My YAPC::NA 2012 recap

June 19, 2012 by Andy | 0 comments

Random notes and comments about YAPC::NA in Madison, WI

ack 2.0

I uploaded ack 2.00alpha01 to the CPAN.

All that week, Rob Hoelz did a ton of work, and Jerry Gay was invaluable in helping us work through some configuration issues. Then, out of nowhere, Ryan Olson swoops in to close some sticky issues in the GitHub queue. I love conferences for bringing people together to get things done.

Finally, on Thursday night at the Bad Movie BOF I hacked away on the final few tickets while watching “Computer Beach Party (1987)”. Halfway through MST3K’s take on “Catalina Caper (1967)”, I made the alpha release. If that’s not heaven, I don’t know what is.

Mojolicious

Glen Hinkle

Mojolicous looks really cool. Glen called it a “full web framework, not partial,” although I’m not sure what would count as a partial framework.

It has no outside dependencies, and works to have a lot of bleeding edge features like websockets, non-blocking events, IPv6 and concurrent requests.

Mojo::UserAgent is the client that is part of Mojolicious, and it’s got all sorts of cool features:

DOM parsing
text selection via CSS selectors
- For example, “give me all the text that is #introduction ul li.”
- Command line: mojo get mojolicio.us '#introduction ul li'
JSON parsing
JSON pointers
- JSON pointers look like XPath as a way of specifying data in
  a JSON string

Mojolicious is based on “routes”, which look like:

get '/'
get '/:placeholder'
get '/#releaxed'
get '/*wildcard'

The latter three are (apparently) ways of making flexible URL specifications that then return information to your app about the URL.

Sample app with Mojolicious::Lite:

use Mojolicious::Lite;
get '/' => sub {
    my $self = shift;
    $self->render( text => 'mytemplate' );
}
app->start;

__DATA__
@@ mytemplate.html.ep
Hello!

Mojolicious also has its own templating language that looks a lot like Mason, but Glen said you can use Template Toolkit as well (and presumably others, but TT was the only one I was
interested in.)

Full Mojolicious includes a dev server called Morbo and you can run your apps through the Hypnotoad “hot-code-reloading production server” if you don’t want to run under Apache/etc.

Another selling point for Mojolicious: They value making things “beautiful” and “fun”. Glen specifically said “Join our IRC channel. We will not be mean to you.”

Perl-as-a-Service shootout

Mark Allen

Slides

This was disappointing because I was hoping for recommendations to use or not use a given vendor’s offerings. I was hoping at least for “This vendor does this, and that one does that differently,” but all I came away with was “they’re pretty much the same.”

It’s a good sign that, as Mark put it, “getting PSGI-compliant apps into PaaS is generally pain free.”

His criteria were as follows:

Ease of deployment
Performance (ignored)
Cost (ignored)
How “magical” the Perl support is (first class or hacked together)

Why ignore performance and cost? I don’t know.

Big data and PDL

There were three sessions back-to-back about PDL, the Perl Data Language. It’s in the same space as Mathematica and R. I was disappointed because I was hoping for big data analysis outside of just number crunching. The analysis of galaxy luminosity was pretty and looked very easy to do, but it didn’t have any application I was interested in. I bailed after the 2nd talk.

My big takeaway from the talk was that I need to take a statistics
class.

Web security 101

Michael Peters gave a good intro talk on security, handwaving the tech details with examples of “This is how bad guys can get your info.”

Emphasis on not trusting your client data, but I was surprised and disappointed that he seemed to steer people away from Perl’s taint mode. He made vague reference to there being bugs with regexes and taint mode, but I don’t know what he’s referring to.

Taint mode is one of my favorite things about Perl 5, and there are (last I checked) no plans for implementing it in Perl 6.

One of the examples Michael used for an example of an attack with SQL injection used sleep() to let the attacker find out information about the database based on timings. I asked him to write that up for bobby-tables.com.

On being a polyglot

Miyagawa gave a great overview of how he spends time in Perl, Python and Ruby, and what he learns from each, and what each language learns from the others.

Key point: Ruby is not the enemy. They are neighbors.

Things he likes about Ruby:

Everything is an object
More Perlish than Python
Diversity matters = TIMTOTWTDI
Meta programming built in and encouraged
Convention of ! and ? in method names
- str.upcase! to upcase str in place
- str.islower? to functions that return values
Ability to omit self
Everything is an expression.
No need to type ; (unlike Python)
Implicit better than explicit
block, iterators and yield
No semicolons, 2-space indent.
- (This last one gives me the creeps. 2-space indent!??!)

Naming differences between the three:

Perl naming: Descriptive, boring, clones become ::Simple
Python naming: Descriptive, confusing, everything is py* or *py
Ruby naming: Fancy, creative, chaotic (Sinatra, Rails, etc)
With frameworks, all the languages get creative: Django, nbottle,
Catalyst, Dancer, Mojolicious

When you’re going to borrow something from another language, don’t just borrow it, but copy it wholesale. Example: Perl’s WWW::Mechanize getting cloned as Ruby’s WWW::Mechanize.

Doing Things Wrong, chromatic

chromatic talked about the value of doing things “wrong” and embracing your constraints. Sometimes you can’t do The Perfect Job, and that’s OK, and sometimes comes out even better.

Example: chromatic wanted to do some parallel web fetching. He could have dug into LWP::Parallel, but instead he went with what he knew: waitpid() and shelling to curl.

Screen scraping example:

Obvious answer: HTML::TokeParser::Simple or Mojo::DOM
Common: Regexes
Lovely: Template::Extract

Parsing HTML with regex may be the “wrong” way to do
it, but sometimes, it’s the best solution.

Perl 6 lists

Patrick Michaud talked about all kinds of awesome stuff you can do with lists and arrays in Perl 6. After a bit I stopped trying to take notes and follow what he was saying and instead just let it wash over me so I could absorb the coolness.

I would really like Perl 6 to be easy enough to install for serious play. I need to get my feet back into the Perl 6 pool and see how I can help.

Tweakers Anonymous

John Anderson (genehack)

Quick overview of cool things that he has in his configs.

“The F keys are not just to skip tracks in your music player.”
Keep your configs in git. You will screw them up. This will save you.
Make your editor chmod +x when you create a .pl file since you know you will want to run it.

The coolest thing was this plugin called flymake. Apparently it runs continuously, submitting your code to a compiler (or perl -c) as you type. As soon as John made a typo on a line and moved to the next line, the error line was highlighted. He then demonstrated doing this with Perl::Critic, which must be dog slow, but flymake lets you adjust the frequency of checks.

Exceptional Exceptions

Mark Fowler, now at OmniTI. Great discussion of exceptions in Perl.

Returning false on failure sucks because you have to follow your failures all the way up the call tree. It’s tedious and error-prone because all it takes is one link in the chain to not propagate the error and you’re out of luck.

Using try/catch from Java.

There are three non-deprecated ways of doing exceptions in Perl.

eval

eval is often confused with eval $string which means to compile code. eval is a statement not a block so requires a semicolon after it. It works but it’s a pain.

Try::Tiny

Simple extension to the syntax
Uses $_ not $@

TryCatch

Has named exception variables
Fully functional syntax
Very fast and featureful
Large dependency base

TryCatch is a little faster than Try::Tiny, but eval is much much faster than either of them.

TryCatch has much more clever syntax, but looks (to me) to be more dangerous.

Mark recommends that whatever you use, you make exceptions out of Exception::Class objects.

Self-selecting for the thick-skinned means turning away contributors.

May 29, 2012 by Andy | 2 Comments

Every so often, usually in the middle of an online argument or flame war, someone will say that the climate of the group has him or her uncomfortable. He’ll say something like “I don’t want to be around all this hostility” or, worst of all, “This makes me not want to get involved.” The reply sometimes comes back “You’re just thin-skinned.”

Labeling someone as “thin-skinned” makes no sense. There is no measure of skin thickness. When someone says “You are thin-skinned,” he’s really saying “You are less willing to put up with anti-social behavior than I am.”

I wonder what the speaker hopes for “You’re just thin-skinned” to do. Is that supposed to inspire the listener? Make him realize the error of his ways? I don’t know what the intent is, but it communicates “You are wrong to feel that way” and that’s hurtful, not helpful. There’s nothing wrong with not wanting to put up with anti-social behavior.

None of this is an endorsement of being easily offended, however you may define “easily.” I wish we all had the attitude of Gina Trapani, who once said “I eat your sexist comments for breakfast. YUM.” But not everyone does, and that’s no reason to shut them out. Yes, online communities can get hostile, but that doesn’t mean we need to tacitly endorse that hostility. We can do better, and we should, to help our communities grow and thrive.

Aside from ignoring the aspect of treating other humans with compassion, it makes no sense to ignore or insult those you see as thin-skinned. Ricardo Signes recalled a lightning talk at OSCON 2011 where someone noted “When we say that this community requires a thick skin, it means we’re self-selecting for only people with thick skin.”

Self-selecting for the thick-skinned means turning away contributors. If you were running a restaurant, and a customer said “I like the food here, but my waiter was rude to me,” the sensible restaurateur would take this as an opportunity for improvement. You’d thank the patron for bringing it to your attention. You wouldn’t say “Well, that’s just the way it is here” or “You’re just too sensitive.” The wise restaurateur would see it as an opportunity for improvement.

There’s an adage in business that for every customer complaint you get, there are between ten to 100 other dissatisfied customers that don’t say anything and go somewhere else. This is especially so in the case of those tarred as “thin-skinned” by someone in the community. For every person who speaks up and says “I don’t like this hostility”, how many more unsubscribe from the list, leave the IRC channel or vow not to come back to the user group meeting again, all without saying a word about it?

In online communities, we’re not dealing with an owner-customer relationship, but nonetheless contributors to the community are a scarce commodity. A business owner can’t afford to turn away customers. Is your online community or open source project so flush with talent that you can turn away contributors?

My Solr+Tomcat troubles, and how I fixed them

May 22, 2012 by Andy | 1 Comment

I’ve been working at getting Solr working under Tomcat, and spent most of a day working on fixing these problems. The fixes didn’t take so much time as the trying to grok the Java app ecosystem.

My Solr install worked well. I was able to import records and search them through the interface. Where I ran into trouble was with the Velocity search browser that comes with Solr.

I’m documenting my troubles and their solutions here because otherwise they won’t exist on the web for people to find. Putting solutions to problems on the web makes them findable for the next poor guy who has the same problem. I figure that if I spend a day working on fixing problems, I can spend another hour publishing them so others can benefit.

These are for Solr 3.5 running under Tomcat 6.0.24.

Unable to open velocity.log

Velocity tries to create a file velocity.log and gets a permission failure.

HTTP Status 500 - org.apache.velocity.exception.VelocityException:
Failed to initialize an instance of
org.apache.velocity.runtime.log.Log4JLogChute with the current
runtime configuration. java.lang.RuntimeException:
org.apache.velocity.exception.VelocityException: Failed to initialize
an instance of org.apache.velocity.runtime.log.Log4JLogChute with
the current runtime configuration. at
...
Caused by: java.io.FileNotFoundException: velocity.log
(Permission denied) at java.io.FileOutputStream.openAppend(Native
Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:207)
...

But where is it trying to create the file? What directory? Since
no pathname was specified, it seemed that the file would be created
in the current working directory of Tomcat. What would that be?

First I had to figure out what process that Tomcat was running as:

frisbee:~ $ ps aux | grep tomcat
tomcat     498  0.6  1.3 6240056 214880 ?      Sl   09:27   0:10 /usr/lib/jvm/java/bin/java ....

In this case, it’s PID 498. So we go to the /proc/498 directory and see what’s in there.

frisbee:~ $ cd /proc/498
frisbee:/proc/498 $ ls -al
ls: cannot read symbolic link cwd: Permission denied
ls: cannot read symbolic link root: Permission denied
ls: cannot read symbolic link exe: Permission denied
total 0
dr-xr-xr-x   7 tomcat tomcat 0 May 22 09:27 ./
dr-xr-xr-x 173 root   root   0 May 17 11:33 ../
dr-xr-xr-x   2 tomcat tomcat 0 May 22 09:58 attr/
-rw-r--r--   1 tomcat tomcat 0 May 22 09:58 autogroup
-r--------   1 tomcat tomcat 0 May 22 09:58 auxv
-r--r--r--   1 tomcat tomcat 0 May 22 09:58 cgroup
--w-------   1 tomcat tomcat 0 May 22 09:58 clear_refs
-r--r--r--   1 tomcat tomcat 0 May 22 09:56 cmdline
-rw-r--r--   1 tomcat tomcat 0 May 22 09:58 coredump_filter
-r--r--r--   1 tomcat tomcat 0 May 22 09:58 cpuset
lrwxrwxrwx   1 tomcat tomcat 0 May 22 09:58 cwd
...

We can see that cwd is a symlink to a directory, but we have to be root to see what the target directory is. I have to run ls again as root.

frisbee:/proc/498 $ sudo ls -al
[sudo] password for alester:
total 0
dr-xr-xr-x   7 tomcat tomcat 0 May 22 09:27 .
dr-xr-xr-x 174 root   root   0 May 17 11:33 ..
dr-xr-xr-x   2 tomcat tomcat 0 May 22 09:58 attr
-rw-r--r--   1 tomcat tomcat 0 May 22 09:58 autogroup
-r--------   1 tomcat tomcat 0 May 22 09:58 auxv
-r--r--r--   1 tomcat tomcat 0 May 22 09:58 cgroup
--w-------   1 tomcat tomcat 0 May 22 09:58 clear_refs
-r--r--r--   1 tomcat tomcat 0 May 22 09:56 cmdline
-rw-r--r--   1 tomcat tomcat 0 May 22 09:58 coredump_filter
-r--r--r--   1 tomcat tomcat 0 May 22 09:58 cpuset
lrwxrwxrwx   1 tomcat tomcat 0 May 22 09:58 cwd -> /usr/share/tomcat6

I could also have used the stat command.

frisbee:/proc/498 $ sudo stat cwd
File: `cwd' -> `/usr/share/tomcat6'
Size: 0               Blocks: 0          IO Block: 1024   symbolic link
Device: 3h/3d   Inode: 100017      Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (   91/  tomcat)   Gid: (   91/  tomcat)
Access: 2012-05-22 09:58:17.131009458 -0500
Modify: 2012-05-22 09:58:17.130009715 -0500
Change: 2012-05-22 09:58:17.130009715 -0500

So we find that the CWD is /usr/share/tomcat6. I don’t want the tomcat user to have rights to that directory, so instead I create a velocity.log file in a proper log directory and then symlink
to it.

frisbee:/proc/498 $ cd /var/log/tomcat6
frisbee:/var/log/tomcat6 $ sudo touch velocity.log
frisbee:/var/log/tomcat6 $ sudo chown tomcat:tomcat velocity.log
frisbee:/var/log/tomcat6 $ cd /usr/share/tomcat6
frisbee:/usr/share/tomcat6 $ sudo ln -s /var/log/tomcat6/velocity.log velocity.log

Now the app is able to open /usr/share/tomcat6/velocity.log without error.

log4j error

Once I created a log file Velocity could write to, it stared throwing an error with log4j. log4j is the Java logging package.

org.apache.log4j.Logger.setAdditivity(Z)V java.lang.NoSuchMethodError:
org.apache.log4j.Logger.setAdditivity(Z)V at
org.apache.velocity.runtime.log.Log4JLogChute.initAppender(Log4JLogChute.java:126) at
org.apache.velocity.runtime.log.Log4JLogChute.init(Log4JLogChute.java:85) at
org.apache.velocity.runtime.log.LogManager.createLogChute(LogManager.java:157) at
org.apache.velocity.runtime.log.LogManager.updateLog(LogManager.java:255) at
org.apache.velocity.runtime.RuntimeInstance.initializeLog(RuntimeInstance.java:795) at
org.apache.velocity.runtime.RuntimeInstance.init(RuntimeInstance.java:250) at
org.apache.velocity.app.VelocityEngine.init(VelocityEngine.java:107) at
org.apache.solr.response.VelocityResponseWriter.getEngine(VelocityResponseWriter.java:132) at
org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:40) at
org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.write(SolrCore.java:1774) at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:352) at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:273) at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555) at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:679)

In searching the web for this error, I found this ticket in the Solr bug tracker that says that the log4j .jar files should be removed from the Solr tarball, because they can conflict with existing .jars on the system. That conflict was exactly the error I was getting.

I wanted to remove the extra .jar files, so I used locate to search my system for any log4j .jars. Indeed, there was one installed with solr:

frisbee:~ $ locate log4j
...
/var/lib/tomcat6/webapps/solr/WEB-INF/lib/log4j-over-slf4j-1.6.1.jar
...

So I just changed the extension of the file so it wouldn’t get loaded as a .jar.

frisbee:~ $ sudo mv /var/lib/tomcat6/webapps/solr/WEB-INF/lib/log4j-over-slf4j-1.6.1.{jar,jarx}

Now Velocity loads beautifully. Now the real work starts: Configuration of Velocity to understand the schema in my Solr core.

I hope this helps someone in the future!

Rethink the post-interview thank you note

May 15, 2012 by Andy | 2 Comments

Good golly do people get riled up by the idea of sending a thank you note after a job interview. “Why should I thank them, they didn’t give me a gift!” is a common refrain in /r/jobs. “They should be thanking me!”

I think the big problem is the name, “thank you note.” It makes us recall being forced to say nice things about the horrible sweater Aunt Margaret gave us for Christmas.

It’s not a thank you note. It’s a followup. It doesn’t have to be any more than this:

Dear Mr. Manager,

Thank you for the opportunity to meet with you today. I enjoyed the interview and tour and discussing your database administration needs. Based on our discussions with Peter Programmer, I’m sure that my PostgreSQL database administration skills would be a valuable addition to the Yoyodyne team. I look forward to hearing from you.

Sincerely,
Susan Candidate.

There’s nothing odious there. You’re not fawning or begging. You’re thanking the interviewer for his time, reminding him of key parts of the interview and your key skills, and reasserting that you are interested in the job. (And before you say “Of course I’m interested, I went to the interview!”, know that perceived indifference and/or lack of enthusiasm is an interview killer.)

People ask “Do I really have to do that?” and I say “No, you don’t HAVE to, you GET to.” It’s not a chore, it’s an opportunity.

Please help me with terminology for “small acts that add to a greater whole”

May 11, 2012 by Andy | 14 Comments

I’m looking for a term to describe small positive actions that individuals do to add up to a greater whole.

Examples in the world of open source software might include:

Answering a question on a mailing list
Testing a beta release
Welcoming someone to a community
Submitting a bug report, or clarifying an existing one
Patching a bug
Closing a ticket
Removing dead code
Silencing a compiler warning
Adding a test to the test suite
Blogging about how you use a software package
Thanking others on the project
Patching the documentation
Adding a tutorial example to the docs
Adding notes to the README
Hosting or speaking at a user group meeting
Attending a user group meeting

Outside of software development specifically, the best example is making an edit to a Wikipedia page. Wikipedia is nothing but millions of these small actions, aggregated.

The term “microaggression” was coined to describe a small non-physical interaction between people that communicates hostility towards others. I’m looking for the opposite.

The Japanese term “kaizen” means “improvement”, or “change for the better”, and is close to what I’m talking about, but I’m looking for a term for the actions, not the process.

If there’s not a similar term to describe the small positive actions that create a greater whole, I’m going to coin it.

Ideas? References? Existing terms I haven’t thought of? Please post them below.

Before you write a patch, write an email

April 27, 2012 by Andy | 7 Comments

I often get surprise patches in my projects from people I’ve never heard from. I’m not talking about things like fixing typos, or fixing a bug in the bug tracker. I’m talking about new features, handed over fully-formed. Unfortunately, it’s sometimes the case that the patch doesn’t fit the project, or where the project is going. I feel bad turning down these changes, but it’s what I have to do.

Sometimes it feels like they’re trying to do their best to make the patch a surprise, sort of like working hard to buy your mom an awesome birthday present without her knowing about it. But in the case of contributing to a project, surprise isn’t a good thing. Talking to the project first doesn’t take away from the value of what you’re trying to do. This talking up front may even turn up a better way to do what you want.

There’s nothing wrong with collaborating with others to plan work to be done. In our day-to-day jobs, when management, clients and users push us to start construction of a project before requirements are complete, it’s called WISCY, or Why Isn’t Someone Coding Yet? As programmers, it’s our job to push back against this tendency to avoid wasted work. Sometimes this means pushing back against users, and sometimes it means pushing back against ourselves.

I’m not suggesting that would-be contributors go through some sort of annoying process, filling out online forms to justify their wants. I’m just talking about a simple email. I know that we want to get to the fun part of coding, but it makes sense to spend a few minutes to drop a quick note: “Hey, I love project Foo, and I was thinking about adding a switch to do X.” You’re sure to get back a “Sounds great! Love to have it!” or a “No, thanks, we’ve thought about that and decided not to do that”. Maybe you’ll find that what you’re suggesting is already done and ready for the next release. Or maybe you’ll get no reply to your email at all, which tells you your work will probably be ignored anyway.

I’m not suggesting that you shouldn’t modify code for your own purposes. That’s part of the beauty of using open source. If you need to add a feature for yourself, go ahead. But if your goal is to contribute to the project as well as scratching your own itch, it only makes sense to start with communication.

Communication starts with understanding how the project works. The docs probably include something about the development process the project uses. While you’re at it, join the project’s mailing list and read the last few dozen messages in the archive. I can’t tell you how many times I’ve answered a question or patch from someone when I’ve said the same thing to someone else a week earlier.

Next time you have an idea to contribute a change to an open source project, let someone know what you’re thinking first. Find out if your patch is something the project wants. Find out what the preferred process for submitting changes is. Save yourself from wasted time.

We want your collaboration! We want you your help! Just talk to us first.

What if news stories were written like resumes?

April 20, 2012 by Andy | 0 comments

If news stories were written like the resumes I see every day, a news story about a fire might look like this:

“There was a fire on Tuesday in a building. Traffic was backed up some distance for some period of time. Costs of the damage were estimated. There may have been fatalities and injuries, or maybe not.”

Now look at your resume. Does it have bullet items like “Wrote web apps in Ruby”? That’s just about as barely informative as my hypothetical news story above. However, your resume’s job is to get you an interview by providing compelling details in your work history.

Add details! What sort of web apps? What did they do? Did they drive company revenue? How many users used them? How big were these apps?

Or maybe you have a bullet point of “provided help desk support.” How many users did you support? How many incidents per day/week? What sorts of problems? Were they geographically close, or remote? What OSes did you support? What apps? Was there sort of service level agreement you had to hit?

If you don’t provide these details, the reader is left to make her own assumptions. “Help desk support” might mean something as basically as handling two phone calls a day for basic “I can’t get the Google to work” questions. Without details you provide, that’s the picture the reader is free to infer.

When you write about your work experiences, you have a picture in your head of the history and skills you’re talking about. To you, “wrote web apps in Ruby” or “provided help desk support” brings back the memory of what that entailed. The reader doesn’t have access to your memory. That’s why you have a resume with written words. You have to spell it out, to draw that picture for her. Your details make that happen and increase the chances you’ll get an interview.

Programmers, please take five minutes to provide some data for an experiment

April 19, 2012 by Andy | 29 Comments

Whenever people talk about ack, there’s always a discussion of whether ack is faster than grep, and how much faster, and people provide data points that show “I searched this tree with find+grep in 8.3 seconds, and it took ack 11.5 seconds”. Thing is, that doesn’t take into account the amount of time it takes to type the command.

How much faster is it to type an ack command line vs. a find+xargs line? I wanted to time myself.

Inspired by this tweet by @climagic, I wanted to find out for myself. I used time read to see how long it would take me to type three different command lines.

The three command lines are:
A: ack --perl foo
B: find . -name '*.php' | xargs grep foo
C: find . -name '*.pl' -o -name '*.pm' | xargs grep foo

So I tried it out using time read. Note that it’s not actually executing the command, but measuring how long it takes to hit Enter.

$ time read
find . -name '*.pl' -o -name '*.pm' | xargs grep foo

real    0m8.648s
user    0m0.000s
sys     0m0.000s

For me, my timings came out to average about 1.4s for A, 6.1s for B and 8.6s for C. That was with practice. I also found that it is nearly impossible for me to type the punctuation-heavy B and C lines without making typos and having to correct them.

So I ask of you, dear readers, would you please try this little experiment yourself, and then post your results in the comments? Just give me numbers for A, B and C and then also include the name of your favorite Beatle so I know you actually read this. Also, if you have any insights as to why you think your results came out the way they did, please let me know.

At this point I’m just collecting data. It’s imperfect, but I’m OK with that.

Yes, I’m sure there’s another way I could do this timing. It might even be “better”, for some values of “better”.
Yes, I know that I’m asking people to report their own data and there may be observational bias.
Yes, I know I’m excluding Windows users from my sample.
Yes, I know it’s possible to create shell aliases for long command lines.
Yes, I know that the find command lines should be using find -print0 and xargs -0.
Yes, I know that some shells have globbing like **/*.{pl,pm}.

Note: I’ve heard from a zsh user that time doesn’t work for this because it’s a shell function, but /usr/bin/time does work.

Thanks for your help! I’ll report on results in a future post.

The world’s two worst variable names

April 18, 2012 by Andy | 70 Comments

As programmers, assigning names makes up a big part of our jobs. Phil Karlton said “There are only two hard things in Computer Science: cache invalidation and naming things.” It’s a hard problem, and it’s something we deal with every time we write a line of code. Whether it’s a variable or a table or a column in that table or a file on the filesystem, or what we call our projects and products, naming is a big deal.

Bad variable naming is everywhere. Maybe you’ll find variables that are too short to be adequately descriptive. The programmer might as well have been working in TRS-80 BASIC, where only the first two characters of variable names were significant, and we had to keep a handwritten lookup chart of names in a spiral notebook next to the keyboard.

Sometimes you’ll find variables where all vowels have been removed as a shortening technique, instead of simple truncation, so you have $cstmr instead of $cust. I sure hope you don’t have to distinguish the customers from costumers! Worse, $cstmr is harder to type because of the lack of vowels, and is no longer pronounceable in conversation.

There are also intentionally bad variable names, where the writer was more interested in being funny than clear. I’ve seen $crap as a loop variable, and a colleague tells of overhauling old code with a function called THE_LONE_RANGER_RIDES_AGAIN(). That’s not the type of bad variable name I mean.

While I’m well aware that variable naming conventions can often turn into a religious war, I’m entirely confident when I declare The World’s Worst Variable Name is $data.

Of course it’s data! That’s what variables contain! That’s all they ever contain. It’s like if you were packing up your belongings in moving boxes, and on the side you labeled the box “matter.”

Variable names should say what type of data they hold. Asking the question “what kind” is an easy way to enhance your variable naming. I once saw $data used when reading a record from a database table. The code was something like:

$data = read_record();
print "ID = ", $data["CUSTOMER_ID"];

Asking the question “what kind of $data?” turns up immediate ideas for renaming. $record would be a good start. $customer_record would be better still.

Vague names are the worst, but right behind them are naming related objects with nearly identical names that do not distinguish them. Therefore the World’s Second Worst Variable Name is: $data2.

More generally, any variable that relies on a numeral to distinguish it from a similar
variable needs to be refactored, immediately. Usually, you’ll see it like this:

$total = $price * $qty;
$total2 = $total - $discount;
$total2 += $total2 * $taxrate;

$total3 = $purchase_order_value + $available_credit;
if ( $total2 < $total3 ) {
    print "You can't afford this order.";
}

You can see this as an archaeological dig through the code. At one point, the code only figured out the total cost of the order, $total. If that’s all the code does, then $total is a fine name. Unfortunately, someone came along later, added code for handling discounts and tax rate, and took the lazy way out by putting it in $total2. Finally, someone added some checking against the total that the user can pay and named it $total3.

The real killer in this chunk of code is that if statement:

if ( $total2 < $total3 )

You can’t read that without going back to figure out how it was calculated. You have to look back up above to keep track of what’s what.

If you’re faced with naming something $total2, change the existing name to something more specific. Spend the five minutes to name the variables appropriately. This level of refactoring is one of the easiest, cheapest and safest forms of refactoring you can have, especially if the naming is confined to a single subroutine.

Let’s do a simple search-and-replace on the coding horror above:

$order_total = $price * $qty;
$payable_total = $order_total - $discount;
$payable_total += $payable_total * $taxrate;

$available_funds = $purchase_order_value + $available_credit;
if ( $payable_total < $available_funds ) {
    print "You can't afford this order.";
}

The only thing that changed was the variable names, and already it's much easier to read. Now there’s no ambiguity as to what each of the _total variables means. And look what we found: The comparison in the if statement was reversed. Effective naming makes it obvious.

There is one exception to the rule that all variables ending with numerals are bad. If the entity itself is named with a number, then keep that as part of the name. It's fine to use $sha1 for variable that holds a SHA-1 hash. It helps no one to rename it to $sha_one.

After I wrote the first version of this article, I created policies for Perl::Critic to check for these two naming problems. My add-on module Perl::Critic::Bangs includes two policies to check for these problems: ProhibitVagueNames and ProhibitNumberedNames.

What other naming sins drive you crazy? Have you created automated ways to detect them?

Undecided if something should go on your resume? Add more detail for guidance.

April 11, 2012 by Andy | 2 Comments

Convential Wisdom has it that resumes have to be written in the most clipped, stilted business-speak possible. It’s not true. Thinking that way is a disservice to our resumes and our job prospects.

A poster on Reddit asked how proficient he should be in German before listing it on his resume. You can see where he’s coming from. He’s wondering if he can add a “Languages spoken: German” bullet point to his resume, and that’s good. The problem is that the clipped business-speak mentality has him thinking that that’s all he can say.

You can and should add detail to your resume. The more detail you add, the less chance there is for misinterpretation, and it helps you think more about your skills and how you can sell them to the reader.

I suggest that instead of putting an overly terse “Languages spoken: German”, you add a sentence giving details. This might be, for example:

I am fluent in written and spoken German, and have been for the past 20 years.
I have conversational fluency with spoken German.
I know some German words I picked up from my Grandma.

If in the process of writing the details of your skill you find that it sounds silly, then you’ve answered your question as to whether it should be on your resume. To be clear, that last bullet item isn’t worth putting on your resume.

This process works with any item you want to put on a resume. As you add detail, does it still sound like it’s worth putting on there? If not, leave it off. If it is, work with that detail to grab the reader’s attention.

Programmers struggle with this all the time. “How much Ruby do I have to know before I can put it on my resume?” Add detail to answer your own question. If you’re not going to be comfortable asking the question “How have you used Ruby?” in the interview, then don’t put it on a resume.

Finally, always remember why you have a resume: A resume exists to get the reader to call you in for an interview. If something isn’t going to make the reader say “We need to get her in here ASAP”, then leave it off.

Andy Lester

Technology, careers, life and being happy