Paul Scheinast offers many naïve Perl benchmarks

The Benchmark module is one of the most dangerous things to give a new and uninformed Perl programmer. Paul Scheinast constructed a page to Write Fast Code in Perl which completely misses the point of optimizing code. There’s some conversation in the Reddit post How to write fast code in Perl.

As I emphasize in Mastering Perl, faster code comes from better algorithms instead of different syntax. The difference between tr/// and y, for instance, is so insignificant that the real problem is the time you waste thinking about it.

But, the big problem with these benchmarks and the people who tout the results is that they don’t run them more than once. They get an answer and stop thinking about it.

Here’s his Perl y vs tr benchmark:

use Benchmark qw(:all) ;
 $x = "12121211"; $x=~tr/1/X/; print $x.$/;
 $x = "12121211"; $x=~y/1/X/;  print $x.$/;
 
cmpthese(-2, {
'tr'  => sub {$x = "12121211"; $x=~tr/1/X/;},
'y'   => sub {$x = "12121211"; $x=~y/1/X/;},
});

He shows one set of results:

X2X2X2XX
X2X2X2XX
        Rate  tr   y
tr 3772974/s  -- -9%
y  4156585/s 10%  --

To an knowledgeable Perl programmer, this is an amazing result since the tr and y are the same thing. It’s not that they do the same task in different ways. They compile to the same code (see Use B::Deparse to see what perl thinks the code is. at The Effective Perler). The two operators are synonyms and are indistinguishable after the compilation:

$ perl -MO=Deparse -e 'sub x { my $x = "123456"; $x =~ tr/3/X/ }'
sub x {
    my $x = '123456';
    $x =~ tr/3/X/;
}
-e syntax OK

$ perl -MO=Deparse -e 'sub x { my $x = "123456"; $x =~ y/3/X/ }'
sub x {
    my $x = '123456';
    $x =~ tr/3/X/;
}
-e syntax OK

Benchmarking is dangerous because there’s not an answer. There are many answers and many things to consider. After I run a benchmark, I run it again to see if I get the same answer. Typically, the percentages change a bit, but in this case the two snippets trade places too:

$ for i in `seq 1 6`; do perl5.22.0 tr-v-y.pl; done
X2X2X2XX
X2X2X2XX
        Rate   y  tr
y  4549604/s  -- -5%
tr 4764789/s  5%  --

X2X2X2XX
X2X2X2XX
        Rate   y  tr
y  5218028/s  -- -1%
tr 5290980/s  1%  --

X2X2X2XX
X2X2X2XX
        Rate   y  tr
y  5131999/s  -- -2%
tr 5236451/s  2%  --

X2X2X2XX
X2X2X2XX
        Rate  tr   y
tr 4508617/s  -- -1%
y  4550135/s  1%  --

X2X2X2XX
X2X2X2XX
        Rate  tr   y
tr 4091430/s  -- -1%
y  4151598/s  1%  --

X2X2X2XX
X2X2X2XX
        Rate   y  tr
y  4683416/s  -- -0%
tr 4683416/s  0%  --

If I thought that these two operators were different, I might be confused by these. Instead of stopping here, I want to try something else. I’ll benchmark y against itself. I should get the same times for each snippet because it’s the same thing:

use Benchmark qw(:all) ;
 $x = "12121211"; $x=~tr/1/X/; print $x.$/;
 $x = "12121211"; $x=~y/1/X/;  print $x.$/;
 
cmpthese(-2, {
'y2'  => sub {$x = "12121211"; $x=~y/1/X/;},
'y1'  => sub {$x = "12121211"; $x=~y/1/X/;},
});

I see the same results I saw before when I run the tests one after another:

$ for i in `seq 1 6`; do perl5.22.0 y-v-y.pl; done
X2X2X2XX
X2X2X2XX
        Rate  y1  y2
y1 4612790/s  -- -1%
y2 4676593/s  1%  --

X2X2X2XX
X2X2X2XX
        Rate  y2  y1
y2 4612502/s  -- -4%
y1 4799265/s  4%  --

X2X2X2XX
X2X2X2XX
        Rate  y2  y1
y2 5160561/s  -- -3%
y1 5324464/s  3%  --

X2X2X2XX
X2X2X2XX
        Rate  y2  y1
y2 5232626/s  -- -2%
y1 5314607/s  2%  --

X2X2X2XX
X2X2X2XX
        Rate  y1  y2
y1 4969555/s  -- -7%
y2 5330167/s  7%  --

X2X2X2XX
X2X2X2XX
        Rate  y1  y2
y1 5187125/s  -- -2%
y2 5278021/s  2%  --

Comparing the y to itself shows that there’s some inherent uncertainty in the measurement. And, there always will be. Ignore that to your own embarrassment.

Notice that across the runs that the rate ranges from 4612502/s to 5324464/s. That’s a difference of about 700,000 iterations, or 15% of 4,612,502 or 13% of 5,324,464. Those differences are much greater than the relative percentages in the reports. That’s a problem. Running another test right after a test gives different results even if the percentages within the test are the same. Damn you multi-tasking computers!

elif’s missing ‘e’

@aran384 complains on Twitter about Perl’s elsif, throwing up his hands with a “WTF?”. It’s the sort of facile complaint by the people who can’t think or don’t want to think. » Read more…

Bugzilla mis-using CGI.pm is not a bug

Gervase Markham writes about a New Class of Vulnerability that shows the unbridled ego that leads to someone blaming the language and ignoring their own lack of competence.

He writes about a new class vulnerability for what he labels a bug. It’s neither new nor a bug. Miyagawa wrote about this in 2009 in Perl: Why parameters() sucks and what we can do, and that’s still not the origin of it. I know about in the 90s when I was making these mistakes. If you think you’ve found something new, you’re probably wrong. Don’t get too excited too soon.

Gervase is writing about the param method from CGI.pm. In particular, in list context it returns a list of all the form fields with that name:

my @params = $cgi->param( $field );

If there are no form fields with that name, it returns the empty list. That means when Bugzilla used it in list context to create a hash, it was courting disaster by not properly checking its return value:

my $otheruser = Bugzilla::User->create({
    login_name => $login_name, 
    realname   => $cgi->param('realname'), 
    cryptpassword => $password});

If there’s no realname, param returns an empty list and cryptpassword becomes the value for realname.

This is a problem with the competence of the Bugzilla developers. Hey, it happens. But, it’s how you react when you mess up that matters. Blaming the language is the wrong move. It’s not even a language issue. It’s a misuse of a designed and documented API.

But, that’s not the real problem here. The syntax and API misuse is the problem. The Bugzilla developers are taking unfiltered and invalidated input directly from the user and doing things with it. Even if the param method did what they expected, they are still wrong.

This sensationalist posts then make the rounds of the internet, amplified by people who never bothered to learn the tool. It fits their uninformed mental models of the world. Other ignorant people are quick to jump onto that bandwagon because they mirror the behavior of the group they want to belong to. And, most everyone ends up worse for it.

If you mess up, take the blame for it and move on. It makes life so much easier.

rename takes two scalars, not a list

Perl gets us used to the idea that we can throw a list at a function and it all works out. The first item in the list becomes the first parameter, the second list item becomes the next parameter, and so on. We expect these two calls to some_sub to act the same: » Read more…

It’s a way of thinking

danvk tried to solve the Josephus Problem in Perl, Python, and Ruby, comparing each one. He had some pretty ignorant things to say about each. » Read more…

Don’t forget that path!

I wanted to rename some files, so I pulled out File::Find and created a callback to do that. Then nothing appeared to happen: » Read more…

Why I didn’t submit that patch

Ovid notes his three requirements for submitting a patch:

  1. The other devs should be pleasant to work with
  2. The code base should be relevant to me or at least fun
  3. The barrier to entry should be as low as possible

» Read more…

Don’t look at that CPAN module!

Can’t use CPAN modules? If you can’t, you have company, whether you want it or not. CPAN Dependency Hell is a real problem, and it’s much worse if you’re an experienced Perl programmer. Management might want to clamp down on third-party, unsupported code. Lawyers might be afraid. I don’t want to get into that bit though. I’ll concede, for the purposes of this post, that these people can’t use any CPAN module. » Read more…

Missing the big picture

chromatic is a bit harsh on Seda Özses’s article “Very simple login using Perl, jQuery, Ajax, JSON and MySQL” on IBM developerWorks, and his tone encouraged a lot of other people to be equally harsh. This has always been one of the biggest problems for technical communities: they forget that a person is on the other side. Not only that, they forget that a person is also on their side. I think chromatic could have modulated his tone and formed a very useful post had he actually explained his changes in the code. » Read more…

It’s precedence, not context

Ovid blames Perl for a newbie-level mistake in a Twitter post.
Marcel goes on to misread the problem as one of context. Ovid posts a fix that uses parentheses to solve the precedence problem. This is chiefly a problem of code reading skills.

Here’s the code:

$ perl -MData::Dumper -e 'sub boo { 1,2,3 } my @x = (boo()||5,8,7); print Dumper \@x'

The output is:

$VAR1 = [
          3,
          8,
          7
        ];

Ovid doesn’t say much about what he thinks should happen, but it shouldn’t be hard for any person who actually knows Perl (and programming), to figure it out. If you understand precedence, you know which order things will occur.

First, you have to break it down into what happens when. Many experienced people often skip this step because they think their experience should allow them to skip the basics. They try to take in complex expressions all at once and figure out what they do. That’s where people get confused. They ignore the few simple rules of code reading.

Perl figures out the right side of the assignment operator first, so you have to figure out this expression, which is in list context because of the assignment to @x, an array:

(boo()||5,8,7)

In list context, the comma operator separates the elements of the list. There are only a few operators lower in precedence, and || isn’t one of them. The list is then going to be the results of these three expressions:

boo()||5
8
7

This is not what some people (maybe Ovid) don’t expect because they parse it as the choice between two lists:

boo()
(5,8,7)

Most of the misunderstanding comes from thinking the comma is just a way to separate items instead of thinking about it as an operator that has precedence like other operators. Some of the rest of the misunderstanding comes from perceptual narrowing; people are primed to think about lists so they forget what they should know and substitute new rules that only deals with lists.

Change the comma to a different, unfamiliar character, such as ‡, and show it to a programmer who understands precedence and I assert the confusion disappears because the programmer doesn’t insert his misconceptions about the comma into reading the code:

bar() || 5 ‡ 8 ‡ 7

Once you understand the precedence, the last two expressions of the list, 8 and 7, are easy. The first one is easy too. If you looked at it by itself you shouldn’t have a problem with it. You call boo() in scalar context because || is a scalar argument. If it returns a true value, use it. Otherwise, use 5. That’s easy enough too. You might recognize the process better if you saw it with a different scalar operator, such as +:

bar() + 5

The definition for the subroutine is just boo { 1,2,3 }. In scalar context, that is the final element in the series because that’s what the comma operator in scalar context does. Two things are lacking here in most people’s analysis: the comma is an operator and it responds to context. This is almost excusable as a gotcha, but it’s such a well known gotcha that a practicing Perl programmer should know it. This isn’t some obscure corner of the language. It’s the very basics of how the language works.

The perlop documentation is quite clear. The comma operator in scalar context evaluates its left expression and discards it (so, there may be side effects), then evaluates its rightmost element, in this case three, and returns it. There’s even an example in the perlfaq4’s “What’s the difference between a list and an array” that tells you exactly that, and it does that because so many people make this same mistake.

Experienced programmers often charge ahead where they’d do well to read the documentation. It doesn’t matter what you think it should do; it only matters what it does. Intuition is fine when it works out, but it’s not an excuse for a lack of knowledge or education. Intuition is a fool’s game; it only has hope when everyone thinks the same, and nobody does.

The basics matter quite a bit. Don’t get lazy.

7ads6x98y