Paul Scheinast offers many naïve Perl benchmarks

The Benchmark module is one of the most dangerous things to give a new and uninformed Perl programmer. Paul Scheinast constructed a page to Write Fast Code in Perl which completely misses the point of optimizing code. There’s some conversation in the Reddit post How to write fast code in Perl.

As I emphasize in Mastering Perl, faster code comes from better algorithms instead of different syntax. The difference between tr/// and y, for instance, is so insignificant that the real problem is the time you waste thinking about it.

But, the big problem with these benchmarks and the people who tout the results is that they don’t run them more than once. They get an answer and stop thinking about it.

Here’s his Perl y vs tr benchmark:

use Benchmark qw(:all) ;
 $x = "12121211"; $x=~tr/1/X/; print $x.$/;
 $x = "12121211"; $x=~y/1/X/;  print $x.$/;
 
cmpthese(-2, {
'tr'  => sub {$x = "12121211"; $x=~tr/1/X/;},
'y'   => sub {$x = "12121211"; $x=~y/1/X/;},
});

He shows one set of results:

X2X2X2XX
X2X2X2XX
        Rate  tr   y
tr 3772974/s  -- -9%
y  4156585/s 10%  --

To an knowledgeable Perl programmer, this is an amazing result since the tr and y are the same thing. It’s not that they do the same task in different ways. They compile to the same code (see Use B::Deparse to see what perl thinks the code is. at The Effective Perler). The two operators are synonyms and are indistinguishable after the compilation:

$ perl -MO=Deparse -e 'sub x { my $x = "123456"; $x =~ tr/3/X/ }'
sub x {
    my $x = '123456';
    $x =~ tr/3/X/;
}
-e syntax OK

$ perl -MO=Deparse -e 'sub x { my $x = "123456"; $x =~ y/3/X/ }'
sub x {
    my $x = '123456';
    $x =~ tr/3/X/;
}
-e syntax OK

Benchmarking is dangerous because there’s not an answer. There are many answers and many things to consider. After I run a benchmark, I run it again to see if I get the same answer. Typically, the percentages change a bit, but in this case the two snippets trade places too:

$ for i in `seq 1 6`; do perl5.22.0 tr-v-y.pl; done
X2X2X2XX
X2X2X2XX
        Rate   y  tr
y  4549604/s  -- -5%
tr 4764789/s  5%  --

X2X2X2XX
X2X2X2XX
        Rate   y  tr
y  5218028/s  -- -1%
tr 5290980/s  1%  --

X2X2X2XX
X2X2X2XX
        Rate   y  tr
y  5131999/s  -- -2%
tr 5236451/s  2%  --

X2X2X2XX
X2X2X2XX
        Rate  tr   y
tr 4508617/s  -- -1%
y  4550135/s  1%  --

X2X2X2XX
X2X2X2XX
        Rate  tr   y
tr 4091430/s  -- -1%
y  4151598/s  1%  --

X2X2X2XX
X2X2X2XX
        Rate   y  tr
y  4683416/s  -- -0%
tr 4683416/s  0%  --

If I thought that these two operators were different, I might be confused by these. Instead of stopping here, I want to try something else. I’ll benchmark y against itself. I should get the same times for each snippet because it’s the same thing:

use Benchmark qw(:all) ;
 $x = "12121211"; $x=~tr/1/X/; print $x.$/;
 $x = "12121211"; $x=~y/1/X/;  print $x.$/;
 
cmpthese(-2, {
'y2'  => sub {$x = "12121211"; $x=~y/1/X/;},
'y1'  => sub {$x = "12121211"; $x=~y/1/X/;},
});

I see the same results I saw before when I run the tests one after another:

$ for i in `seq 1 6`; do perl5.22.0 y-v-y.pl; done
X2X2X2XX
X2X2X2XX
        Rate  y1  y2
y1 4612790/s  -- -1%
y2 4676593/s  1%  --

X2X2X2XX
X2X2X2XX
        Rate  y2  y1
y2 4612502/s  -- -4%
y1 4799265/s  4%  --

X2X2X2XX
X2X2X2XX
        Rate  y2  y1
y2 5160561/s  -- -3%
y1 5324464/s  3%  --

X2X2X2XX
X2X2X2XX
        Rate  y2  y1
y2 5232626/s  -- -2%
y1 5314607/s  2%  --

X2X2X2XX
X2X2X2XX
        Rate  y1  y2
y1 4969555/s  -- -7%
y2 5330167/s  7%  --

X2X2X2XX
X2X2X2XX
        Rate  y1  y2
y1 5187125/s  -- -2%
y2 5278021/s  2%  --

Comparing the y to itself shows that there’s some inherent uncertainty in the measurement. And, there always will be. Ignore that to your own embarrassment.

Notice that across the runs that the rate ranges from 4612502/s to 5324464/s. That’s a difference of about 700,000 iterations, or 15% of 4,612,502 or 13% of 5,324,464. Those differences are much greater than the relative percentages in the reports. That’s a problem. Running another test right after a test gives different results even if the percentages within the test are the same. Damn you multi-tasking computers!

Leave a comment

0 Comments.

Leave a Reply

You must be logged in to post a comment.