Programming versus programs

Some people use programming languages to create programs and other people use those programs. Many people know the the difference. Some people don’t. Witness this tweet from @Nolaan_boy:



He’s upset that he can’t tell the difference between building your own tool and using one that someone else has already written. He shows a Perl program using Text::CSV_XS (the correct way to do that). The program looks well written and reasonable; it’s even commented. Nightmare?

On the other side, he shows the command-line use of csvcut, a program written in Python. I think it’s a nice program; when I saw this defective tweet I downloaded csvcut and tried it. Nice job, programmers! However, @Nolaan_boy is not writing Python and is not dealing with the Python language. I’m sure he’d have an equally hard time dealing with the Python source.

He could have posted that csvcut was a really nice program that solved his problem. He could have used it and been done with life. But, that’s not what he wanted to do. He wanted to bash Perl. I don’t mind if you don’t like Perl, but if you’re done with it, you don’t have an ongoing emotional response to it. That means you’re still letting it influence you.

But this really isn’t about Perl. This is about a particular Python failure mode where programmers reinforce their bonds by attacking a common target. They don’t need Perl to be that target, but it’s what the Python in-crowd has latched on to. Instead of talking about the great things coming out of Python, they have to bash things they don’t understand.

Paul Scheinast offers many naïve Perl benchmarks

The Benchmark module is one of the most dangerous things to give a new and uninformed Perl programmer. Paul Scheinast constructed a page to Write Fast Code in Perl which completely misses the point of optimizing code. There’s some conversation in the Reddit post How to write fast code in Perl.

As I emphasize in Mastering Perl, faster code comes from better algorithms instead of different syntax. The difference between tr/// and y, for instance, is so insignificant that the real problem is the time you waste thinking about it.

But, the big problem with these benchmarks and the people who tout the results is that they don’t run them more than once. They get an answer and stop thinking about it.

Here’s his Perl y vs tr benchmark:

use Benchmark qw(:all) ;
 $x = "12121211"; $x=~tr/1/X/; print $x.$/;
 $x = "12121211"; $x=~y/1/X/;  print $x.$/;
 
cmpthese(-2, {
'tr'  => sub {$x = "12121211"; $x=~tr/1/X/;},
'y'   => sub {$x = "12121211"; $x=~y/1/X/;},
});

He shows one set of results:

X2X2X2XX
X2X2X2XX
        Rate  tr   y
tr 3772974/s  -- -9%
y  4156585/s 10%  --

To an knowledgeable Perl programmer, this is an amazing result since the tr and y are the same thing. It’s not that they do the same task in different ways. They compile to the same code (see Use B::Deparse to see what perl thinks the code is. at The Effective Perler). The two operators are synonyms and are indistinguishable after the compilation:

$ perl -MO=Deparse -e 'sub x { my $x = "123456"; $x =~ tr/3/X/ }'
sub x {
    my $x = '123456';
    $x =~ tr/3/X/;
}
-e syntax OK

$ perl -MO=Deparse -e 'sub x { my $x = "123456"; $x =~ y/3/X/ }'
sub x {
    my $x = '123456';
    $x =~ tr/3/X/;
}
-e syntax OK

Benchmarking is dangerous because there’s not an answer. There are many answers and many things to consider. After I run a benchmark, I run it again to see if I get the same answer. Typically, the percentages change a bit, but in this case the two snippets trade places too:

$ for i in `seq 1 6`; do perl5.22.0 tr-v-y.pl; done
X2X2X2XX
X2X2X2XX
        Rate   y  tr
y  4549604/s  -- -5%
tr 4764789/s  5%  --

X2X2X2XX
X2X2X2XX
        Rate   y  tr
y  5218028/s  -- -1%
tr 5290980/s  1%  --

X2X2X2XX
X2X2X2XX
        Rate   y  tr
y  5131999/s  -- -2%
tr 5236451/s  2%  --

X2X2X2XX
X2X2X2XX
        Rate  tr   y
tr 4508617/s  -- -1%
y  4550135/s  1%  --

X2X2X2XX
X2X2X2XX
        Rate  tr   y
tr 4091430/s  -- -1%
y  4151598/s  1%  --

X2X2X2XX
X2X2X2XX
        Rate   y  tr
y  4683416/s  -- -0%
tr 4683416/s  0%  --

If I thought that these two operators were different, I might be confused by these. Instead of stopping here, I want to try something else. I’ll benchmark y against itself. I should get the same times for each snippet because it’s the same thing:

use Benchmark qw(:all) ;
 $x = "12121211"; $x=~tr/1/X/; print $x.$/;
 $x = "12121211"; $x=~y/1/X/;  print $x.$/;
 
cmpthese(-2, {
'y2'  => sub {$x = "12121211"; $x=~y/1/X/;},
'y1'  => sub {$x = "12121211"; $x=~y/1/X/;},
});

I see the same results I saw before when I run the tests one after another:

$ for i in `seq 1 6`; do perl5.22.0 y-v-y.pl; done
X2X2X2XX
X2X2X2XX
        Rate  y1  y2
y1 4612790/s  -- -1%
y2 4676593/s  1%  --

X2X2X2XX
X2X2X2XX
        Rate  y2  y1
y2 4612502/s  -- -4%
y1 4799265/s  4%  --

X2X2X2XX
X2X2X2XX
        Rate  y2  y1
y2 5160561/s  -- -3%
y1 5324464/s  3%  --

X2X2X2XX
X2X2X2XX
        Rate  y2  y1
y2 5232626/s  -- -2%
y1 5314607/s  2%  --

X2X2X2XX
X2X2X2XX
        Rate  y1  y2
y1 4969555/s  -- -7%
y2 5330167/s  7%  --

X2X2X2XX
X2X2X2XX
        Rate  y1  y2
y1 5187125/s  -- -2%
y2 5278021/s  2%  --

Comparing the y to itself shows that there’s some inherent uncertainty in the measurement. And, there always will be. Ignore that to your own embarrassment.

Notice that across the runs that the rate ranges from 4612502/s to 5324464/s. That’s a difference of about 700,000 iterations, or 15% of 4,612,502 or 13% of 5,324,464. Those differences are much greater than the relative percentages in the reports. That’s a problem. Running another test right after a test gives different results even if the percentages within the test are the same. Damn you multi-tasking computers!

elif’s missing ‘e’

@aran384 complains on Twitter about Perl’s elsif, throwing up his hands with a “WTF?”. It’s the sort of facile complaint by the people who can’t think or don’t want to think. » Read more…

Bugzilla mis-using CGI.pm is not a bug

Gervase Markham writes about a New Class of Vulnerability that shows the unbridled ego that leads to someone blaming the language and ignoring their own lack of competence.

He writes about a new class vulnerability for what he labels a bug. It’s neither new nor a bug. Miyagawa wrote about this in 2009 in Perl: Why parameters() sucks and what we can do, and that’s still not the origin of it. I know about in the 90s when I was making these mistakes. If you think you’ve found something new, you’re probably wrong. Don’t get too excited too soon.

Gervase is writing about the param method from CGI.pm. In particular, in list context it returns a list of all the form fields with that name:

my @params = $cgi->param( $field );

If there are no form fields with that name, it returns the empty list. That means when Bugzilla used it in list context to create a hash, it was courting disaster by not properly checking its return value:

my $otheruser = Bugzilla::User->create({
    login_name => $login_name, 
    realname   => $cgi->param('realname'), 
    cryptpassword => $password});

If there’s no realname, param returns an empty list and cryptpassword becomes the value for realname.

This is a problem with the competence of the Bugzilla developers. Hey, it happens. But, it’s how you react when you mess up that matters. Blaming the language is the wrong move. It’s not even a language issue. It’s a misuse of a designed and documented API.

But, that’s not the real problem here. The syntax and API misuse is the problem. The Bugzilla developers are taking unfiltered and invalidated input directly from the user and doing things with it. Even if the param method did what they expected, they are still wrong.

This sensationalist posts then make the rounds of the internet, amplified by people who never bothered to learn the tool. It fits their uninformed mental models of the world. Other ignorant people are quick to jump onto that bandwagon because they mirror the behavior of the group they want to belong to. And, most everyone ends up worse for it.

If you mess up, take the blame for it and move on. It makes life so much easier.

rename takes two scalars, not a list

Perl gets us used to the idea that we can throw a list at a function and it all works out. The first item in the list becomes the first parameter, the second list item becomes the next parameter, and so on. We expect these two calls to some_sub to act the same: » Read more…

The Josephus Problem: It’s a way of thinking

danvk tried to solve the Josephus Problem in Perl, Python, and Ruby, comparing each one. He had some pretty ignorant things to say about each. » Read more…

Don’t forget that path!

I wanted to rename some files, so I pulled out File::Find and created a callback to do that. Then nothing appeared to happen: » Read more…

Why I didn’t submit that patch

Ovid notes his three requirements for submitting a patch:

  1. The other devs should be pleasant to work with
  2. The code base should be relevant to me or at least fun
  3. The barrier to entry should be as low as possible

» Read more…

Don’t look at that CPAN module!

Can’t use CPAN modules? If you can’t, you have company, whether you want it or not. CPAN Dependency Hell is a real problem, and it’s much worse if you’re an experienced Perl programmer. Management might want to clamp down on third-party, unsupported code. Lawyers might be afraid. I don’t want to get into that bit though. I’ll concede, for the purposes of this post, that these people can’t use any CPAN module. » Read more…

Missing the big picture

chromatic is a bit harsh on Seda Özses’s article “Very simple login using Perl, jQuery, Ajax, JSON and MySQL” on IBM developerWorks, and his tone encouraged a lot of other people to be equally harsh. This has always been one of the biggest problems for technical communities: they forget that a person is on the other side. Not only that, they forget that a person is also on their side. I think chromatic could have modulated his tone and formed a very useful post had he actually explained his changes in the code. » Read more…