It’s precedence, not context

Ovid blames Perl for a newbie-level mistake in a Twitter post.
Marcel goes on to misread the problem as one of context. Ovid posts a fix that uses parentheses to solve the precedence problem. This is chiefly a problem of code reading skills.

Here’s the code:

$ perl -MData::Dumper -e 'sub boo { 1,2,3 } my @x = (boo()||5,8,7); print Dumper \@x'

The output is:

$VAR1 = [
          3,
          8,
          7
        ];

Ovid doesn’t say much about what he thinks should happen, but it shouldn’t be hard for any person who actually knows Perl (and programming), to figure it out. If you understand precedence, you know which order things will occur.

First, you have to break it down into what happens when. Many experienced people often skip this step because they think their experience should allow them to skip the basics. They try to take in complex expressions all at once and figure out what they do. That’s where people get confused. They ignore the few simple rules of code reading.

Perl figures out the right side of the assignment operator first, so you have to figure out this expression, which is in list context because of the assignment to @x, an array:

(boo()||5,8,7)

In list context, the comma operator separates the elements of the list. There are only a few operators lower in precedence, and || isn’t one of them. The list is then going to be the results of these three expressions:

boo()||5
8
7

This is not what some people (maybe Ovid) don’t expect because they parse it as the choice between two lists:

boo()
(5,8,7)

Most of the misunderstanding comes from thinking the comma is just a way to separate items instead of thinking about it as an operator that has precedence like other operators. Some of the rest of the misunderstanding comes from perceptual narrowing; people are primed to think about lists so they forget what they should know and substitute new rules that only deals with lists.

Change the comma to a different, unfamiliar character, such as ‡, and show it to a programmer who understands precedence and I assert the confusion disappears because the programmer doesn’t insert his misconceptions about the comma into reading the code:

bar() || 5 ‡ 8 ‡ 7

Once you understand the precedence, the last two expressions of the list, 8 and 7, are easy. The first one is easy too. If you looked at it by itself you shouldn’t have a problem with it. You call boo() in scalar context because || is a scalar argument. If it returns a true value, use it. Otherwise, use 5. That’s easy enough too. You might recognize the process better if you saw it with a different scalar operator, such as +:

bar() + 5

The definition for the subroutine is just boo { 1,2,3 }. In scalar context, that is the final element in the series because that’s what the comma operator in scalar context does. Two things are lacking here in most people’s analysis: the comma is an operator and it responds to context. This is almost excusable as a gotcha, but it’s such a well known gotcha that a practicing Perl programmer should know it. This isn’t some obscure corner of the language. It’s the very basics of how the language works.

The perlop documentation is quite clear. The comma operator in scalar context evaluates its left expression and discards it (so, there may be side effects), then evaluates its rightmost element, in this case three, and returns it. There’s even an example in the perlfaq4’s “What’s the difference between a list and an array” that tells you exactly that, and it does that because so many people make this same mistake.

Experienced programmers often charge ahead where they’d do well to read the documentation. It doesn’t matter what you think it should do; it only matters what it does. Intuition is fine when it works out, but it’s not an excuse for a lack of knowledge or education. Intuition is a fool’s game; it only has hope when everyone thinks the same, and nobody does.

The basics matter quite a bit. Don’t get lazy.

Leave a comment

2 Comments.

  1. I tried to comment on Marcel’s post, but blogs.perl.org can’t seem to stomach my openid url.

    Anyway, I think part of the problem (at least for me) is that I don’t keep in the front of my mind the idea that context is effectively passed in to subs as an extra parameter. Yeah, I know that subs can use wantarray() (or more sophisticated mechanisms) to *check* what context they were called in, but I don’t tend to think about context silently affecting the behavior of the return expression.

    This combines with the fact that many Perl programmers don’t actively think of the comma operator as having different behaviors in different contexts (because we so seldom intentionally use it in scalar context).

    I wish 35 years ago K&R had decided to use a much less widely-used character for the “throw away your first parameter and return your 2nd” operation. In Perl, having one operator with 2 very different behaviors, one of which is used in pretty much every meaningful snippet of Perl code, and one of which is almost never intentionally invoked, strikes me (with the benefit of hindsight) as a bad design choice.

  2. It’s much more obvious when one changes sub boo {1,2,3 } to sub boo { 10,11,12 }.

Leave a Reply

You must be logged in to post a comment.

7ads6x98y