6

I know I can slurp a file by setting the input record separator ($/) to an undefined value, like

open my $fh, '<', $filename or die "Cannot read from $file: $!";
my $contents = do { local $/; <$fh> };

But recently I came across a very similar but different idiom:

open my $fh, '<', $filename or die "Cannot read from $file: $!";
my $contents = do { local $/ = <$fh> };

(Note the local $/ = <$fh> instead of local $/; <$fh>).

Both of these work, and there are examples on CPAN of both the variant with the assignment and the one without (although the latter is not surprisingly, much more common).

But my question is why does it work? What is the variant with the assignment doing?

PS: I also know that I should be using eg. File::Slurper to slurp files, but life is funny like that sometimes.

jja
  • 2,058
  • 14
  • 27

1 Answers1

4

This is the result of an undocumented optimization on which you shouldn't rely.


Normally LHS = RHS is evaluated as follows:

  1. RHS is evaluated.
  2. LHS is evaluated.
  3. Assignment is evaluated.

As you can see, the right-hand side of the assignment is evaluated first[1]. This allows the following to work:

my $x = 123;

{
   my $x = $x * 2;
   say $x;  # 456
}

say $x;  # 123

Obviously, something different —and undocumented— is happening in your case. That's because LHS = <$fh> is special. Rather than reading from the file then assigning the result to the left-hand side, readline (<>) writes directly to the result of the left-hand side of the assignment.[2]

  1. LHS is evaluated. (This backs up $/ and sets it to undef in your case.)
  2. $fh is evaluated.
  3. readline is evaluated, writing directly to the result of the left-hand side of the assignment.

No assignment is performed.

This optimization is undocumented, and you shouldn't rely on it.

local $/ = uc(<$fh>) wouldn't work, for example.


  1. The compiled code has the right-hand side evaluated first:

    $ perl -MO=Concise,-exec -e'$L = $R'
    1  <0> enter
    2  <;> nextstate(main 1 -e:1) v:{
    3  <#> gvsv[*R] s                   <- $R
    4  <#> gvsv[*L] s                   <- $L
    5  <2> sassign vKS/2                <- =
    6  <@> leave[1 ref] vKP/REFC
    -e syntax OK
    

    The following shows the right-hand side evaluated first:

    $ perl -e'sub f :lvalue { CORE::say $_[0]; $x } f("L") = f("R")'
    R
    L
    
  2. $x = uc(<>) evaluates uc(<>) before $x, then performs an assignment:

    $ perl -MO=Concise,-exec -e'$x = uc(<>)'
    1  <0> enter
    2  <;> nextstate(main 1 -e:1) v:{
    3  <#> gv[*ARGV] s                  \
    4  <1> readline[t3] sK/1             > RHS
    5  <1> uc[t4] sK/1                  /
    6  <#> gvsv[*x] s                   -> LHS
    7  <2> sassign vKS/2
    8  <@> leave[1 ref] vKP/REFC
    -e syntax OK
    

    $x = uc(<>) evaluates $x before <>, and it doesn't perform an assignment:

    $ perl -MO=Concise,-exec -e'$x = <>'
    1  <0> enter
    2  <;> nextstate(main 1 -e:1) v:{
    3  <#> gvsv[*x] s                   -> LHS
    4  <#> gv[*ARGV] s                  \  RHS
    5  <1> readline[t3] sKS/1           /
    6  <@> leave[1 ref] vKP/REFC
    -e syntax OK
    

    Note the (uppercase) S next to readline that wasn't there before. This "special" flag is what tells readline to write to $x.

    Adding local doesn't change anything.

    $ perl -MO=Concise,-exec -e'local $x = <>'
    1  <0> enter
    2  <;> nextstate(main 1 -e:1) v:{
    3  <#> gvsv[*x] s/LVINTRO
    4  <#> gv[*ARGV] s
    5  <1> readline[t3] sKS/1
    6  <@> leave[1 ref] vKP/REFC
    -e syntax OK
    
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • @melpomene, In reply to the comment to your deleted answer, `local` doesn't remove magic. `perl -e'local $! = 5; CORE::say $!'` You gotta localize the entire glob. `perl -e'local *!; $! = 5; CORE::say $!'` So yeah, unrelated. Note that `local $_` is special cased so it does remove magic. `perl -e'for $x ($!) { local $x = 5; CORE::say $x }'` vs `perl -e'for ($!) { local $_ = 5; CORE::say $_ }'` – ikegami Sep 02 '19 at 15:44
  • "writing directly to the result of the left-hand side of the assignment": in this, you mean the left-hand as in "the one on the left of the `do {}`"? – jja Sep 02 '19 at 15:45
  • 1
    I only talk about `local $/ = <>`. The rest is irrelevant to the question. – ikegami Sep 02 '19 at 15:47
  • In the full statement, `do` makes a copy of the result, which is then assigned (copied) to the `$contents`. These particular copies are efficient; they don't result in a copy of the string buffer. (5.20 introduced a copy-on-write mechanism, so copying scalars containing strings is always efficient. But this was efficient before that because of an optimization that occurred when copying temporary scalars. Ownership of the string buffer would be transferred via pointer assignment rather than duplicating the string buffer.) – ikegami Sep 02 '19 at 15:49
  • The copy made by `do` does involve creating (and then freeing) a new scalar, however, so `my $contents; { local $/; $content = <$fh>; }` is a bit more efficient. This is the inefficiency that the `$x = <>` optimization eliminates. (Note that I go for readability and use `do`. After all, we're reading from the disc.) – ikegami Sep 02 '19 at 15:59
  • Thanks for the interesting and thorough explanation, and I'd suggest that you move/duplicate the bold comment about not relying on this behaviour right at the beginning of the answer for the TL;DR impatients – polettix Sep 08 '19 at 08:53