Re^3: Parsing a file and finding the dependencies in it

Hope you understood Corion's reply. "while (<DATA>)" causes $_ be set to the next line in __DATA__ at every iteration.

This construct: my $num = /regex/.../regex/ uses what is also known as the flip-flop operator. A classic post on this by Grandfather is: Flipin good, or a total flop?.

A single regex that matches will have a true value, I think a numeric 1 is returned. In the case where 2 regex's are joined by the ... operator, a line number is returned representing which line of the record we are on.

I would suggest that you put a print "num=$num\n"; statement in the loop and watch what happens. You will see values like: 1,2,3,4E0.

The 4E0 means that something is different about this line number! And indeed there is. It is the line that contains the ']' character (the last line of the record - the line that matches the 2nd regex). The E0 is just exponential notation meaning 10**0. Any number raised to the zero'th power is 1. So 4E0 = 4 * 10**0 = 4 * 1 = 4 from a numeric perspective. So this is a clever way to return 2 pieces of information with a single number. A number in exponential format means the record is over and if I wanted to do some math on this number, it is a perfectly legitimate representation of the number 4.

Update:
I could have written the code with a more conventional parsing scheme. When the first line of a record is detected, call a subroutine which processes lines until the last line of the record is detected. This eliminates the need to have some flag values like "I'm inside the record now..", etc. The flip-flop implementation essentially does what the below would have done:

#!/usr/bin/perl -w
use strict;

while (<DATA>)
{
   process_record() if /^\[/;  #start of record
}

sub process_record
{
   my %record;
   my $line;
   my $line_num=1;
   
   while (defined ($line = <DATA>) and $line !~ /\]/)
   {   
      print "line= ",$line_num++,"  ",$line;
      # do splits and fill in %record here
   }
   
   print "Record Complete!\n\n";
   # use %record here to populate other hashes
   # %record is thrown away when sub returns.
}

=prints
line= 1  ID:        123
line= 2  Start:     /tmp/file.1  /tmp/file.2  /tmp/file.3
line= 3  Done:      /complete/success.1 /complete/success.2
Record Complete!

line= 1  ID:        456
line= 2  Start:     /complete/success.1  /complete/success.2  /tmp/fil
+e.3
line= 3  Done:      /complete/success.3  /complete/success.4
Record Complete!
=cut

__DATA__
[  
ID:        123
Start:     /tmp/file.1  /tmp/file.2  /tmp/file.3
Done:      /complete/success.1 /complete/success.2
]

[  
ID:        456
Start:     /complete/success.1  /complete/success.2  /tmp/file.3
Done:      /complete/success.3  /complete/success.4
]
[download]

Comment on Re^3: Parsing a file and finding the dependencies in it Download Code

Replies are listed 'Best First'.
Re^4: Parsing a file and finding the dependencies in it by remiah (Hermit) on Jul 07, 2011 at 07:30 UTC
I didn't have an idea that '...' is an operator. I'm now reading perlop's Range Operator section and I am gradually understanding the mystery. Thanks to Corion and Marshall. Marshall's explanation is a great help for me. Thousand miles to go before I sleep.	[reply]