in reply to Parsing a file line by line until a certain indice in array

Hi sluggo,

I think the main problem is that where you have:

if ($volume =~ $vol_to_parse) {

you really should have:

if ($volume =~ /$vol_to_parse/) {

since it's a regular expression.

Update:  Thanks to AnomalousMonk below for pointing out that my "fix" above was incorrect, and for teaching me something new!

A couple of other suggestions, though:

You should get in the habit of using warnings (in addition to strict).

Note that you've got a lot of string concatenation which is unnecessary, due to the interpolation nature of quotation marks ("double quotes").  In general, everywhere you've got a <close-quotes> <dot> <open-quotes> combination, you can remove it without change to your functionality (but it's much easier to read!)

For example, you can change:

print "Using: "."vol: $vol_to_parse". " and " . "file: $file_to_parse" + . "\n";

to:

print "Using: vol: $vol_to_parse and file: $file_to_parse\n";

Finally, if you use $! after system calls like open, it will tell you the exact error you got (eg. "file not found", "permission error", etc.):

open(DAT, "<", $file_to_parse) || die "Could not open file! ($!)\n +";

There are numerous other issues that I'll address by example rather than explanation.  Here's a stripped-down, cleaned-up and working version of your program:

#!/usr/bin/perl # Libraries use strict; use warnings; use Getopt::Long; # Globals and default arguments my $vol_to_parse = "john"; my $file_to_parse = "file1.txt"; my @raw_data; # Main program process_args(); open_sesame(); pars0r(); # Subroutines sub process_args { GetOptions ( 'v=s' => \$vol_to_parse, 'h=s' => \$file_to_parse, ) or die "syntax: $0 -v <volume> -h <file>\n"; } sub open_sesame{ open(DAT, "<", $file_to_parse) || die "Could not open file! ($!)\n +"; chomp(@raw_data = <DAT>); close(DAT); } sub pars0r{ foreach my $volume (@raw_data) { if ($volume =~ /$vol_to_parse/) { print "$vol_to_parse got got \n"; } } }

Feel free to ask other questions (of course), and keep up the good Perl learning!


s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/

Replies are listed 'Best First'.
Re^2: Parsing a file line by line until a certain indice in array
by AnomalousMonk (Archbishop) on Sep 05, 2009 at 21:20 UTC
    I think the main problem is that where you have:
        if ($volume =~ $vol_to_parse) {
    you really should have:
        if ($volume =~ /$vol_to_parse/) {
    since it's a regular expression.

    A 'naked string' used as a regex will work just as well with the  =~ and  !~ binding operators as enclosing the string in  // delimiters, as shown by the example below.

    >perl -wMstrict -le "my $rx = 'x\dy'; for my $str (qw(xxy x23y x4y xy)) { print qq{match '$str'} if $str =~ $rx; print qq{no match '$str'} if $str !~ $rx; } " no match 'xxy' no match 'x23y' match 'x4y' no match 'xy'

    The problem with using a string in that way (or within  // delimiters) is that Perl is forced to re-compile the regex each time it is encountered (IIRC). This probably does not matter in a short script processing a short file, but it may matter if the file is 100,000,000 lines long!

    The re-compilation problem can be avoided by using the  /o 'compile once' regex modifier:
        if ($volume =~ /$vol_to_parse/o) {

    However, the preferred method with modern Perl is to compile a regex object with the  qr operator. With such an object, you never have to worry about re-compilation unless you really need to do it.
        my $rx_vol_to_parse = qr{ \Q$vol_to_parse\E }xms;
        if ($volume =~ $rx_vol_to_parse) {
    In addition (and probably more importantly), if you use a regex object you don't have to worry about differences in interpolation between single- and double-quoted strings and in regexes, and regex metacharacters. For instance, consider the difference between the behavior of
        my $rx = 'x\dy';
    and
        my $rx = "x\dy";
    in the command-line example given above.

    As mentioned above, the difference between the two approaches (i.e., regex object vs. naked string) is probably not significant in this particular situation, but it's always best to try to develop good programming habits right from the beginning.

      Thank you very much for the help guys. All my questions thus far have been answered and then some.

      It completely works the way I want it to, but I am curious, what if some lines had, lets say a word in the beginning of the line. That I did not want to return with the rest of the line, for instance, I have one line with the word DISKUSED in the front.

      The normal structure of the lines I am returning look like this:

      DISKUSED OK - /vol/hello/ - total: 83886080 Kb - used 519800 Kb (1%) - + free: 83366280 Kb /vol/john/.snapshot - total: 0 Kb - used 30971856 Kb (0%) - free: 0 Kb /vol/bill/ - total: 20132660 Kb - used 7178128 Kb (36%) - free: 129545 +32 Kb /vol/ted/ - total: 52428800 Kb - used 4137924 Kb (8%) - free: 48290876 + Kb

      There is only one instance where there will be something before the volume name in my file. I just want to be prepared for that, but at the same time I am curious as to how to get passed that if it occurred multiple times.

      I guess what I am trying to ask is, as opposed to returning the entire line, how can I manipulate the STDOUT to be printing everything from $vol_to_parse to end of line?

      And secondly, is that the right way I should be thinking for a situation like this in perl? Or is there some kind of reverse-chomp-type modifier.

      Once again, any and all help is greatly appreciated.

      Thank you,

      Sluggo

        If I understand your question correctly, you might try something like:
        my $vol_to_parse = qr{ /what/ever }xms; my $line = get_a_line_somehow(); if ($line =~ m{ ($vol_to_parse .*) }xms) { print "$1 \n"; }
Re^2: Parsing a file line by line until a certain indice in array
by sluggo (Novice) on Sep 05, 2009 at 20:21 UTC
    Hello Liverpole,

    Thank you very much for your help and insight, I am better understanding how to think when programming in Perl as well as the structure.

    The file I am parsing, contains each volume on a separate line, but then at the bottom it repeats every line but without \n, which I believe is a side-effect of parsing all volumes at once with the script that I am using.

    I wanted to make the pars0r subroutine skip printing the '$vol_to_parse got got' to STDOUT for every instance of john after the first.

    Is the proper way in Perl to make another nested if statement inside the existing if statement under the pars0r subroutine? I was thinking to declare a counter to increment/count how many times it showed up and only print the first instance. Is there an easier way to skip every instance of john after the first instance?

    Something like:

    sub pars0r{ foreach my $volume (@raw_data) { if ($volume =~ /$vol_to_parse/) { my $counter++; if($counter <=1){ print "$vol_to_parse got got \n"; } elsif ( $counter > 1 ) { next; # or break; ? } } } }

    Am I approaching the logic in this the wrong way?

    This is my code so far, but it prints out the line 'john got got' 6 times as opposed to one, where did I go wrong?

    #!/usr/bin/perl #Libraries use strict; use warnings; use Getopt::Long; #Globals and default arguments my $vol_to_parse = "john"; my $file_to_parse = "file1.txt"; my @raw_data; my $counter=0; #Main Program process_args(); open_sesame(); pars0r(); #Subroutines sub process_args{ GetOptions ( 'v=s' => \$vol_to_parse, 'h=s' => \$file_to_parse, ) or die "syntax: $0 -v <volume> -h <file>\n"; } sub open_sesame{ open(DAT, "<", $file_to_parse) || die "Could not open file! ($!)\n + +"; chomp(@raw_data = <DAT>); close(DAT); } sub pars0r{ foreach my $volume (@raw_data) { if ($volume =~ /$vol_to_parse/){ $counter++; } if ($counter >=1){ print "$vol_to_parse got got \n"; } elsif( $counter > 1){ next; } } }

    Output is:

    john got got john got got john got got john got got john got got john got got
      Hi sluggo,

      Your first example of pars0r immediately above has a subtle bug in it -- you have my $counter++ instead of $counter++.   This has the effect of creating a new instance of $counter, which is why you'll be printing "$vol_to_parse got got\n" every time you get a match:

      sub pars0r{ foreach my $volume (@raw_data) { if ($volume =~ /$vol_to_parse/) { my $counter++; # Create new $counter, set it to + 1 if($counter <= 1){ # This will always be true print "$vol_to_parse got got \n"; } elsif ($counter > 1 ) { # Block will never be executed next; # or break; ? } } } }

      If you take out the $my it should function correctly.

      Your second example of pars0r immediately above has a different bug; even if you change the if ($counter >= 1) to if ($counter <= 1) as I think you meant, once you've found the first match, and until you find a second one, you will always print "$vol_to_parse got got\n":

      sub pars0r{ foreach my $volume (@raw_data) { if ($volume =~ /$vol_to_parse/){ $counter++; # Set counter to one the first time } if ($counter <= 1){ # Changed ">= 1" to "<= 1" # This block executes after the first match, and up to # the next non-match, which is NOT what you want! ## print "$vol_to_parse got got \n"; } elsif( $counter > 1){ next; } } }

      Why not just do something like this:

      sub pars0r{ my $b_got_match = 0; # Initialize boolean to FALSE foreach my $volume (@raw_data) { if ($volume =~ /$vol_to_parse/){ if (!$b_got_match) { # Only print the match once, as $b_got_match gets # set once we print it, and then this conditional # will never again get executed. ## print "$vol_to_parse got got\n"; $b_got_match = 1; } # We can still track the total number of matches ++$counter; } } }

      s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/