sluggo has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I am a novice at Perl, but am enjoying learning the language thus far. However, I have run into an obstacle and am totally clueless how to proceed. This is my code:
#!/usr/bin/perl use strict; use Getopt::Long; #GetOpt::Long::Configure('bundling'); #sub process_args(); #sub pars0r(); &process_args; &open_sesame; &pars0r; #my $hostname; my $vol_to_parse; my $file_to_parse; my @raw_data; my $status; my $volume; sub process_args(){ $status = GetOptions ( 'v=s' => \$vol_to_parse, # volume to parse argumento 'h=s' => \$file_to_parse); print "Process arg subrountine real\n"; print "Using: "."vol: $vol_to_parse". " and " . "file: $file_to_parse" + . "\n"; } #$file_to_parse="netapp_volume_location"; sub open_sesame{ open(DAT, $file_to_parse) || die("Could not open file! Is it there?"); @raw_data=<DAT>; #Take everything from file and put in @raw_data array close(DAT); print "open and close ran \n"; } #my @arrayz=($raw_data[0]..$raw_data[-1]); print "Gonna print whole array"; print "$raw_data[0..-1]"; my $y = qw ($raw_data[0]..$raw_data[3]); sub pars0r{ # foreach $volume (@arrayz) foreach $volume (@raw_data) { chomp($volume); # print "$volume" . "\n"; if ($volume =~ $vol_to_parse) { print "$vol_to_parse"." ". "got got \n"; } # elsif ($volume !=~ $vol_to_parse) # { # redo; #} else{ print "Did not find volume!\n"; } } }
Pretty much I am using a Nagios plugin to check a Net-App filer and I export all volumes of the NetApp filer's diskusage to a file. I want to parse this file line by line and return each line if it matches a certain phrase (volume name ie. $vol_to_parse). At the same time, the plugin not only produces each volumes disk usage line by line into a text file for me, it also unfortunately reproduces every volume in a nasty mess at the end of the file. This is why I was trying to do: my @arrayz=($raw_data[0]..$raw_data-1);

however my output from my code of a file1.txt with the info contains on 4 seperate lines:

hello

goodbye

john

dillinger

and my output is:

root# perl ult_test.pl -v john -h file1.txt

Process arg subrountine real

Using: vol: john and file: file1.txt

open and close ran

Did not find volume!

Did not find volume!

john got got

Did not find volume!

Gonna print whole arraygoodby.

I only want it to return the line with john in it. Am I going about this all wrong? Any help would be greatly appreciated. Thank you.

Sluggo

Replies are listed 'Best First'.
Re: Parsing a file line by line until a certain indice in array
by toolic (Bishop) on Sep 05, 2009 at 17:26 UTC
    I only want it to return the line with john in it.
    If you mean that you only want your program to print to STDOUT the line with 'john' in it, then you should not use print everywhere except in your 'got got' line.

    If you mean something else, please clarify. It would help if you show us an example of the exact output you expect, inside 'code' tags.

    See also: How do I compose an effective node title?

      Hello toolic,

      Thank you very much for taking the time to reply. I am sorry about the lack of clarity in my post (first time hehe). Also, sorry for the massive prints, I was using them to see which parts of the code were actually executing.

      Yes you are correct, I would like to print to STDOUT just the line with john in it.

      The exact output I expect is:

      root# perl ult_test.pl -v john -h file1.txt john got got

      Really all I'm looking for is john got got, the others just let me know the other sections were run. I ran into trouble with having the subroutine 'pars0r' print the else statement for every element/line in the list/array when all I want is the line that contains john

      Thank you very much for your time. If I am still unclear at all, please let me know and I will do my best to convey better.

      Sluggo

Re: Parsing a file line by line until a certain indice in array
by liverpole (Monsignor) on Sep 05, 2009 at 19:23 UTC
    Hi sluggo,

    I think the main problem is that where you have:

    if ($volume =~ $vol_to_parse) {

    you really should have:

    if ($volume =~ /$vol_to_parse/) {

    since it's a regular expression.

    Update:  Thanks to AnomalousMonk below for pointing out that my "fix" above was incorrect, and for teaching me something new!

    A couple of other suggestions, though:

    You should get in the habit of using warnings (in addition to strict).

    Note that you've got a lot of string concatenation which is unnecessary, due to the interpolation nature of quotation marks ("double quotes").  In general, everywhere you've got a <close-quotes> <dot> <open-quotes> combination, you can remove it without change to your functionality (but it's much easier to read!)

    For example, you can change:

    print "Using: "."vol: $vol_to_parse". " and " . "file: $file_to_parse" + . "\n";

    to:

    print "Using: vol: $vol_to_parse and file: $file_to_parse\n";

    Finally, if you use $! after system calls like open, it will tell you the exact error you got (eg. "file not found", "permission error", etc.):

    open(DAT, "<", $file_to_parse) || die "Could not open file! ($!)\n +";

    There are numerous other issues that I'll address by example rather than explanation.  Here's a stripped-down, cleaned-up and working version of your program:

    #!/usr/bin/perl # Libraries use strict; use warnings; use Getopt::Long; # Globals and default arguments my $vol_to_parse = "john"; my $file_to_parse = "file1.txt"; my @raw_data; # Main program process_args(); open_sesame(); pars0r(); # Subroutines sub process_args { GetOptions ( 'v=s' => \$vol_to_parse, 'h=s' => \$file_to_parse, ) or die "syntax: $0 -v <volume> -h <file>\n"; } sub open_sesame{ open(DAT, "<", $file_to_parse) || die "Could not open file! ($!)\n +"; chomp(@raw_data = <DAT>); close(DAT); } sub pars0r{ foreach my $volume (@raw_data) { if ($volume =~ /$vol_to_parse/) { print "$vol_to_parse got got \n"; } } }

    Feel free to ask other questions (of course), and keep up the good Perl learning!


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
      I think the main problem is that where you have:
          if ($volume =~ $vol_to_parse) {
      you really should have:
          if ($volume =~ /$vol_to_parse/) {
      since it's a regular expression.

      A 'naked string' used as a regex will work just as well with the  =~ and  !~ binding operators as enclosing the string in  // delimiters, as shown by the example below.

      >perl -wMstrict -le "my $rx = 'x\dy'; for my $str (qw(xxy x23y x4y xy)) { print qq{match '$str'} if $str =~ $rx; print qq{no match '$str'} if $str !~ $rx; } " no match 'xxy' no match 'x23y' match 'x4y' no match 'xy'

      The problem with using a string in that way (or within  // delimiters) is that Perl is forced to re-compile the regex each time it is encountered (IIRC). This probably does not matter in a short script processing a short file, but it may matter if the file is 100,000,000 lines long!

      The re-compilation problem can be avoided by using the  /o 'compile once' regex modifier:
          if ($volume =~ /$vol_to_parse/o) {

      However, the preferred method with modern Perl is to compile a regex object with the  qr operator. With such an object, you never have to worry about re-compilation unless you really need to do it.
          my $rx_vol_to_parse = qr{ \Q$vol_to_parse\E }xms;
          if ($volume =~ $rx_vol_to_parse) {
      In addition (and probably more importantly), if you use a regex object you don't have to worry about differences in interpolation between single- and double-quoted strings and in regexes, and regex metacharacters. For instance, consider the difference between the behavior of
          my $rx = 'x\dy';
      and
          my $rx = "x\dy";
      in the command-line example given above.

      As mentioned above, the difference between the two approaches (i.e., regex object vs. naked string) is probably not significant in this particular situation, but it's always best to try to develop good programming habits right from the beginning.

        Thank you very much for the help guys. All my questions thus far have been answered and then some.

        It completely works the way I want it to, but I am curious, what if some lines had, lets say a word in the beginning of the line. That I did not want to return with the rest of the line, for instance, I have one line with the word DISKUSED in the front.

        The normal structure of the lines I am returning look like this:

        DISKUSED OK - /vol/hello/ - total: 83886080 Kb - used 519800 Kb (1%) - + free: 83366280 Kb /vol/john/.snapshot - total: 0 Kb - used 30971856 Kb (0%) - free: 0 Kb /vol/bill/ - total: 20132660 Kb - used 7178128 Kb (36%) - free: 129545 +32 Kb /vol/ted/ - total: 52428800 Kb - used 4137924 Kb (8%) - free: 48290876 + Kb

        There is only one instance where there will be something before the volume name in my file. I just want to be prepared for that, but at the same time I am curious as to how to get passed that if it occurred multiple times.

        I guess what I am trying to ask is, as opposed to returning the entire line, how can I manipulate the STDOUT to be printing everything from $vol_to_parse to end of line?

        And secondly, is that the right way I should be thinking for a situation like this in perl? Or is there some kind of reverse-chomp-type modifier.

        Once again, any and all help is greatly appreciated.

        Thank you,

        Sluggo

      Hello Liverpole,

      Thank you very much for your help and insight, I am better understanding how to think when programming in Perl as well as the structure.

      The file I am parsing, contains each volume on a separate line, but then at the bottom it repeats every line but without \n, which I believe is a side-effect of parsing all volumes at once with the script that I am using.

      I wanted to make the pars0r subroutine skip printing the '$vol_to_parse got got' to STDOUT for every instance of john after the first.

      Is the proper way in Perl to make another nested if statement inside the existing if statement under the pars0r subroutine? I was thinking to declare a counter to increment/count how many times it showed up and only print the first instance. Is there an easier way to skip every instance of john after the first instance?

      Something like:

      sub pars0r{ foreach my $volume (@raw_data) { if ($volume =~ /$vol_to_parse/) { my $counter++; if($counter <=1){ print "$vol_to_parse got got \n"; } elsif ( $counter > 1 ) { next; # or break; ? } } } }

      Am I approaching the logic in this the wrong way?

      This is my code so far, but it prints out the line 'john got got' 6 times as opposed to one, where did I go wrong?

      #!/usr/bin/perl #Libraries use strict; use warnings; use Getopt::Long; #Globals and default arguments my $vol_to_parse = "john"; my $file_to_parse = "file1.txt"; my @raw_data; my $counter=0; #Main Program process_args(); open_sesame(); pars0r(); #Subroutines sub process_args{ GetOptions ( 'v=s' => \$vol_to_parse, 'h=s' => \$file_to_parse, ) or die "syntax: $0 -v <volume> -h <file>\n"; } sub open_sesame{ open(DAT, "<", $file_to_parse) || die "Could not open file! ($!)\n + +"; chomp(@raw_data = <DAT>); close(DAT); } sub pars0r{ foreach my $volume (@raw_data) { if ($volume =~ /$vol_to_parse/){ $counter++; } if ($counter >=1){ print "$vol_to_parse got got \n"; } elsif( $counter > 1){ next; } } }

      Output is:

      john got got john got got john got got john got got john got got john got got
        Hi sluggo,

        Your first example of pars0r immediately above has a subtle bug in it -- you have my $counter++ instead of $counter++.   This has the effect of creating a new instance of $counter, which is why you'll be printing "$vol_to_parse got got\n" every time you get a match:

        sub pars0r{ foreach my $volume (@raw_data) { if ($volume =~ /$vol_to_parse/) { my $counter++; # Create new $counter, set it to + 1 if($counter <= 1){ # This will always be true print "$vol_to_parse got got \n"; } elsif ($counter > 1 ) { # Block will never be executed next; # or break; ? } } } }

        If you take out the $my it should function correctly.

        Your second example of pars0r immediately above has a different bug; even if you change the if ($counter >= 1) to if ($counter <= 1) as I think you meant, once you've found the first match, and until you find a second one, you will always print "$vol_to_parse got got\n":

        sub pars0r{ foreach my $volume (@raw_data) { if ($volume =~ /$vol_to_parse/){ $counter++; # Set counter to one the first time } if ($counter <= 1){ # Changed ">= 1" to "<= 1" # This block executes after the first match, and up to # the next non-match, which is NOT what you want! ## print "$vol_to_parse got got \n"; } elsif( $counter > 1){ next; } } }

        Why not just do something like this:

        sub pars0r{ my $b_got_match = 0; # Initialize boolean to FALSE foreach my $volume (@raw_data) { if ($volume =~ /$vol_to_parse/){ if (!$b_got_match) { # Only print the match once, as $b_got_match gets # set once we print it, and then this conditional # will never again get executed. ## print "$vol_to_parse got got\n"; $b_got_match = 1; } # We can still track the total number of matches ++$counter; } } }

        s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/