annie06 has asked for the wisdom of the Perl Monks concerning the following question:

To add to my original question (and to also post here so it may be seen...)
Ok, I'm trying something new as well here and took your suggestions. I want to be able to pass arguments to my script so that it knows what file to open and what to look for in the file:

for example, you would call it with "my_script file "Info I want"
#!/usr/bin/perl $infile = shift @ARGV; open (FILE, $infile) or die "Unable to open $infile: $!"; while (<FILE>) { foreach (@ARGV) { $pattern = $_; if ($pattern eq "Info I want") { if (/^Info I want/../^Start of/) { print unless ( /^Start of Info/ ); } } } } close (FILE);


But this doesn't return what I need. It actually prints repeated "Info I want" to the screen.



Hi all, Here is an example of my file that I want to parse:

----------------
asdfsdfds
asdfasdf
blah blah blah

Info I Want

I want this line
And this line
And this line

Start of Info I don't want

blah blah
blah blah blah

So the only constant is the heading of the text I want "Info I want". But I don't know how many lines will be after that before we get to the line that reads "Start of Info I don't want" (also a constant).
Is there anyway to parse this? Right now I have:
#!/usr/bin/perl $infile = "my_file"; open (FILE, $infile); @text = <FILE>; close(FILE); foreach $line (@text) { $_ = $line; if (/^Info I want/../^Start of Info/) { print; } }
But that prints an extra line that I don't want:
START OF OUTPUT:
Info I Want

I want this line
And this line
And this line

Start of Info I don't want
END OF OUTPUT

Replies are listed 'Best First'.
Re: Parse a block of text
by dragonchild (Archbishop) on Jul 07, 2008 at 19:22 UTC
    Exclude the line you don't want.
    if (/^Info I want/../^Start of Info/) { unless ( /^Start of Info/ ) { print; } }
    Yes, there's a nifty-keen way to do it with regexes, but this is simpler, less error-prone, and easier for your maintainers to understand.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      that works, thanks so much! Just curious, how would you do it with regexes?
        Your problem is that you're reading a line at a time. This means you don't have the information you need until it's too late. You'd have to switch to reading two lines at a time and that's just silly. But, you'd use a positive lookahead, in-string line anchors, and the sm modifiers.

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Parse a block of text
by talexb (Chancellor) on Jul 07, 2008 at 19:44 UTC

    You're almost there. I think you probably want something like this:

    #!/usr/bin/perl $infile = "my_file"; open (FILE, $infile) or die "Unable to open $infile: $!"; while(<FILE>) { if (/^Info I want/../^Start of Info/) { print unless (/^Start of Info/); } } close (FILE);

    I haven't tested this .. but it should point you in the right direction. Note that a) I tested the open to make sure that the operation succeeded, and b) I didn't read the file into an array.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

      Ok, I'm trying something new as well here and took your suggestions. I want to be able to pass arguments to my script so that it knows what file to open and what to look for in the file:

      for example, you would call it with "my_script file "Info I want"


      #!/usr/bin/perl $infile = shift @ARGV; open (FILE, $infile) or die "Unable to open $infile: $!"; while (<FILE>) { foreach (@ARGV) { $pattern = $_; if ($pattern eq "Info I want") { if (/^Info I want/../^Start of/) { print unless ( /^Start of Info/ ); } } } } close (FILE);

      But this doesn't return what I need. It actually prints repeated "Info I want" to the screen.

        I find that if I add comments as I go, writing code is much easier. Here's some code that works for me:

        #!/usr/bin/perl use strict; { # Get filename, starting regex and stopping regex # from the command line. my ( $filename, $start, $stop ) = @ARGV; # Check the inputs, open the file. die "ERROR: Need to define start and stop arguments." unless ( defined($start) && defined($stop) ); open(INPUT, $filename) or die "Unable to open $filename: $!"; # Loop through the file, looking for the starting # regexp and skipping lines till then. while(<INPUT>) { if ( /$start/ ) { # Found the matching line! Print it, and start # looking for the stopping regexp. print; while(<INPUT>) { # If we found the stopping regexp, drop out # of this loop; otherwise, print a line and # keep going. if ( /$stop/ ) { last; } else { print; } } } } close(INPUT); }

        I've grabbed all of the arguments from the command line right at the start -- this is far easier than trying to remember what's still left to be read in later.

        The advantage to this code is that it will find multiple occurrences of start/stop in a single file.

        Alex / talexb / Toronto

        "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

        I don't quite get what exactly you want to pass in as your patterns since you seem to be hardcoding them anyway...so I'll take a stab in the dark. Also, the ".." and "..." operators return a count of lines matched along with an indicator that tells you if it matched the stop pattern. So:
        my ($file, $start, $stop) = @ARGV; $start = qr/^$start/; $stop = qr/^$stop/; open(my $fh, "<", $file) or die "Err: $!"; while(<$fh>) { if ( my $status = /$start/.../$stop/ ) { print unless $status =~ /E/; } } close $fh;
Re: Parse a block of text
by Lawliet (Curate) on Jul 07, 2008 at 19:24 UTC

    Put parenthesis around the part you want

    if (/^(Info I want.+)Start of Info/) { print $1 ; }

    Try that.

    <(^.^-<) <(-^.^<) <(-^.^-)> (>^.^-)> (>-^.^)>
      if (/^(Info I want.+)Start of Info/) { print $1 ; }
      This still grabs the 'Info I want' part, which doesn't seem (despite the name) to be part of the desired info. Thus, exclude it. Meanwhile, we might as well make this a multi-line regular expression with the /m flag, so that the /./ can gobble up new-lines (which seems to be desired); explicitly notice that the Info I want and Start of Info should be on their own lines; and allow there to be nothing at all between the markers (unless we don't want that). This gives:
      print $1 if ( /^Info I want$(.*)^Start of Info$/m )
      (Of course, this makes the end-of-section marker 'Start of Info', which is slightly different from what the original poster said.) It's easy to gobble up leading and trailing new-lines, if they're not significant:
      print $1 if ( /^Info I want\n+(.*?)\n+Start of Info$/m )
      (Note the non-greedy quantifier.) Of course, this only fetches one occurrence of the info; if there might be many occurrences, then we can use a lazy quantifier as above with a while loop and the /g flag. If the original poster can change the format of the input files, then it might be appropriate to consider changing the markers and using Text::Balanced.

        Ah right - the multiline flag. Forgot about that.

        Thanks for correcting me.

        <(^.^-<) <(-^.^<) <(-^.^-)> (>^.^-)> (>-^.^)>
Re: Parse a block of text
by Cristoforo (Curate) on Jul 07, 2008 at 22:35 UTC
    Here's a way to identify if it is either the beginning or ending match or both.
    if((my $first = /^Info I want/) ... (my $last = /^Start of/)) { print unless $first || $last; }
Re: Parse a block of text
by hilitai (Monk) on Jul 07, 2008 at 19:26 UTC
    print unless /^Start of Info/;

    perhaps?

Re: Parse a block of text
by Anonymous Monk on Jul 07, 2008 at 19:49 UTC
    Another approach, not necessarily better than the others:

    use warnings; use strict; my $START = qr{ \A Info [ ] I [ ] Want }xms; my $STOP = qr{ \A Start [ ] of [ ] Info [ ] I [ ] don't }xms; while (<DATA>) { if (/$START/ .. /$STOP/ xor /$STOP/) { print; } } __DATA__ asdfsdfds asdfasdf blah blah blah Info I Want I want this line And this line And this line this is the last line i want Start of Info I don't want blah blah blah blah blah

    Output:

    Info I Want I want this line And this line And this line this is the last line i want
      while (<DATA>) { # logic crystal clear here next unless /$START/ .. /$STOP/; # next if /$START/; # exclude line with START pattern next if /$STOP/; # exclude line with STOP pattern print; }
      On second thought...

      This solution fails if there is a stop-line not balanced by a start-line. A better solution (or at least one that doesn't have a subtle potential bug) is:

      use warnings; use strict; my $START = qr{ \A Info [ ] I [ ] Want }xms; my $STOP = qr{ \A Start [ ] of [ ] Info [ ] I [ ] don't }xms; while (<DATA>) { # if (/$START/ .. /$STOP/ xor /$STOP/) { # print; # } if (/$START/ .. /$STOP/ and not /$STOP/) { print; } } __DATA__ asdfsdfds asdfasdf blah blah blah Info I Want I want this line And this line And this line this is the last line i want Start of Info I don't want blah blah blah blah blah Start of Info I don't want blah blah blah blah blah
      <p Output: <p
      Info I Want I want this line And this line And this line this is the last line i want