Parse a block of text

annie06 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parse a block of text by dragonchild (Archbishop) on Jul 07, 2008 at 19:22 UTC
Exclude the line you don't want. `if (/^Info I want/../^Start of Info/) { unless ( /^Start of Info/ ) { print; } }` [download] Yes, there's a nifty-keen way to do it with regexes, but this is simpler, less error-prone, and easier for your maintainers to understand. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply] [d/l]
Re^2: Parse a block of text by annie06 (Acolyte) on Jul 07, 2008 at 19:31 UTC
that works, thanks so much! Just curious, how would you do it with regexes?	[reply]
Re^3: Parse a block of text by dragonchild (Archbishop) on Jul 08, 2008 at 00:42 UTC
Your problem is that you're reading a line at a time. This means you don't have the information you need until it's too late. You'd have to switch to reading two lines at a time and that's just silly. But, you'd use a positive lookahead, in-string line anchors, and the sm modifiers. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]
Re: Parse a block of text by talexb (Chancellor) on Jul 07, 2008 at 19:44 UTC
You're almost there. I think you probably want something like this: `#!/usr/bin/perl $infile = "my_file"; open (FILE, $infile) or die "Unable to open $infile: $!"; while(<FILE>) { if (/^Info I want/../^Start of Info/) { print unless (/^Start of Info/); } } close (FILE);` [download] I haven't tested this .. but it should point you in the right direction. Note that a) I tested the open to make sure that the operation succeeded, and b) I didn't read the file into an array. Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds	[reply] [d/l]
Re^2: Parse a block of text by annie06 (Acolyte) on Jul 07, 2008 at 20:40 UTC
Ok, I'm trying something new as well here and took your suggestions. I want to be able to pass arguments to my script so that it knows what file to open and what to look for in the file: for example, you would call it with "my_script file "Info I want" `#!/usr/bin/perl $infile = shift @ARGV; open (FILE, $infile) or die "Unable to open $infile: $!"; while (<FILE>) { foreach (@ARGV) { $pattern = $_; if ($pattern eq "Info I want") { if (/^Info I want/../^Start of/) { print unless ( /^Start of Info/ ); } } } } close (FILE);` [download] But this doesn't return what I need. It actually prints repeated "Info I want" to the screen.	[reply] [d/l]
Re^3: Parse a block of text by talexb (Chancellor) on Jul 07, 2008 at 21:14 UTC
I find that if I add comments as I go, writing code is much easier. Here's some code that works for me: #!/usr/bin/perl use strict; { # Get filename, starting regex and stopping regex # from the command line. my ( $filename, $start, $stop ) = @ARGV; # Check the inputs, open the file. die "ERROR: Need to define start and stop arguments." unless ( defined($start) && defined($stop) ); open(INPUT, $filename) or die "Unable to open $filename: $!"; # Loop through the file, looking for the starting # regexp and skipping lines till then. while(<INPUT>) { if ( /$start/ ) { # Found the matching line! Print it, and start # looking for the stopping regexp. print; while(<INPUT>) { # If we found the stopping regexp, drop out # of this loop; otherwise, print a line and # keep going. if ( /$stop/ ) { last; } else { print; } } } } close(INPUT); } [download] I've grabbed all of the arguments from the command line right at the start -- this is far easier than trying to remember what's still left to be read in later. The advantage to this code is that it will find multiple occurrences of start/stop in a single file. Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds	[reply] [d/l]
Re^3: Parse a block of text by runrig (Abbot) on Jul 07, 2008 at 22:52 UTC
I don't quite get what exactly you want to pass in as your patterns since you seem to be hardcoding them anyway...so I'll take a stab in the dark. Also, the ".." and "..." operators return a count of lines matched along with an indicator that tells you if it matched the stop pattern. So: `my ($file, $start, $stop) = @ARGV; $start = qr/^$start/; $stop = qr/^$stop/; open(my $fh, "<", $file) or die "Err: $!"; while(<$fh>) { if ( my $status = /$start/.../$stop/ ) { print unless $status =~ /E/; } } close $fh;` [download]	[reply] [d/l]
Re: Parse a block of text by Lawliet (Curate) on Jul 07, 2008 at 19:24 UTC
Put parenthesis around the part you want `if (/^(Info I want.+)Start of Info/) { print $1 ; }` [download] Try that. `<(^.^-<) <(-^.^<) <(-^.^-)> (>^.^-)> (>-^.^)>`	[reply] [d/l] [select]
Re^2: Parse a block of text by JadeNB (Chaplain) on Jul 07, 2008 at 21:04 UTC
`if (/^(Info I want.+)Start of Info/) { print $1 ; }` [download] This still grabs the 'Info I want' part, which doesn't seem (despite the name) to be part of the desired info. Thus, exclude it. Meanwhile, we might as well make this a multi-line regular expression with the /m flag, so that the /./ can gobble up new-lines (which seems to be desired); explicitly notice that the Info I want and Start of Info should be on their own lines; and allow there to be nothing at all between the markers (unless we don't want that). This gives: `print $1 if ( /^Info I want$(.)^Start of Info$/m )` [download] (Of course, this makes the end-of-section marker 'Start of Info', which is slightly different from what the original poster said.) It's easy to gobble up leading and trailing new-lines, if they're not significant: `print $1 if ( /^Info I want\n+(.?)\n+Start of Info$/m )` [download] (Note the non-greedy quantifier.) Of course, this only fetches one occurrence of the info; if there might be many occurrences, then we can use a lazy quantifier as above with a while loop and the /g flag. If the original poster can change the format of the input files, then it might be appropriate to consider changing the markers and using Text::Balanced.	[reply] [d/l] [select]
Re^3: Parse a block of text by Lawliet (Curate) on Jul 07, 2008 at 21:15 UTC
Ah right - the multiline flag. Forgot about that. Thanks for correcting me. `<(^.^-<) <(-^.^<) <(-^.^-)> (>^.^-)> (>-^.^)>`	[reply] [d/l]
Re: Parse a block of text by Cristoforo (Curate) on Jul 07, 2008 at 22:35 UTC
Here's a way to identify if it is either the beginning or ending match or both. `if((my $first = /^Info I want/) ... (my $last = /^Start of/)) { print unless $first \|\| $last; }` [download]	[reply] [d/l]
Re: Parse a block of text by hilitai (Monk) on Jul 07, 2008 at 19:26 UTC
`print unless /^Start of Info/;` [download] perhaps?	[reply] [d/l]
Re: Parse a block of text by Anonymous Monk on Jul 07, 2008 at 19:49 UTC
Another approach, not necessarily better than the others: `use warnings; use strict; my $START = qr{ \A Info [ ] I [ ] Want }xms; my $STOP = qr{ \A Start [ ] of [ ] Info [ ] I [ ] don't }xms; while (<DATA>) { if (/$START/ .. /$STOP/ xor /$STOP/) { print; } } __DATA__ asdfsdfds asdfasdf blah blah blah Info I Want I want this line And this line And this line this is the last line i want Start of Info I don't want blah blah blah blah blah` [download] Output: `Info I Want I want this line And this line And this line this is the last line i want` [download]	[reply] [d/l] [select]
Re^2: Parse a block of text by Anonymous Monk on Jul 07, 2008 at 20:46 UTC
`while (<DATA>) { # logic crystal clear here next unless /$START/ .. /$STOP/; # next if /$START/; # exclude line with START pattern next if /$STOP/; # exclude line with STOP pattern print; }` [download]	[reply] [d/l]
Re^2: Parse a block of text by Anonymous Monk on Jul 07, 2008 at 20:17 UTC
On second thought... This solution fails if there is a stop-line not balanced by a start-line. A better solution (or at least one that doesn't have a subtle potential bug) is: `use warnings; use strict; my $START = qr{ \A Info [ ] I [ ] Want }xms; my $STOP = qr{ \A Start [ ] of [ ] Info [ ] I [ ] don't }xms; while (<DATA>) { # if (/$START/ .. /$STOP/ xor /$STOP/) { # print; # } if (/$START/ .. /$STOP/ and not /$STOP/) { print; } } __DATA__ asdfsdfds asdfasdf blah blah blah Info I Want I want this line And this line And this line this is the last line i want Start of Info I don't want blah blah blah blah blah Start of Info I don't want blah blah blah blah blah` [download] <p Output: <p `Info I Want I want this line And this line And this line this is the last line i want` [download]	[reply] [d/l] [select]