Newbie Text Parsing Question

kanikilu has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Newbie Text Parsing Question by Abigail (Deacon) on Jul 06, 2001 at 01:18 UTC
Here's a slight variation. This one prints the line before and after the line(s) that match (the amount of lines is controlled by the variable `$range`). It uses a circular buffer, but all the functionality of dealing with circularity is hidden inside a `tie` mechanism. #!/opt/perl/bin/perl use strict; use warnings; my $file = "/usr/dict/words"; my $word = "perl"; my $range = 1; # -$range .. $range my $size = 2 * $range + 1; sub TIEARRAY {bless [("") x $_ [1]] => $_ [0]} sub STORE {${$_ [0]} [$_ [1] % @{$_ [0]}] = $_ [2]} sub FETCH {${$_ [0]} [$_ [1] % @{$_ [0]}]} sub FETCHSIZE {scalar @{$_[0]}} sub STORESIZE {die} tie my @buffer => 'main', $size; open my $fh => $file or die "Failed to open $file: $!"; while (<$fh>) { $buffer [$.] = $_; if ($buffer [$. - $range] =~ /$word/) { print @buffer [$. - $size + 1 .. $.]; } } # Borderline, matches at the end: for my $line ($. - $range + 1 .. $.) { print @buffer [$line - $range .. $.] if $buffer [$line] =~ /$word/ +; } __END__ [download] -- Abigail	[reply] [d/l] [select]
Re: Newbie Text Parsing Question by VSarkiss (Monsignor) on Jul 06, 2001 at 00:05 UTC
Yes and yes. If the files are small, the easiest way to "back up" is to read the whole thing into memory. Something like this: # Loop over all files with a .log suffix foreach my $fn (<*.log>) { # Open the file, if possible, and read it all into @f open I, $fn or warn("Couldn't open $fn: $!"), next; my @f = <I>; close I; # Go through it a line at a time for (my $i = 0; $i < @f; $i++) { # If you find "hello" anywhere in the line, # Back up two lines and print if possible if ($f[$i] =~ /hello/) { print $f[$i-2] if $i > 1; print $f[$i-1] if $i > 0; print $f[$i]; } # Note, if you don't care about "undefined value" # warnings, print the three elements without any # condition. } } [download] If the files are large, this could eat up lots of memory. In that case, you'll have to play games with backing up inside the file, which is trickier. (An exercise for the reader. ;-) HTH	[reply] [d/l]
Re: Re: Newbie Text Parsing Question by kanikilu (Initiate) on Jul 06, 2001 at 00:26 UTC
Thanks for the reply! The files should be between about 5 and 45 KB. Is this too big? I'll try what you suggested and reply back...	[reply]
Re: Re: Re: Newbie Text Parsing Question by kanikilu (Initiate) on Jul 06, 2001 at 00:39 UTC
Thanks! It worked perfectly. I added a couple lines to output it to a file and make the output a little "prettier", but it suits my purposes just fine. And there doesn't seem to be any memory 'issues'... Thanks again.	[reply]
Re: Newbie Text Parsing Question by Albannach (Monsignor) on Jul 06, 2001 at 00:29 UTC
If file size is a concern, just keep the recent lines in an array instead of slurping up everything. I just threw this thing together but it seems to do the trick: `use strict; my $fileglob = shift \|\| '*.pl'; my $pattern = shift \|\| 'hello'; my $keeplines = shift \|\| 3; for my $file (<${fileglob}>) { unless(open(IN, $file)) { warn "Can't read from $file: $!"; next; } my @lines; while(<IN>) { push @lines, $_; shift @lines if(@lines > $keeplines); if(/$pattern/i) { print "--- From $file:---\n@lines\n" } } }` [download] -- I'd like to be able to assign to an luser	[reply] [d/l]
Re: Newbie Text Parsing Question by tachyon (Chancellor) on Jul 06, 2001 at 00:31 UTC
This does the trick with inplace editing - lets you do really big files without a really big memory :-) #!/usr/bin/perl -w use strict; my $logfile = "/path/to/logfile"; my $outfile = "/path/to/outfile"; my $find = "hello"; my $second = ''; my $first = ''; my $line = 0; # allow regex unfriendly chars in $find $find = quotemeta $find; open (FILE, "<$logfile") or die "Oops perl says $!"; while (<FILE>) { chomp; $line++; &print_found if /$find/; $second = $first; $first = $_; } sub print_found { open (OUT, ">>$outfile") or die "Oops perl says $!"; print OUT "Line: $line\n"; print OUT "$second\n$first\n$_\n\n"; close OUT; } [download] Hope this helps cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Newbie Text Parsing Question by Brovnik (Hermit) on Jul 06, 2001 at 00:49 UTC
My version, editted from the first reply. `foreach my $fn (<*.log>) { # start with 3 entries to ensure 3 lines my @fifo = ('','',''); open I, $fn or warn("Couldn't open $fn: $!"), next; while (<I>) { #Add current line on one end and remove the first entry push(@fifo,$_); shift(@fifo); if (/monk/) { print '-'x40 ,$/; print "From file [$fn]:\n\n"; print @fifo; print '-'x40 , $/; } } close I; }` [download] -- Brovnik	[reply] [d/l]
Re: Newbie Text Parsing Question by particle (Vicar) on Jul 06, 2001 at 01:37 UTC
this ought to let you look at files of any size, and i think the output format is just what you wanted. #!/usr/bin/perl -w use strict; my $ext = '.log'; # extension to look for in filenames my $pat = 'bob'; # adjust to value to search for my $match; # track how many matches opendir(DH, '.') or die("CANT! $!"); # open directory foreach my $diritem ( readdir(DH) ) { # read directory next unless ( -f $diritem && # is it a file? $diritem =~ m/$ext$/ ); # does it end in $ext? open(FH, '<', $diritem) or die("CANT! $!"); # open the file for re +ading my @buffer; # buffer for previous lines push @buffer, scalar <FH> for 1 .. 3; # create three line bu +ffer while(<FH>) { # read the file line by line push @buffer, $_; # add line to end of buffer shift @buffer; # remove line from front of buff +er if( /$pat/ ) { # did i find the search pattern? $match++; # increment my match count # print fancy output, with # separator line, # match counter, filename, # two previous lines and matching line print "-----------------------------------------\n"; print "($match) From file $diritem\n"; print $buffer[0], $buffer[1], $buffer[2]; } # if } # while close FH; # close the file } # foreach [download] aah, that was a fun diversion! i searched for 'bob'. you might like to search for something more useful.... ~Particle	[reply] [d/l]
(Follow-Up)Re: Newbie Text Parsing Question by Hofmator (Curate) on Jul 06, 2001 at 14:08 UTC
Reading this question I tought at once of the grep family. Something along the lines of `% grep -B 2 pattern *.log > outfile` which should do the trick. Then I saw Windows 2000 so I thought the PPT or the utilities 'find' or 'findstr' could help out. But sadly all of them don't support any of the context options (like -2 or -B 2 or -C) which I considered standard for these kind of utilities. Can somebody tell me why these were not included in the PPT - to be more precise in tcgrep?? -- Hofmator	[reply] [d/l]