Re: Searching a gzip file

I would unzip the file to a temporary location, then process it as a normal file.

This is the sort of thing that you could easily do with the awk tool, or of course with Perl. The general approach that I would use is that of a state-machine.

In this approach, consider that there are exactly four kinds of lines you can be looking at, at any time:

Pattern 1.
Pattern 2.
A line that is neither.
End of file. (No more lines exist.)

And there are three states:

The last pattern seen was Pattern 1.
The last pattern seen was Pattern 2.
Neither pattern has been seen yet. (Initial state.)

So, the general idea is to imagine a 3x4 rectangular table and to work out, for each square, what the program needs to do.

Replies are listed 'Best First'.
Re^2: Searching a gzip file by baski (Novice) on Aug 26, 2010 at 15:08 UTC
That is a very methodical approach and I will try the approach and post my code once its done. If I understand it correctly, this involves unzipping the file and using a file pointer to remember where I last stopped scanning. Is there any other way of doing it without unzipping the file? I am not averse to using awk/sed but from my limited knowledge, I cant seem to come up with a way of achieving this. The point I am getting struck is remmebering the position of pattern 1. Thats why I thought perl will rescue me with file handlers. But anything with file handlers will impose a limitation on the size of file I am dealing with. Also, I would like to point out that each of pettern 1 and pattern 2 are different. < name > name 1 < /name > #unknoen no of lines < id > unique id1 < /id > #unknow no of lines < name > name 2 < /name > #unknoen no of lines < id > unique id2 < /id >	[reply]
Best way to search through blocks of data by baski (Novice) on Sep 11, 2010 at 02:57 UTC
Hi Monks, I need suggestions on the fastest way to do this: I have 3 directories with log files in them. All log files have the following pattern. < block> < id> xyz < /id> < url> foo.com < /url> .. < response> xyz < /response> < /block> < block> .. The task is to get the id if the url is foo.com from logs of first directory , search for that id in all the directories(including first one) print the responses from the corresponding blocks into a saparate file. #getting the ids from first directory sub doFile($) { my ($fn) =@_; chomp($fn); print "opening $fn\n"; my $fh = IO::File->new($fn, 'r'); my @msgLines; if( defined $fh){ while(my $l = <$fh>) { push @msgLines, $l; if($l =~ m"</msg>\s\$") { #my $msg = join('', @msgLines); my $id; if(grep{ m"http://.foo.com" } @msgLines) { #store the @msglines into an array, this + array can serve as source for searching for reponses from first dire +ctory, need to do something similar for the rest of directories. $id = grep { $_ =~ m"<Id>(\d+)</Id>"; +} #@msgLines; # $id =~ m"<Id>(\d+)</Id>"; push @IDs, $id; } @msgLines = (); } } } else{ die "Cannot open file $!\n";} } my @firstdir=@{$logfiles[0]}; my $path=$logdirs[0]; foreach (@firstdir) { my $curpath=sprintf($path.'/'.$_); print"In foreach trying to open $path\n"; doFile($curpath); } [download] The log files are huge, so zipping them into a single file is not possible(out of disk space). Any perl modules that can help me with this task?	[reply] [d/l]