looking for speed!! large file search and extract

smbs has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: looking for speed!! large file search and extract by bart (Canon) on Jan 12, 2005 at 15:38 UTC
I see no reason to first read in the whole file and then process it line by line doing a similar tes ttwice, when one will do I'd just do `while(<FH>) { push @required, $_ if /^abcde.partname$/; }` [download] or, if that's all you do with those lines: `while(<FH>) { print C if /^abcde.partname$/; }` [download]	[reply] [d/l] [select]
Re: looking for speed!! large file search and extract by holli (Abbot) on Jan 12, 2005 at 15:47 UTC
one-liner: `c:\> perl -n -e "print if /^abcde.+PARTNAME$/" c:\somefile.txt>k:\1\so +mefile.txt` [download] or `c:\> perl -n -e "print if /^abcde/ && /PARTNAME$/" c:\somefile.txt>k:\ +1\somefile.txt` [download] whatever is faster. Update: The second one is approx 50% faster. I tried with a file of 73MB and 900.000 lines, where every second line matches. one-liner 1 takes 11 seconds, one-liner 2 takes 6 seconds. Update: one-liner using `substr() c:\> perl -n -e "print if substr($_,0,5) eq q(abcde) && substr($_,-9) +eq qq(PARTNAME\n)" c:\somefile.txt>k:\1\somefile.txt` [download]	[reply] [d/l] [select]
Re^2: looking for speed!! large file search and extract by Roy Johnson (Monsignor) on Jan 12, 2005 at 16:03 UTC
I recommend looking at substr and (if on Unix) the egrep utility, too. Caution: Contents may have been coded under pressure.	[reply]
Re^3: looking for speed!! large file search and extract by smbs (Acolyte) on Jan 12, 2005 at 17:00 UTC
Thanx for answer but now have to make a small change I only want to extract the lines on condition that the line directly above it starts and ends with the following 5 chararacters "xyzdf" basically looking for 2 line match thanx	[reply]
Re^4: looking for speed!! large file search and extract by Tanktalus (Canon) on Jan 12, 2005 at 17:07 UTC
Re^5: looking for speed!! large file search and extract by Roy Johnson (Monsignor) on Jan 12, 2005 at 17:18 UTC
Re^4: looking for speed!! large file search and extract by kutsu (Priest) on Jan 12, 2005 at 17:08 UTC
Re^4: looking for speed!! large file search and extract by holli (Abbot) on Jan 12, 2005 at 19:19 UTC
Re^5: looking for speed!! large file search and extract by smbs (Acolyte) on Jan 13, 2005 at 09:39 UTC
Some notes below your chosen depth have not been shown here
Re: looking for speed!! large file search and extract by Tanktalus (Canon) on Jan 12, 2005 at 15:40 UTC
Instead of @required and @all and the foreach: `while (<FH>) { print C $_ if /^abcde.PARTNAME$/; }` [download] Note that if either "abcde" or "PARTNAME" (or both) are variables, but don't change while reading this particular file, I would compile that regexp once: `my $re = qr/^$start.$end$/; while (<FH>) { print C $_ if m/$re/; }` [download] Another possible improvement may be to remove the .* and do two matches: `my $startre = qr/^$start/; my $endre = qr/$end$/; while (<FH>) { print C $_ if m/$startre/ and m/$endre/; }` [download] Also, please put your code into <code> and </code> tags - makes it much easier to read. Thanks.	[reply] [d/l] [select]
Re: looking for speed!! large file search and extract by waswas-fng (Curate) on Jan 12, 2005 at 15:43 UTC
`open INFI, "c:\\input\\file.txt" or die "cant open infile: $!\n"; open OUTFI, ">c:\\output\\file.txt" or "cant open outfile : $!\n"; while (<INFI>) { print OUTFI if /^abcde.*partname$/; }` [download] -Waswas	[reply] [d/l]
Re: looking for speed!! large file search and extract by vek (Prior) on Jan 12, 2005 at 22:29 UTC
Please remember that `@all=<FH>;` will read the entire file into memory. Safe for small files but might cause you grief with huge files. -- vek --	[reply] [d/l]
Re: looking for speed!! large file search and extract by perlsen (Chaplain) on Jan 13, 2005 at 10:35 UTC
I think this takes less time for process If U wish please try this `undef $/; open (FH, "D:\\temp.txt") \|\| die "Couldn't open file: $!"; @required=(); $str=<FH>; close(FH); (@arr)=$str=~m#(abcde.*?PARTNAME)#gsi; print "$_\n" for @arr;` [download]	[reply] [d/l]