Re: looking for speed!! large file search and extract

one-liner:

c:\> perl -n -e "print if /^abcde.+PARTNAME$/" c:\somefile.txt>k:\1\so
+mefile.txt
[download]

c:\> perl -n -e "print if /^abcde/ && /PARTNAME$/" c:\somefile.txt>k:\
+1\somefile.txt
[download]

whatever is faster.

Update: The second one is approx 50% faster.
I tried with a file of 73MB and 900.000 lines, where every second line matches.
one-liner 1 takes 11 seconds, one-liner 2 takes 6 seconds.

Update:
one-liner using

substr()

c:\> perl -n -e "print if substr($_,0,5) eq q(abcde) && substr($_,-9) 
+eq qq(PARTNAME\n)" c:\somefile.txt>k:\1\somefile.txt
[download]

Comment on Re: looking for speed!! large file search and extract Select or Download Code

Replies are listed 'Best First'.
Re^2: looking for speed!! large file search and extract by Roy Johnson (Monsignor) on Jan 12, 2005 at 16:03 UTC
I recommend looking at substr and (if on Unix) the egrep utility, too. Caution: Contents may have been coded under pressure.	[reply]
Re^3: looking for speed!! large file search and extract by smbs (Acolyte) on Jan 12, 2005 at 17:00 UTC
Thanx for answer but now have to make a small change I only want to extract the lines on condition that the line directly above it starts and ends with the following 5 chararacters "xyzdf" basically looking for 2 line match thanx	[reply]
Re^4: looking for speed!! large file search and extract by Tanktalus (Canon) on Jan 12, 2005 at 17:07 UTC
A couple minor points (and maybe I'm a bit too new to PM to make the comments): You probably should have this in a new question, not a reply on the previous question, since it's now a new question. You probably should try something yourself, and then come back if it doesn't work. Or even if it does - share your answer and get feedback on it. One WTDI is to use a simplified state machine: `my $match; while (<FH>) { print C $_ if $match and /^abcde.*PARTNAME$/; $match = /xyzdf$/; }` [download] This will set $match to true if the current line matches xyzdf at the end, false otherwise. The next time through the loop, we only check your second-line regexp if $match is already true (that is, the previous line matched the other regexp).	[reply] [d/l]
Re^5: looking for speed!! large file search and extract by Roy Johnson (Monsignor) on Jan 12, 2005 at 17:18 UTC
Re^4: looking for speed!! large file search and extract by kutsu (Priest) on Jan 12, 2005 at 17:08 UTC
then change holli's command line statement from `perl -n -e "print if /^abcde/ && /PARTNAME$/" c:\somefile.txt>k:\1\somefile.txt` to `perl -n -e "print if /^xyzdf/ && /xyzdf$/" c:\somefile.txt>k:\1\somefile.txt` If you don't understand this I really recommend you read perlre Update: Somehow missed reading "line directly above", so ignore the rest...except for reading perlre that's always a good idea if you haven't "Cogito cogito ergo cogito sum - I think that I think, therefore I think that I am." Ambrose Bierce	[reply] [d/l] [select]
Re^4: looking for speed!! large file search and extract by holli (Abbot) on Jan 12, 2005 at 19:19 UTC
if i get your comment right, this could be: `c:\> perl -n -e "print $last, $_ if /xyzdf$/ && $last; $last= /^xyzdf +/ ? $_ : ''" file1>file2` [download] Assuming file1 looks like `abc xyzdf def hij xyzdf klm xyzdf nop qrs xyzdf` [download] file2 will end up as `xyzdf def hij xyzdf xyzdf nop qrs xyzdf` [download] Is that what you want? Update: if not, post some sample data and the desired output.	[reply] [d/l] [select]
Re^5: looking for speed!! large file search and extract by smbs (Acolyte) on Jan 13, 2005 at 09:39 UTC
Re^6: looking for speed!! large file search and extract by holli (Abbot) on Jan 13, 2005 at 10:14 UTC