in reply to Re: Dealing with large files in Perl
in thread Dealing with large files in Perl

You're absolutely right. your code is matches exactly what I'm looking for however, not getting the result I suspect. so here's the output after executing what  you listed.
 
<snip>
 
 
00e06f16b25 41000 306000 00112f9486bf 412 1696
 
</snip>
 
what I'm looking for is searching for this value 00e06f16b25 and match with file2, than take both matching lines from file1 and file2 and merge it to file3. so the result should be:
 
00e06f16b25 41000 306000 00e06f16b25 389 5000

Replies are listed 'Best First'.
Re^3: Dealing with large files in Perl
by graff (Chancellor) on May 16, 2005 at 21:43 UTC
    If you really ran the code exactly as I posted it, and your first command-line arg (assigned to $value) was really "00e06f16b25", then I just don't see how you could come up with the output that you cited inside your "snip" tags. Please double-check that you didn't alter the code, and that you ran it as intended.

    But now that you have provided more information about your data -- that the value you want to match is the first token on each data line, and this consists of a long hex number -- you can speed things up and make it more trustworthy by using "substr" and "eq" instead of a regex match:

    use strict; my $Usage = "Usage: $0 value file1 file2\n"; die $Usage unless ( @ARGV == 3 and -f $ARGV[1] and -f $ARGV[2] ); my $value = shift; # removes first element from @ARGV my $chklen = length( $value ); my @match; # will hold matching line from each file for my $file ( @ARGV ) { # loop over remaining two ARG's open( IN, $file ) or die "$file: $!"; while (<IN>) { if ( substr( $_, 0, $chklen ) eq $value ) { chomp; push @match, $_; last; } } close IN; # (this was implicit in the earlier version) } print join( " ", @match ), "\n";

    Note that in either version, if the value you provide on the command line turns out to be shorter than the initial hex number on each line of the input files, there's a chance that you'll get a "false alarm" match.

    For example, in the initial regex version, if the search value on the command line was just "6b" or "00", this could explain why the record from the second file was not right -- "6b" and "00" are found in both records.

      This code didn't return anything. however, the last sample data output I have provided was the grep I did against the file and I just copied with your generated output. I agree with what you indicated using substr as oppose to regex. what I need to know if need to implement this within the script as another sub how can i do that. please get back to me.
      Good evening all, No need to response to this as again I got my script to work.. once completed I'll post the codes. Regards,
        Thanking people is nice. Telling us what solution you found is even better, because it means someone else can learn from your experience.