Re^3: Dealing with large files in Perl

If you really ran the code exactly as I posted it, and your first command-line arg (assigned to $value) was really "00e06f16b25", then I just don't see how you could come up with the output that you cited inside your "snip" tags. Please double-check that you didn't alter the code, and that you ran it as intended.

But now that you have provided more information about your data -- that the value you want to match is the first token on each data line, and this consists of a long hex number -- you can speed things up and make it more trustworthy by using "substr" and "eq" instead of a regex match:

use strict;

my $Usage = "Usage: $0 value file1 file2\n";
die $Usage unless ( @ARGV == 3 and -f $ARGV[1] and -f $ARGV[2] );

my $value = shift;  # removes first element from @ARGV
my $chklen = length( $value );

my @match;  # will hold matching line from each file

for my $file ( @ARGV ) {  # loop over remaining two ARG's
    open( IN, $file ) or die "$file: $!";
    while (<IN>) {
        if ( substr( $_, 0, $chklen ) eq $value ) {
            chomp;
            push @match, $_;
            last;
        }
    }
    close IN; # (this was implicit in the earlier version)
}

print join( " ", @match ), "\n";
[download]

Note that in either version, if the value you provide on the command line turns out to be shorter than the initial hex number on each line of the input files, there's a chance that you'll get a "false alarm" match.

For example, in the initial regex version, if the search value on the command line was just "6b" or "00", this could explain why the record from the second file was not right -- "6b" and "00" are found in both records.

Comment on Re^3: Dealing with large files in Perl Download Code

Replies are listed 'Best First'.
Re^4: Dealing with large files in Perl by tester786 (Initiate) on May 17, 2005 at 20:02 UTC
This code didn't return anything. however, the last sample data output I have provided was the grep I did against the file and I just copied with your generated output. I agree with what you indicated using substr as oppose to regex. what I need to know if need to implement this within the script as another sub how can i do that. please get back to me.	[reply]
finding highest and lowest number by tester786 (Initiate) on May 23, 2005 at 23:24 UTC
Good evening all, No need to response to this as again I got my script to work.. once completed I'll post the codes. Regards,	[reply]
Re^5: Dealing with large files in Perl by jZed (Prior) on May 23, 2005 at 23:33 UTC
Thanking people is nice. Telling us what solution you found is even better, because it means someone else can learn from your experience.	[reply]
Re^6: Dealing with large files in Perl by tester786 (Initiate) on May 26, 2005 at 06:12 UTC
Certainly... I should have done that.. #!/usr/bin/perl &processf(); use Fcntl ':flock'; sub processf() { unless (open(FILER, "ts-speed.txt")) { die ("cannot open input file out2\n"); } $newfile="final.out"; flock FILE, LOCK_SH; while ($line=<FILER>) { # first loop starts here.. chomp($line); my $mymac = substr($line, 0, 12); unless (open(FILER2, "docsis-file-orig")) { die ("cannot open input file out1\n"); } while ($item=<FILER2>) { # second loop starts here.... chomp ($item); $mymac1 = substr($item, 0, 12); if ($mymac =~ /^$mymac1$+/g) { unless (open(FILEW, ">>$newfile")) { die ("cannot open input file newfile\n"); } flock(FILEW, 2); # file lock set printf("'%s' and '%s' are identical\n", $mymac, $mymac1); print FILEW ("$item $line\n"); flock(FILEW, 8); # file lock unset close(FILEW); } #if end here; } #while second end here; close(FILER2); } #while first end here; close(FILER); } #sub end here;	[reply]