in reply to Re^2: Dealing with large files in Perl
in thread Dealing with large files in Perl

If you really ran the code exactly as I posted it, and your first command-line arg (assigned to $value) was really "00e06f16b25", then I just don't see how you could come up with the output that you cited inside your "snip" tags. Please double-check that you didn't alter the code, and that you ran it as intended.

But now that you have provided more information about your data -- that the value you want to match is the first token on each data line, and this consists of a long hex number -- you can speed things up and make it more trustworthy by using "substr" and "eq" instead of a regex match:

use strict; my $Usage = "Usage: $0 value file1 file2\n"; die $Usage unless ( @ARGV == 3 and -f $ARGV[1] and -f $ARGV[2] ); my $value = shift; # removes first element from @ARGV my $chklen = length( $value ); my @match; # will hold matching line from each file for my $file ( @ARGV ) { # loop over remaining two ARG's open( IN, $file ) or die "$file: $!"; while (<IN>) { if ( substr( $_, 0, $chklen ) eq $value ) { chomp; push @match, $_; last; } } close IN; # (this was implicit in the earlier version) } print join( " ", @match ), "\n";

Note that in either version, if the value you provide on the command line turns out to be shorter than the initial hex number on each line of the input files, there's a chance that you'll get a "false alarm" match.

For example, in the initial regex version, if the search value on the command line was just "6b" or "00", this could explain why the record from the second file was not right -- "6b" and "00" are found in both records.

Replies are listed 'Best First'.
Re^4: Dealing with large files in Perl
by tester786 (Initiate) on May 17, 2005 at 20:02 UTC
    This code didn't return anything. however, the last sample data output I have provided was the grep I did against the file and I just copied with your generated output. I agree with what you indicated using substr as oppose to regex. what I need to know if need to implement this within the script as another sub how can i do that. please get back to me.
finding highest and lowest number
by tester786 (Initiate) on May 23, 2005 at 23:24 UTC
    Good evening all, No need to response to this as again I got my script to work.. once completed I'll post the codes. Regards,
      Thanking people is nice. Telling us what solution you found is even better, because it means someone else can learn from your experience.
        Certainly... I should have done that..

        #!/usr/bin/perl

        &processf();
        use Fcntl ':flock';

        sub processf() {

        unless (open(FILER, "ts-speed.txt")) {
        die ("cannot open input file out2\n");
        }

        $newfile="final.out";

        flock FILE, LOCK_SH;


        while ($line=<FILER>) { # first loop starts here..
        chomp($line);
        my $mymac = substr($line, 0, 12);
        unless (open(FILER2, "docsis-file-orig")) {
        die ("cannot open input file out1\n");
        }

        while ($item=<FILER2>) { # second loop starts here....
        chomp ($item);
        $mymac1 = substr($item, 0, 12);
        if ($mymac =~ /^$mymac1$+/g) {
        unless (open(FILEW, ">>$newfile")) {
        die ("cannot open input file newfile\n");
        }
        flock(FILEW, 2); # file lock set
        printf("'%s' and '%s' are identical\n", $mymac, $mymac1);
        print FILEW ("$item $line\n");
        flock(FILEW, 8); # file lock unset
        close(FILEW);
        } #if end here;
        } #while second end here;
        close(FILER2);


        } #while first end here;


        close(FILER);
        } #sub end here;