batcater98 has asked for the wisdom of the Perl Monks concerning the following question:

I have a very large file that I am parsing through as in my last post - Thank you for the help!! Now I am trying to use a list of items in an array - parse though the large file pulling out a field and checking to see if it is in the array - if so increment a count to get a record count. I don't think my lookup in the arrary is working.
$logfile="d:\\ScanReps\\TR&D\\WDL_Scan_0000.csv"; $locofile = "d:\\ScanReps\\TR&D\\locos.txt"; open(DAT, $logfile); open(loco, $locofile); %loco_data = <loco>; close(loco); $linecnt1 = 0; $linecnt2 = 0; while ($record = <DAT>) { chop($locodn); ($RR_Name,$Loco_no,$Event_Date,$Event_Time,$Event_Code,$Event_Durati +on,$Source_File)=split(/,/,$record); chop($loco_data); chop($Loco_no); if (exists($loco_data{$Loco_no})) { $linecnt1 = $linecnt1 +1; } $linecnt2 = $linecnt2 + 1; print "Rec: ",$linecnt2, " In: ",$linecnt1, "\r"; } close(DAT);
The locos.txt is just a list of numbers - one number per line - I want to compare the value pulled in to $loco_no to the values in the array and if there inc the counter. Thanks, Ad.

Replies are listed 'Best First'.
Re: If Exists in an Array?
by keszler (Priest) on Nov 12, 2009 at 23:03 UTC
    Have you checked what's in %loco_data?
    use strict; use Data::Dump qw/dump/; my %loco_data = <DATA>; print dump \%loco_data; __DATA__ 123 234 345 456 #{ # "123\n" => 234 # , # "345\n" => 456 # , #}

    Also, you should at least use strict;; use warnings; and use diagnostics; will give you even more info.
    For opening files it's better to use the 3-element open.
    Is the code you posted a cut-n-paste? I don't see where $locodn is coming from, or why you'd need to chop it.
    chomp is generally preferable to chop.
    To increment variables: $linecnt1++ or $linecnt1 += $x.

Re: If Exists in an Array?
by bichonfrise74 (Vicar) on Nov 12, 2009 at 23:07 UTC
    Note: untested. I think this is something that you might want to take a look at. Again, untested.
    #!/usr/bin/perl use strict; my $logfile = '/tmp/WDL_Scan_0000.csv'; my $locofile = '/tmp/locos.txt'; my ($locodn, $loco_data); my ($linecnt1, $linecnt2) = 0; open( my $file_1, '<', $logfile) or die "Error: Cannot open $logfile."; my @loco_data = <$file_1>; close( $file_1 ); open( my $file_2, '<', $locofile) or die "Error: Cannot open $locofile."; while ( <$file_2> ) { chop($locodn); my ($RR_Name, $Loco_no, $Event_Date, $Event_Time, $Event_Code, $Event_Duration,$Source_File ) = split( /,/ ); chop($loco_data); chop($Loco_no); $linecnt1++ if grep { /\b$Loco_no\b/ } @loco_data; $linecnt2++; print "Rec: ",$linecnt2, " In: ",$linecnt1, "\r"; } close( $file_2 );
      my ($linecnt1, $linecnt2) = 0;

      Is that what you meant to write?

      linecnt1++ if grep { /\b$Loco_no\b/ } @loco_data;

      Doing a hash lookup--like the op was trying to do--is more efficient than searching the entire array every time through the loop. You know, O(n squared) v. O(n) type stuff.

      use strict; use warnings; use 5.010; open (my $LOGFILE, "<", "logfile.txt"); open (my $MATCHFILE, "<", "matches.txt"); chomp(my @keys = <$MATCHFILE>); close $MATCHFILE; #Initialize hash: my %target_matches; @target_matches{@keys} = (); #now the keys exist, and the values are undef my($num_records, $match_count) = (0, 0); while (<$LOGFILE>) { chomp; my @fields = split /,/; if (exists $target_matches{$fields[1]}) { $match_count++; } $num_records++; say "Total records: $num_records, matches: $match_count"; } close $LOGFILE;

      matches.txt:

      1 2 3 4

      logfile.txt:

      a,5,b,c,d,e a,2,b,c,d,e a,8,b,c,d,e a,3,b,c,d,e

      output:

      Total records: 1, matches: 0 Total records: 2, matches: 1 Total records: 3, matches: 1 Total records: 4, matches: 2
        I prefer:
        my %target_matches = map { chomp; $_,undef } <$MATCHFILE>;

        No need for @keys that way.

        Note: untested.

        Ah.
        Thanks so much - this is going to work great. I have one other question. My source file just because of the way it is built - the $field1 value has leading spaces, how can I remove them on the fly?
        Source Example: sx, 8849, sample1, source2 st, 893, sample2, source3 Matching Table: 8849 893
        With the leading white space the match does not hit. Thanks Again, Ad.