Monks, I'm stumped on this one. I have two files, file1 an original with 37621 names in it, 1 per line. I produced it using grep > file1. I have a second file, file2 with 37585 names in it, that are exactly the same as what is in file1, except there are 36 missing. I thought, well, I can use perl to get the differences. So off I went and wrote a small script:
while (<FH1>) { chomp; my $line = $_; #print __LINE__ . " $_\n"; my ($var1,$var2) = split(/:/,$line); $var2 = substr($var2,1); #remove 1st space results from substr $diffHash1{$var2} = $var2; } close(FH1); getCount(\%diffHash1); #$sclar = keys(%hash) my $cnt1 = getCount(\%diffHash1,'1'); print "Reading in $file2\n"; open FH2, "< $file1" || die "Counld not open $file2: $!"; while (<FH2>) { chomp; my $line = $_; my ($var1,$var2) = split(/:/,$line); $var2 = substr($var2,1); $diffHash2{$var2} = $var2; } close(FH2); print "Comparing $file1 and $file2\n"; my $cnt2 = getCount(\%diffHash2,'1'); print "Count in $file1: $cnt1\n"; print "Count in $file2: $cnt2\n"; if ($cnt1 gt $cnt2) { while (my($k1,$v1) = each(%diffHash1)) { my $line = $k1; if (!$diffHash2{$line}) { $resHash{$line} = $line; } } } else { while (my($k1,$v1) = each(%diffHash2)) { my $line = $k1; if (!$diffHash1{$line}) { $resHash{$line} = $line; } } }
getCount just does a keys <%hash> into scalar to get a count of keys. Problem is, perl thinks there are only 37585 lines in file1.Huh?
Since everything is in a format of insert_job: <jn> I can check perl:
grep insert_job | wc -l
which returns 37621. Alright, me thinks, maybe there are blank lines, or some other entries that I'm not coding for. So I sort the two files, then do a diff redirect and get 36 names that aren't in file2. Question is, where am I letting perl down in my code? How come perl isn't seeing all the jobs in file1?

In reply to Different counts between perl and grep by herda05

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.