comment on

Maybe your perl code is right. The perl code and the grep are doing two different things. The perl code is populating a hash, which means that you will get collisions if the same key (job number) is inserted twice. The grep, however, is less picky. Duplicates will get printed out.

grep insert_job | sort -u | wc -l
[download]

That might print out something a bit closer to perl, assuming there is no other data on the line. Or, in perl, try this:

    my $dupes = 0;
    while (<FH1>) {
        chomp;
        my ($var1,$var2) = split(/:/,$_);
        $var2 = substr($var2,1); #remove 1st space results from substr

        # here is the important bit:
        if (exists $diffHash1{$var2})
        {
          ++$dupes;
          print "Dupe on line $.: $var2\n";
        }

        $diffHash1{$var2} = $var2;
    }
    print "$dupes dupes found in file1.\n";
[download]

This will tell you about any dupes (not the original line, just the additional lines - we could add that, too, but I'll leave it to you if you decide you want to do that). And then, if you total the count in the file plus the dupes, you'll get what your original grep count is.

That's not to say that the rest of your code is clean and doesn't require any stylistic changes, but we'll focus on the problem first, and worry about style later ;-)

In reply to Re: Different counts between perl and grep by Tanktalus
in thread Different counts between perl and grep by herda05

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.