Re: Long list is long

A number of problems with the code you posted:

Missing strict.
Missing warnings.
Missing I/O checking and exception handling. I suggest autodie.
Package variables used throughout. Use lexical (my) variables instead; follow links for details. This applies to your filehandles as well.
Use the 3-argument form of open.

From your description, I'd say the bottleneck lies with the population of the three arrays: @p, @q and @i. This is all unnecessary work and those arrays are not even needed. See "perlperf - Perl Performance and Optimization Techniques", for benchmarking and profiling techniques, to get a clearer picture of where problems lie.

I created these three dummy test files. In case you're unfamiliar with cat -vet, ^I represents a tab and $ represents a newline.

$ for i in A B C; do echo -e "\n*** $i"; cat $i; echo '----';  cat -ve
+t $i; done

*** A
foo     73
bar     35
word    27
blah    23
----
foo^I73$
bar^I35$
word^I27$
blah^I23$

*** B
bar     35
yada    3
word    27
blah    23
----
bar^I35$
yada^I3$
word^I27$
blah^I23$

*** C
foo     73
word    27
blah    23
life    42
----
foo^I73$
word^I27$
blah^I23$
life^I42$
[download]

Then this test code:

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

my @in_files = qw{A B C};
my $outfile = 'merge_count.out';
my %data;
my $out_fmt = "%s\t%d\n";

for my $infile (@in_files) {
    open my $fh, '<', $infile;

    while (<$fh>) {
        my ($word, $count) = split;
        $data{$word} += $count;
    }
}

open my $fh, '>', $outfile;

for my $key (sort { $data{$a} <=> $data{$b} } keys %data) {
    printf $fh $out_fmt, $key, $data{$key};
}
[download]

Output (raw and showing special characters):

$ cat merge_count.out
yada    3
life    42
blah    69
bar     70
word    81
foo     146

$ cat -vet merge_count.out
yada^I3$
life^I42$
blah^I69$
bar^I70$
word^I81$
foo^I146$
[download]

Try that with your real files. I suspect it should be faster and not have the bottlenecks. Let us know if you still have problems: show your new code and profiling output (in <readme> or <spoiler> tags).

— Ken

Comment on Re: Long list is long Select or Download Code

Replies are listed 'Best First'.
Re^2: Long list is long by eyepopslikeamosquito (Archbishop) on Oct 30, 2022 at 10:11 UTC
A number of problems with the code you posted: Missing strict* ... Missing warnings ...* Kudos to kcott for patiently showing by example yet again how to write excellent, clean and succinct Perl code. Unfortunately, it seems unlikely that the monk in question will follow your sage advice, despite a gentle nudge from haukex a year ago. Given the Bod's recent posting history on this topic: Re^4: Newline's creep in, while using Tie::File - "is it a universal belief that strictures should be on or a majority belief?" Backdating strict - "should strict always be used or are there circumstances when it makes sense not to use this pragma?" I'm trusting that a once similarly recalcitrant Bod has now seen the light and might be persuaded to share his experiences ... along with why (or why not) he uses `strict` and `warnings` nowadays. Oh, and just in case it helps, a non-Perl Monk reference on this topic: Always use strict and warnings in your Perl code (perlmaven) Update: I also keep a list of references on this topic: use strict and warnings References	[reply] [d/l] [select]
Re^3: Long list is long by kcott (Archbishop) on Oct 30, 2022 at 13:34 UTC
G'day eyepopslikeamosquito, Thanks for the compliment. I usually look at an OP's previous posts. This gives me an idea of the OP's level of Perl knowledge and how best to frame my response. I did check on this occasion; wondered if I was flogging a dead horse; but chose to proceed anyway. To Chuma: I've been writing Perl code for almost 30 years. A substantial part of my paid employment involves writing Perl code. I use these pragmata for personal, $work and PM code. I don't do this because it's trendy, expected, or for any other frivolous reasons; I do it because they're extremely useful and save me a lot of time. Even with decades of experience, I still make typos and other silly mistakes (just like everyone else does) — I'd like Perl to tell me about these problems as soon as possible, instead of getting weird or unexpected output and spending a lot of time tracking down the source of the bug. I encourage you to use these pragmata for the same reasons. — Ken	[reply]