in reply to very slow processing

When you make the first pass, instead of pushing each ID into an array then filtering out duplicates, use a hash with ID as the key. For the value, concatenate your formatted output. For the second pass, loop on the keys of the hash, printing the strings in the hash:

my $lnum = 0; for my $line (@lines) { $line =~ /your regex/; my $date = $1; my $id = $2; my $keyword = $3; $urecs{$id} .= "$date,$id,$keyword \n"; } print $urecs{$urec} for my $urec (keys %urecs);

Displamer: Not tested.

Replies are listed 'Best First'.
Re^2: very slow processing
by sandy105 (Scribe) on Aug 20, 2014 at 18:37 UTC

    the id's are repeated so need to check for unique'ids and keywords

      Hash keys are always unique. In my example, if an ID has already been seen, the new string is appended to the previous content of the value for that ID.

      I could have written:

      for my $line (@lines) { $line =~ /your regex/; if (exists $urecs{$2}) { $urecs{$2} .= "$1,$2,$3\n"; } else { $urecs{$2} = "$1,$2,$3\n"; } }

      but that is not necessary because Perl treats appending to an undefined value the same as appending to an empty string.

      Another thing you could do: $ids{$2}++ would give you a hash of IDs seen (the keys) and how many (the values) times each was seen (again, no need to check for existence first).

      As for checking the keywords, I left that out so to focus my example on the use of the hash.