add/replace map result into existing hash

fredo2906 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a list of files that contains multiple lines. I get those lines into an array, then I would like to push those into a hash with map for later quick search.

for(glob("*.gz")){
        my @o = `zcat $_ | sed 's/[<> ]//g'`;chomp @o;push @l,@o;
}
my %h = map { $_, 1 } @l;
[download]

I am trying to remove "my %h = map { $_, 1 } @l;" and "push @l,@o" to use less memory and maybe speed up a bit the process. Any good idea?

====== Update

zcat file1.gz will return :

- line1 xxxxx
- line2 yyyyy
- line3 zzzzz
[download]

the array @l is containing for each turn of the loop :

- file1_line1 xxxxx
- file1_line2 yyyyy
- file1_line3 zzzzz
- filen_line1 xxxxxxx
- filen_line2 yyyyyyy
- filen_line3 zzzzzzz
[download]

then the hash %h is containing

xxxxx -> 1
yyyyy -> 1
zzzzz -> 1
xxxxxxx -> 1
yyyyyyy -> 1
zzzzzzz -> 1
[download]

The amount of keys are counted in millions. So, using a hash is much better than using a grep in array to find if a key exist or not later on. Every little bits count, so even if i didnt profiled the code I did both tries with hash and grep and to accomplish the whole treatment with a grep it takes about 15min and with a hash it takes about 1min.

Comment on add/replace map result into existing hash Select or Download Code

Replies are listed 'Best First'.
Re: add/replace map result into existing hash by DrHyde (Prior) on Feb 19, 2014 at 11:41 UTC
Your `for` loop is effectively a `map`: `@l = map { ... } glob("*.gz")` [download] Does that help? BTW, the `shell stuff` will give you a scalar with embedded newlines, not a list of lines of text which is what I presume you want. You'll be better of using `open(my $fh, '-\|', "zcat $_")` instead and read a line at a time, and translate the little sed snippet into perl. Finally, why do you think that rewriting the code will save memory or make it faster? And do you know that it actually needs to be made faster? Have you profiled your code?	[reply] [d/l] [select]
Re: add/replace map result into existing hash by hdb (Monsignor) on Feb 19, 2014 at 11:36 UTC
It is not quite clear how you want your final structure to look like. Here is something untested: my %h = map { $_ => [ `zcat $_ \| sed 's/[<> ]//g'` ] } glob("*.gz"); chomp @$_ for values %h; [download]	[reply] [d/l]
Re^2: add/replace map result into existing hash by fredo2906 (Acolyte) on Feb 19, 2014 at 14:06 UTC
Thanks, it is actually something like that I was looking for. I updated the post so you can see the kind of structure i am using.	[reply]
Re: add/replace map result into existing hash by kcott (Archbishop) on Feb 19, 2014 at 15:27 UTC
G'day fredo2906, I created some test input, before seeing your updated OP, as follows: `$ cat > pm_1075438_1.txt qw<er ty> a>sd <fgh $ gzip pm_1075438_1.txt $ cat > pm_1075438_2.txt <zxc vbn> 123> <456 $ gzip pm_1075438_2.txt` [download] This script removes the need for the intermediary `@o` and `@l`: #!/usr/bin/env perl use strict; use warnings; my %h; ++$h{$_} for map { chomp; $_ } `zcat @{[glob '.gz']} \| sed 's/[<> ]// +g'`; use Data::Dump; dd \%h; [download] Output: `{ 123456 => 1, asdfgh => 1, qwerty => 1, zxcvbn => 1 }` [download] FWIW, this line: ++$h{$_} for map { chomp; s/[<> ]//g; $_ } `zcat @{[glob '.gz']}`; [download] produces identical output. I'll leave you to benchmark (if you want). -- Ken	[reply] [d/l] [select]
Re^2: add/replace map result into existing hash by fredo2906 (Acolyte) on Feb 20, 2014 at 00:10 UTC
Thank you. Exactly what i needed.	[reply]