I am trying to remove "my %h = map { $_, 1 } @l;" and "push @l,@o" to use less memory and maybe speed up a bit the process. Any good idea?for(glob("*.gz")){ my @o = `zcat $_ | sed 's/[<> ]//g'`;chomp @o;push @l,@o; } my %h = map { $_, 1 } @l;
====== Update
zcat file1.gz will return :the array @l is containing for each turn of the loop :- line1 xxxxx - line2 yyyyy - line3 zzzzz
then the hash %h is containing- file1_line1 xxxxx - file1_line2 yyyyy - file1_line3 zzzzz - filen_line1 xxxxxxx - filen_line2 yyyyyyy - filen_line3 zzzzzzz
The amount of keys are counted in millions. So, using a hash is much better than using a grep in array to find if a key exist or not later on. Every little bits count, so even if i didnt profiled the code I did both tries with hash and grep and to accomplish the whole treatment with a grep it takes about 15min and with a hash it takes about 1min.xxxxx -> 1 yyyyy -> 1 zzzzz -> 1 xxxxxxx -> 1 yyyyyyy -> 1 zzzzzzz -> 1
In reply to add/replace map result into existing hash by fredo2906
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |