Re^5: please help me to resolve the Line comments and appending issue

Array search can be done using grep, but if you want just to check whether an item exists in the list, using hash is faster:

my %keywords = map { $_ => 1 } qw/L CLI BNE LTR .../;
...
#later in the code
if (exists $keywords{$whatever_is_suspected_to_be_a_keyword}) {
 # it is a keyword
 ...
} ...
[download]

Sorry if my advice was wrong.

Comment on Re^5: please help me to resolve the Line comments and appending issue Download Code

Replies are listed 'Best First'.
Re^6: please help me to resolve the Line comments and appending issue by aaron_baugher (Curate) on Aug 10, 2012 at 15:46 UTC
Almost always true in practice. But there is more overhead in a hash than in an array, and that's magnified if you already have an array and have to duplicate it as a hash. If you're going to check more than one value against the list, as in the FAQ "How do I search file2 for all the values in file1?" then a hash is almost certainly the best solution. But if you only have to check one or two values, you may be better off sticking with grep: % cat 986735.pl #!/usr/bin/env perl use Modern::Perl; use Benchmark qw(:all); my @words = split /\s+/, `cat bigfile`; #8.5MB file say scalar @words, " words in bigfile"; #1.3M words my $match = 'professional'; # appears 16 times scattered through bigfi +le cmpthese( 10, { 'hash it' => \&hashit, 'grep it' => \&grepit, 'first it' => \&firstit, }); sub hashit { my %h; @h{@words} = (); my $exists = exists $h{$match}; } sub grepit { my $exists = grep { $_ eq $match } @words; } sub firstit { use List::Util 'first'; my $exists = first { $_ eq $match } @words; } % perl 986735.pl 1293687 words in bigfile Rate hash it grep it first it hash it 3.33/s -- -43% -89% grep it 5.80/s 74% -- -81% first it 29.8/s 795% 413% -- [download] So if I only need to search the list once, grep wins over a hash. For multiple searches, a hash comes out ahead. List::Util's `first()` routine splits the difference a little; with my dataset it beats the hash for up to about 8 searches, but I assume that would vary greatly depending on how early in the array the match is made. It'd take more testing with matches found earlier/later/never in the array to form a good comparison there. My simple conclusion would be: always build a hash unless you know your program will only search it once; then use grep or List::Util::first. Also, if I'm pulling data in from somewhere (like a file) and I know I'm going to be searching it this way, I put it straight into a hash from the start. Aaron B. Available for small or large Perl jobs; see my home node.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^6: please help me to resolve the Line comments and appending issue
by aaron_baugher (Curate) on Aug 10, 2012 at 15:46 UTC

Almost always true in practice. But there is more overhead in a hash than in an array, and that's magnified if you already have an array and have to duplicate it as a hash. If you're going to check more than one value against the list, as in the FAQ "How do I search file2 for all the values in file1?" then a hash is almost certainly the best solution. But if you only have to check one or two values, you may be better off sticking with grep:

 % cat 986735.pl
#!/usr/bin/env perl
use Modern::Perl;
use Benchmark qw(:all);

my @words = split /\s+/, `cat bigfile`; #8.5MB file
say scalar @words, " words in bigfile"; #1.3M words
my $match = 'professional'; # appears 16 times scattered through bigfi
+le

cmpthese( 10, {
        'hash it' => \&hashit,
        'grep it' => \&grepit,
        'first it' => \&firstit,
});

sub hashit {
    my %h;
    @h{@words} = ();
    my $exists = exists $h{$match};
}

sub grepit {
    my $exists = grep { $_ eq $match } @words;
}

sub firstit {
    use List::Util 'first';
    my $exists = first { $_ eq $match } @words;
}
 % perl 986735.pl
1293687 words in bigfile
           Rate  hash it  grep it first it
hash it  3.33/s       --     -43%     -89%
grep it  5.80/s      74%       --     -81%
first it 29.8/s     795%     413%       --
[download]

So if I only need to search the list once, grep wins over a hash. For multiple searches, a hash comes out ahead. List::Util's first() routine splits the difference a little; with my dataset it beats the hash for up to about 8 searches, but I assume that would vary greatly depending on how early in the array the match is made. It'd take more testing with matches found earlier/later/never in the array to form a good comparison there.

My simple conclusion would be: always build a hash unless you know your program will only search it once; then use grep or List::Util::first. Also, if I'm pulling data in from somewhere (like a file) and I know I'm going to be searching it this way, I put it straight into a hash from the start.

Aaron B.
Available for small or large Perl jobs; see my home node.

[reply]
[d/l]
[select]