Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^4: grep for lines containg two variables

by l3v3l (Monk)
on Dec 08, 2005 at 17:54 UTC ( [id://515313]=note: print w/replies, xml ) Need Help??


in reply to Re^3: grep for lines containg two variables
in thread grep for lines containg two variables

I am not sure I understand your points - so to clarify:

listing above was posted as pseudo code and not the actual code run - for $string1 and $string2 in the test case I put the literal strings of interest in place (and have updated to illustrate) I used what I found interesting for my specific situation and I left it generic when posting the code example above because I thought that would be more useful - I understand your point re: my $str... and now that works properly if I use your qr update (and I get the same times). This is not a useful or valid benchmark?

my input file was:

RH_MEa0001bA09_1 1253 871 10 GGAGAGGGGTCGAATTTCTC... RH_MEa0001bB03_1 553 104 12 GTCCGTTGCAACAAAAGTGA... RH_MEa0001bC11_1 1160 385 12 TGGGGTTGAAGAAAGGTTNG... RH_MEa0001bG06_1 710 14 18 Invalid starting position (14) RH_MEa0001bG06_2 710 34 10 GGGGGACACCTTCTCTCTCT... RH_MEa0001bG06_3 710 51 10 GGGGGACACCTTCTCTCTCT... etc

since diff boxes have different performance depending on input_files, strings, mem., proc. etc - I guess it is better to list relative results instead of specifics ... is it just luck that your results confirm the general reason I posted, that lookahead grep is the fastest (accurate) solution?

Replies are listed 'Best First'.
Re^5: grep for lines containg two variables
by ikegami (Patriarch) on Dec 08, 2005 at 18:35 UTC
    I am not sure I understand your points

    Change q{ to q{use strict; or q{print scalar @data; and you'll see. You've updated your node, but the problem is still there. @data is empty in the tests, because the tests are using our @main::data and not the my @data that holds the test file.

    listing above was posted as pseudo code and not the actual code run

    That's rather silly.

    since diff boxes have different performance depending on input_files, strings, mem., proc. etc -

    Sorry, but your machine is not 540x faster than mine. Change q{ to sub { and you'll see.

      Right! thank you for the pointers/clarification - makes sense now!!!! and this is now valid: (?)
      #!/usr/bin/perl -w # usage : ./this_script.pl input_file > captured_benchmarks use strict; use Benchmark; my @data = do { open(my $fh, '<', $0) or die; <$fh> }; timethese (1000000, { grep_and => sub{ my @res1 = grep /GGGGGACACCTTCTCTCTCT/ && /RH_MEa0001bG06/,@data; }, double_grep => sub{ my @res2 = grep /GGGGGACACCTTCTCTCTCT/,grep /RH_MEa0001bG06/,@data +; }, lookahead_grep => sub{ my @res3 = grep /^(?=.*GGGGGACACCTTCTCTCTCT)(?=.*RH_MEa0001bG06)/, +@data; } } );
      because I get the following now:
      Benchmark: timing 1000000 iterations of double_grep, grep_and, lookahe +ad_grep... double_grep: 11 wallclock secs ( 9.06 usr + 0.00 sys = 9.06 CPU) @ 1 +10350.92/s (n=1000000) grep_and: 8 wallclock secs ( 8.70 usr + 0.00 sys = 8.70 CPU) @ 11 +4902.91/s (n=1000000) lookahead_grep: 21 wallclock secs (20.31 usr + 0.00 sys = 20.31 CPU) +@ 49231.98/s (n=1000000)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://515313]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-04-20 00:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found