in reply to performance issues

I was almost with this up to the regex /^(.*)\t(.*)$/. Can you post the working code, from which one can unambiguously reverse-engineer the requirement ?

When you say you are searching for a "given string", what do you mean ? Are you looking for lines in the file where the "block" before the tab exactly matches the given string ? or the given string as a sub-string of one, or either or both blocks ? or what ?

Replies are listed 'Best First'.
Re^2: performance issues
by perlcat (Novice) on Jan 27, 2009 at 19:09 UTC
    I will post some code asap. I'm indeed looking for a substring in the left block. I'm using m=~ to achieve this.

      OK. From what I've understood you to want... see if something along these lines works any better:

      use strict ; use warnings ; my $data ; { local $/ ; $data = <DATA> ; } ; my $s = 'bc' ; # Substring to search for in left hand side my @r = ($data =~ m/$s[^\s]*[ \t]+(.*)/g) ; print "@r\n" ; __DATA__ abcd s1 efgh s2 ijklm s3 nopq s4 bc s5 aqbc s6 rst s7 uvwx s8 yz s9
      noting that the [ \t]+ can be changed to simply \t for use with your file (I've used [ \t]+ here only because that works even if the stuff between the "blocks" has been converted to any mixture tabs and spaces). Noting also that the (.*) will stop at end of line.

      Note also that this is assuming that the right hand side will not contain any tabs (or spaces)

      The regex can match to the search substring in the right hand side, but since that ends in a newline, the [^\s]*[ \t]+ part will fail the match.

      The following:

      my $data = "\n" ; { local $/ ; $data .= <DATA> ; } ; my $s = 'bc' ; # Substring to search for in left hand side my @r = ($data =~ m/\n.*?$s[^\s]*[ \t]+(.*)/g) ;
      avoids matching in the right hand side, and the right hand side can contain tabs (and spaces). I don't have enough data to know if there's any performance advantage or disadvantage here.

        thank you I will test the code this weekend and see if it runs any faster.
      I'm using m=~ to achieve this.
      I guess that this is a typo—if you've got working code at all, then you must be using something like (if we must call it that) =~m//, not m=~.