in reply to Split Function - Positions

Update: Ignore this! It is much slower.

This might work out a little faster if performance is a issue--which it usually is with genome related stuff.

#! perl -slw use strict; my $re = '([^X]+)X*' . '(?:([^X]+)X*)?' x 100; $re = qr[$re]; my $screen = "ATCGATCGXXXXXATCGATXXXACTGCTACGGTACXXXAATTATXGCGCGXXT"; $screen =~ $re; print for @-[ 1 .. $#- ]; __END__ P:\test>test2 0 13 22 38 45 52

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

Replies are listed 'Best First'.
Re: Re: Split Function - Positions
by duff (Parson) on Jun 02, 2004 at 03:10 UTC
    It is much slower.

    Surely it must be so; look at the size of your RE! :-)

      Agreed. Though I had thought that by grabbing the matches using a standard m[([^X]+)]g first, I would know how big to make the big re. Then a second pass would populate @-.

      As it turn out,

      push @posns, pos($screen) - length $1 while $screen =~ /([^X]+)/g;

      is substantially faster than

      push @posns, $-[ 0 ] while $screen =~ m[([^X]+)]g;

      which surprised me. I'm not sure why that would be?

      My best guess is that @- uses tie-style magic, and isn't populated unless it is accessed rather than when the regex runs? Perhaps the captures are made in the form of LVALUE refs and @- and @+ are derived from those if and when they are called for?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail