Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Producing a list of offsets efficiently

by dws (Chancellor)
on May 28, 2005 at 20:10 UTC ( #461391=note: print w/replies, xml ) Need Help??


in reply to Producing a list of offsets efficiently

my $string = "aXbcXdefgXhijXklmnopqrXstuvwXyz"; my $target = 'X'; my @offsets; while ( $string =~ /$target/g ) { push @offsets, pos($string) - length($target); } print "@offsets\n";

Does the trick for me, and it scales to arbitrary substrings.


(Update: hoisting length($target) out of the loop is a no-brainer. Alas, I had no brain at the time.)

Replies are listed 'Best First'.
Re^2: Producing a list of offsets efficiently
by nobull (Friar) on May 28, 2005 at 20:20 UTC
    pos($string) - length($target) can be replaced with $-[0].
Re^2: Producing a list of offsets efficiently
by BrowserUk (Patriarch) on May 28, 2005 at 20:12 UTC

    Problem: If the string is very large with lots of occurances of 'X', your @offsets array is being constantly resized and copied as you add new values.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

      FWIW, push seems to be faster than direct indexing on a pre-grown array, as the following table shows.

      The only difference between the windex and windex_2 alternatives in the following table is that the former uses push and the latter uses direct indexing on a pre-grown array. The size of the string is 100_000, containing about 3800 matches. (Full code within the readmore tags.)

      Rate wregex windex_2 windex wregex 212/s -- -24% -32% windex_2 277/s 31% -- -11% windex 310/s 47% 12% --

      the lowliest monk

      In that case, I'd probably code up an XS module that makes two passes over the target string, allocating the result vector once after the first pass, and filling it in on the second.

      The resize and copies have an amortized constant cost per array element added. Put another way, pushing one element at a time averages out to a O(1) operation.

        Thoughts?

        #! perl -slw use strict; use Benchmark::Timer; my $T= new Benchmark::Timer; for( 1 .. 1000 ) { my @small = (1) x 100; my @large = (1 ) x 130900; $T->start( "small: add 100 by 1" ); push @small, 1 for 1 .. 100; $T->stop( "small: add 100 by 1" ); $T->start( "large: add 100 by 1" ); push @large, 1 for 1 .. 100; $T->stop( "large: add 100 by 1" ); $T->start( "small: add 100 by 100" ); push @small, 1 .. 100; $T->stop( "small: add 100 by 100" ); $T->start( "large: add 100 by 100" ); push @large, 1 .. 100; $T->stop( "large: add 100 by 100" ); } $T->report; __END__ P:\test>461552,pl 1000 trials of small: add 100 by 1 ( 94.947ms total), 94us/trial 1000 trials of large: add 100 by 1 (112.226ms total), 112us/trial 1000 trials of small: add 100 by 100 ( 16.009ms total), 16us/trial 1000 trials of large: add 100 by 100 ( 15.977ms total), 15us/trial

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://461391]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2023-12-07 16:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?











    Results (33 votes). Check out past polls.

    Notices?