Re: foreach (@array) s/x/y/ efficiency

First, an interesting point about the regex... Within a character class, \b matches a backspace, rather than a word-boundary. [\W\b] will match either a non-word character or a backspace character (which is a non-word character anyway).

I would actually use lookbehind and lookahead to make the replacement simpler: s,(?<!\w)($worda)(\W+)($wordb)(?!\w),$1$2$3,i Next, I'm trying to figure out what makes s|||g; s|||g; necessary. Because your regex only allows non-word characters between word A and word B, and  and  each contain a word character, once bold tags are put around a word that word should never be matched by your regex again. Ah... Unless your material may already contain some bold tags before you do any of the substitutions. Then you could end up with doubled tags to remove.

Finally, here's how I would try to do this more efficiently. I would combine @material into a single string, perform the substitutions, and then split back to @material.

my $material = join "\0", @material;

foreach $phrase (@key_phrases) {
    my($worda, $wordb) = split / /, $phrase;

    $material =~ s{(?<!\w)($worda)([^\w\0]+)($wordb)(?!\w)}
                  {<B>$1</B>$2<B>$3</B>}i;
}

@material = split /\0/, $material;
[download]

As you can see, I'm using "\0" as a temporary divider between pieces of @material; I've updated the regex to make sure matches don't overlap two pieces.

I considered also building a single regex to match all the key phrases, but since each phrase appears only once I don't know if that would be more efficient.

Comment on Re: foreach (@array) s/x/y/ efficiency Select or Download Code

Replies are listed 'Best First'.
Re: Re: foreach (@array) s/x/y/ efficiency by gryphon (Abbot) on Jan 11, 2001 at 02:43 UTC
This ends up actually taking longer to run than the original. I think that's because the s/// has to fly through the entire string of `$material`, where the loop version (original) stops (`last` out of loop) when it hits the match. (Not that I would really know for sure what I'm talking about.)	[reply] [d/l] [select]
Re: Re: Re: foreach (@array) s/x/y/ efficiency by chipmunk (Parson) on Jan 11, 2001 at 03:36 UTC
Could you provide the data you used to compare the different solutions? I really was expecting mine to be faster, so I'd like to figure out what I did wrong. Thanks!	[reply]
Re: Re: Re: Re: foreach (@array) s/x/y/ efficiency by gryphon (Abbot) on Jan 11, 2001 at 04:15 UTC
It's the book of Luke from the NIV Bible. Each element of `@material` is setup `^\d\t\d\tLuke\t$chapter\$verse\$verse_text\n`. And `@key_phrases` is just each key phrase from the material with a single space between the two words. (ex. `$key_phrases[6153] = "not immediately";`)	[reply] [d/l] [select]