in reply to performance enhancement

The "normal" way to do this is with two substitutions: $str =~ s/^\s+//; $str =~ s/\s+\z//;. Is that not fast enough? I see a few problems with String::Strip:

Replies are listed 'Best First'.
Re^2: performance enhancement
by demerphq (Chancellor) on Jul 19, 2006 at 22:37 UTC

    The "normal" way to do this is with two substitutions

    Ive often pondered on an optimisation of $s=~s/^\s+|\s+$/g so that this is no longer true. So far its been over my head in the sense of requiring too much research time to implement compared to other useful tasks that I can do, but maybe one day...

    And for people wondering why this isn't the recommended way, its because this pattern will try to match every point in the string. The regex engine isnt currently smart enough to optimise this to only try the pattern twice.

    ---
    $world=~s/war/peace/g

      Why s/^\s+|\s*$/g rather than s/^\s+|\s+$/g, s/^\s*|\s*$/g or s/^\s*|\s+$/g?

      A benchmark suggests the two substitution approach is faster than any of the single substitution approaches and that there are interesting variations between the different single substitution options:

      Rate starstar plusstar plusplus starplus twosub starstar 47.0/s -- -8% -25% -28% -42% plusstar 51.2/s 9% -- -18% -21% -37% plusplus 62.5/s 33% 22% -- -4% -23% starplus 65.1/s 39% 27% 4% -- -20% twosub 81.6/s 74% 59% 31% 25% --

      The benchmark uses a single large string (100_000 characters) with a fairly large run of spaces (1000) at the start and end.


      DWIM is Perl's answer to Gödel

        Why s/^\s+|\s*$/g rather than s/^\s+|\s+$/g, s/^\s*|\s*$/g or s/^\s*|\s+$/g?

        Er, the quantifier mismatch was a typo. I have corrected the original node.

        But its good as you can see the speed advantage of the twosub method. Although I suspect you would see a radically different result if the string were more "normal" for instance the content of a node, with the intention of triming each line.

        Also you have to be very careful with benchmarking regexes, really subtle differences in the input string and the pattern can result in wildly different run times due to how the optimiser handles them. For instance if your string/pattern facilitates a single FBM search followed by a match followed by a failing FBM search then its going to be massivley faster than a pattern where a FBM search matches many times, each rejected by the regex engine itself afterwards. FBM is really fast, the regex engine is not. In fact despite the common perception that the regex engine itself is fast Id say its not, rather the reputation comes from using a lot of really tricky optimisations to cut down as much as possible how much the regex engine proper is involved. In other words the perl regex engine is perceived as fast mostly because we do our damndest not to use it when we dont need to.

        ---
        $world=~s/war/peace/g