in reply to Re3: Regex: get first N characters but break at whitespace
in thread Regex: get first N characters but break at whitespace

A very good point. Mine did not save alot. I did make one more stab, though i doubt it will out-run tye's regexp :
(look ma, no regexp)
# Code by itself : sub mz2 { $c = substr($string,0,201); $a = rindex($c,' '); ( $a == 201 ? $string : substr($string,0,$a)); } ## Bench addition : MZS2 => '($chunk) = &mz2()', ## Bench output : C:\WINDOWS\DESKTOP>perl index Benchmark: running Hofmator, MZS2, MZSanford, japhy, tye, each for at +least 3 CPU seconds... Hofmator: 4 wallclock secs ( 3.07 usr + 0.00 sys = 3.07 CPU) @ 378 +02.61/s (n=116054) MZS2: 4 wallclock secs ( 3.13 usr + 0.00 sys = 3.13 CPU) @ 442 +32.59/s (n=138448) MZSanford: 3 wallclock secs ( 3.19 usr + 0.00 sys = 3.19 CPU) @ 131 +59.87/s (n=41980) japhy: 3 wallclock secs ( 3.13 usr + 0.00 sys = 3.13 CPU) @ 491 +81.79/s (n=153939) tye: 4 wallclock secs ( 3.02 usr + 0.00 sys = 3.02 CPU) @ 514 +03.64/s (n=155239)

As expected, no better. But, worth the excersize. I did a quick test to make sure the output was the same, though, i would normally use a regexp, as it would be clearer in most cases than all this oddness. I do not know if this will work on all input cases, you milage may vary, etc, etc, etc ...

Replies are listed 'Best First'.
Re5: Regex: get first N characters but break at whitespace
by Hofmator (Curate) on Jan 16, 2002 at 16:51 UTC

    Your idea relaxes the requirements somewhat as it uses literal space instead of whitespace. Furthermore you might be left with extra spaces at the end of your string.

    If both of these things don't matter then you are easily fastest like this (rindex can take a starting index as its 3rd argument!): $chunk = substr($string,0,rindex($string,' ',200));

    The benchmark showes a huge gain - but of course it's a bit unfair :) considering the different requirements ...

    Benchmark: running japhy, rindex, tye, each for at least 3 CPU seconds +... japhy: 2 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 24 +4478.67/s (n=733436) rindex: 3 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 84 +2494.33/s (n=2527483) tye: 4 wallclock secs ( 3.39 usr + 0.00 sys = 3.39 CPU) @ 30 +5243.95/s (n=1034777)

    -- Hofmator