in reply to stripped punctuation

use strict; use warnings; my $stripped = qr/["!]/; my $word = '"Wilmer!"'; $word =~ s/^(?:$stripped)+(.*?)(?:$stripped)+$/$1/; print $word;

Perl is Huffman encoded by design.

Replies are listed 'Best First'.
Re^2: stripped punctuation
by thealienz1 (Pilgrim) on Oct 06, 2005 at 20:20 UTC

    After looking at your regexp I took to simplifying my needs with:

    $word =~ s/^[^\w\d]+(.*?)[^\w\d]+$/$1/;

    My intention is remove everything that is not a letter or number up to the first letter, pull everything up till the last non letter or digit. When I look at it it makes sense, but my testing it does not work.

    Update

    It works on the simple example I gave for 'Wilmer!'. I was running word count with a script as the input and the odd results I was seeing were the syntax in the script. I apologize.

      Except you want to strip punctuation from the beginning or end. The above regex only works if there is punctuation at both beginning and end.

      If removing any trailing/leading punctuation is in fact your goal, what about something like:

      use strict; use warnings; my $word = 'Wilmer",'; $word =~ s/^ \W*? # ignore any leading punc ( \w .*? ) # swallow everything lazily (?: \W+ )? $ # ignore any trailing punc /$1/x; print $word;

      Update: Mind you, at that point, a much simpler regex will likely serve you better in terms of speed and readability:

      $word =~ s/(?:^\W+)|(?:\W+$)//g;

      Final update - benchmark:

      Rate capture non_capture capture 16561/s -- -28% non_capture 22861/s 38% --

      The second suggestion is about 30% faster, on average.

      Additionally, \w doesn't mean what you think it means.

        I did basically your second regexp there in two steps. I will try the yours, though. I am curious the difference in speed between them. Of course I am wondering what you mean by \w doesn't mean what I think I mean.