Re: stripped punctuation

Replies are listed 'Best First'.
Re^2: stripped punctuation by thealienz1 (Pilgrim) on Oct 06, 2005 at 20:20 UTC
After looking at your regexp I took to simplifying my needs with: `$word =~ s/^[^\w\d]+(.?)[^\w\d]+$/$1/;` My intention is remove everything that is not a letter or number up to the first letter, pull everything up till the last non letter or digit. When I look at it it makes sense, but my testing it does not work. Update* It works on the simple example I gave for 'Wilmer!'. I was running word count with a script as the input and the odd results I was seeing were the syntax in the script. I apologize.	[reply] [d/l]
Re^3: stripped punctuation by fishbot_v2 (Chaplain) on Oct 06, 2005 at 20:50 UTC
Except you want to strip punctuation from the beginning or end. The above regex only works if there is punctuation at both beginning and end. If removing any trailing/leading punctuation is in fact your goal, what about something like: `use strict; use warnings; my $word = 'Wilmer",'; $word =~ s/^ \W? # ignore any leading punc ( \w .? ) # swallow everything lazily (?: \W+ )? $ # ignore any trailing punc /$1/x; print $word;` [download] Update: Mind you, at that point, a much simpler regex will likely serve you better in terms of speed and readability: `$word =~ s/(?:^\W+)\|(?:\W+$)//g;` [download] Final update - benchmark: `Rate capture non_capture capture 16561/s -- -28% non_capture 22861/s 38% --` [download] The second suggestion is about 30% faster, on average. Additionally, `\w` doesn't mean what you think it means.	[reply] [d/l] [select]
Re^4: stripped punctuation by thealienz1 (Pilgrim) on Oct 06, 2005 at 21:37 UTC
I did basically your second regexp there in two steps. I will try the yours, though. I am curious the difference in speed between them. Of course I am wondering what you mean by \w doesn't mean what I think I mean.	[reply]
Re^5: stripped punctuation by fishbot_v2 (Chaplain) on Oct 06, 2005 at 21:45 UTC
Re^5: stripped punctuation by Nkuvu (Priest) on Oct 06, 2005 at 21:50 UTC