in reply to Regexp explanation

The regex you're using is matching a "word character", i.e. any alphanumeric character and underscore, or a single quote, and as much of them in a row as possible. If you want it to match also a dash, you should say so:
$string =~ /((\w|'|-)+)/g
But that's bit un-regex-like. You should use a character class in this situation:
$string =~ /([-\w']+)/g
But this also matches dashes and apostrophes at the beginnings or ends of words, which may or may not be what you want. If not, you could force that a dash or apostrophe is between alphanumeric characters:
$string =~ /(\w+[-']?\w+)/g
But this has the unwanted effect that a word is at least 2 characters. So we can add an alternation, saying that we also allow a single character word (or number) if we can't match a word consisting of an alphanumerics with a dash or apostrophe between them:
$string =~ /(\w+[-']?\w+|\w)/g
A small complete test-case:
#!/usr/local/bin/perl use strict; use warnings; $/ = undef; my $string = <DATA>; while ($string =~ /(\w+[-']?\w+|\w)/g) { print "Word: <$1>\n"; } __DATA__ This is a sentence with words that're different from other words. They have apostrophes in them (') and dashes, or dash-like characters (-).
Try running this code and compare the output with the sentence in the __DATA__ section.

The ultimate guide (in my opinion) on regular expressions is Jeffrey Friedl's Mastering Regular Expressions, 2nd Edition.

Arjen

Replies are listed 'Best First'.
Re: Re: Regexp explanation
by Anonymous Monk on Apr 02, 2004 at 20:57 UTC
    Thanks, I'll run the code now. There are so many possible combination of word into a single composed word, that I need to test every case.