in reply to Regex - Matching prefixes of a word

If I'm following you correctly, you want 'h', 'he', 'hel', 'hell' and 'hello' all to match when the test is for 'hello'. You can use \b (Using character classes) to anchor at the start (or end) of the word (it's a zero-width match), and then a series of ? (0 or 1 occurrences, Matching repetitions) in a set of nested Non capturing groupings with | (alternators, Matching this or that). So for hello, you could use:

$line =~ /\bh(?:\b|e(?:\b|l(?:\b|l(?:\b|o))))/i;

Where I've added case-insensitivity for good measure. Note that this does not match 'howdy'. Given that you'll want to do this for multiple words, I imagine, you'll probably want to auto generate your regular expressions, using something like:

use strict; use warnings; while (<DATA>) { my $regex = '\b' . substr $_,0,1; foreach my $letter (split //, substr($_,1)) { $regex .= '(?:\b|' . $letter; } $regex .= ')' x ((length)-1); print $regex; } __DATA__ hello

Update: As usual, ikegami's code is better than mine. A rewritten autogenerator using his pattern (Update 2: including dependence on Perl version for regex):

use strict; use warnings; my $five10 = $] > 5.010 ? 1 : 0; while (<DATA>) { my $regex = '\b' . substr $_,0,1; foreach my $letter (split //, substr($_,1)) { $regex .= '(?:' . $letter; } $regex .= (')?' . '+' x $five10) x ((length)-1) . '\b'; print $regex; } __DATA__ hello

Replies are listed 'Best First'.
Re^2: Regex - Matching prefixes of a word
by ikegami (Patriarch) on Jul 24, 2009 at 16:24 UTC
    That's overly complicated. Move the \b to the end.
    $line =~ /\bh(?:e(?:l(?:l(?:o)?)?)?)?\b/i;
    In 5.10, you can even disable needless backtracking
    $line =~ /\bh(?:e(?:l(?:l(?:o)?+)?+)?+)?+\b/i;
Re^2: Regex - Matching prefixes of a word
by SuicideJunkie (Vicar) on Jul 24, 2009 at 17:06 UTC

    I hadn't considered generating the regex algorithmically, and it is a great idea. I was concerned about complexity and readability of the regex itself, but I missed the (now) obvious curtain to hide the man behind :)

    A hash of huge, ugly and efficient pregenerated regex fragments to include will work wonderfully!