John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to write a regex that will match a truncated word? For example, if my target word is "foobar", I want to match qr/fooba$/, qr/foob$/, qr/foo$/, qr/fo$/, or qr/f$/.

—John

Replies are listed 'Best First'.
Re: Matching a truncated word
by grinder (Bishop) on Jul 31, 2001 at 21:09 UTC
    Match the other way around:

    perl -le '$x = shift; print 1 if $x and "foobar" =~ /^\Q$x/'

    update: added a first test to stop empty $x's from matching (thanks Sifmole), and a \Q in the regex as per the Perl Cookbook. Also note that in Real Life I would of course say 'foobar' and not "foobar", but that's neither here nor there.

    --
    g r i n d e r
      Minor nitpick, you will test true for the empty case which is not correct.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Matching a truncated word
by ChemBoy (Priest) on Jul 31, 2001 at 21:12 UTC

    I think that in this case the best approach uses substr:

    my $matched = do { my $i = length $_; $_ eq substr "foobar",0,$i; };
    or alternatively
    my $matched = ($_ eq substr "foobar",0,length);
    (if we're going to use special variables, may as well do it right...)

    The idea is from the CB discussion that led to <plug type="shameless">this snippet </plug>, which seems to work--I am hopeful therefore that this will also work. :-)



    If God had meant us to fly, he would *never* have given us the railroads.
        --Michael Flanders

Re: Matching a truncated word
by dragonchild (Archbishop) on Jul 31, 2001 at 20:54 UTC
    Well, one possibility isn't to use a regex, but instead to do something like:

    my @word = split //, $word_to_match; my $match = 0; for (0 .. $#word) { $match = 1 && last if $word_to_compare eq join //, @word[0..$_]; } print "Matched!\n" if $match;
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Matching a truncated word
by Sifmole (Chaplain) on Jul 31, 2001 at 21:01 UTC
    how about?
    $f =~ /f(o(o(b(a(r)?)?)?)?)?$/
      Wow, is that ugly. I frequently have a difficult time with nested parens in regexes, so I wrote a quick test script to use this regex to figure out where $1, $2 etc would go:
      #!/usr/bin/perl -w $string="foobar"; if ($string =~ /(f(o(o(b(a(r)?)?)?)?)?)$/) { print "$1\n"; print "$2\n"; print "$3\n"; }
      There is nothing really earth-shattering in the results:
      foobar oobar obar
      So essentially, this will do very much what I suppose John wants to do, as the most complete match goes to $1.

      Very nice, Sifmole, but I bet this tends to be a pretty slow regex.

      Scott

        I didn't test the speed, but I wouldn't be surprised to find out that it was slow. There are better ways to do this as are shown below; I however decided to be literal and present a regex to do what he wanted, since that was what he asked. :)
Re: Matching a truncated word
by John M. Dlugosz (Monsignor) on Jul 31, 2001 at 23:07 UTC
    Clarification: It should match the truncated piece of the last word, but the string being searched contains stuff before that. E.g.
    $text= "This sentence ends in foob"; # truncated from "...foobar" $text =~ /foobar$/; $text =~ /fooba$/; $text =~ /foob$/; ... etc. ...
      Hopefully this isn't too obfuscated. I've tried to restrain myself here. The trunc_match function creates a regular expression for any given input string, and should handle wierd stuff to by virtue of the qr() operator.
      sub trunc_match { my ($what) = @_; my @bits; for (1..length $what) { push (@bits,$what); chop $what; } return '('.join ('|', map { quotemeta($_) } @bits).')$'; } my $rx = trunc_match ("foobar"); $_ = "This sentence ends in foob"; if (/$rx/) { print "Truncated, ends in '$1'\n"; }
      The format of the regex is something like:
      (foobar|fooba|foob|foo|fo|f)$
      So you get whatever you're looking for in $1, or the returned array if you're brave enough to use /g.

      Update:
      For some reason, I had confused qr with quotemeta, and so I am updating the code here to be more sensible in that regard. Thanks, once again, Hofmator.
        That's an interesting use of qr// to escape out the individual parts, instead of using logic involving \Q..\E. I like the idea, since it avoids the confusion of escape chars in strings vs. in regex, and multiple levels of escaping. I'll be sure to remember that.