Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to remove the first occurrence of the literal '.vacation.msg' (or any pieces of its name) as there is a small problem with the way my logging script works, and since I'm over my deadline this is what I generated.
/((msg|vacation.msg)|a(cation.msg|t(ation.msg|ition.msg))| catation.msg|g|i(on.msg|tion.msg)|msg|n.msg|on.msg|sg|t(ation.msg|i(on +.msg|tion. msg))|vacation.msg)//;
anyone know a non-ugly way to do the same thing?

Replies are listed 'Best First'.
Re: more efficient regular expression please
by hardburn (Abbot) on Jul 03, 2003 at 14:32 UTC

    Probably slower than what you're using, but easier to understand (I hope):

    for(my $str = 'vaction.msg'; $str; $str = substr($str, 1, length($str) +)) { $replace_on =~ s/$str//; }

    Ahh, the rarely seen (around here, anyway) three-element form of for. I've tested this loop and it produces the strings correctly. I don't know if this exactly matches your needs, but you should be able to modify it easily enough.

    Update: Forgot that you need to escape the '.' before it gets put in a regex, which complicates things a bit.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

      Forgot that you need to escape the '.' before it gets put in a regex, which complicates things a bit.
      Not by that much. All you need to add, is '\Q':
      $replace_on =~ s/\Q$str//;
      I won't comment on the actual value of your idea, though. :)
Re: more efficient regular expression please
by l2kashe (Deacon) on Jul 03, 2003 at 15:10 UTC
    So im not sure what you mean by more efficient. In terms of runtime, or in terms of regex construction. I also dont know what you mean by any pieces of its name, or rather how paranoid you want to be. but this is what I have.
    #!/usr/bin/perl use strict; my $str; my $tomatch = 'foobar the vacation.msg is here'; my @permute = map { $str; $str = $_ . $str; } ( reverse( split(//, 'vacation') ) ); $regex = '(\.?(?:' . join('|', reverse(@permute)) . ')\.msg)'; print "$regex\n"; print "Match: $1\n" if ($regex =~ m/$tomatch/ );
    HTH

    MMMMM... Chocolaty Perl Goodness.....
Re: more efficient regular expression please
by BrowserUk (Patriarch) on Jul 03, 2003 at 16:04 UTC

    Assuming that I haven't screwed up again, which is by no means guarenteed, your regex doesn't quite catch all the cases.

    Input  cation.msgThe quick brown fox jumps over the lazy dog

    Output  caThe quick brown fox jumps over the lazy dog

    However, that could be corrected. And the sad truth is that your version actually runs a tad faster than my (second) attempt, and also tye's offering, though the difference is marginal, and may fade once you correct the error.

    The only saving grace is that mine is shorter and prettier, but then tye's is even shorter and prettier than mine, and runs a tad quicker to boot.

    I'm taking a vacation as of now, from PM at least. It would be nice to take a real one, but that's not an option. I might be back someday.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: more efficient regular expression please
by BrowserUk (Patriarch) on Jul 03, 2003 at 15:01 UTC

    This seems to do what you want, if I understand you correctly.

    $line =~ s[v?a?c?a?t?i?o?n\.msg][];

    Update: Right idea, wrong execution. See 271190 below :( I think this is better.

    $line =~ s[(?:(?:(?:(?:(?:(?:v?a)?c)?a)?t)?i)?o)?n.msg][];

    Test code


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      That also removes cat.msg, vain.msg, etc. Closer to the original intent (as best as I can tell) would be:

      $line =~ s[v?(a?(c?(a?(t?(i?(o?(n?(\.?(m?(­s?g))))))))))][];

                      - tye

        Your right. It does need brackets. I had another go similar in vein to yours, though I bracketed the other way. Both seem to work, with the same caveat that in the partial insertion is preceded by one or more characters that match the missing part, they get removed also, but without some delimeter, this will always be the case.

        Any thought about which bracketing causes the least amount of work?


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      oops!

      last two lines of output

      The quick brown fox jumps oer the lazy dog The quick brown fox jumps over the lazy dog

      Where did that 'v' go?


        Your right. It was flawed. Another attempt.

        #! perl -slw use strict; # Set up some test data. my $text = 'The quick brown fox jumps over the lazy dog'; my @tests = map{ my $t = $text; substr($t, rand( length $text ), 0 ) = $_; $t; } map{ substr 'vacation.msg', $_ } 0 ..6; print "Before\n"; print for @tests; # s[(?:(?:(?:(?:(?:(?:v?a)?c)?a)?t)?i)?o)?n.msg][] for @tests; print "\nAfter\n"; print for @tests; __END__ P:\>junk Before The quick brown fox jumps ovvacation.msger the lazy dog The quick brown fox jumps oacation.msgver the lazy dog The quick brocation.msgwn fox jumps over the lazy dog The quick brown fox jumpsation.msg over the lazy dog The quick brown fox jumps over the lation.msgzy dog Tion.msghe quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy on.msgdog After The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lzy dog The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog

        This time, the apparent error is explainable.

        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: more efficient regular expression please
by CountZero (Bishop) on Jul 04, 2003 at 13:20 UTC

    Rather than immediately jumping into churning out code, we should perhaps ask why one should also look at "any pieces of its name"?

    Is it perhaps because this string might be split over two lines?

    If that is the case, than I can imagine that simpler solutions are available.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law