nmerriweather has asked for the wisdom of the Perl Monks concerning the following question:

I can't get this regex right. I want to split a die statement into 2 components -- the user raised error ( die "MY ERROR" ) and the perl remark ( "at /path/to/file line 3.)

i've been staring at this too long. can someone point me in the right direction? This doesn't work at all, but it should illustrate what I mean better
sub die_clean { my ( $input ) = @_; my ( $msg , $line ) = $input =~ /^([.]+)( at [\\\/\w\.\-\ ] line [ +\d]+\.)$/; }

Replies are listed 'Best First'.
Re: regex die?
by johngg (Canon) on Mar 19, 2006 at 22:15 UTC
    I think part of your problem is that you are using square brackets [] inappropriately. They set up a character class so your [.]+ will match one or more literal full-stops; in a character class the full-stop loses it's meta-character meaning of matching any character and becomes a literal. Doing [abc]{3} would match exactly three of either a or b or c.

    It also makes things easier when matching paths to choose a delimiter character other than "/" by using the m operator. You can choose a character like the pipe symbol "|" or balanced brackets like {}. Once you've done that you no longer have to escape the "/" characters.

    My (non-tested) attempt at your sub die_clean would be:-

    sub die_clean { my $input = shift; my ($msg, $line) = $input =~ m{(.+?)( at /\S+ line \d+\.$}; }

    The $ sign anchors the match to the end of the string just on the off-chance that the "MY ERROR" part contains "at /pathe/to/file line n." which is unlikely but stranger things have happened. The .+? says match one or more of any character in a non-greedy fashion. You want to do this otherwise a .+ without the ? would consume the entire string and not leave any characters for the rest of the regular expression to match.

    Cheers,

    JohnGG

      There's no need to for /\S+, however. Better to simply .+?

        Six of one, a half dozen of the other unless /path/to/file contains spaces. It easily could, of course, but I mainly work on Unix-like systems so it is a bit of a blind spot with me :-)

        Cheers

        JohnGG

Re: regex die?
by GrandFather (Saint) on Mar 19, 2006 at 22:03 UTC

    This seems to be what you want, albeit not very clever - it will fail if the message contains the word "at". :)

    use warnings; use strict; while (<DATA>) { my ($msg, $line) = /((?:(?!\bat\b).)*)(.*)/; print "$msg\n $line\n"; } __DATA__ This is my text at noname2.pl line 4.

    Prints:

    This is my text at noname2.pl line 4.

    DWIM is Perl's answer to Gödel
Re: regex die?
by wazoox (Prior) on Mar 19, 2006 at 21:57 UTC
    isn't split enough then ? something like my ($l, $p) = split / at /, $message, 2 ;
Re: regex die?
by ayrnieu (Beadle) on Mar 19, 2006 at 22:24 UTC
    my ($msg, $line) = (split /\s*at|line\s*|\.$/, $input)[0,2];

    With GrandFather's caveat.

      except that the regex solution I gave returns all the text, even if it gets the split point wrong. The "split on 'at' solution" throws away anything after a second at, which is a lot worse.


      DWIM is Perl's answer to Gödel

        *shrug*, if I had a version that launched nuclear weapons when the message contained an 'at', it would still be the case than an assumption had been violated and that subsequent assumptions are then in peril.

Re: regex die?
by acid06 (Friar) on Mar 20, 2006 at 02:21 UTC
    Wouldn't something like this suffice?
    sub die_clean { my ($err) = @_; my ($rmk, $msg) = map {scalar reverse} split(/ ta /, reverse($err), +2); my ($line) = ($rmk =~ /line (\d+)/; }
    You can even have the word 'at' in the error message.


    acid06
    perl -e "print pack('h*', 16369646), scalar reverse $="
      the split at 'ta' is probably as bad as splitting on at.. BUT... reversing the string and then regexing might be worth stabbing at. i know regex has ^$, but that might be more natural to write with
        Find a breakage and I'll buy you a beer.


        acid06
        perl -e "print pack('h*', 16369646), scalar reverse $="