bradcathey has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monasterians,

Business::FedEx::DirectConnect, used for estimating FedEx shipping costs (and errors), returns a response that uses quotes and commas as "delimiters:"

"F059"3,"Invalid Recipient postal code format for specified country."99,""

I'm interested in parsing what comes after the "3,". I started with a funky looking:

if ($response =~ /"3,"([\w \!\@\#\$\%\*\&\(\)\-\+\=\.\/\?\,\:\;\']+)"/ +i) { $error = $1; }

I next tried a cleaner, and successful:

$response =~ /3",([^"]+.+)"/;

However, I have read caveats about using the greedy .+, however the negated character class [^"] seems to save me.

Is there a faster/safer/better regex for this? Thanks in advance.


—Brad
"The important work of moving the world forward does not wait to be done by perfect men." George Eliot

Replies are listed 'Best First'.
Re: Regex negation (or golf)
by ikegami (Patriarch) on Feb 20, 2007 at 04:14 UTC
    sub dequote { for (my $s = @_ ? $_[0] : $_) { s/^"//; s/"$//; return $_; } } { my $qstr = qr{"[^"]*"}; my $num = qr{\d+}; my $response = '"F059"3,"Invalid Recipient postal code format for s +pecified country."99,""'; my ($error, $code) = $response =~ /^$qstr$num,($qstr)($num),$qstr$/ or die("Unrecognized format\n"); $error = dequote $error; print("error: $error\n"); print("code: $code\n"); }

    $qstr and dequote need to be adjusted to handle your (unmentioned) quoting mechanism.

Re: Regex negation (or golf)
by GrandFather (Saint) on Feb 20, 2007 at 03:20 UTC

    What are the fields in that thing? Looks to me rather like comma separated data with two sub-fields per field. If that is the case then maybe something like Text::xSV or Text::CSV is appropriate for a first pass over the line?


    DWIM is Perl's answer to Gödel
Re: Regex negation (or golf)
by ady (Deacon) on Feb 20, 2007 at 06:33 UTC
    Hi, --maybe you can use this alternative...

    #!/usr/bin/perl -w use strict; use Data::Dumper; my $text = '"F059"3,"Invalid Recipient postal code format for specifie +d country."99,""'; my @text = split /"/, $text; print Dumper(\@text); print $text[3];
    Output:
    $VAR1 = [ '', 'F059', '3,', 'Invalid Recipient postal code format for specified country. +', '99,' ]; Invalid Recipient postal code format for specified country.

    Best regards,
    allan
Re: Regex negation (or golf)
by johngg (Canon) on Feb 20, 2007 at 10:28 UTC
    The format of the response is potentially difficult so some sort of parser would probably be the safest option. The possibility of commas in the quoted parts of the response makes splitting on comma unreliable. Assuming fields are delimited by commas and sub-fields are one inside the double-quotes and one after them, this should work.

    use strict; use warnings; use Data::Dumper; my $rxFlds = qr {(?x) " ([^"]*) " ([^,]*) (?:,|\z) }; my $resp = q{"F059"3,"Invalid blech"99,"","This, that"33,""77,}; my @flds; while ( $resp =~ m{$rxFlds}g ) { push @flds, [$1, $2]; } print Data::Dumper->Dumpxs([\@flds], [qw{*flds}])

    Here's the output

    @flds = ( [ 'F059', '3' ], [ 'Invalid blech', '99' ], [ '', '' ], [ 'This, that', '33' ], [ '', '77' ] );

    I hope this is of use.

    Cheers,

    JohnGG

      Precisely on the comma thing. The only thing I'm sure about is that a quote begins the next data pair.


      —Brad
      "The important work of moving the world forward does not wait to be done by perfect men." George Eliot
        The other worry would be if your double-quoted field contained double quotes, eg $resp = q{"Wrong "name" given"88};. All bets would be off then and you'd probably have to go for a parser solution.

        Cheers,

        JohnGG

Re: Regex negation (or golf)
by Skeeve (Parson) on Feb 20, 2007 at 11:07 UTC

    I'm not sure, but isn't this a kind of weird response string?

    I looked into the CPAN documentation and found nothing about this kind of result. Which method do you use? Wouldn't it be a good idea too look into other methods first for one that can cope with these results?


    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e