jdtoronto has asked for the wisdom of the Perl Monks concerning the following question:

I hate regexen, I have been programming Perl for enarly 10 years and I sitll haven't come to terms with them, so I need a little help here please:
There is a new voicemail in mailbox 11234567890: From: "Unknown" <Unknown> Length: 0:13 seconds Date: Thursday, January 19, 2006 at 11:35:01 AM
I need to extract the 11 digit number following the word 'mailbox' into one variable and I then need to get the value for the From: field (in this case unknown) into another variable. The text is in a single string.

Help please!

jdtoronto

Replies are listed 'Best First'.
Re: A little regex help please!
by ikegami (Patriarch) on Jan 31, 2006 at 21:23 UTC

    More of the same, but

    • Lookahead not needed.
    • I think /(.*?)\n/ is slower than /([^\n]*)/.
    • local $/; is safer than undef $/;.
    • It's probably safer to search for "mailbox" followed by a number than for just a number.
    • Hardcoding the number's length is probably undesirable.
    local $/; my $text = <DATA>; my ($number, $from) = $text =~ /mailbox (\d+).*?From:\s*([^\n]+)/;

    It might be more efficient to replace the .*? with .*. Both will work.

      /.*?\n/ isn't really any slower, since it jumps to the closest newline it finds. The silly thing is the ? in that regex. The .*? in your regex WON'T match over the newlines, so you're missing an /s modifier I think. And since [^\n] IS ., I'd just write:
      { local $/; my ($num, $from) = <FILE> =~ m{ mailbox \s+ (\d+): \s+ From: \s* (.*) }x; }
      Untested, but it seems to me that it should work properly, given the data from the OP.

      Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
      How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: A little regex help please!
by InfiniteSilence (Curate) on Jan 31, 2006 at 20:41 UTC
    Lookahead assertion
    #!/usr/bin/perl -w use strict; local $/; my $line = <DATA>; if($line=~m/(\d+):\s+From:\s+(.*?)(?=Length)/s){ print $1 . $2; } 1; __DATA__ There is a new voicemail in mailbox 11234567890: From: "Unknown" <Unknown> Length: 0:13 seconds Date: Thursday, January 19, 2006 at 11:35:01 AM
    Prints:
    perl hate.pl 11234567890"Unknown" <Unknown>

    Update:Okay, I have to ask it...how in the hell did you go 10 years coding in Perl without learning regexes?

    Celebrate Intellectual Diversity

      Well, going ten years without learning regexes is pretty hard! I have a little 'cookbook' of regex formulae, I have every Perl book worth having, and a few that are worth nothing and much of the time I can work out the basics. If you really want to know the sad truth I am a mathematician who hd a heart attack at age 36 (I am not 52) and there are some areas in which I was most proficient which now baffle me totally. Other things i have no trouble with at all! Go figure.

      jdtoronto

      Update:Okay, I have to ask it...how in the hell did you go 10 years coding in Perl without learning regexes?

      Wild guess: much fiddling with index and substr, and occasionally pack and unpack?
      ;-)

Re: A little regex help please!
by explorer (Chaplain) on Jan 31, 2006 at 20:48 UTC
    Let kk.txt a file with your data.
    Run this code like perl program.pl kk.txt
    #!/usr/bin/perl -l use warnings; use strict; undef $/; my $txt = <>; (my $number) = $txt =~ /mailbox (\d{11})/; (my $from) = $txt =~ /From: (.*?)\n/; print "$number -> $from\n";
Re: A little regex help please!
by davido (Cardinal) on Feb 01, 2006 at 02:51 UTC

    Assuming the entire "record" is held in a scalar:

    use strict; use warnings; my $string = <<HERE; There is a new voicemail in mailbox 11234567890: From: "Unknown" <Unknown> Length: 0:13 seconds Date: Thursday, January 19, 2006 at 11:35:01 AM HERE if( $string =~ m/ mailbox\s+(\d+): .+? From:\s+(\S[^\n]*) /sx ) { print "Match: mailbox $1, from $2\n"; }

    It may be unnecessarily explicit about what it requires for anchors, but with this sort of thing it's probably better to leave as little as possible to chance.


    Dave