Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hints , or possibly elbow in ribs needed on parsing a Subject: header to grab the complete string
Three (I hope) Subject variations :
"foo.com, bar.net, blah.org, trivial.com";
"foo.com, bar.net, blah.org & trivial.com";
"foo.com, bar.net, blah.org and trivial.com";
if (/^([\w.-]+((,\s?)|(\s&\s)|(\sand\s)|$))+/) { print "Got a match - $1\n"; } else { print "Thursday night on the town - bad idea.\n"; }
prints only the first domain, much to my annoyance.
Out of interest, the typo <read: getting desperate> :   /^([\w.-]+((,\s?)|(\s&\s)|(\sand\s)|$?)$)+/ ...CoreDumped. Is this a friday thing ?

Replies are listed 'Best First'.
Re: friday morning regex
by dingus (Friar) on Nov 08, 2002 at 10:17 UTC
    Hints , or possibly elbow in ribs needed on parsing a Subject: header + to grab the complete string Three (I hope) Subject variations : "foo.com, bar.net, blah.org, trivial.com"; "foo.com, bar.net, blah.org & trivial.com"; "foo.com, bar.net, blah.org and trivial.com"; <SNIP CODE> prints only the first domain, much to my annoyance.
    Well $1 obviously only matches foo.com. what you want is to print $_ if you have all 4. Alternatively if you actually want the three separate (and know that there are only three) then you can with do
    my ($first, $second, $third, $fourth) = (/^([\w.-]+((,\s?)|(\s&\s)|(\s +and\s)|$))+/)
    Probably better to use split and get a list as in
    my @topics = split(/(?:,|\s*&|\s*and)\s*/)
    Update: fix typos and s/3/4/; I need my coffee too!

    Dingus


    Enter any 47-digit prime number to continue.

    Edit by tye to change PRE to CODE around wide lines

      Split was my fallback plan, but your

      /^(([\w.-]+((,\s?)|(\s&\s)|(\s+and\s)|$))+)/

      ..hasn't failed here yet.
      Where there's a will, there's a wa.. regexp :)

      Cheers dingus

      Simon, UK
Re: friday morning regex
by BrowserUk (Patriarch) on Nov 08, 2002 at 10:19 UTC

    My offer.

    #! perl -sw use strict; while (<DATA>) { my @domains = /([^,]+),\s*([^,]+),\s*([^,]+)\s*(?:,|and|&)\s*(.*?) +$/; print $_, $/ for @domains; print $/; } =pod comment only c:\test>test foo.com bar.net blah.org trivial.com foo.com bar.net blah.org trivial.com foo.com bar.net blah.org trivial.com c:\test> =cut __END__ foo.com, bar.net, blah.org, trivial.com foo.com, bar.net, blah.org & trivial.com foo.com, bar.net, blah.org and trivial.com

    Nah! You're thinking of Simon Templar, originally played (on UKTV) by Roger Moore and later by Ian Ogilvy
Re: friday morning regex
by robartes (Priest) on Nov 08, 2002 at 10:15 UTC
    This is one of those situations where split will help you to leave your sanity intact. Assuming your subjects are always comma delimited, with a possible & or 'and' as last separator:
    use strict; my @subjects=("camel, flea, humbug, hubris", "Camel, Flea, Humbug & Hu +bris","CAMEL, FLEA, HUMBUG and HUBRIS"); for (@subjects) { my @elements=split /\s*,\s*/; push @elements, split /\s+&\s+/, pop(@elements); push @elements, split /\s+and\s+/, pop(@elements); print join "\n", @elements; }
    Note that the whitespace handling in the seperators above is a bit crude, but hey, it's just example code :)

    Update: Must have more coffee ... see dingus' post below for a split that does it all in one go. Then again, that brings you right back in regex headache land, so YMMV.

    CU
    Robartes-

Re: friday morning regex
by Tanalis (Curate) on Nov 08, 2002 at 10:24 UTC
    Look very carefully at your parenthesis - Perl uses these to figure out what it's returning from the match.

    If all your lines are going to be of that format, and you can trust the data, it's possible to match the entire string using:

    /^([\w\s,.&-]+)/

    which works fine for me here for your given data.

    Hope that helps ..
    --Foxcub

    Update: Or just use $_. It's easier. :)