knewter has asked for the wisdom of the Perl Monks concerning the following question:

If I have a series of records listed as:

Name<br /> Address<br /> URL<br /><br /> Name2<br /> Address2<br /><br />
And the URL line is an optional line (not in most records, as seen above), how would one write a regular expression to match this record and have $1, $2, and $3 be Name, Address, and URL respectively, while matching both of the above records?

Replies are listed 'Best First'.
Re: Regexp with optional group containing backreference
by ikegami (Patriarch) on Aug 15, 2005 at 20:25 UTC

    (I'm assuming those <br /> are a problem with your post and not in the original data.)

    Reading in paragraph mode should do the trick:

    $/ = ''; while (<DATA>) { chomp; my ($name, $addr, $url) = split(/\n/, $_); print("name: $name\n"); print("addr: $addr\n"); print("url: ", (defined($url) ? $url : '[undef]'), "\n"); print("--\n"); } __DATA__ Name Address URL Name2 Address2

    output:

    name: Name addr: Address url: URL -- name: Name2 addr: Address2 url: [undef] --
Re: Regexp with optional group containing backreference
by Transient (Hermit) on Aug 15, 2005 at 20:24 UTC
    #!/usr/bin/perl local $/=""; while (<DATA>) { my ( $name, $address, $url ) = split /<br \/>/, $_; print "Got $name $address $url\n"; } __DATA__ Name<br /> Address<br /> URL<br /><br /> Name2<br /> Address2<br /><br />
    P.S. you can chomp if you want to...

    Update:
    (...you can leave your friends behind, 'cause your friends don't chomp and if they don't chomp, well they're no friends of mine) - sorry, I had to...

    In case of no newlines as separator:
    #!/usr/bin/perl local $/="<br /><br />"; while (<DATA>) { s/^\s*// && s/\s*$//; my ( $name, $address, $url ) = split /<br \/>/, $_; print "Got $name $address $url\n"; } __DATA__ Name<br /> Address<br /> URL<br /><br /> Name2<br /> Address2<br /><br />
Re: Regexp with optional group containing backreference
by GrandFather (Saint) on Aug 15, 2005 at 22:12 UTC

    A variant using a regex:

    use warnings; use strict; $/ = ''; while (<DATA>) { m|\G[\n\s]*(.*?)<br />[\n\s]*(.*?)(?:<br />[\n\s]*(.*?))?(?:<br />){ +2}[\n\s]*|isg; my $url = $3 || ""; print "Name: $1\nAddr: $2\nURL: $url\n\n"; }

    Perl is Huffman encoded by design.
Re: Regexp with optional group containing backreference
by AReed (Pilgrim) on Aug 16, 2005 at 05:55 UTC
    Very late, but just for fun since I can't sleep anyway...
    Solution #1:
    use strict; use warnings; $/ = "<br /><br />"; while(<DATA>) { chomp; my @record = split(/<br \/>/); print "@record\n" if (@record == 3); } __DATA__ Name1<br />Address1<br /><br />Name2<br />Address2<br />URL2<br /><br +/>Name3<br />Address3<br />URL3<br /><br />Name4<br />Address4<br />< +br />
    and solution #2:
    use strict; use warnings; $/ = "<br />"; my $i = -1; my @record; while(<DATA>) { chomp; if (/^$/) { $i = -1; next; } $record[++$i] = $_; print "@record\n" if ($i == 2); } __DATA__ Name1<br />Address1<br /><br />Name2<br />Address2<br />URL2<br /><br +/>Name3<br />Address3<br />URL3<br /><br />Name4<br />Address4<br />< +br />
Re: Regexp with optional group containing backreference
by TedPride (Priest) on Aug 16, 2005 at 07:39 UTC
    I personally wouldn't use regex for this, but here you go. The secret is to make your entire address string match optional with (?:REGEX HERE)?
    $_ = join '', <DATA>; while (m/(.*)<br \/>\n(?:(.*)<br \/>\n)?(.*)<br \/><br \/>/g) { print "$1 - $2 - $3\n\n"; } __DATA__ Name<br /> Address<br /> URL<br /><br /> Name2<br /> Address2<br /><br />