GaijinPunch has asked for the wisdom of the Perl Monks concerning the following question:

Hey kids. Sucking some stuff out of an HTML file. It's not enough to warrant any module, as it's pretty clean stuff. However, there's one thing I want to do.

( $result = $htmlline ) =~ s/<tag marker>(.+?)</td>/$1/;
This works, but every now and again $1 holds some br's (sorry --if I type it like a tag, it's interpretted as such). Sometimes zero, sometimes ten. I know I can add a line to it and easily take them out, but is there an easier 1-line solution? I've been doing Perl for over a year now... it's probably time I got a bit funkier w/ my regex's. I thought this would work, but didn't.

( ( $result = $htmlline ) =~ s/<tag marker>(.+?)</td>/$1/ ) =~ s/<br>/ +/g;


...but says I can't substitute in a substitute.

Replies are listed 'Best First'.
Re: regex help
by JediWizard (Deacon) on Jun 27, 2005 at 01:44 UTC

    Any reason you are opposed to two lines?

    ($result = $htmlline) =~ s/<tag marker>(.+?)</td>/$1/; $result =~ s/<br(?:\s\/)?>/\n/g;

    Update: If you really want one line: (untested)

    ( $result = $htmlline ) =~ s/{<tag marker>(.+?)</td>}{join("\n", split +(/<br(?:\s/?)?>/, $1))}eg;

    They say that time changes things, but you actually have to change them yourself.

    —Andy Warhol

Re: regex help
by Samy_rio (Vicar) on Jun 27, 2005 at 01:52 UTC

    Hi, check whether it helps you.

    ( $result = $htmlline ) =~ s/<tag marker>(.+?)<\/td>/my $tmp = $1; $tm +p =~ s!<br>!!g; $tmp/eg;

    Regards,
    Samy

Re: regex help
by GrandFather (Saint) on Jun 27, 2005 at 01:47 UTC

    Show us what you want to achieve by giving some example text that gets mishandled.

    Note that you can include tag stuff by wrapping it in <code> ... </code> tags.


    Perl is Huffman encoded by design.
Re: regex help
by davidrw (Prior) on Jun 27, 2005 at 01:56 UTC
    Still sorta 1 line (though just keeping as two lines is probably best for clarity), but can just change your:
    ( ( $result = $htmlline ) =~ s/<tag marker>(.+?)</td>/$1/ ) =~ s/<br>/ +/g;
    to this to ditch the "can't substitute in a substitute" error:
    ( ( $result = $htmlline ) =~ s#<tag marker>(.+?)</td>#$1# ) =~ s/<br>/ +/g && $result =~ s/<br>//g;
      I'm not really opposed to two lines... just always see more complex (and ultimately... not necessarily in this case) better optimzed code. Just thought I'd try to push myself, that's all.

      Thanks for the tips... I will give them a whirl.
Re: regex help
by injunjoel (Priest) on Jun 27, 2005 at 02:43 UTC
    Greetings,
    Looks like someone beat me to it but here it goes anyway.
    ($result = $htmlline) =~ s/<tag marker>(.+?)<\/td>/my $t=$1; $t=~s!<br +>!!g; $t;/eg

    The /e switch is a beautiful thing.

    -InjunJoel
    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo
Re: regex help
by GrandFather (Saint) on Jun 27, 2005 at 01:55 UTC

    Most likely what you want is:

    ([^<]+?) (rather than (.+?))

    to avoid collecting any tags.


    Perl is Huffman encoded by design.