littleperl has asked for the wisdom of the Perl Monks concerning the following question:

With the following regex I would like to extract the value for "rn", only after the "action=login" has occured. This because the page (text) from which I want to extract this has multiple "rn" occurances.

a little help on this would be very welcome, as this goes well beyond my perl/regex knowledge

source string:
.... <form method="post" action="login.lp" name="authform" id="authform"> <input type="hidden" name="rn" value="-1383135969">

my working regex so far:

$message =~ /\"rn\"\s+value\=+"([^"]+)"/ ; $rn= $1;

my attempt of recognizing the pattern first:

$message =~ /^action.*\"rn\"\s+value\=+"([^"]+)"/

Replies are listed 'Best First'.
Re: regex match after pattern
by choroba (Cardinal) on Mar 12, 2015 at 15:52 UTC
    Don't use regular expressions to parse HTML. Use a proper parser:
    #!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $html = 'XML::LibXML'->load_html( string => << '__HTML__'); <form method="post" action="logout.lp" name="authform" id="logoutform" +> <input type="hidden" name="rn" value="-1383135969"> <form method="post" action="login.lp" name="authform" id="authform"> <input type="hidden" name="rn" value="-1383135969"> __HTML__ my $value = $html->findvalue('//form[@action="login.lp"]/input[@name=" +rn"]/@value'); print $value, "\n";
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      I've got just shy of four million reasons why a proper parser should be used for this purpose. But the inquisitive engineer in me wants to know: Under what conditions (not including fear of modules) might one justify not using the proper parser procedure?

        I'd use the regex approach if the input in question was generated from a known template that would never change, which means all the whitespace would always be in the sampe place. Or in a one-time script I'll never run again. Which makes it almost never. :-)
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Under what conditions (not including fear of modules) might one justify not using the proper parser procedure?

        When you know how to write and debug the regular expression yourself ;) and understand the risks

      thank you for the quick response. Indeed why did I not opt for a parser, not sure, didn't use my head will probably is the most apropriate answer :) stepping away from the regex idea for this pattern search then.
Re: regex match after pattern
by Athanasius (Archbishop) on Mar 12, 2015 at 16:04 UTC

    Hello littleperl,

    choroba is correct, of course, but for the record, this regex works:

    use strict; use warnings; my $message =<<END; <abc name="rn" value="42"> <form method="post" action="login.lp" name="authform" id="authform"> <input type="hidden" name="rn" value="-1383135969"> <xyz name="rn" value="43"> END $message =~ /action\s*=\s*"login.*?".*?"rn"\s+value\s*=\s*"(.+?)"/s; my $rn = $1; print "rn = '$rn'\n" if $rn;

    Output:

    1:34 >perl 1180_SoPW.pl rn = '-1383135969' 1:38 >

    Notes:

    • In the regex elements .*? and .+? the ? specifies a minimal match instead of the default maximal one
    • the /s modifier is needed to allow . to match newlines

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: regex match after pattern
by ww (Archbishop) on Mar 12, 2015 at 16:19 UTC

    Another approach to a regex solution, assuming Choroba's explanation of when a regex might be the appropriate tool... and multiple sources, simulated by the array below:

    #!/usr/bin/perl use 5.018; use strict; use warnings; my @message = ('<form method="post" action="login.lp" name="authform" +id="authform"> <input type="hidden" name="rn" value="-1383135969">', '<form method="post" action="login.lp" name="authform" i +d="authform"> <input type="hidden" name="rn" value="-2383135969">', '<form method="post" action="login.lp" name="authform" i +d="authform"> <input type="hidden" name="rn" value="-3383135969">'); for my $msg (@message) { $msg =~ /^action.*?\"rn\"\s+value\=+"([^"]+)"/g; # say " DEBUG: At Ln 15, msg: $msg"; if ( $msg =~ /\"rn\"\s+value\=+"([^"]+)"/ ) { my $rn= $1; say "At Ln 19, \$rn: $rn\n"; } } =head C:\>1119824.pl At Ln 19, $rn: -1383135969 At Ln 19, $rn: -2383135969 At Ln 19, $rn: -3383135969 =cut