fixles has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm trying to extract some info from HTML where I need to identity it over muliple lines. The HTML is as follows.
<td class="Label" align="right">Last Login:</td> <td>Yesterday</td>
I'm trying to extract the word Yesterday but whatever I try fails. Is there someway of matching the newline or the spaces? Printing a join of @TD = $mech->content() =~ m|<td>(.*?)</td>|g shows yesterday but also everything else within a TD tag. The HTML is indented by about 20 spaces like the above example code. Does anyone know how I could use a regex to extract just Yesterday from this HTML? Many Thanks, James

Replies are listed 'Best First'.
Re: Regex on HTML across multiple lines with WWW::Mechanize->content()
by Corion (Patriarch) on Jul 25, 2011 at 18:36 UTC
Re: Regex on HTML across multiple lines with WWW::Mechanize->content()
by onelesd (Pilgrim) on Jul 26, 2011 at 06:15 UTC

    Use the s and/or m operator, documented in perlre:

    • m

      Treat string as multiple lines. That is, change "^" and "$" from matching the start or end of the string to matching the start or end of any line anywhere within the string.

    • s

      Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.

      Used together, as /ms, they let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string.