Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I want to capture this line in an HTML file:
A - Tilt: 19° - Segments: 1(72-93)

and its respective HTML line is:
<td align="left"><b>A</b> - Tilt: 19&#176; - Segments: 1(72-93)</td>

My regexp is:
if($_=~/\<td align\=\"left\"\>\<b\>A\<\/b\>\s+\-\s+Tilt\:\s+\d+\&\#\d+ +\;\s+\-\s+Segments\:\s+(.*?)\<\/td\>/)

but it doesn't capture the info I want (i.e. the part after Segments:).
Any idea why?

Replies are listed 'Best First'.
Re: RegExp not working!
by Anonymous Monk on Apr 30, 2014 at 12:33 UTC
    use warnings; use strict; use re 'debug'; q{<td align="left"><b>A</b> - Tilt: 19&#176; - Segments: 1(72-93)</td> +} =~ /\<td align\=\"left\"\>\<b\>A\<\/b\>\s+\-\s+Tilt\:\s+\d+\&\#\d+\; +\s+\-\s+Segments\:\s+(.*?)\<\/td\>/; print "$1\n"; __END__ # ... snip debugging stuff ... Match successful! 1(72-93)

    So maybe your input is not what you are showing (maybe it's split on two lines?).

    How do I match XML, HTML, or other nasty, ugly things with a regex?

Re: RegExp not working!
by LanX (Saint) on Apr 30, 2014 at 12:39 UTC
    I suggest you do a binary search to localize your problem. (splitting the problem in halves)

    Probably an issue with special characters...

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      Hm, now I notice it works... Probably I was making some kind of stupid mistake before!
      Thank you guys!
Re: RegExp not working!
by ww (Archbishop) on Apr 30, 2014 at 21:21 UTC
    Your regex has entirely too many escapes... except where it needs them:
    #!/usr/bin/perl use strict; use 5.016; # 1084488 my $line = '<td align="left"><b>A</b> - Tilt: 19&#176; - Segments: 1(7 +2-93)</td>'; if ($line =~ /<td align="left"><b>A<\/b>\s+-\s+Tilt:\s+\d+&#\d+;\s+-\s ++Segments:\s+(.*?)<\/td>/) { say 'yep, good match'; say $1; } else { say 'boo'; }

    Note that the literal parentheses around the range 72-93 need to be escaped; otherwise they set up a capture. Conversely, you need to parenthesize the whole regex expression (i.e., between the regex delimiters) to capture, as you asked, the whole line... or, if you don't want the html (your OP indicated that's the case) put pairs of capturing parentheses around what you do want (to skip over the closing </b> and to eliminate the closing </td>) in which case, your <c>.+?<c> to match the segment value will need to be more specific or otherwise modified.

    Updated by putting that last para, that got lost somewhere :-(, back in the node, and adding the observation re skipping the html.


    Questions containing the words "doesn't work" (or their moral equivalent) will usually get a downvote from me unless accompanied by:
    1. code
    2. verbatim error and/or warning messages
    3. a coherent explanation of what "doesn't work actually means.

    check Ln42!