OfficeLinebacker has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, esteemed monks!

I have the following snippet of code where I am parsing some HTML to get a piece of text. I have split up the HTML into chunks and a typical chunk might look like

/td><td class="stdViewCon">Moving Services for MSCD Student Success Bu +ilding</td></tr></table></TD></TR> <TR VALIGN=top><TD><table class="stdViewITbl"><tr><td class="stdViewLn +kLbl">
The part I want is "Moving Services for MSCD Student Success Building" or whatever text is in that TD. My code goes like this:
$pieces[4] =~ m/>(^<+)</; my $title = $1;
where the $pieces variable holds that chunk.

First of all, why isn't this working?

Second of all, is there a better "Best Practices" way to do the RE match and store in $title?

Thanks!


I like computer programming because it's like Legos for the mind.

Replies are listed 'Best First'.
Re: Question why this Regex isn't matching
by pvaldes (Chaplain) on Sep 30, 2011 at 20:43 UTC

    if you still want a regex

    my $chain = '/td><td class="stdViewCon">Moving Services for MSCD Stude +nt Success Building</td></tr></table></TD></TR>'; if($chain =~ /<td class="stdViewCon">(.*?)<\/td>/m){print $1} else {print "ups"};

    but you probably should use a module instead

    Update: or maybe if you feel twisted...

    if($chain =~ /<\n?t\n?d\n?[ ]+c\n?l\n?a\n?s\n?s\n?=\n?"\n?s\n?t\n?d\n?V\n?i\n?e\n?w\n?C\n?on\n?"\n?>(.*?\n?.*?)<\n?\/\n?t\n?d\n?>/m){print $1}

    This is a better regex, matching also

    <td class="stdViewC on">Moving Services for MSCD Stude nt Success Building</td ></tr></tab le></TD></TR>';..
Re: Question why this Regex isn't matching
by OfficeLinebacker (Chaplain) on Sep 30, 2011 at 17:29 UTC
    So I ended up doing
    my @pieces4 = split /<|>/ , $pieces[4]; my $title = $pieces4[3];
    but I am still curious as to why my match wasn't working. Thanks! TMTOWTDI!

    I like computer programming because it's like Legos for the mind.
      $pieces[4] =~ m/>(^<+)</;

      This regex wants a '>' character followed by ^ (hat metacharacter), the start of the string! That's not likely to occur in any string unless the  /m regex modifier is used to allow ^ to match with embedded newlines (Update: Actually, even that won't happen. The match would have to be with something like  / > \n ^ /xm because with the /m switch ^ will only match immediately after a newline or at the very start of the string). Did you perhaps mean something like  m/>([^<]+)</?

        YES! I thought that the rules were the same for () as []. Thanks for clearing that up. And yes, the latter is what I want because I want to group and capture that part of the match into $1.

        I like computer programming because it's like Legos for the mind.