in reply to Re^4: Regex infinite loop?
in thread Regex infinite loop?

You could consider using a parser.

This extracts the first five bits of data you're after from a fragment of the first url you listed. Sure, there is a lot of it but, imo, this way will be easier to write/maintain and easily adapted should the HTML change (which it will).

#!/usr/bin/perl use warnings; use strict; use HTML::TreeBuilder; my $html = do{local $/;<DATA>}; my $t = HTML::TreeBuilder->new_from_content( $html ) or die qq{cant parse string\n}; my $td = $t->look_down( q{_tag} => q{td}, q{class} => q{marketWatch}, q{style} => qr{width:\s+20px;}, ) or die qq{cant find td\n}; my $script = $td->look_down( q{_tag} => q{script}, ) or die qq{cant find script\n}; my $js = $script->as_HTML; my ($ugid, $marketid) = $js =~ /ugid=([^&]+)&marketId=([^&]+)&/; printf qq{%-10s: %s\n}, q{ugid}, $ugid; printf qq{%-10s: %s\n}, q{marketId}, $marketid; $td = $t->look_down( q{_tag} => q{td}, q{style} => qr{width:\s+22%}, ) or die qq{cant find name\n}; printf qq{%-10s: %s\n}, q{name}, $td->as_trimmed_text; $td = $t->look_down( q{_tag} => q{td}, q{class} => q{qty}, ) or die qq{cant find qty\n}; printf qq{%-10s: %s\n}, q{qty}, $td->as_trimmed_text; $td = $t->look_down( q{_tag} => q{td}, q{class} => q{price}, ) or die qq{cant find qty\n}; printf qq{%-10s: %s\n}, q{price}, $td->as_trimmed_text; # gratuitous white space wiped __DATA__ <td class="marketWatch" style="width: 20px;"><p class="pngfix" id="mar +ket-watch-1">&#160;</p> <script type="text/javascript"> $('market-watch-1').onclick=function(){location.href='/marketWatch.h +tml?ugid=696398000&marketId=700213356&action=add';}; </script> </td> <td style="width: 25%;">Super Bowl XLIII</td> <td style="width: 22%;"> <a href="/marketplace/sports/nfl/super-bowl-xliii/upper-deck-end-zon +e/1/ravens">Upper Deck End Zone</a> </td> <td class="qty"> <input type="hidden" id="qty_select_1" value="2"/>2 </td> <td class="price" style="width: 18%;"> <input type="hidden" id="trade_price_1" value="25" /> <span class="price">$25.00</span> </td>
ugid : 696398000 marketId : 700213356 name : Upper Deck End Zone qty : 2 price : $25.00

Replies are listed 'Best First'.
Re^6: Regex infinite loop?
by Ninth Prince (Acolyte) on Oct 17, 2008 at 15:03 UTC

    Wow! I was thinking this might be the way I would (ultimately) have to go. I was reading about TreeBuilder in "Perl & LWP" on the train this morning. I am a PERL novice -- this will be a great learning tool for me! Thank you very much.

      The first step in growing from a novice to an earnest student of Perl is never ever write "PERL". The language is called Perl and the implementation is perl, but "PERL" makes you stand out as someone who knows nothing about it and from your code it is clear you are well past that point.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James