regex and WWW::Mechanize parsing a ->content of mechanize object

spx2 has asked for the wisdom of the Perl Monks concerning the following question:

i want to parse a piece of html with WWW::Mechanize so i used $mech is a WWW::Mechanize object and i have like this piece of string that i want to catch and it seems all methods i tried with regex don't seem to work.

i will paste here the relevant piece of the html i want to parse,and i can say that there are no more pieces like this.

      <div style="font-size: 14px; font-weight: bold; border-bottom: 1
+px solid black; margin-bottom: 5px; padding-bottom: 2px;">
        Cristina&#39;s Stats
    <\/div>
[download]

now im kind of suspicios on that ' that its messing stuff up,but im not sure. ok,we'll talk about this later.

the following is the code ive tried to make to match what we have above

$_=qq/
      <div style="font-size: 14px; font-weight: bold; border-bottom: 1
+px solid black; margin-bottom: 5px; padding-bottom: 2px;">
        Cristina&#39;s Stats
    <\/div>
/;

/>.(.*)s Stats.*<\/div>/s;
print $1;
[download]

now what i am sure of is that it does skip over the endline and that it goes to take the name Cristina wich is actually what i want the regex to match...well im pretty close to it .im not sure how does WWW::Mechanize come up with ' , is this represented as a character or just as a it is in the $mech->content ? (i didnt check that... :| )

hmmm , look how perlmonks displays it : '

the fact is that with the code above the regex works ok, but when faced with the real web content in $mech->content it doesnt work as expected,it doesnt match anything at all.

how can i fix this regex? or what other WWW::Mechanize methods/properties could help to solve the problem?

Comment on regex and WWW::Mechanize parsing a ->content of mechanize object Select or Download Code

Replies are listed 'Best First'.
Re: regex and WWW::Mechanize parsing a ->content of mechanize object by Cody Pendant (Prior) on Jul 01, 2007 at 03:16 UTC
Are you absolutely sure that WWW::Mechanize is seeing the same HTML you are? Print out $agent->content() and make sure it contains what you think it does. I've been caught before by a server sending different code to Mechanize than it sent to my browser, because it saw it as a different agent. Nobody says perl looks like line-noise any more kids today don't know what line-noise IS ...	[reply]
Re: regex and WWW::Mechanize parsing a ->content of mechanize object by c4onastick (Friar) on Jul 01, 2007 at 02:51 UTC
You can make this a little more specific by using: `m%>\s+(\w+)'s Stats\s+</div>%s; print $1;` [download] This is, of course, under the assumption that every one you want to match is in the form "Name's Stats".	[reply] [d/l]
Re^2: regex and WWW::Mechanize parsing a ->content of mechanize object by spx2 (Deacon) on Jul 01, 2007 at 03:07 UTC
thank you very much for the answer,but the main problem that the script only works in the testcase and not in a real case with WWW::Mechanize,still remains...	[reply]