spx2 has asked for the wisdom of the Perl Monks concerning the following question:

i want to parse a piece of html with WWW::Mechanize so i used $mech is a WWW::Mechanize object and i have like this piece of string that i want to catch and it seems all methods i tried with regex don't seem to work.

i will paste here the relevant piece of the html i want to parse,and i can say that there are no more pieces like this.

<div style="font-size: 14px; font-weight: bold; border-bottom: 1 +px solid black; margin-bottom: 5px; padding-bottom: 2px;"> Cristina&#39;s Stats <\/div>

now im kind of suspicios on that &#39; that its messing stuff up,but im not sure. ok,we'll talk about this later.

the following is the code ive tried to make to match what we have above

$_=qq/ <div style="font-size: 14px; font-weight: bold; border-bottom: 1 +px solid black; margin-bottom: 5px; padding-bottom: 2px;"> Cristina&#39;s Stats <\/div> /; />.(.*)s Stats.*<\/div>/s; print $1;

now what i am sure of is that it does skip over the endline and that it goes to take the name  Cristina wich is actually what i want the regex to match...well im pretty close to it .im not sure how does WWW::Mechanize come up with &#39; , is this represented as a character or just as a it is in the $mech->content ? (i didnt check that... :| )

hmmm , look how perlmonks displays it : '

the fact is that with the code above the regex works ok, but when faced with the real web content in $mech->content it doesnt work as expected,it doesnt match anything at all.

how can i fix this regex? or what other WWW::Mechanize methods/properties could help to solve the problem?

Replies are listed 'Best First'.
Re: regex and WWW::Mechanize parsing a ->content of mechanize object
by Cody Pendant (Prior) on Jul 01, 2007 at 03:16 UTC
    Are you absolutely sure that WWW::Mechanize is seeing the same HTML you are?

    Print out $agent->content() and make sure it contains what you think it does.

    I've been caught before by a server sending different code to Mechanize than it sent to my browser, because it saw it as a different agent.



    Nobody says perl looks like line-noise any more
    kids today don't know what line-noise IS ...
Re: regex and WWW::Mechanize parsing a ->content of mechanize object
by c4onastick (Friar) on Jul 01, 2007 at 02:51 UTC
    You can make this a little more specific by using:
    m%>\s+(\w+)&#39;s Stats\s+</div>%s; print $1;
    This is, of course, under the assumption that every one you want to match is in the form "Name's Stats".

      thank you very much for the answer,but the main problem that the script only works in the testcase and not in a real case with WWW::Mechanize,still remains...