Re: HTML::TokeParser Problem

Replies are listed 'Best First'.
Re^2: HTML::TokeParser Problem by sirius98 (Acolyte) on Dec 17, 2004 at 03:14 UTC
Its an asp page the actualy tag when i look at the source from the page is below, note the problem at saloon.18". How do i fix this? <input style="display:none" name="stradtext" type="text" value="Mercedes-Benz S500 W220 2002 Zircon Silver Anthracite trim, 4dr Sedan 4.96ltr 8 Cyls Petrol 5sp Auto, 22884kms, One owner. Fully optioned luxury saloon.18" alloy wheels. Low kilometres. Inspection will not disappoint., ABS, Air Bag, Air Con, CD Player, C'lock, Climate Control, Cruise cont, Electric Seats, Immobiliser, Leather, Logbooks, Mags, Metallic Paint, Phone Kit, Pwr Mirrors, Pwr Windows, Pwr Steer, Radio Cass, Speed Alert, Sunroof, Traction Control, Trip Computer, 802GZE $189950.00 Briggs Carse Moloney 779 Kingsford Smith Drv Eagle Farm (07)36302244">	[reply] [d/l]
Re^3: HTML::TokeParser Problem by graff (Chancellor) on Dec 17, 2004 at 05:41 UTC
I think this is where one is supposed to use a '%HH' expression to safely invoke/evoke the "special" character (double quote in this case) without screwing up everything else: `... saloon.18%22 alloy...` [download] The same might apply to the apostrophe that occurs later in the same string (replace it with "%27"), and since you'll be doing stuff in perl with this string, you'd better treat the dollar sign as well ("%24"). By any chance, has something already been done to this text, in terms of "decoding" uri escapes, before you get to the point in your script that throws the error? If so, maybe just postpone doing that sort of step until later in the script. Update (oops): As tye points out in the following reply, I'm wrong -- it's not a URI-escape thing, it's an HTML Entity thing. So, my question should have been phrased "has something been done to decode HTML entity references (like `"`)?" If so, don't do that, or do it later.	[reply] [d/l] [select]
Re^4: HTML::TokeParser Problem (entitties) by tye (Sage) on Dec 17, 2004 at 06:51 UTC
No, %XX is for URLs which this isn't. This is just HTML so use `"` in place of ". - tye	[reply] [d/l]
Re^4: HTML::TokeParser Problem by sirius98 (Acolyte) on Dec 17, 2004 at 10:08 UTC
I'm parsing a web page that i have no control over the format of the content. Basically i get the webpage using the LWP::UserAgent module then i call the HTML::TokeParser on that web page. Then i call the while loop that is in my post. Im wondering if i can set what the TokeParser object that i create sees as ending the field so instead of it being an " i can set it to > Any Suggestions would be great.	[reply]
Re^4: HTML::TokeParser Problem by sirius98 (Acolyte) on Dec 18, 2004 at 05:35 UTC
I think the only way im going to be able to get what i need out of this tag is to capture the whole tag contents. I was wondering if there was a way of specifying the entire contents of the tag instead of, for example just the value element. I have to use TokeParser because i need to specify a particular tag out of several similar tags differentiated only by name.	[reply]