in reply to Re: HTML parsing OR capturing text from a string within tags
in thread HTML parsing OR capturing text from a string within tags

Popcorn Dave, I looked at your code. I dont know how it works yet. Will it allow me to add my own string and remove the text right after it. For exmaple...
<div\042\... > Person <b> Ran <\div>
will it allow me to capture Person Ran? I think this is the file where I can add my own tags :)
HTML-Tree-3.23/lib/HTML/AsSubs.pm

Replies are listed 'Best First'.
Re^3: HTML parsing OR capturing text from a string within tags
by Popcorn Dave (Abbot) on Dec 24, 2006 at 09:12 UTC
    All that code does is get a html page and parse it in to tokens. It will spit the whole mess out, so I ran it at command line, e.g. perl tokeparser.pl > output.txt

    That way you can scan through the file and see how it's tokenizing the information you fed it.

    Revolution. Today, 3 O'Clock. Meet behind the monkey bars.

    If quizzes are quizzical, what are tests?

      Yahoo offers something that I can use. I can send yahoo a request and yahoo will send me a xml file BUT I am getting errors because yahoo has urls with &'s in the file. I can either replace all of the & with %26 and save the file and then let the XML::Parser do the work or I can look at the Parser code and determine where it parses the file and make the change there. I am found where it parses the file in Expat.pm :: sub parse. Then it calls ParseString() but I cant find the sub ParseString.

      http://local.yahooapis.com/LocalSearchService/V2/localSearch?appid=YahooDemo&query=plumbing&zip=22222&format=php&results=10 Kevin
        I'm not sure why XML::Parser would complain about the & between code tags, but I've never used it myself. You might have a go at that with XML::Simple. I've seen quite a few monks say positive things about that module.

        As far as the & goes, there are monks that are better equipped to handle that question. My code that I pointed you to was for tearing down HTML in to parsed tokens, not dealing with XML.

        Revolution. Today, 3 O'Clock. Meet behind the monkey bars.

        If quizzes are quizzical, what are tests?