Re: Re: Extracting similar data from html

What would be the best way to break the code into smaller chunks? The best way that I can think of wouldn't help because these two examples would still be in the same chunk because of them being in the same general catagory, Geography.

And to tell you the truth, I had the same Idea about them publishing to XML the first time I checked out the pages and I even predicted on-the-spot that for their next release they will also release it in XML. If they don't its just damn dehumifiying. =)

Comment on Re: Re: Extracting similar data from html

Replies are listed 'Best First'.
Re: Re: Re: Extracting similar data from html by ichimunki (Priest) on Jan 24, 2001 at 19:57 UTC
MeowChow's advice on how to work with the single RE looks good to me, except that if you are going to use it on all the different countries the implication is that the matches will all be valid and found in the same order for each page (so if a visual survey confirms this will work, by all means use that). This is the problem with RE-based parsing of HTML/XML, it seems like everytime you solve one problem, you find at least one more that impacts your last solution. The HTML::TokeParser module is really easy to use and will make this whole job a lot easier.	[reply]

Replies are listed 'Best First'.

Re: Re: Re: Extracting similar data from html
by ichimunki (Priest) on Jan 24, 2001 at 19:57 UTC

MeowChow's advice on how to work with the single RE looks good to me, except that if you are going to use it on all the different countries the implication is that the matches will all be valid and found in the same order for each page (so if a visual survey confirms this will work, by all means use that). This is the problem with RE-based parsing of HTML/XML, it seems like everytime you solve one problem, you find at least one more that impacts your last solution. The HTML::TokeParser module is really easy to use and will make this whole job a lot easier.

[reply]