Re: Pattern matching html.

Generally speaking, parsing HTML files using regular expressions is a dangerous business. There's always just one more complication to take into account.

A far better approach is to use a proper HTML parser. The HTML::Parser module is available from CPAN, but it seems to me that one of it's sub-classes HTML::TokeParser or HTML::TreeBuilder might be more appropriate in this case. There's a particularly good article on HTML::Treebuilder in the current Perl Journal (issue 19).

--
<http://www.dave.org.uk>

"Perl makes the fun jobs fun
and the boring jobs bearable" - me

Comment on Re: Pattern matching html.

Replies are listed 'Best First'.
Re: Re: Pattern matching html. by zzspectrez (Hermit) on Nov 25, 2000 at 05:45 UTC
I read the article in Perl Journal. Im not sure if its solution would work well with my situation. The data I'm looking for is not embedded within html tokens but is within javascript functions within the html source. Unless I am mistaken, HTML::Treebuilder will not be able to help me with this problem. The data I am looking for is within some Javascript calls like `if (day = 10) document.write("The data I want")` I have not used these modules so maybee I am missing something. About all I can see using them for is to grab all the functions within the `<script LANGUAGE="JavaScript"> .... blah blah ... </script>` tags. But even this doesnt seem as straight forward since there are three diferent script sections and once I get the data I still have to parse it with a regular expression, right?? Thanks! zzSPECTREz	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Re: Pattern matching html.
by zzspectrez (Hermit) on Nov 25, 2000 at 05:45 UTC

I read the article in Perl Journal. Im not sure if its solution would work well with my situation. The data I'm looking for is not embedded within html tokens but is within javascript functions within the html source. Unless I am mistaken, HTML::Treebuilder will not be able to help me with this problem. The data I am looking for is within some Javascript calls like if (day = 10) document.write("The data I want")

I have not used these modules so maybee I am missing something. About all I can see using them for is to grab all the functions within the <script LANGUAGE="JavaScript"> .... blah blah ... </script> tags. But even this doesnt seem as straight forward since there are three diferent script sections and once I get the data I still have to parse it with a regular expression, right??

Thanks!
zzSPECTREz

[reply]
[d/l]
[select]