Re: Using HTML::TableExtract

As others have pointed out, the new lines in the original source header strings are the culprit.

What is generally missed about header-based extraction with HTML::TableExtract, however, is that the strings that you use to define the headers will eventually be turned into case-insensitive regular expressions.

So change the following part:

my $te = new HTML::TableExtract( headers => ["Product Name",
                                             "Software Version",
                                             "Compatible with Versions
+",
                                             "Date" ] );
[download]

to this (notice the single quotes..otherwise you'll have to escape your backslashes):

my $te = new HTML::TableExtract( headers => ['Product\s+Name',
                                             'Software\s+Version',
                                             'Compatible\s+with\s+Vers
+ions",
                                             'Date' ] );
[download]

...and things will work as you expect. Also note that rather than strings, you can pass pre-compiled regexps from qr//, like so:

my $te = new HTML::TableExtract( headers => [qr/Product\s+Name/,
                                             qr/Software\s+Version/,
                                             qr/Compatible\s+with\s+Ve
+rsions/,
                                             'Date' ] );
[download]

Cheers,
Matt

Comment on Re: Using HTML::TableExtract Select or Download Code

Replies are listed 'Best First'.
Re^2: Using HTML::TableExtract by davido (Cardinal) on Jun 19, 2004 at 04:09 UTC
Thanks for all the answers everyone. I had it in the back of my mind that the problem may have been related to embeded newlines, but tried embedding my own in the header search strings, and just didn't get the combination quite right. The tr/// suggestion was helpful. But I particularly like the fact that I can pass a regexp in. As I thought the issue over I actually thought to myself, "I wish I could just pass in a regexp." Viola, I can. ;) Thanks again. Dave	[reply]

Replies are listed 'Best First'.

Re^2: Using HTML::TableExtract
by davido (Cardinal) on Jun 19, 2004 at 04:09 UTC

Thanks for all the answers everyone. I had it in the back of my mind that the problem may have been related to embeded newlines, but tried embedding my own in the header search strings, and just didn't get the combination quite right. The tr/// suggestion was helpful.

But I particularly like the fact that I can pass a regexp in. As I thought the issue over I actually thought to myself, "I wish I could just pass in a regexp." Viola, I can. ;)

Thanks again.

Dave

[reply]