in reply to File/String search...
Ah, jcwren has the right module name.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
(jcwren) RE: Re: File/String search...
by jcwren (Prior) on Sep 07, 2000 at 01:29 UTC | |
Be aware, though, that this module may be a little difficult to wrap your brain around, if you're new to Perl. If you're not comfortable with sub-classing and basic OO, if may be a little overwhelming. That said, there are some examples, and if you dig in the docs, you'll be able to find something that you should be able to carve up for your purposes. However, it's not something you'll be able to do in 20 minutes... I can't tell exactly what you're trying to do from what you've provided, but there is a module called HTML::TableParser. Since you're using <TR>/</TR> tags, this indicates table rows. HTML::TableParser is useful for yanking the data out of tables. The problem is that if you need the HREF tag info, HTML::TableParser won't give it to you. In the luke_repwalker.pl script, I had a similiar problem. At the bottom of the code is a package you may be able to extract, and with a little tinkering would allow you to extract the table text and the HREF links. Unless I'm making it more complicated than what you're trying to do, this may be of help. If you need some additional assistance with getting that working, drop me a /msg or an e-mail and we'll see what we can get going. Using regexps to extract HTML *can* work, but it's not the best idea. Certain tags aren't balanced pairs, which can really mess you up. Also, there are some places where people will render the starting tags, but not the ending tags. Most browsers, trying to be the acommodating beasts they are, don't care about the end tags. This is particularly true of table rows and data. As such, unless you can be assured that the HTML is DTD spec HTML, using regexps is risky business. This code was something I came up with, based on a /msg from Sharky_The_Dog. I realize that it could be collapsed into one statement, but that wasn't the point (and, dang it, tilly, I know $filename and $match could be 'use vars'!). It's also based on the fact that Sharky says his HTML is machine generated, and legal. --Chris | [reply] [d/l] |