Re: Extract info from HTML

Parsing HTML with regexen is always hard. I'd suggest that you use HTML::LinkExtor to grab all of the links off of the page, then sort through them looking for the ones you actually want.

The other possibility would be to use HTML::TreeBuilder, but that might turn out to be more complicated. However, I've found that TreeBuilder is a much easier way of thinking about HTML than Parser. Most of the useful documentation for HTML::TreeBuilder is found under HTML::Element, BTW.

perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^+*`^ ve^#$&V"+@( NO CARRIER'

Comment on Re: Extract info from HTML

Replies are listed 'Best First'.
(crazyinsomniac) Re^2: Extract info from HTML by crazyinsomniac (Prior) on Nov 12, 2001 at 13:00 UTC
I'd suggest that you use HTML::LinkExtor to grab all of the links off of the page, then sort through them looking for the ones you actually want. In some cases, that might work, but for this particular one, he has no way of determining which links were those of "authors" who replied to Name Space. As for HTML::TreeBuilder, demerphq puts on a nice show on how you'd do it, but for me, it's too much work (and a lil'bit of a mind ben). ___crazyinsomniac_______________________________________ `Disclaimer: Don't blame. It came from inside the void` `perl -e "$q=$_;map({chr unpack qq;H;,$_}split(q;;,qH*));print;$q/$q;"`	[reply]

Replies are listed 'Best First'.

(crazyinsomniac) Re^2: Extract info from HTML
by crazyinsomniac (Prior) on Nov 12, 2001 at 13:00 UTC

I'd suggest that you use HTML::LinkExtor to grab all of the links off of the page, then sort through them looking for the ones you actually want.

Name Space

As for HTML::TreeBuilder, demerphq puts on a nice show on how you'd do it, but for me, it's too much work (and a lil'bit of a mind ben).

___crazyinsomniac_______________________________________
Disclaimer: Don't blame. It came from inside the void
perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

[reply]