in reply to Re^2: My first socket program is SLOW?
in thread My first socket program is SLOW?

"I'm actually trying to fetch about 6 pages and then snatch stuff out of them with regex's"

The general advice is not to use a regex to parse/manipulate HTML/XML but use one of the parsing modules on cpan. For HTML look at modules such as HTML::Parses and HTML::TokeParser, also see HTML::TokeParser help - parsing headlines or use super search to find more examples.

Hope this helps

Martin

Replies are listed 'Best First'.
Re^4: My first socket program is SLOW?
by ttlgreen (Sexton) on Jan 15, 2009 at 21:43 UTC

    Interesting... Fetching a page using the get method from LWP::Simple is alot faster than what I was doing... I guess I should have taken the advice and gone with that in the first place.

    Looks like the solution here is to re-write the whole thing to use LWP::Simple, use threads to do it all at once, and call it a day!

    Thanks for the great help everyone.

    By the way (Martin) is there a page/discussion somewhere about /why/ "The general advice is not to use a regex to parse/manipulate HTML/XML...".

    I'd be curious to know more about that. Basically all I'm doing is taking a few numbers and things like that out of the pages, not trying to do something to all the html tags. I'll check out those modules/pages anyway though.

    Thanks!