I am hoping to add a module to CPAN and I was hoping to get some feedback/comments/ideas.The functionality of this Module is detailed below. I am not really sure what I should call it. So far I have the following options in mind:
The module is intended to be used as part of a web crawler although I have found myself using parts of it elsewhere.
The basic functionality of the proposed module will include:
I intend to make these functions available both independently and together in an Object Oriented structure.
The OO part would look something like this:
my $b = new foo::bar { CURRENT_URL => 'www.site_i_am_crawling.com/page_i_am_c +rawling.html', ## New will croak if this is not provided. FIND_CONTAINED_URLS => 1 , ## Default 1 BREAK_CONTAINED_URLS => 1 , ## Default 1 ABSOLUTE_CONTAINED_URLS => 1 , ## Default 1 CLEAN_URLS => 1 , ## Default 1 CURRENT_URL_HTML => "long string here", ## Optional, will b +e extracted if this is not provided. USER-AGENT => '' , TIMEOUT => 5 , DEBUG => 0 } $b->get_url_info( ## Can reset object parameters here. ## All processing will be performed only when this function is cal +led. ); my @array_of_urls = $b->get_contained_urls(); ## ALSO for NON-OO my @array_of_urls = get_contained_urls( URL => '', HTML => '' ); ... my $all_results = $b->get_all_results();
The following is a list of existing CPAN modules that are similar to the one proposed here.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: RFC: URI::URL::Detail
by moritz (Cardinal) on Aug 07, 2009 at 13:26 UTC | |
by tmharish (Friar) on Aug 07, 2009 at 14:09 UTC | |
by moritz (Cardinal) on Aug 07, 2009 at 16:04 UTC | |
|
Re: RFC: URI::URL::Detail
by Anonymous Monk on Aug 08, 2009 at 10:05 UTC | |
by tmharish (Friar) on Aug 11, 2009 at 11:43 UTC | |
|
Re: RFC: URI::URL::Detail
by tmharish (Friar) on Aug 09, 2009 at 08:27 UTC | |
by Jenda (Abbot) on Aug 10, 2009 at 09:51 UTC | |
by tmharish (Friar) on Aug 11, 2009 at 08:14 UTC | |
by Anonymous Monk on Aug 11, 2009 at 08:27 UTC |