comment on

Hi,

Not sure if you wanna use my version, but this seems to work for CSS (JS should be a simple tweak);

sub get_css { 

    my $tmp = shift  ;

    my $self         ;
    my $url          ;
    my $html         ;

    my @result_arr   ;

    my $user_agent = "Html_Miner/0.01" ;
    my $timeout    = 60                ; 

    my $domain       ;
    

    ## First extract all required information.

    if( UNIVERSAL::isa( $tmp, 'HTML::Miner' )  ) { 

    $self = $tmp                        ;

    $url     =  $self->{ CURRENT_URL      } ;
    $html    =  $self->{ CURRENT_URL_HTML } ;
    $domain  =  $self->{ _BASE_DOMAIN     } ;

    } else { 
    
    $url = $tmp                         ;

    ## Check for validity of url! 
    my ( $tmp, $protocol, $domain, $uri ) =  
        _convert_to_valid_url( $url )   ;
    $url = $tmp                         ;

    my @params               = @_       ;
    my $html_has_been_passed = @params  ;

    
    if( $html_has_been_passed ) { 
        $html = shift                   ;
    } else { 

        ## Need to retrieve html 
    
        eval { 
        require LWP::UserAgent      ;
        require HTTP::Request       ;
        }; 
        croak( "LWP::UserAgent and HTTP::Request are required if the u
+rl is to be fetched!" ) 
        if( $@ );


        $html = _get_url_html( $url, $user_agent, $timeout )   ;
        
    } ## HTML Not passed


    }     ## Not called on Object.

    while( $html =~ m/\<link .*? href=\"(.+?)\.css\" \/?\>/gis ){
        push( @result_arr, "$1.css" );
    }

    return \@result_arr;

}
[download]

Cheers

Andy

In reply to Re^5: Extract CSS + JS + Image URLs from a HTML page? by ultranerds
in thread Extract CSS + JS + Image URLs from a HTML page? by ultranerds

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.