Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I know the beauty of Perl modules is that I should be able to edit them to suit my purposes, and of course I should write to the author (in this case Brian D Foy) and ask him for an update to the code, but right now, can I get a witness, I mean, can I get a hand adding one more sub to SimpleLinkExtor so that I can use it to grab remote script files. The tag is <SCRIPT> and the attribute is SRC and I swear, I've tried, I just can't figure out where to start...


($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
=~y~b-v~a-z~s; print

Replies are listed 'Best First'.
Re: Who wants to help me adjust LinkExtor::Simple?
by tachyon (Chancellor) on Jun 15, 2004 at 06:01 UTC

    Add this line:

    script tag

    to %AUTO_METHODS hash and it should work. (untested)

    %AUTO_METHODS = qw( background attribute href attribute src attribute a tag area tag base tag body tag img tag frame tag script tag );

    cheers

    tachyon

      HTML::SimpleLinkExtor 1.05, just released, contains this addition. I also included a note in the docs to tell people that they can do this while they wait for me to upload a fix.
      Tachyon, you're a legend. It works! Thanks so much. Five seconds with pico and I'm ready to go. Great stuff.


      ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
      =~y~b-v~a-z~s; print
Re: Who wants to help me adjust LinkExtor::Simple?
by PodMaster (Abbot) on Jun 15, 2004 at 05:56 UTC
    The tag is <SCRIPT> and the attribute is SRC and I swear, I've tried, I just can't figure out where to start...
    You start by reading the source, and showing us what you tried :)

    I would just use HTML::LinkExtractor. The interface is a tad different, but it has a better support for "links."

    On the other hand, looking at the HTML::SimpleLinkExtor source I can see that it's using HTML::LinkExtor which decides what constitutes a link via %HTML::Tagset::linkElements.

    update: Hmm, it looks like %HTML::Tagset::linkElements already has script/src in there, so I'd suggest you show the html and the perl that's not working out for you.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

Re: Who wants to help me adjust LinkExtor::Simple?
by saskaqueer (Friar) on Jun 15, 2004 at 05:45 UTC

    This:

    ... and of course I should write to the author (in this case Brian D Foy) and ask him for an update to the code...

    and this:

    $ perldoc HTML::SimpleLinkExtor <snip> =head1 TO DO This module doesn't handle all of the HTML tags that might have links. If someone wants those, I'll add them.

    That's all that anybody needs to say :)

      I did see that! : ) I was just opening it up, hoping it would say "here's where we extract the IMG SRC attributes" then go ahead with something I could munge, like if ($tag == 'img'){push @img_srcs $attributes{'src'}} kind of thing, but no, it was all rather more obscure than that.

      I'll write to Brian but in the meantime, anyone got any ideas for me?

      The other way to go of course is to get into TokeParser or something and do it myself, but that seems a shame when this module does everything I want except that one thing.



      ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
      =~y~b-v~a-z~s; print
Re: Who wants to help me adjust LinkExtor::Simple?
by saskaqueer (Friar) on Jun 15, 2004 at 06:26 UTC

    Carrying on with my recent fetish for HTML::Parser, I provide this. It will grab the 'src', 'href', 'background' and any other attributes you wish from any HTML element. This could be changed very easily to limit which tag elements are recognized, etc. Enjoy.

    #!perl -w use strict; use HTML::Parser; # list of html attributes which contain UR[IL]s my @ATTR = qw( src href background ); my $parser = HTML::Parser->new( start_h => [ \&parser_tag, 'self, attr' ] ); $parser->parse_file( *DATA ); print join($/, @{ $parser->{_links} }), $/; sub parser_tag { my ($self, $attr) = @_; while ( my ($attr_n, $attr_v) = each %$attr ) { next unless grep $_ eq $attr_n, @ATTR; push @{ $self->{_links} }, $attr_v; } } __DATA__ <html> <head> <title>Sample Page</title> <script language="JavaScript" src="/foo.js"></script> <style type="text/css" src="/bar.css"></style> </head> <body background="/qux.jpg"> <a href="/some-link"><img src="/some-image" /></a> </body> </html>