Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a Link Extractor that works with command line argument. I now need to make it work with the "File::Find". Here is the original one that works with command line argument:
use HTML::LinkExtor; use LWP::Simple; $base_url = shift; $parser = HTML::LinkExtor->new(undef, $base_url); $parser->parse(get($base_url))->eof; @links = $parser->links; foreach $linkarray (@links) { @element = @$linkarray; $elt_type = shift @element; while (@element) { ($attr_name, $attr_value) = splice(@element, 0 , 2); $seen{$attr_value}++; } } for (sort keys %seen) { print $_, "\n"; }
My attempt to put "File:Find" which gives me a error message: "error Permission denied: /thedirectory". How I can get this to work??
use HTML::LinkExtor; use LWP::Simple; use File::Find; sub LinkRoutine { my $name = $File::Find::name; open ( FH, $name ) or die "error $!: $name\n"; while(my $line = <FH>) { $base_url = $name; $parser = HTML::LinkExtor->new(undef, $name); $parser->parse(get($name))->eof; @links = $parser->links; foreach $linkarray (@links) { @element = @$linkarray; $elt_type = shift @element; while (@element) { ($attr_name, $attr_value) = splice(@element, 0 , 2); $seen{$attr_value}++; } } close(FH) } } find( \&LinkRoutine, "/thedirectory" ); for (sort keys %seen) { print $_, "\n"; }

Replies are listed 'Best First'.
Re: File Find error
by chip (Curate) on Jun 23, 2003 at 18:36 UTC
    Trying to open a directory as a file is not OK on some operating systems, and almost never helpful anyway. I suggest the first thing in LinkRoutine, after assigning to $name, should be:

       return unless -f $name;

        -- Chip Salzenberg, Free-Floating Agent of Chaos

      I added as you requested and now dont get an error message but also I dont get any output at all. Please advise what else I need to do to get this work:
      use HTML::LinkExtor; use LWP::Simple; use File::Find; sub LinkRoutine { my $name = $File::Find::name; return unless -f $name; open ( FH, $name ) or die "error $!: $name\n"; while(my $line = <FH>) { $base_url = $name; $parser = HTML::LinkExtor->new(undef, $name); $parser->parse(get($name))->eof; @links = $parser->links; foreach $linkarray (@links) { @element = @$linkarray; $elt_type = shift @element; while (@element) { ($attr_name, $attr_value) = splice(@element, 0 , 2); $seen{$attr_value}++; } } close(FH) } } find( \&LinkRoutine, "/perl/bin" ); for (sort keys %seen) { print $_, "\n"; }
        I suspect you don't want to break your file up into individual lines before HTML parsing them. :-) To test whether that's the problem, put:

          local $/;

        before the 'while my $line' loop.

            -- Chip Salzenberg, Free-Floating Agent of Chaos