DigitalKitty has asked for the wisdom of the Perl Monks concerning the following question:

Hi all.

I am trying to write a small script to count the number of links in an HTML file. I'm not sure how to place an 'extension' on the end of the regex ( .com, etc ) but this is what I have so far:

#!/usr/bin/perl -w use strict; @ARGV="test.html"; my $count = 0; while (<>) { foreach(/(http:|ftp:)\/\//g) { $count++; } } print "There are $count links in this file.\n";

If anyone could offer a suggestion ( or criticism ), I would be most appreciative. Thanks, D.K.

Replies are listed 'Best First'.
Re: Counting links in a file?
by Fletch (Bishop) on Mar 24, 2002 at 18:16 UTC
Re: Counting links in a file?
by Amoe (Friar) on Mar 24, 2002 at 19:47 UTC

    If it's only a really short hack, and the page doesn't contain the strings "http://" or "ftp://" outside of anchor tags, that'll be fine. For a slightly more sophisticated method, you can use the old mainstay HTML::TokeParser.

    use HTML::TokeParser; my $count = 0; my $parsee = HTML::TokeParser->new('test.html') or die "couldn't open test.html: $!"; while (my $tag = $parsee->get_tag('a')) { $count++ if $tag->[1]{href} =~ m{(http|ftp)://}i; } print "There are $count links in this file.\n";

    And if you ever want to do anything more complicated, which you probably will eventually, you have crazyinsomniac's superb tutorial to guide you.


    --
    my one true love