Parsing to get server info

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to parse my links and get server name info between "http://" and the first "/"
I need to then get a count of each one. Here an example of what I have:

http://riverserver/dir1/dir2/index.html
http://riverserver/dir1/dir2/index.html
http://perlmonks.org/index.pl
http://webserver.company.serv/Adir/Bdir/thePage.cfm
http://riverserver/dir1/dir2/index.html
http://webserver.company.serv/Adir/Bdir/thePage.cfm
[download]

I want to parse the info and get this:

riverserver = 3
perlmonks.org = 1
webserver.company.serv = 2
[download]

I think I have a problem with how to parse the links and add a hash but not sure if I am doing this right??

use strict;
my $link1 = 'http://perlmonks.org/index.pl';
my $link2 = 'http://riverserver/dir1/dir2/index.html';
#more links etc...
$link1 =~ s/http\:\/\///;
$link2 =~ s/http\:\/\///;

my @link1 = split /\//, $link1;
my @link2 = split /\//, $link2;

print "$link1[0]\n";
print "$link2[0]\n";
[download]

Comment on Parsing to get server info Select or Download Code

Replies are listed 'Best First'.
Re: Parsing to get server info by ctilmes (Vicar) on Jul 15, 2003 at 13:18 UTC
Use URI. `use URI; my %hostcount; while (<DATA>) { my $u = URI->new($_); $hostcount{$u->host}++; } foreach my $host (keys %hostcount) { print "$host = $hostcount{$host}\n"; } __DATA__ http://riverserver/dir1/dir2/index.html http://riverserver/dir1/dir2/index.html http://perlmonks.org/index.pl http://webserver.company.serv/Adir/Bdir/thePage.cfm http://riverserver/dir1/dir2/index.html http://webserver.company.serv/Adir/Bdir/thePage.cfm` [download] Output: `webserver.company.serv = 2 riverserver = 3 perlmonks.org = 1` [download]	[reply] [d/l] [select]
Re: Re: Parsing to get server info by Anonymous Monk on Jul 15, 2003 at 16:02 UTC
I tried exactly as you had and got this message: `Can't locate object method "host" via package "URI::_generic" (perhaps + you forgo t to load "URI::_generic"?) at C:\Perl\bin\url1.pl line 11, <DATA> lin +e 7.` [download] I do have "URI" module on my Windows NT. Please advise.	[reply] [d/l]
Re: Parsing to get server info by gjb (Vicar) on Jul 15, 2003 at 13:14 UTC
If you're sure that the data you process contains only valid URLs, you can do it a bit more conveniently with: `$link =~ m{http://([^/]+)}; $server = $1;` [download] Hope this helps, -gjb-	[reply] [d/l]
Re: Re: Parsing to get server info by l2kashe (Deacon) on Jul 15, 2003 at 14:29 UTC
As an add on here. maybe `$line =~ m{^https?://([^/]+)}; $server{$1}++; for ( keys %server ) { print "$_ : $server{$_}\n"; }` [download] This way it will also catch https URLs. Though using a module as recommended below is probably the best way. MMMMM... Chocolaty Perl Goodness.....	[reply] [d/l]