in reply to use Perl to maintain blocked host file
Assuming you want to parse each text file for things that look like sitenames ... this is an easy (if time-consuming) task in Perl. Try something like:
while (<>) { open IN, $_ || die $!; while (<IN>) { if (m!https?://([^/"'\s]+)!) { print "'$1'\n"; } } close IN; }
This code is untested. Please try and figure out what it's doing before you tell others it works. It also isn't complete and is easily fooled. It's an 80/20 solution, useful only if 80/20 solutions are acceptable. Much better would be to employ something like URI or HTML::Parser.
------
We are the carpenters and bricklayers of the Information Age.
Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.
|
|---|