dwhite20899 has asked for the wisdom of the Perl Monks concerning the following question:
I have a list of about 100 companies, and I need to find the short registered names from DNS for each of them. For example, I have "Adobe Systems, Inc." - since they own "adobe.com" the result is "adobe". "American Power Conversion" should result in "apc".
This needs to be done just once, so I'm tempted to just put a person on it, but if there is a way to automate it, I'd rather do that.
Any insights are appreciated! Doug
Update: Got something working.
It needs some work, but this gets me dang close.
Given a file with the NVD known vendor strings, and a file with the NSRL manufacturer text strings, this gets 6 URLs from Yahoo, strips them down and gives suggested mappings in square brackets, known NVD vendor strings in curly braces and other context, tab separated.
Output looks like this:
The code:BeLight Software [belight] [belightsoft] en.wikipedia BeLight Software, Ltd. [belightsoft] go.cadwire macs.abou +t Bea Systems, Inc. [beasys] {bea} {oracle} Beermat Software Ltd. [beermatsoftware] encarta.msn Belkin Corp [belkin] bizjournals cnet updates.zdnet Bell Atlantic Internet Solutions Inc. [bellatlantic] prnewsw +ire verizon yale.edu Berkley Systems berkeley best.me.berkeley.edu bt-systems + bvsystems en.wikipedia gis.co.berkeley.sc.us Bethesda SoftWorks bethsoft elderscrolls support.bethsoft Big Fish Games [bigfishgames] atlantis.bigfishgames bigfi +sh.es Big Fish Games, Inc. [bigfishgames] bigfish.es bigfishgam +es.es otg.bigfishgames BioWare [bioware] blog.bioware nwn.bioware store.biowa +re
#!/opt/local/bin/perl -w use strict; use Yahoo::Search; use vars qw( %nvdVendor $textName %foundName @Results ); open(VIN,"NVD-vendors.txt") or die "$0 : cant open support file NVD-ve +ndors.txt\n"; while(<VIN>) { chomp; $nvdVendor{$_} = 1; } close(VIN); open(NIN,"NSRL-manufacturers.txt") or die "$0 : cant open support file + NSRL-manufacturers.txt\n"; while(<NIN>) { chomp; $textName = $_; (defined $textName) or next; @Results = Yahoo::Search->Results(Doc => "$textName", AppId => "Ya +hooDemo", Count => 6, Mode => 'all'); warn $@ if $@; # report any errors for my $Result (@Results) { addFullName($Result->Url); } print "$textName\t\t"; my %guesses; for my $k (keys %foundName) { if (defined $nvdVendor{$k}) { $guesses{"\{$k\}"} = 1; } else { if (closeEnuff($textName, $k)) { $guesses{"[$k]"} = 1; } else { $guesses{$k} = 1; } } delete $foundName{$k}; } print join("\t", (sort keys %guesses)); print "\n"; sleep(3); } # NIN exit; sub addFullName () { my $url = shift; $url = lc($url); ($url =~ /^http:/ ) or return(0); $url =~ s/http:\/\/// ; (my $n, my $p) = split(/\//,$url,2); # get the server name $n =~ s/^www\.// ; # strip common pre/postfixes $n =~ s/\.com$// ; $n =~ s/\.net$// ; $n =~ s/\.org$// ; $n =~ s/\.co\.uk$// ; $foundName{$n} += 1; return(1); } sub closeEnuff() { my $t = shift; my $y = shift; if ($t =~ / $y /i ) { return(1); } # does the candidate match a wo +rd in the text name? $t =~ s/ //g ; if ($t =~ /$y/i ) { return(1); } # does the candidate match the te +xt name with spaces removed? # should do a check after removing special chars return(0); } __END__
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Finding short DNS names from long text
by merlyn (Sage) on Jul 12, 2009 at 20:03 UTC | |
by dwhite20899 (Friar) on Jul 12, 2009 at 20:54 UTC | |
by hossman (Prior) on Jul 13, 2009 at 01:22 UTC | |
by dwhite20899 (Friar) on Jul 13, 2009 at 01:58 UTC | |
|
Re: Finding short DNS names from long text
by mzedeler (Pilgrim) on Jul 12, 2009 at 20:28 UTC | |
by dwhite20899 (Friar) on Jul 12, 2009 at 20:40 UTC |