I've been looking on CPAN, googling, etc. and I can't find anything on this topic, so I'm hoping someone here can share a flash of brilliance.

I have a list of about 100 companies, and I need to find the short registered names from DNS for each of them. For example, I have "Adobe Systems, Inc." - since they own "adobe.com" the result is "adobe". "American Power Conversion" should result in "apc".

This needs to be done just once, so I'm tempted to just put a person on it, but if there is a way to automate it, I'd rather do that.

Any insights are appreciated! Doug

Update: Got something working.

It needs some work, but this gets me dang close.

Given a file with the NVD known vendor strings, and a file with the NSRL manufacturer text strings, this gets 6 URLs from Yahoo, strips them down and gives suggested mappings in square brackets, known NVD vendor strings in curly braces and other context, tab separated.

Output looks like this:

BeLight Software [belight] [belightsoft] en.wikipedia BeLight Software, Ltd. [belightsoft] go.cadwire macs.abou +t Bea Systems, Inc. [beasys] {bea} {oracle} Beermat Software Ltd. [beermatsoftware] encarta.msn Belkin Corp [belkin] bizjournals cnet updates.zdnet Bell Atlantic Internet Solutions Inc. [bellatlantic] prnewsw +ire verizon yale.edu Berkley Systems berkeley best.me.berkeley.edu bt-systems + bvsystems en.wikipedia gis.co.berkeley.sc.us Bethesda SoftWorks bethsoft elderscrolls support.bethsoft Big Fish Games [bigfishgames] atlantis.bigfishgames bigfi +sh.es Big Fish Games, Inc. [bigfishgames] bigfish.es bigfishgam +es.es otg.bigfishgames BioWare [bioware] blog.bioware nwn.bioware store.biowa +re
The code:
#!/opt/local/bin/perl -w use strict; use Yahoo::Search; use vars qw( %nvdVendor $textName %foundName @Results ); open(VIN,"NVD-vendors.txt") or die "$0 : cant open support file NVD-ve +ndors.txt\n"; while(<VIN>) { chomp; $nvdVendor{$_} = 1; } close(VIN); open(NIN,"NSRL-manufacturers.txt") or die "$0 : cant open support file + NSRL-manufacturers.txt\n"; while(<NIN>) { chomp; $textName = $_; (defined $textName) or next; @Results = Yahoo::Search->Results(Doc => "$textName", AppId => "Ya +hooDemo", Count => 6, Mode => 'all'); warn $@ if $@; # report any errors for my $Result (@Results) { addFullName($Result->Url); } print "$textName\t\t"; my %guesses; for my $k (keys %foundName) { if (defined $nvdVendor{$k}) { $guesses{"\{$k\}"} = 1; } else { if (closeEnuff($textName, $k)) { $guesses{"[$k]"} = 1; } else { $guesses{$k} = 1; } } delete $foundName{$k}; } print join("\t", (sort keys %guesses)); print "\n"; sleep(3); } # NIN exit; sub addFullName () { my $url = shift; $url = lc($url); ($url =~ /^http:/ ) or return(0); $url =~ s/http:\/\/// ; (my $n, my $p) = split(/\//,$url,2); # get the server name $n =~ s/^www\.// ; # strip common pre/postfixes $n =~ s/\.com$// ; $n =~ s/\.net$// ; $n =~ s/\.org$// ; $n =~ s/\.co\.uk$// ; $foundName{$n} += 1; return(1); } sub closeEnuff() { my $t = shift; my $y = shift; if ($t =~ / $y /i ) { return(1); } # does the candidate match a wo +rd in the text name? $t =~ s/ //g ; if ($t =~ /$y/i ) { return(1); } # does the candidate match the te +xt name with spaces removed? # should do a check after removing special chars return(0); } __END__

In reply to Finding short DNS names from long text by dwhite20899

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.