vit has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
Does anybody know a perl program or module which does at least roughly a Parked or Ads-Portal Domain identification? Meaning for a given domain recognize that it is Parked or Ads-Portal
  • Comment on Parked or Ads-Portal Domain identification

Replies are listed 'Best First'.
Re: Parked or Ads-Portal Domain identification
by chrestomanci (Priest) on Mar 23, 2011 at 21:09 UTC

    This is not a perl specific solution, but my approach would be to look up the IP address for www.<domain>, and then do a reverse DNS lookup on that IP address.

    If the domain is parked, then there will probably be thousands of domains on the same IP address. If a domain is on shared hosting, then I would expect at most a few hundred.

    For a perl script, I would start by reading the docs for gethostbyaddr, and any module interfaces to the same information.

      /*If the domain is parked, then there will probably be thousands of domains on the same IP address. If a domain is on shared hosting, then I would expect at most a few hundred.*/
      I did not understand, "thousands of domains" where? You mean if I use "gethostbyaddr" I will get thousand of domains?
      But even if I determine that this is a host how do I know that this is parked or a regular site?

        On any normal domain, there will be an 'A' Record that points to a web server that hosts a website for the domain.

        It is possible for several domains to share the same server, on the same IP address. This is commonly done for low traffic web sites, blogs and the like.

        A parked domain will by definition have extremely low traffic, and the company that owns the parked domain will also own thousands of others, so logic dictates that they will all be on the same web server and share the same IP address. (seeing as IP addresses are scarce, and web servers expensive).

        Therefore, in order to guess if a domain is parked, one way would be to find out how many other domains share the same IP address.

Re: Parked or Ads-Portal Domain identification
by planetscape (Chancellor) on Mar 27, 2011 at 19:41 UTC

    On further reflection, how does the Wayback Machine know to stop archiving?

    When I encounter a parked domain (manually), the first thing I do is click my Wayback Machine shortcut (javascript:location.href=%22http://web.archive.org/web/*/%22+location.href as provided by bart), and check the most recent archive. Sometimes I get a cached version of the ads portal, but often, I don't. Anecdotal, yes, but it may be worth inquiring how they do it (wayback@archive.org is their publicly available addy).

    HTH,

    planetscape
Re: Parked or Ads-Portal Domain identification
by planetscape (Chancellor) on Mar 25, 2011 at 17:18 UTC

    By its search engine preview, it will appear the single most promising of hits, and it will have no cached version. </semi-facetious>

    HTH,

    planetscape