fraktalisman has asked for the wisdom of the Perl Monks concerning the following question:

There are lots of rumours about dynamic pages i.e. web pages generated with perl, php, asp.
After examining the displayed page rank (using the Google toolbar) of different pages of a content management system of our company, I got the impression that .html files get a better rating than .pl files do. OTOH perlmonks.org has a very high page rank on almost every page, and it's all .pl (what a surprise).
Now has anyone of you got the same impression, and has someone maybe found out some definite do's and don't for dynamic web pages to get them the appropriate ranking?
  • Comment on (OT) Google page rank for dynamic pages

Replies are listed 'Best First'.
Re: (OT) Google page rank for dynamic pages
by matija (Priest) on Mar 10, 2004 at 23:25 UTC
    Companies that claim to improve Perl ranking through various ways of fooling google charge thousands of dolars per consultation. Even then they can't guarantee that a site's favorable position in the rankings for a given search term will remain constant through google's next reorganisation of it's ranking algorithm. While the general outline of their algorithm is public, the exact details are closely guarded. After all, any of the "improve your pagerank" companies that knew the exact details could fool the spiders in placing all it's clients on the top of relevant search lists, making a lot of money and devaluing google's utility in the process.

    Google frequently tweaks their algorithm precisely because they don't want to reward people who abuse their engine with generated-for-google pages, (or worse yet, webs of pages).

    In fact, if you perform a search and find clearly fraudulent results, click on the "Help us improve" link at the bottom of the search page, and fill out the form. I understand google's engineers are pretty good about lowering the ranking of frauds reported through that page.

    The page rank of a particular site depends on the quality of content on that site, on the relevance of content (how similar the pages pointed at are to the links pointing at them), to the page rank of pages pointing at that page, and last, but not least, the number of pages pointing at the site. (And probably many more factors we can only guess at).

    So Perlmonks rates high because a lot of high-quality pages point at various points within the monastery, and the monastery scores high on the relevance, and low on "fraudulence indicators", the .pl in the URL nonwithstanding.

      I dont't doubt the content's quality here at perlmonks. Still I'm wondering about my observation that often at the same site, .html pages seem to get better ranking than .pl pages even if there's next to no content on the .html ones, or content of obviously the same quality level on both kinds of pages.
      One such site is www.xsk8.de (a german inline skating magazine) which has in fact .html parts as well as .pl
Re: (OT) Google page rank for dynamic pages
by swngnmonk (Pilgrim) on Mar 10, 2004 at 23:13 UTC

    It's hard to comment directly on your situation, as there are no concrete examples to look at, but a couple of thoughts:

    In general, search engines tend to do a lot better with static HTML pages than they do with dynamic content. Form submissions, site searches, and other such user-friendly interfaces on dynamic sites cause real problems for crawlers. Many dynamic sites require user decisions to deliver content - these things are just not easily codeable from the crawler-side.

    Part of Google's PageRank algorithm is dependent on how many links there are to a given page (from other sites). I'd guess that a lot of people with websites like Perlmonks. Not knowing a lot about PageRank, I'm guessing that the real difference in scores between PerlMonks content and your site's dynamic content has to do with the last point - external links. How to fix that? Get more people to link to you.

    A company I worked for back in the dot-com days did a lot of crawling, and some sites were just impossible - javascript or search forms would be the only way to navigate the site, and you can't just clobber a search CGI to find what you're looking for. Our solution was to get these sites to send us DB dumps of the info we needed instead. Perhaps Google has a service like that available?

Re: (OT) Google page rank for dynamic pages
by wolfi (Scribe) on Apr 01, 2004 at 12:15 UTC

    (My current "client" was particular worried about this, so i've had to do some R&D on the subject the past few mo's. Some might be redundant w/what others have said...)

    every search engine uses its own algorithm for determining rankings, but these are some common means:

    ~meta-keywords
    ~meta-descriptions
    ~meta-keywords or -descriptions compared to actual content (text) in the site.
    ~text located near the top OF THE SOURCE CODE, gets higher ranking (tables, js/css scripts, etc can push this down)
    ~text in heading-tags "H1, H2, etc" or bold or linked get higher weight
    ~links from other sites
    ~links to other sites
    ~comparison of words used together in the page vs. the search engine user's inputted criteria to determine relevance.
    ~they prefer non-script and non-framed pages

    as far as .pl vs. .html or dynamic vs. static goes - it's always been easier for search engines to index static content, rather than dynamic stuff. But over the past few yrs, with the growing dependency on dynamic content, search engines have had to evolve their rules and programs and i believe, most can index these pages. And those that can't, won't be long for this world.

    HOWEVER...