I'm about to move RadicalMatterDotCom to a new server and plan to rebuild the database into subcategories. Spidering dynamic content (as mentioned defore) is not a problem for the scripts I use.

The db will be split into categories like and I am wondering if I should include a perl docs category, and spider sites like perlmonks.org and cpan etc. Thoughts??

Replies are listed 'Best First'.
Re: RadicalMatterDotCom
by blakem (Monsignor) on Aug 23, 2001 at 00:25 UTC
      A spider does not need to be a registered user, so there are only the nodes, as you skip the part of "lastnode_id". the bigger problem is the duplication of information as the display of a node also includes replies. So whether the spider recognizes single nodes and can assort them to avoid multiple lookups and storing everything n times, if n is the level of "re:" to a node, it could work.
      Anyhow, for archiving purposes, and to make perlmonks more easy searcheable and to allow for better categorization it might be the need on the side of perlmonks.org to have something alike:
      http://www.perlmonks.org/index.pl?node_id=83485&view_mode=plain_txt
      so the spider does not mess up with all the dynamic content as nodelets and menubars.
      And I bet that the everything engine has such a feature, even if its somewhere deep hidden and only used for debugging or so.
      But, sincere excuse, as long as this loads work to vroom, better write a really good spider.
      Yes, I like your idea a lot, cause I believe, all thats been posted until now, would make up a Perl-monks-bookshelf upon tips, traps, tricks and so on. (well except meditations and discussions, but those contents you could offer mindspring.com or pilosophy.org for linking.) {grin}

      Have a nice day
      All decision is left to your taste
        Seems like you've proved my point though. Any spider that successfully indexes perlmonks will have to be specially customized for the site. Or, said another way, perlmonks is not friendly to the general search spider.

        Oh, and I think you might be looking for DisplayType Raw

        -Blake

Re: RadicalMatterDotCom
by E-Bitch (Pilgrim) on Aug 23, 2001 at 00:32 UTC
    Problem is, (and this is my opinion, yers may be different) I dont consider perl monks to be a "documentation" site. Documentation implys static content, like books and such, and while, yes, pm does include various tutorials, it doesnt contain much static content.

    just my 2 cents though, I may be totally off base.

    what about "perl communities" or just "perl sites"?
    _________________________________________
    E-Bitch
    Tempora Mutantur Nos et Mutamur in Illis
    "The Times are Changed Even as We are Changed in Them"
Re: RadicalMatterDotCom
by Zecho (Hermit) on Aug 23, 2001 at 03:57 UTC
    Wow now that was a lot more than I had asked for, the ultimate question was is there a use, or a need for a searchable db of perl sites like perlmonks, cpan etc.. I don't know how my site would take to spidering perlmonks, but for the sake of curiosity, I'm gonna point it to it and see what it does. I will probably start rebuilding everything next week, so it'll only be temporary.