A spider does not need to be a registered user, so there are only the nodes, as you skip the part of "lastnode_id". the bigger problem is the duplication of information as the display of a node also includes replies. So whether the spider recognizes single nodes and can assort them to avoid multiple lookups and storing everything n times, if n is the level of "re:" to a node, it could work.
Anyhow, for archiving purposes, and to make perlmonks more easy searcheable and to allow for better categorization it might be the need on the side of perlmonks.org to have something alike:
http://www.perlmonks.org/index.pl?node_id=83485&view_mode=plain_txt so the spider does not mess up with all the dynamic content as nodelets and menubars.
And I bet that the everything engine has such a feature, even if its somewhere deep hidden and only used for debugging or so.
But, sincere excuse, as long as this loads work to
vroom, better write a really good spider.
Yes, I like your idea a lot, cause I believe, all thats been posted until now, would make up a Perl-monks-bookshelf upon tips, traps, tricks and so on. (well except meditations and discussions, but those contents you could offer mindspring.com or pilosophy.org for linking.) {grin}
Have a nice day
All decision is left to your taste