note marto I didn't check all of the domains, no. I think the only valid archive would be an up to date database extract of node content (and some of the other metadata), rather partial snapshots of page impressions from a moment in time. Update: I seem to recall different domains having different robots.txt rules to impact indexing. 11159096 11159141