The
Personal Open Directory script does a very similar job - it 'reads' the pages from
dmoz.org, re-writes the URLs, re-brands it as necessary and allows sites such as my
own site to offer the content without having to store several hundred megabytes of data. The code, while a little spaghettified, could be used as a good example of what you want to achieve.