The site I manage has been using conventional (hand crafted) web pages since its inception, and we now have over 300 pages. We have a site search function, using the popular Swish-e tool. This is a C program, that is kicked off by a cron job each night, and scans each file in the server document tree, and builds search indexes and so on. When a person searches our site, they are given (hopefully) a list of pages, identified by document title - that is stuff between the <title> and </title> tags.
Now, since we are about to use EmbPerl::Object to have a far easier to manage site, each page only has the guts of the page as HTML stuff, with standard embperl files making the standard page headers, and so on. Any browser (or spider) getting pages through our server is delivered the complete HTML code, with titles, body stuff and so on. No problem there. But, swish-e, which runs the index generation outside of the web server, only sees the "raw" files. Hence, even though it indexes all the searchable text, there are no title tags in each content file.
Have other people faced this problem? Is there a version of swish-e - or something similar - that can be scheduled on a regular basis, but indexes documents retrieved through the web server itself? I am sure this coudl be done with LWP, but not wanting to invent the wheel . . .
How do those sites with large content management systems provide this search capability?
In reply to Templated Web Sites and Search Engines by Maclir
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |