Re: Script generated site index db

Replies are listed 'Best First'.
Re: Re: Script generated site index db by S_Shrum (Pilgrim) on Mar 19, 2002 at 08:55 UTC
The problem there is that my site uses nested html documents...the URI will be a perl script call that will take the following: Page = General site identity document Table = Content header (table results, document abstract info, etc.) Record = Document content (tabled data, documents, etc.) ...and nest them in the following order: Record -> Table -> Page Just searching the files would create a list of all the words at the site however it will not take into consideration the layout of the files and their nesting order...the URI is the key. ====================== Sean Shrum http://www.shrum.net	[reply]
Re: Re: Re: Script generated site index db by shotgunefx (Parson) on Mar 19, 2002 at 09:14 UTC
You could expand the filename. I am assuming when you say nested you mean subdirectories. Let's say your document root is /usr/local/apache/htdocs `#!/usr/bin/perl use File::Find; my $filedir = '/usr/local/apache/htdocs'; my $baseurl = "http://someplace.com/shrum"; my @docs = (); sub process_file { return if -d; # Skip directories. push @docs, [$File::Find::dir, $_]; } find(\&process_file, ($filedir) ); foreach $doc (@docs){ $doc->[0]=~s/$filedir/$baseurl/o; #[0] is dir [1] is filename. print "URL is ".$doc->[0].'/'.$doc->[1],"\n"; }` [download] -Lee "To be civilized is to deny one's nature."	[reply] [d/l]
Re: Re: Re: Re: Script generated site index db by rob_au (Abbot) on Mar 19, 2002 at 09:33 UTC
This is still very much a half-solution for sites that incorporate dynamic content or server-side includes. For sites such as these, for which I first looked into this issue, a local-based HTTP indexing engine is an absolute must - It should also be noted that indexing can be scheduled for low-utilisation times and the impact on the server is minimal. The other advantage which this approach offers is the ability to incorporate web site maintenance such as broken link checking and content-auditing into the same process. `perl -e 's&&rob@cowsnet.com.au&&&split/[@.]/&&s&.com.&_&&&print'`	[reply]
Re: (5): Script generated site index db by shotgunefx (Parson) on Mar 19, 2002 at 09:59 UTC
Re: Re: Re: Re: Script generated site index db by S_Shrum (Pilgrim) on Mar 19, 2002 at 09:38 UTC
My bad... By nested, I literally mean nested in that: I have a script that makes a LWP::Simple call to the Page, Table, and Record files specified in the script call and stores them in variables. These files are templates into which I do substitutions of the site content data into. Once the substitutions are completed, I place the record template into the table template into the page template. To give you a better idea here is an example of a completed URL from my site. The above URI uses 3 template files: Page Table, and Record The majority of the page content that you see (at present) is from the docs.dat. Currently, this setup allows me to create webpages with either content from the dac.dat or by simply specifying a document html page as the RECORD in which case the user is no the wiser...the page displays as if it had come from the docs.dat (even though it's not). Hence the problem...the document content (in this case) will not be in the docs.dat (only the reference to the document will be listed in the URI). It doesn't make for the cleanest HTML but it works 99% of the time. ;D ====================== Sean Shrum http://www.shrum.net	[reply]