in reply to Re: Re: Script generated site index db
in thread Script generated site index db

You could expand the filename. I am assuming when you say nested you mean subdirectories.

Let's say your document root is /usr/local/apache/htdocs
#!/usr/bin/perl use File::Find; my $filedir = '/usr/local/apache/htdocs'; my $baseurl = "http://someplace.com/shrum"; my @docs = (); sub process_file { return if -d; # Skip directories. push @docs, [$File::Find::dir, $_]; } find(\&process_file, ($filedir) ); foreach $doc (@docs){ $doc->[0]=~s/$filedir/$baseurl/o; #[0] is dir [1] is filename. print "URL is ".$doc->[0].'/'.$doc->[1],"\n"; }


-Lee

"To be civilized is to deny one's nature."

Replies are listed 'Best First'.
Re: Re: Re: Re: Script generated site index db
by rob_au (Abbot) on Mar 19, 2002 at 09:33 UTC
    This is still very much a half-solution for sites that incorporate dynamic content or server-side includes. For sites such as these, for which I first looked into this issue, a local-based HTTP indexing engine is an absolute must - It should also be noted that indexing can be scheduled for low-utilisation times and the impact on the server is minimal.

    The other advantage which this approach offers is the ability to incorporate web site maintenance such as broken link checking and content-auditing into the same process.

     

    perl -e 's&&rob@cowsnet.com.au&&&split/[@.]/&&s&.com.&_&&&print'

      I totally agree that munging files is a less than ideal solution. Not an insult to S_Shrum but when I saw that he was working with flat files and planning on searching flat files I geared the answer towards the question.

      When possible, I think that indexing is best part of the publishing process. If you're working with the data files that generate a site, you're in a much better position to know what data is relevant and in what context.

      Most of our clients have their data piped through us at one stage or another and it makes it much easier. For instance, if you have a global footer on every page that contains a keyword, you probably don't want to return every page back in the results. To try and detect things like that looking from the outside would be pretty difficult.

      -Lee

      "To be civilized is to deny one's nature."
Re: Re: Re: Re: Script generated site index db
by S_Shrum (Pilgrim) on Mar 19, 2002 at 09:38 UTC

    My bad...

    By nested, I literally mean nested in that:

    I have a script that makes a LWP::Simple call to the Page, Table, and Record files specified in the script call and stores them in variables. These files are templates into which I do substitutions of the site content data into. Once the substitutions are completed, I place the record template into the table template into the page template.

    To give you a better idea here is an example of a completed URL from my site.

    The above URI uses 3 template files:

    Page
    Table, and
    Record

    The majority of the page content that you see (at present) is from the docs.dat.

    Currently, this setup allows me to create webpages with either content from the dac.dat or by simply specifying a document html page as the RECORD in which case the user is no the wiser...the page displays as if it had come from the docs.dat (even though it's not). Hence the problem...the document content (in this case) will not be in the docs.dat (only the reference to the document will be listed in the URI).

    It doesn't make for the cleanest HTML but it works 99% of the time. ;D

    ======================
    Sean Shrum
    http://www.shrum.net