http://qs1969.pair.com?node_id=225586

primus has asked for the wisdom of the Perl Monks concerning the following question:

Hail Monks!

i have been wondering this for a while and was hoping someone here could clarify my question.

on some websites the url is for example, "http://www.eweek.com/article2/0,3959,809353,00.asp?kc=EWTH102099TX1K0100487", well i was wondering what the "0,3959,809353,00.asp" was... is it a parameter being passed to a script, or does the file have commas in it? wanst sure having files with commas in the name is possible...

but anyway, i like this method, even though i do not fully understand it, and i wanted to try it out, though i need some help with what is actually going on.

thanks monks! ------ another example : "http://gamespot.com/gamespot/filters/0,10850,6013548,00.html"

Replies are listed 'Best First'.
Re: quasi-perl related ... more website design
by seattlejohn (Deacon) on Jan 09, 2003 at 18:37 UTC
    That particular URL format is characteristic of Vignette content-management products. It's been a while since I worked with their tools, but my recollection is that it works something like this.

    Each of the four comma-delimited numbers has a specific meaning. The first one is normally 0 to use a cached version of a page if one exists, whereas a 1 indicates that the page should be rebuilt dynamically. The second number is the identifier of the template used to build the page content. The third number is the identifier of the content used to build the page. The fourth number has something to do with browser-specific versions of a page, but I've never seen it used in practice.

    By varying the second number but keeping the third number constant, a producer can vary the template that controls content presentation while keeping the underlying content constant. By varying the third number but keeping the second number constant, a producer can vary the content that appears in a specific template.

    I'm not familiar with how eWeek uses the "?kc=xxx" portion of the URL, but it may have something to do with tracking where a referral to a page originated, or perhaps user-specific state.

    You mention you weren't sure whether commas are allowable in filenames. It probably depends on your OS, but in any case it doesn't likely matter, because the Vignette URLs do not correspond to real filenames. Instead, the URL above is essentially invoking a program with the parameters (0,3959,809353,00) and whatever's in kc, and that program returns the appropriate page.

    You can accomplish something similar with perl and CGI. For example, if you want your site to have URLs like this:
    http://www.mysite.com/show/14/22/56

    You can do it by creating a script called show (no extension) in your docroot, then using the $ENV{PATH_INFO} variable to capture the /14/22/56 "parameters" that the script was invokved with. (This makes some assumptions that may not be true in practice, e.g. that your host will let you create an executable script in your docroot.)

    Building a database- and template-driven site is indeed something worth exploring (though this particular URL syntax may not be the most effective for all applications). You may want to check out modules like HTML::Template and Template Toolkit to get an idea what's possible.

            $perlmonks{seattlejohn} = 'John Clyman';

      i am using the HTML::Template now, and i think i will check out the idea of using an exec script in my docroot to pull parameters... is there a easy way to go about it? i heard of one way to have the script intercept and parse the $ENV{PATH_INFO} and display the corresponding page... thank you fellow monk
Re: quasi-perl related ... more website design
by derby (Abbot) on Jan 09, 2003 at 18:03 UTC
Re: quasi-perl related ... more website design
by osama (Scribe) on Jan 09, 2003 at 22:05 UTC

    I have used a trick with apache to get the data for the web pages from a database (in a cms).

    I would have a 404 handler that runs my script, the script would check $ENV{REQUEST_URI}, and query the database, if found it overrides the 404 header with a 200 OK and spits out the page, otherwise it just keeps the 404 status with a custom message (a nice one :)