punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:

Machinating Monks,

I'm using an htaccess redirect to send users who request any file from subfolder-a on Site-A, to a logging Perl script on Site-B. Then I want my script to return the user to their originally requested page.

Easy, you say - $ENV{HTTP_REFERER}, but referer info is not reliable - sometimes missing, and it can be spoofed.

How can I reliably know which page was originally requested?

Thanks.




Time flies like an arrow. Fruit flies like a banana.

Replies are listed 'Best First'.
Re: How to reliably get referrer?
by chromatic (Archbishop) on Mar 02, 2009 at 19:32 UTC

    You can't.

    Okay, that's not true. If you have complete control of all client machines, such that all access went through your preferred mechanism, with no possible way anyone could ever access your program from any other mechanism, you can. (The statelessness of HTTP combined with the open access to any URI-accessible resource necessitates these strong steps.)

    That's probably impractical -- so you can't.

      Well, if that's the case, I'll see about going at it from the other end and see if I can have htaccess pass on the url that was requested.
Re: How to reliably get referrer?
by Joost (Canon) on Mar 02, 2009 at 22:15 UTC
    You could pass on a dynamically (server-side) generated and logged token from site A to site B (for instance, as a GET parameter), then you can check the tokens later to see if they match. Or better yet, use some kind of inter-server mechanism (like memcached) to check the token when the request reaches site B.

    Without some dynamic serverside control of server A, you won't be able to get even remotely reliable prevention of spoofing.

    updated for clarification

      Ya, that sounds like the best idea yet. It'll add a step - htaccess redirects to a local Perl script, which generates a hash from a private key and redirects to the remote Perl script, which checks the hash against the same private key, and, if OK, does its logging and redirects back to the originally requested file. Do-able.

      But, how to make that happen as a GET so it doesn't all show up in the location bar? I was thinking redirects by printing location headers, but that'll put the info in the url.

      Update - uh, I mean the opposite - instead of redirecting as a GET, it'd be nice to be able to redirect as a POST so that the info won't show up in the location bar.

      Update 2 - hmmm, maybe less do-able than I thought - because of course when the last Perl script redirects to the desired file, htaccess will test the referer again. If I'm appending a dynamically generated hash, it won't match whatever url htaccess has been statically programmed to test against, and if I append a static key, then all it takes is one successful visit to the page and the user can circumvent the system by re-inputting that url. Hmmmmm.....

Re: How to reliably get referrer?
by zwon (Abbot) on Mar 02, 2009 at 19:45 UTC

    You should specify originally requested page as perl script parameter. When user requests http://site-a/filename, you redirecting him to http://site-b/script.pl?file=filename and from script.pl you can redirect him back using file parameter.

      What if the user bookmarks the latter and removes the query string? What if the user types it in directly? What if the client ignores query string parameters?

      This isn't completely reliable either.

        True, but the OP didn't specify any concerns about malicious users trying to break the app. The referer business is unreliable even when users are friendly.

        UPDATE: I guess "spoofed" does sound like a concern about tampering by users.

        This isn't completely reliable either.

        Sure. There's alway possible to use user agent which doesn't support redirection ;).

Re: How to reliably get referrer?
by CountZero (Bishop) on Mar 02, 2009 at 20:43 UTC
    send users who request any file from subfolder-a on Site-A, to a logging Perl script on Site-B
    Can't you get this information from the log files on site-A? Much less overhead and unreliable redirecting involved.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: How to reliably get referrer?
by punch_card_don (Curate) on Mar 03, 2009 at 16:50 UTC
    MUCH simpler idea - I think

    htaccess checks the Query String, not the Referer.

    IF (Query String != long_string_a+date+long_string_b+hour+long_string_ +c) THEN redirect to logging script on other server, appending the origina +lly requested url, the local date and the local hour ELSE simply re-write url to remove query string END
    The Perl script on the other server strips the originally requested url, date and hour, generates a Query String using static long_string_a, b, and c, the date and the hour, and redirects back to

    originally_requested_url?q=long_string_a+date+long_string_b+hour+long_string_c

    But htaccess strips the Query string off, so noone ever sees it, it's nigh impossible to guess, and it's dynamic so very hard to spoof.

    Any holes in my theory?