Just today I was toying around with a little idea I had: using $ENV{PATH_INFO} in a CGI script to dynamically include content-generating modules into the main script. The PATH_INFO would be a slash-delimited string, with the first part being the name of the module and everything else arguments to the module.

The basic idea was as follows:

# FILE: main script use strict; use warnings; (my $module) = $ENV{PATH_INFO} =~ m!^/([^/]+)/; # mind this regex, I'm coming back at it in a few secs! require "$module.pl"; my $content = $module->run(); #... print $header, $content, $footer; # FILE: any random module, let's say myawesomemodule.pl package myawesomemodule; use strict; use warnings; sub run { ... # generate content return $content; } 1; # hey; it's a require after all.

And now a request to http://localhost/mainscript/myawesomemodule would just work.

However, like I said, I also wanted to include a few arguments in the PATH_INFO, so the regex I made looked somewhat like  my ($module, @module_args) = $ENV{PATH_INFO} =~ m!/([^/]+)(?:/([^/]+))+!;

The idea, obviously, was to grab the first part into $module and everything else into @module_args. Much to my surprise I found only the last argument inside the array, though. So for example http://localhost/mainscript/myawesomemodule/foo/bar/baz would only receive "baz" as its single argument.

I fiddled around with adding ^ and $ start and end markers to the regex but that didn't help either. I made the match /global and again nothing much seemed to change. I sort of understood what was going on: I only had two pairs of capturing parentheses so it'd make sense the regex would only return two things, however, I was fairly sure a task as simple as this should be doable without a lot of hassle. I mean, all I have is a simple slash-delimited string of which only the first part has a somewhat special meaning.

After an hour of staring at it, fiddling around with one or two bytes of changed code, it dawned onto me! Eureka.

my @module_args = $ENV{PATH_INFO} =~ m!/([^/]+)!g; my $module = shift @module_args || $some_default_value;

So, yeah. The solution was truly simple and I'm rather amazed it took me so long to find out. Perl aims to make the easy things easy and the hard things possible. But if you don't know how to wield that power, well, you'll find yourself making the easy things unachievable.

PS. I toyed with it a bit further and by now the main script also checks wether "$module.pl" even exists. If not, it checks -e "${module}_$ENV{REQUEST_METHOD}" so that you can have seperate modules for something like a form: myform_GET.pl is the module to generate the form, whereas myform_POST.pl will process it. But to the external world they are hidden behind the same address, namely http://example.com/mainscript/myform. I think this looks rather elegant.

Replies are listed 'Best First'.
Re: Regex stupidity - or, making the easy things hard
by JavaFan (Canon) on Oct 09, 2009 at 08:46 UTC
    (my $module) = $ENV{PATH_INFO} =~ m!^/([^/]+)/; # mind this regex, I'm coming back at it in a few secs!
    When I read this, I thought the posting was about the fact that the above is valid code, but the regexp is very unlikely to match a typical PATH_INFO (note that the above code that the terminating regexp delimiter as the last character of the comment!).
    my @module_args = $ENV{PATH_INFO} =~ m!/([^/]+)!g;
    I would have written it as:
    my @module_args = split '/+', $ENV{PATH_INFO};

      Thoughtfull, and perhaps your solution improves readability. But let's consider your approach:

      use strict; use warnings; my $path_info = "/some_module/with/some/args"; my @module_args = split '/+', $path_info; print join(" :: ", @module_args), "\n"; __END__ :: some_module :: with :: some :: args

      Note how there is first an empty element in the array, because the PATH_INFO has a leading slash. Then there is the "real first" part which contains the module name and then come the arguments. So in order to make your solution work, you need two shifts to get the module name. I wonder how useful that is but then again, I might be missing something here.

        (undef, my ($module, @args)) = split '/+', $path_info;
        No shifts.
Re: Regex stupidity - or, making the easy things hard
by Anonymous Monk on Oct 09, 2009 at 06:18 UTC

      I have considered that too and although I must admit that you're most right about pointing it out, I decided to not care about security just yet simply because I was only trying to make some sort of proof-of-concept kind of thing. However, since I'm indeed planning of using a similar system into a real website I'm developing I will heed your warnings.

      So thank you for bringing the security issue to my attention!