Cool uses for path

Dallaylaen has asked for the wisdom of the Perl Monks concerning the following question:

Hello dear esteemed monks,

My toy web framework's documentation explicitly states that it should only accept validated data from remote user, the natural and basic validation method being of course regular expression processing.

However, the path_info parameter is for now accepted as-is - one of my early design mistakes. An application itself is divided into paths (much like in dancer); anything in the URI following the matching part is considered additional input. So the current usage is:

MVC::Neaf->route( "/foo/bar" => sub {
    my $request = shift;
    
    $request->param( name => qr/\w+/ ); # undef unless name is 1+ word
+ characters
    $request->path_info(); # oops user input slips through
} );
[download]

Here any URI that doesn't match any of the configured routes would return a customizable 404 Not Found page. So would a handler that calls die 404; or $request->error(404, %params); at some point.

Now I would like to correct this mistake by adding path_components => qr/.../ parameter to the handler definition and path_components() method to the request object that would return path_info itself, followed by capture groups $1, $2 ... in the validation regexp (if any). If the regexp doesn't match (or wasn't specified), the application would just show a 404 page.

MVC::Neaf->route( "/foo/bar" => sub {
    my $request = shift;

    $request->path_components->[0]; # 1+ digits guaranteed
}, path_components => qr/\d+/ );
[download]

This way only the parts of application that actually need path_info (wiki pages, /calendar/YYYY/MM/DD etc) would get it, while the others would just reply with 404 unless called correctly.

So my questions here are:

1) Does this scheme seem reasonable?

2) What would a be better name for path_components? It's too long and clumsy, but I'll take it if I can't come up with something better.

Comment on Cool uses for path_info Select or Download Code

Replies are listed 'Best First'.
Re: Cool uses for path_info by shmem (Chancellor) on Nov 23, 2016 at 19:54 UTC
I would just fix `path_info()` untainting it, and done. After all, user input may slip through, if it is valid and doesn't do any harm. I haven't looked through the entire module, so I can only guess that requests fail elsewhere with a 404, if `path_info()` doesn't provide anything useful for a component relying on it. But then, you probably do untaint both the environment and user input as early as possible, don't you? If not, you should have a very good reason. perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'	[reply] [d/l] [select]
Re^2: Cool uses for path_info by Dallaylaen (Chaplain) on Nov 24, 2016 at 14:29 UTC
Thanks for your reply. Currently, a customizable 404 page is returned if (1) URI doesn't match any `route` configured in the application, or (2) user called `die 404;` (or its longer analog) in the handler. Cookies and parameters have signature like `$request->param( name => qr/.../ );` . Sorry for not explaining in the question. And yes, fixing `path_info()` into "untaint" style was my first thought. However, after trying it out I noticed that only few paths in an actual application require `path_info`, and in those that don't I keep using a boilerplate along the lines of `die 404 if $request->path_info(qr/./);` [download] Consider something like /questions /questions/tagged/\w+ I would like to get a 404 upon requesting `/questions/foobar` automatically, without having to specify anything in the handler. Also if there's something like /history/\d{4}/\4{2} the path is likely* to be processed with further regexp extracting specific values, so why not do it for the user at once and return captured values? That's why I'm thinking of going for a more convoluted API and deprecating `path_info()` altogether. Complex APIs are evil, but so is boilerplate code and unneeded repetition.	[reply] [d/l]
Re^2: Cool uses for path_info by Dallaylaen (Chaplain) on Nov 28, 2016 at 12:53 UTC
I ended up adding `path_info_regex` parameter to the path handler definition that untaints `path_info` for future use, while resulting in 404 if it doesn't match. Current behavior is deprecated and will be phased out in future versions (in fact, the regex will just get a default value equal to `^$`). What I originally came up with was clearly overengineered. Thanks again for the discussion!	[reply]
Re: Cool uses for path_info by RonW (Parson) on Nov 28, 2016 at 19:46 UTC
Seems to me that when a new route is added, the default should be that path_info exactly match the route. Otherwise, either supply the route as a regex, or supply an optional parameter which is a regex for matching acceptable additional path_info content. If either the route regex or the additional regex failed to match, then the try the next route. Either way, path_info would be validated before it was available to any handler.	[reply]
Re^2: Cool uses for path_info by Dallaylaen (Chaplain) on Nov 29, 2016 at 01:31 UTC
Maybe I wasn't clear enough, I meant `path_info()` to be the part of path after the matched route (which I refer to as `script_name`, following the CGI specification more or less), not including the matched route. That said, I did it exactly as you suggested.	[reply]