Dallaylaen has asked for the wisdom of the Perl Monks concerning the following question:

Hello dear esteemed monks,

My toy web framework's documentation explicitly states that it should only accept validated data from remote user, the natural and basic validation method being of course regular expression processing.

However, the path_info parameter is for now accepted as-is - one of my early design mistakes. An application itself is divided into paths (much like in dancer); anything in the URI following the matching part is considered additional input. So the current usage is:

MVC::Neaf->route( "/foo/bar" => sub { my $request = shift; $request->param( name => qr/\w+/ ); # undef unless name is 1+ word + characters $request->path_info(); # oops user input slips through } );

Here any URI that doesn't match any of the configured routes would return a customizable 404 Not Found page. So would a handler that calls die 404; or $request->error(404, %params); at some point.

Now I would like to correct this mistake by adding path_components => qr/.../ parameter to the handler definition and path_components() method to the request object that would return path_info itself, followed by capture groups $1, $2 ... in the validation regexp (if any). If the regexp doesn't match (or wasn't specified), the application would just show a 404 page.

MVC::Neaf->route( "/foo/bar" => sub { my $request = shift; $request->path_components->[0]; # 1+ digits guaranteed }, path_components => qr/\d+/ );

This way only the parts of application that actually need path_info (wiki pages, /calendar/YYYY/MM/DD etc) would get it, while the others would just reply with 404 unless called correctly.

So my questions here are:

1) Does this scheme seem reasonable?

2) What would a be better name for path_components? It's too long and clumsy, but I'll take it if I can't come up with something better.

Replies are listed 'Best First'.
Re: Cool uses for path_info
by shmem (Chancellor) on Nov 23, 2016 at 19:54 UTC

    I would just fix path_info() untainting it, and done. After all, user input may slip through, if it is valid and doesn't do any harm. I haven't looked through the entire module, so I can only guess that requests fail elsewhere with a 404, if path_info() doesn't provide anything useful for a component relying on it.

    But then, you probably do untaint both the environment and user input as early as possible, don't you? If not, you should have a very good reason.

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

      Thanks for your reply.

      Currently, a customizable 404 page is returned if (1) URI doesn't match any route configured in the application, or (2) user called die 404; (or its longer analog) in the handler. Cookies and parameters have signature like $request->param( name => qr/.../ ); . Sorry for not explaining in the question.

      And yes, fixing path_info() into "untaint" style was my first thought. However, after trying it out I noticed that only few paths in an actual application require path_info, and in those that don't I keep using a boilerplate along the lines of

      die 404 if $request->path_info(qr/.*/);

      Consider something like

      /questions
      /questions/tagged/\w+
      

      I would like to get a 404 upon requesting /questions/foobar automatically, without having to specify anything in the handler.

      Also if there's something like

      /history/\d{4}/\4{2}

      the path is likely to be processed with further regexp extracting specific values, so why not do it for the user at once and return captured values?

      That's why I'm thinking of going for a more convoluted API and deprecating path_info() altogether. Complex APIs are evil, but so is boilerplate code and unneeded repetition.

      I ended up adding path_info_regex parameter to the path handler definition that untaints path_info for future use, while resulting in 404 if it doesn't match. Current behavior is deprecated and will be phased out in future versions (in fact, the regex will just get a default value equal to ^$). What I originally came up with was clearly overengineered. Thanks again for the discussion!
Re: Cool uses for path_info
by RonW (Parson) on Nov 28, 2016 at 19:46 UTC

    Seems to me that when a new route is added, the default should be that path_info exactly match the route. Otherwise, either supply the route as a regex, or supply an optional parameter which is a regex for matching acceptable additional path_info content. If either the route regex or the additional regex failed to match, then the try the next route.

    Either way, path_info would be validated before it was available to any handler.

      Maybe I wasn't clear enough, I meant path_info() to be the part of path after the matched route (which I refer to as script_name, following the CGI specification more or less), not including the matched route.

      That said, I did it exactly as you suggested.