comment on

When designing my latest web application thingy, I decided on rather unique ( as far as I know ) query string parsing requirements. These requirements are basically that each 'o' arguement is associated with the value of all the following 'd' values, but using the value of the 'o' arguement. That is, instead of this query string: o=foo&d=baz&d=bar&o=one&d=uno&o=none&totallyrandom=foo being parsed like so:

{
  'totallyrandom' => 'foo',
  'd' => [
           'baz',
           'bar',
           'uno'
         ],
  'o' => [
           'foo',
           'one',
           'none'
         ]
};
[download]

Which is the default for CGI.pm and others, I want it to be parsed as follows:

{
  'none' => undef,
  'one' => 'uno',
  'foo' => [
             'baz',
             'bar'
           ]
};
[download]

I realize that these requirements are a little odd, but I feel that they happen to suit my application very well, and will be easily usable and so on. However, this requires that I write my own querystring parser which I have done, and now I present it to you in the hopes that some monk can find an error or something thats not optimal so I can fix it now before its 'in production'. It's used in the form of my $u = new QueryParse; $u->handle( $ENV{QUERY_STRING} );


package QueryParse;
use strict;

sub new
{
  return bless {};
}

sub handle
{
  my $self = shift;
  my $query_string = shift;
  
  if( $query_string eq '')
  {
    local $/ = undef;
    $query_string = <STDIN>;
  }
  
  #o=foo & d=baz & d=qux & o=n & d=o & d=f
  my @query_string = split/[;&]/,$query_string;
  
  my %arg;
  for( my $i = 0; $i < @query_string; $i++ )
  {
    $_ = $query_string[ $i ];
    next unless /^[oO]=/;
    
    my $o = ( split/=/ )[ 1 ];
    my $j = $i + 1;
    my @opts;
    
    while( $j < @query_string )
    {
      $_ = $query_string[ $j ];
      last unless /^[dD]=/;
      my $dat = ( split/=/ )[ 1 ];
      push @opts,$dat;
      $j++;
    }
    
    for( @opts )
    {
      s
      /
        %
        (
          [0-9A-Fa-f]{2}    # match hex escapes ( %2b )
        )
      /
        chr( hex( $1 ) )    # convert in to ascii chars
      /xeg;                  # ignore whitespace, execute code, repeat
    }
    $o =~ s/%([0-9A-Fa-f]{2})/chr( hex( $1 ) )/eg; # same as above
    
    $arg{ $o } = @opts > 1 ? [ @opts ] : $opts[ 0 ];
  }
  
  return %arg;
}

1;
[download]

Two notes, one is that I'm deliberately ignoring options passed via POST in options are already being passed by GET, this is so I can use options in the query_string to specify that say, a file is being uploaded and use some other module to upload the file and interface with it.

The other note is that I deliberately 'inlined' the regex for unescaping the hex codes in the options rather then using the module URI::escape as the author of URI::escape mentions in the docs that 'unescape' does the exact same thing as the regex and calling the function adds anywheres from 40% to 70% slow down.

In reply to Reinventing wheels: query string parsing. by BUU

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.