Re^2: use English; and performance

i'm trying to create a string tokenizer for a config file parser and the best that i've managed to think of is this:

#!/usr/bin/perl

use strict;
use Data::Dumper;

my $line = q[keyword1 value keyword2 "value with spaces" keyword3 valu
+e];

print Dumper tokenize_line($line);

sub tokenize_line {
    my $line = shift;

    my @tokens;
    while ($line =~ /(\S+)/g) {
        # every non-space match is a token
        push @tokens, $1;

        # anything in double-quotes is a single token
        if ($line =~ /\G\s*"(.+?)"/) {
            push @tokens, $1;
            # continue from this last match
            $line = $';
        }
    }

    return \@tokens;
}
[download]

wich outputs this:

$VAR1 = [
          'keyword1',
          'value',
          'keyword2',
          'value with spaces',
          'keyword3',
          'value'
        ];
[download]

i know it's an ugly hack, trying to substitute the original string with the rest of the matched pattern ($line = $';), but in my previous attempts i would use split and substr to achieve the same results... and it was very ugly :)
what would be a better way to write this? i will be parsing some hundred lines from a config file, so i don't think i want a performance penalty. thank you all for your time and advice!

:)))))

Comment on Re^2: use English; and performance Select or Download Code

Replies are listed 'Best First'.
Re^3: use English; and performance by Aristotle (Chancellor) on Mar 03, 2006 at 16:09 UTC
Use `/gc` in your speculative match. `/c` prevents `pos()` from being reset on match failure. Makeshifts last the longest.	[reply]