OK, (mostly) as an intellectual excercise I'm working on writing a simple tokenizer using perl and regexes. Never mind that this might be better done by reading the input in a linear fashion and using some RPN or something like that, I'm using regexes (mainly because I'm rubbish at using 'em).
Now, the problem: When splitting the input line into quoted and unquoted data I use the following regex:
Now, wanting to make this thing able to use both " and ' as quote characters I tried this:
This however has the rather unfortunate side effect of creating a lot of empty matches as well as (for some reason) matching the closing quote twice (the last time as a single character all by itself).