I've used regex branches instead of your for loop, and moved the matching into the while condition to eliminate the explicit loop control and to avoid the repeated zero-length matches. I've replaced the AoA with named captures.use strict; use warnings; $_ = q( \bib{ref0}{article}{ author={Y. Bartal}, volume={37}, pages={184}, date={1996}, issn={0272-5428}, } ); my @tokfd; my $tokre = qr{ (?<bib> \\bib(?![A-Za-z]) ) | (?<text> (?s: \\(?:[A-Za-z]+|.) ) ) | (?<comment> \%.*\n\s* ) | (?<equal> \= ) | (?<begin> \{ ) | (?<end> \} ) | (?<space> \s+ ) | (?<word> [A-Za-z0-9_\-\.]+ ) | (?<text> [^\\\%\=\{\}\sA-Za-z0-9_\-\.] ) }x; push @tokfd, [ keys %+, values %+ ] while /\G$tokre/gc; die "internal error: amsref reader tokenizer cannot match input line: +($_) at" . pos($_) if ( $+[0] != length ); for my $t (@tokfd) { my ( $i, $c ) = @$t; $c =~ s/\n/\\n/g; printf qq(%-8s "%s"\n), $i, $c; }
As far as I can tell it produces the same output as yours, but I think it's a little more concise. It is also easy to see in the output when you accidentally make a branch matching the null string.
I hope it is to your liking.In reply to Re: The story of a strange line of code: pos($_) = pos($_);
by rubasov
in thread The story of a strange line of code: pos($_) = pos($_);
by ambrus
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |