comment on

For the sake of TIMTOWTDI I've tried to rewrite your code a little, by moving much of your explicit looping logic into the regex, letting the regex engine do the dirty work. Here it is:

use strict;
use warnings;

$_ = q(
\bib{ref0}{article}{
        author={Y. Bartal},
        volume={37},
        pages={184},
        date={1996},
        issn={0272-5428},
}
);

my @tokfd;
my $tokre = qr{
    (?<bib>     \\bib(?![A-Za-z])             )
  | (?<text>    (?s: \\(?:[A-Za-z]+|.) )      )
  | (?<comment> \%.*\n\s*                     )
  | (?<equal>   \=                            )
  | (?<begin>   \{                            )
  | (?<end>     \}                            )
  | (?<space>   \s+                           )
  | (?<word>    [A-Za-z0-9_\-\.]+             )
  | (?<text>    [^\\\%\=\{\}\sA-Za-z0-9_\-\.] )
}x;

push @tokfd, [ keys %+, values %+ ] while /\G$tokre/gc;
die "internal error: amsref reader tokenizer cannot match input line: 
+($_) at" . pos($_)
  if ( $+[0] != length );

for my $t (@tokfd) {
  my ( $i, $c ) = @$t;
  $c =~ s/\n/\\n/g;
  printf qq(%-8s "%s"\n), $i, $c;
}
[download]

I've used regex branches instead of your for loop, and moved the matching into the while condition to eliminate the explicit loop control and to avoid the repeated zero-length matches. I've replaced the AoA with named captures.

As far as I can tell it produces the same output as yours, but I think it's a little more concise. It is also easy to see in the output when you accidentally make a branch matching the null string.

I hope it is to your liking.

In reply to Re: The story of a strange line of code: pos($_) = pos($_); by rubasov
in thread The story of a strange line of code: pos($_) = pos($_); by ambrus

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.