I have a question that's making me feel like a newbie, but I'm stumped and now seeking wisdom.

I'm trying to transform a lot of text using a steam of tokens. In doing so I have to I tokenize a string using a single regular expression based on concatenation of smaller regular expressions in order to strip the first/next token in the string and pass it to a handler. Here is a simplified version of what I mean...

$regex = '\w+|\d+|\s+|.*?'; $text = 'The world is foo 2!'; while ($text=~s/^$regex//) { print "token: $1\n"; }

What I'm stumped on is the *best way* to determine what part of the regular expression the current token matched -- thereby telling me the type of token and which handler I should pass it to.

I'm trying to refrain from using ?{ } to set a key I can use to call the handler. I've had my share of issues with scripts that have used construct. Doing a second match seems rather inefficient also. I'm hoping there is another solution that I'm just not grasping. Can anyone offer any suggestions that may lift the veil of igornance from my eyes?


In reply to Determing what part of a regex matched. by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.