Re^3: Perl Regex

If you want to build an expression parser, you want to separate the tokens. If you really, really want to write your tokeniser as a single regular expression, the following might help you:

#!perl -w
use strict;
use Data::Dumper;

while (<DATA>) {
    my @tokens= m!\s*
                  (
                   \d+    # any sequence of digits
                  |[-+x/] # or an operator
                  )
                  \s*
                 !sxg;
    print Dumper \@tokens;
}

__DATA__
--1
300+400x500
3 + 4 x 5
[download]

Note that this tokenizer does not care for signs and also does not care for syntactical/semantical correctness. If you introduce parentheses, you will have to check their matching in the actual parser, and you will also have to deal with unary minus, like "-1" and "+1", which are somewhat different from expressions like "2 - 1" and "2 + 1".

Update: I now realize that you're trying to tackle this problem in a larger context. I recommend against trying to use one regular expression for the whole task. First extract the formula, then split up the formula into its terms. Do not try to merge the capturing of an unknown number of elements together with the capturing of a known number of elements.

Comment on Re^3: Perl Regex Download Code

Replies are listed 'Best First'.
Re^4: Perl Regex by Buddyhelp (Initiate) on Oct 10, 2013 at 11:43 UTC
Thank you very much for the input. But it seems to be a little complicated for me to put this in. If you see in the code that I have posted, I am able to get all the values except for the words like get,come and go. Can you please tell me why is that, how I can modify the existing regex to retrieve the get,come and go as well. Thanks,	[reply]