in reply to Re: Regular Exp parsing
in thread Regular Exp parsing

Just for good marks and all.. when you are dealing with a regex expression that will match a whole line it is good form to use ^(matches the beginning of a line) and $(matches the end of a line) to speed processing. So your regex expression would change to:

/^\A(\S+) (\S+) (\S+) (\d+):$/;

Also, if you truly wanted $1 to be set you could do so by just executing the regex statement provided by MarkM.

So instead of:

my($wday, $mon, $mday, $time, $year) = $var =~ /\A(\S+) (\S+) (\S+) (\d+):/;
it would just be

$var =~ /\A(\S+) (\S+) (\S+) (\d+):/;
and $1 would be set equal to the first match

$2 to second

and so on

These are static/constant variables so to modify them you would have to assign them to a seperate variable as MarkM has done. If however you just need to display or store the results why generate additional variables?

If you are doing this over a large number of entries you also might want to look into optimizing your statements using the lookaheads (I think that is the correct term) which allow the regex expression to set a qualifier before attempting to match any further into the string/line/block, etc..

example

(?:[SMTWF]) warning I know my syntax is off so please don't use this.
at the beginning of your regex string should help to quickly skip those lines which do not start with a capital letter from the days of the week, nifty huh?

Just my .02 cents since I love regex.

Dave -- Saving the world one node at a time

Replies are listed 'Best First'.
Re: Re: Re: Regular Exp parsing
by MarkM (Curate) on Dec 13, 2002 at 21:25 UTC

    Zapowork: \A..\z is just as efficient as ^..$

    \A..\z should be used to anchor a pure string, wheras ^..$ should be used to anchor a line. For most cases, the difference is subtle enough that, virtually, there is no difference (this is why cookbook examples, and a lot of existing code is able to get away with never using \A..\z). Still, it is proper to be accurate. If it is not expected, or acceptable for a string to end with '\n', \z should be used instead of $.

    For example:

    if ($ARGV[0] =~ /^-o$/) { ... }

    Will match "-o" or "-o\n". For command line arguments, "-o\n" should not be allowed. The more accurate expression is:

    if ($ARGV[0] =~ /\A-o\z/) { ... }

    The reason I am so rigid about this point is that I have been hit by the difference in production code. I am now very strict about use \A..\z for strings and ^..$ for lines.

      Hi Mark,

      That's great information. I didn't know that \n would not be literraly matched when using $ as an anchor. I normally chomp all my strings before they get to that point so I hadn't encountered it. Knowing this now though is there a reason as to why? Does $ assume EOL characters?

      BTW - Did you mean to put a \z in your initial example?

      Dave -- Saving the world one node at a time

        The $ thing is due to legacy behaviour, and the fact that when most people say $, they mean "end of string, or end of line, but not the end of line itself." There is no question that $ is one of the most useful regexp primitive operators there is. Just, people are very comfortable with using it, and so, sometimes it gets used in places where it is questionable to use, or very rarely, in places where problems can arise.

        In my initial example, I used ':' instead of '\z', because the original example looked as if the year was trailed with a ':' and since I didn't know exactly what was after the ':', I figured it would be simpler to just not care, and align the regexp based on the ':'. In the original example, the ':' may have been a typo, in which case I probably would have used \z as you suggest.

        Cheers,
        mark