Regular Exp parsing

Cupojava has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regular Exp parsing by tadman (Prior) on Dec 13, 2002 at 18:14 UTC
Since your data is delimited by spaces, you can really simplify this using `split`: `chop($date_string); # Remove colon my @date = split(' ', $date_string);` [download] Now `$date[0]` is `'Fri'`, `$date[1]` is `'Nov'` and so forth. If you're weary of using `chop`, you can always use `substr` instead.	[reply] [d/l] [select]
Re: Regular Exp parsing by MarkM (Curate) on Dec 13, 2002 at 17:42 UTC
The easiest way to parse string such as this is: `my($wday, $mon, $mday, $time, $year) = $var =~ /\A(\S+)\s+(\S+)\s+(\S+)\s+(\d+):/;` [download] Broken down -- it scans var for a regexp that matches a specific pattern, returning parameters (the things in '()') as a list. If this is done in a loop, tag 'or next' on the end to make it skip matches that do not apply. NOTE: Updated in response to Cupojava's observation.	[reply] [d/l]
Re: Re: Regular Exp parsing by Zapawork (Scribe) on Dec 13, 2002 at 18:58 UTC
Just for good marks and all.. when you are dealing with a regex expression that will match a whole line it is good form to use ^(matches the beginning of a line) and $(matches the end of a line) to speed processing. So your regex expression would change to: /^\A(\S+) (\S+) (\S+) (\d+):$/; Also, if you truly wanted $1 to be set you could do so by just executing the regex statement provided by MarkM. So instead of: `my($wday, $mon, $mday, $time, $year) = $var =~ /\A(\S+) (\S+) (\S+) (\d+):/;` [download] it would just be `$var =~ /\A(\S+) (\S+) (\S+) (\d+):/;` [download] and $1 would be set equal to the first match $2 to second and so on These are static/constant variables so to modify them you would have to assign them to a seperate variable as MarkM has done. If however you just need to display or store the results why generate additional variables? If you are doing this over a large number of entries you also might want to look into optimizing your statements using the lookaheads (I think that is the correct term) which allow the regex expression to set a qualifier before attempting to match any further into the string/line/block, etc.. example `(?:[SMTWF]) warning I know my syntax is off so please don't use this.` [download] at the beginning of your regex string should help to quickly skip those lines which do not start with a capital letter from the days of the week, nifty huh? Just my .02 cents since I love regex. Dave -- Saving the world one node at a time	[reply] [d/l] [select]
Re: Re: Re: Regular Exp parsing by MarkM (Curate) on Dec 13, 2002 at 21:25 UTC
Zapowork: \A..\z is just as efficient as ^..$ \A..\z should be used to anchor a pure string, wheras ^..$ should be used to anchor a line. For most cases, the difference is subtle enough that, virtually, there is no difference (this is why cookbook examples, and a lot of existing code is able to get away with never using \A..\z). Still, it is proper to be accurate. If it is not expected, or acceptable for a string to end with '\n', \z should be used instead of $. For example: `if ($ARGV[0] =~ /^-o$/) { ... }` Will match "-o" or "-o\n". For command line arguments, "-o\n" should not be allowed. The more accurate expression is: `if ($ARGV[0] =~ /\A-o\z/) { ... }` The reason I am so rigid about this point is that I have been hit by the difference in production code. I am now very strict about use \A..\z for strings and ^..$ for lines.	[reply] [d/l] [select]
Re: Re: Re: Re: Regular Exp parsing by Zapawork (Scribe) on Dec 13, 2002 at 21:44 UTC
Re: Re: Re: Re: Re: Regular Exp parsing by MarkM (Curate) on Dec 13, 2002 at 21:49 UTC
Re: Re: Regular Exp parsing by Cupojava (Novice) on Dec 13, 2002 at 21:05 UTC
I see what you are doing but it isn't going to work because the string mday sometimes is a single digit and sometimes a double causing the spaces between them to differ....	[reply]
Re: Regular Exp parsing by senik148 (Beadle) on Dec 13, 2002 at 18:43 UTC
or this.. `$string = "Fri Nov 8 15:00:02 2002:"; my ($1, $2, $3, $4, $5) = split(/ /, $string);` [download]	[reply] [d/l]
Re: Re: Regular Exp parsing by MarkM (Curate) on Dec 13, 2002 at 18:54 UTC
You can't use $1, $2, ... in my(). It would be far better to avoid using $1, $2, ... altogether and use properly named variables that do not have any scoping issues.	[reply]
Re: Re: Regular Exp parsing by Cabrion (Friar) on Dec 14, 2002 at 14:13 UTC
or this.. `$string = "Fri Nov 8 15:00:02 2002:"; my ($1, $2, $3, $4, $5) = split(/ +/, $string);` [download] Noting that the + allows for one or more spaces.	[reply] [d/l]
Re: Re: Re: Regular Exp parsing by Cabrion (Friar) on Dec 14, 2002 at 14:16 UTC
Oops, corrected as the use of $1, $2, etc. wouldn't work as pointed out above since they are "magic" variables used by regexps. `$string = "Fri Nov 8 15:00:02 2002:"; my ($a, $b, $c, $d, $e) = split(/ +/, $string);` [download] Noting that the + allows for one or more spaces.	[reply] [d/l]
Re: Re: Re: Re: Regular Exp parsing by hardburn (Abbot) on Dec 16, 2002 at 15:24 UTC