Regular expression

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regular expression by Masem (Monsignor) on May 01, 2001 at 17:30 UTC
Make it easier on yourself, and use split to grab each item, since you already have the spaces there: `my ( $id, $time, $date, $col1, $col2, $col3, $col4 ) = split / /, $line;` [download] Then you can concentrate on any other checks that you want to do to make sure the number if valid. Note that in your expression to get the E number, your are using '.', which you need to escape if you want to match a decimal point, otherwise it will simply match any character. Try somethign like: `\d\.\dE[-+]\d\d` (particular if this is coming from a fortran or c output code). Dr. Michael K. Neylon - mneylon-pm@masemware.com \|\| "You've left the lens cap of your mind on again, Pinky" - The Brain*	[reply] [d/l] [select]
Re: Regular expression by Anonymous Monk on May 01, 2001 at 17:40 UTC
I was asking about efficiency between split and using a regx: Here is the stuff: Using split: while(<DATA>) { ($time,$date,$data1,$data2) = (split)[1,2,6,7]; print "$time $date $data1 $data2", "\n"; } the code took:26 wallclock secs ( 2.17 usr + 0.55 sys = 2.72 CPU) Using an REGX: while(<DATA>) { /^(\d+) (\d\d:\d\d:\d\d) (\d\d-\w+-\d{4}) (\d+) (\d+\s+\d+) (\d.\dE +[+-]\d+) (\d.\dE[+-]\d+)/; print "$2 $3 $6 $7\n"; } the code took:25 wallclock secs ( 1.98 usr + 0.73 sys = 2.71 CPU) Much of a muchness really. Regards, Stacy. [download]	[reply] [d/l]
Re: Re: Regular expression by Anonymous Monk on May 01, 2001 at 17:54 UTC
Hmmm... If I print to a file instead of STDOUT, I get the code down to 13-14 seconds for both methods ... Regards, Stacy.	[reply]
Re: Regular expression by suaveant (Parson) on May 01, 2001 at 17:47 UTC
No one really told you why you query for the last number failed, but Masem sorta did... the reason it failed was that \d.\d matches one number, then any one character, then one number... as Masem said, you need to use \. in order to escape . to match . instead of one of any character, but you also needed \d* or \d+ after the \. to match more than one number. You were only matching the 5 after the . instead of 509667... so the regex couldn't match the E - Ant	[reply]
Re: Re: Regular expression by Anonymous Monk on May 01, 2001 at 18:17 UTC
So what if there is a '-' in front of the data in the last column: `/^(\d+) (\d\d:\d\d:\d\d) (\d\d-\w+-\d{4}) (\d+) (\d+\s+\d+) ([-]\d+\. +\d+E[-+]\d+)/; It don't work...` [download]	[reply] [d/l]
Re: Re: Re: Regular expression by suaveant (Parson) on May 01, 2001 at 18:55 UTC
-? allows for one or no - ? is one or none, * is zero or more, + is 1 or more - Ant	[reply]
Re: Regular expression by DeaconBlues (Monk) on May 01, 2001 at 19:55 UTC
Depending the context of parsing the log, you might find it easier to use AWK!! Woohoo! Something like: `{ print $2"\|"$3"\|"$7 }` [download] You would run something like this `awk '{print $2"\|"$3"\|"$7}' web.log \| perl parselog.pl` [download] Then your perl would be something like this `while (<>) { chomp; my ($time, $date, $expo) = split /\\|/; print "$time, $date, $expo\n"; }` [download] I have recently starting using AWK to parse through delimited files. It's nice. Sorry about suggesting a non-perl solution. :-) I think it might be *NIX only too, but I am not sure.	[reply] [d/l] [select]
Re: Regular expression by le (Friar) on May 01, 2001 at 17:34 UTC
Maybe this will work: `while (<DATA>) { print "$1 $2 $3\n" if /^\S+\s+(\S+)\s+(\S+)\s+\S+\s+\S+\s+\S+\s+(\ +S+)$/; }` [download] Might be not too efficient.	[reply] [d/l]
Re: Regular expression by converter (Priest) on May 01, 2001 at 19:07 UTC
If all your records have the same format, split is probably the way to go: `while (<DATA>) { (undef, $time, $date, undef, undef, undef, $num) = split; }` [download]	[reply] [d/l]