Pattern Match/Trim Variables.

LostS has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Pattern Match/Trim Variables. by fruiture (Curate) on Feb 17, 2003 at 19:10 UTC
"MM/DD/YYYY HH:MM:SS.XXX" and remove ".XXX" that's it. This directly translates to perl: `s{^ ( #capture \d{2} #MM / \d{2} #DD / \d{4} #YYYY \s+ \d{2} #HH : \d{2} #MM : \d{2} #SS ) #stop capturing \. \d{3} #XXX $} {$1}x #replace by captured` [download] Have you read perlrequick and perlre? -- http://fruiture.de	[reply] [d/l]
Re: Re: Pattern Match/Trim Variables. by LostS (Friar) on Feb 17, 2003 at 19:34 UTC
OK after taking time and looking at what you said has greatly helped... I began to think about how the code works and looked at your information and it worked great. Thank you... ----------------------- Billy S. Slinar Hardtail - Hand of Dane Datal Ephialtes - Guildless RallosZek.Net Admin/WebMaster `perl -le '$cat = "cat"; if ($cat =~ /\143\x61\x74/) { print "Its a cat +!\n"; } else { print "Thats a dog\n"; }'` [download]	[reply] [d/l]
Re: Pattern Match/Trim Variables. by dvergin (Monsignor) on Feb 17, 2003 at 20:01 UTC
I like to keep my regexes simple. Strip a dot and some digits from the end of a string: `my $str = '02/17/2003 11:44:19.123'; $str =~ s/\.\d+$//; print "$str\n";` [download] You say you are struggling with regexes, so here is the explanation: We could try to say `s/.\d+$//` but it happens that the dot is magic (matches most anything). So to match a dot we have to "escape" it with the backslash. Like this: `\.` Next comes `\d` which means "any digit". The plus after it means "one or more". Like this: `\d+` Then we have the `$` When `$` is used at the end of a regex, it means "anchor this match to the end of the string" (there are some nuances here we won't bother with). Actually, in this case we don't need the `$` anchor -- we know that there is only one place in the string that will match "a dot followed by some digits". But it is perhaps a kindness to the next human who looks at the code to provide this visual clue that the match is expected to occur at the end of the string. So now we have: `\.\d+$` We put this regex snippet into a substitution regex: `s/ / /` But we leave the second half empty. This means "whatever you matched in the first half of the `s///`, replace it with nothing at all". So reading `s/\.\d+$//` straight off the page, we could translate it as: match a literal dot followed by some digits anchored to the end of the string and replace them with nothing. Hope that helps. ------------------------------------------------------------ "Perl is a mess and that's good because the problem space is also a mess." - Larry Wall	[reply] [d/l]
Re: Pattern Match/Trim Variables. by enoch (Chaplain) on Feb 17, 2003 at 20:10 UTC
You can replace those 4 substition with one transliteration. `$line =~ tr/\t\n\r"/\x09\x0A\x0D'/;` [download] And, then, since the date format is fixed, just match up until the period. `$line =~ s{ ( # start capturing into $1 [\d\|/\|\s\|:]+ # match digits, # forward slashes, spaces, # or colons 1 or more times ) \. # stop capturing into $1 # when you hit a period \d\d\d} # match three more digits {$1}x; # replace it all w/ $1` [download] enoch edit: removed the `ig` options from the `tr` because they are not necessary (and not even valid) options.	[reply] [d/l] [select]
Re: Pattern Match/Trim Variables. by ihb (Deacon) on Feb 17, 2003 at 23:38 UTC
Others have replied regarding your question, so I leave that. But I do want to suggest that the variable `$line` should be removed here. This is mostly a style question, but imho the code gets a lot nicer if `$_` is used instead of another variable. The infamous `$line` is over-used, if you ask me. `foreach (@rows) { s/\t/\x09/ig; s/\n/\x0A/ig; s/\r/\x0D/ig; s/"/'/ig; print FILE qq{"$_",}; # Other delimiters too. $cell++; }` [download] Other issues with this code are left aside. `ihb`	[reply] [d/l] [select]
Re: Pattern Match/Trim Variables. by steves (Curate) on Feb 17, 2003 at 22:09 UTC
In defense of my pathetically ugly regexp, I offer two things: I was assuming, based on the original request, that some but not all dates needed fixing. Some of the other suggestions may not be concise enough to handle that. I sometimes (maybe too often) use that explicit \d\d type formatting to show what the pattern is. Anyway, \d{2} is 5 characters and \d\d is 4. 8-) But I'd probably take the first one that's nicely commented over mine.	[reply]
Re: Pattern Match/Trim Variables. by steves (Curate) on Feb 17, 2003 at 19:12 UTC
I'm not sure what those first three substitutions are. You appear to be replacing characters with themselves, specifying the match as a meta-character and the replacement as a hex value. The s command you want to get rid of the trailing piece in the date is: `my $date = '02/17/2003 14:09:34.087'; $date =~ s#^(\d\d/\d\d/\d\d\d\d \d\d:\d\d:\d\d)\.\d\d\d$#$1#;` [download]	[reply] [d/l]
Re: Re: Pattern Match/Trim Variables. by LostS (Friar) on Feb 17, 2003 at 19:21 UTC
Don't worry about those other replace strings. I am creating a CSV and need those for formating... But how do I do an: `if ($line =~ /\d\d\/\d\d\/\d\d\d\d \d\d:\d\d:\d\d.\d\d\d/) { $line =~ s#^(\d\d/\d\d/\d\d\d\d \d\d:\d\d:\d\d\.\d\d\d$#$1#; }` [download] Is that correct?? ----------------------- Billy S. Slinar Hardtail - Hand of Dane Datal Ephialtes - Guildless RallosZek.Net Admin/WebMaster `perl -le '$cat = "cat"; if ($cat =~ /\143\x61\x74/) { print "Its a cat +!\n"; } else { print "Thats a dog\n"; }'` [download]	[reply] [d/l] [select]
Re^3: Pattern Match/Trim Variables. by Coruscate (Sexton) on Feb 17, 2003 at 20:44 UTC
You could do that, but the if() statement is completely unnecessary. If the match is not found in the string, then it won't do anything, thus the if() is redundant. As for that regex, I'd recommend you look further into the discussion and pick out one of the other (shorter!) ones. If the above content is missing any vital points or you feel that any of the information is misleading, incorrect or irrelevant, please feel free to downvote the post. At the same time, reply to this node or /msg me to tell me what is wrong with the post, so that I may update the node to the best of my ability. If you do not inform me as to why the post deserved a downvote, your vote does not have any significance and will be disregarded.	[reply]
Re: Re: Re: Pattern Match/Trim Variables. by steves (Curate) on Feb 17, 2003 at 22:18 UTC
Those first three substitutions are doing nothing unless there's some magic I'm missing. A tab has hex value 09, a newline hex value 0a, and a carriage return hex value 0d. The metacharacter representation and the hex representation in your substitutions evaluate to the same thing. So you're replacing each one with itself.	[reply]


Your skill will accomplish what the force of many cannot
	PerlMonks