In your code, you slurp the entire file into an array, then join all the lines using a fixed string, and then use a regex that specifically includes the fixed string as the last thing to match. I don't quite understand why you're doing it this way, I don't see the advantage of this over a normal while (<$filehandle>) { ... } loop? I didn't really test your code because the sample log entry you provided doesn't actually match your regex, but from what I can tell, your code will silently skip any log entries that don't match the regex, including that it will always skip the last log entry.
I see a couple of other issues with your code: You don't Use strict and warnings, and you don't check some opens for errors. In your regexes, you don't need to put (...) capturing groups around things you don't actually want to capture into the $1, $2, ... variables, e.g. you can say /...T.../ instead of /...(T).../. You might also want to look into the /x regex modifier (perlre) to make your regexes easier to read and follow. Also, I'd strongly recommend using an appropriate module such as Text::CSV for CSV output.
I'm not sure I fully understand your questions. Instead, I can show you how I might have coded this. Personally, I like to validate the format of input files a little bit as I read them. Instead of DateTime::Format::Flexible, I'd use several DateTime::Format::Strptime parsers, and first use a heuristic to decide which format the log line has. It seems from your sample inputs that the log line formats are quite different, which is why I've duplicated the parsing and output logic in the if statements below, but if your log lines are instead similar, you should of course not duplicate that code and move the common parsing code outside of the ifs.
#!/usr/bin/env perl use warnings; use strict; use 5.010; # for /p and ${^MATCH} use DateTime; use DateTime::Format::Strptime; use Text::CSV; my $strp_one = DateTime::Format::Strptime->new(on_error=>'croak', time_zone=>'UTC', pattern => '%Y-%m-%dT%H:%M:%S.%6N'); my $strp_two = DateTime::Format::Strptime->new(on_error=>'croak', time_zone=>'UTC', pattern => '%a %b %d %H:%M:%S %Y'); my $csv = Text::CSV->new({binary=>1, always_quote=>1, blank_is_undef=> +1, eol=>$/, auto_diag=>2}); while (<DATA>) { chomp; if (/^\d{4,}-[\d\-T\:\.]+(?=\s+)/p) { my ($dts,$rest) = (${^MATCH}, ${^POSTMATCH}); my $dt = $strp_one->parse_datetime($dts); # parse "$rest" and break it into more fields here $csv->print(select, [ $dt->strftime('%Y-%m-%d-%H-%M-%S-%6N-%Z'), $rest ] ); } elsif (/^\w+\s+\w+\s+\d+\s+[\d\:]+\s+\d{4,}(?=\s+)/p) { my ($dts,$rest) = (${^MATCH}, ${^POSTMATCH}); my $dt = $strp_two->parse_datetime($dts); # parse "$rest" and break it into more fields here $csv->print(select, [ $dt->strftime('%Y-%m-%d-%H-%M-%S-%6N-%Z'), $rest ] ); } else { warn "Skipping unknown line format: $_" } } __DATA__ 2017-02-20T09:30:53.177000 20[] 0000000000000000 Error Description One Mon Feb 20 09:31:25 2017 [INFO] [AGENTEXEC] Error Description Two 2017-02-20T09:30:53.177000 20[] 0000000000000000 Error Description Thr +ee Mon Feb 20 09:31:25 2017 [INFO] [AGENTEXEC] Error Description Four
Output:
"2017-02-20-09-30-53-177000-UTC"," 20[] 0000000000000000 Error Descrip +tion One" "2017-02-20-09-31-25-000000-UTC"," [INFO] [AGENTEXEC] Error Descriptio +n Two" "2017-02-20-09-30-53-177000-UTC"," 20[] 0000000000000000 Error Descrip +tion Three" "2017-02-20-09-31-25-000000-UTC"," [INFO] [AGENTEXEC] Error Descriptio +n Four"
One disadvantage of the above approach is that if you have a lot of different date/time formats in your log files, you'd have to add more and more parsers. So if that's the case, you can also try using DateTime::Format::Flexible, and the same basic idea as above (use a regex to pull the date/time string from the beginning of the line before attempting to parse it) applies.
In reply to Re: DateTime::Format::Flexible; for Log Parse with multiple formatted lines
by haukex
in thread DateTime::Format::Flexible; for Log Parse with multiple formatted lines
by TCLion
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |