in reply to regex with /s modifier does not provide the expected result
I think choroba is correct, that the real problem is the $line variable passed to the sub has already been truncated at the newline character, and there is nothing wrong with your regex.
Anyway, here are some things I noticed:
It looks like you forgot to escape the dot characters in your regex's date capture group, to match dots rather than any character.
Same with the dot before the microseconds field.
No space characters are required between the digits at the end of the date field and the beginning of the time field, which leads to the need to over-specify the time and date fields in order to avoid ambiguity and false matches.
The single character separating the "pool-def" and "class-name" fields should probably be an explicit space, rather than any character.
The space which terminates "class-name" is included in the capture, plus specifying non-greedy repetition doesn't make sense to me here.
So for what it's worth, here is how I would handle the regex:
my $line = "30.08.2016 08:00:00.004 *ERROR* [pool-7-thread-5] com.day. +cq.reporting.impl.snapshots.SnapshotServiceImpl Error accessing repos +itory during creation of report snapshot data\njavax.jcr.LoginExcepti +on: Cannot derive user name for bundle com.day.cq.cq-reporting [313] +and sub service null"; my $pattern = 'ERROR'; my $r = qr/([\d\.]+)\s+([\d:]+)\.?\d*\s+\*(\Q$pattern\E)\*\s+(\[.*?\]) +\s+([\w\.]+)\s*(.*)/s; my @fields = $line =~ $r; unshift(@fields, undef); # 1-base to match regex capture indexes for my $i (1..$#fields) { print "[$i] '$fields[$i]'\n"; }
Output:
[1] '30.08.2016' [2] '08:00:00' [3] 'ERROR' [4] '[pool-7-thread-5]' [5] 'com.day.cq.reporting.impl.snapshots.SnapshotServiceImpl' [6] 'Error accessing repository during creation of report snapshot dat +a javax.jcr.LoginException: Cannot derive user name for bundle com.day.c +q.cq-reporting [313] and sub service null'
I don't like to over-specify fields when I expect the input's format to be sane (of course, I have only seen one sample of your log entries, so grain of salt...), so my regex's date field matches any blob of digits with dots, and its time field matches any blob of digits and colons. I think this helps with maintainability, as well as tolerance to small changes to the log file's format in the future.
My regex's class field is similarly a blob of word characters and dots, but is actually more specified than yours: something with a dot in it, one character after "pool-def", terminated by a space.
Also, all fields in my regex are explicitly separated by one or more space characters, which helps avoid undesired matchings due to ambiguity introduced by the under-specified date and time fields.
|
|---|