there's been a lot of talks about how to parse given logfile formats -- but my question today is: What's the best way to log a set of fields per line in a somewhat readable format in order to be able to parse them efficiently later?
Text::CSV comes to mind. Logging fields comma-separated, escaping comma-containing fields with double-quotes ("...") and escaping quotes by doubling them ("") works nicely:
However, Text::CSV is slow and chokes on special characters (try parsing "fü,bär,bäz").use Text::CSV; my $csv = Text::CSV->new(); while(<DATA>) { $csv->parse($_) or die "Parse error"; my @cols = $csv->fields(); print join('/', @cols), "\n"; } __DATA__ foo,bar,baz foo,"bar baz","foo ""the bar"" baz"
Being able to determine the log format is certainly an advantage, how about the following: Separate fields by spaces, literal spaces are escaped by a backslash (\ ) and backslashes are escaped by another backslash (\\). The following code parses this format:
I bet there's plenty of other ways to do this -- what's the most efficient one? My conditions are that the format is readable (so no \0 as field separator) and that the separator could show up in a field value.while(<DATA>) { my @columns; while(/("(?:\\\\|\\"|.)*?")| # "quoted" (\S+) # unquoted /gx) { my $match; if(defined $1) { # matched quoted $match = $1; $match =~ s/^"//; # remove opening " $match =~ s/"$//; # remove closing " } else { $match = $2; } $match =~ s#\\\\#\\#g; # \\ -> \ $match =~ s#\\"#"#g; # \" -> " push(@columns, $match); # store } print join('/', @columns), "\n"; } __DATA__ foo bar baz foo "bar \\ baz" "foo \"the bar\" baz" fü bär bäz
Who's got the most efficient format/algorithm?
In reply to Who beats Text::CSV? by saintmike
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |