Is it guaranteed that the compression method field will never nest a space, like "LZW cmp" or something?
When building a regexp against sample data (as opposed to "against a specification") my approach tends to be exactly the opposite of Fletch's - make the regexp constrain as much as possible, so that I can warn if I ever see new data that violates my expectations:
$line =~ m{^
\s* \d+ # size?
\s+ \w+ # compression method
\s+ \d+ # compressed size?
\s+ \d+ % # compression ratio
\s+ \d+ - \d+ - \d+ # date
\s+ \d+ : \d+ # time
\s+ [0-9a-f]{8} # checksum?
\s+ (.*) # filename
$}xi or warn "Couldn't match input line '$line'";
$filename = $1;
It is worth checking whether it is possible to store a filename with some odd characters to see what happens, such as a newline, backslash etc. Similarly it is worth looking for boundary conditions on other fields - if the size is more than 8 digits does it still retain at least one following space?
Hugo |