Re: Regular Expressions Matching with Perl

Replies are listed 'Best First'.
Re^2: Regular Expressions Matching with Perl by hv (Prior) on Apr 21, 2005 at 09:54 UTC
Is it guaranteed that the compression method field will never nest a space, like "LZW cmp" or something? When building a regexp against sample data (as opposed to "against a specification") my approach tends to be exactly the opposite of Fletch's - make the regexp constrain as much as possible, so that I can warn if I ever see new data that violates my expectations: `$line =~ m{^ \s* \d+ # size? \s+ \w+ # compression method \s+ \d+ # compressed size? \s+ \d+ % # compression ratio \s+ \d+ - \d+ - \d+ # date \s+ \d+ : \d+ # time \s+ [0-9a-f]{8} # checksum? \s+ (.*) # filename $}xi or warn "Couldn't match input line '$line'"; $filename = $1;` [download] It is worth checking whether it is possible to store a filename with some odd characters to see what happens, such as a newline, backslash etc. Similarly it is worth looking for boundary conditions on other fields - if the size is more than 8 digits does it still retain at least one following space? Hugo	[reply] [d/l]
Re^3: Regular Expressions Matching with Perl by ikegami (Patriarch) on Apr 21, 2005 at 14:18 UTC
Is it guaranteed that the compression method field will never nest a space, like "LZW cmp" or something? yes, I think it's always one word (and probably specifically for easy parsing, judging by the odd names). When building a regexp against sample data my approach tends to be exactly the opposite of Fletch's I call the two approaches "Extraction" (`/:.{15}(.)/`) and "Validation" (your's). Which I use is determined by the situation. Sometimes, there's a happy middle that's a mixture of both (Fletch's `/[[:hexdigit:]]{8}\s+(.)$/`).	[reply] [d/l] [select]