Re: Regular Expressions Matching with Perl
by ikegami (Patriarch) on Apr 20, 2005 at 17:40 UTC
|
chomp($line);
push(@temp, (split(' ', $line, 8))[-1]);
Tested. | [reply] [d/l] |
|
|
Is it guaranteed that the compression method field will never nest a space, like "LZW cmp" or something?
When building a regexp against sample data (as opposed to "against a specification") my approach tends to be exactly the opposite of Fletch's - make the regexp constrain as much as possible, so that I can warn if I ever see new data that violates my expectations:
$line =~ m{^
\s* \d+ # size?
\s+ \w+ # compression method
\s+ \d+ # compressed size?
\s+ \d+ % # compression ratio
\s+ \d+ - \d+ - \d+ # date
\s+ \d+ : \d+ # time
\s+ [0-9a-f]{8} # checksum?
\s+ (.*) # filename
$}xi or warn "Couldn't match input line '$line'";
$filename = $1;
It is worth checking whether it is possible to store a filename with some odd characters to see what happens, such as a newline, backslash etc. Similarly it is worth looking for boundary conditions on other fields - if the size is more than 8 digits does it still retain at least one following space?
Hugo | [reply] [d/l] |
|
|
Is it guaranteed that the compression method field will never nest a space, like "LZW cmp" or something?
yes, I think it's always one word (and probably specifically for easy parsing, judging by the odd names).
When building a regexp against sample data my approach tends to be exactly the opposite of Fletch's
I call the two approaches "Extraction" (/:.{15}(.*)/) and "Validation" (your's). Which I use is determined by the situation. Sometimes, there's a happy middle that's a mixture of both (Fletch's /[[:hexdigit:]]{8}\s+(.*)$/).
| [reply] [d/l] [select] |
Re: Regular Expressions Matching with Perl
by Transient (Hermit) on Apr 20, 2005 at 17:36 UTC
|
$filename = (split( ' ',$line, 8 ))[-1]
| [reply] [d/l] [select] |
|
|
This won't work because of the leading spaces. See my reply for the fix.
| [reply] |
|
|
| [reply] |
|
|
It will always be in the 8th column - I'll give that a shot and see. Just tried my regexp and it didn't do a thing. I'll give this a shot and see what happens.
| [reply] |
Re: Regular Expressions Matching with Perl
by davidrw (Prior) on Apr 20, 2005 at 17:44 UTC
|
If you were able to use modules, i would suggest Archive::Zip. But here is a non-module solution:
$line =~ s/^\s+//g; # strip leading whites
+pace
$line =~ s/\s+$//g; # strip trailing white
+space
my @cols = split(/ +/, $line); # split on spaces
my $filename = join(' ', splice(@lines, 7) ); # piece back together
+the filename
push @temp, $filename; # store filename
Could possibly try a fixed-width solution, but i'd be worried that it wouldn't work if the first or third columns varied too much in size.
Update: I think i overthought a little and forgot about the LIMIT parameter to split() -- probably better than breaking and re-gluing the filename. | [reply] [d/l] |
|
|
I've thought about Archive::Zip as well, but its not one on our system. :-)
| [reply] |
|
|
there is an alternative way to "install" pure perl modules: copy&paste their source code directly on your script... it will require some changes, but usually small, trivial ones.
| [reply] |
Re: Regular Expressions Matching with Perl
by Fletch (Bishop) on Apr 20, 2005 at 17:40 UTC
|
Regexen should be a short as possible, but no shorter.
my( $file ) = /[[:hexdigit:]]{8}\s+(.*)$/;
| [reply] [d/l] |
|
|
| [reply] [d/l] |
|
|
/:\d\d\s\s[[:hexdigit:]]{8}\s\s(.*)$/
| [reply] [d/l] |
|
|
Re: Regular Expressions Matching with Perl
by NateTut (Deacon) on Apr 20, 2005 at 17:40 UTC
|
I would think substr would be a good choice too since you seem to be dealing with fixed length fields/records. | [reply] |
|
|
Unfortunately, I think substr will fail when the raw size of the file is 10MB of more, when the compressed size of the file is 10MB or more, or when the compression ratio is 100%. (I've seen it round to 100% once.)
| [reply] [d/l] |
|
|
You wouldn't need to run substr on the whole file at one time, you would run it against each line of the file separately like this:
use strict;
use warnings;
use Data::Dumper;
my @Temp;
use constant FileNameStart => 58;
while(<DATA>)
{
chomp();
push(@Temp, substr($_,FileNameStart));
}
print(Dumper(@Temp));
__DATA__
0 Stored 0 0% 04-20-05 08:43 00000000 test 1 2 3.z
+ip
704106 DeflatN 83362 89% 04-04-05 19:00 8e76dc22 file1.dat
| [reply] [d/l] |
|
|