Total number of lines for a Zipped file.

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Total number of lines for a Zipped file. by mce (Curate) on Oct 29, 2002 at 13:20 UTC
Cookbook page 278-279. `open(FILE, "< $filename"); # use < for taint reasons $count += tr/\n/\n/ while sysread(FILE, $_, 2 ** 16);` [download] I hope this helps --------------------------- Dr. Mark Ceulemans Senior Consultant IT Masters, Belgium	[reply] [d/l]
Re: Total number of lines for a Zipped file. by BrowserUk (Patriarch) on Oct 29, 2002 at 13:39 UTC
I'm not sure what your performance requirements are, but the following will scan a single directory of 100+ .pl files (for example) on my system, count & print the number of lines in each in the format `file:nn` in less 1 second. including printing. With the output re-directed to the null device it does several thousand files in less than 5 seconds. Update:Scrunched to prevent wrapping `perl -e"@_=map{glob}@ARGV;print$_,':'and$_=do{local(@ARGV,$/)=$_;<>},p +rint$Â£=s!$/!!g,$/for@_" c:\\.pl` [download] It is set up for use on a Win32 system so has extra stuff for getting around CMD's lack of globbing. I did some very unscientific tests on zip files (using WINZIP 7.0) and discovered that for simple text files there appears to be a relationship between the number of NL sequences in the zip and the number of lines in the file , ie. text-file-lines +3 = zip-file-NL's. Maybe you could find a similar relationship using your favorite zip utility on your system. Update: It seems that at least one person felt that the above was too obfuscated, so here's an expanded version as a normal program with a few comments. #! perl -sw use strict; die "Usage: $0 filespec\|glob [[filespec\|glob] ... ]\n" unless @ARGV; # On Win32 CMD doesn't do globing so we need to expand @ARGV ourselves +. @_ = map{glob}@ARGV; # print the name of the file and a colon print $_, ':' # Slurp the contents of the file into $_ and $_ = do{ local( @ARGV, $/ ) = $_; <>; }, # Use s///g to look for $/ (Newline for your system) in the scalar rea +d above # by assigning s///g to a scalar it returns the number of matches # So we print that and a newline print $Â£ = s!$/!!g, $/ # For all the files we found by globing the command line arguments. for @_; __END__ [download] Nah! Your thinking of Simon Templar, originally played by Roger Moore and later by Ian Ogilvy	[reply] [d/l] [select]
Re: Total number of lines for a Zipped file. by thor (Priest) on Oct 29, 2002 at 12:52 UTC
Quick hack: `gunzip -c <filename> \| wc -l` thor	[reply] [d/l]