Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Splitted Zip

by polettix (Vicar)
on Mar 04, 2006 at 02:42 UTC ( [id://534408]=sourcecode: print w/replies, xml ) Need Help??
Category: Utility Scripts
Author/Contact Info Flavio Poletti / frodo72
Description: Sometimes email puts a hard limit to the size of the files we can send. In these occasions, compressing comes handy, because it reduces the size of the data, but it could not be sufficient. Many tools allow the production of a splitted ZIP file, but this approach, while general, requires a higher knowledge on the side of the recipient, that is obliged to save all chunks in a directory. Many users simply don't want to catch this simple concept, and insist on double-clicking in the file they receive.

This is where split-zip.pl comes to the rescue. If it can.

Its purpose is to arrange the files to be sent in order to produce multiple ZIP archives, each of which remains valid and self-contained. Thus, the casual user double-clicking on it will be happy and will see some of the files. Of course, this approach fails miserably if there is the need to send a single, huge file - you're stuck to train your users better in this case.

Note: I only tested it in a few cases, be sure to read the disclaimer at the end!!!

#!/usr/bin/perl
use strict;
use warnings;
use Carp;

use version; my $VERSION = qv('0.0.1');

use Pod::Usage qw( pod2usage );
use Getopt::Long;
use Archive::Zip;
use Fatal qw( open );

# A little help here
unless (@ARGV) {
   pod2usage(-verbose => 1);
   exit 0;
}

# Grab command-line configurations, croak where needed
my %config;
my_GetOptions(\%config);

# On with the show: first of all, pre-generate a bunch of zips
my @zips;
foreach my $file (@ARGV) {
   my $archive = Archive::Zip->new();
   my $member  = $archive->addFile($file);

   my $contents = zip_it($archive);
   my $size     = length $contents;
   if ($size >= $config{max}) {    # Immediate output
      print {*STDERR} "Single compressed file '$file' beyond limit\n"
        if $size > $config{max};
      save_contents($contents);
   }
   else {
      push @zips, [length($contents), $archive, $member, $contents];
   }
} ## end foreach my $file (@ARGV)

# Order from bigger to tiniest
@zips = reverse sort { $a->[0] <=> $b->[0] } @zips;

# Now, try to pack them using some lazy algorithm
while (scalar @zips) {
   my $bigger = shift @zips;
   my ($size, $archive, $member, $contents) = @$bigger;
   for (my $index = 0; $index < @zips; ++$index) {

      # The sum of the zip sizes SHOULD be greater than the size
      # of the zip containing both. At least I hope
      next if $size + $zips[$index][0] > $config{max};

      # Ok, I found a suitable companion, altough it may be suboptimal
+.
      my $companion = splice @zips, $index, 1;

      # Grab its data, and update all stuff
      $archive->addMember($companion->[2]);
      $contents = zip_it($archive);
      $size     = length $contents;
   } ## end for (my $index = 0; $index...

   # Time to save, $contents already contains zipped data
   save_contents($contents);
} ## end while (scalar @zips)

# "Save" archive to a scalar, returns the scalar
sub zip_it {
   my ($archive) = @_;
   my $contents;
   open my $fh, '>', \$contents;
   binmode $fh;
   $archive->writeToFileHandle($fh);
   close $fh;
   return $contents;
} ## end sub zip_it

# Save a scalar to a file, whose name is produced dynamically
sub save_contents {
   my $filename = sprintf "%s%03d.zip", $config{prefix},
     ++$config{counter};
   open my $fh, '>', $filename;
   binmode $fh;
   print {$fh} $_[0];
   close $fh;
   return;
} ## end sub save_contents

# Bottom line from an idea of Aristotle
sub mylength {
   return defined $_[0] && length $_[0];
}

# Get options, perform minimal parameter validation. MINIMAL.
sub my_GetOptions {
   my ($href) = @_;
   GetOptions($href, 'max|m=i', 'prefix|p=s', 'help|h');

   if ($href->{help}) {
      pod2usage(-verbose => 2);
      exit 0;
   }
   unless (mylength $href->{prefix}) {
      print {*STDERR} "please specify a prefix\n\n";
      pod2usage(-verbose => 1);
      exit 1;
   }
   unless (mylength $href->{max}) {
      print {*STDERR} "specify a minimum or use another tool!\n\n";
      pod2usage(-verbose => 1);
      exit 1;
   }
   return;
} ## end sub my_GetOptions

__END__

=head1 NAME

split-zip.pl - ZIP files in different archives, possibly


=head1 VERSION

This document describes split-zip.pl version 0.0.1


=head1 SYNOPSIS

   # Print these examples
   shell$ split-zip.pl

   # Print documentation, also with --help
   shell$ split-zip.pl -h

   # Split with maximum zip size of 1000000 bytes, with zip files
   # starting by 'zipped' (produces 'zipped001.zip', 'zipped002.zip',
   # and so on). Zip all files in directory :)
   shell$ split-zip.pl -m 1000000 -p zipped *

  
=head1 DESCRIPTION

Sometimes email puts a hard limit to the size of the files we can send
+.
In these occasions, compressing comes handy, because it reduces the
size of the data, but it could not be sufficient. Many tools allow
the production of a splitted ZIP file, but this approach, while genera
+l,
requires a higher knowledge on the side of the recipient, that is
obliged to save all chunks in a directory. Many users simply don't
want to catch this simple concept, and insist on double-clicking in
the file they receive.

This is where split-zip.pl comes to the rescue. If it can. Its purpose
is to arrange the files to be sent in order to produce multiple ZIP
archives, each of which remains valid and self-contained. Thus, the
casual user double-clicking on it will be happy and will see some
of the files. Of course, this approach fails miserably if there is the
need to send a single, huge file - you're stuck to train your users
better in this case.


=head1 INTERFACE

If you launch the script without parameters, the examples in the
SYNOPSIS are printed out. Otherwise, you have the following options:

=over

=item -h / --help

Print this extended documentation and exit.

=item -m / --max

Set the maximum value for the size of an archive. If the script is
obliged to produce an archive which is bigger, it will emit a
warning on standard error.

This value is in bytes.

=item -p / --prefix

Set the filename prefix. Each zip file will be produced according to
the following schema:

   <prefix>XXX.zip

where <prefix> is the prefix set with this option, and XXX is a
progressive number starting from 001.

=back

Note that for I<real world> usage you have to provide I<both> the
maximum value and the prefix.


=head1 DIAGNOSTICS

The script will complain if you omit to specify both the maximum
size and the prefix, also printing the SYNOPSIS examples.

There are cases where C<open()> may fail, these will be surely
C<croak>ed upon.

=head1 CONFIGURATION AND ENVIRONMENT

split-zip.pl requires no configuration files or environment variables.


=head1 DEPENDENCIES

Archive::Zip is at the very base of this script. Carp always comes han
+dy.
Getopt::Long and Pod::Usage have become two of my must-have modules. L
+ike
version.


=head1 INCOMPATIBILITIES

None reported.


=head1 BUGS AND LIMITATIONS

No bugs have been reported, but this doesn't mean there aren't.

The script is surely limited, it only tries to address a specific
problem and solve it if it can. Feel free to extend it with more
features!

Please report any bugs or feature requests through http://rt.cpan.org/


=head1 AUTHOR

Flavio Poletti C<flavio@polettix.it>


=head1 LICENCE AND COPYRIGHT

Copyright (c) 2006, Flavio Poletti C<flavio@polettix.it>. All rights r
+eserved.

This script is free software; you can redistribute it and/or
modify it under the same terms as Perl itself. See L<perlartistic>
and L<perlgpl>.

Questo script è software libero: potete ridistribuirlo e/o
modificarlo negli stessi termini di Perl stesso. Vedete anche
L<perlartistic> e L<perlgpl>.


=head1 DISCLAIMER OF WARRANTY

BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WH
+EN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. TH
+E
ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH
YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL
NECESSARY SERVICING, REPAIR, OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE
LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL,
OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE
THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.

=head1 NEGAZIONE DELLA GARANZIA

Poiché questo software viene dato con una licenza gratuita, non
c'è alcuna garanzia associata ad esso, ai fini e per quanto permesso
dalle leggi applicabili. A meno di quanto possa essere specificato
altrove, il proprietario e detentore del copyright fornisce questo
software "così com'è" senza garanzia di alcun tipo, sia essa espressa
o implicita, includendo fra l'altro (senza però limitarsi a questo)
eventuali garanzie implicite di commerciabilità e adeguatezza per
uno scopo particolare. L'intero rischio riguardo alla qualità ed
alle prestazioni di questo software rimane a voi. Se il software
dovesse dimostrarsi difettoso, vi assumete tutte le responsabilità
ed i costi per tutti i necessari servizi, riparazioni o correzioni.

In nessun caso, a meno che ciò non sia richiesto dalle leggi vigenti
o sia regolato da un accordo scritto, alcuno dei detentori del diritto
di copyright, o qualunque altra parte che possa modificare, o redistri
+buire
questo software così come consentito dalla licenza di cui sopra, potrà
essere considerato responsabile nei vostri confronti per danni, ivi
inclusi danni generali, speciali, incidentali o conseguenziali, deriva
+nti
dall'utilizzo o dall'incapacità di utilizzo di questo software. Ciò
include, a puro titolo di esempio e senza limitarsi ad essi, la perdit
+a
di dati, l'alterazione involontaria o indesiderata di dati, le perdite
sostenute da voi o da terze parti o un fallimento del software ad
operare con un qualsivoglia altro software. Tale negazione di garanzia
rimane in essere anche se i dententori del copyright, o qualsiasi altr
+a
parte, è stata avvisata della possibilità di tali danneggiamenti.

Se decidete di utilizzare questo software, lo fate a vostro rischio
e pericolo. Se pensate che i termini di questa negazione di garanzia
non si confacciano alle vostre esigenze, o al vostro modo di
considerare un software, o ancora al modo in cui avete sempre trattato
software di terze parti, non usatelo. Se lo usate, accettate espressam
+ente
questa negazione di garanzia e la piena responsabilità per qualsiasi
tipo di danno, di qualsiasi natura, possa derivarne.

=cut
Replies are listed 'Best First'.
Re: Splitted Zip
by jonadab (Parson) on Mar 05, 2006 at 03:23 UTC
    Of course, this approach fails miserably if there is the need to send a single, huge file - you're stuck to train your users better in this case.

    There is a way around that too, but it requires some knowledge of the recipient's setup, and requires that the user be able to receive attachments with executable content types. (Still, many users can not receive zipfiles either, due to restrictions on attachment content types, so a fully general solution is not possible in any case.)

    It works like this: you break the large file into chunks, give them sequential filenames, and send the chunks (possibly compressed, if much is to be gained by that). You also send a small custom executable that checks for the presense of all the chunks and, if they are all present, assembles them. If they're not all present, it displays a dialog that basically says, "wait, you don't have all the pieces yet, come back and run me again after you've received all the pieces". The executable can be very small in most cases, so it adds very little overhead. (If the user is on Windows, for instance, it can be a batch file and will probably not exceed a kilobyte unless there are a LOT of pieces to put together. If the user is on practically anything else, it could be a very short Perl script.)

    Of course, this requires that the user trusts you enough to run your executable, and that the user's sysadmin trusts the user enough to let them receive executable email attachments. If either of these conditions fails, then you revert to training the user to assemble the pieces.

    The larger problem is that all of these solutions assume the user knows how to open attachments. About a third of all users do NOT know how to open attachments. I have found that making the content available via http and putting the URI on a line by itself in the email message is frequently a better solution for this reason, and also because it's less likely to run into problems with filters, although, that still can happen sometimes, if the network admin _really_ doesn't trust the users; I know a guy who at work is behind a transparent proxy that only allows text/* and strips out Javascript, so if I had to get something to him there I'd probably encode it as Perl code, send or serve it as text/plain, and tell him how to run it from the command line -- he's on OS X, so Perl is present; if he were behind such a proxy _and_ on Windows, that would require even more cleverness to get around if need be.


    Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.
      I think that in the "big file" case I'd stick to the ZIP splitting feature instead of rolling my own. My only concern is that in this case you have to be sure that the recipient is sufficiently smart to save the attachments (instead of dumbly double-clicking on them) in the same directory, and launch the main zip file (but this is simpler, it's the only zip file!).

      The mail-an-URL solution is also my preferred one, when feasable. But sometimes it's simply not possible: in the particular case that started the development of my small script, we were into a tender and the rules were that the stuff had to be both sent by email and delivered via CD-ROM.

      Thank you for the suggestions, anyway!

      Flavio
      perl -ple'$_=reverse' <<<ti.xittelop@oivalf

      Don't fool yourself.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://534408]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-25 07:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found