Dear Monks,
Its a long time since I have asked a question on SOPW such is the power of super search and the many arcane snippets of information one can find in the darkest corners of the monastery. Alas this problem has me stumped.
I am have automated the downloading of genome databases from ftp.ncbi.nih.gov/blast/db, uncompressing them and indexing with a search engine, prospector.ucsf.edu.
My script works on both Windows and Sun OS's.
I currently uncompress the databases with a system call either using the uncompress command on unix or a port of the tool on windows (UNCOMP.exe). One of the databases is a foo.gz compressed file and I use the CPAN Compress::ZLib library for that database (/genbank/genpept.fas.gz Genpept). I would like to use the Compress::ZLib library for all the files. I've played with the example scripts and include my modified versions below but they don't seem to be working. Has anyone had success with foo.Z files before.
As some of the databases are very large I don't want to do this in memory.
Which leads me to this additional question, is there any way of finding out the original file size before uncompressing the file? I can make a guess by dividing the compressed file by 0.55 for amino acid files and 0.3 for DNA files. It would be great if I knew the actual file size as then I could use that information to create a progress bar.
As always thanks for any tips in solving this problem.
The code:
#!/usr/bin/perl -w
use strict;
use Compress::Zlib;
my ($tempFilename2, $tempFilename1, $gz, $buffer, $gzerrno);
$tempFilename1 = "yeast.aa.Z";
$tempFilename2 = "yeast.aa";
open (GZIPFILE, "$tempFilename1")
|| warn "Can't open zip input file: $tempFilename1: $!";
binmode GZIPFILE;
open (DATAFILE, ">>$tempFilename2")
|| warn "Can't open uncompressed data file: $tempFilename2: $!";
binmode DATAFILE;
my $x = inflateInit()
or die "Cannot create a inflation stream\n" ;
my $input = '' ;
my ($output, $status) ;
while (read(GZIPFILE, $input, 4096))
{
($output, $status) = $x->inflate(\$input) ;
print DATAFILE $output
if $status == Z_OK or $status == Z_STREAM_END ;
last if $status != Z_OK ;
}
die "inflation failed\n"
unless $status == Z_STREAM_END ;
Or using the IO-Zlib library:
#!/usr/bin/perl -w
use strict;
use IO::Zlib;
my $tempFilename1 = "yeast.aa.Z";
my $tempFilename2 = "yeast.aa";
open (DATAFILE, ">>$tempFilename2")
|| warn "Can't open uncompressed data file: $tempFilename2: $!";
my $fh = new IO::Zlib;
if ($fh->open("$tempFilename1", "rb")) {
print DATAFILE <$fh>;
$fh->close;
}
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.