Hello all,
I have a gzipped file that contains a string of two bit binary codes. I'm attempting to read this in to do some work on the contents. However, when I attempt to use IO::Uncompress::AnyUncompress, each of the reads ends before the file end. Using gunzip -c | works just fine though. I've included a file snippet below and code. Is this a bug with IO::Uncompress::AnyUncompress? Is this something wrong with my code that attempts to use IO::Uncompress::AnyUncompress?
Edit: The file snippet below is where the read ends on line 899924 in each of the three files I'm processing. I don't know if those codes together add up to some sort of stop code.
1 899682 <B6>^E
1 899740 <B5>^E
1 899766 <B6>^E
1 899767 <B6>^E
1 899816 <B5>^E<B8>^E<BD>^E
1 899915 <B5>^E<B6>^E<B8>^E<BD>^E
1 899924 <B5>^E<B6>^E<B8>^E<BD>^E
1 900121 <B5>^E
1 900159 <B6>^E
1 900373 <B5>^E<B6>^E<B8>^E<BD>^E
1 900686 <B5>^E<B6>^E<BD>^E
1 900791 <B6>^E
1 900902 <B5>^E<B6>^E<B8>^E<BD>^E
1 900903 <B5>^E<B6>^E<B8>^E<BD>^E
1 901004 <B8>^E
1 901005 <B8>^E
1 901020 <B5>^E<B6>^E<B8>^E<BD>^E
1 901092 <B5>^E<B8>^E<BD>^E
1 901129 <B5>^E<B6>^E<B8>^E<BD>^E
1 901188 <B5>^E
1 901369 <B6>^E<B8>^E<BD>^E
1 901423 <B5>^E<BD>^E
Working code below results in line count 5042137.
my $lineCount = 0;
foreach my $file (@whiFiles) {
print "reading $file\n";
open IN, "gunzip -c $file |" or die "Can't open file $!";
while (<IN>) {
next if ($_ =~ /^#/);
chomp;
my ($chr, $pos, $codes) = split(/\t/, $_);
$lineCount++;
}
close IN;
}
print "Line count is $lineCount\n";
Non-working code below results in line count 13263.
my $lineCount = 0;
foreach my $file (@whiFiles) {
print "reading $file\n";
my $HANDLE = new IO::Uncompress::AnyUncompress($file,Transparent =
+> 1,
AutoClose=>1) or die;
while (<$HANDLE>) {
next if ($_ =~ /^#/);
chomp;
my ($chr, $pos, $codes) = split(/\t/, $_);
$lineCount++;
}
close $HANDLE;
}
print "Line count is $lineCount\n";
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.