in reply to merge multiple files giving out of memory error
Memory consumption seems to be about 50% of your solution or less, with the running time being the same or a bit shorter.
#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use List::Util qw{ sum }; my %h; $/ = q(); while (my $block = <>) { my @lines = split /\n/, $block; my $key = $lines[1]; my ($count) = $lines[3] =~ /\s(\d+)/; unless (exists $h{$key}) { $block =~ s/\n\n?$//; $block =~ s/\s*\d+$//; $h{$key} = $block; } $h{$key} .= "\t$count"; } for my $key (sort keys %h) { my ($match) = $h{$key} =~ /((?:\d+\t*)+)$/; my @counts = $match =~ /\d+/g; my $sum = sum(@counts); say join "\t", $h{$key}, "count:$sum\n"; }
If you're interested, here's how I created the input data:
#!/usr/bin/perl use warnings; use strict; my ($n, $size, $dna_length) = @ARGV; my $template = << '__EOT__'; @ns %s + //%s %d __EOT__ open my $OUT, '>', "file$n" or die $!; for my $i (1 .. $size) { my $dna = join q(), map qw( A C G T )[ rand 4 ], 1 .. $dna_length; my $ignored = join q(), map +('A' .. 'Z')[ rand 26 ], 1 .. 4; my $count = 1 + int rand 20; printf {$OUT} $template, $dna, $ignored, $count; print {$OUT} "\n" unless $i == $size; }
Tested with 1000 files with size = 10000 where the DNA length was 6 or 10.
#!/bin/bash for i in {000..999} ; do create-file $i 10000 6 done
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: merge multiple files giving out of memory error
by Anonymous Monk on Mar 02, 2017 at 11:19 UTC | |
by choroba (Cardinal) on Mar 03, 2017 at 04:23 UTC |