comment on

This is untested (so may need minor tweaks), and the error messages and checking are prefunctory, but it should be close:

Update: Corrected a few typos and tested on a small (500,000) sequence dataset (It took 3.5 minutes, so yours should be done in around an hour. It uses < 2 MB ram)

#! perl -slw
use strict;
use Time::HiRes qw[ time ];

sub readFASTA {
    my( $fh ) = @_;
    local $/ = "\n";
    my $header = <$fh>;
    $header = substr $header, 1 if ord $header == ord '>'; ## First li
+ne
    chomp $header;
    local $/ = '>';
    my $seq = <$fh>;
    chomp $seq; ## get rid of trailing '>';
    $seq =~ tr[\n][]d;
    return $header, \$seq;
}

sub writeFASTA {
    local $\ = "\n";
    my( $fh, $header, $refSeq, $lineLength ) = @_;
    $lineLength ||= 50;
    my $length = length( $$refSeq );
    my $chunks = int( $length / $lineLength );
    --$chunks unless $length % $lineLength;
    print $fh '>', $header;
    print $fh substr $$refSeq, $_*$lineLength, $lineLength for 0 .. $c
+hunks;
    return;
}


open SMALL, '<', 'theSmallFile' or die $!;
open BIG,   '<', 'theBigFile'   or die $!;

until( eof( SMALL ) ) {
    my( $bigHdr, $bigRef ) = readFASTA( \*BIG );

    my( $hdrLeft,  $leftRef  ) = readFASTA( \*SMALL );
    my( $leftId, $leftLen )  = $hdrLeft =~ m[([^_]+)_left\s+length=(\d
++)];
    die "Bad small left header '$hdrLeft'"
        unless defined $leftId and defined $leftLen;

    my( $hdrRight, $rightRef ) = readFASTA( \*SMALL );
    my( $rightId, $rightLen ) = $hdrRight =~ m[([^_]+)_right\s+length=
+(\d+)];
    die "Bad small right header '$hdrRight'"
        unless defined $rightId and defined $rightLen;

    die "Out of sequence data at record $.: $bigHdr/$leftId/$rightId"
        unless $bigHdr eq $leftId and $bigHdr eq $rightId;

    my @bits = split ' ', $$bigRef;

    writeFASTA(
        \*STDOUT, $hdrLeft,
        \join(' ', @bits[ 0 ..  $leftLen -1 ] ), 1e6
    );
    writeFASTA(
        \*STDOUT, $hdrRight,
        \join( ' ', @bits[ -$rightLen ..  -1 ] ), 1e6
    );

}
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re^3: When the input file is huge !!! by BrowserUk
in thread When the input file is huge !!! by sugar

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.