comment on

Not sure if this is useful or not, but it does load this 165MB FASTA file in 1.156 seconds and 165MB of ram using 5.8.3. The iteration performance isn't too bad either.

package FASTA::Faster;
use strict;
use warnings;
use Carp;

my %raw;
my %seq;

our $DEBUG = 0;

sub TIEHASH {
    $DEBUG and carp "TIEHASH: @_";
    my( $class, $file, @options ) = @_;

    my $self = bless \$file, $class;
    open my $in, '< :raw', $file or croak "$file : $!";
    sysread( $in, $raw{ $self }, -s $file ) or die "$!";
    close $in;

    $raw{ $self } .= "\n>";    ## Update: Make sure we capture the las
+t record.

    $seq{ $self }{ $1 } = \substr( $raw{ $self }, $-[ 2 ], $+[ 2 ] - $
+-[ 2 ] )
        while $raw{ $self } =~ m[>(\S+)\s[^\n]*?\n(.*?)\n(?=>)]sg;

    return $self;
}

use constant {
    SELF => 0,
    KEY  => 1,
};

sub FETCH {
    $DEBUG and carp "FETCH: @_";
    my $value = ${ $seq{ $_[ SELF ] }{ $_[ KEY ] } };
    $value =~ tr[\n][]d;
    $value;
}

sub EXISTS {
    $DEBUG and carp "EXISTS: @_";
    exists $seq{ $_[ SELF ] }{ $_[ KEY ] };
}

sub FIRSTKEY {
    $DEBUG and carp "FIRSTKEY: @_";
    keys %{ $seq{ $_[ SELF ] } };
    each %{ $seq{ $_[ SELF ] } };
}

sub NEXTKEY {
    $DEBUG and carp "NEXTKEY: @_";
    each %{ $seq{ $_[ SELF ] } };
}

sub SCALAR {
    $DEBUG and carp "SCALAR: @_";
    croak 'Not implemented';
}

sub STORE {
    $DEBUG and carp "STORE: @_";
    croak 'Not implemented';
}

sub DELETE {
    $DEBUG and carp "DELETE: @_";
    croak 'Not implemented';
}

return 1 if caller;

package main;
use Benchmark::Timer;
my $T = new Benchmark::Timer;
local $\=$/;

my %sequence;

$T->start( 'load' );
my $seqRef = tie %sequence, 'FASTA::Faster', 'na_clones.dros.RELEASE2.
+5';
$T->stop( 'load' );

$T->start( 'keys' );
map $_, keys %sequence;
$T->stop( 'keys' );
print scalar keys %sequence;

$T->start( 'values' );
map $_, values %sequence;
$T->stop( 'values' );
print scalar values %sequence;

$T->report;

printf 'Check memory'; <STDIN>;

my( $key, $value );

print "$key =>\n$value\n" while ( $key, $value ) = each %sequence;

__END__
P:\test\FASTA>perl faster.pm
940
940
1 trial of load (1.165s total)

1 trial of keys (12.100ms total)

1 trial of values (12.311ms total)
[download]

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

In reply to Re: Creating very long strings from text files (DNA sequences) by BrowserUk
in thread Creating very long strings from text files (DNA sequences) by bobychan

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.