in reply to Creating very long strings from text files (DNA sequences)

You could try just setting $/ to "\n>". That is, set the "input record separator" to what actually separates your input records. Gotta love perl for its ability not to answer the question, but rather to very simply change the question to a different one (to which the answer is trivial).

This would look something like this:

local $/ = "\n>"; my %seqs_hash; while (my $line = <SEQS>) { chomp $line; $line =~ s/^\>?\s*(.+)+$\r?\n// or do { warn "Malformatted sequence\n"; next; }; $seq_name = $1; $line =~ s/\s//gs; $seqs_hash{$seq_name} = $line; }
So you're only doing one read per sequence (so there's no need for string concatenation at all).
------------ :Wq Not an editor command: Wq

Replies are listed 'Best First'.
Re^2: Creating very long strings from text files (DNA sequences)
by bobychan (Initiate) on Jun 26, 2004 at 17:04 UTC
    I tried the suggestions of Anonymous Monk and etcshadow. Unfortunately...

    • .= in perl 5.6: 30 seconds
    • store in array, join into string: 5 minutes
    • set $/ to "\n>": 4 minutes
    Sorry about the typo, etcshadow!

    I glanced through the perl58delta, but did not see any architectural changes to memory allocation. Could there be something else causing the slowdown?

    At this point, I've become interested in this issue above and beyond just solving my particular sequence-reading problem. Is this a bug or a feature of perl 5.8...?

      set $/ to "\n": 4 minutes
      Was that a typo? Did you mean: set $/ to "\n>" (with a greater-than sign). Because if you did it with just $/ set to "\n", then you haven't actually changed the important thing.

      Of course, if that's just a typo, then i could see that my "solution" (which I didn't have any data to benchmark with... or a perl 5.8 install to benchmark against) doesn't help significantly... and that's entirely possible, but I'm curious, none-the-less.

      ------------ :Wq Not an editor command: Wq