in reply to Get random unique lines from file
This script takes the name of the file and the number of sequences you want and outputs that number of randomly selected sequences.
It is simple, fast and should handle any size of input file with minimal memory usage:
#! perl -slw use strict; use List::Util qw[ shuffle ]; $/ = '>'; open FASTA, '<:raw', $ARGV[0] or die $!; my @seqPosns; push @seqPosns, tell( FASTA ) while <FASTA>; @seqPosns = shuffle @seqPosns; for ( @seqPosns[ 0 .. $ARGV[ 1 ] // 10 ] ) { seek FASTA, $_, 0; my $seq = <FASTA>; chomp( $seq ); chop( $seq ); print '>', $seq; } close FASTA; __END__ C:\test>988096.pl C:/dell/test/LCS/bioMan.fasta 2 >af418682 TTCCACAACTTTCCACCAAGCTCTACAAGATCCCAGAGTCAGGGGCCTGTATTTTCC TGGGTCTTTTGGGCTTTGCCGCTCCATTTACACAATGTGGTTATCCTGCATTAATGC ACTTCTTTCCTTCAGTACGAGATCTCCTAGATACCGCCTCAGCTCTATATCGGGAAG TCAAACAATCCAGATTGGGACTTCAACCCCATCAAGGACCACTGGCCACAAGCCAAC >ab033557 CTCCACGACATTCCACCAAGCTCTGCTAGATCCCAGAGTGAGGGGCCTTTACTTTCC TGGGTCTTTTGGGCTTTGCTGCCCCTTTTACACAATGTGGCTATCCTGCCTTAATGC ACTTCTTTCCTTCCATTCGAGATCTTCTCGACACCGCCTCTGCTCTGTATCGGGAGG TCAAACAATCCAGATTGGGACTTCAACCCCAACAAGGATCAATGGCCAGAAGCAAAT >x97850 CTCCACAACTTTCCTCCAAACTCTTCAAGATTCCAGAGTCAGGGCCCTGTACCTTCC TGGGTCTTTTGGGGTTTGCCGCCCCTTTCACGCAATGTGGATATCCTGCTTTAATGC ACTTTTTTCCTTCTATTCGAGATCTCCTCGACACCGCCTCTGCTCTGTATCGGGAGG TCAGAAAATCCAGATTGGGACCTCAACCCGCACAAGGACAACTGGCCGGACGCCAAC
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Get random unique lines from file
by Marshall (Canon) on Aug 17, 2012 at 23:26 UTC | |
by BrowserUk (Patriarch) on Aug 17, 2012 at 23:56 UTC |