in reply to unique sequences
It looks like you need something like this:
#!/usr/bin/perl use strict; use warnings; my $input_file = '/scratch/Drosophila/dmel-all-chromosome-r6.02.fasta +'; my $output_file = 'unique12KmersEndingGG.fasta'; open my $FASTA, '<', $input_file or die "Cannot open '$input_file' be +cause: $!"; open my $KMERS, '>', $output_file or die "Cannot open '$output_file' b +ecause: $!"; my ( $count, %unique_data ); while ( my $line = <$FASTA> ) { next if $line =~ /^>/; while ( $line =~ / ( .{9} [ATCG]{10} G \K G ) /gsx ) { print $KMERS '>crispr_', ++$count "\n$1\n" unless $unique_data +{ $1 }++; } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: unique sequences
by AnomalousMonk (Archbishop) on Dec 11, 2017 at 17:35 UTC | |
by jwkrahn (Abbot) on Dec 11, 2017 at 19:26 UTC | |
by AnomalousMonk (Archbishop) on Dec 11, 2017 at 21:25 UTC | |
by Cristoforo (Curate) on Dec 11, 2017 at 19:39 UTC |