comment on

Hello, been working myself in into Perl since about a week. I need it to write scripts for manipulating DNA-sequences. In first instance, I need to cut off part of a string. This might be a trivial thing to do for an experienced programmer but currently, I don't know where to go from this point ...
I've heard of BioPerl, but since I'm quite new to Perl AND programming in general, I get this huge avalanche of information over me and I really don't know where to look first. Plus I need to get some results rather quickly.

So, what I want it to do is read in a file (fasta format) and compare (the beginning of) every sequence to some specified strings (called a primer) and when there's a match, remove the matching string from the sequence. The remaining trimmed sequences should be stored in a new file, and perhaps as a control, store non-matched sequences in another file. This leads to another question. Do you have to store the processed data in an array before writing it to an output-file, or can this be done directly?
Normally, the primers to be compared start at the beginning of the sequence, but, to exclude possible errors, it might be useful to delete anything before the primer. (btw, each sequence is preceded by a unique identifier key (indicated by ">"), so these should always remain together)

Also, I was wondering if it'd make a difference in speed if it'd check the entire file (couple of thousand rows) separately for each specific primer, or if you go row by row and check every primer against it (so only going through the file once). Does this make sense? :o)

Don't mind the comments too much :)

Thanks in advance!

 #! C:/Perl/bin
use strict;
use warnings;
use File::Path;

# This script processes a fasta file containing DNA sequences

# Part 1: declare variables, constants, ...

# forward (F) barcodes

my @forward = ("AGCCTAAGCT",
               "TCAAGTTAGC",
               "AGCCTGGCAT",
               "ACGGTCCATG",
               "ACTTGCCGAT",
               "ACGGTGGATC",
               "ATCCGCCTAG",
               "ATGGCGGTAC");

# reverse (R) barcodes

my @reverse = ("AGCTTAGGCT",
               "TAGCCTAAGC",
               "AGCTTGCCAT",
               "ACGTTCAATG",
               "ACTGGCGGAT",
               "ACGTTGAATC",
               "ATCGGCAAGT",
               "ATGCCGTTAC");

# primers used for Variable Region 1 (V1) and Variable Region 3 (V3) o
+f 16S rRNA
# forward primer (V1 region)
my $V1 = 'AGAGTTTGATCCTGGCTCAG';

# reverse primer (V3 region)
my $V3 = 'GTATTACCGCGGCTGCTGGCA';


# locate the import-file with data
my $input_file = "C:/../input.txt";

# name the filehandler: FASTA_IN
open (FASTA_IN, $input_file);

# import data (fasta formatted style) as array to read all sequences

my @raw_DNA = <FASTA_IN>;

#test imported data 
#print "@raw_DNA\n";

# close the import-file

close FASTA_IN;

# Part 3: start processing sequences

# 3.1 Create arrays to hold processed results
my @Processed_Sequences = ();
my @Rejected_Sequences = ();

# 3.2 concatenate each barcode with apropriate primer

for my $current_barcode(0..$#forward)
    {
       my $F = "$forward[$current_barcode]$V1\n";
       #test concatenation
    # print $F;
                  
     
     #test current concatenated barcode.primer against sequences and i
+f match,
     #remove the barcode and primer
     
     # =~ m/$F/;
     
     #if match print match $F
     
    }
[download]

In reply to remove part of string (DNA) by Furor

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.