splitting a string

rkenshin09 has asked for the wisdom of the Perl Monks concerning the following question:

Hey,

So, I'm basically trying to write a script that cuts a DNA sequence at specific points. I'm able to find the correct sequence within the string, but I can't for the life of me figure out how to split the string at the right site. Here's my code so far:

use strict;

use Getopt::Std;  #Invokes Get options package
our ($opt_c,$opt_e);
getopts('ce:');


my @seq = <>;
chomp(@seq);
my $seq = join("", @seq);


our %re = (
      'BamHI'  => 'GGATCC 1', # G | GATCC
      'EcoRI'  => 'GAATTC 1', # G | AATTC
      'EcoRII' => 'CCWGG 0',  #   | CCWGG
);

my @enz = split(/\b(\s+)\b/,$re{$opt_e});
my $enzseq = $enz[0];
my $cut = $enz[2];
print "The restriction site is: $enzseq\n\n";

while($seq =~ /($enzseq)/ig) {
    my $site = pos($seq),"\n";
    print "$site\n";
}
[download]

Right now, it prints the locations of where to split the sequence, but I'm not exactly sure how to split the string at these specific location into fragments. For instance, if the string is 6000 characters, and the search string is found at 500, 1000, 2000, 4000, 5000, I want to store/print: (500, 500, 1000, 2000, 1000, 1000).

Any help would be greatly appreciated, Thanks.

~~~~~~~~~~

the current code prints:


The input sequence length is 48502

The restriction site is: GAATTC

21231
26109
31752
39173
44977
[download]

So, basically, I want to cut at each of those sites, so the fragments should have a length of 21231, (26109-21231), etc, where you subtract each shorter one from the longer one, where the last one is (48502-44977)

Comment on splitting a string Select or Download Code

Replies are listed 'Best First'.
Re: splitting a string by GrandFather (Saint) on Sep 26, 2008 at 03:09 UTC
It may be that you are looking for something like: `use strict; my $seq = '------------GGATCC________________GGATCC=============GGATCC ++++++++++'; my @splits = (0); push @splits, pos $seq while $seq =~ /(?=GGATCC)/ig; push @splits, length $seq; for my $index (0 .. $#splits) { my $width = $splits[$index + 1] - $splits[$index]; print substr ($seq, $splits[$index], $width), "\n"; }` [download] Prints: `------------ GGATCC________________ GGATCC============= GGATCC+++++++++` [download] However it's a bit hard to tell because you include uninteresting rubbish in your sample that means it is unrunnable, but you don't include any sample data and you don't show the result you would actually like. Perl reduces RSI - it saves typing	[reply] [d/l] [select]
Re: splitting a string by kyle (Abbot) on Sep 26, 2008 at 02:53 UTC
Are you looking for substr? Maybe you want to use split again, sort of like: `my @frangments = split /$enzseq/i, $seq;` [download]	[reply] [d/l]
Re: splitting a string by johngg (Canon) on Sep 26, 2008 at 11:05 UTC
If I have understood correctly, you could do a split using look-around assertions, see perlre, either using a look-ahead to split before your enzyme or a look-behind to split after it. use strict; use warnings; open my $seqFH, q{<}, \ <<'EOD' or die qq{open: << HEREDOC: $!\n}; CAGTTTCGATCGAATC ATTATTGGATCCGCAT GCGTCAGGATCCAATC EOD my $sequence = join q{}, map { chomp; $_ } <$seqFH>; close $seqFH or die qq{close: << HEREDOC: $!\n}; print qq{Sequence:\n $sequence\n}; my $enzymeSeq = q{GGATCC}; my $rxSplitBefore = qr{(?=$enzymeSeq)}; my $rxSplitAfter = qr{(?<=$enzymeSeq)}; print qq{Splitting before $enzymeSeq\n}; my @chunksBefore = split m{$rxSplitBefore}, $sequence; print qq{ $_\n} for @chunksBefore; print qq{Splitting after $enzymeSeq\n}; my @chunksAfter = split m{$rxSplitAfter}, $sequence; print qq{ $_\n} for @chunksAfter; [download] The output. `Sequence: CAGTTTCGATCGAATCATTATTGGATCCGCATGCGTCAGGATCCAATC Splitting before GGATCC CAGTTTCGATCGAATCATTATT GGATCCGCATGCGTCA GGATCCAATC Splitting after GGATCC CAGTTTCGATCGAATCATTATTGGATCC GCATGCGTCAGGATCC AATC` [download] I hope I have guessed correctly and this is of some use. Cheers, JohnGG Update: Corrected `die` text.	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.