separation a string

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: separation a string by BrowserUk (Patriarch) on Nov 03, 2011 at 11:00 UTC
Update: Improved the performance. sub ntuples{ my( $n, $s ) = @_; my $b = $n -1; my $n2 = length( $s ) - $n +1; return unpack "(A$n X$b)$n2", $s; } ;; $genome = 'AACCCDGYAEELPSWWYAOOLLLSSBBBDDD';; print join ' - ', ntuples( $_, $genome ) for 2 .. 5;; AA - AC - CC - CC - CD - DG - GY - YA - AE - EE - EL - LP - PS - SW - +WW - WY - YA - AO - OO - OL - LL - LL - LS - SS - SB - BB - BB - BD - + DD - DD AAC - ACC - CCC - CCD - CDG - DGY - GYA - YAE - AEE - EEL - ELP - LPS +- PSW - SWW - WWY - WYA - YAO - AOO - OOL - OLL - LLL - LLS - LSS - S +SB - SBB - BBB - BBD - BDD - DDD AACC - ACCC - CCCD - CCDG - CDGY - DGYA - GYAE - YAEE - AEEL - EELP - +ELPS - LPSW - PSWW - SWWY - WWYA - WYAO - YAOO - AOOL - OOLL - OLLL - + LLLS - LLSS - LSSB - SSBB - SBBB - BBBD - BBDD - BDDD AACCC - ACCCD - CCCDG - CCDGY - CDGYA - DGYAE - GYAEE - YAEEL - AEELP +- EELPS - ELPSW - LPSWW - PSWWY - SWWYA - WWYAO - WYAOO - YAOOL - AOO +LL - OOLLL - OLLLS - LLLSS - LLSSB - LSSBB - SSBBB - SBBBD - BBBDD - +BBDDD [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: separation a string by Ratazong (Monsignor) on Nov 03, 2011 at 09:39 UTC
substr could be your friend: it gets you substrings, starting at a defined offset and with a defined length. Now all you have to do is to create suitable loops for the required offsets and lengths. HTH, Rata	[reply]
Re^2: separation a string by Anonymous Monk on Nov 03, 2011 at 10:03 UTC
yes,substr was the only thing that i used, but my bigger problem is that i don't want to just separate 2-2 or 3-3. i need to have all 2 alphabets or 3 alphabets words as i brought in example. thank you again.	[reply]
Re^3: separation a string by Eliya (Vicar) on Nov 03, 2011 at 10:31 UTC
i need to have all 2 alphabets or 3 alphabets words You can call substr() repeatedly in a loop (as Ratazong pointed out), which gives you all 2/3/...-substrings. I'm not 100% sure what your task is, but judging from the sample output, you seem to want something like this: my $s = "AACCCDGYAEELPSWWYAOOLLLSSBBBDDD"; for my $len (2..4) { my @parts; for my $offs (0..length($s)-$len) { push @parts, substr($s, $offs, $len); } print "i=$len: @parts\n"; } __END__ i=2: AA AC CC CC CD DG GY YA AE EE EL LP PS SW WW WY YA AO OO OL LL LL + LS SS SB BB BB BD DD DD i=3: AAC ACC CCC CCD CDG DGY GYA YAE AEE EEL ELP LPS PSW SWW WWY WYA Y +AO AOO OOL OLL LLL LLS LSS SSB SBB BBB BBD BDD DDD i=4: AACC ACCC CCCD CCDG CDGY DGYA GYAE YAEE AEEL EELP ELPS LPSW PSWW +SWWY WWYA WYAO YAOO AOOL OOLL OLLL LLLS LLSS LSSB SSBB SBBB BBBD BBDD + BDDD [download]	[reply] [d/l]
Re: separation a string by AnomalousMonk (Archbishop) on Nov 03, 2011 at 12:06 UTC
BrowserUk's unpack approach is probably a bit faster, but here's the 'standard' regex approach: `>perl -wMstrict -le "my $s = 'AACCCDGYAEELPSWWYA'; ;; for my $n (2 .. 5) { my @subseqs = $s =~ m{ (?= (.{$n})) }xmsg; print qq{n $n: @subseqs}; } " n 2: AA AC CC CC CD DG GY YA AE EE EL LP PS SW WW WY YA n 3: AAC ACC CCC CCD CDG DGY GYA YAE AEE EEL ELP LPS PSW SWW WWY WYA n 4: AACC ACCC CCCD CCDG CDGY DGYA GYAE YAEE AEEL EELP ELPS LPSW PSWW +SWWY WWYA n 5: AACCC ACCCD CCCDG CCDGY CDGYA DGYAE GYAEE YAEEL AEELP EELPS ELPSW + LPSWW PSWWY SWWYA` [download]	[reply] [d/l]