Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Win8 Strawberry 5.8.9.5 (32) Sun 09/27/2020 14:19:34 C:\@Work\Perl\monks >perl use strict; use warnings; use Data::Dump qw(dd); my $string = 'ACGTAAAAATGCCCATGGGGGGG'; my @repeats = do { my $p; grep { $p = !$p } $string =~ m{ ((.) \2+) }xmsg; }; dd \@repeats; __END__ ["AAAAA", "CCC", "GGGGGGG"]

Update 1: But you also want lengths:

Win8 Strawberry 5.8.9.5 (32) Sun 09/27/2020 14:20:42 C:\@Work\Perl\monks >perl use strict; use warnings; use Data::Dump qw(dd); my $string = 'ACGTAAAAATGCCCATGGGGGGG'; my @repeats_and_lengths = do { my $p; map [ $_, length ], grep { $p = !$p } $string =~ m{ ((.) \2+) }xmsg; }; dd \@repeats_and_lengths; __END__ [["AAAAA", 5], ["CCC", 3], ["GGGGGGG", 7]]
You already know how to sort this. :)

Update 2:

... there are statements in the while loop that look doubtful ...
Other than the useless /g modifier on the /.../g regex, | oops... not useless! I don't see anything objectionable. There are usually several ways to do anything and which is "best" is often a question of taste — unless you're Benchmark-ing.
... the idea of using an array to store the substring along with its length might not be good.
Again, I see nothing to gripe about. It's a matter of taste and the best impedance match to the rest of the code.

Update 3: Oh, and one more thing... If you're doing a buncha matching operations on a buncha long sequences, it might be useful to add a validation step for each input sequence to be sure it consists only in [ATCG] characters before any further matching operations are done. This allows you to match with . (dot) and know that you can only be matching a valid base character. This might save significant time over many matches, but this can only be determined for sure by benchmarking. (I'd be inclined to add a validation step anyway just to be sure your data really is what you think it is.)


Give a man a fish:  <%-{-{-{-<


In reply to Re: substrings that consist of repeating characters (updated x3) by AnomalousMonk
in thread substrings that consist of repeating characters by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2024-04-26 09:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found