Hi monks,

I need to do a script which trims ends of strings if there is too many same letters at the end of the string. If there is 3 or more same letters in the end, I would like to shorten the string to have only two same letters. My initial version of doing this is below.
#!/usr/bin/perl use warnings; use strict; while (<DATA>) { chomp; print "$_ -> "; m/(\w)\Z/; my $last = $1; if (s/($last{3,})\Z/$last$last/) { my $len = length($1)-2; print "$len -> $_\n"; } else { print "\n" } } __DATA__ ACTGCTAGGGGGGG TCAGCTAGCNA ACTGSCGACAAAA GTCTGAGTTATTT
And the result of it is
ACTGCTAGGGGGGG -> 5 -> ACTGCTAGG TCAGCTAGCNA -> ACTGSCGACAAAA -> 2 -> ACTGSCGACAA GTCTGAGTTATTT -> 1 -> GTCTGAGTTATT
The prints are there for sake of understanding the situation. However I do need the length of the trimmed string. Though I can of course just use length() after trimming to get it.

However I'm wondering if there is a better, or should I say faster, way of doing this. My inputs are several GB long and I would like to avoid a regex which needs to be recompiled at every time it is used (which it now does as there is a variable in the substitution). I thought of using the (??) construct but since I would need to "go back" after finding out the last character I couldn't come up with a workable solution.

In reply to Regex related question by Hena

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.