Hi monks,
I need to do a script which trims ends of strings if there is too many same letters at the end of the string. If there is 3 or more same letters in the end, I would like to shorten the string to have only two same letters. My initial version of doing this is below.
#!/usr/bin/perl
use warnings;
use strict;
while (<DATA>) {
chomp;
print "$_ -> ";
m/(\w)\Z/;
my $last = $1;
if (s/($last{3,})\Z/$last$last/) {
my $len = length($1)-2;
print "$len -> $_\n";
} else { print "\n" }
}
__DATA__
ACTGCTAGGGGGGG
TCAGCTAGCNA
ACTGSCGACAAAA
GTCTGAGTTATTT
And the result of it is
ACTGCTAGGGGGGG -> 5 -> ACTGCTAGG
TCAGCTAGCNA ->
ACTGSCGACAAAA -> 2 -> ACTGSCGACAA
GTCTGAGTTATTT -> 1 -> GTCTGAGTTATT
The prints are there for sake of understanding the situation. However I do need the length of the trimmed string. Though I can of course just use length() after trimming to get it.
However I'm wondering if there is a better, or should I say faster, way of doing this. My inputs are several GB long and I would like to avoid a regex which needs to be recompiled at every time it is used (which it now does as there is a variable in the substitution). I thought of using the (??) construct but since I would need to "go back" after finding out the last character I couldn't come up with a workable solution.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.