perlquestion
Polyglot
<p>I have data in a format like this:</p>
<code>
43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27:3:8; 27:3:9; 65:1:4; 65:1:18
</code>
<p>I'm attempting to condense these to ranges for every sequence in which the first two colon-delimited numbers are the same and the third numbers are in chronological sequence. For this range, my output should be:</p>
<code>
43:1:1-5; 27:3:7-9; 65:1:4; 65:1:18
</code>
<p>The variability in potential sequence lengths is what is throwing me on this one. While I can match it with a regex expression, I don't know how to make the substitution with only the <i>last</i> matched number in the sequence, i.e., how to know which capture group is the last numbered capture.</p>
<p>Here is what I was trying:</p>
<code>
$seq =~ s/
(\d+):(\d+):(\d+)
(?:;|\s)*
(\1):(\2):(?{1+($3|$6)})
/$1:$2:$3-$6/xg;
</code>
<p>This leaves me with the wrong output: </p>
<code>
#OUTPUT
43:1:1-2; 43:1:3-4; 43:1:5-6; 27:3:7-8; 27:3:9; 65:1:4-18
</code>
<p>Am I attempting something beyond the bounds of regex?</p>
<!-- Node text goes above. Div tags should contain sig only -->
<div class="pmsig"><div class="pmsig-644375">
<p>Blessings,
<p><i>~Polyglot~</i>
</div></div>