Update: missed the part about multiple I,D sections....so adjusted loop to do that. And now I see that there was some Count Zero code prior to the thread level I've replied to. His code looks fine to me. What I did is very similar except that I used substr() instead of print.

I think this does what you want. Basically in the CIGAR, an insertion becomes a deletion and vice-versa. So I use the edit instructions in the CIGAR in an inverse sense.

The total field lengths in the CIGAR (viewed in inverted sense) may be less than the number of characters in the input, so I think this means truncate the output to whatever that total is.

whether or not some final adjustment to either truncate or perhaps add more "X"'s after inverse of all editing commands is unclear to me - just a matter of knowing what is required - that's why I kept a running tally of the total length.

#!/usr/bin/perl -w use strict; while (<DATA>) { next if /^\s*$/; #skip blank lines my ($input, $CIGAR) = split; my $ref = $input; #working copy of $input my (@edit_cmd) = $CIGAR =~ m/\d+\w/g; my $curr_pos = 0; my $total_len =0; foreach my $cmd (@edit_cmd) { if (my ($M) = $cmd =~ m/(\d+)M/) { $curr_pos += $M; $total_len+= $M; } elsif (my ($I) = $CIGAR =~ m/(\d+)I/) { substr($ref,$curr_pos,$I,''); #delete $I characters $total_len -= $I } elsif (my ($D) = $CIGAR =~ m/(\d+)D/) { substr($ref,$curr_pos,0,"X" x $D); #insert $D X's $total_len += $D; $curr_pos += $D; } } $ref = substr($ref,0,$total_len); #truncate ????? print "INPUT = $input CIGAR = $CIGAR\n"; print "REF = $ref\n\n"; } =prints INPUT = CGAATTAATGGGAATTG CIGAR = 8M2I7M REF = CGAATTAAGGAAT INPUT = CGAATTAATGGGAATTG CIGAR = 2M2I2M3D10M REF = CGTTTGGGAA INPUT = CGAATTAATGGGA CIGAR = 8M2D7M REF = CGAATTAAXXTGGGA =cut __DATA__ CGAATTAATGGGAATTG 8M2I7M CGAATTAATGGGAATTG 2M2I2M3D10M CGAATTAATGGGA 8M2D7M

In reply to Re^3: string manipulation with Regex by Marshall
in thread string manipulation with Regex by FluffyBunny

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.