in reply to Re^2: Regex /g and interpolated lengths
in thread Regex /g and interpolated lengths

Hmm, your formatting seems a little messed up there, especially the file data doesn't look like what your code seems to be expecting. Could you please repost that, with <code></code> tags around the code and data?


Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan

Replies are listed 'Best First'.
Re^4: Regex /g and interpolated lengths
by Anonymous Monk on Nov 15, 2005 at 22:10 UTC
    Sorry I thought it would be easier to answer and didn't want to burden you with code. But here comes now the important part of the code.. I hope there is all you need and I did't forget anything.
    open(FILE, "<", $multiz); my @content = <FILE>; close FILE; s/^\w\w\t(.+)(?:\n)?/$1/ for(@content); my $len = length($content[0]) - 5; my $allseq = join("", @content); my @branchPoints; while($allseq =~ /(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?: +.{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})( +ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(.*)/ig) { my $dist = length($7); push(@branchPoints, "\nBranchpoint:\t$1\t$dist\n\t\t$2\t$dist\n\t\t$3\ +t$dist\n\t\t$4\t$dist\n\t\t$5\t$dist\n\t\t$6\t$dist\n\n"); } print REPORT @branchPoints;
    File $multiz comes here:
    aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac

    20051115 Janitored by Corion: Fixed code tags

      Off hand I'd say that you aren't accounting for the newlines in the data. You're trying to match across all six lines explicitly, but aren't matching the newlines. Try changing where you set content to look like this:

      chomp(my @content = <FILE>);
      And then there shouldn't be any newlines in $allseq

      Oh, and if you're trying to match the same pattern several times, you can do something like this: /pattern{6}/ Though in your case that'll throw a kink into how you get the values out with $1 and friends.

        s/^\w\w\t(.+)(?:\n)?/$1/ for(@content);
        Newlines should have been removed by that line.

      Almost :-). It's <code></code>, not <code><\code>


      Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan

      Running that code gives

      Branchpoint: ctgac 34 ctgac 34 ctgac 34 ctgac 34 ctgac 34 ctgac 34

      What were you expecting? If this is not the output you are getting, then I'd guess duff probably has the answer


      Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
        No sorry the machine is a Windonws system so no confusing \r or whatever. Yes what I want to get is all aligned Branchpoints. So it should be:
        ctgac 34 ctgat 18 ctaac 0 ctgac 34 ctgat 18 ctaac 0 ctgac 34 ctgat 18 ctaac 0 ctgac 34 ctgat 18 ctaac 0 ctgac 34 ctgat 18 ctaac 0 ctgac 34 ctgat 18 ctaac 0
        or something like that despite of the format. But it should find all Branchpoints ... not the first one and then surrender.