in reply to Re: Regex /g and interpolated lengths
in thread Regex /g and interpolated lengths

Sorry I thought it would be easier to answer and didn't want to burden you with code. But here comes now the important part of the code.. I hope there is all you need and I did't forget anything.

open(FILE, "<", $multiz); my @content = <FILE>; close FILE; s/^\w\w\t(.+)(?:\n)?/$1/ for(@content); my $len = length($content[0]) - 5; my $allseq = join("", @content); my @branchPoints; while($allseq =~ /(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?: +c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?:. +{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(.*)/ig) { my $dist = length($7); push(@branchPoints, "\nBranchpoint:\t$1\t$dist\n\t\t$2\t$d +ist\n\t\t$3\t$dist\n\t\t$4\t$dist\n\t\t$5\t$dist\n\t\t$6\t$dist\n\n") +; } print REPORT @branchPoints;

File $multiz comes here:

aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac

20051115 Janitored by Corion: Fixed formatting, code tags

Replies are listed 'Best First'.
Re^3: Regex /g and interpolated lengths
by tirwhan (Abbot) on Nov 15, 2005 at 22:07 UTC

    Hmm, your formatting seems a little messed up there, especially the file data doesn't look like what your code seems to be expecting. Could you please repost that, with <code></code> tags around the code and data?


    Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
      Sorry I thought it would be easier to answer and didn't want to burden you with code. But here comes now the important part of the code.. I hope there is all you need and I did't forget anything.
      open(FILE, "<", $multiz); my @content = <FILE>; close FILE; s/^\w\w\t(.+)(?:\n)?/$1/ for(@content); my $len = length($content[0]) - 5; my $allseq = join("", @content); my @branchPoints; while($allseq =~ /(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?: +.{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})( +ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(.*)/ig) { my $dist = length($7); push(@branchPoints, "\nBranchpoint:\t$1\t$dist\n\t\t$2\t$dist\n\t\t$3\ +t$dist\n\t\t$4\t$dist\n\t\t$5\t$dist\n\t\t$6\t$dist\n\n"); } print REPORT @branchPoints;
      File $multiz comes here:
      aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac

      20051115 Janitored by Corion: Fixed code tags

        Off hand I'd say that you aren't accounting for the newlines in the data. You're trying to match across all six lines explicitly, but aren't matching the newlines. Try changing where you set content to look like this:

        chomp(my @content = <FILE>);
        And then there shouldn't be any newlines in $allseq

        Oh, and if you're trying to match the same pattern several times, you can do something like this: /pattern{6}/ Though in your case that'll throw a kink into how you get the values out with $1 and friends.

        Almost :-). It's <code></code>, not <code><\code>


        Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan

        Running that code gives

        Branchpoint: ctgac 34 ctgac 34 ctgac 34 ctgac 34 ctgac 34 ctgac 34

        What were you expecting? If this is not the output you are getting, then I'd guess duff probably has the answer


        Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan