in reply to Re^3: Regex /g and interpolated lengths
in thread Regex /g and interpolated lengths

Sorry I thought it would be easier to answer and didn't want to burden you with code. But here comes now the important part of the code.. I hope there is all you need and I did't forget anything.
open(FILE, "<", $multiz); my @content = <FILE>; close FILE; s/^\w\w\t(.+)(?:\n)?/$1/ for(@content); my $len = length($content[0]) - 5; my $allseq = join("", @content); my @branchPoints; while($allseq =~ /(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?: +.{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})( +ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(.*)/ig) { my $dist = length($7); push(@branchPoints, "\nBranchpoint:\t$1\t$dist\n\t\t$2\t$dist\n\t\t$3\ +t$dist\n\t\t$4\t$dist\n\t\t$5\t$dist\n\t\t$6\t$dist\n\n"); } print REPORT @branchPoints;
File $multiz comes here:
aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac

20051115 Janitored by Corion: Fixed code tags

Replies are listed 'Best First'.
Re^5: Regex /g and interpolated lengths
by duff (Parson) on Nov 15, 2005 at 22:18 UTC

    Off hand I'd say that you aren't accounting for the newlines in the data. You're trying to match across all six lines explicitly, but aren't matching the newlines. Try changing where you set content to look like this:

    chomp(my @content = <FILE>);
    And then there shouldn't be any newlines in $allseq

    Oh, and if you're trying to match the same pattern several times, you can do something like this: /pattern{6}/ Though in your case that'll throw a kink into how you get the values out with $1 and friends.

      s/^\w\w\t(.+)(?:\n)?/$1/ for(@content);
      Newlines should have been removed by that line.

        Ah, good point. Perhaps the file was generated on a DOS/Windows based system but he'se running his program on a unix-based or Macintosh system? The newlines would be different between those systems and his RE wouldn't be working properly.

Re^5: Regex /g and interpolated lengths
by tirwhan (Abbot) on Nov 15, 2005 at 22:14 UTC

    Almost :-). It's <code></code>, not <code><\code>


    Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
Re^5: Regex /g and interpolated lengths
by tirwhan (Abbot) on Nov 15, 2005 at 22:32 UTC

    Running that code gives

    Branchpoint: ctgac 34 ctgac 34 ctgac 34 ctgac 34 ctgac 34 ctgac 34

    What were you expecting? If this is not the output you are getting, then I'd guess duff probably has the answer


    Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
      No sorry the machine is a Windonws system so no confusing \r or whatever. Yes what I want to get is all aligned Branchpoints. So it should be:
      ctgac 34 ctgat 18 ctaac 0 ctgac 34 ctgat 18 ctaac 0 ctgac 34 ctgat 18 ctaac 0 ctgac 34 ctgat 18 ctaac 0 ctgac 34 ctgat 18 ctaac 0 ctgac 34 ctgat 18 ctaac 0
      or something like that despite of the format. But it should find all Branchpoints ... not the first one and then surrender.
        My thinking was that it reaches the end of the string so it breaks up with only one found brachpoint. May it be the solution?

        Your code measures the length of the line only once and then matches the given pattern and any characters up to the end of the line, after which it starts at the beginning of the next line. This should do what you want:

        while(my $line=<FILE>) { my $linecopy=$line; while ($line=~m/(ct[ag]a[ct])/g) { my $match=$1; my ($rest)=$linecopy=~m/(?:$match)(.*)$/; print "$match\t".length($rest)."\t"; } print "\n"; }

        Update: Or, if you're into recursion:

        sub matchit { my $data=shift; if ($data =~ m/(ct[ag]a[ct])(.*)$/) { print "$1\t".length($2)."\t"; matchit($2); } else { print "\n"; } } matchit($_) while(<FILE>);

        Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan