in reply to Regex /g and interpolated lengths

Well, that would depend entirely on your data, wouldn't it? Your regex looks for the desired string (e.g. "ctaac"), followed by exactly $len other characters. When I run that it works perfectly:

$allseq="ctaacjkjfggctaatlkj"; $len=3; while($allseq =~ /(ct(?:a|g)a(?:c|t))(?:.{$len})/g) {push(@array, $1);} print join("-",@array)."\n";
prints
ctaac-ctaat

What were you expecting? Do you maybe want {0,$len}? arbitrary characters?


Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan

Replies are listed 'Best First'.
Re^2: Regex /g and interpolated lengths
by Anonymous Monk on Nov 15, 2005 at 21:57 UTC

    Sorry I thought it would be easier to answer and didn't want to burden you with code. But here comes now the important part of the code.. I hope there is all you need and I did't forget anything.

    open(FILE, "<", $multiz); my @content = <FILE>; close FILE; s/^\w\w\t(.+)(?:\n)?/$1/ for(@content); my $len = length($content[0]) - 5; my $allseq = join("", @content); my @branchPoints; while($allseq =~ /(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?: +c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?:. +{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(.*)/ig) { my $dist = length($7); push(@branchPoints, "\nBranchpoint:\t$1\t$dist\n\t\t$2\t$d +ist\n\t\t$3\t$dist\n\t\t$4\t$dist\n\t\t$5\t$dist\n\t\t$6\t$dist\n\n") +; } print REPORT @branchPoints;

    File $multiz comes here:

    aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac

    20051115 Janitored by Corion: Fixed formatting, code tags

      Hmm, your formatting seems a little messed up there, especially the file data doesn't look like what your code seems to be expecting. Could you please repost that, with <code></code> tags around the code and data?


      Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
        Sorry I thought it would be easier to answer and didn't want to burden you with code. But here comes now the important part of the code.. I hope there is all you need and I did't forget anything.
        open(FILE, "<", $multiz); my @content = <FILE>; close FILE; s/^\w\w\t(.+)(?:\n)?/$1/ for(@content); my $len = length($content[0]) - 5; my $allseq = join("", @content); my @branchPoints; while($allseq =~ /(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?: +.{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})( +ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(.*)/ig) { my $dist = length($7); push(@branchPoints, "\nBranchpoint:\t$1\t$dist\n\t\t$2\t$dist\n\t\t$3\ +t$dist\n\t\t$4\t$dist\n\t\t$5\t$dist\n\t\t$6\t$dist\n\n"); } print REPORT @branchPoints;
        File $multiz comes here:
        aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac

        20051115 Janitored by Corion: Fixed code tags

Re^2: Regex /g and interpolated lengths
by Anonymous Monk on Nov 15, 2005 at 22:03 UTC
    Sorry beeing confusing. All lines should be joind so I want to find all "aligned" words that match the regex part (ct(?:a|g)a(?:c|t))(?:.{$len}) and this six times as there are 6 sequeces. Your code does function but it is not what I want. Thank you anyway.