Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

while($allseq =~ /(ct(?:a|g)a(?:c|t))(?:.{$len})/g) {push(@array, $1);}
$len beeing the length of a word. I always get only the first encounter but never all hits? Why is it so? I guess it has something to do with $len because without it I get all encounters. Thank you in advance.

Replies are listed 'Best First'.
Re: Regex /g and interpolated lengths
by tirwhan (Abbot) on Nov 15, 2005 at 21:45 UTC

    Well, that would depend entirely on your data, wouldn't it? Your regex looks for the desired string (e.g. "ctaac"), followed by exactly $len other characters. When I run that it works perfectly:

    $allseq="ctaacjkjfggctaatlkj"; $len=3; while($allseq =~ /(ct(?:a|g)a(?:c|t))(?:.{$len})/g) {push(@array, $1);} print join("-",@array)."\n";
    prints
    ctaac-ctaat

    What were you expecting? Do you maybe want {0,$len}? arbitrary characters?


    Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan

      Sorry I thought it would be easier to answer and didn't want to burden you with code. But here comes now the important part of the code.. I hope there is all you need and I did't forget anything.

      open(FILE, "<", $multiz); my @content = <FILE>; close FILE; s/^\w\w\t(.+)(?:\n)?/$1/ for(@content); my $len = length($content[0]) - 5; my $allseq = join("", @content); my @branchPoints; while($allseq =~ /(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?: +c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(?:. +{$len})(ct(?:a|g)a(?:c|t))(?:.{$len})(ct(?:a|g)a(?:c|t))(.*)/ig) { my $dist = length($7); push(@branchPoints, "\nBranchpoint:\t$1\t$dist\n\t\t$2\t$d +ist\n\t\t$3\t$dist\n\t\t$4\t$dist\n\t\t$5\t$dist\n\t\t$6\t$dist\n\n") +; } print REPORT @branchPoints;

      File $multiz comes here:

      aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac aa ctgacaaaaaaaaaaactgataaaaaaaaaaaaactaac

      20051115 Janitored by Corion: Fixed formatting, code tags

        Hmm, your formatting seems a little messed up there, especially the file data doesn't look like what your code seems to be expecting. Could you please repost that, with <code></code> tags around the code and data?


        Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
      Sorry beeing confusing. All lines should be joind so I want to find all "aligned" words that match the regex part (ct(?:a|g)a(?:c|t))(?:.{$len}) and this six times as there are 6 sequeces. Your code does function but it is not what I want. Thank you anyway.
Re: Regex /g and interpolated lengths
by Roy Johnson (Monsignor) on Nov 15, 2005 at 21:44 UTC
    To see what you're talking about, we need the value of $_ $allseq that you're working with, and also the value of $len.

    Caution: Contents may have been coded under pressure.
Re: Regex /g and interpolated lengths
by duff (Parson) on Nov 15, 2005 at 21:45 UTC

    Are you sure that the pattern should match more than once? Show a more complete example including values for $len and $allseq

    P.S. Put <code></code> tags around your code.