There seems to be precious little further info on that var
Well, it's in in the perlvar manual page in my install ;-)
But it looks like one has to use English to see its default value:
qwurx [shmem] ~/perlmonks > perl5.10.0 -le 'print ${^RE_TRIE_MAXBUF}'
qwurx [shmem] ~/perlmonks > perl5.10.0 -MEnglish -le 'print ${^RE_TRI
+E_MAXBUF}'
65536
But yes, the number 2**16 rings a bell. It looks like the TRIE-optimization has that
limit (of tokens? objects? branches?). Running with -D512 reveals
13104 probes:
EXECUTING...
Trying 13104 probes with perl 5.010000 at 669148.pl line 18.
1203590260 at 669148.pl line 20.
Compiling REx "TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TGTGTTTAAC|
+CCTCA"...
Final program:
1: TRIEC-EXACT[ACGT] (65526)
<TATGTTTCGT>
<CCGCTTTTTA>
<CGAAGATTTC>
...
<AACAGTGAGG>
<GAAACTCGCG>
<GAGAGATGGA>
65526: END (0)
stclass AHOCORASICKC-EXACT[ACGT] minlen 10
Compiling REx "((?-xism:TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TG
+TGTTT"...
Final program:
1: OPEN1 (3)
3: TRIEC-EXACT[ACGT] (65529)
<TATGTTTCGT>
<CCGCTTTTTA>
<CGAAGATTTC>
...
<AACAGTGAGG>
<GAAACTCGCG>
<GAGAGATGGA>
65529: CLOSE1 (65531)
65531: END (0)
stclass AHOCORASICKC-EXACT[ACGT] minlen 10
Matching REx "((?-xism:TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TGT
+GTTT"... against "ACTCGAATTCCGAATAGATAGAAGTCTGCTGATAATATCGCGCCGGT
TCTGATGCGCCTC"...
Matching stclass AHOCORASICKC-EXACT[ACGT] against "ACTCGAATTCCGAATAGAT
+AGAAGTCTGCTGATAATATCGCGCCGGTTCTGATGCGCCTC"... (1000000 chars)
0 <> <ACTCGAATTC> | Charid: 2 CP: 41 State: 1, word=0
+- legal
1 <A> <CTCGAATTCC> | Charid: 4 CP: 43 State: 52, word=0
+- legal
2 <AC> <TCGAATTCCG> | Charid: 1 CP: 54 State: ce, word=0
+- legal
3 <ACT> <CGAATTCCGA> | Charid: 4 CP: 43 State: 1a2, word=0
+- legal
4 <ACTC> <GAATTCCGAA> | Charid: 3 CP: 47 State: 1a3, word=0
+- legal
13105 probes:
EXECUTING...
Trying 13105 probes with perl 5.010000 at 669148.pl line 18.
1203590218 at 669148.pl line 20.
Compiling REx "TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TGTGTTTAAC|
+CCTCA"...
Final program:
1: TRIEC-EXACT[ACGT] (65531)
<TATGTTTCGT>
<CCGCTTTTTA>
<CGAAGATTTC>
...
<GAAACTCGCG>
<GAGAGATGGA>
<CGCCGAGGAT>
65531: END (0)
stclass AHOCORASICKC-EXACT[ACGT] minlen 10
Compiling REx "((?-xism:TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TG
+TGTTT"...
Final program:
1: OPEN1 (3)
3: BRANCHJ (11)
5: EXACT <TATGTTTCGT> (9)
9: LONGJMP (104850)
11: BRANCHJ (19)
13: EXACT <CCGCTTTTTA> (17)
17: LONGJMP (104850)
19: BRANCHJ (27)
21: EXACT <CGAAGATTTC> (25)
25: LONGJMP (104850)
...
The 65531: END (0) looks - though I don't know at all what it means - just too close to 2**16 ...
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
|