in reply to Re^2: Find number of short words in long word
in thread Find number of short words in long word

Thanks greatly for the start on this... All *possible* combinations are important - certainly over 20K sequences they'll likely all appear at least a couple of times. Overlapping instances are important - so "AGCTGT" would need to be scored;
AGCT GCTG CTGT TGTA etc. AGCTGT 1 1 1 0
and so on...

Replies are listed 'Best First'.
Re^4: Find number of short words in long word
by SuicideJunkie (Vicar) on Jul 14, 2009 at 22:50 UTC

    Overlapping instances of different patterns would match fine. You'd be searching them separately

    The problem is when the last few characters of a search pattern are the same as the first few characters, and two matches of the same pattern could overlap... it is those cases where you need the lookaheads.


    'ACTACTA' for example; when searching for 'ACTA', should that score two matches or just one?

    If you want it to be two, you need the lookaheads. If you want it to be just one match, then the regex pattern is simply 'ACTA', but it sounds like you want the lookaheads.