comment on

Anyway 9 seconds are not that much..

See how it works out in the real thing.

especially if I start with your algo...

index isn't my algo, just a poor, pure perl substitute for it.

If you look closely at the inline C version, longCmp() is not your average brute force string search. It uses several short circuits to avoid unnecessary comparisons.

For example, it checks the last byte in the haystack up front.

First, there is no point in comparing all 300 bytes if the last bytes don't match.
When longCmp() encounter this scenario:
```
haystack:....0ACGTACGTACGT0....
needle  :     ACGTACGc
[download]
```
The fact that the last bytes to be matched are different means it doesn't have to compare all the intermediates to discover than and can move forward to the next position.
Second, if the last byte that would be compared is 0, then not only is there no point in comparing these two, but we can skip ahead to the start of the next string in the haystack.
That is, when longCmp() tries this comparison:
```
haystack:....0ACGTACGTACGT0....
needle  :          AACCGGTT
[download]
```
It sees that the last byte that would be compared is a null, and not only skips that comparison, but skips ahead to the start of the next string:
```
haystack:....0ACGTACGTACGT0....
needle  :                  AACCGGTT
[download]
```

It isn't possible to code this kind of logic in perl efficiently, hence the inline C solution. My pure perl version was simply a poor substitute until the OP can sort out his Inline::C install.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^15: list of unique strings, also eliminating matching substrings by BrowserUk
in thread list of unique strings, also eliminating matching substrings by lindsay_grey

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.