comment on

glasswalk3r,
First of all, don't do this:
$ ./test < textfile | sort | uniq
[download]
Perl has more then enough tools to do the job that is done with sort and uniq programs. System calls are expensive and sometimes the speed of those programs doesn't pay for the cost of invoking them.

I think you are making the mistake of repeating what you have heard others say without really understanding it yourself. In this particular case, sort and uniq are likely compiled C programs optimized for a single task and are far superior to Perl. While system calls can be expensive - it is just not the case here.

open(IN,"<$file") or die "Cannot read $file: $!\n"; my @content = <IN> +; close(IN); close(IN);
[download]
This will speed up things than using while block.

Well it may speed things up at the expense of memory. I do not know how many lines are in the file but if individual strings are 9 million characters this may definately be the wrong way to go. You still need to loop through the array so it is not going to avoid the need to loop. The speed savings come in from disk I/O.

Try as much as you can to avoid using next loops with for. Look for the Schwartzian Transform to see how to improve your code. Try using @sequence = split( //, $sequence ) instead of a other loop.

I am not exactly sure why you think using next inside a for loop is a bad thing. If it is possible to eliminate those loops prior to entering the loop then it is advantageous because you don't have a conditional every loop. That is seldom the case. The ST is used to speed up sorting routines when the comparison of 2 elements is expensive. This looks out of place in the context of the rest of what you said so you should probably be sure to explain why what you are saying has relavence.

Finally, the real problem here is the numbers involved. Using a brute force algorithm, no matter how well it is tuned, to find the longest common substring of a 9 million digit number is going to be extremely slow. If you are interested in the math I will be happy to provide it.

Cheers - L~R

In reply to Re^2: Longest repeated string... by Limbic~Region
in thread Longest repeated string... by Yzzyx

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.