comment on

How many cpus have you? Neither threads nor processes are likely to help unless you have more than one, and even then it's unlikely to get much quicker using the method you are considering as they would all be accessing the same disc and file. You would likely just slow things down.

One thing that might help is to load the file as a single string. I know you said you do not have enough memory, but I am guessing that you have been trying to load the file as an array of lines which takes a lot more ram than if you load it as a single string. It's a rare machine these days that does not have 200MB available.

The following shows my fairly average machine loading a 200MB file consisting of 100-character lines of random digits and then searching the resultant string and finding 2000+ occurances all in just under 3/4 of a second.

#! perl -slw
use strict;
use Benchmark::Timer; my $T = new Benchmark::Timer;

$T->start('read');
open IN, '<', $ARGV[ 0 ] or die $!;
my $text;
sysread IN, $text, -s( $ARGV[ 0 ] ) or die $!;
$T->stop('read');
printf "file contains %d bytes\n", length $text;

my $count = 0;
$T->start('regex');
$count++ while $text =~ m[(12345)]g;
$T->stop('regex');
print "$count matches found";

$T->report;

__END__
C:\test>junk junk.dat
file contains 213909504 bytes
2016 matches found
1 trial of read (452.624ms total)

1 trial of regex (260.872ms total)
[download]

If you are searching for multiple texts and need to capture/output whole lines then things will be slower, but it's not possible to be more realistic without better information.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re: How to speed up/multi thread extract from txt files? by BrowserUk
in thread How to speed up/multi thread extract from txt files? by MelaOS

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.