comment on

hi monks,

i need a help with this one

i would like to speed up my reading from a file. i know i can do that by first splitting a file in several small files and then fork the reading process. is there any other, more elegant way to do this.

why am i trying to do this? well the file i'm dealing with is taking a line from a file and processes it . the processing is what is slowing my procedure down, and because i'm while-looping through a file, my idea was to split a file and then fork the whole procedure for every peace of a file. putting a file in memory is not an option(file is too big for my PC)

so what i'm asking for is a different point of view on this problem of mine. a different idea...

thnx

pseudocode

open (original_file);
$counter = 0;
while(original_file){
  $counter++;
}
my $peace = $counter/4;  # let say the file has an equale number of li
+nes
my $count_for_peace = 0;
open (file_part);
while(original_file){
  if ($count_for_peace == $peace){
    close file_part_handled;
    open(file_part_new);
    $count_for_peace = 0;
  }
  print into file_part_handled 
  $count_for_peace++;
}

my @ch;
for(1..4){
  my $pid = fork();
  if ($pid){
    push(@ch,$pid);
  }
  elsif($pid ==0){
    #read from file 1 and do some processing
    exit;
  }
  else{
    die error;
  }
}
foreach (@ch){
waitpid($_,0);
}
[download]

this is just an example of what i do to speed up my work!

Update:

#read from file 1 and do some processing

i realy didn't benchmark that but what really happens here is the line is read, through regex the number is identified and then this number is looked for in the in-memory hashed table. and then according to some correlated value from that table some quick statistical corection is calculated for that value(FDR). so basicly what i was thinking of when trying to speed things up is to divide my calculation and regex identification through several cores (CPU's are on 100% when i do my parallelization as mentioned). i'll do some benchmarking later and post the results

In reply to fork IO by baxy77bax

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.