xiaoyafeng has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I'd like to split a file to three parts and search different strings.As GrandFather's reply and Super Search,Tie::File seems the best way.Below is my code:
use strict; use warnings; use Tie::File; my $file_name = 'test.txt'; my @name; tie @name, 'Tie::File' ,$file_name; die "error!!!" unless defined (my $pid_1 = fork()); exit search_string(0,9999,"8888") unless ($pid_1); die "error!!!" unless defined (my $pid_2 = fork()); exit search_string(10000,19999,"18888") unless ($pid_2); die "error!!!" unless defined (my $pid_3 = fork()); exit search_string(20000,29999,"28888") unless ($pid_3); wait; wait; wait; untie @name; sub search_string { my ($from,$to,$string) = @_; for ($from .. $to) { print "the $_ th line of test.txt is $string\n" if $name[$_] = +~ /^$string$/x; } } __test.txt__ 1 2 3 ... 29998 29999 30000
It works quite well for the FIRST time,but....Look:
# first time the 8887 th line of test.txt is 8888 the 18887 th line of test.txt is 18888 the 28888 th line of test.txt is 28888 #second time the 8888 th line of test.txt is 8888 the 18888 th line of test.txt is 18888 the 28889 th line of test.txt is 28888 #third time the 8887 th line of test.txt is 8888 the 28887 th line of test.txt is 28888 Use of uninitialized value in pattern match (m//) at serarch_file.pl l +ine 42, <$ fh> line 29884. Use of uninitialized value in pattern match (m//) at serarch_file.pl l +ine 42, <$ fh> line 29884. Use of uninitialized value in pattern match (m//) at serarch_file.pl l +ine 42, <$ fh> line 29884. Use of uninitialized value in pattern match (m//) at serarch_file.pl l +ine 42, <$ fh> line 29884. Use of uninitialized value in pattern match (m//) at serarch_file.pl l +ine 42, <$ fh> line 29884.
Please advice!

I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction

Replies are listed 'Best First'.
Re: regarding Tie::File
by chromatic (Archbishop) on Jan 13, 2007 at 01:54 UTC
    Below is my code:

    That code doesn't have 42 lines. Thus the error is coming from another program. As we can't see it, we can only guess as to what it might be.

    As a side note, why are you using fork? Do you have a multiprocessor machine? Your single file is probably on a single hard drive and almost definitely on a single hard drive controller. Your bottleneck here is IO. As your algorithm doesn't support the overlap of lines, you can't even benefit from any OS-level caching.

    I'm also not sure that Tie::File is fork safe. It might be. I don't know.

      Thanks for your reply.In fact,the script is going to run on two processor TRU64 machine, and the size of the file i want to search is over 500M.So I thought that using fork might speed up my search.

      I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction
        I thought that using fork might speed up my search.

        It's pretty unlikely, as it is. Can you split the file into two pieces and put each on separate hard drives with separate controllers? That's more likely to help. (It's not easy to say for sure. There's some black magic with regard to performance optimization. It's absolutely necessary to identify the single most likely bottleneck, though. That's IO here, as it usually is, and having only one file means that that's the most likely place for the bottleneck to start.)

        If your actual code is about as simple as that which you posted, GNU grep might be more useful; see the -n option.

      And What does "That code doesn't have 42 lines" mean? It will lead to unpredictable result?


      Update: correct grammar mistake as ww mentioned.
      I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction

        Your warning message was "Use of uninitialized value in pattern match (m//) at serarch_file.pl line 42, ..." This means the warning came from a script with at least 42 lines of code in it. Yours didn't therefore whatever script quoted wasn't what produced the warning.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊