Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Matching data in a big file

by alexsc01 (Novice)
on Dec 14, 2013 at 10:00 UTC ( [id://1067118]=perlquestion: print w/replies, xml ) Need Help??

alexsc01 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm reading in a 19MB Rich text file (Student's report card) using
open(RTF_FILE, "file"); while(<RTF_FILE>){ $document.=$_; } $document =~ s/SURNAME/$surname/ ; .... .... print $document ;
I have about 20 of these matching lines of code. At first I had the matching lines inside the while loop. Opening a document took about 30 secs, now with it on the outside it takes about 5 secs. Is there a better way to do this? Scott

Replies are listed 'Best First'.
Re: Matching data in a big file
by hdb (Monsignor) on Dec 14, 2013 at 10:07 UTC

    You can slurp the file in one go by unsetting the special variable $/:

    { local $/; $document = <RTF_FILE>; }

      hdb:

      Lately, I've just been using:

      $document = join("", <RTF_FILE>);

      Compared to the localize technique, I find it clear and easy to type. It is, however, very inefficient. It's fast enough, though, that until today I had never noticed the time it takes to load the file. Will I continue to use the join version? Certainly--except for a task where I need to slurp enough files where it would make a significant difference.

      Just because I have it, here's the benchmark code & results. (You don't have to open it, the results are that File::Slurp and the local technique are roughly equivalent, and both are 10 times faster than the join version.)

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

        Thanks roboticus for your useful comparison.

      slurp without $/

      use Path::Tiny qw/ path /;
      my $document = path( "file" )->slurp;
      $document = path( "file" )->slurp_raw;
      $document = path( "file" )->slurp_utf8;
      $document = path( "file" )->slurp( {binmode => ":raw"} );

Re: Matching data in a big file
by CountZero (Bishop) on Dec 14, 2013 at 12:23 UTC
    Have a look also at File::Slurp and its edit_file and edit_file_lines functions which can do "in place" edits to your file.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1067118]
Approved by hdb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-03-28 07:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found