Dealing with huge text string

PhilFromIndy has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Dealing with huge text string by BrowserUk (Patriarch) on Mar 28, 2008 at 13:25 UTC
Read it one record at a time: `open(INPUT,"filename.txt") or die "Will not open input: $!"; open(OUTPUT,">output.txt") or die "Will not open output: $!"; local $/ = \164; while( <INPUT> ) { ## Updated per jwkrahn's post below print OUTPUT "$_\n"; } close OUTPUT; close INPUT;` [download] BTW: The semicolon after `open(OUTPUT,">output.txt");` is kinda a givaway that you didn't have the error checking :) As a one-liner: `perl -ple"BEGIN{$/=\164}" filename.txt >output.txt` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^2: Dealing with huge text string by jwkrahn (Abbot) on Mar 28, 2008 at 13:32 UTC
Change `while( <IN> ) {` to `while( <INPUT> ) {` and it may work better.	[reply] [d/l] [select]
Re^2: Dealing with huge text string by PhilFromIndy (Initiate) on Mar 28, 2008 at 14:03 UTC
Took a few seconds to run, thanks! The superfluous semicolon ended up in the opening of the output file while I was changing the names of the files to protect the innocent.	[reply]
Re^2: Dealing with huge text string by mobiusinversion (Beadle) on Mar 28, 2008 at 19:28 UTC
That will certainly fail in many circumstances. See Ikegami's post and my example. If you are unlucky enough to break a wide character in half with 164 byte reads, well, that would suck	[reply]
Re: Dealing with huge text string by ikegami (Patriarch) on Mar 28, 2008 at 13:54 UTC
read would be the alternative to setting $/	[reply]
Re: Dealing with huge text string by locked_user sundialsvc4 (Abbot) on Mar 28, 2008 at 13:42 UTC
Yeah, don't forget that “memory” is virtual. In other words, it is a disk-file. So your program tried to copy 168 megabytes from one disk file to another, the hard way. It probably never succeeded in doing just-that. There are several ways to do it (of course), but yes, the bottom line is that you need to read n bytes at a time and write each piece out followed by a newline.
Re: Dealing with huge text string by mobiusinversion (Beadle) on Mar 28, 2008 at 19:16 UTC
BrowserUK should have a look at Ikegami's post and the perldoc entry on: $/ If by "each record is 164 characters long", our friend Phil really meant that each record is 164 bytes long, than Browser's solution would be fine. If on the other hand, Phil's file had a wide character in it, (that is, a single logical character that requires more than one byte of storage, for example, the pound sign £ or the trademark sign ™), he'd be smoked. The most general way for Phil to feed fixed width fields from a file is as follows. `use strict; my $length = 164; my $file = 'path/to/filename.txt'; open(my $F, '<:encoding' , $file) or die "cant open $file\n$!\n"; # you will supply the right value for 'encoding'. # one common example is 'utf8' while( read( $F, my $record, $length ) ){ # do something with $record }` [download] If Phil was sure that his text file contained no wide characters, he could omit the ':encoding' portion of the open mode; read operates on bytes unless otherwise informed by the status of the filehandle in question. A related issue: To test if an in-memory scalar contains wide-characters, use the bytes pragma and the following trick: `my $c = 'some_scalar_data'; test_for_wide_chars: { require bytes; if ( bytes::length($c) > length($c) \|\| ($] >= 5.008 && $c =~ /[^\0-\xFF]/)) { print "i found a wide character!" } no bytes; }` [download]	[reply] [d/l] [select]
Re^2: Dealing with huge text string by BrowserUk (Patriarch) on Mar 28, 2008 at 19:33 UTC
If by "each record is 164 characters long", our friend Phil really meant that each record is 164 bytes long, than Browser's solution would be fine. It was Six hours of research to find a semantic quibble for a problem that was solved five hours ago? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^3: Dealing with huge text string by mobiusinversion (Beadle) on Mar 29, 2008 at 07:02 UTC
Ah young Browser, wide characters are no laughing matter! Have you ever done any web programming? If so then you'll have run into wide chracters when using HTML entities. How about this, try this code and witness the power of wide characters, which really do exist! `use LWP::UserAgent; open($F, '>:utf8' , 'wide-chars-example.html'); $url = 'http://www.w3schools.com/tags/ref_symbols.asp'; $html = LWP::UserAgent->new()->get($url)->content; print $F $html;` [download] Now open the newly made file using your method... Try: `your_method: { local $/ = 2; #open 'wide-chars-example.html', process the 'records' }` [download] You'll get an interesting surprise! Take care!	[reply] [d/l] [select]
Re^4: Dealing with huge text string by BrowserUk (Patriarch) on Mar 29, 2008 at 09:15 UTC