in reply to Dealing with huge text string

BrowserUK should have a look at Ikegami's post and the perldoc entry on: $/

If by "each record is 164 characters long", our friend Phil really meant that each record is 164 bytes long, than Browser's solution would be fine. If on the other hand, Phil's file had a wide character in it, (that is, a single logical character that requires more than one byte of storage, for example, the pound sign £ or the trademark sign ™), he'd be smoked.

The most general way for Phil to feed fixed width fields from a file is as follows.
use strict; my $length = 164; my $file = 'path/to/filename.txt'; open(my $F, '<:encoding' , $file) or die "cant open $file\n$!\n"; # you will supply the right value for 'encoding'. # one common example is 'utf8' while( read( $F, my $record, $length ) ){ # do something with $record }
If Phil was sure that his text file contained no wide characters, he could omit the ':encoding' portion of the open mode; read operates on bytes unless otherwise informed by the status of the filehandle in question.

A related issue:

To test if an in-memory scalar contains wide-characters, use the bytes pragma and the following trick:
my $c = 'some_scalar_data'; test_for_wide_chars: { require bytes; if ( bytes::length($c) > length($c) || ($] >= 5.008 && $c =~ /[^\0-\xFF]/)) { print "i found a wide character!" } no bytes; }

Replies are listed 'Best First'.
Re^2: Dealing with huge text string
by BrowserUk (Patriarch) on Mar 28, 2008 at 19:33 UTC
    If by "each record is 164 characters long", our friend Phil really meant that each record is 164 bytes long, than Browser's solution would be fine.

    It was

    Six hours of research to find a semantic quibble for a problem that was solved five hours ago?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Ah young Browser, wide characters are no laughing matter!

      Have you ever done any web programming? If so then you'll have run into wide chracters when using HTML entities.

      How about this, try this code and witness the power of wide characters, which really do exist!
      use LWP::UserAgent; open($F, '>:utf8' , 'wide-chars-example.html'); $url = 'http://www.w3schools.com/tags/ref_symbols.asp'; $html = LWP::UserAgent->new()->get($url)->content; print $F $html;
      Now open the newly made file using your method... Try:
      your_method: { local $/ = 2; #open 'wide-chars-example.html', process the 'records' }
      You'll get an interesting surprise!

      Take care!

        Of course they do. I deal with them all the time...when I have to.

        But I do not waste effort supporting the two greatest evils in software development today--What-if pessimism and Wouldn't-it-be-nice-if optimism--in one-liners that run for a few seconds and may never be used again.

        Let's see. My response was posted 4 minutes after the question was asked--and it worked. Yours came 6 hours later. By my reckoning, assuming a very conservative 7 hour day--7*60 / 4 = 105 * 6 / 24 = 90 days--on that showing it would take you 3 months to what I would do in a day? Nah! That can't be right.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.