in reply to 32 Bit Perl causing segmentation fault if data is big

Using 32-bit 5.8.9, loading a 37MB file consisting of 37*1024 chunks of 1024 chars with '~' separators uses 120MB total and no traps:

C:\test>perl -e"print 'x' x 1024 . '~' for 1..37*1024; print 'x'" > hu +ge.file C:\test>dir huge.file 18/03/2010 23:58 38,835,201 huge.file C:\test>\perl32\bin\perl -e" @{ $h{lines } } = split'~', <>" huge.file

Or with 37*32*1024 chunks of 32chars, it took 343MB and no traps:

C:\test>perl -e"print 'x' x 32 . '~' for 1..37*32*1024; print 'x'" > h +uge.file C:\test>dir huge.file 19/03/2010 00:06 40,009,729 huge.file C:\test>\perl32\bin\perl -e" @{ $h{lines } } = split'~', <>" huge.file

Which given a different OS (Vista) and version probably tells you very little except, that unless you've a tiny amount of ram in your machine, this probably isn't memory limit related.

What may be of more interest is that if you set $/ = '~'; you can read the line in bits and then push them onto the array.

On my machine the latter test from above only requires 105MB total and ran much faster.

perl -e"local $/ = '~'; push @{ $h{lines } }, $_ while <>; <STDIN>" hu +ge.file

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"I'd rather go naked than blow up my ass"

Replies are listed 'Best First'.
Re^2: 32 Bit Perl causing segmentation fault if data is big
by peacelover1976 (Initiate) on Mar 19, 2010 at 01:57 UTC
    Thanks BrowserUk. Can you be bit elaborate on the sample that you have provided. Sorry for my ignorance. I am new to perl so if you can corelate your sample to my sample code that i had provided, i will be really thankful. Thanks, Peacelover1976
      Can you be bit elaborate on the sample that you have provided.

      By "the sample", I presume you mean this one (reformatted for readability)?

      perl -e" local $/ = '~'; push @{ $h{lines } }, $_ while <>; " huge.file

      When you call readline (or use the <> operater as above), Perl determines how much to read from the file, by looking for a character (or sequence of characters), that match the current setting of the special variable $/ (also known as the $INPUT_RECORD_SEPARATOR (you'll have to scroll down aways to find it)). Normally, $/ defaults to being a newline. But...

      What the sample above does is set the value of $/ = '~';. That means that readline will stop reading when it encounters a '~' in the input stream.

      Ie. Instead of readline reading the whole 37 MB in as a single string; then spliting it into a big list; and then assigning that to the array on mass.

      The code above, readlines just up to the first '~' character, pushes it onto the array; then loops (while), back to get the next chunk up to the next '~'.

      Put another way. Setting $/ = '~';, has the effect of redefining a line, as a sequence of chars terminated by a '~'.

      I hope one of those descriptions helps--ask again if it doesn't--because I know of no other langauge that has a standard library that allows you to do this. So when you first encounter it, it is definitely a bit weird.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Hi BrowserUk,

        I understood the role of special variable from your notes. I also tried to implement the same in my code. Please find it below:

        sub new { my $class = shift(); my $self = {}; $self->{fname} = shift() or $self->{fname} = "STDIN"; print "Inside new in X12.pm\n"; # Open File Handle if ($self->{fname} ne "STDIN") { # Check For Empty File if (not -s $self->{fname}) { #die "Empty or non-existent input file.\n"; return 0; } local *IN; print "Before File Open\n"; open(IN, "< $self->{fname}") or return 0; #die "Can't open $se +lf->{fname} for input: $!\n"; print "After File Open\n"; $self->{file} = *IN; $self->{fopen} = 1; } else { $self->{file} = *STDIN; } my $line = ""; while (not length($line)) { print "inside while loop in X12.pm\n"; $line = readline($self->{file}); #die "Unexpected EOF in file $self->{fname}\n" if (not defined +($line)); return 0 if (not defined($line)); $line =~ s/^\s+//; # Strip leading whitespace print "line read from the file in X12.pm\n"; } # Get Header Information and Initalize File Read Buffer if ($line !~ /^ISA/) { $self->{fieldOut} = "*"; $self->{segOut} = "\n"; $self->{compSep} = "<"; $self->{fieldSep} = "\\*"; $self->{segSep} = "\\\n"; print "self in if cond in X12.pm\n"; } elsif (length($line) < 106) { return 0; #die "ISA segment invalid, aborting.\n"; } else { $self->{fieldOut} = substr($line, 3, 1); $self->{segOut} = substr($line, 105, 1); $self->{compSep} = substr($line, 104, 1); $self->{fieldSep} = "\\$self->{fieldOut}"; $self->{segSep} = "\\$self->{segOut}"; print "self in else cond in X12.pm\n"; } print "self->{segSep} is $self->{segSep}\n"; local $/ = $self->{segSep}; # @{ $self->{lines}} = split(/$self->{segSep}/, $line); push @{ $self->{lines}}, $_ while readline($self->{file}); print "New in X12 complete\n"; bless($self, $class); return $self; }

        But i dont get the desired results. Am i doing anything wrong? Please let me know.

        Thanks,

        peacelover1976