Re^3: 32 Bit Perl causing segmentation fault if data is big

Can you be bit elaborate on the sample that you have provided.

By "the sample", I presume you mean this one (reformatted for readability)?

perl -e"
    local $/ = '~'; 
    push @{ $h{lines } }, $_ while <>; 
" huge.file
[download]

When you call readline (or use the <> operater as above), Perl determines how much to read from the file, by looking for a character (or sequence of characters), that match the current setting of the special variable $/ (also known as the $INPUT_RECORD_SEPARATOR (you'll have to scroll down aways to find it)). Normally, $/ defaults to being a newline. But...

What the sample above does is set the value of $/ = '~';. That means that readline will stop reading when it encounters a '~' in the input stream.

Ie. Instead of readline reading the whole 37 MB in as a single string; then spliting it into a big list; and then assigning that to the array on mass.

The code above, readlines just up to the first '~' character, pushes it onto the array; then loops (while), back to get the next chunk up to the next '~'.

Put another way. Setting $/ = '~';, has the effect of redefining a line, as a sequence of chars terminated by a '~'.

I hope one of those descriptions helps--ask again if it doesn't--because I know of no other langauge that has a standard library that allows you to do this. So when you first encounter it, it is definitely a bit weird.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"I'd rather go naked than blow up my ass"

Comment on Re^3: 32 Bit Perl causing segmentation fault if data is big Select or Download Code

Replies are listed 'Best First'.
Re^4: 32 Bit Perl causing segmentation fault if data is big by peacelover1976 (Initiate) on Mar 19, 2010 at 04:00 UTC
Hi BrowserUk, I understood the role of special variable from your notes. I also tried to implement the same in my code. Please find it below: sub new { my $class = shift(); my $self = {}; $self->{fname} = shift() or $self->{fname} = "STDIN"; print "Inside new in X12.pm\n"; # Open File Handle if ($self->{fname} ne "STDIN") { # Check For Empty File if (not -s $self->{fname}) { #die "Empty or non-existent input file.\n"; return 0; } local IN; print "Before File Open\n"; open(IN, "< $self->{fname}") or return 0; #die "Can't open $se +lf->{fname} for input: $!\n"; print "After File Open\n"; $self->{file} = IN; $self->{fopen} = 1; } else { $self->{file} = STDIN; } my $line = ""; while (not length($line)) { print "inside while loop in X12.pm\n"; $line = readline($self->{file}); #die "Unexpected EOF in file $self->{fname}\n" if (not defined +($line)); return 0 if (not defined($line)); $line =~ s/^\s+//; # Strip leading whitespace print "line read from the file in X12.pm\n"; } # Get Header Information and Initalize File Read Buffer if ($line !~ /^ISA/) { $self->{fieldOut} = ""; $self->{segOut} = "\n"; $self->{compSep} = "<"; $self->{fieldSep} = "\\*"; $self->{segSep} = "\\\n"; print "self in if cond in X12.pm\n"; } elsif (length($line) < 106) { return 0; #die "ISA segment invalid, aborting.\n"; } else { $self->{fieldOut} = substr($line, 3, 1); $self->{segOut} = substr($line, 105, 1); $self->{compSep} = substr($line, 104, 1); $self->{fieldSep} = "\\$self->{fieldOut}"; $self->{segSep} = "\\$self->{segOut}"; print "self in else cond in X12.pm\n"; } print "self->{segSep} is $self->{segSep}\n"; local $/ = $self->{segSep}; # @{ $self->{lines}} = split(/$self->{segSep}/, $line); push @{ $self->{lines}}, $_ while readline($self->{file}); print "New in X12 complete\n"; bless($self, $class); return $self; } [download] But i dont get the desired results. Am i doing anything wrong? Please let me know. Thanks, peacelover1976	[reply] [d/l]
Re^5: 32 Bit Perl causing segmentation fault if data is big by BrowserUk (Patriarch) on Mar 19, 2010 at 09:04 UTC
But i dont get the desired results. It's your turn to elaborate a bit :) What isn't happening that should be? Or is happening that shouldn't? If the problem is that the '~' separators are being left on the end of the lines, then that's easily dealt with once the array is filled using chomp. Ie. `... local $/ = $self->{segSep}; # @{ $self->{lines}} = split(/$self->{segSep}/, $line); push @{ $self->{lines}}, $_ while readline($self->{file}); chomp @{ $self->{lines} }; ## remove the '~'s from the ends ...` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "I'd rather go naked than blow up my ass"	[reply] [d/l]