Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hii monks wen i am running my pgm in server its showing smthing like segmentation fault core dumped. i really couldnt understand does it have smthg to do with the memeory allocation of my local variables? here is my code
use strict; use warnings; open (my $fhConditions, "<1.txt"); my $l = 25; open(my $read, "<2.txt"); my @e = <$read>; my $d = join('', @e ); $d =~ s/\s+//g; while (my $line = <$fhConditions>) { chomp $line; my @info = split("\t" , $line); if ($info[2] eq "+") { my $match = substr($d,$info[4],$l); print "y" if $match =~m/AAGCTT/; print "n" if $match !~m/AAGCTT/; } if ($info[2] eq "-") { my $a= $info[4]-$l; my $match = substr($d,$a,$l); print "y" if $match =~m/AAGCTT/; print "n" if $match !~m/AAGCTT/; } }

Replies are listed 'Best First'.
Re: segmentation fault (core dumped!)
by davido (Cardinal) on Jul 03, 2012 at 05:11 UTC

    Since Perl doesn't complain about lack of indentation in code we can cross that one off the list. Did it give you an actual message such as "Out of memory!"?, or did it just silently fail?

    How big is 2.txt?

    Totally unrelated to a core dump, but a good idea nevertheless: As you're not checking for failure in your opens, you should put use autodie; at the top of your script near your use strict; line. That way you'll know if a file fails to open.

    On the topic of memory, you can cut your consumption considerably if you don't store both an array and a scalar each with their own copy of the input file. Something like this:

    my $d = do { local $/ = undef; <$read>; };

    This totally eliminates @e. If the input file is huge, you save a lot of memory. If it's really huge, it won't matter.


    Dave

      my input file (2.txt) is nearly 3 gb and and the 1.txt would be nearly 1 gb.

        The code, as you have it, is reading the entire "2.txt" file into memory, and then making another copy of it in memory as it's converted from an array to a scalar. So your memory footprint is a lot bigger than it has to be. But depending on your system, it may not help to simply avoid making that second copy. You may need to come up with an algorithm that doesn't pull the entire 3gb file into memory all at once.

        Here are three distinct alternatives that you might consider:

        • Find a way to process 2.txt in chunks.
        • As you read 2.txt into memory convert ATGC from bytes to bits; if A=0, T=1, G=2, C=3, then you can store each character position in two bits instead of eight.
        • Keep 2.txt on disk, and do a lot of seeking and telling.

        There are surely other strategies, but these are at least options you can consider.

        Each of these has implications with respect to complexity and performance. You know more about your problem than we do, and frankly, I'm not too interested in implementing a seek/tell or transcoding solution for you. But both are possible (albeit a pain in the backside).


        Dave