in reply to Re^2: segmentation fault (core dumped!)
in thread segmentation fault (core dumped!)

The code, as you have it, is reading the entire "2.txt" file into memory, and then making another copy of it in memory as it's converted from an array to a scalar. So your memory footprint is a lot bigger than it has to be. But depending on your system, it may not help to simply avoid making that second copy. You may need to come up with an algorithm that doesn't pull the entire 3gb file into memory all at once.

Here are three distinct alternatives that you might consider:

There are surely other strategies, but these are at least options you can consider.

Each of these has implications with respect to complexity and performance. You know more about your problem than we do, and frankly, I'm not too interested in implementing a seek/tell or transcoding solution for you. But both are possible (albeit a pain in the backside).


Dave

Replies are listed 'Best First'.
Re^4: segmentation fault (core dumped!)
by Anonymous Monk on Jul 03, 2012 at 06:36 UTC
Re^4: segmentation fault (core dumped!)
by Anonymous Monk on Jul 03, 2012 at 06:19 UTC
    using sed will make chunks in few minutes but the thg is i need to have entire data. and the server has 512gb memory. is there any priblem if i am storing everythg (2.txt containg 3 gb of data) to single scalar variable? can u just check my code

      Well, I thought it was clear that I did check your code. The issue of not checking the return value of open could be allowing a slient failure, but that wouldn't be anything like a core dump. I asked what error message you were getting, and you haven't answered that yet. I'm assuming, given the size of the files, that you're getting an "Out of memory!" error. Even if the server has copious amounts of RAM, a 32 bit build of Perl can't address more than 2gb (I think). A 64bit build shouldn't have that restriction. So you could probably get your script to run under a 64 bit Perl if it's built right.

      I provided a suggestion for minimizing the memory footprint (I even supplied some code demonstrating how), by eliminating a second in-memory copy of the data, and by storing the large file only in a single scalar rather than in an array and a scalar. That's a bigger savings than you might think, because each array element consumes as much memory as a scalar (which is more than a dozen bytes each). By eliminating the array altogether and holding the data in a single scalar you're reducing your memory footprint to about the same size as the file itself, plus a relatively small amount of overhead.

      You mentioned you need to have the entire data. So I'll assume that you've done your research; your due diligence, and that there really is no algorithm that would allow you to work on the data in chunks instead of all at once. That's fine. So if using a 64 bit Perl still doesn't get you enough wiggle room, then you have to start looking at a random-access-file (seek/tell), or transcoding (converting each byte to its smallest possible representation, possibly two bits per [ACGT].


      Dave

      Even if the server does actually have 512GB or RAM you may only have access to a percentage of that. You may want to check with your sysadmin to see if there are user/process based limits on resource usage.