in reply to segmentation fault (core dumped!)

Since Perl doesn't complain about lack of indentation in code we can cross that one off the list. Did it give you an actual message such as "Out of memory!"?, or did it just silently fail?

How big is 2.txt?

Totally unrelated to a core dump, but a good idea nevertheless: As you're not checking for failure in your opens, you should put use autodie; at the top of your script near your use strict; line. That way you'll know if a file fails to open.

On the topic of memory, you can cut your consumption considerably if you don't store both an array and a scalar each with their own copy of the input file. Something like this:

my $d = do { local $/ = undef; <$read>; };

This totally eliminates @e. If the input file is huge, you save a lot of memory. If it's really huge, it won't matter.


Dave

Replies are listed 'Best First'.
Re^2: segmentation fault (core dumped!)
by Anonymous Monk on Jul 03, 2012 at 05:35 UTC
    my input file (2.txt) is nearly 3 gb and and the 1.txt would be nearly 1 gb.

      The code, as you have it, is reading the entire "2.txt" file into memory, and then making another copy of it in memory as it's converted from an array to a scalar. So your memory footprint is a lot bigger than it has to be. But depending on your system, it may not help to simply avoid making that second copy. You may need to come up with an algorithm that doesn't pull the entire 3gb file into memory all at once.

      Here are three distinct alternatives that you might consider:

      • Find a way to process 2.txt in chunks.
      • As you read 2.txt into memory convert ATGC from bytes to bits; if A=0, T=1, G=2, C=3, then you can store each character position in two bits instead of eight.
      • Keep 2.txt on disk, and do a lot of seeking and telling.

      There are surely other strategies, but these are at least options you can consider.

      Each of these has implications with respect to complexity and performance. You know more about your problem than we do, and frankly, I'm not too interested in implementing a seek/tell or transcoding solution for you. But both are possible (albeit a pain in the backside).


      Dave

        using sed will make chunks in few minutes but the thg is i need to have entire data. and the server has 512gb memory. is there any priblem if i am storing everythg (2.txt containg 3 gb of data) to single scalar variable? can u just check my code