in reply to out of memory problem

We don't know anything about your algorithm or your input. How can we help?

Replies are listed 'Best First'.
Re^2: out of memory problem
by perlbeginner10 (Acolyte) on Mar 15, 2006 at 20:58 UTC
    Sorry about that. Here is my code
    my %fnameof; my %valueof; my @relation; my @second; my $mainfile; my $subfiles; { open ($testdataset, "datasetnew.txt") or die "Cannot open file"; @testdataset = <$testdataset>; close ($testdataset); open (STDOUT, ">>result.txt"); $fcount = 1; $secondcount = 0; @testdataset = grep { $_ ne '' } @testdataset; @testdataset = grep /\S/, @testdataset; foreach $dataline (@testdataset) { ($mainfile, $subfiles) = GetFileName($dataline); for ($mainfile) { $mainfile =~ s/^\s+//; $mainfile =~ s/\s+$//; } addtoHash($mainfile); @subfiles = @keywords = split(/;/, $subfiles); @subfiles = grep { $_ ne '' } @subfiles; @subfiles = grep /\S/, @subfiles; foreach $subfile (@subfiles) { $subfile =~ s/^\s+//; $subfile =~ s/\s+$//; addtoHash($subfile) unless ($_ ne ''); } #defining the relation of mainfile with subfiles. Each mainfil +e has relation weight = 1 with subfile. foreach $subfile (@subfiles) { $relation[$valueof{$mainfile}][$valueof{$subfile}] = 1 +; $second[$secondcount] = "$valueof{$mainfile};$valueof{ +$subfile}"; $secondcount++; } } #creating transitive relationship. ie: if A->B and B->C, then A->C foreach $seconditem (@second) { @test = split(/;/, $seconditem); $b = $test[0]; $c = $test[1]; for ($k = 1; $k<=$secondcount; $k++) { if ($relation[$c][$k] gt 0) { $relation[$b][$k] = $relation[$b][$k]+1; } } } PrintArray(); } #get mainfile and subfiles sub GetFileName{ my $item = $_[0]; @datasplit = split(/\t/, $item); $mainfile = @datasplit[0]; $subfiles = @datasplit[1]; return ($mainfile, $subfiles); } sub addtoHash{ my $file = $_[0]; $exist = 0; for ($i = 0; $i < $fcount; $i++) { if ($fnameof{$i} eq $file) { $exist = $i; } } if ($exist == 0) { $fnameof{$fcount}= $file; $valueof{$file} = $fcount; $fcount++; } } sub PrintArray(){ for($i=1;$i<$fcount; $i++) { for($j=1;$j<$fcount;$j++){ if (defined ($relation[$i][$j])) { print $fnameof{$i}."-".$relation[$i][$j]."->".$fnameof +{$j}."\n"; } } } print "\n"; }
    And Here is sample dataset:
    cancer breast cancer; lung cancer; heart cancer; stomach cancer; breast cancer foot cancer; foot cancer some cancer; lung cancer blood cancer; foot cancer; heart cancer foot cancer; stomach cancer foot cancer; blood cancer some cancer;
    But this dataset is actually huge. It's about 48MB. I have 1GB memory in my comp. I ran this program on Windows and Fedora core, but the resut is the same: blank --(with the 48MB dataset). PS: If there are any other points that can improve my code, please let me know.

      First glance -

      • add use strict; use warnings to your code then clean up the errors and warnings.
      • don't use $a or $b as variable names - they are reserved for use by sort
      • use the three parameter open
      • where does $fcount in addtoHash get a value? Make it explicit by passing the value into the sub rather than relying on a global.
      • Don't prototype PrintArray - especially after it's first use!
      • you probably want chomp @testdataset; before @testdataset = grep { $_ ne '' } @testdataset;
      • @testdataset = grep { $_ ne '' } @testdataset; is redundant when followed by @testdataset = grep /\S/, @testdataset;
      • what does for ($mainfile) { achieve?
      • You test if ($exist == 0), but $i can == 0 and therefore $exist can == 0 (in addtoHash)

      You could describe the output you expect. Sometimes knowing what is expected of a piece of code helps understand it - sometimes it helps misunderstand it :)

      Update: more items added


      DWIM is Perl's answer to Gödel
        The output of the program with the sample dataset I gave in the previous thread is:
        cancer-1->breast cancer cancer-1->lung cancer cancer-1->heart cancer cancer-1->stomach cancer cancer-4->foot cancer cancer-1->blood cancer breast cancer-1->foot cancer breast cancer-1->some cancer lung cancer-1->foot cancer lung cancer-2->some cancer lung cancer-1->blood cancer heart cancer-1->foot cancer heart cancer-1->some cancer stomach cancer-1->foot cancer stomach cancer-1->some cancer foot cancer-1->some cancer blood cancer-1->some cancer
        As you can see, I am trying to create transitive relationships, ie, If A->B, and B->C, then A->C. But I am also giving weight to the relationships, ie, if A->D, and D->C, then again A->C. So the relation [valueof{A}][valueof{C}]=2---->ie 1+1, because the relation was created twice.

      I don't have time to look at it personally, at least not now, but the following will help you greatly. Change

      open ($testdataset, "datasetnew.txt") or die "Cannot open file"; @testdataset = <$testdataset>; close ($testdataset); @testdataset = grep { $_ ne '' } @testdataset; @testdataset = grep /\S/, @testdataset; foreach $dataline (@testdataset) {

      to

      open (my $testdataset, '<', "datasetnew.txt") or die "Cannot open input file: $!\n"; while (my $dataline = <$testdataset>) { next if $dataline =~ /^\s*$/;

      You'll have (2 or 3) fewer copies of your file in memory.