Re^2: out of memory problem

Sorry about that. Here is my code

my %fnameof;
my %valueof;

my @relation;
my @second;

my $mainfile;
my $subfiles;
{
  open ($testdataset, "datasetnew.txt") or die "Cannot open file";
  @testdataset =  <$testdataset>;
  close ($testdataset);

  open (STDOUT, ">>result.txt");

    $fcount = 1;
    
    $secondcount = 0;
    
 
    @testdataset = grep { $_ ne '' } @testdataset;
    @testdataset = grep /\S/, @testdataset;
  
    foreach $dataline (@testdataset) {
        ($mainfile, $subfiles) = GetFileName($dataline);
        
        for ($mainfile) {
            $mainfile =~ s/^\s+//;
            $mainfile =~ s/\s+$//;
        }
        addtoHash($mainfile);
        
        @subfiles = @keywords = split(/;/, $subfiles);
        @subfiles = grep { $_ ne '' } @subfiles;
        @subfiles = grep /\S/, @subfiles;

        foreach $subfile (@subfiles) {
            $subfile =~ s/^\s+//;
            $subfile =~ s/\s+$//;
            addtoHash($subfile) unless ($_ ne '');
        }

        #defining the relation of mainfile with subfiles. Each mainfil
+e has relation weight = 1 with subfile.
        foreach $subfile (@subfiles) {
                $relation[$valueof{$mainfile}][$valueof{$subfile}] = 1
+;
                $second[$secondcount] = "$valueof{$mainfile};$valueof{
+$subfile}";
                $secondcount++;
        }
    }
    #creating transitive relationship. ie: if A->B and B->C, then A->C
    foreach  $seconditem (@second) {
            @test = split(/;/, $seconditem);
            $b = $test[0];
            $c = $test[1];

            for ($k = 1; $k<=$secondcount; $k++) {
                if ($relation[$c][$k] gt 0) {
                        $relation[$b][$k] = $relation[$b][$k]+1;
                }
            }
    }
    PrintArray();
}

#get mainfile and subfiles
sub GetFileName{
    my $item = $_[0];
    @datasplit = split(/\t/, $item);
    $mainfile = @datasplit[0];
    $subfiles = @datasplit[1];

    return ($mainfile, $subfiles);
}

sub addtoHash{
    my $file = $_[0];
    $exist = 0;
        for ($i = 0; $i < $fcount; $i++)
    {
                if ($fnameof{$i} eq $file)
                {
                    $exist = $i;
                }
    }
    if ($exist == 0)
    {
        $fnameof{$fcount}= $file;
              $valueof{$file} = $fcount;
              
               $fcount++;
    }
}

sub PrintArray(){

    for($i=1;$i<$fcount; $i++) {
        for($j=1;$j<$fcount;$j++){
            if (defined ($relation[$i][$j])) {
                print $fnameof{$i}."-".$relation[$i][$j]."->".$fnameof
+{$j}."\n";
            }
        }
    }
    print "\n";
}
[download]

And Here is sample dataset:

 cancer    breast cancer; lung cancer; heart cancer; stomach cancer;
 breast cancer    foot cancer;
 foot cancer    some cancer;
 lung cancer    blood cancer; foot cancer;
 heart cancer    foot cancer;
 stomach cancer    foot cancer;
 blood cancer    some cancer;
[download]

But this dataset is actually huge. It's about 48MB. I have 1GB memory in my comp. I ran this program on Windows and Fedora core, but the resut is the same: blank --(with the 48MB dataset). PS: If there are any other points that can improve my code, please let me know.

Comment on Re^2: out of memory problem Select or Download Code

Replies are listed 'Best First'.
Re^3: out of memory problem by GrandFather (Saint) on Mar 15, 2006 at 21:14 UTC
First glance - add `use strict; use warnings` to your code then clean up the errors and warnings. don't use $a or $b as variable names - they are reserved for use by sort use the three parameter open where does $fcount in addtoHash get a value? Make it explicit by passing the value into the sub rather than relying on a global. Don't prototype PrintArray - especially after it's first use! you probably want `chomp @testdataset;` before `@testdataset = grep { $_ ne '' } @testdataset;` `@testdataset = grep { $_ ne '' } @testdataset;` is redundant when followed by `@testdataset = grep /\S/, @testdataset;` what does `for ($mainfile) {` achieve? You test `if ($exist == 0)`, but $i can == 0 and therefore $exist can == 0 (in addtoHash) You could describe the output you expect. Sometimes knowing what is expected of a piece of code helps understand it - sometimes it helps misunderstand it :) Update: more items added DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re^4: out of memory problem by perlbeginner10 (Acolyte) on Mar 16, 2006 at 05:33 UTC
The output of the program with the sample dataset I gave in the previous thread is: `cancer-1->breast cancer cancer-1->lung cancer cancer-1->heart cancer cancer-1->stomach cancer cancer-4->foot cancer cancer-1->blood cancer breast cancer-1->foot cancer breast cancer-1->some cancer lung cancer-1->foot cancer lung cancer-2->some cancer lung cancer-1->blood cancer heart cancer-1->foot cancer heart cancer-1->some cancer stomach cancer-1->foot cancer stomach cancer-1->some cancer foot cancer-1->some cancer blood cancer-1->some cancer` [download] As you can see, I am trying to create transitive relationships, ie, If A->B, and B->C, then A->C. But I am also giving weight to the relationships, ie, if A->D, and D->C, then again A->C. So the relation `[valueof{A}][valueof{C}]=2---->`ie 1+1, because the relation was created twice.	[reply] [d/l] [select]
Re^5: out of memory problem by GrandFather (Saint) on Mar 16, 2006 at 07:05 UTC
Now that I understand what you are doing, here's code that generates the output you are looking for, albeit in a different order: use strict; use warnings; my %mappings; while (<DATA>) { chomp; next if ! /\S/; s/^\s+//; s/\s+$//; my ($mainfile, $subfiles) = split /\s,\s/; my @subfiles = split /\s;\s/, $subfiles; @subfiles = grep /\S/, @subfiles; $mappings{$mainfile}{$_} = 1 for @subfiles; } # Generate transitive relationships. ie: if A->B and B->C, then A->C for my $A (keys %mappings) { for my $B (keys %{$mappings{$A}}) { ++$mappings{$A}{$_} for keys %{$mappings{$B}}; } } for my $key (keys %mappings) { print "$key - $mappings{$key}{$_} -> $_\n" for keys %{$mappings{$k +ey}}; } __DATA__ cancer,breast cancer; lung cancer; heart cancer; stomach cancer; breast cancer,foot cancer; foot cancer,some cancer; lung cancer,blood cancer; foot cancer; heart cancer,foot cancer; stomach cancer,foot cancer; blood cancer,some cancer; [download] Prints: lung cancer - 2 -> some cancer lung cancer - 1 -> blood cancer lung cancer - 1 -> foot cancer cancer - 1 -> some cancer cancer - 1 -> lung cancer cancer - 1 -> blood cancer cancer - 1 -> breast cancer cancer - 4 -> foot cancer cancer - 1 -> heart cancer cancer - 1 -> stomach cancer blood cancer - 1 -> some cancer breast cancer - 1 -> some cancer breast cancer - 1 -> foot cancer heart cancer - 1 -> some cancer heart cancer - 1 -> foot cancer foot cancer - 1 -> some cancer stomach cancer - 1 -> some cancer stomach cancer - 1 -> foot cancer [download] DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re^6: out of memory problem by perlbeginner10 (Acolyte) on Mar 25, 2006 at 06:38 UTC
Re^7: out of memory problem by GrandFather (Saint) on Mar 26, 2006 at 05:18 UTC
Re^3: out of memory problem by ikegami (Patriarch) on Mar 15, 2006 at 22:47 UTC
I don't have time to look at it personally, at least not now, but the following will help you greatly. Change `open ($testdataset, "datasetnew.txt") or die "Cannot open file"; @testdataset = <$testdataset>; close ($testdataset); @testdataset = grep { $_ ne '' } @testdataset; @testdataset = grep /\S/, @testdataset; foreach $dataline (@testdataset) {` [download] to `open (my $testdataset, '<', "datasetnew.txt") or die "Cannot open input file: $!\n"; while (my $dataline = <$testdataset>) { next if $dataline =~ /^\s*$/;` [download] You'll have (2 or 3) fewer copies of your file in memory.	[reply] [d/l] [select]