in reply to Re: Reducing memory footprint when doing a lookup of millions of coordinates
in thread Reducing memory footprint when doing a lookup of millions of coordinates

Cheers BrowserUK,

I'll definitely give that a go.

Out of interest is there an advantage to using the constant for END and REP rather than 0 and 1? I've not used that before.

Many thanks for your help

Rich
  • Comment on Re^2: Reducing memory footprint when doing a lookup of millions of coordinates

Replies are listed 'Best First'.
Re^3: Reducing memory footprint when doing a lookup of millions of coordinates
by BrowserUk (Patriarch) on Feb 27, 2011 at 12:40 UTC
    is there an advantage to using the constant for END and REP rather than 0 and 1?

    Beyond a little extra clarity, no. I did it to make the two versions visibly comparible.

    There might be some extra memory savings to be had if you could give a clearer idea of the numbers involved.

    Ie. How many chromosomes? Approximate maximum lengths of both the chromosomes and the ranges?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Cool. That makes sense. Cheers

      There are 24 chromosomes I'm interested in in the file with the total number of records (ranges) per chromosome as follows:

      chr1 2235512 chr2 674652 chr3 348269 chr4 323500 chr5 308100 chr6 338158 chr7 280734 chr8 253229 chr9 224412 chr10 237524 chr11 240186 chr12 250300 chr13 161894 chr14 160126 chr15 152561 chr16 170145 chr17 167623 chr18 126566 chr19 134123 chr20 123693 chr21 61077 chr22 75260 chrX 265561 chrY 43169

      The length of the ranges are usually between about 10-300.

      Does this help?

        Does this help?

        No.

        But from those figures I have to agree with moritz that splitting the dataset into 24 files and loading each set individually makes perfect sense. It would hardly affect your performance at all, but reduce your memory consumption to the size of the largest set.

        Ie. roughly 2235512/7356374 = 30% of 1.2GB ~= 350MB.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.