You are running out of memory because your program requires more virtual memory to complete, than your computer is capable of supplying!

At byte offset 113864 of your equation (the original, fully expanded one with numeric symbols):

...,15618,+,15619,+,15620,15621,15622,+,x,+,x,+,x,+,x,15623,15624,1562 +5,... 113864..............................................^

Where (using my best yet compression technique described below), your equation calls for the multiplication of two additive (+) terms containing 22665 & 186 subterms respectively resulting in a subterm containing 3.8 million terms.

This subterm is next used at offset 113952:

...,x,15630,x,15631,x,15632,x,15633,x,+,x,+,x 113952..................................^

where it is multiplied by another with 6 terms to produce a resultant containing 22,846,320 (+)terms. But, at the point where this caclulation starts, even with compression and token cache substitutions, the process is already consuming 1.7GB!

I estimate that it would require at least 6GB to complete that expansion. And there are two more to go! Obviously, with 32-bit machines being limited to (at most) 3GB of virtual memory per process, this is not going to work.

My DB/SQL skills are more dormant than real at this stage, but I do not see any easy (or hard) way of using a DB for this. Maybe if you re-described the operation of your postfix expression evaluator, someone with greater DB foo than me could see a way to approach this. I do not expect it would be very fast though.

Other solutions

One possibility is to move your application to a well specified 64-bit machine with say 8GB of ram.

Another it to start caching the intermediate terms to disk. Besides being very slow, the more I think about this, the harder it gets. Essentially you would have to load the smaller of the next two terms from a disk based stack(??How to implement??), into memory, break it to its subterms, and then read the larger in, subterm by subterm, and output the result back to disk on the fly. Distinctly non-trivial.

The final possibility that I see, and have been pursuing as time permitted, is to find a more aggressive compression strategy than I have been using till now.

Compresion strategy: token caching

Any subterm of of a subequation like 1,2,3,4,5,6+1,2,3,4,5,6,7,8,9+7,8,9 can be reduced (in size) by caching the common subsequences of multiplicative terms and subtituting a symbol. So the above caches 1,2,3,4,5,6 and assigns symbol A; caches 7,8,9 as B; and the result becomes A+A,B+B. And the multiplicative terms A,B can in turn be cached and C substituted.

Once the calculation is complete, these substitutions can be (recursively) expanded on-the-fly as the result is written to disk.

The problem is, even with this symbol caching, the equation as the point outlined above consisting of purely additive terms. Eg. PQRS+HIJK+ABCD...(22,million) and it is still breaking the memory bank. And you cannot do symbol substitution on sequences of additive terms because multiplication needs access to the individual elements.

However, the largest cache token I've seen in use is only four characters (eg.'PQRS'), which means there must (at the point I've managed to get to so far), be less than 456,976 (26^4) unique terms in the cache. And as the largest subequation (so far) contains 3.8 million terms, there must be many duplicates. Eg.

ABCD+IJKL+ABCD+PQRS+ABCD+IJKL...

Which could be reduced to:

(ABCD*3)+(IJKL*2)+(PQRS*1)...

and those subterms can safely be used in subsequent calculations for a further considerable reduction in memory usage. They can then be expanded on output as above.

I was still trying to a) convince myself this was correct; b) work out how to code it; when I saw your latest post.

I also wonder quite what you are going to do with the results of this? I don't see that the human brain is going to derive much information from a several gigabytes of symbolic mush.

I think I recall you mentioning Excel? I seriously doubt Excel's capability to even load this much data on a 32-bit machine, never mind do anything useful with it?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re: Out of Memory 2. by BrowserUk
in thread Out of Memory 2. by dneedles

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.