Update: Ignore this, my error. I should have checked the return code from tie. If the tie fails, it just creates an in memory hash. The extra memory used is just the overhead of loading of the module.

I'm probably doing something wrong, but I just tried the following code to detect duplicates in my 80 MB file (1_000_000 lines x 80 chars) and it took close to 1/2 an hour to hash the whole file.

#! perl -slw use strict; use DB_File; tie %h, 'DB_File', 'test.db'; open IN, '<', 'test.dat' or warn $!; print scalar localtime; $h{ $_ } .= ' ' . $. while $_ = <IN>; print scalar localtime; exit; __END__ Thu Nov 13 20:55:30 2003 Thu Nov 13 21:23:31 2003

That wasn't much of a surprise, but the fact that it consumed 190 MB of memory doing so was, as this is considerably more than building a straight hash in memory.</strike

Is there some way of limiting the memory use?


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!
Wanted!


In reply to Re: Re: Re: Re: Re^2: Are two lines in the text file equal (!count) by BrowserUk
in thread Are two lines in the text file equal by prostoalex

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.