So, how about dealing with the memory issue? I thought the beauty of Haskell was that it was non-strict or lazy. That it dealt with infinite lists. Then why does getContents insist on loading the whole darn file?

I was curious about why this happened, so I tried my own variations to see if I could overcome it. Nothing I tried worked; Hugs ran out of memory with a large file every time. So I did some more searching online and through the Haskell mailing list. There's a thread that talks about the 'wc' program, and gives the reasons why it uses so much memory.

In essence, it's not that Haskell is not being lazy enough and reading in the entire file. The problem is that it's being too lazy, and not evaluating the '+1's in the recursive function call. So it keeps all these '+1's around in memory, waiting to be evaluated until it hits the base case, but you run out of memory before it gets there.

The solution they came up with on the mailing list was to create a data type to hold the stats, and put strictness markers on the data type so that the '+1's would get evaluated immediately. Below is your example converted to that style. The strictness markers are the exclamation points before the parametric types in the data constructor Stats. They tell Haskell to evaluate whatever value is about to be put into that slot in the Stats value being created, instead of keeping the "thunks" around to evaluate later. This version does run in constant memory space with a huge file.

data Stats = Stats !Int !Int !Int wcf :: Stats -> String -> Stats wcf stats [] = stats wcf (Stats cc w lc) (' ' : xs) = wcf (Stats (cc+1) (w+1) (lc ) ) xs wcf (Stats cc w lc) ('\t' : xs) = wcf (Stats (cc+1) (w+1) (lc ) ) xs wcf (Stats cc w lc) ('\n' : xs) = wcf (Stats (cc+1) (w+1) (lc+1) ) xs wcf (Stats cc w lc) ( x : xs) = wcf (Stats (cc+1) (w ) (lc ) ) xs wc :: IO() wc = do putStr ( "Filename: " ) name <- getLine contents <- readFile name let (Stats cc w lc) = wcf (Stats 0 0 0) contents putStrLn ( "The file " ++ name ++ " has " ) putStrLn ( show cc ++ " chars " ) putStrLn ( show w ++ " words " ) putStrLn ( show lc ++ " lines." ) main = wc


In reply to Re^14: World's shortest intro to function programming by kelan
in thread Thread on Joel on software forum : "I hate Perl programmers." by techcode

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.