nwboy74 has asked for the wisdom of the Perl Monks concerning the following question:

I have a long-running, complex perl process that downloads pages, extracts a couple of strings that get stored in a hash, and continues. After the page is initially downloaded, it is stored as an array of lines. I was watching memory use on a Windows machine while it ran and after every page, the memory usage jumped 2MB, but doesn't ever go back down. Eventually, I get an access violation error and the whole thing shuts down.

It's as if the arrays aren't getting garbage collected, but I've examined the reference count just prior to the reference leaving scope and it only has one reference (using Devel::Peek Dump).

SV = RV(0x1a36f28) at 0x1b78394
  REFCNT = 1
  FLAGS = (PADBUSY,PADMY,ROK)
  RV = 0x1b78b8c
  SV = PVAV(0x1b6c41c) at 0x1b78b8c
    REFCNT = 1
    FLAGS = (PADBUSY,PADMY)
    ...

I'm not sure what else to check. The computer that is running the script only has 512MB of RAM and the memory use just climbs and climbs the longer it runs. Like I say, I'm only storing what amounts to a phone number and address from each page in a hash. There's no way that one phone number/address is taking 2MB.

Replies are listed 'Best First'.
Re: Running out of Memory
by Khen1950fx (Canon) on Oct 07, 2010 at 22:42 UTC
Re: Running out of Memory
by bluescreen (Friar) on Oct 07, 2010 at 22:34 UTC

    This is a general recommendation, post your code and we will be able to help you. Ideally you could come up with an script of 40 lines or less that evidences the problem.

    Like the Anonymous monk said, your problem might in one of the items in the array, and that can be causing the whole array not to be collected

Re: Running out of Memory
by Anonymous Monk on Oct 07, 2010 at 21:45 UTC
    It's as if the arrays aren't getting garbage collected, but I've examined the reference count just prior to the reference leaving scope and it only has one reference (using Devel::Peek Dump).

    Oh look, a reference count of 1 :)

    my @one; my @two; my @three; push @two, \@three; push @three,\@two; push @one, \@two, \@three; use Devel::Peek; Dump(\@one ); __END__ SV = RV(0x3e8a90) at 0x3e8a84 REFCNT = 1 FLAGS = (TEMP,ROK) RV = 0x98a404 SV = PVAV(0x3e995c) at 0x98a404 REFCNT = 2 FLAGS = (PADMY) ARRAY = 0x997dec FILL = 1 MAX = 3 ARYLEN = 0x0 FLAGS = (REAL) Elt No. 0 SV = RV(0x98a4b0) at 0x98a4a4 REFCNT = 1 FLAGS = (ROK) RV = 0x98a424 SV = PVAV(0x3e9b84) at 0x98a424 REFCNT = 3 FLAGS = (PADMY) ARRAY = 0x9b1d7c FILL = 0 MAX = 3 ARYLEN = 0x0 FLAGS = (REAL) Elt No. 0 SV = RV(0x3e8b30) at 0x3e8b24 REFCNT = 1 FLAGS = (ROK) RV = 0x98a444 Elt No. 1 SV = RV(0x3e8970) at 0x3e8964 REFCNT = 1 FLAGS = (ROK) RV = 0x98a444 SV = PVAV(0x3e9b9c) at 0x98a444 REFCNT = 3 FLAGS = (PADMY) ARRAY = 0x994b2c FILL = 0 MAX = 3 ARYLEN = 0x0 FLAGS = (REAL) Elt No. 0 SV = RV(0x3e8b80) at 0x3e8b74 REFCNT = 1 FLAGS = (ROK) RV = 0x98a424
      If you look at SV = PVAV(0x3e995c) at 0x98a404, which is the array reference, it has two.
        :) If you look at ... in your original post, it could be hiding an elephant :)
Re: Running out of Memory
by aquarium (Curate) on Oct 08, 2010 at 04:25 UTC
    Sounds like you need to tackle the problem slightly differently, as the process outlined is a bit wasteful in slurping whole pages just to extract minimum of data. surely a grep or similar setup would do, or reading the file a few lines at a time using a window/buffering technique.
    btw phone numbers can be formatted in a myriad of ways. also, make sure you do the right thing with those phone numbers. just because they appear on a website page, doesn't automatically mean you can do unsolicited/bulk calling to these, and can get be illegal in some states/countries. but that's obviously just friendly advice beyond the scope of this forum.
    the hardest line to type correctly is: stty erase ^H