Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

memory consumption

by moked (Beadle)
on Jul 07, 2009 at 08:54 UTC ( [id://777810]=perlquestion: print w/replies, xml ) Need Help??

moked has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I'm using the next code in order to put a file on a remote place
which indicates if I have a new mail on my exchange server.
the script is doing exactly what it should.
but it has one flaw, after it runs for about one day it consumes about 400MB of memory.
I'm trying to solv this issue but so far no use.

The script is running on a win XP SP3 OP SYS
#!/usr/bin/perl use WWW::Mechanize; use HTTP::Cookies; use Stream::Reader; $url="https://mymail.company.com"; my $username = "XXXXXXXXXX"; my $password = "xxxxxxxxxx"; my $mechanize = WWW::Mechanize->new(autocheck => 1); CHK_STRT: $mechanize->cookie_jar(HTTP::Cookies->new()); $mechanize->credentials($username,$password); $mechanize->get($url); $mechanize->get("https://mymail.company.com/exchange/User/Inbox/?Cmd=c +ontents&Page=1"); my $page = $mechanize->content(); $mechanize->get("https://mymail.company.com/exchange/User/Inbox/Sub_di +r_1/?Cmd=contents&Page=1"); my $Sub_1>get("https://mymail.company.com/exchange/User/Inbox/Sub_dir_ +2/?Cmd=contents&Page=1"); my $Sub_2 = $mechanize->content(); $mechanize->get("https://mymail.company.com/exchange/User/Inbox/sub_di +r_3/?Cmd=contents&Page=1"); my $Sub_3 = $mechanize->content(); $mechanize->get("https://mymail.company.com/exchange/User/Inbox/sub_di +r_4/?Cmd=contents&Page=1"); my $Sub_4 = $mechanize->content(); open(FH, ">school.txt") or die " Can't open school file\n"; binmode FH, ':utf8'; print FH $page; print FH $Sub_1; print FH $Sub_2; print FH $Sub_3; print FH $Sub_4; close(FH); my @substrings = ( 'icon-msg-unread.gif' ); my $handler; open( $handler,'<','school.txt' ) or die "can't Reopen the file\n"; my $stream = Stream::Reader->new( $handler ); my $result = $stream->readto(\@substrings, {Mode => 'E'}); #This mode +returns false $emails = 1; close $handler; open(WR, ">announce.txt"); if( $result ) { print WR "new\n"; $emails++; } elsif( $stream->{Error} ) { die "Fatal error during reading file!\n"; } else { print WR "old\n"; } close WR; unlink('C:/MC/school.txt'); system 'ftp -s:ftpc ftp.server > Log.log'; unlink 'Log.log'; sleep(60); goto CHK_STRT;

Thanks ahead,
Moked

Replies are listed 'Best First'.
Re: memory consumption
by bangers (Pilgrim) on Jul 07, 2009 at 09:47 UTC
    as a suggestion, I'd scope your variables and create $mechanize on each iteration. This may help the garbage collection to free up the memory
    #!/usr/bin/perl use WWW::Mechanize; use HTTP::Cookies; use Stream::Reader; $url="https://mymail.company.com"; my $username = "XXXXXXXXXX"; my $password = "xxxxxxxxxx"; CHK_STRT: { my $mechanize = WWW::Mechanize->new(autocheck => 1); $mechanize->cookie_jar(HTTP::Cookies->new()); $mechanize->credentials($username,$password); $mechanize->get($url); $mechanize->get("https://mymail.company.com/exchange/User/Inbox/?Cmd +=contents&Page=1"); my $page = $mechanize->content(); $mechanize->get("https://mymail.company.com/exchange/User/Inbox/Sub_ +dir_1/?Cmd=contents&Page=1"); my $Sub_1>get("https://mymail.company.com/exchange/User/Inbox/Sub_di +r_2/?Cmd=contents&Page=1"); my $Sub_2 = $mechanize->content(); $mechanize->get("https://mymail.company.com/exchange/User/Inbox/sub_ +dir_3/?Cmd=contents&Page=1"); my $Sub_3 = $mechanize->content(); $mechanize->get("https://mymail.company.com/exchange/User/Inbox/sub_ +dir_4/?Cmd=contents&Page=1"); my $Sub_4 = $mechanize->content(); open(FH, ">school.txt") or die " Can't open school file\n"; binmode FH, ':utf8'; print FH $page; print FH $Sub_1; print FH $Sub_2; print FH $Sub_3; print FH $Sub_4; close(FH); my @substrings = ( 'icon-msg-unread.gif' ); my $handler; open( $handler,'<','school.txt' ) or die "can't Reopen the file\n"; my $stream = Stream::Reader->new( $handler ); my $result = $stream->readto(\@substrings, {Mode => 'E'}); #This mod +e returns false $emails = 1; close $handler; open(WR, ">announce.txt"); if( $result ) { print WR "new\n"; $emails++; } elsif( $stream->{Error} ) { die "Fatal error during reading file!\n"; } else { print WR "old\n"; } close WR; } unlink('C:/MC/school.txt'); system 'ftp -s:ftpc ftp.server > Log.log'; unlink 'Log.log'; sleep(60); goto CHK_STRT;
    I haven't tried this code, but it may make a starting point.

      WWW::Mechanize keeps a history of all visited pages, so your solution will work.

Re: memory consumption
by JavaFan (Canon) on Jul 07, 2009 at 10:37 UTC
    Instead of sleeping, you may be better off running the script from cron (or whatever they have on Windows). That has the advantage that if the script dies (and you do have a couple of die statements - your program is clearly not written with durability in mind), a minute later, another instance tries again.

    As for the memory leak, a classical solution for such daemons is to have them exec() themselves every once in a while. This doesn't work (easily) if the daemon needs to keep state, but your program doesn't. So for instance, you could keep a counter which you increment each loop, and once the counter goes over 500 (or some other number), instead of the goto, you do an exec $0;

    Still, I'd go for the crontab solution.

      Some notes:

      Often, it is a bad when two instance of a program run at the same time. So, when you run the script using "scheduled tasks", and this is a problem, check that the current instance is the only instance (which is another common problem).

      Windows has no exec() system call, so that trick won't work on Windows. (Windows also lacks fork(). Another reason to stay away from Windows. ;-) Recent perl versions try to emulate both, but the emulation is far from being complete - simply because Windows has no equivalent of that API calls.)

      exec $0 removes all command line arguments. Often, you don't want that. exec($0,@ARGV) keeps them.

      In any case, exec() removes all context your program had, it literally starts from the beginning. If you need some state information, you have to keep it outside of the process, e.g. in a file or in an environment variable.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        I checked whether the OPs program uses any arguments - he didn't, so exec $0 should be fine. As for context, most context, like environment variables, cwd, and even open file descriptors will be preserved. State information of the process itself of course doesn't, but I already mentioned that. Nor does the OPs program use state information (although the module it uses builds up a state, which causes the memory leak - the state isn't used; the fact the exec loses this state is exactly the reason why exec helps here).

        As for preventing duplicates to be run, something like:

        use Fcntl ':flock'; open my $me, $0 or die; flock $me, LOCK_EX|LOCK_NB or exit;
        near the beginning of your program usually does the trick.
Re: memory consumption
by Your Mother (Archbishop) on Jul 07, 2009 at 16:38 UTC

    Extending what bangers and Corion said already, you can ask Mech to not keep a history if you need or prefer to have a single object (this was documented incorrectly in earlier versions but I think it's worked for a long time-

    $mech->stack_depth(0)

Re: memory consumption
by missingthepoint (Friar) on Jul 08, 2009 at 10:06 UTC

    This seems to be the most common problem with WWW::Mechanize... people leaving it running for long periods and running out of memory. I'm writing a tutorial for Mechanize which I'll post on PM soon - next few days hopefully. I'll include this in the 'troubleshooting' bit.


    The zeroeth step in writing a module is to make sure that there isn't already a decent one in CPAN. (-- Pod::Simple::Subclassing)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://777810]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-04-20 00:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found