in reply to Mail::MboxParser pegs the CPU

Well, you can't parse 50MB in a second... You should create index for you mailbox, and load it instead of parsing mailbox for every request. I would use a database for this purpose. But if you want to use the file, take a look onto make_index method in Mail::MboxParser. It would require some additional work, as you probably want to index more information than just message number, but it's a something you can start with.

Replies are listed 'Best First'.
Re^2: Mail::MboxParser pegs the CPU
by oko1 (Deacon) on Jan 15, 2010 at 19:25 UTC

    Actually, I've been doing that while waiting for a reply here - but I can't quite figure out what's going on. According to the docs, 'make_index' is supposed to run automatically as soon as I exec a 'get_message' method - but the cache file never gets created, no matter what I do (!). I've been trying to figure that out for the past hour or so; still haven't found anything like an answer.

    What I really wish is that I actually understood this process of indexing (I have some hazy conception of saving pointers to message positions within the file, and then reusing those instead of traversing the entire file, but no clue of how to make that work efficiently.) I would have preferred to write that part myself, but had to rely on a module instead.


    --
    "Language shapes the way we think, and determines what we can think about."
    -- B. L. Whorf

      You can use something like this to create index:

      use strict; use warnings; use Mail::MboxParser; my $mb = Mail::MboxParser->new( 'mbox', ); my $ind = $mb->make_index; for ( 0 .. $mb->nmsgs - 1 ) { printf "%5.5d => %10.10d => %s\n", $_, $mb->get_pos($_), $mb->get_message($_)->header->{subject}; }
        > You can use something like this to create index:

        I'm sorry to disagree, but - no, you can't. Per the docs:

        enable_cache When set to a true value, caching is used but only if you +gave *cache_file_name*. There is no default value here! cache_file_name The file used for caching. This option is mandatory if *enable_cache* is true.

        Neither of these is set in your code - and setting them does not create the specified file.

        I've tried explicitly using '$mb->make_index' in my code, by the way - in which both of the above are defined (please see the code I originally posted.) The cache file still does not get created (and, yes, I do have write permissions in that directory; an 'open' call in the script creates one without any problem.) At this point, it's beginning to look like a module bug.


        --
        "Language shapes the way we think, and determines what we can think about."
        -- B. L. Whorf