Hi, all -

I'm trying to parse out a 50+MB mbox and show the users an interface for reading and searching it, and I used Mail::MboxParser to do the indexing. Unfortunately, whenever I try to run the CGI that I wrote, it pegs the CPU - and stays there, for a long time. I did a few tests, and it is definitely related to the size of the file:

for n in 100k 1M 10M; do echo -n '*** Processing a' $n 'mailbox: ***'; + head -c $n origami_archive > test; time ./mboxparser.cgi > /dev/null +; done *** Processing a 100k mailbox: *** real 0m1.399s user 0m1.364s sys 0m0.036s *** Processing a 1M mailbox: *** real 0m5.943s user 0m5.880s sys 0m0.060s *** Processing a 10M mailbox: *** real 0m53.079s user 0m52.991s sys 0m0.088s

50MB takes... well, I didn't have the patience. :) Way too long to be useful, anyway.

So, here's my code. I can't see what I'm doing wrong, so I'd really appreciate help!

#!/usr/bin/perl -w # Created by Ben Okopnik on Thu Jan 14 21:55:46 EST 2010 use strict; use Mail::MboxParser; use CGI::Carp qw/fatalsToBrowser warningsToBrowser/; use CGI qw/:standard/; $|++; my $fname = "test"; my $mb = Mail::MboxParser->new( $fname, parseropts => { enable_cache => 1, # enable_grep => 1, cache_file_name => '/tmp/cache' . substr(rand(), 1, 10) } ); my($self) = $0 =~ m{([^/]+)$}; my $count = $mb->nmsgs - 1; binmode STDOUT, ':encoding(UTF-8)'; # Set up utf-8 output print header(-charset => 'utf-8'), start_html( -encoding => 'utf-8', -title => 'Origami Archive'); if (!param('msg')){ my $end; my $incr = 50; my $start = param('start') || 0; my $div; # $start is always going to be $incr * $_ for 0 .. int($count / $i +ncr) # If we're more than $incr posts from the start (i.e., $start is $ +incr or more), # show the "Previous" link if ($start > 0){ my $bottom = $start - $incr; print a({-href=>"$self?start=$bottom"}, "Previous $incr"); $div = " | "; } # If we're >= $incr posts from the end, show the 'Next' link if ($count - $start >= $incr){ my $top = $start + $incr; print $div if $div; print a({-href=>"$self?start=$top"}, "Next $incr"); $end = $top - 1; } else { $end = $count; } print hr; # print "Start: $start End: $end"; # Subscripting one message after the other print "<table>\n"; for my $idx ($start .. $end) { my $msg = $mb->get_message($idx); my %m = %{$msg->header}; print Tr(td(b("&gt;&gt;"), a({-href=>"$self?msg=$idx"}, escapeHTML($m{subject}))), td(escapeHTML($m{from}))), "\n" +; } print "</table>\n"; } else { my $msg = param('msg'); my $prev = $msg - ($msg > 0 ? 1 : 0); my $next = $msg + ($msg < $count ? 1 : 0); print join " | ", ( $msg ? a({-href=>"$self?msg=0"}, "&lt;&lt;") : + "&lt;&lt;" ), ($msg ? a({-href=>"$self?msg=$prev"}, "Previous") : "Previous" +), a({-href=>$self}, "Index"), a({-href=>"$self?msg=$next"}, "Next"), ($msg < $count ? a({-href=>"$self?msg=$count"}, "&gt;&gt;") : +"&gt;&gt;"); print hr, pre($mb->get_message($msg)); } print end_html;

--
"Language shapes the way we think, and determines what we can think about."
-- B. L. Whorf

In reply to Mail::MboxParser pegs the CPU by oko1

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.