Read Death to Dot Star!. Basically the star operator is greedy, and will eat AS MUCH as possible before matching the rest of the expression. So your first dot star will eat most of the 20MB file (i.e. all but the last occurance of the rest of the string), which probably
causes some memory problems.
The solution is to either make the operator non-greedy (put a ? at the end: .*?) or to restate the regexp to get rid of the dot so it doesn't match as much.
In your case though, You'd probably be better off searching CPAN for some modules to parse the mail for you.
Parsers are tricky beasts.
| [reply] [Watch: Dir/Any] |
Looking on CPAN is your best bet. If you wanted to do it yourself, I'd suggest really reading the MIME RFCs, because I can see some problems with what you have there immediately.
I'm gonna assume that the .* issues others have mentioned have been resolved, and you are using .*? for a non-greedy look-ahead. That'll help immediately.
But the end-terminator for an attachment isn't "From", it's a second line matching the first "--.*" line. Worse, the proper value of the ".*" is specified on a different line, which you don't wan't to chop out.
Ignoring that last problem, you could probably use something like:
s{
(\n--).*?\s*\n # Match boundary line
Content-Type:\ image # Find image part
.*? # Match part non-greedily
(\1) # Match next boundary line
}{\1}gs # Replace with boundary line
This should match the proper beginnings and endings better. I'd love to get rid of the .*? parts, but I'm not sure if it can be done. I though of using \S* for the first, but the boundary line can contain spaces, so that won't work.
| [reply] [Watch: Dir/Any] [d/l] |
That's a very greedy regexp, with that .* in there and the /s modifier at the end. I bet that's your problem. Try creating a little mailbag and trying it on that--I bet it'll run, and you'll see how your regex fails.
(If you can't get it to work, tell us a little more about how those mail files are structured--I don't think it would be hard to make it work reliably, but I don't know a darn thing about Netscape mail files.) | [reply] [Watch: Dir/Any] |