in reply to Re: Get CID inline attachments with MIME::Parser
in thread Get CID inline attachments with MIME::Parser

About srand:

I dont know if this are updated now, but when I first begun learning perl, I noticed that if I would run 2 instances of a script printing a sequence of random digits, both scripts would show the same sequence if not srand; was runned in the beginning. Therefore, I have get used to srand; in the beginning when Im gonna use rand();

About repeated calls to rand:

The repeated calls to rand is to force leading zeroes in case I get a number like 000001. Perl would normally strip off all leading zeroes leading to strange filenames.

By calling int(rand(10) repeated times, I guarantee that the resulting number will have this number of digits. So if I would want to generate a 10 digit number, thats ALWAYS 10 digits, even if the number coming out is 1, I would run:

$number = int(rand(10)).int(rand(10)).int(rand(10)).int(rand(10)).int(rand(10)).int(rand(10)).int(rand(10)).int(rand(10)).int(rand(10)).int(rand(10));

That would force the number to the string "0000000001" if it gets that number, and not "1".

About testing if $foldername exist

Since the filename is consisted of time() and 5 random digits, for the chance to happen that it would write the mail in the same folder, is in the following prerequistes:

-2 or more mails must be finished by postfix in the exact same second.

-Both 5 digit random numbers must be exactly same. The chance for this happening is: 0,001 %.

you would have a higher chance of winning lucky numbers on TV, than both of these prerequistes happening at the same time. And IF that would happen, no ill would be happen except for 2 mails getting merged into one.

The mail SMTP server IS located in the Web server, they are running on the same machine! Thats why I want to skip all overhead with going internally through IMAP and POP3. I also prefer to block incoming IMAP and POP3 in fw for security reasons and only have port 80, 25 and 53 open in fw.

The problem with parsing the mail as-its opened by the receiver, is if someone would send you a lets say a 50 MB mail with attachments of 49MB. You might not want to have to download that attachment, but you want to still read the body of the mail. You would still have to wait until the attachments is parsed before body can be opened.

Theres NO MIME parser in the webmail system. All the webmail system does is to scan a folder of files and generating a output based on that. MIME::Parser will have parsed everything when a email has been received.

About permissions: I prefer to code the permission system itself. As you might see, the mail is placed in the /my/ folder. Thats a user of the webmail system. When I have get all running, I will implement so the system will place the mail in the /$user/ folder where $user is the part before @. No malicious user can access other user's mail since their login will make the webmail system read from "their" folder. Theres no need to config unix permissions since no unauthorized has admin/physical access to the server machine.

About maildir: Maildir are writing the mail to the disk before its parsed. Thats means parsing has to wait until mail is fully delivered. By letting postfix stream in the mail into the parser while the remote MTA is still writing to my postfix server, I can launch parsing at the same time as the remote MTA sends "DATA" to me. This speeds up things. The mail is written to disk completely parsed and ready for the webmail system to pick up.

About switching mailservers: I selected postfix because its efficient and it can stream the mail to a mailbox command's STDIN. If I would switch mailserver, I would require that the mailserver can do that. If theres a MUST to switch a mailserver to a noncompatible type, it would be as easy to replace parse(\*STDIN) in my script with parse_open($path_to_mailfile_in_mailserver) since most, if not all SMTP servers, would write a MIME file somewhere.

Replies are listed 'Best First'.
Re^3: Get CID inline attachments with MIME::Parser
by roboticus (Chancellor) on Nov 28, 2010 at 21:17 UTC

    sebastiannielsen:

    About srand:

    I dont know if this are updated now, but when I first begun learning perl, I noticed that if I would run 2 instances of a script printing a sequence of random digits, both scripts would show the same sequence if not srand; was runned in the beginning. Therefore, I have get used to srand; in the beginning when Im gonna use rand();

    A typical random number generator will generate the same sequence of numbers with the same initial state. It can be a blessing or a curse. Be sure to seed your random number generator when you want individual runs to be different. I typically use time for that.

    About repeated calls to rand:

    The repeated calls to rand is to force leading zeroes in case I get a number like 000001. Perl would normally strip off all leading zeroes leading to strange filenames.

    By calling int(rand(10) repeated times, I guarantee that the resulting number will have this number of digits. So if I would want to generate a 10 digit number, thats ALWAYS 10 digits, even if the number coming out is 1, I would run:

    $number = int(rand(10)).int(rand(10)).int(rand(10)).int(rand(10)).int( +rand(10)).int(rand(10)).int(rand(10)).int(rand(10)).int(rand(10)).int +(rand(10));
    That would force the number to the string "0000000001" if it gets that number, and not "1".

    A simpler method would be:

    $number = sprintf "%010u", int(rand(10000000000));

    ...roboticus

      A typical random number generator will generate the same sequence of numbers with the same initial state. It can be a blessing or a curse. Be sure to seed your random number generator when you want individual runs to be different. I typically use time for that.

      Except with ancient Perl (5.004), calling srand() yourself is not necessary. rand() will call srand() on the first call to rand(). The seed value used is a function of time, PID and memory allocation so typically what rand()will do itself is better than what calling srand() yourself would do. srand rand

      Basically to get the same set of random numbers for a new run you have to keep specifying the same seed in an explicit call to srand() at the start of the run. I do that when I am debugging and want to get the same numbers so that some error case is reliably repeatable.

Re^3: Get CID inline attachments with MIME::Parser
by afoken (Chancellor) on Nov 29, 2010 at 13:35 UTC
    repeated calls to rand

    sprintf

    And by the way: why would it hurt to have directory names without leading zeros in front of the random number part? Separate timestamp and random number by some non-digit character and both parts can no longer collide. Given a Unix-based system, the random number doesn't even have to be an integer to be part of the filename. You could use one call to rand() to get a number between 0 and 1 with a lot of digits, and those use the full potential of the random number generator.


    About testing if $foldername exist

    Since the filename is consisted of time() and 5 random digits, for the chance to happen that it would write the mail in the same folder, is in the following prerequistes:

    -2 or more mails must be finished by postfix in the exact same second.

    -Both 5 digit random numbers must be exactly same. The chance for this happening is: 0,001 %.

    • Testing that mkdir did not fail with EEXIST gives you a collision chance of exactly zero. Whenever you see EEXIST, generate a new random filename and try again.
    • Not testing at all that mkdir succeeded can cause subsequent errors. If you omit further error checks, this may end in data loss. open() in your updated posting has no traces of error checks, neither or die nor use autodie. $parser->parse() is even wrapped in an eval {}, but no code checks $@ or the state of $parser after that.
    • On the servers I use, it is quite possible that two instances of the mail server run in parallel and each deliver one e-mail in exactly the same second.
    • Are you sure that the probability of generating two identical strings from two runs of int(rand(10)).int(rand(10)).int(rand(10)).int(rand(10)).int(rand(10)) is just 0.001 %? If rand() was be completely fair, random, and independant from past results, the probability for each of the possiible combinations from 00000 to 99999 would be equal. So you had a chance of 1 in 100000 to for each combination. But rand() is a pseudo random number generator, where each result depends on the internal state of the PRNG. Combined with the massive rounding due to int, I guess that the combinations are not equally distributed, and so the collision probability is higher. Did you know that the PRNG on some perl interpreters has just 15 bits, i.e. 32768 different "random" numbers?

    The mail SMTP server IS located in the Web server, they are running on the same machine! Thats why I want to skip all overhead with going internally through IMAP and POP3.

    Sure, it is now. But that approach won't scale when you need to support more users than the machine can handle. Being able to separate mail and web services to two or more different machines would help you. For that, you would need a clear distinction between both. IMAP could clearly separate both services.


    I also prefer to block incoming IMAP and POP3 in fw for security reasons and only have port 80, 25 and 53 open in fw.

    And this is relevant because ...? Given your code that lacks error checks and taint mode while processing data from untrustworthy sources, I guess that attacks via HTTP or SMTP are quite possible. Port filtering won't help at all. And if you run an ancient version of BIND on port 53, your server is very likely already rooted.

    Disabling all unused services is a good idea, because it reduces the risk of being attacked. But still, you could use IMAP here, simply by configuring the imapd to listen only to connections from localhost. Should your needs grow, you could connect mail server and web server by a cable between two dedicated network cards, and make imapd listen only on the address assigned to that card.


    The problem with parsing the mail as-its opened by the receiver, is if someone would send you a lets say a 50 MB mail with attachments of 49MB. You might not want to have to download that attachment, but you want to still read the body of the mail. You would still have to wait until the attachments is parsed before body can be opened.

    This is probably a limitation of MIME::Parser. But it is not a generic limitation of the e-mail system as we currently use it. You can stop parsing the e-mail at any arbitary point and use what you got so far. You don't have to process attachments to see the mail body. You may need to decode some or all attachments if the mail body is HTML and refers some or all attachments.


    About permissions: I prefer to code the permission system itself. As you might see, the mail is placed in the /my/ folder. Thats a user of the webmail system. When I have get all running, I will implement so the system will place the mail in the /$user/ folder where $user is the part before @. No malicious user can access other user's mail since their login will make the webmail system read from "their" folder. Theres no need to config unix permissions since no unauthorized has admin/physical access to the server machine.

    Good luck. Your attempts at securing the system don't look very promising. Given your setup, all a bad guy needs is a single bug in any of the applications running on the web server, and he has access to all mails on the server. Unix permissions could help you prevent that.


    About maildir: Maildir are writing the mail to the disk before its parsed. Thats means parsing has to wait until mail is fully delivered.

    How would you display attachments that have not yet been parsed? Right, that won't work. So you have to wait for the entire mail, no matter what happens.


    About switching mailservers: I selected postfix because its efficient and it can stream the mail to a mailbox command's STDIN. If I would switch mailserver, I would require that the mailserver can do that.

    Most mailservers can deliver to procmail or a procmail replacement via a pipe. But that's not the point. Tight integration into the mail server will make it much harder to switch to a different mail server when your current mail server can't handle your future requirements.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)