comment on

Update: I suggest you first looking at solution below, before reading complete thread.

Class, prepare your mouse pointers, take them near '--' and be ready to click 'Vote!' because I'm going to say something terrible about CPAN.

Few days ago I've asked about Reliable email parsing and that discussion uncover some points:

People doesn't need reliable perl modules, worksforme is enough.
People even doesn't understand what reliable solution is good thing which must be first priority goal (even if you can't reach this goal right now)!
CPAN == goodness. Use CPAN module - is only recommended solution for anything. If you say all existing CPAN modules for {any task here} is wrong because {any reason here} - you got a lot of '--', no matter is you right or not!

For me, ANY software MUST be developed with these priorities in mind:

be reliable: it should do it work correctly (ex. for email parsing task it must support all features defined in RFCs related to email format plus be able to handle non-RFC-compliant emails produced by some buggy software)
be secure: it should not allow unauthorized usage
next priorities may vary from have a lot features to work fast or be cheaper or have intuitive interface, etc.

... because if software isn't doing exactly what it should or open way for hackers/viruses into my system I don't need it, no matter which (unreliable!) features it have!

I say "ANY software", but especially this important for all reusable things like perl modules and core things like OS kernel, perl itself, text editor or web server (i.e. all things which you use to develop your own application). Without this developing own reliable application become a nightmare because lower layers which you use isn't reliable itself and you spend huge amount of time detecting and fixing bugs there... or reimplementing these layers as part of your application. :(

Now about CPAN. 99.9% of CPAN modules developed without reliability and security in mind. They are "worksforme", feature rich, have ugly interfaces, bloated, clever, anything! But not reliable and secure.

So, if you wanna develop reliable application you can't use most of CPAN modules. List of "safe" modules may vary, but usually it include:

some 'mature' modules which you can't rewrite (DBI, Crypt::*, Math::Pari, ...)
core modules (Socket, Fcntl, List::MoreUtils, Sys::syslog, Time::HiRes, Data::Dumper, Carp, Inline, Test::*, ...)
and some clever small/simple modules usually developed by few well-known authors (Perl6::Export::Attrs, Data::Alias, Regexp::Common, ...)

If this looks like "a lot" of modules then check them again: they all are low-level modules, each doing small simple task (probably except DBI)! All CPAN modules I seen which try to solve more complex tasks doesn't looks reliable and secure. A-L-L! Ok, I (still) hope there some exceptions which I missed, but this doesn't important.

So... for every high-level task (and low-level tasks which has no good enough CPAN module) you must develop custom solution. We all read: 'CPAN is goodness!'; 'Any your task already solved by reusable CPAN module, no need to reinvent it!'; 'Perl is better than other languages because only Perl has CPAN!'; etc. for YEARS...........................
This result in people stop thinking critically about CPAN, they believe is't goodness because it IS goodness, and that's all.

Now it's time for adducing some proof, small examples for people who think CPAN modules ARE reliable.

First example - executing external process task.
There already exists a lot of ways to execute command: system(), open(), `` (backticks), IPC::Open3, IPC::Run, IPC::Run3. In short, system() is good but doesn't allow interaction with running command, and all others doesn't handle signals correctly. Below is gore details, if interested.

While it's possible with system() to run command
with custom filehandles instead of inheriting only
STDIN/STDOUT/STDERR of current process, it has some limitations:

 - You can't use safe LIST form of system() and should execute
   your command using shell to have filehandle redirections
   like '2>&1' working.
 - You can't interact with running command using pipes.
 - If you need to give your non-STDIN/OUT/ERR filehandle or if
   you need to give more than 3 filehandles to command you should
   use fcntl() to modify close-on-exec flag and/or reopen your
   STDIN/OUT/ERR to needed filehandles.
 - User can't set timeout for command. There no way to setup
   alarm() for command. User can setup alarm() for his main
   process, but:
   1) this isn't acceptable in module because user can already
      setup some alarm() before calling our module
   2) this alarm() will interrupt system(), but not interrupt
      running command (and we can't kill it because we don't
      know it's pid)

open() limitations:

 - open(), unlike system(), doesn't block SIGCHLD. This may be
   fine in user code, but in module this result in two problems
   if user has $SIG{CHLD} handler installed:
   1) pid of process open()'ed in module will be delivered into
      user's SIGCHLD handler
   2) $? status will not be available in module
 - open(), unlike system(), doesn't block SIGINT & SIGQUIT in
   main process so if you use open() instead of system() just
   to interact with executed command using pipe like:
        open(my $fh, 'some_foreground_command |')
   then both this command and your main process will receive
   these signals if user press Ctrl-C or Ctrl-\.
 - If command will exit while user print() into pipe SIGPIPE
   will kill main process if user don't block it.
 - All system() limitations also apply to open(), with only
   correction: you can interact with running command using
   pipe, but only single pipe - if you need more pipes then
   you should use IPC::Open3, IPC::Run or IPC::Run3.

`command` (backticks) limitations:

 - No safe LIST form.
 - You can't interact with running command using pipes.
 - If you need to give your non-STDIN/OUT/ERR filehandle or if
   you need to give more than 3 filehandles to command you should
   use fcntl() to modify close-on-exec flag and/or reopen your
   STDIN/OUT/ERR to needed filehandles.
 - All open() limitations apply.

IPC::Open3 limitations:

 - No signal handling at all, so all open() limitations about
   SIGCHLD, SIGINT, SIGQUIT and SIGPIPE apply.
 - No timeout.
 - Unable to use more than 3 filehandles.

IPC::Run limitations:

 - No signal handling at all, so all open() limitations about
   SIGCHLD, SIGINT, SIGQUIT and SIGPIPE apply.

IPC::Run3 limitations:

 - No signal handling at all, so all open() limitations about
   SIGCHLD, SIGINT, SIGQUIT and SIGPIPE apply.
 - You can't interact with running command using pipes.
[download]

My realization of this task have 110 lines of code and it's only realization I know which handle all nuances described in Stevens APUE book. You can check it by downloading POWER::Utils module from my website.

Second example - timers and timeouts.
If you use time() function for realizing timer or timeout, no matter CORE::time() or Time::HiRes::time() -- your code is unreliable. Why? Because. Because there NTP and /bin/date. And they may (and will!) change current time, both forward and backward. Only reliable way to realize timer/timeout in Linux - use CLOCK_MONOTONIC param for clock_gettime(2). This required syscall() until I've asked Time::HiRes author to add this feature, and since Time::HiRes 1.77 you've clock_gettime(CLOCK_MONOTONIC) feature. How many CPAN modules work with timers/timeouts? And is even single of them use CLOCK_MONOTONIC? Few months ago I've searched CPAN and found 0 such modules.

Third example - mailbox parsing (from my previous thread).
In short, mailbox can be in one of 4 formats: mboxo, mboxrd, mboxcl and mboxcl2. There no way to autodetect it. Reading from mailbox using wrong format lead to damaging messages. CPAN has many modules which has 'read mailbox' feature, but no one of them allow user to configure mailbox format BEFORE reading, and only one has note in documentation about these formats (and try to do it best autodetecting it, which is anyway impossible).

Fourth example - reliable eval().
Eval? What's the hell is wrong with eval!? Only one: eval() doesn't support one advanced perl feature: source filters. You can found my version of eval (6 lines) which compatible with internal perl eval but also support source filters in same POWER::Utils module.

Want more? Ok. Fifth example - using GPG.
CPAN contains a lot of modules for accessing GnuPG, but they all not reliable enough and some of them not secure. My version (see POWER::GPG module on my site) is module execute gpg with correct signal handling; use non-blocking pipes and multiplexing to avoid hang on processing huge files; never store sensitive information on hard drive in temporary files (one temporary file used for checking detached signature); use timeout to protect against gpg hangs. Details about existing CPAN modules below:

Limitations of other modules from reliability/security view:

=over

=item GPG

 - Doesn't handle SIGCHLD, SIGINT, SIGQUIT.
 - Hang on large files.
 - Parse unreliable STDOUT instead of reliable --status-fd.

=item Crypt::GPG

 - Doesn't handle SIGINT, SIGQUIT.
 - Incorrectly handle SIGCHLD.

=item GnuPG

 - Doesn't handle SIGCHLD, SIGINT, SIGQUIT.
 - Use deprecated shared memory interface.
 - Use temporary files to store sensitive information.

=item GnuPG::Interface

 - Doesn't handle SIGCHLD, SIGINT, SIGQUIT.

=back
[download]

And the last thing which drives me crazy. Perl isn't a new language, it activelly used in the world for more than 10 years. Perl, by design, is text processing language, used initially for system administration. Parsing emails is really needed task, it's text processing task and it's related to system administration. CPAN has huge amount of modules for this task, and no one of them support all email-related RFC. After so many years no one developed such module! AND NO ONE THINK IT'S NEEDED - at least comments to my question say so......... "Yeah, why you think you need reliable email parser, what you will use it for? Forget about it, man, you don't need it. Nobody need it!" :(

In reply to Reliable software: SOLVED (was: Reliable software OR Is CPAN the sacred cow) by powerman

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Problems? Is your data what you think it is?
	PerlMonks