in reply to Parallel::ForkManager takes too much time to start 'finish' function

The on_finish callback is only called when P::FM reaps a child, and P::FM only reaps a child under three conditions:

There could be an arbitrarily long delay between a child exiting and one the above events. Adding the following to your program should eliminate that delay:

$SIG{CHLD} = sub { $pm->reap_finished_children };

By the way, if the work performed by your child only takes "a few milliseconds or nanoseconds", you are actually slowing things down by using P::FM. Data passed to `finish` gets serialized and written to disk, then read from the disk and deserialized for the `on_finish` callback!