|Welcome to the Monastery|
I worked on the core network system for GPRS (2.5G wireless system) before. We used a similar environment, not the same type of card, but the binary is also compiled and loaded on to card.
I don't really like your idea of directly sharing data among processes, this way of multi-processing has been proved wrong from a structural point of view, and largely reduce the maintainability of your application.
For that GPRS system I worked on, we used both TCP and UDP to communicate between processes, it has been proved that performance was not a problem with TCP and UDP. Also we used stream at any tightly-coupled point (have to check whether your card support any thing similar to Unix stream.)
(I think you made some typo, by saying multi-thread across multi-processors, any way).
As for language, c is usually the best choice for this kind of project.
Writing perl script and then compile into c might shorten the development cycle, but will definitely hurt performance. I would assume anything that runs on card, requires high performance and throughput.
Also I would imagine that all the libraries, if any, you can find are in c.