in reply to Re: Fastest way to determine whether a specific port is open via a Socket
in thread Fastest way to determine whether a specific port is open via a Socket

The program that I am writing is reading content from files that are in a folder with filename looking like this:
IP_ADDRESS.txt. The content get's submitted to the IP_ADDRESS and the file deleted. Then the next one and so on. The important thing is that the files that didn't go through must be retried again and again, in the mean time there are newly created files, which content can not be delayed by the retry procedure. That's why I am looking for rely fast or independent process (thread) to do the retrying. Your input is highly appreciated.
  • Comment on Re^2: Fastest way to determine whether a specific port is open via a Socket

Replies are listed 'Best First'.
Re^3: Fastest way to determine whether a specific port is open via a Socket
by jbert (Priest) on Oct 30, 2006 at 15:53 UTC
    OK, so you've got a system which is doing message passing between servers based on IP, where both latency and reliability are important.

    Implementing reliability reliably (hah!) is pretty hard. How well are you covering against power failure (you're fsync'ing file descriptors, right? Are you flushing stdio buffers beforehand?) You're triggering off of the existence of files - do you have race conditions where a file is created (empty) and might be processed before it's filled in? You're processing collections of related files. Are they created in a specified order?

    I'd seriously consider using existing software for this. In particular, either find a message passing module/library of even consider good old store-and-forward SMTP. Any grown-up mail system will have the reliability thing sorted and also not have any issues about one failing delivery stalling the entire queue. You might need to tune connection timeouts and retry timeouts though.

    If you don't like this idea and want/need to write it yourself, you're either going to have to get threaded, multi-process or get asynchronous. All of these solutions will add a lot of complexity to your setup and your best chance is to pick the one you understand best.

    If it were me and I had to implement this sort of thing then I might go for an async, event-based system and a simple UDP protocol. The events are then as simple as:

    1. New file to process (add new memory record, send UDP msg)
    2. UDP ACK received (clear memory record, log delivery)
    3. Timed event: UDP response not received (inc retry counter in mem record, retry UDP send or discard+log)
    4. Timed event: poll for new files (OR use system filesystems to generate notification events)
    It shouldn't get much more complex than that, but you will be doing async programming, so you can get bitten by some things taking unexpectedly longer than you think sometimes (e.g. DNS queries).

    You'll also have to think about how much state you need to save on shutdown/restore and if you need to sync it to prevent against uncontrolled shutdown (power loss).

    If you want to do multiprocess, cpan throws up "Parallel::ForkManager" as a possibilty. That might help. The things which make multiprocess painful include sharing information after process creation time, beyond the child exit code. In your case, you might get away with a simple succeed/fail exit code for each delivery which might make things quite simple.

      Thanks a lot for the reply. What i am doing is a lpr queue. I have several hosts running WinLPD - an LPD alternative for Windows 98. On other hand I have two applications that are supposed to print jobs to the remote hosts (It is a point of sale system). The whole idea is to manage e bad queue for the jobs that are not going through. At the same time the issue is with the jobs going to hosts that are "alive". Looked around for a lpr queue but there is nothing freely available, which is why I decided to do this myself and then host it on Source Forge as a tool, as there is a need for Windows based LPR queue software that is free of charge.

      I will have a look at Parallel::ForkManager and try and utilize it in the processing of the bad queue. The jobs in my case are very small text files (mainly customers slips). Which makes the processing of a single job very quick... except the case when the host on the other side is not responding and I need to retry this job until it becomes available. Using of threads makes the biggest sense for now.

      The process that I am talking about looks like this:

      <APPLICATION> -> <JOB_FILE> -> <QUEUE_PROCESSOR-LPR> -> <REMOTE_LPD_PRINTER>
        Ah, OK. Good luck.

        FYI, one approach which some mail systems use is to keep "known bad" and "just received" in different queues.

        The idea here is that any time you see something in the just-received queue it is likely to be good (most requests are good) and so you schedule it immediately. If it fails, you move it to the 'bad' queue, where you have a different policy (perhaps a simple round-robin where one process tries each 'bad' in turn, with a short sleep before walking the queue again).

        These can either be logical (say, two different @arrays referring to the same directory root) or physical (actually have two different root dirs). One characteristic of the latter is that the 'badness' state persists over restart. This can either be good or bad, depending on what behaviour you want. There's certainly something to be said for rebooting forgiving the past signs of unavailable printers.

        There are a *lot* of similarities to the SMTP MTA problem - e.g. keeping per-destination availability info to avoid repeated connections to a known-dead host. Overkill for v1.0, but you might get some good ideas from reading the FAQs, docs or code for one of the good SMTP MTAs, say Exim.