guha has asked for the wisdom of the Perl Monks concerning the following question:

Background:

I have a couple of programs which access different remote databases distributed in our wan. The programs are invoked in a cronish manner several times a day. The communication is done with tcp/ip over fixed 2Mb lines.

Current situation:

Lately there has been a lot of problems with the reliability of those links, related to HW-failure at the link provider and excavators cutting the backbone fiber, etc, etc.

Problem:

Until recently I didn't think this was a major problem with the application, because if the program failed with a timeout due to the broken link, then DB's would be resynced on the next time the programs starts with a functional link

However, here's where I spotted an, to me anyways, odd behaviour of possibly DBI and/or DBD::ODBC.

Consider this scenario

  1. The link is broken
  2. The program starts.
  3. A few seconds after the DBI->connect method was invoked the program dies with a msg saying that an timeout has occurred.
  4. Time passes ...
  5. The program starts again.
  6. .... but it never exits ....
  7. The link goes up again, but the program started at 5 is still in blocking wait or whatever.
  8. I kill the program manually.
  9. The program starts again and remember that the link is now up.
  10. .... but it never exits from connect.

The fix:

What I did was to use Net::Ping to test wheter the link was up before invoking the connect method of DBI. This does however not handle the case where the link goes down during a connection, and the previous scenario could very well occur again.

More info
Program fail on Win2000 and WinXP
AS perl v 635
DBI v 1.37
DBD::ODBC v 1.05

Also I'm using DSN's to connect.

my $dbh = DBI->connect("DBI:ODBC:$dsn") or die "Could not connect to $dsn >> $DBI::errstr\n";

I have tried to find something related through google and one of the mailing-list archives for DBI, but no avail.

Question:

As I'm totally stumped by this, I welcome any input to the problem.

Replies are listed 'Best First'.
Re: Timeout problems with DBI
by zakzebrowski (Curate) on Sep 10, 2003 at 11:40 UTC

    Welcome to the world of wan programming. It *can* be done, but you have to do it carefully. I think the problem you are running into, is that the ODBC driver might not have a 'timeout' setting enabled. I know that when I wrote a comm layer using a database and a jdbc driver, executing a 'select' would not timeout, even if the tcp connection was broken. I had to manually specify (in jdbc's dsn string) two timeout values, one for the connect, and one for the time that it took for the connection to return results. Additionally, even when the client timed out right and disconnected the connection, the *server* still thought it had a connection. This was a problem because we quickly reached a 'connection limit' for the server. So, I set the timeout for the jdbc connection && the database connection to the same value, and that (seemed) to work ok....

    If you want to test this in the future, consider setting up an openbsd / freebsd box with ipfw on it. You can throttle the network connection, introduce random packet drops, etc. You can then automate your testing by using Net::Telnet or Net::SSH to send commands to ipfw, and Cluster::Run (not yet released, working on it at home) or your favourite Distributed:: module to stress test the server.

    Good luck! Must get java filth off my hands... unclean! unclean!


    ----
    Zak
Re: Timeout problems with DBI
by jdtoronto (Prior) on Sep 10, 2003 at 12:29 UTC
    You have discovered that ODBC is not resilient!

    Using MySQL we never use ODBC to connect - even on our LAN, of course using an MS database we don't have much choice, so we don't use MS.

    We got around it in some cases by actually wrtiing a SOAP server in Perl on the DB machine and accessing that from the client side - look up SOAP::Lite

    As someone else said - 'Good Luck'

    jdtoronto

Re: Timeout problems with DBI
by monktim (Friar) on Sep 10, 2003 at 12:45 UTC
    I have Perl processes that update multiple dBs using Win32::ODBC than run 24/7. I ran into problems with some dB servers not being available because of maintence or other reasons. I wrote a package that connects to the dB(s) and stored the connection information. I have a sub that executes the query and if I get (Win32::ODBC::Error() =~ /Communication link failure/) after execution then another connection to the dB is made and the query executes again. It's something like this.
    my ($rc) = $sql_obj->sql($query); return ($sql_obj, undef) unless $rc; return ($sql_obj, "".Win32::ODBC::Error()) unless (Win32::ODBC::Error +() =~ /Communication link failure/); my($sql_obj) = $self->connect($server, $user, $password); return (undef, "Cannot reconnect") unless ($new_sql_obj); $rc=$sql_obj->sql($query); return ($sql_obj, "".$rc) if ($rc); return ($sql_obj, undef);
Re: Timeout problems with DBI
by liz (Monsignor) on Sep 10, 2003 at 10:43 UTC
    Are you sure the problem is on your end, and not e.g. a hanging lock on the database / ODBC end?

    Liz

Re: Timeout problems with DBI
by guha (Priest) on Sep 11, 2003 at 12:45 UTC

    Followup

    I've been doing some homework, as a result of the replies, thank you all.

    However first I would like to make a couple of clarifications.

    1. Point 10 of the scenario should actually read "Works again"
    2. I think I need to emphasize that the major problem is not the broken link, but the fact that program invocation # > 1 hangs in the connect method ( see below for (too much) detail.
    liz
    As I see it in this case, there is no contact with the DB-server. Of course the requester/client or possibly DSN-manager could have a bug which results in this behaviour. I have no idea how to approach that.

    What I have done is to enable tracing in DBI, settings for greatest detail.

    zakzebrowski
    Yes I've checked for timeouts and it's set for 15 secs. Which seem to be what I see for the first program invocation.

    jdtoronto
    Interesting idea, I would like it alot if you could elaborate on the pros of the SOAP solution compared to ODBC in this regard.

      The advantage of SOAP is that the SOAP server is on the database server. The ODBC connection is direct and does not have to worry about network issues. The disadvantage is that you would need to rewrite your application to use SOAP and move the database calls to the application server.

      Another option is DBI::Proxy. It is a pure Perl DBI proxy to remote server with its own DBI connection to the database. The big advantage is over SOAP is that the interface is DBI so you wouldn't have to rewrit your application.

      Well, if you know it *should* exit, but doesn't... grab the pid, use win32::Process and win32::process::info , when you launch the process, grab a pid and start a 'watcher' process. If timeout > run time, do a Win32::Process::KillProcess on that process... As you suggest, it may be an odbc bug...
      Cheers & Good luck.

      ----
      Zak