Re^7: Time::HiRes sleep does not always work

I'm sorry if it offends you, but that honestly was my experience of Windows back when I used it. For example at one point I was using Win32::OLE to drive IE to load a website then print it to a postscript file. (That was then passed through ps2pdf to get a PDF version of the web page.) I was using the option to tell IE to do the print but not pop up an alert. It would work about, say, 99.9% of the time. But one time in a thousand the alert did come up.

I got around it by making nothing else dependent on that job finishing in a timely manner, emailing someone if the alert came up, then having someone manually check that alert every week or two. Oh, and I pushed the person who developed the web page to make a standards compliant version so that we could just use html2pdf and get rid of the Windows dependency. (That happened about 6 months later, and I was very glad when it happened.)

According to Microsoft, every layer from Win32::OLE on was part of and integrated with Windows. Including IE. So when I say that this was non-deterministic behaviour within Windows, I am using Microsoft's definition of Windows. I do not know why it happened. It "smells like" a race condition but I cannot prove that.

Admittedly this was some years ago. As you may know, I avoid dealing with Windows whenever I can. The problems may have all gone away, and I wouldn't know.

However I doubt it. Every so often I hear stories that confirm my low opinions. For instance with the initial release of Vista I heard complaints from people testing it that they couldn't get the same result twice from an install. More specifically, they could script an install of the whole OS and applications. Start with 2 factory machines that were supposedly identical. Do identical installs. And they would wind up with 2 machines that would misbehave in different ways. Of course this was before Vista was in wide release and I am sure it has improved since, but still a non-deterministic install process does not inspire confidence.

If I was to guess a cause for these problems, I would say that it is a combination of heavy multi-threading and a dysfunctional development process. If you want to know about the latter, you could do worse than to occasionally read the discussions from insiders at http://minimsft.blogspot.com/. I know we disagree on multi-threading, but read through The Problem with Threads and tell me whether you can honestly disagree with the technical content of that paper. (I know you will disagree with the tone, so please identify technical content you disagree with.)

A second cause of problems is that behaviour may be deterministic, but is affected by things outside of our knowledge. DLL hell comes to mind as a common cause, but it is far from being the only one. For instance many home users have no idea when their computer is captured and made part of a bot net. How many of us don't know someone whose computer mysteriously got corrupted and crashed? How often is it because of computer viruses and worms? Even if the behaviour is deterministic, from the operator's point of view, it isn't. If you're like my nanny your computer starts off fine, gets slow, then at some point won't turn on. For no apparent reason.

And before you say it, yes I know that other operating systems like Linux use multi-threading. However by and large they use it less than Windows does. YAnd their internal development processes seem to be in better shape. Yes, they have security problems. But those are less exploited than Windows is. The result that I haven't experienced a similar amount of pain from non-deterministic behaviour on Linux that I remember from Windows.

Comment on Re^7: Time::HiRes sleep does not always work

Replies are listed 'Best First'.
Re^8: Time::HiRes sleep does not always work by BrowserUk (Patriarch) on Aug 21, 2008 at 00:14 UTC
Okay. Let's tackle these one at a time: Automating IE from Perl. Perl: which for the most part operates despite running on win32 rather than with it. doing IPC via Win32::OLE (an early version of given your timeline) A module which has probably suffered more bugs in its history than any other module known to man. to drive Internet Explorer. A multi-threaded, GUI program designed for human interaction. To load a page from a remote server. That you have no control over its performance, loading, specification ... via an ever-changing network of remote bridges and routers. All of which are not only not under your control, but are not under any single point of control. using HTTP A connectionless, unreliable protocol. to load a page, and then print it. And when 1 time in a 1000, a "timing issue" causes a problem, "Windoze" is the cause! Start with 2 factory machines that were supposedly identical. Do identical installs. On one big project I was deeply involved in, 700 "identical" HP servers were purchased on a single purchase order. The (EU mandated and approved) purchase order required that all 700 machines be identical specification. When automated installs of HPUX started to be rolled out in a 20 server pilot test, 7 of them failed. In subsequent checks on the 700 machines, 62 variations of motherboard were found. 11 different combinations of harddrive/drive controller manufacture were found. That grew to well over 100 different hardware combinations when minor hardware revisions were taken into account. And that rose to well over 300 variations once minor fairmware revisions were acounted for. In the ensuing legal case, it was determined that is was impossible for a manufacturer to provide 700 identical machines. The "identical" clauses were dropped from the contracts for the 40,000 NT workstations, because no manufacturer would sign them. Even badly seated memory chips or adapter cards can cause install time hardware checks to produce different results. And all OSs can and do suffer the same way. DLL hell. Badly versioned shared library problems are not confined to Windows. Indeed, I first encountered them on IBM VM/SP. And as this paper from some guy at Priceton shows, Linux isn't immune to this either. Indeed, he says: "Surprisingly, it is arguable that the shared library problem under Linux is perhaps even worse than the corresponding problems in Microsoft Windows.". Windows machines get hacked--due to "non-determinism". The US government is currently trying to extradite Gary McKinnon for hacking "53 US Army computers, 26 US Navy computers, 16 Nasa computers, one US department of Defence computer and one US Air Force computer.". Apparently, "The entire network of more than 300 computers at US Naval Weapons Station Earle, in New Jersey, is said to have been left inoperable after Mr McKinnon deleted files." How many of those were running some version of *nix do you think? And if the US Navy/Army/Airforce and NASA aren't capable of securing their machines, what chance has "your nanny"? How do you think your nanny would fair trying to resolve the compiler and linker errors when building her own copy of Linux? And how do you think she would fair configuring IPTables? Yes, windows is far from perfect. And the way it comes pre-installed with half the ports open to the world is downright idiocy. But remember that Dell configure the copies of windows installed on Dell Machines; and Sony on Sony machines; etc. Windows can be bolted down pretty tight, but it takes know-how. The manufacturers should have that know-how. Granted, if MS defaulted it to be bolted down, and forced the manufacturers to have to unbolt things, then they might put more thought into the process, but knee-jerk "blame it on Windows" answers really don't benefit anyone. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^9: Time::HiRes sleep does not always work by tilly (Archbishop) on Aug 21, 2008 at 06:02 UTC
sigh I provided enough information that you should have seen that most of those steps are not possible sources of the problem. Even if the network and webserver were not under our control (they happened to be), there were four calls through Win32::OLE. The call to bring up IE worked reliably. The call to navigate to the web page worked reliably. At this point all external networking, HTTP and so on has succeeded. Next came the call to print. That always went through. The call to close IE always worked. The point of failure was that IE did not always pay attention to the flag that was passed saying not to pop up a dialog box. That call is entirely within the machine in question. All of the external networking already successfully happened at previous steps. Now for all of the bugs in Win32::OLE, I've never heard anything indicating any way that code-paths within it could be non-deterministic. Besides, every call to Win32::OLE successfully went through, and the only the passing of a particular flag got messed up. While I can't prove it, I would be willing to bet large amounts of money that if the problem could have been debugged, we would have found that Perl reliably passed that call to Win32::OLE that reliably passed it into the appropriate underlying C library supplied by Microsoft. After all given that I had a reasonably simple single-threaded program, there is no possible cause to expect any non-determinism within that part of the code. If that belief is correct, then the flag must have been lost at some point after that. But every layer after that is part of Windows! Yes, including IE, which (as you say) is a multi-threaded GUI program designed for human interaction. (Which is, I suspect, the most likely location for the actual bug.) According to Microsoft, that is part of windows. So I've seen strong evidence that the problem arises in a piece of software that is (according to Microsoft) part of windows. How then am I wrong to blame windows for the problem? Hardware differences would be a reasonable explanation of that specific problem. It would not of other problems I've heard of though. Of course library version conflicts happen elsewhere. However most of us, including me, have had more grief from it on Windows than on other operating systems. Between other operating systems it varies widely. I have personally experienced and heard of more problems with it on Red Hat than Debian. Gnome has inflicted more of it on the Linux world than, say, Perl. However Windows has a history of being particularly bad in this regard. You got this one backwards. I was saying that uncertainty over whether your machine has been hacked or infected results in perceived non-determinism. And not that non-determinism causes security problems. Yes, security can be a problem with any system. However it seems to be more of a problem for Windows. Take, for instance, the case you bring up of Gary McKinnon. I can't find any records of what mix of machines he hacked into. But reading through things like this interview it looks like what he did was have a Perl script search for windows machines that had the default blank password for Administrator, and then he logged in. So it seems quite possible that none of the machines he accessed was any version of nix. And if they were, then certainly the majority of them were not. Now on to the default install. You are putting far too much fault on the manufacturers. While it is true that Dell actually configures the copies of windows on their machine, they are forced by contract* to configure it within very tight guidelines provided by Microsoft. Microsoft took control of that after they got upset about pre-installs of Netscape on Windows 95 machines, and to the best of my knowledge have never given it up. Therefore no matter how competent they are, Microsoft makes them screw up. Let me double-check that. (Searches.) Hmm. http://msdn.microsoft.com/en-us/isv/bb190477.aspx includes the sentence, OEMs cannot change any default settings in Windows, except as specifically described in Microsoft documentation or other contractual agreements. That doesn't exactly give the OEMs a lot of wriggle-room, does it? OK, that is specific to XP and to Windows Server 2003. But I think that it would take something big to make Microsoft change that policy. And I haven't heard of anything that could qualify. So unless you can come up with something more recent, it looks to me like all of the problems with the default install of Windows are entirely Microsoft's fault. No matter how much the OEMs might wish things were different, they are prevented by contract with Microsoft from doing anything more serious for security than installing a third party anti-virus product.	[reply]
Re^10: Time::HiRes sleep does not always work by BrowserUk (Patriarch) on Aug 21, 2008 at 11:37 UTC
Having made one long and painful foray into the world of driving IE from Perl via OLE (never again), and from the sounds of things in almost exactly the same time frame as you, I'll tell you that the cause of your problem was almost certainly cauased by unprocessed messages in IE message queue. If you could revisit that code now, and insert a few strategic `Win32::OLE->SpinMessageLoop`, you'd probably be able to cure that 1 in a 1000 event. However Windows has a history of being particularly bad in this regard. Yep. It was really bad in Win95/98/ME running third-party VB apps. That was quite a while ago... I was saying that uncertainty over whether your machine has been hacked or infected results in perceived non-determinism. That's like perceiving Fords as having bad fuel economy when you don't bother to keep your tyres correctly inflated. Read on a little further: OEMs may install third-party applications that launch in place of Internet Explorer, Outlook Express, Windows Media Player, Windows Messenger, or Windows Firewall as enabled by Windows registry settings and other documented mechanisms. So, the OEMs are not "prevented by contract with Microsoft from doing anything more serious for security than installing a third party anti-virus product."! If every OEM installed (say)the free version of ZoneAlarm and configured it, there would be no botnets. I don't know for sure that Zone Labs would allow them to do that for free, but it seems like a reasonable bet that the opportunity for product placement and potential upsell would be pretty pursuasive. That one step alone wouldn't make user experts or careful, but it would at least make them aware. And then there is the whole thing of "or as permitted in the license agreement.". I don't know what is in there as they are not published, but if you've ever used a newly purchased Sony VIAO laptop and seen the amount of customisation, and mostly useless preinstalled crap, on their systems, you'd think they could find the time to install a decent firewall, browser and email program, rather than a bunch of root kits. I'm no apologist for MS and roundly critisise them for their failings, but laying the blame for problems caused by badly installed/corrupted Perl modules, at their door, doesn't help anyone. Yes. Windows system dominate the bulk of cracked hacked and rooted systems, but then they dominate the bulk of all installed systems, by a huge margin. And if even the US military fail to secure them properly, how do you expect the average non-geek to do so? And as other systems and software become more prevalent, so they are being targeted. Witness the recent spate of Apple hacks; pdf and Flash hacks; Firefox hacks. The list goes on. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^11: Time::HiRes sleep does not always work by zentara (Cardinal) on Aug 21, 2008 at 12:10 UTC
Re^12: Time::HiRes sleep does not always work by tilly (Archbishop) on Aug 21, 2008 at 21:40 UTC
Some notes below your chosen depth have not been shown here
Re^12: Time::HiRes sleep does not always work by BrowserUk (Patriarch) on Aug 21, 2008 at 12:49 UTC
Some notes below your chosen depth have not been shown here