Re: Re: Execution order of END/CHECK vs BEGIN/INIT

Replies are listed 'Best First'.
Re^2: Re: Execution order of END/CHECK vs BEGIN/INIT by adrianh (Chancellor) on Jun 28, 2003 at 08:56 UTC
Have to disagree. Always found END block behaviour completely intuitive. Indeed, it never occurred to be that anybody would want it to work in any other way. I don't often have multiple END blocks, but when I do I want them to run LIFO. Either because later blocks need the context that earlier blocks clean up, or because early blocks need some state that is only known to later blocks. Yes, you can come up with situations where they don't do what you want - but I would argue that would be because you're trying to use them for the wrong sort of thing ;-)	[reply]
Re: Re: Re: Execution order of END/CHECK vs BEGIN/INIT by tilly (Archbishop) on Jun 28, 2003 at 16:14 UTC
It is a heuristic. Which is to say that it is a fancy way of saying that it doesn't really work. If you want to be a real control freak, it is possible to manufacture cases in which you want operations to go in any order you could possibly ask for, and there is no way for Perl to meet every theoretically possible case. In fact if you want some real room for foot shootage, just look at exec and exit. If someone happens to include those in the code, your END blocks don't get to run at all... The heuristic is that your BEGIN blocks are for initializations that you want to happen early, and END blocks are for final cleanup. Furthermore anything that appears after you might need your functionality, therefore your initializations need to happen before them, and your cleanup has to occur after them. Therefore BEGINs are FIFO and ENDs are LIFO. Now to your bullets. I want this piece of code in my main program to be the very last thing executed, so where do I put it? What is your reason for wanting it to be last? Control of program flow? END blocks are not meant to be part of normal program flow. If you try to use them for what they aren't meant to do, it is no surprise that you can cause yourself confusion and pain. If not control of program flow, then what? Well probably cleanup. In which case see above. You put it after any initializations that you want available in your END block, either in your code or in modules that you load. Well, before anything that you want to execute before it. Regular code that appears after it will execute before it (if it executes at all). You only need to worry about its placement vs other END blocks. And there it mostly does the right thing. So, I put it at the top of the program? You place it at the point in the program where it is obvious that it will need to be run eventually. Which is generally directly after any initializations that it needs to cleanup, and we like this because putting related code together makes synchronization errors less likely. Well, no. If you do that then it will be executed after the END blocks in any packages you use which might not be what you want, so put it at the top of the program, except after any use statements for packages that might have END blocks that need to execute after your END block. Did you want to insist on it executing last, or merely to clean yourself up? END blocks have been thought through as a way to clean yourself up. I have yet to see a practical complaint about them in practice. (Now if you want to complain, go take a look at the calling of DESTROY in global destruction, every so often I need to explain why that messed up to people and tell them how to fix it with doing their cleanup in an END block...) But how do I know if the modules I use need to execute their END blocks after I execute mine? You should not need to know whether they have END blocks or not. Place your END after your initialization, and your initialization after loading any desired functionality, and your END blocks normally will have whatever functionality is reasonable. However there is an interesting case here when the functionality that you need might be AUTOLOADed at runtime, and the AUTOLOAD might add an END block of its own. In this case you will want to be very careful to make sure that the functionality that you need is exercised before Perl sees your END block. Which means that you either make sure the initialization is in a BEGIN block, or put the initialization in regular code and wrap your END in an eval. But note that I have yet to see someone ask a question indicating that they tripped up on this, and even in this pathological case the principle of putting the END block right after all initializations have happened is precisely what you want to do. Read the source Luke. And hope that the authors thought to document the need. If the authors used END blocks as intended, correct usage will be obvious. (Just put your END right after your initializations and don't worry about it...) If the authors chose to miscode their module and put an Easter egg in the END without warning, well this is but the smallest of ways in which bad code can cause problems for the person using it. Come to think of it, why might they need to do that? Normally because they have some state that they want to be properly cleaned up? Well, the module might class data shared by all its instances that need to be cleaned up. This cannot be done as a part of any individual instances DESTROY method, so it has to be done in a END block. Gotcha alert. Your instances might want to use that data in their own DESTROY methods. But they might be in global variables somewhere that is not cleaned up until global destruction, which happens after END blocks. If this is a problem (I have seen it be occasionally) then the module will want to also manage all of its instances and finalize them during the END phase. (If you want access to virtually any other data, including your own internal variables, then you want to do this. Ilya does have a patch which is in 5.8 IIRC which has a heuristic that mostly gets global destruction right, but it isn't perfect.) Ah! But that goves me a problem. My program is converting some irreplaceable flat-file legacy data to DB format. I want to ensure that the file gets deleted after it has been successfully input, and that is what I am going to do in my END block, but I need to ensure that the data has been succesfully flushed to teh DB first. If there is any chance that the connection will fail and the data I stored is lost or corrupted, then I don't want to delete the file. How do I handle this? You have irreplacable data which you are going to allow to be automatically deleted by possibly buggy code in the middle of execution? That would seem to be your biggest problem right there... But we shall suppose that the coder has good reasons for wanting to do this (umm..you are out of space and management refuses to buy backup media, OK, attempting to live with a PHB, I sympathize), how do you accomplish the act? Well in that unfortunate case I would decide on how I am going to track success/failure, and then in my END block, wrap my unlink in an `if ($is_success) {...}` block. You've got backups of the files haven't you:) Before doing anything automatic and possibly nasty with data, I insist on having backups. I know I am human. I have messed up often enough to not trust myself, and I definitely know better than to trust someone else who has not yet learned to take proper precautions. In summary. It is a heuristic. It can theoretically go wrong. But I have yet to see the order of execution of END blocks to not do what is desired in real code if the END block is placed directly after the initialization that it cleans up. Unlike, say, global destruction. Or even the ability of people to unexepectedly eliminate the END phase with an exit or exec. (If you use END blocks, make sure to plead with the sometime C coders to not call exit...)	[reply]
Re^4: Execution order of END/CHECK vs BEGIN/INIT (exit??) by tye (Sage) on Jun 30, 2003 at 14:36 UTC
In fact if you want some real room for foot shootage, just look at exec and exit. If someone happens to include those in the code, your END blocks don't get to run at all... Um, exit doesn't prevent END blocks from being run. Not even die does that. Just exec and uncaught fatal signals (as noted in perlmod). Also, BrowserUk, note that BEGIN/END are referred to as "package constructors and destructors", which supports the LIFO order for "destructors". - tye	[reply]
Re: Re^4: Execution order of END/CHECK vs BEGIN/INIT (exit??) by tilly (Archbishop) on Jun 30, 2003 at 16:21 UTC
Oops, you're exactly right. I knew about exec, and had a mental association with exit as well, but should have checked the documentation. My unfair maligning of exit is almost certainly due to the fact that I try to avoid it because of the difficulty it causes when you go from stand-alone scripts to scripts embedded in a persistent interpreter.	[reply]
Re: Re^4: Execution order of END/CHECK vs BEGIN/INIT (exit??) by BrowserUk (Patriarch) on Jun 30, 2003 at 15:36 UTC
Perhaps Tilly was thinking of POSIX::_exit which was muted (by me, though previously by another) but as a possibly workaround to a different problem. On package constructors/destructors point, I realise the nature of the beast, and if you accept that such implicit destructors are a 'Good Thing', then I agree that there is some merit in their being called in LIFO order between the packages. Clumbsily stated, but I mean that the END blocks of the first encountered package (generally main) should be called last. Those of the second package encountered--generally the first use statement within main--should be called second from last. Those of the third package encountered--perhaps the first use statement within the first package called from main--being called third from last. And so on through the package hierachy. But I still find it counter-intuative that the END blocks within any given package will be called in the reverse of their physical ordering in the code, and even worse, that I would have to place the END block(s) in my code, be it main or package, before any use statements in that same package, in order to ensure that my code gets it's final opportunity to clean up after those subordinate packages it calls. If I were sitting down to design the END block processing from scratch, I would combine the END blocks within a package in their natural ordering, and call these combined destructors in the LIFO order of which the packages came into existance. That's probably too complex, but it seems like the 'right way' to do it to me. Generally, I think that reliance on automatic destructors is a bad thing, and that every package should offer me a call that allows me to inform them when I have finished with them. The END block for that package could then test to see if I have done the right thing, and only call the destructor if I haven't. At the end of the day, my opposition to the current mechanism is purely academic as I have rarely ever used END blocks, and never in anger. And as Tilly pointed out, I've also never heard of anyone who has actually fallen foul of the problem I perceive, and so this is likely just (another) load of hot air on my behalf. That does lessen the strength of my feelings against the status quo though, and I would still consider it to be a bug. Or at least, consider any code that replied upon intimate knowledge of the order in which they called as a bug, because if one person can write code that relies upon this, then two can, and eventually the two pieces of code will come together, and there is no way to have both called last. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply]
Re: Re: Re^4: Execution order of END/CHECK vs BEGIN/INIT (exit??) by tilly (Archbishop) on Jun 30, 2003 at 16:34 UTC
Re: Re: Re: Re: Execution order of END/CHECK vs BEGIN/INIT by BrowserUk (Patriarch) on Jun 28, 2003 at 17:50 UTC
I guess we aren't likely to reach an agreement on this. If a resturant put my desert on the table first but told me not to eat it until last, I would probably feel much the same as I do about this. I appreciate, and implied as much, that my contrived example doesn't hold up to scrutiny, but the point I was trying to make does. If it is legitimate for a module to require an END block to achieve its purpose, then it is also legitimate for my program to have a similar requirement. Your own example probably makes my case better than I can improve upon. I want to write a program that accesses a DB through a module that use class based DB connection, and an END block to free that connection when the program is cleaning up. Once I have performed my processing, I need to exec a follow-on script. To ensure that all the resources my script uses, directly or indirectly, are properly cleaned up, I place the exec in an END block so that everything gets the chance to clean itself up before I do so. Except that unless I know that one of the modules I am using has an END block that needs to be called before I exec and take the unusual step of placing my END block at the top of my script before useing that module, then my END block will be called before the modules, the exec occurs and the module never gets it's chance to clean up. And therein lies the fudge. It may be a reasonable compromise given the architecture, but it is hardly the "right" solution IMO. A correct design would be for the module to have a class method that my program could call to cause the class to explicitly request that it frees off all its resources. That way, I as the main script author can chose when things should be freed, and not have to rely upon explicit knowledge of the calling order of a chain of global events to ensure that my program runs correctly. Of course, if every class/module had such an 'Okay, I'm finished with you so do anything you need to do to clean yourself up' call, then I wouldn't need to use an END block in my code to perform the exec, I could just arrange for it to be the last thing that happened in the normal flow having called the appropriate "I'm done with you" routines, and that would be that. I can see no logic in the idea that a 'module' should be any more (or less) likely to need to have it's END block called later than my END block? What if I have am using two modules that need their END blocks to be called last. Then which ever order I choose to use them is going to be wrong. The fact that I have to know that I have to use whichever module most needs to have its END block called last, FIRST. Is not just counter-intuative, it's just plain wrong. There was a recent thread about what order people favour for their use statements. The answers ranged from alphabetical, to pragmas followed by utilities followed by classes, except the vars pragma which which always came last. So far, I haven't hit upon any good reasons to favour any one ordering over any other. They generally get added in what is basically chronological order as I find the need for them. The idea that I should have to scour the source code for each module looking for END blocks and then try and decide what order they should be called in and then reverse that to determine the order of my uses, beggars belief. I strongly suspect that END blocks came into being as a poor man's 'last gasp', fatal exception clean up mechanism and the real answer is that there should be no inter-module ordering dependancies between them. Were this the case, there would be no need to know what order they execute, which would be a wholey good thing. It would also remove the need to reverse the logical ordering of multiple END blocks within any given source file. The need for this backward ordering arises soley because the one time exception mechanism has been subverted. People have decided that rather providing an explicit call to perform cleanup and a fatal exception cleanup, they don't need to provide the former and require the caller to use it, because the latter will get called anyway...except that sometimes it doesn't. I wonder if there is any history of when this ordering decision was made and why. I also wonder if it will persist into P6. Personnally I strongly hope that it doesn't but we will probably have to wait for "Apocolypse N of N" for the answer to that, given that the last Apocolypse would seem to be the logical place to consider END blocks. But then again, maybe they should have been in Apo 1:) Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply]
Re: Re: Re: Re: Re: Execution order of END/CHECK vs BEGIN/INIT by tilly (Archbishop) on Jun 28, 2003 at 19:29 UTC
Question. Have you, in any real code that you have written (as opposed to hypothetical code that you might choose to write some day) ever been bitten by this behaviour? Have you seen anyone complaining about problems which they had because of it? Me neither. And it has managed to do the right thing when I needed it in some relatively complex cases. As for your module loading question, that is a red herring. Why should it matter what order you use the modules in? If the modules have dependencies, they are responsible for pulling them in, and in so doing will guarantee that any functionality the module needs to have when it unloads will still be there. OK, so this system can get into trouble with circular dependencies (for instance Carp and Exporter have to play games because each depends on the other), but that is at load, not unload. It is not your responsibility to manage your modules' dependencies, and it turns out that you don't need to. And now because I pointed out how to create a problem (with exec) you raise that case. Well it is valid, if you wish to use exec you can quickly cause a headache. Use exec in a complex C or C++ program and you cause similar ones for the same reason. (Using it within a Perl script is arguably just using it in a complex C program...) But it seems that most people who want to use exec manage to deal. Now suppose that someone comes up with some reason why their module should have its END go last. I have never seen that module, but I can dream up cases where you would want to. Well were I writing that module, I would document that fact up front because it is an important usage note that people are going to have to work with. Now suppose that someone else came up with another that did the same. And you wanted to use both. Um, well, good luck. Odds are that if they both want control of the end slot, they do things that are pretty incompatible. If you want both things to happen you probably need to make two system calls (or fork and then load each separately, or something else crazy). Perl cannot anticipate every possible need all of the time. And should not worry itself overly about anticipating needs that people can dream up, but nobody has apparently ever wanted in real life. But if you really want to get more control over END blocks, odds are pretty good that you could go stare at Perl's source and find where the END blocks actually get stored for use in cleanup. You can write a module that can go and manipulate those internals to your heart's content. It would actually be useful if you used it to implement versions of exit and exec that actually call the END blocks before exiting. However I don't think that you will find that anyone is particularly unhappy with how END blocks work in practice. (You might find some unhappiness with exit and exec though. But not that much since most people have found their own ways around that possible breakage.) And finally, given that END blocks don't cause significant complaint now, and the person who was in charge then is still in charge, well guess what I predict Larry Wall is going to say in the appropriate Apocalypse?	[reply]
Re: Re: Re: Re: Re: Re: Execution order of END/CHECK vs BEGIN/INIT by BrowserUk (Patriarch) on Jun 28, 2003 at 19:43 UTC