in reply to Execution order of END/CHECK vs BEGIN/INIT

There is indeed a reason for this.

Suppose that you have code with an END block. Suppose that your code uses a module which also has a END block that cleans up the module's external dependencies (eg database connections) so that the module is then unusable.

Which order do we want to run these END blocks? Well we want your user code because you might save something to the database, then we want the module unload second. Where do we think that these things will probably be placed? Well the module use will likely be at the top of your code, and your END block at the bottom of your code. So the END block that we want to execute first is likely to be the one that we saw second. Which means that LIFO is most likely to be the heuristic that gets it right.

Refering to Programming Perl, 2nd Ed on page 283 they say, You may have multiple END blocks in a file -- they will execute in reverse order of definition; that is last in, first out (LIFO). That is s that related BEGINs and ENDs will nest the way you'd expect, if you pair them up. Which is another variation of what I described. You can pair the initialization and cleanup code either in the file or in modules, and later code (which might expect to have the earlier initialized stuff there) will be entirely nested between the two.

Sure it seems odd. But it is what is most likely to do The Right Thing. (Which Perl always tries to do.)

UPDATE: I had said that the Camel's case was both a special case, and a generalization of what I described. That didn't make much sense - it is a variation.

  • Comment on Re: Execution order of END/CHECK vs BEGIN/INIT

Replies are listed 'Best First'.
Re: Re: Execution order of END/CHECK vs BEGIN/INIT
by BrowserUk (Patriarch) on Jun 28, 2003 at 07:28 UTC

    Like the man said, there had to be a reason:) I would dispute that it is a good one though. Apart from the lack of intuativity (Is that a word?), there are just so many ways to break this.

    The heuristic can be summed up as:

    • I want this piece of code in my main program to be the very last thing executed, so where do I put it?
    • Well, before anything that you want to execute before it.
    • So, I put it at the top of the program?
    • Well, no. If you do that then it will be executed after the END blocks in any packages you use which might not be what you want, so put it at the top of the program, except after any use statements for packages that might have END blocks that need to execute after your END block.
    • But how do I know if the modules I use need to execute their END blocks after I execute mine?
    • Read the source Luke. And hope that the authors thought to document the need.
    • Come to think of it, why might they need to do that?
    • Well, the module might class data shared by all its instances that need to be cleaned up. This cannot be done as a part of any individual instances DESTROY method, so it has to be done in a END block.
    • Ah! But that goves me a problem. My program is converting some irreplaceable flat-file legacy data to DB format. I want to ensure that the file gets deleted after it has been successfully input, and that is what I am going to do in my END block, but I need to ensure that the data has been succesfully flushed to teh DB first. If there is any chance that the connection will fail and the data I stored is lost or corrupted, then I don't want to delete the file. How do I handle this?
    • You've got backups of the files haven't you:)

    A bit contrived but, it still seems more than slightly weird to me. Sort of makes me hanker for the simplicity of old-time basic's line numbered code:)


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      Have to disagree. Always found END block behaviour completely intuitive. Indeed, it never occurred to be that anybody would want it to work in any other way.

      I don't often have multiple END blocks, but when I do I want them to run LIFO. Either because later blocks need the context that earlier blocks clean up, or because early blocks need some state that is only known to later blocks.

      Yes, you can come up with situations where they don't do what you want - but I would argue that would be because you're trying to use them for the wrong sort of thing ;-)

      It is a heuristic. Which is to say that it is a fancy way of saying that it doesn't really work. If you want to be a real control freak, it is possible to manufacture cases in which you want operations to go in any order you could possibly ask for, and there is no way for Perl to meet every theoretically possible case.

      In fact if you want some real room for foot shootage, just look at exec and exit. If someone happens to include those in the code, your END blocks don't get to run at all...

      The heuristic is that your BEGIN blocks are for initializations that you want to happen early, and END blocks are for final cleanup. Furthermore anything that appears after you might need your functionality, therefore your initializations need to happen before them, and your cleanup has to occur after them. Therefore BEGINs are FIFO and ENDs are LIFO.

      Now to your bullets.

      I want this piece of code in my main program to be the very last thing executed, so where do I put it? What is your reason for wanting it to be last? Control of program flow? END blocks are not meant to be part of normal program flow. If you try to use them for what they aren't meant to do, it is no surprise that you can cause yourself confusion and pain. If not control of program flow, then what? Well probably cleanup. In which case see above. You put it after any initializations that you want available in your END block, either in your code or in modules that you load.

      Well, before anything that you want to execute before it. Regular code that appears after it will execute before it (if it executes at all). You only need to worry about its placement vs other END blocks. And there it mostly does the right thing.

      So, I put it at the top of the program? You place it at the point in the program where it is obvious that it will need to be run eventually. Which is generally directly after any initializations that it needs to cleanup, and we like this because putting related code together makes synchronization errors less likely.

      Well, no. If you do that then it will be executed after the END blocks in any packages you use which might not be what you want, so put it at the top of the program, except after any use statements for packages that might have END blocks that need to execute after your END block. Did you want to insist on it executing last, or merely to clean yourself up? END blocks have been thought through as a way to clean yourself up. I have yet to see a practical complaint about them in practice. (Now if you want to complain, go take a look at the calling of DESTROY in global destruction, every so often I need to explain why that messed up to people and tell them how to fix it with doing their cleanup in an END block...)

      But how do I know if the modules I use need to execute their END blocks after I execute mine? You should not need to know whether they have END blocks or not. Place your END after your initialization, and your initialization after loading any desired functionality, and your END blocks normally will have whatever functionality is reasonable. However there is an interesting case here when the functionality that you need might be AUTOLOADed at runtime, and the AUTOLOAD might add an END block of its own. In this case you will want to be very careful to make sure that the functionality that you need is exercised before Perl sees your END block. Which means that you either make sure the initialization is in a BEGIN block, or put the initialization in regular code and wrap your END in an eval. But note that I have yet to see someone ask a question indicating that they tripped up on this, and even in this pathological case the principle of putting the END block right after all initializations have happened is precisely what you want to do.

      Read the source Luke. And hope that the authors thought to document the need. If the authors used END blocks as intended, correct usage will be obvious. (Just put your END right after your initializations and don't worry about it...) If the authors chose to miscode their module and put an Easter egg in the END without warning, well this is but the smallest of ways in which bad code can cause problems for the person using it.

      Come to think of it, why might they need to do that? Normally because they have some state that they want to be properly cleaned up?

      Well, the module might class data shared by all its instances that need to be cleaned up. This cannot be done as a part of any individual instances DESTROY method, so it has to be done in a END block. Gotcha alert. Your instances might want to use that data in their own DESTROY methods. But they might be in global variables somewhere that is not cleaned up until global destruction, which happens after END blocks. If this is a problem (I have seen it be occasionally) then the module will want to also manage all of its instances and finalize them during the END phase. (If you want access to virtually any other data, including your own internal variables, then you want to do this. Ilya does have a patch which is in 5.8 IIRC which has a heuristic that mostly gets global destruction right, but it isn't perfect.)

      Ah! But that goves me a problem. My program is converting some irreplaceable flat-file legacy data to DB format. I want to ensure that the file gets deleted after it has been successfully input, and that is what I am going to do in my END block, but I need to ensure that the data has been succesfully flushed to teh DB first. If there is any chance that the connection will fail and the data I stored is lost or corrupted, then I don't want to delete the file. How do I handle this? You have irreplacable data which you are going to allow to be automatically deleted by possibly buggy code in the middle of execution? That would seem to be your biggest problem right there... But we shall suppose that the coder has good reasons for wanting to do this (umm..you are out of space and management refuses to buy backup media, OK, attempting to live with a PHB, I sympathize), how do you accomplish the act? Well in that unfortunate case I would decide on how I am going to track success/failure, and then in my END block, wrap my unlink in an if ($is_success) {...} block.

      You've got backups of the files haven't you:) Before doing anything automatic and possibly nasty with data, I insist on having backups. I know I am human. I have messed up often enough to not trust myself, and I definitely know better than to trust someone else who has not yet learned to take proper precautions. In summary. It is a heuristic. It can theoretically go wrong. But I have yet to see the order of execution of END blocks to not do what is desired in real code if the END block is placed directly after the initialization that it cleans up. Unlike, say, global destruction. Or even the ability of people to unexepectedly eliminate the END phase with an exit or exec. (If you use END blocks, make sure to plead with the sometime C coders to not call exit...)

        In fact if you want some real room for foot shootage, just look at exec and exit. If someone happens to include those in the code, your END blocks don't get to run at all...

        Um, exit doesn't prevent END blocks from being run. Not even die does that. Just exec and uncaught fatal signals (as noted in perlmod).

        Also, BrowserUk, note that BEGIN/END are referred to as "package constructors and destructors", which supports the LIFO order for "destructors".

                        - tye

        I guess we aren't likely to reach an agreement on this. If a resturant put my desert on the table first but told me not to eat it until last, I would probably feel much the same as I do about this.

        I appreciate, and implied as much, that my contrived example doesn't hold up to scrutiny, but the point I was trying to make does. If it is legitimate for a module to require an END block to achieve its purpose, then it is also legitimate for my program to have a similar requirement. Your own example probably makes my case better than I can improve upon.

        I want to write a program that accesses a DB through a module that use class based DB connection, and an END block to free that connection when the program is cleaning up. Once I have performed my processing, I need to exec a follow-on script. To ensure that all the resources my script uses, directly or indirectly, are properly cleaned up, I place the exec in an END block so that everything gets the chance to clean itself up before I do so.

        Except that unless I know that one of the modules I am using has an END block that needs to be called before I exec and take the unusual step of placing my END block at the top of my script before useing that module, then my END block will be called before the modules, the exec occurs and the module never gets it's chance to clean up.

        And therein lies the fudge. It may be a reasonable compromise given the architecture, but it is hardly the "right" solution IMO. A correct design would be for the module to have a class method that my program could call to cause the class to explicitly request that it frees off all its resources. That way, I as the main script author can chose when things should be freed, and not have to rely upon explicit knowledge of the calling order of a chain of global events to ensure that my program runs correctly. Of course, if every class/module had such an 'Okay, I'm finished with you so do anything you need to do to clean yourself up' call, then I wouldn't need to use an END block in my code to perform the exec, I could just arrange for it to be the last thing that happened in the normal flow having called the appropriate "I'm done with you" routines, and that would be that.

        I can see no logic in the idea that a 'module' should be any more (or less) likely to need to have it's END block called later than my END block? What if I have am using two modules that need their END blocks to be called last. Then which ever order I choose to use them is going to be wrong. The fact that I have to know that I have to use whichever module most needs to have its END block called last, FIRST. Is not just counter-intuative, it's just plain wrong.

        There was a recent thread about what order people favour for their use statements. The answers ranged from alphabetical, to pragmas followed by utilities followed by classes, except the vars pragma which which always came last. So far, I haven't hit upon any good reasons to favour any one ordering over any other. They generally get added in what is basically chronological order as I find the need for them. The idea that I should have to scour the source code for each module looking for END blocks and then try and decide what order they should be called in and then reverse that to determine the order of my uses, beggars belief.

        I strongly suspect that END blocks came into being as a poor man's 'last gasp', fatal exception clean up mechanism and the real answer is that there should be no inter-module ordering dependancies between them. Were this the case, there would be no need to know what order they execute, which would be a wholey good thing. It would also remove the need to reverse the logical ordering of multiple END blocks within any given source file.

        The need for this backward ordering arises soley because the one time exception mechanism has been subverted. People have decided that rather providing an explicit call to perform cleanup and a fatal exception cleanup, they don't need to provide the former and require the caller to use it, because the latter will get called anyway...except that sometimes it doesn't.

        I wonder if there is any history of when this ordering decision was made and why. I also wonder if it will persist into P6. Personnally I strongly hope that it doesn't but we will probably have to wait for "Apocolypse N of N" for the answer to that, given that the last Apocolypse would seem to be the logical place to consider END blocks.

        But then again, maybe they should have been in Apo 1:)


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: Re: Execution order of END/CHECK vs BEGIN/INIT
by belden (Friar) on Jun 28, 2003 at 07:05 UTC

    <blink><blink>

    Wow. Perfect explanation. Thanks. ++tilly

    blyman
    setenv EXINIT 'set noai ts=2'