in reply to Re: Re: Sorting a large file
in thread Sorting a large file
Secondly for temporary files, use File::Temp instead of hand-rolling.
And thirdly the fact that you defined your store function inside of your sortsub function suggests some confusion on your part. You cannot get nested functions that way.
As a matter of personal taste I would drop the prototype, use strict, etc, etc, etc. But the three I gave above are the biggies.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re (tilly) 3: Sorting a large file
by clintp (Curate) on Feb 22, 2001 at 02:58 UTC | |
As I recommended in Re (tilly) 1: Sorting a large file, I would tryFile::Sort before the above code.No problem, except this was a demonstration of merge sorting, nothing more, and identified as a quick hack. Merge sorting is not rocket science and a valid solution to the problem at hand. Secondly for temporary files, use File::Temp instead of hand-rolling.For eons, tempfiles have been created with tmpnam() from POSIX. Nothing wrong with that. In fact, I suggested exactly that. Yes, File::Temp might be your solution, but not everyone's. TIMTOWDI
And thirdly the fact that you defined your store function inside of your sortsub function suggests some confusion on your part. You cannot get nested functions that way.No, no confusion. Did you read the code? You'll note that's NOT a nested subroutine declaration. The sort subroutine is all on one line. The bare block is to provide a lexical scope. Read it again.
As a matter of personal taste I would drop the prototype, use strict, etc, etc, etc. But the three I gave above are the biggies.For something written in 20 minutes, that works, demonstrates the topic at hand, has (or suggests) idiomatic perl coding... I didn't think it was too bad. PS: The prototype is NECESSARY. Comment was there for your benefit. Read it and you'll discover why. | [reply] |
by tilly (Archbishop) on Feb 22, 2001 at 05:18 UTC | |
First of all merge sorting is not rocket science. However it does take some care to get right. For instance you have a rather naive scalability. You did not clean up your temporaries. Your code is not portable. Secondly there are good reasons that I suggest using File::Temp rather than getting a name and then using that from the POSIX module. The first reason is that there is no way to guarantee no gap between when a temporary name is chosen and when you open that temporary file using the POSIX interface. By contrast with File::Temp if there is any way to do it on your OS, it will. The second reason is that File::Temp provides automatic and secure cleanup (to whatever extent is possible) of temporary files. Both are very good reasons to use File::Temp rather than POSIX. As for the third point, we were both confused by your code. I had not looked closely enough for the position of the braces. You had not put in the keyword my. (Which strict would have told you.) Therefore your bare block served no purpose but to obfuscate. As for the matter of personal taste. It takes rather less than 20 minutes to go look at Sort, and find out that there is a wheel that looks like it has been maintained and has had a lot of input for this exact problem. You can, then install and write to that to solve the problem in a cleaner, more efficient, and portable fashion. That is what I call idiomatic coding! As for the prototype, I see that you used it, but I have to wonder at the gratuitous lack of backwards compatibility. Note that 5.6.0 is not a very good release - why write code that forces people onto it? Anyways if you want to reinvent the wheel, well two can play at that game. Here is the wheel reinvented more scalably, more flexibly, using fewer filehandles, using strict, doing its own cleanup... PS: What do I mean by more scalable? Well if you compile in large file support then this can, with a fixed amount of memory, sort terabytes of logfiles with only a few dozen passes. PPS: With a large number of files your repeated sorts are going to be slow. I don't know whether your 2 pass solution gains more on IO than it loses on the inefficiency of the comparison. I would suspect that it doesn't, but I don't feel like benching that. | [reply] [d/l] |
by chipmunk (Parson) on Feb 22, 2001 at 10:25 UTC | |
| [reply] |