JockoHelios has asked for the wisdom of the Perl Monks concerning the following question:
My one defense is that you gave me so much to think about and work through that I put in a lot more time than usual to work the concepts into my code :) Successfully, too. I love it when that happens.
Original textI'm running through an array via foreach, and as it runs through the array I'd like to be able to delete rows based on specific content. I've found reference material showing how to do this with a hash, but not with an array.
Changing over to a hash would add to the complexity. I need to keep the rows in the same order as they are read in from the file, and don't otherwise need hash features for this data.Is it possible to just delete the current row from an array when looping through it without setting up a counter and splice operation ? That's the only way I can see of accomplishing this, and I suspect this problem could be solved in a more concise way.
I can clear the row with a regex substitution, but this leaves a blank row in place and the dreaded uninitialized value warning rears up...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: foreach array - delete current row ?
by dave_the_m (Monsignor) on Jun 02, 2013 at 21:37 UTC | |
Dave. | [reply] [d/l] |
|
Re: foreach array - delete current row ?
by davido (Cardinal) on Jun 02, 2013 at 22:20 UTC | |
Consider three options: If you have the ability to throw memory at the problem, I would tend to favor the linear solution provided by grep or by foreach combined with push. Dave | [reply] [d/l] [select] |
|
Re: foreach array - delete current row ?
by BrowserUk (Patriarch) on Jun 02, 2013 at 23:41 UTC | |
Three methods; which is best depends upon the size of the array, but grep is never a bad choice unless you're tight on memory:
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
by roboticus (Chancellor) on Jun 03, 2013 at 10:50 UTC | |
Update 3: As BrowserUk notes below, my code is flawed. Since he already corrected it, I'll leave my incorrect code here. For future benchmarking, I think I'll have to try and remember to put in a test case to compare the results of all the routines to ensure that they match. I've done that sometimes in the past when I wasn't sure my code was correct. However it seems that I should *always* do it, since I was sure that *this* code was correct. <sigh> Generally, I just build a new array because it's easiest. I didn't realize that it's reasonably fast, too.
I'm surprised at how much the for_splice version moves around in the rankings. While waiting for it to finish, I realized that your offset copy method would be faster if you saved the bookkeeping for the end, so I tweaked it a little:
... will this *ever* finish? If it does before I go to work, I'll update the node with the result. Update: Added the edit in place chunk, and replaced the output to include those results as well. Update 2: The 1e6 case finally finished, so I added them. If anyone cares, this is on an Intel Atom 330 1.6GHz machine with 2GB RAM. Additionally, I slightly formatted the output (removing a few blank lines, and inserting some in other places) for readability. ...roboticus When your only tool is a hammer, all problems look like your thumb. | [reply] [d/l] [select] |
by BrowserUk (Patriarch) on Jun 03, 2013 at 13:44 UTC | |
And actually, my original benchmark was also flawed -- or at least lazy -- in as much as it conflates the time taken to build the original array into the overall timings; which is unrealistic. Correcting for that
I get yet another set of numbers:
Which just goes to prove a) be careful what you benchmark; b) O(n2) in C can often be considerably faster than O(n) in Perl; if the former avoids the multiple opcodes of the latter. WHere the copy-offset (or your in-place) really come into their own is when the array being filtered is close to the limits of your memory to start with. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by roboticus (Chancellor) on Jun 03, 2013 at 18:05 UTC | |
by Anonymous Monk on Apr 27, 2015 at 01:55 UTC | |
by BrowserUk (Patriarch) on Apr 27, 2015 at 08:24 UTC | |
|
Re: foreach array - delete current row ?
by Laurent_R (Canon) on Jun 02, 2013 at 22:36 UTC | |
Deleting elements from an array while iterating over it is explicitly documented as 'So don't do that' in perlfunc. Yes, this is really getting bad and messy. But for some reason, it seems that it sort of works fine with a hash. I had a program where I was progressively "improving" my hash by deleting useless elements, and everything went fine. Then I thought that it might be more efficient to use an array instead, and I figured out that it did not work at all. The array got completely messed up. In brief, it seems that you can delete records from a hash while processing it, but can"t do it in an array. OK, maybe I got it wrong and did not understand fully what was going on, but I got the feeling that you can modify a hash while processing it with the keys function, but not an array fith the foreach function. Does any monk here have an explanation for this apparently different behavor? | [reply] |
by davido (Cardinal) on Jun 02, 2013 at 22:43 UTC | |
As documented in each, it is safe to use the each iterator to iterate over a hash and delete the element most recently returned. It is also safe to generate a list of keys, and then delete hash elements by iterating over those keys -- you're not deleting elements from the list returned by keys, you're deleting elements from the hash that were listed by keys; a subtle but important difference. With an array, deleting elements requires rearranging the array, which (as the C++ folks like to say) invalidates the iterator, or as you mentioned, the array gets "completely messed up." Dave | [reply] |
|
Re: foreach array - delete current row ?
by vsespb (Chaplain) on Jun 02, 2013 at 21:46 UTC | |
delete the current row from an array when looping through it without setting up a counter and splice operationIHMO counter+splice is most feasible way. I use it when copying arrays with grep is too slow. | [reply] |
|
Re: foreach array - delete current row ?
by locked_user sundialsvc4 (Abbot) on Jun 03, 2013 at 02:11 UTC | |
Instead of deleting the items that you want to discard from an array that you are iterating over ... which is a very dicey proposition in any programming language ... just create a new list and push the entries that you want to keep onto it. (The grep function might do this in one fell swoop.) Then, replace the original list with the new one. @list = grep( &keep_me, @list); In almost every case, the two lists consist of references to some bit of information ... and a reference, no matter what monstrosity it may refer to, is small and cheap. You’re actually not moving the contents (big) around, just references-to (small) them. As you build the new list, the reference-counts of all its contents briefly rises to “2,” then, when the new list replaces the old one (and the old one vanishes into the gloom of the garbage-collector ...) they all go down again. | |
by AnomalousMonk (Archbishop) on Jun 03, 2013 at 07:01 UTC | |
I would add that the keep_me() function of the example must operate on the $_ default scalar, which is topicalized by grep.
| [reply] [d/l] [select] |