in reply to searching for unique numbers into a string
use List::MoreUtils qw(uniq); my @unique = uniq split(/\t/, $myLine); $myLine = join("\t",@unique);
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: searching for unique numbers into a string
by almut (Canon) on Apr 06, 2009 at 15:44 UTC | |
Interestingly, when you benchmark it, the OP's method turns out to be slightly faster (~20% with Perl 5.8.8, ~40% with Perl 5.10.0) than List::MoreUtils' implementation, which is
So, if the original order of values doesn't need to be maintained, it isn't such a bad choice, after all — though BrowserUk's form would be somewhat more natural, IMHO (but not faster). | [reply] [d/l] |
by GrandFather (Saint) on Apr 06, 2009 at 23:06 UTC | |
I'd like to see that benchmark. My version gives somewhat different results: Prints (neglecting the sanity check output):
Update: UtilsM uses List::MoreUtils. UtilsH is the uniq code implemented in the same context as the other benchmark tests. True laziness is hard work | [reply] [d/l] [select] |
by almut (Canon) on Apr 07, 2009 at 02:02 UTC | |
OK, a couple of errors on my part... (mea culpa) However, as it looks after more judicious investigation, the results are highly data dependent. So what did I do? First, the code (cleaned up, and with GrandFather's @values added):
I first started with my input data ("AB", an adapted/simplified version of BrowserUk's random input generator), and got the following results:
From this I had concluded (prematurely) that there is virtually no difference between "uniq1" and "uniqM" (the XS implementation), so I commented out the latter benchmark (my error 1). Then, after having played around a bit, I had settled on the following results (which is where the reported ~40% for Perl 5.10.0 came from):
The thing I had overlooked (error 2) is, that my $data pointer was still referring to BrowserUk's data ("BU"), which I had been playing around in between. So, those results are in fact for rather unusual input, i.e. 1000 values of around 4K each... The full set with the BU data is, btw:
which shows that, for large strings (probably all of them being unique), the XS variant is clearly the slowest (!) With GrandFather's input data, OTOH, I do get similar results:
Overall, BrowserUk's uniq() seems to be the winner. In other words, the findings essentially remain the same — with my original data (which isn't all that unrealistic). But there is huge variation depending on the type of input. Moral of the story: thou shalt not be lazy and not disclose your benchmark code (telling myself) ;( | [reply] [d/l] [select] |
|
Re^2: searching for unique numbers into a string
by jwkrahn (Abbot) on Apr 06, 2009 at 15:52 UTC | |
'grep' builds an array Wrong. grep builds a list. Perhaps you should read What is the difference between a list and an array?. | [reply] |
by Marshall (Canon) on Apr 07, 2009 at 17:14 UTC | |
I personally think this gets into what I would call "language lawyering" and fine parsing of the terminology and to no real benefit. I personally like the way Tom does it by introducing the term array and then quickly moving to calling all of these Perl equivalent things to "arrays in other langugages", lists. That a list is described by an array variable type, is not that an important distinction to me. When we get into more complex Perl structures like LoL (List of Lists), LoH (Lists of Hash), LoLoL (List of Lists of Lists), my opinion is that these are MUCH more descriptive than other types of terms. I guess part of this has to do with what somebody's programming background is. In the C world, a "traditional 2- D" or higher order C array is a pretty worthless data structure for most jobs. There are lots of problems with this, just one thing is that you have to pass around both dimensions which makes it very hard to write general purpose matrix routines. Also for example, I don't know of any traditional 2-D arrays used in the Unix O/S. Maybe there are some, I just don't know where they are. Starting with intermediate C, "traditional" 2-D C arrays go the way of the dodo bird. The way in C to build a practical 2D structure, say of ints is an **int (array of pointers to arrays of ints). This is very close to exactly what a Perl LoL is! In C, this is also a 2-D array, but it is a special kind of 2D array. In Perl, calling this a LoL, List of List (or more specifically List of references to Lists) is much more descriptive of what is really going on! A main point with a LoL is that everything is a pointer until you get to the final dimension. A "traditional" array has fixed memory layout and dimensions. That is not what a Perl list is! Any Perl list that has a name can be "grown". Even ones that are initialized with X number of elemements at the beginning of the program. I'm sure this post will generate some controversy. Maybe sometimes we get too caught up in yelling about terminology? I like the terms LoL, etc. If somebody wants to call this AoA, I'm not that bent out of shape about it. I think LoL is better, but this is not the "end of the world". | [reply] |