in reply to writing perl function like sql IN

The canonical answer is to use a hash:

my %in_set; undef @in_set{1, 3, 7}; if( exists $in_set{ $x } ) { # ... }

The grep solution mentioned before is also common.

The other solutions posted involving modules have much more overhead than either of these two, but have strengths in various areas beyond your simple use case.

Makeshifts last the longest.

Replies are listed 'Best First'.
Re^2: writing perl function like sql IN
by davido (Cardinal) on Aug 31, 2004 at 06:00 UTC

    grep may be a common solution, but it's not necessarily the most efficient. It probably doesn't matter for small sets or infrequent tests. But if efficiency does matter, one must keep in mind that the grep solution always has an O(n) outcome, even if the item you're looking for is the first element found. grep always searches the entire list.

    Any linear search solution where the loop terminates as soon as the item is found will have a worst case of O(n), but an average case of O(n/2) (which isn't really true big-oh notation)...assuming randomly ordered list.

    A binary search solution will achieve O(log N), again, assuming that the binary search quits as soon as the item is found. That's an order of magnitude faster than the linear search solution.

    A hashing algorithm, well written, such as Perl's %hash can achieve O(1), and is not sensitive to order nor should it be affected by dataset size (see the O'Reilly Wolf book, Mastering Algorithms with Perl, pp. 158-159, 168). So for many purposes a good old fashioned Perl hash might be one of the best solutions. ...assuming the dataset is in memory.

    For those unfamiliar with big-oh (O(x)) notation, it's often used as an analysis of order of growth of work an algorithm must do as a dataset grows. "O(1)" means no growth. "O(n)" means a one-to-one growth ratio. "O(log n)" is an order of magnitude slower growth than "O(n)", for example. The reason that I said O(n/2) for average case was not really valid big-oh notation is because big-oh isn't concerned with average cases. Big-O notation represents an "order of growth won't be worse than..." relationship. Just a quick refresher. ;)


    Dave

      You forget to mention the other side of the trade-off: a hash trades memory for speed. grep will run in O(N), but also consume a lot less memory. It also requires a certain amount of work to set up the hash; if you're only going to test a particular set few times, it may well end up not amortizing that up front effort.

      TANSTAAFL.

      Makeshifts last the longest.