in reply to Re: is index faster than regexp for fixed text token?
in thread is index faster than regexp for fixed text token?

Excellent thoughts, thank you. The question for me is in squeezing more performance out of a mod_perl web application which needs to check for named options in tiny search strings, so perhaps the mundane index implementation is better there. It is faster for me to type regexp matches, so I'll probably use them more, and regexp matches can have /i so I can make them case-insensitive which with index requires lc on both arguments.

I studied Boyer-Moore in my first C Algorithms class under the very demanding Dr. Wong. I'll never forget him, he took points off my code for forgetting to comment, and I learned as much as 50% of my programming craft from him. Nearly failed that class! I was such a geek and I signed up for it as a freshman, when he taught it like a graduate course. I got the impression that he didn't like me...but that probably made me study harder! He also showed me the XOR trick for swapping two registers without an intermediary:

$a^=$b; $b^=$a; $a^=$b; # swap a and b

SSF

Replies are listed 'Best First'.
Re^3: is index faster than regexp for fixed text token?
by roboticus (Chancellor) on Jul 05, 2009 at 17:31 UTC
    sflitman:

    I'd definitely update the benchmark to test cases that are close to what you're using in your application, just to make sure you get the best result. Include the lc calls, as well, so you can see if they push you to the regex side.

    Oh, yeah, if you're going to do other matches as well, be sure to see if you can incorporate them into the regex to let it get even more performance for you.

    Regarding your class--I always found it best to take the course with the teachers other students considered "the hardest", as I found I usually learned more from them. And because of their reputation, I'd keep up on the notes & homework. It's amazing how easy a difficult class is if you keep up on the reading and homework.

    ...roboticus

    Update: Added second paragraph.

Re^3: is index faster than regexp for fixed text token?
by Marshall (Canon) on Jul 06, 2009 at 06:27 UTC
    "He also showed me the XOR trick for swapping two registers without an intermediary":
    $a^=$b; $b^=$a; $a^=$b; # swap a and b

    This "trick" is part of first level class in 'C'. It is shown to demonstrate power of XOR. It looks cool but it is inefficient and in general not a good idea even though it produces a correct result. XOR is "more expensive", meaning takes longer than other simple bitwise ops like OR or AND or NOT. Anyway this construct is just demonstrated to explain XOR, it is not practical is not used even in 'C'.

    In Perl, this is bad code! - just wrong.

    The Perl way: ($a,$b)=($b,$a);
    the above is practical and could be used.

      I agree; I like the Perl way better because it is more expressive and readable, plus it makes more sense to my XOR-incapable neurons. (Actually, I heard a lecture somewhere that neuron dendrites compute analog boolean-like operations such as AND and NOT on their neural inputs).

      SSF

        Great! Glad that we agree that ($B,$A)=($A,$B) is FAR superior the XOR trick! Clarity is an important aspect of software engineering.

        XOR is fundamental to data encryption and error correction codes. But for the most part, you won't need to use this in your normal code!!

        This is a bit off topic as this is 'C' Code, but just to show you another tricky way of using XOR in a low level sub. This is basically "modem code" as it looks like the gibberish that would result from an unsync'd modem!!

        Low level C has stuff like this, but almost no Perl code should have it! Heck the raw assembly code is easier to understand this this critter below, but this idiomatic good C code. enjoy.

        /* flip all bits except between range defined by startBit and numBits to left*/ /* bit numbering like: 7 6 5 4 3 2 1 0 */ inline unsigned short flipNotInRange (unsigned short in, int startBit, + int numBits) { unsigned short result; result =(~in) ^ (~((unsigned short) ~0<<numBits )<< startBit); return result; }