in reply to Re: Re: Re: Optimizing regular expressions
in thread Optimizing regular expressions

Interesting. Needed a few tweaks to compile (maybe I grabbed an early version).

It also drops the non-wordchars in output, and I couldn't get the context to work when matches were closer than BEFORE and AFTER settings.

I'll spend some time with your code later, as it's an interesting approach. I like your coding style, too!

Anyway, you can see that this not a trivial problem to solve... I'll keep checking back.

Thanks again,

BTW -- did you try running the code I posted?

Replies are listed 'Best First'.
Swish module (was Re: Re: Optimizing regular expressions)
by japhy (Canon) on Jun 03, 2001 at 04:07 UTC
    I was forgetting some capturing parentheses. I've gotten it fixed now, and it seems to run fine. I've not yet run your code, because after I saw your post, I was very interested in finding out how to do stream matching, so I wanted to write my own.

    japhy -- Perl and Regex Hacker
      I'm using:

      $find = Swish->new( "'", # ignore beginning "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'", "'", # ignore ending 'breath every', 5, # words to highlight BEFORE 5, # words to highlight AFTER ); de>

      With a call-back function of:

      $find->stream(\*DATA, sub { " ... $_[1]<b>$_[2]</b>$_[3] ..." }); while (my $t = $find->match) { print "[[$t]]\n" }

      And I get this result:

      keywords are: <breath> <every> [[ ... With <b>every</b> form you come You surprise ...]] [[ ... me, hypnotize me << shouldn't be a break here With <b>every</b> ...]] [[ ... <b>breath</b> I take << or here You're depriving, suffocating ...]] [[, choking << here's the rest of the text? Now the time has come when I tell myself There's nothing more I can take Then you show me Does it matter right now If I'm already numb That's what you do to control me ]]

      It's tough to realize when words match near each other and to then print out a complete line of text.

      In other words, I'm looking for this output:

      With <b>every</b> form you come ... hypnotize me With <b>every</b> <b>breath</b> I take You're ...

      So the code realizes that "every" and "breath" are near each other so not to print out words on either side twice.

      Well, I'll have to work through it on Monday.

      Thanks,

        Ah, I understand your request now. That will take a bit of finessing to achieve. And those linebreaks that you say should not be there SHOULD be there, since that is where newlines existed in the data stream.

        japhy -- Perl and Regex Hacker