in reply to Re: Replacing Types
in thread Replacing Types

I think that your approach is better than regexp. It it the best approach or we can do anything better? I have millions of sentences and want to find common patterns of "informative" sentences this way.

Replies are listed 'Best First'.
Re^3: Replacing Types
by shmem (Chancellor) on Sep 21, 2006 at 20:38 UTC
    Any solution is quite fine until it has to scale. But that depends on the dataset and on the goals. You were talking of simple search and replace operations; now it's about finding interesting patterns via search operation through a hugh dataset. This usually requires indexing of tokens / database-like operations / vectorizing terms.

    I begin to suspect an XY Problem... maybe you should use a search engine like Swish-E or Lucene.

    What are you really trying to do?

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}