in reply to Way to "trim" part of a phrase?

Once you have divided up the block into phrases, would not a substitution be the most efficient way of stripping the unwanted bits?

s/[^\.!\,\:\)]++[\.!\,\:\)]//

I haven't tested this, it is just a thought - i.e. greedy match non-target characters followed by a single target character, and replace it with nothing?

And if you haven't already thought of it Benchmark is good for comparing any alternate ways you can think of doing this. And if speed is really an issue Devel::NYTProf is pretty rock and roll for optimising code usage! HTH!

Just a something something...

Replies are listed 'Best First'.
Re^2: Way to "trim" part of a phrase?
by ultranerds (Hermit) on Aug 14, 2009 at 08:36 UTC
    Hi,

    Thanks for the reply. However, it doesn't seem to do anything :(

            $text =~ s/[^\.!\,\:\)]+[\.!\,\:\)]//;

    pour 2 personnes, achetez deux motos tout le monde je prépare un un projet de voyage assez similaire

    ..still comes out like that,instead of how it should be, with:

    achetez deux motos tout le monde je prépare un un projet de voyage assez similaire

    Any suggestions?

    Re benchmarking - we are already using the Benchmark on, to keep track on speed stuff (as its a large site, even a small amount of CPU increase can be a major headache for us)

    TIA!

    Andy
      Never mind - there was a typo in your regex (you had ^ inside the [ bit =)) This works:
      my $string = q|pour 2 personnes, achetez deux motos tout le monde je p +répare un un projet de voyage assez similaire|; print qq|OLD STRING: $string \n|; $string =~ s/[^\.!\,\:\)]+[\.!\,\:\)]//; print qq|NEW STRING: $string|;
      Thanks again. Andy

        I can't see a difference between the two variations you posted! Also the [^chars]+[chars] was deliberate - when you define a character class, putting ^ inside negates it - so i meant 'match a string of non-special characters, followed by a special character'. i guess it could also be put as s/.*[\.\:\!\)\,]// i.e. match anything followed by a special character!

        I understand your pain about overheads and memory usage though, so keep pushing and you'll get there eventually!

        Just a something something...