#!/usr/bin/perl -w use strict; my $text = <## [leo@mescaline ~]$ perl recurring.pl 0) phrase [[[This is my first sentence, Ok]]] digest [[[thisismyfirstsentenceok]]] digest occurrences: 1 1) phrase [[[ This is another one]]] digest [[[thisisanotherone]]] digest occurrences: 1 2) phrase [[[ Leonardo da Vinci died at Clos Lucé, France, on 2nd May, 1519]]] digest [[[leonardodavincidiedatcloslucfranceonndmay]]] digest occurrences: 1 3) phrase [[[ The only solution I can think of is to loop through the text, word by word, and search the remaining text for multiple occurences of that word]]] digest [[[theonlysolutionicanthinkofistoloopthroughthetextwordbywordandsearchtheremainingtextformultipleoccurencesofthatword]]] digest occurrences: 1 4) phrase [[[ If found, check if the successive words are the same]]] digest [[[iffoundcheckifthesuccessivewordsarethesame]]] digest occurrences: 1 5) phrase [[[ But that method is very slow, as I need to loop through the content many times]]] digest [[[butthatmethodisveryslowasineedtoloopthroughthecontentmanytimes]]] digest occurrences: 4 6) phrase [[[ I'm wondering if there's a way to do it more efficient]]] digest [[[imwonderingiftheresawaytodoitmoreefficient]]] digest occurrences: 4 7) phrase [[[ But that method is very slow, as I need to loop through the content many times]]] digest [[[butthatmethodisveryslowasineedtoloopthroughthecontentmanytimes]]] digest occurrences: 4 8) phrase [[[ I'm wondering if there's a way to do it more efficient]]] digest [[[imwonderingiftheresawaytodoitmoreefficient]]] digest occurrences: 4 9) phrase [[[ But that method is very slow, as I need to loop through the content many times]]] digest [[[butthatmethodisveryslowasineedtoloopthroughthecontentmanytimes]]] digest occurrences: 4 10) phrase [[[ I'm wondering if there's a way to do it more efficient]]] digest [[[imwonderingiftheresawaytodoitmoreefficient]]] digest occurrences: 4 11) phrase [[[ But that method is very slow, as I need to loop through the content many times]]] digest [[[butthatmethodisveryslowasineedtoloopthroughthecontentmanytimes]]] digest occurrences: 4 12) phrase [[[ I'm wondering if there's a way to do it more efficient]]] digest [[[imwonderingiftheresawaytodoitmoreefficient]]] digest occurrences: 4 13) phrase [[[ "of the" is a phrase that's apt to recur (and many times) in many documents]]] digest [[[oftheisaphrasethatsapttorecurandmanytimesinmanydocuments]]] digest occurrences: 1 14) phrase [[[ Do you care]]] digest [[[doyoucare]]] digest occurrences: 1 15) phrase [[[ Or do you really mean that the ONLY recurring phrase you care about is "Leonardo da Vinci" or something similarly restricted]]] digest [[[ordoyoureallymeanthattheonlyrecurringphraseyoucareaboutisleonardodavinciorsomethingsimilarlyrestricted]]] digest occurrences: 1 16) phrase [[[ "Leonardo da Vinci" or something similarly restricted]]] digest [[[leonardodavinciorsomethingsimilarlyrestricted]]] digest occurrences: 6 17) phrase [[[ "Leonardo da Vinci" or something similarly restricted]]] digest [[[leonardodavinciorsomethingsimilarlyrestricted]]] digest occurrences: 6 18) phrase [[[ "Leonardo da Vinci" or something similarly restricted]]] digest [[[leonardodavinciorsomethingsimilarlyrestricted]]] digest occurrences: 6 19) phrase [[[ "Leonardo da Vinci" or something similarly restricted]]] digest [[[leonardodavinciorsomethingsimilarlyrestricted]]] digest occurrences: 6 20) phrase [[[ "Leonardo da Vinci" or something similarly restricted]]] digest [[[leonardodavinciorsomethingsimilarlyrestricted]]] digest occurrences: 6 21) phrase [[[ "Leonardo da Vinci" or something similarly restricted]]] digest [[[leonardodavinciorsomethingsimilarlyrestricted]]] digest occurrences: 6 22) phrase [[[ And while the "speed" will depend (in part) on your algorithm, the time the process will take to run to completion will likely be most influenced by the size of the text to search and the specificity (or simplicity) of the search phrase (hint: read "regular expression"), for any given language and box upon which to run it]]] digest [[[andwhilethespeedwilldependinpartonyouralgorithmthetimetheprocesswilltaketoruntocompletionwilllikelybemostinfluencedbythesizeofthetexttosearchandthespecificityorsimplicityofthesearchphrasehintreadregularexpressionforanygivenlanguageandboxuponwhichtorunit]]] digest occurrences: 1