hyu968 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am new to Perl (any scripting language as a matter of fact), and I need your help with the following. I need to search a text file (lengthy files, with many pages, they are in RTF format). I want to do the following: search for a certain word, or combination of words, for example, "North Korea". Each of this hits I want to record as a line in Excel file, with the following columns: 1st column: 25 characters before and after "North Korea", 2nd column: Line where it is found, 3rd column: Word position on that line. I would also ideally like to search for multiple keywords, for example, "North", "South", and "Korea". Then the output will have the 4th column in the Excel file that specifies which of the keywords was found. Any help with this will be greatly appreciated! Thanks! GM
  • Comment on Search a text file for a keyword, and create Excel file output

Replies are listed 'Best First'.
Re: Search a text file for a keyword, and create Excel file output
by Anonymous Monk on Jun 02, 2013 at 00:18 UTC

    Any help with this will be greatly appreciated! Thanks! GM

    Sure, here you go :) perlintro/perlintro#Regular expressions..., rtf->RTF::Tokenizer/MANIFEST, Text::CSV_XS/examples/csv2xls , DBD::CSV

     Sure, here you go :) [doc://perlintro]/[doc://perlintro]#[doc://perlintro#Regular-expressions]..., [cpan://rtf]->[mod://RTF::Tokenizer]/[href://http://search.cpan.org/dist/RTF-Tokenizer/MANIFEST|MANIFEST], [mod://Text::CSV_XS]/[href://http://search.cpan.org/dist/Text-CSV_XS/MANIFEST|examples/csv2xls] , [mod://DBD::CSV]

Re: Search a text file for a keyword, and create Excel file output
by Jenda (Abbot) on Jun 02, 2013 at 11:15 UTC

    Maybe in this case it's be better to have a look at scripting MS Word either directly (here's how to add the tools to the ribbon) or from Perl (Win32::OLE). I'm not sure the other solutions will be able to give you the correct line numbers.I did not check what info does Word provide about the things you ask it to find though.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Re: Search a text file for a keyword, and create Excel file output
by rpnoble419 (Pilgrim) on Jun 02, 2013 at 12:26 UTC

    You will have a difficult time with this project as you have it planned. First, I don't think Excel is the best choice for you to store the results. If you have that much data you will exceed the 65000 row limit in Excel 97 and if you used Excel 2007 format the file will be very slow to process. Second, the term line is a difficult one to define as RTF does not have a character limit for its paragraphs. so at best you can capture the number of the paragraph the "keyword" is located in. Unless you can force each "line" to be 80 characters in mono-spaced font.

    It is best to "Ask" for help by showing us what you have attempted already along with a sample of the data you plan to process. Best of luck learning Perl as it is a wonderful language to use. All of what you need to do this project can be found here at the monastery if you know where to look for it.