maheshkumar has asked for the wisdom of the Perl Monks concerning the following question:

I need to write a Perl program which takes in account of a Text file and then finds a specific word. Whenever the specific word appears in the file Perl starts to record the lines or data written after that word and ends on another specific word. That means the data between those two words should be copied to another file. Those two sets of words can appear multiple times in the file.

Replies are listed 'Best First'.
Re: Extracting specific Data
by marto (Cardinal) on Jun 26, 2012 at 12:07 UTC
Re: Extracting specific Data
by Athanasius (Archbishop) on Jun 26, 2012 at 12:15 UTC
Re: Extracting specific Data
by jayto (Acolyte) on Jun 26, 2012 at 13:11 UTC

    maybe this will help, it uses regex to get all character between two words.

    use strict; use warnings; my $file_content = ""; my $parsed = ""; open(INPUT,"<readbetween.txt") or die "cannot open readbetween.txt : $!"; while(<INPUT>){ my $line = $_; chomp($line); $file_content .= $line; } $_ = $file_content; while(/word1(.+)word2/mg){ print "$1\n"; #This is the content between the two words }
      I tried the following
      open(FILE, "<Google.txt") or die "Could not open file: $!"; while(<FILE>){ if(m/^Google/) { print $_; } }

      My Google.txt has the following data

      Google Google mahesh kumar sam

      I am confused what if my file has multiple lines then?
        your code would work for one line, look at my code. I put the regex in a while loop and had /mg which means match globally, this makes it work for all lines in file. word1 is the word where you want to start recording the data and word2 is where you want to end the reading of the data. The (.+) is everything in between those two words (non inclusive), and $1 represents the data in (.+).
      Well that is working completely fine. Thanks a lot!!!

        Okay what if I had a file from which i had to extract data between those two words. But dont want it to match globally. For instance the word1 and word2 appears the first time it prints the data between them, and word1 again appears somewhere in the file and word2 also appears again. So it is like chunks of data through the file between the ~"word1" and "word2".... Nothing happens when i remove the match global (mg) at the end of the specified words...

Re: Extracting specific Data
by flexvault (Monsignor) on Jun 26, 2012 at 12:44 UTC

    Welcome maheshkumar,

    Two things you may want to search on:

    • "Slurp a file" or reading the entire file into a string in memory. I use 'read' or 'sysread', but there are other techniques.

    • 'index' function to find exact location of sub-strings

    If the file is larger than available memory, you can use the 'read' or 'sysread' in a loop, getting chucks of the file that will fit in available memory. This may be more complicated, but you don't have to worry about line endings.

    Good Luck!

    "Well done is better than well said." - Benjamin Franklin