in reply to Extracting specific Data

maybe this will help, it uses regex to get all character between two words.

use strict; use warnings; my $file_content = ""; my $parsed = ""; open(INPUT,"<readbetween.txt") or die "cannot open readbetween.txt : $!"; while(<INPUT>){ my $line = $_; chomp($line); $file_content .= $line; } $_ = $file_content; while(/word1(.+)word2/mg){ print "$1\n"; #This is the content between the two words }

Replies are listed 'Best First'.
Re^2: Extracting specific Data
by maheshkumar (Sexton) on Jun 26, 2012 at 13:16 UTC
    I tried the following
    open(FILE, "<Google.txt") or die "Could not open file: $!"; while(<FILE>){ if(m/^Google/) { print $_; } }

    My Google.txt has the following data

    Google Google mahesh kumar sam

    I am confused what if my file has multiple lines then?
      your code would work for one line, look at my code. I put the regex in a while loop and had /mg which means match globally, this makes it work for all lines in file. word1 is the word where you want to start recording the data and word2 is where you want to end the reading of the data. The (.+) is everything in between those two words (non inclusive), and $1 represents the data in (.+).
Re^2: Extracting specific Data
by maheshkumar (Sexton) on Jun 26, 2012 at 13:36 UTC
    Well that is working completely fine. Thanks a lot!!!

      Okay what if I had a file from which i had to extract data between those two words. But dont want it to match globally. For instance the word1 and word2 appears the first time it prints the data between them, and word1 again appears somewhere in the file and word2 also appears again. So it is like chunks of data through the file between the ~"word1" and "word2".... Nothing happens when i remove the match global (mg) at the end of the specified words...

        And what are the requirements when regions overlap? And what if start/end words double?

        You should look into constructs like

        $ perl -ne'/start/i .. /end/i and print' file $ perl -ne'/\b start \b/ix ... /\b end \b/ix and print' file

        And start reading :)


        Enjoy, Have FUN! H.Merijn