Extracting specific Data

maheshkumar has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Extracting specific Data by marto (Cardinal) on Jun 26, 2012 at 12:07 UTC
Welcome to the Monastery. This doesn't read like a Perl question, just a requirement, you need to achieve x,y,z using a Perl program. What parts are you having a problem with? If you haven't tried anything yet, solve the problem using pen/pencil and paper then figure out how to code your solution using Perl. If you are just starting to learn Perl I suggest you take the time to read: perlintro http://learn.perl.org http://perldoc.perl.org Browse the Tutorials section of this site. Regarding this site please read and understand PerlMonks for the Absolute Beginner and How do I post a question effectively?.	[reply]
Re: Extracting specific Data by Athanasius (Archbishop) on Jun 26, 2012 at 12:15 UTC
See also the recent thread Extract Tags between Two strings. Athanasius <°(((>< contra mundum	[reply]
Re: Extracting specific Data by jayto (Acolyte) on Jun 26, 2012 at 13:11 UTC
maybe this will help, it uses regex to get all character between two words. `use strict; use warnings; my $file_content = ""; my $parsed = ""; open(INPUT,"<readbetween.txt") or die "cannot open readbetween.txt : $!"; while(<INPUT>){ my $line = $_; chomp($line); $file_content .= $line; } $_ = $file_content; while(/word1(.+)word2/mg){ print "$1\n"; #This is the content between the two words }` [download]	[reply] [d/l]
Re^2: Extracting specific Data by maheshkumar (Sexton) on Jun 26, 2012 at 13:16 UTC
I tried the following `open(FILE, "<Google.txt") or die "Could not open file: $!"; while(<FILE>){ if(m/^Google/) { print $_; } }` [download] My Google.txt has the following data Google Google mahesh kumar sam I am confused what if my file has multiple lines then?	[reply] [d/l]
Re^3: Extracting specific Data by jayto (Acolyte) on Jun 26, 2012 at 13:24 UTC
your code would work for one line, look at my code. I put the regex in a while loop and had /mg which means match globally, this makes it work for all lines in file. word1 is the word where you want to start recording the data and word2 is where you want to end the reading of the data. The (.+) is everything in between those two words (non inclusive), and $1 represents the data in (.+).	[reply]
Re^2: Extracting specific Data by maheshkumar (Sexton) on Jun 26, 2012 at 13:36 UTC
Well that is working completely fine. Thanks a lot!!!	[reply]
Re^3: Extracting specific Data by maheshkumar (Sexton) on Jun 26, 2012 at 13:56 UTC
Okay what if I had a file from which i had to extract data between those two words. But dont want it to match globally. For instance the word1 and word2 appears the first time it prints the data between them, and word1 again appears somewhere in the file and word2 also appears again. So it is like chunks of data through the file between the ~"word1" and "word2".... Nothing happens when i remove the match global (mg) at the end of the specified words...	[reply]
Re^4: Extracting specific Data by Tux (Canon) on Jun 26, 2012 at 14:33 UTC
Re: Extracting specific Data by flexvault (Monsignor) on Jun 26, 2012 at 12:44 UTC
Welcome maheshkumar, Two things you may want to search on: "Slurp a file" or reading the entire file into a string in memory. I use 'read' or 'sysread', but there are other techniques. 'index' function to find exact location of sub-strings If the file is larger than available memory, you can use the 'read' or 'sysread' in a loop, getting chucks of the file that will fit in available memory. This may be more complicated, but you don't have to worry about line endings. Good Luck! "Well done is better than well said." - Benjamin Franklin	[reply]