Storing String from Line Before Regex Match

Nico has asked for the wisdom of the Perl Monks concerning the following question:

Hello again everyone!

I always end up finding myself coming straight back to PerlMonks when I'm stumped because everyone is always so helpful!

Right now I'm attempting to analyze a text file. This text file (I will give an example) with multiple "sections" divided by "\n" characters. These sections contain a bunch of data I don't care about, but there a few lines that I need to grab. Here is where my problem lies.

Example Text File

First Name: John
Last Name: Doe
Occupation: Network Administrator
Location: West Coast

First Name: Jane
Last Name: Doe
Occupation: Human Resources
Location: East Coast

First Name: James
Last Name: Doe
Occupation: Technical Support Engineer
Location: Central USA
[download]

I have been trying to use regex to search for a string, for example "Central USA" and then use that to match the "First Name" and "Last Name" lines and CAPTURE their names.

I attempted to use a regex "lookbehind" but I can't do that since my capture has to be variable in length. I believe this is because I don't know the length of the first or last name and I have to account for that. I have been attempting to use http://regexstorm.net/tester to accomplish this.

When I don't use a lookbehind, my regex search picks up the first "First Name" and "Last Name" line in the file regardless of if it is near where I matched the "Location" field. This makes sense, but I want it to grab the "First Name" and "Last Name" line that came right before "Central USA".

Should I be going at this a different way?

Example Code

if ($line =~ /First Name:\s+([A-Za-z0-9 _ ( )]*).*?Last Name:\s+([A-Za
+-z0-9 _ ( )]*).*?Location: Central USA/s) {
     print $line;
}
[download]

As always, any help would be greatly appreciated!

Comment on Storing String from Line Before Regex Match Select or Download Code

Replies are listed 'Best First'.
Re: Storing String from Line Before Regex Match by toolic (Bishop) on Mar 31, 2016 at 18:32 UTC
A different approach is to read the file as records separated by a blank line. Store the data into a hash for each record, then print out only what you need. One benefit is that this method is independent of the order of the lines of the input. use warnings; use strict; $/ = "\n\n"; while (<DATA>) { my %data; for my $line (split /\n/) { my ($k, $v) = split /\s:\s/, $line; $data{$k} = $v; } print "$data{'First Name'} $data{'Last Name'}\n" if $data{Location +} eq 'Central USA'; } __DATA__ First Name: John Last Name: Doe Occupation: Network Administrator Location: West Coast First Name: Jane Last Name: Doe Occupation: Human Resources Location: East Coast First Name: James Last Name: Doe Occupation: Technical Support Engineer Location: Central USA [download]	[reply] [d/l]
Re: Storing String from Line Before Regex Match by haukex (Archbishop) on Mar 31, 2016 at 19:23 UTC
Hi Nico, I like toolic's approach better, but in the spirit of TIMTOWTDI: `my $re = qr/ First\ Name: \s+ (.+)\n Last\ Name: \s+ (.+)\n (?:.+\n)* Location:\ Central\ USA\n /x; while ($line=~/$re/g) { print "<$1> <$2>\n"; }` [download] Note the use of the /x modifier to make the regex more readable. Also, I removed the `/s` modifier, so that the dot `.` doesn't match newlines. I think this approach is probably a little less robust than splitting the input file on empty lines, but if you're sure of the formatting of the input files this should still work. Hope that helps, -- Hauke D	[reply] [d/l] [select]
Re: Storing String from Line Before Regex Match by Laurent_R (Canon) on Apr 01, 2016 at 06:35 UTC
You might simply read your data in a loop, capture the names as you go, and use these captures only when needed (when location matches "Central USA").	[reply]
Re: Storing String from Line Before Regex Match by tybalt89 (Monsignor) on Apr 01, 2017 at 20:57 UTC
The .* lets your match cross over section boundaries. $stayinsection acts just like .* but will not allow crossing over your section boundary of "\n\n" Just a slightly advanced regex trick :) #!/usr/bin/perl # http://perlmonks.org/?node_id=1159215 use strict; use warnings; my $stayinsection = qr/(?:(?!\n\n).)/s; $_ = do { local $/; <DATA> }; print "<$1> <$2>\n" while /First Name:\s+([A-Za-z0-9 _ ( )])${stayinsection}Last Name:\s+([A- +Za-z0-9 _ ( )]*)${stayinsection}Location: Central USA/g; __DATA__ First Name: John Last Name: Doe Occupation: Network Administrator Location: West Coast First Name: Jane Last Name: Doe Occupation: Human Resources Location: East Coast First Name: James Last Name: Doe Occupation: Technical Support Engineer Location: Central USA First Name: Jane Last Name: Doe Occupation: Human Resources Location: East Coast First Name: Another Last Name: Doe Occupation: Technical Support Engineer Location: Central USA [download]	[reply] [d/l]
Re^2: Storing String from Line Before Regex Match by AnomalousMonk (Archbishop) on Apr 01, 2017 at 23:02 UTC
I notice the character class `[A-Za-z0-9 _ ( )]` with two extra space characters. Just out of idle curiosity, is this done to enhance visual presentation/readability, or for some other reason? Update: Also: What is the purpose of the `$searchfor` string? Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: Storing String from Line Before Regex Match by tybalt89 (Monsignor) on Apr 02, 2017 at 01:22 UTC
I just left the OP's character class code as it was. I don't know why he had multiple spaces in there. I had tried several different solutions before deciding to just modify the OP's regex. The $searchfor string is left over from an early test version, it should be removed. In fact, I think I'll go do that now. Thanks for the catch.	[reply]