Match two files using regex

chemshifts has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Match two files using regex by stevieb (Canon) on Jun 02, 2015 at 18:01 UTC
Even though I answered this on StackOverflow, I'll paste my solution here for completeness purposes: Here's one way using split() hackery: #!/usr/bin/perl use strict; use warnings; my $f1 = 'file1.txt'; my $f2 = 'file2.txt'; my @pdb; open my $pdb_file, '<', $f2 or die "Can't open the PDB file $f2: $!"; while (my $line = <$pdb_file>){ chomp $line; push @pdb, $line; } close $pdb_file; open my $shifts_file, '<', $f1 or die "Can't open the SHIFTS file $f1: $!"; while (my $line = <$shifts_file>){ chomp $line; my $pdb_line = shift @pdb; # - inner split: get the third element from the $pdb_line # - outer split: get the first element (character) from the # result of the inner split my $criteria = (split('', (split('\s+', $pdb_line))[2]))[0]; # - compare the 2nd element of the file1.txt line against # the above split() operations if ((split('\s+', $line))[1] eq $criteria){ print "$pdb_line\n"; } else { print "** >$pdb_line< doesn't match >$line<\n"; } } [download] Files: file1.txt (note I changed line two to ensure a non-match worked): `1 H 35 1 A 22 1 H 20` [download] file2.txt: `A 1 HB2 MET 1 A 2 CA MET 1 A 3 HA MET 1` [download] Output: `./app.pl A 1 HB2 MET 1 **>A 2 CA MET 1< doesn't match >1 A 22< A 3 HA MET 1` [download] -stevieb	[reply] [d/l] [select]
Re^2: Match two files using regex by Anonymous Monk on Jun 02, 2015 at 18:06 UTC
It was correct and I appreciate you posting it as I got the output I needed however, I would just like to see where I went wrong with my code.	[reply]
Re^3: Match two files using regex by stevieb (Canon) on Jun 02, 2015 at 18:10 UTC
Gotcha. Just for forward-going, it's always best to state up-front that you've cross-posted and that you're just looking for further advice. No harm done. I'll have another look at your original code later on if nobody else gets a chance to debug it. Cheers, -stevieb	[reply]
Re^4: Match two files using regex by chemshifts (Initiate) on Jun 02, 2015 at 18:13 UTC
Re^2: Match two files using regex by chemshifts (Initiate) on Jun 02, 2015 at 18:08 UTC
It was correct and I appreciate you posting it as I got the output I needed however, I would just like to see where I went wrong with my code.	[reply]
Re^2: Match two files using regex by Anonymous Monk on Jun 03, 2015 at 05:18 UTC
Actually, it doesn't look correct. The output lines are supposed to begin with a number, not a letter, and the output is missing the values from the third column of the first file, which should be the fourth column of the output.	[reply]
Re^3: Match two files using regex by chemshifts (Initiate) on Jun 03, 2015 at 14:34 UTC
I was able to get the correct output by adding these variables to the print command.	[reply]
Re: Match two files using regex by stevieb (Canon) on Jun 02, 2015 at 17:49 UTC
What was wrong with my StackOverflow answer that was marked as correct? -stevieb	[reply]
Re: Match two files using regex by Anonymous Monk on Jun 02, 2015 at 18:41 UTC
It's just a simple one-liner `#!/usr/bin/perl # http://perlmonks.org/?node_id=1128822 use strict; use warnings; $_ = <<END; # input 1 H 35 1 C 22 2 H 20 2 C 30 A 1 HB2 MET 1 A 2 CA MET 1 A 3 HA ASP 2 A 4 CA ASP 2 END =output wanted 1 MET HB2 35 1 MET CA 22 2 ASP HA 20 2 ASP CA 30 =cut print "$1 $5 $4 $3\n" while /^(\S+)\s+(\w)\s+(\S+)(?=.\n\n.^\S+\s+ +\S+\s+(\2..)\s+(\S+)\s+\1)/gms;` [download] :)	[reply] [d/l]
Re: Match two files using regex by GotToBTru (Prior) on Jun 02, 2015 at 19:21 UTC
You declare my $value inside the loop, so that variable will cease to exist once the loop exits. You need to move your test inside the loop, and don't re-declare the variable. That's the first problem I see. Dum Spiro Spero	[reply]
Re^2: Match two files using regex by chemshifts (Initiate) on Jun 02, 2015 at 19:35 UTC
I see, I guess it's the same for the fields variable as well...	[reply]
Re^3: Match two files using regex by GotToBTru (Prior) on Jun 02, 2015 at 19:53 UTC
Yep. I strongly suggest you get familiar with the Perl debugger. It will be an enormous help to you as you learn the language. You can inspect the values of variables while the program is running. I frequently use it to try out syntax, especially when dealing with references. Dum Spiro Spero	[reply]
Re: Match two files using regex by Anonymous Monk on Jun 03, 2015 at 05:23 UTC
Did you forget to mention that the first column of the first file also has to match the last column of the second file?	[reply]
Re^2: Match two files using regex by chemshifts (Initiate) on Jun 03, 2015 at 14:31 UTC
It should, but that was too complicated for me to write in script.	[reply]
Re^3: Match two files using regex by Anonymous Monk on Jun 03, 2015 at 19:55 UTC
But it's a requirement... The one-liner above does it. Here's an expanded version of my one-liner that I hope makes it easier to understand. #!/usr/bin/perl # http://perlmonks.org/?node_id=1128822 use strict; use warnings; $_ = <<END; # input 1 H 35 1 C 22 2 H 20 2 C 30 A 1 HB2 MET 1 A 2 CA MET 1 A 3 HA ASP 2 A 4 CA ASP 2 END =output wanted 1 MET HB2 35 1 MET CA 22 2 ASP HA 20 2 ASP CA 30 =cut #print "$1 $5 $4 $3\n" while /^(\S+)\s+(\w)\s+(\S+)(?=.\n\n.^\S+\s ++\S+\s+(\2..)\s+(\S+)\s+\1\b)/gms; # expanded for clarity print "$1 $5 $4 $3\n" while / # match ^ # starting at the start of a line (\S+) # capture first field \s+ # skip whitespace (\w) # capture letter in column 2 \s+ # skip whitespace (\S+) # capture third field (?= # zerowidth positive lookahead .* # skipping to \n\n # the empty line separating first and second file # this guarantees the patterns above this are in the first f +ile # and the patterns below are in the second file .* # skipping to ^ # start of a line in second file \S+ # skip first field (not needed) \s+ # skip whitespace \S+ # skip second field (not needed) \s+ # skip whitespace (\2..) # capture third field if it starts with previously captured +letter (three wide) \s+ # skip whitespace (\S+) # capture fourth field \s+ # skip whitespace \1 # make sure fifth field matches first field of first file. \b # insure complete match ) # end of zerowidth lookahead /gmsx; # global, match any start of line, . matches \n, expanded __END__ [download]	[reply] [d/l]
Re^4: Match two files using regex by chemshifts (Initiate) on Jun 03, 2015 at 20:02 UTC