Accessing secondary elements in array

chavanak has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks, I have two arrays that I have to compare. E.g.:

Array1
ATOM   2198  [b]SG  CYS L  51[/b]      39.781 -12.827   5.691  1.00 26
+.67 
ATOM   2199  N   MET L  52      37.845 -15.766   5.722  1.00 33.08 
ATOM   2200  CA  MET L  52      38.312 -17.144   5.674  1.00 33.08 
ATOM   2201  C   MET L  52      37.329 -18.022   4.901  1.00 33.08
[download]

Array2
ATOM   2212 [b] CB  MET L  52[/b]      17.332  94.112  87.029  1.00  0
+.00 
ATOM   2213  CG  MET L  52      18.017  94.866  88.170  1.00  0.00 
ATOM   2214  SD  MET L  52      18.711  96.457  87.699  1.00  0.00 
ATOM   2215  CE  MET L  52      17.198  97.429  87.820  1.00  0.00 
ATOM   2216  N   ARG L  53      19.331  91.671  87.132  1.00  0.00
[download]

I am supposed to remove elements from array2 that are not present in array1. But my problem is I have to judge the differences based only on the bold text above. i.e., the perl program should compare both the arrays and if the bold part is common in both files, then it should be removed from array2. I am not understanding how I can tell perl to look only for the bold text and ignore the remaining text and number. Can anyone help me? Cheers

Comment on Accessing secondary elements in array Select or Download Code

Replies are listed 'Best First'.
Re: Accessing secondary elements in array by johngg (Canon) on Nov 04, 2009 at 15:07 UTC
If I've understood correctly, you could construct a hash keyed by the four columns of interest. If you split the line on whitespace and slice out the columns and join them again with some delimiter (I chose a colon) you construct the key. I have added a line to your "array 2" data with a common "bold part" so you can see that it gets removed. Note that I have used Data::Dumper so that you can see the lookup hash and resultant `@array2`. Here's the code. use strict; use warnings; use Data::Dumper; open my $array1FH, q{<}, \ <<'EOF1' or die qq{open: < HEREDOC 1: $!\n} +; ATOM 2198 SG CYS L 51 39.781 -12.827 5.691 1.00 26.67 ATOM 2199 N MET L 52 37.845 -15.766 5.722 1.00 33.08 ATOM 2200 CA MET L 52 38.312 -17.144 5.674 1.00 33.08 ATOM 2201 C MET L 52 37.329 -18.022 4.901 1.00 33.08 EOF1 my @array1 = <$array1FH>; close $array1FH or die qq{close: < HEREDOC 1: $!\n}; my %array1Lookup = map { join( q{:}, ( split )[ 2 .. 5 ] ), 1 } @array1; print Data::Dumper->Dumpxs( [ \ %array1Lookup ], [ qw{ array1Lookup } + ] ); open my $array2FH, q{<}, \ <<'EOF2' or die qq{open: < HEREDOC 2: $!\n} +; ATOM 2212 CB MET L 52 17.332 94.112 87.029 1.00 0.00 ATOM 2213 CG MET L 52 18.017 94.866 88.170 1.00 0.00 ATOM 2214 SD MET L 52 18.711 96.457 87.699 1.00 0.00 ATOM 2215 CE MET L 52 17.198 97.429 87.820 1.00 0.00 ATOM 2216 N ARG L 53 19.331 91.671 87.132 1.00 0.00 ATOM 2217 CA MET L 52 19.331 91.671 87.132 1.00 0.00 EOF2 my @array2 = (); while ( <$array2FH> ) { chomp; my $lookupKey = join q{:}, ( split )[ 2 .. 5 ]; next if $array1Lookup{ $lookupKey }; push @array2, $_; } close $array2FH or die qq{close: < HEREDOC 2: $!\n}; print Data::Dumper->Dumpxs( [ \ @array2 ], [ qw{ array2 } ] ); [download] The output. `%array1Lookup = ( 'C:MET:L:52' => 1, 'N:MET:L:52' => 1, 'CA:MET:L:52' => 1, 'SG:CYS:L:51' => 1 ); @array2 = ( 'ATOM 2212 CB MET L 52 17.332 94.112 87.029 1 +.00 0.00', 'ATOM 2213 CG MET L 52 18.017 94.866 88.170 1 +.00 0.00', 'ATOM 2214 SD MET L 52 18.711 96.457 87.699 1 +.00 0.00', 'ATOM 2215 CE MET L 52 17.198 97.429 87.820 1 +.00 0.00', 'ATOM 2216 N ARG L 53 19.331 91.671 87.132 1 +.00 0.00' );` [download] I hope I have guessed correctly and this is of some help. Cheers, JohnGG Update: Added missing `@` sigil to `array2` in 2nd paragraph.	[reply] [d/l] [select]
Re: Accessing secondary elements in array by JavaFan (Canon) on Nov 04, 2009 at 12:03 UTC
I would use a regexp to extract the "bold" part, and compare that. Which part do you have a problem with? The regexp? Comparing two strings? Intersecting the array (for that: see the perlfaq)?	[reply]
Re^2: Accessing secondary elements in array by chavanak (Initiate) on Nov 04, 2009 at 12:21 UTC
The problem for me is in regexp and intersection. To be very honest I have no idea how to use regexp for this particular task :( I am very new to perl so any guidance to material or example code will be really helpful	[reply]
Re^3: Accessing secondary elements in array by JavaFan (Canon) on Nov 04, 2009 at 12:30 UTC
A few assumptions, "bold text" is the part of the text that is surrounded by `[b]` and `[/b]`. "bold text" isn't nested inside "bold text", and there's at most one piece of "bold text" per string. Strings not containing any bold text is to be ignored. Then I would do something like (not tested): `my %seen; m{\[b\](.?)\[/b\]} and $seen{$1} = 1 for @array1; my @result = grep {m{\[b\](.?)\[/b\]} && !$seen{$1}} @array2;` [download]	[reply] [d/l] [select]
Re^4: Accessing secondary elements in array by chavanak (Initiate) on Nov 04, 2009 at 13:00 UTC
Re^5: Accessing secondary elements in array by JavaFan (Canon) on Nov 04, 2009 at 13:16 UTC