Extracting string based on comparison to second string?

Replies are listed 'Best First'.
Re: Extracting string based on comparison to second string? by kennethk (Abbot) on May 19, 2010 at 21:36 UTC
What have you tried? What didn't work? See How do I post a question effectively?. In general, we can be more helpful when we can see where you've had difficulty and can gauge your experience. As I understand your spec, you can use regular expressions to locate where in your masks your codes fall and then use the special variables @- and @+ along with substr to extract the appropriate characters. Something like: #!/usr/bin/perl use strict; use warnings; my @searches = qw( .LLL LLLL .RRR RRRR ); my @file = <DATA>; chomp @file; for my $search (@searches) { for my $line (@file) { my ($string, $mask) = split /\s+/, $line, 2; my @matches; while ($mask =~ /\Q$search\E/g) { my $start = $-[0]; my $end = $+[0]; push @matches, substr $string, $start, $end - $start; } local $" = ","; print "$line\t\t@matches\n"; } } __DATA__ THAETURQU .LLL..RRR JURYATAUTIOPW .LLL...LL.RRR TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL [download] Note I used \Q and \E to escape metacharacters since . has special meaning in regular expressions. See perlretut on info on how to build regular expressions.	[reply] [d/l]
Re: Extracting string based on comparison to second string? by ikegami (Patriarch) on May 19, 2010 at 22:38 UTC
Should "LL" result in `THAETURQU .LLL..RRR HA,AE JURYATAUTIOPW .LLL...LL.RRR UR,RY,UT TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL WQ,QN,NQ,MN,NH,HF` [download] or `THAETURQU .LLL..RRR HA JURYATAUTIOPW .LLL...LL.RRR UR,UT TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL WQ,NQ,MN,HF` [download] kennethk assumed the latter. The following handles the former: #!/usr/bin/perl use strict; use warnings; my @patterns = qw( .LLL LLLL .RRR RRRR LL ); chomp( my @file = <DATA> ); for my $pat (@patterns) { print("pattern = $pat\n"); our $string; our @matches; my $re = do { use re 'eval'; qr/ \Q$pat\E (?{ my $start = $-[0]; my $end = $+[0]; push @matches, substr($string, $start, $end - $start); }) (?!) /x }; for my $line (@file) { (local $string, my $mask) = split(/\s+/, $line, 2); local @matches; $mask =~ $re; local $" = ","; print "$line\t@matches\n"; } print("\n"); } __DATA__ THAETURQU .LLL..RRR JURYATAUTIOPW .LLL...LL.RRR TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL [download] pattern = .LLL THAETURQU .LLL..RRR THAE JURYATAUTIOPW .LLL...LL.RRR JURY TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL OWQN,WMNH pattern = LLLL THAETURQU .LLL..RRR JURYATAUTIOPW .LLL...LL.RRR TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL WQNQ,MNHF pattern = .RRR THAETURQU .LLL..RRR URQU JURYATAUTIOPW .LLL...LL.RRR IOPW TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL pattern = RRRR THAETURQU .LLL..RRR JURYATAUTIOPW .LLL...LL.RRR TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL pattern = LL THAETURQU .LLL..RRR HA,AE JURYATAUTIOPW .LLL...LL.RRR UR,RY,UT TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL WQ,QN,NQ,UT,MN,NH,HF [download]	[reply] [d/l] [select]
Re: Extracting string based on comparison to second string? by johngg (Canon) on May 19, 2010 at 23:01 UTC
This solution is similar to kennethk's but uses a look-ahead in the regular expression and pos to find where the match occurred rather than special variables. use strict; use warnings; use 5.010; my @searches = qw{ .LLL LLLL .RRR RRRR }; chomp( my @lines = <DATA> ); foreach my $search ( @searches ) { my $rxSearch = qr{(?=\Q$search\E)}; my $length = length $search; foreach my $line ( @lines ) { my ( $string, $mask ) = split m{\s+}, $line, 2; say qq{Looking for '$search' in '$mask'}; while ( $mask =~ m{$rxSearch}g ) { my $pos = pos $mask; my $ind = q{ } x length $mask; substr $ind, $pos, $length, q{^} x $length; say qq{ Found at offset $pos}; say qq{ $string\n $mask}; say qq{ $ind - @{ [ substr $string, $pos, $length ] }} +; } } } __DATA__ THAETURQU .LLL..RRR JURYATAUTIOPW .LLL...LL.RRR TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL [download] The output. Looking for '.LLL' in '.LLL..RRR' Found at offset 0 THAETURQU .LLL..RRR ^^^^ - THAE Looking for '.LLL' in '.LLL...LL.RRR' Found at offset 0 JURYATAUTIOPW .LLL...LL.RRR ^^^^ - JURY Looking for '.LLL' in 'L..L..LLLL.LL.R..L.LLLL' Found at offset 5 TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL ^^^^ - OWQN Found at offset 18 TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL ^^^^ - WMNH Looking for 'LLLL' in '.LLL..RRR' Looking for 'LLLL' in '.LLL...LL.RRR' Looking for 'LLLL' in 'L..L..LLLL.LL.R..L.LLLL' Found at offset 6 TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL ^^^^ - WQNQ Found at offset 19 TURIWOWQNQQUTYRIOPWMNHF L..L..LLLL.LL.R..L.LLLL ^^^^ - MNHF Looking for '.RRR' in '.LLL..RRR' Found at offset 5 THAETURQU .LLL..RRR ^^^^ - URQU Looking for '.RRR' in '.LLL...LL.RRR' Found at offset 9 JURYATAUTIOPW .LLL...LL.RRR ^^^^ - IOPW Looking for '.RRR' in 'L..L..LLLL.LL.R..L.LLLL' Looking for 'RRRR' in '.LLL..RRR' Looking for 'RRRR' in '.LLL...LL.RRR' Looking for 'RRRR' in 'L..L..LLLL.LL.R..L.LLLL' [download] I hope this is useful. Cheers, JohnGG Update: Corrected typo.	[reply] [d/l] [select]
Re^2: Extracting string based on comparison to second string? by trudy123g (Initiate) on May 20, 2010 at 22:16 UTC
Thanks guys - really kind of you! The output from JohnGG's first example was best, I can deal with that. I've a lot still to learn!	[reply]

How do I post a question effectively?

As I understand your spec, you can use regular expressions to locate where in your masks your codes fall and then use the special variables @- and @+ along with substr to extract the appropriate characters. Something like:

#!/usr/bin/perl
use strict;
use warnings;

my @searches = qw(
                    .LLL
                    LLLL
                    .RRR
                    RRRR
                );
                
my @file = <DATA>;
chomp @file;

for my $search (@searches) {
    for my $line (@file) {
        my ($string, $mask) = split /\s+/, $line, 2;
        my @matches;
        while ($mask =~ /\Q$search\E/g) {
            my $start = $-[0];
            my $end   = $+[0];
            push @matches, substr $string, $start, $end - $start;
        }
        local $" = ",";
        print "$line\t\t@matches\n";
    }
}


__DATA__
THAETURQU                 .LLL..RRR
JURYATAUTIOPW             .LLL...LL.RRR
TURIWOWQNQQUTYRIOPWMNHF   L..L..LLLL.LL.R..L.LLLL
[download]

Note I used \Q and \E to escape metacharacters since . has special meaning in regular expressions. See perlretut on info on how to build regular expressions.

[reply]
[d/l]

THAETURQU                .LLL..RRR                HA,AE
JURYATAUTIOPW            .LLL...LL.RRR            UR,RY,UT
TURIWOWQNQQUTYRIOPWMNHF  L..L..LLLL.LL.R..L.LLLL  WQ,QN,NQ,MN,NH,HF
[download]

THAETURQU                .LLL..RRR                HA
JURYATAUTIOPW            .LLL...LL.RRR            UR,UT
TURIWOWQNQQUTYRIOPWMNHF  L..L..LLLL.LL.R..L.LLLL  WQ,NQ,MN,HF
[download]

kennethk assumed the latter. The following handles the former:

#!/usr/bin/perl
use strict;
use warnings;

my @patterns = qw(
    .LLL
    LLLL
    .RRR
    RRRR
    LL
);

chomp( my @file = <DATA> );

for my $pat (@patterns) {
    print("pattern = $pat\n");

    our $string;
    our @matches;
    my $re = do {
        use re 'eval';
        qr/
            \Q$pat\E
            (?{
                my $start = $-[0];
                my $end   = $+[0];
                push @matches, substr($string, $start, $end - $start);
            })
            (?!)
        /x
    };

    for my $line (@file) {
        (local $string, my $mask) = split(/\s+/, $line, 2);
        local @matches;
        $mask =~ $re;
        local $" = ",";
        print "$line\t@matches\n";
    }

    print("\n");
}

__DATA__
THAETURQU                .LLL..RRR
JURYATAUTIOPW            .LLL...LL.RRR
TURIWOWQNQQUTYRIOPWMNHF  L..L..LLLL.LL.R..L.LLLL
[download]

pattern = .LLL
THAETURQU                .LLL..RRR                THAE
JURYATAUTIOPW            .LLL...LL.RRR            JURY
TURIWOWQNQQUTYRIOPWMNHF  L..L..LLLL.LL.R..L.LLLL  OWQN,WMNH

pattern = LLLL
THAETURQU                .LLL..RRR
JURYATAUTIOPW            .LLL...LL.RRR
TURIWOWQNQQUTYRIOPWMNHF  L..L..LLLL.LL.R..L.LLLL  WQNQ,MNHF

pattern = .RRR
THAETURQU                .LLL..RRR                URQU
JURYATAUTIOPW            .LLL...LL.RRR            IOPW
TURIWOWQNQQUTYRIOPWMNHF  L..L..LLLL.LL.R..L.LLLL

pattern = RRRR
THAETURQU                .LLL..RRR
JURYATAUTIOPW            .LLL...LL.RRR
TURIWOWQNQQUTYRIOPWMNHF  L..L..LLLL.LL.R..L.LLLL

pattern = LL
THAETURQU                .LLL..RRR                HA,AE
JURYATAUTIOPW            .LLL...LL.RRR            UR,RY,UT
TURIWOWQNQQUTYRIOPWMNHF  L..L..LLLL.LL.R..L.LLLL  WQ,QN,NQ,UT,MN,NH,HF
[download]

[reply]
[d/l]
[select]

This solution is similar to kennethk's but uses a look-ahead in the regular expression and pos to find where the match occurred rather than special variables.

use strict;
use warnings;

use 5.010;

my @searches = qw{
   .LLL
   LLLL
   .RRR
   RRRR
   };
                
chomp( my @lines = <DATA> );

foreach my $search ( @searches )
{
    my $rxSearch = qr{(?=\Q$search\E)};
    my $length   = length $search;
    foreach my $line ( @lines )
    {
        my ( $string, $mask ) = split m{\s+}, $line, 2;
        say qq{Looking for '$search' in '$mask'};

        while ( $mask =~ m{$rxSearch}g )
        {
            my $pos = pos $mask;
            my $ind = q{ } x length $mask;
            substr $ind, $pos, $length, q{^} x $length;
            say qq{     Found at offset $pos};
            say qq{     $string\n     $mask};
            say qq{     $ind - @{ [ substr $string, $pos, $length ] }}
+;
        }
    }
}


__DATA__
THAETURQU            .LLL..RRR
JURYATAUTIOPW            .LLL...LL.RRR
TURIWOWQNQQUTYRIOPWMNHF        L..L..LLLL.LL.R..L.LLLL
[download]

The output.

Looking for '.LLL' in '.LLL..RRR'
     Found at offset 0
     THAETURQU
     .LLL..RRR
     ^^^^      - THAE
Looking for '.LLL' in '.LLL...LL.RRR'
     Found at offset 0
     JURYATAUTIOPW
     .LLL...LL.RRR
     ^^^^          - JURY
Looking for '.LLL' in 'L..L..LLLL.LL.R..L.LLLL'
     Found at offset 5
     TURIWOWQNQQUTYRIOPWMNHF
     L..L..LLLL.LL.R..L.LLLL
          ^^^^               - OWQN
     Found at offset 18
     TURIWOWQNQQUTYRIOPWMNHF
     L..L..LLLL.LL.R..L.LLLL
                       ^^^^  - WMNH
Looking for 'LLLL' in '.LLL..RRR'
Looking for 'LLLL' in '.LLL...LL.RRR'
Looking for 'LLLL' in 'L..L..LLLL.LL.R..L.LLLL'
     Found at offset 6
     TURIWOWQNQQUTYRIOPWMNHF
     L..L..LLLL.LL.R..L.LLLL
           ^^^^              - WQNQ
     Found at offset 19
     TURIWOWQNQQUTYRIOPWMNHF
     L..L..LLLL.LL.R..L.LLLL
                        ^^^^ - MNHF
Looking for '.RRR' in '.LLL..RRR'
     Found at offset 5
     THAETURQU
     .LLL..RRR
          ^^^^ - URQU
Looking for '.RRR' in '.LLL...LL.RRR'
     Found at offset 9
     JURYATAUTIOPW
     .LLL...LL.RRR
              ^^^^ - IOPW
Looking for '.RRR' in 'L..L..LLLL.LL.R..L.LLLL'
Looking for 'RRRR' in '.LLL..RRR'
Looking for 'RRRR' in '.LLL...LL.RRR'
Looking for 'RRRR' in 'L..L..LLLL.LL.R..L.LLLL'
[download]

I hope this is useful.

Cheers,

JohnGG

Update: Corrected typo.

[reply]
[d/l]
[select]

Thanks guys - really kind of you! The output from JohnGG's first example was best, I can deal with that. I've a lot still to learn!

[reply]