ashnator has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I have to parse file 1 using it as a hash key table and then
using key locate the position in file 2. Once I have
located the position of key. Then after locating the
position I have to go backwards searing the word "FAN". Once
I locate the FAN word then I have to move in a window of 3
from the fan word until I reach my position and include it
in the 3 window and then print it to the output.
File 1 looks ilke this:- 12345 11 67890 21 File 2 looks like this:- 12345 ABCDEFANABCDEFGHIJKLMNOPQRSTVVWXWZ 67890 ABACFHAYJAYAFANJAKALAHUSSGSJISUSSKSOWUWSLSS --------------------END----------------
Now I have to do like this :-
Suppose I have found the position 11 of File 1 in File 2
(C). Then I have to search backward to find "FAN". Once
located then I have move by the window of 3 which will
include the position C and output the 3 letters which
include the position (C) => ABC is the output.
I have written a program but it is not giving the correct
output.
Please help
#!/usr/bin/perl -w my %href; my $fn = "key.txt"; open(FH, "$fn") || die "Cannot open file"; while (<FH>) { chomp($_); $href{$1} = $2 if $_ =~ /(\S+)\s+(\S+)/; } while (my ($key, $value) = each(%href)) { #print $key. ", ". $value."\n"; } open(FD,"<check.txt") || die("Can't open: $!"); $/ = '>'; while ( <FD> ) { chomp; next unless ( s{ \A (\S+) \s+ (?= \d ) }{}xms and exists( $href{$1 +} )); my $name = $1; my $position = $2; my @numbers = split /\w+/; my $one_number = $numbers[$href{$name} - 1]; #if ( $one_number >= $quality ) { print "$name\t\t$href{$name}\t$one_number\n"; #print F1 "$name\t\t$href{$name}\t$one_number\n"; # } } close FD; #close F1;

20081224 Janitored by Corion: Restored content

Replies are listed 'Best First'.
Re: Regex problem
by ikegami (Patriarch) on Dec 20, 2008 at 08:36 UTC
    • Always use use strict;! It finds an error immediately in this case.
    • s{ \A (\S+) \s+ (?= \d ) }{}xms only has one capture, so $2 will always be undef
    • No line in File2 has a space followed by a digit, so s{ \A (\S+) \s+ (?= \d ) }{}xms will never match.
    #!/usr/bin/perl use strict; use warnings; my $qfn1 = "File1.txt"; my $qfn2 = "File2.txt"; my %positions; { open(my $fh, '<', $qfn1) or die("Cannot open file \"$qfn1\": $!\n"); while (<$fh>) { my ($key, $pos) = split /\s+/; $positions{$key} = $pos; } } { open(my $fh, '<', $qfn2) or die("Cannot open file \"$qfn2\": $!\n"); for (;;) { defined( my $key = <$fh> ) or last; defined( my $text = <$fh> ) or last; chomp($key); chomp($text); defined( my $pos = $positions{$key} ) or next; $pos = $pos - 1 - 3; $pos >= 0 or next; my ($str) = $text =~ /^.{0,$pos}FAN(.{3})/ or next; print("$key: $str\n"); } }
    12345: ABC 67890: JAK
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Regex problem
by ig (Vicar) on Dec 20, 2008 at 16:08 UTC

    As an alternative to the regular expression ikegami showed you, you can use rindex and substr to locate and extract your text. You might find this a little easier to understand (after you read the manual pages). For example:

    use strict; use warnings; my $pos = 21; my $str = "ABCDEFANABCDEFGHIJKLMNOPQRSTVVWXWZ"; my $pos_fan = rindex($str, "FAN", $pos); print substr($str, $pos_fan + 3, 3) . "\n";

    produces

    ABC
Extracting Locations in 3 window size
by ashnator (Sexton) on Dec 20, 2008 at 16:28 UTC
    Dear Monks,
    I actually want the position in File 1 like suppose 21 of 67890 in this case to be fit in 3 word size window after finding the word FAN and starting the 3 window triplets from
    that pattern until it reaches the position 21 and fits this position in 3 window size and prints it out to the output.

    I have arrived at this code after some changes but The program is printing the 3 word size only just after the FAN word not talking care of the position in the File 1. Example
    in the second case 67890 the position is 21 but the output is printing JAK instead of ALA. That means its printing position just next to FAN always instead of the original
    position in File 1.
    File 1 looks ilke this:- 12345 14 67890 21 File 2 looks like this:- >gi|12345|ref|OM_2343434|Some text ... ABCDEFANADEFGHIJKLMNOPQRSTVVWXWZ >gi|67890|ref|HM_2338373|Some text ... ABACFHAYJAYAFANJAKALAHUSSGSJISUSSKSOWUWSLSS output should be like this:- 12345 11 FGH 67890 21 ALA Also the File 1 position character should be made bold.
    The Code is like this:-
    #!/usr/bin/perl use strict; use warnings; my $qfn1 = "File1.txt"; my $qfn2 = "File2.txt"; my %positions; { open(my $fh, '<', $qfn1) or die("Cannot open file \"$qfn1\": $!\n"); while (<$fh>) { my ($key, $pos) = split /\s+/; $positions{$key} = $pos; } } { open(my $fh, '<', $qfn2) or die("Cannot open file \"$qfn2\": $!\n"); for (;;) { defined( my $key = <$fh> ) or last; defined( my $text = <$fh> ) or last; chomp($key); chomp($text); defined( my $pos = $positions{$key} ) or next; $pos = $pos - 1 - 3; $pos >= 0 or next; my ($str) = $text =~ /^.{0,$pos}FAN(.{3})/ or next; print("$key: $str\n"); } }

    20081224 Janitored by Corion: Restored content

      The following produces the output you describe, except for the bolding. For hilighting you might have a look at Term::ANSIColor.

      #!/usr/bin/perl use strict; use warnings; my $qfn1 = "File1.txt"; my $qfn2 = "File2.txt"; my %positions; { open(my $fh, '<', $qfn1) or die("Cannot open file \"$qfn1\": $!\n"); while (<$fh>) { my ($key, $pos) = split /\s+/; $positions{$key} = $pos; } } { open(my $fh, '<', $qfn2) or die("Cannot open file \"$qfn2\": $!\n"); for (;;) { defined( my $key = <$fh> ) or last; defined( my $text = <$fh> ) or last; chomp($key); chomp($text); $key = (split(/\|/,$key,3))[1]; defined( my $pos = $positions{$key} ) or next; my $index = rindex($text, "FAN", $pos); next if ( $index < 0 ); $index += 3 while ( ($index + 3) < $pos); print "$key $pos " . substr($text, $index, 3) . "\n"; } }