in reply to Re^11: Read, match string and print
in thread Read, match string and print

No doubt I should learn Perl more (and programming in general). I guess I managed to handle the print statement. Now, the code is:
#!/usr/bin/perl use strict; use warnings; my %info; # here we collect all information my @columns = qw(gi version cds); # The name and order of the columns +we want to print my $data = '/DATA/GenBankFile.gb'; # GenBank file is located at C:\DAT +A open INFILE, '<', $data or die "Cannot!\n"; while (<INFILE>) { if (m!GI:(\d+)!) { if ($info{cds}) { # we are in a CDS block $info{gi} = $1; }; } elsif (m!^\s+CDS\s+(.*)!) { # a new gene information has started flush_info(); # now remember the CDS $info{cds} = $1 } elsif (m!^VERSION.*\w:(\d+)! ) { $info{ version } = $1; #} else {warn "Ignoring unknown line [$_]\n";}; };} # Output any leftover information: #flush_info(); sub flush_info { # print out all information: print "$info{gi}"."\t"."$info{version}"."\t"."$info{cds}\n"; # and forget the collected information %info = (); }; >
And here is some part of the output:
Use of uninitialized value $info{"gi"} in string at C:\Perl\bin\wtf14. +pl line 32, <INFILE> line 154. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 154. Use of uninitialized value $info{"cds"} in concatenation (.) or string + at C:\Perl\bin\wtf14.pl line 32, <INFILE> line 154. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 180. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 206. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 232. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 258. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 284. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 314. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 345. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 374. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 404. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 434. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 481. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 498. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 534. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 600. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 617. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 634. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 667. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 710. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 721. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 746. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 773. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 836. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 848. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 860. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 892. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 938. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 952. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 996. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1009. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1046. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1070. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1098. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1131. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1174. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1188. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1216. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1256. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1269. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1329. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1344. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1360. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1403. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1415. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1445. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1490. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1505. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1533. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1591. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1606. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1621. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1710. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1767. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1781. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1795. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1827. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1865. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1899. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1934. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1982. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 1998. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 2048. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 2083. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 2119. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 2264. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 2282. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 2300. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 2337. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 2382. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 2419. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 2450. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf14.pl line 32, <INFILE> line 2459. 30061484 join(68351..68408,76646..77058) 30061486 join(123270..123327,126056..126333) 83582801 join(138186..138234,139415..139665) 30061487 complement(join(168527..168759,170216..170264)) 197245446 join(207930..207987,209919..210412) 23943927 join(238420..238477,239718..239947) 55769555 complement(join(251848..251908,256609..256727, 21264338 278228..279439 197245445 306569..307516 41327717 join(330288..330476,333854..334279) 144953899 join(368655..368945,371931..372223,376842..377334) 5454168 join(389383..389423,398170..398263,398376..398574, 111185943 join(389402..389423,390525..390669,398170..398263, 47419901 complement(join(419230..419485,419752..419939, 4503095 complement(join(464605..464720,467020..467106, 29570791 complement(join(464605..464720,467020..467106, 48255908 complement(join(464605..464720,467020..467106, 22129777 complement(join(585235..585309,590357..590881)) 22129778 complement(join(629358..629561,633620..633829)) 197245441 join(629536..629555,633613..634228,642601..643113) 156564358 complement(join(644315..645105,656113..656245)) 108389007 complement(join(741670..741882,742345..742468, 46391104 825448..826335 108389127 825448..826335 21328453 825448..826335 83722283 complement(join(853603..853763,854927..855057, 90970328 complement(join(941000..941109,947817..947957, 145611425 complement(join(941000..941109,944578..944763, 145611430 join(1099417..1099545,1106141..1106293,1108069..11081 +51,

Replies are listed 'Best First'.
Re^13: Read, match string and print
by Corion (Patriarch) on Feb 08, 2010 at 11:09 UTC

    I guess that's because the VERSION information never gets set, or maybe gets set only once. But only you can tell that because you are the only one who has the input file. Consider looking at your input file, printing out each line of the input file, and then also printing out what the script does (ignore, capture GI info, capture CDS info, capture VERSION info), and also outputting the current state as it is kept in %info (using Data::Dumper for example). Then you will understand where your program does the wrong thing.

      Indeed, the culprit is the VERSION which appears only once and needs to be printed everytime along with the other two variables. So it should look like:
      gi1 VERSION cds1 gi2 VERSION cds2 . . . . . . . . .
      Okay, and here is the code:
      #!/usr/bin/perl use strict; use warnings; my %info = ('gi' => "", 'version' => "", 'cds' => ""); # here we collect all information sub flush_info { # print out all information: print "$info{gi}"."\t"."$info{version}"."\t"."$info{c +ds}\n"; %info = (); }; my $data = '/DATA/GenBankFile.gb'; # GenBank file is located at C:\DAT +A open (INFILE, '<', $data) or die "Cannot!\n"; while (<INFILE>) { last if m!//$!; if (m!^VERSION.*\w:(\d+)! ) { $info{version} = $1; } elsif (m!GI:(\d+)!) { if ($info{cds}) { # we are in a CDS block $info{gi} = $1; }; } elsif (m!^\s+CDS\s+(.*)!) { # a new gene information has started flush_info(); # now remember the CDS $info{cds} = $1; } else {warn "Ignoring unknown line [$_]\n";}; }; # Output any leftover information: flush_info();
      and for this file:
      LOCUS NC_0000230 600020 bp DNA linear CON 21- +APR-2007 VERSION NC_000023.10 GI:123456789 CDS join(11111..222222,333333..444444) /db_xref="GI:55555555" CDS join(66666..7777777,888888..99999) /db_xref="GI:10101010" //
      --this is what we get:
      Ignoring unknown line [LOCUS NC_0000230 600020 bp D +NA linear CON 21-APR-2007 ] Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf16.pl line 12, <INFILE> line 5. Use of uninitialized value $info{"version"} in string at C:\Perl\bin\w +tf16.pl line 12, <INFILE> line 7. 123456789 55555555 join(11111..222222,333333..444444) 10101010 join(66666..7777777,888888..99999)
      First line is ignored as expected. Then it founds the VERSION and keep its value. Once it made its way to flush_info(), it gets printed without waiting for the other two variables, and becomes empty. Since there is no other VERSION for the remainder of the code, it never gets printed (well, it prints "").

      I thought of two ways to overcome this problem, but either they did not work out or I did not implement them properly.

      1. Within the sub routine, I introduced a next along with an if statement as follows:

      sub flush_info { # print out all information: next print "$info{gi}"."\t"."$info{version}"."\t"."$inf +o{cds}\n" if ($info{gi} or $info{version} or $info{cds} eq undef; %info = (); };

      Idea was to skip printing unless all three variables are defined. By doing this, I intended to solve the problem with the first line of output (i,e. only printing one variable)

      2. Within the while loop, I tried to set the VERSION permanently, but I could not do it. I used matching =~ as an assignment and alternatively introduced a fourth variable to keep the value of the VERSION to use this value for printing purposes.

      No progress. =) I am stucked once again, hooray!