in reply to While loop with nested if statements

If you had some example input, I could perhaps run the code. Right now all I can see are the compiler errors.

However, some things look odd to me:

my $headerHash; $headerHash{$1} = $2;
You have already declared my %headerHash; outside of the loop. Here $headerHash is declared as a scalar, not a hash! Perl does allow different name spaces for hash vs scalar. You can have both a scalar and a hash named "headerHash", but in this case, I think this is a bad idea. I suggest you change this scalar version of $headerHash to something else. Your code is very confusing.

Update: Even if you move my $headerhash; above the while loop, On line 28, print OUTFILE $headerHash, "\n", $seq, "\n";, that will create a runtime error because there is nowhere that I can see where this scalar $headerhash is assigned any value, it will be "undef".

Maybe you are confused about hash syntax? A hash like %headerhash is accessed with 2 things, a key and a value, like $hash{$key}=$value.

Be careful, something like %hash=55; doesn't do what you think!! This increases the size of the hash, i.e. more buckets. This doesn't assign a value to the hash. Consider:

#!/usr/bin/perl use strict; my %hash; #defaults to 8 buckets $hash{a}=33; #use one of the buckets my $buckets = %hash; print "$buckets\n"; #prints 1/8, 1 of 8 buckets used keys(%hash)=32; #increase size of hash to 32 buckets $buckets = %hash; print "$buckets\n"; #prints 1/32 1 of 32 buckets used
Perl starts a new hash with a default of 8 buckets. When it needs to grow the hash, it doubles the number of buckets. 8,16,32,64,etc. I have bench marked presetting big hashes to big bucket sizes to prevent this auto re-sizing, but found out that this makes almost no difference in performance. Perl is surprisingly efficient at this transparent operation. It is best to just let Perl "do its thing" without trying to overly "help it". I just mention this here to show some perhaps error that could produce some very unexpected results if you botch the hash assignment syntax.

Replies are listed 'Best First'.
Re^2: While loop with nested if statements
by lairel (Novice) on Apr 03, 2016 at 16:21 UTC

    I've added an example of the input and output to my original post, and my code is currently looking like this, but again I am still not sure I am using the hash correctly, and the hash key is really throwing me. I'm not getting any errors with this code, but my output file is empty

    #!/usr/bin/perl use warnings; use strict; use diagnostics; open( INFILE, "<", 'myosin.fasta') or die $!; #open original myosin.fa +sta for reading open( OUTFILE, ">", 'modifiedHeaders.fasta') or die $!; #open/create n +ew fasta for writing my %headerHash; #create hash for the header my $seq; my $header; #loop through old file while (<INFILE>){ chomp; my $line = $_; # use if/then to seperate headers from seq, while regex to select only + species name in header if ($line =~ /^>/){ $header = $line; if ($header =~ /(>gi.*)\[(.+)\](>g.*)\[(.+)\]/ +){ my $headerHash = $2; $headerHash{$2} = $2; } } else { $seq = $line; } } close INFILE; for my $key(keys %headerHash){ print OUTFILE $headerHash{$key}, "\n", $seq, "\n"; }
      I still see multiple problems. Can you show what you expect the output lines to be? I guess there is some input now on the OP. I get now:
      Global symbol "$headerHash" requires explicit package name at C:\Proje +cts_Perl\anotherfastathing.pl line 30. Execution of C:\Projects_Perl\anotherfastathing.pl aborted due to comp +ilation errors (#1) (F) You've said "use strict" or "use strict vars", which indicates + that all variables must either be lexically scoped (using "my" or +"state"), declared beforehand using "our", or explicitly qualified to say which package the global variable is in (using "::"). Uncaught exception from user code: Global symbol "$headerHash" requires explicit package name at C:\P +rojects_Perl\anotherfastathing.pl line 30. Execution of C:\Projects_Perl\anotherfastathing.pl aborted due to +compilation errors. Process completed with exit code 255
      Update: Ok, this looks like what you want, why not?
      #!/usr/bin/perl use warnings; use strict; use diagnostics; my $firstline = <DATA>; my ($species) = $firstline =~ /\[(.+?)\]/; print "$species\n"; while (<DATA>) {print;} =Prints: Homo sapiens MSSDSEMAIFGEAAPFLRKSERERIEAQNKPFDAKTSVFVVDPKESFVKATVQSREGGKVTAKTEAGATV +TVKDDQVFPM NPPKYDKIEDMAMMTHLHEPAVLYNLKERYAAWMIYTYSGLFCVTVNPYKWLPVYNAEVVTAYRGKKRQE +APPHIFSISD NAYQFMLTDRENQSILITGESGAGKTVNTKRVIQYFATIAVTGEKKKEEVTSGKMQGTLEDQIISANPLL +EAFGNAKTVR NDNSSRFGKFIRIHFGTTGKLASADIETYLLEKSRVTFQLKAERSYHIFYQIMSNKKPDLIEMLLITTNP +YDYAFVSQGE ITVPSIDDQEELMATDSAIEILGFTSDERVSIYKLTGAVMHYGNMKFKQKQREEQAEPDGTEVADKAAYL +QNLNSADLLK ALCYPRVKVGNEYVTKGQTVQQVYNAVGALAKAVYDKMFLWMVTRINQQLDTKQPRQYFIGVLDIAGFEI +FDFNSLEQLC INFTNEKLQQFFNHHMFVLEQEEYKKEGIEWTFIDFGMDLAACIELIEKPMGIFSILEEECMFPKATDTS +FKNKLYEQHL GKSNNFQKPKPAKGKPEAHFSLIHYAGTVDYNIAGWLDKNKDPLNETVVGLYQKSAMKTLALLFVGATGA +EAEAGGGKKG GKKKGSSFQTVSALFRENLNKLMTNLRSTHPHFVRCIIPNETKTPGAMEHELVLHQLRCNGVLEGIRICR +KGFPSRILYA =cut __DATA__ >gi|115527082|ref|NP_005954.3| myosin-1 [Homo sapiens] >gi|226694176|s +p|P12882.3|MYH1_HUMAN RecName: Full=Myosin-1; AltName: Full=Myosin he +avy chain 1; AltName: Full=Myosin heavy chain 2x; Short=MyHC-2x; AltN +ame: Full=Myosin heavy chain IIx/d; Short=MyHC-IIx/d; AltName: Full=M +yosin heavy chain, skeletal muscle, adult 1 [Homo sapiens] >gi|119610 +411|gb|EAW90005.1| hCG1986604, isoform CRA_b MSSDSEMAIFGEAAPFLRKSERERIEAQNKPFDAKTSVFVVDPKESFVKATVQSREGGKVTAKTEAGATV +TVKDDQVFPM NPPKYDKIEDMAMMTHLHEPAVLYNLKERYAAWMIYTYSGLFCVTVNPYKWLPVYNAEVVTAYRGKKRQE +APPHIFSISD NAYQFMLTDRENQSILITGESGAGKTVNTKRVIQYFATIAVTGEKKKEEVTSGKMQGTLEDQIISANPLL +EAFGNAKTVR NDNSSRFGKFIRIHFGTTGKLASADIETYLLEKSRVTFQLKAERSYHIFYQIMSNKKPDLIEMLLITTNP +YDYAFVSQGE ITVPSIDDQEELMATDSAIEILGFTSDERVSIYKLTGAVMHYGNMKFKQKQREEQAEPDGTEVADKAAYL +QNLNSADLLK ALCYPRVKVGNEYVTKGQTVQQVYNAVGALAKAVYDKMFLWMVTRINQQLDTKQPRQYFIGVLDIAGFEI +FDFNSLEQLC INFTNEKLQQFFNHHMFVLEQEEYKKEGIEWTFIDFGMDLAACIELIEKPMGIFSILEEECMFPKATDTS +FKNKLYEQHL GKSNNFQKPKPAKGKPEAHFSLIHYAGTVDYNIAGWLDKNKDPLNETVVGLYQKSAMKTLALLFVGATGA +EAEAGGGKKG GKKKGSSFQTVSALFRENLNKLMTNLRSTHPHFVRCIIPNETKTPGAMEHELVLHQLRCNGVLEGIRICR +KGFPSRILYA
        #!/usr/bin/perl-w use strict; use warnings; my %hash=(); my $key = ''; while(<>){ chomp; if($_ =~ /^>gi.*\[(.+)\]\s>gi.*/){ $key = $1; }else{ $hash{$key} .= $_; } } foreach(keys %hash){ print join("\n",$_,$hash{$_}),"\n"; }
        This does what you need