Re: Skript help needed - RegEx & Hashes
by haukex (Archbishop) on Oct 10, 2018 at 11:00 UTC
|
Welcome to Perl and PerlMonks, PandaRaey!
First, a few general tips:
- It's very good you're using strict and warnings! However, note that pre-declaring your variables at the top of the script like that is not so good, because they are then like global variables. Instead, it's best to declare them at the smallest scope where they are needed. For example, foreach my $folder (@folders) {, while ( my $file = readdir(DIR) ) {, or my $reads = 0;. Skimming your code, I don't see any obvious scoping issues caused by the global variables, but I might have overlooked something.
Please try to use consistent indentation. perltidy can help.
- I would recommend using the more modern three-argument form of open, and lexical filehandles (my $fh instead of bareword filehandles like FILE, since the latter are global). For example: open my $fh, '<', $filename or die "$filename: $!";
- Always check open for errors, which you do on your first open, but not on the second or third.
The last point may even account for your problem, "I do not get the content I should printed into the Merge-File". Another possibility is that you might want to open that file for appending (">>"), because ">" overwrites the file (Update: I just saw that hippo made the same point.).
Other than that, I have looked at your code, and nothing obvious has jumped out at me yet. A few thoughts: You don't seem to be chomping the lines you read from files (removing the newline at the end), and you could use some of the tips from the Basic debugging checklist, like using Data::Dumper to print out the contents of your variables while your program is running (I recommend setting $Data::Dumper::Useqq=1; to see whitespace better). Also, I see you're doing $tRNAname = $line[0]; $tRNAname = $&;, which doesn't make sense to me because the second assignment will just overwrite the first. Other than that, your expected output seems to depend on your data and algorithm.
The problem is that without sample data, we can't really run the code. It would be best if you could provide a Short, Self-Contained, Correct Example, that is, some small sample input that demonstrates your problem, the expected output for that input, and your actual output, including any error messages you might be getting (all within <code> tags). Also a description of what your algorithm is supposed to be doing would help.
| [reply] [d/l] [select] |
|
|
All good points (++). One minor comment:
Please try to use consistent indentation.
From my quick eye-parse of the code, PandaRaey is using consistent indentation. It's just that their choice of indentation scheme (Whitesmiths) is unusual to see in Perl.
| [reply] |
|
|
Yes, you're right! (When I skimmed the code, the final closes jumped out at me as looking off, I'm obviously not used to this style.)
Sorry, PandaRaey, this is Perl and There Is More Than One Way To Do It, you're free to use whatever indentation style you like, as long as it's consistent (which it is in this case). If I may make a minor comment though: some more whitespace in statements like foreach$folder(@folders) would make them a little easier to read, IMHO.
| [reply] [d/l] [select] |
Re: Script help needed - RegEx & Hashes
by hippo (Archbishop) on Oct 10, 2018 at 10:58 UTC
|
open(MERGE,">merge"); #open new file to save the new sortet stuff in
You are doing this inside the foreach loop so each time round the loop the contents of this file get clobbered (ie. erased) on the new call to open. Use append mode instead:
open (MERGE, '>>merge');
Alternatively you could open the file once before starting the foreach loop and close it once after the end. There are pros and cons to both approaches neither of which should probably concern you today.
Let us know if this solves it for you.
| [reply] [d/l] [select] |
Re: Skript help needed - RegEx & Hashes
by PandaRaey (Novice) on Oct 10, 2018 at 12:56 UTC
|
Thank you all already very much for the super fast replies. I will work myself through the tips and tricks and post an update as soon as I can.
Because it was asked for here a more detailed description of my problem:
I have several folders which all contain two specific files. One that ends on ".mapped_sequences" and one that always has the same name "unitas.tRF-table.txt".
The mapped_sequences file looks like this with always a number and a gene sequence:
>1
CCTCCTCTACCTCATCCCAGTT
>1
GGGTTCGATTCCCGGTCAGGGAT
The other file looks like this (without the four header lines and just a few example lines as the whole file is a bit big):
source_tRNA 5p-tR-halves (fractionated) 5p-tR-halves (absolute)
+ 5p-tRFs (fractionated) 5p-tRFs (absolute) 3p-tR-halves (fra
+ctionated) 3p-tR-halves (absolute) 3p-CCA-tRFs (fractionated)
+ 3p-CCA-tRFs (absolute) 3p-tRFs (fractionated) 3p-tRFs (absolu
+te) tRF-1 (fractionated) tRF-1 (absolute) tRNA-leader (fract
+ionated) tRNA-leader (absolute) misc-tRFs (fractionated) mis
+c-tRFs (absolute)
MT-TL2 0 0 0 0 0 0 0 0 0 0 0 0
+6.16666666666667 18 0 0
MT-TL2-ENSG00000210191.1 1 1 4 4 0 0 0 0 0
+ 0 0 0 0 0 124 124
MT-TM 0 0 0 0 0 0 0 0 0 0 6 6 0
+ 0 0 0
MT-TM-ENSG00000210112.1 13 13 9 9 0 0 0 0 0
+ 0 0 0 0 0 40.8333333333333 43
MT-TN 0 0 0 0 0 0 0 0 0 0 1.5 3
+ 2 2 0 0
MT-TN-ENSG00000210135.1 0 0 1 1 0 0 0 0 0
+ 0 0 0 0 0 25.25 26
MT-TP 0 0 0 0 0 0 0 0 0 0 2 2 0
+ 0 0 0
tRNA-Ala-AGC-1-1 0 0 0.142857142857143 1 0 0 0
+ 0 0 0 0 0 0 0 1.21693121693122 10
tRNA-Ala-AGC-11-1 0 0 0 0 0 0 0 0 0 0
+ 0 0 0 0 9.99444444444444 39
tRNA-Ala-AGC-15-1 0 0 0 0 0 0 0 0 0 0
+ 0 0 0 0 4.26111111111111 21
tRNA-Ala-AGC-2-1 0 0 0.166666666666667 2 0 0 0
+ 0 0.0909090909090909 1 0 0 0 0 1.53835978835979
+ 12
tRNA-Ala-AGC-2-2 0 0 0.166666666666667 2 0 0 0
+ 0 0.0909090909090909 1 0 0 0 0 1.53835978835979
+ 12
tRNA-Ala-AGC-3-1 0 0 0.166666666666667 2 0 0 0
+ 0 0.0909090909090909 1 0 0 0 0 1.21693121693122
+ 10
tRNA-Ala-AGC-4-1 0 0 5.75 46 0 0 0 0 0 0
+ 0 0 0 0 1.17407407407407 13
tRNA-Ala-AGC-5-1 0 0 0.166666666666667 2 0 0 0
+ 0 0 0 0 0 0 0 1.21693121693122 10
tRNA-Ala-AGC-6-1 0 0 0 0 0 0 0 0 0 0
+0 0 0 0 2 2
tRNA-Ala-AGC-7-1 0 0 0.166666666666667 2 0 0 0
+ 0 0 0 0 0 0 0 1.53835978835979 12
tRNA-Ala-AGC-8-1 0 0 0.5 1 0 0 0 0 0 0
+ 0 0 0 0 9.99444444444444 39
tRNA-Ala-AGC-8-2 0 0 0.5 1 0 0 0 0 0 0
+ 0 0 0 0 9.99444444444444 39
tRNA-Ala-AGC-9-1 0 0 0 0 0 0 0 0 0 0
+0 0 0 0 0.511111111111111 3
tRNA-Ala-AGC-9-2 0 0 0 0 0 0 0 0 0 0
+0 0 0 0 0.511111111111111 3
tRNA-Ala-CGC-1-1 0 0 5.75 46 0 0 0 0 0 0
+ 0 0 0 0 5.84074074074074 21
tRNA-Ala-CGC-2-1 0 0 5.75 46 0 0 0 0 0 0
+ 19 19 1 1 4.75740740740741 21
tRNA-Ala-CGC-3-1 0 0 5.75 46 0 0 0 0 0 0
+ 10 10 0 0 6.07407407407407 8
tRNA-Ala-CGC-4-1 0 0 0.166666666666667 2 0 0 0
+ 0 0 0 0 0 0 0 1.28835978835979 11
tRNA-Ala-TGC-1-1 0 0 0.166666666666667 2 0 0 0
+ 0 0.0909090909090909 1 0 0 0 0 5.12645502645503
+ 24
tRNA-Ala-TGC-2-1 0 0 5.75 46 0 0 0 0 0 0
+ 0 0 0 0 5.12645502645503 24
tRNA-Ala-TGC-3-1 0 0 5.75 46 0 0 0 0 0 0
+ 0 0 0 0 29.2931216931217 74
tRNA-Ala-TGC-3-2 0 0 5.75 46 0 0 0 0 0 0
+ 0 0 0 0 29.2931216931217 74
tRNA-Ala-TGC-4-1 0 0 5.75 46 0 0 0 0 0 0
+ 0 0 0 0 95.7097883597884 113
tRNA-Ala-TGC-5-1 0 0 0.166666666666667 2 0 0 0
+ 0 0 0 0 0 0 0 2.20978835978836 17
tRNA-Ala-TGC-6-1 0 0 0.166666666666667 2 0 0 0
+ 0 0.0909090909090909 1 0 0 0 0 0.07407407407407
+41 2
tRNA-Ala-TGC-7-1 0 0 0.166666666666667 2 0 0 0
+ 0 0 0 0 0 0 0 2.20978835978836 17
tRNA-Arg-ACG-1-1 0 0 0.2 2 0 0 0.142857142857143
+ 1 0 0 13 13 0 0 9.83333333333333 95
So the first task was to count all of the numbers form the first file together (the reads) which is the one thing I got to work and it's doing it very well for all the files.
The next task would to re-calculate the numbers in the 2nd file (number/reads*1000000) and afterwards sum together the numbers. As you can see from the 2nd code example there are multiple lines for the same amino-acid combination and all for one combination should be summed up together and saved in a new more organized file (the merged file and only the columns with the fractioned parts). I hope I could somehow explain what this script should do.
Regarding the indentation style - what would be a common one? I have to admit I only know this one. I got a book from my professor to find my way into perl and that was the one they used there so I kinda stuck to that.
Once again, thank you all ready for the super quick replies. I am very glad I found so much help so quickly ~Panda | [reply] [d/l] [select] |
Re: Skript help needed - RegEx & Hashes
by jwkrahn (Abbot) on Oct 10, 2018 at 18:17 UTC
|
@folders=glob("*"); #to get all folders in directory; extension ("*")
+as wildcard to get all names
foreach$folder(@folders) #to speak to each element in directory
{
next if ($folder!~/^UNITAS_/); #skip elements which do not start w
+ith "UNITAS"
Because you are only interested in "folders" that begin with the string "UNITAS_" you can do that with glob:
# to get UNITAS_* folders in directory
my @folders = glob "UNITAS_*";
# to speak to each element in directory
foreach my $folder ( @folders )
{
$head=<TRF>; #remove the first four lines of the trf-table.txt
+ file
$head=<TRF>;
$head=<TRF>;
$head=<TRF>;
You don't need a variable to do that:
undef = <TRF>; #remove the first four lines of the trf-table.t
+xt file
undef = <TRF>;
undef = <TRF>;
undef = <TRF>;
Or use a loop:
# remove the first four lines of the trf-table.txt file
undef = <TRF> for 1 .. 4;
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
Re: Skript help needed - RegEx & Hashes
by poj (Abbot) on Oct 10, 2018 at 20:00 UTC
|
while($line=<TRF>)
{
# @line=split("\t",$trftable); # error
@line=split("\t",$line);
also you probably want the merge file in the same folder as the other 2 files
open MERGE,">","$folder/merge"
or die "Could not open $folder/merge : $!";
poj
| [reply] [d/l] [select] |
Re: Skript help needed - RegEx & Hashes
by PandaRaey (Novice) on Oct 11, 2018 at 17:42 UTC
|
Thank you all so so much for your feedback and the help, you are my heros right now. The errors pointed out by poj, hippo and haukex solved the issue, so when I only make those quick changes (the one variable change still can't believe I overlooked that I used the wrong variable all along and the append mode) in my old script and it worked wonders. I get what I want and I could cry tears of happiness right now. You guys can not imagine the relief I am feeling right now.
However I do not just want a working script anymore but actually one that is also looking good. For that I would love to implement all the changes that have been suggested, however I am running in several problems while doing so.
1) when I try to declare the variables when I need them for some reasons I still get the following warning: Global symbol "VARIABLE" requires explicit package name (did you forget to declare "my VARIABLE"?)
2) While trying to use the newer suggested way of opening files I get the following warning: Scalar found where operator expected at new_reads.pl line 24, near """$folder" (Missing operator before $folder?)
I don't really understand why these errors are occurring, but I would love to get it fixed simply because I want my script not to be an "outdated" thing, using old ways of handling files and variables lol.
I copied the "newer" version of the script below and once again, thank all of you for your help.
#!/usr/bin/perl -w
use strict;
use warnings;
#Initiate all variables, hashes and co
#Open folders in working directy
my @folders = glob("*"); #to get all folders in directory; extension (
+"*") as wildcard to get all names
foreach my $folder(@folders) #to speak to each element in directory
{
next if ($folder!~/^UNITAS_/); #skip elements which do not start w
+ith "UNITAS"
opendir(DIR,$folder)||die print$!; #open folder, end script when o
+pening is not possible (DIR is the "filehandle" for the directory)
print"\n$folder";
while( my $file=readdir(DIR)) #returns content of folder
{
next if($file!~/\.mapped_sequences$/); #get the mapped_sequenc
+es file we need to read out the reads
print"\n$file"; #print out file names to make sure we get the
+right files
my $reads = 0; #set the number of reads to 0 for each run
open my $fileone, '<', "$folder/$file" or die ""$folder/$file"
+: $!";
while(my $tocount=<$fileone>)#read file
{
chomp $tocount;
$tocount =~ s/>//g; #remove all ">"
next if ($tocount =~ /[A-Za-z]/); #skip lines which contai
+n the sequence
if ($tocount =~ /[0-9]/) #get the read-number
{
print"\n$tocount";
$reads = ($reads + $tocount); # add up all reads
}
print"\n$reads";
}
close $fileone;
my $trftable = 'unitas.tRF-table.txt'; #save file name in vari
+able
open my $trf, '<', "$folder/$trftable" or die ""$folder/$trfta
+ble": $!";
undef = <$trf> for 1 .. 4;
my %hash = (); #initiate empty hash
while( my $line=<$trf>)
{
chomp $line;
my @line=split("\t",$line);
if($line[0]=~s/tRNA-[^-]+-...//) # "tRNA-"(matched tRNA un
+d -) "[^-]+" beginning bis Ende, egal was "-..."(weiterer Strich bis
+Ende)
{
my $tRNAname=$line[0];
$tRNAname=$&; # "$&" = last pattern match
print"\n$tRNAname";
}
else
{
my $tRNAname=$line[0];
$tRNAname=~s/-ENS.+$//; # "-ENS.+$" ( matched allen di
+e -ENS. bis Ende enthalten)
print"\n$tRNAname";
}
my $hash{$tRNAname}{"5p-tR-halves"}+=$line[1]/$reads*10000
+00;
$hash{$tRNAname}{"5p-tRFs"}+=$line[3]/$reads*1000000;
$hash{$tRNAname}{"3p-tR-halves"}+=$line[5]/$reads*1000000;
$hash{$tRNAname}{"3p-CCA-tRFs"}+=$line[7]/$reads*1000000;
$hash{$tRNAname}{"3p-tRFs"}+=$line[9]/$reads*1000000;
$hash{$tRNAname}{"tRF-1"}+=$line[11]/$reads*1000000;
$hash{$tRNAname}{"tRNA-leader"}+=$line[13]/$reads*1000000;
$hash{$tRNAname}{"misc-tRFs"}+=$line[15]/$reads*1000000;
}
open my $merge,">>","$folder/$merge" or die "Could not open $f
+older/$merge : $!";
my @tRF_types=("5p-tR-halves","5p-tRFs","3p-tR-halves","3p-CCA
+-tRFs","3p-tRFs","tRF-1","tRNA-leader","misc-tRFs");
foreach $tRNAname(sort{$a cmp $b}keys%hash) #sortiert die alph
+abetisch nach keys
{
print MERGE $tRNAname; # print tRNA name
foreach my $tRF_type(@tRF_types)
{
print MERGE"\t$hash{$tRNAname}{$tRF_type}"; # print co
+unts for each tRF type separated by tab
}
print MERGE"\n";# print newline
}
close TRF;
close MERGE;
close DIR;
}
}
| [reply] [d/l] |
|
|
line 24 - remove double "
#open my $fileone, '<', "$folder/$file" or die ""$folder/$file": $!";
open my $fileone, '<', "$folder/$file" or die "$folder/$file: $!";
line 44 - same
#open my $trf, '<', "$folder/$trftable" or die ""$folder/$trftable": $!";
open my $trf, '<', "$folder/$trftable" or die "$folder/$trftable: $!";
line 46 remove undef
#undef = <$trf> for 1 .. 4;
<$trf> for 1 .. 4;
line 49 add declare here to expand scope of variable
my $tRNAname;
line 57 - remove my
#my $tRNAname=$line[0];
$tRNAname=$line[0];
line 63 remove my
#my $tRNAname=$line[0];
$tRNAname=$line[0];
line 68 - remove my ( as %hash declared earlier )
#my $hash{$tRNAname}{"5p-tR-halves"}+=$line1/$reads*1000000;
$hash{$tRNAname}{"5p-tR-halves"}+=$line1/$reads*1000000;
line 69 - change $merge after $folder to merge
#open my $merge,">>","$folder/$merge" or die "Could not open $folder/$merge : $!";
open my $merge,">>","$folder/merge" or die "Could not open $folder/merge : $!";
line 84..94 change MERGE to $merge
print MERGE $tRNAname; # print tRNA name
foreach my $tRF_type(@tRF_types)
{
print MERGE"\t$hash{$tRNAname}{$tRF_type}"; # print counts for each tRF type separated by tab
}
print MERGE"\n";# print newline
}
close TRF;
close MERGE;
close DIR;
poj | [reply] |
|
|
die "'$folder/$file': $!"; # choose a different literal character
die "\"$folder/$file\": $!"; # escape the inner "
die qq{"$folder/$file": $!}; # use an alternative outer delimiter
| [reply] [d/l] [select] |
|
|
Here's how I might have written that script, with the following changes to your version:
- Formatting: I've used my personal formatting style; a matter of taste of course (and sometimes I even vary my own style, if I think it looks better another way). For example, I added a bunch of whitespace and removed a couple of parens where it's not strictly necessary (but you are free to add parens if you like). I wrapped a few long lines so they would display nicely here, but usually I write my open ... or die ... on one line if it fits reasonably.
- Don't need both #!/usr/bin/perl -w and use warnings; (What's wrong with -w and $^W)
- I used the glob suggestion from jwkrahn, and also made sure that it would only return directories with the -d filetest operator. Note that glob has quite a few caveats, but with fixed strings it's ok.
- I applied most of the changes suggested by poj and others.
- Just to shorten the code a bit, I used an intermediate hash reference $h for $hash{$tRNAname}, in order for this to work I had to make sure to initialize $hash{$tRNAname} with an empty anonymous hash: my $h = ( $hash{$tRNAname} //= {} ) means "assign {} (an empty anonymous hash) to $hash{$tRNAname} if the latter is not yet defined, then assign the value of $hash{$tRNAname} to $h". (See also perlreftut and perlref.) Update: And see hippo's reply for one way to shorten it even more.
- You were closing the directory handle too early, and I had to change the scoping of a couple of variables like $tRNAname.
- I switched to using Data::Dumper to output the variables, which I configured in a way that I like the output better (although normally I'd use Data::Dump; Date::Dumper is a core module). BTW, I'm not sure why you were prefixing the \n in your prints, but normally one would do things like print "$tocount\n";
- You said open my $merge,">>","$folder/$merge", but the latter variable doesn't yet exist at that point (my $merge doesn't take effect until after the open statement), and in your original script you said open(MERGE,">merge"), so I'm not sure if you want a merge file per folder, or a single merge file in the current working directory? If it's the former, the probably hippo's suggestion of opening the file once at the top of the script is better, also then you don't have to use append mode.
- I'm not sure about if ( $tocount =~ /[0-9]/ ): If you want to make sure that it contains only digits, you should anchor your regex, as in
/^[0-9]$/ /^[0-9]+$/.
- Plus I made a few other tweaks and used idioms in a few places, such as ( $tRNAname = $line[0] ) =~ s/-ENS.+$//, which means "copy $line[0] to $tRNAname and then apply the regex to $tRNAname".
- {$a cmp $b} is the default sort order and isn't really needed, unless you really want to be explicit (it doesn't hurt).
Please have a look, and if you have any questions, please let us know.
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dumper;
$Data::Dumper::Useqq = 1;
$Data::Dumper::Quotekeys = 0;
$Data::Dumper::Sortkeys = 1;
for my $folder ( grep {-d} glob('UNITAS_*') ) {
print Data::Dumper->Dump([$folder], [qw/folder/]);
opendir my $dh, $folder or die "$folder: $!";
while ( my $file = readdir($dh) ) {
next if $file !~ /\.mapped_sequences$/;
print Data::Dumper->Dump([$file], [qw/file/]);
my $reads = 0;
open my $fileone, '<', "$folder/$file"
or die "$folder/$file: $!";
while ( my $tocount = <$fileone> ) {
chomp $tocount;
$tocount =~ s/>//g;
next if $tocount =~ /[A-Za-z]/;
if ( $tocount =~ /[0-9]/ ) {
print Data::Dumper->Dump([$tocount], [qw/tocount/]);
$reads += $tocount;
}
}
close $fileone;
print Data::Dumper->Dump([$reads], [qw/reads/]);
my %hash;
my $trftable = 'unitas.tRF-table.txt';
open my $trf, '<', "$folder/$trftable"
or die "$folder/$trftable: $!";
<$trf> for 1 .. 4;
while ( my $line = <$trf> ) {
chomp $line;
my @line = split /\t/, $line;
#print Data::Dumper->Dump([\@line], [qw/*line/]);
my $tRNAname;
if ( $line[0] =~ s/tRNA-[^-]+-...// )
{ $tRNAname = $& }
else
{ ( $tRNAname = $line[0] ) =~ s/-ENS.+$// }
print Data::Dumper->Dump([$tRNAname], [qw/tRNAname/]);
my $h = ( $hash{$tRNAname} //= {} );
$h->{"5p-tR-halves"} += $line[ 1] / $reads * 1000000;
$h->{"5p-tRFs"} += $line[ 3] / $reads * 1000000;
$h->{"3p-tR-halves"} += $line[ 5] / $reads * 1000000;
$h->{"3p-CCA-tRFs"} += $line[ 7] / $reads * 1000000;
$h->{"3p-tRFs"} += $line[ 9] / $reads * 1000000;
$h->{"tRF-1"} += $line[11] / $reads * 1000000;
$h->{"tRNA-leader"} += $line[13] / $reads * 1000000;
$h->{"misc-tRFs"} += $line[15] / $reads * 1000000;
}
close $trf;
print Data::Dumper->Dump([\%hash], [qw/*hash/]);
open my $merge, '>>', "$folder/merge"
or die "$folder/merge: $!";
my @tRF_types = ("5p-tR-halves", "5p-tRFs", "3p-tR-halves",
"3p-CCA-tRFs", "3p-tRFs", "tRF-1", "tRNA-leader",
"misc-tRFs");
for my $tRNAname ( sort keys %hash ) {
print $merge $tRNAname;
for my $tRF_type (@tRF_types) {
print $merge "\t$hash{$tRNAname}{$tRF_type}";
}
print $merge "\n";
}
close $merge;
}
close $dh;
}
For the sample data from this post, the output file merge I get is the following. Note that if you re-run the script, because of the append mode on the merge file, the same lines get added to that file again.
MT-TM 6500000 4500000 0 0 0 0 0 20416666.66666
+66
MT-TN 0 500000 0 0 0 750000 1000000 12625000
MT-TP 0 0 0 0 0 1000000 0 0
tRNA-Ala-AGC 0 3863095.23809524 0 0 136363.636363636
+ 0 0 23353306.8783069
tRNA-Ala-CGC 0 8708333.33333333 0 0 0 14500000 50
+0000 8980291.00529101
tRNA-Ala-TGC 0 11833333.3333333 0 0 90909.0909090909
+ 0 0 84521296.2962963
tRNA-Arg-ACG 0 100000 0 71428.5714285715 0 6500000
+ 0 4916666.66666667
Update: Minor edits and a few additions to the explanations.
| [reply] [d/l] [select] |
|
|
my $i = 1;
for (qw/5p-tR-halves 5p-tRFs 3p-tR-halves 3p-CCA-tRFs 3p-tRFs tRF-1 tR
+NA-leader misc-tRFs/) {
$->{$_} += $line[$i] / $reads * 1000000;
$i += 2; # Odd entries only
}
| [reply] [d/l] [select] |
|
|
|
|
my %hash = (); #initiate empty hash
while ( my $line = <$trf> )
{
next if $. < 5; # skip first four lines
chomp $line;
my @line = split /\t/, $line;
| [reply] [d/l] [select] |
Re: Skript help needed - RegEx & Hashes
by PandaRaey (Novice) on Oct 15, 2018 at 16:45 UTC
|
Hello guys, thank all so, so much for your replies. I learned so much from these it's amazing haha.
I also apologize for not replying earlier, I had some major issues with the internet and were only able to get back on-line and take a look at your help today.
@Haukex; thank you very much. The reason I have been using the formatting as I did, is mainly because I constantly forget to open or close brackets and in the way I did it I have a bit better overview over the brackets I open and close. My guess is, that the more I work with perl and the more I will get used to it I can probably go back to a different formatting style.
Another question I would have would be about this: $h->{"3p-tRFs"} . How was it possible to replace the variable call from my version with " -> " ? My second question would be about the Data:Dumper. Isn't it contra-productive to use it when you are working with large data-sets and would end up printing a lot of content into the terminal? Or would you just use the Data:Dumper until you made sure that your script is working? I can definitely see the advantages of the Data:Dumper but I am just a bit worried it could slow things down too much with large data-sets.
poj, hippo, jwkrahn thank you all so much for your help and baring with my probably very basic and banal problems. But I definitely learned a lot from your help and I slowly start being less scared about working with perl.
| [reply] |
|
|
Glad to help, that's what we're here for :-)
Like I said, don't worry too much about the formatting - it's much more important that you apply whatever formatting style you choose consistently, because no matter what formatting, if indentation isn't applied consistently, it's much easier to make mistakes. Using one of the more common styles might make your code a little easier to read to others, but inconsistent indentation is much more problematic.
$h->{"3p-tRFs"} . How was it possible to replace the variable call from my version with " -> " ?
The Arrow Operator is both the method call operator and the dereferencing operator. For example, I can say:
my %hash = ( hello => "foo", world => "bar");
my $hashref = \%hash; # store a reference to the %hash
print $hashref->{hello}; # prints "foo"
$hashref->{world} = "quz"; # change "bar" to "quz" in orig. hash
References are explained quite nicely in perlreftut - if you've heard of the concept of pointers, references are kind of like "safer" pointers, and therefore less scary ;-) Two advantages of references are that (a) instead of copying data structures when they are passed as arguments to functions*, you can just pass a reference instead, which saves memory and allows the function to modify the original data if desired, and (b) you can build complex data structures out of them, for example an array can contain a list of references to hashes, then you have an AoH (array of hashes); hash values can be references to arrays (HoA, hash of arrays), and all sorts of complex data structures. You can see plenty of examples of the latter in perldsc, and references are explained in detail in perlref.
An "anonymous" hash or array is called that because it doesn't have a name. In "my $hashref = \%hash;", the hash being referenced by $hashref has a name, %hash. In "my $hashref = {};", this does basically the same as the previous piece of code, but now the hash referenced by $hashref is newly created and does not have a name (quite useful when building nested data structures).
... Data:Dumper. Isn't it contra-productive to use it when you are working with large data-sets and would end up printing a lot of content into the terminal?
It's just intended for debugging output - I figured that's what you wanted because you were using prints in your original code. It's no problem to comment them out, or probably even better to do something like:
my $DEBUG = 0; # at the top of the program
...
$DEBUG and print Dumper(...);
# or
print Dumper(...) if $DEBUG;
I personally prefer the former because it's visually a bit easier to skip those lines when you're skimming the code. Or, if performance is a concern, the following will be optimized away (the disadvantage being it's a bit trickier to change a constant via a command-line option):
use constant DEBUG => 0;
...
DEBUG and print Dumper(...);
Minor edits for clarity.
* Update 2: This statement applies when using the common ways to access arguments: sub abc { my ($foo,$bar,...) = @_ } and sub abc { my $foo=shift; my $bar=shift; ... }. Don't worry about accessing the elements of @_ directly just yet ($_[0], $_[1], ...), that's a topic for another day. | [reply] [d/l] [select] |
|
|
> I constantly forget to open or close brackets
You can get a wonderful Perl editor free from Activestate, which will insert a closing
bracket any time you type an open bracket:
www.activestate.com/komodo-ide/downloads/ide
| [reply] |
|
|
a lot of non commercial editors do that too
| [reply] |
|
|