Unique Variable names...

bioinformatics has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Unique Variable names... by dragonchild (Archbishop) on Jul 29, 2003 at 18:48 UTC
A few notes: Use strict, my, and pass your stuff into your functions. The way you're doing with, with all globals, is a recipe for a major headache. `%hash={"$file"=>@final_data}; #this hash assignment +doesn't wor +k close (FILE); } @values=values(%hash);` [download] Yeah, no kidding that's not going to work. You create %hash every iteration through @filename. Try something like: `$hash{$file} = \@final_data; close (FILE); } @values=values(%hash);` [download] If I understand your code correctly, you're attempting to read all the files in a directory and grab all the values in the 4th column of each file, as defined by a tab delimiter. That's get_signal(). I'm not sure what you're doing with get_targets(), so I'll ignore it for now. I would implement the subset of your script that doesn't deal with get_targets() as such: #!/usr/local/bin/perl #Why do you need this?!? #use Cwd qw(cdir); use IO::Dir; use IO::File; print "Please enter the name and location of the directory to parse:\n +"; chomp(my $directory = <STDIN>); my $dh = IO::Dir->new($directory) \|\| die "Cannot open directory '$directory': $!\n"; my @filenames; push @filenames, $_ for map { "$directory/$_" } grep !/^\.\.?/, $dh->r +ead; $dh->close; my %file_data; foreach my $filename (@filenames) { my @final_data; # Why do you need to do this?!? #chdir "./data"; my $fh = IO::File->new($file) \|\| die "Cannot open file '$file': $!\n"; my $i = 0; while (<$fh>) { next while $i++ <= 14; push @{$file_data{$file}}, (split /\t/)[3]; } $fh->close; } # Now, at this point, you have a hash called %file_data # which is keyed by filename. Each filename points to an # array reference contained the values in the 4th column, # starting at the 15th line. What do you want to do with it? [download] ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement. Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.	[reply] [d/l] [select]
Re: Re: Unique Variable names... by BUU (Prior) on Jul 30, 2003 at 00:51 UTC
Minor style/correctness whinge, but I believe `$hash{$file} = \@final_data;` should be `$hash{$file}=[@final_data];`. I suppose it wouldn't matter if you strictly used my @foo; every time the loop went around, but I think this way would be cleaner and less chance to bug up someplace..	[reply] [d/l] [select]
Re3: Unique Variable names... by dragonchild (Archbishop) on Jul 30, 2003 at 13:41 UTC
Actually, it does matter, and potentially might matter a lot. `$x{$y} = \@z;` takes a reference to a data structure that already exists. `$x{$y} = [@z];` creates a new reference and copies the existing data structure into it. This can be an expensive operation. My operation will always occur in the same amount of time, regardless of how many elements there are in @z. I suppose it wouldn't matter if you strictly used my @foo; every time the loop went around, but I think this way would be cleaner and less chance to bug up someplace.. Always using `my @foo;` every time the loop went around is both cleaner and less bug-prone. I am having the language handle my memory management for me. The language will always do it right - I might not. The rule of thumb is that if you're doing `$x{$y} = [@z];` and you don't have a compelling reason why, you probably are doing something that is bug-prone and should rewrite it. ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement. Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.	[reply] [d/l] [select]
Re: Re: Unique Variable names... by bioinformatics (Friar) on Jul 30, 2003 at 15:49 UTC
The reason I used the Cwd module is because I needed to change the current directory in order for the program to function. Hence, when I use your program, I have the same issue: it will be unable to open the files until the working directory is changed within the program. Bioinformatics	[reply]
Re: Re: Re: Unique Variable names... by Limbic~Region (Chancellor) on Jul 30, 2003 at 15:59 UTC
bioinformatics, Why do you need to use Cwd when chdir will work just fine? Why do you put the chdir inside the foreach loop? Would not changing the directory at the top of the foreach loop once be sufficient. As a final note - dragonchild's code provides the directory and file name in the file list, so the chdir really is not required. Cheers - L~R	[reply]
Re: Re: Unique Variable names... by bioinformatics (Friar) on Jul 30, 2003 at 19:50 UTC
Thank you for your suggestions. I have incorporated a number of them into my program, as well as made it more streamlined. However, my problem still remains. I need to print the data from @signal into consecutive columns, as shown in my post at the end of the thread. The only way I know how to make this managable is to take the data pushed into @signal, grep it, and then shove it into 5 separate arrays (there are 5 input files CURRENTLY). This is all well and good, except that I need to somehow make this program capable of handling different numbers of files each time it is run. Do I need to manually assign, say 30, arrays in which to put the data, placing a limit on the program? I'm sure there are better ways to do this, but I don't know how. NOTE: please be patient with me, as I'm only a beginning programmer, perl being my first language. Having only been working with it for a month now, I suppose I could be doing worse...:-) My latest code: #! usr/local/bin/perl -w use Cwd; use IO::Dir; use IO::File; print STDOUT "Please enter the name and location of the directory to p +arse\:\n"; chomp (my $directory=<STDIN>); open (OUTPUTFILE,">junk.txt"); my $dh = IO::Dir->new($directory) \|\| die "Cannot open directory '$dire +ctory': $!\n"; my @filenames; push @filenames, $_ for map { "$directory/$_" } grep !/^\.\.?/, $dh->r +ead; $dh->close; @output=get_signal; for (@output){ @{$signal[0-4]}; @{$rprobe};} print OUTPUTFILE "@signal"; close OUTPUTFILE; exit; sub get_signal { while (@filename) { $file=shift @filename; use Cwd 'chdir'; chdir "./data"; open (FILE, "$file") or die; @data=<FILE>; my $i=0; foreach (@data) { next while $i++ <=14; push @signal, (split(/\t/))[3];} my $g=0; foreach (@data) { next while $g++ <=14; @probe=(split(/\t/))[0];} $hash{$file}=\@signal; close (FILE); } @values=values(%hash); $rprobe=\@probe; return @values; return $rprobe; } [download] Bioinformatics	[reply] [d/l]
Re: Unique Variable names... by CountZero (Bishop) on Jul 29, 2003 at 19:15 UTC
Without already being able to give you a solution, I have the following comments: Just a style argument: why do you put the subs definitions in the middle of your code? It tends to make the structure a lot less easy to read. `splice`ing the first two items of your array with filenames/directories, is a nice trick if you can be sure that the first two items are always the dot and dot-dot items. This may be something which is not guaranteed and/or not portable across all OS. Your program assumes (as is your good right) a very specific directory and file-structure (top level only holds directories and each such directory contains a "data"-file. Which makes it difficult to test your script if one doesn't have the same structure. `get_targets` and `get_signal`, seems to go through the same "data"-file, just extracting different items, resp. extracting the first and the fourth item and saving the rest in some variables which are never used (if you did `use warnings` you would have received some warnings in this respect). The same goes for the variables $scratch, $excess and $spliced_data, which are essentially just garbage bins in your script. Rather than using global variables, you could pass to your subs an argument list. If you did that then you would really see that you are using the same arguments in both subroutines. Now you use @genius and @filename, which are just copies of each other, but that is not readily apparent. What you are trying to do with `%hash={"$file"=>@final_data}` beats me. Could you explain it? Why do you `return` the value of @targets to @columns? You never ever use the @columns-array? Why did you think `@$i=@next_columns[$i]` would work? Can you explain your reasoning behind it? About the "unique variable name" thing: why would you need that? I'm not convinced that it is necessary for your purpose. May I suggest that you give us an example of your inputs and your expected output? That would make it a lot easier to help you. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply] [d/l] [select]
Re: Re: Unique Variable names... by bioinformatics (Friar) on Jul 29, 2003 at 19:42 UTC
My input data file would look something like: `AFFX-BioB-5_at 20 20 200.2 P 0.001 AFFX-BioB-M_at 20 20 400.4 P 0.002 AFFX-BioB-3_at 20 20 200.5 P 0.003` [download] I want the 4th column, with the signal data. Actually, that other subroutine gets only the first column from the first file. I didn't know how to do that otherwise, since everything else was in a loop. This way, I have everything set to look like (if it worked anyway): `AFFX-BioB-5_at 200.0 300.0 400.0 AFFX-BioB-M_at 200.0 300.0 400.0 AFFX-BioB-3_at 200.0 300.0 400.0` [download] Thanks for your suggestions!! Bioinformatics	[reply] [d/l] [select]
Re: Re: Re: Unique Variable names... by CountZero (Bishop) on Jul 30, 2003 at 19:48 UTC
OK, now we're talking. Assuming you want a result which is first the AFFX-BioB-xxx_at-identifier, followed by all signal-data which are connected to this identifier, I suggest: you drop the `get_targets` sub and all references to it. Then you change your sub get_signal to: `while (@filename) { my $i; $file=shift @filename; use Cwd 'chdir'; chdir "./data"; open (FILE, "$file") or die; while (<FILE>) { next while $i++ <= 14; (my $id, undef, undef, my $signal, undef, undef)=split(/\t +/); push @{$outputdata{$id}}, $signal; } close (FILE); }` [download] After having run this sub over all your files you will find in %outputdata a nicely ordered (per identifier) structure of your signal-data. "Printing this datastructure goes as follows: `for $id (keys %outputdata) { print "$id:\t",(join("\t",@{$outputdata{$id}})),"\n"; }` [download] Of course you can print it to a filehandle. This is a format which is suitable to be imported in a database or a spreadsheet. The "magic" of using references to anonymous arrays may perhaps be a bit too deep for someone who is just starting to program, but if you read Chapters 8 and 9 of the Camel book a few times and study the examples given, much will become clearer. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply] [d/l] [select]
Re: Unique Variable names... by LameNerd (Hermit) on Jul 29, 2003 at 18:31 UTC
You could get the "names" like so ... `sub get_signal { ... @values=values(%hash); @names=keys(%hash); return (\@values, \@names ); } ... my ( $arrRef_next_columns, $arrRef_names ) = get_signal;` [download]	[reply] [d/l]