in reply to Re: Tallying appearance of a unique string from hash keys
in thread Tallying appearance of a unique string from hash keys

Hello, So I've progressed in my script (for a newbie), but still have some gaps that I could use some help on if anyone is able to help.
#! /local/bin/perl use strict; use warnings; #Declare hash to pull IDs and corresponding degrees into from file my %degree; my %newdegree; my @geneID_1; my @geneID_2; my $filename = "edges.txt"; #Set a variable for our file name open(my $fh, "<", $filename) or die "Can't open file $filename."; #Op +en the file edges.txt while (<$fh>) { if ($_ =~ m/(\S+)\t(\S+)/) { #Match the IDs in the file to + $1 and $2 $degree{$1}++; #Count the appearance of each + ID, and store this as $degree{$2}++; #the value for that key (this + will be the degree) push (@geneID_1, $1); push (@geneID_2, $2); }} close $fh; #Close the file edges.txt #Calling the following subroutines DegreeDistribution(); RandomSequence(); DegreeRan(); #Subroutines can be found below sub DegreeDistribution{ #Determines the degree distribution (i.e. fre +quency of each degree) my %degree_distribution; $degree_distribution{$_}++ for values %degree; #Creates a hash wi +th the keys as the degrees and the values as the frequencies for my $id (keys %degree) { #This corresponds each b +elow to it's key my $d = $degree{$id}; #This is the degree value +corresponding to its gene ID my $freq = $degree_distribution{$d}; #This is the frequency val +ue corresponding to its gene ID??? print "$id has degree:\t$d\t(freq: $freq)\n"; }} sub RandomSequence{ #Generates a hash of random ID interactions and + each IDs degree my $length = scalar (@geneID_1); for (my $i=0; $i<$length; $i++){ my $ID = int(rand($length)); my $new_id1 = $geneID_1[$ID]; my $new_id2 = $geneID_2[$ID]; $newdegree{$new_id1}++; $newdegree{$new_id2}++; }} sub DegreeRan{ #same as sub DegreeDistribution but for the random +sequence my %degree_distribution; $degree_distribution{$_}++ for values %newdegree; #Creates a hash + with the keys as the degrees and the values as the frequencies for my $id (keys %degree) { #This corresponds each b +elow to it's key my $d = $degree{$id}; #This is the degree value +corresponding to its gene ID my $freq = $degree_distribution{$d}; #This is the frequency val +ue corresponding to its gene ID??? print "$id has degree:\t$d\t(freq: $freq)\n"; }} exit;
I have a few concerns: 1. I want to create three random networks, not just one. I'm not sure how to make my subroutine so that it produces a different hash each time (i.e. named differently), than I can return to the main script -- the return function didn't seem to work properly, it only returns one key-value pair not the whole list. 2. Similar to one, I want to run each hash through the DegreeDistribution subroutine, because making a separate subroutine for each hash defeats the purpose of a subroutine. 3. Probably also very related, I want to return the final values to the program so I can use them all (from the original dataset and the three random datasets) in an excel file. Thank you to anyone that responds.

Replies are listed 'Best First'.
Re^3: Tallying appearance of a unique string from hash keys
by tilly (Archbishop) on Mar 28, 2009 at 03:15 UTC
    First of all how to pass a hash:
    # Populate a hash. my %result = some_function(%data); sub some_function { my %passed = @_; my %to_return; # Do stuff with %passed here and populate %to_return return %to_return; }
    With this you can do things like this:
    my %distribution = degree_distribution(); my %random_distribution_1 = random_distribution(); my %random_distribution_2 = random_distribution(); my %random_distribution_3 = random_distribution(); output(%distribution); output(%random_distribution_1); output(%random_distribution_2); output(%random_distribution_3);
    and so on. (I'm not suggesting that those be actual functions you use, but that gives you an idea.)

    Before long I predict that having to repetitively work with 3 random distributions will get very old. That's where you'll want to work with more complex data structures. For that read references quick reference and come back if you have any questions.

      Hello, I've discovered that my random hashes based on my above code are not actually random, but are biased based on the original network. What I should do is outlined in my pseudocode, some elements of which I am having trouble with:
      #read IDs into %edges using match operator /(\S*)(\t)(\S*)/ and specia +l variables $1, $3 for each ID per line #assign IDs as format $edge{ID1,ID2} #skip edge assignment if ID2,ID1 already exists .. the network needs t +o be undirected such that ID2-ID1 is equivalent to ID1-ID2 and should + therefore not be counted twice #populate hash of unique IDs from %edges #create array of unique IDs from hash of unique IDs .... @uniqIDs, $un +iqIDs[0]=ID1, uniqIDs[1]=ID2 etc .. how do i do this? #initialize degree counter hash (IDs are keys, values are degrees) usi +ng @uniqIDs with foreach #go through %edges and increment %counter for each ID #generate random network with rand(int(scalar(@uniqIDs))), discard ran +dom picks if they already exist or represent undirected equivalent #initialize random network degree counter hash, and then count random +network degrees as before
      I'm already stuck at how to skip an edge assignment if $2,$1 exists. Then, how can I populate a new hash of unique IDs? Again, the problem of excluding something if it already exists.
      my $filename = "edges.txt"; #Set a variable for our file name open(my $fh, "<", $filename) or die "Can't open file $filename."; #Op +en the file edges.txt while (<$fh>) { if ($_ =~ m/(\S+)\t(\S+)/) { #Match the IDs in the fil +e to $1 and $2 $edge{$1,$2}= $holder; #assign IDs as format $edge +{ID1,ID2} close $fh; #Close the file edges.txt my @list = keys %uniqedge; print "@list\n"; #Prints the list of keys, but they are un +i-directional (i.e. repeated for ID1-ID2 and ID2-ID1)
      Once I have these steps, I can proceed to counting and using the unique IDs list to create my random networks.

        You might find module Graph helpful for some of what you are doing.

        Some of your questions/difficulties relate to quite basic aspects of Perl and data structures. This is not surprising given you say you are a "newbie". As in the addage "you must learn to walk before you can run", it may be quicker for you to put aside your bigger problem briefly and focus on the basics. A good place to start for Perl, with links to many resources is: Where and how to start learning Perl. Otherwise, you might make sure you are familiar with data structures, graphs, algorithms in general, statistics and sets. Once you know all this (and I am not trying to suggest you currently know none of it) you will be able to solve your bigger problem much more easily and more quickly.

        In particular, it seems you need to review: perldata, perlreftut, perlref, perldsc and perllol. These will help you better understand several of your problems and make you aware of alternative solutions for them.

        update:

        Again, the problem of excluding something if it already exists.

        A common technique for finding/avoiding duplicates is to use a hash to store what you already have and then do lookup.

        #!/usr/bin/perl # use strict; use warnings; my @list_of_items = qw(a b c a d b b z f); my %seen; foreach my $item (@list_of_items) { if($seen{$item}++) { # do what is appropriate for items that have already been seen print "saw an $item again\n"; } else { # do what is appropriate for items that have not already been +seen print "saw an $item\n"; } } foreach my $item (sort keys %seen) { print "$item is in \@list_of_items $seen{$item} " . (($seen{$item} + > 1)?"times\n":"time\n"); }