in reply to Selecting a random number, and back calculating to chromosome and contig!

You're a victim of typos. When I ran your code through perl -c, I got a log of messages like:
Global symbol "$conig" requires explicit package name at 349376.pl line 555.
You spell that variable as "$contig" in some places and "$conig" in others. However, that's not the entire scope of your problem. You declare $contig as my in different scopes, so I'd expect that your code wouldn't run as you expect. A short demonstration of what I'm talking about:
my $contig = 3; my $foo = 6; if ($foo < 8) { my $contig = 5; } print "contig = $contig\n"; __END__ contig = 3

As a matter of style, you should probably avoid the whole "if..elsif..elsif..elsif..." routine for this. As I'm sure you found out, it's very tedious to type. A more visually appealing (and easier to maintain) to do something like the following:

my %master = ( 167280 => (chromosome => 1, contig => 'NT_077402.1'), 217280 => (chromosome => 1, contig => 'gap'), 257582 => (chromosome => 1, contig => 'NT_077911.1'), # etc ); NUMBER: foreach my $number (@numbers) { foreach my $key (sort { $a <=> $b } keys %master) { if ($number < $key) { my $chromosome = $master{$key}{chromosome}; my $contig = $master{$key}{contig}; #do whatever processing here; I'll just print print "chromosome = $chromosome, contig = $contig\n"; next NUMBER; } } }
This style has the advantage of maintainability. Should you find an error in your processing logic (for example, if I found that I had misspelled 'chromosome' in my print statement'), you have exactly one place to fix it.

Hope this helps,
thor

Update: I just thought of another reason to avoid the "if..elsif..." paradigm. Once, someone came to me for help in trying to run a tool that had such a thing in it. perl core dumped because the script was too large. I converted the solution to a hash, and it ran in no time at all.

Replies are listed 'Best First'.
Re: Re: Selecting a random number, and back calculating to chromosome and contig!
by Sameet (Beadle) on Apr 30, 2004 at 18:01 UTC
    I will try to do this. Actually this hash business sort of tricky. i actually tried to avoid it!! but i can see that using hashes is probably the best option i have got. I will look into it and get back as soon as possible
    Thank you for all the help
    Sameet


    Update
    Hi,
    I did a lookup table as you had suggested but it still has become a huge file (now huge is subjective i know :-) ), but it is something like 650+ lines of code for the look-up hash itself. I will test it and again get back if i have more doubts or need more help.
    Thank you again for all the help!


    Update 2
    Acutally the file from which i am downloading these values come from a standard data format an html file, where length of each contig is given. What i am trying to do it is make those contigs into periods and then map the given random number where it would fall.
    . What i want to know are there any HTML parsers avialable, if yes, (I am sure they are but I am at a loss which one to use! :-( ), how to go about using them.
    Regards


    UPDATE3
    I have done what you had asked. But it gives me following error
    Argument "NT_077988.2" isn't numeric in sort at newhumangenomelookup.p +l line 736. Argument "contig" isn't numeric in sort at newhumangenomelookup.pl lin +e 736. Argument "NT_024498.12" isn't numeric in sort at newhumangenomelookup. +pl line 736. Argument "NT_030737.8" isn't numeric in sort at newhumangenomelookup.p +l line 736. Argument "NT_077627.2" isn't numeric in sort at newhumangenomelookup.p +l line 736. Argument "NT_079485.1" isn't numeric in sort at newhumangenomelookup.p +l line 736. Argument "NT_035608.1" isn't numeric in sort at newhumangenomelookup.p +l line 736. Argument "NT_077911.1" isn't numeric in sort at newhumangenomelookup.p +l line 736. Argument "NT_023736.16" isn't numeric in sort at newhumangenomelookup. +pl line 736. Argument "Y" isn't numeric in sort at newhumangenomelookup.pl line 736 +. Argument "NT_077931.2" isn't numeric in sort at newhumangenomelookup.p +l line 736. Argument "NT_011903.10" isn't numeric in sort at newhumangenomelookup. +pl line 736. Argument "NT_011295.10" isn't numeric in sort at newhumangenomelookup. +pl line 736.
    I have given here just a part of the error that i got as output. Can you give me some hints as to what is going wrong
    I am giving my new code here
    Sameet
      It looks like you wrote your if/else statements by cutting and pasting each statement, and then modifying each individual statement. While this may feel like real work, it's really not an efficient use of your time. If you ever wanted to modify this program, it would be very difficult to do because of its length. Developing with repetitive code is also prone to typos and other sorts of little errors, as you can see by the conig/contig typos that are in your code. By using a better data structure like thor proposes, you can avoid making minute modifications to repetitive pieces of code (like your if/else statement) and concentrate more on developing your program.

      Another idea that you may look into as you learn more is by representing your data and operations on this data in the form of objects. For a reference, check out this short example on OO programming in Perl.

      As you develop your code more and more, you may want to begin abstracting your useful code into subroutines. Here's an example that relies on thor's hash-based data structure shown above:

      sub print_contig_information { my $number = shift; foreach my $key (sort { $a <=> $b } keys %master) { if ($number < $key) { my $chromosome = $master{$key}{chromosome}; my $contig = $master{$key}{contig}; #do whatever processing here; I'll just print print "chromosome = $chromosome, contig = $contig\n"; } } }
      By modularizing your code every way, you avoid having to change the methodology of changing your code in many places every time you'd like to modify your program.

      Hope that this helps. :)

        HI,
        I have tried doing this, but i keep on getting error messages saying tht the "chromosome" is not numeric to be sorted. It also gives the same error for some contigs. Is there a way around

        Sameet