in reply to Re: (Failing) script to return an official ID
in thread (Failing) script to return an official ID

Dear toolic,
This script works beautifully on my dummy data.

When I run it using my real files I get an error message that repeats itself line after line until I stop it.

Use of uninitialized value in string eq at HUGOID_extract.pl line 50, <$GENEFILE> line 1.

Line 50 is -if ($genes2 eq $hugo$i) -

I am confused as to why it would work on dummy data but not real.
My only thoughts are that in the IDs file (DUMMYHUGO) there are up to 28 columns with aliases for genes names (for which I want to return the HUGO ID (hopefully this is clear from my initial post). However, not all gene names have 28 alises.

With the dummy data, there was always an 'eq' for the gene name. In the real files there may not be. Is it possible that if there is no match in the DUMMYHUGO file that the script doesn't know how to move on?

I have added some code after your if loop (below)

if ($genes[2] eq $hugo[$i]) { print $OUT "$genes[0]\t$genes[1]\t$genes[2]\t$genes[3] +\t$hugo[1]\n"; })


My code is as follows (looks a little squiffy here but still):
#added by me else { print $OUT "$genes[0]\t$genes[1]\t$genes[2]\t$ +genes[3]\tHUGO_notfound\n"; $i++; }


I still get the infinite error though.
Any further ideas that can help?
Thanks again.

Replies are listed 'Best First'.
Re^3: (Failing) script to return an official ID
by toolic (Bishop) on Apr 12, 2008 at 20:01 UTC
    The code I posted, as I mentioned, was not a complete solution. The code assumed that every line in the genes file would have at least 3 columns and that every line in the hugo file would have at least 9 columns, since this is what your dummy input sample files had.

    If your actual files have fewer columns, then you might get those warnings.

    If your actual files have blank lines, then you might get those warnings.

    It is impossible for me to know the structure of your input files without seeing more (small) examples. My guess is that you now need to check the format of your input. For example, you could check how many columns are in each line of the genes file by checking how many elements are in the array:

    my $cols = scalar @genes;

    Are you sure the code is looping infinitely? I could believe that the code would take a long time to run if your input files are really big (1 million lines, many columns per line).

      Yes, my dummy files were very simplified. Some lines may not have any alises at all, it ranges from between 0-35. That must be the problem.

      Not I'm not sure the code loops infinitely, I stopped after many -many lines!

      Thanks again.