comment on

There is a bioperl module that knows how to talk to NCBI's E-Utilities: see http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook (it mentions homologene - I suppose it works, but I haven't tried it). You can also use the EUtilities directly. Both approaches have a slight learning curve.

Another, third approach is to download homologene into a local database. The NCBI E-Utilities work well, but working with homologene, I find it handier (and faster) to have all data locally, and use the file provided by NCBI in:

  ftp://ftp.ncbi.nih.gov/pub/HomoloGene/current/
[download]

The file 'homologene.data' there, when stored in a database, looks like this (just showing 10 random rows):

 homologene_group_id | tax_id | geneid  |     symbol      | protein_gi
+ | protein_accession 
---------------------+--------+---------+-----------------+-----------
+-+-------------------
                   3 |   9606 |      34 | ACADM           |  187960098
+ | NP_001120800.1
                   3 |   9598 |  469356 | ACADM           |  114557331
+ | XP_524741.2
                   3 |   9615 |  490207 | ACADM           |   73960161
+ | XP_547328.2
                   3 |   9913 |  505968 | ACADM           |  115497690
+ | NP_001068703.1
                   3 |  10090 |   11364 | Acadm           |    6680618
+ | NP_031408.1
                   3 |  10116 |   24158 | Acadm           |    8392833
+ | NP_058682.1
                   3 |   7955 |  406283 | acadm           |   47085823
+ | NP_998254.1
                   3 |   7227 |   38864 | CG12262         |   24660351
+ | NP_648149.1
                   3 |   7165 | 1276346 | AgaP_AGAP005662 |   58387602
+ | XP_315683.2
                   3 |   6239 |  181757 | acdh-10         |   17569725
+ | NP_510788.1
(
[download]

What you want is to look up your human gene or accession (human: tax_id=9606), take the group_id, and see if there is a Drosophila melanogaster (fly: tax_id=7227) record within the same group id.

In case you have basic database skills, here is a way to load that file into a postgresql database:



#!/bin/sh

wget ftp://ftp.ncbi.nih.gov/pub/HomoloGene/current/homologene.data;

< homologene.data psql -c "
  drop table if exists my_homologene_data;
  create table         my_homologene_data (
    homologene_group_id  integer , 
    tax_id               integer , 
    geneid               integer , 
    symbol               text    , 
    protein_gi           integer , 
    protein_accession    text     
 );
 copy my_homologene_data from stdin csv delimiter E'\t';
";

echo "select count(*) from my_homologene_data" | psql;
[download]

The records that have the same group id are homologs.

select * from my_homologene_data where homologene_group_id = 31015
[download]

With that group id, you can easily construct links into specific NCBI homologene pages too:

http://www.ncbi.nlm.nih.gov/homologene/?term=31015
[download]

hth

P.S. Re zoological nomenclature: in the binomial name Drosophila melanogaster, 'melanogaster' is the epitheton and must *always* be lower case; only genus names must be capitalised.

In reply to Re: Homologene BioPerl by erix
in thread Homologene BioPerl by ZWcarp

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.