hoffmann has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a data set composed of several Gaussian kernels. I have a program that currently computes mutual information for all samples in the data set and also calculates mutual information for subsets at the extrema of the distribution. In the same program, I have a class that creates a vector of samples through sampling with replacement. I will still need to create an implementation that uses this replacement appropriately. Thus, I need to create a script that composes these data sets for bootstrapping. My question is, how will a bootstrapping algorithm effect the mutual information calculation for the samples at the extrema of the distribution? Furthermore, how exactly would I program such a bootstrapping implementation? Thanks!

Replies are listed 'Best First'.
Re: Bootstrapping Implementation
by Anonymous Monk on Jul 13, 2008 at 03:26 UTC
    You can get started as soon as you can explain what is such a bootstrapping implementation. I am no mathematician (or statistician), but I'm sure even they would need more information in order to help you.
Re: Bootstrapping Implementation
by samtregar (Abbot) on Jul 13, 2008 at 22:36 UTC
    Bravo! Now show us the code that generated this question. I'm guessing you used Markov chains, but what was your source text?

    -sam

Re: Bootstrapping Implementation
by hoffmann (Novice) on Jul 16, 2008 at 22:42 UTC
    Maybe further clarification will help. I have a software used for bioinformatics that reconstructs interaction networks, think genes for example. Briefly, points (or experiments) are randomly sampled from the original data set with replacement and assembled into new 'bootstrapped' data sets containing the same number of experiments as the original. The software is then applied to a large number of such pseudo-data sets to generate a set of bootstrap networks. A consensus network is then constructed that includes edges (i.e., interactions) that are supported across many of the bootstrap networks. The first script, 'qsubbootstrap.pl', is used for the submission of bootstrap jobs to computational clusters. The first argument to this script is the name of the output directory, and the second and third arguments specify the range that is used to number the resulting files. Upon completion of the jobs, executing the second script, 'getconsensusnet.pl', will combine all the adjacency matrices and randomly permute the inferred edges in each bootstrap network to perform significance tests. It will then construct the consensus network based on the user-specified P-value threshold. This script takes as arguments the directory containing the bootstrap networks and the significance threshold. For example, 'perl Scripts/getconsensusnet.pl BoostrapNetworks 1e-7' will construct a consensus network from all bootstrap networks contained in the 'BootstrapNetworks' directory using significance threshold '1e-7'. I want to employ a similar procedure in a related program that calculates two sets of mutual information as described in my first post. Will such an implementation be possible?