Re: A generic biomedical data processing library

Replies are listed 'Best First'.
Re^2: A generic biomedical data processing library by CountZero (Bishop) on Mar 21, 2010 at 17:04 UTC
Why not add a second header line that identifies the types of the fields labelled in the first line? I find it rarely a good idea to change anything in external data files and for sure, allowing the users to change a data-file is courting disaster. Before you know it, some data is inadvertantly changed and a wrong diagnosis follows. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply]
Re^3: A generic biomedical data processing library by BrowserUk (Patriarch) on Mar 21, 2010 at 17:14 UTC
allowing the users to change a data-file is courting disaster And who are "the users" in this case? "Disaster" is a little dramatic don't you think? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "I'd rather go naked than blow up my ass"	[reply]
Re^4: A generic biomedical data processing library by CountZero (Bishop) on Mar 21, 2010 at 20:31 UTC
It is biomedical data, so you could be playing with someone's life. In such cases one cannot be careful enough. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply]
Re^5: A generic biomedical data processing library by BrowserUk (Patriarch) on Mar 21, 2010 at 21:12 UTC
Re^6: A generic biomedical data processing library by CountZero (Bishop) on Mar 22, 2010 at 18:40 UTC
Re^6: A generic biomedical data processing library by spiros (Beadle) on Mar 23, 2010 at 08:58 UTC
Some notes below your chosen depth have not been shown here
Re^2: A generic biomedical data processing library by spiros (Beadle) on Mar 21, 2010 at 13:52 UTC
That is a good idea, the only problem I am thinking of is that data tends to be multidemensional so doing this for 150 variables might be tedious. I would also like to have it in a simple manner which is readable - I am leaning towards YAML for this.	[reply]
Re^3: A generic biomedical data processing library by almut (Canon) on Mar 21, 2010 at 15:15 UTC
I am leaning towards YAML for this. Just a general note. During my last job (in a lab mostly frequented by psychologists, linguists, etc.) I wrote a suite of modules for EEG/ERP analysis, and my general conclusion was that if you make the user interface too complex (as is unfortunately sometimes required for generic solutions), people simply aren't going to use the tool — in particular, if the entry threshold is high, and it doesn't come with lots of ready to use cut-n-paste examples. In other words, using YAML would be fine if they know it already, but otherwise they might not be willing to learn it (which could mean - at least if you work in the same lab - you'll always be the one eventually writing the code for them :)	[reply]
Re^4: A generic biomedical data processing library by spiros (Beadle) on Mar 23, 2010 at 08:54 UTC
I agree with your points and at the same time I am being realistic. I dont expect anybody from my work here to use this module - they all use statistical packages. I would expect the average Perl user though to be able to use it easily. My plan, is/was to have a separate tool and have something like "Cut and paste the column names in this space" and the markup would be generated automagically.	[reply]
Re^4: A generic biomedical data processing library by spiros (Beadle) on Mar 22, 2010 at 18:07 UTC
Thank you all for your useful comments. Spiros	[reply]
Re^3: A generic biomedical data processing library by BrowserUk (Patriarch) on Mar 21, 2010 at 14:22 UTC
Dealing with 150 fields is always going to be tedious. Whether you spread it horizontally over a single line, or vertically over 150 lines. The nice thing about the prefix/suffix idea, is that it can be embedded in a standard Xsv file and normal Xsv handling can still be used. If the processor uses the header line, the fields just carry some extra information. If it discards ths headers, it just gets discarded. If it uses the field names for processing, you only need preprocess the first line to strip the suffixes to allow it to still work. Things that a YAML/XML/Other format description would never allow. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "I'd rather go naked than blow up my ass"	[reply]