Re^2: A generic biomedical data processing library
by CountZero (Bishop) on Mar 21, 2010 at 17:04 UTC
|
Why not add a second header line that identifies the types of the fields labelled in the first line? I find it rarely a good idea to change anything in external data files and for sure, allowing the users to change a data-file is courting disaster.Before you know it, some data is inadvertantly changed and a wrong diagnosis follows.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] |
|
|
| [reply] |
|
|
| [reply] |
|
|
|
|
|
|
|
Re^2: A generic biomedical data processing library
by spiros (Beadle) on Mar 21, 2010 at 13:52 UTC
|
That is a good idea, the only problem I am thinking of is that data tends to be multidemensional so doing this for 150 variables might be tedious. I would also like to have it in a simple manner which is readable - I am leaning towards YAML for this. | [reply] |
|
|
I am leaning towards YAML for this.
Just a general note. During my last job (in a lab mostly frequented by
psychologists, linguists, etc.) I wrote a suite of modules for EEG/ERP
analysis, and my general conclusion was that if you make the user
interface too complex (as is unfortunately sometimes required for generic solutions),
people simply aren't going to use the tool — in particular, if the entry
threshold is high, and it doesn't come with lots of ready to use cut-n-paste examples.
In other words, using YAML would be fine if they know it already, but
otherwise they might not be willing to learn it (which could mean - at
least if you work in the same lab - you'll always be the one eventually writing the code for them :)
| [reply] |
|
|
I agree with your points and at the same time I am being realistic. I dont expect anybody from my work here to use this module - they all use statistical packages. I would expect the average Perl user though to be able to use it easily.
My plan, is/was to have a separate tool and have something like "Cut and paste the column names in this space" and the markup would be generated automagically.
| [reply] |
|
|
Thank you all for your useful comments.
Spiros
| [reply] |
|
|
Dealing with 150 fields is always going to be tedious. Whether you spread it horizontally over a single line, or vertically over 150 lines.
The nice thing about the prefix/suffix idea, is that it can be embedded in a standard Xsv file and normal Xsv handling can still be used. If the processor uses the header line, the fields just carry some extra information. If it discards ths headers, it just gets discarded. If it uses the field names for processing, you only need preprocess the first line to strip the suffixes to allow it to still work.
Things that a YAML/XML/Other format description would never allow.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |