Renyulb28 has asked for the wisdom of the Perl Monks concerning the following question:
The dataset I just received is horribly formatted, and my limited perl knowledge is not enough to do what I would like, and thus I would like to ask you monks for aid. The dataset is set up in a 5 column format; with the columns as sample ID, mother ID, father ID, sex, and then attribute. The first problem is that some of the ID's have two or three different ID's in one observation since it was uncertain as to which was the true one. Thus, the usual ID of 1293 might be 1293&1295 or 1293&1295&1305. Thus, for this, I would like perl to go through and if it finds a "&" symbol, delete it along with all other strings after it, therefore only leaving the first ID. Right now I've only found how to delete the line if it matches the string, but not only part of the line.
$ perl -ni -e 'print unless /&/' filename
The second problem is that for the attribute column, it needs to be either 0 for missing, 1 for HCR, or 2 for LCR. Right now the format is either 13HCR-NIH-0 or 13LCR-NIH-0. The numbers in there are arbitrary. What I would like perl to do is if it detects the string "HCR" in a line in column 5, change the entire string to 1, and same for "LCR" and 2. For this I have tried using the find and replace
-p -i.bak -e 's/13HCR-NIH-0/1/g' filename
but this is way too time consuming as there are too many permutations of the digits.
Thank you for any advise/help
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Data format - delete parts of string and replace strings that match characters partly with numbers.
by BrowserUk (Patriarch) on Apr 01, 2011 at 17:53 UTC | |
|
Re: Data format - delete parts of string and replace strings that match characters partly with numbers.
by jellisii2 (Hermit) on Apr 01, 2011 at 17:53 UTC | |
|
Re: Data format - delete parts of string and replace strings that match characters partly with numbers.
by locked_user sundialsvc4 (Abbot) on Apr 01, 2011 at 18:21 UTC |