in reply to formating data input

Using -a (autosplit) -F (split delimiter), -n (read file line by line) -l (add newlines): See perlrun

Switch "s for 's on a unix system.

C:\test>perl -aF"\|" -nle"print join '|', @F[0,1]" junk6.pl 271479 | Papaya leaf curl Guandong virus 12202 | Lettuce mosaic virus 116056 | Pelargonium zonate spot virus 45709 | Sabia virus

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: formating data input
by manu7495 (Initiate) on Jun 27, 2007 at 14:19 UTC
    THANKS TONS TONS AND TONS for all the wonderful suggestions. I could get what I wanted on the shell. Thanks again Is there any way I could use this inside the code I have posted and still do what I am intending to do. I just tried and it screwed up a bit.....so would offer any suggestions. Again thanks a lot.....I very much appreciate the help Thanks

      I do not know how to answer you because the code you posted doesn't seem to relate to the data posted--either before or after the reduction to two fields.

      The program is looking for lines that begin the word ARRAY and ignores any other lines, but none of your posted data has line that begin with ARRAY?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Here is exactly my input data pasted below that I am using this script for. It chokes when in the first cell (if you were to import the whole data into excel) when it has more than the form <filed1> | <filed2> | <field3> | <field4> all this is one annotation in one CELL.Look at the code 1 below which is the actual data in the form I need to only have <filed1> | <filed2> of this cell but still retain the rest of the values in other cells. look at code 2 thats the form I intend to have. The previous one line code U sent does strip and put the data in the format I am desiring but takes away all the other data in my table which is not intended. Looking at the 2 examples pasted here as code(tab file) I am sure you will understand what I am asking
        ARRAYS V1.4HFChip-1 V1.4HFChip-10 V1.4HFChip-100 V1.4HFChi +p-11 V1.4HFChip-12 V1.4HFChip-13 37133 | Tula virus | Bunyaviridae | V1.3_110017:22.1 0.539026 0. +357762 0.801409 0.315076 0.207579 0.946322 263532 | Possum enterovirus W6 | Picornaviridae | V1.3_116027:82.1 +0.242743 0.712059 0.474686 0.738211 0.26494 0.529945 271479 | Papaya leaf curl Guandong virus | Geminiviridae | V1.3_105649 +:75.8 0.291412 0.726736 0.277159 0.893388 0.24579 0 +.904211 12202 | Lettuce mosaic virus | Potyviridae | V1.3_118815:65.4 0.391 +46 0.567612 0.771404 0.671439 0.427434 0.855816 116056 | Pelargonium zonate spot virus | Bromoviridae | V1_111931:65.5 + 0.704965 0.750921 0.66365 0.835392 0.654149 0.0426 +2 45709 | Sabia virus | Arenaviridae | V1_112261:16.8 0.392471 0.7 +40175 0.584603 0.861441 0.434677 0.758832 130556 | Culex nigripalpus NPV | Baculoviridae | V1_112047:15.8 0.3 +15955 0.882084 0.551393 0.909915 0.088346 0.745482 312349 | Procyon lotor papillomavirus type 1 | Papillomaviridae | V1.3 +_113827:83.8 0.652409 0.200222 0.65569 0.239118 0.5376 +55 0.889673 243550 | Calicivirus isolate TCG | Caliciviridae | V1.3_115411:78.6 + 0.324359 0.820308 0.238306 0.88163 0.311354 0.741035 150285 | Garlic virus E | Flexiviridae | V1.3_103783:90.0 0.267302 + 0.809609 0.55432 0.908932 0.193653 0.718928
        ARRAYS V1.4HFChip-1 V1.4HFChip-10 V1.4HFChip-100 V1.4HFChi +p-11 V1.4HFChip-12 V1.4HFChip-13 37133 | Tula virus 0.539026 0.357762 0.801409 0.315076 + 0.207579 0.946322 263532 | Possum enterovirus W6 0.242743 0.712059 0.474686 + 0.738211 0.26494 0.529945 271479 | Papaya leaf curl Guandong virus 0.291412 0.726736 0 +.277159 0.893388 0.24579 0.904211 12202 | Lettuce mosaic virus 0.39146 0.567612 0.771404 0. +671439 0.427434 0.855816 116056 | Pelargonium zonate spot virus 0.704965 0.750921 0.6 +6365 0.835392 0.654149 0.04262 45709 | Sabia virus 0.392471 0.740175 0.584603 0.861441 + 0.434677 0.758832 130556 | Culex nigripalpus NPV 0.315955 0.882084 0.551393 + 0.909915 0.088346 0.745482 312349 | Procyon lotor papillomavirus type 1 0.652409 0.200222 + 0.65569 0.239118 0.537655 0.889673 243550 | Calicivirus isolate TCG 0.324359 0.820308 0.238306 + 0.88163 0.311354 0.741035 150285 | Garlic virus E 0.267302 0.809609 0.55432 0.90893 +2 0.193653 0.718928
        Hope this makes my question clear. I restate that I just want the curation to happen in the first cell but retain all the values and just for information I have several thousands of cells like this