in reply to Parsing csv without changing dimension of original file
to the stackoverflow script.next if $. < 2;
Without access to the data - I'm guessing that this suppresses output of the heading line, and might cause the problem you have seen.
You can avoid speculation about your intent, and get better responses if you show a sample of the original data, and the output you expect.
...it is unhealthy to remain near things that are in the process of blowing up. man page for WARP, by Larry Wall
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Parsing csv without changing dimension of original file
by huck (Prior) on Mar 06, 2017 at 20:17 UTC | |
Because there is no close $fh;between the two opens next if $. < 2; is meaningless if there are any lines in kegg_pathway_title.txt | [reply] [d/l] [select] |
|
Re^2: Parsing csv without changing dimension of original file
by zillur (Novice) on Mar 06, 2017 at 23:56 UTC | |
Thank you very much for your reply. Here is the sample of my original data "kegg_pathway_title.txt":
The orhtogroups_3.csv has 13 columns
Expected output:
I want the column number (13) in orthogroups_3.csv and the parsed results to be same. Best regards Zillur | [reply] [d/l] [select] |
by huck (Prior) on Mar 07, 2017 at 01:08 UTC | |
Well there are many many things wrong here in your original example page http://stackoverflow.com/questions/11678939/replace-text-based-on-a-dictionary you missed the part where it says "I have a dictionary(dict.txt). It is space separated and it reads like this:. Your kegg_pathway_title.txt instead has a tab after the replace-from field. In a way that is easy to fix, change the Line to
Next implies that the fields are separated by a tab (\t). Yes your column headers are separated by a tab, and there are tabs in your other rows, but row OG0000000 only has 5 tab separated fields, three of them being blank due to consecutive tabs, and being considered as one field, being number 5. There is a tab however after the OG0000000 at least The next line OG0000001 does have 13 tab delimited fields, OG0000001 has a tab after it to put it in its own column, followed by 11 blank fields due to consecutive tabs, and being considered the contents of the 13th column Given the following as your dictionary one notices that none of the replace-from fields in it even occur in your sample Orthogroups_3.csv file at all, so your expected output is a myth. and besides the tab after the replace-from in your dictionary file, adding this code identifies a major problem result Those new tabs introduce "extra columns" to the output. The code that identifies all these problems is
All this leads me to think you dont have much of a clue as to what you are doing and are just trying cookie-cutter examples found on the web. This is a bad thing to do Edit: code tabs around the huge fields, but im not sure its any better | [reply] [d/l] [select] |
by zillur (Novice) on Mar 07, 2017 at 03:54 UTC | |
Thank you very much for your comment. Sorry for the inconvenience. Using "sep=\t" in your first script solved the problem and this give me the exact output as the latest script my %dict = map { chomp; split '\t', $_, 2 } <$fh>;I have another problem. In the result, I still have previous text in many cells. Its strange, some cells replaced exactly some not. Either they might not be replaced or replaced by the 1st column of the 'egg_pathway_title.txt'. I was trying to delete those strings but failed. What I have done ut -f1 bioDBnet_db2db_KEGG_Title_final.txt > exclude-these.txtbut failed. I have tried in many ways. Is there any way to upload my files here? Best Regards Zillur | [reply] [d/l] [select] |