Dear Monks,
I have several thousands of lines that have a format like the following:
ID1<TAB>name1-name2-name3-name4....-nameX
ID2<TAB>name1-name2-name3-name4....-nameX
ID3<TAB>name1-name2-name3-name4....-nameX
...
What I want to achieve is to keep each of the IDs as they appear and ONLY the unique names that appear on the second column. For instance, imagine:
ID1<TAB>nick-john-helena
ID2<TAB>george-andreas-lisa-anna-matthew-andreas-lisa
ID3<TAB>olivia-niels-peter-lars-niels-lars-olivia-olivia
...
my output should be:
ID1<TAB>nick-john-helena
ID2<TAB>george-andreas-lisa-anna-matthew
ID3<TAB>olivia-niels-peter-lars
What I am looking is to see if someone has a quick solution to this. My approach would be to read each line, store the ID and the line of names, split the names using
- as delimiter, put all names into an temp array on the fly and then make this array unique and print the unique elements.
Any more clever solution perhaps?
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.