HI,
I have a tab separated file which may run upto 5000 lines.
The file format is some thing like this:
XXXXXS331632 XXXXXS331632 female 40087 a5
XXXXXS331632 XXXXXS331632 female 47735 a5
XXXXXS331681 XXXXXS331681 male 40087 e6
XXXXXS331681 XXXXXS331681 male 47735 e6
XXXXXS331856 XXXXXS331856 male 40177 d1
XXXXXS331856 XXXXXS331856 male 47737 d1
What I really want to do is delete the row that appears twice irrespective of the difference(40087 , 47735) in the 4th column. I could remove either the first or the the second entry. At the end what I like to have is a file with the duplicate(?) entry removed.
Something like this:
XXXXXS331632 XXXXXS331632 female 40087 a5
XXXXXS331681 XXXXXS331681 male 40087 e6
XXXXXS331856 XXXXXS331856 male 40177 d1
Any suggestions please
Thanks for your time.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.