Hi. I have a dataset of 100,000 lines of this sort (with emphasis on the first and last lines in this example):
ENSP00000372533 ENSP00000372214
ENSP00000372533 ENSP00000362744
ENSP00000372525 ENSP00000368486
ENSP00000372521 ENSP00000355119
ENSP00000372521 ENSP00000362981
ENSP00000372214 ENSP00000372533
Every line such as:
ENSP1 ENSP4
will have later somewhere in the set the same line equivalent but in opposite order:
ENSP4 ENSP1 (which I consider as a redundancy).
want to make this set a non-redundant one. Could you please suggest a way of "cleaning" that dataset- getting rid of lines that already exist, but in opposite order? I couldn't think how to do it in a way which will not be immensly time consuming and clumsy.
Thanks a lot for any idea!