Hi,
I am facing a problem with text file manipulation with Perl.

I have a file with over 2 lac lines of data.
I need to find the duplicates(strings) in the file and copy
those records into another file.
Is there a function/module in Perl by which I can read the
duplicates in a file at one go and
print them
on to another file.
The following is a more detailed form of my requirement:

The input to the code is a text file with the following format of records.

dn: cn=1148734,ou=Employees,dc=jci,dc=com
displayname: Herek, Moriah L
jdirlastfourssn: 2888

dn: cn=1148735,ou=Employees,dc=jci,dc=com
displayname: Pelletier, Michael J
jdirlastfourssn: 8719
uid: cpellem

dn: cn=1148736,ou=Employees,dc=jci,dc=com
displayname: Manimanakis, Aris N
jdirlastfourssn: 0366

dn: cn=1148738,ou=Employees,dc=jci,dc=com
displayname: Bernardini, James A
jdirlastfourssn: 8540

dn: cn=1148739,ou=Employees,dc=jci,dc=com
displayname: Steyvers, Robert L
jdirlastfourssn: 8634

dn: cn=1148740,ou=Employees,dc=jci,dc=com
displayname: Vest, Elizabeth G
jdirlastfourssn: 7487

The file will look like the above.
What I need to do is:

1. Take the first entry and get the value of the display name attribute.
2. Check whether there is another record with the same display name attribute value.(There cud be
multiple records)
3. If so then extract both record and write them into
another file.
4. Delete these duplicate records from the parent file.
5. Do that for all records

I hope you got what I meant.

In reply to Help needed on text file/String manipulation by chinkusimon

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.