Here's how I'd do it:

use strict; use Data::Dumper; my $delimiter = "|"; # delimiter for multiple tags my %data; while(my $line = <DATA>) { chomp $line; $line =~ s/^(.*?):\s*//; # remove leading number and colon, and any +whitespace my $rec = {}; my $partno; for my $field ( split(/,/, $line) ) { my ($key, $value) = split(/=/, $field); if ($key eq 'partnum') { $partno = $value; } else { $rec->{$key} = $value; } } if ( defined $data{ $partno } ) { if ( $data{$partno}{'tags'} ) { $data{$partno}{'tags'} .= $delimiter . $rec->{'tags'}; } else { $data{$partno}{'tags'} = $rec->{'tags'}; } $data{$partno}{'quantity'} += $rec->{'quantity'}; unless ($data{$partno}{'description'} eq $rec->{'description'}) { warn "Multiple descriptions for $partno ! \n"; } } else { $data{ $partno } = $rec; } } print Dumper(\%data), "\n"; __DATA__ 413: partnum=2204133000,description=PRESS GAUGE,quantity=1.0000,tags=P +I-412 414: partnum=2202261000,description=THERMOWELL,quantity=2.0000,tags= 415: partnum=2201176000,description=THERMOMETER,quantity=2.0000,tags= 581: partnum=2204227002,description=TEMP TRANSMITTER,quantity=1.0000,t +ags=TE/TT-102 582: partnum=2201176000,description=THERMOMETER,quantity=3.0000,tags=T +I-100 TI-101 TI-200 576: partnum=2204133000,description=PRESS GAUGE,quantity=1.0000,tags=P +I-400

Notes:

I get this output:

$VAR1 = { '2201176000' => { 'quantity' => 5, 'description' => 'THERMOMETER', 'tags' => 'TI-100 TI-101 TI-200' }, '2204133000' => { 'quantity' => 2, 'description' => 'PRESS GAUGE', 'tags' => 'PI-412|PI-400' }, '2202261000' => { 'quantity' => '2.0000', 'description' => 'THERMOWELL', 'tags' => '' }, '2204227002' => { 'quantity' => '1.0000', 'description' => 'TEMP TRANSMITTER', 'tags' => 'TE/TT-102' } };

In reply to Re: Removing Duplicates in a HoH by scorpio17
in thread Removing Duplicates in a HoH by DunLidjun

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.