How do I remove duplicate numeric elements of an array and preserve alphabetic elements?

jzelkowsz has asked for the wisdom of the Perl Monks concerning the following question:

I need to preserve all alphabetic elements of the array but the duplicate numeric elements must be removed. The original array is 45,000+ elements. I am trying to end up with a result like the below (yes, the pipe is required):

20055111|YOUSLAV,YURT,TENWIMPL
20011271|YOUSLAV,WUMARTHE
20011541|YOUSLAV,TENWIMPL
20102741|WEDLOFOU,YOUSLAV,YURT,KUPLYSO,TENWIMPL
20155505|YOUSLAV,YURT,TENWIMPL
20147155|YOUSLAV,KUPLYSO,FRIMA
[download]

The original data looked like this:

20055111,YOUSLAV,
20055111,YURT,
20055111,TENWIMPL,
20011271,YOUSLAV,
20011271,WUMARTHE
20011541,YOUSLAV,
20011541,TENWIMPL,
20102741,WEDLOFOU,
20102741,YOUSLAV,
20102741,YURT,
20102741,KUPLYSO,
20102741,TENWIMPL,
20155505,YOUSLAV,
20155505,YURT,
20155505,TENWIMPL,
20147155,YOUSLAV,
20147155,KUPLYSO,
20147155,FRIMA,
[download]

I have tried the below but (unfortunately) it removes ALL duplicate elements. I am trying to preserve the alphabetic elements.

sub uniq {
    my %seen;
    grep !$seen{$_}++, @_;
}


my @cert = qw( 

 20055111 YOUSLAV  20055111 YURT  20055111 TENWIMPL  20011271 YOUSLAV 
+ 20011271 WUMARTHE  20011541 YOUSLAV  20011541 TENWIMPL  20102741 WED
+LOFOU  20102741 YOUSLAV  20102741 YURT  20102741 KUPLYSO  20102741 TE
+NWIMPL  20155505 YOUSLAV  20155505 YURT  20155505 TENWIMPL  20147155 
+YOUSLAV  20147155 KUPLYSO  20147155 FRIMA 
);

my @filtered = uniq(@cert);

print "@filtered\n";
[download]

Below is a sample of the file I am trying to work with. I replace all the commas with spaces in my array:

20055111,YOUSLAV, 20055111,YURT, 20055111,TENWIMPL, 20011271,YOUSLAV, 20011271,WUMARTHE, 20011541,YOUSLAV, 20011541,TENWIMPL, 20102741,WEDLOFOU, 20102741,YOUSLAV, 20102741,YURT, 20102741,KUPLYSO, 20102741,TENWIMPL, 20155505,YOUSLAV, 20155505,YURT, 20155505,TENWIMPL, 20147155,YOUSLAV, 20147155,KUPLYSO, 20147155,FRIMA, 20172145,TENWIMPL, 20172175,TENWIMPL, 20175511,FRIMA, 20174117,TENWIMPL, 20175410,TENWIMPL, 20175554,YOUSAID, 20202011,FRIMATEC, 20214475,CIPWOMAT, 20271275,YOUSLAV, 20271275,YURT, 20271275,TENWIMPL, 20217175,YURT, 20217175,KUPLYSO, 20217175,TENWIMPL, 20217177,WEDLOFOU, 20217177,YOUSLAV, 20217177,YURT, 20217177,YURTRN, 20217177,YURTRN, 20217177,TENWIMPL, 20217177,WEDLOFOU, 20217177,YOUSLAV, 20217177,KUPLYSO, 20217177,TENWIMPL, 20217171,YOUSLAV, 20217171,YURT, 20217171,TENWIMPL, 20217171,YOUSLAV, 20217171,YURT, 20217171,TENWIMPL, 20217110,WEDLOFOU, 20217110,YOUSLAV, 20217110,KUPLYSO, 20217110,TENWIMPL, 20217112,YOUSLAV, 20217112,YOUTESSNO, 20217112,YOUTESSNO, 20217507,YOUSLAV, 20217501,WEDLOFOU, 20217501,YOUSLAV, 20217501,TENWIMPL, 20217512,TENWIMPL, 20217517,YOUSLAV, 20217517,FRIMA, 20217517,YOUSLAV, 20217517,YURT, 20217517,TENWIMPL, 20217511,YOUSLAV, 20217511,SYMKIR, 20217511,TENWIMPL, 20217520,WEDLOFOU, 20217520,YOUSLAV, 20217520,TENWIMPL, 20217521,YOUSLAV, 20217521,TENWIMPL, 20217522,WEDLOFOU, 20217522,YOUSLAV, 20217522,CIPWOMAT, 20217522,TENTMIR, 20217522,TENTMIR, 20217555,YOUSLAV, 20217555,YURT, 20217555,TENWIMPL, 20217557,CODNGSPC, 20217774,YOUSLAV, 20217774,KUPLYSO,

Comment on How do I remove duplicate numeric elements of an array and preserve alphabetic elements? Select or Download Code

Replies are listed 'Best First'.
Re: How do I remove duplicate numeric elements of an array and preserve alphabetic elements? by hippo (Archbishop) on Jun 04, 2018 at 14:32 UTC
Have you read the FAQ How can I remove duplicate elements from a list or array? If so, have you tried to modify the approaches given there to meet your individual requirements? I would modify the last example.	[reply]
Re: How do I remove duplicate numeric elements of an array and preserve alphabetic elements? by BrowserUk (Patriarch) on Jun 04, 2018 at 15:26 UTC
Try this: #! perl -slw use strict; use Data::Dump qw[ pp ]; my( @ordered, %grouped ); while( <DATA> ) { chomp; my @pair = split ',', $_; $ordered[ @ordered ] = $pair[ 0 ] unless exists $grouped{ $pair[ 0 + ] }; push @{ $grouped{ $pair[0] } }, $pair[1]; } #pp \@ordered, \%grouped; print "$_\|", join ',', @{ $grouped{ $_ } } for @ordered; __DATA__ 20055111,YOUSLAV, 20055111,YURT, 20055111,TENWIMPL, 20011271,YOUSLAV, 20011271,WUMARTHE 20011541,YOUSLAV, 20011541,TENWIMPL, 20102741,WEDLOFOU, 20102741,YOUSLAV, 20102741,YURT, 20102741,KUPLYSO, 20102741,TENWIMPL, 20155505,YOUSLAV, 20155505,YURT, 20155505,TENWIMPL, 20147155,YOUSLAV, 20147155,KUPLYSO, 20147155,FRIMA, [download] Output: `C:\test>1215831.pl 20055111\|YOUSLAV,YURT,TENWIMPL 20011271\|YOUSLAV,WUMARTHE 20011541\|YOUSLAV,TENWIMPL 20102741\|WEDLOFOU,YOUSLAV,YURT,KUPLYSO,TENWIMPL 20155505\|YOUSLAV,YURT,TENWIMPL 20147155\|YOUSLAV,KUPLYSO,FRIMA` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit	[reply] [d/l] [select]
Re^2: How do I remove duplicate numeric elements of an array and preserve alphabetic elements? by Marshall (Canon) on Jun 04, 2018 at 20:29 UTC
I like your code++. A couple of minor nits: I personally try to avoid using subscripts in favor of assigning a name to variables in a split. In this case, I'm not sure what the number represents (or the name), I'm sure the OP knows better a better description than us. I would use a push instead of array assignment to @ordered, just because it seems more natural to me. This is minor stuff - no problem at all with your code. Update: I guess I don't know what is supposed to happen if say YOUSLAV appeared twice for 20055111 or whether that is even possible to occur. If that is possible, the OP should clarify. #! perl -slw use strict; use Data::Dump qw[ pp ]; my( @ordered, %grouped ); while( <DATA> ) { chomp; my ($number, $name) = split ',', $_; push (@ordered, $number ) unless exists $grouped{$number }; push @{ $grouped{ $number } }, $name; } #pp \@ordered, \%grouped; print "$_\|", join ',', @{ $grouped{ $_ } } for @ordered; =prints 20055111\|YOUSLAV,YURT,TENWIMPL 20011271\|YOUSLAV,WUMARTHE 20011541\|YOUSLAV,TENWIMPL 20102741\|WEDLOFOU,YOUSLAV,YURT,KUPLYSO,TENWIMPL 20155505\|YOUSLAV,YURT,TENWIMPL 20147155\|YOUSLAV,KUPLYSO,FRIMA =cut __DATA__ 20055111,YOUSLAV, 20055111,YURT, 20055111,TENWIMPL, 20011271,YOUSLAV, 20011271,WUMARTHE 20011541,YOUSLAV, 20011541,TENWIMPL, 20102741,WEDLOFOU, 20102741,YOUSLAV, 20102741,YURT, 20102741,KUPLYSO, 20102741,TENWIMPL, 20155505,YOUSLAV, 20155505,YURT, 20155505,TENWIMPL, 20147155,YOUSLAV, 20147155,KUPLYSO, 20147155,FRIMA, [download]	[reply] [d/l]
Re^3: How do I remove duplicate numeric elements of an array and preserve alphabetic elements? by jzelkowsz (Novice) on Jun 07, 2018 at 13:11 UTC
You said "I guess I don't know what is supposed to happen if say YOUSLAV appeared twice for 20055111 or whether that is even possible to occur" It's not possible for the term to appear twice with the number. His solution appears to work very well!	[reply]
Re^4: How do I remove duplicate numeric elements of an array and preserve alphabetic elements? by Marshall (Canon) on Jun 07, 2018 at 14:30 UTC
Re^2: How do I remove duplicate numeric elements of an array and preserve alphabetic elements? by jzelkowsz (Novice) on Jun 07, 2018 at 12:59 UTC
I'm very pleased to say your solution is working. I remmed out the chomp statement and put in two file handling statements and now it's doing exactly what I need. I had to install the "Data::Dump" module. Thank you, this is great work and very slick!	[reply]
Re: How do I remove duplicate numeric elements of an array and preserve alphabetic elements? -- oneliner by Discipulus (Canon) on Jun 04, 2018 at 15:52 UTC
Hello jzelkowsz and welcome to the monastery and to the wonderful world of Perl! As you already got useful answers I propose you a short version(pay attention to windows double quotes!): `perl -F"," -lane "push @{$h{$F[0]}},$F[1]}{print map{$_.'\|'.(join',',@ +{$h{$_}}).qq(\n)}keys %h" data.txt 20155505\|YOUSLAV,YURT,TENWIMPL 20102741\|WEDLOFOU,YOUSLAV,YURT,KUPLYSO,TENWIMPL 20011541\|YOUSLAV,TENWIMPL 20011271\|YOUSLAV,WUMARTHE 20147155\|YOUSLAV,KUPLYSO,FRIMA 20055111\|YOUSLAV,YURT,TENWIMPL` [download] See `perlrun` to get all these perl switches explained, but use `-MO=Deparse` to see the oneliner exploded and more readable (the curly braces in `$F[1]}{print` are a trick called esquimo greeting;): `perl -MO=Deparse -F"," -lane "push @{$h{$F[0]}},$F[1]}{print map{$_.'\| +'.(join',',@{$h{$_}}).qq(\n)}keys %h" BEGIN { $/ = "\n"; $\ = "\n"; } LINE: while (defined($_ = <ARGV>)) { chomp $_; our(@F) = split(/,/, $_, 0); push @{$h{$F[0]};}, $F[1]; } { print map({$_ . '\|' . join(',', @{$h{$_};}) . "\n";} keys %h); } -e syntax OK` [download] L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]
Re^2: How do I remove duplicate numeric elements of an array and preserve alphabetic elements? -- oneliner by jzelkowsz (Novice) on Jun 07, 2018 at 13:21 UTC
`perl -F"," -lane "push @{$h{$F[0]}},$F[1]}{print map{$_.'\|'.(join',',@{$h{$_}}).qq(\n)}keys %h" data.txt` This definitely works too. June 8th 2018 Discipulus added code tags	[reply] [d/l]
Re: How do I remove duplicate numeric elements of an array and preserve alphabetic elements? by LanX (Saint) on Jun 04, 2018 at 14:55 UTC
This puzzles me, > preserve all alphabetic elements of the array but the duplicate numeric elements must be removed but I think what you want is to parse the data pairwise ($number, $name) and have a unique list of names per number. In this case I'd suggest building a hash of hashes (if original order doesn't matter). Just set `$names_per_num{$number}{$name} = 1` for each combination. After that you'll just need to iterate over all numbers and print the `keys` of the sub-hash to get your desired output. No code yet, we'd love to help you improving your attempts! :) HTH! Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} PS: and if original order matters just use the above HoH as a %seen filter while iterating the list.	[reply] [d/l]
Re^2: How do I remove duplicate numeric elements of an array and preserve alphabetic elements? by jzelkowsz (Novice) on Jun 07, 2018 at 13:08 UTC
Rolf: Thank you very much for your reply! I will read the link you mentioned. I have found Perl Monks to be very helpful!	[reply]
Re: How do I remove duplicate numeric elements of an array and preserve alphabetic elements? by BillKSmith (Monsignor) on Jun 04, 2018 at 15:31 UTC
Do not think of your problem as removing numeric data. Look at the problem as one of combining all the alpha data that belongs to the same numeric 'key'. In this view, store the data as a hash-of-arrays with the numbers as the keys. `use strict; use warnings; use Autodie; open my $FH, '<', 'jzelkowsz.dat'; my %data; while (my $pair = do{ $/ = ', ';<$FH>}) { my ($numeric, $alpha) = split qr/,/, $pair; push @{$data{$numeric}}, $alpha; } foreach my $num (sort keys %data) { $" = ','; $\ = "\n"; print "$num\|@{$data{$num}}"; }` [download] Bill	[reply] [d/l]
Re^2: How do I remove duplicate numeric elements of an array and preserve alphabetic elements? by jzelkowsz (Novice) on Jun 07, 2018 at 13:25 UTC
Thank you, Bill. I appreciate a different way of looking at the problem.	[reply]