1Nf3 has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I'm trying to remove duplicate entries in a text file. It's a simple English-Polish dictionary. Basically, I have a file like this:
anatomy=anatomia ancestor=poprzednik ancestor=przodek ancestral=dziedziczny ancestral=rodowy ancestry=pochodzenie ancestry=przodkowie anchor=kotwica
when what I want is this:
anatomy=anatomia ancestor=poprzednik, przodek ancestral=dziedziczny, rodowy ancestry=pochodzenie, przodkowie anchor=kotwica
My problem is - I don't know my way around Perl the way I'd like to, so I'm not sure how to approach this. Right now, I'm thinking about regular expressions and substitutions, something like this:
#!/usr/bin/perl while (<>) { s{ (^[^=]+) #should match the duplicated word [=] (.+) #should be the translation after the "=" \n \1 [=] (.+) \n }{$1=$2, $3}xig; print; }
But I think that (if this works), it's a solution for doubled entries, not tripled ones. Could anyone tell me, how to replace any number of repetitions in a file like mine? For example, my input file is:
ancient=starozytny ancillary=pomocniczy ancillary=sluzebny ancillary=wspomagajacy and=a, coraz, i and=oraz anecdote=anegdota anemone=zawilec
and the output should be:
ancient=starozytny ancillary=pomocniczy, sluzebny, wspomagajacy and=a, coraz, i, oraz anecdote=anegdota anemone=zawilec
Thanks for any suggestions.

Replies are listed 'Best First'.
Re: Remove duplicate words from a dictionary
by merlyn (Sage) on Dec 27, 2006 at 22:56 UTC
    my %mapping; while (<>) { my ($key, $value) = /^(.*?)=(.*)$/ or die "Cannot parse $_"; $mapping{$key}{$_} = 1 for split /, /, $value; } for my $word (sort keys %mapping) { my @aliases = sort keys %{$mapping{$word}}; print "$word=", join(", ", @aliases), "\n"; }

    update: Yeah, I wanted it to be a HoH, fixed the code, sorry.
      Thank you. Tomorrow I will test it against my 500k data file, but I'm sure it's just the thing I needed. Thank you for the solution, and thank you for the books. I would say "the books that had taught me everything I know about Perl", but as you see, I don't know much, so I'll write this: Thanks for the great books you wrote, for if I don't know much about Perl, it's not their fault.
      %mapping seems to be a HoH in the first cycle, and a HoA in the second. I'd probably modify to (untested):
      my %mapping; while (<>) { my ($key, $value) = /^(.*?)=(.*)$/ or die "Cannot parse $_"; push @{$mapping{$key}}, split /, /, $value; } for my $word (sort keys %mapping) { my @aliases = sort @{$mapping{$word}}; print "$word=", join(", ", @aliases), "\n"; }
      or, if scared by possible repetitions (untested, again):
      my %mapping; while (<>) { my ($key, $value) = /^(.*?)=(.*)$/ or die "Cannot parse $_"; $mapping{$key}{$_} = 1 for split /, /, $value; } for my $word (sort keys %mapping) { my @aliases = sort keys %{$mapping{$word}}; print "$word=", join(", ", @aliases), "\n"; }

      Flavio
      perl -ple'$_=reverse' <<<ti.xittelop@oivalf

      Don't fool yourself.
Re: Remove duplicate words from a dictionary
by ambrus (Abbot) on Dec 27, 2006 at 23:09 UTC

    As a quick fix,

    #!/usr/bin/perl local $/ = undef; $_ = <>; 1 while s{ (^[^=]+) #should match the duplicated word [=] (.+) #should be the translation after the "=" \n \1 [=] (.+ \n) }{$1=$2, $3}xmig; print;
Re: Remove duplicate words from a dictionary
by siva kumar (Pilgrim) on Dec 28, 2006 at 07:05 UTC
    Simply you can try this out
    my %hashMap; open(FH,"test.txt") or die("Can't open file $! "); while (<FH>) { next if ($_ =~ /^\s*$/); my ($key, $value) = /^(.*?)=(.*)$/ or die "Cannot parse $_"; if($hashMap{$key}) { $hashMap{$key}.= ",". $value; }else{ $hashMap{$key} = $value; } } for my $word (sort keys %hashMap) { print "$word=".$hashMap{$word}. "\n"; }