Re: Regex: remove non-adjacent duplicate hashtags

I know you're asking for a regexp, but here's another way to do it that will give you an intermediate count of how many dupes;

my $s = "#tag1 #tag2 #tag3 #tag1";

print qq{Original: $s\n};
my ($x, $d);
map { ++$x->{$_}; $d->{$_}=$x->{$_} if $x->{$_} > 1 } split /\s+/, $s;
printf qq{Uniq: %s\n}, join(' ', sort keys %$x);
printf qq{Dupes found for: %s\n}, join(', ', sort keys %$d);
[download]

Output:

Original: #tag1 #tag2 #tag3 #tag1
Uniq: #tag1 #tag2 #tag3
Dupes found for: #tag1
[download]

I am sure this can be shrunk down quite a bit, but I tried to strike a balance between terseness and readability. You can inspect $x for counts.

Comment on Re: Regex: remove non-adjacent duplicate hashtags Select or Download Code

Replies are listed 'Best First'.
Re^2: Regex: remove non-adjacent duplicate hashtags by hippo (Archbishop) on Jul 24, 2022 at 09:20 UTC
To solve the presented problem this type of approach is the one I would use too but with List::Util::uniq for speed and clarity. Core modules with XS are ideal for this type of task. `#!/usr/bin/env perl use strict; use warnings; use List::Util 'uniq'; my $tags = '#tag1 #tag2 #tag3 #tag1'; my $uniq = join ' ', uniq split / /, $tags; print "Orig: $tags\nUniq: $uniq\n";` [download] 🦛	[reply] [d/l]
Re^3: Regex: remove non-adjacent duplicate hashtags by perlfan (Parson) on Jul 26, 2022 at 06:27 UTC
Yes, I was just showing the different approach which seemed to solve the issue, i.e., getting the list of unique tags. I also wanted to show how to do that with hashrefs; but sure what you say is another option.	[reply]
Re^4: Regex: remove non-adjacent duplicate hashtags by AnomalousMonk (Archbishop) on Jul 26, 2022 at 10:19 UTC
I ++ed your reply when I first saw it, but I'm having second thoughts. (But I won't try to take back my upvote. :) A solution based on uniq (which utilizes a hash under the hood) preserves the order of items in the input list (with only the first of a set of non-unique items being output) whereas the method using sort does not. If information on the particular input list items that were not unique is needed, a single hash can be used both for uniqifying and for counting (and a hash reference is not needed). `Win8 Strawberry 5.8.9.5 (32) Tue 07/26/2022 5:28:42 C:\@Work\Perl\monks >perl use strict; use warnings; my $s = "#tag2 #tag1 #tag2 #tag3 #tag1"; print qq{Orig: '$s' \n}; my %u; $s = join ' ', grep { ! $u{$_}++ } split /\s+/, $s; print qq{Uniq: '$s' \n}; printf qq{Dups: '%s' \n}, join ', ', sort grep { $u{$_} > 1 } keys %u; ^Z Orig: '#tag2 #tag1 #tag2 #tag3 #tag1' Uniq: '#tag2 #tag1 #tag3' Dups: '#tag1, #tag2'` [download] Note: `uniq` may be found in List::MoreUtils rather than in List::Util in older versions of Perl. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]