Collapsing a string to unique characters

dwhite20899 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Collapsing a string to unique characters by ikegami (Patriarch) on Jan 09, 2009 at 14:02 UTC
In existing order: `perl -nle"my %seen; print grep !$seen{$_}++, /./g"` [download] In lexical order: `perl -nle"my %seen; print sort grep !$seen{$_}++, /./g"` [download] By the way, the `chomp` is useless because `-nl` already chomps. `>perl -MO=Deparse -nle"foo()" BEGIN { $/ = "\n"; $\ = "\n"; } LINE: while (defined($_ = <ARGV>)) { chomp $_; foo(); } -e syntax OK` [download] Update: Oops, I had my tests inverted. Fixed.	[reply] [d/l] [select]
Re^2: Collapsing a string to unique characters by dwhite20899 (Friar) on Jan 09, 2009 at 14:17 UTC
ikegami, I tried the lexical order method, and it's almost what I need... On a mac, I get this output: `FMMNllltuwwxxxz 13AEITfhlllstxx /39CMMNhhlllllluuwwxxx 8ELMMfhllllxxx 49MMdhllllouxx 2239MMlllquxxxx 018AMMPPQTbbffhllllx /2559Mllrtuuwwxz` [download] If I change `$seen{$_}++` to `$seen{$_}+=2` then I get this: `/034589BDFFHKMMMNNUXabcfghjllllnqsttuuwwwxxxxyzz /112334589AACEEGIIKLMNPQRSTTbdeffhhjkllllnopqssttuxxx //01335899ABCCIMMMNNRTVYbcdefghhhjkllllllloqtuuuwwwxxxxy /035889BEEFKLLMMMNOPQTabcffghhlllllmqrtuvwxxxx +/134456899ACGKMMMNOQWbddghhlllllmnooqrtuuwxxx /2223345899ABDEFGHIMMMNORTabdfhillllnqqtuuwxxxxxyz +/00113457889AACEIMMMNPPPQQTTZabbbcefffghhlllllmqtuwxx //2234555899ACGILMMNOQUbehlllmoqrrsttuuuwwwxxyzz` [download] which DOES list all the chars used, but has duplicates.	[reply] [d/l] [select]
Re^2: Collapsing a string to unique characters by dwhite20899 (Friar) on Jan 09, 2009 at 14:21 UTC
*BRILLIANT* That bang did it. Sweet!	[reply]
Re: Collapsing a string to unique characters by Corion (Patriarch) on Jan 09, 2009 at 13:54 UTC
If the order of the characters is of no concern, you can do it in one regex and the lookup hash: `perl -wple "%seen=();s/(.)/$seen{$1}++?'':$1/ge"` [download]	[reply] [d/l]
Re^2: Collapsing a string to unique characters by dwhite20899 (Friar) on Jan 09, 2009 at 14:27 UTC
Holy moley. I need the order, but I'll save this for another use. Thanks!	[reply]
Re: Collapsing a string to unique characters by BrowserUk (Patriarch) on Jan 09, 2009 at 14:23 UTC
Golf:55 (and no sort!) `perl -ple"local($\",@_);@_[unpack'C',$_]=split'';$_=qq[@_]" test-stri +ngs +/034589BDFHKMNUXabcfghjlnqstuwxyz /1234589ACEGIKLMNPQRSTbdefhjklnopqstux /013589ABCIMNRTVYbcdefghjkloqtuwxy /03589BEFKLMNOPQTabcfghlmqrtuvwx +/1345689ACGKMNOQWbdghlmnoqrtuwx /234589ABDEFGHIMNORTabdfhilnqtuwxyz +/01345789ACEIMNPQTZabcefghlmqtuwx /234589ACGILMNOQUbehlmoqrstuwxyz` [download] Unix version is one less:`perl -ple'local($",@_);@_[unpack"C",$_]=split"";$_=qq[@_]'` Two less(thanks ikegami):`perl -ple'local($",@_);@_[unpack"C",$_]=split"";$_="@_"'` Update:54: `perl -ple"local(@_);@_[unpack'C',$_]=split'';$_=join'',@_" test-strings` 50: `-ple"@_=();@_[unpack'C*',$_]=split'';$_=join'',@_"` Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^2: Collapsing a string to unique characters by ikegami (Patriarch) on Jan 09, 2009 at 14:49 UTC
Depending on platform and Perl version: `Unix, pre 5.10: 42 -nle'@_=();@_[unpack"C",$_]=/./g;print@_' Windows, pre 5.10: 40 -nle@_=();@_[unpack'C',$_]=/./g;print@_ Unix, 5.10+: 40: -nlE'@_=();@_[unpack"C",$_]=/./g;say@_' Windows, 5.10+: 38: -nlE@_=();@_[unpack'C',$_]=/./g;say@_` [download]	[reply] [d/l]
Re^3: Collapsing a string to unique characters by Corion (Patriarch) on Jan 09, 2009 at 14:55 UTC
38, resp. 36 when using `say` instead of `print` `-nle@_[unpack'C',$_]=/./g;@_=!print@_` [download] Update:And 36 resp. 34: `-nle@_[ord]=$_,for/(.)/g;@_=!print@_` [download] And if you're using the `-E+say`, you can shave off one more by leaving off the `-l`, at 33 strokes: `-nE@_[ord]=$_,for/(.)/g;@_=!say@_` [download] Update2*: And incorporating BrowserUk and JavaFan's ideas: `# 34 strokes on Windows, and also Unix if your shell doesn't treat "!" + special -nE@_[map+ord,/./g]=/./g;@_=!say@_` [download]	[reply] [d/l] [select]
Re^3: Collapsing a string to unique characters by BrowserUk (Patriarch) on Jan 09, 2009 at 15:10 UTC
Minus 1 on all: Windows, pre 5.10: 39 `-nlelocal@_[map+ord,/./g]=/./g;print@_`	[reply] [d/l]
Re^3: Collapsing a string to unique characters by JavaFan (Canon) on Jan 09, 2009 at 15:04 UTC
You can leave off the -l. You don't need it for the output, as you're using say. And you don't need it for the chomp, as /./ doesn't match the newline.	[reply]
Re^2: Collapsing a string to unique characters by dwhite20899 (Friar) on Jan 09, 2009 at 14:32 UTC
Be kind! It's still morning here, and I'm not done my tea! That's spectacular, but it hurts my brain...	[reply]
Re: Collapsing a string to unique characters by JavaFan (Canon) on Jan 09, 2009 at 15:07 UTC
35 (Unix): `-nE'@_=();@_[ord]=$_ for/./g;say@_'` [download]	[reply] [d/l]
Re^2: Collapsing a string to unique characters by BrowserUk (Patriarch) on Jan 09, 2009 at 15:49 UTC
Anywhere, any version: 32 `-ple$_=join'',sort/./g;tr/!-~//s` [download]	[reply] [d/l]
Re^3: Collapsing a string to unique characters by jwkrahn (Abbot) on Jan 10, 2009 at 09:58 UTC
`-ple$_=join'',sort/./g;y///cs` [download]	[reply] [d/l]
Re^4: Collapsing a string to unique characters by BrowserUk (Patriarch) on Jan 10, 2009 at 11:01 UTC
Re^3: Collapsing a string to unique characters by JavaFan (Canon) on Jan 09, 2009 at 22:02 UTC
Anywhere Nope, it'll fail on an EBCDIC platform as the ~ is somewhere between the 'r' and the 's'.	[reply]
Re^2: Collapsing a string to unique characters by Corion (Patriarch) on Jan 09, 2009 at 15:11 UTC
Applying my same tricks, you can replace the `@_=();say@_` initialization by `@_=!say@_`, which is two chars shorter,yielding 31 on Windows and 33 on Unix: `-nE@_[ord]=$_,for/./g;@_=!say@_` [download]	[reply] [d/l] [select]
Re: Collapsing a string to unique characters by jwkrahn (Abbot) on Jan 09, 2009 at 14:16 UTC
$ echo "9b/lllqtUst48MMMxwBHz+wluFguNx5h3DnyKxfxFjNazwc0X 9b/lllqtxTElC8GLsftS2RKkAxI1MfQTIuNx5h3P4eoEphA31djsn 9b/lllqtyk/lC8MMMxwwwhcTAxlhNBl1ugoluNx5h3fjxud309RVeCIY 9b/lllqtcxP8MMMxBvFfOh8lLxQTfguNx5h3LKrE0maElw 9b/lllqtdx48MMMxw6h4+mol1ugoluNx5h3AKrdGQ9OCnW 9b/lllqtf2l48MMMxDxaxxz29OAHIuNx5h3wG2TEyFqu3RdBin 9b/lllqthPElC8MMMxw78QQ0bfaPlI14TfguNx5h30beTAmP1cfA+Z 9b/lllqtryA8GL2m2s9OQrxzotuIuNx5h3MUw/45uzMeC5ww " \| perl -pe'1 while s/(.)(?=.*\1)//' 9b/qUst48MBH+lgu5h3DnyKfxFjNazwc0X 9b/qlC8GLtS2RKkMfQTIuNx5P4eoEphA31djsn bqtyk/8MwcTAB1golN5hfjxud309RVeCIY 9b/qtcPMBvFO8QTfguNx5h3LKr0maElw b/qt8Mw64+m1goluNx5h3AKrdGQ9OCnW b/tfl48MDaz9OAHINx5hwG2TEyFqu3RdBin 9/qtECMw78QalI4guNx5h30beTmP1cfA+Z blqyA8GLm2s9OQrotINxh3U/4uzMeC5w [download]	[reply] [d/l]
Re: Collapsing a string to unique characters by gone2015 (Deacon) on Jan 09, 2009 at 15:11 UTC
I tried: `sub ext_s { # Returns characters in sorted order my ($s) = @_ ; my %h ; @h{split(//, $s)} = undef ; return join('', sort keys %h) ; } ; sub ext_o { # Returns characters in original order my ($s) = @_ ; my @h ; return join('', grep { !$h[ord($_)]++ } split(//, $s)) ; } ;` [download] compared to: `sub ext_dw { my ($s) = @_ ; my %h ; $h{$_} = undef foreach split(//, $s) ; return join('', sort keys %h) ; } ; sub ext_cn { my ($s) = @_ ; my %h ; $s =~ s/(.)/$h{$1}++?'':$1/ge ; return $s ; } ;` [download] and benchmarked: Rate cn dw s o cn 10101/s -- -40% -54% -59% dw 16949/s 68% -- -22% -31% s 21739/s 115% 28% -- -11% o 24390/s 141% 44% 12% --	[reply] [d/l] [select]
Re^2: Collapsing a string to unique characters by dwhite20899 (Friar) on Jan 09, 2009 at 20:00 UTC
oshalla, too nice, you beat me to it. Considering I'm going to be doing this for 100 million strings, that's great to know.	[reply]
Re^3: Collapsing a string to unique characters by BrowserUk (Patriarch) on Jan 09, 2009 at 20:42 UTC
If speed is your need, then compare this: `sub buk_s { my @c; $c[ $_ ] = $_ for unpack 'C', $_[ 0 ]; pack 'C', grep defined, @c; }` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re: Collapsing a string to unique characters by JavaFan (Canon) on Jan 10, 2009 at 10:32 UTC
26 chars `-nE'@;{/./g}=();%;=!say%;'` [download]	[reply] [d/l]
Re^2: Collapsing a string to unique characters by BrowserUk (Patriarch) on Jan 10, 2009 at 11:21 UTC
No good. Hash keys are unordered. C:\test>perl -ple$_=join'',sort/./g;y///cs test-strings +/034589BDFHKMNUXabcfghjlnqstuwxyz /1234589ACEGIKLMNPQRSTbdefhjklnopqstux /013589ABCIMNRTVYbcdefghjkloqtuwxy /03589BEFKLMNOPQTabcfghlmqrtuvwx +/1345689ACGKMNOQWbdghlmnoqrtuwx /234589ABDEFGHIMNORTabdfhilnqtuwxyz +/01345789ACEIMNPQTZabcefghlmqtuwx /234589ACGILMNOQUbehlmoqrstuwxyz C:\test>\Perl510\bin\perl5.10.0.exe -nE"@;{/./g}=();%;=!say%;" test-st +rings /aNKjyugtsBHcDqbzUwFxMh0fnX39+8l45 S/TNKd2Eju1ktesqbIGxQhMCfLAn3P98lp4Ro5 /TNdYjyu1kgteBcqbIwxVhM0CfA398lR5o /aTNKEugtvBcqbwFrxQhM0LfO3Pm98l5 /NKdu1gtWqbGwrxQhMCA6nO3m9+8l45o /TaNdE2yutBHDqbIGzFwxhMfiAnO398l4R5 /TaN7EZu1gtecqbIwxQMh0CfA3Pm9+8l45 /N2yutesqbIGzUwrxQMhCLAO3m98l45o [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^3: Collapsing a string to unique characters by JavaFan (Canon) on Jan 10, 2009 at 17:20 UTC
No good. Hash keys are unordered. So? The OP didn't make it a requirement the result was ordered: I have a number of strings, made up only of 64 characters: a-zA-Z0-9/+ . I need to collapse these down to just the unique characters in the string. Besides, Perlmonks has a long tradition of making small changes to the requirements for the sake of winning at golf. ;-)	[reply]
Re^4: Collapsing a string to unique characters by ikegami (Patriarch) on Jan 10, 2009 at 17:32 UTC
Re^5: Collapsing a string to unique characters by dwhite20899 (Friar) on Jan 11, 2009 at 00:47 UTC
Re^2: Collapsing a string to unique characters by CountZero (Bishop) on Jan 10, 2009 at 11:24 UTC
That is too nice, but you will have to explain how it works. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply]
Re^3: Collapsing a string to unique characters by JavaFan (Canon) on Jan 12, 2009 at 14:42 UTC
`-nE'@;{/./g}=();%;=!say%;'` `/./g` is in list context, so it's a shorthand for `/(.)/g` and will hence return a list of all characters (without the newline). `@;{...}` is a slice of the hash `%;`. `@;{/./g} = ()` sets all values in the slice to undef. The keys are the characters of the line. `say %;` prints the hash; as key-value pairs. Since the values are undefined, the values are printed as empty strings. So, in effect, it prints all the characters of the line, without duplicates. `%;=!say%;` `say` will return true, so its negation will be the empty string. So it'll make `%;` have one element: the empty string as key, and the undefined value as value. This will be printed for the next line, but since they are both printed as empty strings, you won't actually see it.	[reply] [d/l] [select]
Re^4: Collapsing a string to unique characters by CountZero (Bishop) on Jan 12, 2009 at 18:38 UTC