harangzsolt33 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks!

I have a little problem that might as well be a "homework" assignment. I already have a solution, but my question is is there a nicer or more efficient solution to this problem?

I want to write a function that tests its arguments to make sure that they are all made up of the same characters. My assumption is that if two strings have the same makeup, then they are going to have the same length. So, for example, if a string is made up of 5 As and 2 Bs, then you could have "AAAAABB" or "AABAABA" or "AABBAAA" or any other configuration, BUT they will always be 7 bytes altogether, never more or less. So, first thing I want to make sure is that each argument has the same length. All arguments could be empty strings, or all arguments could be undefined, or all arguments could be N bytes long. Once we figure this out, then we have to find out if the strings are made up of the same stuff.

The first idea that came to my mind is that I'm going to sort the letters. If the sorted strings are the same, then the string has the same makeup. If not, the function returns zero. But could there be a faster solution, something I haven't thought of?

#!/usr/bin/perl use strict; use warnings; Test(); sub hasSameChars { # Make sure each argument has the same type and same length. @_ > 1 or return 0; # Require at least 2 arguments # Check if arguments are of the same length. my $THISLEN = -2; # Current argument's length my $PREVLEN = -2; # Previous argument's length foreach (@_) { $THISLEN = (defined $_) ? length($_) : -1; if ($PREVLEN > -2) { $PREVLEN == $THISLEN or return 0; } $PREVLEN = $THISLEN; } # Okay, at this point, we know that all arguments are # the same type and same length. If they are all undefined, # then $THISLEN will be -1 or if they are all empty strings, then # $THISLEN will be zero, so we can exit right here in that case: $THISLEN < 1 and return 1; # We end up here if all arguments are defined, and all are the same +length. # Check if arguments have the same makeup: my $SAMPLE = join('', sort(split(//, shift))); foreach (@_) { join('', sort(split(//, $_))) eq $SAMPLE or return 0; } return 1; } sub Test { my $ERRORS = 0; # TESTING TRUE: hasSameChars("Hello World!", "Hello World!") == 1 or + $ERRORS++; hasSameChars("Hello World!", "Hdll!ooreWl ") == 1 or + $ERRORS++; hasSameChars("Hello World!", "Hdll!ooreWl ", "Hlde! Wlloor") == 1 or + $ERRORS++; hasSameChars("FALLEN", "LENFAL", "ELFALN") == 1 or + $ERRORS++; hasSameChars(undef, undef, undef) == 1 or + $ERRORS++; hasSameChars('', '', '', '') == 1 or + $ERRORS++; hasSameChars('A', 'A', 'A', 'A') == 1 or + $ERRORS++; hasSameChars("\r\n\t\0\0\1", "\0\1\t\0\n\r", "\n\1\r\t\0\0") == 1 or + $ERRORS++; hasSameChars('ABB', 'BBA', 'BAB', 'BBA') == 1 or + $ERRORS++; hasSameChars(' ', ' ', ' ', ' ') == 1 or + $ERRORS++; hasSameChars("ABC", "CAB", "BAC", "BCA", "ACB", "ABC") == 1 or + $ERRORS++; # TESTING FALSE: hasSameChars("\r\n\t\0\1\2", "\0\2\t\1\n\r", "\t\t\r\1\2\n") == 0 or + $ERRORS++; hasSameChars("Hello World!", "Hdll!ooreWl ", "Hlde! Wlloo") == 0 or + $ERRORS++; hasSameChars("Hello World!", "Hdll!ooreWl") == 0 or + $ERRORS++; hasSameChars("ABC", "C_B", "BAC", "BCA", "ACB", "ABC") == 0 or + $ERRORS++; hasSameChars("ALLEN", "ELLEN", "HALLEN", "HELLEN") == 0 or + $ERRORS++; hasSameChars("FALLEN", "Lenfal", "ELFALN") == 0 or + $ERRORS++; hasSameChars("FALLEN", "FALLEN ") == 0 or + $ERRORS++; hasSameChars(undef, "FALLEN", "FALLEN") == 0 or + $ERRORS++; hasSameChars("FALLEN", undef, "FALLEN") == 0 or + $ERRORS++; hasSameChars("FALLEN", "FALLEN", undef) == 0 or + $ERRORS++; hasSameChars('', undef, undef) == 0 or + $ERRORS++; hasSameChars('ABB', 'ABA', 'BAA', 'BAA') == 0 or + $ERRORS++; hasSameChars(' ', ' ', ' ', "\t") == 0 or + $ERRORS++; print "\n\n ERRORS = $ERRORS \n\n"; }

Replies are listed 'Best First'.
Re: Check if string A and B are made up of same chars
by LanX (Saint) on Nov 17, 2024 at 00:36 UTC
    Not sure if it's faster.

    It's late and I haven't benchmarked it. ( I suppose it also depends on the length of the strings.)

    For multiple strings you'll need to reset %h to the first count.

    use strict; use warnings; sub test { my ($str1,$str2) = @_; my %h; $h{$_}++ for split //, $str1; $h{$_}-- for split //, $str2; $_ and return 0 for values %h; return 1; } print test(("Hello World!")x2); # 1 print test("Hello World!","Hello World"); # 0

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

      Thank you! That's an interesting solution. In the meanwhile, I also thought of one. This version removes characters one by one from both strings:

      Edit: Here is the latest updated version. I think, this is the fastest so far:

        The performance of those algorithms depends a lot on the quality of the input.

        Like

        • length
        • entropy (randomness)

        For instance a linear approach like mine will most likely dominate for longer strings, which are equal.

        But how likely are a bunch of totally random long strings ever to be equal???

        That's a microscopic chance, and any algorithm which is managing to opt out as early as possible will dominate, little matter the complexity of the approach in total.

        So first task:

        define the problem space and provide a generator to create representative input.

        Second task:

        write a Benchmark suite based on this input.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

Re: Check if string A and B are made up of same chars
by Tux (Canon) on Nov 18, 2024 at 12:01 UTC

    Using unpack, hashes and Test::More:

    #!/usr/bin/perl use 5.026001; use warnings; use Test::More; sub same { my ($s1, $s2) = @_; my %l1; $l1{$_}++ for unpack "(A)*" => $s1; my %l2; $l2{$_}++ for unpack "(A)*" => $s2; is_deeply (\%l1, \%l2, "'$s1' ~~~ '$s2'"); } # same sub diff { my ($s1, $s2) = @_; my %l1; $l1{$_}++ for unpack "(A)*" => $s1; my %l2; $l2{$_}++ for unpack "(A)*" => $s2; isnt ((join "" => map { $_, $l1{$_} } sort keys %l1), (join "" => map { $_, $l2{$_} } sort keys %l2), "'$s1' ~!~ '$s2'"); } # diff same ("ABBABAA", "BAABABA"); diff ("ABBABAA", "BAABBBA"); diff ("ABBABAA", "ABBABaA"); diff ("ABBABAA", "BAABBAA "); diff ("ABBABAA", "BAABABA "); diff ("ABBABAA", "BAABABAA"); done_testing;

    -->

    $ test.pl ok 1 - 'ABBABAA' ~~~ 'BAABABA' ok 2 - 'ABBABAA' ~!~ 'BAABBBA' ok 3 - 'ABBABAA' ~!~ 'ABBABaA' ok 4 - 'ABBABAA' ~!~ 'BAABBAA ' ok 5 - 'ABBABAA' ~!~ 'BAABABA ' ok 6 - 'ABBABAA' ~!~ 'BAABABAA' 1..6

    edit: add example for Test2:

    use 5.026001; use warnings; use Test2::V0; sub same { my ($s1, $s2) = @_; my %l1; $l1{$_}++ for unpack "(A)*" => $s1; my %l2; $l2{$_}++ for unpack "(A)*" => $s2; is (\%l1, \%l2, "'$s1' ~~~ '$s2'"); } # same sub diff { my ($s1, $s2) = @_; my %l1; $l1{$_}++ for unpack "(A)*" => $s1; my %l2; $l2{$_}++ for unpack "(A)*" => $s2; isnt (\%l1, \%l2, "'$s1' ~!~ '$s2'"); # <- this line is now m +uch simpler } # diff same ("ABBABAA", "BAABABA"); diff ("ABBABAA", "BAABBBA"); diff ("ABBABAA", "ABBABaA"); diff ("ABBABAA", "BAABBAA "); diff ("ABBABAA", "BAABABA "); diff ("ABBABAA", "BAABABAA"); done_testing;

    Enjoy, Have FUN! H.Merijn
Re: Check if string A and B are made up of same chars
by tybalt89 (Monsignor) on Nov 18, 2024 at 16:23 UTC

    And now for something completely different :)

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11162756 use warnings; Test(); sub hasSameChars { ( grep(defined, @_) or return 1 ) < @_ and return 0; my %same = map { do { my @count = (0) x 256; $count[ord $&]++ while /./gs; "@count" }, 1 } @_; return %same == 1 ? 1 : 0; } sub Test { my $ERRORS = 0; # TESTING TRUE: hasSameChars("Hello World!", "Hello World!") == 1 or + $ERRORS++; hasSameChars("Hello World!", "Hdll!ooreWl ") == 1 or + $ERRORS++; hasSameChars("Hello World!", "Hdll!ooreWl ", "Hlde! Wlloor") == 1 or + $ERRORS++; hasSameChars("FALLEN", "LENFAL", "ELFALN") == 1 or + $ERRORS++; hasSameChars(undef, undef, undef) == 1 or + $ERRORS++; hasSameChars('', '', '', '') == 1 or + $ERRORS++; hasSameChars('A', 'A', 'A', 'A') == 1 or + $ERRORS++; hasSameChars("\r\n\t\0\0\1", "\0\1\t\0\n\r", "\n\1\r\t\0\0") == 1 or + $ERRORS++; hasSameChars('ABB', 'BBA', 'BAB', 'BBA') == 1 or + $ERRORS++; hasSameChars(' ', ' ', ' ', ' ') == 1 or + $ERRORS++; hasSameChars("ABC", "CAB", "BAC", "BCA", "ACB", "ABC") == 1 or + $ERRORS++; # TESTING FALSE: hasSameChars("\r\n\t\0\1\2", "\0\2\t\1\n\r", "\t\t\r\1\2\n") == 0 or + $ERRORS++; hasSameChars("Hello World!", "Hdll!ooreWl ", "Hlde! Wlloo") == 0 or + $ERRORS++; hasSameChars("Hello World!", "Hdll!ooreWl") == 0 or + $ERRORS++; hasSameChars("ABC", "C_B", "BAC", "BCA", "ACB", "ABC") == 0 or + $ERRORS++; hasSameChars("ALLEN", "ELLEN", "HALLEN", "HELLEN") == 0 or + $ERRORS++; hasSameChars("FALLEN", "Lenfal", "ELFALN") == 0 or + $ERRORS++; hasSameChars("FALLEN", "FALLEN ") == 0 or + $ERRORS++; hasSameChars(undef, "FALLEN", "FALLEN") == 0 or + $ERRORS++; hasSameChars("FALLEN", undef, "FALLEN") == 0 or + $ERRORS++; hasSameChars("FALLEN", "FALLEN", undef) == 0 or + $ERRORS++; hasSameChars('', undef, undef) == 0 or + $ERRORS++; hasSameChars('ABB', 'ABA', 'BAA', 'BAA') == 0 or + $ERRORS++; hasSameChars(' ', ' ', ' ', "\t") == 0 or + $ERRORS++; print "ERRORS = $ERRORS\n"; }

    Outputs:

    ERRORS = 0

      Well, let's see:

      X:\>perl 11162789.pl Argument "1/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "1/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "1/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "1/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "1/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "1/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "1/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "1/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "1/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "1/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. ERRORS = 0 X:\>perl -v This is perl 5, version 14, subversion 2 (v5.14.2) built for MSWin32-x +64-multi-t hread Copyright 1987-2011, Larry Wall Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 source ki +t. Complete documentation for Perl, including FAQ lists, should be found +on this system using "man perl" or "perldoc perl". If you have access to + the Internet, point your browser at http://www.perl.org/, the Perl Home Pa +ge. X:\>

      Line 16 is this one:

      return %same == 1 ? 1 : 0;

      Prior to Perl 5.25, a non-empty hash in scalar context returned a STRING containing used and allocated buckets, separated by a slash. After that, a hash in scalar context returns the number of keys in the hash. Obviously, you did not consider that. keys %same would always return the number of keys.

      And while we are at it, let's see what happens once you leave the 8 bit world and process Unicode strings. Adding the following line to the "TESTING FALSE" lines:

      hasSameChars("ABC\x{ABCD}", "ABC\x{CDEF}") == 0 or + $ERRORS++;
      ... Use of uninitialized value in join or string at 11162789.pl line 14. Use of uninitialized value in join or string at 11162789.pl line 14. Use of uninitialized value in join or string at 11162789.pl line 14. Use of uninitialized value in join or string at 11162789.pl line 14. Use of uninitialized value in join or string at 11162789.pl line 14. Use of uninitialized value in join or string at 11162789.pl line 14. Terminating on signal SIGBREAK(21) Use of uninitialized value in join or string at 11162789.pl line 14. Terminating on signal SIGBREAK(21) Use of uninitialized value in join or string at 11162789.pl line 14. X:\>

      Yes, it seems that it does not terminate. Actually, it would terminate, after issuing many, many warnings. Much more than I'm willing watch scrolling across my screen.

      Line 14 is this line:

      "@count"

      Let's change my change to use "ABC\x{101}" and "ABC\x{102}" to see what happens:

      ... Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. Use of uninitialized value in join or string at 11162789.pl line 14. Use of uninitialized value in join or string at 11162789.pl line 14. Use of uninitialized value in join or string at 11162789.pl line 14. Argument "2/8" isn't numeric in numeric eq (==) at 11162789.pl line 16 +. ERRORS = 0 X:\>

      Well, it seems you don't know Unicode at all, or you just completely ignored it.

      Both is sad.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Funny enough, always ERRORS = 0

        So either he just needs to silence the warnings or the OP has to work on his test suite...

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

      Subtitles:

      Our hero is using the stringified @count array as hash key. All counts are equal IFF the hash has exactly one entry.

      Entertaining! 👏🏼

      😉

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

Re: Check if string A and B are made up of same chars
by karlgoethebier (Abbot) on Nov 17, 2024 at 15:07 UTC

    Variety delights:

    #!/usr/bin/env perl use strict; use warnings; use Set::Scalar; use List::MoreUtils qw(uniq); use feature qw(say); my $string = qw(kizuaheli); my $s_1 = Set::Scalar->new(uniq(split//, $string)); my $s_2 = Set::Scalar->new(uniq(split//, $string)); say $s_1->is_equal($s_2); say $s_1 == $s_2; my $s_3 = Set::Scalar->new(uniq(split//, $string)); my $s_4 = Set::Scalar->new(uniq(split//, "")); say $s_3->is_equal($s_4); say $s_3 == $s_4;

    Update: arunbear has pointed out that the use of uniq is superfluous here - see below.

      The uniq is not really needed there because Set::Scalar itself will discard any duplicates.

        Yes, thanks. I had to convince myself again - it's been a long time since I last used it.

        my $s_1 = Set::Scalar->new(split//, "abc"); my $s_2 = Set::Scalar->new(split//, "aaaaaabc"); dd $s_1; dd $s_2; __END__ do { my $a = bless({ elements => { a => "a", b => "b", c => "c" }, universe => bless({ elements => { a => "a", b => "b", c => "c" }, null => bless({ universe => 'fix' }, "Set::Scalar::N +ull"), universe => undef, }, "Set::Scalar::Universe"), }, "Set::Scalar"); $a->{universe}{null}{universe} = $a->{universe}; $a; } do { my $a = bless({ elements => { a => "a", b => "b", c => "c" }, universe => bless({ elements => { a => "a", b => "b", c => "c" }, null => bless({ universe => 'fix' }, "Set::Scalar::N +ull"), universe => undef, }, "Set::Scalar::Universe"), }, "Set::Scalar"); $a->{universe}{null}{universe} = $a->{universe}; $a; }
      Well the OP said "made up of same chars" but he wants "same amount of same characters" ¹

      Please compare how his test for "ABB" and "ABA" are supposed to fail.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

      ¹) or in math lingo "is a permutation"