emdub has asked for the wisdom of the Perl Monks concerning the following question:

I have a very specific need and I have just enough knowledge to make me dangerous, but not enough to put that lethality to good use. What I need to do is take a string from one file, file1.txt for example, and remove any shared letters with a string from another file, file2.txt for example. And then do the same for the string in file2.txt.

Sample Input:

File1.txt: CHARLIE ROOT
File2.txt: HARRY NODE
Output: CLI T Y ND

In the above the letters shared between the two strings are removed and the resulting combined string is printed out. I have about 300 lines per file. Can someone help?

Replies are listed 'Best First'.
Re: Mostly Harmless
by Athanasius (Archbishop) on Dec 28, 2013 at 06:28 UTC

    Hello emdub, and welcome to the Monastery!

    Here is one way to do this (for a single pair of lines):

    #! perl use strict; use warnings; my $string1 = 'CHARLIE ROOT'; my $string2 = 'HARRY NODE'; my @list1 = split //, $string1; my @list2 = split //, $string2; my %hash1 = map { $_ => undef } @list1; my %hash2 = map { $_ => undef } @list2; delete $hash1{' '}; delete $hash2{' '}; for (@list1) { print unless exists $hash2{$_}; } for (@list2) { print unless exists $hash1{$_}; }

    Output:

    16:26 >perl 815_SoPW.pl CLI TY ND 16:26 >

    There are no doubt more elegant/streamlined approaches, but this should give you an idea. (Extending the script to read and compare lines from the two files is left as an exercise!)

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Here is another approach (again, for one pair of words, so extending would be an exercise for the reader):

      use strict; use warnings; my $str1 = q{CHARLIE ROOT}; my $str2 = q{HARRY NODE}; my $output = process( $str1, $str2 ) . q{ } . process( $str2, $str1 ); print $output, qq{\n}; sub process { my ( $word1, $word2 ) = @_; my @letters_to_remove = grep{ ! m/\s+/; } split //, $word2; my $remove_str = join q{}, q{[}, @letters_to_remove, q{]}; my $result = $word1; $result =~ s/$remove_str//g; return $result; }

      Output:

      CLI T Y ND

      In this code, a character group is constructed on the fly, and a substitution operation replaces each member globally in the string with nothing, removing the character. (The string is not as efficient as it could be, since it may contain repeated characters, but this was off-the-cuff code, so there we are.)

      Hope that helps-if not in usefulness, then at least in seeing different possible approaches to the same problem.

Re: Mostly Harmless
by CountZero (Bishop) on Dec 28, 2013 at 10:07 UTC
    And now using the Set::Scalar module.
    use Modern::Perl; use Set::Scalar; my $first_word = 'CHARLIE ROOT'; my $second_word = 'HARRY NODE'; my $first = Set::Scalar->new( split //, $first_word ); my $second = Set::Scalar->new( split //, $second_word ); say $first - $second; # difference between $first and $second say $second - $first; # difference between $second and $first say $first % $second; # symmetric difference
    Output:
    (C I L T) (D N Y) (C D I L N T Y)

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Mostly Harmless
by GrandFather (Saint) on Dec 28, 2013 at 09:07 UTC

    A single hash (and a couple of temporary hashes) cleans up the code somewhat at the cost of a higher degree of Perl knowledge for understanding:

    use strict; use warnings; my $str1 = 'CHARLIE ROOT'; my $str2 = 'HARRY NODE'; my @chars1 = split '', $str1; my @chars2 = split '', $str2; my %hits; $hits{$_} = 1 for @chars1; ++$hits{$_} for keys {map {$_ => 1} @chars2}; print $_ for grep {$_ eq ' ' || $hits{$_}++ == 1} @chars1, @chars2;

    Prints:

    CLI TY ND
    True laziness is hard work
Re: Mostly Harmless
by Lennotoecom (Pilgrim) on Dec 28, 2013 at 11:26 UTC
    $a[0] = 'CHARLIE ROOT'; $a[1] = 'HARRY NODE'; s/[$a[~$i]]//g and print while ($i, $_) = each @a;
    and if you wish to keep spaces between words
    $a[0] = 'CHARLIE ROOT'; $a[1] = 'HARRY NODE'; s/([$a[~$i]])/$1 eq ' '?' ':''/eg and print while ($i, $_) = each @a;
Re: Mostly Harmless
by kcott (Archbishop) on Dec 29, 2013 at 00:26 UTC

    G'day emdub,

    Welcome to the monastery.

    The following code shows how you could handle multiple pairs of strings. I've hard-coded a space to match your "Output". You haven't indicated exactly what you mean by "letter": I've used any character matching /\p{Alpha}/ (see "perluniprops: Properties accessible through \p{} and \P{}") — your concept of "letter" may differ.

    #!/usr/bin/env perl -l use strict; use warnings; my @pairs = (['CHARLIE ROOT', 'HARRY NODE'], ['abc1', 'cde1']); print get_non_shared(reverse @$_), ' ', get_non_shared(@$_) for @pairs +; sub get_non_shared { my ($ref, $src) = @_; my %ref_char = map { $_ => 1 } grep { /\p{Alpha}/ } split '' => $r +ef; join '' => grep { ! $ref_char{$_} } split '' => $src; }

    Output:

    CLI T Y ND ab1 de1

    Note that you haven't indicated how you get your pairs of strings nor what combinations are involved in the operations. Does one string equate to one line from a file? Are you operating on just single pairs from matching lines (i.e. ~300 operations) or all possible combinations (i.e. ~90,000 operations)? Depending on the answers to those — and potentially other questions, such as how often do you plan to run the program — some efficiency optimisations may be useful.

    -- Ken