fish496 has asked for the wisdom of the Perl Monks concerning the following question:

I have a list of some 30000 Swedish words. The words of 3 extra swedish alphabets ö , å, å I wish to create a backward dictionry (Alphabetical sort from the last letter of the word) so that all the words with similar suffixes are together. In order to achieve this i tried to convert all the words to their reverse using reverse function but the 3 extra alphabets are replaced with some other charecters
¥ - å
€Ã -ä
¶Ã - ö
this is my code to reverse the word. How do i convert the string to reverse without getting above mentioned characters my code to reverse is like this
#!/usr/bin/perl -w # Reading protein sequence data from a file, take 3 # The filename of the file containing the protein sequence data $filename = 'svensk.txt'; # First we have to "open" the file open(FILE, $filename); # Read the protein sequence data from the file, and storeit # into the array variable @protein @WORDS = <FILE>; # Print the protein onto the screen ##print scalar (@protein); foreach $str (@WORDS) { $str2 = reverse $str; print $str2; } close PROTEINFILE; exit;
Is

2019-02-08 Athanasius added code and pre tags

Replies are listed 'Best First'.
Re: Creating reverse dictionary
by GrandFather (Saint) on Feb 08, 2019 at 01:53 UTC

    Perl needs to know that you are dealing with Unicode characters. Consider:

    #!/usr/bin/perl use strict; use warnings; use utf8; my $str = "ö123 å456 å789"; open my $inFileUTF, '<:utf8', \$str; my @lines = <$inFileUTF>; chomp @lines; @lines = map {scalar reverse} @lines; print "Using utf8\n"; print "$_\n" for @lines; open my $inFile, '<', \$str; @lines = <$inFile>; chomp @lines; @lines = map {scalar reverse} @lines; print "\nUsing default file I/O\n"; print "$_\n" for @lines;

    Prints:

    Using utf8 321ö 654å 987å Using default file I/O 321¶Ã 654¥Ã 987¥Ã

    Note the use of both strict and warnings. Always use strictures.

    Note the use of three parameter open and lexical file handles (usual file checking omitted because of the string trick).

    The \$str syntax lets me open the string as though it were a file. Very useful here to provide the data in the same "file" as the script.

    See PerlIO for a description of the open custom layer options - the :utf8 bit.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Creating reverse dictionary
by haukex (Archbishop) on Feb 08, 2019 at 07:19 UTC

    As GrandFather said, the issue you describe definitely sounds like an encoding issue. In terms of sorting, you might be interested in Unicode::Collate:

    use warnings;
    use strict;
    use Unicode::Collate;
    use utf8;
    use open qw/:std :utf8/;
    
    my @words=('två','tvagit','ägg','agget','öga','ogallrad','Offerdal');
    
    my @rev = map {scalar reverse} @words;
    my @sorted = Unicode::Collate->new->sort(@rev);
    
    print "$_\n" for @sorted;
    
    __END__
    
    agö
    åvt
    darllago
    ggä
    ladreffO
    tegga
    tigavt
    
Re: Creating reverse dictionary
by hippo (Archbishop) on Feb 08, 2019 at 09:54 UTC

    How to handle this will depend on both the character encoding of your input file and the character encoding of your output (in this case, the terminal). Once you know these you can use Encode to decode from your input and encode to your output.

    Now, here are a few supplementary questions for you:

    • Why no strict?
    • Why do your comments refer throughout to protein sequences when instead you are dealing with a Swedish dictionary?
    • Why persist with bareword filehandles? This combined with the lack of strict means that you did not spot that you have opened FILE but attempted to close PROTEINFILE.

    If you can correct these and properly decode from and encode to the correct character encodings you should be all set. Good luck.

Re: Creating reverse dictionary
by NetWallah (Canon) on Feb 08, 2019 at 01:47 UTC
    Not sure about the contents of your file, but this works for me :
    g>perl -E "$x.= chr($_) for (148,133,134); say $x ;say scalar reverse +$x " öàå åàö

                    As a computer, I find your faith in technology amusing.