perlron has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
With a desire to write elegant/optimal code in perl,i want to know any suggestions the monks might have to spare me.
A customer gave me a word document with country (duplicates due to multiple commitees per country)names.
I need to create a drop down in html showing country names. Hence my trivial code below to read a list of duplicate country names , identify unique names and write them to a file in alphabetical order. Basic stuff.
#!/usr/bin/perl use strict; my ($key,$name,%countries); open (my $fh1,"<","files/country_listv1.txt") or die $!; while(<$fh1>){ if (!exists $countries{$_}){ $countries{$_} = '1'; } } open (my $fh2,">","files/country_listv2.txt") or die $!; foreach $key (sort keys %countries){ print $fh2 $key; } close($fh1,$fh2);
update I still have one issue the comma character appears as €TM. I suspect its a encoding issue of my text file. I am using textedit on a mac which doesnt seem to have a save as UTF8 option.strange. Im just updating this query than create a new one. Im going to try some options but please tell me which is the best way out, as this problem seems a generic one. use utf8 encoding in my perl script. recreate the text file in vim any other options ?

Do we have some perl code checker available for us to validate scripts / modules ?
Thanks
Do not wait to strike when the iron is hot! Make it hot by striking - WB Yeats

Replies are listed 'Best First'.
Re: Sorting/Cleansing a Duplicate File
by toolic (Bishop) on Oct 25, 2014 at 13:48 UTC
    Another common way to code the while loop:
    while(<$fh1>){ $countries{$_}++; }
    do we have some perl code checker available for us to validate scripts / modules ?
Re: Sorting/Cleansing a Duplicate File
by Laurent_R (Canon) on Oct 25, 2014 at 14:28 UTC
    Hi,

    Your code looks OK to me, but you don't really need this line:

    if (!exists $countries{$_})
    in your code, the hash will remove duplicates without that.

    Concerning your other question, this is not exactly a code checker, but the use of the:

    use warnings;
    pragma will definitely help you finding a number of common errors and possibly dangerous or deprecated constructs.

      thanks . u mean the value will get re-assigned every time ? i wil check this.. thats a good one.!
      Do not wait to strike when the iron is hot! Make it hot by striking - WB Yeats
        Yes, it will be reassigned every time, but you don't care, do you? And at the end, the keys of your hash will be unique.
Re: Sorting/Cleansing a Duplicate File
by poj (Abbot) on Oct 25, 2014 at 14:48 UTC
    .. a word document ..
    I suggest you chomp the input. Also consider removing any leading/trailing whitespace and skipping over blank lines.
    poj
      and the word document introduced some unicode/ut8 characters which somehow appeared wierd on the html page. its that perl utf8 thingy i keep readig about i think.
      Do not wait to strike when the iron is hot! Make it hot by striking - WB Yeats
      thanks i did chomp it. but i need a new line anyway in the file..so i let it be.
      Do not wait to strike when the iron is hot! Make it hot by striking - WB Yeats
        yes, but the last line may not have a new line and then you could get a duplicate.
        poj
Re: Sorting/Cleansing a Duplicate File
by GotToBTru (Prior) on Oct 26, 2014 at 03:44 UTC

    If the file has a .txt extension, I guess it could be a "word document", a file containing words. That name suggest a Microsoft Word document, which probably would make this a bit trickier.

    1 Peter 4:10
      Can i have a link to understand how to deal with tricky utf8 issues when working with text in perl ?
      Do not wait to strike when the iron is hot! Make it hot by striking - WB Yeats
Re: Sorting/Cleansing a Duplicate File
by Anonymous Monk on Oct 29, 2014 at 13:38 UTC

    The most common way to do this, as shown, is to dump all of the values into a hash ... where you only care about the keys, not the values ... then read all of the keys back out with sort( keys( varname ) ).   Unicode can be a wrinkle.