in reply to AWK? Split one file in seperate files based on country

Sorry, no awk. Here is a Perl solution.
It reads the whole file into a hash, each line is stored at its country. Then, output files are created and their lines are saved to them.
#!/usr/bin/perl use strict; use warnings; my %countries; open my $IN, '<:encoding(utf-8)', '1.csv' or die $!; while (<$IN>) { my @columns = split /;/; push @{ $countries{$columns[0]} }, $_; } for my $country (keys %countries) { open my $OUT, '>:encoding(utf-8)', "$country.csv" or die $!; for (@{ $countries{$country} }) { print {$OUT} $_; } close $OUT or die $!; }
I used UTF-8 encoding for the accented characters. You might need to change the encoding if your input file uses a different one.

Replies are listed 'Best First'.
Re^2: AWK? Split one file in seperate files based on country
by Janwhatever (Novice) on May 31, 2012 at 08:28 UTC
    Hi, i'm trying to understand what you created and piece by piece i'm putting it together. But when i run it i get the following error.

    utf8 "\xEB" does not map to Unicode at C:\bla\bla\bla line 9, <$IN> line XXXX.

    So i'm guessing this has something to do with the encoding. I'm not sure what kind of encoding i use. Is there a way to look this up in my file? Or is it possible to remove the encoding part? , '<:encoding(utf-8)' so it will read it as a normal file perhaps? Because i didn't need it before.

    @ BrowserUk / jwkrahn: i dont have enough experience to work with your answers i'm afraid

    edit: When i remove the encoding part i get this: No such file or directory at Z:\Data-Content\Data\test\jan\ALL_DATA\ori.pl line 13, <$IN> line 50001. (line 50001 being the end of the input file). What am i doing wrong?
      Have you removed the : as well? The open should look like
      open my $IN, '<', '1.csv' or die $!;
        Haha it worked! Thx so much:D This explains a lot for me:) I do have one other question. The first field of the original file has the names of the columns, is it possible to apply this to the output files aswell? Or in other words, keep the first row in al output files.

        So the first row consist of: country_name;region;city;etc;etc;etc