melun has asked for the wisdom of the Perl Monks concerning the following question:

I've been looking for a way to resort&rearrange hundreds of lines from a file, and then save the result in three of more files.
Im new to perl and i havent managed to produce a bigger script like this with guides.
The conversion goes like this;

First i get the raw list which formatting i cannot change directly:
- The first number is the category of the user - these could be sorted to multiple files, 1's in 1.txt and 2's in 2.txt and so on )
1 Förenäme1 Surname1 passwd1 1 Forename2 Surnäme2 passwd2 2 Forenäme3 Surname3 passwd3 3 Forename4 Surname4 passwd4
These are meant to be converted to a list like this:
- The first |-delimited field is in lovercase and ä,å->a, ö->o
- Other copied values are like in the unprocessed file
- Text-contents of FOO BAR and BAZ can be configured in the start of the process
forename1 surname1|passwd1|Förenäme1 Surname1|FOO|BAR|BAZ|FOO Förenäme +1 Surname1
Hope somebody could give me a push to the right direction. Thanks in advance.

Update:
Thank's Fizbin for your diligence! Works like a charm.
Started to expand this forward...

Replies are listed 'Best First'.
Re: A bit more complex resorting
by fizbin (Chaplain) on Aug 21, 2005 at 16:20 UTC

    I have to disagree with the previous poster - there's nothing really database-y about what you want to do.

    I'm not going to do everything, just the bits that I found interesting.

    Note that the following works reliably only on perl 5.8 and above.
    Although you may get the following to work in an earlier perl, the outcome will likely not be what you want. Specifically, the output is likely to be in utf8, which I strongly suspect is not what you wanted.

    The interesting thing in what you ask is to remove all those accents. The easiest way to do that is with the Unicode::Normalize module, which is not installed by default. (You'll need to install that via CPAN) This module gives you access to various Unicode normalization forms; the one we'll use is called NFKD, which splits all accented letters into multi-character sequences of letter + combining accent mark. Then, you can use a regular expression using perl's support for unicode properties to remove any character that has the "mark" property. (that's what \pM is doing below)

    So here's code that'll do what you want, except for the splitting the lines into categories and seting FOO BAR and BAZ from the command line, both of which should be easy changes to make.

    #! perl use Unicode::Normalize; # for the NFKD function use strict; use warnings; my ($FOO, $BAR, $BAZ) = qw(FOO BAR BAZ); # uses system default encoding for INFILE; say # '<:encoding(iso-8859-1)' to explicitly use iso-latin-1 open(INFILE, '<', 'test1.txt'); while (<INFILE>) { chomp; my ($category, $fornom, $surnom, $pass, @rest) = split; die "Extra crud at the end of the line: @rest" if (@rest); my ($squashed_fornom) = NFKD($fornom) # NFKD separates accented +letters # into letters + combining + mark $squashed_fornom =~ s/\pM//g; # remove marks $squashed_fornom = lc($squashed_fornom); # lowercase my ($squashed_surnom) = NFKD($surnom); $squashed_surnom =~ s/\pM//g; $squashed_surnom = lc($squashed_surnom); print "$squashed_fornom $squashed_surnom"; print "|$pass|$fornom $surnom|$FOO|$BAR|$BAZ|$FOO $fornom $surnom\n" +; }

    And that's it. For older perl versions, you'd probably have to go through and manually create a lookup table to convert from an accented letter to a non-accented letter.

    Update: Changed the code to something that'll work in perl 5.6 and higher, though this code is highly fragile on perls that old, and the slightest change is liable to cause your output to spring back to utf-8.

    -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
Re: A bit more complex resorting
by jZed (Prior) on Aug 21, 2005 at 14:23 UTC
    Since you are asking about database-like operations, why don't you use one of the modules that treats "delimited" text files as databases. Both DBD::CSV and DBD::AnyData can handle the kinds of files you show. If you know SQL, you can simply use ORDER BY to sort on any column, CREATE TABLE AS SELECT to make new tables from combinations of old tables, etc. If you don't know SQL, you can use those same modules with Tie::DBI or Class::DBI as front ends that will hide the SQL from you.