steph_bow has asked for the wisdom of the Perl Monks concerning the following question:


Dear Monks, could you give me a hand ?

I would like to gather some elements of a list into one element.

My start file looks like
0601 3 NORM 2 ALLO XLF753 U 0045 0050 0603 5 NORM 2 ALLO ADR2CG 0430 0438 0604 6 NORM 2 ALLO AF681VC i U 0500 0510 0605 7 NORM 2 ALLO AF651PQ i 0515 0523 0606 8 NORM 2 ALLO AF713BR i 0445 0453 0607 9 NORM 2 ALLO AFR100M i 0520 0533 0609 11 NORM 2 ALLO GJT775 i E 2300 2315 0610 12 NORM 2 ALLO AF661WN i 0450 0500

I would like that my outfile looks like :
0601;3;NORM;2;ALLO;XLF753;U;0045;0050; 0603;5;NORM;2;ALLO;ADR2CG;;0430;0438; 0604;6;NORM;2;ALLO;AF681VC;i U;0500;0510; 0605;7;NORM;2;ALLO;AF651PQ;i;0515;0523; 0606;8;NORM;2;ALLO;AF713BR;i;0445;0453; 0607;9;NORM;2;ALLO;AFR100M;i;0520;0533; 0609;11;NORM;2;ALLO;GJT775;i E;2300;2315; 0610;12;NORM;2;ALLO;AF661WN;i;0450;0500;

So I have made a split and I took into consideration several cases, depending on the configuration of the lines (if they have no single letter, one single letter or several)
I have made a code but that does not seem to work. Could you please tell me where I was wrong ? Thanks
#!/usr/bin/perl use strict; use warnings; # this is the file we wish to have in a good format my $file = "$ARGV[0]"; my $Current_Dir = `pwd`; # print STDOUT "the current directory is $Current_Dir"; open(INFILE,"$ARGV[0]") or die "Can't open $ARGV[0]: $!"; # name of the OUTFILE # do not forget the "" if # never put a \n at the end of the OUTFILE name otherwise it does not +create the output my ${outfile_name} = "bon_format_$file"; # to open the file # OUTFILE is the name of the HANDLE in this case open (OUTFILE, ">${outfile_name}.csv") or die "Can't open ${outfile_ +name}.csv: $!"; my @Parts; my $part; while (<INFILE>) { # the lines are composed of elements separated by a point comma my $Line = $_; my @Elements = split(";", $Line); my $element = $Elements[1]; @Parts = split(" ",$element); print STDOUT "le septième élément est $Parts[6]\n"; my $longueur = @Parts; print STDOUT "le nombre d'éléments est $longueur\n"; # case where $Parts[5] is ADR2G and $Parts[6] is 2045 if (($Parts[5] eq /\w+\d+/) & ($Parts[6] eq /\d\d\d\d/)){ print OUTFILE "$Parts[5];$Parts[6]\n"; } # case where $Parts[6] is E and $Parts[7] is 6043 if (($Parts[6] eq /\w/) & ($Parts[7] eq /\d\d\d\d/)){ print OUTFILE "$Parts[6];$Parts[7]\n"; } # case where $Parts[6] is i, $Parts[7] is E and $Parts[8] +is 6043 if (($Parts[6] eq /\w/) & ($Parts[7] eq /\w/) & ($Parts[8] + eq /\d\d\d\d/)){ print OUTFILE "$Parts[6] $Parts[7];$Parts[8]\n"; } # case where $Parts[6] is E, $Parts[7] is i, $Parts[8] is +U and $Parts[8] is 3065 if (($Parts[6] eq /\w/) & ($Parts[7] eq /\w/) & ($Parts[8] + eq /\w/) & ($Parts[9] eq /\d\d\d\d/)){ print OUTFILE "$Parts[6] $Parts[7] $Parts[8];$Parts[9]\n"; } } close INFILE; close OUTFILE;

Replies are listed 'Best First'.
Re: gathering of some elements of a list
by FunkyMonk (Bishop) on Jul 30, 2007 at 14:06 UTC
    Looks like fixed width columns, so use that fact...

    while( <DATA> ) { my $ieu = substr( $_, 34, 7); $ieu =~ s/\s+//g; substr( $_, 34, 7) = sprintf("%-7s", $ieu || '-'); print; } __DATA__ 0601 3 NORM 2 ALLO XLF753 U 0045 0050 0603 5 NORM 2 ALLO ADR2CG 0430 0438 0604 6 NORM 2 ALLO AF681VC i U 0500 0510 0605 7 NORM 2 ALLO AF651PQ i 0515 0523 0606 8 NORM 2 ALLO AF713BR i 0445 0453 0607 9 NORM 2 ALLO AFR100M i 0520 0533 0609 11 NORM 2 ALLO GJT775 i E 2300 2315 0610 12 NORM 2 ALLO AF661WN i 0450 0500 0500

    Output:

    0601 3 NORM 2 ALLO XLF753 U 0045 0050 0603 5 NORM 2 ALLO ADR2CG - 0430 0438 0604 6 NORM 2 ALLO AF681VC iU 0500 0510 0605 7 NORM 2 ALLO AF651PQ i 0515 0523 0606 8 NORM 2 ALLO AF713BR i 0445 0453 0607 9 NORM 2 ALLO AFR100M i 0520 0533 0609 11 NORM 2 ALLO GJT775 iE 2300 2315 0610 12 NORM 2 ALLO AF661WN i 0450 0500 0500

    I've added $ieu || '-' so that an empty column is replaced with a dash. You may not want that, but I find dealing with delimetered columns easier than fixed width.

    update: added output


      Thanks a lot FunkyMonk

      That's exactely how I would like the output to be (I can then put a delimitator but the main difficulty was to gather the i, E, U, etc ...)
Re: gathering of some elements of a list
by ikegami (Patriarch) on Jul 30, 2007 at 14:01 UTC

    Is your data aligned as shown? It might make more sense to treat the records (lines) as having as fixed width fields.

    while (<DATA>) { my @parts = unpack('A5 A7 A5 A3 A5 A9 A2 A4 A2 A5 A5 A*', $_); print(join('|', @parts), "\n"); } __DATA__ 0601 3 NORM 2 ALLO XLF753 U 0045 0050 0603 5 NORM 2 ALLO ADR2CG 0430 0438 0604 6 NORM 2 ALLO AF681VC i U 0500 0510 0605 7 NORM 2 ALLO AF651PQ i 0515 0523 0606 8 NORM 2 ALLO AF713BR i 0445 0453 0607 9 NORM 2 ALLO AFR100M i 0520 0533 0609 11 NORM 2 ALLO GJT775 i E 2300 2315 0610 12 NORM 2 ALLO AF661WN i 0450 0500 0500
    0601|3|NORM|2|ALLO|XLF753|||U|0045|0050| 0603|5|NORM|2|ALLO|ADR2CG||||0430|0438| 0604|6|NORM|2|ALLO|AF681VC|i||U|0500|0510| 0605|7|NORM|2|ALLO|AF651PQ|i|||0515|0523| 0606|8|NORM|2|ALLO|AF713BR|i|||0445|0453| 0607|9|NORM|2|ALLO|AFR100M|i|||0520|0533| 0609|11|NORM|2|ALLO|GJT775|i|E||2300|2315| 0610|12|NORM|2|ALLO|AF661WN|i|||0450|0500|0500

      Dear ikegami
      I would like that my outfile looks like :
      0601|3|NORM|2|ALLO|XLF753|U|0045|0050| 0603|5|NORM|2|ALLO|ADR2CG||0430|0438| 0604|6|NORM|2|ALLO|AF681VC|i U|0500|0510| 0605|7|NORM|2|ALLO|AF651PQ|i|0515|0523| 0606|8|NORM|2|ALLO|AF713BR|i|0445|0453| 0607|9|NORM|2|ALLO|AFR100M|i|0520|0533| 0609|11|NORM|2|ALLO|GJT775|i E|2300|2315| 0610|12|NORM|2|ALLO|AF661WN|i|0450|0500|0500

      So that there is a unique column for i, E, and U and if two both appear on the same line, that they are in the same column.

        Here's a solution that creates the output you've added to the OP using the updated input data. (When making changes to a post, especially if replies relied on the original unchanged data, specify the changes you made by adding "Update:".)

        while (<DATA>) { my @parts = unpack('A5 A7 A5 A3 A5 A9 A8 A5 A*', $_); for ($parts[6]) { s/\s+/ /g; s/^\s//; } print(join(';', @parts), "\n"); }

        or

        while (<DATA>) { my @parts = unpack('A5 A7 A5 A3 A5 A9 A2 A4 A2 A5 A*', $_); my $flags = join ' ', grep length, @parts[6..8]; splice(@parts, 6, 3, $flags); print(join(';', @parts), "\n"); }

        The second version is basically the same as FunkyMonk's, but I do the substitution after extracting the fields.

Re: gathering of some elements of a list
by liverpole (Monsignor) on Jul 30, 2007 at 14:07 UTC
    Hi steph_bow,

    Your code, as presented, isn't working for me.

    For one thing, $file isn't defined (should it be "ALL_FT"?).  You should really use strict and warnings all the time in your programs, to catch these types of errors.

    Secondly, you appear to be looking for the string "REGULATION ALERTS", but it doesn't appear in your input file.  Hence, the entire block beginning with if (/REGULATION ALERTS/..eof(INFILE)){ is never getting executed.

    Thirdly, I'm not sure exactly what you want for output.  It might be more helpful if you could show an example of the exact output you're looking for.  That way, we'll have a target to aim for.


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/

      Dear Liverpole
      Thanks for your remarks
      I have updated with what you said. (the : "use warnings"; "use strict"; and the entry file : you can give the name you would like)

      in this case, I do not need : "REGULATION ALERTS" : sorry for the error
        Okay, that's definitely an improvement.

        Now you're assigning to $file (which doesn't need quotes around it):

        my $file = "$ARGV[0]"; # Better: (my $file = $ARGV[0]) or die "Syntax error ...\n";

        Why not use the same variable thereafter?:

        open(INFILE,"$ARGV[0]") or die "Can't open $ARGV[0]: $!"; # Better: open(INFILE, $file) or die "Can't open '$file': $!\n";

        But I still think you should specify the exact output you're looking for.  I suspect that the solution to get you from input data to output data may be easier than you think, and knowing what output you require will help us to modify your program to generate that output correctly.


        s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
Re: gathering of some elements of a list
by toolic (Bishop) on Jul 30, 2007 at 14:06 UTC
    The first thing you could do is add the strictures:
    use warnings; use strict;
    This will issue the following useful complaint, notifying you that $file was not assigned a value:
    Global symbol "$file" requires explicit package name at script line 17 +.
    Secondly, your input data does not contain the string "REGULATION ALERTS". When I run your script, I get nothing printed to the output because this string was never found.