oopl1999 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am extremely new to Perl and any help would be appreciated. So I have data in text form such as this:

A 1 B 3 6 2 C 3 4 6 2 10

This is just an example, the actual data is much more complex and long. I need to be able to parse the data from a text file then output the data in the following format into another text file:

A 1 B 2 B 3 B 6 C 2 C 3 C 4 C 6 C 10

The problem is that I wont know how many row or columns the data has. In other words, I don't know how many letters there are or how many numbers each letter has. Can anyone help me with this or give me any pointers?

Replies are listed 'Best First'.
Re: Perl Formatting Text
by Marshall (Canon) on Jun 23, 2016 at 00:30 UTC
    Try something like this:
    #!/usr/bin/perl use warnings; use strict; while (my $line = <DATA>) { next if $line =~ /^\s*$/; #skip blank lines my ($label, @rest) = split ' ', $line; @rest = sort {$a <=> $b}@rest; #numeric sort foreach my $col (@rest) { print "$label $col\n"; } } =prints A 1 B 2 B 3 B 6 C 2 C 3 C 4 C 6 C 10 =cut __DATA__ A 1 B 3 6 2 C 3 4 6 2 10
      Hi I am a little confused with your code. Where did you input the data in the code or put the name of the file that contains the data to sort?
        The post that I made is runnable code. If you download it, it will run "as is". I hope that you did that!

        Instead of an actual data file, I used the predefined DATA file handle. The data that is being read is right after the __DATA__ statement. This allows me to make a single post that shows the program, the output, and the input data.

        What Perl does is open the .pl program for read and then "seeks" to the beginning of the line right after __DATA__. The DATA file handle is initialized by Perl without me doing anything extra. Pretty cool! As trivia, it is possible to "seek" the DATA file handle to the beginning of the file. This would allow a Perl program to actually "read itself".

        To make a "real program", you need to put in something like this:

        open FILE, '<', "yourfilename" or "die unable to open read file $!";
        Now put FILE everywhere that I used DATA.

        The other "trick" that I used was perldoc. Perl has a way of embedding documentation right into the program. There is a utility that generates nicely formatted documentation and HTML pages using certain markup tags. The "=prints" says that what follows is documentation. The "=cut" says "end of documentation". So everything between and including the =prints and =cut tags is skipped by the compiler because it figures that this is program documentation.

        Asking questions if you see something that you don't understand is fine. I can't predict in advance what you know or don't know.

Re: Perl Formatting Text
by AnomalousMonk (Archbishop) on Jun 23, 2016 at 01:11 UTC

    If this is not homework, Text::CSV_XS is your friend. (If this is homework, Text::CSV_XS will be your friend once you're out in the Real World.) For your test input file:

    c:\@Work\Perl\monks\oop11999>perl -e "use warnings; use strict; ;; use Text::CSV_XS; ;; use Data::Dump qw(dd); ;; my $csv = Text::CSV_XS->new ({ sep_char => ' ', }) or die qq{Cannot use CSV: }, Text::CSV_XS->error_diag; ;; open my $fh, '<', 'test.csv' or die qq{opening test.csv: $!}; ;; my %letters; while (my $row = $csv->getline($fh)) { my ($letter, @numbers) = @$row; push @{ $letters{$letter} }, @numbers; } $csv->eof or $csv->error_diag; close $fh or die qq{closing test.csv: $!}; ;; dd \%letters; ;; for my $letter (sort keys %letters) { print qq{$letter $_ \n} for @{ $letters{$letter} }; } " { A => [1], B => [3, 6, 2], C => [3, 4, 6, 2, 10] } A 1 B 3 B 6 B 2 C 3 C 4 C 6 C 2 C 10
    (The  dd \%letters; statement is just for debug and illustration.)


    Give a man a fish:  <%-{-{-{-<

Re: Perl Formatting Text
by NetWallah (Canon) on Jun 23, 2016 at 04:40 UTC
    Here is the obligatory one-liner:
    perl -an -E '$c=shift@F; say qq|$c $_\n| for sort {$a<=>$b} @F' your-f +ile.txt

            This is not an optical illusion, it just looks like one.

Re: Perl Formatting Text
by BillKSmith (Monsignor) on Jun 23, 2016 at 02:32 UTC
    I have a lot of questions about your requirements. Can a line have more than one letter? Can the same number appear more than once for the same letter? Can the same letter be repeated anywhere on a line? Or in the file? If the answer to any of these questions is 'yes', what must you do? Is your real data just numbers and letters? If not, how can we tell the difference?
    Bill
      10GBE_ADDR1 R3629.2 (ANALOG:107) R3633.1 (ANALOG:107) U212.19 (INPUT:107)

      This is an example of one of the lines in the file. I now realize my example probably wasn't the best.

      I would want to the end file to be:

      10GBE_ADDR1 R3629.2 (ANALOG:107) 10GBE_ADDR1 R3633.1 (ANALOG:107) 10GBE_ADDR1 (ANALOG:107) U212.19

      And so on for the next lines

        So I see the plot thickens...

        I made a straightforward modification to previous code to account for the fact that you have pairs of things instead of single space separated things in the input data. I am confused by your last example output line 10GBE_ADDR1 (ANALOG:107) U212.19. I just assumed that this was a cut-n-paste error? If not, then you have a lot more explaining to do about "what the rules are".

        I am not sure if this is what you need, but we are incrementally closer...

        #!/usr/bin/perl use warnings; use strict; while (my $line = <DATA>) { next if $line =~ /^\s*$/; #skip blank lines my ($label, @rest) = split ' ', $line; my @pairs; while (@rest) { my $first_num_thing = shift @rest; my $paren_thing = shift @rest; push @pairs, "$first_num_thing $paren_thing"; } @pairs = sort @pairs; #may need special sort?? foreach my $col (@pairs) { print "$label $col\n"; } } =prints 10GBE_ADDR1 R3629.2 (ANALOG:107) 10GBE_ADDR1 R3633.1 (ANALOG:107) 10GBE_ADDR1 U212.19 (INPUT:107) =cut __DATA__ 10GBE_ADDR1 R3629.2 (ANALOG:107) R3633.1 (ANALOG:107) U212.19 (INPUT:1 +07)
        Now of course in your "real" code vs my "demo" code, use something more descriptive that "$paren_thing". I am sure in your actual context that thing has some name or description that makes a lot more sense than that!

        I hope that you have read my previous answer to your questions and that this post makes more sense to you now. As with the previous code post, this is "runnable code" as is.

        What I expect you to do is use my code as a starting point. Play with it. Modify it. I am trying to provide enough info to get you "unstuck". You need to start writing some code yourself. There are of course other ways to write this code. I attempted to be straightforward and not overly fancy.

        Update:
        Ok, I will demo another technique. If you can understand how both of these programs work, then you are well on your way. Split and "match global" can solve an enormous percentage of file parsing problems.

        #!/usr/bin/perl use warnings; use strict; while (my $line = <DATA>) { next if $line =~ /^\s*$/; #skip blank lines my ($label, $rest) = split ' ', $line,2; (my @pairs) = $rest =~ /(\S+\s+\S+)/g; #called "match global"; @pairs = sort @pairs; foreach my $col (@pairs) { print "$label $col\n"; } } =prints 10GBE_ADDR1 R3629.2 (ANALOG:107) 10GBE_ADDR1 R3633.1 (ANALOG:107) 10GBE_ADDR1 U212.19 (INPUT:107) =cut __DATA__ 10GBE_ADDR1 R3629.2 (ANALOG:107) R3633.1 (ANALOG:107) U212.19 (INPUT:1 +07)
        oopl1999, You answered only one of my seven questions. Someone might guess the rest of the answers correctly and give you a good solution, but you will get more and better solutions if you post the answers to my previous questions. Remember that examples alone cannot tell about conditions that are impossible.
        Bill