in reply to Re: Perl Formatting Text
in thread Perl Formatting Text

10GBE_ADDR1 R3629.2 (ANALOG:107) R3633.1 (ANALOG:107) U212.19 (INPUT:107)

This is an example of one of the lines in the file. I now realize my example probably wasn't the best.

I would want to the end file to be:

10GBE_ADDR1 R3629.2 (ANALOG:107) 10GBE_ADDR1 R3633.1 (ANALOG:107) 10GBE_ADDR1 (ANALOG:107) U212.19

And so on for the next lines

Replies are listed 'Best First'.
Re^3: Perl Formatting Text
by Marshall (Canon) on Jun 23, 2016 at 21:48 UTC
    So I see the plot thickens...

    I made a straightforward modification to previous code to account for the fact that you have pairs of things instead of single space separated things in the input data. I am confused by your last example output line 10GBE_ADDR1 (ANALOG:107) U212.19. I just assumed that this was a cut-n-paste error? If not, then you have a lot more explaining to do about "what the rules are".

    I am not sure if this is what you need, but we are incrementally closer...

    #!/usr/bin/perl use warnings; use strict; while (my $line = <DATA>) { next if $line =~ /^\s*$/; #skip blank lines my ($label, @rest) = split ' ', $line; my @pairs; while (@rest) { my $first_num_thing = shift @rest; my $paren_thing = shift @rest; push @pairs, "$first_num_thing $paren_thing"; } @pairs = sort @pairs; #may need special sort?? foreach my $col (@pairs) { print "$label $col\n"; } } =prints 10GBE_ADDR1 R3629.2 (ANALOG:107) 10GBE_ADDR1 R3633.1 (ANALOG:107) 10GBE_ADDR1 U212.19 (INPUT:107) =cut __DATA__ 10GBE_ADDR1 R3629.2 (ANALOG:107) R3633.1 (ANALOG:107) U212.19 (INPUT:1 +07)
    Now of course in your "real" code vs my "demo" code, use something more descriptive that "$paren_thing". I am sure in your actual context that thing has some name or description that makes a lot more sense than that!

    I hope that you have read my previous answer to your questions and that this post makes more sense to you now. As with the previous code post, this is "runnable code" as is.

    What I expect you to do is use my code as a starting point. Play with it. Modify it. I am trying to provide enough info to get you "unstuck". You need to start writing some code yourself. There are of course other ways to write this code. I attempted to be straightforward and not overly fancy.

    Update:
    Ok, I will demo another technique. If you can understand how both of these programs work, then you are well on your way. Split and "match global" can solve an enormous percentage of file parsing problems.

    #!/usr/bin/perl use warnings; use strict; while (my $line = <DATA>) { next if $line =~ /^\s*$/; #skip blank lines my ($label, $rest) = split ' ', $line,2; (my @pairs) = $rest =~ /(\S+\s+\S+)/g; #called "match global"; @pairs = sort @pairs; foreach my $col (@pairs) { print "$label $col\n"; } } =prints 10GBE_ADDR1 R3629.2 (ANALOG:107) 10GBE_ADDR1 R3633.1 (ANALOG:107) 10GBE_ADDR1 U212.19 (INPUT:107) =cut __DATA__ 10GBE_ADDR1 R3629.2 (ANALOG:107) R3633.1 (ANALOG:107) U212.19 (INPUT:1 +07)

      Hi I was wondering if you could help with a couple last things. I have modified the code to take input from a file and also I have commented out everything to make sure I understand it. And by the way I have gone with the first method you provided.

      My first problem is getting the output to a new file rather than to the terminal. I have tried several different methods of this and have failed.

      The second problem is in the data itself. Every once in a while the data may look like this because there are two many of the tags for one id:

      10GBE_ADDR1 R3629.2 (ANALOG:107) R3633.1 (ANALOG:107)

      U212.19 (INPUT:107)

      Basically it will start a new line to write all of the tags. I am not sure how to recognize that there is no id and how to then make the following tags attribute themselves back to the previous id. And here is the code I used.

      open FILE, '<', "golden.rpt" or "die unable to open read file $!"; while (my $line = <FILE>) { next if $line =~ /^\s*$/; #skip blank lines next if $line =~ /#/; #Skip comments my ($netname, @referencedesignators) = split ' ', $line; #split file + into a scalar for netname and put all of the reference designators i +nto an array my @singlereference; while (@referencedesignators) { my $firstpart = shift @referencedesignators; #split array into tw +o scalars one for the letter then number sequence and the other for t +he analog thing my $secondpart = shift @referencedesignators; push @singlereference, "$firstpart $secondpart"; #push these two +scalars to form a pair. Each pair is one reference designator. These +pairs form an array. } @singlereference = sort {$a <=> $b} @singlereference; #sort by ascen +ding foreach my $col (@singlereference) { print "$netname $col\n"; #print the netname along with each colum +n of the array containing the singlereference designators. } } print "done\n";
        Hi, well if you did a google search on "perl open", you would come to http://perldoc.perl.org/functions/open.html as the first link.

        You need something like this:

        open OUT , '>', "outfilename" or die "unable to open out $!";
        Then instead of just  print "whatever" you just put in print OUT "whatever" . The print goes to the OUT file instead of to the terminal.

        The kind of problem that you are having where the file contains something weird that you find during debug happens all the time when using ad hoc methods to parse something for which a complete spec is not known in advance.

        I am happy to see that you are making an effort to understand the code. My efforts to help are pointless unless actual knowledge is being transferred.

        The generic problem is that you can't output the new lines until you are sure that you've got all of the referencedesignators. The typical solution is to delay the printout until you have seen the next line with a valid $netname. This introduces a couple of complications.

        First, how do we tell if this is a new "record" or a continuation of the previous line? That depends upon what $netnames look like. If all netnames have an underscore in them and the other continuation tokens do not, then something like this would work:

        my $test = '10GBE_ADDR1'; #decide if first part of line has ABC901X_ ... print "new record\n" if $test =~ m/^[A-Z0-9a-z]+_/;
        The above code decides that "10GBE_" is a match. If that regex (regular expression) is not adequate, then some other "rule" is needed.

        Second, since we don't have a simple: read line, process line, print line(s), some "memory" is needed. ie. we have "read line, print lines based upon previous $netname and previous @singlereference if a new record is starting, process line".

        Third, you will find that the last record is problematic. The while loop will end when there are no more lines, but that last record will not have been output yet. So you need some "cleanup" code to do that.

        sub print_record { return unless @singlereference; #no work to do @singlereference = sort {$a <=> $b} @singlereference; foreach my $col (@singlereference) { print "$netname $col\n"; } @singlereference =(); #reset array to empty return; }
        Now you can call this subroutine inside or after the while loop ends provided that you give @singlereference and $netname greater scope by declaring them before the while loop starts.

        Hope these tips helps. Try some code and ask if you are having problems. The code doesn't need to some "masterpiece", it just has to be logically correct and work.

      Thank you for the help! I will run these programs and play around with them. If I have any more questions I will ask for clarification.
      And that was a copy paste error.
Re^3: Perl Formatting Text
by BillKSmith (Monsignor) on Jun 23, 2016 at 20:22 UTC
    oopl1999, You answered only one of my seven questions. Someone might guess the rest of the answers correctly and give you a good solution, but you will get more and better solutions if you post the answers to my previous questions. Remember that examples alone cannot tell about conditions that are impossible.
    Bill

      A line can have more than one letter

      The number or string will never be repeated.

      The letter may be repeated

      If it the letters or numbers are repeated just sort it the same way (as I did in the second example I provided.

      The data is letters and numbers but not all mixed together not split (as shown by my example).

      Sorry for my ignorance. I'm new the forum and Perl as well. I hope you can still help!