Re^3: Perl Formatting Text

So I see the plot thickens...

I made a straightforward modification to previous code to account for the fact that you have pairs of things instead of single space separated things in the input data. I am confused by your last example output line 10GBE_ADDR1 (ANALOG:107) U212.19. I just assumed that this was a cut-n-paste error? If not, then you have a lot more explaining to do about "what the rules are".

I am not sure if this is what you need, but we are incrementally closer...

#!/usr/bin/perl
use warnings;
use strict;

while (my $line = <DATA>)
{
  next if $line =~ /^\s*$/; #skip blank lines
  
  my ($label, @rest) = split ' ', $line;
  
  my @pairs;
  while (@rest)
  {
     my $first_num_thing = shift @rest;
     my $paren_thing = shift @rest;
     push @pairs, "$first_num_thing $paren_thing";
  }
  
  @pairs = sort @pairs;  #may need special sort??
  
  foreach my $col (@pairs)
  {
     print "$label $col\n";
  }
}  

=prints
10GBE_ADDR1 R3629.2 (ANALOG:107)
10GBE_ADDR1 R3633.1 (ANALOG:107)
10GBE_ADDR1 U212.19 (INPUT:107)
=cut


__DATA__
10GBE_ADDR1 R3629.2 (ANALOG:107) R3633.1 (ANALOG:107) U212.19 (INPUT:1
+07)
[download]

Now of course in your "real" code vs my "demo" code, use something more descriptive that "$paren_thing". I am sure in your actual context that thing has some name or description that makes a lot more sense than that!

I hope that you have read my previous answer to your questions and that this post makes more sense to you now. As with the previous code post, this is "runnable code" as is.

What I expect you to do is use my code as a starting point. Play with it. Modify it. I am trying to provide enough info to get you "unstuck". You need to start writing some code yourself. There are of course other ways to write this code. I attempted to be straightforward and not overly fancy.

Update:
Ok, I will demo another technique. If you can understand how both of these programs work, then you are well on your way. Split and "match global" can solve an enormous percentage of file parsing problems.

#!/usr/bin/perl
use warnings;
use strict;

while (my $line = <DATA>)
{
  next if $line =~ /^\s*$/; #skip blank lines
  
  my ($label, $rest) = split ' ', $line,2;
  
  (my @pairs) = $rest =~ /(\S+\s+\S+)/g; #called "match global";
  @pairs = sort @pairs;
  
  foreach my $col (@pairs)
  {
     print "$label $col\n";
  }
}  

=prints
10GBE_ADDR1 R3629.2 (ANALOG:107)
10GBE_ADDR1 R3633.1 (ANALOG:107)
10GBE_ADDR1 U212.19 (INPUT:107)
=cut


__DATA__
10GBE_ADDR1 R3629.2 (ANALOG:107) R3633.1 (ANALOG:107) U212.19 (INPUT:1
+07)
[download]

Comment on Re^3: Perl Formatting Text Select or Download Code

Replies are listed 'Best First'.
Re^4: Perl Formatting Text by oopl1999 (Novice) on Jun 23, 2016 at 23:54 UTC
Thank you for the help! I will run these programs and play around with them. If I have any more questions I will ask for clarification.	[reply]
Re^4: Perl Formatting Text by oopl1999 (Novice) on Jun 24, 2016 at 01:56 UTC
Hi I was wondering if you could help with a couple last things. I have modified the code to take input from a file and also I have commented out everything to make sure I understand it. And by the way I have gone with the first method you provided. My first problem is getting the output to a new file rather than to the terminal. I have tried several different methods of this and have failed. The second problem is in the data itself. Every once in a while the data may look like this because there are two many of the tags for one id: 10GBE_ADDR1 R3629.2 (ANALOG:107) R3633.1 (ANALOG:107) U212.19 (INPUT:107) Basically it will start a new line to write all of the tags. I am not sure how to recognize that there is no id and how to then make the following tags attribute themselves back to the previous id. And here is the code I used. open FILE, '<', "golden.rpt" or "die unable to open read file $!"; while (my $line = <FILE>) { next if $line =~ /^\s*$/; #skip blank lines next if $line =~ /#/; #Skip comments my ($netname, @referencedesignators) = split ' ', $line; #split file + into a scalar for netname and put all of the reference designators i +nto an array my @singlereference; while (@referencedesignators) { my $firstpart = shift @referencedesignators; #split array into tw +o scalars one for the letter then number sequence and the other for t +he analog thing my $secondpart = shift @referencedesignators; push @singlereference, "$firstpart $secondpart"; #push these two +scalars to form a pair. Each pair is one reference designator. These +pairs form an array. } @singlereference = sort {$a <=> $b} @singlereference; #sort by ascen +ding foreach my $col (@singlereference) { print "$netname $col\n"; #print the netname along with each colum +n of the array containing the singlereference designators. } } print "done\n"; [download]	[reply] [d/l]
Re^5: Perl Formatting Text by Marshall (Canon) on Jun 24, 2016 at 13:31 UTC
Hi, well if you did a google search on "perl open", you would come to http://perldoc.perl.org/functions/open.html as the first link. You need something like this: `open OUT , '>', "outfilename" or die "unable to open out $!";` [download] Then instead of just `print "whatever"` you just put in `print OUT "whatever"` . The print goes to the OUT file instead of to the terminal. The kind of problem that you are having where the file contains something weird that you find during debug happens all the time when using ad hoc methods to parse something for which a complete spec is not known in advance. I am happy to see that you are making an effort to understand the code. My efforts to help are pointless unless actual knowledge is being transferred. The generic problem is that you can't output the new lines until you are sure that you've got all of the `referencedesignators`. The typical solution is to delay the printout until you have seen the next line with a valid `$netname`. This introduces a couple of complications. First, how do we tell if this is a new "record" or a continuation of the previous line? That depends upon what $netnames look like. If all netnames have an underscore in them and the other continuation tokens do not, then something like this would work: `my $test = '10GBE_ADDR1'; #decide if first part of line has ABC901X_ ... print "new record\n" if $test =~ m/^[A-Z0-9a-z]+_/;` [download] The above code decides that "10GBE_" is a match. If that regex (regular expression) is not adequate, then some other "rule" is needed. Second, since we don't have a simple: read line, process line, print line(s), some "memory" is needed. ie. we have "read line, print lines based upon previous $netname and previous @singlereference if a new record is starting, process line". Third, you will find that the last record is problematic. The while loop will end when there are no more lines, but that last record will not have been output yet. So you need some "cleanup" code to do that. `sub print_record { return unless @singlereference; #no work to do @singlereference = sort {$a <=> $b} @singlereference; foreach my $col (@singlereference) { print "$netname $col\n"; } @singlereference =(); #reset array to empty return; }` [download] Now you can call this subroutine inside or after the while loop ends provided that you give @singlereference and $netname greater scope by declaring them before the while loop starts. Hope these tips helps. Try some code and ask if you are having problems. The code doesn't need to some "masterpiece", it just has to be logically correct and work.	[reply] [d/l] [select]
Re^6: Perl Formatting Text by oopl1999 (Novice) on Jun 25, 2016 at 01:43 UTC
I had already tried what you said to output to the file but it was not working. I was previously trying this without opening the file in the beginning of the code which solved my problem. As for the problem when the data skips to a new line, I actually solved that by myself. I saw your code but decided I should at least attempt before getting help. Thank you for all the help!	[reply]
Re^7: Perl Formatting Text by Marshall (Canon) on Jun 26, 2016 at 14:26 UTC
Re^4: Perl Formatting Text by oopl1999 (Novice) on Jun 23, 2016 at 23:54 UTC
And that was a copy paste error.	[reply]