Re: Remove unwanted chars and maintain table integrity

I took a stab at this (below). You actually have a "pretty well behaved" input format. The basic job of the main loop is to assemble a "%record" which is all the stuff that relates to last line containing some "name" info that was seen.

Many of these types of parsing problems have issues about the first record, last record or both, ie these situations are often handled just slightly different than all the stuff in between. Below, I try to print if I see a line with the name info. That will happen for the first real "data" line, but the print routine won't actually do anything because it will figure out that there is no "real" data there yet!

I use a regex to parse the input line and I go straight into application specific $variables without using any $1,$2,$3,$4 stuff. Those numbers don't have any application specific meaning and are just "clutter". I used a special switch on the regex so that I could line up the variable names with what is being captured in the regex by adding spaces. The next lines are very regular in appearance and function if the $var isn't a "W", then something is done with it. The id's are an array. Use split when the separator is very regular. Think regex when this simple idea doesn't work.

The output subroutine uses a "formatted print". This is ancient stuff (predates even 'C') and Perl supports this functionality. You can specify if things are left or right justified and how wide the field is. For most report generation applications, it is not necessary to "find the longest line" and then adjust things based upon that. In fact that is often the wrong thing to do! I advocate a nice solution for the 99% case and let the other 1% go into some "unaligned, wacko looking case". If you get too much space in between the columns, this degrades the ability to read the report easily -> go for 99% always looks nice vs sometimes 100% hard to read! You can read Perl doc for how to use printf and adjust spacings accordingly. Always put at least one explicit space between fields! (so that 2 fields don't ever "run together")

Note that I call output again after the main loop to take care of "last record" special case. Hope this additional explanation verbiage helps you. You said that you were new and that often triggers me to at least try to explain more.

Have fun!

#!/usr/bin/perl -w
use strict;

<DATA>; #throws away first line, no need for an lvalue

my %record =();

while (<DATA>)
{
   next if /^\s*$/;                 # skip blank lines
   output_record() if (!/^W\s/);   # just an "attempt to print"
   
   my (   $name,      $count,     $length,       $id) = 
      (m/^(.*?)\s{2,} (\S+)  \s+  (\S+)    \s+   (\S+)/x);
      
   $record{'name'}  = $name     if $name   !~ /^W\s*$/;
   $record{'count'} = $count    if $count  !~ /^W\s*$/;
   $record{'length'}= $length   if $length !~ /^W\s*$/;
   push (@{$record{'id'}},$id)  if $id     !~ /^W\s*$/;
}

output_record();

sub output_record
{ 
   if (!exists($record{'name'})) { return }
   
   printf "%-30s  %-3s  %-3s  %s\n", $record{'name'},
                                     $record{'count'},
                                     $record{'length'},
                                     shift @{$record{'id'}};
                                
   foreach my $id ( @{$record{'id'}} )
   {
      printf "%47s\n", $id;
   }
   
   print "\n";     #blank as spacer before next record
   
   %record=();     #record dumped, so delete it!
   return;                                
}

=CODE PRINTS:
Timothy Watson 12 Medulla       5    16   ID:10
                                          ID:11
                                          ID:12
                                          ID:13
                                          ID:14

Maya Alabina 5 Exo              1    11   ID:28
                                          ID:30
=cut                                          


__DATA__
Character                        Count Length Pro_ID
Timothy Watson 12 Medulla    W     W      W
W                            W     W      ID:10
W                            W     W      ID:11
W                            W     W      ID:12
W                            W     W      ID:13
W                            W     W      ID:14
W                            5     W      W
W                            W     16     W
Maya Alabina 5 Exo           W     W      W
W                            W     W      ID:28
W                            W     W      ID:30
W                            1     W      W
W                            W     11     W
[download]

Comment on Re: Remove unwanted chars and maintain table integrity Download Code