Parsing Header Last Field

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parsing Header Last Field by toolic (Bishop) on Jul 05, 2011 at 15:04 UTC
I think you have an off-by-one error. Change: `for ( my $i = 0 ; $i <= @parts ; $i++ ) {` [download] to: `for ( my $i = 0 ; $i < @parts ; $i++ ) {` [download] You probably need to chomp @Proto, then add the newline back to your @Assembly. After using perltidy, here are the changes: use warnings; use strict; use Text::ParseWords; my @Proto = <DATA>; chomp @Proto; my $Header = shift @Proto; my @Assembly; my $Nr = &parsingHeader( $Header, "Number" ); my $Nr_1 = &parsingHeader( $Header, "Number_1" ); my $name = &parsingHeader( $Header, "Name" ); foreach (@Proto) { my @fields = quotewords( ';', 0, $_ ); $fields[$Nr] =~ tr/0123456789/9876543210/ if defined $Nr; $fields[$Nr_1] =~ tr/0123456789/9876543210/ if defined $Nr_1; $fields[$name] =~ tr/A-Za-z/B-Wzb-w/ if defined $name; push @Assembly, join( ";", @fields ), "\n"; } unshift @Assembly, $Header; print @Assembly; sub parsingHeader { my $x; my $Row = shift @_; my $Column = shift @_; my @parts = split( /;/, $Row ); for ( my $i = 0 ; $i < @parts ; $i++ ) { if ( $parts[$i] eq $Column ) { $x = $i; last; } } return $x; } __DATA__ Number;Name;Age;Gender;Number_1;Dummy 123;Andrew;54;m;123;AAA 234;John;43;m;234;JJJ 345;Helen;23;w;345;HHH [download]	[reply] [d/l] [select]
Re^2: Parsing Header Last Field by GrandFather (Saint) on Jul 06, 2011 at 12:24 UTC
Even better would be to use a Perl for loop: `for my $i (0 .. $#parts) {` [download] which is clearer, safer and, as icing on the cake, is more succinct too. True laziness is hard work	[reply] [d/l]
Re: Parsing Header Last Field by Tux (Canon) on Jul 05, 2011 at 15:07 UTC
Looking at the data, wouldn't it be much easier to use a CSV based module like Text::CSV_XS, Text::CSV or DBD::CSV? These all support semi-colon as separator character. Enjoy, Have FUN! H.Merijn	[reply]
Re: Parsing Header Last Field by Marshall (Canon) on Jul 05, 2011 at 16:01 UTC
Your code (after the off by one error is fixed) does work, but you aren't getting much "mileage" out of ParseWords. I recoded without that below. A few comments: -don't push when you could print. -there is no need to save all the output in an array. -using a hash allowed me to get rid of variables like $Nr_1 and put code in same lingo as the file itself. -this kind of "anonimization" is easily cracked - not so anon. -you will need a better scheme if this data is sensitive. #!/usr/bin/perl -w use strict; my $header_line = <DATA>; print $header_line; $header_line =~ s/\s$//; # no trailing blanks or \n # a bit better than chomp() here # (chomp only gets rid of \n) # since words will be hash keys + my %index; # field name to index number my $i=0; foreach ( split /;/,$header_line ) { $index{$_}=$i++; } while(<DATA>) { next if /^\s$/; # skip blank lines chomp; my @parts = split(/;/,$_); $parts[$index{Number}] =~ tr/0123456789/9876543210/; $parts[$index{Number_1}] =~ tr/0123456789/9876543210/; $parts[$index{Name}] =~ tr/A-Za-z/B-Wzb-w/; print join (";",@parts),"\n"; } =output Number;Name;Age;Gender;Number_1 876;Brhviw;54;m;876;AAA 765;Kslr;43;m;765;JJJ 654;Iipir;23;w;654;HHH =cut __DATA__ Number;Name;Age;Gender;Number_1 123;Andrew;54;m;123;AAA 234;John;43;m;234;JJJ 345;Helen;23;w;345;HHH [download]	[reply] [d/l]
Re: Parsing Header Last Field by Anonymous Monk on Jul 05, 2011 at 15:23 UTC
I have to anonymize several files (semicolon-separated). The following script does the job. ... Where is the mistake? There are several. A simple substitution cipher (like rot 13 ) doesn't make the data anonymous, and its easily reversed. You're not chomping your input, so parsingHeader( $Header, "Number_1" ) returns undef You have a useless while loop :)	[reply]
Re: Parsing Header Last Field by Anonymous Monk on Jul 05, 2011 at 15:55 UTC
Dear All, thank you very much! After chomp'ing the input and changing from $i <= @parts to $i < @parts it works now! I also changed `unshift @Assembly, $Header;` to `unshift @Assembly, $Header."\n";` since the first row sticks on the header otherwise. However I could not do it without while (<DATA>)-loop. I am a newbie thus please do not beat me, but the code of toolic works by me only after I returned the while(<DATA>)-loop back. In fact, this loop is necessary anyway since I process the files not the DATA-snippet, this was just an example. You are right, this anonymizing is very primitive, but this is just a fool-proofing and is sufficient in this case, it is not necessary to protect this data from the hacker attack. And: unfortunately I can work with the standard installation of perl only, I cannot install the other modules - it is just a policy here. Too bad. Thank you all very much again. If you find another mistakes in this snippet of code it would be grate. VE	[reply] [d/l] [select]
Re^2: Parsing Header Last Field by toolic (Bishop) on Jul 05, 2011 at 16:06 UTC
You're welcome. And: unfortunately I can work with the standard installation of perl only, I cannot install the other modules - it is just a policy here. Too bad. Are you sure about that? See: Yes, even you can use CPAN	[reply]
Re^3: Parsing Header Last Field by Anonymous Monk on Jul 05, 2011 at 16:18 UTC
Thank you again, I will try it.	[reply]
Re^2: Parsing Header Last Field by Anonymous Monk on Jul 05, 2011 at 20:36 UTC
However I could not do it without while (<DATA>)-loop Sure you can, simply shift the first, empty (all whitespace) line before the header -- you don't need a loop for that <DATA> is readline DATA, and readline in scalar context reads one line	[reply]
Re: Parsing Header Last Field by Anonymous Monk on Jul 05, 2011 at 16:02 UTC
... and: toolic, thank you for link to perltidy, I did not know about the tool.	[reply]
Re: Parsing Header Last Field by Anonymous Monk on Jul 05, 2011 at 16:09 UTC
Thank you Marshall, this is great!	[reply]
Re: Parsing Header Last Field by Anonymous Monk on Jul 08, 2011 at 07:37 UTC
Dear Monks, I would like to give a feedback. It works great now (the hash solution twice so fast as a solution with a sub). Still I have a problem :-) There are many files which I process. Neither of files have every field in the header, which I modify. Some of files have even no such fields in their header. I noticed that e.g. `$index{"NAME"};` returns undef in these cases which is interpreted as 0. Therefore the first column in these files is erroneously modified. I did the following thing: `if ($index{"NAME"} =~/\d+/) { $parts[$index{"NAME"}] = ""; }` [download] and this works. I need to repeat this preamble each time however. Is this a workaround or the solution (i.e. can this be done better)? Please note that the different fields are processed differently (some become "", some become replacement of digits etc.). Thanks! VE	[reply] [d/l] [select]
Re^2: Parsing Header Last Field by Marshall (Canon) on Jul 10, 2011 at 16:00 UTC
Sounds like you need something like this: `$parts[$index{Name}] =~ tr/A-Za-z/B-Wzb-w/ if defined ($index{Name});` [download] Checking whether or not the hash key is defined or exists is faster and more clear than running a regex to see if it contains a digit. Perhaps you need something similar to this on other replacements also?	[reply] [d/l]