in reply to Re^2: Cleaning Data Between Specified Columns
in thread Cleaning Data Between Specified Columns

Sorry Aristotle. Fletch's (partial) solution, neat as the technique is, falls foul of the fact that deleting the apostrophies in a one range, causes all the subsequent columns to shift.


Examine what is said, not who speaks.

The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

  • Comment on Re: Re^2: Cleaning Data Between Specified Columns

Replies are listed 'Best First'.
Re^4: Cleaning Data Between Specified Columns
by Aristotle (Chancellor) on Jan 28, 2003 at 09:35 UTC
    I should have tested. Anyway, in this case, it's a simple matter of changing the order of operations:
    { local *_ = \substr $source, $start, $len; y/a-zA-Z0-9\n\|-/ /c; y/'//d; }
    However, that obviously only works if there's only one operation affecting length. For a more general case, I'd do something like this (untested):
    #!/usr/bin/perl -w use strict; my @range = map /^(\d+)-(\d+)$/, sort { $a <=> $b } splice @ARGV, 1; unshift @range, 0; $range[$_] = 1 + $range[$_+1] - $range[$_] for 0 .. $#range-1; $range[-1] = '*'; die "Negative length field specified" if grep $_ < 0, @range[0 .. $#range-1]; my $fmt = join " ", map "A$_", @range; # pick odd numbered elements my @selected = map 1 + $_ * 2, 0..$range_/2; while(<>) { my @field = unpack $fmt, $_; for (@field[@selected]) { tr/a-zA-Z0-9\n\|\-'/ /c; tr/'//d; } print join '', @field; }
    The point is to structure your data whenever possible. An array element end is never ambiguous, a \x7F can happen to be, and in my case, whatever my mark character, I've always been bitten by it.

    Makeshifts last the longest.