in reply to split, manipulate, join

It's hard to give specific advice without seeing a sample of your input data, but it may be hard for you to show us a sample if your lines are 550 columns wide. Short answer: yes, it's quite likely that there is a faster solution using split and/or regex than what you're doing now with multiple index and substr commands.

One example to get you started: If you want the six columns numbered 10, 20, 30, 32, 34, and 38 (with the first column numbered 0), and your columns are separated by whitespace, you could do this to get them in an array:

while(my $line = <$file>){ chomp $line; # split the line on whitespace, then take certain indexed columns my @columns = (split /\s+/, $line)[10,20,30,32,34,38]; do_stuff_with_those_columns(@columns); }

On the other hand, if you know what you want out of each column, you may be able to skip the step of splitting into an array and go straight to extracting what you need. It just depends on the data. Give us at least a couple lines if you can.

Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.

Replies are listed 'Best First'.
Re^2: split, manipulate, join
by jc.smith3 (Initiate) on Apr 10, 2015 at 22:55 UTC
    I really can't give any data as it is customer sensitive. Assume I have to search 10 columns for key1= or key2= or key3= or key4=. There is no way to predict what will be after those 4 words. But I have to strip out whatever follows up to a & or reach end of the field. I have to blank the original data where it comes from (just what I am pulling, not whole column) and then append new columns with what I stripped out. I like the idea of the slice syntax. But then how would I go back and update those same columns?
      > I really can't give any data as it is customer sensitive

      please invent non-sensitiv data for your use case.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

      Can you make up some random but representative data?

      For example, it sounds a lot like you're parsing a URL for GET parameters, so your data with fake customer data with might be:
      http://www.example.com/random_funky_characters_here.html?name=JoeSmith&address=123UniversalAve&CreditCardNumber=1234567890&kneecapstate=broken

      my $inrec='0;1;2?foo=duh;3;4;5;6;7;8;9;;;;;;';

      You've already given some input data, presumably faked to the necessary degree. All you needed to do was to give a couple more fake input record instances and their corresponding output records to make the needed transformation(s) much more clear. No "customer sensitive" data need apply.


      Give a man a fish:  <%-(-(-(-<

        I thank everyone for their replies so far. I have updated the input examples. INREC1 does not meet the specs as the word ekey= is in not in elements2. INREC2 does meet the specs so it blanks the data following ekey= and moves the data it found to the end of the array. The thing to keep in mind is that I have 550 columns and I have to check 10 of those for 4 different words. If found I must blank what follows those words and then move the data I blanked to the end of the array.

      You can strip out the values while moving them. From what you describe, I'd ignore the columns (unless it's possible for those key2= patterns to appear in other columns where you want to ignore them). So you could use something like this to get the value for key1, strip it out, and append it to the end of the line:

      open my $in, '<', 'infile' or die $!; # open input file open my $out, '>', 'outfile' or die $!; # open output file while(my $line = <$in>){ chomp $line; # remove newline if( $line =~ s/key1=([^&\s]+)// ){ # capture value in $1 while # replacing with empty string $line .= " key1=$1"; # append key and value to line } print $out "$line\n"; # print line to output file } close $in; close $out;

      That will do one find/replace/copy, so you can repeat the if loop while changing the key?= to do multiple ones.

      Aaron B.
      Available for small or large Perl jobs and *nix system administration; see my home node.