in reply to How to substitute something from only between two specified charecters

A couple of solutions for you. It is possible to put an extra qualifier on the split regex. In the first example below, I say split on white space but only if those spaces are preceded by a digit or the / character. This is done by a positive look behind assertion. So a name like "District of Columbia" has the spaces preserved and no split happens on those spaces.

In the second example below, I used the same extra qualifier trick and said remove spaces but only if the spaces are preceded by a letter. Then I did a split on the result.

Note that the chomp is not necessary in the second case. When splitting on the default of \s+, space characters are in the set of [space,\n\r\f\t]. Since \n is in that set, it is removed. In the first example a chomp() is needed because the condition of the split was modified.

The seek statement just "rewinds" the DATA file handle. The DATA file handle starts out positioned at the first byte after the __DATA__ statement. $begin is used to remember what that byte is so that I can go back. If I had done a seek DATA,0,0; that would have moved the file pointer to right before the "hashbang" line. If for some reason you would like for a Perl program to read itself, that is one way!

#!/usr/bin/perl -w use strict; my $begin = tell(DATA); #to rewind DATA later on while (<DATA>) { chomp; # (?<=\d) is a positive look behind assertion # a digit or / must preceed the \s+ in order to split # upon it. Note chomp is necessary because the # trailing \n will not be removed because there is # no digit in HA. my @tokens = split(/(?<=\d|\/)\s+/, $_); print join("\n",@tokens),"\n"; } =prints like: >cds:ADD23250 A/District of Columbia/INS17/2009 2009/10/26 HA =cut seek DATA,$begin,0; #rewinds DATA back to beginning while (<DATA>) { s/(?<=[a-zA-Z])\s+//g; #remove spaces if preceeded by letter my @tokens = split; print join("\n",@tokens),"\n"; } =prints like: >cds:ADD23250 A/DistrictofColumbia/INS17/2009 2009/10/26 HA =cut __DATA__ >cds:ADD75048 A/Brussels/INS71/2009 2009/10/30 HA >cds:ADF58353 A/Germany-MV/HGW4/2009 2009/12/ HA >cds:ADF58351 A/Germany-MV/HGW6/2009 2009/12/ HA >cds:ADU76781 A/England/94780010/2009 2009/10/22 HA >cds:AEA30293 A/Netherlands/2223b/2009 2009/11/18 HA >cds:ADD23250 A/District of Columbia/INS17/2009 2009/10/26 HA >cds:ADX98640 A/San Diego/INS13/2009 2009/10/19 HA >cds:ADD74978 A/San Diego/INS54/2009 2009/10/12 HA >cds:ADF27925 A/Texas/JMS407/2010 2010/01/11 HA >cds:ADM95824 A/Finland/661/2009 2009/10/26 HA >cds:ADD97035 A/Wisconsin/629-D00036/2009 2009/09/15 HA
  • Comment on Re: How to substitute something from only between two specified charecters
  • Download Code