SiLiK has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I need to split a $string into $subString's of characters wight different length. The name of the substring ($var) and it's length ($noChar) is given in a @list.

Here is an example @list:
$var $noChar
1: date 6
2: time 4
3: duration 4
4: calling-num 15
5: dialed-num 18
6: code-used 4
7: cond-code 1
8: ppm 5
9: auth-code 13

And an example $string:
21090412500011            209         070279501 9047    0         1111


Now here is the tricky part. The @list changes (so does the $string) from time to time (I can get the list in @ or %), so I have to the split the $string in $subString's depending on the content of the @list at the moment. In the end I have to end up with a @anotherList of $var, $subString.

What is the best way to go about this?


10x ahead...

Replies are listed 'Best First'.
Re: split string by character count
by Corion (Patriarch) on Sep 21, 2004 at 12:41 UTC

    pack and unpack are very suited for fixed-length strings. As your list of elements is variable, you will have to create the template for unpack dynamically from your list. The idea is first to split up your list of elements into the template and the names, and then to use these two to create the hash from that:

    my $unpack_template; my @names; for (@list) { die "Don't know what to do with $_" unless /^\d+: ([-a-z]+) (\d+)$/; $unpack_template .= "A$2"; push @name, $1; }; print "I will be unpacking using '$unpack_template'\n"; my @values = unpack $unpack_template, $string; # Now, pack everything into a nice hash so we can use the name # to address the value: my %result; @result{ @name } = @values; for my $k (sort keys %result) { print "$k => $result{$k}\n"; };
Re: split string by character count
by gothic_mallard (Pilgrim) on Sep 21, 2004 at 12:52 UTC

    Have you thought of using unpack? Then each time you wish to split the $string into $subStrings you can just change the unpack code list.

    I'm not sure how you're representing @list - is each line an element that needs parsing into $var and $noChar or individual elements?

    Assuming the structure you have above you could do something like:

    my $string = '21090412500011 209 070279501 9047 +0 1111'; my $f = "A6A4A4A15A18A4A1A5A13"; # Dynamically construct from list val +ues. my @subStrings = unpack("A6A4A4A15A18A4A1A5A13",$string);

    Then the @subStrings list would hold all the substrings you require.

    All you have to do each time you read it is parse the @list and create the $f pack list. Each substring is represented by an "A" (alphanumeric) followed by the length of the substring - i.e. "A6" is a 6 char substring.

Re: split string by character count
by Limbic~Region (Chancellor) on Sep 21, 2004 at 12:55 UTC
    SiLiK,
    The following code should be all you want and a bag of chips too.
    #!/usr/bin/perl use strict; use warnings; my @template = ( '1: date 6', '2: time 4', '3: duration 4', '4: calling-num 15', '5: dialed-num 18', '6: code-used 4', '7: cond-code 1', '8: ppm 5', '9: auth-code 13', ); my $string = '21090412500011 209 070279501 9047 +0 1111'; my %code = Parse_String(\@template, $string); print "$code{date}\n"; print "$code{'auth-code'}\n"; sub Parse_String { my ($rosetta, $string) = @_; my $template; my @fields; for ( @{ $rosetta } ) { if ( /^\d+:\s+([^\s]+)\s+(\d+)/ ) { push @fields , $1; $template .= 'a' . $2; } } my %code; @code{ @fields } = unpack $template , $string; return %code; }
    I know I didn't provide you with another array because it appears that a hash would be better. It should be trivial to change it if that's what you truly want.

    Cheers - L~R

      10x L~R

      I decided to go with your solution since id made me re-read the chapter on working with arrays from my Perl book to understand it ;P

      many thanks to all !
Re: split string by character count
by bpphillips (Friar) on Sep 21, 2004 at 13:18 UTC
    As noted above, pack and unpack are perl's built-in methods of doing this kind of thing. However, I never can remember how the syntax works for the templates required by those two functions. I've used Parse::FixedLength some and have found it very handy for splitting out fixed width text files. Here's an example of how to use it (not sure on the exact format of the data in @list or what you need after processing each line):
    @input = <FH> # not sure where the input is from @list = () # whatever... foreach my $line(@input){ my $parser = new Parse::FixedLength([ map {m/^\d+: (.+) (\d+)/} @list ]); my $hash = $parser->parse($line); # do whatever with $has }
    Just another option to using pack/unpack manually...
Re: split string by character count
by TheEnigma (Pilgrim) on Sep 21, 2004 at 13:14 UTC
    I'm assuming the 1:, 2:, etc. aren't actually in @list. You said you wanted to end up with an array; how about this:

    #!/usr/bin/perl -w use strict; my $string = "21090412500011 209 070279501 9047 +0 1111"; my @list = ("date", 6, "time", 4, "duration", 4, "calling-num", 15, "dialed-num", 18, "code-used", 4, "cond-code", 1, "ppm", 5, "auth-code", 13,); my @anotherlist; my $ptr = 0; while(1){ push(@anotherlist, shift(@list)); my $length = shift(@list); push(@anotherlist, substr($string, $ptr, $length)); $ptr += $length; last if $ptr >= length($string); } print "@anotherlist\n";

    TheEnigma

      For added utility, convert the array to a hash:

      sub extract_fields { my @anotherlist; my $ptr = 0; while (@list) { push(@anotherlist, shift(@list)); my $length = shift(@list); if ($ptr >= length($string)) { push(@anotherlist, undef); } else { push(@anotherlist, substr($string, $ptr, $length)); } $ptr += $length; } push(@anotherlist, '_leftover', substr($string, $ptr)) if ($ptr < length($string)); return @anotherlist; } %fields = extract_fields(...);
Re: split string by character count
by Jasper (Chaplain) on Sep 21, 2004 at 14:29 UTC
    %hash = map {/ (\S+) /, substr $string, 0, $', ''} @list;

    I'd normally say that unpack is probably your friend, as shown above in the other replies. However, it looks to me like a nonsense of a format to be supplied with. I'd ask them (the mysterious they, of course) to supply it in a csv file with column headers, and run it through DBX. Or something like that.