Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have a string similar to:
a(b)cd(e)f
i want to split the string by 5 characters and do something to each of the splits:
split 1 = a(b)c split 2 = d(e)f
any advice would appreciated Thanks

2005-10-04 Retitled from "split" by Corion, as per Monastery guidelines
Original title: 'split'

Replies are listed 'Best First'.
Re: Split a string into items of constant length
by Roy Johnson (Monsignor) on Oct 04, 2005 at 12:41 UTC
    Another solution is to use magical values for open and $/:
    my $a = 'a(b)cd(e)fg(h)ij(k)l'; open IN, '<', \$a or die "Reading string: $!\n"; $/=\5; my @s = <IN>; print map "$_\n", @s;
    (You need to be using a modern perl. I think 5.8.x)

    Caution: Contents may have been coded under pressure.
Re: Split a string into items of constant length
by Perl Mouse (Chaplain) on Oct 04, 2005 at 10:11 UTC
    my @chunks = unpack "(A5)*", "a(b)cd(e)f"; print $chunks[0], "\n"; print $chunks[1], "\n"; __END__ a(b)c d(e)f
    Perl --((8:>*

      This won't play well with UTF8.

      -sauoq
      "My two cents aren't worth a dime.";
      
      A reply falls below the community's threshold of quality. You may see it by logging in.

      Hi,

      Or more dynamic, but note the little 'empty' problem.

      #!/usr/bin/perl use strict; use warnings; while (<DATA>) { my $line = $_; chomp $line; my @chunks = unpack "(A5)*", $line; print @chunks . "\n"; # beware, the last 'entry' in chunks is empty, note the < and >! foreach my $i (@chunks) { print ">" . $i . "<\n"; } } __DATA__ a(b)cd(e)fg(h)i j(k)lm(n)o
      --
      if ( 1 ) { $postman->ring() for (1..2); }
        my $line = $_; chomp $line; my @chunks = unpack "(A5)*", $line;
        Just a side note: what's wrong with
        chomp; my @chunks = unpack "(A5)*", $_;
        ?

        (or else

        while (my $line=<DATA>) { ...
        instead.)

        Also, more on a stylistic personal preference ground:

        foreach my $i (@chunks) { print ">" . $i . "<\n";
        why not
        print ">$_<\n" for @chunks;
        instead?
Re: Split a string into items of constant length
by Samy_rio (Vicar) on Oct 04, 2005 at 10:22 UTC

    Try this,

    $a="a(b)cd(e)f"; print "\n", substr $a, 0,5,'' until $a eq '';

    Regards,
    Velusamy R.

      Just a minor nitpick that is always worth to repeat, IMHO: $a should not be used as a general purpose variable -- see sort.

      Update: (especially for those who didn't like this post) sauoq explained in greater detail the potential issues with using $a and $b as general purpose variables. He also pointed out that, as is well known, in most cases it won't do much harm. However it was apparent that the person I was answering to was not aware of them and the OP appeared to be a newbie. So, in this context, I'm still convinced it was an important circumstance to bring to their knowledge. Sad to notice more than one's mileage does vary...

        $a should not be used as a general purpose variable

        It's probably better to explain that rather than refer to a manpage that won't. At least, not very well. Yes, sort uses package variable $a and $b. It also localizes them.

        $ perl -le '$a="foo"; my @n = (3,2,1); print for sort {$a<=>$b} @n; pr +int $a' 1 2 3 foo
        It only becomes an issue if you declare them as lexical variables.
        $ perl -le 'my $a; my @n = (3,2,1); print for sort {$a<=>$b} @n' Can't use "my $a" in sort comparison at -e line 1.
        And the better fix is probably not to avoid $a and $b but to be explicit in your sort blocks and subs by using $::a and $::b (or $Foo::a and $Foo::b if you are in package Foo) explicitly.
        $ perl -le 'my $a; my @n = (3,2,1); print for sort {$::a<=>$::b} @n;' 1 2 3
        That isn't to say avoiding $a and $b is a bad thing... they are generally lousy variable names anyway. But I often use them in one-liners. So long as you know when and, more importantly, why it can be an issue, there's no harm in it.

        -sauoq
        "My two cents aren't worth a dime.";
        
      thanks for all those suggestions!
Re: Split a string into items of constant length
by sauoq (Abbot) on Oct 04, 2005 at 10:22 UTC

    One way...

    my $string = "a(b)cd(e)f"; my @parts = split /(?=.{5}$)/, $string';
    And if your strings are longer and you want to chop it all up the same way...
    my $string = "a(b)cd(e)fg(h)ij(k)l"; my @parts = split /(?=(?:.{5})+$)/, $string';
    It would be simpler to drop split in this case though...
    my $string = "a(b)cd(e)fg(h)ij(k)l"; my @parts = $string =~ /(.{5})/g;

    -sauoq
    "My two cents aren't worth a dime.";
    
      my $string = "a(b)cd(e)f"; my @parts = split /(?=.{5}$)/, $string';
      Why using an extended regex to use split where a simple match would do?
      local $_ = "a(b)cd(e)f"; my @parts = split /.{5}/g;
      local $_ = "a(b)cd(e)f"; my @parts = /.{5}/g;
      (Yes: of course this does not take care of the case when the given string has a length that is not a multiple of 5. But then I'm using unpack, as well as Perl Mouse does!)

      Update: fixed a split that I had inadvertently left in -- see stroken out code above. Thanks to sauog's comment.

        Yes: of course this does not take care of the case when the given string has a length that is not a multiple of 5.
        If you want to catch that, you can change the regex to /.{1,5}/g.
        perl -le 'print for "ABCDEFGHIJKLMNOPQRSTUVWXYZ" =~ /.{1,5}/g' ABCDE FGHIJ KLMNO PQRST UVWXY Z
        Of course, depending on what it's used for, simply discarding it can be a better idea, and /.{5}/g works just fine for that.
        my @parts = split /.{5}/g;

        You probably meant to leave split out of that. I'm not sure because of your comment about a string with a length that isn't a multiple of 5 though... In any case, you were probably adding this at the same time I was adding (the correct version of) it as an afterthought to my own post. That's pretty quick as I added it within seconds... :-P

        -sauoq
        "My two cents aren't worth a dime.";
        
Re: Split a string into items of constant length
by inman (Curate) on Oct 04, 2005 at 11:07 UTC
    A simple regex should work fine. No need for anything complicated.
    my $data = 'a(b)cd(e)f'; my @chunks = $data =~ /.{5}/g; print "@chunks";

    Change the value in the regex for a different number of chars.

Re: Split a string into items of constant length
by blazar (Canon) on Oct 04, 2005 at 10:22 UTC
    do_something($_) for unpack 'A5A5', 'a(b)cd(e)f';

    Update: I know that we should not really care XP points, but since I see a -1 on top of this node, I'm astonished. What's wrong with the code I proposed?

    I don't see why it shouldn't work:

    $ perl -le 'print for unpack qw/A5A5 a(b)cd(e)f/' a(b)c d(e)f
    I wonder if the person who downvoted this also had something intelligent to say...