comment on

OK, I took the idea and did basically a complete rewrite. In particular I noticed the following:

I made the interfaces less magic. For instance you have this magic stuff on the filehandle. I made that a separate function. This will work with tied filehandles as well. For the same reason I stopped using $/ because the author of a tied method may not pay attention to that.
If you are a module, there is no need to do initializations in a BEGIN block.
I would have moved your functions into @EXPORT_OK as Exporter suggests, but you want this for one-offs. OK, TIMTOWTDI. But if I was using it I would have made that change.
I wondered if your @VERSION was meant to be $VERSION.
I note that there is no equivalent to the third argument to split. I played both ways with that then left it alone. Just note that trailing blanks will get split.
I am doing a rewrite and didn't include any POD. You should.
I made this n-dimensional because, well, because I can.
You were not completely clear what the argument order was, and naming the first one $second and the second one $first is IMO confusing. I made it recursive, but still you should note the naming issue. If you wanted 2-dim I would suggest $inner and $outer as names.
You are using explicit indexes. I almost never find that necessary. In this version I use map. Otherwise you could push onto the anon array. Avoiding ever thinking about the index leads to fewer opportunities to mess up, and often results in faster code as well!
I am using qr// to avoid recompiling REs. Given the function call overhead this probably isn't a win. I did it mainly to mention that if you are going to do repeated uses of an RE, you can and should avoid compilation overhead.
The reason for my wrappers is so that my recursion won't mess up on the defaults. :-)
I considered checking wantarray, but the complication in the interface did not seem appropriate for short stuff.
Note that this entire approach is going to fail miserably on formats with things like escape characters and escape sequences. For instance the CSV format is never going to be easily handled using this. Something to consider before using this for an interesting problem.

Oh right, and you want to see code? OK.

package SuperSplit;
use strict;

use Exporter;
use vars qw( @EXPORT @ISA $VERSION );
$VERSION = 0.02;
@ISA = 'Exporter';
@EXPORT = qw( superjoin supersplit supersplit_io );

# Takes a reference to an n-dim array followed by n strings. 
# Joins the array on those strings (inner to outer),
# defaulting to "\t", "\n"
sub superjoin {
  my $a_ref = shift;
  push (@_, "\t") if @_ < 1;
  push (@_, "\n") if @_ < 2;
  _join($a_ref, @_);
}

sub _join {
  my $a_ref = shift;
  my $str = pop;
  if (@_) {
    @$a_ref = map {_join($_, @_)} @$a_ref;
  }
  join $str, @$a_ref;
}

# Splits the input from a filehandle
sub supersplit_io {
  my $fh = shift;
  unless (defined($fh)) {
    $fh = \*STDIN;
  }
  unshift @_, join '', <$fh>;
  supersplit(@_);
}

# n-dim split.  First arg is text, rest are patterns, listed
# inner to outer.  Defaults to /\t/, /\n/
sub supersplit {
  my $text = shift;
  if (@_ < 1) {
    push @_, "\t";
  }
  if (@_ < 2) {
    push @_, "\n";
  }
  _split($text, map {qr/$_/} @_);
}

sub _split {
  my $text = shift;
  my $re = pop;
  my @res = split($re, $text); # Consider the third arg?
  if (@_) {
    @res = map {_split($_, @_)} @res;
  }
  \@res;
}

1;
[download]

Cheers,
Ben

PS Please take the quantity and detail of my response as a sign that I liked the idea enough to critique it, and not as criticism of the effort you put in...

In reply to Re (tilly) 1: Supersplit by tilly
in thread Supersplit by jeroenes

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Problems? Is your data what you think it is?
	PerlMonks