OK, I took the idea and did basically a complete rewrite.
In particular I noticed the following:
- I made the interfaces less magic. For instance you
have this magic stuff on the filehandle. I made that a
separate function. This will work with tied filehandles
as well. For the same reason I stopped using $/
because the author of a tied method may not pay attention
to that.
- If you are a module, there is no need to do
initializations in a BEGIN block.
- I would have moved your functions into @EXPORT_OK as
Exporter suggests, but you want this for one-offs. OK,
TIMTOWTDI. But if I was using it I would have made that
change.
- I wondered if your @VERSION was meant to be $VERSION.
- I note that there is no equivalent to the third
argument to split. I played both ways with that then
left it alone. Just note that trailing blanks will
get split.
- I am doing a rewrite and didn't include any POD. You
should.
- I made this n-dimensional because, well, because I can.
- You were not completely clear what the argument order
was, and naming the first one $second and the second one
$first is IMO confusing. I made it recursive, but still
you should note the naming issue. If you wanted 2-dim I
would suggest $inner and $outer as names.
- You are using explicit indexes. I almost never find
that necessary. In this version I use map. Otherwise
you could push onto the anon array. Avoiding ever
thinking about the index leads to fewer opportunities to
mess up, and often results in faster code as well!
- I am using qr// to avoid recompiling REs. Given the
function call overhead this probably isn't a win. I did
it mainly to mention that if you are going to do repeated
uses of an RE, you can and should avoid compilation
overhead.
- The reason for my wrappers is so that my recursion
won't mess up on the defaults. :-)
- I considered checking wantarray, but the complication
in the interface did not seem appropriate for short stuff.
- Note that this entire approach is going to fail
miserably on formats with things like escape characters
and escape sequences. For instance the CSV format is
never going to be easily handled using this. Something
to consider before using this for an interesting problem.
Oh right, and you want to see code? OK.
package SuperSplit;
use strict;
use Exporter;
use vars qw( @EXPORT @ISA $VERSION );
$VERSION = 0.02;
@ISA = 'Exporter';
@EXPORT = qw( superjoin supersplit supersplit_io );
# Takes a reference to an n-dim array followed by n strings.
# Joins the array on those strings (inner to outer),
# defaulting to "\t", "\n"
sub superjoin {
my $a_ref = shift;
push (@_, "\t") if @_ < 1;
push (@_, "\n") if @_ < 2;
_join($a_ref, @_);
}
sub _join {
my $a_ref = shift;
my $str = pop;
if (@_) {
@$a_ref = map {_join($_, @_)} @$a_ref;
}
join $str, @$a_ref;
}
# Splits the input from a filehandle
sub supersplit_io {
my $fh = shift;
unless (defined($fh)) {
$fh = \*STDIN;
}
unshift @_, join '', <$fh>;
supersplit(@_);
}
# n-dim split. First arg is text, rest are patterns, listed
# inner to outer. Defaults to /\t/, /\n/
sub supersplit {
my $text = shift;
if (@_ < 1) {
push @_, "\t";
}
if (@_ < 2) {
push @_, "\n";
}
_split($text, map {qr/$_/} @_);
}
sub _split {
my $text = shift;
my $re = pop;
my @res = split($re, $text); # Consider the third arg?
if (@_) {
@res = map {_split($_, @_)} @res;
}
\@res;
}
1;
Cheers,
Ben
PS Please take the quantity and detail of my response as a
sign that I liked the idea enough to critique it, and
not as criticism of the effort you put in...
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.