anjiro has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to split a line on whitespace and then not whitspace, but I don't want to lose any of the whitespace. As an example, given the first line of df (I've replaces spaces with '_' to make it more obvious what I need:
Filesystem___________1K-blocks______Used_Available_Use%_Mounted_on
I want (split parts denoted by []s):
[Filesystem][___________1K-blocks][______Used][_Available][_Use%][_Mou +nted][_on]
I almost have what I want with split(/(?<=\S)(\s)/); but I get this:
[Filesystem][_][__________1K-blocks][_][_____Used][_][Available][_][Us +e%][_][Mounted][_][on]
Any advice?

Replies are listed 'Best First'.
Re: Splitting without losing
by graff (Chancellor) on Jan 31, 2003 at 02:36 UTC
    You are very close. Try making the regex look like this:
    split(/(?<=\S)(?=\s)/);
    That is, use both look-behind and look-ahead assertions, both zero-width. (update: made the above line coincide with the tested code below) I tried the following snippet on the command line to verify that:
    $s="AAA B X Y Z W\n"; @a=split(/(?<=\S)(?=\s)/, $s); print length($s),":$s"; print join(":",length(join("",@a)),scalar(@a),"\n"); print join("\n",@a),"\n";
    Output:
    20:AAA B X Y Z W 20:7: AAA B X Y Z W
    Note that this treats the final LF (or CRLF, if that's your flavor) as a token -- the second line of output shows that the array got seven elements, while the string had just six non-whitespace tokens followed by LF.
Re: Splitting without losing
by Kanji (Parson) on Jan 31, 2003 at 02:37 UTC

    Rather than split, I'd be more inclined to use a simple regex...

    my $line = 'Filesystem 1K-blocks Used Available Use% +Mounted on'; my @fields = $line =~ /(\s*\S+)/g; print "[", join( "][", @fields ), "]\n"; __END__ [Filesystem][ 1K-blocks][ Used][ Available][ Use%][ Mou +nted][ on]

        --k.


Re: Splitting without losing
by Paladin (Vicar) on Jan 31, 2003 at 02:33 UTC
    Try:
    split /(?<!_)(?=_)/,$string; # Using _ instead of space like your exa +mple.
    ie. Split on the zero-width bit between a non-space on the left, and a space on the right, without capturing either.
Re: Splitting without losing
by helgi (Hermit) on Jan 31, 2003 at 12:51 UTC
    I have a simpler solution that seems to work for most cases:

    split /\b/;

    --
    Regards,
    Helgi Briem
    helgi AT decode DOT is

Re: Splitting without losing
by anjiro (Beadle) on Jan 31, 2003 at 03:47 UTC
    Awesome! You guys rock. =) Thanks a lot.
Re: Splitting without losing
by DaveH (Monk) on Feb 01, 2003 at 10:54 UTC

    Hi.

    It looks like what you are trying to do is parse 'df -k' in some way. Perhaps you should instead look into using a module like Filesys::Df. This uses the statvfs system call, therefore should be more portable should your code be destined to run on any other systems. You never know, it may also end up being faster.

    Trying to parse df consistently cross-platform is like trying to tie down smoke. ;-) Each OS seems to print out the information in a slightly different way.

    Cheers,

    -- Dave :-)


    $q=[split+qr,,,q,~swmi,.$,],+s.$.Em~w^,,.,s,.,$&&$$q[pos],eg,print