LoneRanger has asked for the wisdom of the Perl Monks concerning the following question:

I have something that looks like this:
Sharename Type Comment --------- ---- ------- IPC$ IPC Remote IPC (more lines like this) Server Comment --------- ------- STUFF Stuff (more lines like this)
I just want to grab the text from top set of columns (All except comment) into a hash. So I guess I just need some crafty regex help.

Replies are listed 'Best First'.
Re (tilly) 1: multi-line (string doc) parsing
by tilly (Archbishop) on Jan 27, 2001 at 23:30 UTC
    I suspect that your problem is best solved by locating the positions of the columns and then using an unpack to extract columns into text. For an idea on how to find positions, take a look at the pos trick in Locate char in a string. (Probably with a pattern like /(^| )(?=\S)/g.)

    Of course it will be a bit harder than that because you have to recognize when you leave the ASCII table and start another.

    And check first whether you have tabs. If you do then you should either be able to use them for a split or else follow the advice in How do I expand tabs in a string? so you can do the above.

Re: multi-line (string doc) parsing
by Fastolfe (Vicar) on Jan 27, 2001 at 23:32 UTC
    I wrote a small bit of code for a related problem a while back that should reliably parse this for you: Parse fixed-length ascii table

    The only caveat is the fact that you'll have to break each part up into a chunk of its own and process each separately, and you'd have to strip out the ------ fields (before or after parsing). Hope this helps.

Re: multi-line (string doc) parsing
by Trinary (Pilgrim) on Jan 27, 2001 at 23:31 UTC
    Would help to know if (hopefully) the columns are separated by something other than a bunch of spaces...tab would be convenient. Otherwise I guess you could just split on more than two spaces, but I don't know exactly what the data looks like there.

    Anyway, a simple approach: pseudo-codish

    foreach (@lines) { next if (/-+/ || /^Sharename/); last if (/^Server/); my @vals = split /\s{2,}/; $hash{"$vals[0]"}{'Type'} = $vals[1]; $hash{"$vals[0]"}{'Comment'} = $vals[2]; }
    As usual, untested and mostly not proofread. =) Point out errors as you see fit, of course this is a totally simple method of solving the problem...much cooler solutions abound.

    Enjoy

    Trinary

Re: multi-line (string doc) parsing
by 2501 (Pilgrim) on Jan 27, 2001 at 23:53 UTC
    here is a quickie if the file is broken up by spaces. I am not wildly happy with it, because you have to assume that sharename and type don't have spaces in them. hell, I tried, it works, it just ain't the best.:)
    use strict; while(<DATA>){ if($_ =~ /^\s+(\S+)\s+(\S+)\s+(.+)\s+$/){ print "$1\t$2\t$3\t\n"; } } __DATA__ Sharename Type Comment --------- ---- ------- IPC$ IPC Remote IPC Server Comment --------- ------- STUFF Stuff