This does what you *seem* to want, with your single line of input (however, you should try to get the data in CSV format so you can use a real parser -- using a regexp for this is flakey).

(Update: I treated the first words as special, not as a key, since it has not only no value but no succeeding separator. I see from your later comments that this format matches a key as well. There is no way I can think of that would allow you to differentiate between a key with no separator or value, and a word in a multi-word value. You will have to get your data produced differently if you expect there to be both "bare" keys and multi-word values, in a string with no distinct separators between pairs.)

(Update 2: I see from your later responses that in fact the first value should be treated as a key, even though it is not followed by the key-value separator. I've updated the code to include it in the data hash, but with the empty string as the value rather than undef.)

use strict; use warnings; use feature 'say'; use Data::Dumper; $Data::Dumper::Sortkeys = $Data::Dumper::Indent = 1; chomp( my $input = <DATA> ); my ( $first, $txt ) = split / /, $input, 2; my %pairs = $txt =~ / ( \w+ ) # capture key = # separator ( # capture value (?: # group (but don't additionally capture) (?!\w+=) # (must not be followed by the next key-se +parator) .+? # at least one character (even if just the + whitespace), non-greedy )+ # at least one of those ) # end value /msxg; $pairs{ $first } = ''; # trim trailing whitsepace from values $_ =~ s/ $// for values %pairs; say Dumper \%pairs; __END__ eab12345 id=00000 pgrp=abcdefgh groups=abcdefgh home=/home/eab12345 sh +ell=/usr/bin/ksh gecos=AB/C/Y0000/ABC/XYZ RTYUI, LMNOP *CONTRACTOR* ( +AS 00000) auditclasses=general,files,TCPIP login=true su=true rlogin= +true daemon=true admin=false sugroups=ALL admgroups= tpath=nosak ttys +=ALL expires=0 auth1=SYSTEM auth2=NONE umask=00 registry=AD SYSTEM=AD + logintimes= loginretries=5 pwdwarntime=5 account_locked=false minage +=0 maxage=13 maxexpired=0 minalpha=1 minother=1 mindiff=1 maxrepeats= +2 minlen=8 histexpire=13 histsize=8 pwdchecks= dictionlist=/abc/def/g +hi/jkl default_roles= fsize=-1 cpu=-1 data=-1 stack=65536 core=000000 + rss=65536 nofiles=2000 time_last_login=1512632113 time_last_unsucces +sful_login=1505304923 tty_last_login=ssh tty_last_unsuccessful_login= +ssh host_last_login=0.000.000.000 host_last_unsuccessful_login=0.000. +000.000 unsuccessful_login_count=0 roles=
Output:
$VAR1 = { 'SYSTEM' => 'AD', 'account_locked' => 'false', 'admgroups' => '', 'admin' => 'false', 'auditclasses' => 'general,files,TCPIP', 'auth1' => 'SYSTEM', 'auth2' => 'NONE', 'core' => '000000', 'cpu' => '-1', 'daemon' => 'true', 'data' => '-1', 'default_roles' => '', 'dictionlist' => '/abc/def/ghi/jkl', 'eab12345' => '', 'expires' => '0', 'fsize' => '-1', 'gecos' => 'AB/C/Y0000/ABC/XYZ RTYUI, LMNOP *CONTRACTOR* (AS 00000)' +, 'groups' => 'abcdefgh', 'histexpire' => '13', 'histsize' => '8', 'home' => '/home/eab12345', 'host_last_login' => '0.000.000.000', 'host_last_unsuccessful_login' => '0.000.000.000', 'id' => '00000', 'login' => 'true', 'loginretries' => '5', 'logintimes' => '', 'maxage' => '13', 'maxexpired' => '0', 'maxrepeats' => '2', 'minage' => '0', 'minalpha' => '1', 'mindiff' => '1', 'minlen' => '8', 'minother' => '1', 'nofiles' => '2000', 'pgrp' => 'abcdefgh', 'pwdchecks' => '', 'pwdwarntime' => '5', 'registry' => 'AD', 'rlogin' => 'true', 'roles' => '', 'rss' => '65536', 'shell' => '/usr/bin/ksh', 'stack' => '65536', 'su' => 'true', 'sugroups' => 'ALL', 'time_last_login' => '1512632113', 'time_last_unsuccessful_login' => '1505304923', 'tpath' => 'nosak', 'tty_last_login' => 'ssh', 'tty_last_unsuccessful_login' => 'ssh', 'ttys' => 'ALL', 'umask' => '00', 'unsuccessful_login_count' => '0' }

Hope this helps!


The way forward always starts with a minimal test.

In reply to Re: Splitting a long row with multiple delimiters. by 1nickt
in thread Splitting a long row with multiple delimiters. by dipit

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.