Re: Splitting a long row with multiple delimiters.

A single sample input is usually not enough to reliably design a regex (see also). <update> Please use <code> tags when posting sample input. Also, can't you get this data in a more parseable format? </update> I have made the following assumptions:

Keys always match \w+
Values may not contain =
Keys are always preceded by whitespace

use warnings;
use strict;

my $str = q{eab12345 id=00000 pgrp=abcdefgh groups=abcdefgh home=/home
+/eab12345 shell=/usr/bin/ksh gecos=AB/C/Y0000/ABC/XYZ RTYUI, LMNOP *C
+ONTRACTOR* (AS 00000) auditclasses=general,files,TCPIP login=true su=
+true rlogin=true daemon=true admin=false sugroups=ALL admgroups= tpat
+h=nosak ttys=ALL expires=0 auth1=SYSTEM auth2=NONE umask=00 registry=
+AD SYSTEM=AD logintimes= loginretries=5 pwdwarntime=5 account_locked=
+false minage=0 maxage=13 maxexpired=0 minalpha=1 minother=1 mindiff=1
+ maxrepeats=2 minlen=8 histexpire=13 histsize=8 pwdchecks= dictionlis
+t=/abc/def/ghi/jkl default_roles= fsize=-1 cpu=-1 data=-1 stack=65536
+ core=000000 rss=65536 nofiles=2000 time_last_login=1512632113 time_l
+ast_unsuccessful_login=1505304923 tty_last_login=ssh tty_last_unsucce
+ssful_login=ssh host_last_login=0.000.000.000 host_last_unsuccessful_
+login=0.000.000.000 unsuccessful_login_count=0 roles= };

my $REGEX = qr{ (?|
        # treat beginning of string as a key only
        \A \s* (?<key> \w+ ) \s*
    |    # otherwise, a normal key=value pair
        (?<= \s ) # key must be preceded by a space
        (?<key> \w+ )  \s* = \s*
        (?<value> # a value may not look like another key=value
            (?: (?! \s* \w+ = ) [^=] )*
        ) \s*
    ) }msx;

pos($str)=undef;
while ( $str =~ /\G$REGEX/gc ) {
    print "<", $+{key}, "> = <", $+{value}//'undef', ">\n";
}
die "failed to parse at pos ".pos($str)
    unless pos($str)==length($str);
[download]

<eab12345> = <undef>
<id> = <00000>
<pgrp> = <abcdefgh>
<groups> = <abcdefgh>
<home> = </home/eab12345>
<shell> = </usr/bin/ksh>
<gecos> = <AB/C/Y0000/ABC/XYZ RTYUI, LMNOP *CONTRACTOR* (AS 00000)>
<auditclasses> = <general,files,TCPIP>
<login> = <true>
<su> = <true>
<rlogin> = <true>
<daemon> = <true>
<admin> = <false>
<sugroups> = <ALL>
<admgroups> = <>
<tpath> = <nosak>
<ttys> = <ALL>
<expires> = <0>
<auth1> = <SYSTEM>
<auth2> = <NONE>
<umask> = <00>
<registry> = <AD>
<SYSTEM> = <AD>
<logintimes> = <>
<loginretries> = <5>
<pwdwarntime> = <5>
<account_locked> = <false>
<minage> = <0>
<maxage> = <13>
<maxexpired> = <0>
<minalpha> = <1>
<minother> = <1>
<mindiff> = <1>
<maxrepeats> = <2>
<minlen> = <8>
<histexpire> = <13>
<histsize> = <8>
<pwdchecks> = <>
<dictionlist> = </abc/def/ghi/jkl>
<default_roles> = <>
<fsize> = <-1>
<cpu> = <-1>
<data> = <-1>
<stack> = <65536>
<core> = <000000>
<rss> = <65536>
<nofiles> = <2000>
<time_last_login> = <1512632113>
<time_last_unsuccessful_login> = <1505304923>
<tty_last_login> = <ssh>
<tty_last_unsuccessful_login> = <ssh>
<host_last_login> = <0.000.000.000>
<host_last_unsuccessful_login> = <0.000.000.000>
<unsuccessful_login_count> = <0>
<roles> = <>
[download]

Comment on Re: Splitting a long row with multiple delimiters. Select or Download Code

Replies are listed 'Best First'.
Re^2: Splitting a long row with multiple delimiters. by dipit (Sexton) on Jan 19, 2018 at 15:10 UTC
Thank you for you efforts! But getting <undef> as value : Also, please explain more how its working, i am unable to understand regex! `<eab12345> = <undef> <id> = <undef> <pgrp> = <undef> <groups> = <undef> <home> = <undef> <shell> = <undef> <gecos> = <undef> <auditclasses> = <undef> <login> = <undef> <su> = <undef> <rlogin> = <undef> <daemon> = <undef> <admin> = <undef> <sugroups> = <undef> <admgroups> = <undef> <tpath> = <undef> <ttys> = <undef> <expires> = <undef> <auth1> = <undef> <auth2> = <undef> <umask> = <undef> contd........................` [download]	[reply] [d/l]
Re^3: Splitting a long row with multiple delimiters. by haukex (Archbishop) on Jan 21, 2018 at 13:35 UTC
But getting <undef> as value Is your Perl 5.12 or earlier, that is, more than six years old? There apparently was a bug with `(?\| )` that was fixed in v5.14 (this appears to be the commit). You should consider upgrading. i am unable to understand regex! Are you familiar with concepts such as non-capturing groups `(?: )` and other basics like `\s*` etc.? If not, you probably want to read perlretut first. In fact, as far as I can tell even the advanced regex features I used are explained there (with more details on each in perlre): `(?\| )` - Alternative capture group numbering `(?<name> )` - Named backreferences (see also %+) `(?<= )` and `(?! )` - Looking ahead and looking behind	[reply] [d/l] [select]