in reply to Splitting a long row with multiple delimiters.

A single sample input is usually not enough to reliably design a regex (see also). <update> Please use <code> tags when posting sample input. Also, can't you get this data in a more parseable format? </update> I have made the following assumptions:

use warnings; use strict; my $str = q{eab12345 id=00000 pgrp=abcdefgh groups=abcdefgh home=/home +/eab12345 shell=/usr/bin/ksh gecos=AB/C/Y0000/ABC/XYZ RTYUI, LMNOP *C +ONTRACTOR* (AS 00000) auditclasses=general,files,TCPIP login=true su= +true rlogin=true daemon=true admin=false sugroups=ALL admgroups= tpat +h=nosak ttys=ALL expires=0 auth1=SYSTEM auth2=NONE umask=00 registry= +AD SYSTEM=AD logintimes= loginretries=5 pwdwarntime=5 account_locked= +false minage=0 maxage=13 maxexpired=0 minalpha=1 minother=1 mindiff=1 + maxrepeats=2 minlen=8 histexpire=13 histsize=8 pwdchecks= dictionlis +t=/abc/def/ghi/jkl default_roles= fsize=-1 cpu=-1 data=-1 stack=65536 + core=000000 rss=65536 nofiles=2000 time_last_login=1512632113 time_l +ast_unsuccessful_login=1505304923 tty_last_login=ssh tty_last_unsucce +ssful_login=ssh host_last_login=0.000.000.000 host_last_unsuccessful_ +login=0.000.000.000 unsuccessful_login_count=0 roles= }; my $REGEX = qr{ (?| # treat beginning of string as a key only \A \s* (?<key> \w+ ) \s* | # otherwise, a normal key=value pair (?<= \s ) # key must be preceded by a space (?<key> \w+ ) \s* = \s* (?<value> # a value may not look like another key=value (?: (?! \s* \w+ = ) [^=] )* ) \s* ) }msx; pos($str)=undef; while ( $str =~ /\G$REGEX/gc ) { print "<", $+{key}, "> = <", $+{value}//'undef', ">\n"; } die "failed to parse at pos ".pos($str) unless pos($str)==length($str);
<eab12345> = <undef> <id> = <00000> <pgrp> = <abcdefgh> <groups> = <abcdefgh> <home> = </home/eab12345> <shell> = </usr/bin/ksh> <gecos> = <AB/C/Y0000/ABC/XYZ RTYUI, LMNOP *CONTRACTOR* (AS 00000)> <auditclasses> = <general,files,TCPIP> <login> = <true> <su> = <true> <rlogin> = <true> <daemon> = <true> <admin> = <false> <sugroups> = <ALL> <admgroups> = <> <tpath> = <nosak> <ttys> = <ALL> <expires> = <0> <auth1> = <SYSTEM> <auth2> = <NONE> <umask> = <00> <registry> = <AD> <SYSTEM> = <AD> <logintimes> = <> <loginretries> = <5> <pwdwarntime> = <5> <account_locked> = <false> <minage> = <0> <maxage> = <13> <maxexpired> = <0> <minalpha> = <1> <minother> = <1> <mindiff> = <1> <maxrepeats> = <2> <minlen> = <8> <histexpire> = <13> <histsize> = <8> <pwdchecks> = <> <dictionlist> = </abc/def/ghi/jkl> <default_roles> = <> <fsize> = <-1> <cpu> = <-1> <data> = <-1> <stack> = <65536> <core> = <000000> <rss> = <65536> <nofiles> = <2000> <time_last_login> = <1512632113> <time_last_unsuccessful_login> = <1505304923> <tty_last_login> = <ssh> <tty_last_unsuccessful_login> = <ssh> <host_last_login> = <0.000.000.000> <host_last_unsuccessful_login> = <0.000.000.000> <unsuccessful_login_count> = <0> <roles> = <>

Replies are listed 'Best First'.
Re^2: Splitting a long row with multiple delimiters.
by dipit (Sexton) on Jan 19, 2018 at 15:10 UTC

    Thank you for you efforts! But getting <undef> as value :

    Also, please explain more how its working, i am unable to understand regex!

    <eab12345> = <undef> <id> = <undef> <pgrp> = <undef> <groups> = <undef> <home> = <undef> <shell> = <undef> <gecos> = <undef> <auditclasses> = <undef> <login> = <undef> <su> = <undef> <rlogin> = <undef> <daemon> = <undef> <admin> = <undef> <sugroups> = <undef> <admgroups> = <undef> <tpath> = <undef> <ttys> = <undef> <expires> = <undef> <auth1> = <undef> <auth2> = <undef> <umask> = <undef> contd........................
      But getting <undef> as value

      Is your Perl 5.12 or earlier, that is, more than six years old? There apparently was a bug with (?| ) that was fixed in v5.14 (this appears to be the commit). You should consider upgrading.

      i am unable to understand regex!

      Are you familiar with concepts such as non-capturing groups (?: ) and other basics like \s* etc.? If not, you probably want to read perlretut first. In fact, as far as I can tell even the advanced regex features I used are explained there (with more details on each in perlre):