Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I need to get rid of some data inside quotes and I am having problems because the string has multiple line so on the next interaction it is picking up on the wrong thing, any help will be great.

And here is the code with some data and sample of what the results should looks like.
Thanks again!

#!/perl/bin/perl use strict; use warnings; use CGI qw/:standard/; use CGI::Carp qw(fatalsToBrowser); print header(); while (<DATA>) { chomp; if (/CN=\w+/) { #$_ =~ s/^(.+?"CN=.+?),.+$/$1"/; $_ =~ s/^(\w+\s,\d+,"CN=.+?),.+$/$1"/; print "$_\n"; } } __DATA__ DHCP Administrators,2,"CN=May Mary,OU=Enterprise Admins,Users,DC=INTER +NETNET,DC=com Domain Admins,Users,DC=INTERNETNETCOM,DC=com",, DnsAdmins,2,"Enterprise Admins,Users,DC=INTERNETNET,DC=com CN=Arth Gure,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com Domain Admins,Users,DC=INTERNETCOM,DC=com",, AS400 Query Database,2,"CN=Joe Car,OU=Systems/Operations,OU=MIS,OU=Use +r Accounts,DC=INTERNETNET,DC=com CN=Ricrad Tallar,OU=Systems/Operations,OU=MIS,OU=User Accounts,DC=INTE +RNETNETCOM,DC=com",, Deptpar Access,8,"CN=John Class,OU=Marketing,OU=User Accounts,DC=INTER +NETNET,DC=com CN=Judy Lipa,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com CN=George Grey,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com Artur More,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com CN=Raimun Sirilo,OU=Executive,OU=User Accounts,DC=INTERNETNET,DC=com CN=Amilcar Ove,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com Daniel Santos,OU=Executive,OU=User Accounts,DC=INTERNETNET,DC=com CN=Paula Corte,OU=Executive,OU=User Accounts,DC=INTERNETNET,DC=com" Human_Resources,3,"CN=Katarine Gilly,OU=Executive,OU=User Accounts,DC= +INTERNETNET,DC=com CN=Chris Head,OU=Human Resources,OU=Finance & Administration,OU=User A +ccounts,DC=INTERNETNET,DC=com CN=Susany Cadru,OU=Human Resources,OU=Finance & Administration,OU=User + Accounts,DC=INTERNETNET,DC=com" =commnet Here are what I am expecting to get back: DHCP Administrators,2,"CN=May Mary" DnsAdmins,2,"CN=Arth Gure" AS400 Query Database,2,"CN=Joe Car,CN=Ricrad Tallar" Deptpar Access,8,"CN=John Class,CN=Judy Lipa,CN=George Grey,CN=Raimun +Sirilo,CN=Amilcar Ove,CN=Paula Corte" Human_Resources,3,"CN=Katarine Gilly,CN=Chris Head,CN=Susany Cadru" =cut

Replies are listed 'Best First'.
Re: Regular Expression Help
by Joost (Canon) on Jan 13, 2005 at 13:54 UTC
Re: Regular Expression Help
by gellyfish (Monsignor) on Jan 13, 2005 at 14:12 UTC

    If your data is exactly have it there then:

    local $/ = ''; + while (<DATA>) { chomp; + my( $group, $number, $members) = split /,/,$_,3; + @members = ($members =~ /(CN=[^,]+)/mg); + $members = '"' . join( ',', @members) . '"'; + print "$group,$number,$members\n"; + } __DATA__ ...
    will work fine. Of course if the data changes format in the slightest it will probably break

    PS Please do not keep posting new threads when you have a followup to a previously posted question.

    /J\

Re: Regular Expression Help
by Random_Walk (Prior) on Jan 13, 2005 at 14:21 UTC

    Myself I would do it like this, I think it is more readable than regexing it all in one go

    #!/usr/bin/perl use strict; use warnings; use CGI qw/:standard/; use CGI::Carp qw(fatalsToBrowser); print header(); local $/=""; while (<DATA>) { s/\n/,/sg; s/\"//g; my ($name, $number, @bits)=split /,/; my $string; for (@bits){ $string.=", \"".$_."\"" if $_=~/^CN=/; } print "$name, $number$string\n"; } __DATA__ DHCP Administrators,2,"CN=May Mary,OU=Enterprise Admins,Users,DC=INTER +NETNET,DC=com Domain Admins,Users,DC=INTERNETNETCOM,DC=com",, DnsAdmins,2,"Enterprise Admins,Users,DC=INTERNETNET,DC=com CN=Arth Gure,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com Domain Admins,Users,DC=INTERNETCOM,DC=com",, AS400 Query Database,2,"CN=Joe Car,OU=Systems/Operations,OU=MIS,OU=Use +r Accounts,DC=INTERNETNET,DC=com CN=Ricrad Tallar,OU=Systems/Operations,OU=MIS,OU=User Accounts,DC=INTE +RNETNETCOM,DC=com",, Deptpar Access,8,"CN=John Class,OU=Marketing,OU=User Accounts,DC=INTER +NETNET,DC=com CN=Judy Lipa,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com CN=George Grey,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com Artur More,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com CN=Raimun Sirilo,OU=Executive,OU=User Accounts,DC=INTERNETNET,DC=com CN=Amilcar Ove,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com Daniel Santos,OU=Executive,OU=User Accounts,DC=INTERNETNET,DC=com CN=Paula Corte,OU=Executive,OU=User Accounts,DC=INTERNETNET,DC=com" Human_Resources,3,"CN=Katarine Gilly,OU=Executive,OU=User Accounts,DC= +INTERNETNET,DC=com CN=Chris Head,OU=Human Resources,OU=Finance & Administration,OU=User A +ccounts,DC=INTERNETNET,DC=com CN=Susany Cadru,OU=Human Resources,OU=Finance & Administration,OU=User + Accounts,DC=INTERNETNET,DC=com" # results (without the CGI) DHCP Administrators, 2, "CN=May Mary" DnsAdmins, 2, "CN=Arth Gure" AS400 Query Database, 2, "CN=Joe Car", "CN=Ricrad Tallar" Deptpar Access, 8, "CN=John Class", "CN=Judy Lipa", "CN=George Grey", +"CN=Raimun Sirilo", "CN=Amilcar Ove", "CN=Paula Corte" Human_Resources, 3, "CN=Katarine Gilly", "CN=Chris Head", "CN=Susany C +adru"

    Cheers,
    R.

Re: Regular Expression Help
by holli (Abbot) on Jan 13, 2005 at 14:23 UTC
    that is easy.
    $/=""; while (<DATA>) { /^([^,]+,[0-9]+)/; $a = $1; @a = /(CN=[^,"]+)/msg; print qq($a,"), join(", ", @a), qq("\n\n); } __DATA__ DHCP Administrators,2,"CN=May Mary,OU=Enterprise Admins,Users,DC=INTER +NETNET,DC=com Domain Admins,Users,DC=INTERNETNETCOM,DC=com",, DnsAdmins,2,"Enterprise Admins,Users,DC=INTERNETNET,DC=com CN=Arth Gure,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com Domain Admins,Users,DC=INTERNETCOM,DC=com",, AS400 Query Database,2,"CN=Joe Car,OU=Systems/Operations,OU=MIS,OU=Use +r Accounts,DC=INTERNETNET,DC=com CN=Ricrad Tallar,OU=Systems/Operations,OU=MIS,OU=User Accounts,DC=INTE +RNETNETCOM,DC=com",, Deptpar Access,8,"CN=John Class,OU=Marketing,OU=User Accounts,DC=INTER +NETNET,DC=com CN=Judy Lipa,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com CN=George Grey,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com Artur More,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com CN=Raimun Sirilo,OU=Executive,OU=User Accounts,DC=INTERNETNET,DC=com CN=Amilcar Ove,OU=Marketing,OU=User Accounts,DC=INTERNETNET,DC=com Daniel Santos,OU=Executive,OU=User Accounts,DC=INTERNETNET,DC=com CN=Paula Corte,OU=Executive,OU=User Accounts,DC=INTERNETNET,DC=com" Human_Resources,3,"CN=Katarine Gilly,OU=Executive,OU=User Accounts,DC= +INTERNETNET,DC=com CN=Chris Head,OU=Human Resources,OU=Finance & Administration,OU=User A +ccounts,DC=INTERNETNET,DC=com CN=Susany Cadru,OU=Human Resources,OU=Finance & Administration,OU=User + Accounts,DC=INTERNETNET,DC=com"
    Output:
    DHCP Administrators,2,"CN=May Mary" DnsAdmins,2,"CN=Arth Gure" AS400 Query Database,2,"CN=Joe Car, CN=Ricrad Tallar" Deptpar Access,8,"CN=John Class, CN=Judy Lipa, CN=George Grey, CN=Raim +un Sirilo, CN=Amilcar Ove, CN=Paula Corte" Human_Resources,3,"CN=Katarine Gilly, CN=Chris Head, CN=Susany Cadru"
    hint:
    if one regex gets to complicated: split it up.
      Actually was my fault to forget to say that sometimes the data will have lines that doesn't match the ones a have here, and them the code breaks, unfortunately, just because it is looking for a space between the lines separating wach block.
        how do these lines, look? Would you please provide an example.