in reply to inline replacement in existing regex

It sounds like you're asking how to substitute newlines in the middle of that substitution. The reason you're asking is that

it doesn't work if $3 contains multiples lines
I think this premise is wrong. What do you mean that it "doesn't work"? I just tried a small test case (assuming that the encode_base64 in your code is the one from MIME::Base64) and it works just fine.

Replies are listed 'Best First'.
Re^2: inline replacement in existing regex
by raiten (Acolyte) on Dec 01, 2004 at 06:04 UTC
    if using
    $content =~ s/(\w+)(: *)(.*)\n<xxxxyyxxxx>\n/$1.":".$2.encode_base64($ +3,'')."\n"/ges;
    it works but it takes the biggest text segment, not the smallest one which i want.
    to answer to tall_man, i'm trying to clean the ldif output from mozilla to get standard ldap to include in a directory.
    so i get a file with multiple
    cn: dn=toto sn: XXX description: YYY

    some of them base64 encoded, i have to decode them, for cn, add a good suffix, remove others, reformat ones, ...
    some of theses base64 fields contains multiples lines and at the end of the conversion (reencoding base64) my file has:
    cn: dn=toto sn: XXX description: YYY yyyy <xxxxyyxxxx> cn: dn=toto2 sn: XXX2 description: YYY2 cn: dn=toto3 sn: XXX3 description: YYY3 tttt uuuuuu <xxxxyyxxxx>

    $content =~ s/(\w+)(: *)(.*)\n<xxxxyyxxxx>\n/$1.":".$2.encode_base64($ +3,'')."\n"/ges;

    => encode the largest matching
    $content =~ s/(\w+)(: *)([^\n]*)\n<xxxxyyxxxx>\n/$1.":".$2.encode_base +64($3,'')."\n"/ges;

    => doesn't match field with more than 1 lines. i hope have been clearer.

    thanks

      It sounds like you just want to modify the greediness of *:

      $content =~ s/(\w+)(: *)(.*?)\n<xxxxyyxxxx>\n/$1.":".$2.encode_base64( +$3,'')."\n"/ges;
      The ? following the * will cause it to be non-greedy and thus match at the earliest possible point in the string rather than the latest.
        thanks, but it does not help. still a bit greedy

        i use the following code
        #!/usr/bin/perl -w # File: UTFtobase64.pl use strict; use warnings; use MIME::Base64; my $ldif_input_fn = 'personal-utf-final.ldif'; my $ldif_output_fn = 'personal-final.ldif'; open (INFILE, $ldif_input_fn) or die "Error opening ".$ldif_input_fn." +.\n"; binmode INFILE; my $content = ''; &read_file(\$ldif_input_fn, \$content); ## FIXME: what if the contents is on more than 1 lines # ? previous conversion '\n' -> '%%EOL%%' -> '\n' #$content =~ s/(\w+)(: *)([^\n]*)\n<xxxxyyxxxx>\n/$1.":".$2.encode_bas +e64($3,'')."\n"/ges; ## take the smaller string ... #$content =~ s/(\w+)(: *)(.*?)\n<xxxxyyxxxx>\n/$1.":".$2.encode_base64 +($3,'')."\n"/ges; ## take the bigger string ... $content =~ s/(\w+)(: *)(.*)\n<xxxxyyxxxx>\n/$1.":".$2.encode_base64($ +3,'')."\n"/ges; &write_file(\('>'.$ldif_output_fn), \$content); exit(0); # reads a file sub read_file { my ($fn_ptr, $text_ptr) = @_; open (INFILE, $$fn_ptr) or die "Error opening ".$$fn_ptr.".\n"; binmode INFILE; ## read the whole text and convert windows linebreaks(\r\n) to unix ## linebreaks(\n) while (my $line = <INFILE>) { $line =~ s!\r\n!\n!gs; $$text_ptr .= $line; } close INFILE; return; } # writes a file sub write_file { my ($fn_ptr, $text_ptr) = @_; open (OUTFILE, $$fn_ptr) or die "Error opening ".$$fn_ptr.".\n"; binmode OUTFILE; print OUTFILE $$text_ptr; close OUTFILE; return; }

        to recall, it is one parted of the ldif from mozilla contacts to a real ldap-compliant file. my test file:
        dn: cn=toto1,ou=Fournisseurs, ou=People,dc=test,dc=local objectclass: top objectclass: person objectclass: organizationalPerson sn: b cn: a b dn: cn=address_1,cn=toto1,ou=Fournisseurs, ou=People,dc=test,dc=local objectclass: top objectclass: person objectclass: organizationalPerson objectclass: inetOrgPerson givenName: a sn: b cn: a b cn: address_1 mail: toto1@a.com o: o dn: cn=toto2,ou=Fournisseurs, ou=People,dc=test,dc=local objectclass: top objectclass: person objectclass: organizationalPerson sn: b2 cn: a2 b2 dn: cn=address_1,cn=toto2,ou=Fournisseurs, ou=People,dc=test,dc=local objectclass: top objectclass: person objectclass: organizationalPerson objectclass: inetOrgPerson givenName: a2 sn: b2 cn: a2 b2 cn: address_1 mail: toto2@a.com o: o dn: cn=toto3,ou=Fournisseurs, ou=People,dc=test,dc=local objectclass: top objectclass: person objectclass: organizationalPerson sn: b3 cn: a3 b3 dn: cn=address_1,cn=toto3,ou=Fournisseurs, ou=People,dc=test,dc=local objectclass: top objectclass: person objectclass: organizationalPerson objectclass: inetOrgPerson givenName: a3 sn: b3 cn: a3 b3 cn: address_1 mail: a3 b3 title: Responsable Commercial o: OO description: 09/09/2004 reunion pabx/call center <xxxxyyxxxx> dn: cn=toto4,ou=Fournisseurs, ou=People,dc=test,dc=local objectclass: top objectclass: person objectclass: organizationalPerson sn: b4 cn: a4 b4 dn: cn=address_1,cn=toto4,ou=Fournisseurs, ou=People,dc=test,dc=local objectclass: top objectclass: person objectclass: organizationalPerson objectclass: inetOrgPerson givenName: a4 sn: b4 cn: a4 b4 cn: address_1 mail: toto4@a.com description: 09/09/2004 reunion pabx/call center <xxxxyyxxxx> dn: cn=toto5,ou=Fournisseurs, ou=People,dc=test,dc=local objectclass: top objectclass: person objectclass: organizationalPerson sn: b5 cn: a5 b5 dn: cn=address_1,cn=toto5,ou=Fournisseurs, ou=People,dc=test,dc=local objectclass: top objectclass: person objectclass: organizationalPerson objectclass: inetOrgPerson givenName: a5 sn: b5 cn: a5 b5 cn: address_1 mail: toto5@a.com o: o

        problem is with description of 3 lines (2 real line + separator)
        * with
        $content =~ s/(\w+)(: *)([^\n]*)\n<xxxxyyxxxx>\n/$1.":".$2.encode_base +64($3,'')."\n"/ges;
        these lines are not base64 reencoded.
        * with
        $content =~ s/(\w+)(: *)(.*)\n<xxxxyyxxxx>\n/$1.":".$2.encode_base64($ +3,'')."\n"/ges; &write_file(\('>'.$ldif_output_fn), \$content);
        nearly all file is. * with
        $content =~ s/(\w+)(: *)(.*?)\n<xxxxyyxxxx>\n/$1.":".$2.encode_base64( +$3,'')."\n"/ges; &write_file(\('>'.$ldif_output_fn), \$content);
        i get
        dn:: Y249dG90bzEsb3U9Rm91cm5pc3NldXJzLCBvdT1QZW9wbGUsZGM9YXJsaXMsZGM9b +G9jYWwyCm9iamVjdGNsYXNzOiB0b3AKb2JqZWN0Y2xhc3M6IHB lcnNvbgpvYmplY3RjbGFzczogb3JnYW5pemF0aW9uYWxQZXJzb24Kc246IGIKY246IGEgY +goKZG46IGNuPWFkZHJlc3NfMSxjbj10b3RvMSxvdT1Gb3Vybml zc2V1cnMsIG91PVBlb3BsZSxkYz1hcmxpcyxkYz1sb2NhbDIKb2JqZWN0Y2xhc3M6IHRvc +ApvYmplY3RjbGFzczogcGVyc29uCm9iamVjdGNsYXNzOiBvcmd hbml6YXRpb25hbFBlcnNvbgpvYmplY3RjbGFzczogaW5ldE9yZ1BlcnNvbgpnaXZlbk5hb +WU6IGEKc246IGIKY246IGEgYgpjbjogYWRkcmVzc18xCm1haWw 6IHRvdG8xQGEuY29tCm86IG8KCmRuOiBjbj10b3RvMixvdT1Gb3Vybmlzc2V1cnMsIG91P +VBlb3BsZSxkYz1hcmxpcyxkYz1sb2NhbDIKb2JqZWN0Y2xhc3M 6IHRvcApvYmplY3RjbGFzczogcGVyc29uCm9iamVjdGNsYXNzOiBvcmdhbml6YXRpb25hb +FBlcnNvbgpzbjogYjIKY246IGEyIGIyCgpkbjogY249YWRkcmV zc18xLGNuPXRvdG8yLG91PUZvdXJuaXNzZXVycywgb3U9UGVvcGxlLGRjPWFybGlzLGRjP +WxvY2FsMgpvYmplY3RjbGFzczogdG9wCm9iamVjdGNsYXNzOiB wZXJzb24Kb2JqZWN0Y2xhc3M6IG9yZ2FuaXphdGlvbmFsUGVyc29uCm9iamVjdGNsYXNzO +iBpbmV0T3JnUGVyc29uCmdpdmVuTmFtZTogYTIKc246IGIyCmN uOiBhMiBiMgpjbjogYWRkcmVzc18xCm1haWw6IHRvdG8yQGEuY29tCm86IG8KCmRuOiBjb +j10b3RvMyxvdT1Gb3Vybmlzc2V1cnMsIG91PVBlb3BsZSxkYz1 hcmxpcyxkYz1sb2NhbDIKb2JqZWN0Y2xhc3M6IHRvcApvYmplY3RjbGFzczogcGVyc29uC +m9iamVjdGNsYXNzOiBvcmdhbml6YXRpb25hbFBlcnNvbgpzbjo gYjMKY246IGEzIGIzCgpkbjogY249YWRkcmVzc18xLGNuPXRvdG8zLG91PUZvdXJuaXNzZ +XVycywgb3U9UGVvcGxlLGRjPWFybGlzLGRjPWxvY2FsMgpvYmp lY3RjbGFzczogdG9wCm9iamVjdGNsYXNzOiBwZXJzb24Kb2JqZWN0Y2xhc3M6IG9yZ2Fua +XphdGlvbmFsUGVyc29uCm9iamVjdGNsYXNzOiBpbmV0T3JnUGV yc29uCmdpdmVuTmFtZTogYTMKc246IGIzCmNuOiBhMyBiMwpjbjogYWRkcmVzc18xCm1ha +Ww6IGEzIGIzCnRpdGxlOiBSZXNwb25zYWJsZSBDb21tZXJjaWF sCm86IE9PCmRlc2NyaXB0aW9uOiAwOS8wOS8yMDA0CnJldW5pb24gcGFieC9jYWxsIGNlb +nRlcg== dn:: Y249dG90bzQsb3U9Rm91cm5pc3NldXJzLCBvdT1QZW9wbGUsZGM9YXJsaXMsZGM9b +G9jYWwyCm9iamVjdGNsYXNzOiB0b3AKb2JqZWN0Y2xhc3M6IHB lcnNvbgpvYmplY3RjbGFzczogb3JnYW5pemF0aW9uYWxQZXJzb24Kc246IGI0CmNuOiBhN +CBiNAoKZG46IGNuPWFkZHJlc3NfMSxjbj10b3RvNCxvdT1Gb3V ybmlzc2V1cnMsIG91PVBlb3BsZSxkYz1hcmxpcyxkYz1sb2NhbDIKb2JqZWN0Y2xhc3M6I +HRvcApvYmplY3RjbGFzczogcGVyc29uCm9iamVjdGNsYXNzOiB vcmdhbml6YXRpb25hbFBlcnNvbgpvYmplY3RjbGFzczogaW5ldE9yZ1BlcnNvbgpnaXZlb +k5hbWU6IGE0CnNuOiBiNApjbjogYTQgYjQKY246IGFkZHJlc3N fMQptYWlsOiB0b3RvNEBhLmNvbQpkZXNjcmlwdGlvbjogMDkvMDkvMjAwNApyZXVuaW9uI +HBhYngvY2FsbCBjZW50ZXI= dn: cn=toto5,ou=Fournisseurs, ou=People,dc=test,dc=local objectclass: top objectclass: person objectclass: organizationalPerson sn: b5 cn: a5 b5 dn: cn=address_1,cn=toto5,ou=Fournisseurs, ou=People,dc=test,dc=local objectclass: top objectclass: person objectclass: organizationalPerson objectclass: inetOrgPerson givenName: a5 sn: b5 cn: a5 b5 cn: address_1 mail: toto5@a.com o: o
        so more parted, but not really it.
        thanks
        regards