Perl300 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am trying to create xml from a txt file using XML::Writer and have found it by searching perlmonks archives. Only problem I am having now is to change the regex used in the example to make it work 100% the way I want. I could change it to be useful but still need some more tweaks.

Test.txt is the file I am trying to convert into xml and following are the contents of Test.txt

GI-eSTB-MIB-NPH::eSTBUserSettingOutput.0 = INTEGER: hd1920x1080i(1) GI-eSTB-MIB-NPH::eSTBUserSetting43OverRide.0 = INTEGER: on480p(3)

The code I am using is:

#!/usr/bin/perl use strict; use warnings; use XML::Writer; my $out; my $xml = XML::Writer->new(OUTPUT => \$out, DATA_MODE => 1, DATA_INDEN +T => ' '); $xml->xmlDecl(); $xml->startTag('doc'); my $check_1 = 0; open(my $fh, "<", "/home/drone/DrumTesting/Test.txt") or die "Failed to open file: $!\n"; while(<$fh>) { chomp; next if !length; my ($string1, $string2, $subscript_name, $subscript_value) = / ^([^\::]+) ([^\s]+) \.([^\s]+) \s(.*) /x; if ( $check_1 == 0 ) { $xml->startTag($string1); $check_1 += 1; } $xml->startTag($string2); $xml->dataElement($subscript_name => $subscript_value); $xml->endTag(); } $xml->endTag(); $xml->endTag(); $xml->end(); print $out; close $fh;

This generates following xml

<?xml version="1.0"?> <doc> <GI-eSTB-MIB-NPH> <::eSTBUserSettingOutput> <0>= INTEGER: hd1920x1080i(1)</0> </::eSTBUserSettingOutput> <::eSTBUserSetting43OverRide> <0>= INTEGER: on480p(3)</0> </::eSTBUserSetting43OverRide> </GI-eSTB-MIB-NPH> </doc>

What I am trying to do now is get rid of leading "::" and "= " from tags. Can you please suggest me what regex changes will I have to do so $string2 and $subscript_value would start from the position I want instead of where the preceding variable stopped fetching. I think it should be something like:

^([^\::]+) (/::(\[a-zA-Z]+)\./) \.([^\s]+) \s(.*)

Instead of what I have used in code, but it is showing error after i compile it: Unmatched ( in regex; marked by <-- HERE in m/ ^(^\::+) ( <-- HERE / at <script_name>.pl line 24.

Can you please suggest me what regex changes will I have to do so $string2 and $subscript_value would start from the position I want instead of where the preceding variable stopped fetching?

Also please suggest me some good place where I can learn and practice regex extensively

Replies are listed 'Best First'.
Re: What regex changes are needed while creating xml
by toolic (Bishop) on Jun 25, 2015 at 18:12 UTC
    I got rid of $string1 since you don't seem to use it, and I changed the 1st line of the regex:
    use warnings; use strict; while (<DATA>) { chomp; my ($string2, $subscript_name, $subscript_value) = / ^[^:]+:: ([^\s]+) \.([^\s]+) \s(.*) /x; print "$string2\n"; } __DATA__ GI-eSTB-MIB-NPH::eSTBUserSettingOutput.0 = INTEGER: hd1920x1080i(1) GI-eSTB-MIB-NPH::eSTBUserSetting43OverRide.0 = INTEGER: on480p(3)

    Outputs:

    eSTBUserSettingOutput eSTBUserSetting43OverRide
Re: What regex changes are needed while creating xml
by stevieb (Canon) on Jun 25, 2015 at 18:24 UTC

    Does this look right?

    <?xml version="1.0"?> <doc> <GI-eSTB-MIB-NPH> <eSTBUserSettingOutput> <0>INTEGER: hd1920x1080i(1)</0> </eSTBUserSettingOutput> <eSTBUserSetting43OverRide> <0>INTEGER: on480p(3)</0> </eSTBUserSetting43OverRide> </GI-eSTB-MIB-NPH> </doc>

    If so:

    / ^(.*?):: ([^\s]+) \.([^\s]+)\s+= \s(.*) /x;

    Two lines of input data isn't really reliable to craft a strong regex, so if the above code works, you'll just have to test it against a larger dataset.

    -stevieb

      Hi stevieb, That works and thank you for your help. Yes, I agree that just two lines are not enough for thorough testing, and so I tried to run it on the entire file that gets generated dynamically but got the following error:
      Code point \u0016 is not a valid character in XML at ./Call_to_snmpwal +k.pl line 32.

      This is line $xml->dataElement($subscript_name => $subscript_value);

      I am trying to see what else can I find. It seems like data related issue though as from error message it looks like: Some data was generated which is not considered as valid xml character.

      @toolid: Thank you for your help, but I was using $string1 in code at:

      $xml->startTag($string1);
        I searched for error: "Code point \u0016 is not a valid character in XML at ./Call_to_snmpwalk.pl line 32" It seems that this error is being generated due to control characters present in the text which are not allowed in xml. So I have two options:

        1) Remove these control characters from the file and then print: I have tried this using

        perl -pe's/\x08//g' <file1.xml >file1.xml

        But this gives error: Bad name after g' at <script_name>.pl line 13.

        2) To actually generate an xml (actual .xml file) from code and put the text that is converted in xml into this file and then read it. Do anyone have any suggestions on point 1 or 2?