comment on

While you mentioned isolated, you didn't say the only way to get your Perl program into it is by typing it manually. We have several isolated systems where I work. However, we are able to get files into them by burning them to a DVD and giving that to IT. They scan the DVD, then load the files into whichever isolated system we specify. Getting files off is easier. Get a blank DVD from the IT office, burn the files on to it, then bring the DVD to our own PCs.

(BTW, XML::Parser::Lite is not as big as it looks. And only uses core Perl modules that almost certainly will be on your isolated system. It uses an enhanced version of ShallowParse. The rest of the code is "OO packaging" and for dynamically configuring call backs. That code can be stripped away and your call backs named the default names.)

ShallowParse is not hard to use. It returns a list of the elements and the content within and between elements, making those easier to get. And, since you are working with the returned list, you can use regular expressions. However, it doesn't handle attributes of elements. You have to "post process" the elements to get the attributes if the input XML has those (and you need them).

For my example of how to use ShallowParse, I am assuming all the elements you are interested in are simple containers with no attributes.

#!perl

# REX/Perl 1.0 
# Robert D. Cameron "REX: XML Shallow Parsing with Regular Expressions
+",
# Technical Report TR 1998-17, School of Computing Science, Simon Fras
+er 
# University, November, 1998.
# Copyright (c) 1998, Robert D. Cameron. 
# The following code may be freely used and distributed provided that
# this copyright and citation notice remains intact and that modificat
+ions
# or additions are clearly identified.

$TextSE = "[^<]+";
$UntilHyphen = "[^-]*-";
$Until2Hyphens = "$UntilHyphen(?:[^-]$UntilHyphen)*-";
$CommentCE = "$Until2Hyphens>?";
$UntilRSBs = "[^\\]]*](?:[^\\]]+])*]+";
$CDATA_CE = "$UntilRSBs(?:[^\\]>]$UntilRSBs)*>";
$S = "[ \\n\\t\\r]+";
$NameStrt = "[A-Za-z_:]|[^\\x00-\\x7F]";
$NameChar = "[A-Za-z0-9_:.-]|[^\\x00-\\x7F]";
$Name = "(?:$NameStrt)(?:$NameChar)*";
$QuoteSE = "\"[^\"]*\"|'[^']*'";
$DT_IdentSE = "$S$Name(?:$S(?:$Name|$QuoteSE))*";
$MarkupDeclCE = "(?:[^\\]\"'><]+|$QuoteSE)*>";
$S1 = "[\\n\\r\\t ]";
$UntilQMs = "[^?]*\\?+";
$PI_Tail = "\\?>|$S1$UntilQMs(?:[^>?]$UntilQMs)*>";
$DT_ItemSE = "<(?:!(?:--$Until2Hyphens>|[^-]$MarkupDeclCE)|\\?$Name(?:
+$PI_Tail))|%$Name;|$S";
$DocTypeCE = "$DT_IdentSE(?:$S)?(?:\\[(?:$DT_ItemSE)*](?:$S)?)?>?";
$DeclCE = "--(?:$CommentCE)?|\\[CDATA\\[(?:$CDATA_CE)?|DOCTYPE(?:$DocT
+ypeCE)?";
$PI_CE = "$Name(?:$PI_Tail)?";
$EndTagCE = "$Name(?:$S)?>?";
$AttValSE = "\"[^<\"]*\"|'[^<']*'";
$ElemTagCE = "$Name(?:$S$Name(?:$S)?=(?:$S)?(?:$AttValSE))*(?:$S)?/?>?
+";
$MarkupSPE = "<(?:!(?:$DeclCE)?|\\?(?:$PI_CE)?|/(?:$EndTagCE)?|(?:$Ele
+mTagCE)?)";
$XML_SPE = "$TextSE|$MarkupSPE";


sub ShallowParse { 
  my($XML_document) = @_;
  return $XML_document =~ /$XML_SPE/g;
}

my @els = ShallowParse(<<_EOD_);
<scan>
<RemoteHost>
<hostname>example.com</hostname>
<synopsis>I scanned this host
and didn't find anything interesting.
</synopsis>
</RemoteHost>
</scan>
_EOD_

my $n;
for (@els)
{
    if (($_ eq '<RemoteHost>') .. ($_ eq '</RemoteHost>'))
    {
        if ($n = (($_ eq '<hostname>') .. ($_ eq '</hostname>')))
        {
            next if (($n < 2) || (rindex($n,'E0') > 0)); # skip the ta
+gs
            print "Host: $_\n";
        }
        elsif ($n = (($_ eq '<synopsis>') .. ($_ eq '</synopsis>')))
        {
            next if (($n < 2) || (rindex($n,'E0') > 0)); # skip the ta
+gs
            print "Synopsis:\n$_\n";
        }
    }
}
[download]

About using the range operator. It has a value that is 0 until the first condition matches, then it becomes 1, which indicates the condition just became true. Then, each time the range is tested, the value increases by 1. When the second condition is matched, the value is still numerically incremented, but the "stringification" of the value ends with 'E0' to indicate that the second condition just became true. In my example, I use this behavior to skip the start and end tags of each element with less complex logic in the code.

If any of your elements contained attributes, then the element processing would change:

if ($n = ((/^<foo/) .. ($_ eq '</foo>')))
{
    if ($n == 1)
    {
        ...; # get the attribute(s) and value(s)
    }
    elsif (rindex($n,'E0') < 1)
    {
        ...; # process the content
    }
}
[download]

Or, if the element was not a container:

if (/^<bar/ && /\/>$/)
{
    ...; # get the attribute(s) and value(s)
}
[download]

In reply to Re^3: No tools? Use Perl?! by RonW
in thread No tools? Use Perl?! by Boyd.Ako

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.