Re: No tools? Use Perl?!
by Athanasius (Archbishop) on Jul 28, 2016 at 02:47 UTC
|
Hello Boyd.Ako, and welcome to the Monastery!
It seems you’re missing a forward slash in the terminating regex:
print if (/^\<ReportHost/../^\<\/ReportHost>/);
# ^
BTW, you don’t need to backslash the < character. Also, the regex will be easier to read if you change the delimiters, so that you don’t have to backslash the forward slash character:
print if m{^<ReportHost>} .. m{^</ReportHost>};
Note that the ^ metacharacter in a regex matches the beginning of a line. Is this what you want? It would be safer to match any occurrences of <ReportHost> and </ReportHost> wherever they occur within the XML document (i.e., leave out the ^ characters).
Hope that helps,
| [reply] [d/l] [select] |
|
|
print if (/^\<ReportHost/../^\<\/ReportHost>/);
# ^
...you don’t need to backslash the < character. Also, the regex will be easier to read if you change the delimiters
Thanks for the tip! Can you explain or link me to an explaination of the new syntax?
Note that the ^ metacharacter in a regex matches the beginning of a line. Is this what you want?
Intinctively, yes since I'm expecting to see it at the beginning of the line. Although, you do bring up a good point. | [reply] [d/l] |
|
|
My forward slash is there. It's just after the filp-flp. No?
No. In your original post, the statement is
print if (/^\<ReportHost/../^\<\ReportHost>/);
no forward slash here ^
The \R escape sequence (matching a generic linebreak) was added with Perl version 5.10. Please see Character Classes and other Special Escapes in perlre. Prior to version 5.10, its use would have earned you an "Unrecognized escape \R passed through..." warning if you had enabled warnings, which you very wisely seem to be doing.
c:\@Work\Perl>perl -wMstrict -le
"print qq{perl version $]};
;;
my $rx = qr{ ^ \< \ReportHost> }xms;
print $rx;
"
perl version 5.010001
(?msx-i: ^ \< \ReportHost> )
c:\@Work\Perl>perl -wMstrict -le
"print qq{perl version $]};
;;
my $rx = qr{ ^ \< \ReportHost> }xms;
print $rx;
"
Unrecognized escape \R passed through at -e line 1.
perl version 5.008009
(?msx-i: ^ < ReportHost> )
Update:
Can you explain or link me to an explaination of the new syntax?
I'm not sure what "new" syntax you're referring to. Documentation for your local installation of Perl should be available to you from the command line via, e.g.,
perldoc perlre
with the most important regex syntax doc files being perlre, perlretut, and perlrequick. For the qr// m// s/// tr/// operators, see perlop. (At least I hope you can use perldoc. If not, your Perl installation is seriously b0rken!)
On-line, see The Documentation for all Perl documentation for the most recent and all previous Perl versions.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
If you look again at the original post, you’ll see that the forward slash which is there is functioning as a delimiter for the regex. But the forward shash to be matched is missing. That’s why I recommended using different delimiters for the regex: if the delimiter is something other than a slash, you don’t need to backslash the slash inside the regex. Too many back- and forward-slashes produce what is called “leaning toothpick syndrome.”
On the syntax for delimiters, see, e.g., perlretut#Simple-word-matching and perlop#Quote-and-Quote-like-Operators. Note that if you use any delimiter other than a forward slash, you need to prefix the regex with an m: /abc/ becomes m!abc! or m[abc], etc.
Hope that helps,
| [reply] [d/l] [select] |
Re: No tools? Use Perl?!
by codiac (Beadle) on Jul 28, 2016 at 10:42 UTC
|
Your while loop looks at one line at a time but your text says "synopsis (multi line)" ... D'oh!
Even if you are disconnected from the net you must have some modules installed, maybe you can have a look around and see if someone else got a useful module installed.
If not, let's hope the files are small enough to fit in memory
# Suck in the whole file
my $text = do { local $/; <$fh> };
# use a nested group to exclude the close tag
while ($text=~ m{<ReportHost[^>]*>(?:(?!</ReportHost>).)*</ReportHost>
+}s) {
# print out the content of each ReportHost tag
print "$1\n";
}
Untested! :) | [reply] [d/l] |
|
|
| [reply] [d/l] [select] |
|
|
The regexp needs the /g switch or the code will loop forever, and you use $1 but don't have capturing groups.
I don't know the specifics as to how /g works. All I know is that in other scripts I've wrote it stops at the first instance. Considering that I have multiple <RemoteHost> sections I don't think it will work and thus am assuming that is what I want. I haven't set up $1 captures yet because it should only have one thing being feed to it; the file. I'll work on error handling for invalid input later.
| [reply] [d/l] [select] |
|
|
|
|
| [reply] |
|
|
Your while loop looks at one line at a time but your text says "synopsis (multi line)" ... D'oh!
What I mean is that the XML object data spans serveral lines between the opening and closing tags. It's not a <tag text="blah blah blah"></tag> but more of a
<tag name="stuff">
blah
blah
blah
</tag>
Even if you are disconnected from the net you must have some modules installed, maybe you can have a look around and see if someone else got a useful module installed.
From instmodsh:
CPAN::Meta
CPAN::Meta::Requirements
CPAN::Meta::YAML
Crypt::Blowfish_PP
ExtUtils::CBuilder
File::SearchPath
IPC::Cmd
JMX::JMX4Perl
JSON::PP
Locale::Maketext::Simple
Module::Build
Module::CoreList
Module::Load
Module::Load::Conditional
Module::Metadata
Params::Check
Parse::CPAN::Meta
Perl
Perl::OSType
Term::Clui
Term::ShellUI
Term::Size
Test::Simple
check_postgres
parent
version
If not, let's hope the files are small enough to fit in memory
I'm iffy to the concept of slurping and generally avoid it due to all the warnings that come with it. The files get anywhere from 3-5MB on a 2GB system with nagios running. Don't know if that's "small" enough. | [reply] [d/l] [select] |
Re: No tools? Use Perl?!
by RonW (Parson) on Jul 29, 2016 at 01:32 UTC
|
There are freely available, tiny, pure Perl XML parsers available. These are small enough you could copy paste them into your Perl program.
The tiniest one (less than 50 lines) returns a list of parsed elements. From the list, you can find the first occurrence of RemoteHost, then find the hostname, synopsis and other elements you want and extract their values. Then find the /RemoteHost element, then find the next RemoteHost element, find and extract the subelement, finally, repeat until end of file reached.
This one is at http://www.cs.sfu.ca/~cameron/REX.html#AppA in appendix A.
The other is actually the module XML::Parser::Lite and is based on the above parser. It's a little easier to use than the "bare" parser. The module is small enough you could copy/paste it into your Perl program. It does not have to be its own package, or you can do package main; after the end of XML::Parser::Lite
HOWEVER, you can not use regular expressions - or even split - in your call back functions when using XML::Parser::Lite because the callbacks are being called by Perls regex engine. However, you should not need to use regular expressions or split as the element names given to the callbacks can be directly compared (using eq) to scalar values and the element values are cleanly extracted.
| [reply] [d/l] [select] |
|
|
I might be able to due the shallow parsing method. It's somewhat small enough for me to retype. (Did I mention I'm on an isolated system?) XML::Parser::Lite is way too much for me to retype and requires other modules.
From the list, you can find the first occurrence of RemoteHost, then find the hostname, synopsis and other elements you want and extract their values. Then find the /RemoteHost element, then find the next RemoteHost element, find and extract the subelement, finally, repeat until end of file reached.
Don't suppose you could give me a quick example on how do that with ShallowParse?
| [reply] |
|
|
While you mentioned isolated, you didn't say the only way to get your Perl program into it is by typing it manually. We have several isolated systems where I work. However, we are able to get files into them by burning them to a DVD and giving that to IT. They scan the DVD, then load the files into whichever isolated system we specify. Getting files off is easier. Get a blank DVD from the IT office, burn the files on to it, then bring the DVD to our own PCs.
(BTW, XML::Parser::Lite is not as big as it looks. And only uses core Perl modules that almost certainly will be on your isolated system. It uses an enhanced version of ShallowParse. The rest of the code is "OO packaging" and for dynamically configuring call backs. That code can be stripped away and your call backs named the default names.)
ShallowParse is not hard to use. It returns a list of the elements and the content within and between elements, making those easier to get. And, since you are working with the returned list, you can use regular expressions. However, it doesn't handle attributes of elements. You have to "post process" the elements to get the attributes if the input XML has those (and you need them).
For my example of how to use ShallowParse, I am assuming all the elements you are interested in are simple containers with no attributes.
#!perl
# REX/Perl 1.0
# Robert D. Cameron "REX: XML Shallow Parsing with Regular Expressions
+",
# Technical Report TR 1998-17, School of Computing Science, Simon Fras
+er
# University, November, 1998.
# Copyright (c) 1998, Robert D. Cameron.
# The following code may be freely used and distributed provided that
# this copyright and citation notice remains intact and that modificat
+ions
# or additions are clearly identified.
$TextSE = "[^<]+";
$UntilHyphen = "[^-]*-";
$Until2Hyphens = "$UntilHyphen(?:[^-]$UntilHyphen)*-";
$CommentCE = "$Until2Hyphens>?";
$UntilRSBs = "[^\\]]*](?:[^\\]]+])*]+";
$CDATA_CE = "$UntilRSBs(?:[^\\]>]$UntilRSBs)*>";
$S = "[ \\n\\t\\r]+";
$NameStrt = "[A-Za-z_:]|[^\\x00-\\x7F]";
$NameChar = "[A-Za-z0-9_:.-]|[^\\x00-\\x7F]";
$Name = "(?:$NameStrt)(?:$NameChar)*";
$QuoteSE = "\"[^\"]*\"|'[^']*'";
$DT_IdentSE = "$S$Name(?:$S(?:$Name|$QuoteSE))*";
$MarkupDeclCE = "(?:[^\\]\"'><]+|$QuoteSE)*>";
$S1 = "[\\n\\r\\t ]";
$UntilQMs = "[^?]*\\?+";
$PI_Tail = "\\?>|$S1$UntilQMs(?:[^>?]$UntilQMs)*>";
$DT_ItemSE = "<(?:!(?:--$Until2Hyphens>|[^-]$MarkupDeclCE)|\\?$Name(?:
+$PI_Tail))|%$Name;|$S";
$DocTypeCE = "$DT_IdentSE(?:$S)?(?:\\[(?:$DT_ItemSE)*](?:$S)?)?>?";
$DeclCE = "--(?:$CommentCE)?|\\[CDATA\\[(?:$CDATA_CE)?|DOCTYPE(?:$DocT
+ypeCE)?";
$PI_CE = "$Name(?:$PI_Tail)?";
$EndTagCE = "$Name(?:$S)?>?";
$AttValSE = "\"[^<\"]*\"|'[^<']*'";
$ElemTagCE = "$Name(?:$S$Name(?:$S)?=(?:$S)?(?:$AttValSE))*(?:$S)?/?>?
+";
$MarkupSPE = "<(?:!(?:$DeclCE)?|\\?(?:$PI_CE)?|/(?:$EndTagCE)?|(?:$Ele
+mTagCE)?)";
$XML_SPE = "$TextSE|$MarkupSPE";
sub ShallowParse {
my($XML_document) = @_;
return $XML_document =~ /$XML_SPE/g;
}
my @els = ShallowParse(<<_EOD_);
<scan>
<RemoteHost>
<hostname>example.com</hostname>
<synopsis>I scanned this host
and didn't find anything interesting.
</synopsis>
</RemoteHost>
</scan>
_EOD_
my $n;
for (@els)
{
if (($_ eq '<RemoteHost>') .. ($_ eq '</RemoteHost>'))
{
if ($n = (($_ eq '<hostname>') .. ($_ eq '</hostname>')))
{
next if (($n < 2) || (rindex($n,'E0') > 0)); # skip the ta
+gs
print "Host: $_\n";
}
elsif ($n = (($_ eq '<synopsis>') .. ($_ eq '</synopsis>')))
{
next if (($n < 2) || (rindex($n,'E0') > 0)); # skip the ta
+gs
print "Synopsis:\n$_\n";
}
}
}
About using the range operator. It has a value that is 0 until the first condition matches, then it becomes 1, which indicates the condition just became true. Then, each time the range is tested, the value increases by 1. When the second condition is matched, the value is still numerically incremented, but the "stringification" of the value ends with 'E0' to indicate that the second condition just became true. In my example, I use this behavior to skip the start and end tags of each element with less complex logic in the code.
If any of your elements contained attributes, then the element processing would change:
if ($n = ((/^<foo/) .. ($_ eq '</foo>')))
{
if ($n == 1)
{
...; # get the attribute(s) and value(s)
}
elsif (rindex($n,'E0') < 1)
{
...; # process the content
}
}
Or, if the element was not a container:
if (/^<bar/ && /\/>$/)
{
...; # get the attribute(s) and value(s)
}
| [reply] [d/l] [select] |
Re: No tools? Use Perl?!
by Jenda (Abbot) on Aug 01, 2016 at 14:56 UTC
|
Are you sure you haven't got any installed? I think XML::Parser used to be part of the Perl installation. While the interface is ... erm ... awkward, it is a complete standards conforming XML parser.
Jenda
Enoch was right!
Enjoy the last years of Rome.
| [reply] |
|
|
>corelist XML::Parser
Data for 2013-08-12
XML::Parser was not in CORE (or so I think)
>
And I can't find anything in http://cpansearch.perl.org/src/BINGOS/Module-CoreList-5.20160720/lib/Module/CoreList.pm, so no, XML::Parser appears to be a strictly non-core module.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] |
|
|
| [reply] |