XML::Grove
4 direct replies — Read more / Contribute
|
by Laurent CAPRANI
on Sep 12, 2000 at 18:00
|
|
|
XML::Grove is a tree-based processing module for XML. It is based on data::grove, a general model for hierarchical storing of information. Thanks to its very perl-ish implementation, it is very intuitive for Perl programmers to use and it could be a base for many useful tools.
Unfortunately, as of September 2000, all developpement for XML::Grove or grove-based tools seems abandonned: An extremely simple bug that has been pointed out one year ago hasnīt still been fixed.
|
XML::UM
No replies — Read more | Post response
|
by mirod
on Sep 12, 2000 at 14:25
|
|
|
Description
The
XML::UM module uses the maps that come with
XML::Encoding to perform the reverse operation. It
creates mapping routines that encode a UTF-8 string
in the chosen encoding.
This module comes in handy when processing XML, as
XML::Parser converts all strings to UTF-8.
A typical usage scenario would be:
# create the encoding routine (only once!)
$encode= XML::UM::get_encode(Encoding => 'big5');
...
# convert a utf8 string to the desired encoding
$encoded_string= $encode($utf8_string);
Warning: the version of XML::UM in
libxml-enno-1.02 has an installation problem. To fix
this, once you have downloaded and uncompressed the
module, before doing perl Makefile.PL, edit
the XML::UM.pm file in the lib/XML
directory and replace the $ENCDIR value with
the location of your XML::Encoding maps (it
should be /usr/local/src/XML-Encoding-1.01/maps
or /opt/src/XML-Encoding-1.01/maps/).
Why use XML::UM?
- it works!
- it might be the only easy solution for you
Why NOT use XML::UM?
- it's slow
- it cannot deal with latin-1
- you can wait for the new Unicode features in Perl
Personal comments
XML::UM is probably just an interim solution while the
new Unicode features in Perl are being developed. They
will essentially perform the same tasks, just faster and
in the Perl core (which means more support)
In the meantime XML::UM is easy to use and can really
save you some headaches with encodings.
The absence of latin-1 conversion function (due to the
fact that expat supports latin-1 natively, hence there is
no encoding table for it in XML:Encoding) is a big
flaw though.
It would be real nice if someone would pick up the module
and add latin1. Recoding it in C could help too.
|
Apache::MP3
1 direct reply — Read more / Contribute
|
by maverick
on Sep 11, 2000 at 22:17
|
|
|
The Good
This is an excellent package for anyone who wants to add
MP3 listing and streaming to Apache. The installation
directions are simple and straight forward. There are
a number of configuration options to change the behaviour
of the module. It also uses MP3::Info to extract id3 tag
info for the listings and this information can be cached for
better performance. I have nearly 5 Gig. of mp3s on a k6-2 450
and speed isn't an issue.
The Bad
I had originally installed this on a stock Redhat Apache/Mod_Perl
setup (mod_perl loaded dynamically). For some reason this
played havoc on the stability of the module. Once I
recompiled Apache and Mod_Perl from source, it behaved
flawlessly.
The look of the generated pages is controled via
a cascading style sheet. While this simplifies altering
the apearance, it doesn't allow for an ultimate in configurability.
And lastly, somebody else wrote it.
For those of you who had considered writing one, the task
has been done. :)
|
Proc::Daemon
1 direct reply — Read more / Contribute
|
by ncw
on Sep 11, 2000 at 20:58
|
|
|
Proc::Daemon is a particularly useful module for anyone
writing unix daemon's (a program which runs in the background with no user input). It detaches your program from
the its parent, allowing it to run undisturbed by the parent,
you logging out, or the parent waiting for it to die.
This is a simple module and most of the code in it is contained
in The Perl Cookbook, however this module brings it together
in a neatly wrapped package so you don't have to remember
any of those horrid details (like the double fork, setsid, chdir, reopen STDOUT etc)!
Use it like this
use Proc::Daemon;
Proc::Daemon::Init();
That is it!
You'd quite likely like to re-open STDOUT & STDERR
to a log file though, like this :-
open(STDOUT, ">>$LOG")
or die "Failed to re-open STDOUT to $LOG";
open(STDERR, ">&STDOUT")
or die "Failed to re-open STDERR to STDOUT";
I've used it quite a few times now, for writing real daemons,
sendmail filters which need to run for a long time (eg email to SMS),
for detaching processes from crontab, and for long running CGI programs
(which don't need to return a response to the user).
Warning:
This module is unlikely to work under Windows, though it probably
could be made to.
Verdict: Small but perfectly formed!
|
Date::Calc
2 direct replies — Read more / Contribute
|
by Anonymous Monk
on Sep 11, 2000 at 18:02
|
|
|
I first learned of this module from my copy of the Perl Cookbook (the 'Ram' book). Date::Calc enables you to do just about anything there is to be done with dates, much of it with built-in functions, and handles just about every sensible date format there is.
|
C::Scan
1 direct reply — Read more / Contribute
|
by knight
on Sep 11, 2000 at 15:36
|
|
|
Description
The C::Scan module performs fast,
accurate scanning of C source code.
It provides an object interface
for accessing information about
a particular C source file.
The main interface
(after creating the initial object)
is to use a get() method
that will fetch information using
a set of pre-defined keywords
which specify the type of information
you want:
- function declarations
- (in-line) function definitions
- macro definitions, with our without macro arguments
- typedefs
- extern variables
- included files
A lot of the information is available
either raw or parsed, depending
on the specific keyword used
(for example, 'fdecls' vs. 'parsed_fdecls').
Why should you use it?
You want to use Perl
to extract information about
C source code,
including the functions declared or defined,
arguments to functions,
typedefs,
macros defined, etc.
Why should you NOT use it?
- You need a full-blown C parser.
C::Scan is not that.
- You need to scan C++.
Any bad points?
The documentation is lacking.
This is really annoying
because almost all of the keyword fetches
that try to parse the text
use complex and arbitrary structures for return values:
an array ref of refs to arrays
that each hold five defined values,
an array ref of,
a hash ref where the hash values are array refs
to two-element arrays,
etc.
Don't be surprised if you have to dive in
to the code to really figure out
what's being returned.
Related Modules
C::Scan is an example of extremely
powerful use of the
Data::Flow module
(not surprising,
as both were originally
written by Ilya).
The keywords you use to fetch information
are the underlying Data::Flow recipe keywords.
Personal notes
I used C::Scan
to create a code pre-processor
that would scan our C source
and dump various information
into structures
for use by an administrative interface.
This ended up eliminating several
steps in our process that would
always break when someone
added a new command function
but didn't update the right help-text table,
etc.
I learned a lot
from threading my way through
the C::Scan source code.
It makes liberal use of \G in regexes
to loop through text looking
for pieces it can identify
as a function, typedef, etc.,
and the pos builtin to fetch
and set the offset for the searches.
This allows the module to use
multiple copies of the text side-by-side,
one with the comments and strings whited out
and the other with full text.
This way, it can scan a "sanitized" version
to identify C syntax by position,
but then return full text from the other string.
This is an extremely effective
and astonishingly efficient technique.
Example
Examples of a few ways
to pull information from C::Scan:
$c = new C::Scan(filename => 'foo.c',
filename_filter => 'foo.c',
add_cppflags => '-DFOOBAR',
includeDirs => [ 'dir1', 'dir2' ]
);
#
# Fetch and iterate through information about function declarations.
#
my $array_ref = $c->get('parsed_fdecls');
foreach my $func (@$array_ref) {
my ($type, $name, $args, $full_text, undef) = @$func;
foreach my $arg (@$args) {
my ($atype, $aname, $aargs, $full_text, $array_modifiers) = @$
+arg;
}
}
#
# Fetch and iterate through information about #define values w/out arg
+s.
#
my $hash_ref = $c->get('defines_no_args');
foreach my $macro_name (keys %$hash_ref) {
my $macro_text = $hash_ref{$macro_name};
}
#
# Fetch and iterate through information about #define macros w/args.
#
my $hash_ref = $c->get('defines_args');
foreach my $macro_name (keys %$hash_ref) {
my $array_ref = $macros_hash{$macro_name};
my ($arg_ref, $macro_text) = @$array_ref;
my @macro_args = @$arg_ref;
}
|
Win32::API
2 direct replies — Read more / Contribute
|
by Guildenstern
on Sep 11, 2000 at 14:07
|
|
|
Every now and then I find a need to call a function that would be second nature if I were using Visual C++ in a Windows environment. I happened across Win32::API, and was very pleasantly surprised at what the module can do.
Win32::API allows functions to be imported from Windows DLL files. It also makes the job much easier by making the actual calling of the function incredibly simple. No more trying to do weird type conversions and stupid pointer tricks - just pack your variables and call the function. If the function doesn't take a structure as a parameter, it's even easier - just call it like a normal Perl sub.
I've had great success at using it, and while I haven't benchmarked any results it appears to be quite fast. Coding the function call is simple, also. It takes longer to research the function and its parameters than it does to write the call.
Drawbacks: There are two drawbacks to Win32::API that I've noticed so far. One could be easily remedied, while the other is probably not something I should hold my breath for. Here they are:
- It would be nice if the documentation listed or gave a link to information about what size standard MS parameters are. It's kind of a pain to have to track down what the difference between a WORD and a DWORD is so you know what to pack
- It would be really nice if support for functions that have callbacks. One example I'm thinking of is placing icons in the system tray. To be able to respond to mouse clicks, the function call that places the icon there must specify a callback that gets executed on a mouse click. AFAIK, there is no way to do this yet in Perl, and it may be impossible at the present.
Bottom Line: Win32::API is a great way to build a Windows based Perl solution without having to resort to an MS programming solution because "there's no module that does X".
|
IPC::Open3
1 direct reply — Read more / Contribute
|
by tilly
on Sep 11, 2000 at 01:44
|
|
|
A basic Unix process starts life with a default place to
read data from the outside world (STDIN), a default place
to write data (STDOUT), a basic place to write errors to
(STDERR), and some parent process who hopefully will be
interested in how you died and will catch your return code
(0 by default unless you exit or die).
Perl has many ways of starting processes. The most common
are the following:
- system: This launches a process which will use your
STDIN, STDOUT, STDERR, and returns you the return code. In
the process the special variables $! and $? will be set.
- Backticks: This launches a process which will use your
STDIN and STDERR, but which will return STDOUT to you and
throw away the return code.
- open(CMD, "| $cmd"): Open a process which
you can write to the STDIN of, which uses your STDOUT and
STDERR. The return of open is the process ID. You can
collect the return with wait or waitpid.
- open(CMD, "$cmd |"): Open a process which
you can read STDOUT from, which uses your STDOUT and
STDERR. Essentially the same as backticks but you can
start processing the data interactively. (This can matter
a lot if the command returns large amounts of data.) The
return of open is the process ID. You can
collect the return with wait or waitpid.
This should suffice for most purposes. (See perlipc for
more on inter-process communication.) But from time to
time you need more control. That is where IPC::Open3
becomes useful.
IPC::Open3 exports a function, open3. This allows
you to start a command and choose what it gets for STDIN,
STDOUT, and STDERR. For instance you might hand it for
STDIN a filehandle reading from /dev/null to supress any
attempts on its part to interact. You might want to keep
track of STDERR for error reporting. Anything you want.
As with open, open3 returns the process ID. When the
program should have exited you can then call wait or
waitpid, check that you collected the right child, then
look at $? for the return value.
For an example of real use of this, see Run commands in parallel.
Another useful example that would make a good exercise is to
write a function to run a process and surpresses all output
unless the return-code is not zero in which case it prints
an error message with information on the command, the return
code, and what was printed to STDERR. (I use something
like this for cron jobs.)
|
Mail::POP3Client
2 direct replies — Read more / Contribute
|
by xjar
on Sep 10, 2000 at 23:36
|
|
|
Description
Mail::POP3Client implements an Object Oriented interface to a POP3 server using RFC1939. It is quite easy to use.
Why use MAIL::POP3Client?
- You want to easily access a POP3 server with Perl.
Why NOT use Mail::POP3Client?
- You are accessing an IMAP server :-).
Example
use Mail::POP3Client;
$pop = new Mail::POP3Client( USER => "me",
PASSWORD => "mypassword",
HOST => "pop3.do.main" );
for( $i = 1; $i <= $pop->Count(); $i++ ) {
foreach( $pop->Head( $i ) ) {
/^(From|Subject):\s+/i && print $_, "\n";
}
}
$pop->Close();
Personal Notes
MAIL::POP3Client is the module I used when writing pemail, a command line MUA. I found the module to be very intuitive and straightforward. Writing pemail and using this module are what really got me into Perl programming. I highly recommend this module for any programs that require POP3 server access.
|
XML::Parser
3 direct replies — Read more / Contribute
|
by mirod
on Sep 10, 2000 at 16:06
|
|
|
Description
XML::Parser provides ways to parse XML documents.
- built on top of XML::Parser::Expat, a lower level
interface to James Clark's expat library
- most of the other XML modules are built on top of
XML::Parser
- stream-oriented
- for each event found while parsing a document a
user-defined handler can be called
- events are start and end tags, text, but also
comments, processing instructions, CDATA, entities,
element or attribute declarations in the DTD...
- handlers receive the parser object and context
information
- sets of pre-defined handlers can be used as
Styles
- A companion module, XML::Encodings, allows
XML::Parser to parse XML documents in various
encodings, besides the native UTF-8, UTF-16 and
ISO-8859-1 (latin 1)
Why use XML::Parser
- widely used, the first XML module, hence it is
very robust
- if you need performance as it is low level, and
obviously all modules beased on it are slower
- you need access to some parsing events that are
masked by higher-level modules
- one of the Styles does exactly what you want
- if you want to write your own module based on
XML::Parser
Why NOT use XML::Parser
Related Modules
Besides the modules already mentioned:
- XML::UM can translate characters between various
encodings,
- XML::Checker is a validating parser that just
replaces XML::Parser
Personal comments
XML::Parser is the basis of most XML processing in Perl.
Even if you don't plan to use it directly, you should at
least know how to use it if you are working with XML.
That said I think that it is usually a good idea to have a look at the various ;odules that sub-class XML::Parser, as they are usually easier to use.
There are some compatibility problems between XML::Parser version 2.28 and higher and a lot of other modules, most notably XML::DOM.
Plus it seems to be doing some funky stuff with UTF-8 strings.
Hence I would stick to version 2.27 at the moment.
Update: Activestate distribution
currently includes XML::Parser 2.27
Things to know about XML::Parser
Characters are converted to UTF-8
XML::Parser will gladly parse latin-1 (ISO 8859-1) documents provided the XML declaration mentions that encoding. It will convert all characters to UTF-8 though, so
outputting latin-1 is tricky. You will need to use Perl's
unicode functions, which have changed recently so I will postpone detailed instructions until I catch-up with them ;--(
Catching exceptions
The XML recommendation mandates that when an error is
found in the XML the parser stop processing
immediatly. XML::Parser goes even further: it
displays an error message and then die's.
To avoid dying wrap the parse in an
eval block:
eval { $parser->parse };
if( $@)
{ my $error= $@;
#cleanup
}
|
Getting all the character data
The Char handler can be called several times
within a single text element. This happens when the
text includes new lines, entities or even at random,
depending on expat buffering mechanism. So
the real content should actually be built by pushing
the string passed to Char, and by using it only in the
End handler.
my $stored_content=''; # global
sub Start
{ my( $expat, $gi, %atts)= @_;
process( $stored_content); # needed for mixed content such as
# <p>text <b>bold</b> more text</p>
$stored_content=''; # needs to be reset
}
sub Char
{ my( $expat, $string)= @_;
$stored_content .= $string; # can't do much with it
}
sub End
{ my( $expat, $gi)= @_;
process( $stored_content); # now it's full
$stored_content=''; # reset here too
}
|
XML::Parser Styles
Styles are handler bundles. 5 styles are defined in
XML::Parser, others can be created by users.
Subs
Each time an element starts, a sub by that name is
called with the same parameters that the Start handler
gets called with.
Each time an element ends, a sub with that name
appended with an underscore ("_"), is
called with the same parameters that the
End handler gets called with.
Tree
Parse will return a parse tree for the document. Each
node in the tree takes the form of a tag, content
pair. Text nodes are represented with a pseudo-tag of
"0" and the string that is their content.
For elements, the content is an array reference. The
first item in the array is a (possibly empty) hash
reference containing attributes.
The remainder of the array is a sequence of
tag-content pairs representing the content of the
element.
Objects
This is similar to the Tree style, except that a hash
object is created for each element. The corresponding
object will be in the class whose name is created by
appending "::" to the element name.
Non-markup text will be in the ::Characters class.
The contents of the corresponding object will be in
an anonymous array that is the value of the Kids
property for that object.
Stream
If none of the subs that this style looks for is
there, then the effect of parsing with this style is
to print a canonical copy of the document without
comments or declarations. All the subs receive as
their 1st parameter the Expat instance for the
document they're parsing.
It looks for the following routines:
- StartDocument: called at the start of
the parse.
- StartTag: called for every start tag
with a second parameter of the element type.
The $_ variable will contain a copy of
the tag and the %_ variable will contain
attribute values supplied for that element.
- EndTag: called for every end tag with a
second parameter of the element type. The $_
variable will contain a copy of the end tag.
- Text: called just before start or end
tags with accumulated non-markup text in the $_
variable.
- PI: called for processing instructions.
The $_ variable will contain a copy of the PI and
the target and data are sent as 2nd and
3rd parameters respectively.
- EndDocument: called at conclusion of the
parse.
Debug
This just prints out the document in outline form.
|
Email::Valid
1 direct reply — Read more / Contribute
|
by kilinrax
on Sep 09, 2000 at 14:33
|
|
|
Description
Checks an email address for rfc822 compliance, and, optionally, can also perform an mx check on the domain.
It's worth pointing out here again that attempting to check an email address with a regexp is a very bad idea (see merlyn's reaction to one such attempt, or the explanation of it from perlfaq 9).
Requirements
Who Should Use It?
- Anyone who wants a simple and fairly reliable check of an email address submitted via a form over the web.
Any Bad Points?
- Not a 100% reliable method of verifying an email address (can't be done, except by sending mail to it).
- Requests can sometimes take a while to process.
Example
#!/usr/bin/perl
require 5;
use strict;
use Email::Valid;
use vars qw($addr $email);
if (@ARGV) {
foreach $email (@ARGV) {
eval {
unless ($addr = Email::Valid->address( -address => $email,
-mxcheck => 1 )) {
warn "address failed $Email::Valid::Details check.";
}
};
warn "an error was encountered: $@" if $@;
}
} else {
print <<EOF;
Usage: $0 [email(s)]
Synopsis: checks email address is rfc822 compliant, and performs an mx
+ check on the domain.
EOF
}
|
Data::Dumper
5 direct replies — Read more / Contribute
|
by geektron
on Sep 08, 2000 at 23:28
|
|
|
Data::Dumper is in the standard Perl distribution. it's probably the easiest module to use, just issue a 'use Data::Dumper' call, and then either print to STDOUT ( with 'print Dumper \@foo ') or stick into some html used by an application ( i prefer to wrap it in <pre> tags, so the output isn't mangled.)
Data::Dumper also works keenly with objects, and gives you the class into which your variable is blessed. (with makes debugging object relations really quick.)
|
Filter::Handle
2 direct replies — Read more / Contribute
|
by Adam
on Sep 08, 2000 at 21:51
|
|
|
Filter::Handle
This module was originally written here at the Perl Monks Monastery. It provides a simple interface allowing users to tie a filter subroutine to a filehandle. There are many things you could do with this... from labeling STDERR output as STDERR output to easily puting line numbers and timestamps in a logfile.
|
XML::Writer
5 direct replies — Read more / Contribute
|
by mirod
on Sep 08, 2000 at 09:35
|
|
|
Description
XML::Writer generates XML using an interface similar to CGI.pm. It
allows various checks to be performed on the document and takes care of
special caracter encoding.
Why use XML::Writer?
- you are generating XML documents "from scratch"
- you are used to CGI.pm
- XML::Writer is quite mature
Why NOT use XML::Writer?
- another method is more appropriate
- you don't like CGI.pm!
Related modules
XML::ValidWriter and
XML::AutoWriter both aim at emulating XML::Writer
interface:
- XML::ValidWriter performs some checks on the output document. Notably it
checks whether the elements and attributes are declared in the DTD and whether
you are closing the appropriate element.
- XML::AutoWriter automatically generates missing start or end tags, based
on the DTD.
XML::Generator and
XML::Handler::YAWriter are 2 other modules doing XML generation
Personal notes
At the moment XML::Writer seems to be the most mature and
efficient module to generate XML. Of course a lot of the
transformation modules such as XML::Simple, XML::DOM and
XML::Twig can also be used;
Of course plain print's can also be used, but I think
that XML::Writer is a lot more convenient and adds
some control over the generated XML.
Example
#!/bin/perl -w
use strict;
use XML::Writer;
use IO;
my $doc = new IO::File(">doc.xml");
my $writer = new XML::Writer(OUTPUT => $doc);
$writer->startTag("doc", class => "simple"); # tag + att
$writer->dataElement( 'title', "Simple XML Document");# text elt
$writer->startTag( "section");
$writer->dataElement( 'title', "Introduction",
no => 1, type => "intro");
$writer->startTag( "para");
$writer->characters( "a text with");
$writer->dataElement( 'bold', "bold");
$writer->characters( " words.");
$writer->endTag( "para");
$writer->endTag(); # close section
$writer->endTag(); # close doc
$writer->end(); # check that the doc
# has only one element
$doc->close(); # fixed (was $output->close(); ) as suggested by the p
+ost below
|
Roman
3 direct replies — Read more / Contribute
|
by mirod
on Sep 07, 2000 at 10:38
|
|
|
use Roman;
$arabic = arabic($roman) if isroman($roman);
$roman = Roman($arabic);
$roman = roman($arabic);
Why use Roman?
- to number list item numbers in roman format
- to display dates in an MPAA approved format
Why NOT use Roman?
- if you need Roman numbers above 4000 (or if you don't
need roman numbers!)
Note: the module does not have a Makefile.PL, so you will
have to copy it in your perl module path yourself
that should be something like
/usr/lib/perl5/site_perl/5.6.0/. Alternatively
you can use ExtUtils::MakeMaker to generate
a Makefile.PL:
perl -e 'use ExtUtils::MakeMaker; WriteMakefile(NAME => "Roman");'
Personal comments
Roman is a little module that I found when I had to
convert Roman numbered lists from XML to HTML. Instead
of spending half an hour remembering how those guys
counted then writing it myself it took me 5 minutes to
install a generic solution. Cool!
I guess now with Unicode being available the module
could be upgraded to handle more numbers.
Update: it might look like Dominus does not quite
like Roman: Roman.pm is a new contender for stupidest Perl module ever written.
but he is actually talking about a different module, one he
wrote himself and apparently never submitted to CPAN, which allows you to write things like $IV+$IV
and get VIII as a result.
|
|