User Questions
XML::Grove
4 direct replies — Read more / Contribute
by Laurent CAPRANI
on Sep 12, 2000 at 18:00

    XML::Grove is a tree-based processing module for XML.

    It is based on data::grove, a general model for hierarchical storing of information. Thanks to its very perl-ish implementation, it is very intuitive for Perl programmers to use and it could be a base for many useful tools.

    Unfortunately, as of September 2000, all developpement for XML::Grove or grove-based tools seems abandonned: An extremely simple bug that has been pointed out one year ago hasnīt still been fixed.

XML::UM
No replies — Read more | Post response
by mirod
on Sep 12, 2000 at 14:25

    Description

    The XML::UM module uses the maps that come with XML::Encoding to perform the reverse operation. It creates mapping routines that encode a UTF-8 string in the chosen encoding.

    This module comes in handy when processing XML, as XML::Parser converts all strings to UTF-8.

    A typical usage scenario would be:

    # create the encoding routine (only once!) $encode= XML::UM::get_encode(Encoding => 'big5'); ... # convert a utf8 string to the desired encoding $encoded_string= $encode($utf8_string);

    Warning: the version of XML::UM in libxml-enno-1.02 has an installation problem. To fix this, once you have downloaded and uncompressed the module, before doing perl Makefile.PL, edit the XML::UM.pm file in the lib/XML directory and replace the $ENCDIR value with the location of your XML::Encoding maps (it should be /usr/local/src/XML-Encoding-1.01/maps or /opt/src/XML-Encoding-1.01/maps/).

    Why use XML::UM?

    • it works!
    • it might be the only easy solution for you

    Why NOT use XML::UM?

    • it's slow
    • it cannot deal with latin-1
    • you can wait for the new Unicode features in Perl

    Personal comments

    XML::UM is probably just an interim solution while the new Unicode features in Perl are being developed. They will essentially perform the same tasks, just faster and in the Perl core (which means more support)

    In the meantime XML::UM is easy to use and can really save you some headaches with encodings.

    The absence of latin-1 conversion function (due to the fact that expat supports latin-1 natively, hence there is no encoding table for it in XML:Encoding) is a big flaw though.

    It would be real nice if someone would pick up the module and add latin1. Recoding it in C could help too.

Apache::MP3
1 direct reply — Read more / Contribute
by maverick
on Sep 11, 2000 at 22:17

    The Good

    This is an excellent package for anyone who wants to add MP3 listing and streaming to Apache. The installation directions are simple and straight forward. There are a number of configuration options to change the behaviour of the module. It also uses MP3::Info to extract id3 tag info for the listings and this information can be cached for better performance. I have nearly 5 Gig. of mp3s on a k6-2 450 and speed isn't an issue.

    The Bad

    I had originally installed this on a stock Redhat Apache/Mod_Perl setup (mod_perl loaded dynamically). For some reason this played havoc on the stability of the module. Once I recompiled Apache and Mod_Perl from source, it behaved flawlessly.

    The look of the generated pages is controled via a cascading style sheet. While this simplifies altering the apearance, it doesn't allow for an ultimate in configurability.

    And lastly, somebody else wrote it. For those of you who had considered writing one, the task has been done. :)

Proc::Daemon
1 direct reply — Read more / Contribute
by ncw
on Sep 11, 2000 at 20:58

    Proc::Daemon is a particularly useful module for anyone writing unix daemon's (a program which runs in the background with no user input). It detaches your program from the its parent, allowing it to run undisturbed by the parent, you logging out, or the parent waiting for it to die.

    This is a simple module and most of the code in it is contained in The Perl Cookbook, however this module brings it together in a neatly wrapped package so you don't have to remember any of those horrid details (like the double fork, setsid, chdir, reopen STDOUT etc)!

    Use it like this

    use Proc::Daemon; Proc::Daemon::Init();
    That is it!

    You'd quite likely like to re-open STDOUT & STDERR to a log file though, like this :-

    open(STDOUT, ">>$LOG") or die "Failed to re-open STDOUT to $LOG"; open(STDERR, ">&STDOUT") or die "Failed to re-open STDERR to STDOUT";
    I've used it quite a few times now, for writing real daemons, sendmail filters which need to run for a long time (eg email to SMS), for detaching processes from crontab, and for long running CGI programs (which don't need to return a response to the user).

    Warning: This module is unlikely to work under Windows, though it probably could be made to.

    Verdict: Small but perfectly formed!

Date::Calc
2 direct replies — Read more / Contribute
by Anonymous Monk
on Sep 11, 2000 at 18:02

    I first learned of this module from my copy of the Perl Cookbook (the 'Ram' book). Date::Calc enables you to do just about anything there is to be done with dates, much of it with built-in functions, and handles just about every sensible date format there is.

C::Scan
1 direct reply — Read more / Contribute
by knight
on Sep 11, 2000 at 15:36

    Description

    The C::Scan module performs fast, accurate scanning of C source code. It provides an object interface for accessing information about a particular C source file. The main interface (after creating the initial object) is to use a get() method that will fetch information using a set of pre-defined keywords which specify the type of information you want:
    • function declarations
    • (in-line) function definitions
    • macro definitions, with our without macro arguments
    • typedefs
    • extern variables
    • included files
    A lot of the information is available either raw or parsed, depending on the specific keyword used (for example, 'fdecls' vs. 'parsed_fdecls').

    Why should you use it?

    You want to use Perl to extract information about C source code, including the functions declared or defined, arguments to functions, typedefs, macros defined, etc.

    Why should you NOT use it?

    • You need a full-blown C parser. C::Scan is not that.
    • You need to scan C++.

    Any bad points?

    The documentation is lacking. This is really annoying because almost all of the keyword fetches that try to parse the text use complex and arbitrary structures for return values: an array ref of refs to arrays that each hold five defined values, an array ref of, a hash ref where the hash values are array refs to two-element arrays, etc. Don't be surprised if you have to dive in to the code to really figure out what's being returned.

    Related Modules

    C::Scan is an example of extremely powerful use of the Data::Flow module (not surprising, as both were originally written by Ilya). The keywords you use to fetch information are the underlying Data::Flow recipe keywords.

    Personal notes

    I used C::Scan to create a code pre-processor that would scan our C source and dump various information into structures for use by an administrative interface. This ended up eliminating several steps in our process that would always break when someone added a new command function but didn't update the right help-text table, etc.

    I learned a lot from threading my way through the C::Scan source code. It makes liberal use of \G in regexes to loop through text looking for pieces it can identify as a function, typedef, etc., and the pos builtin to fetch and set the offset for the searches. This allows the module to use multiple copies of the text side-by-side, one with the comments and strings whited out and the other with full text. This way, it can scan a "sanitized" version to identify C syntax by position, but then return full text from the other string. This is an extremely effective and astonishingly efficient technique.

    Example

    Examples of a few ways to pull information from C::Scan:
    $c = new C::Scan(filename => 'foo.c', filename_filter => 'foo.c', add_cppflags => '-DFOOBAR', includeDirs => [ 'dir1', 'dir2' ] ); # # Fetch and iterate through information about function declarations. # my $array_ref = $c->get('parsed_fdecls'); foreach my $func (@$array_ref) { my ($type, $name, $args, $full_text, undef) = @$func; foreach my $arg (@$args) { my ($atype, $aname, $aargs, $full_text, $array_modifiers) = @$ +arg; } } # # Fetch and iterate through information about #define values w/out arg +s. # my $hash_ref = $c->get('defines_no_args'); foreach my $macro_name (keys %$hash_ref) { my $macro_text = $hash_ref{$macro_name}; } # # Fetch and iterate through information about #define macros w/args. # my $hash_ref = $c->get('defines_args'); foreach my $macro_name (keys %$hash_ref) { my $array_ref = $macros_hash{$macro_name}; my ($arg_ref, $macro_text) = @$array_ref; my @macro_args = @$arg_ref; }
Win32::API
2 direct replies — Read more / Contribute
by Guildenstern
on Sep 11, 2000 at 14:07
    Every now and then I find a need to call a function that would be second nature if I were using Visual C++ in a Windows environment. I happened across Win32::API, and was very pleasantly surprised at what the module can do.
    Win32::API allows functions to be imported from Windows DLL files. It also makes the job much easier by making the actual calling of the function incredibly simple. No more trying to do weird type conversions and stupid pointer tricks - just pack your variables and call the function. If the function doesn't take a structure as a parameter, it's even easier - just call it like a normal Perl sub.
    I've had great success at using it, and while I haven't benchmarked any results it appears to be quite fast. Coding the function call is simple, also. It takes longer to research the function and its parameters than it does to write the call. Drawbacks: There are two drawbacks to Win32::API that I've noticed so far. One could be easily remedied, while the other is probably not something I should hold my breath for. Here they are:
    1. It would be nice if the documentation listed or gave a link to information about what size standard MS parameters are. It's kind of a pain to have to track down what the difference between a WORD and a DWORD is so you know what to pack
    2. It would be really nice if support for functions that have callbacks. One example I'm thinking of is placing icons in the system tray. To be able to respond to mouse clicks, the function call that places the icon there must specify a callback that gets executed on a mouse click. AFAIK, there is no way to do this yet in Perl, and it may be impossible at the present.

    Bottom Line: Win32::API is a great way to build a Windows based Perl solution without having to resort to an MS programming solution because "there's no module that does X".
IPC::Open3
1 direct reply — Read more / Contribute
by tilly
on Sep 11, 2000 at 01:44
    A basic Unix process starts life with a default place to read data from the outside world (STDIN), a default place to write data (STDOUT), a basic place to write errors to (STDERR), and some parent process who hopefully will be interested in how you died and will catch your return code (0 by default unless you exit or die).

    Perl has many ways of starting processes. The most common are the following:

    1. system: This launches a process which will use your STDIN, STDOUT, STDERR, and returns you the return code. In the process the special variables $! and $? will be set.
    2. Backticks: This launches a process which will use your STDIN and STDERR, but which will return STDOUT to you and throw away the return code.
    3. open(CMD, "| $cmd"): Open a process which you can write to the STDIN of, which uses your STDOUT and STDERR. The return of open is the process ID. You can collect the return with wait or waitpid.
    4. open(CMD, "$cmd |"): Open a process which you can read STDOUT from, which uses your STDOUT and STDERR. Essentially the same as backticks but you can start processing the data interactively. (This can matter a lot if the command returns large amounts of data.) The return of open is the process ID. You can collect the return with wait or waitpid.
    This should suffice for most purposes. (See perlipc for more on inter-process communication.) But from time to time you need more control. That is where IPC::Open3 becomes useful.

    IPC::Open3 exports a function, open3. This allows you to start a command and choose what it gets for STDIN, STDOUT, and STDERR. For instance you might hand it for STDIN a filehandle reading from /dev/null to supress any attempts on its part to interact. You might want to keep track of STDERR for error reporting. Anything you want.

    As with open, open3 returns the process ID. When the program should have exited you can then call wait or waitpid, check that you collected the right child, then look at $? for the return value.

    For an example of real use of this, see Run commands in parallel. Another useful example that would make a good exercise is to write a function to run a process and surpresses all output unless the return-code is not zero in which case it prints an error message with information on the command, the return code, and what was printed to STDERR. (I use something like this for cron jobs.)

Mail::POP3Client
2 direct replies — Read more / Contribute
by xjar
on Sep 10, 2000 at 23:36

    Description

    Mail::POP3Client implements an Object Oriented interface to a POP3 server using RFC1939. It is quite easy to use.

    Why use MAIL::POP3Client?

  • You want to easily access a POP3 server with Perl.

    Why NOT use Mail::POP3Client?

  • You are accessing an IMAP server :-).

    Example

    use Mail::POP3Client; $pop = new Mail::POP3Client( USER => "me", PASSWORD => "mypassword", HOST => "pop3.do.main" ); for( $i = 1; $i <= $pop->Count(); $i++ ) { foreach( $pop->Head( $i ) ) { /^(From|Subject):\s+/i && print $_, "\n"; } } $pop->Close();

    Personal Notes

    MAIL::POP3Client is the module I used when writing pemail, a command line MUA. I found the module to be very intuitive and straightforward. Writing pemail and using this module are what really got me into Perl programming. I highly recommend this module for any programs that require POP3 server access.

XML::Parser
3 direct replies — Read more / Contribute
by mirod
on Sep 10, 2000 at 16:06

    Description

    XML::Parser provides ways to parse XML documents.

    • built on top of XML::Parser::Expat, a lower level interface to James Clark's expat library
    • most of the other XML modules are built on top of XML::Parser
    • stream-oriented
    • for each event found while parsing a document a user-defined handler can be called
    • events are start and end tags, text, but also comments, processing instructions, CDATA, entities, element or attribute declarations in the DTD...
    • handlers receive the parser object and context information
    • sets of pre-defined handlers can be used as Styles
    • A companion module, XML::Encodings, allows XML::Parser to parse XML documents in various encodings, besides the native UTF-8, UTF-16 and ISO-8859-1 (latin 1)

    Why use XML::Parser

    • widely used, the first XML module, hence it is very robust
    • if you need performance as it is low level, and obviously all modules beased on it are slower
    • you need access to some parsing events that are masked by higher-level modules
    • one of the Styles does exactly what you want
    • if you want to write your own module based on XML::Parser

    Why NOT use XML::Parser

    Related Modules

    Besides the modules already mentioned:

    • XML::UM can translate characters between various encodings,
    • XML::Checker is a validating parser that just replaces XML::Parser

    Personal comments

    XML::Parser is the basis of most XML processing in Perl. Even if you don't plan to use it directly, you should at least know how to use it if you are working with XML.

    That said I think that it is usually a good idea to have a look at the various ;odules that sub-class XML::Parser, as they are usually easier to use.

    There are some compatibility problems between XML::Parser version 2.28 and higher and a lot of other modules, most notably XML::DOM. Plus it seems to be doing some funky stuff with UTF-8 strings. Hence I would stick to version 2.27 at the moment.

    Update: Activestate distribution currently includes XML::Parser 2.27

    Things to know about XML::Parser

    Characters are converted to UTF-8

    XML::Parser will gladly parse latin-1 (ISO 8859-1) documents provided the XML declaration mentions that encoding. It will convert all characters to UTF-8 though, so outputting latin-1 is tricky. You will need to use Perl's unicode functions, which have changed recently so I will postpone detailed instructions until I catch-up with them ;--(

    Catching exceptions

    The XML recommendation mandates that when an error is found in the XML the parser stop processing immediatly. XML::Parser goes even further: it displays an error message and then die's.

    To avoid dying wrap the parse in an eval block:

    eval { $parser->parse }; if( $@) { my $error= $@; #cleanup }

    Getting all the character data

    The Char handler can be called several times within a single text element. This happens when the text includes new lines, entities or even at random, depending on expat buffering mechanism. So the real content should actually be built by pushing the string passed to Char, and by using it only in the End handler.

    my $stored_content=''; # global sub Start { my( $expat, $gi, %atts)= @_; process( $stored_content); # needed for mixed content such as # <p>text <b>bold</b> more text</p> $stored_content=''; # needs to be reset } sub Char { my( $expat, $string)= @_; $stored_content .= $string; # can't do much with it } sub End { my( $expat, $gi)= @_; process( $stored_content); # now it's full $stored_content=''; # reset here too }

    XML::Parser Styles

    Styles are handler bundles. 5 styles are defined in XML::Parser, others can be created by users.

    Subs

    Each time an element starts, a sub by that name is called with the same parameters that the Start handler gets called with.

    Each time an element ends, a sub with that name appended with an underscore ("_"), is called with the same parameters that the End handler gets called with.

    Tree

    Parse will return a parse tree for the document. Each node in the tree takes the form of a tag, content pair. Text nodes are represented with a pseudo-tag of "0" and the string that is their content. For elements, the content is an array reference. The first item in the array is a (possibly empty) hash reference containing attributes.

    The remainder of the array is a sequence of tag-content pairs representing the content of the element.

    Objects

    This is similar to the Tree style, except that a hash object is created for each element. The corresponding object will be in the class whose name is created by appending "::" to the element name. Non-markup text will be in the ::Characters class. The contents of the corresponding object will be in an anonymous array that is the value of the Kids property for that object.

    Stream

    If none of the subs that this style looks for is there, then the effect of parsing with this style is to print a canonical copy of the document without comments or declarations. All the subs receive as their 1st parameter the Expat instance for the document they're parsing.

    It looks for the following routines:

    • StartDocument: called at the start of the parse.
    • StartTag: called for every start tag with a second parameter of the element type. The $_ variable will contain a copy of the tag and the %_ variable will contain attribute values supplied for that element.
    • EndTag: called for every end tag with a second parameter of the element type. The $_ variable will contain a copy of the end tag.
    • Text: called just before start or end tags with accumulated non-markup text in the $_ variable.
    • PI: called for processing instructions. The $_ variable will contain a copy of the PI and the target and data are sent as 2nd and 3rd parameters respectively.
    • EndDocument: called at conclusion of the parse.

    Debug

    This just prints out the document in outline form.

Email::Valid
1 direct reply — Read more / Contribute
by kilinrax
on Sep 09, 2000 at 14:33

    Email::Valid


    Description

    Checks an email address for rfc822 compliance, and, optionally, can also perform an mx check on the domain.
    It's worth pointing out here again that attempting to check an email address with a regexp is a very bad idea (see merlyn's reaction to one such attempt, or the explanation of it from perlfaq 9).

    Requirements

    Who Should Use It?

    • Anyone who wants a simple and fairly reliable check of an email address submitted via a form over the web.

    Any Bad Points?

    • Not a 100% reliable method of verifying an email address (can't be done, except by sending mail to it).
    • Requests can sometimes take a while to process.

    Example

    #!/usr/bin/perl require 5; use strict; use Email::Valid; use vars qw($addr $email); if (@ARGV) { foreach $email (@ARGV) { eval { unless ($addr = Email::Valid->address( -address => $email, -mxcheck => 1 )) { warn "address failed $Email::Valid::Details check."; } }; warn "an error was encountered: $@" if $@; } } else { print <<EOF; Usage: $0 [email(s)] Synopsis: checks email address is rfc822 compliant, and performs an mx + check on the domain. EOF }
Data::Dumper
5 direct replies — Read more / Contribute
by geektron
on Sep 08, 2000 at 23:28
    Data::Dumper is in the standard Perl distribution. it's probably the easiest module to use, just issue a 'use Data::Dumper' call, and then either print to STDOUT ( with 'print Dumper \@foo ') or stick into some html used by an application ( i prefer to wrap it in <pre> tags, so the output isn't mangled.) Data::Dumper also works keenly with objects, and gives you the class into which your variable is blessed. (with makes debugging object relations really quick.)
Filter::Handle
2 direct replies — Read more / Contribute
by Adam
on Sep 08, 2000 at 21:51
    Filter::Handle

    This module was originally written here at the Perl Monks Monastery. It provides a simple interface allowing users to tie a filter subroutine to a filehandle. There are many things you could do with this... from labeling STDERR output as STDERR output to easily puting line numbers and timestamps in a logfile.

XML::Writer
5 direct replies — Read more / Contribute
by mirod
on Sep 08, 2000 at 09:35

    Description

    XML::Writer generates XML using an interface similar to CGI.pm. It allows various checks to be performed on the document and takes care of special caracter encoding.

    Why use XML::Writer?

    • you are generating XML documents "from scratch"
    • you are used to CGI.pm
    • XML::Writer is quite mature

    Why NOT use XML::Writer?

    • another method is more appropriate
    • you don't like CGI.pm!

    Related modules

    XML::ValidWriter and XML::AutoWriter both aim at emulating XML::Writer interface:

    • XML::ValidWriter performs some checks on the output document. Notably it checks whether the elements and attributes are declared in the DTD and whether you are closing the appropriate element.
    • XML::AutoWriter automatically generates missing start or end tags, based on the DTD.

    XML::Generator and XML::Handler::YAWriter are 2 other modules doing XML generation

    Personal notes

    At the moment XML::Writer seems to be the most mature and efficient module to generate XML. Of course a lot of the transformation modules such as XML::Simple, XML::DOM and XML::Twig can also be used;

    Of course plain print's can also be used, but I think that XML::Writer is a lot more convenient and adds some control over the generated XML.

    Example

    #!/bin/perl -w use strict; use XML::Writer; use IO; my $doc = new IO::File(">doc.xml"); my $writer = new XML::Writer(OUTPUT => $doc); $writer->startTag("doc", class => "simple"); # tag + att $writer->dataElement( 'title', "Simple XML Document");# text elt $writer->startTag( "section"); $writer->dataElement( 'title', "Introduction", no => 1, type => "intro"); $writer->startTag( "para"); $writer->characters( "a text with"); $writer->dataElement( 'bold', "bold"); $writer->characters( " words."); $writer->endTag( "para"); $writer->endTag(); # close section $writer->endTag(); # close doc $writer->end(); # check that the doc # has only one element $doc->close(); # fixed (was $output->close(); ) as suggested by the p +ost below
Roman
3 direct replies — Read more / Contribute
by mirod
on Sep 07, 2000 at 10:38

    Description

    Roman is a module for conversion between Roman and Arabic numerals.

    use Roman; $arabic = arabic($roman) if isroman($roman); $roman = Roman($arabic); $roman = roman($arabic);

    Why use Roman?

    • to number list item numbers in roman format
    • to display dates in an MPAA approved format

    Why NOT use Roman?

    • if you need Roman numbers above 4000 (or if you don't need roman numbers!)

    Note: the module does not have a Makefile.PL, so you will have to copy it in your perl module path yourself that should be something like /usr/lib/perl5/site_perl/5.6.0/. Alternatively you can use ExtUtils::MakeMaker to generate a Makefile.PL:

    perl -e 'use ExtUtils::MakeMaker; WriteMakefile(NAME => "Roman");'

    Personal comments

    Roman is a little module that I found when I had to convert Roman numbered lists from XML to HTML. Instead of spending half an hour remembering how those guys counted then writing it myself it took me 5 minutes to install a generic solution. Cool!

    I guess now with Unicode being available the module could be upgraded to handle more numbers.

    Update: it might look like Dominus does not quite like Roman: Roman.pm is a new contender for stupidest Perl module ever written. but he is actually talking about a different module, one he wrote himself and apparently never submitted to CPAN, which allows you to write things like $IV+$IV and get VIII as a result.