I have large piles of code lying around that I have written over the past couple of years that do nothing more than scrape embedded Web servers in printers to retrieve various bits of information: the state of the toner cartridges, the firmware revision, the serial number and other odds and ends.

I'm planning on tidying all this up and releasing it on CPAN. Hopefully it'll be elegant (and simple) enough that other people will want to contribute their own code for dealing with other printers. And that would be Good. I posted a message to module-authors@ the other day and it looks like Printer::Status is an acceptable namespace to use. A particular printer might be dealt with by the Printer::Status::Acme::x200dn module.

There are a couple of issues I need to sort out and so I'm looking for input.

Firstly, different printers report different things. Some will tell you the number of pages printed to date, others will give you the serial number, others the amount of toner left in the cartridge. What is more, when asked about toner cartridges, some printers will give you the amount of toner left in increments of 25%. Others will tell you that x thousand pages have been printed from it, and that an estimated y thousand more can be printed, based on a historical average ink coverage of z percent.

Another issue, but one that I'm not overly worried about is that some printers have horrid java applet interfaces only, which makes scraping them impossible. But, if you have an SNMP module installed you can get at them that way. Whatever, one needs to be able to specify favoured or alternate protocols for getting at the information.

Another hassle is simply dealing with the acquisition of information. I currently have scripts that use LWP::Parallel to fetch things from all printers at once. This is great when you want to interrogate a farm of printers and not have it take too long. One more thing: different types of information appears on different pages. One needs to specify what one wants to fetch so that the acquirer can figure out which pages are necessary to load.

I can think of a couple of ways of dealing with this. Each printer module could have a method (say, attrs) that, when called, returns an array of method names to call, in order to iterate over all that the module has to offer:

for my $attr( @{ $printer->attrs() } ) { print $printer->$attr(); }

But that won't quite cut it for printers with sophisticated toner details (although someone on the m-a@ list pointed to Hash::AsObject which may do the trick). But if anyone can point to prior art or Another Way To Do It, I'm all ears.

I think this is one of those instances where new() doesn't buy you much as a constructor. So consider the following:
my $want = [ { printer1 => Printer::Status::Acme::x1000->new( qw[serial_number b +ios_revision] ) }, { printer2 => Printer::Status::Acme::x1200->new( qw[firmware_revisi +on toner] ) }, { printer2 => Printer::Status::Acme::c800dn->snmp( qw[mac_address s +erial_number] ) }, ]; my %result = Printer::Status->as_hash($want); # or my @result = Printer::Status->as_array($want);

Sometimes you want to get the results back and refer to them by name, other times in the order in which you specified them. (Regardless of how Printer::Status actually went about fetching all the information). The contructor you choose determines how you get things back. Somewhat like DBI's fetchrow_arrayref and fetchrow_hashref.

The ugliest thing is the division of duties between the printer module and the acquirer. An object starts out half-constructed. The acquirer then asks it what it wants to fill in the blanks. It returns a list of URLs. The acquirer fetches them (in parallel with a number of other printers, possibly in parallel with itself) and then returns them back to the printer object. At this point the printer object is fully constructed.

On the other hand, if you only have one single printer, you don't care to go through all those hoops. Instead you just create a printer object and try to access its fuser_temperature method. The object, seeing that it has been called upon to provide some information it does not yet have, then creates its own temporary Printer::Status object to use it to fetch what it needs and thereby complete its own initialisation. That A can use B, and B can A, seems a little... odd.

Note that I do think it's also pointless to try and force all printers into a common mould (in terms of what information they can return). The stuff offered is just too different. Of course, I hope that I'll be able to lay down guidelines of some sort, so that if at least two printers offer the same piece of information, it will have the same name (e.g., I'd like to avoid serial_no versus serial_number).

In the end, if you're interested in a printer, you shall have to consult the manpage to find out what it offers.

So, if anyone has pointers to module hierarchies that do something similar, or observations (yeah but if you do it like that you won't be able to do this), I'll be very interested to hear what you have to say.

I'd like to try to avoid to build one in order to throw it away, so if the good monks can advise me, maybe the first version will be a winner.

Thanks for any clues I can use.

- another intruder with the mooring of the heart of the Perl

Replies are listed 'Best First'.
Re: Strategies and suggestions for a printer status hierarchy
by dragonchild (Archbishop) on Dec 21, 2004 at 14:59 UTC
    In other words, you want to provide something along the lines of DBI's catalog methods. So, what I would do is the following:
    • Define a mechanism for creating objects without having to know the whole class name. Something like DBI's connect where the DBD is figured out from the DSN. I would suggest using various items:
      • Manufacturer
      • Model
      • IP address / Windows address
      Remember - you still have to talk to this printer, so you need to be able to specify that it is this printer that that object is representing. Also, are there ways to auto-detect what manufacturer/model a given printer is if you know the IP address?
    • Define the interface by which you interrogate the object. Instead of using your attrs() method, I would actually have the Printer::Status::Acme::x1000 class provide methods. So, that way, you can say something like
      if ($printer->can( 'serial_number' )) { print "Serial Number: ", $printer->serial_number, $/; }
    • Provide a set of guidelines for authors, similar to the DBD-author guide that tbunce created. You also will want a mailing list. Then, when someone wants to write one of these, they will come to the mailing list and be educated in the ways of the architecture. That's how you make sure you don't have serial_no() vs. serial_number() vs. ser_no().
    • If there are multiple ways to access the information, the PSD (printer-status definition) author should be able to support them, but provide a useful default. Then, the user can specify in the connect() clause which connection method.

    You really need to study how DBI did things. I'd also suggest using the namespaces PSI (printer-status interface), PSD (printer-status definition), and PSIx (other printer-status modules).

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Strategies and suggestions for a printer status hierarchy
by belg4mit (Prior) on Dec 21, 2004 at 21:27 UTC
    Image::Info has to cope with a similar problem and handles it the other way. Everything avilable is published and you try to standardize attribute names.

    --
    I'm not belgian but I play one on TV.