neodon has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm having a debate at work with some coworkers about a component in our system. I'd like to explain how it works with some examples, and get some feedback on how simple, robust, and maintainable you think it is. I'd also appreciate some suggestions for improving the system.

I've changed the names and values so the problem is a little more general.

First, we have a table called Message. It has a column called Status with the following possible values: NOT_SCANNED, NOT_CHECKED, NOT_VERIFIED, QUEUED, COMPLETE, and BROKEN.

NOT_SCANNED means the message hasn't been scanned for viruses. NOT_CHECKED means it hasn't been spell checked, and NOT_VERIFIED means the sender of the message hasn't been verified.

If any of our worker processes has a problem with the message, it will mark it as BROKEN so it can be dealt with later.

When a new message is NOT_SCANNED, a worker checks the message for viruses. A message *must* be virus scanned before it can be checked or verified. After it is scanned, the worker calls $message->set_next_status() so it can update its own status.

The next status after NOT_SCANNED could be either NOT_CHECKED or NOT_VERIFIED, depending on the user's preference for spell checking. If the message needs spell checked, it is marked NOT_CHECKED and the spell checker gets it. If the user does not want spell checking, the message status is marked NOT_VERIFIED.

If the message is marked NOT_CHECKED, the spell checker worker will do its thing and then call $message->set_next_status(), which happens to set the status to NOT_VERIFIED. Messages are always verified, but not always spell checked.

Another worker watches for NOT_VERIFIED messages, and it marks them as QUEUED and puts them in a queue to be verified.

Finally, one more worker dequeues messages, verifies them, and calls $message->set_next_status(), which will mark them as COMPLETE.

Here's some code:

package Message; ... sub is_not_scanned { return shift->status eq 'NOT_SCANNED' } sub is_not_checked { return shift->status eq 'NOT_CHECKED' } sub is_not_verified { return shift->status eq 'NOT_VERIFIED' } sub is_queued { return shift->status eq 'QUEUED' } sub is_complete { return shift->status eq 'COMPLETE' } sub is_broken { return shift->status eq 'BROKEN' } sub is_scanned { my ($self) = @_; return !$self->is_not_scanned && !$self->is_broken; } sub is_checked { my ($self) = @_; return $self->is_scanned && !$self->is_not_checked && !$self->is_broken; } sub set_next_status { my ($self) = @_; return if $self->is_complete || $self->is_broken; my $enable_spell_check = $self->user->preferences->enable_spell_check; my $is_complete = $self->is_not_verified || $self->is_queued; my $is_not_checked = $self->is_not_scanned && $enable_spell_check; my $status = $is_complete ? 'COMPLETE' : $is_not_checked ? 'NOT_CHECKED' : 'NOT_VERIFIED ; $self->update({ status => $status }); return; }

UPDATE:

I've gotten some good responses to this post, and I'd like to implement some of the suggestions below, after asking some more questions.

One simple yet compelling observation by arkturuz is that the status values shouldn't be negated (e.g. NOT_SCANNED vs SCANNED). However, removing the negation from the status values would require rethinking all the various possibilities for them as a message is processed and rewriting most of the code above.

I hadn't realized the design looks like a finite-state machine, and so I am very interested in exploring this approach. Before I do, however, I'd like to ask a follow-up question. I'll post it as a reply too. The question is this:

A message can have multiple independent states, such as being scanned and checked, or verified but broken, or complete but not spell checked. Is it better to keep these states in separate columns?

Perhaps columns like scan_status, check_status, and verify_status with values like PENDING, COMPLETE, NA, BROKEN. These could also be on a separate table like message_status.

Would this be a significant improvement? Does it improve simplicity, robustness, and maintainability?

Replies are listed 'Best First'.
Re: Is this a simple, robust, and maintainable design?
by eyepopslikeamosquito (Archbishop) on Feb 02, 2011 at 20:39 UTC

    Hard to comment without knowing the full context of your system but it sounds like a Finite-state machine. If you haven't coded a finite state machine in Perl before, this article may be helpful.

Re: Is this a simple, robust, and maintainable design?
by jdporter (Paladin) on Feb 02, 2011 at 21:01 UTC

    IMHO, processes (workflows) are better implemented with actual state machines. You have custom code implementing every possible state transition. That's not very scaleable at all. You might check out these modules on CPAN: Workflow, Class-Workflow, Method-Workflow.

    What is the sound of Windows? Is it not the sound of a wall upon which people have smashed their heads... all the way through?
Re: Is this a simple, robust, and maintainable design?
by Jim (Curate) on Feb 02, 2011 at 20:53 UTC
Re: Is this a simple, robust, and maintainable design?
by arkturuz (Curate) on Feb 03, 2011 at 09:15 UTC
    I'd also appreciate some suggestions for improving the system.

    I would suggest reversing the logic of variable names starting with 'NOT_'. It can be quite confusing to follow the logic once the whole system becomes pretty complex. So, why not name your states like 'CHECKED', 'VERIFIED', etc. and functions like 'is_checked', 'is_verified', etc. It's much easier to follow the reasoning, IMO. Notice that you don't have variables like 'NOT_BROKEN' or 'NOT_COMPLETE'.

      'Negating' the NOT states actually implies the next state. The negation of NOT_CHECKED to CHECKED actually makes CHECKED semantically the equivelent of NOT_VERIFIED. I'd change the sense by changing NOT_CHECKED to NEEDS_CHECK to retain the state but to avoid the seeming double negatives scattered through the code.

      True laziness is hard work
        That's pretty insightful. It would be a relatively easy change that would make everything much clearer.
      I absolutely agree with you on this. Having negated attributes and then negating them again in conditions is pretty confusing. Thanks for pointing it out!
Re: Is this a simple, robust, and maintainable design?
by neodon (Novice) on Feb 03, 2011 at 19:59 UTC

    I have some thoughts and questions about ways to improve the design.

    A message can have multiple independent states, such as being scanned and checked, or verified but broken, or complete but not spell checked. Is it better to keep these states in separate columns?

    Perhaps columns like scan_status, check_status, and verify_status with values like PENDING, COMPLETE, NA, BROKEN. These could also be on a separate table like message_status.

    Would this be a significant improvement? Does it improve simplicity, robustness, and maintainability?

      You can think about these in a couple of different ways. One way is to consider each possibility a distinct state such as VERIFIED_BROKEN, ... . That implies separate code to deal with each state.

      An alternative is to consider fundamental states (VERIFIED) and state modifying flags (BROKEN). The code then consists of handlers for each state with conditional code in the state handling code to deal with allowed modifier flags.

      If the flag handling is a small amount of code in each state handler then using flags in a flag column seems sensible. If the exceptional state handling code is significantly different than the unexceptional state code I'd do it as distinct states and use a single column.

      Either of these approaches work well with using a state machine as suggested by others.

      True laziness is hard work