comment on

I'm semi-retired, which means I take care of a client's system of Perl scripts that mostly run without my intervention. I log everything with the excellent Log::Log4perl module, and sometimes I tail those files to keep on eye on the various scripts that run. One group of scripts creates tickets for new orders, and other scripts update these tickets based on what Sage (the accounting system) says.

Eventually, I started to think about understanding the life-cycle of these tickets -- they get created (that's logged in one file), they get updated (logged in a couple of other files), and they get closed (logged in two other files). Could I parse all of the log files and see the life-cycle just by drawing inferences? It's an academic exercise, since all I have to do is query the ticketing system's API about the history of a ticket, but like I said, I'm mostly retired, but I'm still curious.

The lines are like this:

2024/11/28 10:54:04 INFO : Update ticket 425955 to add invoice 802436 
+tag .. OK
2024/11/28 10:54:05 INFO : Update ticket 425912 to add invoice 802435 
+tag .. OK
2024/11/28 10:54:06 INFO : Add note to ticket 425912 with info about i
+nvoice 802435 .. OK
2024/11/28 10:57:02 INFO : Create FD ticket 425991 for order 662626 ..
+ OK
[download]

So I created an AoH data structure with the filename, a useful regular expression, and an action (create or update). (Because for me, it always starts with a data structure to organize the logic.) But then I realized each log file had different elements that needed collecting. How do I handle that without having to write code for each log file? Can't I just add something clever to my data structure?

Eventually, some of my brain cells told me I needed to use a named capture in the regular expressions to handle this. Other brain cells complained that I'd never used that before, but the first group of brain cells said, Nonsense (or Buck Up, I forget), it's all in the Camel if you just look.

So, when you're capturing stuff in a regexp with a clause like (\d+), that first capture just gets stashed in $1. But you can also name that capture (a feature I never needed until now), like this: (?<ticket>\d+). And you get it out by looking at the magic variable %+, so the ticket value is available as $+{ ticket }. SO COOL!

I was then able to write a bunch of regular expressions, all with named captures, and collect whatever I needed from the log lines. Then, if a particular element was there, I would add it to the history hash I was building. So one of the AoH entries looked like this:

{
  filename => 'status.log',
  regexp   => qr/Update (?<ticket>\d+) status to (?<status>.+) \.\./,
  action   => 'update'
},
[download]

Then, putting stuff into the history hash was this large statement:

$history{ $+{ ticket } }{ $entry->{ action } } = {
                                     date  => $words[0],
                                    'time' => $words[1],
  ( exists ( $+{ order } ) ? (       order => $+{ order } ) : () ),
  ( exists ( $+{ invoice } ) ? (   invoice => $+{ invoice } ) : () ),
  ( exists ( $+{ shipment } ) ? ( shipment => $+{ shipment } ) : () ),
  ( exists ( $+{ scheduled_date } ) ? (
                            scheduled_date => $+{ scheduled_date } ) :
+ () ),
  ( exists ( $+{ status } ) ? (     status => $+{ status } ) : () ),
};
[download]

I wanted to do all of this in a single statement, rather than have individual if statements for each possible element.

The code runs fine, and does what I expect. Named captures are a very cool feature, but they do exactly what I needed to do. Props to all the smart folks who came up with that idea (and then implemented it). What a cool language.

Alex / talexb / Toronto

Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

In reply to Perl's hidden depths by talexb

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.