I use Authorize.Net as a credit card processor, and although they have online reports, I don't have a lot of flexibility in slicing and dicing the information, so I had to write my own report maker. It's not pretty right now, but I won't need it that often and I'll improve it the next time I use it. Feel free to steal as much of it as you like.

I can download all of my transaction information in one of four formats: standard, standard plus, extended, and extended plus. I don't know who chose to break things down like that, or why the standard formats are tab-separated and the extended ones are comma-separated. It looks to me like two programmers who refused to talk to each other implemented the different parts. No big whoop: I can handle either quite easily. (And for the purposes of this script, I did the quick and dirty thing rather than pull out the big CPAN guns).

I want to make a report that shows my total settlements (which is just a charge, but from the merchant perspective. You charge, and I settle the charge to get the money) for each day, and break that down into two groups: MasterCard and Visa settlements and American Express settlements. Each of those are handled separately by two different banks who deposit money in my account. I can't look at the daily grand total and match that to a deposit on my bank statement. I also don't see the settlements on the same day as I charge the card, and it can be up to five days later if there are some holidays and weekends close by. Not only that, the deposits don't happen in order. I can settle transactions today and get that money before the ones I settled yesterday. Furthermore, charges aren't settled ont he same day they are made, necessarily. The settlements happen near the close of business each day, so a charge made in the evening shows up in the next day's batch of settlements. I can't just total my receipts each day because I don't know which ones will be in which batch. I want to make sure I'm getting all my money and also know what I should see coming up, so I need this report to compare to my bank statement.

And the trouble is just beginning.

The tab-separated format has everything I need save for one thing: the transaction status. It lists every attempted charge, whether it was an authorization, decline, void, or something else. It doesn't tell me the status of the transaction. I can't create totals from that because I can't tell which ones went through and which ones were declined or voided. Here's a sample line from the tab-separated file.

Response Code Authorization Code Address Verification Status +Transaction ID Submit Date/Time Card Number Expiration Date + Invoice Number Invoice Description Total Amount Method + Action Code Customer ID Customer First Name Customer Last N +ame Company Address City State ZIP Country Phone + Fax Email Ship-To First Name Ship-To Last Name Ship-T +o Company Ship-To Address Ship-To City Ship-To State Ship +-To ZIP Ship-To Country L2 - Tax L2 - Duty L2 - Freight + L2 - Tax Exempt L2 - Purchase Order Number Routing Number +Bank Account Number 1 151984 Z 708323005 10-Nov-2004 10:18:14 AM CST XXXX20 +09 XXXX 2004111001 test of Amex 0.00 A VOID +brian d foy 5250 N. Broadway Suite 157 Chicago IL +60640 USA 0.00 0 +.00 0.00 No

The comma-separated format knows the transaction status, but it doesn't know the charge type: MasterCard, American Express, or Visa. It boggles me that the reports are missing these fields, but they are. Here's a sample line:

Transaction ID,Transaction Status,Settlement Amount,Settlement Currenc +y,Settlement Date/Time,Authorization Amount,Authorization Currency,Su +bmit Date/Time,Authorization Code,Reference Transaction ID,Transactio +n Type,Address Verification Status,Card Code Status,Fraudscreen Appli +ed,Recurring Billing Transaction,Partial Capture Status,Card Number,E +xpiration Date,Bank Account Number,Routing Number,Total Amount,Curren +cy,Invoice Number,Invoice Description,Customer First Name,Customer La +st Name,Company,Address,City,State,ZIP,Country,Phone,Fax,Email,Custom +er ID,Ship-To First Name,Ship-To Last Name,Ship-To Company,Ship-To Ad +dress,Ship-To City,Ship-To State,Ship-To ZIP,Ship-To Country,L2 - Tax +,L2 - Freight,L2 - Duty,L2 - Tax Exempt,L2 - Purchase Order Number,CA +VV Results Code,Reserved1,Reserved2,Reserved3,Reserved4,Reserved5,Res +erved6,Reserved7,Reserved8,Reserved9,Reserved10,Reserved11,Reserved12 +,Reserved13,Reserved14,Reserved15,Reserved16,Reserved17,Reserved18,Re +served19,Reserved20 708323005,Voided,0.00,USD,10-Nov-2004 07:01:18 PM CST,0.00,USD,10-Nov- +2004 10:18:14 AM CST,151984,0,Authorization Only,Z - Street Address: +No Match -- First 5 Digits of Zip: Match,,Not Applicable,N,Not Applic +able,XXXX2009,XXXX,,,0.00,USD,2004111001,test of Amex,brian d,foy,,52 +50 N. Broadway Suite 157,Chicago,IL,60640,USA,,,,,,,,,,,,,0.00,0.00,0 +.00,No,

They both have a transaction ID (although in different columns, so load_file() is a bit ugly), and I can join the records on that. After reading in each file (nothing fancy yet, but as I have to use this more, there will probably be some sort of LWP thing to download them) and loading them into a big hash of hashes, I select from the comma-separated records only the ones that have been settled successfully. The choose() routine simply deletes any keys whose sub-key "Transaction Status" is not "Settled Successfully". The merge() routine joins all of the second level hash keys based on the transaction ID, which both formats share. Having done that and knowing I don't need most of the information from the files, I use prune() to delete any of the sub-keys not in @fields. I don't really need to do this, but I had an easier time looking at Data::Dumper output this way. If I were handling huge files, I would have done this sooner, but I have only several hundred records and the script runs fast enough for me to not care right now.

Once I have the records I want, I turn the date-time groups into a date format I want to use. All the records settled in a particular batch have the same date-time group (that looks like 17-Nov-2004 02:19:53 AM CST), so I really don't need the time. In munge_dates() I just throw away the time and turn the month into a number ( 2004-11-17 ) for easy string sorting.

Now I have this big data structure sitting around. I have an entry for every successful credit card settlement, and each settlement knows all of its data. I don't care about settlements though: I want to know daily totals. I need to turn the data structure inside out so I key on the date instead of the transaction ID, and that's what invert() does. The invert() function also throws away extra data, since I only want to know how much I charged and which type of card I charged. I end up with a hash of arrays, and in each array is an amount-card pair.

The report_totals() turns that hash into a table, and I actually think the table is pretty slick. I wanted to use a format for this, but lexical varialbes carried the day. I started with a printf format to put numbers in columns, put sometimes that would make a 0.00. That just clutters the report, so I changed all the format specifiers to strings and made the string an empty one if the total came out as the string 0.00. Now I just see the important numbers. The first three number columns are totals by card type (which I like to see but don't really matter for this problem), the column between the pipes is the total money I'm due for that day, and the last two are the deposit amounts I should see in my bank statement. I should see a separate deposit for each amount in the last two columns.

Date M V A | Total | M+V A ------------------------------------------------------------------- 2004-12-10 16.00 32.00 | 48.00 | 48.00 2004-12-12 16.00 | 16.00 | 16.00 2004-12-13 32.00 16.00 | 48.00 | 32.00 16.00 2004-12-14 16.00 | 16.00 | 16.00

I print that report, put it next to my printed bank statement, and start matching settlements and deposits. Badda bing badda boom, once done there are some settlements without check marks next to them. They owe me a bunch of money and I know the dates they missed. That's one of the reasons I think the reports aren't that helpful, even though I could just attribute it to incompetence: if you can't easily see what they owe you, you can't easily request your money. Oh well.

#!/usr/bin/perl use File::Spec; my $Desktop = "/Users/brian/Desktop"; @ARGV = map { File::Spec->catfile( $Desktop, $_ ); } qw( Download20041217-040705.txt Download20041217-034041.txt ); my $commas = load_comma_file( $ARGV[0] ); choose( $commas, 'Transaction Status' => 'Settled Successfully' ); my $tabs = load_tab_file( $ARGV[1] ); merge( $tabs, $commas ); my @fields = ( 'Settlement Amount', 'Method', 'Settlement Date/Time', ); prune( $commas, @fields ); munge_dates( $commas ); my $dates = invert( $commas ); report_totals( $dates ); # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # +# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # + # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # +# # # sub merge { my( $tabs, $commas ) = @_; foreach my $key ( keys %$commas ) { $commas->{$key}{'Method'} = $tabs->{$key}{'Method'} } } sub load_tab_file { load_file( $_[0], \&read_tab_record, 3 ) } sub load_comma_file { load_file( $_[0], \&read_comma_record, 0 ) } sub read_comma_record { read_record( $_[0], qr/,/ ) } sub read_tab_record { read_record( $_[0], qr/\t/ ) } sub trim { $_[0] =~ s/^\s*|\s*$//g; $_[0] }; sub read_record { [ map { trim($_) } split m/$_[1]/, $_[0] ] } sub choose { my $hash = shift; my( $field, $value ) = @_; foreach ( keys %$hash ) { my $subhash = $hash->{$_}; delete $hash->{$_} unless $subhash->{$field} eq $value; } } sub munge_dates { my $hash = shift; foreach ( keys %$hash ) { my $date = ( split /\s+/, $hash->{$_}{'Settlement Date/Time'} )[0]; my( $day, $month, $year ) = split /-/, $date; $month = month( $month ); delete $hash->{$_}{'Settlement Date/Time'}; $hash->{$_}{'Settlement Date'} = sprintf "%4d-%02d-%02d", $year, $month, $day; } } sub month { my %hash = qw( Oct 10 Nov 11 Dec 12 Jan 1 ); $hash{ $_[0] }; } sub prune { my $hash = shift; my %fields = map { $_, 1 } @_; foreach ( keys %$hash ) { my $subhash = $hash->{$_}; foreach ( keys %$subhash ) { delete $subhash->{$_} unless exists $fields{$_}; } } } sub load_file { my( $file, $reader, $pivot ) = @_; open my $fh, $file or die "Could not open file: $file\n$!\n"; my $columns = $reader->( scalar <$fh> ); my $hash = {}; while( <$fh> ) { my $fields = $reader->( $_ ); my $trans_id = $fields->[$pivot]; $hash->{ $trans_id } ||= {}; @{ $hash->{ $trans_id } }{ @$columns } = @$fields; } return $hash; } sub invert { my $hash = shift; my $dates = {}; foreach ( keys %$hash ) { my $date = $hash->{$_}{'Settlement Date'}; $dates->{$date} ||= []; push @{ $dates->{$date} }, [ @{ $hash->{$_} }{ 'Method', 'Settlement Amount' } ]; } $dates; } sub report_totals { my $hash = shift; print "\n"; printf "%-10s %7s %7s %7s | %7s | %7s %7s\n" . "-" x 75 . "\n", "Date", "M", "V", "A", "Total", "M+V", "A"; foreach my $date ( sort keys %$hash ) { my %total; foreach my $t ( @{ $hash->{$date} } ) { $total{ $t->[0] } += $t->[1]; } printf "%10s %7s %7s %7s | %7s | %7s %7s\n\n", $date, map { my $s=sprintf "%.2f", $_; $s eq '0.00' ? '' : $s } @total{ qw( M V A ) }, $total{M} + $total{V} + $total{A}, $total{M} + $total{V}, $total{A}; } }
--
brian d foy <bdfoy@cpan.org>

Replies are listed 'Best First'.
Re: Perl, Authorize.Net, and Broken Kneecaps
by idsfa (Vicar) on Dec 17, 2004 at 19:56 UTC

    You are soooo screwed the first time your input CSV has a comma in one of the text fields. In general, a regex is not the right tool for ?SV. Consider rewriting using Text::CSV_XS or Text::xSV, which will handle both your TAB and Comma separated files properly.

    If you want to further abstract your logic from your data, you should probably look into Anydata, which will happily slurp up your files for you. The DBD::Anydata tool will even do the JOIN and SELECT on the two tables.


    The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. -- Cyrus H. Gordon

      In this case, there aren't commas in the data I need. In general, you are right, but this isn't the general case. Treating everything like the general case is usually a lot of wasted effort. I know that the parts of the lines that matter to me won't have the separator character in it.

      I'm not screwed to any degree even if there are commas in the data. This little script simply makes the report I needed this morning. It's not controlling medical equipment, nuclear reactors, or air traffic control. If it breaks, I don't lose any money and nobody dies. People don't lose their jobs and the sky does not fall. I'm not even committing the minor sin of uploading this to CPAN.

      Perhaps your definition of "screwed" is different than mine, but a text report going a bit wonky just means I do the extra work later, if I need it. For now, I have the report I need, and a month from now if I even remember where I put this script and if I need something different, I'll change it.

      Until then, I don't need a perfect or proper script. Joel Spolsky says in "Hard-Assed Bug Fixing" Fixing bugs is only important when the value of having the bug fixed exceeds the cost of the fixing it." Indeed, he goes on to show the economic disadvantage of too much programming.

      The point of the exercise is not programming, but results. Anything beyond the results is wasted time, which is why, when I considered it (as noted in the original post), I didn't use a module, but I designed it so that if I change my mind, almost nothing changes. The flow is still the same, and some subroutines at the end of the program change how they do things, although not their interface. The rest of the program doesn't know how I parse a line: it just knows that it gets data. I don't need DBI to do this simple task, and right now I don't need another module.

      If you want to spend a lot of quality time with your keyboard, that's fine. I prefer to get things done, and sometimes a "proper" program is just wasted time. Instead of doing everything "properly" from the start, just design things so you can go back and easily change the program later. Use the simplest solution to a part of problem, and stop when it works. Move on to the next part of the problem. Stop when that works. When you get the results you need, stop working. I may never use this script again, so any extra work is a waste. If I do use it again, and it breaks, then I fix it, but I only do it just in time and as needed.

      I've met a lot of perfectionists, and in my opinion, most of them don't get that much done.

      --
      brian d foy <bdfoy@cpan.org>
        Sometimes I use a module even if the power it brings could be seen as overkill for the solution at hand.

        I put the extra time on the "education" account. When learning something extra while solving a simple problem I spend the time once. Next time I have knowledge to build on. In the meantime I have widened my horizon slightly, and often a more solid solution.

        The other day I had to create a very simple HTML page with a ToC for a couple of Doxygen reports. I could have just printed HTML to a file. But I went for using Template Toolkit (which I've used before, but it was some time ago and I needed a refresher). Altogether this took a little longer than printing HTML, but the code was a lot clearer and shorter.

        Next time I have to do something similar, I'll get a real good solution in no time.

        /J

        If you want to spend a lot of quality time with your keyboard, that's fine. I prefer to get things done, and sometimes a "proper" program is just wasted time.

        And yet, in the time you wrote this node, you could have fixed the supposed "bug".

Re: Perl, Authorize.Net, and Broken Kneecaps
by Stevie-O (Friar) on Dec 17, 2004 at 21:29 UTC
    "That's one of the reasons I think the reports aren't that helpful, even though I could just attribute it to incompetence: if you can't easily see what they owe you, you can't easily request your money."

    Never attribute to malice that which is adequately explained by human stupidity :)

    That said, bravo -- excellent work. I have just one suggestion: Have you perhaps considered using a SQL database (nothing major -- maybe something as simple as SQLite) to manage things? Your script could read one report, and generate the rows as INSERT statements, then read the other report, and fill in the missing data with UPDATE statements.

    Perl is great for parsing the incoming data (especially given its heterogeneous nature), but SQL servers shine at the data manipulation you're doing (specifically, the 'inversion' you're performing, plus grouping by date).

    Remember: Perl is a very flexible tool, just like a Swiss Army knife. But if I'm turning a lot of screws, I'm not going to reach for my keychain, even if it does have a mini screwdriver; I'm going to use a "real" screwdriver. Sure, my "real" screwdriver doesn't do anything else, but it's way better suited to turning screws.

    --Stevie-O
    $"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc

      At some point when I have a lot of free time, I might consider downloading all the data and storing it in a database, but I'd need a lot of free time to want to do that. I wouldn't take a long time, it's just not that useful. Of all the things I could do with my time, making my own database for all of this doesn't get that high on my list. ;)

      I'm not going to need to do fancy selects. Most records have no relationship to other records, and it's very easy to pull up a single record. I already get summary reports about total charges, etc, so I don't need that. Authorize.Net has the basic stuff covered. I just needed something a little different this time. If I did this in DBI, I would have taken a lot more time and had to write a lot more code. There's not a big gain there.

      The interesting stuff is already in my subscriber database. That's where I can do nifty things, should I get the time. That is a relational database. :)

      And I'm not attributing anything to malice: it's greed. :)

      --
      brian d foy <bdfoy@cpan.org>