bigbot has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I'm developing a script which analyzes financial trades. I want to find out the average number of trades open at the same time. I have the start and end dates of each trade, but I cannot figure out a way to do this properly.

I have found a few ways to compare date ranges and see if they overlap... However the problem is that I think it needs to be recursive.. Because I really need to compare each trade to all the other trades.

For instance, trade #1 might be a very long trade lasting several weeks. During this time dozens of other trades open and close. Using a simple date overlap method, I would find dozens of trades "open at the same time" as trade #1. However this does not help me find the average number of open trades, because these trades all had their own open/close dates which may or may not overlap with the rest of them. All I know for sure is that they all overlap with trade #1 at some point in time.

So I would greatly appreciate anyone's input on this. Again I'm looking for the average number of trades open at one time.

  • Comment on Recursive method to check overlap of many date ranges

Replies are listed 'Best First'.
Re: Recursive method to check overlap of many date ranges
by Perlbotics (Archbishop) on Sep 17, 2011 at 09:51 UTC

    I do not think that recursion is helpful here, though not impossible. First step seems to come up with a proper definition of average number of trades open at the same time (What time? How to deal with inactivity?).

    Maybe you can just re-sample your data like so:

    • Find start time (time_start) of first trade.
    • Find end time (time_end) of last trade.
    • Define a suitable sample interval (interval, e.g 15s).
    • From time_start to time_end in increments of interval do
      • Count the number of open trades at sample point: count[n x interval]
    • Generate mean value from sampled list (maybe ignore samples with zero count).
    • Alternatively: Use a sliding window and reduce further.
    • Potential refinement: Ignore some of the first and last samples (transients).

    Update: Instead of the synchronous sampling method suggested above, count parallel trades asynchronously at open/close times (events) - including/excluding the opened/closed trade respectively. You might weight the count with its duration (relative to observation period) to get an average value. Here, duration of count might differ from duration of given trade since it is the time span between adjacent events.

    Example:

    • Trade#1: open=10:10/close=11:15/duration=01:05;
    • Trade#2: open=10:12/close=10:18/duration=00:06;
    • Trade#3: open=10:15/close=10:26/duration=00:11;
    =Asynchronous Method= =Sampling Method= Time Action Count Duration Weight Count 15s-Samples ----------------------------------------------- ------------------- 10:10 open#1 1 2 2/65 1 8 10:12 open#2 2 3 3/65 2 12 10:15 open#3 3 3 3/65 3 12 10:18 close#2 2 8 8/65 2 32 10:26 close#3 1 49 49/65 1 196 11:15 close#1 0 - - - - ----------------------------------------------- -------------------
    Observation period: 10:10-11:15 = 65 (Minutes) = 260 15s-samples
    Asynchronous Method: Weighted sum (count * weight) = 82/65; Events = 6; Duration = 65; Average = 82/65 = 1.26
    Sampling Method: Sum = 328; Samples = 260; Duration = 260; Average = 328/260 = 1.26
    (Pardon me for leaving out the units...)

    Seems, the asynchronous method is more efficient (same result, lesser computations).

      Thanks for the responses. I should have been more specific regarding what I meant when I said the "average number of open trades".

      I do not care if there are no trades at all open. I also do not care about how long a trade is open. If two trades are open at the same time even for 1 minute I consider that two trades open at once. So I'm not exactly sure the best calculation to use.. Perlbotics your asynchronously at open/close times seems to be what I'm looking for. So far the testing shows that it's pretty accurate. Thanks! Let me know if you have any update considering the definition I presented for average number of open trades.

Re: Recursive method to check overlap of many date ranges
by LanX (Saint) on Sep 17, 2011 at 21:24 UTC
    Simple math: The average number of overlaps is the sum of all interval length divided by the complete observation period.

    If not, then please give example data demonstrating what you want.

    Cheers Rolf

Re: Recursive method to check overlap of many date ranges
by choroba (Cardinal) on Sep 17, 2011 at 09:40 UTC
Re: Recursive method to check overlap of many date ranges
by mshipper (Novice) on Mar 30, 2012 at 17:14 UTC
    I have some code on CPAN that does this.. I think.

    The example is take from the Data::Range::Compare::Stream::CookBook and is intended to compute exact downtime of a bank branch location when all 3 vnps are offline.

    The concept is the same as computing network down time when there are multiple redundant links. You just want to know where the overlap is and how long each overlap is, then average the time out based on your current data.

    Update: I found POSIX mktime was faster by a land slide than DateTime. Output is as follows:
    Total Number of Trades: 2339
    Total Trade Overlaps: 8034
    Average Trade Overlap time in seconds: 45
    Min Tade overlap time in seconds: 0
    Max Tade overlap time in seconds: 918
    Min number of overlapping trades: 2
    Max number of overlapping trades: 5
    Number of trades that did not overlap: 1222
    Average trade time for ranges that did not overlap: 171
    

    The input format for each file was as follows.

    20120228162825 - 20120228164406
    20120228170024 - 20120228171347
    20120228172137 - 20120228173503
    20120228173630 - 20120228174720
    20120228180047 - 20120228180404
    20120228181045 - 20120228181901
    20120228183436 - 20120228183622
    20120228185051 - 20120228185807
    20120228190808 - 20120228192117
    20120228193517 - 20120228194152
    20120228194243 - 20120228195755
    20120228195826 - 20120228200730
    20120228201307 - 20120228201534
    20120228202250 - 20120228203621
    20120228204718 - 20120228205709
    20120228210425 - 20120228211237
    20120228211427 - 20120228212922
    20120228213936 - 20120228214346
    20120228215321 - 20120228215727
    20120228220610 - 20120228221833
    20120228223125 - 20120228224514
    20120228225429 - 20120228230900
    20120228232155 - 20120228232426
    20120228233838 - 20120228234145
    20120228234334 - 20120228235752
    #!/usr/bin/perl use strict; use warnings; use POSIX qw(mktime); use lib qw(../lib); use Data::Range::Compare::Stream::Iterator::File; use Data::Range::Compare::Stream::Iterator::Compare::Asc; use Data::Range::Compare::Stream::Iterator::Consolidate::OverlapAsColu +mn; sub parse_line { my ($line)=@_; my $ref=[$line=~ /(\d+)/g]; foreach my $date (@$ref) { my @info=unpack('a4a2a2a2a2a2',$date); $info[0] -=1900; $info[1] -=1; $date=mktime($info[5],$info[4],$info[3],$info[2],$info[1],$info[0] +, 0, 0, 0); } return $ref; } my $cmp=new Data::Range::Compare::Stream::Iterator::Compare::Asc; my @it_list; foreach my $file (qw(posix_time_a.src posix_time_b.src posix_time_c.sr +c posix_time_d.src posix_time_e.src)) { my $iterator=new Data::Range::Compare::Stream::Iterator::File( parse_line=>\&parse_line, filename=>$file, NEW_FROM=>'Data::Range::Compare::Stream::PosixTime', ); # save our file iterator so we can figure out how many lines were in + each file push @it_list,$iterator; my $con=new Data::Range::Compare::Stream::Iterator::Consolidate::Ove +rlapAsColumn($iterator,$cmp); $cmp->add_consolidator($con); } my $total=0; my $time=0; my $non_overlaps=0; my $non_time=0; my $min_overlap=undef; my $max_overlap=0; my $max_overlap_count=0; my $min_overlap_count=undef; while($cmp->has_next) { my $result=$cmp->get_next; next if $result->is_empty; if($result->get_overlap_count>1) { my $overlap_time=$result->get_common->time_count; $max_overlap_count=$result->get_overlap_count if $max_overlap_coun +t < $result->get_overlap_count; if(defined($min_overlap_count)) { $min_overlap_count=$result->get_overlap_count if $min_overlap_co +unt > $result->get_overlap_count; } else { $min_overlap_count=$result->get_overlap_count; } $total +=$result->get_overlap_count; $time +=$overlap_time; $max_overlap=$overlap_time if $max_overlap < $overlap_time; if(defined($min_overlap)) { $min_overlap=$overlap_time if $min_overlap > $overlap_time; } else { $min_overlap=$overlap_time; } } else { $non_overlaps++; $non_time +=$result->get_common->time_count; } } my $total_trades=0; foreach my $it (@it_list) { $total_trades +=$it->get_pos; } print "Total Number of Trades: $total_trades\n"; print "Total Trade Overlaps: $total\n"; print "Average Trade Overlap time in seconds: ",int($time/$total),"\n" +; print "Min Tade overlap time in seconds: $min_overlap\n"; print "Max Tade overlap time in seconds: $max_overlap\n"; print "Min number of overlapping trades: $min_overlap_count\n"; print "Max number of overlapping trades: $max_overlap_count\n"; print "Number of trades that did not overlap: $non_overlaps\n"; print "Average trade time for ranges that did not overlap: ",int($non_ +time/$non_overlaps),"\n"; { package Data::Range::Compare::Stream::PosixTime; use strict; use warnings; use POSIX qw(strftime); use base qw(Data::Range::Compare::Stream); use constant NEW_FROM_CLASS=>'Data::Range::Compare::Stream::PosixTim +e'; sub format_range_value { my ($self,$value)=@_; strftime('%Y%m%d%H%M%S',localtime($value)); } sub range_start_to_string { my ($self)=@_; $self->format_range_value($self->range_start); } sub time_count { my ($self)=@_; $self->range_end - $self->range_start } sub range_end_to_string { my ($self)=@_; $self->format_range_value($self->range_end); } 1; }
Re: Recursive method to check overlap of many date ranges
by limzz (Novice) on Sep 21, 2011 at 19:41 UTC

    Average number of trades open at the same time per what? I'll guess day? Think about what an average is. You need your data points and the quantity of data. Your data is trades open per time increment, your quantity is the number of increments (e.g. if you were looking at the data over a year, then 365 days). Now the problem is a pretty easy one. You could do it many different ways. Cycle through your time increments, for each just see how many trades were open.