in reply to Calculating Completion of Feeds with Varying Volumes

Thank you all for your replies.

GrandFather's solution is interesting and I will give it a try. Definitely did not think of doing a check that way.

The reason I have created this script is to track feeds which are trending in one direction or another at some varying rate - it is difficult to set a constant value to check against. Hence, I have created a neural network for each feed which uses extensive historical data to make fairly accurate predictions of feed volumes for the following day.

To answer your questions ww:

The significance of marking a feed "completed" is that I can stop worrying about it not being finished on time. I have a similar NN predicting a recv'd time for each feed as well so that I know when I should start worrying about a feed being overly late and can contact the content provider. There is no automated action that I would risk linking to this script, it is simply feeding a webapp that I use for monitoring.

That said, I am willing to risk the possibility of a mistakenly marking a feed "complete." This is very rare due to a feature of the feeds - the majority of the files will come in 1 or 2 of the updates. It is usually the case that after I recv these updates that the feed can be marked complete.

Waiting for actual completion is generally not an option due to the feature described above. The final files of the feed typically arrive much, much later than the bulk and I am only concerned with having recv'd that majority of files as it generally guarantees that the rest will follow. So it wouldn't be a small increment to wait =)

Also, I do not control these feeds, nor do I have a way to contact the owners for a total size.

So I was thinking that there must be a way to create some sort of continuous sliding tolerance value which I could use to calculate acceptable "complete" volumes for feeds of all sizes.

  • Comment on Re: Calculating Completion of Feeds with Varying Volumes

Replies are listed 'Best First'.
Re^2: Calculating Completion of Feeds with Varying Volumes
by ww (Archbishop) on Mar 29, 2012 at 15:14 UTC
    + + for your response... even though I still have some issues.
    1. You say "I am only concerned with having recv'd that majority of files as it generally guarantees that the rest will follow." My paranoia/pessimism (about the inclination of complicated processes to fail unexpectedly) tells me that if I don't have the whole package, I may not get it. OTOH, "...generally guarantees...." is likely a fair to good indicator if your "extensive historic data" allows you to infer a stage at which the feed is unlikely to fail.
    2. On the proverbial third hand, why mark a feed "completed" when it's not? You could just as well mark it "Lookin' good, so far at nn%" and report that to your ap. And, perhaps even better, you could also use your historic data to call attention to any feed that is failing to satisfy your "likely to succeed" criterion at some stage of reception.

      A notice that one has a potential problem is likely, IMO, to be more useful than a notice that says 'All's well on the Western Front."

    GrandFather's approach should be easy to adapt to identifying likely failures.