raghuprasad241 has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I am trying to find modules for some advanced data mining projects in perl. In python they have iterator module with functions like pairwise etc. that does the task for me. However I need to do this perl. So trying to avoid reinventing the wheel. I am going to describe a sample problem so that it will give you an idea for any recommendations you may provide. I have a data set like below...

10/01/2016 99.71 10/02/2016 99.53 10/03/2016 100.10 10/04/2016 100.96 10/05/2016 100.99 10/06/2016 101.38 10/07/2016 100.74 10/08/2016 100.70 10/09/2016 99.88 10/10/2016 97.62 10/11/2016 97.55 10/12/2016 99.12
Now I need to find 5 consecutive records either increasing or decreasing by value next to date field. For e.g. I need the following 2 result sets in my output.
10/02/2016 99.53 10/03/2016 100.10 10/04/2016 100.96 10/05/2016 100.99 10/06/2016 101.38 10/07/2016 100.74 10/08/2016 100.70 10/09/2016 99.88 10/10/2016 97.62 10/11/2016 97.55
Please note that I am looking for exactly 5 consecutive records with either increasing or decreasing patterns, any other number of consecutive records with those patterns should not qualify.

I know this is a little advanced for me to write from scratch given my perl experience. Appreciate any inputs on the modules that I may use to accomplish this task or atleast any ideas to write the code efficiently given there are high volumes of records in each file.

Again I am not expecting one of you to write code for me although I would not complain about it :-)

Thanks! Monk

Replies are listed 'Best First'.
Re: Recommendations for perl modules to work on data sets ?
by choroba (Cardinal) on Oct 21, 2016 at 15:13 UTC
    I don't know what pairwise does in python, but it sounds similar to pairs in List::Util. See also List::MoreUtils for pairwise and many other functions. On the other hand, I don't see how these functions can help you in getting the monotonic sequences.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Recommendations for perl modules to work on data sets ?
by talexb (Chancellor) on Oct 21, 2016 at 15:15 UTC

    There may be a module that could help you here, but my preference would be to just roll up my sleeves and write to some Perl to do this. It sounds like there are two rules:

    1. Starting at some point on the list (and this point may process along the list), elements 0 to 4 must be monotonically increasing; and
    2. element 5 must be less than element 4.
    And similarly for the other direction.

    This would probably be a good interview question .. there are lots of different ways to do it.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Recommendations for perl modules to work on data sets ?
by tybalt89 (Monsignor) on Oct 21, 2016 at 18:41 UTC

    Fun little problem. When you think patterns, think regex :)

    #!/usr/bin/perl -l # http://perlmonks.org/?node_id=1174464 use strict; use warnings; my @data; my $increasing = ''; my $previous; while(<DATA>) { my $n = push @data, $_; my ($this) = (split)[1]; $previous //= $this; $n > 1 and $increasing .= ($this <=> $previous) + 2; $previous = $this; } #print "$increasing\n"; my $pos = 0; $1 && print( @data[$pos..$pos+4]), $pos += length $& while $increasing =~ / ([13])\1{3}(?!\1)(?:.|$) | (.)\2* /gx; __DATA__ 10/01/2016 99.71 10/02/2016 99.53 10/03/2016 100.10 10/04/2016 100.96 10/05/2016 100.99 10/06/2016 101.38 10/07/2016 100.74 10/08/2016 100.70 10/09/2016 99.88 10/10/2016 97.62 10/11/2016 97.55 10/12/2016 99.12
      @tybalt89, thank you for showing me the cool code.

      I got a question though, isn't it true that REGEX's are slow when you are dealing with large amounts of data ?

      Thanks!
        Premature optimization is the root of all evil (or at least most of it) in programming. (Donald Knuth)

        In general, if a regex is the right tool to solve your problem, then use a regex. Try to change it so something else afterwards, only if you've found that it is too slow. But don't make it more complicated than it needs to be if you don't have to.

        Well, having said that, I should add that there are some exceptions: sometimes it is good to think about a fast algorithm or a more efficient data structure before you start coding, but that usually does not apply to micro-optimizations and small efficiencies such as regexes versus other solutions (such as substr, index or unpack).

        Only benchmarking can answer that question. See Benchmark.pm

Re: Recommendations for perl modules to work on data sets ?
by raghuprasad241 (Beadle) on Oct 21, 2016 at 16:22 UTC
    Thank you guys, for the response.
    I will probably write my own code for this. I think finding 5 consecutive increasing or decreasing values in a list should not that difficult.

    However, how do I map them to the dates again ?

    Thanks!
    Monk
      However, how do I map them to the dates again ? if you make up an hash with dates as keys then you can work with values and print back key and value as needed

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.