rinkish85 has asked for the wisdom of the Perl Monks concerning the following question:

I have the below sample data and I need to extract lines which for specific date range. Say from 01/02/2018 to 06/02/2018 Last column in below sample is the date column dd/mm/yyyy format. Columns of file are seprated by one more spaces.

DQ94JD84 S8G2H A9X946N 111.222.333.123 HOME - [01/Feb/2018:01 +:08:39 -0800] 01/02/2018 XA29EN35 M4C6M D7F577Q 111.222.333.123 AWAY - [02/Feb/2018:01 +:08:39 -0800] 02/02/2018 JK20TQ67 K1L0V T6Z148X 111.222.333.123 HOME - [03/Feb/2018:01 +:08:39 -0800] 03/02/2018 SO78NZ28 B5S8J W9F920Z 111.222.333.123 HOME - [04/Feb/2018:01 +:08:39 -0800] 04/02/2018 BI55SY64 R6P5H A9U757R 111.222.333.123 HOME - [05/Feb/2018:01 +:08:39 -0800] 05/02/2018 MH72RG27 Y6X0N C7E352J 111.222.333.123 HOME - [01/Feb/2018:01 +:08:39 -0800] 01/02/2018 ET43US76 F5S3W X2L870O 111.222.333.123 HOME - [06/Feb/2018:01 +:08:39 -0800] 06/02/2018 QA43MM82 Y3K0O H8E229P 111.222.333.123 HOME - [11/Feb/2018:01 +:08:39 -0800] 11/02/2018 MK72HI41 A6W2I M3X402D 111.222.333.123 HOME - [15/Feb/2018:01 +:08:39 -0800] 15/02/2018 LJ28XY72 E0Z4L E8W757S 111.222.333.123 HOME - [22/Mar/2018:01 +:08:39 -0800] 22/03/2018 TJ47EG77 W5J6A A7L557Q 111.222.333.123 HOME - [01/Feb/2018:01 +:08:39 -0800] 01/02/2018 TJ47EG78 E6L8Z J4X329P 216.142.233.73 - - [30/Nov/2018:00:03: +08 -0500] 30/11/2018 TJ47EG79 N2V1H Q1Y615G 10.101.128.66 - - [03/Apr/2018:00:42:1 +1 -0500] 03/04/2018
Thanks.

Replies are listed 'Best First'.
Re: extract lines from file between date range.
by thanos1983 (Parson) on Mar 23, 2019 at 17:34 UTC

    Hello rinkish85,

    Part of learning how to write a script is trying and failing. Apart from that golden rule is also research on the web. If you had spend 5 minutes searching you would have found Getting lines in a file between two patterns and How do I extract all text between two keywords like start and end? that have previously asked on this forum.

    Fellow Monk poj has already replied to your question but just for fun here another way. Since you do not specify if you want to capture the dates between starting and ending point in between if they are duplicates of dates, see below another possible way to do it based on your sample of data:

    The code provides you the ability to insert multiple files through the @ARGV.

    Update: In case you do not want to include also the starting point and the ending point simply increment the date in regex by one unit on the start and decrease by one on the end. This way it will not include start and end.

    Sample below:

    Update2: Reducing the lines a bit. See below:

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: extract lines from file between date range.
by Marshall (Canon) on Mar 23, 2019 at 21:41 UTC
    From reading posts so far, I take it that the Monks are irritated that you haven't presented any attempts at code of your own. Giving an attempt yourself is an important part of the learning process. The Monks will grow deaf to your requests if you don't start writing code yourself!

    Comparing dates/times can get complicated. In general, I recommend converting input strings to output strings in ISO format 3. This is like "YYYY-MM-DD" where leading 0's are required for single digit fields. The reason for doing this is that you can use a string comparison or simple sort rather than a more complex numeric sort. Here is some code for you. Note that I use string comparisons of "ge" (greater than or equal) and "le" (less than or equal).

    Update: Minor code improvement to if statement, deleted $_ from print $_ as that is extraneous.

    #!/usr/bin/perl use strict; use warnings; my $start ='2018-02-03'; #ISO 3 format YYYY-MM-DD my $end ='2018-02-06'; while (<DATA>) { if (my ($day, $month, $year) = $_ =~ m/(\d{2})\/(\d{2})\/(\d{4})\s* +$/) { my $this_date = "$year-$month-$day"; print if ($this_date ge $start and $this_date le $end); } } =prints K20TQ67 K1L0V T6Z148X 111.222.333.123 HOME - [03/Feb/2018:01: +08:39 -0800] 03/02/2018 SO78NZ28 B5S8J W9F920Z 111.222.333.123 HOME - [04/Feb/2018:01 +:08:39 -0800] 04/02/2018 BI55SY64 R6P5H A9U757R 111.222.333.123 HOME - [05/Feb/2018:01 +:08:39 -0800] 05/02/2018 ET43US76 F5S3W X2L870O 111.222.333.123 HOME - [06/Feb/2018:01 +:08:39 -0800] 06/02/2018 =cut __DATA__ some bogus line here DQ94JD84 S8G2H A9X946N 111.222.333.123 HOME - [01/Feb/2018:01 +:08:39 -0800] 01/02/2018 XA29EN35 M4C6M D7F577Q 111.222.333.123 AWAY - [02/Feb/2018:01 +:08:39 -0800] 02/02/2018 JK20TQ67 K1L0V T6Z148X 111.222.333.123 HOME - [03/Feb/2018:01 +:08:39 -0800] 03/02/2018 SO78NZ28 B5S8J W9F920Z 111.222.333.123 HOME - [04/Feb/2018:01 +:08:39 -0800] 04/02/2018 BI55SY64 R6P5H A9U757R 111.222.333.123 HOME - [05/Feb/2018:01 +:08:39 -0800] 05/02/2018 MH72RG27 Y6X0N C7E352J 111.222.333.123 HOME - [01/Feb/2018:01 +:08:39 -0800] 01/02/2018 ET43US76 F5S3W X2L870O 111.222.333.123 HOME - [06/Feb/2018:01 +:08:39 -0800] 06/02/2018 QA43MM82 Y3K0O H8E229P 111.222.333.123 HOME - [11/Feb/2018:01 +:08:39 -0800] 11/02/2018 MK72HI41 A6W2I M3X402D 111.222.333.123 HOME - [15/Feb/2018:01 +:08:39 -0800] 15/02/2018 LJ28XY72 E0Z4L E8W757S 111.222.333.123 HOME - [22/Mar/2018:01 +:08:39 -0800] 22/03/2018 TJ47EG77 W5J6A A7L557Q 111.222.333.123 HOME - [01/Feb/2018:01 +:08:39 -0800] 01/02/2018 TJ47EG78 E6L8Z J4X329P 216.142.233.73 - - [30/Nov/2018:00:03: +08 -0500] 30/11/2018 TJ47EG79 N2V1H Q1Y615G 10.101.128.66 - - [03/Apr/2018:00:42:1 +1 -0500] 03/04/2018
    I don't know where the trailing, 22/03/2018 stuff came from? It appears that [03/Apr/2018:00:42:11 -0500] is the real time stamp. The -0800 means 8 hours behind GMT or UTC (perhaps Pacific Standard Time?). Yes, there is a fine difference between GMT and UTC that is meaningless here. Typically what would be done here is convert these times into UTC and use that for all DB storage. Convert to local time like perhaps Pacific Daylight Savings Time which is -0700 from GMT for user presentation.
Re: extract lines from file between date range.
by hippo (Archbishop) on Mar 23, 2019 at 11:14 UTC

    It's nice to have a fairly solid spec. On that basis, Perl does sound like a sensible way to tackle this. Good luck with your project, rinkish85.

Re: extract lines from file between date range.
by bliako (Abbot) on Mar 23, 2019 at 12:10 UTC

    I second hippo's Good Luck wishes to rinkish85 and I would hereby like to copyright the term "Cuckoo egg programming" (on artistic licence) to describe a very specific way of earning a wage as a modern-day programmer.

    I would also like to commend on those Human Resources departments all over the world that manage to spot such talents practicing the art of "Cuckoo egg programming" and do hire said talent adding immense value to their respective companies: airplanes repeatedly falling down killing innocent folk, stock-markets crashing, online systems with holes like swiss cheese, banks' computer systems crippled for months, processors wide open to attack for years, hospitals either releasing patient records to anyone or crippling doctor access to them, hospitals held hostage by scriptkids because they relied on mickey-mouse OS thanks to advice by consultants, airlines computer systems failing for weeks denying flying to millions, the list is quite long.

    bw, bliako

Re: extract lines from file between date range.
by poj (Abbot) on Mar 23, 2019 at 11:13 UTC
    #!/usr/bin/perl use strict; use warnings; my $start = ymd('01/02/2018'); my $end = ymd('06/02/2018'); my $infile = 'input.txt'; my $outfile = 'output.txt'; open IN, '<',$infile or die "Could not open '$infile' : $!"; open OUT, '>',$outfile or die "Could not open '$outfile' : $!"; my $count=0; while (<IN>){ if (/(\d{2}\/\d{2}\/\d{4})\s*$/){ my $date = ymd($1); if ($date >= $start && $date <= $end){ print OUT $_; ++$count; } else { print "$date Skipped : $_"; } } else { print "No date : $_"; } } close IN; close OUT; printf "%d records written to '%s'\n",$count, $outfile; sub ymd { sprintf "%04d%02d%02d",reverse split /\D/,shift; }
    poj
Re: extract lines from file between date range.
by kcott (Archbishop) on Mar 24, 2019 at 08:52 UTC

    G'day rinkish85,

    Firstly, I concur with what others have said regarding making no effort. This is the third question you've posted; and the third time you've just provided some sort of spec and expected others to do your work for you. Please read "How (Not) To Ask A Question", paying particular attention to the "Do Your Own Work" section. Then read "How do I post a question effectively?" and "SSCCE" to learn how to improve your posts.

    The following is not a canned solution; it justs shows a technique that might be useful for you. You'll need to get your start and end dates into a canonical format (which may be different to what I've used). You'll need to handle your own I/O (see open if you don't know how to do that). If you're unfamiliar with the 'r' modifier, see "Non-destructive substitution"; and, if your version of Perl is earlier than 5.14, you'll need an intermediary variable to hold the original value of each record.

    Here's the script showing the technique:

    #!/usr/bin/env perl use 5.014; use warnings; my ($start, $end) = qw{20180201 20180206}; while (<DATA>) { my $date = s/^.*?(\d{2})\/(\d{2})\/(\d{4})\s*$/$3$2$1/r; next unless $date ge $start && $date le $end; print; } __DATA__ DQ94JD84 S8G2H A9X946N 111.222.333.123 HOME - [01/Feb/2018:01 +:08:39 -0800] 01/02/2018 XA29EN35 M4C6M D7F577Q 111.222.333.123 AWAY - [02/Feb/2018:01 +:08:39 -0800] 02/02/2018 JK20TQ67 K1L0V T6Z148X 111.222.333.123 HOME - [03/Feb/2018:01 +:08:39 -0800] 03/02/2018 SO78NZ28 B5S8J W9F920Z 111.222.333.123 HOME - [04/Feb/2018:01 +:08:39 -0800] 04/02/2018 BI55SY64 R6P5H A9U757R 111.222.333.123 HOME - [05/Feb/2018:01 +:08:39 -0800] 05/02/2018 MH72RG27 Y6X0N C7E352J 111.222.333.123 HOME - [01/Feb/2018:01 +:08:39 -0800] 01/02/2018 ET43US76 F5S3W X2L870O 111.222.333.123 HOME - [06/Feb/2018:01 +:08:39 -0800] 06/02/2018 QA43MM82 Y3K0O H8E229P 111.222.333.123 HOME - [11/Feb/2018:01 +:08:39 -0800] 11/02/2018 MK72HI41 A6W2I M3X402D 111.222.333.123 HOME - [15/Feb/2018:01 +:08:39 -0800] 15/02/2018 LJ28XY72 E0Z4L E8W757S 111.222.333.123 HOME - [22/Mar/2018:01 +:08:39 -0800] 22/03/2018 TJ47EG77 W5J6A A7L557Q 111.222.333.123 HOME - [01/Feb/2018:01 +:08:39 -0800] 01/02/2018 TJ47EG78 E6L8Z J4X329P 216.142.233.73 - - [30/Nov/2018:00:03: +08 -0500] 30/11/2018 TJ47EG79 N2V1H Q1Y615G 10.101.128.66 - - [03/Apr/2018:00:42:1 +1 -0500] 03/04/2018

    Here's the output after running that script:

    DQ94JD84 S8G2H A9X946N 111.222.333.123 HOME - [01/Feb/2018:01 +:08:39 -0800] 01/02/2018 XA29EN35 M4C6M D7F577Q 111.222.333.123 AWAY - [02/Feb/2018:01 +:08:39 -0800] 02/02/2018 JK20TQ67 K1L0V T6Z148X 111.222.333.123 HOME - [03/Feb/2018:01 +:08:39 -0800] 03/02/2018 SO78NZ28 B5S8J W9F920Z 111.222.333.123 HOME - [04/Feb/2018:01 +:08:39 -0800] 04/02/2018 BI55SY64 R6P5H A9U757R 111.222.333.123 HOME - [05/Feb/2018:01 +:08:39 -0800] 05/02/2018 MH72RG27 Y6X0N C7E352J 111.222.333.123 HOME - [01/Feb/2018:01 +:08:39 -0800] 01/02/2018 ET43US76 F5S3W X2L870O 111.222.333.123 HOME - [06/Feb/2018:01 +:08:39 -0800] 06/02/2018 TJ47EG77 W5J6A A7L557Q 111.222.333.123 HOME - [01/Feb/2018:01 +:08:39 -0800] 01/02/2018

    — Ken

Re: extract lines from file between date range.
by haukex (Archbishop) on Mar 24, 2019 at 22:34 UTC
Re: extract lines from file between date range.
by karlgoethebier (Abbot) on Mar 23, 2019 at 16:07 UTC

    See also

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: extract lines from file between date range.
by NetWallah (Canon) on Mar 24, 2019 at 19:32 UTC
    Offering the obligatory one-liner for this:
    perl -ane '$d=join q|/|,reverse(split q|/|, $F[8],3); print if $d ge q +|2018/02/01| and $d le q|2018/02/06|' YOUR-FILE.TXT

                    "It's ten o'clock... Do you know where your AI programs are?"