Re: extract lines from file between date range.
by thanos1983 (Parson) on Mar 23, 2019 at 17:34 UTC
|
Hello rinkish85,
Part of learning how to write a script is trying and failing. Apart from that golden rule is also research on the web. If you had spend 5 minutes searching you would have found Getting lines in a file between two patterns and How do I extract all text between two keywords like start and end? that have previously asked on this forum.
Fellow Monk poj has already replied to your question but just for fun here another way. Since you do not specify if you want to capture the dates between starting and ending point in between if they are duplicates of dates, see below another possible way to do it based on your sample of data:
The code provides you the ability to insert multiple files through the @ARGV.
Update: In case you do not want to include also the starting point and the ending point simply increment the date in regex by one unit on the start and decrease by one on the end. This way it will not include start and end.
Sample below:
Update2: Reducing the lines a bit. See below:
Hope this helps, BR.
Seeking for Perl wisdom...on the process of learning...not there...yet!
| [reply] [d/l] [select] |
Re: extract lines from file between date range.
by Marshall (Canon) on Mar 23, 2019 at 21:41 UTC
|
From reading posts so far, I take it that the Monks are irritated that you haven't presented any attempts at code of your own. Giving an attempt yourself is an important part of the learning process. The Monks will grow deaf to your requests if you don't start writing code yourself!
Comparing dates/times can get complicated. In general, I recommend converting input strings to output strings in ISO format 3. This is like "YYYY-MM-DD" where leading 0's are required for single digit fields. The reason for doing this is that you can use a string comparison or simple sort rather than a more complex numeric sort. Here is some code for you. Note that I use string comparisons of "ge" (greater than or equal) and "le" (less than or equal).
Update: Minor code improvement to if statement, deleted $_ from print $_ as that is extraneous.
#!/usr/bin/perl
use strict;
use warnings;
my $start ='2018-02-03'; #ISO 3 format YYYY-MM-DD
my $end ='2018-02-06';
while (<DATA>)
{
if (my ($day, $month, $year) = $_ =~ m/(\d{2})\/(\d{2})\/(\d{4})\s*
+$/)
{
my $this_date = "$year-$month-$day";
print if ($this_date ge $start and $this_date le $end);
}
}
=prints
K20TQ67 K1L0V T6Z148X 111.222.333.123 HOME - [03/Feb/2018:01:
+08:39 -0800] 03/02/2018
SO78NZ28 B5S8J W9F920Z 111.222.333.123 HOME - [04/Feb/2018:01
+:08:39 -0800] 04/02/2018
BI55SY64 R6P5H A9U757R 111.222.333.123 HOME - [05/Feb/2018:01
+:08:39 -0800] 05/02/2018
ET43US76 F5S3W X2L870O 111.222.333.123 HOME - [06/Feb/2018:01
+:08:39 -0800] 06/02/2018
=cut
__DATA__
some bogus line here
DQ94JD84 S8G2H A9X946N 111.222.333.123 HOME - [01/Feb/2018:01
+:08:39 -0800] 01/02/2018
XA29EN35 M4C6M D7F577Q 111.222.333.123 AWAY - [02/Feb/2018:01
+:08:39 -0800] 02/02/2018
JK20TQ67 K1L0V T6Z148X 111.222.333.123 HOME - [03/Feb/2018:01
+:08:39 -0800] 03/02/2018
SO78NZ28 B5S8J W9F920Z 111.222.333.123 HOME - [04/Feb/2018:01
+:08:39 -0800] 04/02/2018
BI55SY64 R6P5H A9U757R 111.222.333.123 HOME - [05/Feb/2018:01
+:08:39 -0800] 05/02/2018
MH72RG27 Y6X0N C7E352J 111.222.333.123 HOME - [01/Feb/2018:01
+:08:39 -0800] 01/02/2018
ET43US76 F5S3W X2L870O 111.222.333.123 HOME - [06/Feb/2018:01
+:08:39 -0800] 06/02/2018
QA43MM82 Y3K0O H8E229P 111.222.333.123 HOME - [11/Feb/2018:01
+:08:39 -0800] 11/02/2018
MK72HI41 A6W2I M3X402D 111.222.333.123 HOME - [15/Feb/2018:01
+:08:39 -0800] 15/02/2018
LJ28XY72 E0Z4L E8W757S 111.222.333.123 HOME - [22/Mar/2018:01
+:08:39 -0800] 22/03/2018
TJ47EG77 W5J6A A7L557Q 111.222.333.123 HOME - [01/Feb/2018:01
+:08:39 -0800] 01/02/2018
TJ47EG78 E6L8Z J4X329P 216.142.233.73 - - [30/Nov/2018:00:03:
+08 -0500] 30/11/2018
TJ47EG79 N2V1H Q1Y615G 10.101.128.66 - - [03/Apr/2018:00:42:1
+1 -0500] 03/04/2018
I don't know where the trailing, 22/03/2018 stuff came from? It appears that [03/Apr/2018:00:42:11 -0500] is the real time stamp. The -0800 means 8 hours behind GMT or UTC (perhaps Pacific Standard Time?). Yes, there is a fine difference between GMT and UTC that is meaningless here. Typically what would be done here is convert these times into UTC and use that for all DB storage. Convert to local time like perhaps Pacific Daylight Savings Time which is -0700 from GMT for user presentation.
| [reply] [d/l] [select] |
Re: extract lines from file between date range.
by hippo (Archbishop) on Mar 23, 2019 at 11:14 UTC
|
It's nice to have a fairly solid spec. On that basis, Perl does sound like a sensible way to tackle this. Good luck with your project, rinkish85.
| [reply] |
Re: extract lines from file between date range.
by bliako (Abbot) on Mar 23, 2019 at 12:10 UTC
|
I second hippo's Good Luck wishes to rinkish85 and I would hereby like to copyright the term "Cuckoo egg programming" (on artistic licence) to describe a very specific way of earning a wage as a modern-day programmer.
I would also like to commend on those Human Resources departments all over the world that manage to spot such talents practicing the art of "Cuckoo egg programming" and do hire said talent adding immense value to their respective companies: airplanes repeatedly falling down killing innocent folk, stock-markets crashing, online systems with holes like swiss cheese, banks' computer systems crippled for months, processors wide open to attack for years, hospitals either releasing patient records to anyone or crippling doctor access to them, hospitals held hostage by scriptkids because they relied on mickey-mouse OS thanks to advice by consultants, airlines computer systems failing for weeks denying flying to millions, the list is quite long.
bw, bliako
| [reply] |
Re: extract lines from file between date range.
by poj (Abbot) on Mar 23, 2019 at 11:13 UTC
|
#!/usr/bin/perl
use strict;
use warnings;
my $start = ymd('01/02/2018');
my $end = ymd('06/02/2018');
my $infile = 'input.txt';
my $outfile = 'output.txt';
open IN, '<',$infile
or die "Could not open '$infile' : $!";
open OUT, '>',$outfile
or die "Could not open '$outfile' : $!";
my $count=0;
while (<IN>){
if (/(\d{2}\/\d{2}\/\d{4})\s*$/){
my $date = ymd($1);
if ($date >= $start && $date <= $end){
print OUT $_;
++$count;
} else {
print "$date Skipped : $_";
}
} else {
print "No date : $_";
}
}
close IN;
close OUT;
printf "%d records written to '%s'\n",$count, $outfile;
sub ymd {
sprintf "%04d%02d%02d",reverse split /\D/,shift;
}
poj | [reply] [d/l] |
Re: extract lines from file between date range.
by kcott (Archbishop) on Mar 24, 2019 at 08:52 UTC
|
G'day rinkish85,
Firstly, I concur with what others have said regarding making no effort.
This is the third question you've posted;
and the third time you've just provided some sort of spec and expected others to do your work for you.
Please read "How (Not) To Ask A Question", paying particular attention to the
"Do Your Own Work" section.
Then read "How do I post a question effectively?" and "SSCCE" to learn how to improve your posts.
The following is not a canned solution; it justs shows a technique that might be useful for you.
You'll need to get your start and end dates into a canonical format (which may be different to what I've used).
You'll need to handle your own I/O (see open if you don't know how to do that).
If you're unfamiliar with the 'r' modifier, see
"Non-destructive substitution"; and,
if your version of Perl is earlier than 5.14, you'll need an intermediary variable to hold the original value
of each record.
Here's the script showing the technique:
#!/usr/bin/env perl
use 5.014;
use warnings;
my ($start, $end) = qw{20180201 20180206};
while (<DATA>) {
my $date = s/^.*?(\d{2})\/(\d{2})\/(\d{4})\s*$/$3$2$1/r;
next unless $date ge $start && $date le $end;
print;
}
__DATA__
DQ94JD84 S8G2H A9X946N 111.222.333.123 HOME - [01/Feb/2018:01
+:08:39 -0800] 01/02/2018
XA29EN35 M4C6M D7F577Q 111.222.333.123 AWAY - [02/Feb/2018:01
+:08:39 -0800] 02/02/2018
JK20TQ67 K1L0V T6Z148X 111.222.333.123 HOME - [03/Feb/2018:01
+:08:39 -0800] 03/02/2018
SO78NZ28 B5S8J W9F920Z 111.222.333.123 HOME - [04/Feb/2018:01
+:08:39 -0800] 04/02/2018
BI55SY64 R6P5H A9U757R 111.222.333.123 HOME - [05/Feb/2018:01
+:08:39 -0800] 05/02/2018
MH72RG27 Y6X0N C7E352J 111.222.333.123 HOME - [01/Feb/2018:01
+:08:39 -0800] 01/02/2018
ET43US76 F5S3W X2L870O 111.222.333.123 HOME - [06/Feb/2018:01
+:08:39 -0800] 06/02/2018
QA43MM82 Y3K0O H8E229P 111.222.333.123 HOME - [11/Feb/2018:01
+:08:39 -0800] 11/02/2018
MK72HI41 A6W2I M3X402D 111.222.333.123 HOME - [15/Feb/2018:01
+:08:39 -0800] 15/02/2018
LJ28XY72 E0Z4L E8W757S 111.222.333.123 HOME - [22/Mar/2018:01
+:08:39 -0800] 22/03/2018
TJ47EG77 W5J6A A7L557Q 111.222.333.123 HOME - [01/Feb/2018:01
+:08:39 -0800] 01/02/2018
TJ47EG78 E6L8Z J4X329P 216.142.233.73 - - [30/Nov/2018:00:03:
+08 -0500] 30/11/2018
TJ47EG79 N2V1H Q1Y615G 10.101.128.66 - - [03/Apr/2018:00:42:1
+1 -0500] 03/04/2018
Here's the output after running that script:
DQ94JD84 S8G2H A9X946N 111.222.333.123 HOME - [01/Feb/2018:01
+:08:39 -0800] 01/02/2018
XA29EN35 M4C6M D7F577Q 111.222.333.123 AWAY - [02/Feb/2018:01
+:08:39 -0800] 02/02/2018
JK20TQ67 K1L0V T6Z148X 111.222.333.123 HOME - [03/Feb/2018:01
+:08:39 -0800] 03/02/2018
SO78NZ28 B5S8J W9F920Z 111.222.333.123 HOME - [04/Feb/2018:01
+:08:39 -0800] 04/02/2018
BI55SY64 R6P5H A9U757R 111.222.333.123 HOME - [05/Feb/2018:01
+:08:39 -0800] 05/02/2018
MH72RG27 Y6X0N C7E352J 111.222.333.123 HOME - [01/Feb/2018:01
+:08:39 -0800] 01/02/2018
ET43US76 F5S3W X2L870O 111.222.333.123 HOME - [06/Feb/2018:01
+:08:39 -0800] 06/02/2018
TJ47EG77 W5J6A A7L557Q 111.222.333.123 HOME - [01/Feb/2018:01
+:08:39 -0800] 01/02/2018
| [reply] [d/l] [select] |
Re: extract lines from file between date range.
by haukex (Archbishop) on Mar 24, 2019 at 22:34 UTC
|
| [reply] [d/l] |
Re: extract lines from file between date range.
by karlgoethebier (Abbot) on Mar 23, 2019 at 16:07 UTC
|
See also
«The Crux of the Biscuit is the Apostrophe»
perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help
| [reply] [d/l] |
Re: extract lines from file between date range.
by NetWallah (Canon) on Mar 24, 2019 at 19:32 UTC
|
Offering the obligatory one-liner for this:
perl -ane '$d=join q|/|,reverse(split q|/|, $F[8],3); print if $d ge q
+|2018/02/01| and $d le q|2018/02/06|' YOUR-FILE.TXT
"It's ten o'clock... Do you know where your AI programs are?"
| [reply] [d/l] |