Re: Match Line And Combine Into One Line

Replies are listed 'Best First'.
Re^2: Match Line And Combine Into One Line by jlope043 (Acolyte) on Jul 21, 2016 at 18:45 UTC
Hi Ken, thank you and sorry I should have included the headers of the INPUT file `ACCOUNT,DATE,NOTE H123456,20151209,THIS IS A TEST` [download] All my accounts begin with an alphanumeric which is the reason the H is present, on my other scripts I have I usually use a my $find to simplify my search for the account, example below. `my $find = '^(H0\|HT)'` One question I do have is I normally write my scripts to export to a new file in this case what would be the correct format to do so? I have this but I think I am missing something `use strict; use warnings; my %reformat; my $re = qr{^(H\d+,\d+,)(.*)$}; open (NEW, ">", "Notes_Test_OUTPUT.txt" ) or die "could not open:$ +!"; open (FILE, "<", "Notes_Test.txt") or die "could not open:$!"; while (<FILE>) { chomp; /$re/; push @{ $reformat{$1} }, $2; } print NEW if $_, join ' ', @{ $reformat{$_} } for keys %reformat; close (FILE); close (NEW);` [download]	[reply] [d/l] [select]
Re^3: Match Line And Combine Into One Line by kcott (Archbishop) on Jul 21, 2016 at 20:25 UTC
Firstly, this code won't compile as it contains a syntax error. Do not just post untested code! If you don't understand an error message, post the error you're getting and ask. Here's the offending line: `print NEW if $_, join ' ', @{ $reformat{$_} } for keys %reformat;` [download] Take a look at "perlsyn: Statement Modifiers". The very first sentence starts with: Any simple statement may optionally be followed by a SINGLE modifier, ... "SINGLE" is emphasised for a very good reason: you can only use one statement modifier per statement. In the line I've identified, you've used two: `if` and `for`. Had you tried to run your code, you would have got a syntax error similar to the one in this example: `$ perl -e 'my @x = qw{a b}; print if $_ for @x' syntax error at -e line 1, near "$_ for " Execution of -e aborted due to compilation errors.` [download] You have another issue that isn't an error but which would generate warning messages. The problem is that you haven't accounted for the file header line. You can skip this line with the simple expedient of adding this as the first line of your `while` loop: `next if $. == 1;` [download] `$.` is a special variable that holds the line count. Line 1 is the header line and `next` will effectively ignore it. See "perlvar: Variables related to filehandles" for a more detailed description. It's good that you've used the 3-argument form of open; it's less good that you've chosen global package variables to hold the filehandles and, indeed worse, that you've not chosen meaningful names. Once you get into the habit of using names like `FILE`, you'll use them often and, in all likelihood, multiple times in the same script or module: this is highly error-prone and can lead to bugs that are hard to track down. Instead, use lexical variables, with meaningful names, in the smallest possible scope; this greatly reduces the chances of errors and, in many cases, means you don't even need to use `close` as Perl will do this for you. It's also good that you're checking for I/O errors with "`or die 'error message'`" code; however, hand-crafting these messages is tedious and it's easy to leave out important information or forget to add them altogether. If you use the autodie pragma, Perl will perform this task for you: less work for you and less chances of errors. Putting all that together, along with your additional information, here's a new version of the script. Although not shown, my original script was `pm_1168253_reformat_input.pl`, this one's called `pm_1168253_reformat_input_WITH_FILES.pl`. `#!/usr/bin/env perl -l use strict; use warnings; use autodie; my $input_file = 'pm_1168253_reformat_input_INPUT.txt'; my $output_file = 'pm_1168253_reformat_input_OUTPUT.txt'; my %reformat; my $re = qr{^(H\d+,\d+,)(.*)$}; { open my $in_fh, '<', $input_file; while (<$in_fh>) { next if $. == 1; chomp; /$re/; push @{ $reformat{$1} }, $2; } } { open my $out_fh, '>', $output_file; print $out_fh $_, join ' ', @{ $reformat{$_} } for keys %reformat; }` [download] Note the anonymous blocks. The filehandles go out of scope once these blocks are exited: their reference counts are reduced to zero and Perl performs an implicit `close`. Here's the input file: `$ cat pm_1168253_reformat_input_INPUT.txt ACCOUNT,DATE,NOTE H123456,20151209,THIS IS A TEST H123456,20151209,TO COMBINE ALL H123456,20151209,MY MATCHING LINES H123456,20151209,INTO THE FIRST LINE H123456,20151209,THAT MATCHES. H654321,20151209,MATCH LINES FOR THIS H654321,20151209,ACCT INTO THE H654321,20151209,TOP LINE OF THE ACCT H432165,20151209,SINGLE LINE FOR THIS ONE` [download] And here's the output file before and after running the script: `$ cat pm_1168253_reformat_input_OUTPUT.txt cat: pm_1168253_reformat_input_OUTPUT.txt: No such file or directory $ pm_1168253_reformat_input_WITH_FILES.pl $ cat pm_1168253_reformat_input_OUTPUT.txt H432165,20151209,SINGLE LINE FOR THIS ONE H123456,20151209,THIS IS A TEST TO COMBINE ALL MY MATCHING LINES INTO +THE FIRST LINE THAT MATCHES. H654321,20151209,MATCH LINES FOR THIS ACCT INTO THE TOP LINE OF THE AC +CT` [download] As before, you may need a different ordering for your output but I'm still in the dark as to what you require. — Ken	[reply] [d/l] [select]
Re^4: Match Line And Combine Into One Line by jlope043 (Acolyte) on Jul 21, 2016 at 23:45 UTC
Thank you very much Ken and sorry about all the confusion. This stuff is not as easy as I thought it would be, so trying to explain my goal or outcome is difficult at times. But this worked exactly how I want it to. Thank you again for all your help and explanation.	[reply]
Re^3: Match Line And Combine Into One Line [Follow-up] by kcott (Archbishop) on Jul 22, 2016 at 14:04 UTC
This is a follow-up to my earlier post. Firstly, I hope I didn't give the impression that there was anything special about anonymous blocks and scoping. That's standard block behaviour. From "perlsub: Private Variables via my()": The `my` operator declares the listed variables to be lexically confined to the enclosing block, conditional (`if` /`unless` /`elsif` /`else` ), loop (`for` /`foreach` /`while` /`until` /`continue`), subroutine, `eval`, or `do`/`require`/`use`'d file. ... I used anonymous blocks to demonstrate scoping issues; however, in a real-world application, far from being anonymous, they probably would be named to allow reuse: perhaps, `&read_input` and `&write_output`, called in a loop iterating input filenames from the command line. I made some modifications to (a copy of) the earlier script, to demonstrate: #!/usr/bin/env perl -l use strict; use warnings; use autodie; for (@ARGV) { my ($input_file, $output_file) = ($_, $_ . '__OUTPUT.txt'); my (%reformat, @order); read_input($input_file, \%reformat, \@order); write_output($output_file, \%reformat, \@order); } { my $re; INIT { $re = qr{^(H\d+,\d+,)(.)$} } sub read_input { my ($input_file, $reformat, $order) = @_; open my $in_fh, '<', $input_file; while (<$in_fh>) { next if $. == 1; chomp; /$re/; my ($key, $str_part) = ($1, $2); push @$order, $key unless exists $reformat->{$key}; push @{ $reformat->{$key} }, $str_part; } return; } } sub write_output { my ($output_file, $reformat, $order) = @_; open my $out_fh, '>', $output_file; print $out_fh $_, join ' ', @{ $reformat->{$_} } for @$order; return; } [download] I've been testing like this (without any problems): $ pm_1168253_reformat_input_WITH_FILES_PRODUCTON.pl pm_1168253_reforma +t_input_INPUT.txt pm_1168253_reformat_input_INPUT_CLONE.txt; cat pm_1 +168253_reformat_input_INPUT.txt__OUTPUT.txt; cat pm_1168253_reformat_ +input_INPUT_CLONE.txt__OUTPUT.txt H123456,20151209,THIS IS A TEST TO COMBINE ALL MY MATCHING LINES INTO +THE FIRST LINE THAT MATCHES. H654321,20151209,MATCH LINES FOR THIS ACCT INTO THE TOP LINE OF THE AC +CT H432165,20151209,SINGLE LINE FOR THIS ONE H123456,20151209,THIS IS A TEST TO COMBINE ALL MY MATCHING LINES INTO +THE FIRST LINE THAT MATCHES. H654321,20151209,MATCH LINES FOR THIS ACCT INTO THE TOP LINE OF THE AC +CT H432165,20151209,SINGLE LINE FOR THIS ONE [download] Notes: If you don't know about `@ARGV`, see "perlvar: Variables related to filehandles". The anonymous block (with the `INIT` block) is described in the Persistent variables with closures* section of "perlsub: Persistent Private Variables". While you're there, take a look at the Persistent variables via state() section (it's just above). Note: The `state` function was introduced in Perl v5.10.0 (see "perl5100delta: state() variables" ). "A new class of variables has been introduced. ..." `INIT`, and friends, are described in "perlmod: BEGIN, UNITCHECK, CHECK, INIT and END". There is now a line (in `&read_input`) to capture the same order as the expected output in your OP: `push @$order, $key unless exists $reformat->{$key};` [download] — Ken	[reply] [d/l] [select]
Re^3: Match Line And Combine Into One Line by Corion (Patriarch) on Jul 21, 2016 at 18:52 UTC
What makes you think that you are missing something?	[reply]
Re^4: Match Line And Combine Into One Line by jlope043 (Acolyte) on Jul 21, 2016 at 19:00 UTC
Hi Corion, well first I found a few mistakes I made, but my lines are wrapping and not breaking into new lines for each account. Corrected code below but not working as expected. `use strict; use warnings; my %reformat; my $re = qr{^(H\d+,\d+,)(.*)$}; open (NEW, ">", "Notes_Test_OUTPUT.txt" ) or die "could not open:$ +!"; open (FILE, "<", "Notes_Test.txt") or die "could not open:$!"; while (<FILE>) { chomp; /$re/; push @{ $reformat{$1} }, $2; print NEW $_, join ' ', @{ $reformat{$_} } for keys %reformat; } close (FILE); close (NEW);` [download]	[reply] [d/l]
Re^5: Match Line And Combine Into One Line by Corion (Patriarch) on Jul 21, 2016 at 19:04 UTC