Dear Monks,
I am struggling with text files, which I want to modify with regex. I converted a pdf file to a text file which then contains the following text:
"Inventories
Trade receivables
Assets
This item includes impairment losses."
I now want to remove all lines which are not ending with a period (lines Inventories, Trade receivables, Assets) and have the following code:
use strict; use warnings; use File::Copy; my $destination = "/directory/originalinputfiles/"; my @files = glob ("/directoy/*.txt"); foreach my $file (@files) { my $newfile = $file.".tmp"; open (IN,'<:encoding(UTF-8)', $file) or die "Could not open '$file' +$!"; open (OUT,'>:encoding(UTF-8)', $newfile) or die "Could not open '$ne +wfile' $!"; while (my $text = <IN>) { $text =~ s/^[a-z].*[a-z]$//gmi; print OUT $text; } close (IN) or die "Could not close input file: '$file' $!"; close (OUT) or die "Could not close output file '$newfile' $!"; move $file, $destination or die "Could move the orignal input file: +$file to the folder: '$destination' $!"; rename $newfile, $file or die "Could not rename file: '$newfile' $!" +; print "\n done \n"; }
If I run this script the first three lines are not identified and removed (Inventories, Trade receivables, Assets). However, adding a paragraph manually in the text file after every new line leads to the identification and removal of these lines (Inventories, Trade receivables, Assets) (which is not an option as I have thousands of text files).
I already tried to add a line after every line with regex, which also does not work
$text =~ s/$/\n/;Why is my code not working and how can I solve it?
In reply to Perl regex txt file new line (not recognised?) by thurinus
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |