Easiest way to filter a file based on user input

Peter Keystrokes has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Easiest way to filter a file based on user input by haukex (Archbishop) on Jul 07, 2017 at 10:51 UTC
Sorry, but I don't understand your description. Could you show a few examples of what the user will be entering on the command line, and for each sample input show which lines should be filtered and which shouldn't?	[reply]
Re^2: Easiest way to filter a file based on user input by Peter Keystrokes (Beadle) on Jul 07, 2017 at 11:07 UTC
So for example, if the user enters -3, all the lines in the file that begin with a numerical value that is greater than 3 will be excluded. Or if the user enters the value 1, all lines beginning with a value greater than 1 will be excluded.	[reply]
Re^3: Easiest way to filter a file based on user input (updated) by haukex (Archbishop) on Jul 07, 2017 at 11:38 UTC
So for example, if the user enters -3, all the lines in the file that begin with a numerical value that is greater than 3 will be excluded. Or if the user enters the value 1, all lines beginning with a value greater than 1 will be excluded. So based on that description, the user entering `-3` is the same as entering `3`? `#!/usr/bin/env perl use strict; use warnings; print "Enter limit: "; chomp( my $limit = <STDIN> ); $limit = abs($limit); open my $in, '<', "file.hairpin" or die $!; open my $sifted, '>', "new_file.hairpin" or die $!; while (<$in>){ next if /^None/; next if /^(\d+)/ && $1 > $limit; print $sifted $_; } close $in; close $sifted;` [download] Or as a oneliner (where "123" is the limit): `perl -ne 'print unless /^None/ \|\| ( /^(\d+)/ && $1>123 )' file.hairpin + >new_file.hairpin` [download] As for your code here, it looks like you don't need to collect your lines in arrays but can write them to the output file directly (or, at the very least you don't need to open your output file once per line of output). Update: I just noticed that the sample input in the OP includes decimals and negative numbers, so you'd have to adjust the regex in my example code above accordingly. But before you try to develop really complex regexes, have a look at Regexp::Common::number.	[reply] [d/l] [select]
Re^4: Easiest way to filter a file based on user input (updated) by Peter Keystrokes (Beadle) on Jul 07, 2017 at 16:24 UTC
Re^5: Easiest way to filter a file based on user input (updated) by hippo (Archbishop) on Jul 07, 2017 at 16:44 UTC
Some notes below your chosen depth have not been shown here
Re^4: Easiest way to filter a file based on user input (updated) by Peter Keystrokes (Beadle) on Jul 07, 2017 at 14:51 UTC
Re: Easiest way to filter a file based on user input by 1nickt (Canon) on Jul 07, 2017 at 10:49 UTC
Hi, please show the code you have tried, reduced to an SSCCE. The way forward always starts with a minimal test.	[reply]
Re^2: Easiest way to filter a file based on user input by Peter Keystrokes (Beadle) on Jul 07, 2017 at 11:04 UTC
My sincerest apologies for the amateurish code you're about to see... #!/usr/bin/perl use strict; use warnings; print "The lower the score the more stable the structure.", "\n", "Please set a limiting value e.g. -3: ", "\n"; my $value = <STDIN>; open IN, "file.hairpin", or die $!; my @trash; my @treasure; while (<IN>){ if ($_ =~ /^>+/){ push @treasure, $_; }elsif($_ =~ /^None+/){ push @trash, $_; }elsif($_ =~ /(^d+)/){ ## Here I don't know how to incorporate the value I get from the us +er with the value ## in the file }else{ push @treasure, $_; } } close IN; foreach my $stuff (@treasure){ open SIFTED, '>>', "new_file.hairpin", or die $!; print SIFTED, $stuff."\n"; close SIFTED; } [download]	[reply] [d/l]
Re^3: Easiest way to filter a file based on user input by 1nickt (Canon) on Jul 07, 2017 at 12:28 UTC
Hi, thanks for posting your code. Here is a version that appears to do what you want. Note the following things: You need to `chomp()` the user input to remove the newline so you can use the string in a comparison. The sample data you provided doesn't contain anything that would match your first regexp. The regexp for matching the start of the line with the user input crudely and only matches negative numbers with exactly one integer and one decimal place. You'll need to change it if the user could enter a positive number, or a negative integer, or anything else. After capturing the match it is available in the special variable `$1`, which is used for the comparison. I placed your sample data in the script in the __DATA__ section for this demo; it's fine to open and read a file as in your original. I also skipped the writing to an out file. I placed multiple "debug statements" in the code, i.e. printing out things to show what's going on. Once the program is working correctly you can remove those, but it's a good technique for discovering problems in your data processing. #!/usr/bin/perl use strict; use warnings; use feature 'say'; print "The lower the score the more stable the structure.", "\n", "Please set a limiting value e.g. -3: ", "\n"; chomp( my $value = <STDIN> ); chomp( my @input = <DATA> ); my @trash; my @treasure; for ( @input ){ if ( /^>+/ ) { say "$_ matches '/^>+/'"; push @treasure, $_; } elsif ( /^None/ ) { say "$_ matches '/^None/'"; push @trash, $_; } elsif( /(^[\d\.-]{4})/ ) { say "$_ matches '/(^[\d\.-]{4})/'"; if ( $1 <= $value ) { say "$1 is <= $value"; push @treasure, $_; } else { say "$1 is > $value"; push @trash, $_; } } else { say "$_ doesn't match anything!"; push @trash, $_; } } say 'Treasure:'; foreach my $stuff ( @treasure ) { say $stuff; } __END__ hsa_circ_0067224\|chr3:128345575-128345675-\|NM_002950\|RPN1 FORWARD -4.4.. 6 .. 17 xxxxxxxxxxGTGAC CAGT ATGC ACTG AAGATGAGGTTTGTG -0.9.. 5 .. 18 xxxxxxxxxxxGTGA CCAGT ATGC ACTGA AGATGAGGTTTGTGG None.. 1 .. 20 xxxxxxxxxxxxxxx GTGACCAGTATGCACTGAAG ATGAGGTTTGTGGAC [download] Hope this helps! The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re^4: Easiest way to filter a file based on user input by Peter Keystrokes (Beadle) on Jul 07, 2017 at 14:22 UTC