Without knowing how the data is formated, it's hard to give an exact solution, but here it goes.

(Note: as I'm sure you know, everything in Perl can be done a million different ways. I prefer to use hash and array references and do everything in one or two regular expressions when possible.)

First, read in your file and store the unwanted ids:
## open the file and read in data my $list_file = '/g/Viruses/prophage_data/emptySeqList_aa.txt'; ## try to use single quotes when ## you don't need string interpolation, ## e.g., no variables or "\n" open (my $fh, '<', $list_file); ## it is often preferable to use a ## variable to store a filehandle my @lines = <$fh>; # reads entire file in one go ## This is technically bad form, ## but assuming your file isn't too big, it's fine close ($fh); my $text = join ('', @lines); # combines all lines into one string ## Here's where your file format will change the code ## I'm assuming nothing is in the file but gene ids, ## and that each id consists of letters, numbers, and underscores. ## This regex will identify all geneids (using \w+) ## and store them as hash keys. my $geneids_to_remove = {}; # create a hash reference $text =~ s/(\w+) (?{ # in regex code $geneids_to_remove->{$1} = 1; # store geneids in a hash }) //gx;

Now, we read in your other file -- there are two options here:
1) do it per line or
2) do it all at once.

#### Per line #### my $ptt_file="/g/Viruses/prophage_data/prophage_region.ptt1"; open ($fh, '<', $ppt_file); ## precompile a regex to capture the geneid on each line ## I assume the gene id is the first thing on each line my $gene_id; my $rx_find_geneid = qr/^(\w+) (?{ $gene_id = $1; })/x; ## I prefer to avoid $_ for clarity my $saved_lines = ''; while (my $line = <$fh>) { ## run precompiled regex $line =~ /$rx_find_geneid/; ## check to see if it exists in the hash ## if not, save it if (! exists $geneids_to_remove->{$gene_id}) { $saved_lines .= $line; } } close ($fh);
or (my preference)
#### One big regex #### ## don't do this and the previous ## read in file my $ptt_file="/g/Viruses/prophage_data/prophage_region.ptt1"; open ($fh, '<', $ppt_file); @lines = <$fh>; close ($fh); $text = join ('', @lines); ## you don't need to precompile this -- it's for clarity ## and in case you ever want to remove these from multiple ## files, i.e., put it in a loop ## Again, I assume the geneid is at the front of the line. my $saved_lines = ''; my $rx_rm_lines = qr/ (^(\w+).+$ [\r\n]) (?{ if (! exists $geneids_to_remove->{$2}) { $saved_lines .= $1; } }) /xm; # the 'm' modifier enables multiline regex ## run the regex (you can use s/.../$1/g if you ## don't want to destroy the string as you search $text =~ s/$rx_rm_lines//g;
Now write it out (regardless of which method you used)
## write out saved data open ($fh, '>', $outfile); print $fh $saved_lines; close ($fh);

In reply to Re: compare lists and delete unwanted from file by muppetjones
in thread compare lists and delete unwanted from file by AWallBuilder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.