You haven't shown us what you have tried so far, nor what your criteria for "best" are (fastest, least memory, simplest, etc). We cannot easily help you if you don't do your homework first.
From what you've given us, I'd egrep -v the data from the command line and skip perl altogether ...
The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. — Cyrus H. Gordon
| [reply] [d/l] |
| [reply] [d/l] |
One way to do this (discard N lines previous to a match) is to keep a buffer of seen lines which is at least N long. You print lines which overflow out normally and then just discard the buffer on a match. Lastly, remember to print anything in the buffer at the end.
In code:
#!/usr/bin/perl
use strict;
use warnings;
# We don't really need an array for one line, but it seems
# conceptually nicer (and generalises more easily)
my $max_buffer_size = 1;
my @buffer;
my $line;
while ($line = <ARGV>) {
push @buffer, $line;
# Replace the pattern match with your criterion if this
# isn't right.
@buffer = () if $line =~ /^.id/;
if (scalar @buffer > $max_buffer_size) {
print shift @buffer;
}
}
print @buffer;
| [reply] [d/l] |
I can, off the top of my head, think of at least three ways, which I view as distinct:
- Use Tie::File, which lets you treat a file (more or less) as an array.
- Use File::ReadBackwards, which (duh!) reads a file backwards.
- Read the file a record at a time, but keep track of the contents of the current and previous record, and print them as needed.
If your data are as shown (a name, followed by lines starting with labels, such as <id:>, <city:>, etc), something like this may work:
#!perl
use strict;
use warnings;
open($my fh, "<", $infile) or die "Could not open $infile because $!\n
+";
while(<$fh>){
next if /^<id:>|^[A-Za-z]/;
print;
}
Now, if my regex brain is turned on, this regex should skip lines which start with <id:> or start with letters. Incidently, how is this anonymyzing data if you're leaving addresses and phone numbers?
Update
Having noticed ww's comment in a message, I may have misread or confused the title ("Remove line above matching criteria") and "extract name and id". The regex I put in the sample above (unless I screwed it up) should skip the name and id; to skip everything else one could change the "if" to "unless".
emc
If it's not foggy out, I need new glasses.
| [reply] [d/l] |
To reiterate what idsfa said, we need to know what you have tried, what the record separators are, etc, etc. Just generally more information.
That being said, you could always load each record into a variable (hash or array). Then pass that variable off to a function that checks the data for inconsistancies, or whatever you are looking for. Removes whatever needs to be removed (since manipulating a hash or array is simple if the structure doesn't change), and then return the new variable to the main program. Obviously this can also be done via an OO method (and this is likely preferred since this sounds to be repetative data). | [reply] |
Your first paragraph ("extract user names and ID's…") and your final paragraph ("remove the ID and name for each entry") are contradictory.
"Extract" usually means "copy from a file (or database, etc) for use elsewhere". "Remove" usually means "erase"; they are not (in the English version of computerese) synonyms, although they are in normal English (having "a tooth extracted" and "a tooth removed" both result in one fewer teeth in one's mouth).
emc
At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.
—Igor Sikorsky, reported in AOPA Pilot magazine February 2003.
| [reply] |