Can you be a little more specific about which duplicates you wish to ignore? Do you expect to find duplicate words within the same line? Within the file but not within a single line? | [reply] |
Ok, the above code places the entries in an outlog.
Here's an example of one of the logs:
cell-tstc-65_DM/userQ
cell-tstc-65_DM/userQ
node-tstc-65_DM/userF
I need to be able to ignore the blank space, and the duplicate. A blank entry is getting loaded when the authAlias is set to null: ""
| [reply] [d/l] |
Ok, so it sounds like you have a single authAlias on any given line, but the same authAlias may be present multiple times in a given file. And you want a list of all the unique authAlias entries within the file. Given that assumption, what you want to do is store each authAlias in a hash within the loop, filtering out any that you're not interested in (either because they've already been found, or because they are not valid). So, with those changes, you end up with:
#!/usr/bin/perl
my $data_file = '/home/resources.xml';
my $data_out = '/home/out.log';
open DATA, "$data_file" or die "can't open $data_file $!";
open DATA_OUT, ">>$data_out";
my @array_of_data = <DATA>;
my %match_hash;
foreach my $line (@array_of_data){
if ($line =~ m/authDataAlias=(.*-.*-.*_DM)/i){
my $match = $1;
unless($match eq ""){
$match_hash{$match} = 1;
print DATA_OUT "$match\n" unless defined $match_hash{$match};
}
}
}
close (DATA);
close (DATA_OUT);
You could also pull the print statement out of the loop, so that you could open your output file, print all your gathered data, and then close your output file.
| [reply] [d/l] |
I can deal with the duplicates, not that big of a deal, but I think I need help with my regex, some of the stings have a null value, and some have a value, I need the ones that look like this:
authDataAlias="cell-tstc-65_DM/userQ"
I'm only concerned with: cell-tstc-65_DM/userQ
If I do this, sed returns a blank line in the file since one of the authDataAlias strings is set to ="":
($line =~ m/authDataAlias=\"([^\"]*)\"/i)
cell-tstc-65_DM/userQ
cell-tstc-65_DM/user1
If I run this sed command, it ignores the empty sting, but returns too much of the line:
($line =~ m/authDataAlias=(.*-.*-.*_DM\/.*)/i)
"cell-tstc-65_DM/userQ" connectionDefinition="ConnectionDefinition_105
+4132487569" cmpDatasource="DataSource_1195273954323">
"cell-tstc-65_DM/user1" relationalResourceAdapter="builtin_rra" statem
+entCacheSize="10" datasourceHelperClassname="com.ibm.websphere.rsadap
+ter.DB2UniversalDataStoreHelper">
| [reply] [d/l] [select] |