Need Some help with finding a word in a file

was6guy has asked for the wisdom of the Perl Monks concerning the following question:

I need some help pulling an occurance of a sting out of a file. This sting can happen at different positions in the file, and it does not occur a set amount of times. I'ld like to be able to pull the sting out of the file, and save it to another file, also ignoring duplcate occurances of the string. Here's what the string looks like:

authDataAlias="cell-tstc-65_DM/userQ"

Sring will always be: authDataAlias="*-*-*_DM/*

Here's an example of the line containing the string:

An easy example, I can pull the line containing the string. If I knew where the sring would be each time in the line I could grab it, but since it's random... i'm lost. Could someone help me expand on this:

#!/usr/bin/perl

my $data_file = '/home/resources.xml';
my $data_out = '/home/out.log';
open DATA, "$data_file" or die "can't open $data_file $!";
open DATA_OUT, ">>$data_out";
my @array_of_data = <DATA>;


foreach my $line (@array_of_data)
{

     if ($line =~ m/authDataAlias=.*-.*-.*_DM/i)
        {
        print DATA_OUT "$line\n";
        }
}

close (DATA);
close (DATA_OUT);
[download]

Comment on Need Some help with finding a word in a file Download Code

Replies are listed 'Best First'.
Re: Need Some help with finding a word in a file by jrsimmon (Hermit) on Nov 28, 2007 at 23:57 UTC
You're 99% of the way there already! Use () and $1 in your match to pull out the data you need. Ex: `#!/usr/bin/perl my $data_file = '/home/resources.xml'; my $data_out = '/home/out.log'; open DATA, "$data_file" or die "can't open $data_file $!"; open DATA_OUT, ">>$data_out"; my @array_of_data = <DATA>; my $match; foreach my $line (@array_of_data) { if ($line =~ m/authDataAlias=(.-.-.*_DM)/i) { $match = $1; print DATA_OUT "$line\n"; } } close (DATA); close (DATA_OUT);` [download] The special variables $1, $2, etc are set to the data inside of parens when you use parens to encapsulate part of your regex. So $1 matches the first (...), $2 the second, and so forth.	[reply] [d/l]
Re^2: Need Some help with finding a word in a file by was6guy (Initiate) on Nov 29, 2007 at 14:42 UTC
Thank you so much. Does anyone know how to ignore duplicates?	[reply]
Re^3: Need Some help with finding a word in a file by jrsimmon (Hermit) on Nov 29, 2007 at 15:42 UTC
Can you be a little more specific about which duplicates you wish to ignore? Do you expect to find duplicate words within the same line? Within the file but not within a single line?	[reply]
Re^4: Need Some help with finding a word in a file by was6guy (Initiate) on Nov 29, 2007 at 19:27 UTC
Re^5: Need Some help with finding a word in a file by jrsimmon (Hermit) on Nov 29, 2007 at 20:43 UTC
Re^3: Need Some help with finding a word in a file by was6guy (Initiate) on Nov 29, 2007 at 21:27 UTC
I can deal with the duplicates, not that big of a deal, but I think I need help with my regex, some of the stings have a null value, and some have a value, I need the ones that look like this: authDataAlias="cell-tstc-65_DM/userQ" I'm only concerned with: cell-tstc-65_DM/userQ If I do this, sed returns a blank line in the file since one of the authDataAlias strings is set to ="": `($line =~ m/authDataAlias=\"([^\"])\"/i) cell-tstc-65_DM/userQ cell-tstc-65_DM/user1` [download] If I run this sed command, it ignores the empty sting, but returns too much of the line: `($line =~ m/authDataAlias=(.-.-._DM\/.*)/i) "cell-tstc-65_DM/userQ" connectionDefinition="ConnectionDefinition_105 +4132487569" cmpDatasource="DataSource_1195273954323"> "cell-tstc-65_DM/user1" relationalResourceAdapter="builtin_rra" statem +entCacheSize="10" datasourceHelperClassname="com.ibm.websphere.rsadap +ter.DB2UniversalDataStoreHelper">` [download]	[reply] [d/l] [select]
Re: Need Some help with finding a word in a file by thundergnat (Deacon) on Nov 29, 2007 at 20:54 UTC
If you know what is directly before the information you need, try changing the input record separator. Any time you think "unique", you most likely will want a hash. #!/usr/bin/perl use warnings; use strict; $/ = 'authDataAlias='; my %no_dupes; foreach my $line (<DATA>) { if ($line =~ m/^"(.*?_DM\S+)"/i) { $no_dupes{$1} = 0; } } print "$_\n" for keys %no_dupes; __DATA__ <factories xmi:type="resources.jdbc:CMPConnectorFactory" xmi:id="CMPCo +nnectorFactory_1195273978412" name="dataSource" authMechanismPreferen +ce="BASIC_PASSWORD" authDataAlias="cell-tstc-65_DM/userQ" connectionD +efinition="ConnectionDefinition_1054132487569" cmpDatasource="DataSou +rce_1195273954323"><factories xmi:type="resources.jdbc:CMPConnectorFa +ctory" xmi:id="CMPConnectorFactory_1195273978412" name="dataSource" a +uthMechanismPreference="BASIC_PASSWORD" authDataAlias="cell-tstc-65_D +M/userQ" connectionDefinition="ConnectionDefinition_1054132487569" cm +pDatasource="DataSource_1195273954323"><factories xmi:type="resources +.jdbc:CMPConnectorFactory" xmi:id="CMPConnectorFactory_1195273978412" + name="dataSource" authMechanismPreference="BASIC_PASSWORD" authDataA +lias="cell-tstc-65_DM/userF" connectionDefinition="ConnectionDefiniti +on_1054132487569" cmpDatasource="DataSource_1195273954323"> <factories xmi:type="resources.jdbc:CMPConnectorFactory" xmi:id="CMPCo +nnectorFactory_1195273978412" name="dataSource" authMechanismPreferen +ce="BASIC_PASSWORD" authDataAlias="node-tstc-65_DM/userF" connectionD +efinition="ConnectionDefinition_1054132487569" cmpDatasource="DataSou +rce_1195273954323"> [download]	[reply] [d/l]
Re^2: Need Some help with finding a word in a file by was6guy (Initiate) on Nov 29, 2007 at 21:33 UTC
I think I need help with my regex, some of the stings have a null value, and some have a value, I need the ones that look like this: authDataAlias="cell-tstc-65_DM/userQ" I'm only concerned with: cell-tstc-65_DM/userQ If I do this, sed returns a blank line in the file since one of the authDataAlias strings is set to ="": `($line =~ m/authDataAlias=\"([^\"])\"/i) cell-tstc-65_DM/userQ cell-tstc-65_DM/user1` [download] If I run this sed command, it ignores the empty sting, but returns too much of the line: `($line =~ m/authDataAlias=(.-.-._DM\/.*)/i) "cell-tstc-65_DM/userQ" connectionDefinition="ConnectionDefinition_105 +4132487569" cmpDatasource="DataSource_1195273954323"> "cell-tstc-65_DM/user1" relationalResourceAdapter="builtin_rra" statem +entCacheSize="10" datasourceHelperClassname="com.ibm.websphere.rsadap +ter.DB2UniversalDataStoreHelper">` [download]	[reply] [d/l] [select]
Re^3: Need Some help with finding a word in a file by was6guy (Initiate) on Nov 29, 2007 at 21:41 UTC
Your above example works perfect. THANK YOU!	[reply]