using an RE to consolidate repeated words

jaa has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I am trying to use an RE to consolidate repeated words

A simplified example of my before / after;

  my $before  = "apples/apples_green";
  my $after   = $before;
  my $desired = "apples_green";

  # I tried this, but no go!
  $after =~ s#^(\w+)/$1_#$1_#;

  print "before [$before] after [$after] desired [$desired]\n";
[download]

Any suggestions much appreciated!

Regards,

Jeff

Comment on using an RE to consolidate repeated words Download Code

Replies are listed 'Best First'.
Re: using an RE to consolidate repeated words by GrandFather (Saint) on Aug 16, 2005 at 11:19 UTC
`s#^(\w+)/$1_#$1_#` should be `s#^(\w+)/\1_#$1_#`. $1 does not work in the match string, only in the replace string. You have to use \1 in the match string. Perl is Huffman encoded by design.	[reply] [d/l] [select]
Re^2: using an RE to consolidate repeated words by jaa (Friar) on Aug 16, 2005 at 11:50 UTC
Fab - thanks and kudos, just the pointer I was looking for! Regards, Jeff	[reply]
Re: using an RE to consolidate repeated words by inman (Curate) on Aug 16, 2005 at 11:23 UTC
You need to use a back reference. `$after =~ s#^(\w+)/\1#$1#;` [download] Tidying up the syntax and changing the test for the bit in between the repeats gives: `$after =~ s/(\w+)\W+\1/$1/;` [download] I assumed that there is no need to anchor the pattern to the start of the string so I removed the ^	[reply] [d/l] [select]
Re: using an RE to consolidate repeated words by GrandFather (Saint) on Aug 16, 2005 at 11:16 UTC
Can you provide a more realistic set of data. The single example you have given looks rather artificial. Alternatively, what generates data that looks like that? Perl is Huffman encoded by design.	[reply]
Re^2: using an RE to consolidate repeated words by jaa (Friar) on Aug 16, 2005 at 11:46 UTC
The data is being generated as part of a backup process, whose naming is outside my control. I have to collate stats on the various backup folders, by a derived group name. If I were to hand code it, I would do something like: use File::Basename qw( dirname basename ); for my $folder ( '/var/vavoom/cherry/cherry_etc', # cherry_etc '/var/varoom/cherry/cherry_var_data', # cherry_var_data '/var/vavoom/peach/peach_etc', # peach_etc '/var/varoom/mysql/peach_mysql_chant', # mysql_peach_chant '/var/vavoom/upload/var_finite', # upload_var_finite '/var/vavoom/upload/var_open', # upload_var_open ) { my $group = basename($folder); my $parent = basename(dirname($folder)); $group =~ s/$parent\_//g; $group = $parent . '_' . $group; print sprintf("%-40s %s\n", $folder, $group ); } /var/vavoom/cherry/cherry_etc cherry_etc /var/varoom/cherry/cherry_var_data cherry_var_data /var/vavoom/peach/peach_etc peach_etc /var/varoom/mysql/peach_mysql_chant mysql_peach_chant /var/vavoom/upload/var_finite upload_var_finite /var/vavoom/upload/var_open upload_var_open [download] I was hoping for pointers to an RE technique that would enable me to consolidate repeating words. Regards, Jeff	[reply] [d/l]


"be consistent"
	PerlMonks