Search and Replace


Come for the quick hacks, stay for the epiphanies.
	PerlMonks

Search and Replace

by nofernandes (Beadle)

on Aug 27, 2003 at 09:08 UTC ( [id://286986]=perlquestion: print w/replies, xml )

Need Help??

nofernandes has asked for the wisdom of the Perl Monks concerning the following question:

Hello sapient Monks,

I have a file that contains various information. Each kind of information has a specific tag, like xml or html, but is not a document of this type.

The file is like this:

<FMS>

   <HEAD>
      ORADEV61
      GL_FORMS   FULL_ANALYSIS
      ORAFORMS50   GET_DATABLOCK_REFS
   </HEAD>

   [<OPEN_API>
      Server8i
      User1
      READABLE
      passworduser1

   </OPEN_API>]

   <MOD>

   <NOT_REANALYZED>
      "G:\GL_FORMS_4\gsdl.fmb"   GL
      "G:\GL_FORMS_4\glccat.fmb"   GL
      "G:\GL_FORMS_4\glsdcnv.fmb"
      "G:\GL_FORMS_4\glcovsca.fmb>"   GL
   </NOT_REANALYZED>

   <EXTERNAL>
   </EXTERNAL>

   <REANALYZED>
      "G:\GL_FORMS_5\glsss.fmb"   GL
      "G:\GL_FORMS_5\glcxcvat.fmb>"   GL
      "G:\GL_FORMS_5\glcsdsnv.fmb"   GL
      "G:\GL_FORMS_5\glcqsoa.fmb" 
      "G:\GL_FORMS_5\glcqsdvur.fmb" 
      "G:\GL_FORMS_5\gldeqsdf.fmb"   GL
      "G:\GL_FORMS_5\glg xf rp.fmb"   GL
      "G:\GL_FORMS_5\gljou.fmb.fmb"   GL
      "G:\GL_FORMS_5\gl sou.fmb" 
      "G:\GL_FORMS_5\gl sou2.mmb.mmb"   GL
   </REANALYZED>

   </MOD>

</FMS>
[download]

My purpose is too erase the content within the <REANALYZED> and </REANALYZED> or any other tags and substitute for a new one..new files!

What´s the best way to do that??

Must i use regex or is there a module to do this kind of operations?

Thank you for your valuable help.

Nuno _{edited by ybiC: removed funky ¶ EOL chars}

Comment on Search and Replace Download Code

Replies are listed 'Best First'.
Re: Search and Replace by liz (Monsignor) on Aug 27, 2003 at 09:17 UTC
If you stick in a: `<?xml version="1.0" encoding="iso-8859-1"?>` [download] you'll find that this actually is valid XML. Which means you could go for an XML type solution, either DOM or stream based. Or you could read the whole file in memory and do regular expressions. Judging from what you've given us, there is no "best" way. There's just TIMTOWTDI. Hope this helps. Liz	[reply] [d/l]
Re: Search and Replace by cchampion (Curate) on Aug 27, 2003 at 12:25 UTC
This script will do what you want, provided that your file is not too big for your memory. #!/usr/bin/perl -w use strict; my $start_flag = '<REANALYZED>'; my $stop_flag = '</REANALYZED>'; my $replacement = <<'REPL'; "z:\path\newfilename1.ext" GL "z:\path\newfilename2.ext" GL "z:\path\newfilename3.ext" GL "z:\path\newfilename4.ext" GL "z:\path\newfilename5.ext" GL "z:\path\newfilename6.ext" GL REPL open ORIGINAL, "< original.txt" or die "can't open original file\n"; my $original; { local $/; $original = <ORIGINAL>; } close ORIGINAL; $original =~ s/ $start_flag .? (\s) $stop_flag \n /$start_flag\n$replacement$1$stop_flag/gsx; open COPY, "> copy.txt" or die "can't write to copy\n"; print COPY $original; close COPY; [download] HTH	[reply] [d/l]
Re: Re: Search and Replace by dbwiz (Curate) on Aug 27, 2003 at 14:47 UTC
What about a one-liner? If you have your replacement strings in "repl.txt", this will do the trick. `perl -0pe 'BEGIN{open R,"repl.txt";$repl=<R>;close R}s/(<REANALYZED>). +*?(<\/REANALYZED>)/$1$repl$2/s' original.txt` [download]	[reply] [d/l]
Re: Search and Replace by aquarium (Curate) on Aug 27, 2003 at 12:41 UTC
if particular (exact) tags match a particular file to replace the content, then you are not at all dealing with regex, but instead a simple string comparison (for the tags that is). something along the lines of the following should just about do it, if i'm understanding your question correctly. `while($line=<>) { chomp $line; if($line=~m/<REANALYZED>/) { print "$line\n"; system("cat new_content_for_reanalyzed_tag_file"); do { $line=<>; chomp $line; } until($line=~m/<\/REANALYZED>/); print "$line\n"; } elsif($line=~m/<ANOTHERTAG>/) { print "$line\n"; system("cat another_file_for_this_tag"); . . . etc. } else { print "$line\n"; }` [download] I'm sure you can make that much simpler by using some perlisms and make some subs out of re-usable code, but that's the main jist of it. Use redirection to feed this program with your input file, and redirect standard output as you need. E.g. perl this_script.pl <orig_file >new_file	[reply] [d/l]
Re: Search and Replace by Abigail-II (Bishop) on Aug 27, 2003 at 09:18 UTC
Must i use regex or is there a module to do this kind of operations? Well, there is never a must to use a regexp. However, the question is a regexp a good way to tackle this problem cannot be answered. You don't define how your "tags" look like, you just give an example. Your example suggests that tags appear on lines of their own, are all written in full caps, don't have attributes, don't nest themselves, and are always of the open/close form (no standalone tags). Depending on the definition of how tags look like and how they can appear, regexes may or may not be the way to go. Abigail	[reply]
Re: Re: Search and Replace by nofernandes (Beadle) on Aug 27, 2003 at 09:34 UTC
The tags are precisely like that. They appear on lines of their own,are all written in full caps, don't have attributes, don't nest themselves, and are always of the open/close form. And i can´t include a line saying that is xml because this file is going to be read by an application that analyze the file and get the various information of it. So, the tags are always like that. They open and they close..no tags inside and always in separated lines. Thanks for your help. Nuno	[reply]
Re: Re: Re: Search and Replace by liz (Monsignor) on Aug 27, 2003 at 09:50 UTC
And i can´t include a line saying that is xml... Actually, if the strange line ending characters "¶" are an artefact of you grabbing and posting the actual content of the file from e.g. Word, then you don't have to include the <?xml line. If there is no indicating of version or encoding, any XML processor will assume version "1.0" and encoding "utf-8". If the characters in your file are all US-ASCII characters (which in this example they are), you are fine. Liz	[reply]
Re: Search and Replace by nofernandes (Beadle) on Aug 27, 2003 at 10:02 UTC
Re: Search and Replace by nofernandes (Beadle) on Aug 27, 2003 at 15:31 UTC
Thank you all for your precious help once again. I´ve made the following code that works use strict; use warnings; my $path='D:\npodev\forms\src'; my @intervalos= ("[a-e]", "[f-h]", "[i-m]", "[n]", "[o-s]", "[t-z]"); my $esquema="RMS_MC"; my $start_flag = '<REANALYZED>'; my $stop_flag = '</REANALYZED>'; sub procura_ficheiros{ my ($path,$intervalo)=@_; opendir(DIR, $path) \|\|die "$!"; my @files=grep{/^$intervalo.\.[f\|m]/i} readdir(DIR); closedir (DIR); my $aux=0; foreach my $pos (@files){ $files[$aux]="\t\"".$path."\\".$files[$aux]."\""." $esquema\n"; $aux++; } return @files; } foreach my $intervalo (@intervalos){ my @ola=procura_ficheiros($path,$intervalo); my $replacement = join "", @ola; open ORIGINAL, "< teste.txt" or die "can't open original file\n"; my $original; { local $/; $original = <ORIGINAL>; } close ORIGINAL; $original =~ s/ $start_flag .? (\s*) $stop_flag \n /$start_flag\n$replacement$1$stop_flag/gsx; open COPY, "> copy.txt" or die "can't write to copy\n"; print COPY $original; close COPY; system "hello.bat"; } [download] Thank you.. Nuno	[reply] [d/l]
Re: Re: Search and Replace by Not_a_Number (Prior) on Aug 27, 2003 at 20:37 UTC
If it ain't broke, there's no need to fix it. Right. Nevertheless, these lines jumped out at me: `my $aux=0; foreach my $pos (@files){ $files[$aux]="\t\"".$path."\\".$files[$aux]."\""." $esquema\n"; $aux++; }` [download] 1. Variable interpolation First, take the line: `$files[$aux]="\t\"".$path."\\".$files[$aux]."\""." $esquema\n";` This reminds me of something else I saw recently, which (simplified) read something like this: `my $line = $first . " " . $second . " ". $third;` As if Perl didn't allow variable interpolation! Your line can be simplified to: `$files[$aux]="\t\"$path\\$files[$aux]\" $esquema\n";` Further, by using `qq()`, you would no longer have to escape the double quotes: `$files[$aux] = qq '\t"$path\\$files[$aux]" $esquema\n';` which I for one find a bit easier on the eye. 2. for/foreach loops But now to come to your `foreach` loop (inner bit modified as above): `my $aux=0; foreach my $pos (@files){ $files[$aux] = qq '\t"$path\\$files[$aux]" $esquema\n'; $aux++; }` [download] I'm far from being a Perl guru, but I think I may safely say that this is ugly :) You can do this far more efficiently in Perl: `foreach my $pos ( 0 .. $#files ) { $files[$pos] = qq '\t"$path\\$files[$pos]" $esquema\n'; }` [download] <parenthesis>Occasionally, you might also come across some rather quaint variations of the above, along the lines of: `foreach ( my $pos = 0; $pos < @files; $pos++ ) { ... }` but it's best to forget these unless you want to increment your counter (`$pos`) by anything other than 1.</parenthesis> However, you would still not be using the full power of Perl. To quote Marc Jason Dominus: "Any time you have a C-like `for` loop that loops over the indices of an array, you're probably making a mistake." A more perlish way to do what you want would be: `foreach my $file ( @files ) { $file = qq '\t"$path\\$file" $esquema\n'; }` [download] In fact, you can use Perl's built-in `$_` variable to shorten that even further: `for ( @files ) { # 'for' and 'foreach' are equivalent $_ = qq '\t"$path\\$_" $esquema\n'; }` [download] From which, to end this far longer than intially intended post, it is only a short step to using `map`: `@files = map { qq '\t"$path\\$_" $esquema\n' } @files;` HTH dave	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom

Log In^?

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: perlquestion [id://286986]
Approved by broquaint
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others examining the Monastery: (5)

As of 2024-04-19 00:53 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found