Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Search and Replace

by nofernandes (Beadle)
on Aug 27, 2003 at 09:08 UTC ( [id://286986]=perlquestion: print w/replies, xml ) Need Help??

nofernandes has asked for the wisdom of the Perl Monks concerning the following question:

Hello sapient Monks,

I have a file that contains various information. Each kind of information has a specific tag, like xml or html, but is not a document of this type.

The file is like this:

<FMS> <HEAD> ORADEV61 GL_FORMS FULL_ANALYSIS ORAFORMS50 GET_DATABLOCK_REFS </HEAD> [<OPEN_API> Server8i User1 READABLE passworduser1 </OPEN_API>] <MOD> <NOT_REANALYZED> "G:\GL_FORMS_4\gsdl.fmb" GL "G:\GL_FORMS_4\glccat.fmb" GL "G:\GL_FORMS_4\glsdcnv.fmb" "G:\GL_FORMS_4\glcovsca.fmb>" GL </NOT_REANALYZED> <EXTERNAL> </EXTERNAL> <REANALYZED> "G:\GL_FORMS_5\glsss.fmb" GL "G:\GL_FORMS_5\glcxcvat.fmb>" GL "G:\GL_FORMS_5\glcsdsnv.fmb" GL "G:\GL_FORMS_5\glcqsoa.fmb" "G:\GL_FORMS_5\glcqsdvur.fmb" "G:\GL_FORMS_5\gldeqsdf.fmb" GL "G:\GL_FORMS_5\glg xf rp.fmb" GL "G:\GL_FORMS_5\gljou.fmb.fmb" GL "G:\GL_FORMS_5\gl sou.fmb" "G:\GL_FORMS_5\gl sou2.mmb.mmb" GL </REANALYZED> </MOD> </FMS>

My purpose is too erase the content within the <REANALYZED> and </REANALYZED> or any other tags and substitute for a new one..new files!

Whatīs the best way to do that??

Must i use regex or is there a module to do this kind of operations?

Thank you for your valuable help.

Nuno

edited by ybiC: removed funky ķ EOL chars

Replies are listed 'Best First'.
Re: Search and Replace
by liz (Monsignor) on Aug 27, 2003 at 09:17 UTC
    If you stick in a:
    <?xml version="1.0" encoding="iso-8859-1"?>
    you'll find that this actually is valid XML. Which means you could go for an XML type solution, either DOM or stream based.

    Or you could read the whole file in memory and do regular expressions.

    Judging from what you've given us, there is no "best" way. There's just TIMTOWTDI.

    Hope this helps.

    Liz

Re: Search and Replace
by cchampion (Curate) on Aug 27, 2003 at 12:25 UTC

    This script will do what you want, provided that your file is not too big for your memory.

    #!/usr/bin/perl -w use strict; my $start_flag = '<REANALYZED>'; my $stop_flag = '</REANALYZED>'; my $replacement = <<'REPL'; "z:\path\newfilename1.ext" GL "z:\path\newfilename2.ext" GL "z:\path\newfilename3.ext" GL "z:\path\newfilename4.ext" GL "z:\path\newfilename5.ext" GL "z:\path\newfilename6.ext" GL REPL open ORIGINAL, "< original.txt" or die "can't open original file\n"; my $original; { local $/; $original = <ORIGINAL>; } close ORIGINAL; $original =~ s/ $start_flag .*? (\s*) $stop_flag \n /$start_flag\n$replacement$1$stop_flag/gsx; open COPY, "> copy.txt" or die "can't write to copy\n"; print COPY $original; close COPY;

    HTH

      What about a one-liner?

      If you have your replacement strings in "repl.txt", this will do the trick.

      perl -0pe 'BEGIN{open R,"repl.txt";$repl=<R>;close R}s/(<REANALYZED>). +*?(<\/REANALYZED>)/$1$repl$2/s' original.txt
Re: Search and Replace
by aquarium (Curate) on Aug 27, 2003 at 12:41 UTC
    if particular (exact) tags match a particular file to replace the content, then you are not at all dealing with regex, but instead a simple string comparison (for the tags that is). something along the lines of the following should just about do it, if i'm understanding your question correctly.
    while($line=<>) { chomp $line; if($line=~m/<REANALYZED>/) { print "$line\n"; system("cat new_content_for_reanalyzed_tag_file"); do { $line=<>; chomp $line; } until($line=~m/<\/REANALYZED>/); print "$line\n"; } elsif($line=~m/<ANOTHERTAG>/) { print "$line\n"; system("cat another_file_for_this_tag"); . . . etc. } else { print "$line\n"; }

    I'm sure you can make that much simpler by using some perlisms and make some subs out of re-usable code, but that's the main jist of it. Use redirection to feed this program with your input file, and redirect standard output as you need. E.g. perl this_script.pl <orig_file >new_file
Re: Search and Replace
by Abigail-II (Bishop) on Aug 27, 2003 at 09:18 UTC
    Must i use regex or is there a module to do this kind of operations?

    Well, there is never a must to use a regexp. However, the question is a regexp a good way to tackle this problem cannot be answered. You don't define how your "tags" look like, you just give an example. Your example suggests that tags appear on lines of their own, are all written in full caps, don't have attributes, don't nest themselves, and are always of the open/close form (no standalone tags). Depending on the definition of how tags look like and how they can appear, regexes may or may not be the way to go.

    Abigail

      The tags are precisely like that. They appear on lines of their own,are all written in full caps, don't have attributes, don't nest themselves, and are always of the open/close form.

      And i canīt include a line saying that is xml because this file is going to be read by an application that analyze the file and get the various information of it.

      So, the tags are always like that. They open and they close..no tags inside and always in separated lines.

      Thanks for your help.

      Nuno

        And i canīt include a line saying that is xml...

        Actually, if the strange line ending characters "ķ" are an artefact of you grabbing and posting the actual content of the file from e.g. Word, then you don't have to include the <?xml line. If there is no indicating of version or encoding, any XML processor will assume version "1.0" and encoding "utf-8". If the characters in your file are all US-ASCII characters (which in this example they are), you are fine.

        Liz

Re: Search and Replace
by nofernandes (Beadle) on Aug 27, 2003 at 15:31 UTC

    Thank you all for your precious help once again.

    Iīve made the following code that works

    use strict; use warnings; my $path='D:\npodev\forms\src'; my @intervalos= ("[a-e]", "[f-h]", "[i-m]", "[n]", "[o-s]", "[t-z]"); my $esquema="RMS_MC"; my $start_flag = '<REANALYZED>'; my $stop_flag = '</REANALYZED>'; sub procura_ficheiros{ my ($path,$intervalo)=@_; opendir(DIR, $path) ||die "$!"; my @files=grep{/^$intervalo.*\.[f|m]/i} readdir(DIR); closedir (DIR); my $aux=0; foreach my $pos (@files){ $files[$aux]="\t\"".$path."\\".$files[$aux]."\""." $esquema\n"; $aux++; } return @files; } foreach my $intervalo (@intervalos){ my @ola=procura_ficheiros($path,$intervalo); my $replacement = join "", @ola; open ORIGINAL, "< teste.txt" or die "can't open original file\n"; my $original; { local $/; $original = <ORIGINAL>; } close ORIGINAL; $original =~ s/ $start_flag .*? (\s*) $stop_flag \n /$start_flag\n$replacement$1$stop_flag/gsx; open COPY, "> copy.txt" or die "can't write to copy\n"; print COPY $original; close COPY; system "hello.bat"; }

    Thank you..

    Nuno

      If it ain't broke, there's no need to fix it. Right. Nevertheless, these lines jumped out at me:

      my $aux=0; foreach my $pos (@files){ $files[$aux]="\t\"".$path."\\".$files[$aux]."\""." $esquema\n"; $aux++; }

      1. Variable interpolation

      First, take the line:

      $files[$aux]="\t\"".$path."\\".$files[$aux]."\""."   $esquema\n";

      This reminds me of something else I saw recently, which (simplified) read something like this:

      my $line = $first . " " . $second . " ". $third;

      As if Perl didn't allow variable interpolation! Your line can be simplified to:

      $files[$aux]="\t\"$path\\$files[$aux]\"   $esquema\n";

      Further, by using qq(), you would no longer have to escape the double quotes:

      $files[$aux] = qq '\t"$path\\$files[$aux]"   $esquema\n';

      which I for one find a bit easier on the eye.

      2. for/foreach loops

      But now to come to your foreach loop (inner bit modified as above):

      my $aux=0; foreach my $pos (@files){ $files[$aux] = qq '\t"$path\\$files[$aux]" $esquema\n'; $aux++; }

      I'm far from being a Perl guru, but I think I may safely say that this is ugly :) You can do this far more efficiently in Perl:

      foreach my $pos ( 0 .. $#files ) { $files[$pos] = qq '\t"$path\\$files[$pos]" $esquema\n'; }

      <parenthesis>Occasionally, you might also come across some rather quaint variations of the above, along the lines of:

      foreach ( my $pos = 0; $pos < @files; $pos++ ) { ... }

      but it's best to forget these unless you want to increment your counter ($pos) by anything other than 1.</parenthesis>

      However, you would still not be using the full power of Perl. To quote Marc Jason Dominus: "Any time you have a C-like for loop that loops over the indices of an array, you're probably making a mistake." A more perlish way to do what you want would be:

      foreach my $file ( @files ) { $file = qq '\t"$path\\$file" $esquema\n'; }

      In fact, you can use Perl's built-in $_ variable to shorten that even further:

      for ( @files ) { # 'for' and 'foreach' are equivalent $_ = qq '\t"$path\\$_" $esquema\n'; }

      From which, to end this far longer than intially intended post, it is only a short step to using map:

      @files = map { qq '\t"$path\\$_"   $esquema\n' } @files;

      HTH

      dave

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://286986]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-19 00:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found