dear monks i want to move some lines in an xml file
<?xml version="1.0" encoding="UTF-8"?>
<sec><title>The Impact of Aerosols</title>
<p>The impact of aerosols on our daily lives is large, as our activiti
+es are performed in an atmospheric sea containing gases and particles
+ (<xref ref-type="table" rid="ch1-t001">Table 1.1</xref>). The partic
+les, liquid and solid, organic and inorganic, viable and nonviable, i
+nfluence the environment. Natural particle phenomena include cloud fo
+rmation, the role of particles in the water cycle, the shaping of lan
+d by wind, pollination of plants, and the distribution of seeds and s
+pores. Human uses of aerosols include the atomization of fuels prior
+to combustion, the application of paints, cosmetics, medicines, insec
+ticides, and lubricants; and scientific uses.</p>
<p>Unfortunately, aerosols often cause problems which resist eradicati
+on. Among these are infectious diseases including the common cold, in
+fluenza, viral pneumonia, measles, mumps, and tuberculosis. Other dis
+eases in which inhaled particles often play a central role are bronch
+itis, pulmonary emphysema, asthma, diffuse interstitial fibrosis, alv
+eolitis, silicosis, anthracosilicosis, berylliosis, farmers lung, bys
+sinossis, lung cancer, and nasal cancer.</p>
<sec><title>Size Regimes</title>
<p>The great diversity in particle size, shape, and composition makes
+it impossible to describe aerosol behavior simply. As a starting poin
+t, one can divide aerosols into <italic>regimes</italic> (<xref ref-t
+ype="table" rid="ch1-t002">Table 1.2</xref>). These regimes, which en
+compass given size ranges, are each associated with sets of equations
+ that describe the physical behavior of aerosols. An important dimens
+ionless parameter, the <italic>Knudsen number</italic>, Kn, which rel
+ates the particle radius, r<sub>p</sub>, to the molecular mean free-p
+ath of the suspending gas, <italic>λ</italic>g, is given by:</
+p>
<disp-formula id="ch1equ-001"><tex-math><?TeX \begin{equation}<$$>{\rm
+{Kn}} = {{{\lambda _{\rm{g}}}} \over {{{\rm{r}}_{\rm{p}}}}}<$$>\end{e
+quation}?></tex-math><graphic xmlns:xlink="http://www.w3.org/1999/xli
+nk" xlink:href="ch1equ-001.gif"/></disp-formula>
<p><bold>Cloud</bold>—Any free (not spatially confined) aerosol
+system with a definite overall shape and size. Rain clouds and smoke
+rings are examples.</p>
<p><bold>Colloid</bold>—A dispersion of liquid or solid particle
+s in a gas, liquid, or solid medium that has all of the following pro
+perties: slow settling, large surface to volume ratio, invisibility t
+o the unaided eye, and producing scattering of a light beam. Examples
+ include smoke, milk, and gelatin.</p>
<p>Several reference books on aerosols have been published. The basic
+theoretical reference is a work by Nicholai A. Fuchs (1964) entitled
+<italic>The Mechanics of Aerosols</italic>, which was translated from
+ Russian into English by R.E. Daisley and Marina Fuchs and edited by
+C.N. Davies. A variety of additional books, some general and some spe
+cialized, are presented in <xref ref-type="table" rid="ch1-t003">Tabl
+e 1.3</xref>. Although not exhaustive, the listed references cover mo
+st problems that arise in studies with aerosols.</p></sec></sec>
<table-wrap id="ch1-t001" position="float"><label>Table 1.1<x> </x></l
+abel><caption><p></bold> Some Particles Commonly Found in Air, Their
+Sizes and Impacts on Natural Phenomena and Human Health </caption><gr
+aphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="ch1t001.
+gif"/></table-wrap>
<table-wrap id="ch1-t002" position="float"><label>Table 1.2<x> </x></l
+abel><caption><p></bold> The Major Particle Regimes and the Dependenc
+e of Various Properties on Particle Radius </caption><graphic xmlns:x
+link="http://www.w3.org/1999/xlink" xlink:href="ch1t002.gif"/></table
+-wrap>
<table-wrap id="ch1-t003" position="float"><label>Table 1.3<x> </x></l
+abel><caption><p></bold> Selected References on Aerosols </caption><g
+raphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="ch1t003
+.gif"/></table-wrap>
</sec>
</sec>
for example
this line is moved to end of line when spoted table1.1 this is my code to move the lines
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my $sgmfile= $ARGV[0];
my $key = '';
my $xline = '';
open (STDOUT, ">output.xml") or die $!;
open (FILE, "<$sgmfile") or die "Can't open $sgmfile: $!\n";
my $line = '';
my %hash = ();
my $hash = \%hash;
my @tmpA = ();
open (STDOUT, ">output.xml") or die $!;
foreach $line (<FILE>) {
chomp $line;
if($line =~ m/<table-wrap id=(.*?) position=(.*?)><label>(.*?)<x> <
+\/x><\/label><caption><p>(.*?)<\/caption><graphic xmlns:xlink=(.*?) x
+link:href=(.*?)\/><\/table-wrap>/){
my @array = $line;
($hash,@tmpA) = recur(\@array);
} else{print $line."\n";}
}
close FILE;
open (XFILE, "<output.xml") or die "Can't open output.xml: $!\
+n";
foreach $key (sort (keys(%hash))) {
print $key,"\n";
print $hash{$key},"\n";
foreach $xline(<XFILE>){
chomp $xline;
if ($xline =~ m/$key/){
print $xline,$hash{$key},"\n";
delete $hash{$key};
}
else{ print $xline,"\n";}
}
}
close XFILE;
sub recur($) {
my $arrRef = $_[0];
my @procArr = @$arrRef;
my $assign = '';
my @tmpArr = @procArr;
shift(@tmpArr);
my $arr = $procArr[0];
if ($arr =~ m/<table-wrap id=(.*?) position=(.*?)><label>(.*?)<x>
+<\/x><\/label><caption><p>(.*?)<\/caption><graphic xmlns:xlink=(.*?)
+xlink:href=(.*?)\/><\/table-wrap>/ig) {
$assign = $3;
my $tmpVal = '';
$tmpVal = "<table-wrap id=$1 position=$2><label>$3<x> <\/x><\/
+label><caption><p>$4<\/caption><graphic xmlns:xlink=$5 xlink:href=$6\
+/><\/table-wrap>";
$hash{$assign} = $tmpVal;
foreach $line (<FILE>) {
chomp $line;}
}
else{}
return (\%hash, @tmpArr);
}
this code is move the first line only please anyone help me to debug this code
Re: file handling error
by dHarry (Abbot) on Sep 16, 2010 at 15:46 UTC
|
I agree with Javafan. Your code is a bit messy and difficult to make sense of. Starting from scratch might not be such a bad idea. In general if you have to deal with XML documents it's better to use a parser, e.g. XML::LibXML. The way you do it often backfires, a slight change to the xml format probably means "surprise".
| [reply] |
|
|
My “wisdom” on this issue is that ... there ought to be a big sign pasted over the doorway that leads to “the way you’re trying to do it now.” That sign would read:
Through me you pass into the city of woe:
Through me you pass into eternal pain:
Through me among the people lost for aye.
Justice the founder of my fabric mov'd:
To rear me was the task of power divine,
Supremest wisdom, and primeval love.
Before me things create were none, save things
Eternal, and eternal I endure.
All hope abandon ye who enter here.
“The right way to do it ... the only way to do it ...” is with an XML parser (package from CPAN). If you take something like, say, XML::Simple, this will be able to parse the XML into a hashref structure. You can then manipulate that structure any way you need to. Then, write it out.
When writing processing routines like this, I always try to follow two principles:
- Always write the output to a different file than the input. This gives you two generations of the data: “before,” and “after.”
-
Sounds silly, but ... make very sure that the manipulation that you are supposed to be doing, has not already been done. Write the code to be very defensive. Make it examine the data, looking for trouble. Make it respond gracefully and robustly in all circumstances.
| |
Re: file handling error
by JavaFan (Canon) on Sep 16, 2010 at 15:19 UTC
|
It's hard to understand your code if you let a random number generator be in charge of indentation. How can you ever code that way?
Anyway, ignoring all the pointless juggling you're doing with arrayrefs if all you're passing into a sub is a single string (and the verbose way of unpacking the ref), never mind the double matching, it seems you're writing to a file (after opening it twice!), then reading from it before flushing or closing.
I've given up on trying to understand what you possible are trying to do. I suggest you remove the code, and start over. | [reply] |
Re: file handling error
by graff (Chancellor) on Sep 17, 2010 at 01:57 UTC
|
With all due respect to the monks who suggest that you use an XML parsing module here (I do agree with them in general), I'm afraid that approach won't work for the sample data you posted. I hope (for your sake) that you just made some copy/paste errors when you were trying to post.
When I tried using XML::Parser on it, I got failures on the entity references λ and — -- maybe a good DTD will solve this. (I just converted them to unicode numeric entities -- λ — respectively.)
But beyond that, your 15 lines of XML data is badly messed up in terms of tag layout, and cannot be parsed as XML. Here's a little one-liner that will reduce your data sample to just the tags, putting one tag per line:
perl -pe 's{(?<=>)[^<]*}{\n}g' your_file.xml
That just replaces 0 or more characters between a > and a following < with a line-feed. Look carefully at the output, and you'll see what a mess it is. One big issue is that it starts with <sec>, and there's additional XML content after the corresponding </sec> token.
UPDATE: To see what I'm talking about more clearly, pipe the output of that one-liner through grep, like this:
perl -pe 's{(?<=>)[^<]*}{\n}g' your_file.xml | grep sec
| [reply] [d/l] [select] |