in reply to Re^4: Begginer's question: If loops one after the other. Is that code correct?
in thread Begginer's question: If loops one after the other. Is that code correct?
Here is the code of the converter. It is for my non commercial beekeeping website that is on Serbian Latin alphabet. I am working on its new design and would like to have it converted into Cyrillic too. It is not small site, maybe few hundreds pages. I know there is a software for that but somehow, I don't like it. Till recently I never expected I could even have site in Cyrillic, or even could try to do it myself, but with Perl I think it is possible, even for such beginner like me.
It works for simple html pages. If I have external CSS files, maybe it will work with CSS pages too, didn't try yet. So, I ask monks just for the comments on my approach.
It reads a html file, converts the text into Cyrillic, leaves code untouched, and creates new html file in Cyrillic. Next steps are to read whole directory or whole website, and a lot of other things to be done, but it is not a part of my question now.
I read that input file/string part by part, where one part is either string code between <> or string with text > < To determine where is a code and where is a text, I have a parameter k that after "<" receives value 1 and after ">" value 2.
Subroutine converts strings. A hash contains dictionary of one to one equivalents. Letters that are the same (for example "a", "e" etc.) are omitted and I wonder if is it ok, for example are Latin and Cyrillic letter "a" are the same in html file and coding?
script prints output file on standard output too
#!/usr/bin/perl use strict; use warnings; use utf8; binmode(STDOUT, ":utf8"); use open ':encoding(utf8)'; # input/output default encoding will be # UTF-8 my $infile; # reads input file into string $infile open INPUT, "<index_latin.html"; undef $/; $infile =<INPUT>; close INPUT; my $k; # parameter =1 between < > , =2 between > < my $string; # "<code between>" my $txtstring = ''; # >"text between"< my $outcode = ''; # output: code and converted text together my $for_conv; # string to be converted by sub my $char; # chatacter from input file my $convert; # converted string by sub # splits input file into characters foreach $char (split//, $infile) { if ($char eq "<") { $k = 1; } if ($k ==2) { $txtstring= $txtstring . $char; } else { $string = $string .$char; } if ($char eq ">") { if (substr($txtstring, 0, 1) eq "&" ){ #   will not be converted $string =$txtstring.$string; #goes to string code $txtstring = ''; ## } $for_conv = $txtstring; $convert = konverter($for_conv); $outcode = $outcode .$convert.$string; $k = 2; $string = ''; $txtstring = ''; } # of if char eq ">" } # of foreach # writing to file my $filename = "index_cyrilic.htm"; open(FH, '>', $filename) or die $!; print FH $outcode ; close(FH); <readmore> print "\n"; print "code on the output:\n"; print "\n"; print "$outcode\n"; # converting string into Cyrillic sub konverter { # dictionary my %dict = ( "b"=> "б","B"=> "Б","c"=> "ц","C"=> "&# +1062;","č"=> "ч","Č"=> "Ч","ć"=> "ћ" +,"Ć"=> "Ћ","d"=> "д","D"=> "Д","đ"=> " +106;","Đ"=> "Ђ","f"=> "ф","F"=> "Ф","g"=> " +075;","G"=> "Г","h"=> "х","H"=> "Х","i"=> "и" +,"I"=> "И","l"=> "л","L"=> "Л","m"=> "м","n"= +> "н","N"=> "Н","p"=> "п","P" => "П","r" => " +р","R" => "Р","s"=> "с","S"=> "С",""=> "
 +96;",""=> "Ш","t"=> "т","u"=> "у","U"=> "У", +"v"=> "в","V" => "В","z"=> "з", "Z" => "З"," +"=> "ж",""=> "Ж"); my @conv_arr = split (//, $for_conv); # splits input string for conv +ersion my $ind = 0; # index of array element my $out = ""; # output, converted string my $str_char; # string character my $next; # next string character my $nj; # Latin two character letters to be replaced with one Cyrilli +c my $Nj; my $lj; my $Lj; my $dz; my $Dz; while ($ind <= $#conv_arr){ $str_char = $conv_arr[$ind]; # current character if ($ind ==$#conv_arr) { $next =""; # there are no more characters } else { $next =$conv_arr[$ind+1]; # next character } if (exists ($dict{$str_char})) { # combination nj gives $nj = "њ" if (($str_char eq "n") && ($next eq "j")){ $nj = "њ"; $out = $out.$nj; $ind = $ind+1; } elsif (($str_char eq "N") && ($next eq "j")){ $Nj = "Њ"; $out = $out.$Nj; $ind = $ind+1; } elsif (($str_char eq "l") && ($next eq "j")){ $lj = "љ"; $out = $out.$lj; $ind = $ind+1; } elsif (($str_char eq "L") && ($next eq "j")){ $Lj = "Љ"; $out = $out.$Lj; $ind = $ind+1; } elsif (($str_char eq "d") && ($next eq "")){ $dz = "џ"; $out = $out.$dz; $ind = $ind+1; } elsif (($str_char eq "D") && ($next eq "")){ $Dz = "Џ"; $out = $out.$Dz; $ind = $ind+1; } else { # one character letters $out = $out.$dict{$str_char}; } $ind++; } # of if exists else { $out = $out.$str_char; $ind++; } } # of while return $out; } # of sub </readmore>
Here is the html code of input file index_latin.html for testing
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>test</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> <p>primer <font color="#003300"><strong><font color="#006600">sajta</f +ont></strong></font></p> <table width="124" border="1" cellspacing="2" cellpadding="2"> <tr> <td width="41">Fh</td> <td width="63">Hjć</td> </tr> <tr> <td>abcd</td> <td>145</td> </tr> </table> <p> </p> <p> </p> <p><em>konverzija</em> iz latinice u ćirilicu čČ  +73;Đ nj Nj</p> <p>poslednji red</p> </body> </html>
it is the output code, I hope that I was successful with readmore
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>тeст</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> <p>примeр <font color="#003300"><strong> +<font color="#006600">сajтa</font></strong></font></p> <table width="124" border="1" cellspacing="2" cellpadding="2"> <tr> <td width="41">Фхшж</td> <td width="63">Хjћ</td> </tr> <tr> <td>aбцд</td> <td>145</td> </tr> </table> <p> </p> <p> </p> <p><em>koнвeрзиja</em> из &# +1083;aтиницe у ћи
 +88;илицу шШ чЧ +ђЂ жЖ њ Њ</p> <p>пoслeдњи рeд</p> </body> </html>
|
|---|