Hi I have a file that has log lines from mulitple windows OS in different languges.
The log file is always in ASCII (native format) on which the log file is saved, that means when the log file is
saved & opened in Notepad under an English OS, the Japanese and
German Chracter will display as native
charater sets (see below) . When the same log file is opened in Japnese OS, the Japanese characters will be displayed as
Japanese but the German ones still look like junk. I need to parse these log files to extract each fields to do
substitution, add or delete the fields, then write the result back to a new log file. (see Note)
I want to extract and replace those fields with new user defined ones (of course the user need
to supply the field in the same code page as that particular log line.
Is there a way to dynamically detect which code page(which character set) each line is using? Do I need to do that for my purpose?
What other ideas you guys have in mind?
Note:
1. each line is newline terminated
2. each field (or column) is delimited with comma (,) in the line 2. note that each field allows \\ inside the
field, that will interfere with the regex actions in Perl I think.
Thanks
230913132C20,50,1,131174,JAP-LINKSYS3-0,Administrator,,C:\Program File +s\netjumper\linkgrabber99\TSImages\testraffles.jpg,1,4,1,0,1090519040 +,"",1129718602,,0,101 0 0 File Remediation Delete +C:\\Program Files\\netjumper\\linkgrabber99\\TSImages\\testraffles.jp +g 2001 1 3f2d7104-9ac7-4867-aa40-73ea69b9a6a2,120783 +7606,4294905926,0,0,0,0,0,0,,0,0,0,0,JAP-NETGEAR3-0,{F38B5FB1-C17D-48 +4D-AB83-11D2F12B09E9},,(IP)-10.160.32.162,JAPSCS30,WGSCS3.0,00:0D:56: +7E:99:D3,10.0.0.359,,,,,,,,,,,,,,,,0,2A2AD14D03D22042B368E2D057BF9AE1 +,1184e2df-fd6d-4911-a004-b244a204cd43,78381056,JAP-LINKSYS3-0 230913132C20,50,1,131174,JAP-LINKSYS3-0,Administrator,,C:\Program File +s\netjumper\linkgrabber99\INSTALL.LOG,1,4,1,0,1090519040,"",112971860 +2,,0,101 0 0 File Remediation Delete C:\\Program F +iles\\netjumper\\linkgrabber99\\INSTALL.LOG 2001 1 3 +f2d7104-9ac7-4867-aa40-73ea69b9a6a2,1207837607,4294905926,0,0,0,0,0,0 +,,0,0,0,0,JAP-NETGEAR3-0,{F38B5FB1-C17D-484D-AB83-11D2F12B09E9},,(IP) +-10.160.32.162,JAPSCS30,WGSCS3.0,00:0D:56:7E:99:D3,10.0.0.359,,,,,,,, +,,,,,,,,0,2A2AD14D03D22042B368E2D057BF9AE1,1184e2df-fd6d-4911-a004-b2 +44a204cd43,78381056,JAP-LINKSYS3-0 230913132C20,50,1,131174,JAP-LINKSYS3-0,Administrator,,C:\Program File +s\netjumper\linkgrabber99\ReadMe.txt,1,4,1,0,1090519040,"",1129718602 +,,0,101 0 0 File Remediation Delete C:\\Program Fi +les\\netjumper\\linkgrabber99\\ReadMe.txt 2001 1 3f2 +d7104-9ac7-4867-aa40-73ea69b9a6a2,1207837608,4294905926,0,0,0,0,0,0,, +0,0,0,0,JAP-NETGEAR3-0,{F38B5FB1-C17D-484D-AB83-11D2F12B09E9},,(IP)-1 +0.160.32.162,JAPSCS30,WGSCS3.0,00:0D:56:7E:99:D3,10.0.0.359,,,,,,,,,, +,,,,,,0,2A2AD14D03D22042B368E2D057BF9AE1,1184e2df-fd6d-4911-a004-b244 +a204cd43,78381056,JAP-LINKSYS3-0 230913132C20,50,1,131174,JAP-LINKSYS3-0,Administrator,,C:\Program File +s\netjumper\linkgrabber99\UNWISE.EXE,1,4,1,0,1090519040,"",1129718602 +,,0,101 0 0 File Remediation Delete C:\\Program Fi +les\\netjumper\\linkgrabber99\\UNWISE.EXE 2001 1 3f2 +d7104-9ac7-4867-aa40-73ea69b9a6a2,1207837609,4294905926,0,0,0,0,0,0,, +0,0,0,0,JAP-NETGEAR3-0,{F38B5FB1-C17D-484D-AB83-11D2F12B09E9},,(IP)-1 +0.160.32.162,JAPSCS30,WGSCS3.0,00:0D:56:7E:99:D3,10.0.0.359,,,,,,,,,, +,,,,,,0,2A2AD14D03D22042B368E2D057BF9AE1,1184e2df-fd6d-4911-a004-b244 +a204cd43,78381056,JAP-LINKSYS3-0 230913132C20,50,1,131174,JAP-LINKSYS3-0,Administrator,,C:\Documents an +d Settings\Administrator\X^[g j[\vO\LinkGrabber99\L +inkGrabber99.lnk,1,4,1,0,1090519040,"",1129718602,,0,101 0 +0 File Remediation Delete C:\\Documents and Settings\\Admini +strator\\X^[g j[\\vO\\LinkGrabber99\\LinkGrabber99. +lnk 2001 1 3f2d7104-9ac7-4867-aa40-73ea69b9a6a2,1207 +837610,4294905926,0,0,0,0,0,0,,0,0,0,0,JAP-NETGEAR3-0,{F38B5FB1-C17D- +484D-AB83-11D2F12B09E9},,(IP)-10.160.32.162,JAPSCS30,WGSCS3.0,00:0D:5 +6:7E:99:D3,10.0.0.359,,,,,,,,,,,,,,,,0,2A2AD14D03D22042B368E2D057BF9A +E1,1184e2df-fd6d-4911-a004-b244a204cd43,78381056,JAP-LINKSYS3-0 230913132C20,50,1,131174,JAP-LINKSYS3-0,Administrator,,C:\Documents an +d Settings\Administrator\X^[g j[\vO\LinkGrabber99\U +nwise.lnk,1,4,1,0,1090519040,"",1129718602,,0,101 0 0 Fi +le Remediation Delete C:\\Documents and Settings\\Administrator +\\X^[g j[\\vO\\LinkGrabber99\\Unwise.lnk + 2001 1 3f2d7104-9ac7-4867-aa40-73ea69b9a6a2,1207837611,4294905 +926,0,0,0,0,0,0,,0,0,0,0,JAP-NETGEAR3-0,{F38B5FB1-C17D-484D-AB83-11D2 +F12B09E9},,(IP)-10.160.32.162,JAPSCS30,WGSCS3.0,00:0D:56:7E:99:D3,10. +0.0.359,,,,,,,,,,,,,,,,0,2A2AD14D03D22042B368E2D057BF9AE1,1184e2df-fd +6d-4911-a004-b244a204cd43,78381056,JAP-LINKSYS3-0 230A08123534,6,2,1,GER-NETGEAR3-0,Administrator,,,,,,,16777216,"Could +not scan 1 files inside D:\Project\DUMBV\all DUMBVirus\Crash in Turbo +\TMDGB292.cab due to extraction errors encountered by the Decomposer +Engines.",0,,0,,,,,0,,,,,,,,,,,{E36FDC15-54A2-484A-BA84-998C32062FC4} +,,(IP)-10.160.32.144,GER_SCS30,WG_GER-ENG,00:12:3F:61:75:21,10.0.0.35 +9,,,,,,,,,,,,,,,,0,A63A014939DAB04B9169884492DA3F9F,,,GER-NETGEAR3-0 230A08123534,6,2,1,GER-NETGEAR3-0,Administrator,,,,,,,16777216,"Could +not scan 1 files inside D:\Project\DUMBV\all DUMBVirus\Crash in Turbo +\TMTC8DD0.cab due to extraction errors encountered by the Decomposer +Engines.",0,,0,,,,,0,,,,,,,,,,,{E36FDC15-54A2-484A-BA84-998C32062FC4} +,,(IP)-10.160.32.144,GER_SCS30,WG_GER-ENG,00:12:3F:61:75:21,10.0.0.35 +9,,,,,,,,,,,,,,,,0,A63A014939DAB04B9169884492DA3F9F,,,GER-NETGEAR3-0 230A08123534,5,1,1,GER-NETGEAR3-0,Administrator,Dir II.A,D:\Project\DU +MBV\all DUMBVirus\DB1.LZH>>.COM,5,1,1,2147483904,16420,"",11314 +71642,,0,,0,433,0,0,0,1,1,1,20051107.019,49622,2,5,0,,{E36FDC15-54A2- +484A-BA84-998C32062FC4},,(IP)-10.160.32.144,GER_SCS30,WG_GER-ENG,00:1 +2:3F:61:75:21,10.0.0.359,,,,,,,,,,,,,,,,0,A63A014939DAB04B9169884492D +A3F9F,,0,GER-NETGEAR3-0 230A08123534,5,1,1,GER-NETGEAR3-0,Administrator,DSCE.2100,D:\Project\D +UMBV\all DUMBVirus\DB1.LZH>>|\.COM,5,1,1,2147483904,17444,"",113147 +1642,,0,,0,12253,0,0,0,1,1,2,20051107.019,49622,0,4,0,,{E36FDC15-54A2 +-484A-BA84-998C32062FC4},,(IP)-10.160.32.144,GER_SCS30,WG_GER-ENG,00: +12:3F:61:75:21,10.0.0.359,,,,,,,,,,,,,,,,0,A63A014939DAB04B9169884492 +DA3F9F,,0,GER-NETGEAR3-0 230A08123534,5,1,1,GER-NETGEAR3-0,Administrator,XM.Laroux.A,D:\Project +\DUMBV\all DUMBVirus\DB1.LZH>>.XLS,5,1,1,2147484928,17444,"", +1131471642,,0,,0,8105,0,0,0,1,1,3,20051107.019,49622,0,4,0,,{E36FDC15 +-54A2-484A-BA84-998C32062FC4},,(IP)-10.160.32.144,GER_SCS30,WG_GER-EN +G,00:12:3F:61:75:21,10.0.0.359,,,,,,,,,,,,,,,,0,A63A014939DAB04B91698 +84492DA3F9F,,0,GER-NETGEAR3-0 230A08123534,5,1,1,GER-NETGEAR3-0,Administrator,WM.NPAD Variant,D:\Pro +ject\DUMBV\all DUMBVirus\DB1.LZH>>{.DOT,5,1,1,2147484928,17444,"", +1131471642,,0,,0,7890,0,0,0,1,1,4,20051107.019,49622,0,4,0,,{E36FDC15 +-54A2-484A-BA84-998C32062FC4},,(IP)-10.160.32.144,GER_SCS30,WG_GER-EN +G,00:12:3F:61:75:21,10.0.0.359,,,,,,,,,,,,,,,,0,A63A014939DAB04B91698 +84492DA3F9F,,0,GER-NETGEAR3-0 230A08123534,5,1,1,GER-NETGEAR3-0,Administrator,Jeru.1808.Frere Jac,D: +\Project\DUMBV\all DUMBVirus\DB1.LZH>>X.EXE,5,1,1,2147483904,17444," +",1131471642,,0,,0,755,0,0,0,1,1,5,20051107.019,49622,0,4,0,,{E36FDC1 +5-54A2-484A-BA84-998C32062FC4},,(IP)-10.160.32.144,GER_SCS30,WG_GER-E +NG,00:12:3F:61:75:21,10.0.0.359,,,,,,,,,,,,,,,,0,A63A014939DAB04B9169 +884492DA3F9F,,0,GER-NETGEAR3-0

In reply to dynamically detect code page by edwardt_tril

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.