in reply to Binary Comparision

What do you mean by "binary string"? Are your input files actually line-oriented text data, as implied by your use of while (<>), or are they streams of non-text byte values (e.g. machine instructions, values to be loaded into registers, etc)? If the latter, is there any sort of structure for either of the inputs (fixed record length, or some sort of record delimiter)?

I expect the information given in earlier replies will be helpful, but based on the information you haven't given (sample data, or some description of it), I can't really tell what should be changed in your code, let alone how to change it.

Replies are listed 'Best First'.
Re^2: Binary Comparision
by punklrokk (Scribe) on Feb 28, 2007 at 03:49 UTC
    Here is the example binary, it's streams of machine code. It's actually in Portable Executabe (PE) format. I'm not sure how to figure out whether there is structure or not (in the context of line breaks, etc...)

    All that being said, I just realized my approach needs to be modified to basically take my defs, for example (say we are looking for <>,4044484<4@4D4H4</code> within streams, which do have an end, I just haven't figured out how to find them yet.

    I see I actually want to do a match /,4044484<4@4D4H4/ on each line as opposed to comparing them. Which I can handle that. My part that I still am having trouble grasping is how do I treat this binary data? I think I'm just used to always working with text so this binary is throwing me off.

    Hope this helps you guys help me.... (See "virus.a" below (but cut wayyy short)

    JP

    MZP    @     + !L!This program must be run under Win32 $7 + + + + + PE L 1; +       @     P  +        !  0 +  PT p  + .text     `.data +  F  @ .tls  `   @ + .rdata  p   @ P.idata 0  "  + @ @.edata    2 @ @.rsrc 0  + 0 4 @ @.reloc `  V d @ P + + + + + + + fb:C++HOOKD D D R +j y ‹# Z|" S# j 5 YhpD j S D j s 5 3 D +áD ` PSh ù tM=D s Qj Pp u PP5D u 5D u _ù tnu D s z +Ã=D r(5D ]u tPj蟷 P 5D lu Ã=D r5 +D u áD dg‹, ‹ÐSh D$PE P. ‹ ‹؉D +u D D D M  [ÐD J D ;D tP荶 ÐU‹‹E‹U +} tE E ‹ lE E E E D 3D 3D E <=E u8@ + D D  D  ‹tE E4‹E ]ÐSV‹|E =E u +; t‹3҉‹փ; u^[ÐU‹3Uh@ d0d E 3ZYYdh@ + ]Ð-E U‹SVWD ' fE ‹E ‹ ‹E ‹‹ E ‹lD  E ‹ U fE ‹E ‹‹U fE ԃ 3‹Ud + _^[‹]  \@ Exception &  0 D T +   @  X Sysutils::Exception ,@  x@  +U‹ U׉E@D & E } |&fE M‹E c M3‹Ev ‹Md } ~‹E蒳 ‹]Ð  0 @ D   D  +H System::TObject   0  D H   + dD  L System::AnsiString  \@ Exception * U‹j + hD hxD j : ]U‹j hD hD j " ]U‹ЈU~莲 MЈU׉ +ED $ fE ‹M3‹E4 Ej hlD hTD j Ѹ ‹Ud ‹E} + tF ‹]ÐU‹ЈU~" MЈU׉E4D G$ fE ‹M3‹E + E‹Ud ‹E} t ‹] @ TForm1 * U‹UE +E ‹  YY]U‹̉UЉEԸD # fE Ep ‹E‹M‹ %M + fE E EhD u! uw‹E ‹ ME + ׯ ‹Md ‹]ÐU‹؉EXD 9# fE ‹U3ɉ ‹Ed ‹E‹]U‹QE‹E8 t‹U‹D Y]ÐU‹MUE‹Ef8 u 3‹E‹]  D@ TForm *   x@ AnsiString *   +0 @ P   |@  T Forms::TForm @  +   0 8 H   @  L TForm1 +D@  @ TForm1D JA V vp U‹貯 U׉E +$D ! } |m3‹EE ‹Ud } ~‹Ei ‹]ÐU‹ȉM +UˉE̸dD {! E UE詭 EfE UE蕭 EUE臭 E +0fE UEq E‹UˡC g ME ‹ ME { + ME k ME [ ‹Md ‹] U‹ЈU~肮 MЈU׉ED  fE u‹M3‹E1 + E‹Ud ‹E} tJ ‹] U‹> U׉ED Q  E + } |fE m3‹E蘀 ‹Ud } ~‹E ‹]  0 + D T   PA  X Forms::TCustomForm H@  + x@ $ @ 8 @ L   0 L \ ‹A  ` Forms::TScrollingWinControl |@    0 +  \ `   ,@  d System::DelphiInterface<Form +s::IDesigner>   0  \ `   +@  d System::DelphiInterface<Forms::IOleForm>   +0 H X B  \ Controls::TWinControl + l@  @ H x@ X U‹UE} t)‹E8 t +‹U‹ Q‹P‹U3ɉ Et u YYY]ÐU‹UE} t)‹E8 t ‹U‹ Q‹P‹U3ɉ Et u^ YYY]Ð   0 D T   8VB  X + Controls::TControl <@  x@ p   0  +` d   @  h System::DelphiInterface<Controls::IDo +ckManager> $  0 D T   lyC  +X Classes::TComponent @  x@  U‹UE} + t)‹E8 t ‹U‹ Q‹P‹U3ɉ Et u YYY]Ð  0 H X   2C  \ + Classes::TPersistent ,@  U‹-E r]]U‹ +E r]]ì@ @  C D D D +D D (D <D |@ tD @ @ TOrderedList@ + @  `@ D D D D D (D <D |@ @ @ @ TS +tack‹RЋÐSVt6 ‹‹3‹ XC  F‹Ƅt +g d ‹^[SVY ‹‹‹F ‹Ӏ‹ ~‹& ^[Ћ +P‹JI‹‹P  ÐSV‹‹‹R‹‹C‹PJ#  ‹^[Ћ@  ÐU‹3Uh @ d0d  +E 3ZYYdh @ f ]Ð-E U‹3UhI @ d0d E 3ZYYdhP + @ . ]Ð-E è @ !@ @ l8@ +C D yC D D (D <D 8@@ p@@ |C 3C }C L|C :@ C@ C }C }C + C D!@ C C tC |C xC ;@ THintAction !@  THintAction @ 89@ +  StdActns pC ` \B@   HintSVt* ‹‹3‹ + FP ‹Ƅtf d ‹^[ÐU‹3Uh!@ d0d E 3ZYYd +h!@  ]Ð-E "@ "@ "@  C + D D D D D (D <D 5@  5@ TChangeLink"@  TImageInde +x "@ "@ "@ "@ ` C C D yC +D D (D <D #@ -@ 01@ ,@ }C L|C }C ~C C }C }C C D#@ ) +@ #@ .@  .@ TCustomImageList"@ TCustomImageList"@ C  + ImgList ‹ЁtJt ø ЋЁ t u +ø ÐSVt* ‹‹3‹U F( F$ ‹‹R4‹ƄtU + d ‹^[ÐSVWD ‹‹‹O  ‹‹ ‹w@~ ‹GD ‹ ‹G@ 3G@‹GHt ‹Ӏ‹U + ~‹ _^[ÐU‹j S‹3Uh$@ d0d XC F C@‹C$| = {(}!UE  ‹MC z  ] C, C5‹e C7 C8C<@ S CD‹5 3ZYYdh$@ ED  +[Y]Ãx0 Ãx0 u U‹SEj  E3Uhi%@ d0d ‹E‹XD‹E‹ +@$P‹E‹@(P‹EP蟥 ‹‹l ‹1  ‹@3C ‹‹R PEP‹‹R,‹ +33U EP‹ Z 3ZYYdhp%@ ‹EPj   ‹E‹@Ht  ‹E3҉PH[‹]S‹‹C(D$‹C$$TD$PR譤 ‹D$C(‹$C$‹ +YZ[ÐSV‹‹‹ t‹‹s0‹f ^[ÐS‹‹‹C0[ÐSV +W‹‹‹‹‹F t/‹ <u ‹‹R`#‹‹‹Q‹‹Qh‹‹R` +‹FD‹R`_^[ÐS‹‹Vt{6 u ‹P譣 3C0‹f [ÐU‹j SVW‹3Uh'@ d0d ‹s,VV3C5‹ +D P‹C$P‹C(PX ‹{0u!UE  ‹MC   ‹C8=t + ‹‹ 3ZYYdh'@ E ` _^[Y]U‹SV‹‹ډE@ + E3Uh'@ d0d @  E3Uh'@ d0d ‹EG‹M‹‹E~ +P‹M‹‹EpP‹E‹@0Ps E3ZYYdh'@ ‹E  3ZYYd +h'@ ‹E  ‹Ef ‹E^[‹]ÐS‹‹t‹PS + [3[ÐU‹j SV‹‹3Uh(@ d0d ‹;|!UE  ‹MC + x [ ‹TtV‹P ‹f 3ZYYdh(@ EZ + ^[Y]ÐlÐSV‹‹‹t‹@P‹C0P訡 s8‹f + ^[ÐS‹‹t‹PM [‹C8[ÐU‹SVW‹U‹‹ + } tD‹E P‹C<P‹Pj j ‹EP‹EP‹ P‹E +P‹P e {H u.@ U ‹sH‹7 ‹S(‹‹Q@‹S$‹‹ +Q4‹CH ‹@ ‹C$PEP‹K(33 EP‹CHj Z$ j j h +j j j j ‹CHK P‹EP‹Pz  ‹EC$PEP‹MK(‹U‹Ed ‹ +CH E‹G  ‹ ‹h VY j V' hF j j ‹EP +‹C$P‹C(P‹E@P‹E@PV  ‹G ‹N ‹h V  j V  hF + j j ‹EP‹C$P‹C(P‹EP‹EPV躟 _^[‹] U‹SVW‹‹‹‹t.W‹EP +3C4‹D 3ҊS7 D PEP‹‹U ‹‹S0_^[] U‹SVWUE‹E‹@$ +PEP‹E‹H(33; ‹E? 3Uh,@ d0d @ a E3Uh›,@ d0d ‹E‹P$‹E‹Q4‹E‹P(‹E‹ +Q@@ ( E3Uh~,@ d0d ‹E ‹E‹P$‹E‹Q4‹E‹P(‹E‹Q +@‹EPj ‹N|mF3‹EB ‹U‹ j j j ‹ PS‹EP ‹E + ‹U‹ jj j ‹ PS‹EP ‹M‹U‹ECNu3ZYYdh,@ + ‹E  3ZYYdh,@ ‹E  3ZYYdh,@ ‹E + _^[‹]ÐSV‹‹؅t‹‹‹N^[ÐSV‹‹؅u ‹[^[‹‹4"@   ‹F5C5F7C7V4‹ F6C6‹ +‹‹H‹iu ‹f‹C$P‹C(P‹P ‹|PҜ +‹‹$‹F<C<‹‹1 ‹‹z ^[ÐSV‹‹‹‹4"@ Q + F5C5F7C7V4‹ F6C6‹F<C<‹‹‹‹‹ +u ‹‹C$P‹C(P‹P] ‹P ‹‹l‹‹ +^[‹‹ ^[U‹j SV‹‹3Uh.@ d0d t9‹‹R ;C$| ‹‹R, +;C(}!UE  ‹MC #  3ZYYdh.@ E)  ^[ +Y]SV‹:V4tV4‹f ^[ÐS‹‹33 [SVWU‹FL~P 8‹F@t +"‹xO|G3‹‹F@ f? EOuf~Z t‹‹F\VX]_^[ÐSVWU‹‹‹C@ +t.‹xO|&G3‹‹C@ ;u3E‹‹C@y FOu]_^[ÐB‹H@t‹ + ÐSVW‹‹‹ ‹‹ ;u‹ ‹‹V‹C u3_^[ +U‹SVW‹‹؅t‹‹‹;t E ‹u‹ +u E @C ' E3Uh0@ d0d ‹U‹ @C  +E3Uh0@ d0d ‹U‹ ‹U‹EE3ZYYdh0@ ‹E  +3ZYYdh0@ ‹E  E_^[‹]ÐU‹S‹E‹@x t7‹E‹@‹X ‹ +‹4"@  t‹E‹‹E‹@t3[]‹E‹@[]ÐU‹ +SUE‹U‹E`K ‹EPh4@ ‹EPh$5@ UtY‹Ⱥ1@ ‹E‹S[YY] + Bitmap U‹SVW‹ډEU ‹ U ‹ @  + E3Uh 4@ d0d ‹d ‹‹‹E‹QP‹U‹\ @ E3 +Uh3@ d0d ‹‹E‹QP@ E‹E‹P(‹E‹Q@‹E‹P$‹E‹Q4 +@ E‹Eq ‹E‹P(‹E‹Q@‹E‹P$‹E‹Q4‹E‹@$PEP‹E‹H(3 +3 ‹E 3Uh3@ d0d ‹E‹R ‹U‹J$H @E3}  + ‹E‹R,‹U‹J(‹N F3ۃ}  ‹E‹@$PEP‹E‹H(‹E‹P$ +‹E‹@( EP‹E P‹E UY ‹E‹@$PEP‹E‹H(‹E‹P$ +‹E‹@(H EP‹E P‹E UYN ‹M‹U‹EMCNQ +GM3ZYYdh3@ ‹E ‹E ‹E  3ZYYdh3@ +‹E ‹ 3ZYYdh4@ ‹E n _^[‹]ÐU‹QS‹j ‹ʡ0C +=O E3Uh4@ d0d ‹EtP ‹‹k‹u‹ E C  9 3ZYYdh4@ ‹E  [Y]SVW‹‹‹ +‹ D$ ‹Թ ‹‹ST$ ‹‹S$f‹$fD$‹T$ ‹Y + ‹$;D$tIu|$Lu ‹‹ ‹‹l_^[ÐU‹QS‹j ‹ʡ0C + 1N E3Uh5@ d0d ‹EtP‹P$ u‹ E  C  5 3ZYYdh5@ ‹E   [Y]@PSxP ~HPx +L t @L f [ÐSV ‹‹‹Ft‹o‹Ӏ‹› ~‹ ^[ÐSf +x t ‹‹P‹C S[ÐU‹3Uh56@ d0d E 3ZYYdh<6@ B ]Ð- +E L6@ TContainedAction 7@ C  ActnList pC H ;@ d:@ +  Category6@  TActionLink9@ C ActnList 7@ T +7@ L6@ f7@ 7@ P !C C D yC D D (D <D :@ 4C | +C 3C }C L|C :@ ~C C }C }C C C C C tC |C xC ;@  +  pC H  X;@ ;@ h:@ x:@ TContainedAction7@ + D8@ 8@ .8@ H C C D yC D D (D <D <@ +4C |C 3C }C =@ }C ~C C }C }C C 4<@ (>@  <@ 4=@ +>@ ?@ TCustomActionListD8@ TCustomActionList7@ C  ActnList + 8@ 9@ <9@ 9@ (9@ 6@ C D yC D D (D +<D 8@@ p@@ |C 3C }C L|C :@ C@ C }C }C C ?@ C C tC | +C xC ;@   pC T pC `  D@ <D@ TCustomAction<9@  TCustomAction8@ H6@  ActnList 9@ :@ + P!C D D D D D (D <D DC @C xC ́C C ЁC C C +C 0?@ D?@ X?@ l?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ T +ActionLinkSV ‹‹‹FLt‹ ‹Ӏ‹G ~‹ ^[ЋPLt ‹R$  ÃÐЋPLt‹B ÃxL tB ÐSVW‹‹‹‹B ‹ +~,‹‹7@ d t ‹‹{ _^[ÐSVW‹‹‹|1‹SL‹z$‹W} +3;‹N;t‹ג  ‹CL‹@$‹‹ _^[ÐSV‹‹‹‹SHh tCH‹  +‹CLt‹R0^[ÐSV‹‹‹CL;tt‹ t ‹‹ ^[SV‹‹C  +u‹‹7@ › t ‹‹^[ÐSVW‹‹~Lt‹‹f u2 +E ‹ ‹ u ‹F u‹3Ҹ@  Ht3_^[ÐSVW‹‹~Lt‹ +‹f{ u2E ‹ ‹u u ‹F u‹3Ҹ? t Ht3_ +^[ÐU‹QSVt6 U‹3‹< XC  C$!@  ‹s +(^ F=@ ‹À} tI d ‹^[Y]SVW8 ‹‹‹G( ‹; +  ‹w$~ ‹G$ ‹Ӏ‹< ~‹ _^[ÐU‹SVUE‹E +‹@$‹XK|$C3‹E‹@$‹ ‹P;Uu‹‹E UFKu^[YY] SVW‹‹‹‹‹C$ +! |‹‹6@  ‹]_^[ÐSV‹‹‹C,t‹S(‹ƉC,t‹S( +‹‹C,l< ^[Ð;P,u‹R0ÐSVW‹‹‹‹‹‹> u*;~,u 3‹ +‹‹6@ 2 t ‹‹% _^[ÐSV‹‹‹‹C$ ^L‹‹; ^[ÐSV +‹‹‹‹C$ |3FL^[SVW‹f{2 t‹‹C4S0‹C$‹pN|F3‹‹C$ + ‹R0GNuC t(‹s‹‹,sA  t8 t ‹8 ‹R _^[ÐSVWU‹‹ +‹F> ‹f‹F‹H ‹‹G$‹pN|,F3‹G$‹E f;hhu‹G$‹5 f + CNu3]_^[ÐSQ$ fx: t ‹‹‹C<S8$Z[ÐSQ$ fxB t ‹‹‹CDS@$Z[ÐS‹‹C‹l8@  [ÐS‹‹C‹l8@  [ÐS‹‹C‹l8@ +  [ÐS‹‹C‹l8@  [ÐS‹‹C‹l8@ { [ÐS‹‹C‹l8@ g [Ð +S‹‹C‹l8@ S [ÐS‹‹C‹l8@ ? [ÐÐÐÐÐÐÐÐ +SVt~ ‹‹3‹A FPFYFdFj‹Ƅt d + ‹^[SV ‹‹‹FxI ‹F|A ‹Ӏ‹~‹b ^[ÐSVW‹‹ +‹‹l8@  tU‹ST‹‹X SX‹ SY‹ ‹S\‹R ‹S`‹  +‹Sd‹ f‹Sh‹K Sj‹ ‹‹ _^[ÐSVWU‹‹‹‹ST~ +tM‹C@‹xO|1G3‹‹C@ ‹`9@  t‹‹C@ ‹‹Q@FOuҍCT‹ + ‹‹R0]_^[ÐSVWU‹‹:]XtF‹E@‹xO|1G3‹‹E@ ‹`9@  t +‹‹E@z ‹‹QDFOu҈]X‹‹R0]_^[SVWU‹‹:]YtF‹E@‹xO|1G3‹‹E@ +; ‹`9@ 0 t‹‹E@" ‹‹QHFOu҈]Y‹‹R0]_^[SVWU‹‹;k\tF‹ +C@‹xO|1G3‹‹C@ ‹`9@  t‹‹C@ ‹‹QLFOu҉k\‹‹R0 +]_^[SVWU‹‹‹‹S` tM‹C@‹xO|1G3‹‹C@ ‹`9@ y t‹‹C@ +k ‹‹QPFOuҍC`‹ ‹‹R0]_^[ÐSVWU‹‹;kdtF‹C@‹xO|1G3‹ +‹C@# ‹`9@  t‹‹C@  ‹‹QTFOu҉kd‹‹R0]_^[SVWU‹‹f;khtG‹C@‹xO|1G3‹‹C@ ‹` +9@  t‹‹C@ ‹‹QXFOufkh‹‹R0]_^[ÐSVWU‹‹:]jtF‹E@‹x +O|1G3‹‹E@o ‹`9@ d t‹‹E@V ‹‹Q\FOu҈]j‹‹R0]_^[ +SVW‹‹‹F‹VT u ‹^t C t3‹‹‹: t‹F@x u ‹‹_^[SQ$fxr t ‹‹‹CtSp$Z[ÐS‹‹‹RD{Y t ‹1u3[ð[ÐU‹3UhD@ d0d + E 3ZYYdhD@  ]Ð-E  False  True +  . 1ҊPDÐSVWt;1ɊH‹D ‹ Ht‹|1;Ju\ 2uIu@t9~݃_^[Ð8u‹|m ÐSVW t J +* 1ۊXt^ | tD f‹X9t ODu‹Ft ‹ Ȋ*ߊX l2luKu_^[SW‹:‹? +?t 1ɊO\‹Jz‹Rr w  r f‹r‹ss t% _[t% _[ÐSVW‹7‹6>t + 1ۊ^\‹W u‹w wr 0 r fr_^[ÐVW‹V u‹~~ wr 8_^_^Á ‹‹ 1ɊHL8rȪ_^ÐSVW ‹‹‹ +‹‹׹ k ‹‹‹  _^[ÐSV‹‹‹‹= ^[ÐVW‹V + u‹~~ wr 8_^_^Á _^ÐSV‹‹‹‹ ^[ÐVW‹V u +‹~~ wr 8_^_^Á _^ÐU‹j SVW‹‹‹3UhG@ d0d E‹ + ‹M‹‹3ZYYdhG@ E  _^[Y]ÐS‹‹t  +t t[[~[U‹SV1‹ ‹ Y\m‹Jz wa+eH@ yH@ z ‹Rr PL +   H@ H@ H@ H@ H@ >> D >Á 4yH@ ^[] SV‹‹‹‹ ^[ÐVW‹V u‹~ +~ wr 8_^_^Á _^ÐS‹Zz w‹Rq1r [ +[Á ‹‹YX[ÐU‹z ‹Jw‹Ru ur  + ‹U‹M H] U‹3UhI@ d0d E uD I D  ‹pC +  3ZYYdhI@  ]Ð-E I@  TTextLayout 

      There are no differences[*] between text and binary files except how you open them. Your plan would fail for text too. Consider trying to match "def\nghi" in a file whose content is "abcdef\nghijkl". You have the same problem whether the file is text (lines) or binary (blocks). The problem you really have is not text vs binary. If you solve this problem for text files, you also solve it for binary files.

      If you know the length of the longest signature, you could use

      my $longuest_sig_len = ...; my $block_size = 4096; $block_size = int(($longuest_sig_len + 1023) / 1024) if $block_size < $longuest_sig_len; local $/ = \$block_size; my $block = ''; while (<$fh>) { $block = substr($block, -($longuest_sig_len-1)) . $_; ... search for signature in $block ... }

      That's the approach I'd take if I was looking for one string. There are surely algorithms that are more efficient at search for a number of strings.

      * — You can even use while (<FILE>) on a binary file, but it might read more than you expect. Setting $/ to a reference to a number (e.g. $/ = \1024; and $block_size = 1024; $/ = \$block_size;) solves that.

      MZPÿÿ¸@º´