Here is the example binary, it's streams of machine code. It's actually in Portable Executabe (PE) format. I'm not sure how to figure out whether there is structure or not (in the context of line breaks, etc...)
All that being said, I just realized my approach needs to be modified to basically take my defs, for example (say we are looking for <>,4044484<4@4D4H4</code> within streams, which do have an end, I just haven't figured out how to find them yet.
I see I actually want to do a match /,4044484<4@4D4H4/ on each line as opposed to comparing them. Which I can handle that. My part that I still am having trouble grasping is how do I treat this binary data? I think I'm just used to always working with text so this binary is throwing me off.
Hope this helps you guys help me.... (See "virus.a" below (but cut wayyy short)
JP
MZP @
+ !L!This program must be run under Win32
$7
+
+
+
+
+ PE L 1;
+ @ P
+ ! 0
+ PT p
+ .text `.data
+ F @ .tls ` @
+ .rdata p @ P.idata 0 "
+ @ @.edata 2 @ @.rsrc 0
+ 0 4 @ @.reloc ` V d @ P
+
+
+
+
+
+
+ fb:C++HOOKD D D R
+j y ‹# Z|" S# j 5 YhpD j S D j s 5 3 D
+áD ` PSh ù tM=D s
Qj Pp u
PP5D u 5D u _ù tnu D s z
+Ã=D r(5D ]u tPj蟷 P 5D lu Ã=D r5
+D u áD dg‹, ‹ÐSh D$PE P. ‹ ‹؉D
+u
D D D M [ÐD J D ;D tP荶 ÐU‹‹E‹U
+} tE E ‹
lE E E E D 3D 3D E <=E u8@
+ D D D ‹tE E4‹E ]ÐSV‹|E =E u
+; t‹3҉‹փ; u^[ÐU‹3Uh@ d0d E 3ZYYdh@
+ ]Ð-E U‹SVWD ' fE ‹E ‹ ‹E ‹‹
E ‹lD E ‹ U fE ‹E ‹‹U fE ԃ 3‹Ud
+ _^[‹] \@ Exception & 0 D T
+ @ X Sysutils::Exception ,@ x@
+U‹ UE@D
& E } |&fE M‹E c M3‹Ev ‹Md
} ~‹E蒳 ‹]Ð 0 @ D D
+H System::TObject 0 D H
+ dD L System::AnsiString \@ Exception * U‹j
+ hD hxD j : ]U‹j hD hD j " ]U‹ЈU~莲 MЈU
+ED $ fE ‹M3‹E4 Ej hlD hTD j Ѹ ‹Ud ‹E}
+ tF ‹]ÐU‹ЈU~" MЈUE4D G$ fE ‹M3‹E
+ E‹Ud ‹E} t ‹] @ TForm1 * U‹UE
+E ‹ YY]U‹̉UЉEԸD # fE Ep ‹E‹M‹ %M
+ fE E EhD u! uw‹E ‹ ME
+ ׯ ‹Md
‹]ÐU‹؉EXD 9# fE ‹U3ɉ
‹Ed ‹E‹]U‹QE‹E8 t‹U‹D Y]ÐU‹MUE‹Ef8
u
3‹E‹] D@ TForm * x@ AnsiString *
+0 @ P |@ T Forms::TForm @
+ 0 8 H @ L TForm1
+D@ @ TForm1D JA V vp U‹貯 UE
+$D ! } |m3‹EE ‹Ud } ~‹Ei ‹]ÐU‹ȉM
+UˉE̸dD {! E UE詭 EfE UE蕭 EUE臭 E
+0fE UEq E‹UˡC g ME ‹ ME {
+ ME k ME [ ‹Md
‹] U‹ЈU~肮 MЈUED fE u‹M3‹E1
+ E‹Ud ‹E} tJ ‹] U‹> UED Q E
+ } |fE m3‹E蘀 ‹Ud } ~‹E ‹] 0
+ D T PA X Forms::TCustomForm H@
+ x@ $ @ 8 @ L 0 L \
‹A ` Forms::TScrollingWinControl |@ 0
+ \ ` ,@ d System::DelphiInterface<Form
+s::IDesigner> 0 \ `
+@ d System::DelphiInterface<Forms::IOleForm>
+0 H X B \ Controls::TWinControl
+ l@ @ H x@ X U‹UE} t)‹E8 t
+‹U‹
Q‹P‹U3ɉ
Et u YYY]ÐU‹UE} t)‹E8 t‹U‹
Q‹P‹U3ɉ
Et u^ YYY]Ð 0 D T 8VB X
+ Controls::TControl <@ x@ p 0
+` d @ h System::DelphiInterface<Controls::IDo
+ckManager> $ 0 D T lyC
+X Classes::TComponent @ x@ U‹UE}
+ t)‹E8 t‹U‹
Q‹P‹U3ɉ
Et u YYY]Ð 0 H X 2C \
+ Classes::TPersistent ,@ U‹-E r]]U‹
+E r]]ì@ @ C D D D
+D D (D <D |@ tD @ @ TOrderedList@
+ @ `@ D D D D D (D <D |@ @ @ @ TS
+tack‹RЋÐSVt6 ‹‹3‹ XC F‹Ƅt
+g d ‹^[SVY ‹‹‹F ‹Ӏ‹ ~‹& ^[Ћ
+P‹JI‹‹P
ÐSV‹‹‹R‹‹C‹PJ# ‹^[Ћ@ ÐU‹3Uh @ d0d
+E 3ZYYdh @ f ]Ð-E U‹3UhI @ d0d E 3ZYYdhP
+ @ . ]Ð-E è @ !@ @ l8@
+C D yC D D (D <D 8@@ p@@ |C 3C }C L|C :@ C@ C }C }C
+ C D!@ C C tC |C xC ;@ THintAction !@ THintAction @ 89@
+ StdActns pC ` \B@ HintSVt* ‹‹3‹
+ FP ‹Ƅtf d ‹^[ÐU‹3Uh!@ d0d E 3ZYYd
+h!@ ]Ð-E "@ "@ "@ C
+ D D D D D (D <D 5@ 5@ TChangeLink"@ TImageInde
+x "@ "@ "@ "@ ` C C D yC
+D D (D <D #@ -@ 01@ ,@ }C L|C }C ~C C }C }C C D#@ )
+@ #@ .@ .@ TCustomImageList"@ TCustomImageList"@ C
+ ImgList ‹ЁtJt ø ЋЁ t u
+ø ÐSVt* ‹‹3‹U F( F$ ‹‹R4‹ƄtU
+ d ‹^[ÐSVWD ‹‹‹O
‹‹ ‹w@~ ‹GD ‹ ‹G@ 3G@‹GHt ‹Ӏ‹U
+ ~‹ _^[ÐU‹j S‹3Uh$@ d0d XC F C@‹C$|
= {(}!UE ‹MC z ] C, C5‹e
C7 C8C<@ S CD‹5 3ZYYdh$@ ED
+[Y]Ãx0 Ãx0 u U‹SEj E3Uhi%@ d0d ‹E‹XD‹E‹
+@$P‹E‹@(P‹EP蟥 ‹‹l ‹1 ‹@3C ‹‹R PEP‹‹R,‹
+33U EP‹ Z 3ZYYdhp%@ ‹EPj
‹E‹@Ht
‹E3҉PH[‹]S‹‹C(D$‹C$$TD$PR譤 ‹D$C(‹$C$‹
+YZ[ÐSV‹‹‹ t‹‹s0‹f ^[ÐS‹‹‹C0[ÐSV
+W‹‹‹‹‹F t/‹ <u ‹‹R`#‹‹‹Q‹‹Qh‹‹R`
+‹FD‹R`_^[ÐS‹‹Vt{6 u
‹P譣 3C0‹f [ÐU‹j SVW‹3Uh'@ d0d ‹s,VV3C5‹
+D
P‹C$P‹C(PX ‹{0u!UE ‹MC ‹C8=t
+ ‹‹ 3ZYYdh'@ E ` _^[Y]U‹SV‹‹ډE@
+ E3Uh'@ d0d @ E3Uh'@ d0d ‹EG‹M‹‹E~
+P‹M‹‹EpP‹E‹@0Ps E3ZYYdh'@ ‹E 3ZYYd
+h'@ ‹E ‹Ef ‹E^[‹]ÐS‹‹t‹PS
+ [3[ÐU‹j SV‹‹3Uh(@ d0d ‹;|!UE ‹MC
+ x [ ‹TtV‹P ‹f 3ZYYdh(@ EZ
+ ^[Y]ÐlÐSV‹‹‹t‹@P‹C0P訡 s8‹f
+ ^[ÐS‹‹t‹PM [‹C8[ÐU‹SVW‹U‹‹
+ } tD‹EP‹C<P‹Pj j ‹EP‹EP‹ P‹E
+P‹P e {H u.@ U ‹sH‹7 ‹S(‹‹Q@‹S$‹‹
+Q4‹CH ‹@ ‹C$PEP‹K(33 EP‹CHj Z$ j j h
+j j j j ‹CHK P‹EP‹Pz ‹EC$PEP‹MK(‹U‹Ed ‹
+CH E‹G ‹ ‹h VY j V' hF j j ‹EP
+‹C$P‹C(P‹E@P‹E@PV ‹G ‹N ‹h V j V hF
+ j j ‹EP‹C$P‹C(P‹EP‹EPV躟 _^[‹] U‹SVW‹‹‹‹t.W‹EP
+3C4‹D 3ҊS7D PEP‹‹U‹‹S0_^[] U‹SVWUE‹E‹@$
+PEP‹E‹H(33; ‹E?
3Uh,@ d0d @ a E3Uh›,@ d0d ‹E‹P$‹E‹Q4‹E‹P(‹E‹
+Q@@ ( E3Uh~,@ d0d ‹E ‹E‹P$‹E‹Q4‹E‹P(‹E‹Q
+@‹EPj ‹N|mF3‹EB ‹U‹ j j j ‹ PS‹EP ‹E
+ ‹U‹ jj j ‹ PS‹EP ‹M‹U‹ECNu3ZYYdh,@
+ ‹E 3ZYYdh,@ ‹E 3ZYYdh,@ ‹E
+ _^[‹]ÐSV‹‹t‹‹‹N^[ÐSV‹‹u
‹[^[‹‹4"@ ‹F5C5F7C7V4‹ F6C6‹
+‹‹H‹iu ‹f‹C$P‹C(P‹P ‹|PҜ
+‹‹$‹F<C<‹‹1 ‹‹z ^[ÐSV‹‹‹‹4"@ Q
+ F5C5F7C7V4‹ F6C6‹F<C<‹‹‹‹‹
+u ‹‹C$P‹C(P‹P] ‹P ‹‹l‹‹
+^[‹‹ ^[U‹j SV‹‹3Uh.@ d0d t9‹‹R ;C$|‹‹R,
+;C(}!UE ‹MC # 3ZYYdh.@ E) ^[
+Y]SV‹:V4tV4‹f ^[ÐS‹‹33 [SVWU‹FL~P 8‹F@t
+"‹xO|G3‹‹F@ f? EOuf~Z t‹‹F\VX]_^[ÐSVWU‹‹‹C@
+t.‹xO|&G3‹‹C@ ;u3E‹‹C@y FOu]_^[ÐB‹H@t‹
+ ÐSVW‹‹‹ ‹‹ ;u‹ ‹‹V‹C u3_^[
+U‹SVW‹‹t‹‹‹;t E ‹u‹
+u E @C ' E3Uh0@ d0d ‹U‹ @C
+E3Uh0@ d0d ‹U‹ ‹U‹EE3ZYYdh0@ ‹E
+3ZYYdh0@ ‹E E_^[‹]ÐU‹S‹E‹@x t7‹E‹@‹X ‹
+‹4"@ t‹E‹‹E‹@t3[]‹E‹@[]ÐU‹
+SUE‹U‹E`K ‹EPh4@ ‹EPh$5@ UtY‹Ⱥ1@ ‹E‹S[YY]
+ Bitmap U‹SVW‹ډEU ‹ U ‹ @
+ E3Uh 4@ d0d ‹d ‹‹‹E‹QP‹U‹\ @ E3
+Uh3@ d0d ‹‹E‹QP@ E‹E‹P(‹E‹Q@‹E‹P$‹E‹Q4
+@ E‹Eq ‹E‹P(‹E‹Q@‹E‹P$‹E‹Q4‹E‹@$PEP‹E‹H(3
+3 ‹E 3Uh3@ d0d ‹E‹R ‹U‹J$H @E3}
+ ‹E‹R,‹U‹J(‹N F3ۃ} ‹E‹@$PEP‹E‹H(‹E‹P$
+‹E‹@( EP‹E P‹E UY ‹E‹@$PEP‹E‹H(‹E‹P$
+‹E‹@(H EP‹E P‹E UYN ‹M‹U‹EMCNQ
+GM3ZYYdh3@ ‹E ‹E ‹E 3ZYYdh3@
+‹E ‹ 3ZYYdh4@ ‹E n _^[‹]ÐU‹QS‹j ‹ʡ0C
+=O E3Uh4@ d0d ‹EtP ‹‹k‹u‹
E C 9 3ZYYdh4@ ‹E [Y]SVW‹‹‹
+‹ D$‹Թ ‹‹ST$ ‹‹S$f‹$fD$‹T$‹Y
+ ‹$;D$tIu|$Lu‹‹ ‹‹l_^[ÐU‹QS‹j ‹ʡ0C
+ 1N E3Uh5@ d0d ‹EtP‹P$ u‹
E C 5 3ZYYdh5@ ‹E [Y]@PSxP ~HPx
+L t
@L f [ÐSV ‹‹‹Ft‹o‹Ӏ‹› ~‹ ^[ÐSf
+x
t‹‹P‹CS[ÐU‹3Uh56@ d0d E 3ZYYdh<6@ B ]Ð-
+E L6@ TContainedAction7@ C ActnList pC H ;@ d:@
+ Category6@ TActionLink9@ C ActnList 7@ T
+7@ L6@ f7@ 7@ P !C C D yC D D (D <D :@ 4C |
+C 3C }C L|C :@ ~C C }C }C C C C C tC |C xC ;@
+ pC H X;@ ;@ h:@ x:@ TContainedAction7@
+ D8@ 8@ .8@ H C C D yC D D (D <D <@
+4C |C 3C }C =@ }C ~C C }C }C C 4<@ (>@ <@ 4=@
+>@ ?@ TCustomActionListD8@ TCustomActionList7@ C ActnList
+ 8@ 9@ <9@ 9@ (9@ 6@ C D yC D D (D
+<D 8@@ p@@ |C 3C }C L|C :@ C@ C }C }C C ?@ C C tC |
+C xC ;@ pC T pC ` D@ <D@
TCustomAction<9@
TCustomAction8@ H6@ ActnList 9@ :@
+ P!C D D D D D (D <D DC @C xC ́C C ЁC C C
+C 0?@ D?@ X?@ l?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ T
+ActionLinkSV ‹‹‹FLt‹ ‹Ӏ‹G ~‹ ^[ЋPLt
‹R$ ÃÐЋPLt‹B ÃxL tB ÐSVW‹‹‹‹B ‹
+~,‹‹7@ d t ‹‹{ _^[ÐSVW‹‹‹|1‹SL‹z$‹W}
+3;‹N;t‹ג ‹CL‹@$‹‹ _^[ÐSV‹‹‹‹SHh tCH‹
+‹CLt‹R0^[ÐSV‹‹‹CL;tt‹ t ‹‹ ^[SV‹‹C
+u‹‹7@ › t ‹‹^[ÐSVW‹‹~Lt‹‹f u2
+E ‹ ‹ u ‹F u‹3Ҹ@ Ht3_^[ÐSVW‹‹~Lt‹
+‹f{ u2E ‹ ‹u u ‹F u‹3Ҹ? t Ht3_
+^[ÐU‹QSVt6 U‹3‹< XC C$!@ ‹s
+(^F=@ ‹À} tI d ‹^[Y]SVW8 ‹‹‹G( ‹;
+ ‹w$~ ‹G$ ‹Ӏ‹< ~‹ _^[ÐU‹SVUE‹E
+‹@$‹XK|$C3‹E‹@$‹ ‹P;Uu‹‹EUFKu^[YY] SVW‹‹‹‹‹C$
+! |‹‹6@ ‹]_^[ÐSV‹‹‹C,t‹S(‹ƉC,t‹S(
+‹‹C,l< ^[Ð;P,u‹R0ÐSVW‹‹‹‹‹‹> u*;~,u3‹
+‹‹6@ 2 t ‹‹% _^[ÐSV‹‹‹‹C$ ^L‹‹; ^[ÐSV
+‹‹‹‹C$ |3FL^[SVW‹f{2 t‹‹C4S0‹C$‹pN|F3‹‹C$
+ ‹R0GNuC t(‹s‹‹,sA t8 t‹8 ‹R_^[ÐSVWU‹‹
+‹F> ‹f‹F‹H ‹‹G$‹pN|,F3‹G$‹E f;hhu‹G$‹5 f
+ CNu3]_^[ÐSQ$ fx: t
‹‹‹C<S8$Z[ÐSQ$ fxB t
‹‹‹CDS@$Z[ÐS‹‹C‹l8@ [ÐS‹‹C‹l8@ [ÐS‹‹C‹l8@
+ [ÐS‹‹C‹l8@ [ÐS‹‹C‹l8@ { [ÐS‹‹C‹l8@ g [Ð
+S‹‹C‹l8@ S [ÐS‹‹C‹l8@ ? [ÐÐÐÐÐÐÐÐ
+SVt~ ‹‹3‹A FPFYFdFj‹Ƅt d
+‹^[SV ‹‹‹FxI ‹F|A ‹Ӏ‹~‹b ^[ÐSVW‹‹
+‹‹l8@ tU‹ST‹‹X SX‹ SY‹ ‹S\‹R ‹S`‹
+‹Sd‹ f‹Sh‹K Sj‹ ‹‹ _^[ÐSVWU‹‹‹‹ST~
+tM‹C@‹xO|1G3‹‹C@ ‹`9@ t‹‹C@ ‹‹Q@FOuҍCT‹
+ ‹‹R0]_^[ÐSVWU‹‹:]XtF‹E@‹xO|1G3‹‹E@ ‹`9@ t
+‹‹E@z ‹‹QDFOu҈]X‹‹R0]_^[SVWU‹‹:]YtF‹E@‹xO|1G3‹‹E@
+; ‹`9@ 0 t‹‹E@" ‹‹QHFOu҈]Y‹‹R0]_^[SVWU‹‹;k\tF‹
+C@‹xO|1G3‹‹C@ ‹`9@ t‹‹C@ ‹‹QLFOu҉k\‹‹R0
+]_^[SVWU‹‹‹‹S` tM‹C@‹xO|1G3‹‹C@ ‹`9@ y t‹‹C@
+k ‹‹QPFOuҍC`‹ ‹‹R0]_^[ÐSVWU‹‹;kdtF‹C@‹xO|1G3‹
+‹C@# ‹`9@ t‹‹C@
‹‹QTFOu҉kd‹‹R0]_^[SVWU‹‹f;khtG‹C@‹xO|1G3‹‹C@ ‹`
+9@ t‹‹C@ ‹‹QXFOufkh‹‹R0]_^[ÐSVWU‹‹:]jtF‹E@‹x
+O|1G3‹‹E@o ‹`9@ d t‹‹E@V ‹‹Q\FOu҈]j‹‹R0]_^[
+SVW‹‹‹F‹VT u
‹^t
C t3‹‹‹: t‹F@x u ‹‹_^[SQ$fxr t
‹‹‹CtSp$Z[ÐS‹‹‹RD{Y t‹1u3[ð[ÐU‹3UhD@ d0d
+ E 3ZYYdhD@ ]Ð-E False True
+ . 1ҊPDÐSVWt;1ɊH‹D‹ Ht‹|1;Ju\
2uIu@t9~݃_^[Ð8u‹|m ÐSVW tJ
+* 1ۊXt^
|tD
f‹X9t ODu‹Ft ‹ Ȋ*ߊX l2luKu_^[SW‹:‹?
+?t 1ɊO\‹Jz‹Rrw
r
f‹r‹ss t% _[t% _[ÐSVW‹7‹6>t
+ 1ۊ^\‹W u‹wwr 0 r
fr_^[ÐVW‹V u‹~~wr
8_^_^Á ‹‹ 1ɊHL8rȪ_^ÐSVW ‹‹‹
+‹‹ k ‹‹‹ _^[ÐSV‹‹‹‹= ^[ÐVW‹V
+ u‹~~wr
8_^_^Á _^ÐSV‹‹‹‹ ^[ÐVW‹V u
+‹~~wr
8_^_^Á _^ÐU‹j SVW‹‹‹3UhG@ d0d E‹
+ ‹M‹‹3ZYYdhG@ E _^[Y]ÐS‹‹t
+t
t[[~[U‹SV1‹
‹ Y\m‹Jzwa+eH@ yH@ z‹Rr PL
+ H@ H@ H@ H@ H@ >>
D >Á 4yH@ ^[] SV‹‹‹‹ ^[ÐVW‹V u‹~
+~wr
8_^_^Á _^ÐS‹Zzw‹Rq1r [
+[Á ‹‹YX[ÐU‹z‹Jw‹Ruur
+ ‹U‹MH] U‹3UhI@ d0d E uD I D ‹pC
+ 3ZYYdhI@ ]Ð-E I@ TTextLayout
| [reply] [d/l] |
There are no differences[*] between text and binary files except how you open them. Your plan would fail for text too. Consider trying to match "def\nghi" in a file whose content is "abcdef\nghijkl". You have the same problem whether the file is text (lines) or binary (blocks). The problem you really have is not text vs binary. If you solve this problem for text files, you also solve it for binary files.
If you know the length of the longest signature, you could use
my $longuest_sig_len = ...;
my $block_size = 4096;
$block_size = int(($longuest_sig_len + 1023) / 1024)
if $block_size < $longuest_sig_len;
local $/ = \$block_size;
my $block = '';
while (<$fh>) {
$block = substr($block, -($longuest_sig_len-1)) . $_;
... search for signature in $block ...
}
That's the approach I'd take if I was looking for one string. There are surely algorithms that are more efficient at search for a number of strings.
* — You can even use while (<FILE>) on a binary file, but it might read more than you expect. Setting $/ to a reference to a number (e.g. $/ = \1024; and $block_size = 1024; $/ = \$block_size;) solves that.
| [reply] [d/l] [select] |