I am still focused on Blackhole URLs, specifically the binary get request. As I look deeper into the URL, tightening up the regex seems possible, as well as broadening the detection to catch those that use longer hex values. There are distinct dialects in the binary get request that are emerging.
The improved Regex
Binary Get Request:
\.php\?\w{2,8}\=(0[0-9a-b]|3[0-9]){5,32}\&\w{2,9}\=(0[0-9a-b]|3[0-9]){10}\&\w{1,8}\=\d{2}\&\w{1,8}\=\w{1,8}\&\w{1,8}\=\w{1,8}
Optimized by suggestions from Will Metcalf @node5. Thanks Will.
PDF Get Request:
Optimized by suggestions from Will Metcalf @node5. Thanks Will.
PDF Get Request:
\.php\?\w{2,9}\=(0[0-9a-b]|3[0-9]){5}\&\w{3,9}\=(3[0-9a-f]|4[0-9a-f])\&\w{3,9}\=(0[0-9a-b]|3[0-9]){10}\&\w{3,9}\=(0[0-9a-b]{1,8})00020002
Thanks to @Dr4g0nFlySm0k3 for widening out my sample set and testing.
Thanks to @Dr4g0nFlySm0k3 for widening out my sample set and testing.
Dialects in the Binary Get Request:
While the exact meaning of the dialects is unknown to me at this time, there are three distinct dialects I have seen in the binary get requests in the wild up to this point. By dialects, I'm referring to a particular pattern variation which is similar among groups of binary get requests.Dialect 1: The 2by10
In this dialect, the first parameter is 2 letters followed by 10 hex (2by10). The second parameter is 2 characters followed by a 20 hex(2by20), then 1 character followed by two digits(1by2), 2by1 and 2by1. This seems to be the most common that I have seen in the wild and was the basis for my first regex to detect the binary.
/forum/links/column.php?tf=0735020b0b&ve=3307093738070736060b&f=02&nu=j&rw=m
Dialect 2: The 3by10
In this dialect, it goes 3by10, 3/4by20, the remainder varies however the third parameter is consistently a two digit number. I do not have enough of these to extrapolate a predictable pattern yet.
Dialect 3:The 4/5/6by64
In this dialect, the first parameter is 4,5 or 6 letters followed by a 64 character hex (4/5/6by64). The second parameter is 8 or 9(char) by 20 character hex (8/9by20). There is fluctuation in the remaining parameters but the third parameter is always a two digit number.
/links/tune-spreads-action.php?uxytgf=3306380338020a0b0b02360609350608350409050334350933080a3505063308&abnczdde=06090a3708050a063402&jvfagfn=02&pusr=uwelha&tibqqyl=rpfarbmb
/detects/stones-instruction_think.php?hij=0802340202&fwi=0b0a33350a0735020405&nktu=03&wai=mpevbgmy&xsrpwq=rjbgqjpy
This is only my observations of the values in the field and could represent a fingerprint which could be used to identify different actors, different versions of the exploit kit or different setups of the exploit kit.
What are the Hex values?
Any of us that analyzed or detected the old version of blackhole are familiar with the old f= & e= parameters, well I'm here to tell you it appears they still exist, only they have been morphed. In the new version of blackhole contains the same parameters obfuscated by using garbage hexidecimal values mix into each number as well as random characters inserted for good measure.
Let's break down one of the URLs.
/forum/links/column.php?tf=0735020b0b&ve=3307093738070736060b&f=02&nu=j&rw=m
0735020b0b = 5
07 = bell
35 = 5
02 = start of the text
0b = vertical tab
0b = vertical tab
3307093738070736060b = 3786
33 = 3
07 = bell
09 = Horizontal tab
37 = 7
38 = 8
07 = bell
07 = bell
36 = 6
06 = Acknowledge
0b = Vertical tab
Let's do another one.
/links/observe_resources-film.php?gf=050934030b&fe=0a050304380b37370a36&c=02&pr=n&od=v
050934030b = 4
05 = Enquiry
09 = Horizontal tab
34 = 4
03 = EndofText
0b = bell
0a050304380b37370a36 = 8776
0a = Line feed
05 = Enquiry
03 = EndofText
04 = EndofTransmission
38 = 8
0b = bell
37 = 7
37 = 7
0a = Line feed
36 = 6
Both of these URLs are of dialect 2by10. You will note that the first parameter turns out to be a single digit while the second value is four digits.
Now let's go back to the fake AV infection URLs I looked at on September 15th
hxxp://108.178.59.39/links/reveals_formed.php?udvf=03080407333603030a3302340235073836093508033706363836353505080833&tvaxpmbue=0a09380b0a3508360208&rdm=02&bnvru=dolz&gwxjfli=ewsxs
03080407333603030a3302340235073836093508033706363836353505080833 = 363458657686553
0a09380b0a3508360208 = 856
This follows a 4by64 dialect and the value of the first parameter is 363,458,657,686,553 and the second is 856.
Now Let's look at another one:
/links/tune-spreads-action.php?uxytgf=3306380338020a0b0b02360609350608350409050334350933080a3505063308&abnczdde=06090a3708050a063402&jvfagfn=02&pusr=uwelha&tibqqyl=rpfarbmb
This is a 6/64 dialect where the first parameter equals 38,865,545,353 and the second parameter equals 74.
Thanks to those who contributed their URLs to help broaden the analysis set and @Dr4g0nFlySm0k3 for discussions on the subject. #malwaremustdie.
There is fluctuation in the remaining parameters.
ReplyDeleteLifeline System
@hemcoined - you are correct. If you view the script which generates those URLs, the parameters are generated via the version of software you have - i.e. Adobe reader. This is my understanding of the reason for the fluctuation.
ReplyDelete