Thursday, November 1, 2012

Deeper into Blackhole, URLs and dialects.

Written by Frank Angiolelli, CISSP

I am still focused on Blackhole URLs, specifically the binary get request. As I look deeper into the URL, tightening up the regex seems possible, as well as broadening the detection to catch those that use longer hex values. There are distinct dialects in the binary get request that are emerging.


The improved Regex

Binary Get Request:
\.php\?\w{2,8}\=(0[0-9a-b]|3[0-9]){5,32}\&\w{2,9}\=(0[0-9a-b]|3[0-9]){10}\&\w{1,8}\=\d{2}\&\w{1,8}\=\w{1,8}\&\w{1,8}\=\w{1,8}

Optimized by suggestions from Will Metcalf @node5. Thanks Will.

PDF Get Request:
\.php\?\w{2,9}\=(0[0-9a-b]|3[0-9]){5}\&\w{3,9}\=(3[0-9a-f]|4[0-9a-f])\&\w{3,9}\=(0[0-9a-b]|3[0-9]){10}\&\w{3,9}\=(0[0-9a-b]{1,8})00020002

Thanks to @Dr4g0nFlySm0k3 for widening out my sample set and testing.


Dialects in the Binary Get Request:

While the exact meaning of the dialects is unknown to me at this time, there are three distinct dialects I have seen in the binary get requests in the wild up to this point. By dialects, I'm referring to a particular pattern variation which is similar among groups of binary get requests.

Dialect 1: The 2by10
In this dialect, the first parameter is 2 letters followed by 10 hex (2by10). The second parameter is 2 characters followed by a 20 hex(2by20), then 1 character followed by two digits(1by2), 2by1 and 2by1. This seems to be the most common that I have seen in the wild and was the basis for my first regex to detect the binary.
/forum/links/column.php?tf=0735020b0b&ve=3307093738070736060b&f=02&nu=j&rw=m

Dialect 2: The 3by10
In this dialect, it goes 3by10, 3/4by20, the remainder varies however the third parameter is consistently a two digit number. I do not have enough of these to extrapolate a predictable pattern yet.

Dialect 3:The 4/5/6by64
In this dialect, the first parameter is 4,5 or 6 letters followed by a 64 character hex (4/5/6by64). The second parameter is 8 or 9(char) by 20 character hex (8/9by20). There is fluctuation in the remaining parameters but the third parameter is always a two digit number.
/links/tune-spreads-action.php?uxytgf=3306380338020a0b0b02360609350608350409050334350933080a3505063308&abnczdde=06090a3708050a063402&jvfagfn=02&pusr=uwelha&tibqqyl=rpfarbmb

/detects/stones-instruction_think.php?hij=0802340202&fwi=0b0a33350a0735020405&nktu=03&wai=mpevbgmy&xsrpwq=rjbgqjpy

This is only my observations of the values in the field and could represent a fingerprint which could be used to identify different actors, different versions of the exploit kit or different setups of the exploit kit.

What are the Hex values?


The hex values are comprised of two separate things, randomized garbage values and numeric digits intermixed. All hex values are either 00-0b or 30-39. the 00-0b are likely garbage, while the 30-39 represent numbers.

Any of us that analyzed or detected the old version of blackhole are familiar with the old f= & e= parameters, well I'm here to tell you it appears they still exist, only they have been morphed. In the new version of blackhole contains the same parameters obfuscated by using garbage hexidecimal values mix into each number as well as random characters inserted for good measure.

Let's break down one of the URLs.
/forum/links/column.php?tf=0735020b0b&ve=3307093738070736060b&f=02&nu=j&rw=m

0735020b0b = 5
07 = bell
35 = 5
02 = start of the text
0b = vertical tab
0b = vertical tab

3307093738070736060b = 3786
33 = 3
07 = bell
09 = Horizontal tab
37 = 7
38 = 8
07 = bell
07 = bell
36 = 6
06 = Acknowledge
0b = Vertical tab


Let's do another one.

/links/observe_resources-film.php?gf=050934030b&fe=0a050304380b37370a36&c=02&pr=n&od=v

050934030b = 4
05 = Enquiry
09 = Horizontal tab
34 = 4
03 = EndofText
0b = bell

0a050304380b37370a36 = 8776
0a = Line feed
05 = Enquiry
03 = EndofText
04 = EndofTransmission
38 = 8
0b = bell
37 = 7
37 = 7
0a = Line feed
36 = 6


Both of these URLs are of dialect 2by10. You will note that the first parameter turns out to be a single digit while the second value is four digits.


Now let's go back to the fake AV infection URLs I looked at on September 15th
hxxp://108.178.59.39/links/reveals_formed.php?udvf=03080407333603030a3302340235073836093508033706363836353505080833&tvaxpmbue=0a09380b0a3508360208&rdm=02&bnvru=dolz&gwxjfli=ewsxs


03080407333603030a3302340235073836093508033706363836353505080833 = 363458657686553


0a09380b0a3508360208 = 856

This follows a 4by64 dialect and the value of the first parameter is 363,458,657,686,553 and the second is 856.

Now Let's look at another one:
/links/tune-spreads-action.php?uxytgf=3306380338020a0b0b02360609350608350409050334350933080a3505063308&abnczdde=06090a3708050a063402&jvfagfn=02&pusr=uwelha&tibqqyl=rpfarbmb

This is a 6/64 dialect where the first parameter equals 38,865,545,353 and the second parameter equals 74.

Thanks to those who contributed their URLs to help broaden the analysis set and @Dr4g0nFlySm0k3  for discussions on the subject. #malwaremustdie.

2 comments:

  1. There is fluctuation in the remaining parameters.
    Lifeline System

    ReplyDelete
  2. @hemcoined - you are correct. If you view the script which generates those URLs, the parameters are generated via the version of software you have - i.e. Adobe reader. This is my understanding of the reason for the fluctuation.

    ReplyDelete