The Gootloader malware family uses a distinctive form of social engineering to infect computers: Its creators lure people to visit compromised, legitimate WordPress websites using hijacked Google search results, present the visitors to these sites with a simulated online message board, and link to the malware from a simulated “conversation” where a fake visitor asks a fake site admin the exact question that the victim was searching for an answer to.
Most of the infection process is driven by code that runs on the compromised WordPress server and another server we have previously named “the mothership” that orchestrates an elaborate and complex dance to dynamically produce a page that seemingly answers the exact question you’re asking. Gootloader’s operators make behind the scenes, almost unnoticeable changes to the compromised WordPress sites that cause those sites to load the extra content from the mothership.
Every aspect of this process is obfuscated to such a degree that even the owners of the compromised WordPress pages often cannot identify the modifications in their own site or trigger the Gootloader code to run when they visit their own pages. At the same time, unless you control one of the affected WordPress sites, it can be very difficult (if not impossible) to get a hold of this code to study it: The modified WordPress database entries and PHP scripts that comprise Gootloader reside only on the compromised server, where security researchers normally cannot access them (barring physical or shell access to the server, itself).
Sophos X-Ops has previously reported on various aspects of Gootloader. However, Sophos X-Ops has reconstructed how Gootloader’s server-side operations function, using breadcrumbs and clues left by both the threat actors (and by other security researchers) published in open-source tools around the internet. We have pulled this collective knowledge together into this report.
In this post, I’ll explain how I was able to reconstruct how the malicious SEO works; how the landing page code on the initial, compromised website validates visitors then redirects some of them to a second website; how the Gootloader operators use the second website to generate a realistic-looking message board dynamically; how the multi-stage infection process works; and how all of these parts are orchestrated by a “mothership” server, controlled by Gootloader’s operators, to control who gets attacks, and which visitors get bounced back to Google’s homepage.
Gootloader’s poisoned SEO
A list of Gootloader JScript filenames, which correspond to the search query that led victims to download them
Gootloader has been using a virtually unchanged malicious SEO method for nearly eight years. When we have done threat hunting in the past, we’ve used our own telemetry to find the key phrases Gootloader used to deliver a malicious JScript file: Gootloader always names these first-stage files to match the search phrase that led the victim into the trap.
Finding new names for these first-stage downloaders also means discovering new phrases the Gootloader operators are using as lures. It was VirusTotal’s live hunting and retrohunting services that led us to these updated payloads, despite the fact that Gootloader’s creators use code obfuscation to an almost absurd degree. We had to come up with creating threat hunting queries such as the following Yara rule:
rule gootkit_js_stage1 { strings: $a1 = /function .{4,60}{return .{1,20} % .{0,8}(.{1,20}+.{1,20});}/ $a2 = /function [w]{1,14}(.{1,14},.{1,50}) {return .{1,14}.substr(.{1,10},.{1,10});}/ $a3 = /function [w]{1,14}(.{1,50}) {return .{1,14}.length;.{1,4}}/ $a4 = /function [w]{1,14}(.{0,40}){.{0,40};while ([w]{1,20} < [23][d]{3}) {/ $b1 = /;WScript.Sleep([d]{4,10});/ $b2 = /function [w]{1,14}(.{0,40}) {.{0,40};while([w]{1,20}<([w]{1,14}*[d]{1,8})){[w]{1,14}++}}/ condition: all of ($a*) and any of ($b*) }
While this rule was effective at the time of our research, Gootloader’s operators have subsequently modified the JScript to render this search obsolete. In order to stay on top of these changes, we needed to analyze newer versions of the heavily obfuscated JScript code.
As part of the obfuscation, the attackers break up the code. Every elementary capability is implemented in a separate function, initially featuring randomly generated variables, then later switching to variable names selected from a dictionary.
In the example above, $a1, $a2, and $a3 match functions that performed the elementary tasks in the decryptor.
$a1 matches the function that determines the parity of a number, matching this obfuscated form:
function dance(expect,support,thin,foot,had){return expect % (magnet+magnet);}
$a2 matches the function that returns a substring from a string, matching this obfuscated form:
function supply(spoke,seed,your,build,charge,carry,sat) {return spoke.substr(seed,your);}
$a3 returns the length of a string, matching this obfuscated form:
function verb(consonant) {return consonant.length; }
$a4 implements the main decoder loop: it contains the length of the encoded part (somewhere between 2000 and 4000 bytes), matching this obfuscated form:
function wave(down){against=kill;hole="";while (against < 2146) {spell=cause(down,against);hole=cool(hole,spell,against); against++; }return hole;}
The code used long delays to make dynamic analysis more difficult, extending to hours the time needed to properly run the code.
Initially, Gootloader used the WScript.Sleep function (matched by $b1) to create this delay. Over time, Gootloader’s creators replaced this with a less recognizable implementation (matched by $b2), like this function, which essentially increments a counter for a very long time:
function string2(evening6) { sky0=25; while(sky0<(evening6*4921)){ sky0++ } }
Even though the code is highly obfuscated, knowing the structure of the code enabled us to create the above seemingly loose Yara rule – which caught thousands of first stage downloader scripts with zero false positives.
Once we had the original file names, we had the search terms. With those, we could find the landing pages: The Gootloader operators were successful in manipulating the search results and the compromised landing sites, such that they end up near the top of the search results (even in the first result, as in the example below):
Gootloader has poisoned search results in multiple languages, including German, French, and Korean
How did the malicious pages end up at the top of the search results?
We were able to learn how the malicious SEO was so effective by inspecting the HTML source of the search landing pages.
There is a hidden element, the name of which is actually a server ID, used at many places in the code (a47ec48 in the following example). It starts with the letter ‘a’ followed by 6 hexadecimal characters:
...
Source of the Gootloader landing pages reveal a number of different search terms and phrases the threat actors wanted search engines to index. The linked subpages (selected with green) don’t actually exist. The injected WordPress code defines a few hooks, one of them is for non-existing pages. This will serve the fake forum discussion, when the victim clicks on the search result
That hidden element had links (selected with green) and the matching targeted search terms (selected with brown):
This hidden element will not be visible to human webpage visitors. But search engine crawlers see and process it, which tricks the search engines into treating the website as if it provides relevant content on the poisoned search term, thus ranking the site high in the search results.
A screenshot of the source code from a Gootkit/Goodloader landing page. Image courtesy of Sucuri Research.
The report (and screenshot) revealed three promising strings:
The request: $_GET[‘a55d837’
A malicious web domain name: ‘my-game[.]biz’
A SQL query (shown on a different screenshot in Sucuri’s blog): ‘SELECT * FROM backupdb_’
Searching Google for code fragment $_GET[‘a55d837’ led us to an online decoder page, where the result (now deleted) of another researcher’s query revealed the encoded version of the PHP code used in the malicious web page:
function qwc1() { global $wpdb, $table_prefix, $qwc1; $qwc2 = explode('.', $_SERVER["x52105x4d117x54105x5f101x44104x52"]); if (sizeof($qwc2) == 4) { if ($wpdb - > get_var("x53105x4c105x43124x20105x58111x53124x5340x28123x45114x45103x5440x2a40x46122x4f115x20142x61143x6b165x70144x62137".$table_prefix. "x6c163x74141x7440x57110x45122x4540x77160x2075x2047".$qwc2[0]. '|'.$qwc2[1]. '|'.$qwc2[2]. "x2751x3b") == 1) {
and the decoded version of that same script:
function qwc1() { global $wpdb, $table_prefix, $qwc1; $qwc2 = explode('.', $_SERVER["REMOTE_ADDR"]); if (sizeof($qwc2) == 4) { if ($wpdb - > get_var("SELECT EXISTS (SELECT * FROM backupdb_".$table_prefix. "lstat WHERE wp = '".$qwc2[0]. '|'.$qwc2[1]. '|'.$qwc2[2]. "');") == 1) {
While it isn’t clear how the code ended up on that website, the Internet never forgets: Search engines found and indexed this analysis. This gave us the first insight at what the injected code of the compromised landing pages would look like.
(Both the analysis linked above, and another page I subsequently found on malwaredecoder.com, were later removed by their respective site owners. Search results that reveal ephemeral analysis pages like these are only available for a short period of time. If you plan to cite source materials from sites such as these, keep an offline copy of the page, because they may not be there when you return.)
At this point we didn’t know exactly how the sites are compromised, but we knew from the report that malicious PHP code is somehow inserted into the WordPress installation.
The search on Virustotal for content:”SELECT * FROM backupdb_” gives a couple of files from a compromised server that contain an error message:
WordPress database error: [Table 'interfree.backupdb_wp_lstat' doesn't exist] SELECT EXISTS (SELECT * FROM backupdb_wp_lstat WHERE wp = '117|50|2');
The criminals are likely using the database backupdb_wp_lstat, which must have been removed from the server during a cleanup. We were hunting for this content on VirusTotal (search term: content:”backupdb_wp_lstat”), hoping we would stumble upon a database dump. It is always a good idea to set up these rules and do additional retrohunts, which can reveal other valuable files or data.
We were lucky, and found an archive file containing a SQL dump of the WordPress database from a compromised server on a public malware repository.
The WordPress database dump included this table that contains a set of the first three octets of IP addresses, a block list of IP ranges that cannot revisit the Gootloader website on the same day
The dumped database contains a table called backupdb_wp_lstat. Later analysis determined that this table contains the IP address blocklist the malicious website uses to prevent repeat visits.
The obfuscated PHP code was also viewable in the database dump:
A block of base64-encoded data stored as a variable named $pposte in a WordPress database
…as was the injected SEO poisoning content, with the j$k..j$k marker:
Malicious SEO content phrases embedded in a WordPress database table, linking the site to an Excel spreadsheet converter search query
Researchers who want to hunt for this identifiable string in the Descriptions property of the malicious landing pages can use the regex /j$k([0-9]{1,10})j$k/
The “place marker” string appears in the OpenGraph metadata SEO headers of a Gootloader-modified web page
This marker serves as placeholder for the spot where Gootloader’s link to the page renderer script is inserted. When the Gootloader page is served up, it excludes the marker from the page source.
However, the code extracted from the SQL database dump was not exactly the same as what was shown in the Sucuri blog. We continued searching for more examples by pivoting on the C2 server my-game[.]biz, and found a handful of PHP files referring to that server:
Files that contain references to the Gootloader “mothership” website (screenshot courtesy of VirusTotal)
The submission name commented_functions.php looked promising. Indeed, it turned out to be likely the work of a researcher, analyzing the PHP source code from the compromised WordPress installation. It was kindly documented in detail, saving us some analysis time (which also helped because we didn’t have all the components).
Commented text, preceded with double slashes, documents the Gootkit characteristics of modified web pages
We were able to use the base64 string referenced in the “html” comment above to search VirusTotal, which led us to a (relatively) recently uploaded SQL dump.
a WordPress database dump in VirusTotal
The dump file contained the previously referenced base64 blob…
A SQL dump from a compromised WordPress installation contains base64-encoded elements of the Gootkit/Gootloader modifications
…which, when decoded, output the same code that was originally published by Sucuri:
The decoded base64 data from the WordPress database reveals the PHP script that handles decoding the malicious content for a site visitor
With this in hand, we had greater confidence in the provenance of this malicious code. We also identified the table where Gootloader stores it in a compromised WordPress database. Having located the dump of the WordPress database and the PHP code on the online decoder site, we have a complete copy of the malicious content hosted on the compromised landing sites.
What’s in the landing page code?
This code contains a simple PHP command shell, which the Gootloader attackers can use to maintain access to compromised pages.
The variable $pposte holds the name of the parameter that gets executed. If the compromised website receives an HTTPS POST with that string in it, the code on the page will decode and execute any base64 encoded commands it receives, turning into a bare-bones command shell the attackers can use to maintain control over the server:
A simple command shell Gootloader inserts into the PHP running in a WordPress site the threat actors have compromised
At other points inside the code, the script defines filters for WordPress events, which trigger the execution of functions based on predefined conditions.
For example, the following function executes once the attackers have set up the compromised WordPress environment: the invoked code (referenced as “qvc5”) initializes the backupdb_wp_lstat database table.
add_action("wp", "qvc5");
This snippet from the qvc5() function initializes the backend databases used by Gootloader:
On preparing the requested web page, the malicious event handler hooks build the request to the “mothership” (a name I’ve given to the website the Gootloader operators use to centrally manage their fleet of compromised blogs). The communication sends the mothership the following parameters of the initial request, all in base64 encoded form:
One of Gootloader’s most problematic behaviors is that it only allows the potential victim to visit the site once in a 24-hour period. It does this by adding the originating IP address of this communication (the address of the victim PC, variable ‘b’ above) to a block list. The server also geofences IP address ranges, and only allows requests to originate from specific countries of interest to the Gootloader threat actor. The referrer string (variable ‘d’ above) contains the original search terms.
(In this example, the “&d=” referrer string is the base64-encoded value of “google/?q=cisco_wpa_agreement”)
Later, we will see that the server’s response will be the fake forum page renderer code.
The mothership sends the fake forum page
The mothership response contains two parts: one contains the HTML header elements, and the other contains the page body content. The two are delimited in the code by a tag.
The header part contains multiple elements, separated by pipe (“|”) characters. Using what it gets from the mothership, the landing page code will gather the HTML content:
The portion of the Gootkit code that collects the HTML content of the fake page it will later draw over the top of the compromised website
The script adds the entire /24 IP address range where the request originated to a 24-hour block list. Neither the originating computer, nor any others with the same initial three sets of numbers in its IP address, can get the page again for at least a day. (This was already seen in the SQL database dump):
The Gootkit code blocks repeat visitors by adding not only the visitor’s IP address range to a block list, but the entire class C IPv4 address range on either side of the visitor’s address, just for good measure
How Gootloader renders the fake forum page
If the request comes from an IP address that isn’t on the block list, the malicious code in the compromised WordPress database takes action and delivers the bogus message board content (typically titled simply “Questions And Answers”) to the visitor’s browser.
The Gootloader fake forum page, featuring a “question” and an “answer” that links to the Gootloader JScript first-stage payload
The only visible malicious content in the source code of a compromised landing page is a simple inserted JavaScript tag. For example: