Hi, welcome to the Part 2 of my Vidar infostealer analysis writeup. In part 1 of this post, I covered detailed technical analysis of packed executable dropped by initial stager by extracting and exploring embedded shellcode which is unpacking and self-injecting final payload. This part focuses on detailed static analysis of final injected payload: unpacked Vidar infostealer, defying anti-analysis techniques employed by malware (string decryption, dynamically loading DLLs and resolving APIs), automating analysis and finally uncovering stealer’s main functionality through deobfuscated/decrypted strings.

SHA256: fca48ccbf3db60291b49f2290317b4919007dcc4fb943c1136eb70cf998260a5

Vidar in a Nutshell

The Vidar Stealer is popular stealer written in C++ and has been active since October 2018 and seen in numerous different campaigns. It has been utilized by the threat actors behind GandCrab to use Vidar infostealer in the process for distributing the ransomware as second stage payload, which helps increasing their profits. The family is quite flexible in its operations as it can be configured to grab specific information dynamically. It fetches its configuration from C2 server at runtime which dictates what features are activated and which information is gathered and exfiltrated from victim machine. It also downloads several benign supporting dlls (freebl3.dll, mozglue.dll, msvcp140.dll and nss3.dll) to process encrypted data from browsers such as email credentials, chat account details, web-browsing cookies, etc., compresses everything into a ZIP archive, and then exfiltrates the archive to the attackers via an HTTP POST request. Once this is done, it kills its own process and deletes downloaded DLLs, working directory contents and main executable in an attempt to wipe all evidence of its presence from the victim’s machine.

Technical Analysis

I’ll start analysis by loading this executable directly in IDA to look for important strings, IDA’s strings window show some intersting plaintext and base64 encoded strings stored in .rdata section

image

if I quickly decode few base64 strings in Cyberchef, it results in junk data giving a clue that strings are possibly encrypted before they were base64 encoded

image

next I’ll check for encryption algorithm but KANAL fails to detect any potential algorithm for string encryption as given in figure below

image

so let’s start digging it statically to see how string encryption actually works in this case, for this purpose I’ll double click a base64 encoded string randomly to see where it’s been used by finding its Xrefs which takes us to sub_423050 routine

image

this routine seems to be processing most of the base64 encoded strings and storing result for each processed string in a global variable, apart from first two variables which seem to be storing plaintext values for possible decryption key and domain, let’s rename this routine to wrap_decrypt_strings

image

sub_422F70 in wrap_decrypt_strings routine can be seen from figure above to be repititively called with base64 strings, has been Xref’d for ~400 times, it can be assumed it is processing encrypted strings and can be renamed to decrypt_strings for our convenience as shown in the figure below

image

further exploring decrypt_strings by loading the executable in x64dbg, debugging unveils that first two calls to sub_4011C0 routine are just copying values of key and base64 encoded encrypted string to local variables, next routine sub_422D00 is decoding base64 string, stores decoded hex value to a local variable and returns address of this local variable

image

base64 decoded hex string can also be verified in cyberchef

image

later it calculates length for base64 decoded hex string and allocates buffer equivalent of that length on heap, next two calls to sub_401330 routine are allocating two buffers on heap for key and base64 decoded hex string respectively before it proceeds to finally decrypt data using sub_422980, quick decompilation of code for this routine results in three well recognized RC4 loops

image

string decryption can be confirmed by following Cyberchef recipe

image

decompiled version of decrypt_strings routine sums up all the steps described above

image

once processing for wrap_decrypt_strings completes, it continues to process next routine from _WinMain, a quick overview of sub_419700 this routine reveals that it makes extensive use of global variables which were initialized in wrap_decrypt_strings apart from two calls to sub_4196D0 and sub_4195A0 routines respectively which can further be explored by debugging

image

in the figure above, routine sub_4196D0 is parsing PEB structure to get base address for Kernel32.dll loaded in memory by accessing _PEB -> PEB_LDR_DATA -> InLoadOrderModuleList structures respetively, next routine sub_4195A0 being called is taking two parametes: 1). kernel32.dll base address 2). address of a global variable dword_432204 (LoadLibraryA) in first call and dword_432438 (GetProcAddress) in second call

image

where sub_4195A0 is parsing kernel32.dll’s header by navigating from IMAGE_DOS_HEADER -> IMAGE_NT_HEADER -> IMAGE_OPTIONAL_HEADER.DATA_DIRECTORY -> IMAGE_EXPORT_DIRECTORY.AddressOfNames to retrieve export name and compare it with value of API contained by input parameter value which in this case is LoadLibraryA

image

if both strings match, it returns API’s address by accessing value of IMAGE_EXPORT_DIRECTORY.AddressOfFunctions field, resolved address is stored in dword_432898 variable while second call to sub_4195A0 resolves GetProcAddress, stores resolved address to dword_43280C which is subsequently used to resolve rest of API functions at runtime. I wrote an IDAPython script here which is first decrypting strings from wrap_decrypt_strings, resolving APIs from sub_419700 routine, adding comments and giving meaningful names to global variables storing resolved APIs to properly understand code flow and its functionality. decrypt_strings routine from IDAPython script is finding key, locating ~400 base64 encoded encrypted strings, base64 decoding strings and using key to decrypt base64 decoded hex strings, adding decrypted strings as comments and renaming variables as shown in figure below

image

resolve_apis routine from script is resolving ~100 APIs from 11 libraries from sub_419700 routine

image

after resolving APIs, next routine sub_41F4A0 checks if victime machine is part of CIS (Commonwealth of Independent States) countries which include Armenia, Azerbaijan, Belarus, Georgia, Kazakhstan, Kyrgyzstan, Moldova, Russia, Tajikistan, Turkmenistan, Ukraine, and Uzbekistan, it retrieves language ID for current user by calling GetUserDefaultLangID API and compares returned result with specified location codes

image

where 0x43F corresponds to Kazakhstan, 0x443 to Uzbekistan, 0x82C to Azerbaijan and so on, it continues performing its tasks if user’s language ID doesn’t fall in the above mentioned category, otherwise it’ll stop execution and exit, next routine sub_41B700 performs windows defender anti-emulation check by compareing computer name to HAL9TH and user name to JohnDoe strings

image

once all required checks are passed, sub_420BE0 routine is called which consists of stealer’s grabbing module, it prepares urls and destination path strings where downloaded dlls from C2 server are to be stored before performing any other activity

image

it downloads 7 dlls under C:\Programdata\

image

next it creates its working directory under C:\Programdata, name of directory is randomly generated 15 digit string like C:\ProgramData\920304972255009 where it further creates four sub-directories (autofill, cc, cookies and crypto) which are required to be created to store stolen data from browser, outlook, cryptocurrency wallets and system information gathering modules

image

different types of browsers are being targeted to steal autofill, credit card, cookies, browsing history and victim’s login credentials, this module is equipped with advanced stealing and encryption techniques

image

it further queries registry about SMTP and IMAP servers with confidential data and password, gathers data about connected outlook accounts (if any) and finally dumps all the data to outlook.txt file in its working directory

image

later it scans for .wallet, .seco, .passphrase and .keystore files for ~30 cryptocurrency wallets on their installed paths and copies scanned files to “crypto” in working directory

image

Vidar creates an HTTP POST request for C&C (http://himarkh.xyz/main.php) server in order to download configuration for grabbing module at runtime, parses downloaded configuration and proceeds to gather host, hardware and installed software related info

image

which is stored in system.txt file according to the specified format as shown in figure below

image

the same routine also captures screenshots which is stored as “screenshot.jpg” inside working directory

image

immidiately after that a zip file with “_8294024645.zip” name format is created and stolen contents from working directory are compressed (file is compressed using Zip2 encryption algorithm as identified by KANAL)

image

the compressed file is now ready to be exfiltrated to its C&C server in another POST request

image

after exiting from recursive grabbing module, it deletes downloaded DLLs and files created in working directory being used to dump stolen data and information in order to remove its traces from victim machine

image

eventually it prepares a command “/c taskkill /pid PID & erase EXECUTABLE_PATH & RD /S /Q WORKING_DIRECTORY_PATH\* & exit” which gets executed using cmd.exe to kill the running infostealer process and to delete remaining directories created by this process and the process itself.

That’s it for Vidar infostealer’s in-depth static analysis and analysis automation! see you soon in another blogpost.