Malware bazaar
Member
The vast majority of serious malware over the past 30 years has been written in assembly or compiled languages such as C, C++ and Delphi. However, over the past decade, applications of this kind have become more and more diverse and, as a result, are increasingly being written in interpreted languages, such as Python. The low entry threshold, ease of use, high development speed, and huge collection of libraries have made Python attractive to most programmers, including malware developers. Python is becoming an increasingly popular tool for creating utilities for trojans, exploits, information theft, and the like. As the popularity of Python continues to grow steadily , and the threshold for entry into the monoculture of malware written in C, continues to be too high, it becomes obvious that Python will be increasingly used in the implementation of cyber attacks, including when writing various kinds of Trojans.
Figure 1: Trends in the popularity of major programming languages over the past decade
Secondly, malware written in Python is usually large, takes up a lot of memory, and, as a result, requires more computing resources. Serious malware that can be found in the wild is often small, inconspicuous, consumes little memory, and uses limited computing power. A sample of a compiled sample written in C can be around 200 KB, and a comparable instance written in Python, after being converted to an executable, is about 20 MB. Thus, when using interpreted languages, much more CPU and RAM resources are consumed.
However, by 2020, both digital and information technologies have advanced a lot. Every day the Internet is getting faster, computers have more RAM and larger hard drives, processors are more productive, and so on. Accordingly, Python is also becoming more and more popular and already comes pre-installed in macOS and most Linux distributions.
As an example, let's create the simplest program "Hello, world!" and convert it to an executable file using PyInstaller:
The result is a portable, self-contained ELF file, which is the Windows equivalent of an EXE file. Now, for comparison, let's create and compile the same C program:
Pay attention to the difference in the sizes of the resulting executables: 7 MB (Python) and 20 KB (C)! This is one of the major drawbacks mentioned earlier regarding file size and memory usage. An executable file compiled from Python code is much larger because an interpreter must be present inside the executable file (like a shared object file in Linux) in order to run successfully.
Py2exe uses the distutils package and requires a small setup.py to create an executable. As in the previous example, we will create the simplest program "Hello, world!" and compile with py2exe:
The file size is about the same as in the case of PyInstaller (6.83 MB)
Figure 2: Size of the executable created with py2exe
Again, we create the simple program “Hello, world!” and compile with Nuitka:
This time we were able to create a 432 KB portable binary, which is much smaller than when using PyInstaller and py2exe. To understand how such impressive results were achieved, let's look at the contents of the folder where the assembly took place:
One line of Python turned into more than 8,000 lines of C. Nuitka works exactly this way when it converts Python modules to C code, and then uses the libpython library and static C files to execute, just like CPython .
The result looks very worthy and it seems that with a high degree of probability Nuitka as a "Python compiler" will be developed further. For example, additional useful features may appear, for example, to protect against reverse engineering. There are already several utilities that easily parse binaries compiled with PyInstaller and py2exe in order to recover the Python source code. If the executable is created with Nuitka and the code is converted from Python to C, the task of a reverse engineer becomes much more complicated.
Consider three categories of simple yet useful and powerful utilities:
The following is an example of using the pyarmor utility:
An example of creating a screenshot using the python-mss library:
For example, the external IP address of a compromised system can be easily obtained using the requests library:
The eval() built-in function is generally considered to be very ambiguous and carries serious security risks when used in code. On the other hand, this feature is very useful when writing malware.
The eval() function is very powerful and can be used to execute lines of Python code inside a script. This single function is often used to run high-level scripts or "plugins" in compiled malware on the fly if implemented correctly. Similarly, malware written in C uses a Lua engine to run scripts written in that language. Similar functionality was found in well-known malware, such as Flame .
Imagine a group of hackers remotely interacting with malware written in Python. If suddenly the group got into an unexpected situation where it is necessary to react quickly, the ability to directly execute code on the target system can be very useful. In addition, malware written in Python could be placed with very limited functionality, and new features are added as needed in order to remain inconspicuous for as long as possible.
Let's move on to real examples of malware from the wild.
An impressive analysis of SeaDuke was conducted by the Unin 42 team at Palo Alto . The decompiled source code of this malware is also available . In addition, F-Secure has published an excellent white paper discussing SeaDuke and related malware.
SeaDuke is a Trojan written in Python that has been converted into an executable for Windows using PyInstaller and processed by the UPX packer. The source code has been obfuscated to make analysis difficult. The malware has a lot of possibilities, including several methods for staying silent and long-term in Windows, launching cross-platform, and making web requests to receive commands and control.
Figure 4: SeaDuke Code Sample
PWOBot had many functions, including collecting keystrokes, pinning to the system, downloading and executing files, running Python code, making web requests, and mining cryptocurrencies. An excellent analysis of PWOBot was done by the Unit 42 team at Palo Alto .
A good analysis of PyLocky was done by Trend Micro , and analysts from Talos Intelligence managed to create a file decryptor to recover information encrypted on victims' systems.
The malware was transmitted using Word documents and contained a host of information theft opportunities, including downloading files via FTP, capturing images from webcams, downloading additional utilities, keylogging, working with browsers, and stealing accounts. Talos Intelligence has written an excellent article on an unknown person using this malware.
The script used to capture images from webcams is shown below:
Figure 5: Code section for capturing images from webcams
Consider the simplest script "Hello, world!" and execute as a module in the form of a pyc file (containing the bytecode) shown below. The source code can be restored using uncompyle.
As a result, pyc files will be obtained, which can be decompiled and restored to the source code using uncompyle6.
PyInstaller writes the string "pyi-windows-manifest-filename" almost at the very end of the executable, which can be observed in a hex editor (HxD):
Figure 6: Unique string added by PyInstaller at compile time
Below is a YARA rule for detecting executables compiled with PyInstaller ( source ):
The second YARA rule is used to detect executables compiled with py2exe ( source )
Happy Hacking ! Cyber Souls
Figure 1: Trends in the popularity of major programming languages over the past decade
Times are changing
Compared to a standard compiled language (such as C), writing malware in Python presents a number of challenges. First, Python must be installed on the operating system in order to interpret and execute code. However, as will be shown below, applications written in Python can be easily converted to a regular executable using various methods.Secondly, malware written in Python is usually large, takes up a lot of memory, and, as a result, requires more computing resources. Serious malware that can be found in the wild is often small, inconspicuous, consumes little memory, and uses limited computing power. A sample of a compiled sample written in C can be around 200 KB, and a comparable instance written in Python, after being converted to an executable, is about 20 MB. Thus, when using interpreted languages, much more CPU and RAM resources are consumed.
However, by 2020, both digital and information technologies have advanced a lot. Every day the Internet is getting faster, computers have more RAM and larger hard drives, processors are more productive, and so on. Accordingly, Python is also becoming more and more popular and already comes pre-installed in macOS and most Linux distributions.
Missing interpreter? Not a problem!
Microsoft Windows is still the main target for most malware attacks, however Python is not installed by default on this operating system. Accordingly, for more effective and mass distribution, a malicious script must be converted into an executable file. There are many ways to "compile" Python. Consider the most popular utilities.PyInstaller
PyInstaller can convert Python scripts into standalone executable files for Windows, Linux, macOS by freezing the code. This method is one of the most popular for converting code into an executable format and is widely used for both legitimate and malicious purposes.As an example, let's create the simplest program "Hello, world!" and convert it to an executable file using PyInstaller:
The result is a portable, self-contained ELF file, which is the Windows equivalent of an EXE file. Now, for comparison, let's create and compile the same C program:
You must reply before you can see the hidden data contained here.
Py2exe
Py2exe is another popular method for converting code into a standalone EXE executable. As with PyInstaller, an interpreter comes with the code to create a portable executable. Although py2exe will likely fall out of use eventually as it doesn't support versions after Python 3.4, as the bytecode in CPython has changed a lot in Python 3.6 and up .Py2exe uses the distutils package and requires a small setup.py to create an executable. As in the previous example, we will create the simplest program "Hello, world!" and compile with py2exe:
You must reply before you can see the hidden data contained here.
Figure 2: Size of the executable created with py2exe
Nuitka
Nuitka is probably one of the most underrated yet more advanced methods for converting Python code into an executable. First, the Python code is translated into C code, and then the libpython library is linked to execute the code in exactly the same way as in the case of CPython. Nuitka knows how to use different C compilers, including gcc, clang, MinGW64, Visual Studio 2019+ and clang-cl to convert Python code to C.Again, we create the simple program “Hello, world!” and compile with Nuitka:
You must reply before you can see the hidden data contained here.
You must reply before you can see the hidden data contained here.
The result looks very worthy and it seems that with a high degree of probability Nuitka as a "Python compiler" will be developed further. For example, additional useful features may appear, for example, to protect against reverse engineering. There are already several utilities that easily parse binaries compiled with PyInstaller and py2exe in order to recover the Python source code. If the executable is created with Nuitka and the code is converted from Python to C, the task of a reverse engineer becomes much more complicated.
Other useful utilities
A big help for malware written in Python is the huge ecosystem of open source packages and repositories. Almost any problem you want to implement has most likely already been solved in one form or another with Python. Accordingly, malware authors can find simple functions on the web, while more complex functionality probably does not have to be written from scratch.Consider three categories of simple yet useful and powerful utilities:
- Code obfuscation.
- Creation of screenshots.
- Making web requests.
Code obfuscation
Malware authors using Python have many obfuscation libraries at their disposal to make code unreadable. Examples: pyminifier and pyarmor.The following is an example of using the pyarmor utility:
You must reply before you can see the hidden data contained here.
Taking screenshots
Malware designed to steal information often has a feature for taking screenshots of users' desktops. With Python this functionality is easy to implement as there are already several libraries available including pyscreenshot and python-mss.An example of creating a screenshot using the python-mss library:
You must reply before you can see the hidden data contained here.
Making web requestsrequests
Malware often uses web requests to perform various tasks on a compromised system, including administration, obtaining an external IP address, downloading new payload parts, and more. With Python, making web requests is easy and can be implemented using the standard library or open source libraries such as requests and httpx .For example, the external IP address of a compromised system can be easily obtained using the requests library:
You must reply before you can see the hidden data contained here.
Feature Benefits
eval()The eval() built-in function is generally considered to be very ambiguous and carries serious security risks when used in code. On the other hand, this feature is very useful when writing malware.
The eval() function is very powerful and can be used to execute lines of Python code inside a script. This single function is often used to run high-level scripts or "plugins" in compiled malware on the fly if implemented correctly. Similarly, malware written in C uses a Lua engine to run scripts written in that language. Similar functionality was found in well-known malware, such as Flame .
Imagine a group of hackers remotely interacting with malware written in Python. If suddenly the group got into an unexpected situation where it is necessary to react quickly, the ability to directly execute code on the target system can be very useful. In addition, malware written in Python could be placed with very limited functionality, and new features are added as needed in order to remain inconspicuous for as long as possible.
Let's move on to real examples of malware from the wild.
SeaDuke
SeaDuke is probably the most famous malware written in Python. In 2015-2016, the Democratic National Committee (DNC) was compromised by two groups that many analysts attributed to APTs 28 and 29.An impressive analysis of SeaDuke was conducted by the Unin 42 team at Palo Alto . The decompiled source code of this malware is also available . In addition, F-Secure has published an excellent white paper discussing SeaDuke and related malware.
SeaDuke is a Trojan written in Python that has been converted into an executable for Windows using PyInstaller and processed by the UPX packer. The source code has been obfuscated to make analysis difficult. The malware has a lot of possibilities, including several methods for staying silent and long-term in Windows, launching cross-platform, and making web requests to receive commands and control.
Figure 4: SeaDuke Code Sample
PWOBot
PWOBot is also a known malware that, like SeaDuke, was compiled using PyInstaller. The main activity of PWOBot occurred in the period 2013-2015 and affected several European organizations, mainly in Poland.PWOBot had many functions, including collecting keystrokes, pinning to the system, downloading and executing files, running Python code, making web requests, and mining cryptocurrencies. An excellent analysis of PWOBot was done by the Unit 42 team at Palo Alto .
PyLockyPyLocky
PyLocky is a ransomware compiled with PyInstaller. The main activity was seen in the US, France, Italy and Korea. This malware implements anti-sandboxing, receiving commands and external control, as well as file encryption using the 3DES algorithm.A good analysis of PyLocky was done by Trend Micro , and analysts from Talos Intelligence managed to create a file decryptor to recover information encrypted on victims' systems.
PoetRATPoetRAT
PoetRAT is a Trojan that targeted the Azerbaijani government and energy sector in early 2020. The Trojan infiltrated systems and stole information related to ICS/SCADA systems that control air turbines.The malware was transmitted using Word documents and contained a host of information theft opportunities, including downloading files via FTP, capturing images from webcams, downloading additional utilities, keylogging, working with browsers, and stealing accounts. Talos Intelligence has written an excellent article on an unknown person using this malware.
The script used to capture images from webcams is shown below:
Figure 5: Code section for capturing images from webcams
Open source
In addition to malware from the wild, there are several open source Trojans, such as pupy and Stitch. These malware demonstrate how complex and feature-rich applications of this kind can be. Pupy is cross-platform, runs entirely in memory, leaves very little footprint, can combine multiple methods for encrypted command transmission, migrate to processes using reflected injection, and can load Python code remotely from memory.Malware analysis tools
There are many malware analysis utilities written in Python, even in compiled form. Let's take a quick look at some of the tools available.uncompyle6
The successor to decompyle, uncompyle, and uncompyle2 is the uncompyle6 utility, which is a cross-platform decompiler that can be used to convert code bytes into Python source code.Consider the simplest script "Hello, world!" and execute as a module in the form of a pyc file (containing the bytecode) shown below. The source code can be restored using uncompyle.
You must reply before you can see the hidden data contained here.
pyinstxtractor.py (PyInstaller Extractor))
PyInstaller Extractor can extract Python data from executables compiled with PyInstaller.
You must reply before you can see the hidden data contained here.
python-exe-unpackerunpacker
The pythonexeunpack.py script can be used to unpack and decompile executables compiled with py2exe.
You must reply before you can see the hidden data contained here.
Compiled file
At compile time, PyInstaller and py2exe add unique strings to the executable, which makes it much easier to detect using YARA rules.PyInstaller writes the string "pyi-windows-manifest-filename" almost at the very end of the executable, which can be observed in a hex editor (HxD):
Figure 6: Unique string added by PyInstaller at compile time
Below is a YARA rule for detecting executables compiled with PyInstaller ( source ):
You must reply before you can see the hidden data contained here.
You must reply before you can see the hidden data contained here.
Conclusion
This is where the story about malware written in Python ends. It's very interesting to see the trends change as productivity grows and computer systems become easier to work with. As security professionals, we need to keep a close eye on malware written in Python, otherwise problems may arise when we least expect them.Happy Hacking ! Cyber Souls