Don't be afraid of XXE vulnerabilities: understand the beast and how to detect them
Today XML External Entities (XXE) vulnerabilities are still ubiquitous, despite the fact that recommendations to protect against them have been an integral part of security standards for years. In this post, the first in a series of three blog posts, we will try to demystify XXE vulnerabilities and present the rule we put in place to help you detect and prevent them.
An XML entity is declared in the Document Type Definition (DOCTYPE) of an XML document. An entity is internal, if its value is retrieved from inside the document, or external if its value is a URI. When an entity reference is subsequently used in the XML document, the reference is replaced by the value that was retrieved for it. For example, the following XML document retrieves the value of the xxe entity via URI from a file which content is then embedded into the document:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE person [ <!ENTITY internal "Matt"> <!ENTITY xxe SYSTEM "file:///data/city.txt"> ]> <person> <name>&internal;</name> <city>&xxe;</city> <age>18</age> </person>
An application dealing with XML files should be careful to restrict external entities to authorized file system and network resources, otherwise it opens the door to arbitrary file disclosures and server-side request forgery (SSRF) attacks:
<!DOCTYPE person [ <!ENTITY file SYSTEM "file:///etc/passwd"> <!ENTITY ssrf SYSTEM "https://internal.network/sensitive_information"> ]>
Note: an entity can be general, as shown above, or it can be a parameter entity. The only difference between the two is that parameter entities are defined and used exclusively in the DTD.
How to detect XXE vulnerabilities?
Rule S2755 to the rescue
To help developers on that topic, rule S2755 “XML parsers should not be vulnerable to XXE attacks” is available for C#, Java, JS/TS, Python, PHP, and C/C++ in SonarLint, SonarCloud and all editions of SonarQube.
This rule raises an issue whenever the XML processor is misconfigured even when it only parses trusted XML files. We believe that there are only advantages to controlling and limiting the use of external entities:
- For performance reasons: it’s a good practice to reduce dependencies on external resources.
- For security reasons: it’s difficult to guarantee that a trusted XML file has not been tampered with, in place or in transit, by a malicious third party (as you’ll see below).
- In general: it makes sense to securely configure the XML parser in your project as soon as you start parsing XML files, even if you consider them to be trusted. That way you no longer have to worry about the risk of XXE vulnerabilities in the future in case the XML parser processes other XML files that you do not control.
Not convinced? Take a look at some actual and severe vulnerabilities, found by S2755 in various well-known open source projects written in different programming languages:
- In a previous blog post we talked about an XXE vulnerability in Wordpress 5.7 (CVE-2021-29447), the most popular PHP CMS, when authenticated users upload media files:
- XXE vulnerability in pikepdf 2.9.2 (CVE-2021-29421), a Python library for manipulating PDF files, when PDF XMP metadata (based on XML) is parsed:
- XXE vulnerability in WxJava 3.7.4.A, a Java SDK for developing WeChat mobile payment apps:
- XXE vulnerability in DefectDojo 1.6.4, a popular Python vulnerability management tool, when XML files, from a partner vulnerability scan tool, are parsed:
- XXE vulnerability in Openfire, a Java XMPP server:
Advice for assessing S2755 issues
Keep these things in mind when assessing XXE vulnerability issues in your own projects:
- Think of the worst-case scenario, for instance a malicious system user manipulating XML files, or a compromised partner application from which XML files are retrieved.
- Read the documentation for your XML processor, especially its default behavior for resolving XXEs.
- It is not well known that file formats and technologies like Office documents, RSS, PDF, SOAP, SVG, XML-RPC, XMPP and many others are partly based on XML standard and therefore it is not easy to notice that by parsing these files, XXE vulnerabilities may appear. So, don't be surprised if rule S2755 is triggered, for example, when parsing XMP metadata from a PDF file.
In this post we saw examples of XXE vulnerabilities in popular and various open source projects written in different programming languages. I explained how to assess XXE vulnerability issues and what are the benefits of rule S2755, but only you can prevent vulnerabilities, so next time we’ll discuss how to fix them