How to restrict XXE resolving?

by eric therond|

In my last post, we saw how to configure XML parsers to globally disable XXE declarations and expansions. It was an easy but strict solution that might be difficult to implement in your project. So today, I’ll talk about how to restrict XXE precisely. Again, our code examples will be mainly in Java with some references to other languages.

Restricting external connections to authorized protocols

The Java JAXP API version 1.5 introduced new properties to restrict access to external content. In this post, we will only focus on restricting XXE and thus on the ACCESS_EXTERNAL_DTD property. However, keep in mind that restricting access to other external resources, like schemas, is also necessary to reduce the risk of other XML vulnerabilities.

This property must be defined with a list of authorized protocols to use:

factory.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, "http");

In this example, only connections to http resources are allowed, preventing exfiltration of sensitive files with XXEs using the file URI scheme, but it's still insufficient to avoid vulnerabilities as sensitive contents can be retrieved from resources accessible through the network. We’ll see later that this feature often acts as a first filter and is used in combination with an entity resolver.

Completely disabling XXE can be done safely by setting this feature to an empty string:

factory.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, "");

Note: we’ve seen some XML processor implementations, like Apache Xerces, that still don’t support these JAXP 1.5 properties.

With the libxml library, the closest setting to restrict external connections is the LIBXML_NONET feature which prohibits the use of the network when retrieving external resources. Here's an example in PHP:

$doc = simplexml_load_string($xml, "SimpleXMLElement", LIBXML_NONET);

Resolving entities with a custom resolver

A SAX (Simple API for XML, originally a Java-only API) Parser is based on an event-driven API. The resolveEntity callback of the EntityResolver interface implementation is invoked for every entity reference found in an XML document. By default, the built-in Java resolver will attempt to access almost any external contents defined in a XML document.

So, to secure a SAX application, either the XXE declarations or reference expansions should be disabled, as we saw in the second post in this series, or a custom resolver should be used depending on the application needs. 

Common use cases for using a custom resolver are:

  • To use a cached version of a resource instead of always fetching it from the network.
  • To replace the scheme (like http://) of URI resources with a more appropriate one (https://).
  • To transform relative URI resources to absolute ones.
  • And of course to authorize only safe and validated entities.

A custom resolver consists of an EntityResolver interface implementation that should be registered using the setEntityResolver method of the SAX processor. 

As an example, a valid solution to prevent XXE vulnerabilities, we’ve seen a lot in open source projects, is to associate for any entity an empty string as content, in order to completely disable entity resolution. The difference with the previous solutions we discussed about is that here XXE declarations are allowed in XML files but their resolutions are disabled, thus providing a non-blocking and secure way to parse XML files:

builder.setEntityResolver(new EntityResolver() {
   @Override
   public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
     return new InputSource(new StringReader(""));
   }
 });

Note that registering null as a an entity resolver is equivalent as using the default and insecure resolver and thus there is likely no good reason to do that:

builder.setEntityResolver(null);

In a custom resolver, when returning null, the default behavior of the parser is to fetch the external content:

builder.setEntityResolver(new EntityResolver() {
   @Override
   public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
     return null;
   }
 });

Therefore a permissive entity resolver, like the one above, can still lead to XXE vulnerabilities. This is where the properties we discussed earlier come in. Below, the entityResolver uses a custom resolution for entities with a systemId ending in logo.png and uses the default behavior otherwise. Since the default behavior has been modified on the first line by setting the  ACCESS_EXTERNAL_DTD property to an empty string, the entity resolver is secure.

builder.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, "");
builder.setEntityResolver(new EntityResolver() {
  @Override
  public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
    if (systemId.endsWith("logo.png")) {           
      InputStream in = classLoader.getResourceAsStream("com/package/logo.png");
      return new InputSource(new StringReader(Base64.getEncoder().encodeToString(IOUtils.toByteArray(in))));
    }
 
    return null;
  }
});

Other libraries in other languages provide similar feature to allow custom resolution of external entities, it's the case of PHP's libxml with the libxml_set_external_entity_loader function:

libxml_set_external_entity_loader(
   function ($public, $system, $context) {
       if(str_ends_with($system, "logo.png")) {
           return fopen("./logo.png", "r");
       }
 
       return null;
   }
);

The major difference with libxml is that returning null in the custom resolver will not resolve the entity and will trigger an error.

Resolving entities with a Catalog

Mapping entities to other entities is also usually done using a XML Catalog.
A XML Catalog is an XML file containing mapping of external entities identifiers to URIs.

<?xml version="1.0"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
 <system systemId="https://www.externalwebsite.com/logos/logo.png" uri="logo.png"/>
</catalog>

The XML processor performs a lookup in the Catalog for each entity found when parsing a XML file. If there is no match then an exception is thrown, providing a safe way to protect against XXE vulnerabilities.

In java, a Catalog can be used by calling the setEntityResolver method as follow:

URL catalogUrl = classLoader.getResource(catalogFile);
CatalogResolver cr = CatalogManager.catalogResolver(CatalogFeatures.defaults(), catalogUrl.toURI());
 
builder.setEntityResolver(cr);

What more to know about XML vulnerabilities

Although XXE is the most dangerous vulnerability to be aware of when parsing XML files, other issues can arise, such as denial of service attacks or fetching of external contents with xinclude or XSLT file I/O elements. We'll be releasing a set of rules soon to further strengthen your XML parser. Stay tuned!

Related Blog Posts