Checkmk: Remote Code Execution by Chaining Multiple Bugs (1/3)

by stefan schiller|

Checkmk is a modern IT infrastructure monitoring solution developed in Python and C++. According to the vendor’s website, more than 2,000 customers rely on Checkmk. Due to its purpose, Checkmk is a central component usually deployed at a privileged position in a company’s network. This makes it a high-profile target for threat actors.

In our effort to help secure the open-source world, we decided to look at the open-source edition of Checkmk, which is based on a Nagios monitoring core and seamlessly integrates NagVis to visualize status data on maps and diagrams. During our research, we identified multiple vulnerabilities in Checkmk and its NagVis integration, which can be chained together by an unauthenticated, remote attacker to fully take over the server running a vulnerable version of Checkmk.

In this first article, in a series of three, we start by getting an overview of all identified vulnerabilities and a basic understanding of the Checkmk architecture. Furthermore, we determine the disastrous impact of chaining the identified vulnerabilities together. We also dive deep into the technical details of the first two vulnerabilities, which pave the way for an unauthenticated attacker to gain remote code execution.

Impact

We discovered multiple vulnerabilities in Checkmk and its NagVis integration with the following CVSS scores assigned by the vendor:

  • CVSS 9.1: Code Injection in watolib’s auth.php
  • CVSS 9.1: Arbitrary File Read in NagVis
  • CVSS 6.8: Line Feed Injection in ajax_graph_images.py
  • CVSS 5.0: Server-Side Request Forgery in agent-receiver

These vulnerabilities can be chained together by an unauthenticated, remote attacker to gain code execution on the server running Checkmk version 2.1.0p10 and lower:

We verified the exploitation for the open-source Raw Edition by leveraging a specific feature of its monitoring core. It is likely that an attacker can use similar techniques to exploit a server running an Enterprise Editions.

All of these issues are fixed with Checkmk version 2.1.0p12. We strongly recommend updating any instance with a version before this release.

Technical Details

In this section, we start by looking at the basic architecture of Checkmk and its components. Based on this, we outline how the identified vulnerabilities can be chained together by an attacker and deep dive into the technical details of the first two vulnerabilities, which are the beginning of a full chain to gain unauthenticated, remote code execution.

Background

Checkmk is an IT infrastructure monitoring solution similar to Zabbix or Icinga. The configuration and monitoring of servers, networks, applications, etc., is done via a web interface. This user-facing component is developed in Python and is called Checkmk GUI.

In order to retrieve additional information from the monitored systems, it is possible to deploy a monitoring agent on these systems. The component responsible for registering agents and receiving data from these agents is called the agent-receiver.

The following picture outlines the basic architecture of Checkmk:

Checkmk exposes two ports on the external network interface by default:

  • TCP port 80: actual web interface
  • TCP port 8000: agent-receiver

The first component of the web interface is an Apache web server running on TCP port 80, which serves as a reverse proxy. It is possible to run multiple Checkmk instances on a single host. These instances are called monitoring sites or simply sites. For each site, a dedicated, internal Apache server is spawned. The purpose of the outer reverse proxy is to map requests for a specific site to the corresponding internal Apache server dedicated to the requested site. In the picture above, the site monitoring is mapped to the Apache server running on TCP port 5000. From the outside, this Apache server can only be reached via the reverse proxy because it only listens on localhost.

The site-dedicated Apache server forwards requests to either the actual Checkmk GUI, a Python WSGI application, or via FCGI to a PHP wrapper in order to integrate the NagVis PHP component.

The heart of Checkmk is the monitoring core, which is responsible for initiating checks, collecting data, detecting state changes, and providing information to the GUI. While the Checkmk Enterprise Editions have their own monitoring core, the open-source Raw Edition uses a Nagios monitoring core. To retrieve data from it, the core provides an interface called Livestatus, which is implemented as a C++ Nagios broker module called livestatus.o. This interface uses a proprietary protocol called Livestatus Query Language (LQL), which is similar to both HTTP and SQL. For example, a query to retrieve the name and IP address of all monitored hosts, which are in DOWN (1) or UNREACH (2) state, looks like this:

The response may look like this:

More advanced queries can be built by using additional headers. Whenever the GUI needs some data from the core, it sends an LQL query to it, and the core responds with the requested data.

The second component directly reachable via the external interface is the agent-receiver. The agent-receiver is a FastAPI web server listening on TCP port 8000, which provides different routes for registering agents and collecting data from these agents.

With this basic understanding of Checkmk’s components, let’s see how an unauthenticated attacker would be able to chain the identified code vulnerabilities together in order to gain remote code execution.

Exploitation Chain

Some of the identified vulnerabilities have limited practical impact on their own. However, a malicious attacker can chain them together to achieve remote code execution.

The following picture summarizes what abilities the exploitation of an individual vulnerability yields and how an attacker can build on this ability to leverage the following vulnerability to further increase control, which finally results in unauthenticated, remote code execution:

The exploitation chain starts with a Server-Side Request Forgery in the agent-receiver (1), which can be leveraged by an attacker to access an endpoint only reachable from localhost. This endpoint is vulnerable to a Line Feed Injection (2). This gives an attacker the ability to forge arbitrary LQL queries, which are used by the Checkmk GUI to retrieve data from the monitoring core. An attacker can take advantage of this ability to delete arbitrary files, which can further be leveraged to bypass the authentication mechanism in the NagVis component.

Once an attacker has gained access to the NagVis component, an authenticated Arbitrary File Read vulnerability (3) in NagVis can be leveraged to read a special Checkmk configuration file called automation.secret. With access to the contents of this file, an attacker can gain access to the Checkmk GUI in the context of the automation user. This access can further be turned into remote code execution by exploiting a Code Injection vulnerability (4) in a Checkmk GUI subcomponent called watolib, which generates a file named auth.php required for the NagVis integration. 

After this rough overview of the exploitation chain, let’s dive into the technical details of the first two code vulnerabilities:

Server-Side Request Forgery in agent-receiver

The Checkmk agent-receiver is a FastAPI web server, which is by default exposed on TCP port 8000. Most of the provided endpoints forward requests to the Checkmk REST API, which is part of the Checkmk GUI exposed on TCP port 80.

The endpoint called /register_with_hostname expects a POST request with credentials provided via HTTP Basic authentication as well as the two JSON-encoded parameters uuid and host_name in the body. The endpoint handler itself only verifies that any credentials are provided and that the two parameters are present.

In order to retrieve and validate the host configuration of the host identified by the host_name parameter, the function host_configuration is called:

checkmk/agent-receiver/agent-receiver/endpoints.py

@agent_receiver_app.post(
   "/register_with_hostname",
   status_code=HTTP_204_NO_CONTENT,
)
async def register_with_hostname(
   *,
   credentials: HTTPBasicCredentials = Depends(security),
   registration_body: RegistrationWithHNBody,
) -> Response:
   _validate_registration_request(
       host_configuration(
           credentials,
           registration_body.host_name,
       )
   )

The host_configuration function forwards the request to the Checkmk REST API by calling the function _forward_get. The user-provided parameter host_name is appended to the target URL without any sanitization or encoding:

checkmk/agent-receiver/agent-receiver/checkmk_rest_api.py

def host_configuration(
   credentials: HTTPBasicCredentials,
   host_name: str,
) -> HostConfiguration:
   if (
       response := _forward_get(
           f"objects/host_config_internal/{host_name}",
           credentials,
       )
   ).status_code == HTTPStatus.OK:

This lack of sanitization and encoding leads to a limited Server-Side Request Forgery (SSRF) vulnerability.

At first, the impact of this vulnerability does not seem to be very high because the SSRF is limited to a GET request to the hostname and port of the Checkmk GUI, and an attacker cannot even read the response. However, this gives an attacker the essential ability to exploit a second vulnerability. Let’s have a look at it.

Line Feed Injection in ajax_graph_images.py

The Checkmk GUI only provides a minimal number of unauthenticated endpoints. This greatly reduces the attack surface. One of the unauthenticated endpoints is called /ajax_graph_images.py, whose endpoint handler is implemented in the function ajax_graph_images_for_notifications. The purpose of this endpoint is to generate an image with performance data for a given host or service.

Although this endpoint can be accessed unauthenticated, access is restricted by only allowing requests, which originate from localhost (127.0.0.1 or ::1):

checkmk/cmk/gui/plugins/metrics/graph_images.py

def ajax_graph_images_for_notifications(
   resolve_combined_single_metric_spec: Callable[
       [CombinedGraphSpec], Sequence[CombinedGraphMetricSpec]
   ],
) -> None:
   """Registered as `noauth:ajax_graph_images`."""
   if request.remote_ip not in ["127.0.0.1", "::1"]:
       raise MKUnauthenticatedException(
           _("You are not allowed to access this page (%s).") % request.remote_ip
       )

   with SuperUserContext():
       _answer_graph_image_request(resolve_combined_single_metric_spec)

After verifying that the request originates from localhost, the function _answer_graph_image_request is called. This function validates that a host GET parameter is provided and then calls get_graph_data_from_livestatus:

checkmk/cmk/gui/plugins/metrics/graph_images.py

def _answer_graph_image_request(
   resolve_combined_single_metric_spec: Callable[
       [CombinedGraphSpec], Sequence[CombinedGraphMetricSpec]
   ],
) -> None:
   try:
       host_name = request.get_ascii_input_mandatory("host")
       if not host_name:
           raise MKGeneralException(_('Missing mandatory "host" parameter'))
       ...
       try:
           row = get_graph_data_from_livestatus(site, host_name, service_description)

The function get_graph_data_from_livestatus retrieves performance data for the given host via the Livestatus Query Language (LQL) interface. When inspecting all invoked functions within the call stack, the _ensure_connected function caught our attention:

checkmk/cmk/gui/sites.py

def _ensure_connected(user: Optional[LoggedInUser], force_authuser: Optional[UserId]) -> None:
   ...
   if force_authuser is None:
       request_force_authuser = request.get_str_input("force_authuser")
       force_authuser = UserId(request_force_authuser) if request_force_authuser else None
   ...
   _set_livestatus_auth(user, force_authuser)

Although this is an internal function part of the code responsible for querying the LQL interface, a GET parameter called force_authuser is accessed. Further inspecting the call stack reveals that this GET parameter is inserted into the AuthUser header of the LQL query without any sanitization:

The AuthUser header is used to restrict the response to data that the specified user is allowed to see. However, this is not essential for our considerations. The important aspect is that the above AuthUser string contains the value of the GET parameter force_authuser and this string is inserted into the final LQL query sent to the monitoring core. Since the GET parameter force_authuser is not sanitized, it is also possible to insert line feed characters (0x0a) into the LQL query.

Usually, an external attacker cannot reach the vulnerable endpoint /ajax_graph_images.py because it is restricted to localhost only. When combined with the SSRF vulnerability in the agent-receiver this assumption is not valid anymore. The SSRF can for example be used to trigger a request with the following GET parameter:

This request results in the following LQL query sent to the core:

By using a line feed character in the force_authuser parameter, additional headers can be injected into the LQL query:

The resulting LQL query contains the additional header:

The ability to inject a whole new query in order to use other tables or commands would increase the attack surface even more. An attacker could try to add two line feed characters and insert a new query after these:

However, the LQL interface terminates the connection by default if two subsequent line feed characters are read, which form the end of a single query. Thus the second query is not evaluated:

This behavior can be altered by leveraging the KeepAlive header. When this header is set to on, the connection will be kept alive. This way whole new LQL queries can be injected:

This results in three distinct LQL queries, which are processed separately.

Query 1:

Query 2:

Query 3:

The second query can be fully controlled by an attacker.

With this ability, an attacker has literally made it to the core of Checkmk. Within the next article of this series, we will explore the LQL interface as a new attack surface and see how some minor differences in a developer’s implementation can prevent or enable an attacker to bypass authentication mechanisms.

Patch

Checkmk patched the limited SSRF in the agent-receiver in version 2.1.0p12 (commit). According to our recommendations, the endpoint handler for /register_with_hostname now URL-encodes the host_name parameter before inserting it into the URL:

checkmk/agent-receiver/agent-receiver/checkmk_rest_api.py

from urllib.parse import quote
...

def _url_encode_hostname(host_name: str) -> str:
    ...
    return quote(host_name, safe="")  # '/' is not "safe" here
...

def host_configuration(...):
   ...
       response := _forward_get(
           f"objects/host_config_internal/{_url_encode_hostname(
host_name)}", ...)
   ...

This prevents an attacker from accessing other endpoints than the intended one when the request is forwarded to the Checkmk REST API.

The Line Feed Injection vulnerability was also patched with version 2.1.0p12 (commit) by validating the value provided for the AuthUser header:

checkmk/livestatus/api/python/livestatus.py

# Pattern for allowed UserId values
validate_user_id_regex = re.compile(r"^[\w_][-\w.@_]*$")
...
   # Set user to be used in certain authorization domain
   def set_auth_user(self, domain: str, user: UserId) -> None:
       # Prevent setting AuthUser to values that would be rejected later. See Werk 14384.
       # Empty value is allowed and used to delete from auth_users dict.
       if user and validate_user_id_regex.match(user) is None:
           raise ValueError("Invalid user ID")

Also, an additional check for injected line feed characters was introduced:

checkmk/livestatus/api/python/livestatus.py

   def build_query(self, query_obj: Query, add_headers: str) -> str:
       # Prevent injection of further livestatus commands inside AuthUser header.
       if "\n" in self.auth_header[:-1]:
           raise MKLivestatusQueryError("Refusing to build query with invalid AuthUser header.")

These patches effectively prevent an attacker from injecting line feed characters in the force_authuser parameter.

Timeline

DateAction
2022-08-22We report all issues to Checkmk.
2022-08-23Vendor confirms all issues.
2022-09-15Vendor releases patched version 2.1.0p12.

Summary

In this first article in a series of three, we briefly introduced the Checkmk architecture and outlined the vulnerabilities we identified including the serious impact of chaining these together. We also did a technical deep dive into the first two vulnerabilities, which enable an external attacker to send arbitrary LQL queries to the monitoring core.

The root cause of most vulnerabilities is the lack of sanitization of user-controlled data. This is also true for both of the vulnerabilities we looked at. The Line Feed Injection vulnerability is somehow hard to spot because the user-controlled data is accessed by a function deep down in the call stack and not directly in the endpoint handler. This is generally a bad pattern and should be prevented.

In the next article in this series, we will have a more detailed look at the LQL interface and derive the impact of an attacker’s ability to forge arbitrary queries. We will also look at Checkmk’s NagVis integration and how the aforementioned ability can be leveraged to bypass the authentication of NagVis due to some specific implementation details.

Finally, we would like to thank the Checkmk team very much for quickly responding to our report, handling each issue with absolute transparency, and providing a comprehensive patch for all reported vulnerabilities.

Related Blog Posts