Measuring hypervisor impact on Linux scheduling
CEME 1210 | Fri 07 Aug 3 p.m.–3:45 p.m.
Presented by
-
Roan Richmond is a Software Engineer working in Automotive for Codethink. He has worked on multiple large scale automotive projects for different client OEMs, specialising in Automotive Linux for Software-Defined-Vehicles (SDV). Focusing on both the integration of Linux into complex Automotive stacks and how we can measure trust in these large systems often comprising of vendor supplied binaries.
Roan also has contributions into Open-Source projects such as the Linux kernel, QEMU, CVA6 and the Trustable Software Framework.
At work Roan is focused on how Linux can be configured to be Safe enough to run ASIL-D rated processes, which previously were typically constrained to propriety Real-Time-Operating-Systems, such as AUTOSAR.
Roan Richmond is a Software Engineer working in Automotive for Codethink. He has worked on multiple large scale automotive projects for different client OEMs, specialising in Automotive Linux for Software-Defined-Vehicles (SDV). Focusing on both the integration of Linux into complex Automotive stacks and how we can measure trust in these large systems often comprising of vendor supplied binaries.
Roan also has contributions into Open-Source projects such as the Linux kernel, QEMU, CVA6 and the Trustable Software Framework.
At work Roan is focused on how Linux can be configured to be Safe enough to run ASIL-D rated processes, which previously were typically constrained to propriety Real-Time-Operating-Systems, such as AUTOSAR.
Abstract
The Linux kernel is used in much of today’s critical infrastructure, often operating autonomously, running processes without direct human supervision. In many of the industries gaining the most benefit from Linux, including Cloud, Finance and Automotive, it is common to see Linux used with a hypervisor. Using a hypervisor is commonly thought to allow for better security, reliability, and safety. However, impacts are often overlooked when hypervisors are presented as the only viable solution. As software complexity increases, decisions that have performance implications and increase the current stack's depth should not be taken lightly.
Hypervisor Coercion
In automotive, there is often no choice but to use a hypervisor; it could be mandated by hardware suppliers as a clause for using the hardware components safely. This can result in a situation where an integrator has to awkwardly stack the original system on top of an externally supplied hypervisor.
To make matters worse, the hypervisors that are forced upon automotive companies are often proprietary black-boxes, provided only as binaries to be integrated upon.
Naive Illusions
When migrating a bare-metal Linux-based system to a hypervisor, it is a fool's game to assume that things will just work the first time and that the “light-weight” hypervisor will have no impact on the system. Therefore, it is important to understand the impact a given hypervisor has on performance and the limitations this impact puts on the system.
Without properly understanding the effects of the hypervisor on the system, both benefits and pitfalls, adding a hypervisor into the base of the software stack results in a less predictable system. Increased complexity, reduced performance, unexpected resource contention, reduced hardware peripheral support are all real possible outcomes of introducing a proprietary hypervisor without proper consideration.
Statistical over Blind Trust
Often, hypervisors include documentation on overhead, which can range from “light” to actual figures like “adds 10% overhead to page allocations”. In general, these are provided without evidence or means to reproduce the figures made. Relying on these values is a blunder that regularly results in unforeseen issues when moving into production.
Measuring performance metrics on representative workloads is one method of building confidence and deriving overheads. These metrics can then be used to inform architectural decisions about the Linux system running on top of the hypervisor, such as avoiding specific physical CPUs which the hypervisor regularly pre-empts.
The Talk
This presentation presents an investigation into the effects of a black-box hypervisor on deadline scheduling in the Linux kernel. Using the millions of gathered data points, trends can be identified, driving further analysis and resulting in actionable architectural decisions backed by data. Specifically, this talk will show how scheduling data analysis resulted in hypotheses about the hypervisor core weighting, and how this was then transferred into limitations being placed on integrated applications.
The Linux kernel is used in much of today’s critical infrastructure, often operating autonomously, running processes without direct human supervision. In many of the industries gaining the most benefit from Linux, including Cloud, Finance and Automotive, it is common to see Linux used with a hypervisor. Using a hypervisor is commonly thought to allow for better security, reliability, and safety. However, impacts are often overlooked when hypervisors are presented as the only viable solution. As software complexity increases, decisions that have performance implications and increase the current stack's depth should not be taken lightly.
Hypervisor CoercionIn automotive, there is often no choice but to use a hypervisor; it could be mandated by hardware suppliers as a clause for using the hardware components safely. This can result in a situation where an integrator has to awkwardly stack the original system on top of an externally supplied hypervisor.
To make matters worse, the hypervisors that are forced upon automotive companies are often proprietary black-boxes, provided only as binaries to be integrated upon.
Naive IllusionsWhen migrating a bare-metal Linux-based system to a hypervisor, it is a fool's game to assume that things will just work the first time and that the “light-weight” hypervisor will have no impact on the system. Therefore, it is important to understand the impact a given hypervisor has on performance and the limitations this impact puts on the system.
Without properly understanding the effects of the hypervisor on the system, both benefits and pitfalls, adding a hypervisor into the base of the software stack results in a less predictable system. Increased complexity, reduced performance, unexpected resource contention, reduced hardware peripheral support are all real possible outcomes of introducing a proprietary hypervisor without proper consideration.
Statistical over Blind TrustOften, hypervisors include documentation on overhead, which can range from “light” to actual figures like “adds 10% overhead to page allocations”. In general, these are provided without evidence or means to reproduce the figures made. Relying on these values is a blunder that regularly results in unforeseen issues when moving into production.
Measuring performance metrics on representative workloads is one method of building confidence and deriving overheads. These metrics can then be used to inform architectural decisions about the Linux system running on top of the hypervisor, such as avoiding specific physical CPUs which the hypervisor regularly pre-empts.
The TalkThis presentation presents an investigation into the effects of a black-box hypervisor on deadline scheduling in the Linux kernel. Using the millions of gathered data points, trends can be identified, driving further analysis and resulting in actionable architectural decisions backed by data. Specifically, this talk will show how scheduling data analysis resulted in hypotheses about the hypervisor core weighting, and how this was then transferred into limitations being placed on integrated applications.