diff options
Diffstat (limited to 'docs/system')
109 files changed, 11074 insertions, 0 deletions
diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst new file mode 100644 index 000000000..cec87e374 --- /dev/null +++ b/docs/system/arm/aspeed.rst @@ -0,0 +1,109 @@ +Aspeed family boards (``*-bmc``, ``ast2500-evb``, ``ast2600-evb``) +================================================================== + +The QEMU Aspeed machines model BMCs of various OpenPOWER systems and +Aspeed evaluation boards. They are based on different releases of the +Aspeed SoC : the AST2400 integrating an ARM926EJ-S CPU (400MHz), the +AST2500 with an ARM1176JZS CPU (800MHz) and more recently the AST2600 +with dual cores ARM Cortex-A7 CPUs (1.2GHz). + +The SoC comes with RAM, Gigabit ethernet, USB, SD/MMC, USB, SPI, I2C, +etc. + +AST2400 SoC based machines : + +- ``palmetto-bmc`` OpenPOWER Palmetto POWER8 BMC +- ``quanta-q71l-bmc`` OpenBMC Quanta BMC + +AST2500 SoC based machines : + +- ``ast2500-evb`` Aspeed AST2500 Evaluation board +- ``romulus-bmc`` OpenPOWER Romulus POWER9 BMC +- ``witherspoon-bmc`` OpenPOWER Witherspoon POWER9 BMC +- ``sonorapass-bmc`` OCP SonoraPass BMC +- ``swift-bmc`` OpenPOWER Swift BMC POWER9 + +AST2600 SoC based machines : + +- ``ast2600-evb`` Aspeed AST2600 Evaluation board (Cortex-A7) +- ``tacoma-bmc`` OpenPOWER Witherspoon POWER9 AST2600 BMC + +Supported devices +----------------- + + * SMP (for the AST2600 Cortex-A7) + * Interrupt Controller (VIC) + * Timer Controller + * RTC Controller + * I2C Controller + * System Control Unit (SCU) + * SRAM mapping + * X-DMA Controller (basic interface) + * Static Memory Controller (SMC or FMC) - Only SPI Flash support + * SPI Memory Controller + * USB 2.0 Controller + * SD/MMC storage controllers + * SDRAM controller (dummy interface for basic settings and training) + * Watchdog Controller + * GPIO Controller (Master only) + * UART + * Ethernet controllers + * Front LEDs (PCA9552 on I2C bus) + * LPC Peripheral Controller (a subset of subdevices are supported) + * Hash/Crypto Engine (HACE) - Hash support only. TODO: HMAC and RSA + + +Missing devices +--------------- + + * Coprocessor support + * ADC (out of tree implementation) + * PWM and Fan Controller + * Slave GPIO Controller + * Super I/O Controller + * PCI-Express 1 Controller + * Graphic Display Controller + * PECI Controller + * MCTP Controller + * Mailbox Controller + * Virtual UART + * eSPI Controller + * I3C Controller + +Boot options +------------ + +The Aspeed machines can be started using the ``-kernel`` option to +load a Linux kernel or from a firmware. Images can be downloaded from +the OpenBMC jenkins : + + https://jenkins.openbmc.org/job/ci-openbmc/lastSuccessfulBuild/distro=ubuntu,label=docker-builder + +or directly from the OpenBMC GitHub release repository : + + https://github.com/openbmc/openbmc/releases + +The image should be attached as an MTD drive. Run : + +.. code-block:: bash + + $ qemu-system-arm -M romulus-bmc -nic user \ + -drive file=obmc-phosphor-image-romulus.static.mtd,format=raw,if=mtd -nographic + +Options specific to Aspeed machines are : + + * ``execute-in-place`` which emulates the boot from the CE0 flash + device by using the FMC controller to load the instructions, and + not simply from RAM. This takes a little longer. + + * ``fmc-model`` to change the FMC Flash model. FW needs support for + the chip model to boot. + + * ``spi-model`` to change the SPI Flash model. + +For instance, to start the ``ast2500-evb`` machine with a different +FMC chip and a bigger (64M) SPI chip, use : + +.. code-block:: bash + + -M ast2500-evb,fmc-model=mx25l25635e,spi-model=mx66u51235f diff --git a/docs/system/arm/collie.rst b/docs/system/arm/collie.rst new file mode 100644 index 000000000..5cc67b6d1 --- /dev/null +++ b/docs/system/arm/collie.rst @@ -0,0 +1,16 @@ +Sharp Zaurus SL-5500 (``collie``) +================================= + +This machine is a model of the Sharp Zaurus SL-5500, which was +a 1990s PDA based on the StrongARM SA1110. + +Implemented devices: + + * NOR flash + * Interrupt controller + * Timer + * RTC + * GPIO + * Peripheral Pin Controller (PPC) + * UARTs + * Synchronous Serial Ports (SSP) diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst new file mode 100644 index 000000000..584eb1709 --- /dev/null +++ b/docs/system/arm/cpu-features.rst @@ -0,0 +1,393 @@ +Arm CPU Features +================ + +CPU features are optional features that a CPU of supporting type may +choose to implement or not. In QEMU, optional CPU features have +corresponding boolean CPU proprieties that, when enabled, indicate +that the feature is implemented, and, conversely, when disabled, +indicate that it is not implemented. An example of an Arm CPU feature +is the Performance Monitoring Unit (PMU). CPU types such as the +Cortex-A15 and the Cortex-A57, which respectively implement Arm +architecture reference manuals ARMv7-A and ARMv8-A, may both optionally +implement PMUs. For example, if a user wants to use a Cortex-A15 without +a PMU, then the ``-cpu`` parameter should contain ``pmu=off`` on the QEMU +command line, i.e. ``-cpu cortex-a15,pmu=off``. + +As not all CPU types support all optional CPU features, then whether or +not a CPU property exists depends on the CPU type. For example, CPUs +that implement the ARMv8-A architecture reference manual may optionally +support the AArch32 CPU feature, which may be enabled by disabling the +``aarch64`` CPU property. A CPU type such as the Cortex-A15, which does +not implement ARMv8-A, will not have the ``aarch64`` CPU property. + +QEMU's support may be limited for some CPU features, only partially +supporting the feature or only supporting the feature under certain +configurations. For example, the ``aarch64`` CPU feature, which, when +disabled, enables the optional AArch32 CPU feature, is only supported +when using the KVM accelerator and when running on a host CPU type that +supports the feature. While ``aarch64`` currently only works with KVM, +it could work with TCG. CPU features that are specific to KVM are +prefixed with "kvm-" and are described in "KVM VCPU Features". + +CPU Feature Probing +=================== + +Determining which CPU features are available and functional for a given +CPU type is possible with the ``query-cpu-model-expansion`` QMP command. +Below are some examples where ``scripts/qmp/qmp-shell`` (see the top comment +block in the script for usage) is used to issue the QMP commands. + +1. Determine which CPU features are available for the ``max`` CPU type + (Note, we started QEMU with qemu-system-aarch64, so ``max`` is + implementing the ARMv8-A reference manual in this case):: + + (QEMU) query-cpu-model-expansion type=full model={"name":"max"} + { "return": { + "model": { "name": "max", "props": { + "sve1664": true, "pmu": true, "sve1792": true, "sve1920": true, + "sve128": true, "aarch64": true, "sve1024": true, "sve": true, + "sve640": true, "sve768": true, "sve1408": true, "sve256": true, + "sve1152": true, "sve512": true, "sve384": true, "sve1536": true, + "sve896": true, "sve1280": true, "sve2048": true + }}}} + +We see that the ``max`` CPU type has the ``pmu``, ``aarch64``, ``sve``, and many +``sve<N>`` CPU features. We also see that all the CPU features are +enabled, as they are all ``true``. (The ``sve<N>`` CPU features are all +optional SVE vector lengths (see "SVE CPU Properties"). While with TCG +all SVE vector lengths can be supported, when KVM is in use it's more +likely that only a few lengths will be supported, if SVE is supported at +all.) + +(2) Let's try to disable the PMU:: + + (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"pmu":false}} + { "return": { + "model": { "name": "max", "props": { + "sve1664": true, "pmu": false, "sve1792": true, "sve1920": true, + "sve128": true, "aarch64": true, "sve1024": true, "sve": true, + "sve640": true, "sve768": true, "sve1408": true, "sve256": true, + "sve1152": true, "sve512": true, "sve384": true, "sve1536": true, + "sve896": true, "sve1280": true, "sve2048": true + }}}} + +We see it worked, as ``pmu`` is now ``false``. + +(3) Let's try to disable ``aarch64``, which enables the AArch32 CPU feature:: + + (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"aarch64":false}} + {"error": { + "class": "GenericError", "desc": + "'aarch64' feature cannot be disabled unless KVM is enabled and 32-bit EL1 is supported" + }} + +It looks like this feature is limited to a configuration we do not +currently have. + +(4) Let's disable ``sve`` and see what happens to all the optional SVE + vector lengths:: + + (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"sve":false}} + { "return": { + "model": { "name": "max", "props": { + "sve1664": false, "pmu": true, "sve1792": false, "sve1920": false, + "sve128": false, "aarch64": true, "sve1024": false, "sve": false, + "sve640": false, "sve768": false, "sve1408": false, "sve256": false, + "sve1152": false, "sve512": false, "sve384": false, "sve1536": false, + "sve896": false, "sve1280": false, "sve2048": false + }}}} + +As expected they are now all ``false``. + +(5) Let's try probing CPU features for the Cortex-A15 CPU type:: + + (QEMU) query-cpu-model-expansion type=full model={"name":"cortex-a15"} + {"return": {"model": {"name": "cortex-a15", "props": {"pmu": true}}}} + +Only the ``pmu`` CPU feature is available. + +A note about CPU feature dependencies +------------------------------------- + +It's possible for features to have dependencies on other features. I.e. +it may be possible to change one feature at a time without error, but +when attempting to change all features at once an error could occur +depending on the order they are processed. It's also possible changing +all at once doesn't generate an error, because a feature's dependencies +are satisfied with other features, but the same feature cannot be changed +independently without error. For these reasons callers should always +attempt to make their desired changes all at once in order to ensure the +collection is valid. + +A note about CPU models and KVM +------------------------------- + +Named CPU models generally do not work with KVM. There are a few cases +that do work, e.g. using the named CPU model ``cortex-a57`` with KVM on a +seattle host, but mostly if KVM is enabled the ``host`` CPU type must be +used. This means the guest is provided all the same CPU features as the +host CPU type has. And, for this reason, the ``host`` CPU type should +enable all CPU features that the host has by default. Indeed it's even +a bit strange to allow disabling CPU features that the host has when using +the ``host`` CPU type, but in the absence of CPU models it's the best we can +do if we want to launch guests without all the host's CPU features enabled. + +Enabling KVM also affects the ``query-cpu-model-expansion`` QMP command. The +affect is not only limited to specific features, as pointed out in example +(3) of "CPU Feature Probing", but also to which CPU types may be expanded. +When KVM is enabled, only the ``max``, ``host``, and current CPU type may be +expanded. This restriction is necessary as it's not possible to know all +CPU types that may work with KVM, but it does impose a small risk of users +experiencing unexpected errors. For example on a seattle, as mentioned +above, the ``cortex-a57`` CPU type is also valid when KVM is enabled. +Therefore a user could use the ``host`` CPU type for the current type, but +then attempt to query ``cortex-a57``, however that query will fail with our +restrictions. This shouldn't be an issue though as management layers and +users have been preferring the ``host`` CPU type for use with KVM for quite +some time. Additionally, if the KVM-enabled QEMU instance running on a +seattle host is using the ``cortex-a57`` CPU type, then querying ``cortex-a57`` +will work. + +Using CPU Features +================== + +After determining which CPU features are available and supported for a +given CPU type, then they may be selectively enabled or disabled on the +QEMU command line with that CPU type:: + + $ qemu-system-aarch64 -M virt -cpu max,pmu=off,sve=on,sve128=on,sve256=on + +The example above disables the PMU and enables the first two SVE vector +lengths for the ``max`` CPU type. Note, the ``sve=on`` isn't actually +necessary, because, as we observed above with our probe of the ``max`` CPU +type, ``sve`` is already on by default. Also, based on our probe of +defaults, it would seem we need to disable many SVE vector lengths, rather +than only enabling the two we want. This isn't the case, because, as +disabling many SVE vector lengths would be quite verbose, the ``sve<N>`` CPU +properties have special semantics (see "SVE CPU Property Parsing +Semantics"). + +KVM VCPU Features +================= + +KVM VCPU features are CPU features that are specific to KVM, such as +paravirt features or features that enable CPU virtualization extensions. +The features' CPU properties are only available when KVM is enabled and +are named with the prefix "kvm-". KVM VCPU features may be probed, +enabled, and disabled in the same way as other CPU features. Below is +the list of KVM VCPU features and their descriptions. + + kvm-no-adjvtime By default kvm-no-adjvtime is disabled. This + means that by default the virtual time + adjustment is enabled (vtime is not *not* + adjusted). + + When virtual time adjustment is enabled each + time the VM transitions back to running state + the VCPU's virtual counter is updated to ensure + stopped time is not counted. This avoids time + jumps surprising guest OSes and applications, + as long as they use the virtual counter for + timekeeping. However it has the side effect of + the virtual and physical counters diverging. + All timekeeping based on the virtual counter + will appear to lag behind any timekeeping that + does not subtract VM stopped time. The guest + may resynchronize its virtual counter with + other time sources as needed. + + Enable kvm-no-adjvtime to disable virtual time + adjustment, also restoring the legacy (pre-5.0) + behavior. + + kvm-steal-time Since v5.2, kvm-steal-time is enabled by + default when KVM is enabled, the feature is + supported, and the guest is 64-bit. + + When kvm-steal-time is enabled a 64-bit guest + can account for time its CPUs were not running + due to the host not scheduling the corresponding + VCPU threads. The accounting statistics may + influence the guest scheduler behavior and/or be + exposed to the guest userspace. + +TCG VCPU Features +================= + +TCG VCPU features are CPU features that are specific to TCG. +Below is the list of TCG VCPU features and their descriptions. + + pauth Enable or disable ``FEAT_Pauth``, pointer + authentication. By default, the feature is + enabled with ``-cpu max``. + + pauth-impdef When ``FEAT_Pauth`` is enabled, either the + *impdef* (Implementation Defined) algorithm + is enabled or the *architected* QARMA algorithm + is enabled. By default the impdef algorithm + is disabled, and QARMA is enabled. + + The architected QARMA algorithm has good + cryptographic properties, but can be quite slow + to emulate. The impdef algorithm used by QEMU + is non-cryptographic but significantly faster. + +SVE CPU Properties +================== + +There are two types of SVE CPU properties: ``sve`` and ``sve<N>``. The first +is used to enable or disable the entire SVE feature, just as the ``pmu`` +CPU property completely enables or disables the PMU. The second type +is used to enable or disable specific vector lengths, where ``N`` is the +number of bits of the length. The ``sve<N>`` CPU properties have special +dependencies and constraints, see "SVE CPU Property Dependencies and +Constraints" below. Additionally, as we want all supported vector lengths +to be enabled by default, then, in order to avoid overly verbose command +lines (command lines full of ``sve<N>=off``, for all ``N`` not wanted), we +provide the parsing semantics listed in "SVE CPU Property Parsing +Semantics". + +SVE CPU Property Dependencies and Constraints +--------------------------------------------- + + 1) At least one vector length must be enabled when ``sve`` is enabled. + + 2) If a vector length ``N`` is enabled, then, when KVM is enabled, all + smaller, host supported vector lengths must also be enabled. If + KVM is not enabled, then only all the smaller, power-of-two vector + lengths must be enabled. E.g. with KVM if the host supports all + vector lengths up to 512-bits (128, 256, 384, 512), then if ``sve512`` + is enabled, the 128-bit vector length, 256-bit vector length, and + 384-bit vector length must also be enabled. Without KVM, the 384-bit + vector length would not be required. + + 3) If KVM is enabled then only vector lengths that the host CPU type + support may be enabled. If SVE is not supported by the host, then + no ``sve*`` properties may be enabled. + +SVE CPU Property Parsing Semantics +---------------------------------- + + 1) If SVE is disabled (``sve=off``), then which SVE vector lengths + are enabled or disabled is irrelevant to the guest, as the entire + SVE feature is disabled and that disables all vector lengths for + the guest. However QEMU will still track any ``sve<N>`` CPU + properties provided by the user. If later an ``sve=on`` is provided, + then the guest will get only the enabled lengths. If no ``sve=on`` + is provided and there are explicitly enabled vector lengths, then + an error is generated. + + 2) If SVE is enabled (``sve=on``), but no ``sve<N>`` CPU properties are + provided, then all supported vector lengths are enabled, which when + KVM is not in use means including the non-power-of-two lengths, and, + when KVM is in use, it means all vector lengths supported by the host + processor. + + 3) If SVE is enabled, then an error is generated when attempting to + disable the last enabled vector length (see constraint (1) of "SVE + CPU Property Dependencies and Constraints"). + + 4) If one or more vector lengths have been explicitly enabled and at + at least one of the dependency lengths of the maximum enabled length + has been explicitly disabled, then an error is generated (see + constraint (2) of "SVE CPU Property Dependencies and Constraints"). + + 5) When KVM is enabled, if the host does not support SVE, then an error + is generated when attempting to enable any ``sve*`` properties (see + constraint (3) of "SVE CPU Property Dependencies and Constraints"). + + 6) When KVM is enabled, if the host does support SVE, then an error is + generated when attempting to enable any vector lengths not supported + by the host (see constraint (3) of "SVE CPU Property Dependencies and + Constraints"). + + 7) If one or more ``sve<N>`` CPU properties are set ``off``, but no ``sve<N>``, + CPU properties are set ``on``, then the specified vector lengths are + disabled but the default for any unspecified lengths remains enabled. + When KVM is not enabled, disabling a power-of-two vector length also + disables all vector lengths larger than the power-of-two length. + When KVM is enabled, then disabling any supported vector length also + disables all larger vector lengths (see constraint (2) of "SVE CPU + Property Dependencies and Constraints"). + + 8) If one or more ``sve<N>`` CPU properties are set to ``on``, then they + are enabled and all unspecified lengths default to disabled, except + for the required lengths per constraint (2) of "SVE CPU Property + Dependencies and Constraints", which will even be auto-enabled if + they were not explicitly enabled. + + 9) If SVE was disabled (``sve=off``), allowing all vector lengths to be + explicitly disabled (i.e. avoiding the error specified in (3) of + "SVE CPU Property Parsing Semantics"), then if later an ``sve=on`` is + provided an error will be generated. To avoid this error, one must + enable at least one vector length prior to enabling SVE. + +SVE CPU Property Examples +------------------------- + + 1) Disable SVE:: + + $ qemu-system-aarch64 -M virt -cpu max,sve=off + + 2) Implicitly enable all vector lengths for the ``max`` CPU type:: + + $ qemu-system-aarch64 -M virt -cpu max + + 3) When KVM is enabled, implicitly enable all host CPU supported vector + lengths with the ``host`` CPU type:: + + $ qemu-system-aarch64 -M virt,accel=kvm -cpu host + + 4) Only enable the 128-bit vector length:: + + $ qemu-system-aarch64 -M virt -cpu max,sve128=on + + 5) Disable the 512-bit vector length and all larger vector lengths, + since 512 is a power-of-two. This results in all the smaller, + uninitialized lengths (128, 256, and 384) defaulting to enabled:: + + $ qemu-system-aarch64 -M virt -cpu max,sve512=off + + 6) Enable the 128-bit, 256-bit, and 512-bit vector lengths:: + + $ qemu-system-aarch64 -M virt -cpu max,sve128=on,sve256=on,sve512=on + + 7) The same as (6), but since the 128-bit and 256-bit vector + lengths are required for the 512-bit vector length to be enabled, + then allow them to be auto-enabled:: + + $ qemu-system-aarch64 -M virt -cpu max,sve512=on + + 8) Do the same as (7), but by first disabling SVE and then re-enabling it:: + + $ qemu-system-aarch64 -M virt -cpu max,sve=off,sve512=on,sve=on + + 9) Force errors regarding the last vector length:: + + $ qemu-system-aarch64 -M virt -cpu max,sve128=off + $ qemu-system-aarch64 -M virt -cpu max,sve=off,sve128=off,sve=on + +SVE CPU Property Recommendations +-------------------------------- + +The examples in "SVE CPU Property Examples" exhibit many ways to select +vector lengths which developers may find useful in order to avoid overly +verbose command lines. However, the recommended way to select vector +lengths is to explicitly enable each desired length. Therefore only +example's (1), (4), and (6) exhibit recommended uses of the properties. + +SVE User-mode Default Vector Length Property +-------------------------------------------- + +For qemu-aarch64, the cpu property ``sve-default-vector-length=N`` is +defined to mirror the Linux kernel parameter file +``/proc/sys/abi/sve_default_vector_length``. The default length, ``N``, +is in units of bytes and must be between 16 and 8192. +If not specified, the default vector length is 64. + +If the default length is larger than the maximum vector length enabled, +the actual vector length will be reduced. Note that the maximum vector +length supported by QEMU is 256. + +If this property is set to ``-1`` then the default vector length +is set to the maximum possible length. diff --git a/docs/system/arm/cubieboard.rst b/docs/system/arm/cubieboard.rst new file mode 100644 index 000000000..344ff8cef --- /dev/null +++ b/docs/system/arm/cubieboard.rst @@ -0,0 +1,16 @@ +Cubietech Cubieboard (``cubieboard``) +===================================== + +The ``cubieboard`` model emulates the Cubietech Cubieboard, +which is a Cortex-A8 based single-board computer using +the AllWinner A10 SoC. + +Emulated devices: + +- Timer +- UART +- RTC +- EMAC +- SDHCI +- USB controller +- SATA controller diff --git a/docs/system/arm/digic.rst b/docs/system/arm/digic.rst new file mode 100644 index 000000000..2b3520ff5 --- /dev/null +++ b/docs/system/arm/digic.rst @@ -0,0 +1,11 @@ +Canon A1100 (``canon-a1100``) +============================= + +This machine is a model of the Canon PowerShot A1100 camera, which +uses the DIGIC SoC. This model is based on reverse engineering efforts +by the contributors to the `CHDK <http://chdk.wikia.com/>`_ and +`Magic Lantern <http://www.magiclantern.fm/>`_ projects. + +The emulation is incomplete. In particular it can't be used +to run the original camera firmware, but it can successfully run +an experimental version of the `barebox bootloader <http://www.barebox.org/>`_. diff --git a/docs/system/arm/emcraft-sf2.rst b/docs/system/arm/emcraft-sf2.rst new file mode 100644 index 000000000..377e24872 --- /dev/null +++ b/docs/system/arm/emcraft-sf2.rst @@ -0,0 +1,15 @@ +Emcraft SmartFusion2 SOM kit (``emcraft-sf2``) +============================================== + +The ``emcraft-sf2`` board emulates the SmartFusion2 SOM kit from +Emcraft (M2S010). This is a System-on-Module from EmCraft systems, +based on the SmartFusion2 SoC FPGA from Microsemi Corporation. +The SoC is based on a Cortex-M4 processor. + +Emulated devices: + +- System timer +- System registers +- SPI controller +- UART +- EMAC diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst new file mode 100644 index 000000000..144dc491d --- /dev/null +++ b/docs/system/arm/emulation.rst @@ -0,0 +1,103 @@ +A-profile CPU architecture support +================================== + +QEMU's TCG emulation includes support for the Armv5, Armv6, Armv7 and +Armv8 versions of the A-profile architecture. It also has support for +the following architecture extensions: + +- FEAT_AA32BF16 (AArch32 BFloat16 instructions) +- FEAT_AA32HPD (AArch32 hierarchical permission disables) +- FEAT_AA32I8MM (AArch32 Int8 matrix multiplication instructions) +- FEAT_AES (AESD and AESE instructions) +- FEAT_BF16 (AArch64 BFloat16 instructions) +- FEAT_BTI (Branch Target Identification) +- FEAT_DIT (Data Independent Timing instructions) +- FEAT_DPB (DC CVAP instruction) +- FEAT_DotProd (Advanced SIMD dot product instructions) +- FEAT_FCMA (Floating-point complex number instructions) +- FEAT_FHM (Floating-point half-precision multiplication instructions) +- FEAT_FP16 (Half-precision floating-point data processing) +- FEAT_FRINTTS (Floating-point to integer instructions) +- FEAT_FlagM (Flag manipulation instructions v2) +- FEAT_FlagM2 (Enhancements to flag manipulation instructions) +- FEAT_HPDS (Hierarchical permission disables) +- FEAT_I8MM (AArch64 Int8 matrix multiplication instructions) +- FEAT_JSCVT (JavaScript conversion instructions) +- FEAT_LOR (Limited ordering regions) +- FEAT_LRCPC (Load-acquire RCpc instructions) +- FEAT_LRCPC2 (Load-acquire RCpc instructions v2) +- FEAT_LSE (Large System Extensions) +- FEAT_MTE (Memory Tagging Extension) +- FEAT_MTE2 (Memory Tagging Extension) +- FEAT_MTE3 (MTE Asymmetric Fault Handling) +- FEAT_PAN (Privileged access never) +- FEAT_PAN2 (AT S1E1R and AT S1E1W instruction variants affected by PSTATE.PAN) +- FEAT_PAuth (Pointer authentication) +- FEAT_PMULL (PMULL, PMULL2 instructions) +- FEAT_PMUv3p1 (PMU Extensions v3.1) +- FEAT_PMUv3p4 (PMU Extensions v3.4) +- FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions) +- FEAT_RNG (Random number generator) +- FEAT_SB (Speculation Barrier) +- FEAT_SEL2 (Secure EL2) +- FEAT_SHA1 (SHA1 instructions) +- FEAT_SHA256 (SHA256 instructions) +- FEAT_SHA3 (Advanced SIMD SHA3 instructions) +- FEAT_SHA512 (Advanced SIMD SHA512 instructions) +- FEAT_SM3 (Advanced SIMD SM3 instructions) +- FEAT_SM4 (Advanced SIMD SM4 instructions) +- FEAT_SPECRES (Speculation restriction instructions) +- FEAT_SSBS (Speculative Store Bypass Safe) +- FEAT_TLBIOS (TLB invalidate instructions in Outer Shareable domain) +- FEAT_TLBIRANGE (TLB invalidate range instructions) +- FEAT_TTCNP (Translation table Common not private translations) +- FEAT_TTST (Small translation tables) +- FEAT_UAO (Unprivileged Access Override control) +- FEAT_VHE (Virtualization Host Extensions) +- FEAT_VMID16 (16-bit VMID) +- FEAT_XNX (Translation table stage 2 Unprivileged Execute-never) +- SVE (The Scalable Vector Extension) +- SVE2 (The Scalable Vector Extension v2) + +For information on the specifics of these extensions, please refer +to the `Armv8-A Arm Architecture Reference Manual +<https://developer.arm.com/documentation/ddi0487/latest>`_. + +When a specific named CPU is being emulated, only those features which +are present in hardware for that CPU are emulated. (If a feature is +not in the list above then it is not supported, even if the real +hardware should have it.) The ``max`` CPU enables all features. + +R-profile CPU architecture support +================================== + +QEMU's TCG emulation support for R-profile CPUs is currently limited. +We emulate only the Cortex-R5 and Cortex-R5F CPUs. + +M-profile CPU architecture support +================================== + +QEMU's TCG emulation includes support for Armv6-M, Armv7-M, Armv8-M, and +Armv8.1-M versions of the M-profile architucture. It also has support +for the following architecture extensions: + +- FP (Floating-point Extension) +- FPCXT (FPCXT access instructions) +- HP (Half-precision floating-point instructions) +- LOB (Low Overhead loops and Branch future) +- M (Main Extension) +- MPU (Memory Protection Unit Extension) +- PXN (Privileged Execute Never) +- RAS (Reliability, Serviceability and Availability): "minimum RAS Extension" only +- S (Security Extension) +- ST (System Timer Extension) + +For information on the specifics of these extensions, please refer +to the `Armv8-M Arm Architecture Reference Manual +<https://developer.arm.com/documentation/ddi0553/latest>`_. + +When a specific named CPU is being emulated, only those features which +are present in hardware for that CPU are emulated. (If a feature is +not in the list above then it is not supported, even if the real +hardware should have it.) There is no equivalent of the ``max`` CPU for +M-profile. diff --git a/docs/system/arm/gumstix.rst b/docs/system/arm/gumstix.rst new file mode 100644 index 000000000..cb373139d --- /dev/null +++ b/docs/system/arm/gumstix.rst @@ -0,0 +1,21 @@ +Gumstix Connex and Verdex (``connex``, ``verdex``) +================================================== + +These machines model the Gumstix Connex and Verdex boards. +The Connex has a PXA255 CPU and the Verdex has a PXA270. + +Implemented devices: + + * NOR flash + * SMC91C111 ethernet + * Interrupt controller + * DMA + * Timer + * GPIO + * MMC/SD card + * Fast infra-red communications port (FIR) + * LCD controller + * Synchronous serial ports (SPI) + * PCMCIA interface + * I2C + * I2S diff --git a/docs/system/arm/highbank.rst b/docs/system/arm/highbank.rst new file mode 100644 index 000000000..bb4965b36 --- /dev/null +++ b/docs/system/arm/highbank.rst @@ -0,0 +1,19 @@ +Calxeda Highbank and Midway (``highbank``, ``midway``) +====================================================== + +``highbank`` is a model of the Calxeda Highbank (ECX-1000) system, +which has four Cortex-A9 cores. + +``midway`` is a model of the Calxeda Midway (ECX-2000) system, +which has four Cortex-A15 cores. + +Emulated devices: + +- L2x0 cache controller +- SP804 dual timer +- PL011 UART +- PL061 GPIOs +- PL031 RTC +- PL022 synchronous serial port controller +- AHCI +- XGMAC ethernet controllers diff --git a/docs/system/arm/imx25-pdk.rst b/docs/system/arm/imx25-pdk.rst new file mode 100644 index 000000000..2a9711e8a --- /dev/null +++ b/docs/system/arm/imx25-pdk.rst @@ -0,0 +1,19 @@ +NXP i.MX25 PDK board (``imx25-pdk``) +==================================== + +The ``imx25-pdk`` board emulates the NXP i.MX25 Product Development Kit +board, which is based on an i.MX25 SoC which uses an ARM926 CPU. + +Emulated devices: + +- SD controller +- AVIC +- CCM +- GPT +- EPIT timers +- FEC +- RNGC +- I2C +- GPIO controllers +- Watchdog timer +- USB controllers diff --git a/docs/system/arm/integratorcp.rst b/docs/system/arm/integratorcp.rst new file mode 100644 index 000000000..594438008 --- /dev/null +++ b/docs/system/arm/integratorcp.rst @@ -0,0 +1,16 @@ +Arm Integrator/CP (``integratorcp``) +==================================== + +The Arm Integrator/CP board is emulated with the following devices: + +- ARM926E, ARM1026E, ARM946E, ARM1136 or Cortex-A8 CPU + +- Two PL011 UARTs + +- SMC 91c111 Ethernet adapter + +- PL110 LCD controller + +- PL050 KMI with PS/2 keyboard and mouse. + +- PL181 MultiMedia Card Interface with SD card. diff --git a/docs/system/arm/kzm.rst b/docs/system/arm/kzm.rst new file mode 100644 index 000000000..bb018fbdf --- /dev/null +++ b/docs/system/arm/kzm.rst @@ -0,0 +1,18 @@ +Kyoto Microcomputer KZM-ARM11-01 (``kzm``) +========================================== + +The ``kzm`` board emulates the Kyoto Microcomputer KZM-ARM11-01 +evaluation board, which is based on an NXP i.MX32 SoC +which uses an ARM1136 CPU. + +Emulated devices: + +- UARTs +- LAN9118 ethernet +- AVIC +- CCM +- GPT +- EPIT timers +- I2C +- GPIO controllers +- Watchdog timer diff --git a/docs/system/arm/mainstone.rst b/docs/system/arm/mainstone.rst new file mode 100644 index 000000000..05310f42c --- /dev/null +++ b/docs/system/arm/mainstone.rst @@ -0,0 +1,25 @@ +Intel Mainstone II board (``mainstone``) +======================================== + +The ``mainstone`` board emulates the Intel Mainstone II development +board, which uses a PXA270 CPU. + +Emulated devices: + +- Flash memory +- Keypad +- MMC controller +- 91C111 ethernet +- PIC +- Timer +- DMA +- GPIO +- FIR +- Serial +- LCD controller +- SSP +- USB controller +- RTC +- PCMCIA +- I2C +- I2S diff --git a/docs/system/arm/mps2.rst b/docs/system/arm/mps2.rst new file mode 100644 index 000000000..8a75beb3a --- /dev/null +++ b/docs/system/arm/mps2.rst @@ -0,0 +1,57 @@ +Arm MPS2 and MPS3 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``, ``mps3-an524``, ``mps3-an547``) +========================================================================================================================================================= + +These board models all use Arm M-profile CPUs. + +The Arm MPS2, MPS2+ and MPS3 dev boards are FPGA based (the 2+ has a +bigger FPGA but is otherwise the same as the 2; the 3 has a bigger +FPGA again, can handle 4GB of RAM and has a USB controller and QSPI flash). + +Since the CPU itself and most of the devices are in the FPGA, the +details of the board as seen by the guest depend significantly on the +FPGA image. + +QEMU models the following FPGA images: + +``mps2-an385`` + Cortex-M3 as documented in Arm Application Note AN385 +``mps2-an386`` + Cortex-M4 as documented in Arm Application Note AN386 +``mps2-an500`` + Cortex-M7 as documented in Arm Application Note AN500 +``mps2-an505`` + Cortex-M33 as documented in Arm Application Note AN505 +``mps2-an511`` + Cortex-M3 'DesignStart' as documented in Arm Application Note AN511 +``mps2-an521`` + Dual Cortex-M33 as documented in Arm Application Note AN521 +``mps3-an524`` + Dual Cortex-M33 on an MPS3, as documented in Arm Application Note AN524 +``mps3-an547`` + Cortex-M55 on an MPS3, as documented in Arm Application Note AN547 + +Differences between QEMU and real hardware: + +- AN385/AN386 remapping of low 16K of memory to either ZBT SSRAM1 or to + block RAM is unimplemented (QEMU always maps this to ZBT SSRAM1, as + if zbt_boot_ctrl is always zero) +- AN524 remapping of low memory to either BRAM or to QSPI flash is + unimplemented (QEMU always maps this to BRAM, ignoring the + SCC CFG_REG0 memory-remap bit) +- QEMU provides a LAN9118 ethernet rather than LAN9220; the only guest + visible difference is that the LAN9118 doesn't support checksum + offloading +- QEMU does not model the QSPI flash in MPS3 boards as real QSPI + flash, but only as simple ROM, so attempting to rewrite the flash + from the guest will fail +- QEMU does not model the USB controller in MPS3 boards + +Machine-specific options +"""""""""""""""""""""""" + +The following machine-specific options are supported: + +remap + Supported for ``mps3-an524`` only. + Set ``BRAM``/``QSPI`` to select the initial memory mapping. The + default is ``BRAM``. diff --git a/docs/system/arm/musca.rst b/docs/system/arm/musca.rst new file mode 100644 index 000000000..81e3dc921 --- /dev/null +++ b/docs/system/arm/musca.rst @@ -0,0 +1,31 @@ +Arm Musca boards (``musca-a``, ``musca-b1``) +============================================ + +The Arm Musca development boards are a reference implementation +of a system using the SSE-200 Subsystem for Embedded. They are +dual Cortex-M33 systems. + +QEMU provides models of the A and B1 variants of this board. + +Unimplemented devices: + +- SPI +- |I2C| +- |I2S| +- PWM +- QSPI +- Timer +- SCC +- GPIO +- eFlash +- MHU +- PVT +- SDIO +- CryptoCell + +Note that (like the real hardware) the Musca-A machine is +asymmetric: CPU 0 does not have the FPU or DSP extensions, +but CPU 1 does. Also like the real hardware, the memory maps +for the A and B1 variants differ significantly, so guest +software must be built for the right variant. + diff --git a/docs/system/arm/musicpal.rst b/docs/system/arm/musicpal.rst new file mode 100644 index 000000000..9de380edf --- /dev/null +++ b/docs/system/arm/musicpal.rst @@ -0,0 +1,19 @@ +Freecom MusicPal (``musicpal``) +=============================== + +The Freecom MusicPal internet radio emulation includes the following +elements: + +- Marvell MV88W8618 Arm core. + +- 32 MB RAM, 256 KB SRAM, 8 MB flash. + +- Up to 2 16550 UARTs + +- MV88W8xx8 Ethernet controller + +- MV88W8618 audio controller, WM8750 CODEC and mixer + +- 128x64 display with brightness control + +- 2 buttons, 2 navigation wheels with button function diff --git a/docs/system/arm/nrf.rst b/docs/system/arm/nrf.rst new file mode 100644 index 000000000..eda87bd76 --- /dev/null +++ b/docs/system/arm/nrf.rst @@ -0,0 +1,51 @@ +Nordic nRF boards (``microbit``) +================================ + +The `Nordic nRF`_ chips are a family of ARM-based System-on-Chip that +are designed to be used for low-power and short-range wireless solutions. + +.. _Nordic nRF: https://www.nordicsemi.com/Products + +The nRF51 series is the first series for short range wireless applications. +It is superseded by the nRF52 series. +The following machines are based on this chip : + +- ``microbit`` BBC micro:bit board with nRF51822 SoC + +There are other series such as nRF52, nRF53 and nRF91 which are currently not +supported by QEMU. + +Supported devices +----------------- + + * ARM Cortex-M0 (ARMv6-M) + * Serial ports (UART) + * Clock controller + * Timers + * Random Number Generator (RNG) + * GPIO controller + * NVMC + * SWI + +Missing devices +--------------- + + * Watchdog + * Real-Time Clock (RTC) controller + * TWI (i2c) + * SPI controller + * Analog to Digital Converter (ADC) + * Quadrature decoder + * Radio + +Boot options +------------ + +The Micro:bit machine can be started using the ``-device`` option to load a +firmware in `ihex format`_. Example: + +.. _ihex format: https://en.wikipedia.org/wiki/Intel_HEX + +.. code-block:: bash + + $ qemu-system-arm -M microbit -device loader,file=test.hex diff --git a/docs/system/arm/nseries.rst b/docs/system/arm/nseries.rst new file mode 100644 index 000000000..cd9edf5d8 --- /dev/null +++ b/docs/system/arm/nseries.rst @@ -0,0 +1,33 @@ +Nokia N800 and N810 tablets (``n800``, ``n810``) +================================================ + +Nokia N800 and N810 internet tablets (known also as RX-34 and RX-44 / +48) emulation supports the following elements: + +- Texas Instruments OMAP2420 System-on-chip (ARM1136 core) + +- RAM and non-volatile OneNAND Flash memories + +- Display connected to EPSON remote framebuffer chip and OMAP on-chip + display controller and a LS041y3 MIPI DBI-C controller + +- TI TSC2301 (in N800) and TI TSC2005 (in N810) touchscreen + controllers driven through SPI bus + +- National Semiconductor LM8323-controlled qwerty keyboard driven + through |I2C| bus + +- Secure Digital card connected to OMAP MMC/SD host + +- Three OMAP on-chip UARTs and on-chip STI debugging console + +- Mentor Graphics \"Inventra\" dual-role USB controller embedded in a + TI TUSB6010 chip - only USB host mode is supported + +- TI TMP105 temperature sensor driven through |I2C| bus + +- TI TWL92230C power management companion with an RTC on + |I2C| bus + +- Nokia RETU and TAHVO multi-purpose chips with an RTC, connected + through CBUS diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst new file mode 100644 index 000000000..adf497e67 --- /dev/null +++ b/docs/system/arm/nuvoton.rst @@ -0,0 +1,95 @@ +Nuvoton iBMC boards (``*-bmc``, ``npcm750-evb``, ``quanta-gsj``) +================================================================ + +The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are +designed to be used as Baseboard Management Controllers (BMCs) in various +servers. They all feature one or two ARM Cortex-A9 CPU cores, as well as an +assortment of peripherals targeted for either Enterprise or Data Center / +Hyperscale applications. The former is a superset of the latter, so NPCM750 has +all the peripherals of NPCM730 and more. + +.. _Nuvoton iBMC: https://www.nuvoton.com/products/cloud-computing/ibmc/ + +The NPCM750 SoC has two Cortex-A9 cores and is targeted for the Enterprise +segment. The following machines are based on this chip : + +- ``npcm750-evb`` Nuvoton NPCM750 Evaluation board + +The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and +Hyperscale applications. The following machines are based on this chip : + +- ``quanta-gbs-bmc`` Quanta GBS server BMC +- ``quanta-gsj`` Quanta GSJ server BMC +- ``kudo-bmc`` Fii USA Kudo server BMC + +There are also two more SoCs, NPCM710 and NPCM705, which are single-core +variants of NPCM750 and NPCM730, respectively. These are currently not +supported by QEMU. + +Supported devices +----------------- + + * SMP (Dual Core Cortex-A9) + * Cortex-A9MPCore built-in peripherals: SCU, GIC, Global Timer, Private Timer + and Watchdog. + * SRAM, ROM and DRAM mappings + * System Global Control Registers (GCR) + * Clock and reset controller (CLK) + * Timer controller (TIM) + * Serial ports (16550-based) + * DDR4 memory controller (dummy interface indicating memory training is done) + * OTP controllers (no protection features) + * Flash Interface Unit (FIU; no protection features) + * Random Number Generator (RNG) + * USB host (USBH) + * GPIO controller + * Analog to Digital Converter (ADC) + * Pulse Width Modulation (PWM) + * SMBus controller (SMBF) + * Ethernet controller (EMC) + * Tachometer + +Missing devices +--------------- + + * LPC/eSPI host-to-BMC interface, including + + * Keyboard and mouse controller interface (KBCI) + * Keyboard Controller Style (KCS) channels + * BIOS POST code FIFO + * System Wake-up Control (SWC) + * Shared memory (SHM) + * eSPI slave interface + + * Ethernet controller (GMAC) + * USB device (USBD) + * Peripheral SPI controller (PSPI) + * SD/MMC host + * PECI interface + * PCI and PCIe root complex and bridges + * VDM and MCTP support + * Serial I/O expansion + * LPC/eSPI host + * Coprocessor + * Graphics + * Video capture + * Encoding compression engine + * Security features + +Boot options +------------ + +The Nuvoton machines can boot from an OpenBMC firmware image, or directly into +a kernel using the ``-kernel`` option. OpenBMC images for ``quanta-gsj`` and +possibly others can be downloaded from the OpenPOWER jenkins : + + https://openpower.xyz/ + +The firmware image should be attached as an MTD drive. Example : + +.. code-block:: bash + + $ qemu-system-arm -machine quanta-gsj -nographic \ + -drive file=image-bmc,if=mtd,bus=0,unit=0,format=raw + +The default root password for test images is usually ``0penBmc``. diff --git a/docs/system/arm/orangepi.rst b/docs/system/arm/orangepi.rst new file mode 100644 index 000000000..83c744519 --- /dev/null +++ b/docs/system/arm/orangepi.rst @@ -0,0 +1,263 @@ +Orange Pi PC (``orangepi-pc``) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Xunlong Orange Pi PC is an Allwinner H3 System on Chip +based embedded computer with mainline support in both U-Boot +and Linux. The board comes with a Quad Core Cortex-A7 @ 1.3GHz, +1GiB RAM, 100Mbit ethernet, USB, SD/MMC, USB, HDMI and +various other I/O. + +Supported devices +""""""""""""""""" + +The Orange Pi PC machine supports the following devices: + + * SMP (Quad Core Cortex-A7) + * Generic Interrupt Controller configuration + * SRAM mappings + * SDRAM controller + * Real Time Clock + * Timer device (re-used from Allwinner A10) + * UART + * SD/MMC storage controller + * EMAC ethernet + * USB 2.0 interfaces + * Clock Control Unit + * System Control module + * Security Identifier device + +Limitations +""""""""""" + +Currently, Orange Pi PC does *not* support the following features: + +- Graphical output via HDMI, GPU and/or the Display Engine +- Audio output +- Hardware Watchdog + +Also see the 'unimplemented' array in the Allwinner H3 SoC module +for a complete list of unimplemented I/O devices: ``./hw/arm/allwinner-h3.c`` + +Boot options +"""""""""""" + +The Orange Pi PC machine can start using the standard -kernel functionality +for loading a Linux kernel or ELF executable. Additionally, the Orange Pi PC +machine can also emulate the BootROM which is present on an actual Allwinner H3 +based SoC, which loads the bootloader from a SD card, specified via the -sd argument +to qemu-system-arm. + +Machine-specific options +"""""""""""""""""""""""" + +The following machine-specific options are supported: + +- allwinner-rtc.base-year=YYYY + + The Allwinner RTC device is automatically created by the Orange Pi PC machine + and uses a default base year value which can be overridden using the 'base-year' property. + The base year is the actual represented year when the RTC year value is zero. + This option can be used in case the target operating system driver uses a different + base year value. The minimum value for the base year is 1900. + +- allwinner-sid.identifier=abcd1122-a000-b000-c000-12345678ffff + + The Security Identifier value can be read by the guest. + For example, U-Boot uses it to determine a unique MAC address. + +The above machine-specific options can be specified in qemu-system-arm +via the '-global' argument, for example: + +.. code-block:: bash + + $ qemu-system-arm -M orangepi-pc -sd mycard.img \ + -global allwinner-rtc.base-year=2000 + +Running mainline Linux +"""""""""""""""""""""" + +Mainline Linux kernels from 4.19 up to latest master are known to work. +To build a Linux mainline kernel that can be booted by the Orange Pi PC machine, +simply configure the kernel using the sunxi_defconfig configuration: + +.. code-block:: bash + + $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make mrproper + $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make sunxi_defconfig + +To be able to use USB storage, you need to manually enable the corresponding +configuration item. Start the kconfig configuration tool: + +.. code-block:: bash + + $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make menuconfig + +Navigate to the following item, enable it and save your configuration: + + Device Drivers > USB support > USB Mass Storage support + +Build the Linux kernel with: + +.. code-block:: bash + + $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make + +To boot the newly build linux kernel in QEMU with the Orange Pi PC machine, use: + +.. code-block:: bash + + $ qemu-system-arm -M orangepi-pc -nic user -nographic \ + -kernel /path/to/linux/arch/arm/boot/zImage \ + -append 'console=ttyS0,115200' \ + -dtb /path/to/linux/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dtb + +Orange Pi PC images +""""""""""""""""""" + +Note that the mainline kernel does not have a root filesystem. You may provide it +with an official Orange Pi PC image from the official website: + + http://www.orangepi.org/downloadresources/ + +Another possibility is to run an Armbian image for Orange Pi PC which +can be downloaded from: + + https://www.armbian.com/orange-pi-pc/ + +Alternatively, you can also choose to build you own image with buildroot +using the orangepi_pc_defconfig. Also see https://buildroot.org for more information. + +When using an image as an SD card, it must be resized to a power of two. This can be +done with the ``qemu-img`` command. It is recommended to only increase the image size +instead of shrinking it to a power of two, to avoid loss of data. For example, +to prepare a downloaded Armbian image, first extract it and then increase +its size to one gigabyte as follows: + +.. code-block:: bash + + $ qemu-img resize Armbian_19.11.3_Orangepipc_bionic_current_5.3.9.img 1G + +You can choose to attach the selected image either as an SD card or as USB mass storage. +For example, to boot using the Orange Pi PC Debian image on SD card, simply add the -sd +argument and provide the proper root= kernel parameter: + +.. code-block:: bash + + $ qemu-system-arm -M orangepi-pc -nic user -nographic \ + -kernel /path/to/linux/arch/arm/boot/zImage \ + -append 'console=ttyS0,115200 root=/dev/mmcblk0p2' \ + -dtb /path/to/linux/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dtb \ + -sd OrangePi_pc_debian_stretch_server_linux5.3.5_v1.0.img + +To attach the image as an USB mass storage device to the machine, +simply append to the command: + +.. code-block:: bash + + -drive if=none,id=stick,file=myimage.img \ + -device usb-storage,bus=usb-bus.0,drive=stick + +Instead of providing a custom Linux kernel via the -kernel command you may also +choose to let the Orange Pi PC machine load the bootloader from SD card, just like +a real board would do using the BootROM. Simply pass the selected image via the -sd +argument and remove the -kernel, -append, -dbt and -initrd arguments: + +.. code-block:: bash + + $ qemu-system-arm -M orangepi-pc -nic user -nographic \ + -sd Armbian_19.11.3_Orangepipc_buster_current_5.3.9.img + +Note that both the official Orange Pi PC images and Armbian images start +a lot of userland programs via systemd. Depending on the host hardware and OS, +they may be slow to emulate, especially due to emulating the 4 cores. +To help reduce the performance slow down due to emulating the 4 cores, you can +give the following kernel parameters via U-Boot (or via -append): + +.. code-block:: bash + + => setenv extraargs 'systemd.default_timeout_start_sec=9000 loglevel=7 nosmp console=ttyS0,115200' + +Running U-Boot +"""""""""""""" + +U-Boot mainline can be build and configured using the orangepi_pc_defconfig +using similar commands as describe above for Linux. Note that it is recommended +for development/testing to select the following configuration setting in U-Boot: + + Device Tree Control > Provider for DTB for DT Control > Embedded DTB + +To start U-Boot using the Orange Pi PC machine, provide the +u-boot binary to the -kernel argument: + +.. code-block:: bash + + $ qemu-system-arm -M orangepi-pc -nic user -nographic \ + -kernel /path/to/uboot/u-boot -sd disk.img + +Use the following U-boot commands to load and boot a Linux kernel from SD card: + +.. code-block:: bash + + => setenv bootargs console=ttyS0,115200 + => ext2load mmc 0 0x42000000 zImage + => ext2load mmc 0 0x43000000 sun8i-h3-orangepi-pc.dtb + => bootz 0x42000000 - 0x43000000 + +Running NetBSD +"""""""""""""" + +The NetBSD operating system also includes support for Allwinner H3 based boards, +including the Orange Pi PC. NetBSD 9.0 is known to work best for the Orange Pi PC +board and provides a fully working system with serial console, networking and storage. +For the Orange Pi PC machine, get the 'evbarm-earmv7hf' based image from: + + https://cdn.netbsd.org/pub/NetBSD/NetBSD-9.0/evbarm-earmv7hf/binary/gzimg/armv7.img.gz + +The image requires manually installing U-Boot in the image. Build U-Boot with +the orangepi_pc_defconfig configuration as described in the previous section. +Next, unzip the NetBSD image and write the U-Boot binary including SPL using: + +.. code-block:: bash + + $ gunzip armv7.img.gz + $ dd if=/path/to/u-boot-sunxi-with-spl.bin of=armv7.img bs=1024 seek=8 conv=notrunc + +Finally, before starting the machine the SD image must be extended such +that the size of the SD image is a power of two and that the NetBSD kernel +will not conclude the NetBSD partition is larger than the emulated SD card: + +.. code-block:: bash + + $ qemu-img resize armv7.img 2G + +Start the machine using the following command: + +.. code-block:: bash + + $ qemu-system-arm -M orangepi-pc -nic user -nographic \ + -sd armv7.img -global allwinner-rtc.base-year=2000 + +At the U-Boot stage, interrupt the automatic boot process by pressing a key +and set the following environment variables before booting: + +.. code-block:: bash + + => setenv bootargs root=ld0a + => setenv kernel netbsd-GENERIC.ub + => setenv fdtfile dtb/sun8i-h3-orangepi-pc.dtb + => setenv bootcmd 'fatload mmc 0:1 ${kernel_addr_r} ${kernel}; fatload mmc 0:1 ${fdt_addr_r} ${fdtfile}; fdt addr ${fdt_addr_r}; bootm ${kernel_addr_r} - ${fdt_addr_r}' + +Optionally you may save the environment variables to SD card with 'saveenv'. +To continue booting simply give the 'boot' command and NetBSD boots. + +Orange Pi PC integration tests +"""""""""""""""""""""""""""""" + +The Orange Pi PC machine has several integration tests included. +To run the whole set of tests, build QEMU from source and simply +provide the following command: + +.. code-block:: bash + + $ AVOCADO_ALLOW_LARGE_STORAGE=yes avocado --show=app,console run \ + -t machine:orangepi-pc tests/avocado/boot_linux_console.py diff --git a/docs/system/arm/palm.rst b/docs/system/arm/palm.rst new file mode 100644 index 000000000..47ff9b36d --- /dev/null +++ b/docs/system/arm/palm.rst @@ -0,0 +1,23 @@ +Palm Tungsten|E PDA (``cheetah``) +================================= + +The Palm Tungsten|E PDA (codename \"Cheetah\") emulation includes the +following elements: + +- Texas Instruments OMAP310 System-on-chip (ARM925T core) + +- ROM and RAM memories (ROM firmware image can be loaded with + -option-rom) + +- On-chip LCD controller + +- On-chip Real Time Clock + +- TI TSC2102i touchscreen controller / analog-digital converter / + Audio CODEC, connected through MicroWire and |I2S| busses + +- GPIO-connected matrix keypad + +- Secure Digital card connected to OMAP MMC/SD host + +- Three on-chip UARTs diff --git a/docs/system/arm/raspi.rst b/docs/system/arm/raspi.rst new file mode 100644 index 000000000..922fe375a --- /dev/null +++ b/docs/system/arm/raspi.rst @@ -0,0 +1,43 @@ +Raspberry Pi boards (``raspi0``, ``raspi1ap``, ``raspi2b``, ``raspi3ap``, ``raspi3b``) +====================================================================================== + + +QEMU provides models of the following Raspberry Pi boards: + +``raspi0`` and ``raspi1ap`` + ARM1176JZF-S core, 512 MiB of RAM +``raspi2b`` + Cortex-A7 (4 cores), 1 GiB of RAM +``raspi3ap`` + Cortex-A53 (4 cores), 512 MiB of RAM +``raspi3b`` + Cortex-A53 (4 cores), 1 GiB of RAM + + +Implemented devices +------------------- + + * ARM1176JZF-S, Cortex-A7 or Cortex-A53 CPU + * Interrupt controller + * DMA controller + * Clock and reset controller (CPRMAN) + * System Timer + * GPIO controller + * Serial ports (BCM2835 AUX - 16550 based - and PL011) + * Random Number Generator (RNG) + * Frame Buffer + * USB host (USBH) + * GPIO controller + * SD/MMC host controller + * SoC thermal sensor + * USB2 host controller (DWC2 and MPHI) + * MailBox controller (MBOX) + * VideoCore firmware (property) + + +Missing devices +--------------- + + * Peripheral SPI controller (SPI) + * Analog to Digital Converter (ADC) + * Pulse Width Modulation (PWM) diff --git a/docs/system/arm/realview.rst b/docs/system/arm/realview.rst new file mode 100644 index 000000000..65f5be346 --- /dev/null +++ b/docs/system/arm/realview.rst @@ -0,0 +1,34 @@ +Arm Realview boards (``realview-eb``, ``realview-eb-mpcore``, ``realview-pb-a8``, ``realview-pbx-a9``) +====================================================================================================== + +Several variants of the Arm RealView baseboard are emulated, including +the EB, PB-A8 and PBX-A9. Due to interactions with the bootloader, only +certain Linux kernel configurations work out of the box on these boards. + +Kernels for the PB-A8 board should have CONFIG_REALVIEW_HIGH_PHYS_OFFSET +enabled in the kernel, and expect 512M RAM. Kernels for The PBX-A9 board +should have CONFIG_SPARSEMEM enabled, CONFIG_REALVIEW_HIGH_PHYS_OFFSET +disabled and expect 1024M RAM. + +The following devices are emulated: + +- ARM926E, ARM1136, ARM11MPCore, Cortex-A8 or Cortex-A9 MPCore CPU + +- Arm AMBA Generic/Distributed Interrupt Controller + +- Four PL011 UARTs + +- SMC 91c111 or SMSC LAN9118 Ethernet adapter + +- PL110 LCD controller + +- PL050 KMI with PS/2 keyboard and mouse + +- PCI host bridge + +- PCI OHCI USB controller + +- LSI53C895A PCI SCSI Host Bus Adapter with hard disk and CD-ROM + devices + +- PL181 MultiMedia Card Interface with SD card. diff --git a/docs/system/arm/sabrelite.rst b/docs/system/arm/sabrelite.rst new file mode 100644 index 000000000..4ccb0560a --- /dev/null +++ b/docs/system/arm/sabrelite.rst @@ -0,0 +1,119 @@ +Boundary Devices SABRE Lite (``sabrelite``) +=========================================== + +Boundary Devices SABRE Lite i.MX6 Development Board is a low-cost development +platform featuring the powerful Freescale / NXP Semiconductor's i.MX 6 Quad +Applications Processor. + +Supported devices +----------------- + +The SABRE Lite machine supports the following devices: + + * Up to 4 Cortex-A9 cores + * Generic Interrupt Controller + * 1 Clock Controller Module + * 1 System Reset Controller + * 5 UARTs + * 2 EPIC timers + * 1 GPT timer + * 2 Watchdog timers + * 1 FEC Ethernet controller + * 3 I2C controllers + * 7 GPIO controllers + * 4 SDHC storage controllers + * 4 USB 2.0 host controllers + * 5 ECSPI controllers + * 1 SST 25VF016B flash + +Please note above list is a complete superset the QEMU SABRE Lite machine can +support. For a normal use case, a device tree blob that represents a real world +SABRE Lite board, only exposes a subset of devices to the guest software. + +Boot options +------------ + +The SABRE Lite machine can start using the standard -kernel functionality +for loading a Linux kernel, U-Boot bootloader or ELF executable. + +Running Linux kernel +-------------------- + +Linux mainline v5.10 release is tested at the time of writing. To build a Linux +mainline kernel that can be booted by the SABRE Lite machine, simply configure +the kernel using the imx_v6_v7_defconfig configuration: + +.. code-block:: bash + + $ export ARCH=arm + $ export CROSS_COMPILE=arm-linux-gnueabihf- + $ make imx_v6_v7_defconfig + $ make + +To boot the newly built Linux kernel in QEMU with the SABRE Lite machine, use: + +.. code-block:: bash + + $ qemu-system-arm -M sabrelite -smp 4 -m 1G \ + -display none -serial null -serial stdio \ + -kernel arch/arm/boot/zImage \ + -dtb arch/arm/boot/dts/imx6q-sabrelite.dtb \ + -initrd /path/to/rootfs.ext4 \ + -append "root=/dev/ram" + +Running U-Boot +-------------- + +U-Boot mainline v2020.10 release is tested at the time of writing. To build a +U-Boot mainline bootloader that can be booted by the SABRE Lite machine, use +the mx6qsabrelite_defconfig with similar commands as described above for Linux: + +.. code-block:: bash + + $ export CROSS_COMPILE=arm-linux-gnueabihf- + $ make mx6qsabrelite_defconfig + +Note we need to adjust settings by: + +.. code-block:: bash + + $ make menuconfig + +then manually select the following configuration in U-Boot: + + Device Tree Control > Provider of DTB for DT Control > Embedded DTB + +To start U-Boot using the SABRE Lite machine, provide the u-boot binary to +the -kernel argument, along with an SD card image with rootfs: + +.. code-block:: bash + + $ qemu-system-arm -M sabrelite -smp 4 -m 1G \ + -display none -serial null -serial stdio \ + -kernel u-boot + +The following example shows booting Linux kernel from dhcp, and uses the +rootfs on an SD card. This requires some additional command line parameters +for QEMU: + +.. code-block:: none + + -nic user,tftp=/path/to/kernel/zImage \ + -drive file=sdcard.img,id=rootfs -device sd-card,drive=rootfs + +The directory for the built-in TFTP server should also contain the device tree +blob of the SABRE Lite board. The sample SD card image was populated with the +root file system with one single partition. You may adjust the kernel "root=" +boot parameter accordingly. + +After U-Boot boots, type the following commands in the U-Boot command shell to +boot the Linux kernel: + +.. code-block:: none + + => setenv ethaddr 00:11:22:33:44:55 + => setenv bootfile zImage + => dhcp + => tftpboot 14000000 imx6q-sabrelite.dtb + => setenv bootargs root=/dev/mmcblk3p1 + => bootz 12000000 - 14000000 diff --git a/docs/system/arm/sbsa.rst b/docs/system/arm/sbsa.rst new file mode 100644 index 000000000..b499d7e92 --- /dev/null +++ b/docs/system/arm/sbsa.rst @@ -0,0 +1,32 @@ +Arm Server Base System Architecture Reference board (``sbsa-ref``) +================================================================== + +While the ``virt`` board is a generic board platform that doesn't match +any real hardware the ``sbsa-ref`` board intends to look like real +hardware. The `Server Base System Architecture +<https://developer.arm.com/documentation/den0029/latest>`_ defines a +minimum base line of hardware support and importantly how the firmware +reports that to any operating system. It is a static system that +reports a very minimal DT to the firmware for non-discoverable +information about components affected by the qemu command line (i.e. +cpus and memory). As a result it must have a firmware specifically +built to expect a certain hardware layout (as you would in a real +machine). + +It is intended to be a machine for developing firmware and testing +standards compliance with operating systems. + +Supported devices +""""""""""""""""" + +The sbsa-ref board supports: + + - A configurable number of AArch64 CPUs + - GIC version 3 + - System bus AHCI controller + - System bus EHCI controller + - CDROM and hard disc on AHCI bus + - E1000E ethernet card on PCIe bus + - VGA display adaptor on PCIe bus + - A generic SBSA watchdog device + diff --git a/docs/system/arm/stellaris.rst b/docs/system/arm/stellaris.rst new file mode 100644 index 000000000..8af4ad79c --- /dev/null +++ b/docs/system/arm/stellaris.rst @@ -0,0 +1,26 @@ +Stellaris boards (``lm3s6965evb``, ``lm3s811evb``) +================================================== + +The Luminary Micro Stellaris LM3S811EVB emulation includes the following +devices: + +- Cortex-M3 CPU core. + +- 64k Flash and 8k SRAM. + +- Timers, UARTs, ADC and |I2C| interface. + +- OSRAM Pictiva 96x16 OLED with SSD0303 controller on + |I2C| bus. + +The Luminary Micro Stellaris LM3S6965EVB emulation includes the +following devices: + +- Cortex-M3 CPU core. + +- 256k Flash and 64k SRAM. + +- Timers, UARTs, ADC, |I2C| and SSI interfaces. + +- OSRAM Pictiva 128x64 OLED with SSD0323 controller connected via + SSI. diff --git a/docs/system/arm/stm32.rst b/docs/system/arm/stm32.rst new file mode 100644 index 000000000..508b92cf8 --- /dev/null +++ b/docs/system/arm/stm32.rst @@ -0,0 +1,66 @@ +STMicroelectronics STM32 boards (``netduino2``, ``netduinoplus2``, ``stm32vldiscovery``) +======================================================================================== + +The `STM32`_ chips are a family of 32-bit ARM-based microcontroller by +STMicroelectronics. + +.. _STM32: https://www.st.com/en/microcontrollers-microprocessors/stm32-32-bit-arm-cortex-mcus.html + +The STM32F1 series is based on ARM Cortex-M3 core. The following machines are +based on this chip : + +- ``stm32vldiscovery`` STM32VLDISCOVERY board with STM32F100RBT6 microcontroller + +The STM32F2 series is based on ARM Cortex-M3 core. The following machines are +based on this chip : + +- ``netduino2`` Netduino 2 board with STM32F205RFT6 microcontroller + +The STM32F4 series is based on ARM Cortex-M4F core. This series is pin-to-pin +compatible with STM32F2 series. The following machines are based on this chip : + +- ``netduinoplus2`` Netduino Plus 2 board with STM32F405RGT6 microcontroller + +There are many other STM32 series that are currently not supported by QEMU. + +Supported devices +----------------- + + * ARM Cortex-M3, Cortex M4F + * Analog to Digital Converter (ADC) + * EXTI interrupt + * Serial ports (USART) + * SPI controller + * System configuration (SYSCFG) + * Timer controller (TIMER) + +Missing devices +--------------- + + * Camera interface (DCMI) + * Controller Area Network (CAN) + * Cycle Redundancy Check (CRC) calculation unit + * Digital to Analog Converter (DAC) + * DMA controller + * Ethernet controller + * Flash Interface Unit + * GPIO controller + * I2C controller + * Inter-Integrated Sound (I2S) controller + * Power supply configuration (PWR) + * Random Number Generator (RNG) + * Real-Time Clock (RTC) controller + * Reset and Clock Controller (RCC) + * Secure Digital Input/Output (SDIO) interface + * USB OTG + * Watchdog controller (IWDG, WWDG) + +Boot options +------------ + +The STM32 machines can be started using the ``-kernel`` option to load a +firmware. Example: + +.. code-block:: bash + + $ qemu-system-arm -M stm32vldiscovery -kernel firmware.bin diff --git a/docs/system/arm/sx1.rst b/docs/system/arm/sx1.rst new file mode 100644 index 000000000..8bce30d4b --- /dev/null +++ b/docs/system/arm/sx1.rst @@ -0,0 +1,18 @@ +Siemens SX1 (``sx1``, ``sx1-v1``) +================================= + +The Siemens SX1 models v1 and v2 (default) basic emulation. The +emulation includes the following elements: + +- Texas Instruments OMAP310 System-on-chip (ARM925T core) + +- ROM and RAM memories (ROM firmware image can be loaded with + -pflash) V1 1 Flash of 16MB and 1 Flash of 8MB V2 1 Flash of 32MB + +- On-chip LCD controller + +- On-chip Real Time Clock + +- Secure Digital card connected to OMAP MMC/SD host + +- Three on-chip UARTs diff --git a/docs/system/arm/versatile.rst b/docs/system/arm/versatile.rst new file mode 100644 index 000000000..2ae792bac --- /dev/null +++ b/docs/system/arm/versatile.rst @@ -0,0 +1,63 @@ +Arm Versatile boards (``versatileab``, ``versatilepb``) +======================================================= + +The Arm Versatile baseboard is emulated with the following devices: + +- ARM926E, ARM1136 or Cortex-A8 CPU + +- PL190 Vectored Interrupt Controller + +- Four PL011 UARTs + +- SMC 91c111 Ethernet adapter + +- PL110 LCD controller + +- PL050 KMI with PS/2 keyboard and mouse. + +- PCI host bridge. Note the emulated PCI bridge only provides access + to PCI memory space. It does not provide access to PCI IO space. This + means some devices (eg. ne2k_pci NIC) are not usable, and others (eg. + rtl8139 NIC) are only usable when the guest drivers use the memory + mapped control registers. + +- PCI OHCI USB controller. + +- LSI53C895A PCI SCSI Host Bus Adapter with hard disk and CD-ROM + devices. + +- PL181 MultiMedia Card Interface with SD card. + +Booting a Linux kernel +---------------------- + +Building a current Linux kernel with ``versatile_defconfig`` should be +enough to get something running. Nowadays an out-of-tree build is +recommended (and also useful if you build a lot of different targets). +In the following example $BLD points to the build directory and $SRC +points to the root of the Linux source tree. You can drop $SRC if you +are running from there. + +.. code-block:: bash + + $ make O=$BLD -C $SRC ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- versatile_defconfig + $ make O=$BLD -C $SRC ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- + +You may want to enable some additional modules if you want to boot +something from the SCSI interface:: + + CONFIG_PCI=y + CONFIG_PCI_VERSATILE=y + CONFIG_SCSI=y + CONFIG_SCSI_SYM53C8XX_2=y + +You can then boot with a command line like: + +.. code-block:: bash + + $ qemu-system-arm -machine type=versatilepb \ + -serial mon:stdio \ + -drive if=scsi,driver=file,filename=debian-buster-armel-rootfs.ext4 \ + -kernel zImage \ + -dtb versatile-pb.dtb \ + -append "console=ttyAMA0 ro root=/dev/sda" diff --git a/docs/system/arm/vexpress.rst b/docs/system/arm/vexpress.rst new file mode 100644 index 000000000..3e3839e92 --- /dev/null +++ b/docs/system/arm/vexpress.rst @@ -0,0 +1,88 @@ +Arm Versatile Express boards (``vexpress-a9``, ``vexpress-a15``) +================================================================ + +QEMU models two variants of the Arm Versatile Express development +board family: + +- ``vexpress-a9`` models the combination of the Versatile Express + motherboard and the CoreTile Express A9x4 daughterboard +- ``vexpress-a15`` models the combination of the Versatile Express + motherboard and the CoreTile Express A15x2 daughterboard + +Note that as this hardware does not have PCI, IDE or SCSI, +the only available storage option is emulated SD card. + +Implemented devices: + +- PL041 audio +- PL181 SD controller +- PL050 keyboard and mouse +- PL011 UARTs +- SP804 timers +- I2C controller +- PL031 RTC +- PL111 LCD display controller +- Flash memory +- LAN9118 ethernet + +Unimplemented devices: + +- SP810 system control block +- PCI-express +- USB controller (Philips ISP1761) +- Local DAP ROM +- CoreSight interfaces +- PL301 AXI interconnect +- SCC +- System counter +- HDLCD controller (``vexpress-a15``) +- SP805 watchdog +- PL341 dynamic memory controller +- DMA330 DMA controller +- PL354 static memory controller +- BP147 TrustZone Protection Controller +- TrustZone Address Space Controller + +Other differences between the hardware and the QEMU model: + +- QEMU will default to creating one CPU unless you pass a different + ``-smp`` argument +- QEMU allows the amount of RAM provided to be specified with the + ``-m`` argument +- QEMU defaults to providing a CPU which does not provide either + TrustZone or the Virtualization Extensions: if you want these you + must enable them with ``-machine secure=on`` and ``-machine + virtualization=on`` +- QEMU provides 4 virtio-mmio virtio transports; these start at + address ``0x10013000`` for ``vexpress-a9`` and at ``0x1c130000`` for + ``vexpress-a15``, and have IRQs from 40 upwards. If a dtb is + provided on the command line then QEMU will edit it to include + suitable entries describing these transports for the guest. + +Booting a Linux kernel +---------------------- + +Building a current Linux kernel with ``multi_v7_defconfig`` should be +enough to get something running. Nowadays an out-of-tree build is +recommended (and also useful if you build a lot of different targets). +In the following example $BLD points to the build directory and $SRC +points to the root of the Linux source tree. You can drop $SRC if you +are running from there. + +.. code-block:: bash + + $ make O=$BLD -C $SRC ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- multi_v7_defconfig + $ make O=$BLD -C $SRC ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- + +By default you will want to boot your rootfs off the sdcard interface. +Your rootfs will need to be padded to the right size. With a suitable +DTB you could also add devices to the virtio-mmio bus. + +.. code-block:: bash + + $ qemu-system-arm -cpu cortex-a15 -smp 4 -m 4096 \ + -machine type=vexpress-a15 -serial mon:stdio \ + -drive if=sd,driver=file,filename=armel-rootfs.ext4 \ + -kernel zImage \ + -dtb vexpress-v2p-ca15-tc1.dtb \ + -append "console=ttyAMA0 root=/dev/mmcblk0 ro" diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst new file mode 100644 index 000000000..850787495 --- /dev/null +++ b/docs/system/arm/virt.rst @@ -0,0 +1,168 @@ +'virt' generic virtual platform (``virt``) +========================================== + +The ``virt`` board is a platform which does not correspond to any +real hardware; it is designed for use in virtual machines. +It is the recommended board type if you simply want to run +a guest such as Linux and do not care about reproducing the +idiosyncrasies and limitations of a particular bit of real-world +hardware. + +This is a "versioned" board model, so as well as the ``virt`` machine +type itself (which may have improvements, bugfixes and other minor +changes between QEMU versions) a version is provided that guarantees +to have the same behaviour as that of previous QEMU releases, so +that VM migration will work between QEMU versions. For instance the +``virt-5.0`` machine type will behave like the ``virt`` machine from +the QEMU 5.0 release, and migration should work between ``virt-5.0`` +of the 5.0 release and ``virt-5.0`` of the 5.1 release. Migration +is not guaranteed to work between different QEMU releases for +the non-versioned ``virt`` machine type. + +Supported devices +""""""""""""""""" + +The virt board supports: + +- PCI/PCIe devices +- Flash memory +- One PL011 UART +- An RTC +- The fw_cfg device that allows a guest to obtain data from QEMU +- A PL061 GPIO controller +- An optional SMMUv3 IOMMU +- hotpluggable DIMMs +- hotpluggable NVDIMMs +- An MSI controller (GICv2M or ITS). GICv2M is selected by default along + with GICv2. ITS is selected by default with GICv3 (>= virt-2.7). Note + that ITS is not modeled in TCG mode. +- 32 virtio-mmio transport devices +- running guests using the KVM accelerator on aarch64 hardware +- large amounts of RAM (at least 255GB, and more if using highmem) +- many CPUs (up to 512 if using a GICv3 and highmem) +- Secure-World-only devices if the CPU has TrustZone: + + - A second PL011 UART + - A second PL061 GPIO controller, with GPIO lines for triggering + a system reset or system poweroff + - A secure flash memory + - 16MB of secure RAM + +Supported guest CPU types: + +- ``cortex-a7`` (32-bit) +- ``cortex-a15`` (32-bit; the default) +- ``cortex-a53`` (64-bit) +- ``cortex-a57`` (64-bit) +- ``cortex-a72`` (64-bit) +- ``a64fx`` (64-bit) +- ``host`` (with KVM only) +- ``max`` (same as ``host`` for KVM; best possible emulation with TCG) + +Note that the default is ``cortex-a15``, so for an AArch64 guest you must +specify a CPU type. + +Graphics output is available, but unlike the x86 PC machine types +there is no default display device enabled: you should select one from +the Display devices section of "-device help". The recommended option +is ``virtio-gpu-pci``; this is the only one which will work correctly +with KVM. You may also need to ensure your guest kernel is configured +with support for this; see below. + +Machine-specific options +"""""""""""""""""""""""" + +The following machine-specific options are supported: + +secure + Set ``on``/``off`` to enable/disable emulating a guest CPU which implements the + Arm Security Extensions (TrustZone). The default is ``off``. + +virtualization + Set ``on``/``off`` to enable/disable emulating a guest CPU which implements the + Arm Virtualization Extensions. The default is ``off``. + +mte + Set ``on``/``off`` to enable/disable emulating a guest CPU which implements the + Arm Memory Tagging Extensions. The default is ``off``. + +highmem + Set ``on``/``off`` to enable/disable placing devices and RAM in physical + address space above 32 bits. The default is ``on`` for machine types + later than ``virt-2.12``. + +gic-version + Specify the version of the Generic Interrupt Controller (GIC) to provide. + Valid values are: + + ``2`` + GICv2 + ``3`` + GICv3 + ``host`` + Use the same GIC version the host provides, when using KVM + ``max`` + Use the best GIC version possible (same as host when using KVM; + currently same as ``3``` for TCG, but this may change in future) + +its + Set ``on``/``off`` to enable/disable ITS instantiation. The default is ``on`` + for machine types later than ``virt-2.7``. + +iommu + Set the IOMMU type to create for the guest. Valid values are: + + ``none`` + Don't create an IOMMU (the default) + ``smmuv3`` + Create an SMMUv3 + +ras + Set ``on``/``off`` to enable/disable reporting host memory errors to a guest + using ACPI and guest external abort exceptions. The default is off. + +Linux guest kernel configuration +"""""""""""""""""""""""""""""""" + +The 'defconfig' for Linux arm and arm64 kernels should include the +right device drivers for virtio and the PCI controller; however some older +kernel versions, especially for 32-bit Arm, did not have everything +enabled by default. If you're not seeing PCI devices that you expect, +then check that your guest config has:: + + CONFIG_PCI=y + CONFIG_VIRTIO_PCI=y + CONFIG_PCI_HOST_GENERIC=y + +If you want to use the ``virtio-gpu-pci`` graphics device you will also +need:: + + CONFIG_DRM=y + CONFIG_DRM_VIRTIO_GPU=y + +Hardware configuration information for bare-metal programming +""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" + +The ``virt`` board automatically generates a device tree blob ("dtb") +which it passes to the guest. This provides information about the +addresses, interrupt lines and other configuration of the various devices +in the system. Guest code can rely on and hard-code the following +addresses: + +- Flash memory starts at address 0x0000_0000 + +- RAM starts at 0x4000_0000 + +All other information about device locations may change between +QEMU versions, so guest code must look in the DTB. + +QEMU supports two types of guest image boot for ``virt``, and +the way for the guest code to locate the dtb binary differs: + +- For guests using the Linux kernel boot protocol (this means any + non-ELF file passed to the QEMU ``-kernel`` option) the address + of the DTB is passed in a register (``r2`` for 32-bit guests, + or ``x0`` for 64-bit guests) + +- For guests booting as "bare-metal" (any other kind of boot), + the DTB is at the start of RAM (0x4000_0000) diff --git a/docs/system/arm/xlnx-versal-virt.rst b/docs/system/arm/xlnx-versal-virt.rst new file mode 100644 index 000000000..92ad10d2d --- /dev/null +++ b/docs/system/arm/xlnx-versal-virt.rst @@ -0,0 +1,226 @@ +Xilinx Versal Virt (``xlnx-versal-virt``) +========================================= + +Xilinx Versal is a family of heterogeneous multi-core SoCs +(System on Chip) that combine traditional hardened CPUs and I/O +peripherals in a Processing System (PS) with runtime programmable +FPGA logic (PL) and an Artificial Intelligence Engine (AIE). + +More details here: +https://www.xilinx.com/products/silicon-devices/acap/versal.html + +The family of Versal SoCs share a single architecture but come in +different parts with different speed grades, amounts of PL and +other differences. + +The Xilinx Versal Virt board in QEMU is a model of a virtual board +(does not exist in reality) with a virtual Versal SoC without I/O +limitations. Currently, we support the following cores and devices: + +Implemented CPU cores: + +- 2 ACPUs (ARM Cortex-A72) + +Implemented devices: + +- Interrupt controller (ARM GICv3) +- 2 UARTs (ARM PL011) +- An RTC (Versal built-in) +- 2 GEMs (Cadence MACB Ethernet MACs) +- 8 ADMA (Xilinx zDMA) channels +- 2 SD Controllers +- OCM (256KB of On Chip Memory) +- XRAM (4MB of on chip Accelerator RAM) +- DDR memory +- BBRAM (36 bytes of Battery-backed RAM) +- eFUSE (3072 bytes of one-time field-programmable bit array) + +QEMU does not yet model any other devices, including the PL and the AI Engine. + +Other differences between the hardware and the QEMU model: + +- QEMU allows the amount of DDR memory provided to be specified with the + ``-m`` argument. If a DTB is provided on the command line then QEMU will + edit it to include suitable entries describing the Versal DDR memory ranges. + +- QEMU provides 8 virtio-mmio virtio transports; these start at + address ``0xa0000000`` and have IRQs from 111 and upwards. + +Running +""""""" +If the user provides an Operating System to be loaded, we expect users +to use the ``-kernel`` command line option. + +Users can load firmware or boot-loaders with the ``-device loader`` options. + +When loading an OS, QEMU generates a DTB and selects an appropriate address +where it gets loaded. This DTB will be passed to the kernel in register x0. + +If there's no ``-kernel`` option, we generate a DTB and place it at 0x1000 +for boot-loaders or firmware to pick it up. + +If users want to provide their own DTB, they can use the ``-dtb`` option. +These DTBs will have their memory nodes modified to match QEMU's +selected ram_size option before they get passed to the kernel or FW. + +When loading an OS, we turn on QEMU's PSCI implementation with SMC +as the PSCI conduit. When there's no ``-kernel`` option, we assume the user +provides EL3 firmware to handle PSCI. + +A few examples: + +Direct Linux boot of a generic ARM64 upstream Linux kernel: + +.. code-block:: bash + + $ qemu-system-aarch64 -M xlnx-versal-virt -m 2G \ + -serial mon:stdio -display none \ + -kernel arch/arm64/boot/Image \ + -nic user -nic user \ + -device virtio-rng-device,bus=virtio-mmio-bus.0 \ + -drive if=none,index=0,file=hd0.qcow2,id=hd0,snapshot \ + -drive file=qemu_sd.qcow2,if=sd,index=0,snapshot \ + -device virtio-blk-device,drive=hd0 -append root=/dev/vda + +Direct Linux boot of PetaLinux 2019.2: + +.. code-block:: bash + + $ qemu-system-aarch64 -M xlnx-versal-virt -m 2G \ + -serial mon:stdio -display none \ + -kernel petalinux-v2019.2/Image \ + -append "rdinit=/sbin/init console=ttyAMA0,115200n8 earlycon=pl011,mmio,0xFF000000,115200n8" \ + -net nic,model=cadence_gem,netdev=net0 -netdev user,id=net0 \ + -device virtio-rng-device,bus=virtio-mmio-bus.0,rng=rng0 \ + -object rng-random,filename=/dev/urandom,id=rng0 + +Boot PetaLinux 2019.2 via ARM Trusted Firmware (2018.3 because the 2019.2 +version of ATF tries to configure the CCI which we don't model) and U-boot: + +.. code-block:: bash + + $ qemu-system-aarch64 -M xlnx-versal-virt -m 2G \ + -serial stdio -display none \ + -device loader,file=petalinux-v2018.3/bl31.elf,cpu-num=0 \ + -device loader,file=petalinux-v2019.2/u-boot.elf \ + -device loader,addr=0x20000000,file=petalinux-v2019.2/Image \ + -nic user -nic user \ + -device virtio-rng-device,bus=virtio-mmio-bus.0,rng=rng0 \ + -object rng-random,filename=/dev/urandom,id=rng0 + +Run the following at the U-Boot prompt: + +.. code-block:: bash + + Versal> + fdt addr $fdtcontroladdr + fdt move $fdtcontroladdr 0x40000000 + fdt set /timer clock-frequency <0x3dfd240> + setenv bootargs "rdinit=/sbin/init maxcpus=1 console=ttyAMA0,115200n8 earlycon=pl011,mmio,0xFF000000,115200n8" + booti 20000000 - 40000000 + fdt addr $fdtcontroladdr + +Boot Linux as DOM0 on Xen via U-Boot: + +.. code-block:: bash + + $ qemu-system-aarch64 -M xlnx-versal-virt -m 4G \ + -serial stdio -display none \ + -device loader,file=petalinux-v2019.2/u-boot.elf,cpu-num=0 \ + -device loader,addr=0x30000000,file=linux/2018-04-24/xen \ + -device loader,addr=0x40000000,file=petalinux-v2019.2/Image \ + -nic user -nic user \ + -device virtio-rng-device,bus=virtio-mmio-bus.0,rng=rng0 \ + -object rng-random,filename=/dev/urandom,id=rng0 + +Run the following at the U-Boot prompt: + +.. code-block:: bash + + Versal> + fdt addr $fdtcontroladdr + fdt move $fdtcontroladdr 0x20000000 + fdt set /timer clock-frequency <0x3dfd240> + fdt set /chosen xen,xen-bootargs "console=dtuart dtuart=/uart@ff000000 dom0_mem=640M bootscrub=0 maxcpus=1 timer_slop=0" + fdt set /chosen xen,dom0-bootargs "rdinit=/sbin/init clk_ignore_unused console=hvc0 maxcpus=1" + fdt mknode /chosen dom0 + fdt set /chosen/dom0 compatible "xen,multiboot-module" + fdt set /chosen/dom0 reg <0x00000000 0x40000000 0x0 0x03100000> + booti 30000000 - 20000000 + +Boot Linux as Dom0 on Xen via ARM Trusted Firmware and U-Boot: + +.. code-block:: bash + + $ qemu-system-aarch64 -M xlnx-versal-virt -m 4G \ + -serial stdio -display none \ + -device loader,file=petalinux-v2018.3/bl31.elf,cpu-num=0 \ + -device loader,file=petalinux-v2019.2/u-boot.elf \ + -device loader,addr=0x30000000,file=linux/2018-04-24/xen \ + -device loader,addr=0x40000000,file=petalinux-v2019.2/Image \ + -nic user -nic user \ + -device virtio-rng-device,bus=virtio-mmio-bus.0,rng=rng0 \ + -object rng-random,filename=/dev/urandom,id=rng0 + +Run the following at the U-Boot prompt: + +.. code-block:: bash + + Versal> + fdt addr $fdtcontroladdr + fdt move $fdtcontroladdr 0x20000000 + fdt set /timer clock-frequency <0x3dfd240> + fdt set /chosen xen,xen-bootargs "console=dtuart dtuart=/uart@ff000000 dom0_mem=640M bootscrub=0 maxcpus=1 timer_slop=0" + fdt set /chosen xen,dom0-bootargs "rdinit=/sbin/init clk_ignore_unused console=hvc0 maxcpus=1" + fdt mknode /chosen dom0 + fdt set /chosen/dom0 compatible "xen,multiboot-module" + fdt set /chosen/dom0 reg <0x00000000 0x40000000 0x0 0x03100000> + booti 30000000 - 20000000 + +BBRAM File Backend +"""""""""""""""""" +BBRAM can have an optional file backend, which must be a seekable +binary file with a size of 36 bytes or larger. A file with all +binary 0s is a 'blank'. + +To add a file-backend for the BBRAM: + +.. code-block:: bash + + -drive if=pflash,index=0,file=versal-bbram.bin,format=raw + +To use a different index value, N, from default of 0, add: + +.. code-block:: bash + + -global xlnx,bbram-ctrl.drive-index=N + +eFUSE File Backend +"""""""""""""""""" +eFUSE can have an optional file backend, which must be a seekable +binary file with a size of 3072 bytes or larger. A file with all +binary 0s is a 'blank'. + +To add a file-backend for the eFUSE: + +.. code-block:: bash + + -drive if=pflash,index=1,file=versal-efuse.bin,format=raw + +To use a different index value, N, from default of 1, add: + +.. code-block:: bash + + -global xlnx,efuse.drive-index=N + +.. warning:: + In actual physical Versal, BBRAM and eFUSE contain sensitive data. + The QEMU device models do **not** encrypt nor obfuscate any data + when holding them in models' memory or when writing them to their + file backends. + + Thus, a file backend should be used with caution, and 'format=luks' + is highly recommended (albeit with usage complexity). + + Better yet, do not use actual product data when running guest image + on this Xilinx Versal Virt board. diff --git a/docs/system/arm/xscale.rst b/docs/system/arm/xscale.rst new file mode 100644 index 000000000..d2d5949e1 --- /dev/null +++ b/docs/system/arm/xscale.rst @@ -0,0 +1,35 @@ +Sharp XScale-based PDA models (``akita``, ``borzoi``, ``spitz``, ``terrier``, ``tosa``) +======================================================================================= + +The Sharp Zaurus are PDAs based on XScale, able to run Linux ('SL series'). + +The SL-6000 (\"Tosa\"), released in 2005, uses a PXA255 System-on-chip. + +The SL-C3000 (\"Spitz\"), SL-C1000 (\"Akita\"), SL-C3100 (\"Borzoi\") and +SL-C3200 (\"Terrier\") use a PXA270. + +The clamshell PDA models emulation includes the following peripherals: + +- Intel PXA255/PXA270 System-on-chip (ARMv5TE core) + +- NAND Flash memory - not in \"Tosa\" + +- IBM/Hitachi DSCM microdrive in a PXA PCMCIA slot - not in \"Akita\" + +- On-chip OHCI USB controller - not in \"Tosa\" + +- On-chip LCD controller + +- On-chip Real Time Clock + +- TI ADS7846 touchscreen controller on SSP bus + +- Maxim MAX1111 analog-digital converter on |I2C| bus + +- GPIO-connected keyboard controller and LEDs + +- Secure Digital card connected to PXA MMC/SD host + +- Three on-chip UARTs + +- WM8750 audio CODEC on |I2C| and |I2S| busses diff --git a/docs/system/authz.rst b/docs/system/authz.rst new file mode 100644 index 000000000..55b7315e4 --- /dev/null +++ b/docs/system/authz.rst @@ -0,0 +1,257 @@ +.. _client authorization: + +Client authorization +-------------------- + +When configuring a QEMU network backend with either TLS certificates or SASL +authentication, access will be granted if the client successfully proves +their identity. If the authorization identity database is scoped to the QEMU +client this may be sufficient. It is common, however, for the identity database +to be much broader and thus authentication alone does not enable sufficient +access control. In this case QEMU provides a flexible system for enforcing +finer grained authorization on clients post-authentication. + +Identity providers +~~~~~~~~~~~~~~~~~~ + +At the time of writing there are two authentication frameworks used by QEMU +that emit an identity upon completion. + + * TLS x509 certificate distinguished name. + + When configuring the QEMU backend as a network server with TLS, there + are a choice of credentials to use. The most common scenario is to utilize + x509 certificates. The simplest configuration only involves issuing + certificates to the servers, allowing the client to avoid a MITM attack + against their intended server. + + It is possible, however, to enable mutual verification by requiring that + the client provide a certificate to the server to prove its own identity. + This is done by setting the property ``verify-peer=yes`` on the + ``tls-creds-x509`` object, which is in fact the default. + + When peer verification is enabled, client will need to be issued with a + certificate by the same certificate authority as the server. If this is + still not sufficiently strong access control the Distinguished Name of + the certificate can be used as an identity in the QEMU authorization + framework. + + * SASL username. + + When configuring the QEMU backend as a network server with SASL, upon + completion of the SASL authentication mechanism, a username will be + provided. The format of this username will vary depending on the choice + of mechanism configured for SASL. It might be a simple UNIX style user + ``joebloggs``, while if using Kerberos/GSSAPI it can have a realm + attached ``joebloggs@QEMU.ORG``. Whatever format the username is presented + in, it can be used with the QEMU authorization framework. + +Authorization drivers +~~~~~~~~~~~~~~~~~~~~~ + +The QEMU authorization framework is a general purpose design with choice of +user customizable drivers. These are provided as objects that can be +created at startup using the ``-object`` argument, or at runtime using the +``object_add`` monitor command. + +Simple +^^^^^^ + +This authorization driver provides a simple mechanism for granting access +based on an exact match against a single identity. This is useful when it is +known that only a single client is to be allowed access. + +A possible use case would be when configuring QEMU for an incoming live +migration. It is known exactly which source QEMU the migration is expected +to arrive from. The x509 certificate associated with this source QEMU would +thus be used as the identity to match against. Alternatively if the virtual +machine is dedicated to a specific tenant, then the VNC server would be +configured with SASL and the username of only that tenant listed. + +To create an instance of this driver via QMP: + +:: + + { + "execute": "object-add", + "arguments": { + "qom-type": "authz-simple", + "id": "authz0", + "identity": "fred" + } + } + + +Or via the command line + +:: + + -object authz-simple,id=authz0,identity=fred + + +List +^^^^ + +In some network backends it will be desirable to grant access to a range of +clients. This authorization driver provides a list mechanism for granting +access by matching identities against a list of permitted one. Each match +rule has an associated policy and a catch all policy applies if no rule +matches. The match can either be done as an exact string comparison, or can +use the shell-like glob syntax, which allows for use of wildcards. + +To create an instance of this class via QMP: + +:: + + { + "execute": "object-add", + "arguments": { + "qom-type": "authz-list", + "id": "authz0", + "rules": [ + { "match": "fred", "policy": "allow", "format": "exact" }, + { "match": "bob", "policy": "allow", "format": "exact" }, + { "match": "danb", "policy": "deny", "format": "exact" }, + { "match": "dan*", "policy": "allow", "format": "glob" } + ], + "policy": "deny" + } + } + + +Due to the way this driver requires setting nested properties, creating +it on the command line will require use of the JSON syntax for ``-object``. +In most cases, however, the next driver will be more suitable. + +List file +^^^^^^^^^ + +This is a variant on the previous driver that allows for a more dynamic +access control policy by storing the match rules in a standalone file +that can be reloaded automatically upon change. + +To create an instance of this class via QMP: + +:: + + { + "execute": "object-add", + "arguments": { + "qom-type": "authz-list-file", + "id": "authz0", + "filename": "/etc/qemu/myvm-vnc.acl", + "refresh": true + } + } + + +If ``refresh`` is ``yes``, inotify is used to monitor for changes +to the file and auto-reload the rules. + +The ``myvm-vnc.acl`` file should contain the match rules in a format that +closely matches the previous driver: + +:: + + { + "rules": [ + { "match": "fred", "policy": "allow", "format": "exact" }, + { "match": "bob", "policy": "allow", "format": "exact" }, + { "match": "danb", "policy": "deny", "format": "exact" }, + { "match": "dan*", "policy": "allow", "format": "glob" } + ], + "policy": "deny" + } + + +The object can be created on the command line using + +:: + + -object authz-list-file,id=authz0,\ + filename=/etc/qemu/myvm-vnc.acl,refresh=on + + +PAM +^^^ + +In some scenarios it might be desirable to integrate with authorization +mechanisms that are implemented outside of QEMU. In order to allow maximum +flexibility, QEMU provides a driver that uses the ``PAM`` framework. + +To create an instance of this class via QMP: + +:: + + { + "execute": "object-add", + "arguments": { + "qom-type": "authz-pam", + "id": "authz0", + "parameters": { + "service": "qemu-vnc-tls" + } + } + } + + +The driver only uses the PAM "account" verification +subsystem. The above config would require a config +file /etc/pam.d/qemu-vnc-tls. For a simple file +lookup it would contain + +:: + + account requisite pam_listfile.so item=user sense=allow \ + file=/etc/qemu/vnc.allow + + +The external file would then contain a list of usernames. +If x509 cert was being used as the username, a suitable +entry would match the distinguished name: + +:: + + CN=laptop.berrange.com,O=Berrange Home,L=London,ST=London,C=GB + + +On the command line it can be created using + +:: + + -object authz-pam,id=authz0,service=qemu-vnc-tls + + +There are a variety of PAM plugins that can be used which are not illustrated +here, and it is possible to implement brand new plugins using the PAM API. + + +Connecting backends +~~~~~~~~~~~~~~~~~~~ + +The authorization driver is created using the ``-object`` argument and then +needs to be associated with a network service. The authorization driver object +will be given a unique ID that needs to be referenced. + +The property to set in the network service will vary depending on the type of +identity to verify. By convention, any network server backend that uses TLS +will provide ``tls-authz`` property, while any server using SASL will provide +a ``sasl-authz`` property. + +Thus an example using SASL and authorization for the VNC server would look +like: + +:: + + $QEMU --object authz-simple,id=authz0,identity=fred \ + --vnc 0.0.0.0:1,sasl,sasl-authz=authz0 + +While to validate both the x509 certificate and SASL username: + +:: + + echo "CN=laptop.qemu.org,O=QEMU Project,L=London,ST=London,C=GB" >> tls.acl + $QEMU --object authz-simple,id=authz0,identity=fred \ + --object authz-list-file,id=authz1,filename=tls.acl \ + --object tls-creds-x509,id=tls0,dir=/etc/qemu/tls,verify-peer=yes \ + --vnc 0.0.0.0:1,sasl,sasl-authz=auth0,tls-creds=tls0,tls-authz=authz1 diff --git a/docs/system/barrier.rst b/docs/system/barrier.rst new file mode 100644 index 000000000..155d7d290 --- /dev/null +++ b/docs/system/barrier.rst @@ -0,0 +1,44 @@ +QEMU Barrier Client +=================== + +Generally, mouse and keyboard are grabbed through the QEMU video +interface emulation. + +But when we want to use a video graphic adapter via a PCI passthrough +there is no way to provide the keyboard and mouse inputs to the VM +except by plugging a second set of mouse and keyboard to the host +or by installing a KVM software in the guest OS. + +The QEMU Barrier client avoids this by implementing directly the Barrier +protocol into QEMU. + +`Barrier <https://github.com/debauchee/barrier>`__ +is a KVM (Keyboard-Video-Mouse) software forked from Symless's +synergy 1.9 codebase. + +This protocol is enabled by adding an input-barrier object to QEMU. + +Syntax:: + + input-barrier,id=<object-id>,name=<guest display name> + [,server=<barrier server address>][,port=<barrier server port>] + [,x-origin=<x-origin>][,y-origin=<y-origin>] + [,width=<width>][,height=<height>] + +The object can be added on the QEMU command line, for instance with:: + + -object input-barrier,id=barrier0,name=VM-1 + +where VM-1 is the name the display configured in the Barrier server +on the host providing the mouse and the keyboard events. + +by default ``<barrier server address>`` is ``localhost``, +``<port>`` is ``24800``, ``<x-origin>`` and ``<y-origin>`` are set to ``0``, +``<width>`` and ``<height>`` to ``1920`` and ``1080``. + +If the Barrier server is stopped QEMU needs to be reconnected manually, +by removing and re-adding the input-barrier object, for instance +with the help of the HMP monitor:: + + (qemu) object_del barrier0 + (qemu) object_add input-barrier,id=barrier0,name=VM-1 diff --git a/docs/system/bootindex.rst b/docs/system/bootindex.rst new file mode 100644 index 000000000..8b057f812 --- /dev/null +++ b/docs/system/bootindex.rst @@ -0,0 +1,76 @@ +Managing device boot order with bootindex properties +==================================================== + +QEMU can tell QEMU-aware guest firmware (like the x86 PC BIOS) +which order it should look for a bootable OS on which devices. +A simple way to set this order is to use the ``-boot order=`` option, +but you can also do this more flexibly, by setting a ``bootindex`` +property on the individual block or net devices you specify +on the QEMU command line. + +The ``bootindex`` properties are used to determine the order in which +firmware will consider devices for booting the guest OS. If the +``bootindex`` property is not set for a device, it gets the lowest +boot priority. There is no particular order in which devices with no +``bootindex`` property set will be considered for booting, but they +will still be bootable. + +Some guest machine types (for instance the s390x machines) do +not support ``-boot order=``; on those machines you must always +use ``bootindex`` properties. + +There is no way to set a ``bootindex`` property if you are using +a short-form option like ``-hda`` or ``-cdrom``, so to use +``bootindex`` properties you will need to expand out those options +into long-form ``-drive`` and ``-device`` option pairs. + +Example +------- + +Let's assume we have a QEMU machine with two NICs (virtio, e1000) and two +disks (IDE, virtio): + +.. parsed-literal:: + + |qemu_system| -drive file=disk1.img,if=none,id=disk1 \\ + -device ide-hd,drive=disk1,bootindex=4 \\ + -drive file=disk2.img,if=none,id=disk2 \\ + -device virtio-blk-pci,drive=disk2,bootindex=3 \\ + -netdev type=user,id=net0 \\ + -device virtio-net-pci,netdev=net0,bootindex=2 \\ + -netdev type=user,id=net1 \\ + -device e1000,netdev=net1,bootindex=1 + +Given the command above, firmware should try to boot from the e1000 NIC +first. If this fails, it should try the virtio NIC next; if this fails +too, it should try the virtio disk, and then the IDE disk. + +Limitations +----------- + +Some firmware has limitations on which devices can be considered for +booting. For instance, the PC BIOS boot specification allows only one +disk to be bootable. If boot from disk fails for some reason, the BIOS +won't retry booting from other disk. It can still try to boot from +floppy or net, though. + +Sometimes, firmware cannot map the device path QEMU wants firmware to +boot from to a boot method. It doesn't happen for devices the firmware +can natively boot from, but if firmware relies on an option ROM for +booting, and the same option ROM is used for booting from more then one +device, the firmware may not be able to ask the option ROM to boot from +a particular device reliably. For instance with the PC BIOS, if a SCSI HBA +has three bootable devices target1, target3, target5 connected to it, +the option ROM will have a boot method for each of them, but it is not +possible to map from boot method back to a specific target. This is a +shortcoming of the PC BIOS boot specification. + +Mixing bootindex and boot order parameters +------------------------------------------ + +Note that it does not make sense to use the bootindex property together +with the ``-boot order=...`` (or ``-boot once=...``) parameter. The guest +firmware implementations normally either support the one or the other, +but not both parameters at the same time. Mixing them will result in +undefined behavior, and thus the guest firmware will likely not boot +from the expected devices. diff --git a/docs/system/cpu-hotplug.rst b/docs/system/cpu-hotplug.rst new file mode 100644 index 000000000..015ce2b6e --- /dev/null +++ b/docs/system/cpu-hotplug.rst @@ -0,0 +1,142 @@ +=================== +Virtual CPU hotplug +=================== + +A complete example of vCPU hotplug (and hot-unplug) using QMP +``device_add`` and ``device_del``. + +vCPU hotplug +------------ + +(1) Launch QEMU as follows (note that the "maxcpus" is mandatory to + allow vCPU hotplug):: + + $ qemu-system-x86_64 -display none -no-user-config -m 2048 \ + -nodefaults -monitor stdio -machine pc,accel=kvm,usb=off \ + -smp 1,maxcpus=2 -cpu IvyBridge-IBRS \ + -qmp unix:/tmp/qmp-sock,server=on,wait=off + +(2) Run 'qmp-shell' (located in the source tree, under: "scripts/qmp/) + to connect to the just-launched QEMU:: + + $> ./qmp-shell -p -v /tmp/qmp-sock + [...] + (QEMU) + +(3) Find out which CPU types could be plugged, and into which sockets:: + + (QEMU) query-hotpluggable-cpus + { + "execute": "query-hotpluggable-cpus", + "arguments": {} + } + { + "return": [ + { + "type": "IvyBridge-IBRS-x86_64-cpu", + "vcpus-count": 1, + "props": { + "socket-id": 1, + "core-id": 0, + "thread-id": 0 + } + }, + { + "qom-path": "/machine/unattached/device[0]", + "type": "IvyBridge-IBRS-x86_64-cpu", + "vcpus-count": 1, + "props": { + "socket-id": 0, + "core-id": 0, + "thread-id": 0 + } + } + ] + } + (QEMU) + +(4) The ``query-hotpluggable-cpus`` command returns an object for CPUs + that are present (containing a "qom-path" member) or which may be + hot-plugged (no "qom-path" member). From its output in step (3), we + can see that ``IvyBridge-IBRS-x86_64-cpu`` is present in socket 0, + while hot-plugging a CPU into socket 1 requires passing the listed + properties to QMP ``device_add``:: + + (QEMU) device_add id=cpu-2 driver=IvyBridge-IBRS-x86_64-cpu socket-id=1 core-id=0 thread-id=0 + { + "execute": "device_add", + "arguments": { + "socket-id": 1, + "driver": "IvyBridge-IBRS-x86_64-cpu", + "id": "cpu-2", + "core-id": 0, + "thread-id": 0 + } + } + { + "return": {} + } + (QEMU) + +(5) Optionally, run QMP ``query-cpus-fast`` for some details about the + vCPUs:: + + (QEMU) query-cpus-fast + { + "execute": "query-cpus-fast", + "arguments": {} + } + { + "return": [ + { + "qom-path": "/machine/unattached/device[0]", + "target": "x86_64", + "thread-id": 11534, + "cpu-index": 0, + "props": { + "socket-id": 0, + "core-id": 0, + "thread-id": 0 + }, + "arch": "x86" + }, + { + "qom-path": "/machine/peripheral/cpu-2", + "target": "x86_64", + "thread-id": 12106, + "cpu-index": 1, + "props": { + "socket-id": 1, + "core-id": 0, + "thread-id": 0 + }, + "arch": "x86" + } + ] + } + (QEMU) + +vCPU hot-unplug +--------------- + +From the 'qmp-shell', invoke the QMP ``device_del`` command:: + + (QEMU) device_del id=cpu-2 + { + "execute": "device_del", + "arguments": { + "id": "cpu-2" + } + } + { + "return": {} + } + (QEMU) + +.. note:: + vCPU hot-unplug requires guest cooperation; so the ``device_del`` + command above does not guarantee vCPU removal -- it's a "request to + unplug". At this point, the guest will get a System Control + Interrupt (SCI) and calls the ACPI handler for the affected vCPU + device. Then the guest kernel will bring the vCPU offline and tell + QEMU to unplug it. diff --git a/docs/system/cpu-models-mips.rst.inc b/docs/system/cpu-models-mips.rst.inc new file mode 100644 index 000000000..02cc4bb88 --- /dev/null +++ b/docs/system/cpu-models-mips.rst.inc @@ -0,0 +1,111 @@ +Supported CPU model configurations on MIPS hosts +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +QEMU supports variety of MIPS CPU models: + +Supported CPU models for MIPS32 hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPU models are supported for use on MIPS32 hosts. +Administrators / applications are recommended to use the CPU model that +matches the generation of the host CPUs in use. In a deployment with a +mixture of host CPU models between machines, if live migration +compatibility is required, use the newest CPU model that is compatible +across all desired hosts. + +``mips32r6-generic`` + MIPS32 Processor (Release 6, 2015) + +``P5600`` + MIPS32 Processor (P5600, 2014) + +``M14K``, ``M14Kc`` + MIPS32 Processor (M14K, 2009) + +``74Kf`` + MIPS32 Processor (74K, 2007) + +``34Kf`` + MIPS32 Processor (34K, 2006) + +``24Kc``, ``24KEc``, ``24Kf`` + MIPS32 Processor (24K, 2003) + +``4Kc``, ``4Km``, ``4KEcR1``, ``4KEmR1``, ``4KEc``, ``4KEm`` + MIPS32 Processor (4K, 1999) + + +Supported CPU models for MIPS64 hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPU models are supported for use on MIPS64 hosts. +Administrators / applications are recommended to use the CPU model that +matches the generation of the host CPUs in use. In a deployment with a +mixture of host CPU models between machines, if live migration +compatibility is required, use the newest CPU model that is compatible +across all desired hosts. + +``I6400`` + MIPS64 Processor (Release 6, 2014) + +``Loongson-2E`` + MIPS64 Processor (Loongson 2, 2006) + +``Loongson-2F`` + MIPS64 Processor (Loongson 2, 2008) + +``Loongson-3A1000`` + MIPS64 Processor (Loongson 3, 2010) + +``Loongson-3A4000`` + MIPS64 Processor (Loongson 3, 2018) + +``mips64dspr2`` + MIPS64 Processor (Release 2, 2006) + +``MIPS64R2-generic``, ``5KEc``, ``5KEf`` + MIPS64 Processor (Release 2, 2002) + +``20Kc`` + MIPS64 Processor (20K, 2000 + +``5Kc``, ``5Kf`` + MIPS64 Processor (5K, 1999) + +``VR5432`` + MIPS64 Processor (VR, 1998) + +``R4000`` + MIPS64 Processor (MIPS III, 1991) + + +Supported CPU models for nanoMIPS hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPU models are supported for use on nanoMIPS hosts. +Administrators / applications are recommended to use the CPU model that +matches the generation of the host CPUs in use. In a deployment with a +mixture of host CPU models between machines, if live migration +compatibility is required, use the newest CPU model that is compatible +across all desired hosts. + +``I7200`` + MIPS I7200 (nanoMIPS, 2018) + +Preferred CPU models for MIPS hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPU models are preferred for use on different MIPS hosts: + +``MIPS III`` + R4000 + +``MIPS32R2`` + 34Kf + +``MIPS64R6`` + I6400 + +``nanoMIPS`` + I7200 + diff --git a/docs/system/cpu-models-x86-abi.csv b/docs/system/cpu-models-x86-abi.csv new file mode 100644 index 000000000..f3f3b60be --- /dev/null +++ b/docs/system/cpu-models-x86-abi.csv @@ -0,0 +1,67 @@ +Model,baseline,v2,v3,v4 +486-v1,,,, +Broadwell-v1,✅,✅,✅, +Broadwell-v2,✅,✅,✅, +Broadwell-v3,✅,✅,✅, +Broadwell-v4,✅,✅,✅, +Cascadelake-Server-v1,✅,✅,✅,✅ +Cascadelake-Server-v2,✅,✅,✅,✅ +Cascadelake-Server-v3,✅,✅,✅,✅ +Cascadelake-Server-v4,✅,✅,✅,✅ +Conroe-v1,✅,,, +Cooperlake-v1,✅,✅,✅,✅ +Denverton-v1,✅,✅,, +Denverton-v2,✅,✅,, +Dhyana-v1,✅,✅,✅, +EPYC-Milan-v1,✅,✅,✅, +EPYC-Rome-v1,✅,✅,✅, +EPYC-Rome-v2,✅,✅,✅, +EPYC-v1,✅,✅,✅, +EPYC-v2,✅,✅,✅, +EPYC-v3,✅,✅,✅, +Haswell-v1,✅,✅,✅, +Haswell-v2,✅,✅,✅, +Haswell-v3,✅,✅,✅, +Haswell-v4,✅,✅,✅, +Icelake-Client-v1,✅,✅,✅, +Icelake-Client-v2,✅,✅,✅, +Icelake-Server-v1,✅,✅,✅,✅ +Icelake-Server-v2,✅,✅,✅,✅ +Icelake-Server-v3,✅,✅,✅,✅ +Icelake-Server-v4,✅,✅,✅,✅ +IvyBridge-v1,✅,✅,, +IvyBridge-v2,✅,✅,, +KnightsMill-v1,✅,✅,✅, +Nehalem-v1,✅,✅,, +Nehalem-v2,✅,✅,, +Opteron_G1-v1,✅,,, +Opteron_G2-v1,✅,,, +Opteron_G3-v1,✅,,, +Opteron_G4-v1,✅,✅,, +Opteron_G5-v1,✅,✅,, +Penryn-v1,✅,,, +SandyBridge-v1,✅,✅,, +SandyBridge-v2,✅,✅,, +Skylake-Client-v1,✅,✅,✅, +Skylake-Client-v2,✅,✅,✅, +Skylake-Client-v3,✅,✅,✅, +Skylake-Server-v1,✅,✅,✅,✅ +Skylake-Server-v2,✅,✅,✅,✅ +Skylake-Server-v3,✅,✅,✅,✅ +Skylake-Server-v4,✅,✅,✅,✅ +Snowridge-v1,✅,✅,, +Snowridge-v2,✅,✅,, +Westmere-v1,✅,✅,, +Westmere-v2,✅,✅,, +athlon-v1,,,, +core2duo-v1,✅,,, +coreduo-v1,,,, +kvm32-v1,,,, +kvm64-v1,✅,,, +n270-v1,,,, +pentium-v1,,,, +pentium2-v1,,,, +pentium3-v1,,,, +phenom-v1,✅,,, +qemu32-v1,,,, +qemu64-v1,✅,,, diff --git a/docs/system/cpu-models-x86.rst.inc b/docs/system/cpu-models-x86.rst.inc new file mode 100644 index 000000000..7f6368f99 --- /dev/null +++ b/docs/system/cpu-models-x86.rst.inc @@ -0,0 +1,440 @@ +Recommendations for KVM CPU model configuration on x86 hosts +============================================================ + +The information that follows provides recommendations for configuring +CPU models on x86 hosts. The goals are to maximise performance, while +protecting guest OS against various CPU hardware flaws, and optionally +enabling live migration between hosts with heterogeneous CPU models. + + +Two ways to configure CPU models with QEMU / KVM +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +(1) **Host passthrough** + + This passes the host CPU model features, model, stepping, exactly to + the guest. Note that KVM may filter out some host CPU model features + if they cannot be supported with virtualization. Live migration is + unsafe when this mode is used as libvirt / QEMU cannot guarantee a + stable CPU is exposed to the guest across hosts. This is the + recommended CPU to use, provided live migration is not required. + +(2) **Named model** + + QEMU comes with a number of predefined named CPU models, that + typically refer to specific generations of hardware released by + Intel and AMD. These allow the guest VMs to have a degree of + isolation from the host CPU, allowing greater flexibility in live + migrating between hosts with differing hardware. @end table + +In both cases, it is possible to optionally add or remove individual CPU +features, to alter what is presented to the guest by default. + +Libvirt supports a third way to configure CPU models known as "Host +model". This uses the QEMU "Named model" feature, automatically picking +a CPU model that is similar the host CPU, and then adding extra features +to approximate the host model as closely as possible. This does not +guarantee the CPU family, stepping, etc will precisely match the host +CPU, as they would with "Host passthrough", but gives much of the +benefit of passthrough, while making live migration safe. + + +ABI compatibility levels for CPU models +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The x86_64 architecture has a number of `ABI compatibility levels`_ +defined. Traditionally most operating systems and toolchains would +only target the original baseline ABI. It is expected that in +future OS and toolchains are likely to target newer ABIs. The +table that follows illustrates which ABI compatibility levels +can be satisfied by the QEMU CPU models. Note that the table only +lists the long term stable CPU model versions (eg Haswell-v4). +In addition to what is listed, there are also many CPU model +aliases which resolve to a different CPU model version, +depending on the machine type is in use. + +.. _ABI compatibility levels: https://gitlab.com/x86-psABIs/x86-64-ABI/ + +.. csv-table:: x86-64 ABI compatibility levels + :file: cpu-models-x86-abi.csv + :widths: 40,15,15,15,15 + :header-rows: 2 + + +Preferred CPU models for Intel x86 hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPU models are preferred for use on Intel hosts. +Administrators / applications are recommended to use the CPU model that +matches the generation of the host CPUs in use. In a deployment with a +mixture of host CPU models between machines, if live migration +compatibility is required, use the newest CPU model that is compatible +across all desired hosts. + +``Cascadelake-Server``, ``Cascadelake-Server-noTSX`` + Intel Xeon Processor (Cascade Lake, 2019), with "stepping" levels 6 + or 7 only. (The Cascade Lake Xeon processor with *stepping 5 is + vulnerable to MDS variants*.) + +``Skylake-Server``, ``Skylake-Server-IBRS``, ``Skylake-Server-IBRS-noTSX`` + Intel Xeon Processor (Skylake, 2016) + +``Skylake-Client``, ``Skylake-Client-IBRS``, ``Skylake-Client-noTSX-IBRS}`` + Intel Core Processor (Skylake, 2015) + +``Broadwell``, ``Broadwell-IBRS``, ``Broadwell-noTSX``, ``Broadwell-noTSX-IBRS`` + Intel Core Processor (Broadwell, 2014) + +``Haswell``, ``Haswell-IBRS``, ``Haswell-noTSX``, ``Haswell-noTSX-IBRS`` + Intel Core Processor (Haswell, 2013) + +``IvyBridge``, ``IvyBridge-IBR`` + Intel Xeon E3-12xx v2 (Ivy Bridge, 2012) + +``SandyBridge``, ``SandyBridge-IBRS`` + Intel Xeon E312xx (Sandy Bridge, 2011) + +``Westmere``, ``Westmere-IBRS`` + Westmere E56xx/L56xx/X56xx (Nehalem-C, 2010) + +``Nehalem``, ``Nehalem-IBRS`` + Intel Core i7 9xx (Nehalem Class Core i7, 2008) + +``Penryn`` + Intel Core 2 Duo P9xxx (Penryn Class Core 2, 2007) + +``Conroe`` + Intel Celeron_4x0 (Conroe/Merom Class Core 2, 2006) + + +Important CPU features for Intel x86 hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following are important CPU features that should be used on Intel +x86 hosts, when available in the host CPU. Some of them require explicit +configuration to enable, as they are not included by default in some, or +all, of the named CPU models listed above. In general all of these +features are included if using "Host passthrough" or "Host model". + +``pcid`` + Recommended to mitigate the cost of the Meltdown (CVE-2017-5754) fix. + + Included by default in Haswell, Broadwell & Skylake Intel CPU models. + + Should be explicitly turned on for Westmere, SandyBridge, and + IvyBridge Intel CPU models. Note that some desktop/mobile Westmere + CPUs cannot support this feature. + +``spec-ctrl`` + Required to enable the Spectre v2 (CVE-2017-5715) fix. + + Included by default in Intel CPU models with -IBRS suffix. + + Must be explicitly turned on for Intel CPU models without -IBRS + suffix. + + Requires the host CPU microcode to support this feature before it + can be used for guest CPUs. + +``stibp`` + Required to enable stronger Spectre v2 (CVE-2017-5715) fixes in some + operating systems. + + Must be explicitly turned on for all Intel CPU models. + + Requires the host CPU microcode to support this feature before it can + be used for guest CPUs. + +``ssbd`` + Required to enable the CVE-2018-3639 fix. + + Not included by default in any Intel CPU model. + + Must be explicitly turned on for all Intel CPU models. + + Requires the host CPU microcode to support this feature before it + can be used for guest CPUs. + +``pdpe1gb`` + Recommended to allow guest OS to use 1GB size pages. + + Not included by default in any Intel CPU model. + + Should be explicitly turned on for all Intel CPU models. + + Note that not all CPU hardware will support this feature. + +``md-clear`` + Required to confirm the MDS (CVE-2018-12126, CVE-2018-12127, + CVE-2018-12130, CVE-2019-11091) fixes. + + Not included by default in any Intel CPU model. + + Must be explicitly turned on for all Intel CPU models. + + Requires the host CPU microcode to support this feature before it + can be used for guest CPUs. + +``mds-no`` + Recommended to inform the guest OS that the host is *not* vulnerable + to any of the MDS variants ([MFBDS] CVE-2018-12130, [MLPDS] + CVE-2018-12127, [MSBDS] CVE-2018-12126). + + This is an MSR (Model-Specific Register) feature rather than a CPUID feature, + so it will not appear in the Linux ``/proc/cpuinfo`` in the host or + guest. Instead, the host kernel uses it to populate the MDS + vulnerability file in ``sysfs``. + + So it should only be enabled for VMs if the host reports @code{Not + affected} in the ``/sys/devices/system/cpu/vulnerabilities/mds`` file. + +``taa-no`` + Recommended to inform that the guest that the host is ``not`` + vulnerable to CVE-2019-11135, TSX Asynchronous Abort (TAA). + + This too is an MSR feature, so it does not show up in the Linux + ``/proc/cpuinfo`` in the host or guest. + + It should only be enabled for VMs if the host reports ``Not affected`` + in the ``/sys/devices/system/cpu/vulnerabilities/tsx_async_abort`` + file. + +``tsx-ctrl`` + Recommended to inform the guest that it can disable the Intel TSX + (Transactional Synchronization Extensions) feature; or, if the + processor is vulnerable, use the Intel VERW instruction (a + processor-level instruction that performs checks on memory access) as + a mitigation for the TAA vulnerability. (For details, refer to + Intel's `deep dive into MDS + <https://software.intel.com/security-software-guidance/insights/deep-dive-intel-analysis-microarchitectural-data-sampling>`_.) + + Expose this to the guest OS if and only if: (a) the host has TSX + enabled; *and* (b) the guest has ``rtm`` CPU flag enabled. + + By disabling TSX, KVM-based guests can avoid paying the price of + mitigating TSX-based attacks. + + Note that ``tsx-ctrl`` too is an MSR feature, so it does not show + up in the Linux ``/proc/cpuinfo`` in the host or guest. + + To validate that Intel TSX is indeed disabled for the guest, there are + two ways: (a) check for the *absence* of ``rtm`` in the guest's + ``/proc/cpuinfo``; or (b) the + ``/sys/devices/system/cpu/vulnerabilities/tsx_async_abort`` file in + the guest should report ``Mitigation: TSX disabled``. + + +Preferred CPU models for AMD x86 hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPU models are preferred for use on AMD hosts. +Administrators / applications are recommended to use the CPU model that +matches the generation of the host CPUs in use. In a deployment with a +mixture of host CPU models between machines, if live migration +compatibility is required, use the newest CPU model that is compatible +across all desired hosts. + +``EPYC``, ``EPYC-IBPB`` + AMD EPYC Processor (2017) + +``Opteron_G5`` + AMD Opteron 63xx class CPU (2012) + +``Opteron_G4`` + AMD Opteron 62xx class CPU (2011) + +``Opteron_G3`` + AMD Opteron 23xx (Gen 3 Class Opteron, 2009) + +``Opteron_G2`` + AMD Opteron 22xx (Gen 2 Class Opteron, 2006) + +``Opteron_G1`` + AMD Opteron 240 (Gen 1 Class Opteron, 2004) + + +Important CPU features for AMD x86 hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following are important CPU features that should be used on AMD x86 +hosts, when available in the host CPU. Some of them require explicit +configuration to enable, as they are not included by default in some, or +all, of the named CPU models listed above. In general all of these +features are included if using "Host passthrough" or "Host model". + +``ibpb`` + Required to enable the Spectre v2 (CVE-2017-5715) fix. + + Included by default in AMD CPU models with -IBPB suffix. + + Must be explicitly turned on for AMD CPU models without -IBPB suffix. + + Requires the host CPU microcode to support this feature before it + can be used for guest CPUs. + +``stibp`` + Required to enable stronger Spectre v2 (CVE-2017-5715) fixes in some + operating systems. + + Must be explicitly turned on for all AMD CPU models. + + Requires the host CPU microcode to support this feature before it + can be used for guest CPUs. + +``virt-ssbd`` + Required to enable the CVE-2018-3639 fix + + Not included by default in any AMD CPU model. + + Must be explicitly turned on for all AMD CPU models. + + This should be provided to guests, even if amd-ssbd is also provided, + for maximum guest compatibility. + + Note for some QEMU / libvirt versions, this must be force enabled when + when using "Host model", because this is a virtual feature that + doesn't exist in the physical host CPUs. + +``amd-ssbd`` + Required to enable the CVE-2018-3639 fix + + Not included by default in any AMD CPU model. + + Must be explicitly turned on for all AMD CPU models. + + This provides higher performance than ``virt-ssbd`` so should be + exposed to guests whenever available in the host. ``virt-ssbd`` should + none the less also be exposed for maximum guest compatibility as some + kernels only know about ``virt-ssbd``. + +``amd-no-ssb`` + Recommended to indicate the host is not vulnerable CVE-2018-3639 + + Not included by default in any AMD CPU model. + + Future hardware generations of CPU will not be vulnerable to + CVE-2018-3639, and thus the guest should be told not to enable + its mitigations, by exposing amd-no-ssb. This is mutually + exclusive with virt-ssbd and amd-ssbd. + +``pdpe1gb`` + Recommended to allow guest OS to use 1GB size pages + + Not included by default in any AMD CPU model. + + Should be explicitly turned on for all AMD CPU models. + + Note that not all CPU hardware will support this feature. + + +Default x86 CPU models +^^^^^^^^^^^^^^^^^^^^^^ + +The default QEMU CPU models are designed such that they can run on all +hosts. If an application does not wish to do perform any host +compatibility checks before launching guests, the default is guaranteed +to work. + +The default CPU models will, however, leave the guest OS vulnerable to +various CPU hardware flaws, so their use is strongly discouraged. +Applications should follow the earlier guidance to setup a better CPU +configuration, with host passthrough recommended if live migration is +not needed. + +``qemu32``, ``qemu64`` + QEMU Virtual CPU version 2.5+ (32 & 64 bit variants) + +``qemu64`` is used for x86_64 guests and ``qemu32`` is used for i686 +guests, when no ``-cpu`` argument is given to QEMU, or no ``<cpu>`` is +provided in libvirt XML. + +Other non-recommended x86 CPUs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPUs models are compatible with most AMD and Intel x86 +hosts, but their usage is discouraged, as they expose a very limited +featureset, which prevents guests having optimal performance. + +``kvm32``, ``kvm64`` + Common KVM processor (32 & 64 bit variants). + + Legacy models just for historical compatibility with ancient QEMU + versions. + +``486``, ``athlon``, ``phenom``, ``coreduo``, ``core2duo``, ``n270``, ``pentium``, ``pentium2``, ``pentium3`` + Various very old x86 CPU models, mostly predating the introduction + of hardware assisted virtualization, that should thus not be + required for running virtual machines. + + +Syntax for configuring CPU models +================================= + +The examples below illustrate the approach to configuring the various +CPU models / features in QEMU and libvirt. + +QEMU command line +^^^^^^^^^^^^^^^^^ + +Host passthrough: + +.. parsed-literal:: + + |qemu_system| -cpu host + +Host passthrough with feature customization: + +.. parsed-literal:: + + |qemu_system| -cpu host,vmx=off,... + +Named CPU models: + +.. parsed-literal:: + + |qemu_system| -cpu Westmere + +Named CPU models with feature customization: + +.. parsed-literal:: + + |qemu_system| -cpu Westmere,pcid=on,... + +Libvirt guest XML +^^^^^^^^^^^^^^^^^ + +Host passthrough:: + + <cpu mode='host-passthrough'/> + +Host passthrough with feature customization:: + + <cpu mode='host-passthrough'> + <feature name="vmx" policy="disable"/> + ... + </cpu> + +Host model:: + + <cpu mode='host-model'/> + +Host model with feature customization:: + + <cpu mode='host-model'> + <feature name="vmx" policy="disable"/> + ... + </cpu> + +Named model:: + + <cpu mode='custom'> + <model name="Westmere"/> + </cpu> + +Named model with feature customization:: + + <cpu mode='custom'> + <model name="Westmere"/> + <feature name="pcid" policy="require"/> + ... + </cpu> diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst new file mode 100644 index 000000000..19944f526 --- /dev/null +++ b/docs/system/device-emulation.rst @@ -0,0 +1,91 @@ +.. _device-emulation: + +Device Emulation +---------------- + +QEMU supports the emulation of a large number of devices from +peripherals such network cards and USB devices to integrated systems +on a chip (SoCs). Configuration of these is often a source of +confusion so it helps to have an understanding of some of the terms +used to describes devices within QEMU. + +Common Terms +~~~~~~~~~~~~ + +Device Front End +================ + +A device front end is how a device is presented to the guest. The type +of device presented should match the hardware that the guest operating +system is expecting to see. All devices can be specified with the +``--device`` command line option. Running QEMU with the command line +options ``--device help`` will list all devices it is aware of. Using +the command line ``--device foo,help`` will list the additional +configuration options available for that device. + +A front end is often paired with a back end, which describes how the +host's resources are used in the emulation. + +Device Buses +============ + +Most devices will exist on a BUS of some sort. Depending on the +machine model you choose (``-M foo``) a number of buses will have been +automatically created. In most cases the BUS a device is attached to +can be inferred, for example PCI devices are generally automatically +allocated to the next free address of first PCI bus found. However in +complicated configurations you can explicitly specify what bus +(``bus=ID``) a device is attached to along with its address +(``addr=N``). + +Some devices, for example a PCI SCSI host controller, will add an +additional buses to the system that other devices can be attached to. +A hypothetical chain of devices might look like: + + --device foo,bus=pci.0,addr=0,id=foo + --device bar,bus=foo.0,addr=1,id=baz + +which would be a bar device (with the ID of baz) which is attached to +the first foo bus (foo.0) at address 1. The foo device which provides +that bus is itself is attached to the first PCI bus (pci.0). + + +Device Back End +=============== + +The back end describes how the data from the emulated device will be +processed by QEMU. The configuration of the back end is usually +specific to the class of device being emulated. For example serial +devices will be backed by a ``--chardev`` which can redirect the data +to a file or socket or some other system. Storage devices are handled +by ``--blockdev`` which will specify how blocks are handled, for +example being stored in a qcow2 file or accessing a raw host disk +partition. Back ends can sometimes be stacked to implement features +like snapshots. + +While the choice of back end is generally transparent to the guest, +there are cases where features will not be reported to the guest if +the back end is unable to support it. + +Device Pass Through +=================== + +Device pass through is where the device is actually given access to +the underlying hardware. This can be as simple as exposing a single +USB device on the host system to the guest or dedicating a video card +in a PCI slot to the exclusive use of the guest. + + +Emulated Devices +~~~~~~~~~~~~~~~~ + +.. toctree:: + :maxdepth: 1 + + devices/ivshmem.rst + devices/net.rst + devices/nvme.rst + devices/usb.rst + devices/vhost-user.rst + devices/virtio-pmem.rst + devices/vhost-user-rng.rst diff --git a/docs/system/device-url-syntax.rst.inc b/docs/system/device-url-syntax.rst.inc new file mode 100644 index 000000000..7dbc525fa --- /dev/null +++ b/docs/system/device-url-syntax.rst.inc @@ -0,0 +1,210 @@ + +In addition to using normal file images for the emulated storage +devices, QEMU can also use networked resources such as iSCSI devices. +These are specified using a special URL syntax. + +``iSCSI`` + iSCSI support allows QEMU to access iSCSI resources directly and use + as images for the guest storage. Both disk and cdrom images are + supported. + + Syntax for specifying iSCSI LUNs is + "iscsi://<target-ip>[:<port>]/<target-iqn>/<lun>" + + By default qemu will use the iSCSI initiator-name + 'iqn.2008-11.org.linux-kvm[:<name>]' but this can also be set from + the command line or a configuration file. + + Since version QEMU 2.4 it is possible to specify a iSCSI request + timeout to detect stalled requests and force a reestablishment of the + session. The timeout is specified in seconds. The default is 0 which + means no timeout. Libiscsi 1.15.0 or greater is required for this + feature. + + Example (without authentication): + + .. parsed-literal:: + + |qemu_system| -iscsi initiator-name=iqn.2001-04.com.example:my-initiator \\ + -cdrom iscsi://192.0.2.1/iqn.2001-04.com.example/2 \\ + -drive file=iscsi://192.0.2.1/iqn.2001-04.com.example/1 + + Example (CHAP username/password via URL): + + .. parsed-literal:: + + |qemu_system| -drive file=iscsi://user%password@192.0.2.1/iqn.2001-04.com.example/1 + + Example (CHAP username/password via environment variables): + + .. parsed-literal:: + + LIBISCSI_CHAP_USERNAME="user" \\ + LIBISCSI_CHAP_PASSWORD="password" \\ + |qemu_system| -drive file=iscsi://192.0.2.1/iqn.2001-04.com.example/1 + +``NBD`` + QEMU supports NBD (Network Block Devices) both using TCP protocol as + well as Unix Domain Sockets. With TCP, the default port is 10809. + + Syntax for specifying a NBD device using TCP, in preferred URI form: + "nbd://<server-ip>[:<port>]/[<export>]" + + Syntax for specifying a NBD device using Unix Domain Sockets; + remember that '?' is a shell glob character and may need quoting: + "nbd+unix:///[<export>]?socket=<domain-socket>" + + Older syntax that is also recognized: + "nbd:<server-ip>:<port>[:exportname=<export>]" + + Syntax for specifying a NBD device using Unix Domain Sockets + "nbd:unix:<domain-socket>[:exportname=<export>]" + + Example for TCP + + .. parsed-literal:: + + |qemu_system| --drive file=nbd:192.0.2.1:30000 + + Example for Unix Domain Sockets + + .. parsed-literal:: + + |qemu_system| --drive file=nbd:unix:/tmp/nbd-socket + +``SSH`` + QEMU supports SSH (Secure Shell) access to remote disks. + + Examples: + + .. parsed-literal:: + + |qemu_system| -drive file=ssh://user@host/path/to/disk.img + |qemu_system| -drive file.driver=ssh,file.user=user,file.host=host,file.port=22,file.path=/path/to/disk.img + + Currently authentication must be done using ssh-agent. Other + authentication methods may be supported in future. + +``GlusterFS`` + GlusterFS is a user space distributed file system. QEMU supports the + use of GlusterFS volumes for hosting VM disk images using TCP, Unix + Domain Sockets and RDMA transport protocols. + + Syntax for specifying a VM disk image on GlusterFS volume is + + .. parsed-literal:: + + URI: + gluster[+type]://[host[:port]]/volume/path[?socket=...][,debug=N][,logfile=...] + + JSON: + 'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"testvol","path":"a.img","debug":N,"logfile":"...", + "server":[{"type":"tcp","host":"...","port":"..."}, + {"type":"unix","socket":"..."}]}}' + + Example + + .. parsed-literal:: + + URI: + |qemu_system| --drive file=gluster://192.0.2.1/testvol/a.img, + file.debug=9,file.logfile=/var/log/qemu-gluster.log + + JSON: + |qemu_system| 'json:{"driver":"qcow2", + "file":{"driver":"gluster", + "volume":"testvol","path":"a.img", + "debug":9,"logfile":"/var/log/qemu-gluster.log", + "server":[{"type":"tcp","host":"1.2.3.4","port":24007}, + {"type":"unix","socket":"/var/run/glusterd.socket"}]}}' + |qemu_system| -drive driver=qcow2,file.driver=gluster,file.volume=testvol,file.path=/path/a.img, + file.debug=9,file.logfile=/var/log/qemu-gluster.log, + file.server.0.type=tcp,file.server.0.host=1.2.3.4,file.server.0.port=24007, + file.server.1.type=unix,file.server.1.socket=/var/run/glusterd.socket + + See also http://www.gluster.org. + +``HTTP/HTTPS/FTP/FTPS`` + QEMU supports read-only access to files accessed over http(s) and + ftp(s). + + Syntax using a single filename: + + :: + + <protocol>://[<username>[:<password>]@]<host>/<path> + + where: + + ``protocol`` + 'http', 'https', 'ftp', or 'ftps'. + + ``username`` + Optional username for authentication to the remote server. + + ``password`` + Optional password for authentication to the remote server. + + ``host`` + Address of the remote server. + + ``path`` + Path on the remote server, including any query string. + + The following options are also supported: + + ``url`` + The full URL when passing options to the driver explicitly. + + ``readahead`` + The amount of data to read ahead with each range request to the + remote server. This value may optionally have the suffix 'T', 'G', + 'M', 'K', 'k' or 'b'. If it does not have a suffix, it will be + assumed to be in bytes. The value must be a multiple of 512 bytes. + It defaults to 256k. + + ``sslverify`` + Whether to verify the remote server's certificate when connecting + over SSL. It can have the value 'on' or 'off'. It defaults to + 'on'. + + ``cookie`` + Send this cookie (it can also be a list of cookies separated by + ';') with each outgoing request. Only supported when using + protocols such as HTTP which support cookies, otherwise ignored. + + ``timeout`` + Set the timeout in seconds of the CURL connection. This timeout is + the time that CURL waits for a response from the remote server to + get the size of the image to be downloaded. If not set, the + default timeout of 5 seconds is used. + + Note that when passing options to qemu explicitly, ``driver`` is the + value of <protocol>. + + Example: boot from a remote Fedora 20 live ISO image + + .. parsed-literal:: + + |qemu_system_x86| --drive media=cdrom,file=https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/20/Live/x86_64/Fedora-Live-Desktop-x86_64-20-1.iso,readonly + + |qemu_system_x86| --drive media=cdrom,file.driver=http,file.url=http://archives.fedoraproject.org/pub/fedora/linux/releases/20/Live/x86_64/Fedora-Live-Desktop-x86_64-20-1.iso,readonly + + Example: boot from a remote Fedora 20 cloud image using a local + overlay for writes, copy-on-read, and a readahead of 64k + + .. parsed-literal:: + + qemu-img create -f qcow2 -o backing_file='json:{"file.driver":"http",, "file.url":"http://archives.fedoraproject.org/pub/archive/fedora/linux/releases/20/Images/x86_64/Fedora-x86_64-20-20131211.1-sda.qcow2",, "file.readahead":"64k"}' /tmp/Fedora-x86_64-20-20131211.1-sda.qcow2 + + |qemu_system_x86| -drive file=/tmp/Fedora-x86_64-20-20131211.1-sda.qcow2,copy-on-read=on + + Example: boot from an image stored on a VMware vSphere server with a + self-signed certificate using a local overlay for writes, a readahead + of 64k and a timeout of 10 seconds. + + .. parsed-literal:: + + qemu-img create -f qcow2 -o backing_file='json:{"file.driver":"https",, "file.url":"https://user:password@vsphere.example.com/folder/test/test-flat.vmdk?dcPath=Datacenter&dsName=datastore1",, "file.sslverify":"off",, "file.readahead":"64k",, "file.timeout":10}' /tmp/test.qcow2 + + |qemu_system_x86| -drive file=/tmp/test.qcow2 diff --git a/docs/system/devices/ivshmem.rst b/docs/system/devices/ivshmem.rst new file mode 100644 index 000000000..b03a48afa --- /dev/null +++ b/docs/system/devices/ivshmem.rst @@ -0,0 +1,64 @@ +.. _pcsys_005fivshmem: + +Inter-VM Shared Memory device +----------------------------- + +On Linux hosts, a shared memory device is available. The basic syntax +is: + +.. parsed-literal:: + + |qemu_system_x86| -device ivshmem-plain,memdev=hostmem + +where hostmem names a host memory backend. For a POSIX shared memory +backend, use something like + +:: + + -object memory-backend-file,size=1M,share,mem-path=/dev/shm/ivshmem,id=hostmem + +If desired, interrupts can be sent between guest VMs accessing the same +shared memory region. Interrupt support requires using a shared memory +server and using a chardev socket to connect to it. The code for the +shared memory server is qemu.git/contrib/ivshmem-server. An example +syntax when using the shared memory server is: + +.. parsed-literal:: + + # First start the ivshmem server once and for all + ivshmem-server -p pidfile -S path -m shm-name -l shm-size -n vectors + + # Then start your qemu instances with matching arguments + |qemu_system_x86| -device ivshmem-doorbell,vectors=vectors,chardev=id + -chardev socket,path=path,id=id + +When using the server, the guest will be assigned a VM ID (>=0) that +allows guests using the same server to communicate via interrupts. +Guests can read their VM ID from a device register (see +ivshmem-spec.txt). + +Migration with ivshmem +~~~~~~~~~~~~~~~~~~~~~~ + +With device property ``master=on``, the guest will copy the shared +memory on migration to the destination host. With ``master=off``, the +guest will not be able to migrate with the device attached. In the +latter case, the device should be detached and then reattached after +migration using the PCI hotplug support. + +At most one of the devices sharing the same memory can be master. The +master must complete migration before you plug back the other devices. + +ivshmem and hugepages +~~~~~~~~~~~~~~~~~~~~~ + +Instead of specifying the <shm size> using POSIX shm, you may specify a +memory backend that has hugepage support: + +.. parsed-literal:: + + |qemu_system_x86| -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1 + -device ivshmem-plain,memdev=mb1 + +ivshmem-server also supports hugepages mount points with the ``-m`` +memory path argument. diff --git a/docs/system/devices/net.rst b/docs/system/devices/net.rst new file mode 100644 index 000000000..4b2640c44 --- /dev/null +++ b/docs/system/devices/net.rst @@ -0,0 +1,100 @@ +.. _pcsys_005fnetwork: + +Network emulation +----------------- + +QEMU can simulate several network cards (e.g. PCI or ISA cards on the PC +target) and can connect them to a network backend on the host or an +emulated hub. The various host network backends can either be used to +connect the NIC of the guest to a real network (e.g. by using a TAP +devices or the non-privileged user mode network stack), or to other +guest instances running in another QEMU process (e.g. by using the +socket host network backend). + +Using TAP network interfaces +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This is the standard way to connect QEMU to a real network. QEMU adds a +virtual network device on your host (called ``tapN``), and you can then +configure it as if it was a real ethernet card. + +Linux host +^^^^^^^^^^ + +As an example, you can download the ``linux-test-xxx.tar.gz`` archive +and copy the script ``qemu-ifup`` in ``/etc`` and configure properly +``sudo`` so that the command ``ifconfig`` contained in ``qemu-ifup`` can +be executed as root. You must verify that your host kernel supports the +TAP network interfaces: the device ``/dev/net/tun`` must be present. + +See :ref:`sec_005finvocation` to have examples of command +lines using the TAP network interfaces. + +Windows host +^^^^^^^^^^^^ + +There is a virtual ethernet driver for Windows 2000/XP systems, called +TAP-Win32. But it is not included in standard QEMU for Windows, so you +will need to get it separately. It is part of OpenVPN package, so +download OpenVPN from : https://openvpn.net/. + +Using the user mode network stack +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By using the option ``-net user`` (default configuration if no ``-net`` +option is specified), QEMU uses a completely user mode network stack +(you don't need root privilege to use the virtual network). The virtual +network configuration is the following:: + + guest (10.0.2.15) <------> Firewall/DHCP server <-----> Internet + | (10.0.2.2) + | + ----> DNS server (10.0.2.3) + | + ----> SMB server (10.0.2.4) + +The QEMU VM behaves as if it was behind a firewall which blocks all +incoming connections. You can use a DHCP client to automatically +configure the network in the QEMU VM. The DHCP server assign addresses +to the hosts starting from 10.0.2.15. + +In order to check that the user mode network is working, you can ping +the address 10.0.2.2 and verify that you got an address in the range +10.0.2.x from the QEMU virtual DHCP server. + +Note that ICMP traffic in general does not work with user mode +networking. ``ping``, aka. ICMP echo, to the local router (10.0.2.2) +shall work, however. If you're using QEMU on Linux >= 3.0, it can use +unprivileged ICMP ping sockets to allow ``ping`` to the Internet. The +host admin has to set the ping_group_range in order to grant access to +those sockets. To allow ping for GID 100 (usually users group):: + + echo 100 100 > /proc/sys/net/ipv4/ping_group_range + +When using the built-in TFTP server, the router is also the TFTP server. + +When using the ``'-netdev user,hostfwd=...'`` option, TCP or UDP +connections can be redirected from the host to the guest. It allows for +example to redirect X11, telnet or SSH connections. + +Hubs +~~~~ + +QEMU can simulate several hubs. A hub can be thought of as a virtual +connection between several network devices. These devices can be for +example QEMU virtual ethernet cards or virtual Host ethernet devices +(TAP devices). You can connect guest NICs or host network backends to +such a hub using the ``-netdev +hubport`` or ``-nic hubport`` options. The legacy ``-net`` option also +connects the given device to the emulated hub with ID 0 (i.e. the +default hub) unless you specify a netdev with ``-net nic,netdev=xxx`` +here. + +Connecting emulated networks between QEMU instances +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Using the ``-netdev socket`` (or ``-nic socket`` or ``-net socket``) +option, it is possible to create emulated networks that span several +QEMU instances. See the description of the ``-netdev socket`` option in +:ref:`sec_005finvocation` to have a basic +example. diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst new file mode 100644 index 000000000..b5acb2a9c --- /dev/null +++ b/docs/system/devices/nvme.rst @@ -0,0 +1,241 @@ +============== +NVMe Emulation +============== + +QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and +``nvme-subsys`` devices. + +See the following sections for specific information on + + * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_. + * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_, + `Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data + Protection`_, + +Adding NVMe Devices +=================== + +Controller Emulation +-------------------- + +The QEMU emulated NVMe controller implements version 1.4 of the NVM Express +specification. All mandatory features are implement with a couple of exceptions +and limitations: + + * Accounting numbers in the SMART/Health log page are reset when the device + is power cycled. + * Interrupt Coalescing is not supported and is disabled by default. + +The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the +following parameters: + +.. code-block:: console + + -drive file=nvm.img,if=none,id=nvm + -device nvme,serial=deadbeef,drive=nvm + +There are a number of optional general parameters for the ``nvme`` device. Some +are mentioned here, but see ``-device nvme,help`` to list all possible +parameters. + +``max_ioqpairs=UINT32`` (default: ``64``) + Set the maximum number of allowed I/O queue pairs. This replaces the + deprecated ``num_queues`` parameter. + +``msix_qsize=UINT16`` (default: ``65``) + The number of MSI-X vectors that the device should support. + +``mdts=UINT8`` (default: ``7``) + Set the Maximum Data Transfer Size of the device. + +``use-intel-id`` (default: ``off``) + Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and + Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID + previously used. + +Additional Namespaces +--------------------- + +In the simplest possible invocation sketched above, the device only support a +single namespace with the namespace identifier ``1``. To support multiple +namespaces and additional features, the ``nvme-ns`` device must be used. + +.. code-block:: console + + -device nvme,id=nvme-ctrl-0,serial=deadbeef + -drive file=nvm-1.img,if=none,id=nvm-1 + -device nvme-ns,drive=nvm-1 + -drive file=nvm-2.img,if=none,id=nvm-2 + -device nvme-ns,drive=nvm-2 + +The namespaces defined by the ``nvme-ns`` device will attach to the most +recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace +identifiers are allocated automatically, starting from ``1``. + +There are a number of parameters available: + +``nsid`` (default: ``0``) + Explicitly set the namespace identifier. + +``uuid`` (default: *autogenerated*) + Set the UUID of the namespace. This will be reported as a "Namespace UUID" + descriptor in the Namespace Identification Descriptor List. + +``eui64`` + Set the EUI-64 of the namespace. This will be reported as a "IEEE Extended + Unique Identifier" descriptor in the Namespace Identification Descriptor List. + Since machine type 6.1 a non-zero default value is used if the parameter + is not provided. For earlier machine types the field defaults to 0. + +``bus`` + If there are more ``nvme`` devices defined, this parameter may be used to + attach the namespace to a specific ``nvme`` device (identified by an ``id`` + parameter on the controller device). + +NVM Subsystems +-------------- + +Additional features becomes available if the controller device (``nvme``) is +linked to an NVM Subsystem device (``nvme-subsys``). + +The NVM Subsystem emulation allows features such as shared namespaces and +multipath I/O. + +.. code-block:: console + + -device nvme-subsys,id=nvme-subsys-0,nqn=subsys0 + -device nvme,serial=a,subsys=nvme-subsys-0 + -device nvme,serial=b,subsys=nvme-subsys-0 + +This will create an NVM subsystem with two controllers. Having controllers +linked to an ``nvme-subsys`` device allows additional ``nvme-ns`` parameters: + +``shared`` (default: ``on`` since 6.2) + Specifies that the namespace will be attached to all controllers in the + subsystem. If set to ``off``, the namespace will remain a private namespace + and may only be attached to a single controller at a time. Shared namespaces + are always automatically attached to all controllers (also when controllers + are hotplugged). + +``detached`` (default: ``off``) + If set to ``on``, the namespace will be be available in the subsystem, but + not attached to any controllers initially. A shared namespace with this set + to ``on`` will never be automatically attached to controllers. + +Thus, adding + +.. code-block:: console + + -drive file=nvm-1.img,if=none,id=nvm-1 + -device nvme-ns,drive=nvm-1,nsid=1 + -drive file=nvm-2.img,if=none,id=nvm-2 + -device nvme-ns,drive=nvm-2,nsid=3,shared=off,detached=on + +will cause NSID 1 will be a shared namespace that is initially attached to both +controllers. NSID 3 will be a private namespace due to ``shared=off`` and only +attachable to a single controller at a time. Additionally it will not be +attached to any controller initially (due to ``detached=on``) or to hotplugged +controllers. + +Optional Features +================= + +Controller Memory Buffer +------------------------ + +``nvme`` device parameters related to the Controller Memory Buffer support: + +``cmb_size_mb=UINT32`` (default: ``0``) + This adds a Controller Memory Buffer of the given size at offset zero in BAR + 2. + +``legacy-cmb`` (default: ``off``) + By default, the device uses the "v1.4 scheme" for the Controller Memory + Buffer support (i.e, the CMB is initially disabled and must be explicitly + enabled by the host). Set this to ``on`` to behave as a v1.3 device wrt. the + CMB. + +Simple Copy +----------- + +The device includes support for TP 4065 ("Simple Copy Command"). A number of +additional ``nvme-ns`` device parameters may be used to control the Copy +command limits: + +``mssrl=UINT16`` (default: ``128``) + Set the Maximum Single Source Range Length (``MSSRL``). This is the maximum + number of logical blocks that may be specified in each source range. + +``mcl=UINT32`` (default: ``128``) + Set the Maximum Copy Length (``MCL``). This is the maximum number of logical + blocks that may be specified in a Copy command (the total for all source + ranges). + +``msrc=UINT8`` (default: ``127``) + Set the Maximum Source Range Count (``MSRC``). This is the maximum number of + source ranges that may be used in a Copy command. This is a 0's based value. + +Zoned Namespaces +---------------- + +A namespaces may be "Zoned" as defined by TP 4053 ("Zoned Namespaces"). Set +``zoned=on`` on an ``nvme-ns`` device to configure it as a zoned namespace. + +The namespace may be configured with additional parameters + +``zoned.zone_size=SIZE`` (default: ``128MiB``) + Define the zone size (``ZSZE``). + +``zoned.zone_capacity=SIZE`` (default: ``0``) + Define the zone capacity (``ZCAP``). If left at the default (``0``), the zone + capacity will equal the zone size. + +``zoned.descr_ext_size=UINT32`` (default: ``0``) + Set the Zone Descriptor Extension Size (``ZDES``). Must be a multiple of 64 + bytes. + +``zoned.cross_read=BOOL`` (default: ``off``) + Set to ``on`` to allow reads to cross zone boundaries. + +``zoned.max_active=UINT32`` (default: ``0``) + Set the maximum number of active resources (``MAR``). The default (``0``) + allows all zones to be active. + +``zoned.max_open=UINT32`` (default: ``0``) + Set the maximum number of open resources (``MOR``). The default (``0``) + allows all zones to be open. If ``zoned.max_active`` is specified, this value + must be less than or equal to that. + +``zoned.zasl=UINT8`` (default: ``0``) + Set the maximum data transfer size for the Zone Append command. Like + ``mdts``, the value is specified as a power of two (2^n) and is in units of + the minimum memory page size (CAP.MPSMIN). The default value (``0``) + has this property inherit the ``mdts`` value. + +Metadata +-------- + +The virtual namespace device supports LBA metadata in the form separate +metadata (``MPTR``-based) and extended LBAs. + +``ms=UINT16`` (default: ``0``) + Defines the number of metadata bytes per LBA. + +``mset=UINT8`` (default: ``0``) + Set to ``1`` to enable extended LBAs. + +End-to-End Data Protection +-------------------------- + +The virtual namespace device supports DIF- and DIX-based protection information +(depending on ``mset``). + +``pi=UINT8`` (default: ``0``) + Enable protection information of the specified type (type ``1``, ``2`` or + ``3``). + +``pil=UINT8`` (default: ``0``) + Controls the location of the protection information within the metadata. Set + to ``1`` to transfer protection information as the first eight bytes of + metadata. Otherwise, the protection information is transferred as the last + eight bytes. diff --git a/docs/system/devices/usb.rst b/docs/system/devices/usb.rst new file mode 100644 index 000000000..afb7d6c22 --- /dev/null +++ b/docs/system/devices/usb.rst @@ -0,0 +1,351 @@ +.. _pcsys_005fusb: + +USB emulation +------------- + +QEMU can emulate a PCI UHCI, OHCI, EHCI or XHCI USB controller. You can +plug virtual USB devices or real host USB devices (only works with +certain host operating systems). QEMU will automatically create and +connect virtual USB hubs as necessary to connect multiple USB devices. + +USB controllers +~~~~~~~~~~~~~~~ + +XHCI controller support +^^^^^^^^^^^^^^^^^^^^^^^ + +QEMU has XHCI host adapter support. The XHCI hardware design is much +more virtualization-friendly when compared to EHCI and UHCI, thus XHCI +emulation uses less resources (especially CPU). So if your guest +supports XHCI (which should be the case for any operating system +released around 2010 or later) we recommend using it: + + qemu -device qemu-xhci + +XHCI supports USB 1.1, USB 2.0 and USB 3.0 devices, so this is the +only controller you need. With only a single USB controller (and +therefore only a single USB bus) present in the system there is no +need to use the bus= parameter when adding USB devices. + + +EHCI controller support +^^^^^^^^^^^^^^^^^^^^^^^ + +The QEMU EHCI Adapter supports USB 2.0 devices. It can be used either +standalone or with companion controllers (UHCI, OHCI) for USB 1.1 +devices. The companion controller setup is more convenient to use +because it provides a single USB bus supporting both USB 2.0 and USB +1.1 devices. See next section for details. + +When running EHCI in standalone mode you can add UHCI or OHCI +controllers for USB 1.1 devices too. Each controller creates its own +bus though, so there are two completely separate USB buses: One USB +1.1 bus driven by the UHCI controller and one USB 2.0 bus driven by +the EHCI controller. Devices must be attached to the correct +controller manually. + +The easiest way to add a UHCI controller to a ``pc`` machine is the +``-usb`` switch. QEMU will create the UHCI controller as function of +the PIIX3 chipset. The USB 1.1 bus will carry the name ``usb-bus.0``. + +You can use the standard ``-device`` switch to add a EHCI controller to +your virtual machine. It is strongly recommended to specify an ID for +the controller so the USB 2.0 bus gets an individual name, for example +``-device usb-ehci,id=ehci``. This will give you a USB 2.0 bus named +``ehci.0``. + +When adding USB devices using the ``-device`` switch you can specify the +bus they should be attached to. Here is a complete example: + +.. parsed-literal:: + + |qemu_system| -M pc ${otheroptions} \\ + -drive if=none,id=usbstick,format=raw,file=/path/to/image \\ + -usb \\ + -device usb-ehci,id=ehci \\ + -device usb-tablet,bus=usb-bus.0 \\ + -device usb-storage,bus=ehci.0,drive=usbstick + +This attaches a USB tablet to the UHCI adapter and a USB mass storage +device to the EHCI adapter. + + +Companion controller support +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The UHCI and OHCI controllers can attach to a USB bus created by EHCI +as companion controllers. This is done by specifying the ``masterbus`` +and ``firstport`` properties. ``masterbus`` specifies the bus name the +controller should attach to. ``firstport`` specifies the first port the +controller should attach to, which is needed as usually one EHCI +controller with six ports has three UHCI companion controllers with +two ports each. + +There is a config file in docs which will do all this for +you, which you can use like this: + +.. parsed-literal:: + + |qemu_system| -readconfig docs/config/ich9-ehci-uhci.cfg + +Then use ``bus=ehci.0`` to assign your USB devices to that bus. + +Using the ``-usb`` switch for ``q35`` machines will create a similar +USB controller configuration. + + +.. _Connecting USB devices: + +Connecting USB devices +~~~~~~~~~~~~~~~~~~~~~~ + +USB devices can be connected with the ``-device usb-...`` command line +option or the ``device_add`` monitor command. Available devices are: + +``usb-mouse`` + Virtual Mouse. This will override the PS/2 mouse emulation when + activated. + +``usb-tablet`` + Pointer device that uses absolute coordinates (like a touchscreen). + This means QEMU is able to report the mouse position without having + to grab the mouse. Also overrides the PS/2 mouse emulation when + activated. + +``usb-storage,drive=drive_id`` + Mass storage device backed by drive_id (see the :ref:`disk images` + chapter in the System Emulation Users Guide). This is the classic + bulk-only transport protocol used by 99% of USB sticks. This + example shows it connected to an XHCI USB controller and with + a drive backed by a raw format disk image: + + .. parsed-literal:: + + |qemu_system| [...] \\ + -drive if=none,id=stick,format=raw,file=/path/to/file.img \\ + -device nec-usb-xhci,id=xhci \\ + -device usb-storage,bus=xhci.0,drive=stick + +``usb-uas`` + USB attached SCSI device. This does not create a SCSI disk, so + you need to explicitly create a ``scsi-hd`` or ``scsi-cd`` device + on the command line, as well as using the ``-drive`` option to + specify what those disks are backed by. One ``usb-uas`` device can + handle multiple logical units (disks). This example creates three + logical units: two disks and one cdrom drive: + + .. parsed-literal:: + + |qemu_system| [...] \\ + -drive if=none,id=uas-disk1,format=raw,file=/path/to/file1.img \\ + -drive if=none,id=uas-disk2,format=raw,file=/path/to/file2.img \\ + -drive if=none,id=uas-cdrom,media=cdrom,format=raw,file=/path/to/image.iso \\ + -device nec-usb-xhci,id=xhci \\ + -device usb-uas,id=uas,bus=xhci.0 \\ + -device scsi-hd,bus=uas.0,scsi-id=0,lun=0,drive=uas-disk1 \\ + -device scsi-hd,bus=uas.0,scsi-id=0,lun=1,drive=uas-disk2 \\ + -device scsi-cd,bus=uas.0,scsi-id=0,lun=5,drive=uas-cdrom + +``usb-bot`` + Bulk-only transport storage device. This presents the guest with the + same USB bulk-only transport protocol interface as ``usb-storage``, but + the QEMU command line option works like ``usb-uas`` and does not + automatically create SCSI disks for you. ``usb-bot`` supports up to + 16 LUNs. Unlike ``usb-uas``, the LUN numbers must be continuous, + i.e. for three devices you must use 0+1+2. The 0+1+5 numbering from the + ``usb-uas`` example above won't work with ``usb-bot``. + +``usb-mtp,rootdir=dir`` + Media transfer protocol device, using dir as root of the file tree + that is presented to the guest. + +``usb-host,hostbus=bus,hostaddr=addr`` + Pass through the host device identified by bus and addr + +``usb-host,vendorid=vendor,productid=product`` + Pass through the host device identified by vendor and product ID + +``usb-wacom-tablet`` + Virtual Wacom PenPartner tablet. This device is similar to the + ``tablet`` above but it can be used with the tslib library because in + addition to touch coordinates it reports touch pressure. + +``usb-kbd`` + Standard USB keyboard. Will override the PS/2 keyboard (if present). + +``usb-serial,chardev=id`` + Serial converter. This emulates an FTDI FT232BM chip connected to + host character device id. + +``usb-braille,chardev=id`` + Braille device. This will use BrlAPI to display the braille output on + a real or fake device referenced by id. + +``usb-net[,netdev=id]`` + Network adapter that supports CDC ethernet and RNDIS protocols. id + specifies a netdev defined with ``-netdev …,id=id``. For instance, + user-mode networking can be used with + + .. parsed-literal:: + + |qemu_system| [...] -netdev user,id=net0 -device usb-net,netdev=net0 + +``usb-ccid`` + Smartcard reader device + +``usb-audio`` + USB audio device + +``u2f-{emulated,passthru}`` + Universal Second Factor device + +Physical port addressing +^^^^^^^^^^^^^^^^^^^^^^^^ + +For all the above USB devices, by default QEMU will plug the device +into the next available port on the specified USB bus, or onto +some available USB bus if you didn't specify one explicitly. +If you need to, you can also specify the physical port where +the device will show up in the guest. This can be done using the +``port`` property. UHCI has two root ports (1,2). EHCI has six root +ports (1-6), and the emulated (1.1) USB hub has eight ports. + +Plugging a tablet into UHCI port 1 works like this:: + + -device usb-tablet,bus=usb-bus.0,port=1 + +Plugging a hub into UHCI port 2 works like this:: + + -device usb-hub,bus=usb-bus.0,port=2 + +Plugging a virtual USB stick into port 4 of the hub just plugged works +this way:: + + -device usb-storage,bus=usb-bus.0,port=2.4,drive=... + +In the monitor, the ``device_add` command also accepts a ``port`` +property specification. If you want to unplug devices too you should +specify some unique id which you can use to refer to the device. +You can then use ``device_del`` to unplug the device later. +For example:: + + (qemu) device_add usb-tablet,bus=usb-bus.0,port=1,id=my-tablet + (qemu) device_del my-tablet + +Hotplugging USB storage +~~~~~~~~~~~~~~~~~~~~~~~ + +The ``usb-bot`` and ``usb-uas`` devices can be hotplugged. In the hotplug +case they are added with ``attached = false`` so the guest will not see +the device until the ``attached`` property is explicitly set to true. +That allows you to attach one or more scsi devices before making the +device visible to the guest. The workflow looks like this: + +#. ``device-add usb-bot,id=foo`` +#. ``device-add scsi-{hd,cd},bus=foo.0,lun=0`` +#. optionally add more devices (luns 1 ... 15) +#. ``scripts/qmp/qom-set foo.attached = true`` + +.. _host_005fusb_005fdevices: + +Using host USB devices on a Linux host +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +WARNING: this is an experimental feature. QEMU will slow down when using +it. USB devices requiring real time streaming (i.e. USB Video Cameras) +are not supported yet. + +1. If you use an early Linux 2.4 kernel, verify that no Linux driver is + actually using the USB device. A simple way to do that is simply to + disable the corresponding kernel module by renaming it from + ``mydriver.o`` to ``mydriver.o.disabled``. + +2. Verify that ``/proc/bus/usb`` is working (most Linux distributions + should enable it by default). You should see something like that: + + :: + + ls /proc/bus/usb + 001 devices drivers + +3. Since only root can access to the USB devices directly, you can + either launch QEMU as root or change the permissions of the USB + devices you want to use. For testing, the following suffices: + + :: + + chown -R myuid /proc/bus/usb + +4. Launch QEMU and do in the monitor: + + :: + + info usbhost + Device 1.2, speed 480 Mb/s + Class 00: USB device 1234:5678, USB DISK + + You should see the list of the devices you can use (Never try to use + hubs, it won't work). + +5. Add the device in QEMU by using: + + :: + + device_add usb-host,vendorid=0x1234,productid=0x5678 + + Normally the guest OS should report that a new USB device is plugged. + You can use the option ``-device usb-host,...`` to do the same. + +6. Now you can try to use the host USB device in QEMU. + +When relaunching QEMU, you may have to unplug and plug again the USB +device to make it work again (this is a bug). + +``usb-host`` properties for specifying the host device +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The example above uses the ``vendorid`` and ``productid`` to +specify which host device to pass through, but this is not +the only way to specify the host device. ``usb-host`` supports +the following properties: + +``hostbus=<nr>`` + Specifies the bus number the device must be attached to +``hostaddr=<nr>`` + Specifies the device address the device got assigned by the guest os +``hostport=<str>`` + Specifies the physical port the device is attached to +``vendorid=<hexnr>`` + Specifies the vendor ID of the device +``productid=<hexnr>`` + Specifies the product ID of the device. + +In theory you can combine all these properties as you like. In +practice only a few combinations are useful: + +- ``vendorid`` and ``productid`` -- match for a specific device, pass it to + the guest when it shows up somewhere in the host. + +- ``hostbus`` and ``hostport`` -- match for a specific physical port in the + host, any device which is plugged in there gets passed to the + guest. + +- ``hostbus`` and ``hostaddr`` -- most useful for ad-hoc pass through as the + hostaddr isn't stable. The next time you plug the device into the host it + will get a new hostaddr. + +Note that on the host USB 1.1 devices are handled by UHCI/OHCI and USB +2.0 by EHCI. That means different USB devices plugged into the very +same physical port on the host may show up on different host buses +depending on the speed. Supposing that devices plugged into a given +physical port appear as bus 1 + port 1 for 2.0 devices and bus 3 + port 1 +for 1.1 devices, you can pass through any device plugged into that port +and also assign it to the correct USB bus in QEMU like this: + +.. parsed-literal:: + + |qemu_system| -M pc [...] \\ + -usb \\ + -device usb-ehci,id=ehci \\ + -device usb-host,bus=usb-bus.0,hostbus=3,hostport=1 \\ + -device usb-host,bus=ehci.0,hostbus=1,hostport=1 diff --git a/docs/system/devices/vhost-user-rng.rst b/docs/system/devices/vhost-user-rng.rst new file mode 100644 index 000000000..a145d4105 --- /dev/null +++ b/docs/system/devices/vhost-user-rng.rst @@ -0,0 +1,39 @@ +QEMU vhost-user-rng - RNG emulation +=================================== + +Background +---------- + +What follows builds on the material presented in vhost-user.rst - it should +be reviewed before moving forward with the content in this file. + +Description +----------- + +The vhost-user-rng device implementation was designed to work with a random +number generator daemon such as the one found in the vhost-device crate of +the rust-vmm project available on github [1]. + +[1]. https://github.com/rust-vmm/vhost-device + +Examples +-------- + +The daemon should be started first: + +:: + + host# vhost-device-rng --socket-path=rng.sock -c 1 -m 512 -p 1000 + +The QEMU invocation needs to create a chardev socket the device can +use to communicate as well as share the guests memory over a memfd. + +:: + + host# qemu-system \ + -chardev socket,path=$(PATH)/rng.sock,id=rng0 \ + -device vhost-user-rng-pci,chardev=rng0 \ + -m 4096 \ + -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \ + -numa node,memdev=mem \ + ... diff --git a/docs/system/devices/vhost-user.rst b/docs/system/devices/vhost-user.rst new file mode 100644 index 000000000..86128114f --- /dev/null +++ b/docs/system/devices/vhost-user.rst @@ -0,0 +1,59 @@ +.. _vhost_user: + +vhost-user back ends +-------------------- + +vhost-user back ends are way to service the request of VirtIO devices +outside of QEMU itself. To do this there are a number of things +required. + +vhost-user device +=================== + +These are simple stub devices that ensure the VirtIO device is visible +to the guest. The code is mostly boilerplate although each device has +a ``chardev`` option which specifies the ID of the ``--chardev`` +device that connects via a socket to the vhost-user *daemon*. + +vhost-user daemon +================= + +This is a separate process that is connected to by QEMU via a socket +following the :ref:`vhost_user_proto`. There are a number of daemons +that can be built when enabled by the project although any daemon that +meets the specification for a given device can be used. + +Shared memory object +==================== + +In order for the daemon to access the VirtIO queues to process the +requests it needs access to the guest's address space. This is +achieved via the ``memory-backend-file`` or ``memory-backend-memfd`` +objects. A reference to a file-descriptor which can access this object +will be passed via the socket as part of the protocol negotiation. + +Currently the shared memory object needs to match the size of the main +system memory as defined by the ``-m`` argument. + +Example +======= + +First start you daemon. + +.. parsed-literal:: + + $ virtio-foo --socket-path=/var/run/foo.sock $OTHER_ARGS + +The you start your QEMU instance specifying the device, chardev and +memory objects. + +.. parsed-literal:: + + $ |qemu_system| \\ + -m 4096 \\ + -chardev socket,id=ba1,path=/var/run/foo.sock \\ + -device vhost-user-foo,chardev=ba1,$OTHER_ARGS \\ + -object memory-backend-memfd,id=mem,size=4G,share=on \\ + -numa node,memdev=mem \\ + ... + diff --git a/docs/system/devices/virtio-pmem.rst b/docs/system/devices/virtio-pmem.rst new file mode 100644 index 000000000..c82ac0673 --- /dev/null +++ b/docs/system/devices/virtio-pmem.rst @@ -0,0 +1,76 @@ + +=========== +virtio pmem +=========== + +This document explains the setup and usage of the virtio pmem device. +The virtio pmem device is a paravirtualized persistent memory device +on regular (i.e non-NVDIMM) storage. + +Usecase +------- + +Virtio pmem allows to bypass the guest page cache and directly use +host page cache. This reduces guest memory footprint as the host can +make efficient memory reclaim decisions under memory pressure. + +How does virtio-pmem compare to the nvdimm emulation? +----------------------------------------------------- + +NVDIMM emulation on regular (i.e. non-NVDIMM) host storage does not +persist the guest writes as there are no defined semantics in the device +specification. The virtio pmem device provides guest write persistence +on non-NVDIMM host storage. + +virtio pmem usage +----------------- + +A virtio pmem device backed by a memory-backend-file can be created on +the QEMU command line as in the following example:: + + -object memory-backend-file,id=mem1,share,mem-path=./virtio_pmem.img,size=4G + -device virtio-pmem-pci,memdev=mem1,id=nv1 + +where: + + - "object memory-backend-file,id=mem1,share,mem-path=<image>, size=<image size>" + creates a backend file with the specified size. + + - "device virtio-pmem-pci,id=nvdimm1,memdev=mem1" creates a virtio pmem + pci device whose storage is provided by above memory backend device. + +Multiple virtio pmem devices can be created if multiple pairs of "-object" +and "-device" are provided. + +Hotplug +------- + +Virtio pmem devices can be hotplugged via the QEMU monitor. First, the +memory backing has to be added via 'object_add'; afterwards, the virtio +pmem device can be added via 'device_add'. + +For example, the following commands add another 4GB virtio pmem device to +the guest:: + + (qemu) object_add memory-backend-file,id=mem2,share=on,mem-path=virtio_pmem2.img,size=4G + (qemu) device_add virtio-pmem-pci,id=virtio_pmem2,memdev=mem2 + +Guest Data Persistence +---------------------- + +Guest data persistence on non-NVDIMM requires guest userspace applications +to perform fsync/msync. This is different from a real nvdimm backend where +no additional fsync/msync is required. This is to persist guest writes in +host backing file which otherwise remains in host page cache and there is +risk of losing the data in case of power failure. + +With virtio pmem device, MAP_SYNC mmap flag is not supported. This provides +a hint to application to perform fsync for write persistence. + +Limitations +----------- + +- Real nvdimm device backend is not supported. +- virtio pmem hotunplug is not supported. +- ACPI NVDIMM features like regions/namespaces are not supported. +- ndctl command is not supported. diff --git a/docs/system/gdb.rst b/docs/system/gdb.rst new file mode 100644 index 000000000..453eb73f6 --- /dev/null +++ b/docs/system/gdb.rst @@ -0,0 +1,194 @@ +.. _GDB usage: + +GDB usage +--------- + +QEMU supports working with gdb via gdb's remote-connection facility +(the "gdbstub"). This allows you to debug guest code in the same +way that you might with a low-level debug facility like JTAG +on real hardware. You can stop and start the virtual machine, +examine state like registers and memory, and set breakpoints and +watchpoints. + +In order to use gdb, launch QEMU with the ``-s`` and ``-S`` options. +The ``-s`` option will make QEMU listen for an incoming connection +from gdb on TCP port 1234, and ``-S`` will make QEMU not start the +guest until you tell it to from gdb. (If you want to specify which +TCP port to use or to use something other than TCP for the gdbstub +connection, use the ``-gdb dev`` option instead of ``-s``. See +`Using unix sockets`_ for an example.) + +.. parsed-literal:: + + |qemu_system| -s -S -kernel bzImage -hda rootdisk.img -append "root=/dev/hda" + +QEMU will launch but will silently wait for gdb to connect. + +Then launch gdb on the 'vmlinux' executable:: + + > gdb vmlinux + +In gdb, connect to QEMU:: + + (gdb) target remote localhost:1234 + +Then you can use gdb normally. For example, type 'c' to launch the +kernel:: + + (gdb) c + +Here are some useful tips in order to use gdb on system code: + +1. Use ``info reg`` to display all the CPU registers. + +2. Use ``x/10i $eip`` to display the code at the PC position. + +3. Use ``set architecture i8086`` to dump 16 bit code. Then use + ``x/10i $cs*16+$eip`` to dump the code at the PC position. + +Debugging multicore machines +============================ + +GDB's abstraction for debugging targets with multiple possible +parallel flows of execution is a two layer one: it supports multiple +"inferiors", each of which can have multiple "threads". When the QEMU +machine has more than one CPU, QEMU exposes each CPU cluster as a +separate "inferior", where each CPU within the cluster is a separate +"thread". Most QEMU machine types have identical CPUs, so there is a +single cluster which has all the CPUs in it. A few machine types are +heterogeneous and have multiple clusters: for example the ``sifive_u`` +machine has a cluster with one E51 core and a second cluster with four +U54 cores. Here the E51 is the only thread in the first inferior, and +the U54 cores are all threads in the second inferior. + +When you connect gdb to the gdbstub, it will automatically +connect to the first inferior; you can display the CPUs in this +cluster using the gdb ``info thread`` command, and switch between +them using gdb's usual thread-management commands. + +For multi-cluster machines, unfortunately gdb does not by default +handle multiple inferiors, and so you have to explicitly connect +to them. First, you must connect with the ``extended-remote`` +protocol, not ``remote``:: + + (gdb) target extended-remote localhost:1234 + +Once connected, gdb will have a single inferior, for the +first cluster. You need to create inferiors for the other +clusters and attach to them, like this:: + + (gdb) add-inferior + Added inferior 2 + (gdb) inferior 2 + [Switching to inferior 2 [<null>] (<noexec>)] + (gdb) attach 2 + Attaching to process 2 + warning: No executable has been specified and target does not support + determining executable automatically. Try using the "file" command. + 0x00000000 in ?? () + +Once you've done this, ``info threads`` will show CPUs in +all the clusters you have attached to:: + + (gdb) info threads + Id Target Id Frame + 1.1 Thread 1.1 (cortex-m33-arm-cpu cpu [running]) 0x00000000 in ?? () + * 2.1 Thread 2.2 (cortex-m33-arm-cpu cpu [halted ]) 0x00000000 in ?? () + +You probably also want to set gdb to ``schedule-multiple`` mode, +so that when you tell gdb to ``continue`` it resumes all CPUs, +not just those in the cluster you are currently working on:: + + (gdb) set schedule-multiple on + +Using unix sockets +================== + +An alternate method for connecting gdb to the QEMU gdbstub is to use +a unix socket (if supported by your operating system). This is useful when +running several tests in parallel, or if you do not have a known free TCP +port (e.g. when running automated tests). + +First create a chardev with the appropriate options, then +instruct the gdbserver to use that device: + +.. parsed-literal:: + + |qemu_system| -chardev socket,path=/tmp/gdb-socket,server=on,wait=off,id=gdb0 -gdb chardev:gdb0 -S ... + +Start gdb as before, but this time connect using the path to +the socket:: + + (gdb) target remote /tmp/gdb-socket + +Note that to use a unix socket for the connection you will need +gdb version 9.0 or newer. + +Advanced debugging options +========================== + +Changing single-stepping behaviour +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The default single stepping behavior is step with the IRQs and timer +service routines off. It is set this way because when gdb executes a +single step it expects to advance beyond the current instruction. With +the IRQs and timer service routines on, a single step might jump into +the one of the interrupt or exception vectors instead of executing the +current instruction. This means you may hit the same breakpoint a number +of times before executing the instruction gdb wants to have executed. +Because there are rare circumstances where you want to single step into +an interrupt vector the behavior can be controlled from GDB. There are +three commands you can query and set the single step behavior: + +``maintenance packet qqemu.sstepbits`` + This will display the MASK bits used to control the single stepping + IE: + + :: + + (gdb) maintenance packet qqemu.sstepbits + sending: "qqemu.sstepbits" + received: "ENABLE=1,NOIRQ=2,NOTIMER=4" + +``maintenance packet qqemu.sstep`` + This will display the current value of the mask used when single + stepping IE: + + :: + + (gdb) maintenance packet qqemu.sstep + sending: "qqemu.sstep" + received: "0x7" + +``maintenance packet Qqemu.sstep=HEX_VALUE`` + This will change the single step mask, so if wanted to enable IRQs on + the single step, but not timers, you would use: + + :: + + (gdb) maintenance packet Qqemu.sstep=0x5 + sending: "qemu.sstep=0x5" + received: "OK" + +Examining physical memory +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Another feature that QEMU gdbstub provides is to toggle the memory GDB +works with, by default GDB will show the current process memory respecting +the virtual address translation. + +If you want to examine/change the physical memory you can set the gdbstub +to work with the physical memory rather with the virtual one. + +The memory mode can be checked by sending the following command: + +``maintenance packet qqemu.PhyMemMode`` + This will return either 0 or 1, 1 indicates you are currently in the + physical memory mode. + +``maintenance packet Qqemu.PhyMemMode:1`` + This will change the memory mode to physical memory. + +``maintenance packet Qqemu.PhyMemMode:0`` + This will change it back to normal memory mode. diff --git a/docs/system/generic-loader.rst b/docs/system/generic-loader.rst new file mode 100644 index 000000000..4f9fb005f --- /dev/null +++ b/docs/system/generic-loader.rst @@ -0,0 +1,120 @@ +.. + Copyright (c) 2016, Xilinx Inc. + + This work is licensed under the terms of the GNU GPL, version 2 or later. See + the COPYING file in the top-level directory. + +Generic Loader +-------------- + +The 'loader' device allows the user to load multiple images or values into +QEMU at startup. + +Loading Data into Memory Values +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The loader device allows memory values to be set from the command line. This +can be done by following the syntax below:: + + -device loader,addr=<addr>,data=<data>,data-len=<data-len> \ + [,data-be=<data-be>][,cpu-num=<cpu-num>] + +``<addr>`` + The address to store the data in. + +``<data>`` + The value to be written to the address. The maximum size of the data + is 8 bytes. + +``<data-len>`` + The length of the data in bytes. This argument must be included if + the data argument is. + +``<data-be>`` + Set to true if the data to be stored on the guest should be written + as big endian data. The default is to write little endian data. + +``<cpu-num>`` + The number of the CPU's address space where the data should be + loaded. If not specified the address space of the first CPU is used. + +All values are parsed using the standard QemuOps parsing. This allows the user +to specify any values in any format supported. By default the values +will be parsed as decimal. To use hex values the user should prefix the number +with a '0x'. + +An example of loading value 0x8000000e to address 0xfd1a0104 is:: + + -device loader,addr=0xfd1a0104,data=0x8000000e,data-len=4 + +Setting a CPU's Program Counter +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The loader device allows the CPU's PC to be set from the command line. This +can be done by following the syntax below:: + + -device loader,addr=<addr>,cpu-num=<cpu-num> + +``<addr>`` + The value to use as the CPU's PC. + +``<cpu-num>`` + The number of the CPU whose PC should be set to the specified value. + +All values are parsed using the standard QemuOpts parsing. This allows the user +to specify any values in any format supported. By default the values +will be parsed as decimal. To use hex values the user should prefix the number +with a '0x'. + +An example of setting CPU 0's PC to 0x8000 is:: + + -device loader,addr=0x8000,cpu-num=0 + +Loading Files +^^^^^^^^^^^^^ + +The loader device also allows files to be loaded into memory. It can load ELF, +U-Boot, and Intel HEX executable formats as well as raw images. The syntax is +shown below: + + -device loader,file=<file>[,addr=<addr>][,cpu-num=<cpu-num>][,force-raw=<raw>] + +``<file>`` + A file to be loaded into memory + +``<addr>`` + The memory address where the file should be loaded. This is required + for raw images and ignored for non-raw files. + +``<cpu-num>`` + This specifies the CPU that should be used. This is an + optional argument and will cause the CPU's PC to be set to the + memory address where the raw file is loaded or the entry point + specified in the executable format header. This option should only + be used for the boot image. This will also cause the image to be + written to the specified CPU's address space. If not specified, the + default is CPU 0. + +``<force-raw>`` + Setting 'force-raw=on' forces the file to be treated as a raw image. + This can be used to load supported executable formats as if they + were raw. + +All values are parsed using the standard QemuOpts parsing. This allows the user +to specify any values in any format supported. By default the values +will be parsed as decimal. To use hex values the user should prefix the number +with a '0x'. + +An example of loading an ELF file which CPU0 will boot is shown below:: + + -device loader,file=./images/boot.elf,cpu-num=0 + +Restrictions and ToDos +^^^^^^^^^^^^^^^^^^^^^^ + +At the moment it is just assumed that if you specify a cpu-num then +you want to set the PC as well. This might not always be the case. In +future the internal state 'set_pc' (which exists in the generic loader +now) should be exposed to the user so that they can choose if the PC +is set or not. + + diff --git a/docs/system/guest-loader.rst b/docs/system/guest-loader.rst new file mode 100644 index 000000000..9ef9776bf --- /dev/null +++ b/docs/system/guest-loader.rst @@ -0,0 +1,54 @@ +.. + Copyright (c) 2020, Linaro + +Guest Loader +------------ + +The guest loader is similar to the ``generic-loader`` although it is +aimed at a particular use case of loading hypervisor guests. This is +useful for debugging hypervisors without having to jump through the +hoops of firmware and boot-loaders. + +The guest loader does two things: + + - load blobs (kernels and initial ram disks) into memory + - sets platform FDT data so hypervisors can find and boot them + +This is what is typically done by a boot-loader like grub using it's +multi-boot capability. A typical example would look like: + +.. parsed-literal:: + + |qemu_system| -kernel ~/xen.git/xen/xen \ + -append "dom0_mem=1G,max:1G loglvl=all guest_loglvl=all" \ + -device guest-loader,addr=0x42000000,kernel=Image,bootargs="root=/dev/sda2 ro console=hvc0 earlyprintk=xen" \ + -device guest-loader,addr=0x47000000,initrd=rootfs.cpio + +In the above example the Xen hypervisor is loaded by the -kernel +parameter and passed it's boot arguments via -append. The Dom0 guest +is loaded into the areas of memory. Each blob will get +``/chosen/module@<addr>`` entry in the FDT to indicate it's location and +size. Additional information can be passed with by using additional +arguments. + +Currently the only supported machines which use FDT data to boot are +the ARM and RiscV ``virt`` machines. + +Arguments +^^^^^^^^^ + +The full syntax of the guest-loader is:: + + -device guest-loader,addr=<addr>[,kernel=<file>,[bootargs=<args>]][,initrd=<file>] + +``addr=<addr>`` + This is mandatory and indicates the start address of the blob. + +``kernel|initrd=<file>`` + Indicates the filename of the kernel or initrd blob. Both blobs will + have the "multiboot,module" compatibility string as well as + "multiboot,kernel" or "multiboot,ramdisk" as appropriate. + +``bootargs=<args>`` + This is an optional field for kernel blobs which will pass command + like via the ``/chosen/module@<addr>/bootargs`` node. diff --git a/docs/system/i386/cpu.rst b/docs/system/i386/cpu.rst new file mode 100644 index 000000000..738719da9 --- /dev/null +++ b/docs/system/i386/cpu.rst @@ -0,0 +1 @@ +.. include:: ../cpu-models-x86.rst.inc diff --git a/docs/system/i386/kvm-pv.rst b/docs/system/i386/kvm-pv.rst new file mode 100644 index 000000000..1e5a9923e --- /dev/null +++ b/docs/system/i386/kvm-pv.rst @@ -0,0 +1,100 @@ +Paravirtualized KVM features +============================ + +Description +----------- + +In some cases when implementing hardware interfaces in software is slow, ``KVM`` +implements its own paravirtualized interfaces. + +Setup +----- + +Paravirtualized ``KVM`` features are represented as CPU flags. The following +features are enabled by default for any CPU model when ``KVM`` acceleration is +enabled: + +- ``kvmclock`` +- ``kvm-nopiodelay`` +- ``kvm-asyncpf`` +- ``kvm-steal-time`` +- ``kvm-pv-eoi`` +- ``kvmclock-stable-bit`` + +``kvm-msi-ext-dest-id`` feature is enabled by default in x2apic mode with split +irqchip (e.g. "-machine ...,kernel-irqchip=split -cpu ...,x2apic"). + +Note: when CPU model ``host`` is used, QEMU passes through all supported +paravirtualized ``KVM`` features to the guest. + +Existing features +----------------- + +``kvmclock`` + Expose a ``KVM`` specific paravirtualized clocksource to the guest. Supported + since Linux v2.6.26. + +``kvm-nopiodelay`` + The guest doesn't need to perform delays on PIO operations. Supported since + Linux v2.6.26. + +``kvm-mmu`` + This feature is deprecated. + +``kvm-asyncpf`` + Enable asynchronous page fault mechanism. Supported since Linux v2.6.38. + Note: since Linux v5.10 the feature is deprecated and not enabled by ``KVM``. + Use ``kvm-asyncpf-int`` instead. + +``kvm-steal-time`` + Enable stolen (when guest vCPU is not running) time accounting. Supported + since Linux v3.1. + +``kvm-pv-eoi`` + Enable paravirtualized end-of-interrupt signaling. Supported since Linux + v3.10. + +``kvm-pv-unhalt`` + Enable paravirtualized spinlocks support. Supported since Linux v3.12. + +``kvm-pv-tlb-flush`` + Enable paravirtualized TLB flush mechanism. Supported since Linux v4.16. + +``kvm-pv-ipi`` + Enable paravirtualized IPI mechanism. Supported since Linux v4.19. + +``kvm-poll-control`` + Enable host-side polling on HLT control from the guest. Supported since Linux + v5.10. + +``kvm-pv-sched-yield`` + Enable paravirtualized sched yield feature. Supported since Linux v5.10. + +``kvm-asyncpf-int`` + Enable interrupt based asynchronous page fault mechanism. Supported since Linux + v5.10. + +``kvm-msi-ext-dest-id`` + Support 'Extended Destination ID' for external interrupts. The feature allows + to use up to 32768 CPUs without IRQ remapping (but other limits may apply making + the number of supported vCPUs for a given configuration lower). Supported since + Linux v5.10. + +``kvmclock-stable-bit`` + Tell the guest that guest visible TSC value can be fully trusted for kvmclock + computations and no warps are expected. Supported since Linux v2.6.35. + +Supplementary features +---------------------- + +``kvm-pv-enforce-cpuid`` + Limit the supported paravirtualized feature set to the exposed features only. + Note, by default, ``KVM`` allows the guest to use all currently supported + paravirtualized features even when they were not announced in guest visible + CPUIDs. Supported since Linux v5.10. + + +Useful links +------------ + +Please refer to Documentation/virt/kvm in Linux for additional details. diff --git a/docs/system/i386/microvm.rst b/docs/system/i386/microvm.rst new file mode 100644 index 000000000..1675e37d3 --- /dev/null +++ b/docs/system/i386/microvm.rst @@ -0,0 +1,128 @@ +'microvm' virtual platform (``microvm``) +======================================== + +``microvm`` is a machine type inspired by ``Firecracker`` and +constructed after its machine model. + +It's a minimalist machine type without ``PCI`` nor ``ACPI`` support, +designed for short-lived guests. microvm also establishes a baseline +for benchmarking and optimizing both QEMU and guest operating systems, +since it is optimized for both boot time and footprint. + + +Supported devices +----------------- + +The microvm machine type supports the following devices: + +- ISA bus +- i8259 PIC (optional) +- i8254 PIT (optional) +- MC146818 RTC (optional) +- One ISA serial port (optional) +- LAPIC +- IOAPIC (with kernel-irqchip=split by default) +- kvmclock (if using KVM) +- fw_cfg +- Up to eight virtio-mmio devices (configured by the user) + + +Limitations +----------- + +Currently, microvm does *not* support the following features: + +- PCI-only devices. +- Hotplug of any kind. +- Live migration across QEMU versions. + + +Using the microvm machine type +------------------------------ + +Machine-specific options +~~~~~~~~~~~~~~~~~~~~~~~~ + +It supports the following machine-specific options: + +- microvm.x-option-roms=bool (Set off to disable loading option ROMs) +- microvm.pit=OnOffAuto (Enable i8254 PIT) +- microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port) +- microvm.pic=OnOffAuto (Enable i8259 PIC) +- microvm.rtc=OnOffAuto (Enable MC146818 RTC) +- microvm.auto-kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline) + + +Boot options +~~~~~~~~~~~~ + +By default, microvm uses ``qboot`` as its BIOS, to obtain better boot +times, but it's also compatible with ``SeaBIOS``. + +As no current FW is able to boot from a block device using +``virtio-mmio`` as its transport, a microvm-based VM needs to be run +using a host-side kernel and, optionally, an initrd image. + + +Running a microvm-based VM +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By default, microvm aims for maximum compatibility, enabling both +legacy and non-legacy devices. In this example, a VM is created +without passing any additional machine-specific option, using the +legacy ``ISA serial`` device as console:: + + $ qemu-system-x86_64 -M microvm \ + -enable-kvm -cpu host -m 512m -smp 2 \ + -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \ + -nodefaults -no-user-config -nographic \ + -serial stdio \ + -drive id=test,file=test.img,format=raw,if=none \ + -device virtio-blk-device,drive=test \ + -netdev tap,id=tap0,script=no,downscript=no \ + -device virtio-net-device,netdev=tap0 + +While the example above works, you might be interested in reducing the +footprint further by disabling some legacy devices. If you're using +``KVM``, you can disable the ``RTC``, making the Guest rely on +``kvmclock`` exclusively. Additionally, if your host's CPUs have the +``TSC_DEADLINE`` feature, you can also disable both the i8259 PIC and +the i8254 PIT (make sure you're also emulating a CPU with such feature +in the guest). + +This is an example of a VM with all optional legacy features +disabled:: + + $ qemu-system-x86_64 \ + -M microvm,x-option-roms=off,pit=off,pic=off,isa-serial=off,rtc=off \ + -enable-kvm -cpu host -m 512m -smp 2 \ + -kernel vmlinux -append "console=hvc0 root=/dev/vda" \ + -nodefaults -no-user-config -nographic \ + -chardev stdio,id=virtiocon0 \ + -device virtio-serial-device \ + -device virtconsole,chardev=virtiocon0 \ + -drive id=test,file=test.img,format=raw,if=none \ + -device virtio-blk-device,drive=test \ + -netdev tap,id=tap0,script=no,downscript=no \ + -device virtio-net-device,netdev=tap0 + + +Triggering a guest-initiated shut down +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As the microvm machine type includes just a small set of system +devices, some x86 mechanisms for rebooting or shutting down the +system, like sending a key sequence to the keyboard or writing to an +ACPI register, doesn't have any effect in the VM. + +The recommended way to trigger a guest-initiated shut down is by +generating a ``triple-fault``, which will cause the VM to initiate a +reboot. Additionally, if the ``-no-reboot`` argument is present in the +command line, QEMU will detect this event and terminate its own +execution gracefully. + +Linux does support this mechanism, but by default will only be used +after other options have been tried and failed, causing the reboot to +be delayed by a small number of seconds. It's possible to instruct it +to try the triple-fault mechanism first, by adding ``reboot=t`` to the +kernel's command line. diff --git a/docs/system/i386/pc.rst b/docs/system/i386/pc.rst new file mode 100644 index 000000000..d543c11a5 --- /dev/null +++ b/docs/system/i386/pc.rst @@ -0,0 +1,7 @@ +i440fx PC (``pc-i440fx``, ``pc``) +================================= + +Peripherals +~~~~~~~~~~~ + +.. include:: ../target-i386-desc.rst.inc diff --git a/docs/system/i386/sgx.rst b/docs/system/i386/sgx.rst new file mode 100644 index 000000000..f8fade5ac --- /dev/null +++ b/docs/system/i386/sgx.rst @@ -0,0 +1,165 @@ +Software Guard eXtensions (SGX) +=============================== + +Overview +-------- + +Intel Software Guard eXtensions (SGX) is a set of instructions and mechanisms +for memory accesses in order to provide security accesses for sensitive +applications and data. SGX allows an application to use it's pariticular +address space as an *enclave*, which is a protected area provides confidentiality +and integrity even in the presence of privileged malware. Accesses to the +enclave memory area from any software not resident in the enclave are prevented, +including those from privileged software. + +Virtual SGX +----------- + +SGX feature is exposed to guest via SGX CPUID. Looking at SGX CPUID, we can +report the same CPUID info to guest as on host for most of SGX CPUID. With +reporting the same CPUID guest is able to use full capacity of SGX, and KVM +doesn't need to emulate those info. + +The guest's EPC base and size are determined by QEMU, and KVM needs QEMU to +notify such info to it before it can initialize SGX for guest. + +Virtual EPC +~~~~~~~~~~~ + +By default, QEMU does not assign EPC to a VM, i.e. fully enabling SGX in a VM +requires explicit allocation of EPC to the VM. Similar to other specialized +memory types, e.g. hugetlbfs, EPC is exposed as a memory backend. + +SGX EPC is enumerated through CPUID, i.e. EPC "devices" need to be realized +prior to realizing the vCPUs themselves, which occurs long before generic +devices are parsed and realized. This limitation means that EPC does not +require -maxmem as EPC is not treated as {cold,hot}plugged memory. + +QEMU does not artificially restrict the number of EPC sections exposed to a +guest, e.g. QEMU will happily allow you to create 64 1M EPC sections. Be aware +that some kernels may not recognize all EPC sections, e.g. the Linux SGX driver +is hardwired to support only 8 EPC sections. + +The following QEMU snippet creates two EPC sections, with 64M pre-allocated +to the VM and an additional 28M mapped but not allocated:: + + -object memory-backend-epc,id=mem1,size=64M,prealloc=on \ + -object memory-backend-epc,id=mem2,size=28M \ + -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2 + +Note: + +The size and location of the virtual EPC are far less restricted compared +to physical EPC. Because physical EPC is protected via range registers, +the size of the physical EPC must be a power of two (though software sees +a subset of the full EPC, e.g. 92M or 128M) and the EPC must be naturally +aligned. KVM SGX's virtual EPC is purely a software construct and only +requires the size and location to be page aligned. QEMU enforces the EPC +size is a multiple of 4k and will ensure the base of the EPC is 4k aligned. +To simplify the implementation, EPC is always located above 4g in the guest +physical address space. + +Migration +~~~~~~~~~ + +QEMU/KVM doesn't prevent live migrating SGX VMs, although from hardware's +perspective, SGX doesn't support live migration, since both EPC and the SGX +key hierarchy are bound to the physical platform. However live migration +can be supported in the sense if guest software stack can support recreating +enclaves when it suffers sudden lose of EPC; and if guest enclaves can detect +SGX keys being changed, and handle gracefully. For instance, when ERESUME fails +with #PF.SGX, guest software can gracefully detect it and recreate enclaves; +and when enclave fails to unseal sensitive information from outside, it can +detect such error and sensitive information can be provisioned to it again. + +CPUID +~~~~~ + +Due to its myriad dependencies, SGX is currently not listed as supported +in any of QEMU's built-in CPU configuration. To expose SGX (and SGX Launch +Control) to a guest, you must either use ``-cpu host`` to pass-through the +host CPU model, or explicitly enable SGX when using a built-in CPU model, +e.g. via ``-cpu <model>,+sgx`` or ``-cpu <model>,+sgx,+sgxlc``. + +All SGX sub-features enumerated through CPUID, e.g. SGX2, MISCSELECT, +ATTRIBUTES, etc... can be restricted via CPUID flags. Be aware that enforcing +restriction of MISCSELECT, ATTRIBUTES and XFRM requires intercepting ECREATE, +i.e. may marginally reduce SGX performance in the guest. All SGX sub-features +controlled via -cpu are prefixed with "sgx", e.g.:: + + $ qemu-system-x86_64 -cpu help | xargs printf "%s\n" | grep sgx + sgx + sgx-debug + sgx-encls-c + sgx-enclv + sgx-exinfo + sgx-kss + sgx-mode64 + sgx-provisionkey + sgx-tokenkey + sgx1 + sgx2 + sgxlc + +The following QEMU snippet passes through the host CPU but restricts access to +the provision and EINIT token keys:: + + -cpu host,-sgx-provisionkey,-sgx-tokenkey + +SGX sub-features cannot be emulated, i.e. sub-features that are not present +in hardware cannot be forced on via '-cpu'. + +Virtualize SGX Launch Control +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +QEMU SGX support for Launch Control (LC) is passive, in the sense that it +does not actively change the LC configuration. QEMU SGX provides the user +the ability to set/clear the CPUID flag (and by extension the associated +IA32_FEATURE_CONTROL MSR bit in fw_cfg) and saves/restores the LE Hash MSRs +when getting/putting guest state, but QEMU does not add new controls to +directly modify the LC configuration. Similar to hardware behavior, locking +the LC configuration to a non-Intel value is left to guest firmware. Unlike +host bios setting for SGX launch control(LC), there is no special bios setting +for SGX guest by our design. If host is in locked mode, we can still allow +creating VM with SGX. + +Feature Control +~~~~~~~~~~~~~~~ + +QEMU SGX updates the ``etc/msr_feature_control`` fw_cfg entry to set the SGX +(bit 18) and SGX LC (bit 17) flags based on their respective CPUID support, +i.e. existing guest firmware will automatically set SGX and SGX LC accordingly, +assuming said firmware supports fw_cfg.msr_feature_control. + +Launching a guest +----------------- + +To launch a SGX guest: + +.. parsed-literal:: + + |qemu_system_x86| \\ + -cpu host,+sgx-provisionkey \\ + -object memory-backend-epc,id=mem1,size=64M,prealloc=on \\ + -object memory-backend-epc,id=mem2,size=28M \\ + -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2 + +Utilizing SGX in the guest requires a kernel/OS with SGX support. +The support can be determined in guest by:: + + $ grep sgx /proc/cpuinfo + +and SGX epc info by:: + + $ dmesg | grep sgx + [ 1.242142] sgx: EPC section 0x180000000-0x181bfffff + [ 1.242319] sgx: EPC section 0x181c00000-0x1837fffff + +References +---------- + +- `SGX Homepage <https://software.intel.com/sgx>`__ + +- `SGX SDK <https://github.com/intel/linux-sgx.git>`__ + +- SGX specification: Intel SDM Volume 3 diff --git a/docs/system/images.rst b/docs/system/images.rst new file mode 100644 index 000000000..d000bd6b6 --- /dev/null +++ b/docs/system/images.rst @@ -0,0 +1,85 @@ +.. _disk images: + +Disk Images +----------- + +QEMU supports many disk image formats, including growable disk images +(their size increase as non empty sectors are written), compressed and +encrypted disk images. + +.. _disk_005fimages_005fquickstart: + +Quick start for disk image creation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You can create a disk image with the command:: + + qemu-img create myimage.img mysize + +where myimage.img is the disk image filename and mysize is its size in +kilobytes. You can add an ``M`` suffix to give the size in megabytes and +a ``G`` suffix for gigabytes. + +See the ``qemu-img`` invocation documentation for more information. + +.. _disk_005fimages_005fsnapshot_005fmode: + +Snapshot mode +~~~~~~~~~~~~~ + +If you use the option ``-snapshot``, all disk images are considered as +read only. When sectors in written, they are written in a temporary file +created in ``/tmp``. You can however force the write back to the raw +disk images by using the ``commit`` monitor command (or C-a s in the +serial console). + +.. _vm_005fsnapshots: + +VM snapshots +~~~~~~~~~~~~ + +VM snapshots are snapshots of the complete virtual machine including CPU +state, RAM, device state and the content of all the writable disks. In +order to use VM snapshots, you must have at least one non removable and +writable block device using the ``qcow2`` disk image format. Normally +this device is the first virtual hard drive. + +Use the monitor command ``savevm`` to create a new VM snapshot or +replace an existing one. A human readable name can be assigned to each +snapshot in addition to its numerical ID. + +Use ``loadvm`` to restore a VM snapshot and ``delvm`` to remove a VM +snapshot. ``info snapshots`` lists the available snapshots with their +associated information:: + + (qemu) info snapshots + Snapshot devices: hda + Snapshot list (from hda): + ID TAG VM SIZE DATE VM CLOCK + 1 start 41M 2006-08-06 12:38:02 00:00:14.954 + 2 40M 2006-08-06 12:43:29 00:00:18.633 + 3 msys 40M 2006-08-06 12:44:04 00:00:23.514 + +A VM snapshot is made of a VM state info (its size is shown in +``info snapshots``) and a snapshot of every writable disk image. The VM +state info is stored in the first ``qcow2`` non removable and writable +block device. The disk image snapshots are stored in every disk image. +The size of a snapshot in a disk image is difficult to evaluate and is +not shown by ``info snapshots`` because the associated disk sectors are +shared among all the snapshots to save disk space (otherwise each +snapshot would need a full copy of all the disk images). + +When using the (unrelated) ``-snapshot`` option +(:ref:`disk_005fimages_005fsnapshot_005fmode`), +you can always make VM snapshots, but they are deleted as soon as you +exit QEMU. + +VM snapshots currently have the following known limitations: + +- They cannot cope with removable devices if they are removed or + inserted after a snapshot is done. + +- A few device drivers still have incomplete snapshot support so their + state is not saved or restored properly (in particular USB). + +.. include:: qemu-block-drivers.rst.inc diff --git a/docs/system/index.rst b/docs/system/index.rst new file mode 100644 index 000000000..73bbedbc2 --- /dev/null +++ b/docs/system/index.rst @@ -0,0 +1,36 @@ +---------------- +System Emulation +---------------- + +This section of the manual is the overall guide for users using QEMU +for full system emulation (as opposed to user-mode emulation). +This includes working with hypervisors such as KVM, Xen, Hax +or Hypervisor.Framework. + +.. toctree:: + :maxdepth: 3 + + quickstart + invocation + device-emulation + keys + mux-chardev + monitor + images + virtio-net-failover + linuxboot + generic-loader + guest-loader + barrier + vnc-security + tls + secrets + authz + gdb + managed-startup + bootindex + cpu-hotplug + pr-manager + targets + security + multi-process diff --git a/docs/system/invocation.rst b/docs/system/invocation.rst new file mode 100644 index 000000000..4ba38fc23 --- /dev/null +++ b/docs/system/invocation.rst @@ -0,0 +1,18 @@ +.. _sec_005finvocation: + +Invocation +---------- + +.. parsed-literal:: + + |qemu_system| [options] [disk_image] + +disk_image is a raw hard disk image for IDE hard disk 0. Some targets do +not need a disk image. + +.. hxtool-doc:: qemu-options.hx + +Device URL Syntax +~~~~~~~~~~~~~~~~~ + +.. include:: device-url-syntax.rst.inc diff --git a/docs/system/keys.rst b/docs/system/keys.rst new file mode 100644 index 000000000..e596ae6c4 --- /dev/null +++ b/docs/system/keys.rst @@ -0,0 +1,6 @@ +.. _pcsys_005fkeys: + +Keys in the graphical frontends +------------------------------- + +.. include:: keys.rst.inc diff --git a/docs/system/keys.rst.inc b/docs/system/keys.rst.inc new file mode 100644 index 000000000..bd9b8e5f6 --- /dev/null +++ b/docs/system/keys.rst.inc @@ -0,0 +1,35 @@ +During the graphical emulation, you can use special key combinations to +change modes. The default key mappings are shown below, but if you use +``-alt-grab`` then the modifier is Ctrl-Alt-Shift (instead of Ctrl-Alt) +and if you use ``-ctrl-grab`` then the modifier is the right Ctrl key +(instead of Ctrl-Alt): + +Ctrl-Alt-f + Toggle full screen + +Ctrl-Alt-+ + Enlarge the screen + +Ctrl-Alt\-- + Shrink the screen + +Ctrl-Alt-u + Restore the screen's un-scaled dimensions + +Ctrl-Alt-n + Switch to virtual console 'n'. Standard console mappings are: + + *1* + Target system display + + *2* + Monitor + + *3* + Serial port + +Ctrl-Alt + Toggle mouse and keyboard grab. + +In the virtual consoles, you can use Ctrl-Up, Ctrl-Down, Ctrl-PageUp and +Ctrl-PageDown to move in the back log. diff --git a/docs/system/linuxboot.rst b/docs/system/linuxboot.rst new file mode 100644 index 000000000..228650abc --- /dev/null +++ b/docs/system/linuxboot.rst @@ -0,0 +1,30 @@ +.. _direct_005flinux_005fboot: + +Direct Linux Boot +----------------- + +This section explains how to launch a Linux kernel inside QEMU without +having to make a full bootable image. It is very useful for fast Linux +kernel testing. + +The syntax is: + +.. parsed-literal:: + + |qemu_system| -kernel bzImage -hda rootdisk.img -append "root=/dev/hda" + +Use ``-kernel`` to provide the Linux kernel image and ``-append`` to +give the kernel command line arguments. The ``-initrd`` option can be +used to provide an INITRD image. + +If you do not need graphical output, you can disable it and redirect the +virtual serial port and the QEMU monitor to the console with the +``-nographic`` option. The typical command line is: + +.. parsed-literal:: + + |qemu_system| -kernel bzImage -hda rootdisk.img \ + -append "root=/dev/hda console=ttyS0" -nographic + +Use Ctrl-a c to switch between the serial console and the monitor (see +:ref:`pcsys_005fkeys`). diff --git a/docs/system/managed-startup.rst b/docs/system/managed-startup.rst new file mode 100644 index 000000000..9bcf98ea7 --- /dev/null +++ b/docs/system/managed-startup.rst @@ -0,0 +1,35 @@ +Managed start up options +======================== + +In system mode emulation, it's possible to create a VM in a paused +state using the ``-S`` command line option. In this state the machine +is completely initialized according to command line options and ready +to execute VM code but VCPU threads are not executing any code. The VM +state in this paused state depends on the way QEMU was started. It +could be in: + +- initial state (after reset/power on state) +- with direct kernel loading, the initial state could be amended to execute + code loaded by QEMU in the VM's RAM and with incoming migration +- with incoming migration, initial state will be amended with the migrated + machine state after migration completes + +This paused state is typically used by users to query machine state and/or +additionally configure the machine (by hotplugging devices) in runtime before +allowing VM code to run. + +However, at the ``-S`` pause point, it's impossible to configure options +that affect initial VM creation (like: ``-smp``/``-m``/``-numa`` ...) or +cold plug devices. The experimental ``--preconfig`` command line option +allows pausing QEMU before the initial VM creation, in a "preconfig" state, +where additional queries and configuration can be performed via QMP +before moving on to the resulting configuration startup. In the +preconfig state, QEMU only allows a limited set of commands over the +QMP monitor, where the commands do not depend on an initialized +machine, including but not limited to: + +- ``qmp_capabilities`` +- ``query-qmp-schema`` +- ``query-commands`` +- ``query-status`` +- ``x-exit-preconfig`` diff --git a/docs/system/monitor.rst b/docs/system/monitor.rst new file mode 100644 index 000000000..ff5c43461 --- /dev/null +++ b/docs/system/monitor.rst @@ -0,0 +1,31 @@ +.. _QEMU monitor: + +QEMU Monitor +------------ + +The QEMU monitor is used to give complex commands to the QEMU emulator. +You can use it to: + +- Remove or insert removable media images (such as CD-ROM or + floppies). + +- Freeze/unfreeze the Virtual Machine (VM) and save or restore its + state from a disk file. + +- Inspect the VM state without an external debugger. + +Commands +~~~~~~~~ + +The following commands are available: + +.. hxtool-doc:: hmp-commands.hx + +.. hxtool-doc:: hmp-commands-info.hx + +Integer expressions +~~~~~~~~~~~~~~~~~~~ + +The monitor understands integers expressions for every integer argument. +You can use register names to get the value of specifics CPU registers +by prefixing them with *$*. diff --git a/docs/system/multi-process.rst b/docs/system/multi-process.rst new file mode 100644 index 000000000..210531ee1 --- /dev/null +++ b/docs/system/multi-process.rst @@ -0,0 +1,64 @@ +Multi-process QEMU +================== + +This document describes how to configure and use multi-process qemu. +For the design document refer to docs/devel/qemu-multiprocess. + +1) Configuration +---------------- + +multi-process is enabled by default for targets that enable KVM + + +2) Usage +-------- + +Multi-process QEMU requires an orchestrator to launch. + +Following is a description of command-line used to launch mpqemu. + +* Orchestrator: + + - The Orchestrator creates a unix socketpair + + - It launches the remote process and passes one of the + sockets to it via command-line. + + - It then launches QEMU and specifies the other socket as an option + to the Proxy device object + +* Remote Process: + + - QEMU can enter remote process mode by using the "remote" machine + option. + + - The orchestrator creates a "remote-object" with details about + the device and the file descriptor for the device + + - The remaining options are no different from how one launches QEMU with + devices. + + - Example command-line for the remote process is as follows: + + /usr/bin/qemu-system-x86_64 \ + -machine x-remote \ + -device lsi53c895a,id=lsi0 \ + -drive id=drive_image2,file=/build/ol7-nvme-test-1.qcow2 \ + -device scsi-hd,id=drive2,drive=drive_image2,bus=lsi0.0,scsi-id=0 \ + -object x-remote-object,id=robj1,devid=lsi0,fd=4, + +* QEMU: + + - Since parts of the RAM are shared between QEMU & remote process, a + memory-backend-memfd is required to facilitate this, as follows: + + -object memory-backend-memfd,id=mem,size=2G + + - A "x-pci-proxy-dev" device is created for each of the PCI devices emulated + in the remote process. A "socket" sub-option specifies the other end of + unix channel created by orchestrator. The "id" sub-option must be specified + and should be the same as the "id" specified for the remote PCI device + + - Example commandline for QEMU is as follows: + + -device x-pci-proxy-dev,id=lsi0,socket=3 diff --git a/docs/system/mux-chardev.rst b/docs/system/mux-chardev.rst new file mode 100644 index 000000000..05064068a --- /dev/null +++ b/docs/system/mux-chardev.rst @@ -0,0 +1,6 @@ +.. _keys in the character backend multiplexer: + +Keys in the character backend multiplexer +----------------------------------------- + +.. include:: mux-chardev.rst.inc diff --git a/docs/system/mux-chardev.rst.inc b/docs/system/mux-chardev.rst.inc new file mode 100644 index 000000000..84ea12cbf --- /dev/null +++ b/docs/system/mux-chardev.rst.inc @@ -0,0 +1,27 @@ +During emulation, if you are using a character backend multiplexer +(which is the default if you are using ``-nographic``) then several +commands are available via an escape sequence. These key sequences all +start with an escape character, which is Ctrl-a by default, but can be +changed with ``-echr``. The list below assumes you're using the default. + +Ctrl-a h + Print this help + +Ctrl-a x + Exit emulator + +Ctrl-a s + Save disk data back to file (if -snapshot) + +Ctrl-a t + Toggle console timestamps + +Ctrl-a b + Send break (magic sysrq in Linux) + +Ctrl-a c + Rotate between the frontends connected to the multiplexer (usually + this switches between the monitor and the console) + +Ctrl-a Ctrl-a + Send the escape character to the frontend diff --git a/docs/system/ppc/embedded.rst b/docs/system/ppc/embedded.rst new file mode 100644 index 000000000..cfffbda24 --- /dev/null +++ b/docs/system/ppc/embedded.rst @@ -0,0 +1,10 @@ +Embedded family boards +====================== + +- ``bamboo`` bamboo +- ``mpc8544ds`` mpc8544ds +- ``ppce500`` generic paravirt e500 platform +- ``ref405ep`` ref405ep +- ``sam460ex`` aCube Sam460ex +- ``taihu`` taihu +- ``virtex-ml507`` Xilinx Virtex ML507 reference design diff --git a/docs/system/ppc/powermac.rst b/docs/system/ppc/powermac.rst new file mode 100644 index 000000000..04334ba21 --- /dev/null +++ b/docs/system/ppc/powermac.rst @@ -0,0 +1,34 @@ +PowerMac family boards (``g3beige``, ``mac99``) +================================================================== + +Use the executable ``qemu-system-ppc`` to simulate a complete PowerMac +PowerPC system. + +- ``g3beige`` Heathrow based PowerMAC +- ``mac99`` Mac99 based PowerMAC + +Supported devices +----------------- + +QEMU emulates the following PowerMac peripherals: + + * UniNorth or Grackle PCI Bridge + * PCI VGA compatible card with VESA Bochs Extensions + * 2 PMAC IDE interfaces with hard disk and CD-ROM support + * NE2000 PCI adapters + * Non Volatile RAM + * VIA-CUDA with ADB keyboard and mouse. + + +Missing devices +--------------- + + * To be identified + +Firmware +-------- + +Since version 0.9.1, QEMU uses OpenBIOS https://www.openbios.org/ for +the g3beige and mac99 PowerMac and the 40p machines. OpenBIOS is a free +(GPL v2) portable firmware implementation. The goal is to implement a +100% IEEE 1275-1994 (referred to as Open Firmware) compliant firmware. diff --git a/docs/system/ppc/powernv.rst b/docs/system/ppc/powernv.rst new file mode 100644 index 000000000..86186b7d2 --- /dev/null +++ b/docs/system/ppc/powernv.rst @@ -0,0 +1,192 @@ +PowerNV family boards (``powernv8``, ``powernv9``) +================================================================== + +PowerNV (as Non-Virtualized) is the "baremetal" platform using the +OPAL firmware. It runs Linux on IBM and OpenPOWER systems and it can +be used as an hypervisor OS, running KVM guests, or simply as a host +OS. + +The PowerNV QEMU machine tries to emulate a PowerNV system at the +level of the skiboot firmware, which loads the OS and provides some +runtime services. Power Systems have a lower firmware (HostBoot) that +does low level system initialization, like DRAM training. This is +beyond the scope of what QEMU addresses today. + +Supported devices +----------------- + + * Multi processor support for POWER8, POWER8NVL and POWER9. + * XSCOM, serial communication sideband bus to configure chiplets + * Simple LPC Controller + * Processor Service Interface (PSI) Controller + * Interrupt Controller, XICS (POWER8) and XIVE (POWER9) + * POWER8 PHB3 PCIe Host bridge and POWER9 PHB4 PCIe Host bridge + * Simple OCC is an on-chip microcontroller used for power management + tasks + * iBT device to handle BMC communication, with the internal BMC + simulator provided by QEMU or an external BMC such as an Aspeed + QEMU machine. + * PNOR containing the different firmware partitions. + +Missing devices +--------------- + +A lot is missing, among which : + + * POWER10 processor + * XIVE2 (POWER10) interrupt controller + * I2C controllers (yet to be merged) + * NPU/NPU2/NPU3 controllers + * EEH support for PCIe Host bridge controllers + * NX controller + * VAS controller + * chipTOD (Time Of Day) + * Self Boot Engine (SBE). + * FSI bus + +Firmware +-------- + +The OPAL firmware (OpenPower Abstraction Layer) for OpenPower systems +includes the runtime services ``skiboot`` and the bootloader kernel and +initramfs ``skiroot``. Source code can be found on GitHub: + + https://github.com/open-power. + +Prebuilt images of ``skiboot`` and ``skiroot`` are made available on the `OpenPOWER <https://github.com/open-power/op-build/releases/>`__ site. + +QEMU includes a prebuilt image of ``skiboot`` which is updated when a +more recent version is required by the models. + +Boot options +------------ + +Here is a simple setup with one e1000e NIC : + +.. code-block:: bash + + $ qemu-system-ppc64 -m 2G -machine powernv9 -smp 2,cores=2,threads=1 \ + -accel tcg,thread=single \ + -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0 \ + -netdev user,id=net0,hostfwd=::20022-:22,hostname=pnv \ + -kernel ./zImage.epapr \ + -initrd ./rootfs.cpio.xz \ + -nographic + +and a SATA disk : + +.. code-block:: bash + + -device ich9-ahci,id=sata0,bus=pcie.1,addr=0x0 \ + -drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \ + -device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \ + +Complex PCIe configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~ +Six PHBs are defined per chip (POWER9) but no default PCI layout is +provided (to be compatible with libvirt). One PCI device can be added +on any of the available PCIe slots using command line options such as: + +.. code-block:: bash + + -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0 + -netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0 + + -device megasas,id=scsi0,bus=pcie.0,addr=0x0 + -drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none + -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 + +Here is a full example with two different storage controllers on +different PHBs, each with a disk, the second PHB is empty : + +.. code-block:: bash + + $ qemu-system-ppc64 -m 2G -machine powernv9 -smp 2,cores=2,threads=1 -accel tcg,thread=single \ + -kernel ./zImage.epapr -initrd ./rootfs.cpio.xz -bios ./skiboot.lid \ + \ + -device megasas,id=scsi0,bus=pcie.0,addr=0x0 \ + -drive file=./rhel7-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none \ + -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 \ + \ + -device pcie-pci-bridge,id=bridge1,bus=pcie.1,addr=0x0 \ + \ + -device ich9-ahci,id=sata0,bus=bridge1,addr=0x1 \ + -drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \ + -device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \ + -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=bridge1,addr=0x2 \ + -netdev bridge,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=net0 \ + -device nec-usb-xhci,bus=bridge1,addr=0x7 \ + \ + -serial mon:stdio -nographic + +You can also use VIRTIO devices : + +.. code-block:: bash + + -drive file=./fedora-ppc64le.qcow2,if=none,snapshot=on,id=drive0 \ + -device virtio-blk-pci,drive=drive0,id=blk0,bus=pcie.0 \ + \ + -netdev tap,helper=/usr/lib/qemu/qemu-bridge-helper,br=virbr0,id=netdev0 \ + -device virtio-net-pci,netdev=netdev0,id=net0,bus=pcie.1 \ + \ + -fsdev local,id=fsdev0,path=$HOME,security_model=passthrough \ + -device virtio-9p-pci,fsdev=fsdev0,mount_tag=host,bus=pcie.2 + +Multi sockets +~~~~~~~~~~~~~ + +The number of sockets is deduced from the number of CPUs and the +number of cores. ``-smp 2,cores=1`` will define a machine with 2 +sockets of 1 core, whereas ``-smp 2,cores=2`` will define a machine +with 1 socket of 2 cores. ``-smp 8,cores=2``, 4 sockets of 2 cores. + +BMC configuration +~~~~~~~~~~~~~~~~~ + +OpenPOWER systems negotiate the shutdown and reboot with their +BMC. The QEMU PowerNV machine embeds an IPMI BMC simulator using the +iBT interface and should offer the same power features. + +If you want to define your own BMC, use ``-nodefaults`` and specify +one on the command line : + +.. code-block:: bash + + -device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 + +The files `palmetto-SDR.bin <http://www.kaod.org/qemu/powernv/palmetto-SDR.bin>`__ +and `palmetto-FRU.bin <http://www.kaod.org/qemu/powernv/palmetto-FRU.bin>`__ +define a Sensor Data Record repository and a Field Replaceable Unit +inventory for a palmetto BMC. They can be used to extend the QEMU BMC +simulator. + +.. code-block:: bash + + -device ipmi-bmc-sim,sdrfile=./palmetto-SDR.bin,fruareasize=256,frudatafile=./palmetto-FRU.bin,id=bmc0 \ + -device isa-ipmi-bt,bmc=bmc0,irq=10 + +The PowerNV machine can also be run with an external IPMI BMC device +connected to a remote QEMU machine acting as BMC, using these options +: + +.. code-block:: bash + + -chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 \ + -device ipmi-bmc-extern,id=bmc0,chardev=ipmi0 \ + -device isa-ipmi-bt,bmc=bmc0,irq=10 \ + -nodefaults + +NVRAM +~~~~~ + +Use a MTD drive to add a PNOR to the machine, and get a NVRAM : + +.. code-block:: bash + + -drive file=./witherspoon.pnor,format=raw,if=mtd + +CAVEATS +------- + + * No support for multiple HW threads (SMT=1). Same as pseries. + * CPU can hang when doing intensive I/Os. Use ``-append powersave=off`` in that case. diff --git a/docs/system/ppc/ppce500.rst b/docs/system/ppc/ppce500.rst new file mode 100644 index 000000000..9beef3917 --- /dev/null +++ b/docs/system/ppc/ppce500.rst @@ -0,0 +1,164 @@ +ppce500 generic platform (``ppce500``) +====================================== + +QEMU for PPC supports a special ``ppce500`` machine designed for emulation and +virtualization purposes. + +Supported devices +----------------- + +The ``ppce500`` machine supports the following devices: + +* PowerPC e500 series core (e500v2/e500mc/e5500/e6500) +* Configuration, Control, and Status Register (CCSR) +* Multicore Programmable Interrupt Controller (MPIC) with MSI support +* 1 16550A UART device +* 1 Freescale MPC8xxx I2C controller +* 1 Pericom pt7c4338 RTC via I2C +* 1 Freescale MPC8xxx GPIO controller +* Power-off functionality via one GPIO pin +* 1 Freescale MPC8xxx PCI host controller +* VirtIO devices via PCI bus +* 1 Freescale Enhanced Triple Speed Ethernet controller (eTSEC) + +Hardware configuration information +---------------------------------- + +The ``ppce500`` machine automatically generates a device tree blob ("dtb") +which it passes to the guest, if there is no ``-dtb`` option. This provides +information about the addresses, interrupt lines and other configuration of +the various devices in the system. + +If users want to provide their own DTB, they can use the ``-dtb`` option. +These DTBs should have the following requirements: + +* The number of subnodes under /cpus node should match QEMU's ``-smp`` option +* The /memory reg size should match QEMU’s selected ram_size via ``-m`` + +Both ``qemu-system-ppc`` and ``qemu-system-ppc64`` provide emulation for the +following 32-bit PowerPC CPUs: + +* e500v2 +* e500mc + +Additionally ``qemu-system-ppc64`` provides support for the following 64-bit +PowerPC CPUs: + +* e5500 +* e6500 + +The CPU type can be specified via the ``-cpu`` command line. If not specified, +it creates a machine with e500v2 core. The following example shows an e6500 +based machine creation: + +.. code-block:: bash + + $ qemu-system-ppc64 -nographic -M ppce500 -cpu e6500 + +Boot options +------------ + +The ``ppce500`` machine can start using the standard -kernel functionality +for loading a payload like an OS kernel (e.g.: Linux), or U-Boot firmware. + +When -bios is omitted, the default pc-bios/u-boot.e500 firmware image is used +as the BIOS. QEMU follows below truth table to select which payload to execute: + +===== ========== ======= +-bios -kernel payload +===== ========== ======= + N N u-boot + N Y kernel + Y don't care u-boot +===== ========== ======= + +When both -bios and -kernel are present, QEMU loads U-Boot and U-Boot in turns +automatically loads the kernel image specified by the -kernel parameter via +U-Boot's built-in "bootm" command, hence a legacy uImage format is required in +such scenario. + +Running Linux kernel +-------------------- + +Linux mainline v5.11 release is tested at the time of writing. To build a +Linux mainline kernel that can be booted by the ``ppce500`` machine in +64-bit mode, simply configure the kernel using the defconfig configuration: + +.. code-block:: bash + + $ export ARCH=powerpc + $ export CROSS_COMPILE=powerpc-linux- + $ make corenet64_smp_defconfig + $ make menuconfig + +then manually select the following configuration: + + Platform support > Freescale Book-E Machine Type > QEMU generic e500 platform + +To boot the newly built Linux kernel in QEMU with the ``ppce500`` machine: + +.. code-block:: bash + + $ qemu-system-ppc64 -M ppce500 -cpu e5500 -smp 4 -m 2G \ + -display none -serial stdio \ + -kernel vmlinux \ + -initrd /path/to/rootfs.cpio \ + -append "root=/dev/ram" + +To build a Linux mainline kernel that can be booted by the ``ppce500`` machine +in 32-bit mode, use the same 64-bit configuration steps except the defconfig +file should use corenet32_smp_defconfig. + +To boot the 32-bit Linux kernel: + +.. code-block:: bash + + $ qemu-system-ppc{64|32} -M ppce500 -cpu e500mc -smp 4 -m 2G \ + -display none -serial stdio \ + -kernel vmlinux \ + -initrd /path/to/rootfs.cpio \ + -append "root=/dev/ram" + +Running U-Boot +-------------- + +U-Boot mainline v2021.07 release is tested at the time of writing. To build a +U-Boot mainline bootloader that can be booted by the ``ppce500`` machine, use +the qemu-ppce500_defconfig with similar commands as described above for Linux: + +.. code-block:: bash + + $ export CROSS_COMPILE=powerpc-linux- + $ make qemu-ppce500_defconfig + +You will get u-boot file in the build tree. + +When U-Boot boots, you will notice the following if using with ``-cpu e6500``: + +.. code-block:: none + + CPU: Unknown, Version: 0.0, (0x00000000) + Core: e6500, Version: 2.0, (0x80400020) + +This is because we only specified a core name to QEMU and it does not have a +meaningful SVR value which represents an actual SoC that integrates such core. +You can specify a real world SoC device that QEMU has built-in support but all +these SoCs are e500v2 based MPC85xx series, hence you cannot test anything +built for P4080 (e500mc), P5020 (e5500) and T2080 (e6500). + +By default a VirtIO standard PCI networking device is connected as an ethernet +interface at PCI address 0.1.0, but we can switch that to an e1000 NIC by: + +.. code-block:: bash + + $ qemu-system-ppc -M ppce500 -smp 4 -m 2G \ + -display none -serial stdio \ + -bios u-boot \ + -nic tap,ifname=tap0,script=no,downscript=no,model=e1000 + +The QEMU ``ppce500`` machine can also dynamically instantiate an eTSEC device +if “-device eTSEC” is given to QEMU: + +.. code-block:: bash + + -netdev tap,ifname=tap0,script=no,downscript=no,id=net0 -device eTSEC,netdev=net0 diff --git a/docs/system/ppc/prep.rst b/docs/system/ppc/prep.rst new file mode 100644 index 000000000..bd9eb8eab --- /dev/null +++ b/docs/system/ppc/prep.rst @@ -0,0 +1,18 @@ +Prep machine (``40p``) +================================================================== + +Use the executable ``qemu-system-ppc`` to simulate a complete 40P (PREP) + +Supported devices +----------------- + +QEMU emulates the following 40P (PREP) peripherals: + + * PCI Bridge + * PCI VGA compatible card with VESA Bochs Extensions + * 2 IDE interfaces with hard disk and CD-ROM support + * Floppy disk + * PCnet network adapters + * Serial port + * PREP Non Volatile RAM + * PC compatible keyboard and mouse. diff --git a/docs/system/ppc/pseries.rst b/docs/system/ppc/pseries.rst new file mode 100644 index 000000000..932d4dd17 --- /dev/null +++ b/docs/system/ppc/pseries.rst @@ -0,0 +1,12 @@ +pSeries family boards (``pseries``) +=================================== + +Supported devices +----------------- + +Missing devices +--------------- + + +Firmware +-------- diff --git a/docs/system/pr-manager.rst b/docs/system/pr-manager.rst new file mode 100644 index 000000000..b19a0c15e --- /dev/null +++ b/docs/system/pr-manager.rst @@ -0,0 +1,83 @@ +=============================== +Persistent reservation managers +=============================== + +SCSI persistent reservations allow restricting access to block devices +to specific initiators in a shared storage setup. When implementing +clustering of virtual machines, it is a common requirement for virtual +machines to send persistent reservation SCSI commands. However, +the operating system restricts sending these commands to unprivileged +programs because incorrect usage can disrupt regular operation of the +storage fabric. + +For this reason, QEMU's SCSI passthrough devices, ``scsi-block`` +and ``scsi-generic`` (both are only available on Linux) can delegate +implementation of persistent reservations to a separate object, +the "persistent reservation manager". Only PERSISTENT RESERVE OUT and +PERSISTENT RESERVE IN commands are passed to the persistent reservation +manager object; other commands are processed by QEMU as usual. + +----------------------------------------- +Defining a persistent reservation manager +----------------------------------------- + +A persistent reservation manager is an instance of a subclass of the +"pr-manager" QOM class. + +Right now only one subclass is defined, ``pr-manager-helper``, which +forwards the commands to an external privileged helper program +over Unix sockets. The helper program only allows sending persistent +reservation commands to devices for which QEMU has a file descriptor, +so that QEMU will not be able to effect persistent reservations +unless it has access to both the socket and the device. + +``pr-manager-helper`` has a single string property, ``path``, which +accepts the path to the helper program's Unix socket. For example, +the following command line defines a ``pr-manager-helper`` object and +attaches it to a SCSI passthrough device:: + + $ qemu-system-x86_64 + -device virtio-scsi \ + -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock + -drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0 + -device scsi-block,drive=hd + +Alternatively, using ``-blockdev``:: + + $ qemu-system-x86_64 + -device virtio-scsi \ + -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock + -blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0 + -device scsi-block,drive=hd + +You will also need to ensure that the helper program +:command:`qemu-pr-helper` is running, and that it has been +set up to use the same socket filename as your QEMU commandline +specifies. See the qemu-pr-helper documentation or manpage for +further details. + +--------------------------------------------- +Multipath devices and persistent reservations +--------------------------------------------- + +Proper support of persistent reservation for multipath devices requires +communication with the multipath daemon, so that the reservation is +registered and applied when a path is newly discovered or becomes online +again. :command:`qemu-pr-helper` can do this if the ``libmpathpersist`` +library was available on the system at build time. + +As of August 2017, a reservation key must be specified in ``multipath.conf`` +for ``multipathd`` to check for persistent reservation for newly +discovered paths or reinstated paths. The attribute can be added +to the ``defaults`` section or the ``multipaths`` section; for example:: + + multipaths { + multipath { + wwid XXXXXXXXXXXXXXXX + alias yellow + reservation_key 0x123abc + } + } + +Linking :program:`qemu-pr-helper` to ``libmpathpersist`` does not impede +its usage on regular SCSI devices. diff --git a/docs/system/qemu-block-drivers.rst b/docs/system/qemu-block-drivers.rst new file mode 100644 index 000000000..c2c0114ce --- /dev/null +++ b/docs/system/qemu-block-drivers.rst @@ -0,0 +1,24 @@ +:orphan: + +============================ +QEMU block drivers reference +============================ + +-------- +Synopsis +-------- + +QEMU block driver reference manual + +----------- +Description +----------- + +.. include:: qemu-block-drivers.rst.inc + +-------- +See also +-------- + +The HTML documentation of QEMU for more precise information and Linux +user mode emulator invocation. diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc new file mode 100644 index 000000000..e31378442 --- /dev/null +++ b/docs/system/qemu-block-drivers.rst.inc @@ -0,0 +1,911 @@ +Disk image file formats +~~~~~~~~~~~~~~~~~~~~~~~ + +QEMU supports many image file formats that can be used with VMs as well as with +any of the tools (like ``qemu-img``). This includes the preferred formats +raw and qcow2 as well as formats that are supported for compatibility with +older QEMU versions or other hypervisors. + +Depending on the image format, different options can be passed to +``qemu-img create`` and ``qemu-img convert`` using the ``-o`` option. +This section describes each format and the options that are supported for it. + +.. program:: image-formats +.. option:: raw + + Raw disk image format. This format has the advantage of + being simple and easily exportable to all other emulators. If your + file system supports *holes* (for example in ext2 or ext3 on + Linux or NTFS on Windows), then only the written sectors will reserve + space. Use ``qemu-img info`` to know the real size used by the + image or ``ls -ls`` on Unix/Linux. + + Supported options: + + .. program:: raw + .. option:: preallocation + + Preallocation mode (allowed values: ``off``, ``falloc``, + ``full``). ``falloc`` mode preallocates space for image by + calling ``posix_fallocate()``. ``full`` mode preallocates space + for image by writing data to underlying storage. This data may or + may not be zero, depending on the storage location. + +.. program:: image-formats +.. option:: qcow2 + + QEMU image format, the most versatile format. Use it to have smaller + images (useful if your filesystem does not supports holes, for example + on Windows), zlib based compression and support of multiple VM + snapshots. + + Supported options: + + .. program:: qcow2 + .. option:: compat + + Determines the qcow2 version to use. ``compat=0.10`` uses the + traditional image format that can be read by any QEMU since 0.10. + ``compat=1.1`` enables image format extensions that only QEMU 1.1 and + newer understand (this is the default). Amongst others, this includes + zero clusters, which allow efficient copy-on-read for sparse images. + + .. option:: backing_file + + File name of a base image (see ``create`` subcommand) + + .. option:: backing_fmt + + Image format of the base image + + .. option:: encryption + + This option is deprecated and equivalent to ``encrypt.format=aes`` + + .. option:: encrypt.format + + If this is set to ``luks``, it requests that the qcow2 payload (not + qcow2 header) be encrypted using the LUKS format. The passphrase to + use to unlock the LUKS key slot is given by the ``encrypt.key-secret`` + parameter. LUKS encryption parameters can be tuned with the other + ``encrypt.*`` parameters. + + If this is set to ``aes``, the image is encrypted with 128-bit AES-CBC. + The encryption key is given by the ``encrypt.key-secret`` parameter. + This encryption format is considered to be flawed by modern cryptography + standards, suffering from a number of design problems: + + - The AES-CBC cipher is used with predictable initialization vectors based + on the sector number. This makes it vulnerable to chosen plaintext attacks + which can reveal the existence of encrypted data. + - The user passphrase is directly used as the encryption key. A poorly + chosen or short passphrase will compromise the security of the encryption. + - In the event of the passphrase being compromised there is no way to + change the passphrase to protect data in any qcow images. The files must + be cloned, using a different encryption passphrase in the new file. The + original file must then be securely erased using a program like shred, + though even this is ineffective with many modern storage technologies. + + The use of this is no longer supported in system emulators. Support only + remains in the command line utilities, for the purposes of data liberation + and interoperability with old versions of QEMU. The ``luks`` format + should be used instead. + + .. option:: encrypt.key-secret + + Provides the ID of a ``secret`` object that contains the passphrase + (``encrypt.format=luks``) or encryption key (``encrypt.format=aes``). + + .. option:: encrypt.cipher-alg + + Name of the cipher algorithm and key length. Currently defaults + to ``aes-256``. Only used when ``encrypt.format=luks``. + + .. option:: encrypt.cipher-mode + + Name of the encryption mode to use. Currently defaults to ``xts``. + Only used when ``encrypt.format=luks``. + + .. option:: encrypt.ivgen-alg + + Name of the initialization vector generator algorithm. Currently defaults + to ``plain64``. Only used when ``encrypt.format=luks``. + + .. option:: encrypt.ivgen-hash-alg + + Name of the hash algorithm to use with the initialization vector generator + (if required). Defaults to ``sha256``. Only used when ``encrypt.format=luks``. + + .. option:: encrypt.hash-alg + + Name of the hash algorithm to use for PBKDF algorithm + Defaults to ``sha256``. Only used when ``encrypt.format=luks``. + + .. option:: encrypt.iter-time + + Amount of time, in milliseconds, to use for PBKDF algorithm per key slot. + Defaults to ``2000``. Only used when ``encrypt.format=luks``. + + .. option:: cluster_size + + Changes the qcow2 cluster size (must be between 512 and 2M). Smaller cluster + sizes can improve the image file size whereas larger cluster sizes generally + provide better performance. + + .. option:: preallocation + + Preallocation mode (allowed values: ``off``, ``metadata``, ``falloc``, + ``full``). An image with preallocated metadata is initially larger but can + improve performance when the image needs to grow. ``falloc`` and ``full`` + preallocations are like the same options of ``raw`` format, but sets up + metadata also. + + .. option:: lazy_refcounts + + If this option is set to ``on``, reference count updates are postponed with + the goal of avoiding metadata I/O and improving performance. This is + particularly interesting with :option:`cache=writethrough` which doesn't batch + metadata updates. The tradeoff is that after a host crash, the reference count + tables must be rebuilt, i.e. on the next open an (automatic) ``qemu-img + check -r all`` is required, which may take some time. + + This option can only be enabled if ``compat=1.1`` is specified. + + .. option:: nocow + + If this option is set to ``on``, it will turn off COW of the file. It's only + valid on btrfs, no effect on other file systems. + + Btrfs has low performance when hosting a VM image file, even more + when the guest on the VM also using btrfs as file system. Turning off + COW is a way to mitigate this bad performance. Generally there are two + ways to turn off COW on btrfs: + + - Disable it by mounting with nodatacow, then all newly created files + will be NOCOW. + - For an empty file, add the NOCOW file attribute. That's what this + option does. + + Note: this option is only valid to new or empty files. If there is + an existing file which is COW and has data blocks already, it couldn't + be changed to NOCOW by setting ``nocow=on``. One can issue ``lsattr + filename`` to check if the NOCOW flag is set or not (Capital 'C' is + NOCOW flag). + +.. program:: image-formats +.. option:: qed + + Old QEMU image format with support for backing files and compact image files + (when your filesystem or transport medium does not support holes). + + When converting QED images to qcow2, you might want to consider using the + ``lazy_refcounts=on`` option to get a more QED-like behaviour. + + Supported options: + + .. program:: qed + .. option:: backing_file + + File name of a base image (see ``create`` subcommand). + + .. option:: backing_fmt + + Image file format of backing file (optional). Useful if the format cannot be + autodetected because it has no header, like some vhd/vpc files. + + .. option:: cluster_size + + Changes the cluster size (must be power-of-2 between 4K and 64K). Smaller + cluster sizes can improve the image file size whereas larger cluster sizes + generally provide better performance. + + .. option:: table_size + + Changes the number of clusters per L1/L2 table (must be + power-of-2 between 1 and 16). There is normally no need to + change this value but this option can between used for + performance benchmarking. + +.. program:: image-formats +.. option:: qcow + + Old QEMU image format with support for backing files, compact image files, + encryption and compression. + + Supported options: + + .. program:: qcow + .. option:: backing_file + + File name of a base image (see ``create`` subcommand) + + .. option:: encryption + + This option is deprecated and equivalent to ``encrypt.format=aes`` + + .. option:: encrypt.format + + If this is set to ``aes``, the image is encrypted with 128-bit AES-CBC. + The encryption key is given by the ``encrypt.key-secret`` parameter. + This encryption format is considered to be flawed by modern cryptography + standards, suffering from a number of design problems enumerated previously + against the ``qcow2`` image format. + + The use of this is no longer supported in system emulators. Support only + remains in the command line utilities, for the purposes of data liberation + and interoperability with old versions of QEMU. + + Users requiring native encryption should use the ``qcow2`` format + instead with ``encrypt.format=luks``. + + .. option:: encrypt.key-secret + + Provides the ID of a ``secret`` object that contains the encryption + key (``encrypt.format=aes``). + +.. program:: image-formats +.. option:: luks + + LUKS v1 encryption format, compatible with Linux dm-crypt/cryptsetup + + Supported options: + + .. program:: luks + .. option:: key-secret + + Provides the ID of a ``secret`` object that contains the passphrase. + + .. option:: cipher-alg + + Name of the cipher algorithm and key length. Currently defaults + to ``aes-256``. + + .. option:: cipher-mode + + Name of the encryption mode to use. Currently defaults to ``xts``. + + .. option:: ivgen-alg + + Name of the initialization vector generator algorithm. Currently defaults + to ``plain64``. + + .. option:: ivgen-hash-alg + + Name of the hash algorithm to use with the initialization vector generator + (if required). Defaults to ``sha256``. + + .. option:: hash-alg + + Name of the hash algorithm to use for PBKDF algorithm + Defaults to ``sha256``. + + .. option:: iter-time + + Amount of time, in milliseconds, to use for PBKDF algorithm per key slot. + Defaults to ``2000``. + +.. program:: image-formats +.. option:: vdi + + VirtualBox 1.1 compatible image format. + + Supported options: + + .. program:: vdi + .. option:: static + + If this option is set to ``on``, the image is created with metadata + preallocation. + +.. program:: image-formats +.. option:: vmdk + + VMware 3 and 4 compatible image format. + + Supported options: + + .. program: vmdk + .. option:: backing_file + + File name of a base image (see ``create`` subcommand). + + .. option:: compat6 + + Create a VMDK version 6 image (instead of version 4) + + .. option:: hwversion + + Specify vmdk virtual hardware version. Compat6 flag cannot be enabled + if hwversion is specified. + + .. option:: subformat + + Specifies which VMDK subformat to use. Valid options are + ``monolithicSparse`` (default), + ``monolithicFlat``, + ``twoGbMaxExtentSparse``, + ``twoGbMaxExtentFlat`` and + ``streamOptimized``. + +.. program:: image-formats +.. option:: vpc + + VirtualPC compatible image format (VHD). + + Supported options: + + .. program:: vpc + .. option:: subformat + + Specifies which VHD subformat to use. Valid options are + ``dynamic`` (default) and ``fixed``. + +.. program:: image-formats +.. option:: VHDX + + Hyper-V compatible image format (VHDX). + + Supported options: + + .. program:: VHDX + .. option:: subformat + + Specifies which VHDX subformat to use. Valid options are + ``dynamic`` (default) and ``fixed``. + + .. option:: block_state_zero + + Force use of payload blocks of type 'ZERO'. Can be set to ``on`` (default) + or ``off``. When set to ``off``, new blocks will be created as + ``PAYLOAD_BLOCK_NOT_PRESENT``, which means parsers are free to return + arbitrary data for those blocks. Do not set to ``off`` when using + ``qemu-img convert`` with ``subformat=dynamic``. + + .. option:: block_size + + Block size; min 1 MB, max 256 MB. 0 means auto-calculate based on + image size. + + .. option:: log_size + + Log size; min 1 MB. + +Read-only formats +~~~~~~~~~~~~~~~~~ + +More disk image file formats are supported in a read-only mode. + +.. program:: image-formats +.. option:: bochs + + Bochs images of ``growing`` type. + +.. program:: image-formats +.. option:: cloop + + Linux Compressed Loop image, useful only to reuse directly compressed + CD-ROM images present for example in the Knoppix CD-ROMs. + +.. program:: image-formats +.. option:: dmg + + Apple disk image. + +.. program:: image-formats +.. option:: parallels + + Parallels disk image format. + +Using host drives +~~~~~~~~~~~~~~~~~ + +In addition to disk image files, QEMU can directly access host +devices. We describe here the usage for QEMU version >= 0.8.3. + +Linux +^^^^^ + +On Linux, you can directly use the host device filename instead of a +disk image filename provided you have enough privileges to access +it. For example, use ``/dev/cdrom`` to access to the CDROM. + +CD + You can specify a CDROM device even if no CDROM is loaded. QEMU has + specific code to detect CDROM insertion or removal. CDROM ejection by + the guest OS is supported. Currently only data CDs are supported. + +Floppy + You can specify a floppy device even if no floppy is loaded. Floppy + removal is currently not detected accurately (if you change floppy + without doing floppy access while the floppy is not loaded, the guest + OS will think that the same floppy is loaded). + Use of the host's floppy device is deprecated, and support for it will + be removed in a future release. + +Hard disks + Hard disks can be used. Normally you must specify the whole disk + (``/dev/hdb`` instead of ``/dev/hdb1``) so that the guest OS can + see it as a partitioned disk. WARNING: unless you know what you do, it + is better to only make READ-ONLY accesses to the hard disk otherwise + you may corrupt your host data (use the ``-snapshot`` command + line option or modify the device permissions accordingly). + +Windows +^^^^^^^ + +CD + The preferred syntax is the drive letter (e.g. ``d:``). The + alternate syntax ``\\.\d:`` is supported. ``/dev/cdrom`` is + supported as an alias to the first CDROM drive. + + Currently there is no specific code to handle removable media, so it + is better to use the ``change`` or ``eject`` monitor commands to + change or eject media. + +Hard disks + Hard disks can be used with the syntax: ``\\.\PhysicalDriveN`` + where *N* is the drive number (0 is the first hard disk). + + WARNING: unless you know what you do, it is better to only make + READ-ONLY accesses to the hard disk otherwise you may corrupt your + host data (use the ``-snapshot`` command line so that the + modifications are written in a temporary file). + +Mac OS X +^^^^^^^^ + +``/dev/cdrom`` is an alias to the first CDROM. + +Currently there is no specific code to handle removable media, so it +is better to use the ``change`` or ``eject`` monitor commands to +change or eject media. + +Virtual FAT disk images +~~~~~~~~~~~~~~~~~~~~~~~ + +QEMU can automatically create a virtual FAT disk image from a +directory tree. In order to use it, just type: + +.. parsed-literal:: + + |qemu_system| linux.img -hdb fat:/my_directory + +Then you access access to all the files in the ``/my_directory`` +directory without having to copy them in a disk image or to export +them via SAMBA or NFS. The default access is *read-only*. + +Floppies can be emulated with the ``:floppy:`` option: + +.. parsed-literal:: + + |qemu_system| linux.img -fda fat:floppy:/my_directory + +A read/write support is available for testing (beta stage) with the +``:rw:`` option: + +.. parsed-literal:: + + |qemu_system| linux.img -fda fat:floppy:rw:/my_directory + +What you should *never* do: + +- use non-ASCII filenames +- use "-snapshot" together with ":rw:" +- expect it to work when loadvm'ing +- write to the FAT directory on the host system while accessing it with the guest system + +NBD access +~~~~~~~~~~ + +QEMU can access directly to block device exported using the Network Block Device +protocol. + +.. parsed-literal:: + + |qemu_system| linux.img -hdb nbd://my_nbd_server.mydomain.org:1024/ + +If the NBD server is located on the same host, you can use an unix socket instead +of an inet socket: + +.. parsed-literal:: + + |qemu_system| linux.img -hdb nbd+unix://?socket=/tmp/my_socket + +In this case, the block device must be exported using ``qemu-nbd``: + +.. parsed-literal:: + + qemu-nbd --socket=/tmp/my_socket my_disk.qcow2 + +The use of ``qemu-nbd`` allows sharing of a disk between several guests: + +.. parsed-literal:: + + qemu-nbd --socket=/tmp/my_socket --share=2 my_disk.qcow2 + +and then you can use it with two guests: + +.. parsed-literal:: + + |qemu_system| linux1.img -hdb nbd+unix://?socket=/tmp/my_socket + |qemu_system| linux2.img -hdb nbd+unix://?socket=/tmp/my_socket + +If the ``nbd-server`` uses named exports (supported since NBD 2.9.18, or with QEMU's +own embedded NBD server), you must specify an export name in the URI: + +.. parsed-literal:: + + |qemu_system| -cdrom nbd://localhost/debian-500-ppc-netinst + |qemu_system| -cdrom nbd://localhost/openSUSE-11.1-ppc-netinst + +The URI syntax for NBD is supported since QEMU 1.3. An alternative syntax is +also available. Here are some example of the older syntax: + +.. parsed-literal:: + + |qemu_system| linux.img -hdb nbd:my_nbd_server.mydomain.org:1024 + |qemu_system| linux2.img -hdb nbd:unix:/tmp/my_socket + |qemu_system| -cdrom nbd:localhost:10809:exportname=debian-500-ppc-netinst + +iSCSI LUNs +~~~~~~~~~~ + +iSCSI is a popular protocol used to access SCSI devices across a computer +network. + +There are two different ways iSCSI devices can be used by QEMU. + +The first method is to mount the iSCSI LUN on the host, and make it appear as +any other ordinary SCSI device on the host and then to access this device as a +/dev/sd device from QEMU. How to do this differs between host OSes. + +The second method involves using the iSCSI initiator that is built into +QEMU. This provides a mechanism that works the same way regardless of which +host OS you are running QEMU on. This section will describe this second method +of using iSCSI together with QEMU. + +In QEMU, iSCSI devices are described using special iSCSI URLs. URL syntax: + +:: + + iscsi://[<username>[%<password>]@]<host>[:<port>]/<target-iqn-name>/<lun> + +Username and password are optional and only used if your target is set up +using CHAP authentication for access control. +Alternatively the username and password can also be set via environment +variables to have these not show up in the process list: + +:: + + export LIBISCSI_CHAP_USERNAME=<username> + export LIBISCSI_CHAP_PASSWORD=<password> + iscsi://<host>/<target-iqn-name>/<lun> + +Various session related parameters can be set via special options, either +in a configuration file provided via '-readconfig' or directly on the +command line. + +If the initiator-name is not specified qemu will use a default name +of 'iqn.2008-11.org.linux-kvm[:<uuid>'] where <uuid> is the UUID of the +virtual machine. If the UUID is not specified qemu will use +'iqn.2008-11.org.linux-kvm[:<name>'] where <name> is the name of the +virtual machine. + +Setting a specific initiator name to use when logging in to the target: + +:: + + -iscsi initiator-name=iqn.qemu.test:my-initiator + +Controlling which type of header digest to negotiate with the target: + +:: + + -iscsi header-digest=CRC32C|CRC32C-NONE|NONE-CRC32C|NONE + +These can also be set via a configuration file: + +:: + + [iscsi] + user = "CHAP username" + password = "CHAP password" + initiator-name = "iqn.qemu.test:my-initiator" + # header digest is one of CRC32C|CRC32C-NONE|NONE-CRC32C|NONE + header-digest = "CRC32C" + +Setting the target name allows different options for different targets: + +:: + + [iscsi "iqn.target.name"] + user = "CHAP username" + password = "CHAP password" + initiator-name = "iqn.qemu.test:my-initiator" + # header digest is one of CRC32C|CRC32C-NONE|NONE-CRC32C|NONE + header-digest = "CRC32C" + +How to use a configuration file to set iSCSI configuration options: + +.. parsed-literal:: + + cat >iscsi.conf <<EOF + [iscsi] + user = "me" + password = "my password" + initiator-name = "iqn.qemu.test:my-initiator" + header-digest = "CRC32C" + EOF + + |qemu_system| -drive file=iscsi://127.0.0.1/iqn.qemu.test/1 \\ + -readconfig iscsi.conf + +How to set up a simple iSCSI target on loopback and access it via QEMU: +this example shows how to set up an iSCSI target with one CDROM and one DISK +using the Linux STGT software target. This target is available on Red Hat based +systems as the package 'scsi-target-utils'. + +.. parsed-literal:: + + tgtd --iscsi portal=127.0.0.1:3260 + tgtadm --lld iscsi --op new --mode target --tid 1 -T iqn.qemu.test + tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 1 \\ + -b /IMAGES/disk.img --device-type=disk + tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 2 \\ + -b /IMAGES/cd.iso --device-type=cd + tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL + + |qemu_system| -iscsi initiator-name=iqn.qemu.test:my-initiator \\ + -boot d -drive file=iscsi://127.0.0.1/iqn.qemu.test/1 \\ + -cdrom iscsi://127.0.0.1/iqn.qemu.test/2 + +GlusterFS disk images +~~~~~~~~~~~~~~~~~~~~~ + +GlusterFS is a user space distributed file system. + +You can boot from the GlusterFS disk image with the command: + +URI: + +.. parsed-literal:: + + |qemu_system| -drive file=gluster[+TYPE]://[HOST}[:PORT]]/VOLUME/PATH + [?socket=...][,file.debug=9][,file.logfile=...] + +JSON: + +.. parsed-literal:: + + |qemu_system| 'json:{"driver":"qcow2", + "file":{"driver":"gluster", + "volume":"testvol","path":"a.img","debug":9,"logfile":"...", + "server":[{"type":"tcp","host":"...","port":"..."}, + {"type":"unix","socket":"..."}]}}' + +*gluster* is the protocol. + +*TYPE* specifies the transport type used to connect to gluster +management daemon (glusterd). Valid transport types are +tcp and unix. In the URI form, if a transport type isn't specified, +then tcp type is assumed. + +*HOST* specifies the server where the volume file specification for +the given volume resides. This can be either a hostname or an ipv4 address. +If transport type is unix, then *HOST* field should not be specified. +Instead *socket* field needs to be populated with the path to unix domain +socket. + +*PORT* is the port number on which glusterd is listening. This is optional +and if not specified, it defaults to port 24007. If the transport type is unix, +then *PORT* should not be specified. + +*VOLUME* is the name of the gluster volume which contains the disk image. + +*PATH* is the path to the actual disk image that resides on gluster volume. + +*debug* is the logging level of the gluster protocol driver. Debug levels +are 0-9, with 9 being the most verbose, and 0 representing no debugging output. +The default level is 4. The current logging levels defined in the gluster source +are 0 - None, 1 - Emergency, 2 - Alert, 3 - Critical, 4 - Error, 5 - Warning, +6 - Notice, 7 - Info, 8 - Debug, 9 - Trace + +*logfile* is a commandline option to mention log file path which helps in +logging to the specified file and also help in persisting the gfapi logs. The +default is stderr. + +You can create a GlusterFS disk image with the command: + +.. parsed-literal:: + + qemu-img create gluster://HOST/VOLUME/PATH SIZE + +Examples + +.. parsed-literal:: + + |qemu_system| -drive file=gluster://1.2.3.4/testvol/a.img + |qemu_system| -drive file=gluster+tcp://1.2.3.4/testvol/a.img + |qemu_system| -drive file=gluster+tcp://1.2.3.4:24007/testvol/dir/a.img + |qemu_system| -drive file=gluster+tcp://[1:2:3:4:5:6:7:8]/testvol/dir/a.img + |qemu_system| -drive file=gluster+tcp://[1:2:3:4:5:6:7:8]:24007/testvol/dir/a.img + |qemu_system| -drive file=gluster+tcp://server.domain.com:24007/testvol/dir/a.img + |qemu_system| -drive file=gluster+unix:///testvol/dir/a.img?socket=/tmp/glusterd.socket + |qemu_system| -drive file=gluster+rdma://1.2.3.4:24007/testvol/a.img + |qemu_system| -drive file=gluster://1.2.3.4/testvol/a.img,file.debug=9,file.logfile=/var/log/qemu-gluster.log + |qemu_system| 'json:{"driver":"qcow2", + "file":{"driver":"gluster", + "volume":"testvol","path":"a.img", + "debug":9,"logfile":"/var/log/qemu-gluster.log", + "server":[{"type":"tcp","host":"1.2.3.4","port":24007}, + {"type":"unix","socket":"/var/run/glusterd.socket"}]}}' + |qemu_system| -drive driver=qcow2,file.driver=gluster,file.volume=testvol,file.path=/path/a.img, + file.debug=9,file.logfile=/var/log/qemu-gluster.log, + file.server.0.type=tcp,file.server.0.host=1.2.3.4,file.server.0.port=24007, + file.server.1.type=unix,file.server.1.socket=/var/run/glusterd.socket + +Secure Shell (ssh) disk images +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You can access disk images located on a remote ssh server +by using the ssh protocol: + +.. parsed-literal:: + + |qemu_system| -drive file=ssh://[USER@]SERVER[:PORT]/PATH[?host_key_check=HOST_KEY_CHECK] + +Alternative syntax using properties: + +.. parsed-literal:: + + |qemu_system| -drive file.driver=ssh[,file.user=USER],file.host=SERVER[,file.port=PORT],file.path=PATH[,file.host_key_check=HOST_KEY_CHECK] + +*ssh* is the protocol. + +*USER* is the remote user. If not specified, then the local +username is tried. + +*SERVER* specifies the remote ssh server. Any ssh server can be +used, but it must implement the sftp-server protocol. Most Unix/Linux +systems should work without requiring any extra configuration. + +*PORT* is the port number on which sshd is listening. By default +the standard ssh port (22) is used. + +*PATH* is the path to the disk image. + +The optional *HOST_KEY_CHECK* parameter controls how the remote +host's key is checked. The default is ``yes`` which means to use +the local ``.ssh/known_hosts`` file. Setting this to ``no`` +turns off known-hosts checking. Or you can check that the host key +matches a specific fingerprint: +``host_key_check=md5:78:45:8e:14:57:4f:d5:45:83:0a:0e:f3:49:82:c9:c8`` +(``sha1:`` can also be used as a prefix, but note that OpenSSH +tools only use MD5 to print fingerprints). + +Currently authentication must be done using ssh-agent. Other +authentication methods may be supported in future. + +Note: Many ssh servers do not support an ``fsync``-style operation. +The ssh driver cannot guarantee that disk flush requests are +obeyed, and this causes a risk of disk corruption if the remote +server or network goes down during writes. The driver will +print a warning when ``fsync`` is not supported: + +:: + + warning: ssh server ssh.example.com:22 does not support fsync + +With sufficiently new versions of libssh and OpenSSH, ``fsync`` is +supported. + +NVMe disk images +~~~~~~~~~~~~~~~~ + +NVM Express (NVMe) storage controllers can be accessed directly by a userspace +driver in QEMU. This bypasses the host kernel file system and block layers +while retaining QEMU block layer functionalities, such as block jobs, I/O +throttling, image formats, etc. Disk I/O performance is typically higher than +with ``-drive file=/dev/sda`` using either thread pool or linux-aio. + +The controller will be exclusively used by the QEMU process once started. To be +able to share storage between multiple VMs and other applications on the host, +please use the file based protocols. + +Before starting QEMU, bind the host NVMe controller to the host vfio-pci +driver. For example: + +.. parsed-literal:: + + # modprobe vfio-pci + # lspci -n -s 0000:06:0d.0 + 06:0d.0 0401: 1102:0002 (rev 08) + # echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind + # echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id + + # |qemu_system| -drive file=nvme://HOST:BUS:SLOT.FUNC/NAMESPACE + +Alternative syntax using properties: + +.. parsed-literal:: + + |qemu_system| -drive file.driver=nvme,file.device=HOST:BUS:SLOT.FUNC,file.namespace=NAMESPACE + +*HOST*:*BUS*:*SLOT*.\ *FUNC* is the NVMe controller's PCI device +address on the host. + +*NAMESPACE* is the NVMe namespace number, starting from 1. + +Disk image file locking +~~~~~~~~~~~~~~~~~~~~~~~ + +By default, QEMU tries to protect image files from unexpected concurrent +access, as long as it's supported by the block protocol driver and host +operating system. If multiple QEMU processes (including QEMU emulators and +utilities) try to open the same image with conflicting accessing modes, all but +the first one will get an error. + +This feature is currently supported by the file protocol on Linux with the Open +File Descriptor (OFD) locking API, and can be configured to fall back to POSIX +locking if the POSIX host doesn't support Linux OFD locking. + +To explicitly enable image locking, specify "locking=on" in the file protocol +driver options. If OFD locking is not possible, a warning will be printed and +the POSIX locking API will be used. In this case there is a risk that the lock +will get silently lost when doing hot plugging and block jobs, due to the +shortcomings of the POSIX locking API. + +QEMU transparently handles lock handover during shared storage migration. For +shared virtual disk images between multiple VMs, the "share-rw" device option +should be used. + +By default, the guest has exclusive write access to its disk image. If the +guest can safely share the disk image with other writers the +``-device ...,share-rw=on`` parameter can be used. This is only safe if +the guest is running software, such as a cluster file system, that +coordinates disk accesses to avoid corruption. + +Note that share-rw=on only declares the guest's ability to share the disk. +Some QEMU features, such as image file formats, require exclusive write access +to the disk image and this is unaffected by the share-rw=on option. + +Alternatively, locking can be fully disabled by "locking=off" block device +option. In the command line, the option is usually in the form of +"file.locking=off" as the protocol driver is normally placed as a "file" child +under a format driver. For example: + +:: + + -blockdev driver=qcow2,file.filename=/path/to/image,file.locking=off,file.driver=file + +To check if image locking is active, check the output of the "lslocks" command +on host and see if there are locks held by the QEMU process on the image file. +More than one byte could be locked by the QEMU instance, each byte of which +reflects a particular permission that is acquired or protected by the running +block driver. + +Filter drivers +~~~~~~~~~~~~~~ + +QEMU supports several filter drivers, which don't store any data, but perform +some additional tasks, hooking io requests. + +.. program:: filter-drivers +.. option:: preallocate + + The preallocate filter driver is intended to be inserted between format + and protocol nodes and preallocates some additional space + (expanding the protocol file) when writing past the file’s end. This can be + useful for file-systems with slow allocation. + + Supported options: + + .. program:: preallocate + .. option:: prealloc-align + + On preallocation, align the file length to this value (in bytes), default 1M. + + .. program:: preallocate + .. option:: prealloc-size + + How much to preallocate (in bytes), default 128M. diff --git a/docs/system/qemu-cpu-models.rst b/docs/system/qemu-cpu-models.rst new file mode 100644 index 000000000..5cf6e46f8 --- /dev/null +++ b/docs/system/qemu-cpu-models.rst @@ -0,0 +1,24 @@ +:orphan: + +================================== +QEMU / KVM CPU model configuration +================================== + +-------- +Synopsis +-------- + +QEMU CPU Modelling Infrastructure manual + +----------- +Description +----------- + +.. include:: cpu-models-x86.rst.inc +.. include:: cpu-models-mips.rst.inc + +-------- +See also +-------- + +The HTML documentation of QEMU for more precise information and Linux user mode emulator invocation. diff --git a/docs/system/qemu-manpage.rst b/docs/system/qemu-manpage.rst new file mode 100644 index 000000000..c47a41275 --- /dev/null +++ b/docs/system/qemu-manpage.rst @@ -0,0 +1,51 @@ +:orphan: + +.. + This file is the skeleton for the qemu.1 manpage. It mostly + should simply include the .rst.inc files corresponding to the + parts of the documentation that go in the manpage as well as the + HTML manual. + +======================= +QEMU User Documentation +======================= + +-------- +Synopsis +-------- + +.. parsed-literal:: + + |qemu_system| [options] [disk_image] + +----------- +Description +----------- + +.. include:: target-i386-desc.rst.inc + +------- +Options +------- + +disk_image is a raw hard disk image for IDE hard disk 0. Some targets do +not need a disk image. + +.. hxtool-doc:: qemu-options.hx + +.. include:: keys.rst.inc + +.. include:: mux-chardev.rst.inc + +----- +Notes +----- + +.. include:: device-url-syntax.rst.inc + +-------- +See also +-------- + +The HTML documentation of QEMU for more precise information and Linux +user mode emulator invocation. diff --git a/docs/system/quickstart.rst b/docs/system/quickstart.rst new file mode 100644 index 000000000..681678c86 --- /dev/null +++ b/docs/system/quickstart.rst @@ -0,0 +1,21 @@ +.. _pcsys_005fquickstart: + +Quick Start +----------- + +Download and uncompress a PC hard disk image with Linux installed (e.g. +``linux.img``) and type: + +.. parsed-literal:: + + |qemu_system| linux.img + +Linux should boot and give you a prompt. + +Users should be aware the above example elides a lot of the complexity +of setting up a VM with x86_64 specific defaults and assumes the +first non switch argument is a PC compatible disk image with a boot +sector. For a non-x86 system where we emulate a broad range of machine +types, the command lines are generally more explicit in defining the +machine and boot behaviour. You will find more example command lines +in the :ref:`system-targets-ref` section of the manual. diff --git a/docs/system/riscv/microchip-icicle-kit.rst b/docs/system/riscv/microchip-icicle-kit.rst new file mode 100644 index 000000000..40798b1aa --- /dev/null +++ b/docs/system/riscv/microchip-icicle-kit.rst @@ -0,0 +1,149 @@ +Microchip PolarFire SoC Icicle Kit (``microchip-icicle-kit``) +============================================================= + +Microchip PolarFire SoC Icicle Kit integrates a PolarFire SoC, with one +SiFive's E51 plus four U54 cores and many on-chip peripherals and an FPGA. + +For more details about Microchip PolarFire SoC, please see: +https://www.microsemi.com/product-directory/soc-fpgas/5498-polarfire-soc-fpga + +The Icicle Kit board information can be found here: +https://www.microsemi.com/existing-parts/parts/152514 + +Supported devices +----------------- + +The ``microchip-icicle-kit`` machine supports the following devices: + +* 1 E51 core +* 4 U54 cores +* Core Level Interruptor (CLINT) +* Platform-Level Interrupt Controller (PLIC) +* L2 Loosely Integrated Memory (L2-LIM) +* DDR memory controller +* 5 MMUARTs +* 1 DMA controller +* 2 GEM Ethernet controllers +* 1 SDHC storage controller + +Boot options +------------ + +The ``microchip-icicle-kit`` machine can start using the standard -bios +functionality for loading its BIOS image, aka Hart Software Services (HSS_). +HSS loads the second stage bootloader U-Boot from an SD card. Then a kernel +can be loaded from U-Boot. It also supports direct kernel booting via the +-kernel option along with the device tree blob via -dtb. When direct kernel +boot is used, the OpenSBI fw_dynamic BIOS image is used to boot a payload +like U-Boot or OS kernel directly. + +The user provided DTB should have the following requirements: + +* The /cpus node should contain at least one subnode for E51 and the number + of subnodes should match QEMU's ``-smp`` option +* The /memory reg size should match QEMU’s selected ram_size via ``-m`` +* Should contain a node for the CLINT device with a compatible string + "riscv,clint0" + +QEMU follows below truth table to select which payload to execute: + +===== ========== ========== ======= +-bios -kernel -dtb payload +===== ========== ========== ======= + N N don't care HSS + Y don't care don't care HSS + N Y Y kernel +===== ========== ========== ======= + +The memory is set to 1537 MiB by default which is the minimum required high +memory size by HSS. A sanity check on ram size is performed in the machine +init routine to prompt user to increase the RAM size to > 1537 MiB when less +than 1537 MiB ram is detected. + +Running HSS +----------- + +HSS 2020.12 release is tested at the time of writing. To build an HSS image +that can be booted by the ``microchip-icicle-kit`` machine, type the following +in the HSS source tree: + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ cp boards/mpfs-icicle-kit-es/def_config .config + $ make BOARD=mpfs-icicle-kit-es + +Download the official SD card image released by Microchip and prepare it for +QEMU usage: + +.. code-block:: bash + + $ wget ftp://ftpsoc.microsemi.com/outgoing/core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic.gz + $ gunzip core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic.gz + $ qemu-img resize core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic 4G + +Then we can boot the machine by: + +.. code-block:: bash + + $ qemu-system-riscv64 -M microchip-icicle-kit -smp 5 \ + -bios path/to/hss.bin -sd path/to/sdcard.img \ + -nic user,model=cadence_gem \ + -nic tap,ifname=tap,model=cadence_gem,script=no \ + -display none -serial stdio \ + -chardev socket,id=serial1,path=serial1.sock,server=on,wait=on \ + -serial chardev:serial1 + +With above command line, current terminal session will be used for the first +serial port. Open another terminal window, and use ``minicom`` to connect the +second serial port. + +.. code-block:: bash + + $ minicom -D unix\#serial1.sock + +HSS output is on the first serial port (stdio) and U-Boot outputs on the +second serial port. U-Boot will automatically load the Linux kernel from +the SD card image. + +Direct Kernel Boot +------------------ + +Sometimes we just want to test booting a new kernel, and transforming the +kernel image to the format required by the HSS bootflow is tedious. We can +use '-kernel' for direct kernel booting just like other RISC-V machines do. + +In this mode, the OpenSBI fw_dynamic BIOS image for 'generic' platform is +used to boot an S-mode payload like U-Boot or OS kernel directly. + +For example, the following commands show building a U-Boot image from U-Boot +mainline v2021.07 for the Microchip Icicle Kit board: + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ make microchip_mpfs_icicle_defconfig + +Then we can boot the machine by: + +.. code-block:: bash + + $ qemu-system-riscv64 -M microchip-icicle-kit -smp 5 -m 2G \ + -sd path/to/sdcard.img \ + -nic user,model=cadence_gem \ + -nic tap,ifname=tap,model=cadence_gem,script=no \ + -display none -serial stdio \ + -kernel path/to/u-boot/build/dir/u-boot.bin \ + -dtb path/to/u-boot/build/dir/u-boot.dtb + +CAVEATS: + +* Check the "stdout-path" property in the /chosen node in the DTB to determine + which serial port is used for the serial console, e.g.: if the console is set + to the second serial port, change to use "-serial null -serial stdio". +* The default U-Boot configuration uses CONFIG_OF_SEPARATE hence the ELF image + ``u-boot`` cannot be passed to "-kernel" as it does not contain the DTB hence + ``u-boot.bin`` has to be used which does contain one. To use the ELF image, + we need to change to CONFIG_OF_EMBED or CONFIG_OF_PRIOR_STAGE. + +.. _HSS: https://github.com/polarfire-soc/hart-software-services diff --git a/docs/system/riscv/shakti-c.rst b/docs/system/riscv/shakti-c.rst new file mode 100644 index 000000000..fea57f7b6 --- /dev/null +++ b/docs/system/riscv/shakti-c.rst @@ -0,0 +1,82 @@ +Shakti C Reference Platform (``shakti_c``) +========================================== + +Shakti C Reference Platform is a reference platform based on arty a7 100t +for the Shakti SoC. + +Shakti SoC is a SoC based on the Shakti C-class processor core. Shakti C +is a 64bit RV64GCSUN processor core. + +For more details on Shakti SoC, please see: +https://gitlab.com/shaktiproject/cores/shakti-soc/-/blob/master/fpga/boards/artya7-100t/c-class/README.rst + +For more info on the Shakti C-class core, please see: +https://c-class.readthedocs.io/en/latest/ + +Supported devices +----------------- + +The ``shakti_c`` machine supports the following devices: + + * 1 C-class core + * Core Level Interruptor (CLINT) + * Platform-Level Interrupt Controller (PLIC) + * 1 UART + +Boot options +------------ + +The ``shakti_c`` machine can start using the standard -bios +functionality for loading the baremetal application or opensbi. + +Boot the machine +---------------- + +Shakti SDK +~~~~~~~~~~ +Shakti SDK can be used to generate the baremetal example UART applications. + +.. code-block:: bash + + $ git clone https://gitlab.com/behindbytes/shakti-sdk.git + $ cd shakti-sdk + $ make software PROGRAM=loopback TARGET=artix7_100t + +Binary would be generated in: + software/examples/uart_applns/loopback/output/loopback.shakti + +You could also download the precompiled example applications using below +commands. + +.. code-block:: bash + + $ wget -c https://gitlab.com/behindbytes/shakti-binaries/-/raw/master/sdk/shakti_sdk_qemu.zip + $ unzip shakti_sdk_qemu.zip + +Then we can run the UART example using: + +.. code-block:: bash + + $ qemu-system-riscv64 -M shakti_c -nographic \ + -bios path/to/shakti_sdk_qemu/loopback.shakti + +OpenSBI +~~~~~~~ +We can also run OpenSBI with Test Payload. + +.. code-block:: bash + + $ git clone https://github.com/riscv/opensbi.git -b v0.9 + $ cd opensbi + $ wget -c https://gitlab.com/behindbytes/shakti-binaries/-/raw/master/dts/shakti.dtb + $ export CROSS_COMPILE=riscv64-unknown-elf- + $ export FW_FDT_PATH=./shakti.dtb + $ make PLATFORM=generic + +fw_payload.elf would be generated in build/platform/generic/firmware/fw_payload.elf. +Boot it using the below qemu command. + +.. code-block:: bash + + $ qemu-system-riscv64 -M shakti_c -nographic \ + -bios path/to/fw_payload.elf diff --git a/docs/system/riscv/sifive_u.rst b/docs/system/riscv/sifive_u.rst new file mode 100644 index 000000000..7b166567f --- /dev/null +++ b/docs/system/riscv/sifive_u.rst @@ -0,0 +1,375 @@ +SiFive HiFive Unleashed (``sifive_u``) +====================================== + +SiFive HiFive Unleashed Development Board is the ultimate RISC-V development +board featuring the Freedom U540 multi-core RISC-V processor. + +Supported devices +----------------- + +The ``sifive_u`` machine supports the following devices: + +* 1 E51 / E31 core +* Up to 4 U54 / U34 cores +* Core Local Interruptor (CLINT) +* Platform-Level Interrupt Controller (PLIC) +* Power, Reset, Clock, Interrupt (PRCI) +* L2 Loosely Integrated Memory (L2-LIM) +* DDR memory controller +* 2 UARTs +* 1 GEM Ethernet controller +* 1 GPIO controller +* 1 One-Time Programmable (OTP) memory with stored serial number +* 1 DMA controller +* 2 QSPI controllers +* 1 ISSI 25WP256 flash +* 1 SD card in SPI mode +* PWM0 and PWM1 + +Please note the real world HiFive Unleashed board has a fixed configuration of +1 E51 core and 4 U54 core combination and the RISC-V core boots in 64-bit mode. +With QEMU, one can create a machine with 1 E51 core and up to 4 U54 cores. It +is also possible to create a 32-bit variant with the same peripherals except +that the RISC-V cores are replaced by the 32-bit ones (E31 and U34), to help +testing of 32-bit guest software. + +Hardware configuration information +---------------------------------- + +The ``sifive_u`` machine automatically generates a device tree blob ("dtb") +which it passes to the guest, if there is no ``-dtb`` option. This provides +information about the addresses, interrupt lines and other configuration of +the various devices in the system. Guest software should discover the devices +that are present in the generated DTB instead of using a DTB for the real +hardware, as some of the devices are not modeled by QEMU and trying to access +these devices may cause unexpected behavior. + +If users want to provide their own DTB, they can use the ``-dtb`` option. +These DTBs should have the following requirements: + +* The /cpus node should contain at least one subnode for E51 and the number + of subnodes should match QEMU's ``-smp`` option +* The /memory reg size should match QEMU’s selected ram_size via ``-m`` +* Should contain a node for the CLINT device with a compatible string + "riscv,clint0" if using with OpenSBI BIOS images + +Boot options +------------ + +The ``sifive_u`` machine can start using the standard -kernel functionality +for loading a Linux kernel, a VxWorks kernel, a modified U-Boot bootloader +(S-mode) or ELF executable with the default OpenSBI firmware image as the +-bios. It also supports booting the unmodified U-Boot bootloader using the +standard -bios functionality. + +Machine-specific options +------------------------ + +The following machine-specific options are supported: + +- serial=nnn + + The board serial number. When not given, the default serial number 1 is used. + + SiFive reserves the first 1 KiB of the 16 KiB OTP memory for internal use. + The current usage is only used to store the serial number of the board at + offset 0xfc. U-Boot reads the serial number from the OTP memory, and uses + it to generate a unique MAC address to be programmed to the on-chip GEM + Ethernet controller. When multiple QEMU ``sifive_u`` machines are created + and connected to the same subnet, they all have the same MAC address hence + it creates an unusable network. In such scenario, user should give different + values to serial= when creating different ``sifive_u`` machines. + +- start-in-flash + + When given, QEMU's ROM codes jump to QSPI memory-mapped flash directly. + Otherwise QEMU will jump to DRAM or L2LIM depending on the msel= value. + When not given, it defaults to direct DRAM booting. + +- msel=[6|11] + + Mode Select (MSEL[3:0]) pins value, used to control where to boot from. + + The FU540 SoC supports booting from several sources, which are controlled + using the Mode Select pins on the chip. Typically, the boot process runs + through several stages before it begins execution of user-provided programs. + These stages typically include the following: + + 1. Zeroth Stage Boot Loader (ZSBL), which is contained in an on-chip mask + ROM and provided by QEMU. Note QEMU implemented ROM codes are not the + same as what is programmed in the hardware. The QEMU one is a simplified + version, but it provides the same functionality as the hardware. + 2. First Stage Boot Loader (FSBL), which brings up PLLs and DDR memory. + This is U-Boot SPL. + 3. Second Stage Boot Loader (SSBL), which further initializes additional + peripherals as needed. This is U-Boot proper combined with an OpenSBI + fw_dynamic firmware image. + + msel=6 means FSBL and SSBL are both on the QSPI flash. msel=11 means FSBL + and SSBL are both on the SD card. + +Running Linux kernel +-------------------- + +Linux mainline v5.10 release is tested at the time of writing. To build a +Linux mainline kernel that can be booted by the ``sifive_u`` machine in +64-bit mode, simply configure the kernel using the defconfig configuration: + +.. code-block:: bash + + $ export ARCH=riscv + $ export CROSS_COMPILE=riscv64-linux- + $ make defconfig + $ make + +To boot the newly built Linux kernel in QEMU with the ``sifive_u`` machine: + +.. code-block:: bash + + $ qemu-system-riscv64 -M sifive_u -smp 5 -m 2G \ + -display none -serial stdio \ + -kernel arch/riscv/boot/Image \ + -initrd /path/to/rootfs.ext4 \ + -append "root=/dev/ram" + +Alternatively, we can use a custom DTB to boot the machine by inserting a CLINT +node in fu540-c000.dtsi in the Linux kernel, + +.. code-block:: none + + clint: clint@2000000 { + compatible = "riscv,clint0"; + interrupts-extended = <&cpu0_intc 3 &cpu0_intc 7 + &cpu1_intc 3 &cpu1_intc 7 + &cpu2_intc 3 &cpu2_intc 7 + &cpu3_intc 3 &cpu3_intc 7 + &cpu4_intc 3 &cpu4_intc 7>; + reg = <0x00 0x2000000 0x00 0x10000>; + }; + +with the following command line options: + +.. code-block:: bash + + $ qemu-system-riscv64 -M sifive_u -smp 5 -m 8G \ + -display none -serial stdio \ + -kernel arch/riscv/boot/Image \ + -dtb arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dtb \ + -initrd /path/to/rootfs.ext4 \ + -append "root=/dev/ram" + +To build a Linux mainline kernel that can be booted by the ``sifive_u`` machine +in 32-bit mode, use the rv32_defconfig configuration. A patch is required to +fix the 32-bit boot issue for Linux kernel v5.10. + +.. code-block:: bash + + $ export ARCH=riscv + $ export CROSS_COMPILE=riscv64-linux- + $ curl https://patchwork.kernel.org/project/linux-riscv/patch/20201219001356.2887782-1-atish.patra@wdc.com/mbox/ > riscv.patch + $ git am riscv.patch + $ make rv32_defconfig + $ make + +Replace ``qemu-system-riscv64`` with ``qemu-system-riscv32`` in the command +line above to boot the 32-bit Linux kernel. A rootfs image containing 32-bit +applications shall be used in order for kernel to boot to user space. + +Running VxWorks kernel +---------------------- + +VxWorks 7 SR0650 release is tested at the time of writing. To build a 64-bit +VxWorks mainline kernel that can be booted by the ``sifive_u`` machine, simply +create a VxWorks source build project based on the sifive_generic BSP, and a +VxWorks image project to generate the bootable VxWorks image, by following the +BSP documentation instructions. + +A pre-built 64-bit VxWorks 7 image for HiFive Unleashed board is available as +part of the VxWorks SDK for testing as well. Instructions to download the SDK: + +.. code-block:: bash + + $ wget https://labs.windriver.com/downloads/wrsdk-vxworks7-sifive-hifive-1.01.tar.bz2 + $ tar xvf wrsdk-vxworks7-sifive-hifive-1.01.tar.bz2 + $ ls bsps/sifive_generic_1_0_0_0/uboot/uVxWorks + +To boot the VxWorks kernel in QEMU with the ``sifive_u`` machine, use: + +.. code-block:: bash + + $ qemu-system-riscv64 -M sifive_u -smp 5 -m 2G \ + -display none -serial stdio \ + -nic tap,ifname=tap0,script=no,downscript=no \ + -kernel /path/to/vxWorks \ + -append "gem(0,0)host:vxWorks h=192.168.200.1 e=192.168.200.2:ffffff00 u=target pw=vxTarget f=0x01" + +It is also possible to test 32-bit VxWorks on the ``sifive_u`` machine. Create +a 32-bit project to build the 32-bit VxWorks image, and use exact the same +command line options with ``qemu-system-riscv32``. + +Running U-Boot +-------------- + +U-Boot mainline v2021.07 release is tested at the time of writing. To build a +U-Boot mainline bootloader that can be booted by the ``sifive_u`` machine, use +the sifive_unleashed_defconfig with similar commands as described above for +Linux: + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ export OPENSBI=/path/to/opensbi-riscv64-generic-fw_dynamic.bin + $ make sifive_unleashed_defconfig + +You will get spl/u-boot-spl.bin and u-boot.itb file in the build tree. + +To start U-Boot using the ``sifive_u`` machine, prepare an SPI flash image, or +SD card image that is properly partitioned and populated with correct contents. +genimage_ can be used to generate these images. + +A sample configuration file for a 128 MiB SD card image is: + +.. code-block:: bash + + $ cat genimage_sdcard.cfg + image sdcard.img { + size = 128M + + hdimage { + gpt = true + } + + partition u-boot-spl { + image = "u-boot-spl.bin" + offset = 17K + partition-type-uuid = 5B193300-FC78-40CD-8002-E86C45580B47 + } + + partition u-boot { + image = "u-boot.itb" + offset = 1041K + partition-type-uuid = 2E54B353-1271-4842-806F-E436D6AF6985 + } + } + +SPI flash image has slightly different partition offsets, and the size has to +be 32 MiB to match the ISSI 25WP256 flash on the real board: + +.. code-block:: bash + + $ cat genimage_spi-nor.cfg + image spi-nor.img { + size = 32M + + hdimage { + gpt = true + } + + partition u-boot-spl { + image = "u-boot-spl.bin" + offset = 20K + partition-type-uuid = 5B193300-FC78-40CD-8002-E86C45580B47 + } + + partition u-boot { + image = "u-boot.itb" + offset = 1044K + partition-type-uuid = 2E54B353-1271-4842-806F-E436D6AF6985 + } + } + +Assume U-Boot binaries are put in the same directory as the config file, +we can generate the image by: + +.. code-block:: bash + + $ genimage --config genimage_<boot_src>.cfg --inputpath . + +Boot U-Boot from SD card, by specifying msel=11 and pass the SD card image +to QEMU ``sifive_u`` machine: + +.. code-block:: bash + + $ qemu-system-riscv64 -M sifive_u,msel=11 -smp 5 -m 8G \ + -display none -serial stdio \ + -bios /path/to/u-boot-spl.bin \ + -drive file=/path/to/sdcard.img,if=sd + +Changing msel= value to 6, allows booting U-Boot from the SPI flash: + +.. code-block:: bash + + $ qemu-system-riscv64 -M sifive_u,msel=6 -smp 5 -m 8G \ + -display none -serial stdio \ + -bios /path/to/u-boot-spl.bin \ + -drive file=/path/to/spi-nor.img,if=mtd + +Note when testing U-Boot, QEMU automatically generated device tree blob is +not used because U-Boot itself embeds device tree blobs for U-Boot SPL and +U-Boot proper. Hence the number of cores and size of memory have to match +the real hardware, ie: 5 cores (-smp 5) and 8 GiB memory (-m 8G). + +Above use case is to run upstream U-Boot for the SiFive HiFive Unleashed +board on QEMU ``sifive_u`` machine out of the box. This allows users to +develop and test the recommended RISC-V boot flow with a real world use +case: ZSBL (in QEMU) loads U-Boot SPL from SD card or SPI flash to L2LIM, +then U-Boot SPL loads the combined payload image of OpenSBI fw_dynamic +firmware and U-Boot proper. + +However sometimes we want to have a quick test of booting U-Boot on QEMU +without the needs of preparing the SPI flash or SD card images, an alternate +way can be used, which is to create a U-Boot S-mode image by modifying the +configuration of U-Boot: + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ make sifive_unleashed_defconfig + $ make menuconfig + +then manually select the following configuration: + + * Device Tree Control ---> Provider of DTB for DT Control ---> Prior Stage bootloader DTB + +and unselect the following configuration: + + * Library routines ---> Allow access to binman information in the device tree + +This changes U-Boot to use the QEMU generated device tree blob, and bypass +running the U-Boot SPL stage. + +Boot the 64-bit U-Boot S-mode image directly: + +.. code-block:: bash + + $ qemu-system-riscv64 -M sifive_u -smp 5 -m 2G \ + -display none -serial stdio \ + -kernel /path/to/u-boot.bin + +It's possible to create a 32-bit U-Boot S-mode image as well. + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ make sifive_unleashed_defconfig + $ make menuconfig + +then manually update the following configuration in U-Boot: + + * Device Tree Control ---> Provider of DTB for DT Control ---> Prior Stage bootloader DTB + * RISC-V architecture ---> Base ISA ---> RV32I + * Boot options ---> Boot images ---> Text Base ---> 0x80400000 + +and unselect the following configuration: + + * Library routines ---> Allow access to binman information in the device tree + +Use the same command line options to boot the 32-bit U-Boot S-mode image: + +.. code-block:: bash + + $ qemu-system-riscv32 -M sifive_u -smp 5 -m 2G \ + -display none -serial stdio \ + -kernel /path/to/u-boot.bin + +.. _genimage: https://github.com/pengutronix/genimage diff --git a/docs/system/riscv/virt.rst b/docs/system/riscv/virt.rst new file mode 100644 index 000000000..fa016584b --- /dev/null +++ b/docs/system/riscv/virt.rst @@ -0,0 +1,148 @@ +'virt' Generic Virtual Platform (``virt``) +========================================== + +The ``virt`` board is a platform which does not correspond to any real hardware; +it is designed for use in virtual machines. It is the recommended board type +if you simply want to run a guest such as Linux and do not care about +reproducing the idiosyncrasies and limitations of a particular bit of +real-world hardware. + +Supported devices +----------------- + +The ``virt`` machine supports the following devices: + +* Up to 8 generic RV32GC/RV64GC cores, with optional extensions +* Core Local Interruptor (CLINT) +* Platform-Level Interrupt Controller (PLIC) +* CFI parallel NOR flash memory +* 1 NS16550 compatible UART +* 1 Google Goldfish RTC +* 1 SiFive Test device +* 8 virtio-mmio transport devices +* 1 generic PCIe host bridge +* The fw_cfg device that allows a guest to obtain data from QEMU + +Note that the default CPU is a generic RV32GC/RV64GC. Optional extensions +can be enabled via command line parameters, e.g.: ``-cpu rv64,x-h=true`` +enables the hypervisor extension for RV64. + +Hardware configuration information +---------------------------------- + +The ``virt`` machine automatically generates a device tree blob ("dtb") +which it passes to the guest, if there is no ``-dtb`` option. This provides +information about the addresses, interrupt lines and other configuration of +the various devices in the system. Guest software should discover the devices +that are present in the generated DTB. + +If users want to provide their own DTB, they can use the ``-dtb`` option. +These DTBs should have the following requirements: + +* The number of subnodes of the /cpus node should match QEMU's ``-smp`` option +* The /memory reg size should match QEMU’s selected ram_size via ``-m`` +* Should contain a node for the CLINT device with a compatible string + "riscv,clint0" if using with OpenSBI BIOS images + +Boot options +------------ + +The ``virt`` machine can start using the standard -kernel functionality +for loading a Linux kernel, a VxWorks kernel, an S-mode U-Boot bootloader +with the default OpenSBI firmware image as the -bios. It also supports +the recommended RISC-V bootflow: U-Boot SPL (M-mode) loads OpenSBI fw_dynamic +firmware and U-Boot proper (S-mode), using the standard -bios functionality. + +Machine-specific options +------------------------ + +The following machine-specific options are supported: + +- aclint=[on|off] + + When this option is "on", ACLINT devices will be emulated instead of + SiFive CLINT. When not specified, this option is assumed to be "off". + +Running Linux kernel +-------------------- + +Linux mainline v5.12 release is tested at the time of writing. To build a +Linux mainline kernel that can be booted by the ``virt`` machine in +64-bit mode, simply configure the kernel using the defconfig configuration: + +.. code-block:: bash + + $ export ARCH=riscv + $ export CROSS_COMPILE=riscv64-linux- + $ make defconfig + $ make + +To boot the newly built Linux kernel in QEMU with the ``virt`` machine: + +.. code-block:: bash + + $ qemu-system-riscv64 -M virt -smp 4 -m 2G \ + -display none -serial stdio \ + -kernel arch/riscv/boot/Image \ + -initrd /path/to/rootfs.cpio \ + -append "root=/dev/ram" + +To build a Linux mainline kernel that can be booted by the ``virt`` machine +in 32-bit mode, use the rv32_defconfig configuration. A patch is required to +fix the 32-bit boot issue for Linux kernel v5.12. + +.. code-block:: bash + + $ export ARCH=riscv + $ export CROSS_COMPILE=riscv64-linux- + $ curl https://patchwork.kernel.org/project/linux-riscv/patch/20210627135117.28641-1-bmeng.cn@gmail.com/mbox/ > riscv.patch + $ git am riscv.patch + $ make rv32_defconfig + $ make + +Replace ``qemu-system-riscv64`` with ``qemu-system-riscv32`` in the command +line above to boot the 32-bit Linux kernel. A rootfs image containing 32-bit +applications shall be used in order for kernel to boot to user space. + +Running U-Boot +-------------- + +U-Boot mainline v2021.04 release is tested at the time of writing. To build an +S-mode U-Boot bootloader that can be booted by the ``virt`` machine, use +the qemu-riscv64_smode_defconfig with similar commands as described above for Linux: + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ make qemu-riscv64_smode_defconfig + +Boot the 64-bit U-Boot S-mode image directly: + +.. code-block:: bash + + $ qemu-system-riscv64 -M virt -smp 4 -m 2G \ + -display none -serial stdio \ + -kernel /path/to/u-boot.bin + +To test booting U-Boot SPL which in M-mode, which in turn loads a FIT image +that bundles OpenSBI fw_dynamic firmware and U-Boot proper (S-mode) together, +build the U-Boot images using riscv64_spl_defconfig: + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ export OPENSBI=/path/to/opensbi-riscv64-generic-fw_dynamic.bin + $ make qemu-riscv64_spl_defconfig + +The minimal QEMU commands to run U-Boot SPL are: + +.. code-block:: bash + + $ qemu-system-riscv64 -M virt -smp 4 -m 2G \ + -display none -serial stdio \ + -bios /path/to/u-boot-spl \ + -device loader,file=/path/to/u-boot.itb,addr=0x80200000 + +To test 32-bit U-Boot images, switch to use qemu-riscv32_smode_defconfig and +riscv32_spl_defconfig builds, and replace ``qemu-system-riscv64`` with +``qemu-system-riscv32`` in the command lines above to boot the 32-bit U-Boot. diff --git a/docs/system/s390x/3270.rst b/docs/system/s390x/3270.rst new file mode 100644 index 000000000..0e173b323 --- /dev/null +++ b/docs/system/s390x/3270.rst @@ -0,0 +1,63 @@ +3270 devices +============ + +The 3270 is the classic 'green-screen' console of the mainframes (see the +`IBM 3270 Wikipedia article <https://en.wikipedia.org/wiki/IBM_3270>`__). + +The 3270 data stream is not implemented within QEMU; the device only provides +TN3270 (a telnet extension; see `RFC 854 <https://tools.ietf.org/html/rfc854>`__ +and `RFC 1576 <https://tools.ietf.org/html/rfc1576>`__) and leaves the heavy +lifting to an external 3270 terminal emulator (such as ``x3270``) to make a +single 3270 device available to a guest. Note that this supports basic +features only. + +To provide a 3270 device to a guest, create a ``x-terminal3270`` linked to +a ``tn3270`` chardev. The guest will see a 3270 channel device. In order +to actually be able to use it, attach the ``x3270`` emulator to the chardev. + +Example configuration +--------------------- + +* Make sure that 3270 support is enabled in the guest's Linux kernel. You need + ``CONFIG_TN3270`` and at least one of ``CONFIG_TN3270_TTY`` (for additional + ttys) or ``CONFIG_TN3270_CONSOLE`` (for a 3270 console). + +* Add a ``tn3270`` chardev and a ``x-terminal3270`` to the QEMU command line:: + + -chardev socket,id=ch0,host=0.0.0.0,port=2300,wait=off,server=on,tn3270=on + -device x-terminal3270,chardev=ch0,devno=fe.0.000a,id=terminal0 + +* Start the guest. In the guest, use ``chccwdev -e 0.0.000a`` to enable + the device. + +* On the host, start the ``x3270`` emulator:: + + x3270 <host>:2300 + +* In the guest, locate the 3270 device node under ``/dev/3270/`` (say, + ``tty1``) and start a getty on it:: + + systemctl start serial-getty@3270-tty1.service + + This should get you an additional tty for logging into the guest. + +* If you want to use the 3270 device as the Linux kernel console instead of + an additional tty, you can also append ``conmode=3270 condev=000a`` to + the guest's kernel command line. The kernel then should use the 3270 as + console after the next boot. + +Restrictions +------------ + +3270 support is very basic. In particular: + +* Only one 3270 device is supported. + +* It has only been tested with Linux guests and the x3270 emulator. + +* TLS/SSL is not supported. + +* Resizing on reattach is not supported. + +* Multiple commands in one inbound buffer (for example, when the reset key + is pressed while the network is slow) are not supported. diff --git a/docs/system/s390x/bootdevices.rst b/docs/system/s390x/bootdevices.rst new file mode 100644 index 000000000..9e591cb9d --- /dev/null +++ b/docs/system/s390x/bootdevices.rst @@ -0,0 +1,82 @@ +Boot devices on s390x +===================== + +Booting with bootindex parameter +-------------------------------- + +For classical mainframe guests (i.e. LPAR or z/VM installations), you always +have to explicitly specify the disk where you want to boot from (or "IPL" from, +in s390x-speak -- IPL means "Initial Program Load"). In particular, there can +also be only one boot device according to the architecture specification, thus +specifying multiple boot devices is not possible (yet). + +So for booting an s390x guest in QEMU, you should always mark the +device where you want to boot from with the ``bootindex`` property, for +example:: + + qemu-system-s390x -drive if=none,id=dr1,file=guest.qcow2 \ + -device virtio-blk,drive=dr1,bootindex=1 + +For booting from a CD-ROM ISO image (which needs to include El-Torito boot +information in order to be bootable), it is recommended to specify a ``scsi-cd`` +device, for example like this:: + + qemu-system-s390x -blockdev file,node-name=c1,filename=... \ + -device virtio-scsi \ + -device scsi-cd,drive=c1,bootindex=1 + +Note that you really have to use the ``bootindex`` property to select the +boot device. The old-fashioned ``-boot order=...`` command of QEMU (and +also ``-boot once=...``) is not supported on s390x. + + +Booting without bootindex parameter +----------------------------------- + +The QEMU guest firmware (the so-called s390-ccw bios) has also some rudimentary +support for scanning through the available block devices. So in case you did +not specify a boot device with the ``bootindex`` property, there is still a +chance that it finds a bootable device on its own and starts a guest operating +system from it. However, this scanning algorithm is still very rough and may +be incomplete, so that it might fail to detect a bootable device in many cases. +It is really recommended to always specify the boot device with the +``bootindex`` property instead. + +This also means that you should avoid the classical short-cut commands like +``-hda``, ``-cdrom`` or ``-drive if=virtio``, since it is not possible to +specify the ``bootindex`` with these commands. Note that the convenience +``-cdrom`` option even does not give you a real (virtio-scsi) CD-ROM device on +s390x. Due to technical limitations in the QEMU code base, you will get a +virtio-blk device with this parameter instead, which might not be the right +device type for installing a Linux distribution via ISO image. It is +recommended to specify a CD-ROM device via ``-device scsi-cd`` (as mentioned +above) instead. + + +Booting from a network device +----------------------------- + +Beside the normal guest firmware (which is loaded from the file ``s390-ccw.img`` +in the data directory of QEMU, or via the ``-bios`` option), QEMU ships with +a small TFTP network bootloader firmware for virtio-net-ccw devices, too. This +firmware is loaded from a file called ``s390-netboot.img`` in the QEMU data +directory. In case you want to load it from a different filename instead, +you can specify it via the ``-global s390-ipl.netboot_fw=filename`` +command line option. + +The ``bootindex`` property is especially important for booting via the network. +If you don't specify the the ``bootindex`` property here, the network bootloader +firmware code won't get loaded into the guest memory so that the network boot +will fail. For a successful network boot, try something like this:: + + qemu-system-s390x -netdev user,id=n1,tftp=...,bootfile=... \ + -device virtio-net-ccw,netdev=n1,bootindex=1 + +The network bootloader firmware also has basic support for pxelinux.cfg-style +configuration files. See the `PXELINUX Configuration page +<https://wiki.syslinux.org/wiki/index.php?title=PXELINUX#Configuration>`__ +for details how to set up the configuration file on your TFTP server. +The supported configuration file entries are ``DEFAULT``, ``LABEL``, +``KERNEL``, ``INITRD`` and ``APPEND`` (see the `Syslinux Config file syntax +<https://wiki.syslinux.org/wiki/index.php?title=Config>`__ for more +information). diff --git a/docs/system/s390x/css.rst b/docs/system/s390x/css.rst new file mode 100644 index 000000000..3b4016118 --- /dev/null +++ b/docs/system/s390x/css.rst @@ -0,0 +1,86 @@ +The virtual channel subsystem +============================= + +QEMU implements a virtual channel subsystem with subchannels, (mostly +functionless) channel paths, and channel devices (virtio-ccw, 3270, and +devices passed via vfio-ccw). It supports multiple subchannel sets (MSS) and +multiple channel subsystems extended (MCSS-E). + +All channel devices support the ``devno`` property, which takes a parameter +in the form ``<cssid>.<ssid>.<device number>``. + +The default channel subsystem image id (``<cssid>``) is ``0xfe``. Devices in +there will show up in channel subsystem image ``0`` to guests that do not +enable MCSS-E. Note that devices with a different cssid will not be visible +if the guest OS does not enable MCSS-E (which is true for all supported guest +operating systems today). + +Supported values for the subchannel set id (``<ssid>``) range from ``0-3``. +Devices with a ssid that is not ``0`` will not be visible if the guest OS +does not enable MSS (any Linux version that supports virtio also enables MSS). +Any device may be put into any subchannel set, there is no restriction by +device type. + +The device number can range from ``0-0xffff``. + +If the ``devno`` property is not specified for a device, QEMU will choose the +next free device number in subchannel set 0, skipping to the next subchannel +set if no more device numbers are free. + +QEMU places a device at the first free subchannel in the specified subchannel +set. If a device is hotunplugged and later replugged, it may appear at a +different subchannel. (This is similar to how z/VM works.) + + +Examples +-------- + +* a virtio-net device, cssid/ssid/devno automatically assigned:: + + -device virtio-net-ccw + + In a Linux guest (without default devices and no other devices specified + prior to this one), this will show up as ``0.0.0000`` under subchannel + ``0.0.0000``. + + The auto-assigned-properties in QEMU (as seen via e.g. ``info qtree``) + would be ``dev_id = "fe.0.0000"`` and ``subch_id = "fe.0.0000"``. + +* a virtio-rng device in subchannel set ``0``:: + + -device virtio-rng-ccw,devno=fe.0.0042 + + If added to the same Linux guest as above, it would show up as ``0.0.0042`` + under subchannel ``0.0.0001``. + + The properties for the device would be ``dev_id = "fe.0.0042"`` and + ``subch_id = "fe.0.0001"``. + +* a virtio-gpu device in subchannel set ``2``:: + + -device virtio-gpu-ccw,devno=fe.2.1111 + + If added to the same Linux guest as above, it would show up as ``0.2.1111`` + under subchannel ``0.2.0000``. + + The properties for the device would be ``dev_id = "fe.2.1111"`` and + ``subch_id = "fe.2.0000"``. + +* a virtio-mouse device in a non-standard channel subsystem image:: + + -device virtio-mouse-ccw,devno=2.0.2222 + + This would not show up in a standard Linux guest. + + The properties for the device would be ``dev_id = "2.0.2222"`` and + ``subch_id = "2.0.0000"``. + +* a virtio-keyboard device in another non-standard channel subsystem image:: + + -device virtio-keyboard-ccw,devno=0.0.1234 + + This would not show up in a standard Linux guest, either, as ``0`` is not + the standard channel subsystem image id. + + The properties for the device would be ``dev_id = "0.0.1234"`` and + ``subch_id = "0.0.0000"``. diff --git a/docs/system/s390x/protvirt.rst b/docs/system/s390x/protvirt.rst new file mode 100644 index 000000000..aee63ed7e --- /dev/null +++ b/docs/system/s390x/protvirt.rst @@ -0,0 +1,67 @@ +Protected Virtualization on s390x +================================= + +The memory and most of the registers of Protected Virtual Machines +(PVMs) are encrypted or inaccessible to the hypervisor, effectively +prohibiting VM introspection when the VM is running. At rest, PVMs are +encrypted and can only be decrypted by the firmware, represented by an +entity called Ultravisor, of specific IBM Z machines. + + +Prerequisites +------------- + +To run PVMs, a machine with the Protected Virtualization feature, as +indicated by the Ultravisor Call facility (stfle bit 158), is +required. The Ultravisor needs to be initialized at boot by setting +``prot_virt=1`` on the host's kernel command line. + +Running PVMs requires using the KVM hypervisor. + +If those requirements are met, the capability ``KVM_CAP_S390_PROTECTED`` +will indicate that KVM can support PVMs on that LPAR. + + +Running a Protected Virtual Machine +----------------------------------- + +To run a PVM you will need to select a CPU model which includes the +``Unpack facility`` (stfle bit 161 represented by the feature +``unpack``/``S390_FEAT_UNPACK``), and add these options to the command line:: + + -object s390-pv-guest,id=pv0 \ + -machine confidential-guest-support=pv0 + +Adding these options will: + +* Ensure the ``unpack`` facility is available +* Enable the IOMMU by default for all I/O devices +* Initialize the PV mechanism + +Passthrough (vfio) devices are currently not supported. + +Host huge page backings are not supported. However guests can use huge +pages as indicated by its facilities. + + +Boot Process +------------ + +A secure guest image can either be loaded from disk or supplied on the +QEMU command line. Booting from disk is done by the unmodified +s390-ccw BIOS. I.e., the bootmap is interpreted, multiple components +are read into memory and control is transferred to one of the +components (zipl stage3). Stage3 does some fixups and then transfers +control to some program residing in guest memory, which is normally +the OS kernel. The secure image has another component prepended +(stage3a) that uses the new diag308 subcodes 8 and 10 to trigger the +transition into secure mode. + +Booting from the image supplied on the QEMU command line requires that +the file passed via -kernel has the same memory layout as would result +from the disk boot. This memory layout includes the encrypted +components (kernel, initrd, cmdline), the stage3a loader and +metadata. In case this boot method is used, the command line +options -initrd and -cmdline are ineffective. The preparation of a PVM +image is done via the ``genprotimg`` tool from the s390-tools +collection. diff --git a/docs/system/s390x/vfio-ap.rst b/docs/system/s390x/vfio-ap.rst new file mode 100644 index 000000000..084ba9c4e --- /dev/null +++ b/docs/system/s390x/vfio-ap.rst @@ -0,0 +1,916 @@ +Adjunct Processor (AP) Device +============================= + +.. contents:: + +Introduction +------------ + +The IBM Adjunct Processor (AP) Cryptographic Facility is comprised +of three AP instructions and from 1 to 256 PCIe cryptographic adapter cards. +These AP devices provide cryptographic functions to all CPUs assigned to a +linux system running in an IBM Z system LPAR. + +On s390x, AP adapter cards are exposed via the AP bus. This document +describes how those cards may be made available to KVM guests using the +VFIO mediated device framework. + +AP Architectural Overview +------------------------- + +In order understand the terminology used in the rest of this document, let's +start with some definitions: + +* AP adapter + + An AP adapter is an IBM Z adapter card that can perform cryptographic + functions. There can be from 0 to 256 adapters assigned to an LPAR depending + on the machine model. Adapters assigned to the LPAR in which a linux host is + running will be available to the linux host. Each adapter is identified by a + number from 0 to 255; however, the maximum adapter number allowed is + determined by machine model. When installed, an AP adapter is accessed by + AP instructions executed by any CPU. + +* AP domain + + An adapter is partitioned into domains. Each domain can be thought of as + a set of hardware registers for processing AP instructions. An adapter can + hold up to 256 domains; however, the maximum domain number allowed is + determined by machine model. Each domain is identified by a number from 0 to + 255. Domains can be further classified into two types: + + * Usage domains are domains that can be accessed directly to process AP + commands + + * Control domains are domains that are accessed indirectly by AP + commands sent to a usage domain to control or change the domain; for + example, to set a secure private key for the domain. + +* AP Queue + + An AP queue is the means by which an AP command-request message is sent to an + AP usage domain inside a specific AP. An AP queue is identified by a tuple + comprised of an AP adapter ID (APID) and an AP queue index (APQI). The + APQI corresponds to a given usage domain number within the adapter. This tuple + forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP + instructions include a field containing the APQN to identify the AP queue to + which the AP command-request message is to be sent for processing. + +* AP Instructions: + + There are three AP instructions: + + * NQAP: to enqueue an AP command-request message to a queue + * DQAP: to dequeue an AP command-reply message from a queue + * PQAP: to administer the queues + + AP instructions identify the domain that is targeted to process the AP + command; this must be one of the usage domains. An AP command may modify a + domain that is not one of the usage domains, but the modified domain + must be one of the control domains. + +Start Interpretive Execution (SIE) Instruction +---------------------------------------------- + +A KVM guest is started by executing the Start Interpretive Execution (SIE) +instruction. The SIE state description is a control block that contains the +state information for a KVM guest and is supplied as input to the SIE +instruction. The SIE state description contains a satellite control block called +the Crypto Control Block (CRYCB). The CRYCB contains three fields to identify +the adapters, usage domains and control domains assigned to the KVM guest: + +* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned + to the KVM guest. Each bit in the mask, from left to right, corresponds to + an APID from 0-255. If a bit is set, the corresponding adapter is valid for + use by the KVM guest. + +* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains + assigned to the KVM guest. Each bit in the mask, from left to right, + corresponds to an AP queue index (APQI) from 0-255. If a bit is set, the + corresponding queue is valid for use by the KVM guest. + +* The AP Domain Mask field is a bit mask that identifies the AP control domains + assigned to the KVM guest. The ADM bit mask controls which domains can be + changed by an AP command-request message sent to a usage domain from the + guest. Each bit in the mask, from left to right, corresponds to a domain from + 0-255. If a bit is set, the corresponding domain can be modified by an AP + command-request message sent to a usage domain. + +If you recall from the description of an AP Queue, AP instructions include +an APQN to identify the AP adapter and AP queue to which an AP command-request +message is to be sent (NQAP and PQAP instructions), or from which a +command-reply message is to be received (DQAP instruction). The validity of an +APQN is defined by the matrix calculated from the APM and AQM; it is the +cross product of all assigned adapter numbers (APM) with all assigned queue +indexes (AQM). For example, if adapters 1 and 2 and usage domains 5 and 6 are +assigned to a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for +the guest. + +The APQNs can provide secure key functionality - i.e., a private key is stored +on the adapter card for each of its domains - so each APQN must be assigned to +at most one guest or the linux host. + +Example 1: Valid configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + ++----------+--------+--------+ +| | Guest1 | Guest2 | ++==========+========+========+ +| adapters | 1, 2 | 1, 2 | ++----------+--------+--------+ +| domains | 5, 6 | 7 | ++----------+--------+--------+ + +This is valid because both guests have a unique set of APQNs: + +* Guest1 has APQNs (1,5), (1,6), (2,5) and (2,6); +* Guest2 has APQNs (1,7) and (2,7). + +Example 2: Valid configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + ++----------+--------+--------+ +| | Guest1 | Guest2 | ++==========+========+========+ +| adapters | 1, 2 | 3, 4 | ++----------+--------+--------+ +| domains | 5, 6 | 5, 6 | ++----------+--------+--------+ + +This is also valid because both guests have a unique set of APQNs: + +* Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); +* Guest2 has APQNs (3,5), (3,6), (4,5), (4,6) + +Example 3: Invalid configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + ++----------+--------+--------+ +| | Guest1 | Guest2 | ++==========+========+========+ +| adapters | 1, 2 | 1 | ++----------+--------+--------+ +| domains | 5, 6 | 6, 7 | ++----------+--------+--------+ + +This is an invalid configuration because both guests have access to +APQN (1,6). + +AP Matrix Configuration on Linux Host +------------------------------------- + +A linux system is a guest of the LPAR in which it is running and has access to +the AP resources configured for the LPAR. The LPAR's AP matrix is +configured via its Activation Profile which can be edited on the HMC. When the +linux system is started, the AP bus will detect the AP devices assigned to the +LPAR and create the following in sysfs:: + + /sys/bus/ap + ... [devices] + ...... xx.yyyy + ...... ... + ...... cardxx + ...... ... + +Where: + +``cardxx`` + is AP adapter number xx (in hex) + +``xx.yyyy`` + is an APQN with xx specifying the APID and yyyy specifying the APQI + +For example, if AP adapters 5 and 6 and domains 4, 71 (0x47), 171 (0xab) and +255 (0xff) are configured for the LPAR, the sysfs representation on the linux +host system would look like this:: + + /sys/bus/ap + ... [devices] + ...... 05.0004 + ...... 05.0047 + ...... 05.00ab + ...... 05.00ff + ...... 06.0004 + ...... 06.0047 + ...... 06.00ab + ...... 06.00ff + ...... card05 + ...... card06 + +A set of default device drivers are also created to control each type of AP +device that can be assigned to the LPAR on which a linux host is running:: + + /sys/bus/ap + ... [drivers] + ...... [cex2acard] for Crypto Express 2/3 accelerator cards + ...... [cex2aqueue] for AP queues served by Crypto Express 2/3 + accelerator cards + ...... [cex4card] for Crypto Express 4/5/6 accelerator and coprocessor + cards + ...... [cex4queue] for AP queues served by Crypto Express 4/5/6 + accelerator and coprocessor cards + ...... [pcixcccard] for Crypto Express 2/3 coprocessor cards + ...... [pcixccqueue] for AP queues served by Crypto Express 2/3 + coprocessor cards + +Binding AP devices to device drivers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There are two sysfs files that specify bitmasks marking a subset of the APQN +range as 'usable by the default AP queue device drivers' or 'not usable by the +default device drivers' and thus available for use by the alternate device +driver(s). The sysfs locations of the masks are:: + + /sys/bus/ap/apmask + /sys/bus/ap/aqmask + +The ``apmask`` is a 256-bit mask that identifies a set of AP adapter IDs +(APID). Each bit in the mask, from left to right (i.e., from most significant +to least significant bit in big endian order), corresponds to an APID from +0-255. If a bit is set, the APID is marked as usable only by the default AP +queue device drivers; otherwise, the APID is usable by the vfio_ap +device driver. + +The ``aqmask`` is a 256-bit mask that identifies a set of AP queue indexes +(APQI). Each bit in the mask, from left to right (i.e., from most significant +to least significant bit in big endian order), corresponds to an APQI from +0-255. If a bit is set, the APQI is marked as usable only by the default AP +queue device drivers; otherwise, the APQI is usable by the vfio_ap device +driver. + +Take, for example, the following mask:: + + 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff + +It indicates: + + 1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6 + belong to the vfio_ap device driver's pool. + +The APQN of each AP queue device assigned to the linux host is checked by the +AP bus against the set of APQNs derived from the cross product of APIDs +and APQIs marked as usable only by the default AP queue device drivers. If a +match is detected, only the default AP queue device drivers will be probed; +otherwise, the vfio_ap device driver will be probed. + +By default, the two masks are set to reserve all APQNs for use by the default +AP queue device drivers. There are two ways the default masks can be changed: + + 1. The sysfs mask files can be edited by echoing a string into the + respective sysfs mask file in one of two formats: + + * An absolute hex string starting with 0x - like "0x12345678" - sets + the mask. If the given string is shorter than the mask, it is padded + with 0s on the right; for example, specifying a mask value of 0x41 is + the same as specifying:: + + 0x4100000000000000000000000000000000000000000000000000000000000000 + + Keep in mind that the mask reads from left to right (i.e., most + significant to least significant bit in big endian order), so the mask + above identifies device numbers 1 and 7 (``01000001``). + + If the string is longer than the mask, the operation is terminated with + an error (EINVAL). + + * Individual bits in the mask can be switched on and off by specifying + each bit number to be switched in a comma separated list. Each bit + number string must be prepended with a (``+``) or minus (``-``) to indicate + the corresponding bit is to be switched on (``+``) or off (``-``). Some + valid values are:: + + "+0" switches bit 0 on + "-13" switches bit 13 off + "+0x41" switches bit 65 on + "-0xff" switches bit 255 off + + The following example:: + + +0,-6,+0x47,-0xf0 + + Switches bits 0 and 71 (0x47) on + Switches bits 6 and 240 (0xf0) off + + Note that the bits not specified in the list remain as they were before + the operation. + + 2. The masks can also be changed at boot time via parameters on the kernel + command line like this:: + + ap.apmask=0xffff ap.aqmask=0x40 + + This would create the following masks: + + apmask:: + + 0xffff000000000000000000000000000000000000000000000000000000000000 + + aqmask:: + + 0x4000000000000000000000000000000000000000000000000000000000000000 + + Resulting in these two pools:: + + default drivers pool: adapter 0-15, domain 1 + alternate drivers pool: adapter 16-255, domains 0, 2-255 + +Configuring an AP matrix for a linux guest +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The sysfs interfaces for configuring an AP matrix for a guest are built on the +VFIO mediated device framework. To configure an AP matrix for a guest, a +mediated matrix device must first be created for the ``/sys/devices/vfio_ap/matrix`` +device. When the vfio_ap device driver is loaded, it registers with the VFIO +mediated device framework. When the driver registers, the sysfs interfaces for +creating mediated matrix devices is created:: + + /sys/devices + ... [vfio_ap] + ......[matrix] + ......... [mdev_supported_types] + ............ [vfio_ap-passthrough] + ............... create + ............... [devices] + +A mediated AP matrix device is created by writing a UUID to the attribute file +named ``create``, for example:: + + uuidgen > create + +or + +:: + + echo $uuid > create + +When a mediated AP matrix device is created, a sysfs directory named after +the UUID is created in the ``devices`` subdirectory:: + + /sys/devices + ... [vfio_ap] + ......[matrix] + ......... [mdev_supported_types] + ............ [vfio_ap-passthrough] + ............... create + ............... [devices] + .................. [$uuid] + +There will also be three sets of attribute files created in the mediated +matrix device's sysfs directory to configure an AP matrix for the +KVM guest:: + + /sys/devices + ... [vfio_ap] + ......[matrix] + ......... [mdev_supported_types] + ............ [vfio_ap-passthrough] + ............... create + ............... [devices] + .................. [$uuid] + ..................... assign_adapter + ..................... assign_control_domain + ..................... assign_domain + ..................... matrix + ..................... unassign_adapter + ..................... unassign_control_domain + ..................... unassign_domain + +``assign_adapter`` + To assign an AP adapter to the mediated matrix device, its APID is written + to the ``assign_adapter`` file. This may be done multiple times to assign more + than one adapter. The APID may be specified using conventional semantics + as a decimal, hexadecimal, or octal number. For example, to assign adapters + 4, 5 and 16 to a mediated matrix device in decimal, hexadecimal and octal + respectively:: + + echo 4 > assign_adapter + echo 0x5 > assign_adapter + echo 020 > assign_adapter + + In order to successfully assign an adapter: + + * The adapter number specified must represent a value from 0 up to the + maximum adapter number allowed by the machine model. If an adapter number + higher than the maximum is specified, the operation will terminate with + an error (ENODEV). + + * All APQNs that can be derived from the adapter ID being assigned and the + IDs of the previously assigned domains must be bound to the vfio_ap device + driver. If no domains have yet been assigned, then there must be at least + one APQN with the specified APID bound to the vfio_ap driver. If no such + APQNs are bound to the driver, the operation will terminate with an + error (EADDRNOTAVAIL). + + * No APQN that can be derived from the adapter ID and the IDs of the + previously assigned domains can be assigned to another mediated matrix + device. If an APQN is assigned to another mediated matrix device, the + operation will terminate with an error (EADDRINUSE). + +``unassign_adapter`` + To unassign an AP adapter, its APID is written to the ``unassign_adapter`` + file. This may also be done multiple times to unassign more than one adapter. + +``assign_domain`` + To assign a usage domain, the domain number is written into the + ``assign_domain`` file. This may be done multiple times to assign more than one + usage domain. The domain number is specified using conventional semantics as + a decimal, hexadecimal, or octal number. For example, to assign usage domains + 4, 8, and 71 to a mediated matrix device in decimal, hexadecimal and octal + respectively:: + + echo 4 > assign_domain + echo 0x8 > assign_domain + echo 0107 > assign_domain + + In order to successfully assign a domain: + + * The domain number specified must represent a value from 0 up to the + maximum domain number allowed by the machine model. If a domain number + higher than the maximum is specified, the operation will terminate with + an error (ENODEV). + + * All APQNs that can be derived from the domain ID being assigned and the IDs + of the previously assigned adapters must be bound to the vfio_ap device + driver. If no domains have yet been assigned, then there must be at least + one APQN with the specified APQI bound to the vfio_ap driver. If no such + APQNs are bound to the driver, the operation will terminate with an + error (EADDRNOTAVAIL). + + * No APQN that can be derived from the domain ID being assigned and the IDs + of the previously assigned adapters can be assigned to another mediated + matrix device. If an APQN is assigned to another mediated matrix device, + the operation will terminate with an error (EADDRINUSE). + +``unassign_domain`` + To unassign a usage domain, the domain number is written into the + ``unassign_domain`` file. This may be done multiple times to unassign more than + one usage domain. + +``assign_control_domain`` + To assign a control domain, the domain number is written into the + ``assign_control_domain`` file. This may be done multiple times to + assign more than one control domain. The domain number may be specified using + conventional semantics as a decimal, hexadecimal, or octal number. For + example, to assign control domains 4, 8, and 71 to a mediated matrix device + in decimal, hexadecimal and octal respectively:: + + echo 4 > assign_domain + echo 0x8 > assign_domain + echo 0107 > assign_domain + + In order to successfully assign a control domain, the domain number + specified must represent a value from 0 up to the maximum domain number + allowed by the machine model. If a control domain number higher than the + maximum is specified, the operation will terminate with an error (ENODEV). + +``unassign_control_domain`` + To unassign a control domain, the domain number is written into the + ``unassign_domain`` file. This may be done multiple times to unassign more than + one control domain. + +Notes: No changes to the AP matrix will be allowed while a guest using +the mediated matrix device is running. Attempts to assign an adapter, +domain or control domain will be rejected and an error (EBUSY) returned. + +Starting a Linux Guest Configured with an AP Matrix +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To provide a mediated matrix device for use by a guest, the following option +must be specified on the QEMU command line:: + + -device vfio_ap,sysfsdev=$path-to-mdev + +The sysfsdev parameter specifies the path to the mediated matrix device. +There are a number of ways to specify this path:: + + /sys/devices/vfio_ap/matrix/$uuid + /sys/bus/mdev/devices/$uuid + /sys/bus/mdev/drivers/vfio_mdev/$uuid + /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/devices/$uuid + +When the linux guest is started, the guest will open the mediated +matrix device's file descriptor to get information about the mediated matrix +device. The ``vfio_ap`` device driver will update the APM, AQM, and ADM fields in +the guest's CRYCB with the adapter, usage domain and control domains assigned +via the mediated matrix device's sysfs attribute files. Programs running on the +linux guest will then: + +1. Have direct access to the APQNs derived from the cross product of the AP + adapter numbers (APID) and queue indexes (APQI) specified in the APM and AQM + fields of the guests's CRYCB respectively. These APQNs identify the AP queues + that are valid for use by the guest; meaning, AP commands can be sent by the + guest to any of these queues for processing. + +2. Have authorization to process AP commands to change a control domain + identified in the ADM field of the guest's CRYCB. The AP command must be sent + to a valid APQN (see 1 above). + +CPU model features: + +Three CPU model features are available for controlling guest access to AP +facilities: + +1. AP facilities feature + + The AP facilities feature indicates that AP facilities are installed on the + guest. This feature will be exposed for use only if the AP facilities + are installed on the host system. The feature is s390-specific and is + represented as a parameter of the -cpu option on the QEMU command line:: + + qemu-system-s390x -cpu $model,ap=on|off + + Where: + + ``$model`` + is the CPU model defined for the guest (defaults to the model of + the host system if not specified). + + ``ap=on|off`` + indicates whether AP facilities are installed (on) or not + (off). The default for CPU models zEC12 or newer + is ``ap=on``. AP facilities must be installed on the guest if a + vfio-ap device (``-device vfio-ap,sysfsdev=$path``) is configured + for the guest, or the guest will fail to start. + +2. Query Configuration Information (QCI) facility + + The QCI facility is used by the AP bus running on the guest to query the + configuration of the AP facilities. This facility will be available + only if the QCI facility is installed on the host system. The feature is + s390-specific and is represented as a parameter of the -cpu option on the + QEMU command line:: + + qemu-system-s390x -cpu $model,apqci=on|off + + Where: + + ``$model`` + is the CPU model defined for the guest + + ``apqci=on|off`` + indicates whether the QCI facility is installed (on) or + not (off). The default for CPU models zEC12 or newer + is ``apqci=on``; for older models, QCI will not be installed. + + If QCI is installed (``apqci=on``) but AP facilities are not + (``ap=off``), an error message will be logged, but the guest + will be allowed to start. It makes no sense to have QCI + installed if the AP facilities are not; this is considered + an invalid configuration. + + If the QCI facility is not installed, APQNs with an APQI + greater than 15 will not be detected by the AP bus + running on the guest. + +3. Adjunct Process Facility Test (APFT) facility + + The APFT facility is used by the AP bus running on the guest to test the + AP facilities available for a given AP queue. This facility will be available + only if the APFT facility is installed on the host system. The feature is + s390-specific and is represented as a parameter of the -cpu option on the + QEMU command line:: + + qemu-system-s390x -cpu $model,apft=on|off + + Where: + + ``$model`` + is the CPU model defined for the guest (defaults to the model of + the host system if not specified). + + ``apft=on|off`` + indicates whether the APFT facility is installed (on) or + not (off). The default for CPU models zEC12 and + newer is ``apft=on`` for older models, APFT will not be + installed. + + If APFT is installed (``apft=on``) but AP facilities are not + (``ap=off``), an error message will be logged, but the guest + will be allowed to start. It makes no sense to have APFT + installed if the AP facilities are not; this is considered + an invalid configuration. + + It also makes no sense to turn APFT off because the AP bus + running on the guest will not detect CEX4 and newer devices + without it. Since only CEX4 and newer devices are supported + for guest usage, no AP devices can be made accessible to a + guest started without APFT installed. + +Hot plug a vfio-ap device into a running guest +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Only one vfio-ap device can be attached to the virtual machine's ap-bus, so a +vfio-ap device can be hot plugged if and only if no vfio-ap device is attached +to the bus already, whether via the QEMU command line or a prior hot plug +action. + +To hot plug a vfio-ap device, use the QEMU ``device_add`` command:: + + (qemu) device_add vfio-ap,sysfsdev="$path-to-mdev",id="$id" + +Where the ``$path-to-mdev`` value specifies the absolute path to a mediated +device to which AP resources to be used by the guest have been assigned. +``$id`` is the name value for the optional id parameter. + +Note that on Linux guests, the AP devices will be created in the +``/sys/bus/ap/devices`` directory when the AP bus subsequently performs its periodic +scan, so there may be a short delay before the AP devices are accessible on the +guest. + +The command will fail if: + +* A vfio-ap device has already been attached to the virtual machine's ap-bus. + +* The CPU model features for controlling guest access to AP facilities are not + enabled (see 'CPU model features' subsection in the previous section). + +Hot unplug a vfio-ap device from a running guest +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A vfio-ap device can be unplugged from a running KVM guest if a vfio-ap device +has been attached to the virtual machine's ap-bus via the QEMU command line +or a prior hot plug action. + +To hot unplug a vfio-ap device, use the QEMU ``device_del`` command:: + + (qemu) device_del "$id" + +Where ``$id`` is the same id that was specified at device creation. + +On a Linux guest, the AP devices will be removed from the ``/sys/bus/ap/devices`` +directory on the guest when the AP bus subsequently performs its periodic scan, +so there may be a short delay before the AP devices are no longer accessible by +the guest. + +The command will fail if the ``$path-to-mdev`` specified on the ``device_del`` command +does not match the value specified when the vfio-ap device was attached to +the virtual machine's ap-bus. + +Example: Configure AP Matrices for Three Linux Guests +----------------------------------------------------- + +Let's now provide an example to illustrate how KVM guests may be given +access to AP facilities. For this example, we will show how to configure +three guests such that executing the lszcrypt command on the guests would +look like this: + +Guest1:: + + CARD.DOMAIN TYPE MODE + ------------------------------ + 05 CEX5C CCA-Coproc + 05.0004 CEX5C CCA-Coproc + 05.00ab CEX5C CCA-Coproc + 06 CEX5A Accelerator + 06.0004 CEX5A Accelerator + 06.00ab CEX5C CCA-Coproc + +Guest2:: + + CARD.DOMAIN TYPE MODE + ------------------------------ + 05 CEX5A Accelerator + 05.0047 CEX5A Accelerator + 05.00ff CEX5A Accelerator + +Guest3:: + + CARD.DOMAIN TYPE MODE + ------------------------------ + 06 CEX5A Accelerator + 06.0047 CEX5A Accelerator + 06.00ff CEX5A Accelerator + +These are the steps: + +1. Install the vfio_ap module on the linux host. The dependency chain for the + vfio_ap module is: + + * iommu + * s390 + * zcrypt + * vfio + * vfio_mdev + * vfio_mdev_device + * KVM + + To build the vfio_ap module, the kernel build must be configured with the + following Kconfig elements selected: + + * IOMMU_SUPPORT + * S390 + * ZCRYPT + * S390_AP_IOMMU + * VFIO + * VFIO_MDEV + * VFIO_MDEV_DEVICE + * KVM + + If using make menuconfig select the following to build the vfio_ap module:: + -> Device Drivers + -> IOMMU Hardware Support + select S390 AP IOMMU Support + -> VFIO Non-Privileged userspace driver framework + -> Mediated device driver framework + -> VFIO driver for Mediated devices + -> I/O subsystem + -> VFIO support for AP devices + +2. Secure the AP queues to be used by the three guests so that the host can not + access them. To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, + 06.0004, 06.0047, 06.00ab, and 06.00ff for use by the vfio_ap device driver, + the corresponding APQNs must be removed from the default queue drivers pool + as follows:: + + echo -5,-6 > /sys/bus/ap/apmask + + echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask + + This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, + 06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The + sysfs directory for the vfio_ap device driver will now contain symbolic links + to the AP queue devices bound to it:: + + /sys/bus/ap + ... [drivers] + ...... [vfio_ap] + ......... [05.0004] + ......... [05.0047] + ......... [05.00ab] + ......... [05.00ff] + ......... [06.0004] + ......... [06.0047] + ......... [06.00ab] + ......... [06.00ff] + + Keep in mind that only type 10 and newer adapters (i.e., CEX4 and later) + can be bound to the vfio_ap device driver. The reason for this is to + simplify the implementation by not needlessly complicating the design by + supporting older devices that will go out of service in the relatively near + future, and for which there are few older systems on which to test. + + The administrator, therefore, must take care to secure only AP queues that + can be bound to the vfio_ap device driver. The device type for a given AP + queue device can be read from the parent card's sysfs directory. For example, + to see the hardware type of the queue 05.0004:: + + cat /sys/bus/ap/devices/card05/hwtype + + The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the + vfio_ap device driver. + +3. Create the mediated devices needed to configure the AP matrixes for the + three guests and to provide an interface to the vfio_ap driver for + use by the guests:: + + /sys/devices/vfio_ap/matrix/ + ... [mdev_supported_types] + ...... [vfio_ap-passthrough] (passthrough mediated matrix device type) + ......... create + ......... [devices] + + To create the mediated devices for the three guests:: + + uuidgen > create + uuidgen > create + uuidgen > create + + or + + :: + + echo $uuid1 > create + echo $uuid2 > create + echo $uuid3 > create + + This will create three mediated devices in the [devices] subdirectory named + after the UUID used to create the mediated device. We'll call them $uuid1, + $uuid2 and $uuid3 and this is the sysfs directory structure after creation:: + + /sys/devices/vfio_ap/matrix/ + ... [mdev_supported_types] + ...... [vfio_ap-passthrough] + ......... [devices] + ............ [$uuid1] + ............... assign_adapter + ............... assign_control_domain + ............... assign_domain + ............... matrix + ............... unassign_adapter + ............... unassign_control_domain + ............... unassign_domain + + ............ [$uuid2] + ............... assign_adapter + ............... assign_control_domain + ............... assign_domain + ............... matrix + ............... unassign_adapter + ............... unassign_control_domain + ............... unassign_domain + + ............ [$uuid3] + ............... assign_adapter + ............... assign_control_domain + ............... assign_domain + ............... matrix + ............... unassign_adapter + ............... unassign_control_domain + ............... unassign_domain + +4. The administrator now needs to configure the matrixes for the mediated + devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3). + + This is how the matrix is configured for Guest1:: + + echo 5 > assign_adapter + echo 6 > assign_adapter + echo 4 > assign_domain + echo 0xab > assign_domain + + Control domains can similarly be assigned using the assign_control_domain + sysfs file. + + If a mistake is made configuring an adapter, domain or control domain, + you can use the ``unassign_xxx`` interfaces to unassign the adapter, domain or + control domain. + + To display the matrix configuration for Guest1:: + + cat matrix + + The output will display the APQNs in the format ``xx.yyyy``, where xx is + the adapter number and yyyy is the domain number. The output for Guest1 + will look like this:: + + 05.0004 + 05.00ab + 06.0004 + 06.00ab + + This is how the matrix is configured for Guest2:: + + echo 5 > assign_adapter + echo 0x47 > assign_domain + echo 0xff > assign_domain + + This is how the matrix is configured for Guest3:: + + echo 6 > assign_adapter + echo 0x47 > assign_domain + echo 0xff > assign_domain + +5. Start Guest1:: + + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ... + +7. Start Guest2:: + + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ... + +7. Start Guest3:: + + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ... + +When the guest is shut down, the mediated matrix devices may be removed. + +Using our example again, to remove the mediated matrix device $uuid1:: + + /sys/devices/vfio_ap/matrix/ + ... [mdev_supported_types] + ...... [vfio_ap-passthrough] + ......... [devices] + ............ [$uuid1] + ............... remove + + + echo 1 > remove + +This will remove all of the mdev matrix device's sysfs structures including +the mdev device itself. To recreate and reconfigure the mdev matrix device, +all of the steps starting with step 3 will have to be performed again. Note +that the remove will fail if a guest using the mdev is still running. + +It is not necessary to remove an mdev matrix device, but one may want to +remove it if no guest will use it during the remaining lifetime of the linux +host. If the mdev matrix device is removed, one may want to also reconfigure +the pool of adapters and queues reserved for use by the default drivers. + +Limitations +----------- + +* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN + to the default drivers pool of a queue that is still assigned to a mediated + device in use by a guest. It is incumbent upon the administrator to + ensure there is no mediated device in use by a guest to which the APQN is + assigned lest the host be given access to the private data of the AP queue + device, such as a private key configured specifically for the guest. + +* Dynamically assigning AP resources to or unassigning AP resources from a + mediated matrix device - see `Configuring an AP matrix for a linux guest`_ + section above - while a running guest is using it is currently not supported. + +* Live guest migration is not supported for guests using AP devices. If a guest + is using AP devices, the vfio-ap device configured for the guest must be + unplugged before migrating the guest (see `Hot unplug a vfio-ap device from a + running guest`_ section above.) diff --git a/docs/system/s390x/vfio-ccw.rst b/docs/system/s390x/vfio-ccw.rst new file mode 100644 index 000000000..41e0bad5b --- /dev/null +++ b/docs/system/s390x/vfio-ccw.rst @@ -0,0 +1,77 @@ +Subchannel passthrough via vfio-ccw +=================================== + +vfio-ccw (based upon the mediated vfio device infrastructure) allows to +make certain I/O subchannels and their devices available to a guest. The +host will not interact with those subchannels/devices any more. + +Note that while vfio-ccw should work with most non-QDIO devices, only ECKD +DASDs have really been tested. + +Example configuration +--------------------- + +Step 1: configure the host device +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As every mdev is identified by a uuid, the first step is to obtain one:: + + [root@host ~]# uuidgen + 7e270a25-e163-4922-af60-757fc8ed48c6 + +Note: it is recommended to use the ``mdevctl`` tool for actually configuring +the host device. + +To define the same device as configured below to be started +automatically, use + +:: + + [root@host ~]# driverctl -b css set-override 0.0.0313 vfio_ccw + [root@host ~]# mdevctl define -u 7e270a25-e163-4922-af60-757fc8ed48c6 \ + -p 0.0.0313 -t vfio_ccw-io -a + +If using ``mdevctl`` is not possible or wanted, follow the manual procedure +below. + +* Locate the subchannel for the device (in this example, ``0.0.2b09``):: + + [root@host ~]# lscss | grep 0.0.2b09 | awk '{print $2}' + 0.0.0313 + +* Unbind the subchannel (in this example, ``0.0.0313``) from the standard + I/O subchannel driver and bind it to the vfio-ccw driver:: + + [root@host ~]# echo 0.0.0313 > /sys/bus/css/devices/0.0.0313/driver/unbind + [root@host ~]# echo 0.0.0313 > /sys/bus/css/drivers/vfio_ccw/bind + +* Create the mediated device (identified by the uuid):: + + [root@host ~]# echo "7e270a25-e163-4922-af60-757fc8ed48c6" > \ + /sys/bus/css/devices/0.0.0313/mdev_supported_types/vfio_ccw-io/create + +Step 2: configure QEMU +~~~~~~~~~~~~~~~~~~~~~~ + +* Reference the created mediated device and (optionally) pick a device id to + be presented in the guest (here, ``fe.0.1234``, which will end up visible + in the guest as ``0.0.1234``:: + + -device vfio-ccw,devno=fe.0.1234,sysfsdev=\ + /sys/bus/mdev/devices/7e270a25-e163-4922-af60-757fc8ed48c6 + +* Start the guest. The device (here, ``0.0.1234``) should now be usable:: + + [root@guest ~]# lscss -d 0.0.1234 + Device Subchan. DevType CU Type Use PIM PAM POM CHPID + ---------------------------------------------------------------------- + 0.0.1234 0.0.0007 3390/0e 3990/e9 f0 f0 ff 1a2a3a0a 00000000 + [root@guest ~]# chccwdev -e 0.0.1234 + Setting device 0.0.1234 online + Done + [root@guest ~]# dmesg -t + (...) + dasd-eckd 0.0.1234: A channel path to the device has become operational + dasd-eckd 0.0.1234: New DASD 3390/0E (CU 3990/01) with 10017 cylinders, 15 heads, 224 sectors + dasd-eckd 0.0.1234: DASD with 4 KB/block, 7212240 KB total size, 48 KB/track, compatible disk layout + dasda:VOL1/ 0X2B09: dasda1 diff --git a/docs/system/secrets.rst b/docs/system/secrets.rst new file mode 100644 index 000000000..4a177369b --- /dev/null +++ b/docs/system/secrets.rst @@ -0,0 +1,162 @@ +.. _secret data: + +Providing secret data to QEMU +----------------------------- + +There are a variety of objects in QEMU which require secret data to be provided +by the administrator or management application. For example, network block +devices often require a password, LUKS block devices require a passphrase to +unlock key material, remote desktop services require an access password. +QEMU has a general purpose mechanism for providing secret data to QEMU in a +secure manner, using the ``secret`` object type. + +At startup this can be done using the ``-object secret,...`` command line +argument. At runtime this can be done using the ``object_add`` QMP / HMP +monitor commands. The examples that follow will illustrate use of ``-object`` +command lines, but they all apply equivalentely in QMP / HMP. When creating +a ``secret`` object it must be given a unique ID string. This ID is then +used to identify the object when configuring the thing which need the data. + + +INSECURE: Passing secrets as clear text inline +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**The following should never be done in a production environment or on a +multi-user host. Command line arguments are usually visible in the process +listings and are often collected in log files by system monitoring agents +or bug reporting tools. QMP/HMP commands and their arguments are also often +logged and attached to bug reports. This all risks compromising secrets that +are passed inline.** + +For the convenience of people debugging / developing with QEMU, it is possible +to pass secret data inline on the command line. + +:: + + -object secret,id=secvnc0,data=87539319 + + +Again it is possible to provide the data in base64 encoded format, which is +particularly useful if the data contains binary characters that would clash +with argument parsing. + +:: + + -object secret,id=secvnc0,data=ODc1MzkzMTk=,format=base64 + + +**Note: base64 encoding does not provide any security benefit.** + +Passing secrets as clear text via a file +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The simplest approach to providing data securely is to use a file to store +the secret: + +:: + + -object secret,id=secvnc0,file=vnc-password.txt + + +In this example the file ``vnc-password.txt`` contains the plain text secret +data. It is important to note that the contents of the file are treated as an +opaque blob. The entire raw file contents is used as the value, thus it is +important not to mistakenly add any trailing newline character in the file if +this newline is not intended to be part of the secret data. + +In some cases it might be more convenient to pass the secret data in base64 +format and have QEMU decode to get the raw bytes before use: + +:: + + -object secret,id=sec0,file=vnc-password.txt,format=base64 + + +The file should generally be given mode ``0600`` or ``0400`` permissions, and +have its user/group ownership set to the same account that the QEMU process +will be launched under. If using mandatory access control such as SELinux, then +the file should be labelled to only grant access to the specific QEMU process +that needs access. This will prevent other processes/users from compromising the +secret data. + + +Passing secrets as cipher text inline +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To address the insecurity of passing secrets inline as clear text, it is +possible to configure a second secret as an AES key to use for decrypting +the data. + +The secret used as the AES key must always be configured using the file based +storage mechanism: + +:: + + -object secret,id=secmaster,file=masterkey.data,format=base64 + + +In this case the ``masterkey.data`` file would be initialized with 32 +cryptographically secure random bytes, which are then base64 encoded. +The contents of this file will by used as an AES-256 key to encrypt the +real secret that can now be safely passed to QEMU inline as cipher text + +:: + + -object secret,id=secvnc0,keyid=secmaster,data=BASE64-CIPHERTEXT,iv=BASE64-IV,format=base64 + + +In this example ``BASE64-CIPHERTEXT`` is the result of AES-256-CBC encrypting +the secret with ``masterkey.data`` and then base64 encoding the ciphertext. +The ``BASE64-IV`` data is 16 random bytes which have been base64 encrypted. +These bytes are used as the initialization vector for the AES-256-CBC value. + +A single master key can be used to encrypt all subsequent secrets, **but it is +critical that a different initialization vector is used for every secret**. + +Passing secrets via the Linux keyring +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The earlier mechanisms described are platform agnostic. If using QEMU on a Linux +host, it is further possible to pass secrets to QEMU using the Linux keyring: + +:: + + -object secret_keyring,id=secvnc0,serial=1729 + + +This instructs QEMU to load data from the Linux keyring secret identified by +the serial number ``1729``. It is possible to combine use of the keyring with +other features mentioned earlier such as base64 encoding: + +:: + + -object secret_keyring,id=secvnc0,serial=1729,format=base64 + + +and also encryption with a master key: + +:: + + -object secret_keyring,id=secvnc0,keyid=secmaster,serial=1729,iv=BASE64-IV + + +Best practice +~~~~~~~~~~~~~ + +It is recommended for production deployments to use a master key secret, and +then pass all subsequent inline secrets encrypted with the master key. + +Each QEMU instance must have a distinct master key, and that must be generated +from a cryptographically secure random data source. The master key should be +deleted immediately upon QEMU shutdown. If passing the master key as a file, +the key file must have access control rules applied that restrict access to +just the one QEMU process that is intended to use it. Alternatively the Linux +keyring can be used to pass the master key to QEMU. + +The secrets for individual QEMU device backends must all then be encrypted +with this master key. + +This procedure helps ensure that the individual secrets for QEMU backends will +not be compromised, even if ``-object`` CLI args or ``object_add`` monitor +commands are collected in log files and attached to public bug support tickets. +The only item that needs strongly protecting is the master key file. diff --git a/docs/system/security.rst b/docs/system/security.rst new file mode 100644 index 000000000..f2092c876 --- /dev/null +++ b/docs/system/security.rst @@ -0,0 +1,173 @@ +Security +======== + +Overview +-------- + +This chapter explains the security requirements that QEMU is designed to meet +and principles for securely deploying QEMU. + +Security Requirements +--------------------- + +QEMU supports many different use cases, some of which have stricter security +requirements than others. The community has agreed on the overall security +requirements that users may depend on. These requirements define what is +considered supported from a security perspective. + +Virtualization Use Case +''''''''''''''''''''''' + +The virtualization use case covers cloud and virtual private server (VPS) +hosting, as well as traditional data center and desktop virtualization. These +use cases rely on hardware virtualization extensions to execute guest code +safely on the physical CPU at close-to-native speed. + +The following entities are untrusted, meaning that they may be buggy or +malicious: + +- Guest +- User-facing interfaces (e.g. VNC, SPICE, WebSocket) +- Network protocols (e.g. NBD, live migration) +- User-supplied files (e.g. disk images, kernels, device trees) +- Passthrough devices (e.g. PCI, USB) + +Bugs affecting these entities are evaluated on whether they can cause damage in +real-world use cases and treated as security bugs if this is the case. + +Non-virtualization Use Case +''''''''''''''''''''''''''' + +The non-virtualization use case covers emulation using the Tiny Code Generator +(TCG). In principle the TCG and device emulation code used in conjunction with +the non-virtualization use case should meet the same security requirements as +the virtualization use case. However, for historical reasons much of the +non-virtualization use case code was not written with these security +requirements in mind. + +Bugs affecting the non-virtualization use case are not considered security +bugs at this time. Users with non-virtualization use cases must not rely on +QEMU to provide guest isolation or any security guarantees. + +Architecture +------------ + +This section describes the design principles that ensure the security +requirements are met. + +Guest Isolation +''''''''''''''' + +Guest isolation is the confinement of guest code to the virtual machine. When +guest code gains control of execution on the host this is called escaping the +virtual machine. Isolation also includes resource limits such as throttling of +CPU, memory, disk, or network. Guests must be unable to exceed their resource +limits. + +QEMU presents an attack surface to the guest in the form of emulated devices. +The guest must not be able to gain control of QEMU. Bugs in emulated devices +could allow malicious guests to gain code execution in QEMU. At this point the +guest has escaped the virtual machine and is able to act in the context of the +QEMU process on the host. + +Guests often interact with other guests and share resources with them. A +malicious guest must not gain control of other guests or access their data. +Disk image files and network traffic must be protected from other guests unless +explicitly shared between them by the user. + +Principle of Least Privilege +'''''''''''''''''''''''''''' + +The principle of least privilege states that each component only has access to +the privileges necessary for its function. In the case of QEMU this means that +each process only has access to resources belonging to the guest. + +The QEMU process should not have access to any resources that are inaccessible +to the guest. This way the guest does not gain anything by escaping into the +QEMU process since it already has access to those same resources from within +the guest. + +Following the principle of least privilege immediately fulfills guest isolation +requirements. For example, guest A only has access to its own disk image file +``a.img`` and not guest B's disk image file ``b.img``. + +In reality certain resources are inaccessible to the guest but must be +available to QEMU to perform its function. For example, host system calls are +necessary for QEMU but are not exposed to guests. A guest that escapes into +the QEMU process can then begin invoking host system calls. + +New features must be designed to follow the principle of least privilege. +Should this not be possible for technical reasons, the security risk must be +clearly documented so users are aware of the trade-off of enabling the feature. + +Isolation mechanisms +'''''''''''''''''''' + +Several isolation mechanisms are available to realize this architecture of +guest isolation and the principle of least privilege. With the exception of +Linux seccomp, these mechanisms are all deployed by management tools that +launch QEMU, such as libvirt. They are also platform-specific so they are only +described briefly for Linux here. + +The fundamental isolation mechanism is that QEMU processes must run as +unprivileged users. Sometimes it seems more convenient to launch QEMU as +root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a +huge security risk. File descriptor passing can be used to give an otherwise +unprivileged QEMU process access to host devices without running QEMU as root. +It is also possible to launch QEMU as a non-root user and configure UNIX groups +for access to ``/dev/kvm``, ``/dev/net/tun``, and other device nodes. +Some Linux distros already ship with UNIX groups for these devices by default. + +- SELinux and AppArmor make it possible to confine processes beyond the + traditional UNIX process and file permissions model. They restrict the QEMU + process from accessing processes and files on the host system that are not + needed by QEMU. + +- Resource limits and cgroup controllers provide throughput and utilization + limits on key resources such as CPU time, memory, and I/O bandwidth. + +- Linux namespaces can be used to make process, file system, and other system + resources unavailable to QEMU. A namespaced QEMU process is restricted to only + those resources that were granted to it. + +- Linux seccomp is available via the QEMU ``--sandbox`` option. It disables + system calls that are not needed by QEMU, thereby reducing the host kernel + attack surface. + +Sensitive configurations +------------------------ + +There are aspects of QEMU that can have security implications which users & +management applications must be aware of. + +Monitor console (QMP and HMP) +''''''''''''''''''''''''''''' + +The monitor console (whether used with QMP or HMP) provides an interface +to dynamically control many aspects of QEMU's runtime operation. Many of the +commands exposed will instruct QEMU to access content on the host file system +and/or trigger spawning of external processes. + +For example, the ``migrate`` command allows for the spawning of arbitrary +processes for the purpose of tunnelling the migration data stream. The +``blockdev-add`` command instructs QEMU to open arbitrary files, exposing +their content to the guest as a virtual disk. + +Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor, +or Linux namespaces, the monitor console should be considered to have privileges +equivalent to those of the user account QEMU is running under. + +It is further important to consider the security of the character device backend +over which the monitor console is exposed. It needs to have protection against +malicious third parties which might try to make unauthorized connections, or +perform man-in-the-middle attacks. Many of the character device backends do not +satisfy this requirement and so must not be used for the monitor console. + +The general recommendation is that the monitor console should be exposed over +a UNIX domain socket backend to the local host only. Use of the TCP based +character device backend is inappropriate unless configured to use both TLS +encryption and authorization control policy on client connections. + +In summary, the monitor console is considered a privileged control interface to +QEMU and as such should only be made accessible to a trusted management +application or user. diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst new file mode 100644 index 000000000..91ebc26c6 --- /dev/null +++ b/docs/system/target-arm.rst @@ -0,0 +1,120 @@ +.. _ARM-System-emulator: + +Arm System emulator +------------------- + +QEMU can emulate both 32-bit and 64-bit Arm CPUs. Use the +``qemu-system-aarch64`` executable to simulate a 64-bit Arm machine. +You can use either ``qemu-system-arm`` or ``qemu-system-aarch64`` +to simulate a 32-bit Arm machine: in general, command lines that +work for ``qemu-system-arm`` will behave the same when used with +``qemu-system-aarch64``. + +QEMU has generally good support for Arm guests. It has support for +nearly fifty different machines. The reason we support so many is that +Arm hardware is much more widely varying than x86 hardware. Arm CPUs +are generally built into "system-on-chip" (SoC) designs created by +many different companies with different devices, and these SoCs are +then built into machines which can vary still further even if they use +the same SoC. Even with fifty boards QEMU does not cover more than a +small fraction of the Arm hardware ecosystem. + +The situation for 64-bit Arm is fairly similar, except that we don't +implement so many different machines. + +As well as the more common "A-profile" CPUs (which have MMUs and will +run Linux) QEMU also supports "M-profile" CPUs such as the Cortex-M0, +Cortex-M4 and Cortex-M33 (which are microcontrollers used in very +embedded boards). For most boards the CPU type is fixed (matching what +the hardware has), so typically you don't need to specify the CPU type +by hand, except for special cases like the ``virt`` board. + +Choosing a board model +====================== + +For QEMU's Arm system emulation, you must specify which board +model you want to use with the ``-M`` or ``--machine`` option; +there is no default. + +Because Arm systems differ so much and in fundamental ways, typically +operating system or firmware images intended to run on one machine +will not run at all on any other. This is often surprising for new +users who are used to the x86 world where every system looks like a +standard PC. (Once the kernel has booted, most userspace software +cares much less about the detail of the hardware.) + +If you already have a system image or a kernel that works on hardware +and you want to boot with QEMU, check whether QEMU lists that machine +in its ``-machine help`` output. If it is listed, then you can probably +use that board model. If it is not listed, then unfortunately your image +will almost certainly not boot on QEMU. (You might be able to +extract the filesystem and use that with a different kernel which +boots on a system that QEMU does emulate.) + +If you don't care about reproducing the idiosyncrasies of a particular +bit of hardware, such as small amount of RAM, no PCI or other hard +disk, etc., and just want to run Linux, the best option is to use the +``virt`` board. This is a platform which doesn't correspond to any +real hardware and is designed for use in virtual machines. You'll +need to compile Linux with a suitable configuration for running on +the ``virt`` board. ``virt`` supports PCI, virtio, recent CPUs and +large amounts of RAM. It also supports 64-bit CPUs. + +Board-specific documentation +============================ + +Unfortunately many of the Arm boards QEMU supports are currently +undocumented; you can get a complete list by running +``qemu-system-aarch64 --machine help``. + +.. + This table of contents should be kept sorted alphabetically + by the title text of each file, which isn't the same ordering + as an alphabetical sort by filename. + +.. toctree:: + :maxdepth: 1 + + arm/integratorcp + arm/mps2 + arm/musca + arm/realview + arm/sbsa + arm/versatile + arm/vexpress + arm/aspeed + arm/sabrelite + arm/digic + arm/cubieboard + arm/emcraft-sf2 + arm/highbank + arm/musicpal + arm/gumstix + arm/mainstone + arm/kzm + arm/nrf + arm/nseries + arm/nuvoton + arm/imx25-pdk + arm/orangepi + arm/palm + arm/raspi + arm/xscale + arm/collie + arm/sx1 + arm/stellaris + arm/stm32 + arm/virt + arm/xlnx-versal-virt + +Emulated CPU architecture support +================================= + +.. toctree:: + arm/emulation + +Arm CPU features +================ + +.. toctree:: + arm/cpu-features diff --git a/docs/system/target-avr.rst b/docs/system/target-avr.rst new file mode 100644 index 000000000..03d5ab51c --- /dev/null +++ b/docs/system/target-avr.rst @@ -0,0 +1,48 @@ +.. _AVR-System-emulator: + +AVR System emulator +------------------- + +Use the executable ``qemu-system-avr`` to emulate a AVR 8 bit based machine. +These can have one of the following cores: avr1, avr2, avr25, avr3, avr31, +avr35, avr4, avr5, avr51, avr6, avrtiny, xmega2, xmega3, xmega4, xmega5, +xmega6 and xmega7. + +As for now it supports few Arduino boards for educational and testing purposes. +These boards use a ATmega controller, which model is limited to USART & 16-bit +timer devices, enough to run FreeRTOS based applications (like +https://github.com/seharris/qemu-avr-tests/blob/master/free-rtos/Demo/AVR_ATMega2560_GCC/demo.elf +). + +Following are examples of possible usages, assuming demo.elf is compiled for +AVR cpu + +- Continuous non interrupted execution:: + + qemu-system-avr -machine mega2560 -bios demo.elf + +- Continuous non interrupted execution with serial output into telnet window:: + + qemu-system-avr -M mega2560 -bios demo.elf -nographic \ + -serial tcp::5678,server=on,wait=off + + and then in another shell:: + + telnet localhost 5678 + +- Debugging with GDB debugger:: + + qemu-system-avr -machine mega2560 -bios demo.elf -s -S + + and then in another shell:: + + avr-gdb demo.elf + + and then within GDB shell:: + + target remote :1234 + +- Print out executed instructions (that have not been translated by the JIT + compiler yet):: + + qemu-system-avr -machine mega2560 -bios demo.elf -d in_asm diff --git a/docs/system/target-i386-desc.rst.inc b/docs/system/target-i386-desc.rst.inc new file mode 100644 index 000000000..7d1fffacb --- /dev/null +++ b/docs/system/target-i386-desc.rst.inc @@ -0,0 +1,73 @@ +The QEMU PC System emulator simulates the following peripherals: + +- i440FX host PCI bridge and PIIX3 PCI to ISA bridge + +- Cirrus CLGD 5446 PCI VGA card or dummy VGA card with Bochs VESA + extensions (hardware level, including all non standard modes). + +- PS/2 mouse and keyboard + +- 2 PCI IDE interfaces with hard disk and CD-ROM support + +- Floppy disk + +- PCI and ISA network adapters + +- Serial ports + +- IPMI BMC, either and internal or external one + +- Creative SoundBlaster 16 sound card + +- ENSONIQ AudioPCI ES1370 sound card + +- Intel 82801AA AC97 Audio compatible sound card + +- Intel HD Audio Controller and HDA codec + +- Adlib (OPL2) - Yamaha YM3812 compatible chip + +- Gravis Ultrasound GF1 sound card + +- CS4231A compatible sound card + +- PC speaker + +- PCI UHCI, OHCI, EHCI or XHCI USB controller and a virtual USB-1.1 + hub. + +SMP is supported with up to 255 CPUs. + +QEMU uses the PC BIOS from the Seabios project and the Plex86/Bochs LGPL +VGA BIOS. + +QEMU uses YM3812 emulation by Tatsuyuki Satoh. + +QEMU uses GUS emulation (GUSEMU32 http://www.deinmeister.de/gusemu/) by +Tibor \"TS\" Schütz. + +Note that, by default, GUS shares IRQ(7) with parallel ports and so QEMU +must be told to not have parallel ports to have working GUS. + +.. parsed-literal:: + + |qemu_system_x86| dos.img -device gus -parallel none + +Alternatively: + +.. parsed-literal:: + + |qemu_system_x86| dos.img -device gus,irq=5 + +Or some other unclaimed IRQ. + +CS4231A is the chip used in Windows Sound System and GUSMAX products + +The PC speaker audio device can be configured using the pcspk-audiodev +machine property, i.e. + +.. parsed-literal:: + + |qemu_system_x86| some.img \ + -audiodev <backend>,id=<name> \ + -machine pcspk-audiodev=<name> diff --git a/docs/system/target-i386.rst b/docs/system/target-i386.rst new file mode 100644 index 000000000..4daa53c35 --- /dev/null +++ b/docs/system/target-i386.rst @@ -0,0 +1,40 @@ +.. _QEMU-PC-System-emulator: + +x86 System emulator +------------------- + +.. _pcsys_005fdevices: + +Board-specific documentation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. + This table of contents should be kept sorted alphabetically + by the title text of each file, which isn't the same ordering + as an alphabetical sort by filename. + +.. toctree:: + :maxdepth: 1 + + i386/microvm + i386/pc + +Architectural features +~~~~~~~~~~~~~~~~~~~~~~ + +.. toctree:: + :maxdepth: 1 + + i386/cpu + i386/kvm-pv + i386/sgx + +.. _pcsys_005freq: + +OS requirements +~~~~~~~~~~~~~~~ + +On x86_64 hosts, the default set of CPU features enabled by the KVM +accelerator require the host to be running Linux v4.5 or newer. Red Hat +Enterprise Linux 7 is also supported, since the required +functionality was backported. diff --git a/docs/system/target-m68k.rst b/docs/system/target-m68k.rst new file mode 100644 index 000000000..d28d3b92e --- /dev/null +++ b/docs/system/target-m68k.rst @@ -0,0 +1,21 @@ +.. _ColdFire-System-emulator: + +ColdFire System emulator +------------------------ + +Use the executable ``qemu-system-m68k`` to simulate a ColdFire machine. +The emulator is able to boot a uClinux kernel. + +The M5208EVB emulation includes the following devices: + +- MCF5208 ColdFire V2 Microprocessor (ISA A+ with EMAC). + +- Three Two on-chip UARTs. + +- Fast Ethernet Controller (FEC) + +The AN5206 emulation includes the following devices: + +- MCF5206 ColdFire V2 Microprocessor. + +- Two on-chip UARTs. diff --git a/docs/system/target-mips.rst b/docs/system/target-mips.rst new file mode 100644 index 000000000..138441bde --- /dev/null +++ b/docs/system/target-mips.rst @@ -0,0 +1,130 @@ +.. _MIPS-System-emulator: + +MIPS System emulator +-------------------- + +Four executables cover simulation of 32 and 64-bit MIPS systems in both +endian options, ``qemu-system-mips``, ``qemu-system-mipsel`` +``qemu-system-mips64`` and ``qemu-system-mips64el``. Five different +machine types are emulated: + +- A generic ISA PC-like machine \"mips\" + +- The MIPS Malta prototype board \"malta\" + +- An ACER Pica \"pica61\". This machine needs the 64-bit emulator. + +- MIPS emulator pseudo board \"mipssim\" + +- A MIPS Magnum R4000 machine \"magnum\". This machine needs the + 64-bit emulator. + +The generic emulation is supported by Debian 'Etch' and is able to +install Debian into a virtual disk image. The following devices are +emulated: + +- A range of MIPS CPUs, default is the 24Kf + +- PC style serial port + +- PC style IDE disk + +- NE2000 network card + +The Malta emulation supports the following devices: + +- Core board with MIPS 24Kf CPU and Galileo system controller + +- PIIX4 PCI/USB/SMbus controller + +- The Multi-I/O chip's serial device + +- PCI network cards (PCnet32 and others) + +- Malta FPGA serial device + +- Cirrus (default) or any other PCI VGA graphics card + +The Boston board emulation supports the following devices: + +- Xilinx FPGA, which includes a PCIe root port and an UART + +- Intel EG20T PCH connects the I/O peripherals, but only the SATA bus + is emulated + +The ACER Pica emulation supports: + +- MIPS R4000 CPU + +- PC-style IRQ and DMA controllers + +- PC Keyboard + +- IDE controller + +The MIPS Magnum R4000 emulation supports: + +- MIPS R4000 CPU + +- PC-style IRQ controller + +- PC Keyboard + +- SCSI controller + +- G364 framebuffer + +The Fuloong 2E emulation supports: + +- Loongson 2E CPU + +- Bonito64 system controller as North Bridge + +- VT82C686 chipset as South Bridge + +- RTL8139D as a network card chipset + +The Loongson-3 virtual platform emulation supports: + +- Loongson 3A CPU + +- LIOINTC as interrupt controller + +- GPEX and virtio as peripheral devices + +- Both KVM and TCG supported + +The mipssim pseudo board emulation provides an environment similar to +what the proprietary MIPS emulator uses for running Linux. It supports: + +- A range of MIPS CPUs, default is the 24Kf + +- PC style serial port + +- MIPSnet network emulation + +.. include:: cpu-models-mips.rst.inc + +.. _nanoMIPS-System-emulator: + +nanoMIPS System emulator +~~~~~~~~~~~~~~~~~~~~~~~~ + +Executable ``qemu-system-mipsel`` also covers simulation of 32-bit +nanoMIPS system in little endian mode: + +- nanoMIPS I7200 CPU + +Example of ``qemu-system-mipsel`` usage for nanoMIPS is shown below: + +Download ``<disk_image_file>`` from +https://mipsdistros.mips.com/LinuxDistro/nanomips/buildroot/index.html. + +Download ``<kernel_image_file>`` from +https://mipsdistros.mips.com/LinuxDistro/nanomips/kernels/v4.15.18-432-gb2eb9a8b07a1-20180627102142/index.html. + +Start system emulation of Malta board with nanoMIPS I7200 CPU:: + + qemu-system-mipsel -cpu I7200 -kernel <kernel_image_file> \ + -M malta -serial stdio -m <memory_size> -hda <disk_image_file> \ + -append "mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda" diff --git a/docs/system/target-ppc.rst b/docs/system/target-ppc.rst new file mode 100644 index 000000000..4f6eb93b1 --- /dev/null +++ b/docs/system/target-ppc.rst @@ -0,0 +1,25 @@ +.. _PowerPC-System-emulator: + +PowerPC System emulator +----------------------- + +Board-specific documentation +============================ + +You can get a complete list by running ``qemu-system-ppc64 --machine +help``. + +.. + This table of contents should be kept sorted alphabetically + by the title text of each file, which isn't the same ordering + as an alphabetical sort by filename. + +.. toctree:: + :maxdepth: 1 + + ppc/embedded + ppc/powermac + ppc/powernv + ppc/ppce500 + ppc/prep + ppc/pseries diff --git a/docs/system/target-riscv.rst b/docs/system/target-riscv.rst new file mode 100644 index 000000000..89a866e4f --- /dev/null +++ b/docs/system/target-riscv.rst @@ -0,0 +1,86 @@ +.. _RISC-V-System-emulator: + +RISC-V System emulator +====================== + +QEMU can emulate both 32-bit and 64-bit RISC-V CPUs. Use the +``qemu-system-riscv64`` executable to simulate a 64-bit RISC-V machine, +``qemu-system-riscv32`` executable to simulate a 32-bit RISC-V machine. + +QEMU has generally good support for RISC-V guests. It has support for +several different machines. The reason we support so many is that +RISC-V hardware is much more widely varying than x86 hardware. RISC-V +CPUs are generally built into "system-on-chip" (SoC) designs created by +many different companies with different devices, and these SoCs are +then built into machines which can vary still further even if they use +the same SoC. + +For most boards the CPU type is fixed (matching what the hardware has), +so typically you don't need to specify the CPU type by hand, except for +special cases like the ``virt`` board. + +Choosing a board model +---------------------- + +For QEMU's RISC-V system emulation, you must specify which board +model you want to use with the ``-M`` or ``--machine`` option; +there is no default. + +Because RISC-V systems differ so much and in fundamental ways, typically +operating system or firmware images intended to run on one machine +will not run at all on any other. This is often surprising for new +users who are used to the x86 world where every system looks like a +standard PC. (Once the kernel has booted, most user space software +cares much less about the detail of the hardware.) + +If you already have a system image or a kernel that works on hardware +and you want to boot with QEMU, check whether QEMU lists that machine +in its ``-machine help`` output. If it is listed, then you can probably +use that board model. If it is not listed, then unfortunately your image +will almost certainly not boot on QEMU. (You might be able to +extract the file system and use that with a different kernel which +boots on a system that QEMU does emulate.) + +If you don't care about reproducing the idiosyncrasies of a particular +bit of hardware, such as small amount of RAM, no PCI or other hard +disk, etc., and just want to run Linux, the best option is to use the +``virt`` board. This is a platform which doesn't correspond to any +real hardware and is designed for use in virtual machines. You'll +need to compile Linux with a suitable configuration for running on +the ``virt`` board. ``virt`` supports PCI, virtio, recent CPUs and +large amounts of RAM. It also supports 64-bit CPUs. + +Board-specific documentation +---------------------------- + +Unfortunately many of the RISC-V boards QEMU supports are currently +undocumented; you can get a complete list by running +``qemu-system-riscv64 --machine help``, or +``qemu-system-riscv32 --machine help``. + +.. + This table of contents should be kept sorted alphabetically + by the title text of each file, which isn't the same ordering + as an alphabetical sort by filename. + +.. toctree:: + :maxdepth: 1 + + riscv/microchip-icicle-kit + riscv/shakti-c + riscv/sifive_u + riscv/virt + +RISC-V CPU firmware +------------------- + +When using the ``sifive_u`` or ``virt`` machine there are three different +firmware boot options: +1. ``-bios default`` - This is the default behaviour if no -bios option +is included. This option will load the default OpenSBI firmware automatically. +The firmware is included with the QEMU release and no user interaction is +required. All a user needs to do is specify the kernel they want to boot +with the -kernel option +2. ``-bios none`` - QEMU will not automatically load any firmware. It is up +to the user to load all the images they need. +3. ``-bios <file>`` - Tells QEMU to load the specified file as the firmware. diff --git a/docs/system/target-rx.rst b/docs/system/target-rx.rst new file mode 100644 index 000000000..4a20a89a0 --- /dev/null +++ b/docs/system/target-rx.rst @@ -0,0 +1,36 @@ +.. _RX-System-emulator: + +RX System emulator +-------------------- + +Use the executable ``qemu-system-rx`` to simulate RX target (GDB simulator). +This target emulated following devices. + +- R5F562N8 MCU + + - On-chip memory (ROM 512KB, RAM 96KB) + - Interrupt Control Unit (ICUa) + - 8Bit Timer x 1CH (TMR0,1) + - Compare Match Timer x 2CH (CMT0,1) + - Serial Communication Interface x 1CH (SCI0) + +- External memory 16MByte + +Example of ``qemu-system-rx`` usage for RX is shown below: + +Download ``<u-boot_image_file>`` from +https://osdn.net/users/ysato/pf/qemu/dl/u-boot.bin.gz + +Start emulation of rx-virt:: + qemu-system-rx -M gdbsim-r5f562n8 -bios <u-boot_image_file> + +Download ``kernel_image_file`` from +https://osdn.net/users/ysato/pf/qemu/dl/zImage + +Download ``device_tree_blob`` from +https://osdn.net/users/ysato/pf/qemu/dl/rx-virt.dtb + +Start emulation of rx-virt:: + qemu-system-rx -M gdbsim-r5f562n8 \ + -kernel <kernel_image_file> -dtb <device_tree_blob> \ + -append "earlycon" diff --git a/docs/system/target-s390x.rst b/docs/system/target-s390x.rst new file mode 100644 index 000000000..c636f6411 --- /dev/null +++ b/docs/system/target-s390x.rst @@ -0,0 +1,35 @@ +.. _s390x-System-emulator: + +s390x System emulator +--------------------- + +QEMU can emulate z/Architecture (in particular, 64 bit) s390x systems +via the ``qemu-system-s390x`` binary. Only one machine type, +``s390-ccw-virtio``, is supported (with versioning for compatibility +handling). + +When using KVM as accelerator, QEMU can emulate CPUs up to the generation +of the host. When using the default cpu model with TCG as accelerator, +QEMU will emulate a subset of z13 cpu features that should be enough to run +distributions built for the z13. + +Device support +============== + +QEMU will not emulate most of the traditional devices found under LPAR or +z/VM; virtio devices (especially using virtio-ccw) make up the bulk of +the available devices. Passthrough of host devices via vfio-pci, vfio-ccw, +or vfio-ap is also available. + +.. toctree:: + s390x/vfio-ap + s390x/css + s390x/3270 + s390x/vfio-ccw + +Architectural features +====================== + +.. toctree:: + s390x/bootdevices + s390x/protvirt diff --git a/docs/system/target-sparc.rst b/docs/system/target-sparc.rst new file mode 100644 index 000000000..b55f8d09e --- /dev/null +++ b/docs/system/target-sparc.rst @@ -0,0 +1,62 @@ +.. _Sparc32-System-emulator: + +Sparc32 System emulator +----------------------- + +Use the executable ``qemu-system-sparc`` to simulate the following Sun4m +architecture machines: + +- SPARCstation 4 + +- SPARCstation 5 + +- SPARCstation 10 + +- SPARCstation 20 + +- SPARCserver 600MP + +- SPARCstation LX + +- SPARCstation Voyager + +- SPARCclassic + +- SPARCbook + +The emulation is somewhat complete. SMP up to 16 CPUs is supported, but +Linux limits the number of usable CPUs to 4. + +QEMU emulates the following sun4m peripherals: + +- IOMMU + +- TCX or cgthree Frame buffer + +- Lance (Am7990) Ethernet + +- Non Volatile RAM M48T02/M48T08 + +- Slave I/O: timers, interrupt controllers, Zilog serial ports, + keyboard and power/reset logic + +- ESP SCSI controller with hard disk and CD-ROM support + +- Floppy drive (not on SS-600MP) + +- CS4231 sound device (only on SS-5, not working yet) + +The number of peripherals is fixed in the architecture. Maximum memory +size depends on the machine type, for SS-5 it is 256MB and for others +2047MB. + +Since version 0.8.2, QEMU uses OpenBIOS https://www.openbios.org/. +OpenBIOS is a free (GPL v2) portable firmware implementation. The goal +is to implement a 100% IEEE 1275-1994 (referred to as Open Firmware) +compliant firmware. + +A sample Linux 2.6 series kernel and ram disk image are available on the +QEMU web site. There are still issues with NetBSD and OpenBSD, but most +kernel versions work. Please note that currently older Solaris kernels +don't work probably due to interface issues between OpenBIOS and +Solaris. diff --git a/docs/system/target-sparc64.rst b/docs/system/target-sparc64.rst new file mode 100644 index 000000000..97e334b93 --- /dev/null +++ b/docs/system/target-sparc64.rst @@ -0,0 +1,37 @@ +.. _Sparc64-System-emulator: + +Sparc64 System emulator +----------------------- + +Use the executable ``qemu-system-sparc64`` to simulate a Sun4u +(UltraSPARC PC-like machine), Sun4v (T1 PC-like machine), or generic +Niagara (T1) machine. The Sun4u emulator is mostly complete, being able +to run Linux, NetBSD and OpenBSD in headless (-nographic) mode. The +Sun4v emulator is still a work in progress. + +The Niagara T1 emulator makes use of firmware and OS binaries supplied +in the S10image/ directory of the OpenSPARC T1 project +http://download.oracle.com/technetwork/systems/opensparc/OpenSPARCT1_Arch.1.5.tar.bz2 +and is able to boot the disk.s10hw2 Solaris image. + +:: + + qemu-system-sparc64 -M niagara -L /path-to/S10image/ \ + -nographic -m 256 \ + -drive if=pflash,readonly=on,file=/S10image/disk.s10hw2 + +QEMU emulates the following peripherals: + +- UltraSparc IIi APB PCI Bridge + +- PCI VGA compatible card with VESA Bochs Extensions + +- PS/2 mouse and keyboard + +- Non Volatile RAM M48T59 + +- PC-compatible serial ports + +- 2 PCI IDE interfaces with hard disk and CD-ROM support + +- Floppy disk diff --git a/docs/system/target-xtensa.rst b/docs/system/target-xtensa.rst new file mode 100644 index 000000000..8d703ad76 --- /dev/null +++ b/docs/system/target-xtensa.rst @@ -0,0 +1,27 @@ +.. _Xtensa-System-emulator: + +Xtensa System emulator +---------------------- + +Two executables cover simulation of both Xtensa endian options, +``qemu-system-xtensa`` and ``qemu-system-xtensaeb``. Two different +machine types are emulated: + +- Xtensa emulator pseudo board \"sim\" + +- Avnet LX60/LX110/LX200 board + +The sim pseudo board emulation provides an environment similar to one +provided by the proprietary Tensilica ISS. It supports: + +- A range of Xtensa CPUs, default is the DC232B + +- Console and filesystem access via semihosting calls + +The Avnet LX60/LX110/LX200 emulation supports: + +- A range of Xtensa CPUs, default is the DC232B + +- 16550 UART + +- OpenCores 10/100 Mbps Ethernet MAC diff --git a/docs/system/targets.rst b/docs/system/targets.rst new file mode 100644 index 000000000..9dcd95dd8 --- /dev/null +++ b/docs/system/targets.rst @@ -0,0 +1,30 @@ +.. _system-targets-ref: + +QEMU System Emulator Targets +============================ + +QEMU is a generic emulator and it emulates many machines. Most of the +options are similar for all machines. Specific information about the +various targets are mentioned in the following sections. + +Contents: + +.. + This table of contents should be kept sorted alphabetically + by the title text of each file, which isn't the same ordering + as an alphabetical sort by filename. + +.. toctree:: + + target-arm + target-avr + target-m68k + target-mips + target-ppc + target-riscv + target-rx + target-s390x + target-sparc + target-sparc64 + target-i386 + target-xtensa diff --git a/docs/system/tls.rst b/docs/system/tls.rst new file mode 100644 index 000000000..1a0467436 --- /dev/null +++ b/docs/system/tls.rst @@ -0,0 +1,328 @@ +.. _network_005ftls: + +TLS setup for network services +------------------------------ + +Almost all network services in QEMU have the ability to use TLS for +session data encryption, along with x509 certificates for simple client +authentication. What follows is a description of how to generate +certificates suitable for usage with QEMU, and applies to the VNC +server, character devices with the TCP backend, NBD server and client, +and migration server and client. + +At a high level, QEMU requires certificates and private keys to be +provided in PEM format. Aside from the core fields, the certificates +should include various extension data sets, including v3 basic +constraints data, key purpose, key usage and subject alt name. + +The GnuTLS package includes a command called ``certtool`` which can be +used to easily generate certificates and keys in the required format +with expected data present. Alternatively a certificate management +service may be used. + +At a minimum it is necessary to setup a certificate authority, and issue +certificates to each server. If using x509 certificates for +authentication, then each client will also need to be issued a +certificate. + +Assuming that the QEMU network services will only ever be exposed to +clients on a private intranet, there is no need to use a commercial +certificate authority to create certificates. A self-signed CA is +sufficient, and in fact likely to be more secure since it removes the +ability of malicious 3rd parties to trick the CA into mis-issuing certs +for impersonating your services. The only likely exception where a +commercial CA might be desirable is if enabling the VNC websockets +server and exposing it directly to remote browser clients. In such a +case it might be useful to use a commercial CA to avoid needing to +install custom CA certs in the web browsers. + +The recommendation is for the server to keep its certificates in either +``/etc/pki/qemu`` or for unprivileged users in ``$HOME/.pki/qemu``. + +.. _tls_005fgenerate_005fca: + +Setup the Certificate Authority +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This step only needs to be performed once per organization / +organizational unit. First the CA needs a private key. This key must be +kept VERY secret and secure. If this key is compromised the entire trust +chain of the certificates issued with it is lost. + +:: + + # certtool --generate-privkey > ca-key.pem + +To generate a self-signed certificate requires one core piece of +information, the name of the organization. A template file ``ca.info`` +should be populated with the desired data to avoid having to deal with +interactive prompts from certtool:: + + # cat > ca.info <<EOF + cn = Name of your organization + ca + cert_signing_key + EOF + # certtool --generate-self-signed \ + --load-privkey ca-key.pem \ + --template ca.info \ + --outfile ca-cert.pem + +The ``ca`` keyword in the template sets the v3 basic constraints +extension to indicate this certificate is for a CA, while +``cert_signing_key`` sets the key usage extension to indicate this will +be used for signing other keys. The generated ``ca-cert.pem`` file +should be copied to all servers and clients wishing to utilize TLS +support in the VNC server. The ``ca-key.pem`` must not be +disclosed/copied anywhere except the host responsible for issuing +certificates. + +.. _tls_005fgenerate_005fserver: + +Issuing server certificates +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Each server (or host) needs to be issued with a key and certificate. +When connecting the certificate is sent to the client which validates it +against the CA certificate. The core pieces of information for a server +certificate are the hostnames and/or IP addresses that will be used by +clients when connecting. The hostname / IP address that the client +specifies when connecting will be validated against the hostname(s) and +IP address(es) recorded in the server certificate, and if no match is +found the client will close the connection. + +Thus it is recommended that the server certificate include both the +fully qualified and unqualified hostnames. If the server will have +permanently assigned IP address(es), and clients are likely to use them +when connecting, they may also be included in the certificate. Both IPv4 +and IPv6 addresses are supported. Historically certificates only +included 1 hostname in the ``CN`` field, however, usage of this field +for validation is now deprecated. Instead modern TLS clients will +validate against the Subject Alt Name extension data, which allows for +multiple entries. In the future usage of the ``CN`` field may be +discontinued entirely, so providing SAN extension data is strongly +recommended. + +On the host holding the CA, create template files containing the +information for each server, and use it to issue server certificates. + +:: + + # cat > server-hostNNN.info <<EOF + organization = Name of your organization + cn = hostNNN.foo.example.com + dns_name = hostNNN + dns_name = hostNNN.foo.example.com + ip_address = 10.0.1.87 + ip_address = 192.8.0.92 + ip_address = 2620:0:cafe::87 + ip_address = 2001:24::92 + tls_www_server + encryption_key + signing_key + EOF + # certtool --generate-privkey > server-hostNNN-key.pem + # certtool --generate-certificate \ + --load-ca-certificate ca-cert.pem \ + --load-ca-privkey ca-key.pem \ + --load-privkey server-hostNNN-key.pem \ + --template server-hostNNN.info \ + --outfile server-hostNNN-cert.pem + +The ``dns_name`` and ``ip_address`` fields in the template are setting +the subject alt name extension data. The ``tls_www_server`` keyword is +the key purpose extension to indicate this certificate is intended for +usage in a web server. Although QEMU network services are not in fact +HTTP servers (except for VNC websockets), setting this key purpose is +still recommended. The ``encryption_key`` and ``signing_key`` keyword is +the key usage extension to indicate this certificate is intended for +usage in the data session. + +The ``server-hostNNN-key.pem`` and ``server-hostNNN-cert.pem`` files +should now be securely copied to the server for which they were +generated, and renamed to ``server-key.pem`` and ``server-cert.pem`` +when added to the ``/etc/pki/qemu`` directory on the target host. The +``server-key.pem`` file is security sensitive and should be kept +protected with file mode 0600 to prevent disclosure. + +.. _tls_005fgenerate_005fclient: + +Issuing client certificates +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The QEMU x509 TLS credential setup defaults to enabling client +verification using certificates, providing a simple authentication +mechanism. If this default is used, each client also needs to be issued +a certificate. The client certificate contains enough metadata to +uniquely identify the client with the scope of the certificate +authority. The client certificate would typically include fields for +organization, state, city, building, etc. + +Once again on the host holding the CA, create template files containing +the information for each client, and use it to issue client +certificates. + +:: + + # cat > client-hostNNN.info <<EOF + country = GB + state = London + locality = City Of London + organization = Name of your organization + cn = hostNNN.foo.example.com + tls_www_client + encryption_key + signing_key + EOF + # certtool --generate-privkey > client-hostNNN-key.pem + # certtool --generate-certificate \ + --load-ca-certificate ca-cert.pem \ + --load-ca-privkey ca-key.pem \ + --load-privkey client-hostNNN-key.pem \ + --template client-hostNNN.info \ + --outfile client-hostNNN-cert.pem + +The subject alt name extension data is not required for clients, so the +the ``dns_name`` and ``ip_address`` fields are not included. The +``tls_www_client`` keyword is the key purpose extension to indicate this +certificate is intended for usage in a web client. Although QEMU network +clients are not in fact HTTP clients, setting this key purpose is still +recommended. The ``encryption_key`` and ``signing_key`` keyword is the +key usage extension to indicate this certificate is intended for usage +in the data session. + +The ``client-hostNNN-key.pem`` and ``client-hostNNN-cert.pem`` files +should now be securely copied to the client for which they were +generated, and renamed to ``client-key.pem`` and ``client-cert.pem`` +when added to the ``/etc/pki/qemu`` directory on the target host. The +``client-key.pem`` file is security sensitive and should be kept +protected with file mode 0600 to prevent disclosure. + +If a single host is going to be using TLS in both a client and server +role, it is possible to create a single certificate to cover both roles. +This would be quite common for the migration and NBD services, where a +QEMU process will be started by accepting a TLS protected incoming +migration, and later itself be migrated out to another host. To generate +a single certificate, simply include the template data from both the +client and server instructions in one. + +:: + + # cat > both-hostNNN.info <<EOF + country = GB + state = London + locality = City Of London + organization = Name of your organization + cn = hostNNN.foo.example.com + dns_name = hostNNN + dns_name = hostNNN.foo.example.com + ip_address = 10.0.1.87 + ip_address = 192.8.0.92 + ip_address = 2620:0:cafe::87 + ip_address = 2001:24::92 + tls_www_server + tls_www_client + encryption_key + signing_key + EOF + # certtool --generate-privkey > both-hostNNN-key.pem + # certtool --generate-certificate \ + --load-ca-certificate ca-cert.pem \ + --load-ca-privkey ca-key.pem \ + --load-privkey both-hostNNN-key.pem \ + --template both-hostNNN.info \ + --outfile both-hostNNN-cert.pem + +When copying the PEM files to the target host, save them twice, once as +``server-cert.pem`` and ``server-key.pem``, and again as +``client-cert.pem`` and ``client-key.pem``. + +.. _tls_005fcreds_005fsetup: + +TLS x509 credential configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +QEMU has a standard mechanism for loading x509 credentials that will be +used for network services and clients. It requires specifying the +``tls-creds-x509`` class name to the ``--object`` command line argument +for the system emulators. Each set of credentials loaded should be given +a unique string identifier via the ``id`` parameter. A single set of TLS +credentials can be used for multiple network backends, so VNC, +migration, NBD, character devices can all share the same credentials. +Note, however, that credentials for use in a client endpoint must be +loaded separately from those used in a server endpoint. + +When specifying the object, the ``dir`` parameters specifies which +directory contains the credential files. This directory is expected to +contain files with the names mentioned previously, ``ca-cert.pem``, +``server-key.pem``, ``server-cert.pem``, ``client-key.pem`` and +``client-cert.pem`` as appropriate. It is also possible to include a set +of pre-generated Diffie-Hellman (DH) parameters in a file +``dh-params.pem``, which can be created using the +``certtool --generate-dh-params`` command. If omitted, QEMU will +dynamically generate DH parameters when loading the credentials. + +The ``endpoint`` parameter indicates whether the credentials will be +used for a network client or server, and determines which PEM files are +loaded. + +The ``verify`` parameter determines whether x509 certificate validation +should be performed. This defaults to enabled, meaning clients will +always validate the server hostname against the certificate subject alt +name fields and/or CN field. It also means that servers will request +that clients provide a certificate and validate them. Verification +should never be turned off for client endpoints, however, it may be +turned off for server endpoints if an alternative mechanism is used to +authenticate clients. For example, the VNC server can use SASL to +authenticate clients instead. + +To load server credentials with client certificate validation enabled + +.. parsed-literal:: + + |qemu_system| -object tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=server + +while to load client credentials use + +.. parsed-literal:: + + |qemu_system| -object tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=client + +Network services which support TLS will all have a ``tls-creds`` +parameter which expects the ID of the TLS credentials object. For +example with VNC: + +.. parsed-literal:: + + |qemu_system| -vnc 0.0.0.0:0,tls-creds=tls0 + +.. _tls_005fpsk: + +TLS Pre-Shared Keys (PSK) +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Instead of using certificates, you may also use TLS Pre-Shared Keys +(TLS-PSK). This can be simpler to set up than certificates but is less +scalable. + +Use the GnuTLS ``psktool`` program to generate a ``keys.psk`` file +containing one or more usernames and random keys:: + + mkdir -m 0700 /tmp/keys + psktool -u rich -p /tmp/keys/keys.psk + +TLS-enabled servers such as ``qemu-nbd`` can use this directory like so:: + + qemu-nbd \ + -t -x / \ + --object tls-creds-psk,id=tls0,endpoint=server,dir=/tmp/keys \ + --tls-creds tls0 \ + image.qcow2 + +When connecting from a qemu-based client you must specify the directory +containing ``keys.psk`` and an optional username (defaults to "qemu"):: + + qemu-img info \ + --object tls-creds-psk,id=tls0,dir=/tmp/keys,username=rich,endpoint=client \ + --image-opts \ + file.driver=nbd,file.host=localhost,file.port=10809,file.tls-creds=tls0,file.export=/ diff --git a/docs/system/virtio-net-failover.rst b/docs/system/virtio-net-failover.rst new file mode 100644 index 000000000..6002dc5d9 --- /dev/null +++ b/docs/system/virtio-net-failover.rst @@ -0,0 +1,68 @@ +====================================== +QEMU virtio-net standby (net_failover) +====================================== + +This document explains the setup and usage of virtio-net standby feature which +is used to create a net_failover pair of devices. + +The general idea is that we have a pair of devices, a (vfio-)pci and a +virtio-net device. Before migration the vfio device is unplugged and data flows +through the virtio-net device, on the target side another vfio-pci device is +plugged in to take over the data-path. In the guest the net_failover kernel +module will pair net devices with the same MAC address. + +The two devices are called primary and standby device. The fast hardware based +networking device is called the primary device and the virtio-net device is the +standby device. + +Restrictions +------------ + +Currently only PCIe devices are allowed as primary devices, this restriction +can be lifted in the future with enhanced QEMU support. Also, only networking +devices are allowed as primary device. The user needs to ensure that primary +and standby devices are not plugged into the same PCIe slot. + +Usecase +------- + + Virtio-net standby allows easy migration while using a passed-through fast + networking device by falling back to a virtio-net device for the duration of + the migration. It is like a simple version of a bond, the difference is that it + requires no configuration in the guest. When a guest is live-migrated to + another host QEMU will unplug the primary device via the PCIe based hotplug + handler and traffic will go through the virtio-net device. On the target + system the primary device will be automatically plugged back and the + net_failover module registers it again as the primary device. + +Usage +----- + + The primary device can be hotplugged or be part of the startup configuration + + -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc, \ + bus=root2,failover=on + + With the parameter failover=on the VIRTIO_NET_F_STANDBY feature will be enabled. + + -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id=net1 + + failover_pair_id references the id of the virtio-net standby device. This + is only for pairing the devices within QEMU. The guest kernel module + net_failover will match devices with identical MAC addresses. + +Hotplug +------- + + Both primary and standby device can be hotplugged via the QEMU monitor. Note + that if the virtio-net device is plugged first a warning will be issued that it + couldn't find the primary device. + +Migration +--------- + + A new migration state wait-unplug was added for this feature. If failover primary + devices are present in the configuration, migration will go into this state. + It will wait until the device unplug is completed in the guest and then move into + active state. On the target system the primary devices will be automatically hotplugged + when the feature bit was negotiated for the virtio-net standby device. diff --git a/docs/system/vnc-security.rst b/docs/system/vnc-security.rst new file mode 100644 index 000000000..4c1769eeb --- /dev/null +++ b/docs/system/vnc-security.rst @@ -0,0 +1,203 @@ +.. _VNC security: + +VNC security +------------ + +The VNC server capability provides access to the graphical console of +the guest VM across the network. This has a number of security +considerations depending on the deployment scenarios. + +.. _vnc_005fsec_005fnone: + +Without passwords +~~~~~~~~~~~~~~~~~ + +The simplest VNC server setup does not include any form of +authentication. For this setup it is recommended to restrict it to +listen on a UNIX domain socket only. For example + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] -vnc unix:/home/joebloggs/.qemu-myvm-vnc + +This ensures that only users on local box with read/write access to that +path can access the VNC server. To securely access the VNC server from a +remote machine, a combination of netcat+ssh can be used to provide a +secure tunnel. + +.. _vnc_005fsec_005fpassword: + +With passwords +~~~~~~~~~~~~~~ + +The VNC protocol has limited support for password based authentication. +Since the protocol limits passwords to 8 characters it should not be +considered to provide high security. The password can be fairly easily +brute-forced by a client making repeat connections. For this reason, a +VNC server using password authentication should be restricted to only +listen on the loopback interface or UNIX domain sockets. Password +authentication is not supported when operating in FIPS 140-2 compliance +mode as it requires the use of the DES cipher. Password authentication +is requested with the ``password`` option, and then once QEMU is running +the password is set with the monitor. Until the monitor is used to set +the password all clients will be rejected. + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] -vnc :1,password=on -monitor stdio + (qemu) change vnc password + Password: ******** + (qemu) + +.. _vnc_005fsec_005fcertificate: + +With x509 certificates +~~~~~~~~~~~~~~~~~~~~~~ + +The QEMU VNC server also implements the VeNCrypt extension allowing use +of TLS for encryption of the session, and x509 certificates for +authentication. The use of x509 certificates is strongly recommended, +because TLS on its own is susceptible to man-in-the-middle attacks. +Basic x509 certificate support provides a secure session, but no +authentication. This allows any client to connect, and provides an +encrypted session. + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] \ + -object tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=server,verify-peer=off \ + -vnc :1,tls-creds=tls0 -monitor stdio + +In the above example ``/etc/pki/qemu`` should contain at least three +files, ``ca-cert.pem``, ``server-cert.pem`` and ``server-key.pem``. +Unprivileged users will want to use a private directory, for example +``$HOME/.pki/qemu``. NB the ``server-key.pem`` file should be protected +with file mode 0600 to only be readable by the user owning it. + +.. _vnc_005fsec_005fcertificate_005fverify: + +With x509 certificates and client verification +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Certificates can also provide a means to authenticate the client +connecting. The server will request that the client provide a +certificate, which it will then validate against the CA certificate. +This is a good choice if deploying in an environment with a private +internal certificate authority. It uses the same syntax as previously, +but with ``verify-peer`` set to ``on`` instead. + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] \ + -object tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=server,verify-peer=on \ + -vnc :1,tls-creds=tls0 -monitor stdio + +.. _vnc_005fsec_005fcertificate_005fpw: + +With x509 certificates, client verification and passwords +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Finally, the previous method can be combined with VNC password +authentication to provide two layers of authentication for clients. + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] \ + -object tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=server,verify-peer=on \ + -vnc :1,tls-creds=tls0,password=on -monitor stdio + (qemu) change vnc password + Password: ******** + (qemu) + +.. _vnc_005fsec_005fsasl: + +With SASL authentication +~~~~~~~~~~~~~~~~~~~~~~~~ + +The SASL authentication method is a VNC extension, that provides an +easily extendable, pluggable authentication method. This allows for +integration with a wide range of authentication mechanisms, such as PAM, +GSSAPI/Kerberos, LDAP, SQL databases, one-time keys and more. The +strength of the authentication depends on the exact mechanism +configured. If the chosen mechanism also provides a SSF layer, then it +will encrypt the datastream as well. + +Refer to the later docs on how to choose the exact SASL mechanism used +for authentication, but assuming use of one supporting SSF, then QEMU +can be launched with: + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] -vnc :1,sasl=on -monitor stdio + +.. _vnc_005fsec_005fcertificate_005fsasl: + +With x509 certificates and SASL authentication +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If the desired SASL authentication mechanism does not supported SSF +layers, then it is strongly advised to run it in combination with TLS +and x509 certificates. This provides securely encrypted data stream, +avoiding risk of compromising of the security credentials. This can be +enabled, by combining the 'sasl' option with the aforementioned TLS + +x509 options: + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] \ + -object tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=server,verify-peer=on \ + -vnc :1,tls-creds=tls0,sasl=on -monitor stdio + +.. _vnc_005fsetup_005fsasl: + +Configuring SASL mechanisms +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following documentation assumes use of the Cyrus SASL implementation +on a Linux host, but the principles should apply to any other SASL +implementation or host. When SASL is enabled, the mechanism +configuration will be loaded from system default SASL service config +/etc/sasl2/qemu.conf. If running QEMU as an unprivileged user, an +environment variable SASL_CONF_PATH can be used to make it search +alternate locations for the service config file. + +If the TLS option is enabled for VNC, then it will provide session +encryption, otherwise the SASL mechanism will have to provide +encryption. In the latter case the list of possible plugins that can be +used is drastically reduced. In fact only the GSSAPI SASL mechanism +provides an acceptable level of security by modern standards. Previous +versions of QEMU referred to the DIGEST-MD5 mechanism, however, it has +multiple serious flaws described in detail in RFC 6331 and thus should +never be used any more. The SCRAM-SHA-256 mechanism provides a simple +username/password auth facility similar to DIGEST-MD5, but does not +support session encryption, so can only be used in combination with TLS. + +When not using TLS the recommended configuration is + +:: + + mech_list: gssapi + keytab: /etc/qemu/krb5.tab + +This says to use the 'GSSAPI' mechanism with the Kerberos v5 protocol, +with the server principal stored in /etc/qemu/krb5.tab. For this to work +the administrator of your KDC must generate a Kerberos principal for the +server, with a name of 'qemu/somehost.example.com@EXAMPLE.COM' replacing +'somehost.example.com' with the fully qualified host name of the machine +running QEMU, and 'EXAMPLE.COM' with the Kerberos Realm. + +When using TLS, if username+password authentication is desired, then a +reasonable configuration is + +:: + + mech_list: scram-sha-256 + sasldb_path: /etc/qemu/passwd.db + +The ``saslpasswd2`` program can be used to populate the ``passwd.db`` +file with accounts. Note that the ``passwd.db`` file stores passwords +in clear text. + +Other SASL configurations will be left as an exercise for the reader. +Note that all mechanisms, except GSSAPI, should be combined with use of +TLS to ensure a secure data channel. |