aboutsummaryrefslogtreecommitdiffstats
path: root/docs/specs/ppc-spapr-xive.rst
diff options
context:
space:
mode:
authorTimos Ampelikiotis <t.ampelikiotis@virtualopensystems.com>2023-10-10 11:40:56 +0000
committerTimos Ampelikiotis <t.ampelikiotis@virtualopensystems.com>2023-10-10 11:40:56 +0000
commite02cda008591317b1625707ff8e115a4841aa889 (patch)
treeaee302e3cf8b59ec2d32ec481be3d1afddfc8968 /docs/specs/ppc-spapr-xive.rst
parentcc668e6b7e0ffd8c9d130513d12053cf5eda1d3b (diff)
Introduce Virtio-loopback epsilon release:
Epsilon release introduces a new compatibility layer which make virtio-loopback design to work with QEMU and rust-vmm vhost-user backend without require any changes. Signed-off-by: Timos Ampelikiotis <t.ampelikiotis@virtualopensystems.com> Change-Id: I52e57563e08a7d0bdc002f8e928ee61ba0c53dd9
Diffstat (limited to 'docs/specs/ppc-spapr-xive.rst')
-rw-r--r--docs/specs/ppc-spapr-xive.rst282
1 files changed, 282 insertions, 0 deletions
diff --git a/docs/specs/ppc-spapr-xive.rst b/docs/specs/ppc-spapr-xive.rst
new file mode 100644
index 000000000..f47f739e0
--- /dev/null
+++ b/docs/specs/ppc-spapr-xive.rst
@@ -0,0 +1,282 @@
+XIVE for sPAPR (pseries machines)
+=================================
+
+The POWER9 processor comes with a new interrupt controller
+architecture, called XIVE as "eXternal Interrupt Virtualization
+Engine". It supports a larger number of interrupt sources and offers
+virtualization features which enables the HW to deliver interrupts
+directly to virtual processors without hypervisor assistance.
+
+A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9
+processors can run under two interrupt modes:
+
+- *Legacy Compatibility Mode*
+
+ the hypervisor provides identical interfaces and similar
+ functionality to PAPR+ Version 2.7. This is the default mode
+
+ It is also referred as *XICS* in QEMU.
+
+- *XIVE native exploitation mode*
+
+ the hypervisor provides new interfaces to manage the XIVE control
+ structures, and provides direct control for interrupt management
+ through MMIO pages.
+
+Which interrupt modes can be used by the machine is negotiated with
+the guest O/S during the Client Architecture Support negotiation
+sequence. The two modes are mutually exclusive.
+
+Both interrupt mode share the same IRQ number space. See below for the
+layout.
+
+CAS Negotiation
+---------------
+
+QEMU advertises the supported interrupt modes in the device tree
+property ``ibm,arch-vec-5-platform-support`` in byte 23 and the OS
+Selection for XIVE is indicated in the ``ibm,architecture-vec-5``
+property byte 23.
+
+The interrupt modes supported by the machine depend on the CPU type
+(POWER9 is required for XIVE) but also on the machine property
+``ic-mode`` which can be set on the command line. It can take the
+following values: ``xics``, ``xive``, and ``dual`` which is the
+default mode. ``dual`` means that both modes XICS **and** XIVE are
+supported and if the guest OS supports XIVE, this mode will be
+selected.
+
+The chosen interrupt mode is activated after a reconfiguration done
+in a machine reset.
+
+KVM negotiation
+---------------
+
+When the guest starts under KVM, the capabilities of the host kernel
+and QEMU are also negotiated. Depending on the version of the host
+kernel, KVM will advertise the XIVE capability to QEMU or not.
+
+Nevertheless, the available interrupt modes in the machine should not
+depend on the XIVE KVM capability of the host. On older kernels
+without XIVE KVM support, QEMU will use the emulated XIVE device as a
+fallback and on newer kernels (>=5.2), the KVM XIVE device.
+
+XIVE native exploitation mode is not supported for KVM nested guests,
+VMs running under a L1 hypervisor (KVM on pSeries). In that case, the
+hypervisor will not advertise the KVM capability and QEMU will use the
+emulated XIVE device, same as for older versions of KVM.
+
+As a final refinement, the user can also switch the use of the KVM
+device with the machine option ``kernel_irqchip``.
+
+
+XIVE support in KVM
+~~~~~~~~~~~~~~~~~~~
+
+For guest OSes supporting XIVE, the resulting interrupt modes on host
+kernels with XIVE KVM support are the following:
+
+============== ============= ============= ================
+ic-mode kernel_irqchip
+-------------- ----------------------------------------------
+/ allowed off on
+ (default)
+============== ============= ============= ================
+dual (default) XIVE KVM XIVE emul. XIVE KVM
+xive XIVE KVM XIVE emul. XIVE KVM
+xics XICS KVM XICS emul. XICS KVM
+============== ============= ============= ================
+
+For legacy guest OSes without XIVE support, the resulting interrupt
+modes are the following:
+
+============== ============= ============= ================
+ic-mode kernel_irqchip
+-------------- ----------------------------------------------
+/ allowed off on
+ (default)
+============== ============= ============= ================
+dual (default) XICS KVM XICS emul. XICS KVM
+xive QEMU error(3) QEMU error(3) QEMU error(3)
+xics XICS KVM XICS emul. XICS KVM
+============== ============= ============= ================
+
+(3) QEMU fails at CAS with ``Guest requested unavailable interrupt
+ mode (XICS), either don't set the ic-mode machine property or try
+ ic-mode=xics or ic-mode=dual``
+
+
+No XIVE support in KVM
+~~~~~~~~~~~~~~~~~~~~~~
+
+For guest OSes supporting XIVE, the resulting interrupt modes on host
+kernels without XIVE KVM support are the following:
+
+============== ============= ============= ================
+ic-mode kernel_irqchip
+-------------- ----------------------------------------------
+/ allowed off on
+ (default)
+============== ============= ============= ================
+dual (default) XIVE emul.(1) XIVE emul. QEMU error (2)
+xive XIVE emul.(1) XIVE emul. QEMU error (2)
+xics XICS KVM XICS emul. XICS KVM
+============== ============= ============= ================
+
+
+(1) QEMU warns with ``warning: kernel_irqchip requested but unavailable:
+ IRQ_XIVE capability must be present for KVM``
+ In some cases (old host kernels or KVM nested guests), one may hit a
+ QEMU/KVM incompatibility due to device destruction in reset. QEMU fails
+ with ``KVM is incompatible with ic-mode=dual,kernel-irqchip=on``
+(2) QEMU fails with ``kernel_irqchip requested but unavailable:
+ IRQ_XIVE capability must be present for KVM``
+
+
+For legacy guest OSes without XIVE support, the resulting interrupt
+modes are the following:
+
+============== ============= ============= ================
+ic-mode kernel_irqchip
+-------------- ----------------------------------------------
+/ allowed off on
+ (default)
+============== ============= ============= ================
+dual (default) QEMU error(4) XICS emul. QEMU error(4)
+xive QEMU error(3) QEMU error(3) QEMU error(3)
+xics XICS KVM XICS emul. XICS KVM
+============== ============= ============= ================
+
+(3) QEMU fails at CAS with ``Guest requested unavailable interrupt
+ mode (XICS), either don't set the ic-mode machine property or try
+ ic-mode=xics or ic-mode=dual``
+(4) QEMU/KVM incompatibility due to device destruction in reset. QEMU fails
+ with ``KVM is incompatible with ic-mode=dual,kernel-irqchip=on``
+
+
+XIVE Device tree properties
+---------------------------
+
+The properties for the PAPR interrupt controller node when the *XIVE
+native exploitation mode* is selected should contain:
+
+- ``device_type``
+
+ value should be "power-ivpe".
+
+- ``compatible``
+
+ value should be "ibm,power-ivpe".
+
+- ``reg``
+
+ contains the base address and size of the thread interrupt
+ managnement areas (TIMA), for the User level and for the Guest OS
+ level. Only the Guest OS level is taken into account today.
+
+- ``ibm,xive-eq-sizes``
+
+ the size of the event queues. One cell per size supported, contains
+ log2 of size, in ascending order.
+
+- ``ibm,xive-lisn-ranges``
+
+ the IRQ interrupt number ranges assigned to the guest for the IPIs.
+
+The root node also exports :
+
+- ``ibm,plat-res-int-priorities``
+
+ contains a list of priorities that the hypervisor has reserved for
+ its own use.
+
+IRQ number space
+----------------
+
+IRQ Number space of the ``pseries`` machine is 8K wide and is the same
+for both interrupt mode. The different ranges are defined as follow :
+
+- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE)
+- ``0x1000 .. 0x1000`` 1 EPOW
+- ``0x1001 .. 0x1001`` 1 HOTPLUG
+- ``0x1002 .. 0x10FF`` unused
+- ``0x1100 .. 0x11FF`` 256 VIO devices
+- ``0x1200 .. 0x127F`` 32x4 LSIs for PHB devices
+- ``0x1280 .. 0x12FF`` unused
+- ``0x1300 .. 0x1FFF`` PHB MSIs (dynamically allocated)
+
+Monitoring XIVE
+---------------
+
+The state of the XIVE interrupt controller can be queried through the
+monitor commands ``info pic``. The output comes in two parts.
+
+First, the state of the thread interrupt context registers is dumped
+for each CPU :
+
+::
+
+ (qemu) info pic
+ CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2
+ CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000
+ CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400
+ CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000
+ CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000
+ ...
+
+In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only
+the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM
+line which is set to the VP identifier.
+
+Then comes the routing information which aggregates the EAS and the
+END configuration:
+
+::
+
+ ...
+ LISN PQ EISN CPU/PRIO EQ
+ 00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
+ 00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
+ 00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
+ 00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ]
+ 00000004 MSI -Q M 00000000
+ 00000005 MSI -Q M 00000000
+ 00000006 MSI -Q M 00000000
+ 00000007 MSI -Q M 00000000
+ 00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
+ 00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
+ 00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
+ 00001101 MSI -Q M 00000000
+ 00001200 LSI -Q M 00000000
+ 00001201 LSI -Q M 00000000
+ 00001202 LSI -Q M 00000000
+ 00001203 LSI -Q M 00000000
+ 00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
+ 00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
+ 00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ]
+
+The source information and configuration:
+
+- The ``LISN`` column outputs the interrupt number of the source in
+ range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI``
+- The ``PQ`` column reflects the state of the PQ bits of the source :
+
+ - ``--`` source is ready to take events
+ - ``P-`` an event was sent and an EOI is PENDING
+ - ``PQ`` an event was QUEUED
+ - ``-Q`` source is OFF
+
+ a ``M`` indicates that source is *MASKED* at the EAS level,
+
+The targeting configuration :
+
+- The ``EISN`` column is the event data that will be queued in the event
+ queue of the O/S.
+- The ``CPU/PRIO`` column is the tuple defining the CPU number and
+ priority queue serving the source.
+- The ``EQ`` column outputs :
+
+ - the current index of the event queue/ the max number of entries
+ - the O/S event queue address
+ - the toggle bit
+ - the last entries that were pushed in the event queue.