diff options
Diffstat (limited to 'roms/skiboot/doc/release-notes/skiboot-5.9-rc1.rst')
-rw-r--r-- | roms/skiboot/doc/release-notes/skiboot-5.9-rc1.rst | 529 |
1 files changed, 529 insertions, 0 deletions
diff --git a/roms/skiboot/doc/release-notes/skiboot-5.9-rc1.rst b/roms/skiboot/doc/release-notes/skiboot-5.9-rc1.rst new file mode 100644 index 000000000..7822015bb --- /dev/null +++ b/roms/skiboot/doc/release-notes/skiboot-5.9-rc1.rst @@ -0,0 +1,529 @@ +.. _skiboot-5.9-rc1: + +skiboot-5.9-rc1 +=============== + +skiboot v5.9-rc1 was released on Wednesday October 11th 2017. It is the first +release candidate of skiboot 5.9, which will become the new stable release +of skiboot following the 5.8 release, first released August 31st 2017. + +skiboot v5.9-rc1 contains all bug fixes as of :ref:`skiboot-5.4.7` +and :ref:`skiboot-5.1.21` (the currently maintained stable releases). We +do not currently expect to do any 5.8.x stable releases. + +For how the skiboot stable releases work, see :ref:`stable-rules` for details. + +The current plan is to cut the final 5.9 by October 17th, with skiboot 5.9 +being for all POWER8 and POWER9 platforms in op-build v1.20 (Due October 18th). +This release will be targetted to early POWER9 systems. + +Over skiboot-5.8, we have the following changes: + +New Features +------------ + +POWER8 +^^^^^^ +- fast-reset by default (if possible) + + Currently, this is limited to POWER8 systems. + + A normal reboot will, rather than doing a full IPL, go through a + fast reboot procedure. This reduces the "reboot to petitboot" time + from minutes to a handful of seconds. + +POWER9 +^^^^^^ +- POWER9 power management during boot + + Less power should be consumed during boot. +- OPAL_SIGNAL_SYSTEM_RESET for POWER9 + + This implements OPAL_SIGNAL_SYSTEM_RESET, using scom registers to + quiesce the target thread and raise a system reset exception on it. + It has been tested on DD2 with stop0 ESL=0 and ESL=1 shallow power + saving modes. + + DD1 is not implemented because it is sufficiently different as to + make support difficult. +- Enable deep idle states for POWER9 + + - SLW: Add support for p9_stop_api + + p9_stop_api's are used to set SPR state on a core wakeup form a deeper + low power state. p9_stop_api uses low level platform formware and + self-restore microcode to restore the sprs to requested values. + + Code is taken from : + https://github.com/open-power/hostboot/tree/master/src/import/chips/p9/procedures/utils/stopreg + - SLW: Removing timebase related flags for stop4 + + When a core enters stop4, it does not loose decrementer and time base. + Hence removing flags OPAL_PM_DEC_STOP and OPAL_PM_TIMEBASE_STOP. + - SLW: Allow deep states if homer address is known + + Use a common variable has_wakeup_engine instead of has_slw to tell if + the: + - SLW image is populated in case of power8 + - CME image is populated in case of power9 + + Currently we expect CME to be loaded if homer address is known ( except + for simulators) + - SLW: Configure self-restore for HRMOR + + Make a stop api call using libpore to restore HRMOR register. HRMOR needs + to be cleared so that when thread exits stop, they arrives at linux + system_reset vector (0x100). + - SLW: Add opal_slw_set_reg support for power9 + + This OPAL call is made from Linux to OPAL to configure values in + various SPRs after wakeup from a deep idle state. +- PHB4: CAPP recovery + + CAPP recovery is initiated when a CAPP Machine Check is detected. + The capp recovery procedure is initiated via a Hypervisor Maintenance + interrupt (HMI). + + CAPP Machine Check may arise from either an error that results in a PHB + freeze or from an internal CAPP error with CAPP checkstop FIR action. + An error that causes a PHB freeze will result in the link down signal + being asserted. The system continues running and the CAPP and PSL will + be re-initialized. + + This implements CAPP recovery for POWER9 systems +- Add ``wafer-location`` property for POWER9 + + Extract wafer-location from ECID and add property under xscom node. + - bits 64:71 are the chip x location (7:0) + - bits 72:79 are the chip y location (7:0) + + Sample output: :: + + [root@wsp xscom@623fc00000000]# lsprop ecid + ecid 019a00d4 03100718 852c0000 00fd7911 + [root@wsp xscom@623fc00000000]# lsprop wafer-location + wafer-location 00000085 0000002c +- Add ``wafer-id`` property for POWER9 + + Wafer id is derived from ECID data. + - bits 4:63 are the wafer id ( ten 6 bit fields each containing a code) + + Sample output: :: + + [root@wsp xscom@623fc00000000]# lsprop ecid + ecid 019a00d4 03100718 852c0000 00fd7911 + [root@wsp xscom@623fc00000000]# lsprop wafer-id + wafer-id "6Q0DG340SO" +- Add ``ecid`` property under ``xscom`` node for POWER9. + Sample output: :: + + [root@wsp xscom@623fc00000000]# lsprop ecid + ecid 019a00d4 03100718 852c0000 00fd7911 +- Add ibm,firmware-versions device tree node + + In P8, hostboot provides mini device tree. It contains ``/ibm,firmware-versions`` + node which has various firmware component version details. + + In P9, OPAL is building device tree. This patch adds support to parse VERSION + section of PNOR and create ``/ibm,firmware-versions`` device tree node. + + Sample output: :: + + /sys/firmware/devicetree/base/ibm,firmware-versions # lsprop . + occ "6a00709" + skiboot "v5.7-rc1-p344fb62" + buildroot "2017.02.2-7-g23118ce" + capp-ucode "9c73e9f" + petitboot "v1.4.3-p98b6d83" + sbe "02021c6" + open-power "witherspoon-v1.17-128-gf1b53c7-dirty" + .... + .... + +POWER9 +------ + +- Disable Transactional Memory on Power9 DD 2.1 + + Update pa_features_p9[] to disable TM (Transactional Memory). On DD 2.1 + TM is not usable by Linux without other workarounds, so skiboot must + disable it. +- xscom: Do not print error message for 'chiplet offline' return values + + xscom_read/write operations returns CHIPLET_OFFLINE when chiplet is offline. + Some multicast xscom_read/write requests from HBRT results in xscom operation + on offline chiplet(s) and printing below warnings in OPAL console: :: + + [ 135.036327572,3] XSCOM: Read failed, ret = -14 + [ 135.092689829,3] XSCOM: Read failed, ret = -14 + + Some SCOM users can deal correctly with this error code (notably opal-prd), + so the error message is (in practice) erroneous. +- IMC: Fix the core_imc_event_mask + + CORE_IMC_EVENT_MASK is a scom that contains bits to control event sampling for + different machine state for core imc. The current event-mask setting sample + events only on host kernel (hypervisor) and host userspace. + + Patch to enable the sampling of events in other machine states (like guest + kernel and guest userspace). +- IMC: Update the nest_pmus array with occ/gpe microcode uav updates + + OOC/gpe nest microcode maintains the list of individual nest units + supported. Sync the recent updates to the UAV with nest_pmus array. + + For reference occ/gpr microcode link for the UAV: + https://github.com/open-power/occ/blob/master/src/occ_gpe1/gpe1_24x7.h +- Parse IOSLOT information from HDAT + + Add structure definitions that describe the physical PCIe topology of + a system and parse them into the device-tree based PCIe slot + description. +- idle: user context state loss flags fix for stop states + + The "lite" stop variants with PSSCR[ESL]=PSSCR[EC]=1 do not lose user + context, while the non-lite variants do (ESL: enable state loss). + + Some of the POWER9 idle states had these wrong. + +CAPI +^^^^ +- POWER9 DD2 update + + The CAPI initialization sequence has been updated in DD2. + This patch adapts to the changes, retaining compatibility with DD1. + The patch includes some changes to DD1 fix-ups as well. +- Load CAPP microcode for POWER9 DD2.0 and DD2.1 +- capi: Mask Psl Credit timeout error for POWER9 + + Mask the PSL credit timeout error in CAPP FIR Mask register + bit(46). As per the h/w team this error is now deprecated and shouldn't + cause any fir-action for P9. + +NVLINK2 +^^^^^^^ + +A notabale change is that we now generate the device tree description of +NVLINK based on the HDAT we get from hostboot. Since Hostboot will generate +HDAT based on VPD, you now *MUST* have correct VPD programmed or we will +*default* to a Sequoia layout, which will lead to random problems if you +are not booting a Sequoia Witherspoon planar. In the case of booting with +old VPD and/or Hostboot, we print a **giant scary warning** in order to scare you. + +- npu2: Read slot label from the HDAT link node + + Binding GPU to emulated NPU PCI devices is done using the slot labels + since the NPU devices do not have a patching slot node we need to + copy the label in here. + +- npu2: Copy link speed from the npu HDAT node + + This needs to be in the PCI device node so the speed of the NVLink + can be passed to the GPU driver. +- npu2: hw-procedures: Add settings to PHY_RESET + + Set a few new values in the PHY_RESET procedure, as specified by our + updated programming guide documentation. +- Parse NVLink information from HDAT + + Add the per-chip structures that descibe how the A-Bus/NVLink/OpenCAPI + phy is configured. This generates the npu@xyz nodes for each chip on + systems that support it. +- npu2: Add vendor cap for IRQ testing + + Provide a way to test recoverable data link interrupts via a new + vendor capability byte. +- npu2: Enable recoverable data link (no-stall) interrupts + + Allow the NPU2 to trigger "recoverable data link" interrupts. + +- npu2: Implement basic FLR (Function Level Reset) +- npu2: hw-procedures: Update PHY DC calibration procedure +- npu2: hw-procedures: Change rx_pr_phase_step value + +XIVE +^^^^ +- xive: Fix opal_xive_dump_tm() to access W2 properly. + The HW only supported limited access sizes. +- xive: Make opal_xive_allocate_irq() properly try all chips + + When requested via OPAL_XIVE_ANY_CHIP, we need to try all + chips. We first try the current one (on which the caller + sits) and if that fails, we iterate all chips until the + allocation succeeds. +- xive: Fix initialization & cleanup of HW thread contexts + + Instead of trying to "pull" everything and clear VT (which didn't + work and caused some FIRs to be set), instead just clear and then + set the PTER thread enable bit. This has the side effect of + completely resetting the corresponding thread context. + + This fixes the spurrious XIVE FIRs reported by PRD and fircheck +- xive: Add debug option for detecting misrouted IPI in emulation + + This is high overhead so we don't enable it by default even + in debug builds, it's also a bit messy, but it allowed me to + detect and debug a locking issue earlier so it can be useful. +- xive: Increase the interrupt "gap" on debug builds + + We normally allocate IPIs from 0x10. Make that 0x1000 on debug + builds to limit the chances of overlapping with Linux interrupt + numbers which makes debugging code that confuses them easier. + + Also add a warning in emulation if we get an interrupt in the + queue whose number is below the gap. +- xive: Fix locking around cache scrub & watch + + Thankfully the missing locking only affects debug code and + init code that doesn't run concurrently. Also adds a DEBUG + option that checks the lock is properly held. +- xive: Workaround HW issue with scrub facility + + Without this, we sometimes don't observe from a CPU the + values written to the ENDs or NVTs via the cache watch. +- xive: Add exerciser for cache watch/scrub facility in DEBUG builds +- xive: Make assertion in xive_eq_for_target() more informative +- xive: Add debug code to check initial cache updates +- xive: Ensure pressure relief interrupts are disabled + + We don't use them and we hijack the VP field with their + configuration to store the EQ reference, so make sure the + kernel or guest can't turn them back on by doing MMIO + writes to ACK# +- xive: Don't try setting the reserved ACK# field in VPs + + That doesn't work, the HW doesn't implement it in the cache + watch facility anyway. +- xive: Remove useless memory barriers in VP/EQ inits + + We no longer update "live" memory structures, we use a temporary + copy on the stack and update the actual memory structure using + the cache watch, so those barriers are pointless. + +PHB4 +^^^^ +- phb4: Mask RXE_ARB: DEC Stage Valid Error + + Change the inits to mask out the RXE ARB: DEC Stage Valid Error (bit + 370. This has been a fatal error but should be informational only. + + This update will be in the next version of the phb4 workbook. +- phb4: Add additional adapter to retrain whitelist + + The single port version of the ConnectX-5 has a different device ID 0x1017. + Updated descriptions to match pciutils database. +- PHB4: Default to PCIe GEN3 on POWER9 DD2.00 + + You can use the NVRAM override for DD2.00 screened parts. +- phb4: Retrain link if degraded + + On P9 Scale Out (Nimbus) DD2.0 and Scale in (Cumulus) DD1.0 (and + below) the PCIe PHY can lockup causing training issues. This can cause + a degradation in speed or width in ~5% of training cases (depending on + the card). This is fixed in later chip revisions. This issue can also + cause PCIe links to not train at all, but this case is already + handled. + + This patch checks if the PCIe link has trained optimally and if not, + does a full PHB reset (to fix the PHY lockup) and retrain. + + One complication is some devices are known to train degraded unless + device specific configuration is performed. Because of this, we only + retrain when the device is in a whitelist. All devices in the current + whitelist have been testing on a P9DSU/Boston, ZZ and Witherspoon. + + We always gather information on the link and print it in the logs even + if the card is not in the whitelist. + + For testing purposes, there's an nvram to retry all PCIe cards and all + P9 chips when a degraded link is detected. The new option is + 'pci-retry-all=true' which can be set using: + `nvram -p ibm,skiboot --update-config pci-retry-all=true`. + This option may increase the boot time if used on a badly behaving + card. + + +IBM FSP platforms +----------------- + +- FSP/NVRAM: Handle "get vNVRAM statistics" command + + FSP sends MBOX command (cmd : 0xEB, subcmd : 0x05, mod : 0x00) to get vNVRAM + statistics. OPAL doesn't maintain any such statistics. Hence return + FSP_STATUS_INVALID_SUBCMD. + + Fixes these messages appearing in the OPAL log: :: + + [16944.384670488,3] FSP: Unhandled message eb0500 + [16944.474110465,3] FSP: Unhandled message eb0500 + [16945.111280784,3] FSP: Unhandled message eb0500 + [16945.293393485,3] FSP: Unhandled message eb0500 +- fsp: Move common prints to trace + + These two prints just end up filling the skiboot logs on any machine + that's been booted for more than a few hours. + + They have never been useful, so make them trace level. They were: :: + SURV: Received heartbeat acknowledge from FSP + SURV: Sending the heartbeat command to FSP + +BMC based systems +----------------- +- hw/lpc-uart: read from RBR to clear character timeout interrupts + + When using the aspeed SUART, we see a condition where the UART sends + continuous character timeout interrupts. This change adds a (heavily + commented) dummy read from the RBR to clear the interrupt condition on + init. + + This was observed on p9dsu systems, but likely applies to other systems + using the SUART. +- astbmc: Add methods for handing Device Tree based slots + e.g. ones from HDAT on POWER9. + +General +------- +- ipmi: Convert common debug prints to trace + + OPAL logs messages for every IPMI request from host. Sometime OPAL console + is filled with only these messages. This path is pretty stable now and + we have enough logs to cover bad path. Hence lets convert these debug + message to trace/info message. Examples are: :: + + [ 1356.423958816,7] opal_ipmi_recv(cmd: 0xf0 netfn: 0x3b resp_size: 0x02) + [ 1356.430774496,7] opal_ipmi_send(cmd: 0xf0 netfn: 0x3a len: 0x3b) + [ 1356.430797392,7] BT: seq 0x20 netfn 0x3a cmd 0xf0: Message sent to host + [ 1356.431668496,7] BT: seq 0x20 netfn 0x3a cmd 0xf0: IPMI MSG done +- libflash/file: Handle short read()s and write()s correctly + + Currently we don't move the buffer along for a short read() or write() + and nor do we request only the remaining amount. + +- hw/p8-i2c: Rework timeout handling + + Currently we treat a timeout as a hard failure and will automatically + fail any transations that hit their timeout. This results in + unnecessarily failing I2C requests if interrupts are dropped, etc. + Although these are bad things that we should log we can handle them + better by checking the actual hardware status and completing the + transation if there are no real errors. This patch reworks the timeout + handling to check the status and continue the transaction if it can. + if it can while logging an error if it detects a timeout due to a + dropped interrupt. +- core/flash: Only expect ELF header for BOOTKERNEL partition flash resource + + When loading a flash resource which isn't signed (secure and trusted + boot) and which doesn't have a subpartition, we assume it's the + BOOTKERNEL since previously this was the only such resource. Thus we + also assumed it had an ELF header which we parsed to get the size of the + partition rather than trusting the actual_size field in the FFS header. + A previous commit (9727fe3 DT: Add ibm,firmware-versions node) added the + version resource which isn't signed and also doesn't have a subpartition, + thus we expect it to have an ELF header. It doesn't so we print the + error message "FLASH: Invalid ELF header part VERSION". + + It is a fluke that this works currently since we load the secure boot + header unconditionally and this happen to be the same size as the + version partition. We also don't update the return code on error so + happen to return OPAL_SUCCESS. + + To make this explicitly correct; only check for an ELF header if we are + loading the BOOTKERNEL resource, otherwise use the partition size from + the FFS header. Also set the return code on error so we don't + erroneously return OPAL_SUCCESS. Add a check that the resource will fit + in the supplied buffer to prevent buffer overrun. +- flash: Support adding the no-erase property to flash + + The mbox protocol explicitly states that an erase is not required + before a write. This means that issuing an erase from userspace, + through the mtd device, and back returns a successful operation + that does nothing. Unfortunately, this makes userspace tools unhappy. + Linux MTD devices support the MTD_NO_ERASE flag which conveys that + writes do not require erases on the underlying flash devices. We + should set this property on all of our + devices which do not require erases to be performed. + + NOTE: This still requires a linux kernel component to set the + MTD_NO_ERASE flag from the device tree property. + +Utilities +--------- +- external/gard: Clear entire guard partition instead of entry by entry + + When using the current implementation of the gard tool to ecc clear the + entire GUARD partition it is done one gard record at a time. While this + may be ok when accessing the actual flash this is very slow when done + from the host over the mbox protocol (on the order of 4 minutes) because + the bmc side is required to do many read, erase, writes under the hood. + + Fix this by rewriting the gard tool reset_partition() function. Now we + allocate all the erased guard entries and (if required) apply ecc to the + entire buffer. Then we can do one big erase and write of the entire + partition. This reduces the time to clear the guard partition to on the + order of 4 seconds. +- opal-prd: Fix opal-prd command line options + + HBRT OCC reset interface depends on service processor type. + + - FSP: reset_pm_complex() + - BMC: process_occ_reset() + + We have both `occ` and `pm-complex` command line interfaces. + This patch adds support to dispaly appropriate message depending + on system type. + + === ==================== ============================ + SP Command Action + === ==================== ============================ + FSP opal-prd occ display error message + FSP opal-prd pm-complex Call pm_complex_reset() + BMC opal-prd occ Call process_occ_reset() + BMC opal-prd pm-complex display error message + === ==================== ============================ + +- opal-prd: detect service processor type and + then make appropriate occ reset call. +- pflash: Fix erase command for unaligned start address + + The erase_range() function handles erasing the flash for a given start + address and length, and can handle an unaligned start address and + length. However in the unaligned start address case we are incorrectly + calculating the remaining size which can lead to incomplete erases. + + If we're going to update the remaining size based on what the start + address was then we probably want to do that before we overide the + origin start address. So rearrange the code so that this is indeed the + case. +- external/gard: Print an error if run on an FSP system + +Simulators +---------- + +- mambo: Add mambo socket program + + This adds a program that can be run inside a mambo simulator in linux + userspace which enables TCP sockets to be proxied in and out of the + simulator to the host. + + Unlike mambo bogusnet, it's requires no linux or skiboot specific + drivers/infrastructure to run. + + Run inside the simulator: + + - to forward host ssh connections to sim ssh server: + ``./mambo-socket-proxy -h 10022 -s 22``, then connect to port 10022 + on your host with ``ssh -p 10022 localhost`` + - to allow http proxy access from inside the sim to local http proxy: + ``./mambo-socket-proxy -b proxy.mynetwork -h 3128 -s 3128`` + + Multiple connections are supported. +- idle: disable stop*_lite POWER9 idle states for Mambo platform + + Mambo prior to Mambo.7.8.21 had a bug where the stop idle instruction + with PSSCR[ESL]=PSSCR[EC]=0 would resume with MSR set as though it had + taken a system reset interrupt. + + Linux currently executes this instruction with MSR already set that + way, so the problem went unnoticed. A proposed patch to Linux changes + that, and causes the idle code to crash. Work around this by disabling + lite stop states for the mambo platform for now. |