Message ID | 20190809145833.1020-1-jgross@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | xen: add core scheduling support | expand |
Hi Juergen, The latest round of testing revealed the following 3 Xen crashes: 1. vcpu_sleep_sync() <-- vlapic_init_sipi_action() This was seen multiple times. It tends to happen on large Windows Server VMs (>= 12 vCPUs). https://paste.debian.net/1095844/ 2. vcpu_sleep_sync() <-- vcpu_set_periodic_timer() <-- pv_hypercall() This was seen on Sles 11.4 PV VM with 32 vCPUs and 6GB RAM https://paste.debian.net/1095843/ 3. ACPI S5: schedule.c#sched_slave() This one was during shutdown again. https://paste.debian.net/1095845/ There also were a number of Dom0 soft-lockups (together with RCU stalls) but unfortunately our crashdump tool was a bit broken so full Dom0 kernel logs were not captured this time. And CentOS 5.11 suspend regression (as reported previously) is still present. Thanks, Sergey
Hi Sergey, On 15.08.19 12:17, Sergey Dyasli wrote: > Hi Juergen, > > The latest round of testing revealed the following 3 Xen crashes: > > 1. vcpu_sleep_sync() <-- vlapic_init_sipi_action() > This was seen multiple times. It tends to happen on large Windows Server > VMs (>= 12 vCPUs). > https://paste.debian.net/1095844/ > > 2. vcpu_sleep_sync() <-- vcpu_set_periodic_timer() <-- pv_hypercall() > This was seen on Sles 11.4 PV VM with 32 vCPUs and 6GB RAM > https://paste.debian.net/1095843/ > > 3. ACPI S5: schedule.c#sched_slave() > This one was during shutdown again. > https://paste.debian.net/1095845/ > > There also were a number of Dom0 soft-lockups (together with RCU stalls) > but unfortunately our crashdump tool was a bit broken so full Dom0 kernel > logs were not captured this time. > > And CentOS 5.11 suspend regression (as reported previously) is still > present. I have found one bug in the v->is_running setting, which might be related to your observed hangs, but I couldn't verify that. I have rebased my series with the is_running fix to current staging. You can find that under git://github.com/jgross1/xen/ sched-v2b-2019-09-05 This branch includes the needed prerequisite series ("xen/sched: use new idle scheduler for free cpus"), and some additional debug enhancements I'd like you to use: - my series for debugtrace enhancements - my series for lock-profiling enhancements - and as the last patch some debugtrace hooks in the scheduler Please make sure to have CONFIG_DEBUG_TRACE and CONFIG_DEBUG_LOCK_PROFILE set when building the hypervisor. When testing please add: async-show-all=on watchdog=on debugtrace=cpu:16 to the hypervisor boot parameters. This should add a lot of additional output in case of a crash, which will (hopefully) help to find the reason(s) for hangs and crashes. Juergen