Message ID | 5761463502000078000F5355@prv-mh.provo.novell.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Wednesday, June 15, 2016, 12:12:37 PM, you wrote: >>>> On 15.06.16 at 11:38, <linux@eikelenboom.it> wrote: >> Wednesday, June 15, 2016, 10:57:03 AM, you wrote: >> >>> Wednesday, June 15, 2016, 10:29:37 AM, you wrote: >> >>>>>>> On 15.06.16 at 01:49, <linux@eikelenboom.it> wrote: >>>>> Just tested latest xen-unstable 4.8 (xen_changeset git:d337764), >>>>> but one of the latest commits seems to have broken boot of HVM guests >>>>> (using qemu-xen) previous build with xen_changeset git:6e908ee worked >>>>> fine. >> >>>> Primary suspects would seem to be 67fc274bbe and bfa84968b2, >>>> but (obviously) I didn't see any issues with them in my own >>>> testing, so could you >>>> - instead of doing a full bisect, revert just those two >> >>> Will give reverting that a shot. >> >> Reverting bfa84968b2 is sufficient. > Could you give this wild guess a try on top of the tree without the > revert? > --- unstable.orig/xen/arch/x86/hvm/emulate.c > +++ unstable/xen/arch/x86/hvm/emulate.c > @@ -1180,7 +1180,7 @@ static int hvmemul_rep_movs( > pfec |= PFEC_user_mode; > > bytes = PAGE_SIZE - (saddr & ~PAGE_MASK); - if ( vio->>mmio_access.read_access && + if ( vio->>mmio_access.read_access && !vio->mmio_access.write_access && > (vio->mmio_gla == (saddr & PAGE_MASK)) && > bytes >= bytes_per_rep ) > { Unfortunately still crashes. -- Sander >>>> And then of course this domain_crash() could of course be >>>> accompanied by some helpful printk() ... >> >> Do you have a debug patch of what you are interested in ? > Not yet - basically we should log all of the variables involved in the > condition leading to the domain_crash(). > Jan
>>> On 15.06.16 at 14:00, <linux@eikelenboom.it> wrote: > Wednesday, June 15, 2016, 12:12:37 PM, you wrote: >>>>> On 15.06.16 at 11:38, <linux@eikelenboom.it> wrote: >>> Wednesday, June 15, 2016, 10:57:03 AM, you wrote: >>>> Wednesday, June 15, 2016, 10:29:37 AM, you wrote: >>>>>>>> On 15.06.16 at 01:49, <linux@eikelenboom.it> wrote: >>>>>> Just tested latest xen-unstable 4.8 (xen_changeset git:d337764), >>>>>> but one of the latest commits seems to have broken boot of HVM guests >>>>>> (using qemu-xen) previous build with xen_changeset git:6e908ee worked >>>>>> fine. >>> >>>>> Primary suspects would seem to be 67fc274bbe and bfa84968b2, >>>>> but (obviously) I didn't see any issues with them in my own >>>>> testing, so could you >>>>> - instead of doing a full bisect, revert just those two >>> >>>> Will give reverting that a shot. >>> >>> Reverting bfa84968b2 is sufficient. > >> Could you give this wild guess a try on top of the tree without the >> revert? > >> --- unstable.orig/xen/arch/x86/hvm/emulate.c >> +++ unstable/xen/arch/x86/hvm/emulate.c >> @@ -1180,7 +1180,7 @@ static int hvmemul_rep_movs( >> pfec |= PFEC_user_mode; >> >> bytes = PAGE_SIZE - (saddr & ~PAGE_MASK); > - if ( vio->>mmio_access.read_access && > + if ( vio->>mmio_access.read_access && !vio->mmio_access.write_access && >> (vio->mmio_gla == (saddr & PAGE_MASK)) && >> bytes >= bytes_per_rep ) >> { > > Unfortunately still crashes. Thanks for trying. Which basically just leaves the p.count > *reps part in that domain_crash() condition, as that's the only other thing involved in that check which said commit could have an effect on (as far as I can tell at least). Would you be up for another experiment, removing that one line? Other things to try (just to understand the issue) would be to - revert only each half of said commit individually (the two hunks really are independent), - remove just the two latch_linear_to_phys() calls. Apart from that, and just to see whether there are other differences between your guest(s) and mine, could you post a guest config from one that's affected? Jan
Wednesday, June 15, 2016, 2:48:55 PM, you wrote: >>>> On 15.06.16 at 14:00, <linux@eikelenboom.it> wrote: >> Wednesday, June 15, 2016, 12:12:37 PM, you wrote: >>>>>> On 15.06.16 at 11:38, <linux@eikelenboom.it> wrote: >>>> Wednesday, June 15, 2016, 10:57:03 AM, you wrote: >>>>> Wednesday, June 15, 2016, 10:29:37 AM, you wrote: >>>>>>>>> On 15.06.16 at 01:49, <linux@eikelenboom.it> wrote: >>>>>>> Just tested latest xen-unstable 4.8 (xen_changeset git:d337764), >>>>>>> but one of the latest commits seems to have broken boot of HVM guests >>>>>>> (using qemu-xen) previous build with xen_changeset git:6e908ee worked >>>>>>> fine. >>>> >>>>>> Primary suspects would seem to be 67fc274bbe and bfa84968b2, >>>>>> but (obviously) I didn't see any issues with them in my own >>>>>> testing, so could you >>>>>> - instead of doing a full bisect, revert just those two >>>> >>>>> Will give reverting that a shot. >>>> >>>> Reverting bfa84968b2 is sufficient. >> >>> Could you give this wild guess a try on top of the tree without the >>> revert? >> >>> --- unstable.orig/xen/arch/x86/hvm/emulate.c >>> +++ unstable/xen/arch/x86/hvm/emulate.c >>> @@ -1180,7 +1180,7 @@ static int hvmemul_rep_movs( >>> pfec |= PFEC_user_mode; >>> >>> bytes = PAGE_SIZE - (saddr & ~PAGE_MASK); >> - if ( vio->>mmio_access.read_access && >> + if ( vio->>mmio_access.read_access && !vio->mmio_access.write_access && >>> (vio->mmio_gla == (saddr & PAGE_MASK)) && >>> bytes >= bytes_per_rep ) >>> { >> >> Unfortunately still crashes. > Thanks for trying. Which basically just leaves the p.count > *reps > part in that domain_crash() condition, as that's the only other thing > involved in that check which said commit could have an effect on (as > far as I can tell at least). Would you be up for another experiment, > removing that one line? Other things to try (just to understand the > issue) would be to > - revert only each half of said commit individually (the two hunks > really are independent), > - remove just the two latch_linear_to_phys() calls. Will try some of that and let you know. > Apart from that, and just to see whether there are other differences > between your guest(s) and mine, could you post a guest config from > one that's affected? Hope you are not too disappointed it's rather sparse: builder='hvm' device_model_version = 'qemu-xen' device_model_user = 'root' memory = 512 name = 'test_guest' vcpus = 4 cpu_weight = 768 vif = [ 'bridge=xen_bridge, ip=192.168.1.15, mac=00:16:3E:C4:72:83, model=e1000' ] disk = [ 'phy:/dev/xen_vms/test_guest1,hda,w', 'phy:/dev/xen_vms/test_guest2,hdb,w' ] on_crash = 'preserve' boot='c' vnc=0 serial='pty' Both dom0 and guest run Debian Jessie, as said platform is AMD, running a 4.7-rc3ish kernel. > Jan
>>> On 15.06.16 at 15:58, <linux@eikelenboom.it> wrote: > Wednesday, June 15, 2016, 2:48:55 PM, you wrote: >> Apart from that, and just to see whether there are other differences >> between your guest(s) and mine, could you post a guest config from >> one that's affected? > > Hope you are not too disappointed it's rather sparse: In no way. > builder='hvm' > device_model_version = 'qemu-xen' > device_model_user = 'root' > memory = 512 > name = 'test_guest' > vcpus = 4 > cpu_weight = 768 > vif = [ 'bridge=xen_bridge, ip=192.168.1.15, mac=00:16:3E:C4:72:83, > model=e1000' ] > disk = [ 'phy:/dev/xen_vms/test_guest1,hda,w', > 'phy:/dev/xen_vms/test_guest2,hdb,w' ] > on_crash = 'preserve' > boot='c' > vnc=0 > serial='pty' I wonder whether mine having stdvga=0 matters. Albeit a quick test passing stdvga=1 works here. And I don't think the vnc= setting should have an effect here. Jan
On 06/15/2016 10:07 AM, Jan Beulich wrote: >>>> On 15.06.16 at 15:58, <linux@eikelenboom.it> wrote: >> Wednesday, June 15, 2016, 2:48:55 PM, you wrote: >>> Apart from that, and just to see whether there are other differences >>> between your guest(s) and mine, could you post a guest config from >>> one that's affected? >> Hope you are not too disappointed it's rather sparse: > In no way. > >> builder='hvm' >> device_model_version = 'qemu-xen' >> device_model_user = 'root' >> memory = 512 >> name = 'test_guest' >> vcpus = 4 >> cpu_weight = 768 >> vif = [ 'bridge=xen_bridge, ip=192.168.1.15, mac=00:16:3E:C4:72:83, >> model=e1000' ] >> disk = [ 'phy:/dev/xen_vms/test_guest1,hda,w', >> 'phy:/dev/xen_vms/test_guest2,hdb,w' ] >> on_crash = 'preserve' >> boot='c' >> vnc=0 >> serial='pty' > I wonder whether mine having > > stdvga=0 > > matters. Albeit a quick test passing stdvga=1 works here. And I > don't think the vnc= setting should have an effect here. Our nightly picked up this crash as well on an AMD box (Intel passed). I believe this is due to + if ( *reps * bytes_per_rep > bytes ) + *reps = bytes / bytes_per_rep; in hvmemul_rep_stos() and then, as you pointed out in another message, we fail p.count > *reps comparison. -boris -boris in hvmemul_rep_stos.
>>> On 15.06.16 at 16:20, <boris.ostrovsky@oracle.com> wrote: > On 06/15/2016 10:07 AM, Jan Beulich wrote: >>>>> On 15.06.16 at 15:58, <linux@eikelenboom.it> wrote: >>> Wednesday, June 15, 2016, 2:48:55 PM, you wrote: >>>> Apart from that, and just to see whether there are other differences >>>> between your guest(s) and mine, could you post a guest config from >>>> one that's affected? >>> Hope you are not too disappointed it's rather sparse: >> In no way. >> >>> builder='hvm' >>> device_model_version = 'qemu-xen' >>> device_model_user = 'root' >>> memory = 512 >>> name = 'test_guest' >>> vcpus = 4 >>> cpu_weight = 768 >>> vif = [ 'bridge=xen_bridge, ip=192.168.1.15, mac=00:16:3E:C4:72:83, >>> model=e1000' ] >>> disk = [ 'phy:/dev/xen_vms/test_guest1,hda,w', >>> 'phy:/dev/xen_vms/test_guest2,hdb,w' ] >>> on_crash = 'preserve' >>> boot='c' >>> vnc=0 >>> serial='pty' >> I wonder whether mine having >> >> stdvga=0 >> >> matters. Albeit a quick test passing stdvga=1 works here. And I >> don't think the vnc= setting should have an effect here. > > Our nightly picked up this crash as well on an AMD box (Intel passed). > > I believe this is due to > > + if ( *reps * bytes_per_rep > bytes ) > + *reps = bytes / bytes_per_rep; > > in hvmemul_rep_stos() and then, as you pointed out in another message, > we fail p.count > *reps comparison. But the really interesting thing then is - why only AMD? Jan
--- unstable.orig/xen/arch/x86/hvm/emulate.c +++ unstable/xen/arch/x86/hvm/emulate.c @@ -1180,7 +1180,7 @@ static int hvmemul_rep_movs( pfec |= PFEC_user_mode; bytes = PAGE_SIZE - (saddr & ~PAGE_MASK); - if ( vio->mmio_access.read_access && + if ( vio->mmio_access.read_access && !vio->mmio_access.write_access && (vio->mmio_gla == (saddr & PAGE_MASK)) && bytes >= bytes_per_rep ) {