Message ID | 20200525143849.4964-1-penguin-kernel@I-love.SAKURA.ne.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/vkms: Don't warn hrtimer_forward_now failure. | expand |
On 2020/05/26 2:00, Daniel Vetter wrote: > Forgot to add: I did run this quickly with vkms as secondary output. > Good fireworks show, but there's an entire army of additional splats > after the first one. The WARN_ON you're removing is only the > messenger, the real bug is probably one of the later backtraces. Or at > least points more clearly at the real bug. > -Daniel > >> >> Debug printk() patch shows that hrtimer_forward_now() can return larger than 1. >> What is the reason you are expecting hrtimer_forward_now() to always return 1 ? >> >>> >>> If the vkms driver is loaded, and there's nothing else going on, then what >>> I expect to happen is that this virtual hw is entirely off. And in that >>> case, the vblank ioctl should be rejected outright. So there's definitely >>> something fishy going on to begin with. >>> >>> If otoh the virtual hw is somehow on (maybe fbcon gets loaded, no idea), >>> then the vblank wait shouldn't just blow up like this. I see. Another debug printk() patch ( https://syzkaller.appspot.com/text?tag=Patch&x=135f254a100000 ) found that [ 86.613767][ T7837] [drm:drm_stub_open] [ 86.621592][ T7837] [drm:drm_open] pid = 7837, minor = 1 [ 86.651632][ T7837] [drm:drm_ioctl] pid=7837, dev=0xe201, auth=1, DRM_IOCTL_WAIT_VBLANK [ 86.663391][ T7837] [drm:drm_calc_timestamping_constants] *ERROR* crtc 32: Can't calculate constants, dotclock = 0! [ 86.679151][ T7837] [drm:drm_calc_timestamping_constants] crtc 32: hwmode: htotal 0, vtotal 0, vdisplay 0 [ 86.688865][ T7837] [drm:drm_calc_timestamping_constants] crtc 32: clock 0 kHz framedur 0 linedur 0 [ 86.688899][ T7837] Starting out->vblank_hrtimer : out->period_ns=0 vblank->framedur_ns=0 [ 86.689348][ T7837] [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0 [ 86.706883][ T7837] [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=0, diff=0, hw=0 hw_last=0 [ 86.706962][ C1] now=86714983667 interval=0 ret_overrun=17693375 hrtimer_start(&out->vblank_hrtimer, 0, HRTIMER_MODE_REL) from vkms_enable_vblank() is causing immediate vkms_vblank_simulate() callback. The reason of vblank->framedur_ns == 0 is that drm_calc_timestamping_constants() fails to calculate framedur_ns (which is 0 by default) due to mode->crtc_clock <= 0. Thus, somehow initializing mode->crtc_clock > 0 might be able to solve this problem. By the way, if you have an environment that can reproduce this problem, you can test with debug printk() patch above applied.
On Tue, May 26, 2020 at 6:39 AM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2020/05/26 13:18, Tetsuo Handa wrote: > > due to mode->crtc_clock <= 0. Thus, somehow initializing mode->crtc_clock > 0 might be able > > to solve this problem. > > Well, I came to think that vkms_enable_vblank() should return an error to the caller > when drm_calc_timestamping_constants() failed... If my memory is right we shouldn't even get there. crtc->mode being all zeros sounds like the simulated output isn't on (no surprise, syzbot doesn't enable it and fbcon is probably on card0), so something higher up should reject this. I'll see whether I can figure out what vkms isn't doing right (or whether there's a higher level bug in drm_vblank.c code), yesterday my machine died and already evening anyway. Thanks for digging into this a bit more meanwhile. Cheers, Daniel > > static int vkms_enable_vblank(struct drm_crtc *crtc) > { > struct drm_device *dev = crtc->dev; > unsigned int pipe = drm_crtc_index(crtc); > struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; > struct vkms_output *out = drm_crtc_to_vkms_output(crtc); > > drm_calc_timestamping_constants(crtc, &crtc->mode); > > hrtimer_init(&out->vblank_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); > out->vblank_hrtimer.function = &vkms_vblank_simulate; > out->period_ns = ktime_set(0, vblank->framedur_ns); > hrtimer_start(&out->vblank_hrtimer, out->period_ns, HRTIMER_MODE_REL); > > return 0; > }
On Tue, May 26, 2020 at 01:11:28PM +0200, Daniel Vetter wrote: > On Tue, May 26, 2020 at 6:39 AM Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > On 2020/05/26 13:18, Tetsuo Handa wrote: > > > due to mode->crtc_clock <= 0. Thus, somehow initializing mode->crtc_clock > 0 might be able > > > to solve this problem. > > > > Well, I came to think that vkms_enable_vblank() should return an error to the caller > > when drm_calc_timestamping_constants() failed... > > If my memory is right we shouldn't even get there. crtc->mode being > all zeros sounds like the simulated output isn't on (no surprise, > syzbot doesn't enable it and fbcon is probably on card0), so something > higher up should reject this. I'll see whether I can figure out what > vkms isn't doing right (or whether there's a higher level bug in > drm_vblank.c code), yesterday my machine died and already evening > anyway. > > Thanks for digging into this a bit more meanwhile. Ok good news, I think I have a fix. Bad news it's code used by like 50 drivers, so this will take some time to get reviewed and merged (and I think almost all these other drivers are buggy too, which is why I decided to fix the shared code). Patch below, can you pls confirm this fixes it? Thanks, Daniel commit b882894ac0fce412b67db79a5c92f2b599ec5069 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue May 26 14:29:00 2020 +0200 drm/atomic-helper: reset vblank on crtc reset Only when vblanks are supported ofc. Some drivers do this already, but most unfortunately missed it. This opens up bugs after driver load, before the crtc is enabled for the first time. syzbot spotted this when loading vkms as a secondary output. Given how many drivers are buggy it's best to solve this once and for all in shared helper code. Aside from moving the few existing calls to drm_crtc_vblank_reset into helpers (i915 doesn't use helpers, so keeps its own) I think the regression risk is minimal: atomic helpers already rely on drivers calling drm_crtc_vblank_on/off correctly in their hooks when they support vblanks. And driver that's failing to handle vblanks after this is missing those calls already, and vblanks could only work by accident when enabling a CRTC for the first time right after boot. FIXME: Fix up drivers and audit them all. Big thanks to Tetsuo for helping track down what's going wrong here. Link: https://syzkaller.appspot.com/bug?id=0ba17d70d062b2595e1f061231474800f076c7cb Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot+0871b14ca2e2fb64f6e3@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> diff --git a/drivers/gpu/drm/drm_atomic_state_helper.c b/drivers/gpu/drm/drm_atomic_state_helper.c index 8fce6a115dfe..843964252239 100644 --- a/drivers/gpu/drm/drm_atomic_state_helper.c +++ b/drivers/gpu/drm/drm_atomic_state_helper.c @@ -32,6 +32,7 @@ #include <drm/drm_device.h> #include <drm/drm_plane.h> #include <drm/drm_print.h> +#include <drm/drm_vblank.h> #include <drm/drm_writeback.h> #include <linux/slab.h> @@ -93,6 +94,9 @@ __drm_atomic_helper_crtc_reset(struct drm_crtc *crtc, if (crtc_state) __drm_atomic_helper_crtc_state_reset(crtc_state, crtc); + if (crtc->dev->num_crtcs) + drm_crtc_vblank_reset(crtc); + crtc->state = crtc_state; } EXPORT_SYMBOL(__drm_atomic_helper_crtc_reset);
diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c index ac85e17428f8..cc1811ce6092 100644 --- a/drivers/gpu/drm/vkms/vkms_crtc.c +++ b/drivers/gpu/drm/vkms/vkms_crtc.c @@ -13,12 +13,9 @@ static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) vblank_hrtimer); struct drm_crtc *crtc = &output->crtc; struct vkms_crtc_state *state; - u64 ret_overrun; bool ret; - ret_overrun = hrtimer_forward_now(&output->vblank_hrtimer, - output->period_ns); - WARN_ON(ret_overrun != 1); + hrtimer_forward_now(&output->vblank_hrtimer, output->period_ns); spin_lock(&output->lock); ret = drm_crtc_handle_vblank(crtc);
Commit 3a0709928b172a41 ("drm/vkms: Add vblank events simulated by hrtimers") introduced ret_overrun variable. And that variable was an unused-but-set-variable until commit 09ef09b4ab95dc40 ("drm/vkms: WARN when hrtimer_forward_now fails") added WARN_ON(ret_overrun != 1). Now, syzbot is hitting this WARN_ON() using a simple reproducer that does open("/dev/dri/card1") followed by ioctl(DRM_IOCTL_WAIT_VBLANK), and a debug printk() patch says that syzbot is getting output->vblank_hrtimer.base->get_time()=93531904774 (which is uptime) output->period_ns=0 ret_overrun=216994 . I can't understand what "verify the hrtimer_forward_now return" in that commit wants to say. hrtimer_forward_now() must return, and the return value of hrtimer_forward_now() is not a boolean. Why comparing with 1 ? Anyway, this failure is not something that worth crashing the system. Let's remove the ret_overrun variable and WARN_ON() test. Link: https://syzkaller.appspot.com/bug?id=0ba17d70d062b2595e1f061231474800f076c7cb Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot+0871b14ca2e2fb64f6e3@syzkaller.appspotmail.com Fixes: 09ef09b4ab95dc40 ("drm/vkms: WARN when hrtimer_forward_now fails") --- drivers/gpu/drm/vkms/vkms_crtc.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)