From patchwork Mon Jan 25 16:38:30 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Mario Kleiner X-Patchwork-Id: 8111571 Return-Path: X-Original-To: patchwork-dri-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id DD3C9BEEE5 for ; Mon, 25 Jan 2016 16:38:39 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9F487202FF for ; Mon, 25 Jan 2016 16:38:38 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 707C4202F8 for ; Mon, 25 Jan 2016 16:38:37 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A0C906E4C1; Mon, 25 Jan 2016 08:38:36 -0800 (PST) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from mail-wm0-f48.google.com (mail-wm0-f48.google.com [74.125.82.48]) by gabe.freedesktop.org (Postfix) with ESMTPS id 794686E4C1 for ; Mon, 25 Jan 2016 08:38:35 -0800 (PST) Received: by mail-wm0-f48.google.com with SMTP id r129so71591327wmr.0 for ; Mon, 25 Jan 2016 08:38:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type:content-transfer-encoding; bh=5aVZpf3xhHRgmNICv1N05eSUBG9gvhZ7aw7hOWzaL6M=; b=Vz+Cw/WsPHQyWv3qHbhuy2mAvynx0WLousHplFbzNJ9iKbEjxyvX7xyexf61Da0vSZ 1ILOM8I/4wFhJLvNkbSJxtmDsYTxYhbvXMpxEPECv/b/GpXLpB7mnFRm0ZZijlyYrBKl kpbPYRRuvLsQXqM+M5DbRBd1RkCiGy1R/cJG98QjRX34bH6rKcydOJ50vV/o/gqbbFBE xy8nlCDRAEdIJUk1svFMAlkZglL1gHy4sNjzMv998AivdfP7xstseSCQYq7hnhu/LZpK z4q7NpCSUIymiRlsivWCONWrrHap2uIrmGqD4BPrjcENYwn4sq3lB2p3gMOqKsLAzxRY l9vQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=5aVZpf3xhHRgmNICv1N05eSUBG9gvhZ7aw7hOWzaL6M=; b=m+yLR1zLtDD0q2vnyJk4Z2AZSP6uzc9tYrvSGnRwgH+LokGYpa3WmIF+MBuKrUtErG EVEPL6rWpx9KzjZ6U/lOArI+z9hQzfLWAACI9UPC69StdM8gSkqCFNZ8DqipyNjvYArF Am6y/e5no7pK83WWjctSKI82JIiDGZyqln+pE1qX2bv5FLbNZjg677biziY/ZCsqnYbm +TwqtUY4EHnUn7GzaFL6q5pG6K/0+8/weS8ecMoDnbPBgp/o2VYDQnWoeUB0hBrTXHuP H7GhbLLnNPtqt4VzmTKp40kZRuvavutSzI8b5CDwjWMSvKN7ogtS5oJmjbu+VAhZhl77 ieeg== X-Gm-Message-State: AG10YOTHFXRXo5DvTfHRldaKZp8Y5LL9c0oi9cPYMPhy137Ch7CLf8u8pGG41Ukg7UQWYQ== X-Received: by 10.28.137.213 with SMTP id l204mr20426035wmd.100.1453739914167; Mon, 25 Jan 2016 08:38:34 -0800 (PST) Received: from [172.25.199.14] (cin-11.medizin.uni-tuebingen.de. [134.2.118.242]) by smtp.gmail.com with ESMTPSA id x2sm11164654wjf.13.2016.01.25.08.38.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Jan 2016 08:38:33 -0800 (PST) Subject: Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon To: =?UTF-8?B?VmlsbGUgU3lyasOkbMOk?= References: <56A07D97.6030606@daenzer.net> <20160121075849.GH19130@phenom.ffwll.local> <56A0989E.30006@daenzer.net> <20160121100905.GL19130@phenom.ffwll.local> <56A19C98.8020208@daenzer.net> <20160122151835.GM23290@intel.com> <56A5A171.7000205@daenzer.net> <56A6203D.3030803@gmail.com> <20160125132310.GS23290@intel.com> <56A626D5.2040808@gmail.com> <20160125145309.GT23290@intel.com> From: Mario Kleiner Message-ID: <56A64F86.9080605@gmail.com> Date: Mon, 25 Jan 2016 17:38:30 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <20160125145309.GT23290@intel.com> Cc: Daniel Vetter , =?UTF-8?Q?Michel_D=c3=a4nzer?= , LKML , dri-devel@lists.freedesktop.org, Alex Deucher , =?UTF-8?Q?Christian_K=c3=b6nig?= , Vlastimil Babka X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Readding Daniel, which somehow got dropped from the cc. On 01/25/2016 03:53 PM, Ville Syrjälä wrote: > On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote: >> >> >> On 01/25/2016 02:23 PM, Ville Syrjälä wrote: >>> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: >>>> >>>> >>>> On 01/25/2016 05:15 AM, Michel Dänzer wrote: >>>>> On 23.01.2016 00:18, Ville Syrjälä wrote: >>>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: >>>>>>> >>>>>>> [ Trimming KDE folks from Cc ] >>>>>>> >>>>>>> On 21.01.2016 19:09, Daniel Vetter wrote: >>>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: >>>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote: >>>>>>>>>> >>>>>>>>>> Can you please point me at the vblank on/off jump bug please? >>>>>>>>> >>>>>>>>> AFAIR I originally reported it in response to >>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html >>>>>>>>> , but I can't find that in the archives, so maybe that was just on IRC. >>>>>>>>> See >>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html >>>>>>>>> . Basically, I ran into the bug fixed by your patch because the counter >>>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after >>>>>>>>> just a few days. >>>>>>>> >>>>>>>> Ok, so just uncovered the overflow bug. >>>>>>> >>>>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off >>>>>>> counter jumping bug (similar to the bug this thread is about), which >>>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems >>>>>>> to happen when turning off the CRTC: >>>>>>> >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 >>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 >>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] >>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 >>>>>> >>>>>> Not sure what bug we're talking about here, but here the hw counter >>>>>> clearly jumps backwards. >>>>>> >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 >>>>>> >>>>>> Same here. >>>>> >>>>> At least one of the jumps is expected, because this is around turning >>>>> off the CRTC for DPMS off. Don't know yet why there are two jumps back >>>>> though. >>>>> >>>>> >>>>>> These things just don't happen on i915 because drm_vblank_off() and >>>>>> drm_vblank_on() are always called around the times when the hw counter >>>>>> might get reset. Or at least that's how it should be. >>>>> >>>>> Which is of course the idea of Daniel's patch (which is what I'm getting >>>>> the above with) or Mario's patch as well, but clearly something's still >>>>> wrong. It's certainly possible that it's something in the driver, but >>>>> since calling drm_vblank_pre/post_modeset from the same places seems to >>>>> work fine (ignoring the regression discussed in this thread)... Do >>>>> drm_vblank_on/off require something else to handle this correctly? >>>>> >>>>> >>>> >>>> I suspect it is because vblank_disable_and_save calls >>>> drm_update_vblank_count() unconditionally, even if vblank irqs are >>>> already off. >>>> >>>> So on a manual display disable -> reenable you get something like >>>> >>>> At disable: >>>> >>>> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> >>>> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes >>>> final count. >>>> >>>> Then the crtc is shut down and its hw counter resets to zero. >>>> >>>> At reenable: >>>> >>>> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> >>>> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> >>>> drm_vblank_off -> vblank_disable_and_save -> A pointless >>>> drm_update_vblank_count() while the hw counter is already reset to zero >>>> --> Unwanted counter jump. >>>> >>>> >>>> The problem doesn't happen on a pure modeset to a different video >>>> resolution/refresh rate, as then we only have one call into >>>> atombios_crtc_dpms(DPMS_OFF). >>>> >>>> I think the fix is to fix vblank_disable_and_save() to only call >>>> drm_update_vblank_count() if vblank irqs get actually disabled, not on >>>> no-op calls. I will try that now. >>> >>> It does that on purpose. Otherwise the vblank counter would appear to >>> have stalled while the interrupt was off. >>> >> >> Ok, that's what the comments there say, although i don't see atm. why >> that perceived stall would be a big problem. I checked all callers of >> vblank_disable_and_save(). They are all careful to not call that >> function if vblanks are already disabled. The only exception is >> drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms >> drivers which have resetting hw counters or other problematic behaviour >> during modesets etc. then this will break. E.g., calling the vblank >> timestamping stuff is also not safe/well-defined during modesets when >> the timestamping constants are not (yet) updated to reflect the new mode >> timing of the modeset in progress. > > The idea is to maintain the appearance that the counter ticks all the > time as long as the crtc is active. While that may not be really > required in case if no one is currently interested in the vblank > counter, I think it's a nice thing to have just to make the behaviour > of the counter consistent. > > As far as calling drm_vblank_off() after the hw counter got reset, well, > that not correct. It should be called before the reset. What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The first call to DMPS_OFF will call drm_vblank_off() and really disable vblank-irqs if they were running, updating the counts/ts a last time. But then the dpms off will reset the hw counter to zero. When one reenables the display, a second call to DPMS_OFF will now call drm_vblank_off again when it apparently shouldn't. I just tested this patch, which fixes the counter jumps on radeon-kms with my or Daniel's drm_vblank_off patches to radeon: /* @@ -1415,6 +1418,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe) return; spin_lock_irqsave(&dev->vbl_lock, irqflags); + DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe, vblank->enabled); + /* Drop our private "prevent drm_vblank_get" refcount */ if (vblank->inmodeset) { atomic_dec(&vblank->refcount); Another, maybe better, approach might be to no-op redundant calls to drm_vblank_off() iff vblank->inmodeset and no-op redundant calls to drm_vblank_on() iff !vblank->inmodeset. -mario > >> >> -mario >> >> >>>> >>>> Otherwise kms drivers would have to be careful to never call >>>> drm_vblank_off multiple times before calling drm_vblank_on, but the help >>>> text to drm_vblank_on() claims that unbalanced calls to these functions >>>> are perfectly fine. >>>> >>>> -mario >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 607f493..d739d93 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1313,7 +1313,10 @@ void drm_vblank_off(struct drm_device *dev, unsigned int pipe) spin_lock_irqsave(&dev->event_lock, irqflags); spin_lock(&dev->vbl_lock); - vblank_disable_and_save(dev, pipe); + DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe, vblank->enabled); + + if (vblank->enabled) + vblank_disable_and_save(dev, pipe); wake_up(&vblank->queue);