From patchwork Fri Nov 13 15:12:04 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 7620521 Return-Path: X-Original-To: patchwork-dri-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 679239F392 for ; Mon, 16 Nov 2015 02:32:56 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 2092620592 for ; Mon, 16 Nov 2015 02:32:55 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 3071120588 for ; Mon, 16 Nov 2015 02:32:53 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2767F6E784; Sun, 15 Nov 2015 18:31:37 -0800 (PST) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from mail-io0-f180.google.com (mail-io0-f180.google.com [209.85.223.180]) by gabe.freedesktop.org (Postfix) with ESMTPS id 478FC6F07E for ; Fri, 13 Nov 2015 07:12:07 -0800 (PST) Received: by ioir85 with SMTP id r85so59310792ioi.1 for ; Fri, 13 Nov 2015 07:12:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel_dk.20150623.gappssmtp.com; s=20150623; h=date:from:to:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:fcc:user-agent; bh=QN5KRQlgeoSCLOH8q9fGjLOmKJu4mUxnt0Un67WAMRY=; b=fjql8l+GZd/Y/qYao4lMvCzQgTSA/HlYm5d4Br29halhC+W1e8vHYXTX5tB6bt+jXF O66/EZvibhtLzoMFoMfFY+FJoPveZCn2SL4Yjci2p5p2r1NUq6BEL+mbjNJU0cANij1v mdcJFT9VozrohXQieuHD9aZutIu3rLyMZ87ochfW4CF2Mx5g81ZixF6mBXqb+9al/1H2 rtiilMV/G0+4tRwLRTvH964oqu5sJRHSdUBt5K+cFMIfW8JlonDUiKx5Qs3vTUilprCT Jf8hq26HMvnQBuFNtitiE2RH3ic5bxzdKVfJuwWJJCwNdi3d3ODs2YDSv/lW6Z/GIn3s C1Dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to:fcc :user-agent; bh=QN5KRQlgeoSCLOH8q9fGjLOmKJu4mUxnt0Un67WAMRY=; b=Rv06B67KbD1ABNbq+1WicE2AXdcP+v1pS3Day6RupfGFW9Q25DU2ml6GWnmq/cQZnU UHJumB0ugRRn6RFdI7aWBADItBwMSGXzO9czlEsNiNGpkseV7sp0M/itle/6Ow94vurk ZUyA5lgPdM63ry+WOwI/qO1H+QiUQFWrpkadilPXI/lhNuBf8fyGVucFODmmxAR3Nwln uUEZg5ySIlyvDt55C/SzwVllhLZ4d8Z3eTI56qdCgp+Q0PP3dOjT8SHbTmwRMjCHURtS 8q4AUb+16v/NifKYdXW3ggGfEUQrxpB0G/qRc3zWct/B+ed6vvGZVVndAMR+1vjLdGmf LAww== X-Gm-Message-State: ALoCoQmTJg4h0a82rNX21wYcFaVJp0HRA7MeKCJybXg3QpVm+NNI9Pqk1Z75KtfXHjcoHl/AhYBl X-Received: by 10.107.3.42 with SMTP id 42mr20771179iod.83.1447427527215; Fri, 13 Nov 2015 07:12:07 -0800 (PST) Received: from localhost ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id p4sm1473910igj.8.2015.11.13.07.12.05 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Fri, 13 Nov 2015 07:12:05 -0800 (PST) Date: Fri, 13 Nov 2015 08:12:04 -0700 From: Jens Axboe To: Chris Wilson , Daniel Vetter , DRI Development , LKML Subject: Re: __i915_spin_request() sucks Message-ID: <20151113151204.GA8939@kernel.dk> References: <5644F850.2060803@kernel.dk> <5644F941.9090505@kernel.dk> <20151112221908.GA26194@nuc-i3427.alporthouse.com> <56451812.2050704@kernel.dk> <20151113091558.GN6247@nuc-i3427.alporthouse.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20151113091558.GN6247@nuc-i3427.alporthouse.com> FCC: imap://axboe%40kernel.dk@imap.gmail.com/[Gmail]/Sent Mail X-Identity-Key: id1 X-Account-Key: account1 X-Mozilla-Draft-Info: internal/draft; vcard=0; receipt=0; DSN=0; uuencode=0; attachmentreminder=0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 X-Mailman-Approved-At: Sun, 15 Nov 2015 18:31:21 -0800 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 11/13/2015 02:15 AM, Chris Wilson wrote: > On Thu, Nov 12, 2015 at 03:52:02PM -0700, Jens Axboe wrote: >> On 11/12/2015 03:19 PM, Chris Wilson wrote: >>>>> So today, I figured I'd try just killing that spin. If it fails, we'll >>>>> punt to normal completions, so easy change. And wow, MASSIVE difference. >>>>> I can now scroll in chrome and not rage! It's like the laptop is 10x >>>>> faster now. >>>>> >>>>> Ran git blame, and found: >>>>> >>>>> commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 >>>>> Author: Chris Wilson >>>>> Date: Tue Apr 7 16:20:41 2015 +0100 >>>>> >>>>> drm/i915: Optimistically spin for the request completion >>>>> >>>>> and read the commit message. Doesn't sound that impressive. Especially >>>>> not for something that screws up interactive performance by a LOT. >>>>> >>>>> What's the deal? Revert? >>> >>> The tests that it improved the most were the latency sensitive tests and >>> since my Broadwell xps13 behaves itself, I'd like to understand how it >>> culminates in an interactivity loss. >>> >>> 1. Maybe it is the uninterruptible nature of the polling, making X's >>> SIGIO jerky: >> >> This one still feels bad. >> >>> 2. Or maybe it is increased mutex contention: >> >> And so does this one... I had to manually apply hunks 2-3, and after >> doing seat-of-the-pants testing for both variants, I confirmed with >> perf that we're still seeing a ton of time in __i915_wait_request() >> for both of them. >> >>> Or maybe it is an indirect effect, such as power balancing between the >>> CPU and GPU, or just thermal throttling, or it may be the task being >>> penalised for consuming its timeslice (for which any completion polling >>> seems susceptible). >> >> Look, polls in the 1-10ms range are just insane. Either you botched >> the commit message and really meant "~1ms at most" and in which case >> I'd suspect you of smoking something good, or you hacked it up wrong >> and used jiffies when you really wanted to be using some other time >> check that really did give you 1us. > > What other time check? I was under the impression of setting up a > hrtimer was expensive and jiffies was available. Looping for 10ms is a lot more expensive :-). jiffies is always there, but it's WAY too coarse to be used for something like this. You could use ktime_get(), there's a lot of helpers for checking time_after, adding msecs, etc. Something like the below, not tested here yet. >> I'll take an IRQ over 10 msecs of busy looping on my laptop, thanks. >> >>>> "Limit the spinning to a single jiffie (~1us) at most" >>>> >>>> is totally wrong. I have HZ=100 on my laptop. That's 10ms. 10ms! >>>> Even if I had HZ=1000, that'd still be 1ms of spinning. That's >>>> seriously screwed up, guys. >>> >>> That's over and above the termination condition for blk_poll(). >> >> ?! And this is related, how? Comparing apples and oranges. One is a >> test opt-in feature for experimentation, the other is >> unconditionally enabled for everyone. I believe the commit even says >> so. See the difference? Would I use busy loop spinning waiting for >> rotating storage completions, which are in the 1-10ms range? No, >> with the reason being that the potential wins for spins are in the >> usec range. > > Equally I expect the service interval for a batch to be around 2-20us. > There are many workloads that execute 30-50k requests/s, and you can > appreciate that they are sensitive to the latency in setting up an irq > and receiving it - just as equally leaving that irq enabled saturates a > CPU with simply executing the irq handler. So what mechanism do you use > to guard against either a very long queue depth or an abnormal request > causing msec+ spins? Not disputing that polling can work, but it needs to be a bit more clever. Do you know which requests are fast and which ones are not? Could you track it? Should we make this a module option? 20usec is too long to poll. If we look at the wins of polling, we're talking anywhere from 1-2 usec to maybe 5 usec, depending on different factors. So spinning between 1-3 usec should be a hard limit on most platforms. And it's somewhat of a policy decision, since it does involve throwing CPU at the problem. There's a crossover point where below it's always a win, but that needs a lot more work than just optimistic spinning for everything. You also do need a check for whether the task has been woken up, that's also missing. As for interrupt mitigation, I'd consider that a separate problem. It's a lot simpler than the app induced polling that i915 is doing here. So if overloading a core with IRQs is an issue, I'd solve that differently similarly to NAPI or blk-iopoll (not to be confused with blk_poll() that you referenced and is app induced polling). diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 5cf4a1998273..658514e899b1 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1148,17 +1148,19 @@ static bool missed_irq(struct drm_i915_private *dev_priv, static int __i915_spin_request(struct drm_i915_gem_request *req) { - unsigned long timeout; + ktime_t start, end; if (i915_gem_request_get_ring(req)->irq_refcount) return -EBUSY; - timeout = jiffies + 1; + start = ktime_get(); + end.tv64 = start.tv64; + ktime_add_us(end, 1); while (!need_resched()) { if (i915_gem_request_completed(req, true)) return 0; - if (time_after_eq(jiffies, timeout)) + if (ktime_after(start, end)) break; cpu_relax_lowlatency();