From patchwork Thu May 29 14:34:59 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 4265851 Return-Path: X-Original-To: patchwork-dri-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 3684D9F1D6 for ; Thu, 29 May 2014 15:22:56 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 53DF42034C for ; Thu, 29 May 2014 15:22:55 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 03AB12034A for ; Thu, 29 May 2014 15:22:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 53B716E9CB; Thu, 29 May 2014 08:22:53 -0700 (PDT) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) by gabe.freedesktop.org (Postfix) with ESMTP id 181A76E407 for ; Thu, 29 May 2014 07:35:10 -0700 (PDT) Received: from www262.sakura.ne.jp (ksav21.sakura.ne.jp [210.224.165.143]) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id s4TEZ4fF023332; Thu, 29 May 2014 23:35:04 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) X-Nat-Received: from [202.181.97.72]:12880 [ident-empty] by smtp-proxy.isp with TPROXY id 1401374104.11070 Received: from CLAMP (KD175108057186.ppp-bb.dion.ne.jp [175.108.57.186]) (authenticated bits=0) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id s4TEZ3jg023323; Thu, 29 May 2014 23:35:03 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) To: konrad.wilk@oracle.com Subject: Re: [PATCH] gpu/drm/ttm: Use mutex_lock_killable() for shrinker functions. From: Tetsuo Handa References: <201405210030.HBD65663.FFLVHOFMSJOtOQ@I-love.SAKURA.ne.jp> <201405242322.AID86423.HOMLQJOtFFVOSF@I-love.SAKURA.ne.jp> <20140528185445.GA23122@phenom.dumpdata.com> <201405290647.DHI69200.HSFVFMFOJOLOQt@I-love.SAKURA.ne.jp> In-Reply-To: <201405290647.DHI69200.HSFVFMFOJOLOQt@I-love.SAKURA.ne.jp> Message-Id: <201405292334.EAG00503.FLOOJFStHVQMFO@I-love.SAKURA.ne.jp> X-Mailer: Winbiff [Version 2.51 PL2] X-Accept-Language: ja,en,zh Date: Thu, 29 May 2014 23:34:59 +0900 Mime-Version: 1.0 X-Anti-Virus: Kaspersky Anti-Virus for Linux Mail Server 5.6.45.2/RELEASE, bases: 29052014 #8077899, status: clean X-Mailman-Approved-At: Thu, 29 May 2014 08:22:46 -0700 Cc: linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, glommer@openvz.org, linux-mm@kvack.org, mgorman@suse.de, dchinner@redhat.com X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Tetsuo Handa wrote: > Konrad Rzeszutek Wilk wrote: > > On Sat, May 24, 2014 at 11:22:09PM +0900, Tetsuo Handa wrote: > > > Hello. > > > > > > I tried to test whether it is OK (from point of view of reentrant) to use > > > mutex_lock() or mutex_lock_killable() inside shrinker functions when shrinker > > > functions do memory allocation, for drivers/gpu/drm/ttm/ttm_page_alloc_dma.c is > > > doing memory allocation with mutex lock held inside ttm_dma_pool_shrink_scan(). > > > > > > If I compile a test module shown below which mimics extreme case of what > > > ttm_dma_pool_shrink_scan() will do > > > > And ttm_pool_shrink_scan. > > I don't know why but ttm_pool_shrink_scan() does not take mutex. > Well, it seems to me that ttm_pool_shrink_scan() not taking mutex is a bug which could lead to stack overflow if kmalloc() in ttm_page_pool_free() triggered recursion. shrink_slab() => ttm_pool_shrink_scan() => ttm_page_pool_free() => kmalloc(GFP_KERNEL) => shrink_slab() => ttm_pool_shrink_scan() => ttm_page_pool_free() => kmalloc(GFP_KERNEL) Maybe shrink_slab() should be updated not to call same shrinker in parallel? Also, it seems to me that ttm_dma_pool_shrink_scan() has potential division by 0 bug as described below. Is this patch correct? ---------- >From 4a65744a300e14e5e202c5f13ba2759e1e797d29 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Thu, 29 May 2014 18:25:42 +0900 Subject: [PATCH] gpu/drm/ttm: Use mutex_trylock() for shrinker functions. I can observe that RHEL7 environment stalls with 100% CPU usage when a certain type of memory pressure is given. While the shrinker functions are called by shrink_slab() before the OOM killer is triggered, the stall lasts for many minutes. One of reasons of this stall is that ttm_dma_pool_shrink_count()/ttm_dma_pool_shrink_scan() are called and are blocked at mutex_lock(&_manager->lock). GFP_KERNEL allocation with _manager->lock held causes someone (including kswapd) to deadlock when these functions are called due to memory pressure. This patch changes "mutex_lock();" to "if (!mutex_trylock()) return ...;" in order to avoid deadlock. At the same time, this patch fixes potential division by 0 due to unconditionally doing "% _manager->npools". This is because list_empty(&_manager->pools) being false does not guarantee that _manager->npools != 0 after taking the _manager->lock because _manager->npools is updated under the _manager->lock. At the same time, this patch moves updating of start_pool variable in order to avoid skipping when choosing a pool to shrink in round-robin style. The start_pool is changed from "atomic_t" to "unsigned int" because it is now updated under the _manager->lock. Signed-off-by: Tetsuo Handa Cc: stable [3.3+] --- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index fb8259f..5e332b4 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -1004,9 +1004,9 @@ EXPORT_SYMBOL_GPL(ttm_dma_unpopulate); static unsigned long ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) { - static atomic_t start_pool = ATOMIC_INIT(0); + static unsigned int start_pool; unsigned idx = 0; - unsigned pool_offset = atomic_add_return(1, &start_pool); + unsigned pool_offset; unsigned shrink_pages = sc->nr_to_scan; struct device_pools *p; unsigned long freed = 0; @@ -1014,8 +1014,11 @@ ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) if (list_empty(&_manager->pools)) return SHRINK_STOP; - mutex_lock(&_manager->lock); - pool_offset = pool_offset % _manager->npools; + if (!mutex_trylock(&_manager->lock)) + return SHRINK_STOP; + if (!_manager->npools) + goto out; + pool_offset = ++start_pool % _manager->npools; list_for_each_entry(p, &_manager->pools, pools) { unsigned nr_free; @@ -1034,6 +1037,7 @@ ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) p->pool->dev_name, p->pool->name, current->pid, nr_free, shrink_pages); } +out: mutex_unlock(&_manager->lock); return freed; } @@ -1044,7 +1048,8 @@ ttm_dma_pool_shrink_count(struct shrinker *shrink, struct shrink_control *sc) struct device_pools *p; unsigned long count = 0; - mutex_lock(&_manager->lock); + if (!mutex_trylock(&_manager->lock)) + return 0; list_for_each_entry(p, &_manager->pools, pools) count += p->pool->npages_free; mutex_unlock(&_manager->lock);