From patchwork Tue Aug 12 16:27:35 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shawn Bohrer X-Patchwork-Id: 4713751 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id B62D19F319 for ; Tue, 12 Aug 2014 16:28:56 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id CD8B22015A for ; Tue, 12 Aug 2014 16:28:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E3E8B20148 for ; Tue, 12 Aug 2014 16:28:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753509AbaHLQ20 (ORCPT ); Tue, 12 Aug 2014 12:28:26 -0400 Received: from mail-ob0-f173.google.com ([209.85.214.173]:56522 "EHLO mail-ob0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753300AbaHLQ2Y (ORCPT ); Tue, 12 Aug 2014 12:28:24 -0400 Received: by mail-ob0-f173.google.com with SMTP id vb8so7387197obc.32 for ; Tue, 12 Aug 2014 09:28:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=9r2o5yvc+5ZS3wnWQHkscpGKBLgOGuBHgwJirKK58sw=; b=sdwx9+f6drHsX6hA53x63VxmFVdpziTzdwa7s0ocKuyJLS145vWOex1X61+LDuD+/f 7HUh41eOGRH/tWChDPZas64fA5OmLwIaDEddubP8Tm7ogpgnJB64X4c7bxE0/fjAMjYK C30WO9hUNAdT5s6fOjV5Cm9JaojhWWUOk4eZ40YTOjX3e2kGyKboCjCUPT/C2DpDN5S6 xLJHJVFVww+jKQQLAw21JCSR3SWBfw7jspGz465JKhbl28zc+G9Q5e+x097PFAontzn5 B7++1diva7EgEW/swr0nYmmpD7DqLNaKOaZwjDXj1hzcDWWqnLn7sJs6eViYEuMha+p7 x66w== X-Received: by 10.182.24.38 with SMTP id r6mr6100155obf.10.1407860903307; Tue, 12 Aug 2014 09:28:23 -0700 (PDT) Received: from sbohrermbp13-local.rgmadvisors.com ([173.227.92.65]) by mx.google.com with ESMTPSA id pa1sm4673674obb.1.2014.08.12.09.28.22 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 Aug 2014 09:28:22 -0700 (PDT) From: Shawn Bohrer To: Roland Dreier Cc: Christoph Lameter , Sean Hefty , Hal Rosenstock , linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, tomk@rgmadvisors.com, Shawn Bohrer Subject: [PATCH] ib_umem_release should decrement mm->pinned_vm from ib_umem_get Date: Tue, 12 Aug 2014 11:27:35 -0500 Message-Id: <1407860855-6564-1-git-send-email-shawn.bohrer@gmail.com> X-Mailer: git-send-email 1.9.3 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Shawn Bohrer In debugging an application that receives -ENOMEM from ib_reg_mr() I found that ib_umem_get() can fail because the pinned_vm count has wrapped causing it to always be larger than the lock limit even with RLIMIT_MEMLOCK set to RLIM_INFINITY. The wrapping of pinned_vm occurs because the process that calls ib_reg_mr() will have its mm->pinned_vm count incremented. Later a different process with a different mm_struct than the one that allocated the ib_umem struct ends up releasing it which results in decrementing the new processes mm->pinned_vm count past zero and wrapping. I'm not entirely sure what circumstances cause a different process to release the ib_umem than the one that allocated it but the kernel stack trace of the freeing process from my situation looks like the following: Call Trace: [] dump_stack+0x19/0x1b [] ib_umem_release+0x1f5/0x200 [ib_core] [] mlx4_ib_destroy_qp+0x241/0x440 [mlx4_ib] [] ib_destroy_qp+0x12c/0x170 [ib_core] [] ib_uverbs_close+0x259/0x4e0 [ib_uverbs] [] __fput+0xba/0x240 [] ____fput+0xe/0x10 [] task_work_run+0xc4/0xe0 [] do_notify_resume+0x95/0xa0 [] int_signal+0x12/0x17 The following patch fixes the issue by storing the mm_struct of the process that calls ib_umem_get() so that ib_umem_release and/or ib_umem_account() can properly decrement the pinned_vm count of the correct mm_struct. Signed-off-by: Shawn Bohrer --- drivers/infiniband/core/umem.c | 17 ++++++++--------- 1 files changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index a3a2e9c..32699024 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -105,6 +105,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, umem->length = size; umem->offset = addr & ~PAGE_MASK; umem->page_size = PAGE_SIZE; + umem->mm = get_task_mm(current); /* * We ask for writable memory if any access flags other than * "remote read" are set. "Local write" and "remote write" @@ -198,6 +199,7 @@ out: if (ret < 0) { if (need_release) __ib_umem_release(context->device, umem, 0); + mmput(umem->mm); kfree(umem); } else current->mm->pinned_vm = locked; @@ -229,13 +231,11 @@ static void ib_umem_account(struct work_struct *work) void ib_umem_release(struct ib_umem *umem) { struct ib_ucontext *context = umem->context; - struct mm_struct *mm; unsigned long diff; __ib_umem_release(umem->context->device, umem, 1); - mm = get_task_mm(current); - if (!mm) { + if (!umem->mm) { kfree(umem); return; } @@ -251,20 +251,19 @@ void ib_umem_release(struct ib_umem *umem) * we defer the vm_locked accounting to the system workqueue. */ if (context->closing) { - if (!down_write_trylock(&mm->mmap_sem)) { + if (!down_write_trylock(&umem->mm->mmap_sem)) { INIT_WORK(&umem->work, ib_umem_account); - umem->mm = mm; umem->diff = diff; queue_work(ib_wq, &umem->work); return; } } else - down_write(&mm->mmap_sem); + down_write(&umem->mm->mmap_sem); - current->mm->pinned_vm -= diff; - up_write(&mm->mmap_sem); - mmput(mm); + umem->mm->pinned_vm -= diff; + up_write(&umem->mm->mmap_sem); + mmput(umem->mm); kfree(umem); } EXPORT_SYMBOL(ib_umem_release);