From patchwork Mon Jan 29 15:47:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wysochanski X-Patchwork-Id: 13535914 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FA8C154426 for ; Mon, 29 Jan 2024 15:47:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706543277; cv=none; b=I1yyh3aN/hp9X6sZBh2vGRdUKUa3V6+hUueVpXOzkQ20rSobehPyLcP7LZiGELTq+aKXcsp3em0KTEnStvevH9fEp8nhNpmM8XxTvEfvVBZm1F9FuNJkLi3UnJWF9vtbeOL+tiFMIiM3GBvueHEqY2r5qksOHQveT9ycRtbGx90= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706543277; c=relaxed/simple; bh=5UTRYSlS/zxWNlVvo8fA1vLrKCFG0n7uaZf30ynaev0=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=mnfvYHDD2N8qySPFgbE1B7iKZ09o9B03gKpLG51dkbASDWOvVIgizW2zbTQ4/YF2QeCwrDMUeMnAH00N00IWdUVbv4jChGb4kkTIy6VYTV9ArPeDle3fXOo2iqjDAJs2B4IHCMNug8g+EB+aqLCvtJKtRpAeByBwPrB9mYQvOh0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=PVZb9x7F; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PVZb9x7F" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706543273; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=mtpw32xGy089W9cbEb7EUhBnWe/H+a2oU2I5sBja1o0=; b=PVZb9x7Ff5W2Wr/Dawj9RK3A9jdrsTUuNuYYhbETDsW3IxPoKIQ95dKu2e8zSLFflds9yY eDsODm6m3P7fFB07kyeTkeeSudMom+pQmFFTcysf1lsqDm466Bm3MWnj8Wg81LihHq4hL4 1+y62oFUp/46P3DDfvjk9UVz+uwU4fY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-124-I7__Q_iROR6xVAJv3bspDA-1; Mon, 29 Jan 2024 10:47:52 -0500 X-MC-Unique: I7__Q_iROR6xVAJv3bspDA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DBBE3185A784; Mon, 29 Jan 2024 15:47:51 +0000 (UTC) Received: from dwysocha.rdu.csb (unknown [10.22.9.52]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 74570157; Mon, 29 Jan 2024 15:47:51 +0000 (UTC) From: Dave Wysochanski To: Anna Schumaker , Trond Myklebust Cc: David Howells , Jeff Layton , linux-nfs@vger.kernel.org, linux-cachefs@redhat.com Subject: [PATCH] NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt Date: Mon, 29 Jan 2024 10:47:50 -0500 Message-Id: <20240129154750.1245317-1-dwysocha@redhat.com> Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 The loop inside nfs_netfs_issue_read() currently does not disable interrupts while iterating through pages in the xarray to submit for NFS read. This is not safe though since after taking xa_lock, another page in the mapping could be processed for writeback inside an interrupt, and deadlock can occur. To fix this, use the irqsave/irqrestore primitives for the xa_lock. The problem is easily reproduced with the following test: mount -o vers=3,fsc 127.0.0.1:/export /mnt/nfs dd if=/dev/zero of=/mnt/nfs/file1.bin bs=4096 count=1 echo 3 > /proc/sys/vm/drop_caches dd if=/mnt/nfs/file1.bin of=/dev/null umount /mnt/nfs On the console with a lockdep-enabled kernel a message similar to the following will be seen: ================================ WARNING: inconsistent lock state 6.7.0-lockdbg+ #10 Not tainted -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. test5/1708 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff888127baa598 (&xa->xa_lock#4){+.?.}-{3:3}, at: nfs_netfs_issue_read+0x1b2/0x4b0 [nfs] {IN-SOFTIRQ-W} state was registered at: lock_acquire+0x144/0x380 _raw_spin_lock_irqsave+0x4e/0xa0 __folio_end_writeback+0x17e/0x5c0 folio_end_writeback+0x93/0x1b0 iomap_finish_ioend+0xeb/0x6a0 blk_update_request+0x204/0x7f0 blk_mq_end_request+0x30/0x1c0 blk_complete_reqs+0x7e/0xa0 __do_softirq+0x113/0x544 __irq_exit_rcu+0xfe/0x120 irq_exit_rcu+0xe/0x20 sysvec_call_function_single+0x6f/0x90 asm_sysvec_call_function_single+0x1a/0x20 pv_native_safe_halt+0xf/0x20 default_idle+0x9/0x20 default_idle_call+0x67/0xa0 do_idle+0x2b5/0x300 cpu_startup_entry+0x34/0x40 start_secondary+0x19d/0x1c0 secondary_startup_64_no_verify+0x18f/0x19b irq event stamp: 176891 hardirqs last enabled at (176891): [] _raw_spin_unlock_irqrestore+0x44/0x60 hardirqs last disabled at (176890): [] _raw_spin_lock_irqsave+0x79/0xa0 softirqs last enabled at (176646): [] __irq_exit_rcu+0xfe/0x120 softirqs last disabled at (176633): [] __irq_exit_rcu+0xfe/0x120 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xa->xa_lock#4); lock(&xa->xa_lock#4); *** DEADLOCK *** 2 locks held by test5/1708: #0: ffff888127baa498 (&sb->s_type->i_mutex_key#22){++++}-{4:4}, at: nfs_start_io_read+0x28/0x90 [nfs] #1: ffff888127baa650 (mapping.invalidate_lock#3){.+.+}-{4:4}, at: page_cache_ra_unbounded+0xa4/0x280 stack backtrace: CPU: 6 PID: 1708 Comm: test5 Kdump: loaded Not tainted 6.7.0-lockdbg+ #10 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014 Call Trace: dump_stack_lvl+0x5b/0x90 mark_lock+0xb3f/0xd20 __lock_acquire+0x77b/0x3360 _raw_spin_lock+0x34/0x80 nfs_netfs_issue_read+0x1b2/0x4b0 [nfs] netfs_begin_read+0x77f/0x980 [netfs] nfs_netfs_readahead+0x45/0x60 [nfs] nfs_readahead+0x323/0x5a0 [nfs] read_pages+0xf3/0x5c0 page_cache_ra_unbounded+0x1c8/0x280 filemap_get_pages+0x38c/0xae0 filemap_read+0x206/0x5e0 nfs_file_read+0xb7/0x140 [nfs] vfs_read+0x2a9/0x460 ksys_read+0xb7/0x140 Fixes: 000dbe0bec05 ("NFS: Convert buffered read paths to use netfs when fscache is enabled") Reviewed-by: Jeff Layton Signed-off-by: Dave Wysochanski --- fs/nfs/fscache.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c index b05717fe0d4e..de7ec89bfe8d 100644 --- a/fs/nfs/fscache.c +++ b/fs/nfs/fscache.c @@ -308,6 +308,7 @@ static void nfs_netfs_issue_read(struct netfs_io_subrequest *sreq) struct nfs_open_context *ctx = sreq->rreq->netfs_priv; struct page *page; int err; + unsigned long flags; pgoff_t start = (sreq->start + sreq->transferred) >> PAGE_SHIFT; pgoff_t last = ((sreq->start + sreq->len - sreq->transferred - 1) >> PAGE_SHIFT); @@ -322,19 +323,19 @@ static void nfs_netfs_issue_read(struct netfs_io_subrequest *sreq) pgio.pg_netfs = netfs; /* used in completion */ - xas_lock(&xas); + xas_lock_irqsave(&xas, flags); xas_for_each(&xas, page, last) { /* nfs_read_add_folio() may schedule() due to pNFS layout and other RPCs */ xas_pause(&xas); - xas_unlock(&xas); + xas_unlock_irqrestore(&xas, flags); err = nfs_read_add_folio(&pgio, ctx, page_folio(page)); if (err < 0) { netfs->error = err; goto out; } - xas_lock(&xas); + xas_lock_irqsave(&xas, flags); } - xas_unlock(&xas); + xas_unlock_irqrestore(&xas, flags); out: nfs_pageio_complete_read(&pgio); nfs_netfs_put(netfs);