From patchwork Wed Jan 14 14:09:03 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peng Tao X-Patchwork-Id: 5631771 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 73A429F358 for ; Wed, 14 Jan 2015 14:09:28 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 757B52026F for ; Wed, 14 Jan 2015 14:09:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2A36C2026C for ; Wed, 14 Jan 2015 14:09:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753230AbbANOJX (ORCPT ); Wed, 14 Jan 2015 09:09:23 -0500 Received: from mail-pd0-f178.google.com ([209.85.192.178]:54210 "EHLO mail-pd0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753206AbbANOJU (ORCPT ); Wed, 14 Jan 2015 09:09:20 -0500 Received: by mail-pd0-f178.google.com with SMTP id r10so9947669pdi.9 for ; Wed, 14 Jan 2015 06:09:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=BcQtVvbFSPV2R1WlXSWq3OZkTwftvXOcVn1sHDBXzKs=; b=V3KLwgw4AhJh3q8YybT0taJR/LZYkAUCfMALFUq/jc4lV7+NHLyf/c39QXrXoh7bIh 8jdfDoXJdpgwwamNNGOWbnCTzHZlCAB+ifa5TciB1TuMKLuEOW476IlnkeGQmAnqi4GJ EWqT1vpmKn3DVlwIe4Rfj8AlDEqd1A/WKwTipUxuwxQm/XuIpylgmwMjgJRxXo6AcrUp bVqeBV4o2jPsElwydp2USI0c8fOqeia9R8DC2/IXH0drmWy2N7RIW2rctGCWUzETalyZ FphkWVqTtCOhQO8b8flBy6KliZ6wiPhaJHX52TuDMoGyqEjR8iYWKAdFZNhBkiR77Z2d QVHQ== X-Gm-Message-State: ALoCoQluNsWkotkx7pvs8qX5BUwl7tZ8ek2oZTRf6u1GwnD/jQKi0TRDKx52cQgRtlX6YQKExQpZ X-Received: by 10.66.184.175 with SMTP id ev15mr6103454pac.78.1421244560217; Wed, 14 Jan 2015 06:09:20 -0800 (PST) Received: from localhost.localdomain (ec2-54-65-164-9.ap-northeast-1.compute.amazonaws.com. [54.65.164.9]) by mx.google.com with ESMTPSA id a13sm19907965pdm.44.2015.01.14.06.09.17 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 14 Jan 2015 06:09:19 -0800 (PST) From: Peng Tao To: linux-nfs@vger.kernel.org Cc: Trond Myklebust , Peng Tao Subject: [PATCH] nfs: fix dio deadlock when O_DIRECT flag is flipped Date: Wed, 14 Jan 2015 22:09:03 +0800 Message-Id: <1421244543-32539-1-git-send-email-tao.peng@primarydata.com> X-Mailer: git-send-email 1.9.1 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Running xfstest generic/036, we'll get following VM_BUG_ON in nfs_direct_IO(). 036 toggles O_DIRECT flag while IO is going on. So the VM_BUG_ON should not exist there. However, we have a deadlock in the code path as well, because inode->i_mutex is taken when calling into ->direct_IO. And nfs_file_direct_write() would grab inode->i_mutex again. Meanwhile, nfs_file_direct_write() does a lot of things that is already done by vfs before ->direct_IO is called. So skip those duplicates. One exception is i_size_write. vfs does not take i_lock when setting i_size. But nfs appears to need i_lock when setting i_size. Signed-off-by: Peng Tao --- fs/nfs/direct.c | 92 ++++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 65 insertions(+), 27 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index bfd9d49..9f88b13 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -105,6 +105,8 @@ static const struct nfs_pgio_completion_ops nfs_direct_write_completion_ops; static const struct nfs_commit_completion_ops nfs_direct_commit_completion_ops; static void nfs_direct_write_complete(struct nfs_direct_req *dreq, struct inode *inode); static void nfs_direct_write_schedule_work(struct work_struct *work); +static ssize_t nfs_direct_write(struct kiocb *iocb, struct iov_iter *iter, + loff_t pos); static inline void get_dreq(struct nfs_direct_req *dreq) { @@ -257,11 +259,9 @@ ssize_t nfs_direct_IO(int rw, struct kiocb *iocb, struct iov_iter *iter, loff_t return -EINVAL; #else - VM_BUG_ON(iocb->ki_nbytes != PAGE_SIZE); - if (rw == READ) return nfs_file_direct_read(iocb, iter, pos); - return nfs_file_direct_write(iocb, iter, pos); + return nfs_direct_write(iocb, iter, pos); #endif /* CONFIG_NFS_SWAP */ } @@ -919,6 +919,66 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, return 0; } +static ssize_t _nfs_direct_write(struct kiocb *iocb, struct iov_iter *iter, + struct inode *inode, struct nfs_direct_req **dreqp, + size_t count, loff_t pos) +{ + ssize_t result; + struct nfs_direct_req *dreq; + struct nfs_lock_context *l_ctx; + + task_io_account_write(count); + + result = -ENOMEM; + dreq = nfs_direct_req_alloc(); + if (!dreq) + goto out; + + dreq->inode = inode; + dreq->bytes_left = count; + dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp)); + l_ctx = nfs_get_lock_context(dreq->ctx); + if (IS_ERR(l_ctx)) { + result = PTR_ERR(l_ctx); + nfs_direct_req_release(dreq); + goto out; + } + dreq->l_ctx = l_ctx; + if (!is_sync_kiocb(iocb)) + dreq->iocb = iocb; + + result = nfs_direct_write_schedule_iovec(dreq, iter, pos); + *dreqp = dreq; + +out: + return result; +} + +static ssize_t nfs_direct_write(struct kiocb *iocb, struct iov_iter *iter, + loff_t pos) +{ + struct file *file = iocb->ki_filp; + struct address_space *mapping = file->f_mapping; + struct inode *inode = mapping->host; + struct nfs_direct_req *uninitialized_var(dreq); + size_t count = iov_iter_count(iter); + ssize_t result; + + result = _nfs_direct_write(iocb, iter, inode, &dreq, count, pos); + if (!result) { + result = nfs_direct_wait(dreq); + if (result > 0) { + iocb->ki_pos = pos + result; + spin_lock(&inode->i_lock); + if (i_size_read(inode) < iocb->ki_pos) + i_size_write(inode, iocb->ki_pos); + spin_unlock(&inode->i_lock); + } + } + nfs_direct_req_release(dreq); + return result; +} + /** * nfs_file_direct_write - file direct write operation for NFS files * @iocb: target I/O control block @@ -947,8 +1007,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; struct inode *inode = mapping->host; - struct nfs_direct_req *dreq; - struct nfs_lock_context *l_ctx; + struct nfs_direct_req *uninitialized_var(dreq); loff_t end; size_t count = iov_iter_count(iter); end = (pos + count - 1) >> PAGE_CACHE_SHIFT; @@ -982,26 +1041,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, goto out_unlock; } - task_io_account_write(count); - - result = -ENOMEM; - dreq = nfs_direct_req_alloc(); - if (!dreq) - goto out_unlock; - - dreq->inode = inode; - dreq->bytes_left = count; - dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp)); - l_ctx = nfs_get_lock_context(dreq->ctx); - if (IS_ERR(l_ctx)) { - result = PTR_ERR(l_ctx); - goto out_release; - } - dreq->l_ctx = l_ctx; - if (!is_sync_kiocb(iocb)) - dreq->iocb = iocb; - - result = nfs_direct_write_schedule_iovec(dreq, iter, pos); + result = _nfs_direct_write(iocb, iter, inode, &dreq, count, pos); if (mapping->nrpages) { invalidate_inode_pages2_range(mapping, @@ -1025,8 +1065,6 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, nfs_direct_req_release(dreq); return result; -out_release: - nfs_direct_req_release(dreq); out_unlock: mutex_unlock(&inode->i_mutex); out: