From patchwork Fri Jul 13 16:56:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Wilck X-Patchwork-Id: 10523893 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4A320601C2 for ; Fri, 13 Jul 2018 16:56:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 39598287FB for ; Fri, 13 Jul 2018 16:56:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2D31329BDD; Fri, 13 Jul 2018 16:56:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 451C829BA5 for ; Fri, 13 Jul 2018 16:56:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387872AbeGMRMO (ORCPT ); Fri, 13 Jul 2018 13:12:14 -0400 Received: from smtp2.provo.novell.com ([137.65.250.81]:49275 "EHLO smtp2.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387881AbeGMRMO (ORCPT ); Fri, 13 Jul 2018 13:12:14 -0400 Received: from [192.168.1.40] (prv-ext-foundry1int.gns.novell.com [137.65.251.240]) by smtp2.provo.novell.com with ESMTP (TLS encrypted); Fri, 13 Jul 2018 10:56:42 -0600 Message-ID: <2311947c2f0f368bd10474edb0f0f5b51dde6b7d.camel@suse.com> Subject: Re: Silent data corruption in blkdev_direct_IO() From: Martin Wilck To: Jens Axboe , Hannes Reinecke Cc: Christoph Hellwig , "linux-block@vger.kernel.org" Date: Fri, 13 Jul 2018 18:56:38 +0200 In-Reply-To: References: <3419a3ae-da82-9c20-26e1-7c9ed14ff8ed@kernel.dk> <57a2b121-9805-8337-fb97-67943670f250@kernel.dk> X-Mailer: Evolution 3.28.2 Mime-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, 2018-07-12 at 10:42 -0600, Jens Axboe wrote: > > Hence the patch I sent is wrong, the code actually looks fine. Which > means we're back to trying to figure out what is going on here. It'd > be great with a test case... We don't have an easy test case yet. But the customer has confirmed that the problem occurs with upstream 4.17.5, too. We also confirmed again that the problem occurs when the kernel uses the kmalloc() code path in __blkdev_direct_IO_simple(). My personal suggestion would be to ditch __blkdev_direct_IO_simple() altogether. After all, it's not _that_ much simpler thatn __blkdev_direct_IO(), and it seems to be broken in a subtle way. However, so far I've only identified a minor problem, see below - it doesn't explain the data corruption we're seeing. Martin diff --git a/fs/block_dev.c b/fs/block_dev.c index 7ec920e..b82b516 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -218,8 +218,12 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, bio.bi_end_io = blkdev_bio_end_io_simple; ret = bio_iov_iter_get_pages(&bio, iter); - if (unlikely(ret)) + if (unlikely(ret)) { + if (vecs != inline_vecs) + kfree(vecs); + bio_uninit(&bio); return ret; + } ret = bio.bi_iter.bi_size;