[RFC] iomap: fix race between readahead and direct write

I noticed that generic/418 test may fail with small probability. And with
futher investiation, it can be reproduced with:

./src/dio-invalidate-cache -wp -b 4096 -n 8 -i 1 -f filename
./src/dio-invalidate-cache -wt -b 4096-n 8 -i 1 -f filename

The failure is because direct write wrote none-zero but buffer read got
zero.

In the process of buffer read, if the page do not exist, readahead will
be triggered.  __do_page_cache_readahead() will allocate page first. Next,
if the file block is unwritten(or hole), iomap_begin() will set iomap->type
to IOMAP_UNWRITTEN(or IOMAP_HOLE). Then, iomap_readpages_actor() will add
page to page cache. Finally, iomap_readpage_actor() will zero the page.

However, there is no lock or serialization between initializing iomap and
adding page to page cache against direct write. If direct write happen to
fininsh between them, the type of iomap should be IOMAP_MAPPED instead of
IOMAP_UNWRITTEN or IOMAP_HOLE. And the page will end up zeroed out in page
cache, while on-disk page hold the data of direct write.

| thread 1                    | thread 2                   |
| --------------------------  | -------------------------- |
| generic_file_buffered_read  |                            |
|  ondemand_readahead         |                            |
|   read_pages                |                            |
|    iomap_readpages          |                            |
|     iomap_apply             |                            |
|      xfs_read_iomap_begin   |                            |
|                             | xfs_file_dio_aio_write     |
|                             |  iomap_dio_rw              |
|                             |   ioamp_apply              |
|     ioamp_readpages_actor   |                            |
|      iomap_next_page        |                            |
|       add_to_page_cache_lru |                            |
|      iomap_readpage_actor   |                            |
|       zero_user             |                            |
|    iomap_set_range_uptodate |                            |
|                             | generic_file_buffered_read |
|                             |  copy_page_to_iter        |

For consequences, the content in the page is zero while the content in the
disk is not.

I tried to fix the problem by moving "add to page cache" before
iomap_begin(). However, performance might be worse since iomap_begin()
will be called for each page. I tested the performance for sequential
read with fio:

kernel version: v5.5-rc6
platform: arm64, 96 cpu
fio version: fio-3.15-2
test cmd:
fio -filename=/mnt/testfile -rw=read -bs=4k -size=20g -direct=0 -fsync=0
-numjobs=1 / 32 -ioengine=libaio -name=test -ramp_time=10 -runtime=120
test result:
|                  | without patch MiB/s | with patch MiB/s |
| ---------------- | ------------------- | ---------------- |
| ssd, numjobs=1   | 512                 | 512              |
| ssd, numjobs=32  | 3615                | 3714             |
| nvme, numjobs=1  | 1167                | 1118             |
| nvme, numjobs=32 | 3679                | 3606             |

Test result shows that the impact on performance is minimal.

Signed-off-by: yu kuai <yukuai3@huawei.com>
---
 fs/iomap/buffered-io.c | 104 ++++++++++++++++++++---------------------
 1 file changed, 52 insertions(+), 52 deletions(-)

Message ID	20200116063601.39201-1-yukuai3@huawei.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=mcKz=3F=vger.kernel.org=linux-fsdevel-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D138B921 for <patchwork-linux-fsdevel@patchwork.kernel.org>; Thu, 16 Jan 2020 06:37:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id ABD0D208C3 for <patchwork-linux-fsdevel@patchwork.kernel.org>; Thu, 16 Jan 2020 06:37:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726864AbgAPGg5 (ORCPT <rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>); Thu, 16 Jan 2020 01:36:57 -0500 Received: from szxga07-in.huawei.com ([45.249.212.35]:36964 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726369AbgAPGg5 (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>); Thu, 16 Jan 2020 01:36:57 -0500 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id A929DF09DF9202DD1280; Thu, 16 Jan 2020 14:36:54 +0800 (CST) Received: from huawei.com (10.175.124.28) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.439.0; Thu, 16 Jan 2020 14:36:45 +0800 From: yu kuai <yukuai3@huawei.com> To: <hch@infradead.org>, <darrick.wong@oracle.com> CC: <linux-xfs@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <yukuai3@huawei.com>, <houtao1@huawei.com>, <zhengbin13@huawei.com>, <yi.zhang@huawei.com> Subject: [RFC] iomap: fix race between readahead and direct write Date: Thu, 16 Jan 2020 14:36:01 +0800 Message-ID: <20200116063601.39201-1-yukuai3@huawei.com> X-Mailer: git-send-email 2.17.2 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.175.124.28] X-CFilter-Loop: Reflected Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: <linux-fsdevel.vger.kernel.org> X-Mailing-List: linux-fsdevel@vger.kernel.org
Series	[RFC] iomap: fix race between readahead and direct write \| expand [RFC] iomap: fix race between readahead and direct write

[RFC] iomap: fix race between readahead and direct write

Commit Message

Comments

Patch