[RFC,06/12] netfs: Keep lists of pending, active, dirty and flushed regions

This looks nice, in theory, and has the following features:

 (*) Things are managed with write records.

     (-) A WRITE is a region defined by an outer bounding box that spans
         the pages that are involved and an inner region that contains the
         actual modifications.

     (-) The bounding box must encompass all the data that will be
     	 necessary to perform a write operation to the server (for example,
     	 if we want to encrypt with a 64K block size when we have 4K
     	 pages).

 (*) There are four list of write records:

     (-) The PENDING LIST holds writes that are blocked by another active
     	 write.  This list is in order of submission to avoid starvation
     	 and may overlap.

     (-) The ACTIVE LIST holds writes that have been granted exclusive
     	 access to a patch.  This is in order of starting position and
     	 regions held therein may not overlap.

     (-) The DIRTY LIST holds a list of regions that have been modified.
     	 This is also in order of starting position and regions may not
     	 overlap, though they can be merged.

     (-) The FLUSH LIST holds a list of regions that require writing.  This
     	 is in order of grouping.

 (*) An active region acts as an exclusion zone on part of the range,
     allowing the inode sem to be dropped once the region is on a list.

     (-) A DIO write creates its own exclusive region that must not overlap
         with any other dirty region.

     (-) An active write may overlap one or more dirty regions.

     (-) A dirty region may be overlapped by one or more writes.

     (-) If an active write overlaps with an incompatible dirty region,
     	 that region gets flushed, the active write has to wait for it to
     	 complete.

 (*) When an active write completes, the region is inserted or merged into
     the dirty list.

     (-) Merging can only happen between compatible regions.

     (-) Contiguous dirty regions can be merged.

     (-) If an inode has all new content, generated locally, dirty regions
         that have contiguous/ovelapping bounding boxes can be merged,
         bridging any gaps with zeros.

     (-) O_DSYNC causes the region to be flushed immediately.

 (*) There's a queue of groups of regions and those regions must be flushed
     in order.

     (-) If a region in a group needs flushing, then all prior groups must
     	 be flushed first.

TRICKY BITS
===========

 (*) The active and dirty lists are O(n) search time.  An interval tree
     might be a better option.

 (*) Having four list_heads is a lot of memory per inode.

 (*) Activating pending writes.

     (-) The pending list can contain a bunch of writes that can overlap.

     (-) When an active write completes, it is removed from the active
     	 queue and usually added to the dirty queue (except DIO, DSYNC).
     	 This makes a hole.

     (-) One or more pending writes can then be moved over, but care has to
     	 be taken not to misorder them to avoid starvation.

     (-) When a pending write is added to the active list, it may require
     	 part of the dirty list to be flushed.

 (*) A write that has been put onto the active queue may have to wait for
     flushing to complete.

 (*) How should an active write interact with a dirty region?

     (-) A dirty region may get flushed even whilst it is being modified on
         the assumption that the active write record will get added to the
         dirty list and cause a follow up write to the server.

 (*) RAM pinning.

     (-) An active write could pin a lot of pages, thereby causing a large
     	 write to run the system out of RAM.

     (-) Allow active writes to start being flushed whilst still being
     	 modified.

     (-) Use a scheduler hook to decant the modified portion into the dirty
     	 list when the modifying task is switched away from?

 (*) Bounding box and variably-sized pages/folios.

     (-) The bounding box needs to be rounded out to the page boundaries so
     	 that DIO writes can claim exclusivity on a series of pages so that
     	 they can be invalidated.

     (-) Allocation of higher-order folios could be limited in scope so
     	 that they don't escape the requested bounding box.

     (-) Bounding boxes could be enlarged to allow for larger folios.

     (-) Overlarge bounding boxes can be shrunk later, possibly on merging
     	 into the dirty list.

     (-) Ordinary writes can have overlapping bounding boxes, even if
     	 they're otherwise incompatible.
---

 fs/afs/file.c                |   30 +
 fs/afs/internal.h            |    7 
 fs/afs/write.c               |  166 --------
 fs/netfs/Makefile            |    8 
 fs/netfs/dio_helper.c        |  140 ++++++
 fs/netfs/internal.h          |   32 +
 fs/netfs/objects.c           |  113 +++++
 fs/netfs/read_helper.c       |   94 ++++
 fs/netfs/stats.c             |    5 
 fs/netfs/write_helper.c      |  908 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/netfs.h        |   98 +++++
 include/trace/events/netfs.h |  180 ++++++++
 12 files changed, 1604 insertions(+), 177 deletions(-)
 create mode 100644 fs/netfs/dio_helper.c
 create mode 100644 fs/netfs/objects.c
 create mode 100644 fs/netfs/write_helper.c

Message ID	162687516812.276387.504081062999158040.stgit@warthog.procyon.org.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-cifs-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B09C5C6377A for <linux-cifs@archiver.kernel.org>; Wed, 21 Jul 2021 13:47:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9176561241 for <linux-cifs@archiver.kernel.org>; Wed, 21 Jul 2021 13:47:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238648AbhGUNGc (ORCPT <rfc822;linux-cifs@archiver.kernel.org>); Wed, 21 Jul 2021 09:06:32 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:27679 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238763AbhGUNFn (ORCPT <rfc822;linux-cifs@vger.kernel.org>); Wed, 21 Jul 2021 09:05:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875179; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i721gV0aIu0rvLexN0pBqMbwBbkKzEhWVB3KEf7BlKk=; b=fjSJTgr714BYvtBpuw/fi8+jODXPkrOa2yvTDy1xsbURRnrCGe1gCtSGPHsPP5uFe++sOM uQl8LlvAzuyQzEDvxrLHisPh16DS6p1xEjfFHcSA9Iz6s653BnhTjKhayeovdNvGmQYf6X wAaN995DfHofJUBm3OsrcPAArEd5I8Q= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-199-KrkAGfI-N4-Yedd1kx3LBQ-1; Wed, 21 Jul 2021 09:46:15 -0400 X-MC-Unique: KrkAGfI-N4-Yedd1kx3LBQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1B59C93920; Wed, 21 Jul 2021 13:46:13 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id E905A6EF4F; Wed, 21 Jul 2021 13:46:08 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 06/12] netfs: Keep lists of pending, active, dirty and flushed regions From: David Howells <dhowells@redhat.com> To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton <jlayton@redhat.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Anna Schumaker <anna.schumaker@netapp.com>, Steve French <sfrench@samba.org>, Dominique Martinet <asmadeus@codewreck.org>, Mike Marshall <hubcap@omnibond.com>, David Wysochanski <dwysocha@redhat.com>, Shyam Prasad N <nspmangalore@gmail.com>, Miklos Szeredi <miklos@szeredi.hu>, Linus Torvalds <torvalds@linux-foundation.org>, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:46:08 +0100 Message-ID: <162687516812.276387.504081062999158040.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: <linux-cifs.vger.kernel.org> X-Mailing-List: linux-cifs@vger.kernel.org
Series	[RFC,01/12] afs: Sort out symlink reading \| expand [RFC,01/12] afs: Sort out symlink reading [RFC,02/12] netfs: Add an iov_iter to the read subreq for the network fs/cache to use [RFC,03/12] netfs: Remove netfs_read_subrequest::transferred [RFC,04/12] netfs: Use a buffer in netfs_read_request and add pages to it [RFC,05/12] netfs: Add a netfs inode context [RFC,06/12] netfs: Keep lists of pending, active, dirty and flushed regions [RFC,07/12] netfs: Initiate write request from a dirty region [RFC,08/12] netfs: Keep dirty mark for pages with more than one dirty region [RFC,09/12] netfs: Send write request to multiple destinations [RFC,10/12] netfs: Do encryption in write preparatory phase [RFC,11/12] netfs: Put a list of regions in /proc/fs/netfs/regions [RFC,12/12] netfs: Export some read-request ref functions [RFC,13/12] netfs: Do copy-to-cache-on-read through VM writeback

[RFC,06/12] netfs: Keep lists of pending, active, dirty and flushed regions

Commit Message

Patch