From patchwork Tue May 29 08:26:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 10434523 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CBE18603B5 for ; Tue, 29 May 2018 08:26:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BC5A32863F for ; Tue, 29 May 2018 08:26:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AE8A828658; Tue, 29 May 2018 08:26:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C1CD92863F for ; Tue, 29 May 2018 08:26:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 720DC6B0007; Tue, 29 May 2018 04:26:55 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6D0246B000C; Tue, 29 May 2018 04:26:55 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E7C66B000D; Tue, 29 May 2018 04:26:55 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id 1E2536B0007 for ; Tue, 29 May 2018 04:26:55 -0400 (EDT) Received: by mail-pg0-f70.google.com with SMTP id z16-v6so3676982pge.21 for ; Tue, 29 May 2018 01:26:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ETGOxHNGQsD3pSWCxQO9YVQdyJXjmo2PJERAa5Ozl2c=; b=snNuSfdWuG3JHYKEp13U015HO7RSh8oRAauIQq093+YjCwczDUc1frgjcYWz0CeJ1u L1rSOsh9K7keNzMF8FmsIQpCYTROxSHAECp+crwKe7kQh6XDIoTaUCiWkXCt2TrGLAwj opqOLAfheF8v9JiXvOHbr2LzoLdpbAY0m4mv7k1X5MDpy6a9JQFUwcAUHkBdcgp2OmdJ 2IahV8q8MP1eYg9Iya5OnNeE+iBZlzO5tr+y3S0PxdGAMWphf6XFs9gi5ShJBuvqw6bW QUt0vmAw9Pz3vbZ+oCkLkn6wmm3AHEuR4X88o/9NiEnvij1N8b21TcVEreYnYtRPW9n9 TnVw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Gm-Message-State: ALKqPwd/73ANH81Z7FAZaW0OwYjiN15eIadL6ZTquaj71C/ioHQVjHvl 6lDf094SaMhktVyNmeASbcSUHmGpn9cV/F7QNxNVAbcFuJp32oqt89o3Lrdenof9IOPb8ohU/Qc wYvVmG22/pdYFbVcRXcS3PVqlAHOXgiVOi8GwtzWRqTq0kWKRTPB4PqrMAs8HcKEAjHPRmqOU3Y pRw/M6IDWpMXwhNqPkRUynE0o89mkcPQ9y2hoaeMx76e46oYDwJE2Z18a2k4D4TJGzq1kBQxTmd h3mEW6eg7+fP2omcKHmEqH4JJkM8dRcwadO0bUMxi4vAhWdLgwHal7HVq0LaliyDwhOQd8j262r YbZjVVRNuoOJ0LV4MyewbRYiTj1g+CezFGedGbxMLzxBOnyX2d42Uv+OOHANH4eqc9x/vXlylg= = X-Received: by 2002:a17:902:76c3:: with SMTP id j3-v6mr16272598plt.15.1527582414797; Tue, 29 May 2018 01:26:54 -0700 (PDT) X-Received: by 2002:a17:902:76c3:: with SMTP id j3-v6mr16272573plt.15.1527582414063; Tue, 29 May 2018 01:26:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527582414; cv=none; d=google.com; s=arc-20160816; b=WQtwbx4JLEfHZtZdhnAbbaTaqD7+8W0d5mHAQyWUDxQwsKe/Qp43AaLQAOGnBXOWOb EZFzs+rYVhCf643YFJgMNpBj1g8yF3sSh9hXhST7G9bQJPEzS0XxQskaMcPYKp2sFoeS 7LumsJAuLfK4ZQ22IrpQn7773DG0pYRlt1Z64MPqG7PWcL9GzaFgO9TaWtwbYeYlSO7I d3nYerDlFRzD+CMhsXgdhKbKiTR6TTuG8q0bjjiO44xQ4xRY46L46gUcv6teFxWWMNT0 faOwjEfSf9T1KguiCdKY4VPzwb5jKt/EMY68X864s5PwACYFSy5dxmh3EnMcpkuxWQ5u WZtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=ETGOxHNGQsD3pSWCxQO9YVQdyJXjmo2PJERAa5Ozl2c=; b=EQ2G6bZ3+buYw33d4KtUhbicXVepmwDVTEfCjoJJ35rcdGOX8dwfYV19aV5GdeZlKU VTcRMs0bdI9WSOnNYJ3sFK+FTrhJ34BP8MLZjfCOOjKuGUZJ4QKGyhcO3aIyZXUYlQM0 54nVd2ZTBzGJxRNJPfmjR3UibQlGXf6Kzt0l2nY5S1w8A0vRlyzJnhMF95gGtjrqxWWx tgm3YpyhBffkFlOdy2B/HGHvj7dBwDofoa7NTwL0xlIwq/X60YGExcQnkelTD2uQWEbj +H6M/Yued5DDyzWUjXBvktmQ8RzZM/kj90WlPHqEkvhHeQqOt5VvtuQsz1tydL82lR/A PdCw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id m26-v6sor8558403pgc.77.2018.05.29.01.26.53 for (Google Transport Security); Tue, 29 May 2018 01:26:54 -0700 (PDT) Received-SPF: pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Google-Smtp-Source: AB8JxZoPz0fJQstaPmF87Ydwd4CIU/xgUD7m52m15a8g/2SyVpTSY1Tpg3vYQCChJlF6J4WqZghBWg== X-Received: by 2002:a63:9741:: with SMTP id d1-v6mr13250569pgo.447.1527582413543; Tue, 29 May 2018 01:26:53 -0700 (PDT) Received: from tiehlicka.suse.cz (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id 63-v6sm56766162pgi.4.2018.05.29.01.26.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 May 2018 01:26:52 -0700 (PDT) From: Michal Hocko To: Jonathan Corbet Cc: Dave Chinner , Randy Dunlap , Mike Rapoport , LKML , , , Michal Hocko Subject: [PATCH v2] doc: document scope NOFS, NOIO APIs Date: Tue, 29 May 2018 10:26:44 +0200 Message-Id: <20180529082644.26192-1-mhocko@kernel.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180524114341.1101-1-mhocko@kernel.org> References: <20180524114341.1101-1-mhocko@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko Although the api is documented in the source code Ted has pointed out that there is no mention in the core-api Documentation and there are people looking there to find answers how to use a specific API. Changes since v1 - add kerneldoc for the api - suggested by Johnatan - review feedback from Dave and Johnatan - feedback from Dave about more general critical context rather than locking - feedback from Mike - typo fixed - Randy, Dave Requested-by: "Theodore Y. Ts'o" Signed-off-by: Michal Hocko Reviewed-by: Dave Chinner --- .../core-api/gfp_mask-from-fs-io.rst | 61 +++++++++++++++++++ Documentation/core-api/index.rst | 1 + include/linux/sched/mm.h | 38 ++++++++++++ 3 files changed, 100 insertions(+) create mode 100644 Documentation/core-api/gfp_mask-from-fs-io.rst diff --git a/Documentation/core-api/gfp_mask-from-fs-io.rst b/Documentation/core-api/gfp_mask-from-fs-io.rst new file mode 100644 index 000000000000..2dc442b04a77 --- /dev/null +++ b/Documentation/core-api/gfp_mask-from-fs-io.rst @@ -0,0 +1,61 @@ +================================= +GFP masks used from FS/IO context +================================= + +:Date: May, 2018 +:Author: Michal Hocko + +Introduction +============ + +Code paths in the filesystem and IO stacks must be careful when +allocating memory to prevent recursion deadlocks caused by direct +memory reclaim calling back into the FS or IO paths and blocking on +already held resources (e.g. locks - most commonly those used for the +transaction context). + +The traditional way to avoid this deadlock problem is to clear __GFP_FS +respectively __GFP_IO (note the latter implies clearing the first as well) in +the gfp mask when calling an allocator. GFP_NOFS respectively GFP_NOIO can be +used as shortcut. It turned out though that above approach has led to +abuses when the restricted gfp mask is used "just in case" without a +deeper consideration which leads to problems because an excessive use +of GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory +reclaim issues. + +New API +======== + +Since 4.12 we do have a generic scope API for both NOFS and NOIO context +``memalloc_nofs_save``, ``memalloc_nofs_restore`` respectively ``memalloc_noio_save``, +``memalloc_noio_restore`` which allow to mark a scope to be a critical +section from a filesystem or I/O point of view. Any allocation from that +scope will inherently drop __GFP_FS respectively __GFP_IO from the given +mask so no memory allocation can recurse back in the FS/IO. + +FS/IO code then simply calls the appropriate save function before +any critical section with respect to the reclaim is started - e.g. +lock shared with the reclaim context or when a transaction context +nesting would be possible via reclaim. The restore function should be +called when the critical section ends. All that ideally along with an +explanation what is the reclaim context for easier maintenance. + +Please note that the proper pairing of save/restore functions +allows nesting so it is safe to call ``memalloc_noio_save`` or +``memalloc_noio_restore`` respectively from an existing NOIO or NOFS +scope. + +What about __vmalloc(GFP_NOFS) +============================== + +vmalloc doesn't support GFP_NOFS semantic because there are hardcoded +GFP_KERNEL allocations deep inside the allocator which are quite non-trivial +to fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is +almost always a bug. The good news is that the NOFS/NOIO semantic can be +achieved by the scope API. + +In the ideal world, upper layers should already mark dangerous contexts +and so no special care is required and vmalloc should be called without +any problems. Sometimes if the context is not really clear or there are +layering violations then the recommended way around that is to wrap ``vmalloc`` +by the scope API with a comment explaining the problem. diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index c670a8031786..8a5f48ef16f2 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -25,6 +25,7 @@ Core utilities genalloc errseq printk-formats + gfp_mask-from-fs-io Interfaces for kernel debugging =============================== diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index e1f8411e6b80..af5ba077bbc4 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -166,6 +166,17 @@ static inline void fs_reclaim_acquire(gfp_t gfp_mask) { } static inline void fs_reclaim_release(gfp_t gfp_mask) { } #endif +/** + * memalloc_noio_save - Marks implicit GFP_NOIO allocation scope. + * + * This functions marks the beginning of the GFP_NOIO allocation scope. + * All further allocations will implicitly drop __GFP_IO flag and so + * they are safe for the IO critical section from the allocation recursion + * point of view. Use memalloc_noio_restore to end the scope with flags + * returned by this function. + * + * This function is safe to be used from any context. + */ static inline unsigned int memalloc_noio_save(void) { unsigned int flags = current->flags & PF_MEMALLOC_NOIO; @@ -173,11 +184,30 @@ static inline unsigned int memalloc_noio_save(void) return flags; } +/** + * memalloc_noio_restore - Ends the implicit GFP_NOIO scope. + * @flags: Flags to restore. + * + * Ends the implicit GFP_NOIO scope started by memalloc_noio_save function. + * Always make sure that that the given flags is the return value from the + * pairing memalloc_noio_save call. + */ static inline void memalloc_noio_restore(unsigned int flags) { current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags; } +/** + * memalloc_nofs_save - Marks implicit GFP_NOFS allocation scope. + * + * This functions marks the beginning of the GFP_NOFS allocation scope. + * All further allocations will implicitly drop __GFP_FS flag and so + * they are safe for the FS critical section from the allocation recursion + * point of view. Use memalloc_nofs_restore to end the scope with flags + * returned by this function. + * + * This function is safe to be used from any context. + */ static inline unsigned int memalloc_nofs_save(void) { unsigned int flags = current->flags & PF_MEMALLOC_NOFS; @@ -185,6 +215,14 @@ static inline unsigned int memalloc_nofs_save(void) return flags; } +/** + * memalloc_nofs_restore - Ends the implicit GFP_NOFS scope. + * @flags: Flags to restore. + * + * Ends the implicit GFP_NOFS scope started by memalloc_nofs_save function. + * Always make sure that that the given flags is the return value from the + * pairing memalloc_nofs_save call. + */ static inline void memalloc_nofs_restore(unsigned int flags) { current->flags = (current->flags & ~PF_MEMALLOC_NOFS) | flags;