From patchwork Wed Nov 8 19:00:58 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10048997 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id AE1A06032D for ; Wed, 8 Nov 2017 19:01:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9FE2329780 for ; Wed, 8 Nov 2017 19:01:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 944C829838; Wed, 8 Nov 2017 19:01:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9EF8C29780 for ; Wed, 8 Nov 2017 19:01:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752465AbdKHTBK (ORCPT ); Wed, 8 Nov 2017 14:01:10 -0500 Received: from mail-qt0-f194.google.com ([209.85.216.194]:54107 "EHLO mail-qt0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751975AbdKHTBG (ORCPT ); Wed, 8 Nov 2017 14:01:06 -0500 Received: by mail-qt0-f194.google.com with SMTP id n61so4652121qte.10 for ; Wed, 08 Nov 2017 11:01:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=SflL7CnH8qXSHgbPxyXtN36ytW7YiMcbAhTZyfPkfM4=; b=D6Oe97b1MKzByB60ItUDb3ghnGP88dm8ksu2ubA9pz+/qSA4JJmSObbJJzkJyfCzgS YvLiBKIe7TOUtXDriNrI7RmoLj7IGp/zTM07L7Nrj5BruPxRpB2/xREIbO0NwLWKpVa4 +lzWAXe+2m361nIFMy0gdMJNt2lVvKibmt7aXG6/6ojoZuuUTi6oVfhK6kDlMdELlZiP DhTZbYBNttla0ff0Tmc7vPJLT+vW+t83ni0i/Y8G5ZzG7gagK187YSxGiIL/+nvNfTV7 OQxZMPB0r3/PWrr+yjt0vrZvVae13xAkK4BM02ipsT0rp/c4eGzl77QUH+Ap+KrVzOoo 7PTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=SflL7CnH8qXSHgbPxyXtN36ytW7YiMcbAhTZyfPkfM4=; b=Kr0j3Epv6w2/H1IvxzsxXIK0Q74xOo9e4L88IhYmCS1TEYEVE2o6tjqsYr9uenU7Ix ZItyZwdSDcJH4IXVAwxB/ks0E+u80cHJUOxjMODPGzPYye3cRnHsT/pysKg9G1yUTnY4 gzg3sJN6jpH01jxFCcyfqqrgLi3Z6jm05mnf58uer9BVynR9+Hd9nvjJkKuTh2gLXCQg fYd34XjalCqTYFnsnDgVbft3BznTGZxuWHuafk4vcQ7MdU0cTjb5TAx3LRMDnE50gtvA 8oBjQ66HppeZssSuIdEbMS2Mpk4ojnYXSTBKjtIoEyc2HOzOII1H88evmvX1G3nRB5bZ uCRA== X-Gm-Message-State: AJaThX4D4AllV8VuOS7WTaQJf6b/ipgg0QsMkCCow+rSTzb7nxwVRaCN Ry5xZJuHqHCc0paM0MnSXepuhks0tFI= X-Google-Smtp-Source: ABhQp+RJ1M7R5tjrbom7L6FUGp1VbRtrJ5muKlUpwC2H6Vd0x4h+PCw+1HRJTfgjDk2Yiio+FG1R+Q== X-Received: by 10.200.26.37 with SMTP id v34mr2435890qtj.1.1510167665525; Wed, 08 Nov 2017 11:01:05 -0800 (PST) Received: from localhost (cpe-2606-A000-4381-1201-225-22FF-FEB3-E51A.dyn6.twc.com. [2606:a000:4381:1201:225:22ff:feb3:e51a]) by smtp.gmail.com with ESMTPSA id k9sm3242548qkl.10.2017.11.08.11.01.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Nov 2017 11:01:04 -0800 (PST) From: Josef Bacik To: hannes@cmpxchg.org, linux-mm@kvack.org, akpm@linux-foundation.org, jack@suse.cz, linux-fsdevel@vger.kernel.org Cc: Josef Bacik Subject: [PATCH 2/4] writeback: allow for dirty metadata accounting Date: Wed, 8 Nov 2017 14:00:58 -0500 Message-Id: <1510167660-26196-2-git-send-email-josef@toxicpanda.com> X-Mailer: git-send-email 2.7.5 In-Reply-To: <1510167660-26196-1-git-send-email-josef@toxicpanda.com> References: <1510167660-26196-1-git-send-email-josef@toxicpanda.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Josef Bacik Provide a mechanism for file systems to indicate how much dirty metadata they are holding. This introduces a few things 1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY. 2) WB stat for dirty metadata. This way we know if we need to try and call into the file system to write out metadata. This could potentially be used in the future to make balancing of dirty pages smarter. Signed-off-by: Josef Bacik --- drivers/base/node.c | 2 + fs/fs-writeback.c | 1 + fs/proc/meminfo.c | 2 + include/linux/backing-dev-defs.h | 1 + include/linux/mm.h | 7 +++ include/linux/mmzone.h | 1 + include/trace/events/writeback.h | 7 ++- mm/backing-dev.c | 2 + mm/page-writeback.c | 100 +++++++++++++++++++++++++++++++++++++-- mm/page_alloc.c | 7 ++- mm/vmscan.c | 3 +- 11 files changed, 125 insertions(+), 8 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 3855902f2c5b..39c031f44d4b 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -99,6 +99,7 @@ static ssize_t node_read_meminfo(struct device *dev, #endif n += sprintf(buf + n, "Node %d Dirty: %8lu kB\n" + "Node %d MetadataDirty: %8lu kB\n" "Node %d Writeback: %8lu kB\n" "Node %d FilePages: %8lu kB\n" "Node %d Mapped: %8lu kB\n" @@ -119,6 +120,7 @@ static ssize_t node_read_meminfo(struct device *dev, #endif , nid, K(node_page_state(pgdat, NR_FILE_DIRTY)), + nid, K(node_page_state(pgdat, NR_METADATA_DIRTY)), nid, K(node_page_state(pgdat, NR_WRITEBACK)), nid, K(node_page_state(pgdat, NR_FILE_PAGES)), nid, K(node_page_state(pgdat, NR_FILE_MAPPED)), diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 245c430a2e41..c5374a4fb982 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1822,6 +1822,7 @@ static unsigned long get_nr_dirty_pages(void) { return global_node_page_state(NR_FILE_DIRTY) + global_node_page_state(NR_UNSTABLE_NFS) + + global_node_page_state(NR_METADATA_DIRTY) + get_nr_dirty_inodes(); } diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index cdd979724c74..f1cafc2aaade 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -98,6 +98,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v) show_val_kb(m, "SwapFree: ", i.freeswap); show_val_kb(m, "Dirty: ", global_node_page_state(NR_FILE_DIRTY)); + seq_printf(m, "MetadataDirty: %8lu kB\n", + global_node_page_state(NR_METADATA_DIRTY)); show_val_kb(m, "Writeback: ", global_node_page_state(NR_WRITEBACK)); show_val_kb(m, "AnonPages: ", diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 866c433e7d32..013e764d4b30 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -36,6 +36,7 @@ typedef int (congested_fn)(void *, int); enum wb_stat_item { WB_RECLAIMABLE, WB_WRITEBACK, + WB_METADATA_DIRTY, WB_DIRTIED, WB_WRITTEN, NR_WB_STAT_ITEMS diff --git a/include/linux/mm.h b/include/linux/mm.h index f8c10d336e42..c6b4a6a62cc2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -32,6 +32,7 @@ struct file_ra_state; struct user_struct; struct writeback_control; struct bdi_writeback; +struct backing_dev_info; void init_mm_internals(void); @@ -1428,6 +1429,12 @@ int redirty_page_for_writepage(struct writeback_control *wbc, void account_page_dirtied(struct page *page, struct address_space *mapping); void account_page_cleaned(struct page *page, struct address_space *mapping, struct bdi_writeback *wb); +void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi); +void account_metadata_cleaned(struct page *page, struct backing_dev_info *bdi); +void account_metadata_writeback(struct page *page, + struct backing_dev_info *bdi); +void account_metadata_end_writeback(struct page *page, + struct backing_dev_info *bdi); int set_page_dirty(struct page *page); int set_page_dirty_lock(struct page *page); void cancel_dirty_page(struct page *page); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 356a814e7c8e..090fce6b1195 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -179,6 +179,7 @@ enum node_stat_item { NR_VMSCAN_IMMEDIATE, /* Prioritise for reclaim when writeback ends */ NR_DIRTIED, /* page dirtyings since bootup */ NR_WRITTEN, /* page writings since bootup */ + NR_METADATA_DIRTY, /* Metadata dirty pages */ NR_VM_NODE_STAT_ITEMS }; diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h index 9b57f014d79d..dd1564b5eab3 100644 --- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -402,6 +402,7 @@ TRACE_EVENT(global_dirty_state, TP_STRUCT__entry( __field(unsigned long, nr_dirty) + __field(unsigned long, nr_metadata_dirty) __field(unsigned long, nr_writeback) __field(unsigned long, nr_unstable) __field(unsigned long, background_thresh) @@ -413,6 +414,7 @@ TRACE_EVENT(global_dirty_state, TP_fast_assign( __entry->nr_dirty = global_node_page_state(NR_FILE_DIRTY); + __entry->nr_metadata_dirty = global_node_page_state(NR_METADATA_DIRTY); __entry->nr_writeback = global_node_page_state(NR_WRITEBACK); __entry->nr_unstable = global_node_page_state(NR_UNSTABLE_NFS); __entry->nr_dirtied = global_node_page_state(NR_DIRTIED); @@ -424,7 +426,7 @@ TRACE_EVENT(global_dirty_state, TP_printk("dirty=%lu writeback=%lu unstable=%lu " "bg_thresh=%lu thresh=%lu limit=%lu " - "dirtied=%lu written=%lu", + "dirtied=%lu written=%lu metadata_dirty=%lu", __entry->nr_dirty, __entry->nr_writeback, __entry->nr_unstable, @@ -432,7 +434,8 @@ TRACE_EVENT(global_dirty_state, __entry->dirty_thresh, __entry->dirty_limit, __entry->nr_dirtied, - __entry->nr_written + __entry->nr_written, + __entry->nr_metadata_dirty ) ); diff --git a/mm/backing-dev.c b/mm/backing-dev.c index e19606bb41a0..57f1dbc41f7e 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -76,6 +76,7 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v) "BackgroundThresh: %10lu kB\n" "BdiDirtied: %10lu kB\n" "BdiWritten: %10lu kB\n" + "BdiMetadataDirty: %10lu kB\n" "BdiWriteBandwidth: %10lu kBps\n" "b_dirty: %10lu\n" "b_io: %10lu\n" @@ -90,6 +91,7 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v) K(background_thresh), (unsigned long) K(wb_stat(wb, WB_DIRTIED)), (unsigned long) K(wb_stat(wb, WB_WRITTEN)), + (unsigned long) K(wb_stat(wb, WB_METADATA_DIRTY)), (unsigned long) K(wb->write_bandwidth), nr_dirty, nr_io, diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 1a47d4296750..9539eae4f088 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -507,6 +507,7 @@ bool node_dirty_ok(struct pglist_data *pgdat) nr_pages += node_page_state(pgdat, NR_FILE_DIRTY); nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS); nr_pages += node_page_state(pgdat, NR_WRITEBACK); + nr_pages += node_page_state(pgdat, NR_METADATA_DIRTY); return nr_pages <= limit; } @@ -1595,7 +1596,8 @@ static void balance_dirty_pages(struct bdi_writeback *wb, * been flushed to permanent storage. */ nr_reclaimable = global_node_page_state(NR_FILE_DIRTY) + - global_node_page_state(NR_UNSTABLE_NFS); + global_node_page_state(NR_UNSTABLE_NFS) + + global_node_page_state(NR_METADATA_DIRTY); gdtc->avail = global_dirtyable_memory(); gdtc->dirty = nr_reclaimable + global_node_page_state(NR_WRITEBACK); @@ -1936,7 +1938,8 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb) */ gdtc->avail = global_dirtyable_memory(); gdtc->dirty = global_node_page_state(NR_FILE_DIRTY) + - global_node_page_state(NR_UNSTABLE_NFS); + global_node_page_state(NR_UNSTABLE_NFS) + + global_node_page_state(NR_METADATA_DIRTY); domain_dirty_limits(gdtc); if (gdtc->dirty > gdtc->bg_thresh) @@ -1980,7 +1983,8 @@ void laptop_mode_timer_fn(unsigned long data) { struct request_queue *q = (struct request_queue *)data; int nr_pages = global_node_page_state(NR_FILE_DIRTY) + - global_node_page_state(NR_UNSTABLE_NFS); + global_node_page_state(NR_UNSTABLE_NFS) + + global_node_page_state(NR_METADATA_DIRTY); struct bdi_writeback *wb; /* @@ -2444,6 +2448,96 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) EXPORT_SYMBOL(account_page_dirtied); /* + * account_metadata_dirtied + * @page - the page being dirited + * @bdi - the bdi that owns this page + * + * Do the dirty page accounting for metadata pages that aren't backed by an + * address_space. + */ +void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi) +{ + unsigned long flags; + + local_irq_save(flags); + __inc_node_page_state(page, NR_METADATA_DIRTY); + __inc_zone_page_state(page, NR_ZONE_WRITE_PENDING); + __inc_node_page_state(page, NR_DIRTIED); + inc_wb_stat(&bdi->wb, WB_RECLAIMABLE); + inc_wb_stat(&bdi->wb, WB_DIRTIED); + inc_wb_stat(&bdi->wb, WB_METADATA_DIRTY); + current->nr_dirtied++; + task_io_account_write(PAGE_SIZE); + this_cpu_inc(bdp_ratelimits); + local_irq_restore(flags); +} +EXPORT_SYMBOL(account_metadata_dirtied); + +/* + * account_metadata_cleaned + * @page - the page being cleaned + * @bdi - the bdi that owns this page + * + * Called on a no longer dirty metadata page. + */ +void account_metadata_cleaned(struct page *page, struct backing_dev_info *bdi) +{ + unsigned long flags; + + local_irq_save(flags); + __dec_node_page_state(page, NR_METADATA_DIRTY); + __dec_zone_page_state(page, NR_ZONE_WRITE_PENDING); + dec_wb_stat(&bdi->wb, WB_RECLAIMABLE); + dec_wb_stat(&bdi->wb, WB_METADATA_DIRTY); + task_io_account_cancelled_write(PAGE_SIZE); + local_irq_restore(flags); +} +EXPORT_SYMBOL(account_metadata_cleaned); + +/* + * account_metadata_writeback + * @page - the page being marked as writeback + * @bdi - the bdi that owns this page + * + * Called on a metadata page that has been marked writeback. + */ +void account_metadata_writeback(struct page *page, + struct backing_dev_info *bdi) +{ + unsigned long flags; + + local_irq_save(flags); + inc_wb_stat(&bdi->wb, WB_WRITEBACK); + __inc_node_page_state(page, NR_WRITEBACK); + __dec_node_page_state(page, NR_METADATA_DIRTY); + dec_wb_stat(&bdi->wb, WB_METADATA_DIRTY); + dec_wb_stat(&bdi->wb, WB_RECLAIMABLE); + local_irq_restore(flags); +} +EXPORT_SYMBOL(account_metadata_writeback); + +/* + * account_metadata_end_writeback + * @page - the page we are ending writeback on + * @bdi - the bdi that owns this page + * + * Called on a metadata page that has completed writeback. + */ +void account_metadata_end_writeback(struct page *page, + struct backing_dev_info *bdi) +{ + unsigned long flags; + + local_irq_save(flags); + dec_wb_stat(&bdi->wb, WB_WRITEBACK); + __dec_node_page_state(page, NR_WRITEBACK); + __dec_zone_page_state(page, NR_ZONE_WRITE_PENDING); + __inc_node_page_state(page, NR_WRITTEN); + local_irq_restore(flags); +} +EXPORT_SYMBOL(account_metadata_end_writeback); + +/* * Helper function for deaccounting dirty page without writeback. * * Caller must hold lock_page_memcg(). diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c841af88836a..7f8eb1f861e5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4694,8 +4694,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n" " active_file:%lu inactive_file:%lu isolated_file:%lu\n" - " unevictable:%lu dirty:%lu writeback:%lu unstable:%lu\n" - " slab_reclaimable:%lu slab_unreclaimable:%lu\n" + " unevictable:%lu dirty:%lu metadata_dirty:%lu writeback:%lu\n" + " unstable:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n" " mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n" " free:%lu free_pcp:%lu free_cma:%lu\n", global_node_page_state(NR_ACTIVE_ANON), @@ -4706,6 +4706,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) global_node_page_state(NR_ISOLATED_FILE), global_node_page_state(NR_UNEVICTABLE), global_node_page_state(NR_FILE_DIRTY), + global_node_page_state(NR_METADATA_DIRTY), global_node_page_state(NR_WRITEBACK), global_node_page_state(NR_UNSTABLE_NFS), global_node_page_state(NR_SLAB_RECLAIMABLE), @@ -4732,6 +4733,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) " isolated(file):%lukB" " mapped:%lukB" " dirty:%lukB" + " metadata_dirty:%lukB" " writeback:%lukB" " shmem:%lukB" #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -4753,6 +4755,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) K(node_page_state(pgdat, NR_ISOLATED_FILE)), K(node_page_state(pgdat, NR_FILE_MAPPED)), K(node_page_state(pgdat, NR_FILE_DIRTY)), + K(node_page_state(pgdat, NR_METADATA_DIRTY)), K(node_page_state(pgdat, NR_WRITEBACK)), K(node_page_state(pgdat, NR_SHMEM)), #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/mm/vmscan.c b/mm/vmscan.c index 13d711dd8776..0281abd62e87 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3827,7 +3827,8 @@ static unsigned long node_pagecache_reclaimable(struct pglist_data *pgdat) /* If we can't clean pages, remove dirty pages from consideration */ if (!(node_reclaim_mode & RECLAIM_WRITE)) - delta += node_page_state(pgdat, NR_FILE_DIRTY); + delta += node_page_state(pgdat, NR_FILE_DIRTY) + + node_page_state(pgdat, NR_METADATA_DIRTY); /* Watch for any possible underflows due to delta */ if (unlikely(delta > nr_pagecache_reclaimable))