[v4] fs/buffer.c: update per-CPU bh_lru cache via RCU

For certain types of applications (for example PLC software or
RAN processing), upon occurrence of an event, it is necessary to
complete a certain task in a maximum amount of time (deadline).

One way to express this requirement is with a pair of numbers, 
deadline time and execution time, where:

	* deadline time: length of time between event and deadline.
	* execution time: length of time it takes for processing of event
			  to occur on a particular hardware platform
			  (uninterrupted).

The particular values depend on use-case. For the case
where the realtime application executes in a virtualized
guest, an IPI which must be serviced in the host will cause 
the following sequence of events:

	1) VM-exit
	2) execution of IPI (and function call)
	3) VM-entry

Which causes an excess of 50us latency as observed by cyclictest
(this violates the latency requirement of vRAN application with 1ms TTI,
for example).

invalidate_bh_lrus calls an IPI on each CPU that has non empty
per-CPU cache:

	on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1);

To avoid the IPI, free the per-CPU caches remotely via RCU.
Two bh_lrus structures for each CPU are allocated: one is being
used (assigned to per-CPU bh_lru pointer), and the other is
being freed (or idle).

An alternative solution would be to protect the fast path 
(__find_get_block) with a per-CPU spinlock. Then grab the 
lock from invalidate_bh_lru, when evaluating whether a given
CPUs buffer_head cache should be invalidated.
This solution would slow down the fast path.

Numbers (16 vCPU guest) for the following test:

for i in `seq 0 50`;
	mount -o loop alpine-standard-3.17.1-x86_64.iso /mnt/loop
	umount /mnt/loop
done

Where the time being measured is time between invalidate_bh_lrus 
function call start and return.

Unpatched: average is 2us
	     ┌                                        ┐
[ 0.0,  2.0) ┤████████████████████████▊ 53
[ 2.0,  4.0) ┤████████████████████████████████████  77
[ 4.0,  6.0) ┤████████▍ 18
[ 6.0,  8.0) ┤▌ 1
[ 8.0, 10.0) ┤  0
[10.0, 12.0) ┤  0
[12.0, 14.0) ┤▌ 1
[14.0, 16.0) ┤  0
[16.0, 18.0) ┤▌ 1
	     └                                        ┘
			   Frequency

Patched: average is 16us

	     ┌                                        ┐
[ 0.0, 10.0) ┤██████████████████▍ 35
[10.0, 20.0) ┤████████████████████████████████████  69
[20.0, 30.0) ┤██████████████████▍ 35
[30.0, 40.0) ┤████▎ 8
[40.0, 50.0) ┤█▌ 3
[50.0, 60.0) ┤█▏ 2
	     └                                        ┘
			   Frequency

The fact that invalidate_bh_lru() is now serialized should not be 
an issue, since invalidate_bdev does:

/* Invalidate clean unused buffers and pagecache. */
void invalidate_bdev(struct block_device *bdev)
{
	struct address_space *mapping = bdev->bd_inode->i_mapping;

	if (mapping->nrpages) {
		invalidate_bh_lrus();
		lru_add_drain_all();    /* make sure all lru add caches are flushed */
		invalidate_mapping_pages(mapping, 0, -1);
	}
}

Where lru_add_drain_all() is serialized by a single mutex lock
(and there have been no reported use cases where this
serialization is an issue).

Regarding scalability, considering the results above where 
it takes 16us to execute invalidate_bh_lrus on 16 CPUs
(where 8us are taken by synchronize_rcu_expedited),
we can assume 500ns per CPU. For a system with 
1024 CPUs, we can infer 8us + 1024*500ns ~= 500us
(which seems acceptable).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---

v4: improved changelog, no code change	(Dave Chinner)
v3: fix CPU hotplug
v2: fix sparse warnings (kernel test robot)

Message ID	ZCXipBvmhAC1+eRi@tpad (mailing list archive)
State	Mainlined, archived
Headers	show Return-Path: <linux-fsdevel-owner@vger.kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF833C7619A for <linux-fsdevel@archiver.kernel.org>; Thu, 30 Mar 2023 19:29:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231613AbjC3T3R (ORCPT <rfc822;linux-fsdevel@archiver.kernel.org>); Thu, 30 Mar 2023 15:29:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33508 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230245AbjC3T3Q (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>); Thu, 30 Mar 2023 15:29:16 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C530210264 for <linux-fsdevel@vger.kernel.org>; Thu, 30 Mar 2023 12:28:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680204506; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=/w6a9DZ97xRlu69CErtgPRkIVJNOsN242/AwObgPQ/M=; b=G9Ewg2s9ziGfeumAobkVrJ1fvG0+MTZ37CBOdtesYg7lZinV5UY7sIGZYlf2M3GI8ClMQW cnbCbNVpTAdTjHmwK1p8OMiMz05WbzcNt7mjrjFiIRPKxjw2U5EHh9DKDI6DD/2t7ZtD4N zwV72b0GDCka+8r7xtzz2bFaNsxJyuU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-203-hH9XuhwSO-a9e57TJO9n4A-1; Thu, 30 Mar 2023 15:28:22 -0400 X-MC-Unique: hH9XuhwSO-a9e57TJO9n4A-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 27FF21C0758A; Thu, 30 Mar 2023 19:28:22 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9C50C4020C82; Thu, 30 Mar 2023 19:28:21 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 077814038BEC0; Thu, 30 Mar 2023 16:27:32 -0300 (-03) Date: Thu, 30 Mar 2023 16:27:32 -0300 From: Marcelo Tosatti <mtosatti@redhat.com> To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk>, Matthew Wilcox <willy@infradead.org>, Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>, Frederic Weisbecker <frederic@kernel.org>, Leonardo Bras <leobras@redhat.com>, Yair Podemsky <ypodemsk@redhat.com>, P J P <ppandit@redhat.com> Subject: [PATCH v4] fs/buffer.c: update per-CPU bh_lru cache via RCU Message-ID: <ZCXipBvmhAC1+eRi@tpad> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: <linux-fsdevel.vger.kernel.org> X-Mailing-List: linux-fsdevel@vger.kernel.org
Series	[v4] fs/buffer.c: update per-CPU bh_lru cache via RCU \| expand [v4] fs/buffer.c: update per-CPU bh_lru cache via RCU

[v4] fs/buffer.c: update per-CPU bh_lru cache via RCU

Commit Message

Comments

Patch