[2/2] radix-tree: Fix optimisation problem

Message ID	CA+55aFw9=wqyA4xO1KKJoH7xsj6poWFrWTddcNBR5tkDOn8SYg@mail.gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> MIME-Version: 1.0 In-Reply-To: <CA+55aFxOJTOvxhv+hECHuGV+=xBHMuQitu86J=qBNmMYQ1ACSg@mail.gmail.com> References: <1474570415-14938-1-git-send-email-mawilcox@linuxonhyperv.com> <1474570415-14938-3-git-send-email-mawilcox@linuxonhyperv.com> <CALXu0Ucx-6PeEk9nTD-4nZvwyVr9LLXcFGFzhctX-ucKfCygGA@mail.gmail.com> <CA+55aFyRG=us-EKnomo=QPE0GR1Qdxyw1Ozmuzw0EJcSr7U3hQ@mail.gmail.com> <CALXu0UfuwGM+H0YnfSNW6O=hgcUrmws+ihHLVB=OJWOp8YCwgw@mail.gmail.com> <CA+55aFzge97L-JLKZq0CTW1wtMOsnt8QzOw3b5qCMmzbKxZ5aw@mail.gmail.com> <CA+55aFxOJTOvxhv+hECHuGV+=xBHMuQitu86J=qBNmMYQ1ACSg@mail.gmail.com> From: Linus Torvalds <torvalds@linux-foundation.org> Date: Sun, 25 Sep 2016 12:56:17 -0700 Message-ID: <CA+55aFw9=wqyA4xO1KKJoH7xsj6poWFrWTddcNBR5tkDOn8SYg@mail.gmail.com> Subject: Re: [PATCH 2/2] radix-tree: Fix optimisation problem To: Cedric Blancher <cedric.blancher@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>, Matthew Wilcox <mawilcox@linuxonhyperv.com>, Konstantin Khlebnikov <koct9i@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, Linux MM <linux-mm@kvack.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Matthew Wilcox <mawilcox@microsoft.com> Content-Type: multipart/mixed; boundary=001a113cf2e29b8734053d5a673e Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk

Message ID

CA+55aFw9=wqyA4xO1KKJoH7xsj6poWFrWTddcNBR5tkDOn8SYg@mail.gmail.com (mailing list archive)

State

New, archived

Headers

MIME-Version: 1.0
In-Reply-To: <CA+55aFxOJTOvxhv+hECHuGV+=xBHMuQitu86J=qBNmMYQ1ACSg@mail.gmail.com>
References: <1474570415-14938-1-git-send-email-mawilcox@linuxonhyperv.com>
	<1474570415-14938-3-git-send-email-mawilcox@linuxonhyperv.com>
	<CALXu0Ucx-6PeEk9nTD-4nZvwyVr9LLXcFGFzhctX-ucKfCygGA@mail.gmail.com>
	<CA+55aFyRG=us-EKnomo=QPE0GR1Qdxyw1Ozmuzw0EJcSr7U3hQ@mail.gmail.com>
	<CALXu0UfuwGM+H0YnfSNW6O=hgcUrmws+ihHLVB=OJWOp8YCwgw@mail.gmail.com>
	<CA+55aFzge97L-JLKZq0CTW1wtMOsnt8QzOw3b5qCMmzbKxZ5aw@mail.gmail.com>
	<CA+55aFxOJTOvxhv+hECHuGV+=xBHMuQitu86J=qBNmMYQ1ACSg@mail.gmail.com>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 25 Sep 2016 12:56:17 -0700
Message-ID: <CA+55aFw9=wqyA4xO1KKJoH7xsj6poWFrWTddcNBR5tkDOn8SYg@mail.gmail.com>
Subject: Re: [PATCH 2/2] radix-tree: Fix optimisation problem
To: Cedric Blancher <cedric.blancher@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Matthew Wilcox <mawilcox@linuxonhyperv.com>,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Matthew Wilcox <mawilcox@microsoft.com>
Content-Type: multipart/mixed; boundary=001a113cf2e29b8734053d5a673e
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk

Commit Message

Linus Torvalds Sept. 25, 2016, 7:56 p.m. UTC

On Sun, Sep 25, 2016 at 12:04 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>        It gets rid of
> the ad-hoc arithmetic in radix_tree_descend(), and just makes all that
> be inside the is_sibling_entry() logic instead. Which got renamed and
> made to actually return the main sibling.

Sadly, it looks like gcc generates bad code for this approach. Looks
like it ends up testing the resulting sibling pointer twice (because
we explicitly disable -fno-delete-null-pointer-checks in the kernel,
and we have no way to say "look, I know this pointer I'm returning is
non-null").

So a smaller patch that keeps the old boolean "is_sibling_entry()" but
then actually *uses* that inside radix_tree_descend() and then tries
to make the nasty cast to "void **" more legible by making it use a
temporary variable seems to be a reasonable balance.

At least I feel like I can still read the code, but admittedly by now
that may be because I've stared at those few lines so much that I feel
like I know what's going on. So maybe the code isn't actually any more
legible after all.

.. and unlike my previous patch, it actually generates better code
than the original (while still passing the fixed test-suite, of
course). The reason seems to be exactly that temporary variable,
allowing us to just do

        entry = rcu_dereference_raw(*sibentry);

rather than doing

        entry = rcu_dereference_raw(parent->slots[offset]);

with the re-computed offset.

So I think I'll commit this unless somebody screams.

                     Linus

Comments

Matthew Wilcox Sept. 26, 2016, 9:28 p.m. UTC | #1

From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds

> On Sun, Sep 25, 2016 at 12:04 PM, Linus Torvalds

> <torvalds@linux-foundation.org> wrote:

> >        It gets rid of

> > the ad-hoc arithmetic in radix_tree_descend(), and just makes all that

> > be inside the is_sibling_entry() logic instead. Which got renamed and

> > made to actually return the main sibling.

> 

> Sadly, it looks like gcc generates bad code for this approach. Looks

> like it ends up testing the resulting sibling pointer twice (because

> we explicitly disable -fno-delete-null-pointer-checks in the kernel,

> and we have no way to say "look, I know this pointer I'm returning is

> non-null").

> 

> So a smaller patch that keeps the old boolean "is_sibling_entry()" but

> then actually *uses* that inside radix_tree_descend() and then tries

> to make the nasty cast to "void **" more legible by making it use a

> temporary variable seems to be a reasonable balance.

> 

> At least I feel like I can still read the code, but admittedly by now

> that may be because I've stared at those few lines so much that I feel

> like I know what's going on. So maybe the code isn't actually any more

> legible after all.

> 

> .. and unlike my previous patch, it actually generates better code

> than the original (while still passing the fixed test-suite, of

> course). The reason seems to be exactly that temporary variable,

> allowing us to just do

> 

>         entry = rcu_dereference_raw(*sibentry);

> 

> rather than doing

> 

>         entry = rcu_dereference_raw(parent->slots[offset]);

> 

> with the re-computed offset.

> 

> So I think I'll commit this unless somebody screams.


Acked-by: Matthew Wilcox <mawilcox@microsoft.com>


I don't love it.  But I think it's a reasonable fix for this point in the release cycle, and I have an idea for changing the representation of sibling slots that will make this moot.

(Basically adopting Konstantin's idea for using the *last* entry instead of the *first*, and then using entries of the form (offset << 2 | RADIX_TREE_INTERNAL_NODE), so we can identify sibling entries without knowing the parent pointer, and we can go straight from sibling entry to slot offset as a shift rather than as a pointer subtraction).

Cedric Blancher Sept. 26, 2016, 9:48 p.m. UTC | #2

You might also try to use valid, plain ISO C99 instead of perverted
gcc extensions which only cause a lot of trouble in the long run.

Ced

On 26 September 2016 at 23:28, Matthew Wilcox <mawilcox@microsoft.com> wrote:
> From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds
>> On Sun, Sep 25, 2016 at 12:04 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >        It gets rid of
>> > the ad-hoc arithmetic in radix_tree_descend(), and just makes all that
>> > be inside the is_sibling_entry() logic instead. Which got renamed and
>> > made to actually return the main sibling.
>>
>> Sadly, it looks like gcc generates bad code for this approach. Looks
>> like it ends up testing the resulting sibling pointer twice (because
>> we explicitly disable -fno-delete-null-pointer-checks in the kernel,
>> and we have no way to say "look, I know this pointer I'm returning is
>> non-null").
>>
>> So a smaller patch that keeps the old boolean "is_sibling_entry()" but
>> then actually *uses* that inside radix_tree_descend() and then tries
>> to make the nasty cast to "void **" more legible by making it use a
>> temporary variable seems to be a reasonable balance.
>>
>> At least I feel like I can still read the code, but admittedly by now
>> that may be because I've stared at those few lines so much that I feel
>> like I know what's going on. So maybe the code isn't actually any more
>> legible after all.
>>
>> .. and unlike my previous patch, it actually generates better code
>> than the original (while still passing the fixed test-suite, of
>> course). The reason seems to be exactly that temporary variable,
>> allowing us to just do
>>
>>         entry = rcu_dereference_raw(*sibentry);
>>
>> rather than doing
>>
>>         entry = rcu_dereference_raw(parent->slots[offset]);
>>
>> with the re-computed offset.
>>
>> So I think I'll commit this unless somebody screams.
>
> Acked-by: Matthew Wilcox <mawilcox@microsoft.com>
>
> I don't love it.  But I think it's a reasonable fix for this point in the release cycle, and I have an idea for changing the representation of sibling slots that will make this moot.
>
> (Basically adopting Konstantin's idea for using the *last* entry instead of the *first*, and then using entries of the form (offset << 2 | RADIX_TREE_INTERNAL_NODE), so we can identify sibling entries without knowing the parent pointer, and we can go straight from sibling entry to slot offset as a shift rather than as a pointer subtraction).

 lib/radix-tree.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1b7bf7314141..91f0727e3cad 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -105,10 +105,10 @@  static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
 	if (radix_tree_is_internal_node(entry)) {
-		unsigned long siboff = get_slot_offset(parent, entry);
-		if (siboff < RADIX_TREE_MAP_SIZE) {
-			offset = siboff;
-			entry = rcu_dereference_raw(parent->slots[offset]);
+		if (is_sibling_entry(parent, entry)) {
+			void **sibentry = (void **) entry_to_node(entry);
+			offset = get_slot_offset(parent, sibentry);
+			entry = rcu_dereference_raw(*sibentry);
 		}
 	}
 #endif

[2/2] radix-tree: Fix optimisation problem

Commit Message

Comments

Patch