diff mbox

[2/2] radix-tree: Fix optimisation problem

Message ID DM2PR21MB0089CA7DCF4845DB02E0E05FCBC80@DM2PR21MB0089.namprd21.prod.outlook.com (mailing list archive)
State New, archived
Headers show

Commit Message

Matthew Wilcox Sept. 23, 2016, 8:16 p.m. UTC
From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds
> On Thu, Sep 22, 2016 at 11:53 AM, Matthew Wilcox
> <mawilcox@linuxonhyperv.com> wrote:
> >
> >           Change the test suite to compile with -O2, and
> > fix the optimisation problem by passing 'entry' through entry_to_node()
> > so gcc knows this isn't a plain pointer.
> 
> Ugh. I really don't like this patch very much.
> 
> Wouldn't it be cleaner to just fix "get_slot_offset()" instead? As it
> is, looking at the code, I suspect that it's really hard to convince
> people that there isn't some other place this might happen. Because
> the "pointer subtraction followed by pointer addition" pattern is all
> hidden in these inline functions.
> 
> Or at least add a big comment about why this is the only such case.
> 
> Because without that, the code now looks very bad.

That's fair.  I looked at all the other callers of get_slot_offset, and all the others are using a real slot pointer.  radix_tree_descend() really is the outlier here.  I think the real problem is that the types in the tree are wrong; instead of storing void *, we should be storing uintptr_t.  But fixing that is a little beyond the scope of -rc8.  Here's a slightly better version which asserts that the passed pointer really is a pointer.

(attached as well, I have no idea whether this patch will get mangled)

Comments

Linus Torvalds Sept. 24, 2016, 8:21 p.m. UTC | #1
On Fri, Sep 23, 2016 at 1:16 PM, Matthew Wilcox <mawilcox@microsoft.com> wrote:
>
>  #ifdef CONFIG_RADIX_TREE_MULTIORDER
>         if (radix_tree_is_internal_node(entry)) {
> -               unsigned long siboff = get_slot_offset(parent, entry);
> +               unsigned long siboff = get_slot_offset(parent,
> +                                               (void **)entry_to_node(entry));

I feel that it is *this* part that I think needs a huge honking comment.

If you are going to make get_slot_offset() different, then you could
just rewrite get_slot_offset() to do

        unsigned long diff = (unsigned long) slot - (unsigned
long)parent->slots;
        return diff / sizeof(void *);

and add a comment to say "don't do this as a pointer diff, because
'slot' may not be an aligned pointer". No BUG_ON() necessary, because
it "just works".

At that point, gcc should just generate the right code, because it
doesn't see it as a pointer subtraction followed by a pointer
addition.

And yes, that crazy " (void **)entry_to_node(entry)" fixes it *too*,
but it needs a *comment*.

Why is that special, when all the other uses of get_slot_offset()
don't have that? *That* is what should be explained. Not some internal
detail.

That said, if this code isn't even used, as Konstantin says (THP
selects it - doesn't THP use it?), then the fix really should be to
just remove the odd code instead of adding to it.

Looking around for uses that set "order" to anything but zero, I
really don't see it. So maybe we should just do *that* trivial thing
instead, and remove CONFIG_RADIX_TREE_MULTIORDER, since it's appears
to be buggy and always has been.

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirill A . Shutemov Sept. 24, 2016, 9:04 p.m. UTC | #2
On Sat, Sep 24, 2016 at 01:21:36PM -0700, Linus Torvalds wrote:
> On Fri, Sep 23, 2016 at 1:16 PM, Matthew Wilcox <mawilcox@microsoft.com> wrote:
> >
> >  #ifdef CONFIG_RADIX_TREE_MULTIORDER
> >         if (radix_tree_is_internal_node(entry)) {
> > -               unsigned long siboff = get_slot_offset(parent, entry);
> > +               unsigned long siboff = get_slot_offset(parent,
> > +                                               (void **)entry_to_node(entry));
> 
> I feel that it is *this* part that I think needs a huge honking comment.
> 
> If you are going to make get_slot_offset() different, then you could
> just rewrite get_slot_offset() to do
> 
>         unsigned long diff = (unsigned long) slot - (unsigned
> long)parent->slots;
>         return diff / sizeof(void *);
> 
> and add a comment to say "don't do this as a pointer diff, because
> 'slot' may not be an aligned pointer". No BUG_ON() necessary, because
> it "just works".
> 
> At that point, gcc should just generate the right code, because it
> doesn't see it as a pointer subtraction followed by a pointer
> addition.
> 
> And yes, that crazy " (void **)entry_to_node(entry)" fixes it *too*,
> but it needs a *comment*.
> 
> Why is that special, when all the other uses of get_slot_offset()
> don't have that? *That* is what should be explained. Not some internal
> detail.
> 
> That said, if this code isn't even used, as Konstantin says (THP
> selects it - doesn't THP use it?), then the fix really should be to
> just remove the odd code instead of adding to it.
> 
> Looking around for uses that set "order" to anything but zero, I
> really don't see it. So maybe we should just do *that* trivial thing
> instead, and remove CONFIG_RADIX_TREE_MULTIORDER, since it's appears
> to be buggy and always has been.

Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
It also converts shmem-with-huge-pages and hugetlb to them.

I'm okay with converting it to other mechanism, but I need something.
(I looked into Konstantin's RFC patchset[2]. It looks okay, but I don't
feel myself qualified to review it as I don't know much about radix-tree
internals.)

[1] http://lkml.kernel.org/r/20160915115523.29737-1-kirill.shutemov@linux.intel.com
[2] http://lkml.kernel.org/r/147230727479.9957.1087787722571077339.stgit@zurg
Linus Torvalds Sept. 24, 2016, 10:54 p.m. UTC | #3
On Sat, Sep 24, 2016 at 2:04 PM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
> It also converts shmem-with-huge-pages and hugetlb to them.

Ok, so that code actually has a chance of being used. I guess we'll
not remove it. But I *would* like this subtle issue to have a comment
around that odd cast/and/mask thing.

            Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1b7bf73..368f641 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -91,9 +91,15 @@  static inline bool is_sibling_entry(struct radix_tree_node *parent, void *node)
 }
 #endif
 
+/*
+ * The slot pointer must be a real pointer as GCC will optimise
+ * through inlined functions and may deduce that
+ * parent->slots + get_slot_offset(parent, slot) == slot
+ */
 static inline unsigned long get_slot_offset(struct radix_tree_node *parent,
 						 void **slot)
 {
+	BUG_ON(radix_tree_exception(slot));
 	return slot - parent->slots;
 }
 
@@ -101,11 +107,12 @@  static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 			struct radix_tree_node **nodep, unsigned long index)
 {
 	unsigned int offset = (index >> parent->shift) & RADIX_TREE_MAP_MASK;
-	void **entry = rcu_dereference_raw(parent->slots[offset]);
+	void *entry = rcu_dereference_raw(parent->slots[offset]);
 
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
 	if (radix_tree_is_internal_node(entry)) {
-		unsigned long siboff = get_slot_offset(parent, entry);
+		unsigned long siboff = get_slot_offset(parent,
+						(void **)entry_to_node(entry));
 		if (siboff < RADIX_TREE_MAP_SIZE) {
 			offset = siboff;
 			entry = rcu_dereference_raw(parent->slots[offset]);
@@ -113,7 +120,7 @@  static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 	}
 #endif
 
-	*nodep = (void *)entry;
+	*nodep = entry;
 	return offset;
 }
 
diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index 3b53046..9d0919ed 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -1,5 +1,5 @@ 
 
-CFLAGS += -I. -g -Wall -D_LGPL_SOURCE
+CFLAGS += -I. -g -O2 -Wall -D_LGPL_SOURCE
 LDFLAGS += -lpthread -lurcu
 TARGETS = main
 OFILES = main.o radix-tree.o linux.o test.o tag_check.o find_next_bit.o \