Message ID | CA+55aFw9=wqyA4xO1KKJoH7xsj6poWFrWTddcNBR5tkDOn8SYg@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds > On Sun, Sep 25, 2016 at 12:04 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > It gets rid of > > the ad-hoc arithmetic in radix_tree_descend(), and just makes all that > > be inside the is_sibling_entry() logic instead. Which got renamed and > > made to actually return the main sibling. > > Sadly, it looks like gcc generates bad code for this approach. Looks > like it ends up testing the resulting sibling pointer twice (because > we explicitly disable -fno-delete-null-pointer-checks in the kernel, > and we have no way to say "look, I know this pointer I'm returning is > non-null"). > > So a smaller patch that keeps the old boolean "is_sibling_entry()" but > then actually *uses* that inside radix_tree_descend() and then tries > to make the nasty cast to "void **" more legible by making it use a > temporary variable seems to be a reasonable balance. > > At least I feel like I can still read the code, but admittedly by now > that may be because I've stared at those few lines so much that I feel > like I know what's going on. So maybe the code isn't actually any more > legible after all. > > .. and unlike my previous patch, it actually generates better code > than the original (while still passing the fixed test-suite, of > course). The reason seems to be exactly that temporary variable, > allowing us to just do > > entry = rcu_dereference_raw(*sibentry); > > rather than doing > > entry = rcu_dereference_raw(parent->slots[offset]); > > with the re-computed offset. > > So I think I'll commit this unless somebody screams. Acked-by: Matthew Wilcox <mawilcox@microsoft.com> I don't love it. But I think it's a reasonable fix for this point in the release cycle, and I have an idea for changing the representation of sibling slots that will make this moot. (Basically adopting Konstantin's idea for using the *last* entry instead of the *first*, and then using entries of the form (offset << 2 | RADIX_TREE_INTERNAL_NODE), so we can identify sibling entries without knowing the parent pointer, and we can go straight from sibling entry to slot offset as a shift rather than as a pointer subtraction).
You might also try to use valid, plain ISO C99 instead of perverted gcc extensions which only cause a lot of trouble in the long run. Ced On 26 September 2016 at 23:28, Matthew Wilcox <mawilcox@microsoft.com> wrote: > From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds >> On Sun, Sep 25, 2016 at 12:04 PM, Linus Torvalds >> <torvalds@linux-foundation.org> wrote: >> > It gets rid of >> > the ad-hoc arithmetic in radix_tree_descend(), and just makes all that >> > be inside the is_sibling_entry() logic instead. Which got renamed and >> > made to actually return the main sibling. >> >> Sadly, it looks like gcc generates bad code for this approach. Looks >> like it ends up testing the resulting sibling pointer twice (because >> we explicitly disable -fno-delete-null-pointer-checks in the kernel, >> and we have no way to say "look, I know this pointer I'm returning is >> non-null"). >> >> So a smaller patch that keeps the old boolean "is_sibling_entry()" but >> then actually *uses* that inside radix_tree_descend() and then tries >> to make the nasty cast to "void **" more legible by making it use a >> temporary variable seems to be a reasonable balance. >> >> At least I feel like I can still read the code, but admittedly by now >> that may be because I've stared at those few lines so much that I feel >> like I know what's going on. So maybe the code isn't actually any more >> legible after all. >> >> .. and unlike my previous patch, it actually generates better code >> than the original (while still passing the fixed test-suite, of >> course). The reason seems to be exactly that temporary variable, >> allowing us to just do >> >> entry = rcu_dereference_raw(*sibentry); >> >> rather than doing >> >> entry = rcu_dereference_raw(parent->slots[offset]); >> >> with the re-computed offset. >> >> So I think I'll commit this unless somebody screams. > > Acked-by: Matthew Wilcox <mawilcox@microsoft.com> > > I don't love it. But I think it's a reasonable fix for this point in the release cycle, and I have an idea for changing the representation of sibling slots that will make this moot. > > (Basically adopting Konstantin's idea for using the *last* entry instead of the *first*, and then using entries of the form (offset << 2 | RADIX_TREE_INTERNAL_NODE), so we can identify sibling entries without knowing the parent pointer, and we can go straight from sibling entry to slot offset as a shift rather than as a pointer subtraction).
lib/radix-tree.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 1b7bf7314141..91f0727e3cad 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -105,10 +105,10 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent, #ifdef CONFIG_RADIX_TREE_MULTIORDER if (radix_tree_is_internal_node(entry)) { - unsigned long siboff = get_slot_offset(parent, entry); - if (siboff < RADIX_TREE_MAP_SIZE) { - offset = siboff; - entry = rcu_dereference_raw(parent->slots[offset]); + if (is_sibling_entry(parent, entry)) { + void **sibentry = (void **) entry_to_node(entry); + offset = get_slot_offset(parent, sibentry); + entry = rcu_dereference_raw(*sibentry); } } #endif