Message ID | 20170108020200.GA16312@cmpxchg.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Jan 7, 2017 at 6:02 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: > > Linus? Andrew? Looks fine to me. Will apply. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat 07-01-17 21:02:00, Johannes Weiner wrote: > On Tue, Jan 03, 2017 at 01:28:25PM +0100, Jan Kara wrote: > > On Mon 02-01-17 16:11:36, Johannes Weiner wrote: > > > On Fri, Dec 23, 2016 at 03:33:29AM -0500, Johannes Weiner wrote: > > > > On Fri, Dec 23, 2016 at 02:32:41AM -0500, Johannes Weiner wrote: > > > > > On Thu, Dec 22, 2016 at 12:22:27PM -0800, Hugh Dickins wrote: > > > > > > On Wed, 21 Dec 2016, Linus Torvalds wrote: > > > > > > > On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > I unmounted the fs, mkfs'd it again, ran the > > > > > > > > workload again and about a minute in this fired: > > > > > > > > > > > > > > > > [628867.607417] ------------[ cut here ]------------ > > > > > > > > [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 shadow_lru_isolate+0x171/0x220 > > > > > > > > > > > > > > Well, part of the changes during the merge window were the shadow > > > > > > > entry tracking changes that came in through Andrew's tree. Adding > > > > > > > Johannes Weiner to the participants. > > Okay, the below patch should address this problem. Dave Jones managed > to reproduce it with the added WARN_ONs, and they made it obvious. He > cannot trigger it anymore with this fix applied. Thanks Dave! FWIW the patch looks good to me. I'd just note that the need to pass the callback to deletion function and the fact that we do it only in cases where we think it is needed appears errorprone. With the warning you've added it should at least catch the cases where we got it wrong but more robust would be if the radix tree root contained a pointer to the callback function so that we would not rely on passing the callback to every place which can possibly free a node. Also conceptually this would make more sense to me since the fact that we may need to do some cleanup on node deletion is a property of the particular radix tree and how we use it. OTOH that would mean growing radix tree root by one pointer which would be unpopular I guess. Honza
On Mon, Jan 09, 2017 at 09:30:05PM +0100, Jan Kara wrote: > On Sat 07-01-17 21:02:00, Johannes Weiner wrote: > > On Tue, Jan 03, 2017 at 01:28:25PM +0100, Jan Kara wrote: > > > On Mon 02-01-17 16:11:36, Johannes Weiner wrote: > > > > On Fri, Dec 23, 2016 at 03:33:29AM -0500, Johannes Weiner wrote: > > > > > On Fri, Dec 23, 2016 at 02:32:41AM -0500, Johannes Weiner wrote: > > > > > > On Thu, Dec 22, 2016 at 12:22:27PM -0800, Hugh Dickins wrote: > > > > > > > On Wed, 21 Dec 2016, Linus Torvalds wrote: > > > > > > > > On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > > I unmounted the fs, mkfs'd it again, ran the > > > > > > > > > workload again and about a minute in this fired: > > > > > > > > > > > > > > > > > > [628867.607417] ------------[ cut here ]------------ > > > > > > > > > [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 shadow_lru_isolate+0x171/0x220 > > > > > > > > > > > > > > > > Well, part of the changes during the merge window were the shadow > > > > > > > > entry tracking changes that came in through Andrew's tree. Adding > > > > > > > > Johannes Weiner to the participants. > > > > Okay, the below patch should address this problem. Dave Jones managed > > to reproduce it with the added WARN_ONs, and they made it obvious. He > > cannot trigger it anymore with this fix applied. Thanks Dave! > > FWIW the patch looks good to me. I'd just note that the need to pass the > callback to deletion function and the fact that we do it only in cases > where we think it is needed appears errorprone. With the warning you've > added it should at least catch the cases where we got it wrong but more > robust would be if the radix tree root contained a pointer to the callback > function so that we would not rely on passing the callback to every place > which can possibly free a node. Also conceptually this would make more > sense to me since the fact that we may need to do some cleanup on node > deletion is a property of the particular radix tree and how we use it. > OTOH that would mean growing radix tree root by one pointer which would be > unpopular I guess. The last sentence is the crux, unfortunately. The first iteration of the shadow shrinker linked up mappings that contained shadow entries, rather than nodes. The code would have been drastically simpler in pretty much all regards, without changes to the radix tree API. But Dave Chinner objected to adding a pointer to the inode when we could stick them into empty space (slab fragmentation) inside the nodes. A fair point, I guess, especially when you consider metadata-intense workloads. But now we're kind of stuck with the complexity of this. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index 5dea8f6440e4..52bda854593b 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -306,7 +306,9 @@ void radix_tree_iter_replace(struct radix_tree_root *, void radix_tree_replace_slot(struct radix_tree_root *root, void **slot, void *item); void __radix_tree_delete_node(struct radix_tree_root *root, - struct radix_tree_node *node); + struct radix_tree_node *node, + radix_tree_update_node_t update_node, + void *private); void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *); void *radix_tree_delete(struct radix_tree_root *, unsigned long); void radix_tree_clear_tags(struct radix_tree_root *root, diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 6f382e07de77..0b92d605fb69 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -640,6 +640,7 @@ static inline void radix_tree_shrink(struct radix_tree_root *root, update_node(node, private); } + WARN_ON_ONCE(!list_empty(&node->private_list)); radix_tree_node_free(node); } } @@ -666,6 +667,7 @@ static void delete_node(struct radix_tree_root *root, root->rnode = NULL; } + WARN_ON_ONCE(!list_empty(&node->private_list)); radix_tree_node_free(node); node = parent; @@ -767,6 +769,7 @@ static void radix_tree_free_nodes(struct radix_tree_node *node) struct radix_tree_node *old = child; offset = child->offset + 1; child = child->parent; + WARN_ON_ONCE(!list_empty(&node->private_list)); radix_tree_node_free(old); if (old == entry_to_node(node)) return; @@ -1824,15 +1827,19 @@ EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot); * __radix_tree_delete_node - try to free node after clearing a slot * @root: radix tree root * @node: node containing @index + * @update_node: callback for changing leaf nodes + * @private: private data to pass to @update_node * * After clearing the slot at @index in @node from radix tree * rooted at @root, call this function to attempt freeing the * node and shrinking the tree. */ void __radix_tree_delete_node(struct radix_tree_root *root, - struct radix_tree_node *node) + struct radix_tree_node *node, + radix_tree_update_node_t update_node, + void *private) { - delete_node(root, node, NULL, NULL); + delete_node(root, node, update_node, private); } /** diff --git a/mm/workingset.c b/mm/workingset.c index 241fa5d6b3b2..abb58ffa3c64 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -473,7 +473,8 @@ static enum lru_status shadow_lru_isolate(struct list_head *item, if (WARN_ON_ONCE(node->exceptional)) goto out_invalid; inc_node_state(page_pgdat(virt_to_page(node)), WORKINGSET_NODERECLAIM); - __radix_tree_delete_node(&mapping->page_tree, node); + __radix_tree_delete_node(&mapping->page_tree, node, + workingset_update_node, mapping); out_invalid: spin_unlock(&mapping->tree_lock);