Message ID | 20231106024413.2801438-3-almasrymina@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Device Memory TCP | expand |
On 2023/11/6 10:44, Mina Almasry wrote: > > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h > index 6fc5134095ed..d4bea053bb7e 100644 > --- a/include/net/page_pool/types.h > +++ b/include/net/page_pool/types.h > @@ -60,6 +60,8 @@ struct page_pool_params { > int nid; > struct device *dev; > struct napi_struct *napi; > + u8 memory_provider; > + void *mp_priv; > enum dma_data_direction dma_dir; > unsigned int max_len; > unsigned int offset; > @@ -118,6 +120,19 @@ struct page_pool_stats { > }; > #endif > > +struct mem_provider; The above doesn't seems be used? > + > +enum pp_memory_provider_type { > + __PP_MP_NONE, /* Use system allocator directly */ > +}; > + > +struct pp_memory_provider_ops { Is it better to rename the above to pp_memory_provider and put the above memory_provider and mp_priv here? so that all the fields related to pp_memory_provider are in one place? It is probably better to provide a register function for driver to implement its own pp_memory_provider in the future. > + int (*init)(struct page_pool *pool); > + void (*destroy)(struct page_pool *pool); > + struct page *(*alloc_pages)(struct page_pool *pool, gfp_t gfp); > + bool (*release_page)(struct page_pool *pool, struct page *page); > +}; > +
On Sun, 2023-11-05 at 18:44 -0800, Mina Almasry wrote: > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h > index 6fc5134095ed..d4bea053bb7e 100644 > --- a/include/net/page_pool/types.h > +++ b/include/net/page_pool/types.h > @@ -60,6 +60,8 @@ struct page_pool_params { > int nid; > struct device *dev; > struct napi_struct *napi; > + u8 memory_provider; > + void *mp_priv; Minor nit: swapping the above 2 fields should make the struct smaller. > enum dma_data_direction dma_dir; > unsigned int max_len; > unsigned int offset; > @@ -118,6 +120,19 @@ struct page_pool_stats { > }; > #endif > > +struct mem_provider; > + > +enum pp_memory_provider_type { > + __PP_MP_NONE, /* Use system allocator directly */ > +}; > + > +struct pp_memory_provider_ops { > + int (*init)(struct page_pool *pool); > + void (*destroy)(struct page_pool *pool); > + struct page *(*alloc_pages)(struct page_pool *pool, gfp_t gfp); > + bool (*release_page)(struct page_pool *pool, struct page *page); > +}; > + > struct page_pool { > struct page_pool_params p; > > @@ -165,6 +180,9 @@ struct page_pool { > */ > struct ptr_ring ring; > > + const struct pp_memory_provider_ops *mp_ops; > + void *mp_priv; Why the mp_ops are not part of page_pool_params? why mp_priv is duplicated here? Cheers, Paolo
On Sun, 5 Nov 2023 18:44:01 -0800 Mina Almasry wrote: > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h > index 6fc5134095ed..d4bea053bb7e 100644 > --- a/include/net/page_pool/types.h > +++ b/include/net/page_pool/types.h > @@ -60,6 +60,8 @@ struct page_pool_params { > int nid; > struct device *dev; > struct napi_struct *napi; > + u8 memory_provider; > + void *mp_priv; > enum dma_data_direction dma_dir; > unsigned int max_len; > unsigned int offset; you should rebase on top of net-next More importantly I was expecting those fields to be gone from params. The fact that the page pool is configured to a specific provider should be fully transparent to the driver, driver should just tell the core what queue its creating the pool from and if there's a dmabuf bound for that queue - out pops a pp backed by the dmabuf. My RFC had the page pool params fields here as a hack.
On Fri, Nov 10, 2023 at 3:19 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sun, 5 Nov 2023 18:44:01 -0800 Mina Almasry wrote: > > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h > > index 6fc5134095ed..d4bea053bb7e 100644 > > --- a/include/net/page_pool/types.h > > +++ b/include/net/page_pool/types.h > > @@ -60,6 +60,8 @@ struct page_pool_params { > > int nid; > > struct device *dev; > > struct napi_struct *napi; > > + u8 memory_provider; > > + void *mp_priv; > > enum dma_data_direction dma_dir; > > unsigned int max_len; > > unsigned int offset; > > you should rebase on top of net-next > > More importantly I was expecting those fields to be gone from params. > The fact that the page pool is configured to a specific provider > should be fully transparent to the driver, driver should just tell > the core what queue its creating the pool from and if there's a dmabuf > bound for that queue - out pops a pp backed by the dmabuf. > My issue with this is that if the driver doesn't support dmabuf then the driver will accidentally use the pp backed by the dmabuf, allocate a page from it, then call page_address() on it or something, and crash. Currently I avoid that by having the driver be responsible for picking up the dmabuf from the netdev_rx_queue and giving it to the page pool. What would be the appropriate way to check for driver support in the netlink API? Perhaps adding something to ndo_features_check?
On Sun, 12 Nov 2023 19:28:52 -0800 Mina Almasry wrote: > My issue with this is that if the driver doesn't support dmabuf then > the driver will accidentally use the pp backed by the dmabuf, allocate > a page from it, then call page_address() on it or something, and > crash. > > Currently I avoid that by having the driver be responsible for picking > up the dmabuf from the netdev_rx_queue and giving it to the page pool. > What would be the appropriate way to check for driver support in the > netlink API? Perhaps adding something to ndo_features_check? We need some form of capabilities. I was expecting to add that as part of the queue API. Either a new field in struct net_device or in ndos. I tend to put static driver caps of this nature into ops. See for instance .supported_ring_params in ethtool ops.
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index 6fc5134095ed..d4bea053bb7e 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -60,6 +60,8 @@ struct page_pool_params { int nid; struct device *dev; struct napi_struct *napi; + u8 memory_provider; + void *mp_priv; enum dma_data_direction dma_dir; unsigned int max_len; unsigned int offset; @@ -118,6 +120,19 @@ struct page_pool_stats { }; #endif +struct mem_provider; + +enum pp_memory_provider_type { + __PP_MP_NONE, /* Use system allocator directly */ +}; + +struct pp_memory_provider_ops { + int (*init)(struct page_pool *pool); + void (*destroy)(struct page_pool *pool); + struct page *(*alloc_pages)(struct page_pool *pool, gfp_t gfp); + bool (*release_page)(struct page_pool *pool, struct page *page); +}; + struct page_pool { struct page_pool_params p; @@ -165,6 +180,9 @@ struct page_pool { */ struct ptr_ring ring; + const struct pp_memory_provider_ops *mp_ops; + void *mp_priv; + #ifdef CONFIG_PAGE_POOL_STATS /* recycle stats are per-cpu to avoid locking */ struct page_pool_recycle_stats __percpu *recycle_stats; diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 578b6f2eeb46..7ea1f4682479 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -23,6 +23,8 @@ #include <trace/events/page_pool.h> +static DEFINE_STATIC_KEY_FALSE(page_pool_mem_providers); + #define DEFER_TIME (msecs_to_jiffies(1000)) #define DEFER_WARN_INTERVAL (60 * HZ) @@ -172,6 +174,7 @@ static int page_pool_init(struct page_pool *pool, const struct page_pool_params *params) { unsigned int ring_qsize = 1024; /* Default */ + int err; memcpy(&pool->p, params, sizeof(pool->p)); @@ -225,10 +228,34 @@ static int page_pool_init(struct page_pool *pool, /* Driver calling page_pool_create() also call page_pool_destroy() */ refcount_set(&pool->user_cnt, 1); + switch (pool->p.memory_provider) { + case __PP_MP_NONE: + break; + default: + err = -EINVAL; + goto free_ptr_ring; + } + + pool->mp_priv = pool->p.mp_priv; + if (pool->mp_ops) { + err = pool->mp_ops->init(pool); + if (err) { + pr_warn("%s() mem-provider init failed %d\n", + __func__, err); + goto free_ptr_ring; + } + + static_branch_inc(&page_pool_mem_providers); + } + if (pool->p.flags & PP_FLAG_DMA_MAP) get_device(pool->p.dev); return 0; + +free_ptr_ring: + ptr_ring_cleanup(&pool->ring, NULL); + return err; } /** @@ -490,7 +517,10 @@ struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp) return page; /* Slow-path: cache empty, do real allocation */ - page = __page_pool_alloc_pages_slow(pool, gfp); + if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops) + page = pool->mp_ops->alloc_pages(pool, gfp); + else + page = __page_pool_alloc_pages_slow(pool, gfp); return page; } EXPORT_SYMBOL(page_pool_alloc_pages); @@ -542,10 +572,13 @@ void __page_pool_release_page_dma(struct page_pool *pool, struct page *page) void page_pool_return_page(struct page_pool *pool, struct page *page) { int count; + bool put; - __page_pool_release_page_dma(pool, page); - - page_pool_clear_pp_info(page); + put = true; + if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops) + put = pool->mp_ops->release_page(pool, page); + else + __page_pool_release_page_dma(pool, page); /* This may be the last page returned, releasing the pool, so * it is not safe to reference pool afterwards. @@ -553,7 +586,10 @@ void page_pool_return_page(struct page_pool *pool, struct page *page) count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt); trace_page_pool_state_release(pool, page, count); - put_page(page); + if (put) { + page_pool_clear_pp_info(page); + put_page(page); + } /* An optimization would be to call __free_pages(page, pool->p.order) * knowing page is not part of page-cache (thus avoiding a * __page_cache_release() call). @@ -821,6 +857,11 @@ static void __page_pool_destroy(struct page_pool *pool) if (pool->disconnect) pool->disconnect(pool); + if (pool->mp_ops) { + pool->mp_ops->destroy(pool); + static_branch_dec(&page_pool_mem_providers); + } + ptr_ring_cleanup(&pool->ring, NULL); if (pool->p.flags & PP_FLAG_DMA_MAP)