Message ID | 20220125051736.2981459-1-shakeelb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: io_uring: allow oom-killer from io_uring_setup | expand |
On Mon, 24 Jan 2022, Shakeel Butt wrote: > On an overcommitted system which is running multiple workloads of > varying priorities, it is preferred to trigger an oom-killer to kill a > low priority workload than to let the high priority workload receiving > ENOMEMs. On our memory overcommitted systems, we are seeing a lot of > ENOMEMs instead of oom-kills because io_uring_setup callchain is using > __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and > allow the oom-killer to kill a lower priority job. > What is the size of the allocations that io_mem_alloc() is doing? If get_order(size) > PAGE_ALLOC_COSTLY_ORDER, then this will fail even without the __GFP_NORETRY. To make the guarantee that workloads are not receiving ENOMEM, it seems like we'd need to guarantee that allocations going through io_mem_alloc() are sufficiently small. (And if we're really serious about it, then even something like a BUILD_BUG_ON().) > Signed-off-by: Shakeel Butt <shakeelb@google.com> > --- > fs/io_uring.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index e54c4127422e..d9eeb202363c 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -8928,10 +8928,9 @@ static void io_mem_free(void *ptr) > > static void *io_mem_alloc(size_t size) > { > - gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP | > - __GFP_NORETRY | __GFP_ACCOUNT; > + gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; > > - return (void *) __get_free_pages(gfp_flags, get_order(size)); > + return (void *) __get_free_pages(gfp, get_order(size)); > } > > static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries, > -- > 2.35.0.rc0.227.g00780c9af4-goog > > >
On Tue, Jan 25, 2022 at 10:35 AM David Rientjes <rientjes@google.com> wrote: > > On Mon, 24 Jan 2022, Shakeel Butt wrote: > > > On an overcommitted system which is running multiple workloads of > > varying priorities, it is preferred to trigger an oom-killer to kill a > > low priority workload than to let the high priority workload receiving > > ENOMEMs. On our memory overcommitted systems, we are seeing a lot of > > ENOMEMs instead of oom-kills because io_uring_setup callchain is using > > __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and > > allow the oom-killer to kill a lower priority job. > > > > What is the size of the allocations that io_mem_alloc() is doing? > > If get_order(size) > PAGE_ALLOC_COSTLY_ORDER, then this will fail even > without the __GFP_NORETRY. To make the guarantee that workloads are not > receiving ENOMEM, it seems like we'd need to guarantee that allocations > going through io_mem_alloc() are sufficiently small. > > (And if we're really serious about it, then even something like a > BUILD_BUG_ON().) > The test case provided to me for which the user was seeing ENOMEMs was io_uring_setup() with 64 entries (nothing else). If I understand rings_size() calculations correctly then the 0 order allocation was requested in io_mem_alloc(). For order > PAGE_ALLOC_COSTLY_ORDER, maybe we can use __GFP_RETRY_MAYFAIL. It will at least do more aggressive reclaim though I think that is a separate discussion. For this issue, we are seeing ENOMEMs even for order 0 allocations.
On Tue, 25 Jan 2022, Shakeel Butt wrote: > > > On an overcommitted system which is running multiple workloads of > > > varying priorities, it is preferred to trigger an oom-killer to kill a > > > low priority workload than to let the high priority workload receiving > > > ENOMEMs. On our memory overcommitted systems, we are seeing a lot of > > > ENOMEMs instead of oom-kills because io_uring_setup callchain is using > > > __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and > > > allow the oom-killer to kill a lower priority job. > > > > > > > What is the size of the allocations that io_mem_alloc() is doing? > > > > If get_order(size) > PAGE_ALLOC_COSTLY_ORDER, then this will fail even > > without the __GFP_NORETRY. To make the guarantee that workloads are not > > receiving ENOMEM, it seems like we'd need to guarantee that allocations > > going through io_mem_alloc() are sufficiently small. > > > > (And if we're really serious about it, then even something like a > > BUILD_BUG_ON().) > > > > The test case provided to me for which the user was seeing ENOMEMs was > io_uring_setup() with 64 entries (nothing else). > > If I understand rings_size() calculations correctly then the 0 order > allocation was requested in io_mem_alloc(). > > For order > PAGE_ALLOC_COSTLY_ORDER, maybe we can use > __GFP_RETRY_MAYFAIL. It will at least do more aggressive reclaim > though I think that is a separate discussion. For this issue, we are > seeing ENOMEMs even for order 0 allocations. > Ah, gotcha, thanks for the background. IIUC, io_uring_setup() can be done with anything with CAP_SYS_NICE so my only concern would be whether this could be used maliciously on a system not using memcg, but in that case we can already fork many small processes that consume all memory and oom kill everything else on the system already.
On Mon, Jan 24, 2022 at 9:17 PM Shakeel Butt <shakeelb@google.com> wrote: > > On an overcommitted system which is running multiple workloads of > varying priorities, it is preferred to trigger an oom-killer to kill a > low priority workload than to let the high priority workload receiving > ENOMEMs. On our memory overcommitted systems, we are seeing a lot of > ENOMEMs instead of oom-kills because io_uring_setup callchain is using > __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and > allow the oom-killer to kill a lower priority job. > > Signed-off-by: Shakeel Butt <shakeelb@google.com> Jens, any comments or concerns on this patch? > --- > fs/io_uring.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index e54c4127422e..d9eeb202363c 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -8928,10 +8928,9 @@ static void io_mem_free(void *ptr) > > static void *io_mem_alloc(size_t size) > { > - gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP | > - __GFP_NORETRY | __GFP_ACCOUNT; > + gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; > > - return (void *) __get_free_pages(gfp_flags, get_order(size)); > + return (void *) __get_free_pages(gfp, get_order(size)); > } > > static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries, > -- > 2.35.0.rc0.227.g00780c9af4-goog >
On Mon, 24 Jan 2022 21:17:36 -0800, Shakeel Butt wrote: > On an overcommitted system which is running multiple workloads of > varying priorities, it is preferred to trigger an oom-killer to kill a > low priority workload than to let the high priority workload receiving > ENOMEMs. On our memory overcommitted systems, we are seeing a lot of > ENOMEMs instead of oom-kills because io_uring_setup callchain is using > __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and > allow the oom-killer to kill a lower priority job. > > [...] Applied, thanks! [1/1] mm: io_uring: allow oom-killer from io_uring_setup commit: 0a3f1e0beacf6cc8ae5f846b0641c1df476e83d6 Best regards,
diff --git a/fs/io_uring.c b/fs/io_uring.c index e54c4127422e..d9eeb202363c 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8928,10 +8928,9 @@ static void io_mem_free(void *ptr) static void *io_mem_alloc(size_t size) { - gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP | - __GFP_NORETRY | __GFP_ACCOUNT; + gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; - return (void *) __get_free_pages(gfp_flags, get_order(size)); + return (void *) __get_free_pages(gfp, get_order(size)); } static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries,
On an overcommitted system which is running multiple workloads of varying priorities, it is preferred to trigger an oom-killer to kill a low priority workload than to let the high priority workload receiving ENOMEMs. On our memory overcommitted systems, we are seeing a lot of ENOMEMs instead of oom-kills because io_uring_setup callchain is using __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and allow the oom-killer to kill a lower priority job. Signed-off-by: Shakeel Butt <shakeelb@google.com> --- fs/io_uring.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)