Message ID | 20200709062603.18480-2-mhocko@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] doc, mm: sync up oom_score_adj documentation | expand |
On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote: > > From: Michal Hocko <mhocko@suse.com> > > The exported value includes oom_score_adj so the range is no [0, 1000] > as described in the previous section but rather [0, 2000]. Mention that > fact explicitly. > > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > Documentation/filesystems/proc.rst | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > index 8e3b5dffcfa8..78a0dec323a3 100644 > --- a/Documentation/filesystems/proc.rst > +++ b/Documentation/filesystems/proc.rst > @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE. > 3.2 /proc/<pid>/oom_score - Display current oom-killer score > ------------------------------------------------------------- > > +Please note that the exported value includes oom_score_adj so it is effectively > +in range [0,2000]. > + [0, 2000] may be not a proper range, see my reply in another thread.[1] As this value hasn't been documented before and nobody notices that, I think there might be no user really care about it before. So we should discuss the proper range if we really think the user will care about this value. [1]. https://lore.kernel.org/linux-mm/CALOAHbAvj-gWZMLef=PuKTfDScwfM8gPPX0evzjoref1bG=zwA@mail.gmail.com/T/#m2332c3e6b7f869383cb74ab3a0f7b6c670b3b23b > This file can be used to check the current score used by the oom-killer is for > any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which > process should be killed in an out-of-memory situation. > -- > 2.27.0 >
On Thu 09-07-20 15:41:11, Yafang Shao wrote: > On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > From: Michal Hocko <mhocko@suse.com> > > > > The exported value includes oom_score_adj so the range is no [0, 1000] > > as described in the previous section but rather [0, 2000]. Mention that > > fact explicitly. > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > Documentation/filesystems/proc.rst | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > > index 8e3b5dffcfa8..78a0dec323a3 100644 > > --- a/Documentation/filesystems/proc.rst > > +++ b/Documentation/filesystems/proc.rst > > @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE. > > 3.2 /proc/<pid>/oom_score - Display current oom-killer score > > ------------------------------------------------------------- > > > > +Please note that the exported value includes oom_score_adj so it is effectively > > +in range [0,2000]. > > + > > [0, 2000] may be not a proper range, see my reply in another thread.[1] > As this value hasn't been documented before and nobody notices that, I > think there might be no user really care about it before. > So we should discuss the proper range if we really think the user will > care about this value. Even if we decide the range should change, I do not really assume this will happen, it is good to have the existing behavior clarified.
On Thu, Jul 9, 2020 at 4:18 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Thu 09-07-20 15:41:11, Yafang Shao wrote: > > On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > From: Michal Hocko <mhocko@suse.com> > > > > > > The exported value includes oom_score_adj so the range is no [0, 1000] > > > as described in the previous section but rather [0, 2000]. Mention that > > > fact explicitly. > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > --- > > > Documentation/filesystems/proc.rst | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > > > index 8e3b5dffcfa8..78a0dec323a3 100644 > > > --- a/Documentation/filesystems/proc.rst > > > +++ b/Documentation/filesystems/proc.rst > > > @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE. > > > 3.2 /proc/<pid>/oom_score - Display current oom-killer score > > > ------------------------------------------------------------- > > > > > > +Please note that the exported value includes oom_score_adj so it is effectively > > > +in range [0,2000]. > > > + > > > > [0, 2000] may be not a proper range, see my reply in another thread.[1] > > As this value hasn't been documented before and nobody notices that, I > > think there might be no user really care about it before. > > So we should discuss the proper range if we really think the user will > > care about this value. > > Even if we decide the range should change, I do not really assume this > will happen, it is good to have the existing behavior clarified. > But the existing behavior is not defined in the kernel documentation before, so I don't think that the user has a clear understanding of the existing behavior. The way to use the result of proc_oom_score is to compare which processes will be killed first by the OOM killer, IOW, the user should always use it to compare different processes. For example, if proc_oom_score(process_a) > proc_oom_score(process_b) then process_a will be killed before process_b fi And then the user will "Use it together with /proc/<pid>/oom_score_adj to tune which process should be killed in an out-of-memory situation." That means what the user really cares about is the relative value, and they will not care about the range or the absolute value.
On Thu 09-07-20 17:01:06, Yafang Shao wrote: > On Thu, Jul 9, 2020 at 4:18 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Thu 09-07-20 15:41:11, Yafang Shao wrote: > > > On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > From: Michal Hocko <mhocko@suse.com> > > > > > > > > The exported value includes oom_score_adj so the range is no [0, 1000] > > > > as described in the previous section but rather [0, 2000]. Mention that > > > > fact explicitly. > > > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > > --- > > > > Documentation/filesystems/proc.rst | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > > > > index 8e3b5dffcfa8..78a0dec323a3 100644 > > > > --- a/Documentation/filesystems/proc.rst > > > > +++ b/Documentation/filesystems/proc.rst > > > > @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE. > > > > 3.2 /proc/<pid>/oom_score - Display current oom-killer score > > > > ------------------------------------------------------------- > > > > > > > > +Please note that the exported value includes oom_score_adj so it is effectively > > > > +in range [0,2000]. > > > > + > > > > > > [0, 2000] may be not a proper range, see my reply in another thread.[1] > > > As this value hasn't been documented before and nobody notices that, I > > > think there might be no user really care about it before. > > > So we should discuss the proper range if we really think the user will > > > care about this value. > > > > Even if we decide the range should change, I do not really assume this > > will happen, it is good to have the existing behavior clarified. > > > > But the existing behavior is not defined in the kernel documentation > before, so I don't think that the user has a clear understanding of > the existing behavior. Well, documentation is by no means authoritative, especially when it is outdated or incomplete. What really matters is the observed behavior and a lot of userspace depends on that or based on the specific implementation. > The way to use the result of proc_oom_score is to compare which > processes will be killed first by the OOM killer, IOW, the user should > always use it to compare different processes. For example, > > if proc_oom_score(process_a) > proc_oom_score(process_b) > then > process_a will be killed before process_b > fi > > And then the user will "Use it together with > /proc/<pid>/oom_score_adj to tune which > process should be killed in an out-of-memory situation." > > That means what the user really cares about is the relative value, and > they will not care about the range or the absolute value. In an ideal world yes. But the real life tells a different story. Many times userspace (ab)uses certain undocumented/unintended (mis)features and the hard rule is that we never break userspace. We've learned that through many painful historical experiences. Especially vaguely defined functionality suffers from the problem.
On Thu, Jul 9, 2020 at 5:58 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Thu 09-07-20 17:01:06, Yafang Shao wrote: > > On Thu, Jul 9, 2020 at 4:18 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Thu 09-07-20 15:41:11, Yafang Shao wrote: > > > > On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > > > From: Michal Hocko <mhocko@suse.com> > > > > > > > > > > The exported value includes oom_score_adj so the range is no [0, 1000] > > > > > as described in the previous section but rather [0, 2000]. Mention that > > > > > fact explicitly. > > > > > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > > > --- > > > > > Documentation/filesystems/proc.rst | 3 +++ > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > > > > > index 8e3b5dffcfa8..78a0dec323a3 100644 > > > > > --- a/Documentation/filesystems/proc.rst > > > > > +++ b/Documentation/filesystems/proc.rst > > > > > @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE. > > > > > 3.2 /proc/<pid>/oom_score - Display current oom-killer score > > > > > ------------------------------------------------------------- > > > > > > > > > > +Please note that the exported value includes oom_score_adj so it is effectively > > > > > +in range [0,2000]. > > > > > + > > > > > > > > [0, 2000] may be not a proper range, see my reply in another thread.[1] > > > > As this value hasn't been documented before and nobody notices that, I > > > > think there might be no user really care about it before. > > > > So we should discuss the proper range if we really think the user will > > > > care about this value. > > > > > > Even if we decide the range should change, I do not really assume this > > > will happen, it is good to have the existing behavior clarified. > > > > > > > But the existing behavior is not defined in the kernel documentation > > before, so I don't think that the user has a clear understanding of > > the existing behavior. > > Well, documentation is by no means authoritative, especially when it is > outdated or incomplete. What really matters is the observed behavior and > a lot of userspace depends on that or based on the specific > implementation. > > > The way to use the result of proc_oom_score is to compare which > > processes will be killed first by the OOM killer, IOW, the user should > > always use it to compare different processes. For example, > > > > if proc_oom_score(process_a) > proc_oom_score(process_b) > > then > > process_a will be killed before process_b > > fi > > > > And then the user will "Use it together with > > /proc/<pid>/oom_score_adj to tune which > > process should be killed in an out-of-memory situation." > > > > That means what the user really cares about is the relative value, and > > they will not care about the range or the absolute value. > > In an ideal world yes. But the real life tells a different story. Many > times userspace (ab)uses certain undocumented/unintended (mis)features > and the hard rule is that we never break userspace. We've learned that > through many painful historical experiences. Especially vaguely defined > functionality suffers from the problem. > -- All right. I don't insist if we think the change in range may break the userspace.
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 8e3b5dffcfa8..78a0dec323a3 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE. 3.2 /proc/<pid>/oom_score - Display current oom-killer score ------------------------------------------------------------- +Please note that the exported value includes oom_score_adj so it is effectively +in range [0,2000]. + This file can be used to check the current score used by the oom-killer is for any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which process should be killed in an out-of-memory situation.