Message ID | 20211208044808.872554-4-pcc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kernel: introduce uaccess logging | expand |
On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote: > > With uaccess logging the contract is that the kernel must not report > accessing more data than necessary, as this can lead to false positive > reports in downstream consumers. This generally works out of the box > when instrumenting copy_{from,to}_user(), but with the data argument > to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as > much as we can, if the PAGE_SIZE sized access failed) and figure out > later how much we actually need. > > To prevent this from leading to a false positive report, use > copy_from_user_nolog(), which will prevent the access from being logged. > Recall that it is valid for the kernel to report accessing less > data than it actually accessed, as uaccess logging is a best-effort > mechanism for reporting uaccesses. > > Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6 > Signed-off-by: Peter Collingbourne <pcc@google.com> > --- > fs/namespace.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 659a8f39c61a..8f5f2aaca64e 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -31,6 +31,7 @@ > #include <uapi/linux/mount.h> > #include <linux/fs_context.h> > #include <linux/shmem_fs.h> > +#include <linux/uaccess-buffer.h> > > #include "pnode.h" > #include "internal.h" > @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data) > if (!copy) > return ERR_PTR(-ENOMEM); > > - left = copy_from_user(copy, data, PAGE_SIZE); > + /* > + * Use copy_from_user_nolog to avoid reporting overly large accesses in > + * the uaccess buffer, as this can lead to false positive reports in > + * downstream consumers. > + */ > + left = copy_from_user_nolog(copy, data, PAGE_SIZE); A late idea... Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD flag. Better for user-space, at least can detect UAFs by checking the first byte. And a more logical kernel annotation (maybe will be used in some other tools? or if we ever check user tags in the kernel). Probably not too important today since we use this only in 2 places, but longer term may be better. Btw, what's the story with BPF accesses? Can we log them theoretically? Previously the comment said: + /* + * Avoid copy_from_user() here as it may leak information about the BPF + * program to userspace via the uaccess buffer. + */ but now it says something very generic: /* * Avoid logging uaccesses here as the BPF program may not be following * the uaccess log rules. */ > /* > * Not all architectures have an exact copy_from_user(). Resort to > -- > 2.34.1.173.g76aa8bc2d0-goog
On Wed, Dec 8, 2021 at 1:35 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote: > > > > With uaccess logging the contract is that the kernel must not report > > accessing more data than necessary, as this can lead to false positive > > reports in downstream consumers. This generally works out of the box > > when instrumenting copy_{from,to}_user(), but with the data argument > > to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as > > much as we can, if the PAGE_SIZE sized access failed) and figure out > > later how much we actually need. > > > > To prevent this from leading to a false positive report, use > > copy_from_user_nolog(), which will prevent the access from being logged. > > Recall that it is valid for the kernel to report accessing less > > data than it actually accessed, as uaccess logging is a best-effort > > mechanism for reporting uaccesses. > > > > Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6 > > Signed-off-by: Peter Collingbourne <pcc@google.com> > > --- > > fs/namespace.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/fs/namespace.c b/fs/namespace.c > > index 659a8f39c61a..8f5f2aaca64e 100644 > > --- a/fs/namespace.c > > +++ b/fs/namespace.c > > @@ -31,6 +31,7 @@ > > #include <uapi/linux/mount.h> > > #include <linux/fs_context.h> > > #include <linux/shmem_fs.h> > > +#include <linux/uaccess-buffer.h> > > > > #include "pnode.h" > > #include "internal.h" > > @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data) > > if (!copy) > > return ERR_PTR(-ENOMEM); > > > > - left = copy_from_user(copy, data, PAGE_SIZE); > > + /* > > + * Use copy_from_user_nolog to avoid reporting overly large accesses in > > + * the uaccess buffer, as this can lead to false positive reports in > > + * downstream consumers. > > + */ > > + left = copy_from_user_nolog(copy, data, PAGE_SIZE); > > A late idea... > Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD > flag. Better for user-space, at least can detect UAFs by checking the > first byte. And a more logical kernel annotation (maybe will be used > in some other tools? or if we ever check user tags in the kernel). > > Probably not too important today since we use this only in 2 places, > but longer term may be better. I'm not sure about this. The overreads are basically an implementation detail of the kernel, so I'm not sure it makes sense to expose them. A scheme where we expose all overreads wouldn't necessarily help with UAF, because what if for example the kernel reads *behind* the user-provided pointer? I guess it could lead to false positives. > Btw, what's the story with BPF accesses? Can we log them theoretically? > > Previously the comment said: > > + /* > + * Avoid copy_from_user() here as it may leak information about the BPF > + * program to userspace via the uaccess buffer. > + */ > > but now it says something very generic: > > /* > * Avoid logging uaccesses here as the BPF program may not be following > * the uaccess log rules. > */ Yes we should be able to log them theoretically, but we don't need to do that now. See my reply here: https://lore.kernel.org/all/CAMn1gO5B5Q3hfN6kugv2wmdFGNhJb75iRX1zmCkw3wnueN1dtg@mail.gmail.com/#:~:text=This%20comment%20was,the%20comment%20accordingly. Peter
On Thu, 9 Dec 2021 at 22:42, Peter Collingbourne <pcc@google.com> wrote: > > > With uaccess logging the contract is that the kernel must not report > > > accessing more data than necessary, as this can lead to false positive > > > reports in downstream consumers. This generally works out of the box > > > when instrumenting copy_{from,to}_user(), but with the data argument > > > to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as > > > much as we can, if the PAGE_SIZE sized access failed) and figure out > > > later how much we actually need. > > > > > > To prevent this from leading to a false positive report, use > > > copy_from_user_nolog(), which will prevent the access from being logged. > > > Recall that it is valid for the kernel to report accessing less > > > data than it actually accessed, as uaccess logging is a best-effort > > > mechanism for reporting uaccesses. > > > > > > Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6 > > > Signed-off-by: Peter Collingbourne <pcc@google.com> > > > --- > > > fs/namespace.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/namespace.c b/fs/namespace.c > > > index 659a8f39c61a..8f5f2aaca64e 100644 > > > --- a/fs/namespace.c > > > +++ b/fs/namespace.c > > > @@ -31,6 +31,7 @@ > > > #include <uapi/linux/mount.h> > > > #include <linux/fs_context.h> > > > #include <linux/shmem_fs.h> > > > +#include <linux/uaccess-buffer.h> > > > > > > #include "pnode.h" > > > #include "internal.h" > > > @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data) > > > if (!copy) > > > return ERR_PTR(-ENOMEM); > > > > > > - left = copy_from_user(copy, data, PAGE_SIZE); > > > + /* > > > + * Use copy_from_user_nolog to avoid reporting overly large accesses in > > > + * the uaccess buffer, as this can lead to false positive reports in > > > + * downstream consumers. > > > + */ > > > + left = copy_from_user_nolog(copy, data, PAGE_SIZE); > > > > A late idea... > > Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD > > flag. Better for user-space, at least can detect UAFs by checking the > > first byte. And a more logical kernel annotation (maybe will be used > > in some other tools? or if we ever check user tags in the kernel). > > > > Probably not too important today since we use this only in 2 places, > > but longer term may be better. > > I'm not sure about this. The overreads are basically an implementation > detail of the kernel, so I'm not sure it makes sense to expose them. A > scheme where we expose all overreads wouldn't necessarily help with > UAF, because what if for example the kernel reads *behind* the > user-provided pointer? I guess it could lead to false positives. If user-space uses logging to check addressability, then it can safely check only the first byte (right? there must be at least 1 byte passed by user-space at that address). And that's enough to detect UAFs. > > Btw, what's the story with BPF accesses? Can we log them theoretically? > > > > Previously the comment said: > > > > + /* > > + * Avoid copy_from_user() here as it may leak information about the BPF > > + * program to userspace via the uaccess buffer. > > + */ > > > > but now it says something very generic: > > > > /* > > * Avoid logging uaccesses here as the BPF program may not be following > > * the uaccess log rules. > > */ > > Yes we should be able to log them theoretically, but we don't need to > do that now. See my reply here: > > https://lore.kernel.org/all/CAMn1gO5B5Q3hfN6kugv2wmdFGNhJb75iRX1zmCkw3wnueN1dtg@mail.gmail.com/#:~:text=This%20comment%20was,the%20comment%20accordingly. I see. These could be marked with another flag. I don't have a strong opinion about this. But I am mentioning this because my experience is that it's better to expose more raw info from kernel in these cases, rather than hardcoding policies into kernel code (what's ignored/why/when) b/c a delay from another kernel change to wide deployment is 5+ years and user-space code may need to detect and deal with all various versions of the kernel logic. Say, fuzzing may still want to know about the mount options (rather than no signal that the kernel reads at least something at that address). But adding them later with a flag is not really a backwards compatible change b/c you now have addressability checking code that's not checking the new flag and will produce false positives.
On Thu, Dec 9, 2021 at 6:59 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Thu, 9 Dec 2021 at 22:42, Peter Collingbourne <pcc@google.com> wrote: > > > > With uaccess logging the contract is that the kernel must not report > > > > accessing more data than necessary, as this can lead to false positive > > > > reports in downstream consumers. This generally works out of the box > > > > when instrumenting copy_{from,to}_user(), but with the data argument > > > > to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as > > > > much as we can, if the PAGE_SIZE sized access failed) and figure out > > > > later how much we actually need. > > > > > > > > To prevent this from leading to a false positive report, use > > > > copy_from_user_nolog(), which will prevent the access from being logged. > > > > Recall that it is valid for the kernel to report accessing less > > > > data than it actually accessed, as uaccess logging is a best-effort > > > > mechanism for reporting uaccesses. > > > > > > > > Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6 > > > > Signed-off-by: Peter Collingbourne <pcc@google.com> > > > > --- > > > > fs/namespace.c | 8 +++++++- > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/fs/namespace.c b/fs/namespace.c > > > > index 659a8f39c61a..8f5f2aaca64e 100644 > > > > --- a/fs/namespace.c > > > > +++ b/fs/namespace.c > > > > @@ -31,6 +31,7 @@ > > > > #include <uapi/linux/mount.h> > > > > #include <linux/fs_context.h> > > > > #include <linux/shmem_fs.h> > > > > +#include <linux/uaccess-buffer.h> > > > > > > > > #include "pnode.h" > > > > #include "internal.h" > > > > @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data) > > > > if (!copy) > > > > return ERR_PTR(-ENOMEM); > > > > > > > > - left = copy_from_user(copy, data, PAGE_SIZE); > > > > + /* > > > > + * Use copy_from_user_nolog to avoid reporting overly large accesses in > > > > + * the uaccess buffer, as this can lead to false positive reports in > > > > + * downstream consumers. > > > > + */ > > > > + left = copy_from_user_nolog(copy, data, PAGE_SIZE); > > > > > > A late idea... > > > Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD > > > flag. Better for user-space, at least can detect UAFs by checking the > > > first byte. And a more logical kernel annotation (maybe will be used > > > in some other tools? or if we ever check user tags in the kernel). > > > > > > Probably not too important today since we use this only in 2 places, > > > but longer term may be better. > > > > I'm not sure about this. The overreads are basically an implementation > > detail of the kernel, so I'm not sure it makes sense to expose them. A > > scheme where we expose all overreads wouldn't necessarily help with > > UAF, because what if for example the kernel reads *behind* the > > user-provided pointer? I guess it could lead to false positives. > > If user-space uses logging to check addressability, then it can safely > check only the first byte (right? there must be at least 1 byte passed > by user-space at that address). And that's enough to detect UAFs. I was thinking more e.g. what if the kernel reads an entire page with copy_from_user() and takes a subset of it later. Then the first byte could point to some other random allocation in the same page and lead to a false UAF report if we just consider the first byte. So I think the use cases for accesses with this flag set may be limited to things like fuzzers. > > > Btw, what's the story with BPF accesses? Can we log them theoretically? > > > > > > Previously the comment said: > > > > > > + /* > > > + * Avoid copy_from_user() here as it may leak information about the BPF > > > + * program to userspace via the uaccess buffer. > > > + */ > > > > > > but now it says something very generic: > > > > > > /* > > > * Avoid logging uaccesses here as the BPF program may not be following > > > * the uaccess log rules. > > > */ > > > > Yes we should be able to log them theoretically, but we don't need to > > do that now. See my reply here: > > > > https://lore.kernel.org/all/CAMn1gO5B5Q3hfN6kugv2wmdFGNhJb75iRX1zmCkw3wnueN1dtg@mail.gmail.com/#:~:text=This%20comment%20was,the%20comment%20accordingly. > > I see. These could be marked with another flag. > I don't have a strong opinion about this. But I am mentioning this > because my experience is that it's better to expose more raw info from > kernel in these cases, rather than hardcoding policies into kernel > code (what's ignored/why/when) b/c a delay from another kernel change > to wide deployment is 5+ years and user-space code may need to detect > and deal with all various versions of the kernel logic. > Say, fuzzing may still want to know about the mount options (rather > than no signal that the kernel reads at least something at that > address). But adding them later with a flag is not really a backwards > compatible change b/c you now have addressability checking code that's > not checking the new flag and will produce false positives. I think this is a good point. I'll see about adding flags for the BPF and overread cases. Peter
On Fri, 10 Dec 2021 at 05:02, Peter Collingbourne <pcc@google.com> wrote: > > On Thu, 9 Dec 2021 at 22:42, Peter Collingbourne <pcc@google.com> wrote: > > > > > With uaccess logging the contract is that the kernel must not report > > > > > accessing more data than necessary, as this can lead to false positive > > > > > reports in downstream consumers. This generally works out of the box > > > > > when instrumenting copy_{from,to}_user(), but with the data argument > > > > > to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as > > > > > much as we can, if the PAGE_SIZE sized access failed) and figure out > > > > > later how much we actually need. > > > > > > > > > > To prevent this from leading to a false positive report, use > > > > > copy_from_user_nolog(), which will prevent the access from being logged. > > > > > Recall that it is valid for the kernel to report accessing less > > > > > data than it actually accessed, as uaccess logging is a best-effort > > > > > mechanism for reporting uaccesses. > > > > > > > > > > Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6 > > > > > Signed-off-by: Peter Collingbourne <pcc@google.com> > > > > > --- > > > > > fs/namespace.c | 8 +++++++- > > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/fs/namespace.c b/fs/namespace.c > > > > > index 659a8f39c61a..8f5f2aaca64e 100644 > > > > > --- a/fs/namespace.c > > > > > +++ b/fs/namespace.c > > > > > @@ -31,6 +31,7 @@ > > > > > #include <uapi/linux/mount.h> > > > > > #include <linux/fs_context.h> > > > > > #include <linux/shmem_fs.h> > > > > > +#include <linux/uaccess-buffer.h> > > > > > > > > > > #include "pnode.h" > > > > > #include "internal.h" > > > > > @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data) > > > > > if (!copy) > > > > > return ERR_PTR(-ENOMEM); > > > > > > > > > > - left = copy_from_user(copy, data, PAGE_SIZE); > > > > > + /* > > > > > + * Use copy_from_user_nolog to avoid reporting overly large accesses in > > > > > + * the uaccess buffer, as this can lead to false positive reports in > > > > > + * downstream consumers. > > > > > + */ > > > > > + left = copy_from_user_nolog(copy, data, PAGE_SIZE); > > > > > > > > A late idea... > > > > Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD > > > > flag. Better for user-space, at least can detect UAFs by checking the > > > > first byte. And a more logical kernel annotation (maybe will be used > > > > in some other tools? or if we ever check user tags in the kernel). > > > > > > > > Probably not too important today since we use this only in 2 places, > > > > but longer term may be better. > > > > > > I'm not sure about this. The overreads are basically an implementation > > > detail of the kernel, so I'm not sure it makes sense to expose them. A > > > scheme where we expose all overreads wouldn't necessarily help with > > > UAF, because what if for example the kernel reads *behind* the > > > user-provided pointer? I guess it could lead to false positives. > > > > If user-space uses logging to check addressability, then it can safely > > check only the first byte (right? there must be at least 1 byte passed > > by user-space at that address). And that's enough to detect UAFs. > > I was thinking more e.g. what if the kernel reads an entire page with > copy_from_user() and takes a subset of it later. Then the first byte > could point to some other random allocation in the same page and lead > to a false UAF report if we just consider the first byte. Humm.. good point. As I said I am not strong about this. I don't know what's the right answer. > So I think the use cases for accesses with this flag set may be > limited to things like fuzzers. > > > > > Btw, what's the story with BPF accesses? Can we log them theoretically? > > > > > > > > Previously the comment said: > > > > > > > > + /* > > > > + * Avoid copy_from_user() here as it may leak information about the BPF > > > > + * program to userspace via the uaccess buffer. > > > > + */ > > > > > > > > but now it says something very generic: > > > > > > > > /* > > > > * Avoid logging uaccesses here as the BPF program may not be following > > > > * the uaccess log rules. > > > > */ > > > > > > Yes we should be able to log them theoretically, but we don't need to > > > do that now. See my reply here: > > > > > > https://lore.kernel.org/all/CAMn1gO5B5Q3hfN6kugv2wmdFGNhJb75iRX1zmCkw3wnueN1dtg@mail.gmail.com/#:~:text=This%20comment%20was,the%20comment%20accordingly. > > > > I see. These could be marked with another flag. > > I don't have a strong opinion about this. But I am mentioning this > > because my experience is that it's better to expose more raw info from > > kernel in these cases, rather than hardcoding policies into kernel > > code (what's ignored/why/when) b/c a delay from another kernel change > > to wide deployment is 5+ years and user-space code may need to detect > > and deal with all various versions of the kernel logic. > > Say, fuzzing may still want to know about the mount options (rather > > than no signal that the kernel reads at least something at that > > address). But adding them later with a flag is not really a backwards > > compatible change b/c you now have addressability checking code that's > > not checking the new flag and will produce false positives. > > I think this is a good point. I'll see about adding flags for the BPF > and overread cases. > > Peter
diff --git a/fs/namespace.c b/fs/namespace.c index 659a8f39c61a..8f5f2aaca64e 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -31,6 +31,7 @@ #include <uapi/linux/mount.h> #include <linux/fs_context.h> #include <linux/shmem_fs.h> +#include <linux/uaccess-buffer.h> #include "pnode.h" #include "internal.h" @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data) if (!copy) return ERR_PTR(-ENOMEM); - left = copy_from_user(copy, data, PAGE_SIZE); + /* + * Use copy_from_user_nolog to avoid reporting overly large accesses in + * the uaccess buffer, as this can lead to false positive reports in + * downstream consumers. + */ + left = copy_from_user_nolog(copy, data, PAGE_SIZE); /* * Not all architectures have an exact copy_from_user(). Resort to
With uaccess logging the contract is that the kernel must not report accessing more data than necessary, as this can lead to false positive reports in downstream consumers. This generally works out of the box when instrumenting copy_{from,to}_user(), but with the data argument to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as much as we can, if the PAGE_SIZE sized access failed) and figure out later how much we actually need. To prevent this from leading to a false positive report, use copy_from_user_nolog(), which will prevent the access from being logged. Recall that it is valid for the kernel to report accessing less data than it actually accessed, as uaccess logging is a best-effort mechanism for reporting uaccesses. Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6 Signed-off-by: Peter Collingbourne <pcc@google.com> --- fs/namespace.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)