Message ID | 20170302163834.2273519-3-arnd@arndb.de (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Kalle Valo |
Headers | show |
On 03/02/2017 05:38 PM, Arnd Bergmann wrote: > When CONFIG_KASAN is enabled, the READ_ONCE/WRITE_ONCE macros cause > rather large kernel stacks, e.g.: > > mm/vmscan.c: In function 'shrink_page_list': > mm/vmscan.c:1333:1: error: the frame size of 3456 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] > block/cfq-iosched.c: In function 'cfqg_stats_add_aux': > block/cfq-iosched.c:750:1: error: the frame size of 4048 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] > fs/btrfs/disk-io.c: In function 'open_ctree': > fs/btrfs/disk-io.c:3314:1: error: the frame size of 3136 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] > fs/btrfs/relocation.c: In function 'build_backref_tree': > fs/btrfs/relocation.c:1193:1: error: the frame size of 4336 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] > fs/fscache/stats.c: In function 'fscache_stats_show': > fs/fscache/stats.c:287:1: error: the frame size of 6512 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] > fs/jbd2/commit.c: In function 'jbd2_journal_commit_transaction': > fs/jbd2/commit.c:1139:1: error: the frame size of 3760 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] > > This attempts a rewrite of the two macros, using a simpler implementation > for the most common case of having a naturally aligned 1, 2, 4, or (on > 64-bit architectures) 8 byte object that can be accessed with a single > instruction. For these, we go back to a volatile pointer dereference > that we had with the ACCESS_ONCE macro. We had changed that back then because gcc 4.6 and 4.7 had a bug that could removed the volatile statement on aggregate types like the following one union ipte_control { unsigned long val; struct { unsigned long k : 1; unsigned long kh : 31; unsigned long kg : 32; }; }; See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 If I see that right, your __ALIGNED_WORD(x) macro would say that for above structure sizeof(x) == sizeof(long)) is true, so it would fall back to the old volatile cast and might reintroduce the old compiler bug? Could you maybe you fence your simple macro for anything older than 4.9? After all there was no kasan support anyway on these older gcc version. Christian
On Thu, Mar 2, 2017 at 5:51 PM, Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 03/02/2017 05:38 PM, Arnd Bergmann wrote: >> >> This attempts a rewrite of the two macros, using a simpler implementation >> for the most common case of having a naturally aligned 1, 2, 4, or (on >> 64-bit architectures) 8 byte object that can be accessed with a single >> instruction. For these, we go back to a volatile pointer dereference >> that we had with the ACCESS_ONCE macro. > > We had changed that back then because gcc 4.6 and 4.7 had a bug that could > removed the volatile statement on aggregate types like the following one > > union ipte_control { > unsigned long val; > struct { > unsigned long k : 1; > unsigned long kh : 31; > unsigned long kg : 32; > }; > }; > > See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 > > If I see that right, your __ALIGNED_WORD(x) > macro would say that for above structure sizeof(x) == sizeof(long)) is true, > so it would fall back to the old volatile cast and might reintroduce the > old compiler bug? Ah, right, that's the missing piece. For some reason I didn't find the reference in the source or the git log. > Could you maybe you fence your simple macro for anything older than 4.9? After > all there was no kasan support anyway on these older gcc version. Yes, that should work, thanks! Arnd
On 03/02/2017 06:55 PM, Arnd Bergmann wrote: > On Thu, Mar 2, 2017 at 5:51 PM, Christian Borntraeger > <borntraeger@de.ibm.com> wrote: >> On 03/02/2017 05:38 PM, Arnd Bergmann wrote: >>> >>> This attempts a rewrite of the two macros, using a simpler implementation >>> for the most common case of having a naturally aligned 1, 2, 4, or (on >>> 64-bit architectures) 8 byte object that can be accessed with a single >>> instruction. For these, we go back to a volatile pointer dereference >>> that we had with the ACCESS_ONCE macro. >> >> We had changed that back then because gcc 4.6 and 4.7 had a bug that could >> removed the volatile statement on aggregate types like the following one >> >> union ipte_control { >> unsigned long val; >> struct { >> unsigned long k : 1; >> unsigned long kh : 31; >> unsigned long kg : 32; >> }; >> }; >> >> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 >> >> If I see that right, your __ALIGNED_WORD(x) >> macro would say that for above structure sizeof(x) == sizeof(long)) is true, >> so it would fall back to the old volatile cast and might reintroduce the >> old compiler bug? Oh dear, I should double check my sentences in emails before sending...anyway the full story is referenced in commit 60815cf2e05057db5b78e398d9734c493560b11e Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux which has a pointer to http://marc.info/?i=54611D86.4040306%40de.ibm.com which contains the full story. > > Ah, right, that's the missing piece. For some reason I didn't find > the reference in the source or the git log. > >> Could you maybe you fence your simple macro for anything older than 4.9? After >> all there was no kasan support anyway on these older gcc version. > > Yes, that should work, thanks!
On Thu, Mar 2, 2017 at 8:00 PM, Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 03/02/2017 06:55 PM, Arnd Bergmann wrote: >> On Thu, Mar 2, 2017 at 5:51 PM, Christian Borntraeger >> <borntraeger@de.ibm.com> wrote: >>> On 03/02/2017 05:38 PM, Arnd Bergmann wrote: >>>> >>>> This attempts a rewrite of the two macros, using a simpler implementation >>>> for the most common case of having a naturally aligned 1, 2, 4, or (on >>>> 64-bit architectures) 8 byte object that can be accessed with a single >>>> instruction. For these, we go back to a volatile pointer dereference >>>> that we had with the ACCESS_ONCE macro. >>> >>> We had changed that back then because gcc 4.6 and 4.7 had a bug that could >>> removed the volatile statement on aggregate types like the following one >>> >>> union ipte_control { >>> unsigned long val; >>> struct { >>> unsigned long k : 1; >>> unsigned long kh : 31; >>> unsigned long kg : 32; >>> }; >>> }; >>> >>> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 >>> >>> If I see that right, your __ALIGNED_WORD(x) >>> macro would say that for above structure sizeof(x) == sizeof(long)) is true, >>> so it would fall back to the old volatile cast and might reintroduce the >>> old compiler bug? > > Oh dear, I should double check my sentences in emails before sending...anyway > the full story is referenced in > > commit 60815cf2e05057db5b78e398d9734c493560b11e > Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux > which has a pointer to > http://marc.info/?i=54611D86.4040306%40de.ibm.com > which contains the full story. Ok, got it. So I guess the behavior of forcing aligned accesses on aligned data is accidental, and allowing non-power-of-two arguments is also not the main purpose. Maybe we could just bail out on new compilers if we get either of those? That might catch code that accidentally does something that is inherently non-atomic or that causes a trap when the intention was to have a simple atomic access. Arnd
On 03/02/2017 10:45 PM, Arnd Bergmann wrote: > On Thu, Mar 2, 2017 at 8:00 PM, Christian Borntraeger > <borntraeger@de.ibm.com> wrote: >> On 03/02/2017 06:55 PM, Arnd Bergmann wrote: >>> On Thu, Mar 2, 2017 at 5:51 PM, Christian Borntraeger >>> <borntraeger@de.ibm.com> wrote: >>>> On 03/02/2017 05:38 PM, Arnd Bergmann wrote: >>>>> >>>>> This attempts a rewrite of the two macros, using a simpler implementation >>>>> for the most common case of having a naturally aligned 1, 2, 4, or (on >>>>> 64-bit architectures) 8 byte object that can be accessed with a single >>>>> instruction. For these, we go back to a volatile pointer dereference >>>>> that we had with the ACCESS_ONCE macro. >>>> >>>> We had changed that back then because gcc 4.6 and 4.7 had a bug that could >>>> removed the volatile statement on aggregate types like the following one >>>> >>>> union ipte_control { >>>> unsigned long val; >>>> struct { >>>> unsigned long k : 1; >>>> unsigned long kh : 31; >>>> unsigned long kg : 32; >>>> }; >>>> }; >>>> >>>> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 >>>> >>>> If I see that right, your __ALIGNED_WORD(x) >>>> macro would say that for above structure sizeof(x) == sizeof(long)) is true, >>>> so it would fall back to the old volatile cast and might reintroduce the >>>> old compiler bug? >> >> Oh dear, I should double check my sentences in emails before sending...anyway >> the full story is referenced in >> >> commit 60815cf2e05057db5b78e398d9734c493560b11e >> Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux >> which has a pointer to >> http://marc.info/?i=54611D86.4040306%40de.ibm.com >> which contains the full story. > > Ok, got it. So I guess the behavior of forcing aligned accesses on aligned > data is accidental, and allowing non-power-of-two arguments is also not > the main purpose. Right. The main purpose is to read/write _ONCE_. You can assume a somewhat atomic access for sizes <= word size. And there are certainly places that rely on that. But the *ONCE thing is mostly used for things where we used barrier() 10 years ago. Maybe we could just bail out on new compilers if we get > either of those? That might catch code that accidentally does something > that is inherently non-atomic or that causes a trap when the intention was > to have a simple atomic access. I think Linus stated that its ok to assume that the compiler is smart enough to uses a single instruction to access aligned and properly sized scalar types for *ONCE. Back then when I changed ACCESS_ONCE there were many places that did use it for non-atomic, > word size accesses. For example on some architectures a pmd_t is a typedef to an array, for which there is no way to read that atomically. So the focus must be on the "ONCE" part. If some code uses a properly aligned, word sized object we can also assume atomic access. If the access is not properly sized/aligned we do not get atomicity, but we do get the "ONCE". But adding a check for alignment/size would break the compilation of some code.
On Fri, Mar 3, 2017 at 9:26 AM, Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 03/02/2017 10:45 PM, Arnd Bergmann wrote: >> Ok, got it. So I guess the behavior of forcing aligned accesses on aligned >> data is accidental, and allowing non-power-of-two arguments is also not >> the main purpose. > > > Right. The main purpose is to read/write _ONCE_. You can assume a somewhat > atomic access for sizes <= word size. And there are certainly places that > rely on that. But the *ONCE thing is mostly used for things where we used > barrier() 10 years ago. Ok > > Maybe we could just bail out on new compilers if we get >> either of those? That might catch code that accidentally does something >> that is inherently non-atomic or that causes a trap when the intention was >> to have a simple atomic access. > > I think Linus stated that its ok to assume that the compiler is smart enough > to uses a single instruction to access aligned and properly sized scalar types > for *ONCE. > > Back then when I changed ACCESS_ONCE there were many places that did use it > for non-atomic, > word size accesses. For example on some architectures a pmd_t > is a typedef to an array, for which there is no way to read that atomically. > So the focus must be on the "ONCE" part. > > If some code uses a properly aligned, word sized object we can also assume > atomic access. If the access is not properly sized/aligned we do not get > atomicity, but we do get the "ONCE". > But adding a check for alignment/size would break the compilation of some > code. So what should be the expected behavior for objects that have a smaller alignment? E.g. this structure struct fourbytes { char bytes[4]; } __packed; when passed into the current READ_ONCE() will be accessed with a 32-bit load, while reading it with struct fourbytes local = *(volatile struct fourbytes *)voidpointer; on architectures like ARMv5 or lower will turn into four single-byte reads to avoid an alignment trap when the pointer is actually unaligned. I can see arguments for and against either behavior, but what should I do when modifying it for newer compilers? The possible options that I see are - keep assuming that the pointer will be aligned at runtime and doesn't trap - use the regular gcc behavior and do byte-accesses on those architectures that otherwise might trap - add a runtime alignment check to do atomic accesses whenever possible, but never trap - fail the build Arnd
On Fri, Mar 03, 2017 at 09:26:50AM +0100, Christian Borntraeger wrote: > Right. The main purpose is to read/write _ONCE_. You can assume a somewhat > atomic access for sizes <= word size. And there are certainly places that > rely on that. But the *ONCE thing is mostly used for things where we used > barrier() 10 years ago. A lot of code relies on READ/WRITE_ONCE() to generate single instructions for naturally aligned machined word sized loads/stores (something GCC used to guarantee, but does no longer IIRC). So much so that I would say its a bug if READ/WRITE_ONCE() doesn't generate a single instruction under those conditions. However, every time I've tried to introduce stricter semantics/primitives to verify things Linus hated it.
On Fri, Mar 03, 2017 at 03:49:38PM +0100, Peter Zijlstra wrote: > On Fri, Mar 03, 2017 at 09:26:50AM +0100, Christian Borntraeger wrote: > > Right. The main purpose is to read/write _ONCE_. You can assume a somewhat > > atomic access for sizes <= word size. And there are certainly places that > > rely on that. But the *ONCE thing is mostly used for things where we used > > barrier() 10 years ago. > > A lot of code relies on READ/WRITE_ONCE() to generate single > instructions for naturally aligned machined word sized loads/stores > (something GCC used to guarantee, but does no longer IIRC). > > So much so that I would say its a bug if READ/WRITE_ONCE() doesn't > generate a single instruction under those conditions. > > However, every time I've tried to introduce stricter > semantics/primitives to verify things Linus hated it. See here for the last attempt: https://marc.info/?l=linux-virtualization&m=148007765918101&w=2
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index fcc5cd387fd1..0c243dd569fe 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -30,7 +30,7 @@ static inline void prepare_switch_to(struct task_struct *prev, * * To minimize cache pollution, just follow the stack pointer. */ - READ_ONCE(*(unsigned char *)next->thread.sp); + (void)READ_ONCE(*(unsigned char *)next->thread.sp); #endif } diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c index 952286f4826c..1c10632a48bb 100644 --- a/fs/overlayfs/util.c +++ b/fs/overlayfs/util.c @@ -222,8 +222,8 @@ void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry) void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper) { - WRITE_ONCE(inode->i_private, (unsigned long) realinode | - (is_upper ? OVL_ISUPPER_MASK : 0)); + WRITE_ONCE(inode->i_private, (void *)((unsigned long) realinode | + (is_upper ? OVL_ISUPPER_MASK : 0))); } void ovl_inode_update(struct inode *inode, struct inode *upperinode) @@ -231,7 +231,7 @@ void ovl_inode_update(struct inode *inode, struct inode *upperinode) WARN_ON(!upperinode); WARN_ON(!inode_unhashed(inode)); WRITE_ONCE(inode->i_private, - (unsigned long) upperinode | OVL_ISUPPER_MASK); + (void *)((unsigned long) upperinode | OVL_ISUPPER_MASK)); if (!S_ISDIR(upperinode->i_mode)) __insert_inode_hash(inode, (unsigned long) upperinode); } diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 56b90897a459..b619f5853af8 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -288,6 +288,10 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s } } +#define __ALIGNED_WORD(x) \ + ((sizeof(x) == 1 || sizeof(x) == 2 || sizeof(x) == 4 || \ + sizeof(x) == sizeof(long)) && (sizeof(x) == __alignof__(x))) \ + /* * Prevent the compiler from merging or refetching reads or writes. The * compiler is also forbidden from reordering successive instances of @@ -309,8 +313,13 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s * mutilate accesses that either do not require ordering or that interact * with an explicit memory barrier or atomic instruction that provides the * required ordering. + * + * Unaligned data is particularly tricky here: if the type that gets + * passed in is not naturally aligned, we cast to a type of higher + * alignment, which is not well-defined in C. This is fine as long + * as the actual data is aligned, but otherwise might require a trap + * to satisfy the load. */ - #define __READ_ONCE(x, check) \ ({ \ union { typeof(x) __val; char __c[1]; } __u; \ @@ -320,7 +329,32 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \ __u.__val; \ }) -#define READ_ONCE(x) __READ_ONCE(x, 1) + +#define __WRITE_ONCE(x, val) \ +({ \ + union { typeof(x) __val; char __c[1]; } __u = \ + { .__val = (__force typeof(x)) (val) }; \ + __write_once_size(&(x), __u.__c, sizeof(x)); \ + __u.__val; \ +}) + + +/* + * the common case is simple: x is naturally aligned, not an array, + * and accessible with a single load, avoiding the need for local + * variables. With KASAN, this is important as any call to + *__write_once_size(),__read_once_size_nocheck() or __read_once_size() + * uses significant amounts of stack space for checking that we don't + * overflow the union. + */ +#define __READ_ONCE_SIMPLE(x) \ + (typeof(x))(*(volatile typeof(&(x)))&(x)) + +#define __WRITE_ONCE_SIMPLE(x, val) \ + ({*(volatile typeof(&(x)))&(x) = (val); }) + +#define READ_ONCE(x) __builtin_choose_expr(__ALIGNED_WORD(x), \ + __READ_ONCE_SIMPLE(x), __READ_ONCE(x, 1)) /* * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need @@ -328,13 +362,8 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s */ #define READ_ONCE_NOCHECK(x) __READ_ONCE(x, 0) -#define WRITE_ONCE(x, val) \ -({ \ - union { typeof(x) __val; char __c[1]; } __u = \ - { .__val = (__force typeof(x)) (val) }; \ - __write_once_size(&(x), __u.__c, sizeof(x)); \ - __u.__val; \ -}) +#define WRITE_ONCE(x, val) do { __builtin_choose_expr(__ALIGNED_WORD(x), \ + __WRITE_ONCE_SIMPLE(x, val), __WRITE_ONCE(x, val)); } while (0) #endif /* __KERNEL__ */
When CONFIG_KASAN is enabled, the READ_ONCE/WRITE_ONCE macros cause rather large kernel stacks, e.g.: mm/vmscan.c: In function 'shrink_page_list': mm/vmscan.c:1333:1: error: the frame size of 3456 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] block/cfq-iosched.c: In function 'cfqg_stats_add_aux': block/cfq-iosched.c:750:1: error: the frame size of 4048 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] fs/btrfs/disk-io.c: In function 'open_ctree': fs/btrfs/disk-io.c:3314:1: error: the frame size of 3136 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] fs/btrfs/relocation.c: In function 'build_backref_tree': fs/btrfs/relocation.c:1193:1: error: the frame size of 4336 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] fs/fscache/stats.c: In function 'fscache_stats_show': fs/fscache/stats.c:287:1: error: the frame size of 6512 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] fs/jbd2/commit.c: In function 'jbd2_journal_commit_transaction': fs/jbd2/commit.c:1139:1: error: the frame size of 3760 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] This attempts a rewrite of the two macros, using a simpler implementation for the most common case of having a naturally aligned 1, 2, 4, or (on 64-bit architectures) 8 byte object that can be accessed with a single instruction. For these, we go back to a volatile pointer dereference that we had with the ACCESS_ONCE macro. READ_ONCE/WRITE_ONCE also try to handle unaligned objects and objects of other sizes by forcing either a word-size access (which may trap on some architectures) or doing a non-atomic memcpy. I could not figure out what these are actually used for, but they appear to be done intentionally, so I'm leaving that code untouched. I had to fix up a couple of files that either use WRITE_ONCE() as an implicit typecast, or ignore the result of READ_ONCE(). In all cases, the modified code seems no worse to me than the original. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/x86/include/asm/switch_to.h | 2 +- fs/overlayfs/util.c | 6 ++--- include/linux/compiler.h | 47 ++++++++++++++++++++++++++++++++-------- 3 files changed, 42 insertions(+), 13 deletions(-)