diff mbox series

[1/4] EDAC: Fix calculation of returned address and next offset in edac_align_ptr()

Message ID 20220113100622.12783-2-farbere@amazon.com (mailing list archive)
State New, archived
Headers show
Series edac_align_ptr() bug fix and refactoring | expand

Commit Message

Eliav Farber Jan. 13, 2022, 10:06 a.m. UTC
Do alignment logic properly and use 'ptr' for calculating the remainder
of the alignment.

This became an issue because 'struct edac_mc_layer' has a size that is
not zero modulo eight, and the next offset that was prepared for the
private-data was unaligned, causing an alignment exception.

Signed-off-by: Eliav Farber <farbere@amazon.com>
---
 drivers/edac/edac_mc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Borislav Petkov Jan. 25, 2022, 2:37 p.m. UTC | #1
On Thu, Jan 13, 2022 at 10:06:19AM +0000, Eliav Farber wrote:
> Do alignment logic properly and use 'ptr' for calculating the remainder
> of the alignment.
> 
> This became an issue because 'struct edac_mc_layer' has a size that is
> not zero modulo eight, and the next offset that was prepared for the
> private-data was unaligned, causing an alignment exception.

How exactly did this "become an issue"?

I'm asking because I have been hearing about weird bugs with that
pointer alignment contraption and have never managed to reproduce them
myself or hear a proper explanation from people.

And that thing is an abomination, frankly, and I'd like to get rid of it
but maybe some other time...

So, please explain more verbosely, a specific example or how I could
reproduce it, would be even better.

Thx.
Eliav Farber Jan. 27, 2022, 9:58 a.m. UTC | #2
On 1/25/2022 4:37 PM, Borislav Petkov wrote:
> How exactly did this "become an issue"?
One of the fields in our private-data structure is 'struct notifier_block'
which has a next field of type 'struct notifier_block __rcu *'.
The size of our private-data structure is greater than 8, and it comes after
'struct edac_mc_layer' which has a size that is not zero modulo eight, and
also ends at an address that is not zero modulo eight.
Because of the bug in edac_align_ptr(), our private-data structure which
should have been aligned to 8 wasn't (it was aligned to 4), so
notifier_block was also not aligned to 8, and finally next wasn't aligned
to 8.

> So, please explain more verbosely, a specific example or how I could
> reproduce it, would be even better.

Our al_mc_edac driver calls atomic_notifier_chain_register() on probe, to
add the notifier_block to panic_notifier_list.
We probe the driver more than once, and each time we use the same value for
the priority field in the notifier_block (so the newer notifier_block should
come later in panic_notifier_list).
When the driver is probed for the second time, we get an unable to handle
kernel paging request panic at rcu_assign_pointer() which is called from
notifier_chain_register().
It happens when rcu_assign_pointer() tries to set the unaligned next pointer
from the first probe, to point to the new notifier_block of the second
probe.

Unable to handle kernel paging request at virtual address ffff8013e8037f4c
Mem abort info:
   ESR = 0x96000061
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
Data abort info:
   ISV = 0, ISS = 0x00000061
   CM = 0, WnR = 1
   swapper pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
   [ffff8013e8037f4c] pgd=00000013ffff8003, pud=00680013c0000711
   Internal error: Oops: 96000061 [#1] SMP
   Modules linked in:
   Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
   CPU: 10 PID: 1 Comm: swapper/0 Not tainted 4.19.191 #1016
   Hardware name: Annapurna Labs Alpine V3 EVP (DT)
   pstate: 20000085 (nzCv daIf -PAN -UAO)
   pc : atomic_notifier_chain_register+0x80/0xb8
   lr : atomic_notifier_chain_register+0x38/0xb8
sp : ffff0000097d3b10
x29: ffff0000097d3b10 x28: ffff000009108068
x27: ffff0000095ca000 x26: ffff8013ed0a8744
x25: ffff0000096e3000 x24: ffff000009199000
x23: ffff8013ed016810 x22: ffff8013ed016800
x21: 0000000000000000 x20: ffff8013ed0a8744
x19: ffff0000095ca6d8 x18: ffffffffffffffff
x17: 0000000000000000 x16: 0000000000000000
x15: ffff0000091996c8 x14: 2820564544203a63
x13: 6d5f6c612072656c x12: 6c6f72746e6f6320
x11: 636164655f636d5f x10: ffff000009199918
x9 : ffff000009173018 x8 : ffff00000878ec80
x7 : 676e69766947203a x6 : 00000000000002b1
x5 : 000000000000003f x4 : 0000000000000000
x3 : 000000007fffffff x2 : ffff8013e8037f4c
x1 : 0000000000000096 x0 : ffff0000091bacf8
Call trace:
  atomic_notifier_chain_register+0x80/0xb8
  al_mc_edac_probe+0x224/0x468
  platform_drv_probe+0x58/0xa8
really_probe+0x2cc/0x3b8
driver_probe_device+0x12c/0x148
__driver_attach+0x148/0x150
bus_for_each_dev+0x84/0xd8
driver_attach+0x30/0x40
bus_add_driver+0x174/0x2a8
driver_register+0x64/0x110
__platform_driver_register+0x54/0x60
al_mc_edac_driver_init+0x20/0x28
do_one_initcall+0x54/0x208
kernel_init_freeable+0x294/0x354
kernel_init+0x18/0x118
ret_from_fork+0x10/0x18
Code: 91002002 f9400400 b5ffff60 f9000680 (c89ffc54)
---[ end trace dba8c8c6291afa5b ]---
Kernel panic - not syncing: Fatal exception
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x0,20006008
Memory Limit: none
---[ end Kernel panic - not syncing: Fatal exception ]---
Borislav Petkov Feb. 15, 2022, 12:34 p.m. UTC | #3
On Thu, Jan 27, 2022 at 11:58:58AM +0200, Farber, Eliav wrote:
> One of the fields in our private-data structure is 'struct notifier_block'
> which has a next field of type 'struct notifier_block __rcu *'.

Just to make sure I understand you correctly: you're talking about some
internal version of al_mc_edac - not what's upstream?

> The size of our private-data structure is greater than 8, and it comes after
> 'struct edac_mc_layer' which has a size that is not zero modulo eight, and
> also ends at an address that is not zero modulo eight.
> Because of the bug in edac_align_ptr(), our private-data structure which
> should have been aligned to 8 wasn't (it was aligned to 4), so
> notifier_block was also not aligned to 8, and finally next wasn't aligned
> to 8.

Ok, I think I see what's going on. So this:

8447c4d15e35 ("edac: Do alignment logic properly in edac_align_ptr()")

changed that remainder computation thing and it was supposed to do what
your patch does. Hell, even the commit message says so:

"The logic was checking the sizeof the structure being allocated to
determine whether an alignment fixup was required. This isn't right;
what we actually care about is the alignment of the actual pointer
that's about to be returned."

So if we really do care about the alignment of the actual pointer we're
about to return, then yes, we should check ptr - not p.

Lemme add that newly discovered info to your patch and queue it.

Thx.
Eliav Farber Feb. 15, 2022, 1:06 p.m. UTC | #4
On 2/15/2022 2:34 PM, Borislav Petkov wrote:
> Just to make sure I understand you correctly: you're talking about some
> internal version of al_mc_edac - not what's upstream?

Yes, I'm talking about an internal version that wasn't up-streamed yet.

--
Regards, Eliav
diff mbox series

Patch

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index fd440b35d76e..61d72bd96754 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -265,7 +265,7 @@  void *edac_align_ptr(void **p, unsigned size, int n_elems)
 	else
 		return (char *)ptr;
 
-	r = (unsigned long)p % align;
+	r = (unsigned long)ptr % align;
 
 	if (r == 0)
 		return (char *)ptr;