Message ID | 20200618131632.32748-3-bstroesser@ts.fujitsu.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 3145550a7f8b08356c8ff29feaa6c56aca12901d |
Headers | show |
Series | scsi: target: tcmu: fix crashes on ARM | expand |
On 6/18/20 8:16 AM, Bodo Stroesser wrote: > This patch fixes the following crash > (see https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=208045__;!!GqivPVa7Brio!O7JgZIL3VPAzIqwJfZjL48y8M90K3HXv3pUVeoCzZ-vXovCpx5g7xMg-z5aAiVXVtkfE$ ) > > Process iscsi_trx (pid: 7496, stack limit = 0x0000000010dd111a) > CPU: 0 PID: 7496 Comm: iscsi_trx Not tainted 4.19.118-0419118-generic > #202004230533 > Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 2019 > pstate: 80400005 (Nzcv daif +PAN -UAO) > pc : flush_dcache_page+0x18/0x40 > lr : is_ring_space_avail+0x68/0x2f8 [target_core_user] > sp : ffff000015123a80 > x29: ffff000015123a80 x28: 0000000000000000 > x27: 0000000000001000 x26: ffff000023ea5000 > x25: ffffcfa25bbe08b8 x24: 0000000000000078 > x23: ffff7e0000000000 x22: ffff000023ea5001 > x21: ffffcfa24b79c000 x20: 0000000000000fff > x19: ffff7e00008fa940 x18: 0000000000000000 > x17: 0000000000000000 x16: ffff2d047e709138 > x15: 0000000000000000 x14: 0000000000000000 > x13: 0000000000000000 x12: ffff2d047fbd0a40 > x11: 0000000000000000 x10: 0000000000000030 > x9 : 0000000000000000 x8 : ffffc9a254820a00 > x7 : 00000000000013b0 x6 : 000000000000003f > x5 : 0000000000000040 x4 : ffffcfa25bbe08e8 > x3 : 0000000000001000 x2 : 0000000000000078 > x1 : ffffcfa25bbe08b8 x0 : ffff2d040bc88a18 > Call trace: > flush_dcache_page+0x18/0x40 > is_ring_space_avail+0x68/0x2f8 [target_core_user] > queue_cmd_ring+0x1f8/0x680 [target_core_user] > tcmu_queue_cmd+0xe4/0x158 [target_core_user] > __target_execute_cmd+0x30/0xf0 [target_core_mod] > target_execute_cmd+0x294/0x390 [target_core_mod] > transport_generic_new_cmd+0x1e8/0x358 [target_core_mod] > transport_handle_cdb_direct+0x50/0xb0 [target_core_mod] > iscsit_execute_cmd+0x2b4/0x350 [iscsi_target_mod] > iscsit_sequence_cmd+0xd8/0x1d8 [iscsi_target_mod] > iscsit_process_scsi_cmd+0xac/0xf8 [iscsi_target_mod] > iscsit_get_rx_pdu+0x404/0xd00 [iscsi_target_mod] > iscsi_target_rx_thread+0xb8/0x130 [iscsi_target_mod] > kthread+0x130/0x138 > ret_from_fork+0x10/0x18 > Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (f9400260) > ---[ end trace 1e451c73f4266776 ]--- > > The solution is based on patch: > > "scsi: target: tcmu: Optimize use of flush_dcache_page" > > which restricts the use of tcmu_flush_dcache_range() to > addresses from vmalloc'ed areas only. > > This patch now replaces the virt_to_page() call in > tcmu_flush_dcache_range() - which is wrong for vmalloced addrs - > by vmalloc_to_page(). > > The patch was tested on ARM with kernel 4.19.118 and 5.7.2 > > Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> > Tested-by: JiangYu <lnsyyj@hotmail.com> > Tested-by: Daniel Meyerholt <dxm523@gmail.com> > --- > drivers/target/target_core_user.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c > index a65e9671ae7a..835d3001cb0e 100644 > --- a/drivers/target/target_core_user.c > +++ b/drivers/target/target_core_user.c > @@ -601,7 +601,7 @@ static inline void tcmu_flush_dcache_range(void *vaddr, size_t size) > size = round_up(size+offset, PAGE_SIZE); > > while (size) { > - flush_dcache_page(virt_to_page(start)); > + flush_dcache_page(vmalloc_to_page(start)); > start += PAGE_SIZE; > size -= PAGE_SIZE; > } For this bug we only need to change the flush in is_ring_space_avail right? It's what is accessing the mb which is vmalloc'd. Is it reccommended to call vmalloc_to_page on a page we get with alloc_page and is the mm then always going to do the right thing for us (I do not know the mm code well and just quickly scanned the Documentation and comments, but could not find anything), or could we hit similar issues where we are using the wrong call on different types of allocated memory down the line?
On 2020-06-18 17:00, Mike Christie wrote: > On 6/18/20 8:16 AM, Bodo Stroesser wrote: >> This patch fixes the following crash >> (see >> https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=208045__;!!GqivPVa7Brio!O7JgZIL3VPAzIqwJfZjL48y8M90K3HXv3pUVeoCzZ-vXovCpx5g7xMg-z5aAiVXVtkfE$ >> ) >> >> Process iscsi_trx (pid: 7496, stack limit = 0x0000000010dd111a) >> CPU: 0 PID: 7496 Comm: iscsi_trx Not tainted 4.19.118-0419118-generic >> #202004230533 >> Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 2019 >> pstate: 80400005 (Nzcv daif +PAN -UAO) >> pc : flush_dcache_page+0x18/0x40 >> lr : is_ring_space_avail+0x68/0x2f8 [target_core_user] >> sp : ffff000015123a80 >> x29: ffff000015123a80 x28: 0000000000000000 >> x27: 0000000000001000 x26: ffff000023ea5000 >> x25: ffffcfa25bbe08b8 x24: 0000000000000078 >> x23: ffff7e0000000000 x22: ffff000023ea5001 >> x21: ffffcfa24b79c000 x20: 0000000000000fff >> x19: ffff7e00008fa940 x18: 0000000000000000 >> x17: 0000000000000000 x16: ffff2d047e709138 >> x15: 0000000000000000 x14: 0000000000000000 >> x13: 0000000000000000 x12: ffff2d047fbd0a40 >> x11: 0000000000000000 x10: 0000000000000030 >> x9 : 0000000000000000 x8 : ffffc9a254820a00 >> x7 : 00000000000013b0 x6 : 000000000000003f >> x5 : 0000000000000040 x4 : ffffcfa25bbe08e8 >> x3 : 0000000000001000 x2 : 0000000000000078 >> x1 : ffffcfa25bbe08b8 x0 : ffff2d040bc88a18 >> Call trace: >> flush_dcache_page+0x18/0x40 >> is_ring_space_avail+0x68/0x2f8 [target_core_user] >> queue_cmd_ring+0x1f8/0x680 [target_core_user] >> tcmu_queue_cmd+0xe4/0x158 [target_core_user] >> __target_execute_cmd+0x30/0xf0 [target_core_mod] >> target_execute_cmd+0x294/0x390 [target_core_mod] >> transport_generic_new_cmd+0x1e8/0x358 [target_core_mod] >> transport_handle_cdb_direct+0x50/0xb0 [target_core_mod] >> iscsit_execute_cmd+0x2b4/0x350 [iscsi_target_mod] >> iscsit_sequence_cmd+0xd8/0x1d8 [iscsi_target_mod] >> iscsit_process_scsi_cmd+0xac/0xf8 [iscsi_target_mod] >> iscsit_get_rx_pdu+0x404/0xd00 [iscsi_target_mod] >> iscsi_target_rx_thread+0xb8/0x130 [iscsi_target_mod] >> kthread+0x130/0x138 >> ret_from_fork+0x10/0x18 >> Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (f9400260) >> ---[ end trace 1e451c73f4266776 ]--- >> >> The solution is based on patch: >> >> "scsi: target: tcmu: Optimize use of flush_dcache_page" >> >> which restricts the use of tcmu_flush_dcache_range() to >> addresses from vmalloc'ed areas only. >> >> This patch now replaces the virt_to_page() call in >> tcmu_flush_dcache_range() - which is wrong for vmalloced addrs - >> by vmalloc_to_page(). >> >> The patch was tested on ARM with kernel 4.19.118 and 5.7.2 >> >> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> >> Tested-by: JiangYu <lnsyyj@hotmail.com> >> Tested-by: Daniel Meyerholt <dxm523@gmail.com> >> --- >> drivers/target/target_core_user.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/target/target_core_user.c >> b/drivers/target/target_core_user.c >> index a65e9671ae7a..835d3001cb0e 100644 >> --- a/drivers/target/target_core_user.c >> +++ b/drivers/target/target_core_user.c >> @@ -601,7 +601,7 @@ static inline void tcmu_flush_dcache_range(void >> *vaddr, size_t size) >> size = round_up(size+offset, PAGE_SIZE); >> while (size) { >> - flush_dcache_page(virt_to_page(start)); >> + flush_dcache_page(vmalloc_to_page(start)); >> start += PAGE_SIZE; >> size -= PAGE_SIZE; >> } > > For this bug we only need to change the flush in is_ring_space_avail > right? It's what is accessing the mb which is vmalloc'd. No, is_ring_space_avail was just the first caller of tcmu_flush_dcache_range for vmalloc'ed pages, so it crashed there. The entiere address range exposed to userspace via uio consists of two parts: 1) mb + command ring are vmalloc'ed in a single vzalloc call during initialization of a tcmu device. 2) the data area which is allocated page by page calling alloc_page() The second part is handled by (scatter|gather)_data_area. For the calls from these routines I think usage of virt_to_page in tcmu_flush_dcache_range was fine. But patch number 1 of this series replaced these called with direct calls to flush_dcache_page. So all remaining calls to tcmu_flush_dcache_range after patch 1 are calls for vmalloc'ed addresses. Therefore after patch 1 we can safely replace virt_to_page with vmalloc_to_page. So it turned out that patch 1, which in the beginning was thought to be an optimization, now also simplifies the crash fix. > > Is it reccommended to call vmalloc_to_page on a page we get with > alloc_page and is the mm then always going to do the right thing for us > (I do not know the mm code well and just quickly scanned the > Documentation and comments, but could not find anything), or could we > hit similar issues where we are using the wrong call on different types > of allocated memory down the line?
On 6/18/20 10:41 AM, Bodo Stroesser wrote: > On 2020-06-18 17:00, Mike Christie wrote: >> On 6/18/20 8:16 AM, Bodo Stroesser wrote: >>> This patch fixes the following crash >>> (see >>> https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=208045__;!!GqivPVa7Brio!O7JgZIL3VPAzIqwJfZjL48y8M90K3HXv3pUVeoCzZ-vXovCpx5g7xMg-z5aAiVXVtkfE$ >>> ) >>> >>> Process iscsi_trx (pid: 7496, stack limit = 0x0000000010dd111a) >>> CPU: 0 PID: 7496 Comm: iscsi_trx Not tainted 4.19.118-0419118-generic >>> #202004230533 >>> Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 >>> 2019 >>> pstate: 80400005 (Nzcv daif +PAN -UAO) >>> pc : flush_dcache_page+0x18/0x40 >>> lr : is_ring_space_avail+0x68/0x2f8 [target_core_user] >>> sp : ffff000015123a80 >>> x29: ffff000015123a80 x28: 0000000000000000 >>> x27: 0000000000001000 x26: ffff000023ea5000 >>> x25: ffffcfa25bbe08b8 x24: 0000000000000078 >>> x23: ffff7e0000000000 x22: ffff000023ea5001 >>> x21: ffffcfa24b79c000 x20: 0000000000000fff >>> x19: ffff7e00008fa940 x18: 0000000000000000 >>> x17: 0000000000000000 x16: ffff2d047e709138 >>> x15: 0000000000000000 x14: 0000000000000000 >>> x13: 0000000000000000 x12: ffff2d047fbd0a40 >>> x11: 0000000000000000 x10: 0000000000000030 >>> x9 : 0000000000000000 x8 : ffffc9a254820a00 >>> x7 : 00000000000013b0 x6 : 000000000000003f >>> x5 : 0000000000000040 x4 : ffffcfa25bbe08e8 >>> x3 : 0000000000001000 x2 : 0000000000000078 >>> x1 : ffffcfa25bbe08b8 x0 : ffff2d040bc88a18 >>> Call trace: >>> flush_dcache_page+0x18/0x40 >>> is_ring_space_avail+0x68/0x2f8 [target_core_user] >>> queue_cmd_ring+0x1f8/0x680 [target_core_user] >>> tcmu_queue_cmd+0xe4/0x158 [target_core_user] >>> __target_execute_cmd+0x30/0xf0 [target_core_mod] >>> target_execute_cmd+0x294/0x390 [target_core_mod] >>> transport_generic_new_cmd+0x1e8/0x358 [target_core_mod] >>> transport_handle_cdb_direct+0x50/0xb0 [target_core_mod] >>> iscsit_execute_cmd+0x2b4/0x350 [iscsi_target_mod] >>> iscsit_sequence_cmd+0xd8/0x1d8 [iscsi_target_mod] >>> iscsit_process_scsi_cmd+0xac/0xf8 [iscsi_target_mod] >>> iscsit_get_rx_pdu+0x404/0xd00 [iscsi_target_mod] >>> iscsi_target_rx_thread+0xb8/0x130 [iscsi_target_mod] >>> kthread+0x130/0x138 >>> ret_from_fork+0x10/0x18 >>> Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (f9400260) >>> ---[ end trace 1e451c73f4266776 ]--- >>> >>> The solution is based on patch: >>> >>> "scsi: target: tcmu: Optimize use of flush_dcache_page" >>> >>> which restricts the use of tcmu_flush_dcache_range() to >>> addresses from vmalloc'ed areas only. >>> >>> This patch now replaces the virt_to_page() call in >>> tcmu_flush_dcache_range() - which is wrong for vmalloced addrs - >>> by vmalloc_to_page(). >>> >>> The patch was tested on ARM with kernel 4.19.118 and 5.7.2 >>> >>> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> >>> Tested-by: JiangYu <lnsyyj@hotmail.com> >>> Tested-by: Daniel Meyerholt <dxm523@gmail.com> >>> --- >>> drivers/target/target_core_user.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/target/target_core_user.c >>> b/drivers/target/target_core_user.c >>> index a65e9671ae7a..835d3001cb0e 100644 >>> --- a/drivers/target/target_core_user.c >>> +++ b/drivers/target/target_core_user.c >>> @@ -601,7 +601,7 @@ static inline void tcmu_flush_dcache_range(void >>> *vaddr, size_t size) >>> size = round_up(size+offset, PAGE_SIZE); >>> while (size) { >>> - flush_dcache_page(virt_to_page(start)); >>> + flush_dcache_page(vmalloc_to_page(start)); >>> start += PAGE_SIZE; >>> size -= PAGE_SIZE; >>> } >> >> For this bug we only need to change the flush in is_ring_space_avail >> right? It's what is accessing the mb which is vmalloc'd. > No, is_ring_space_avail was just the first caller of > tcmu_flush_dcache_range for vmalloc'ed pages, so it crashed there. > > The entiere address range exposed to userspace via uio consists of > two parts: > 1) mb + command ring are vmalloc'ed in a single vzalloc call > during initialization of a tcmu device. > 2) the data area which is allocated page by page calling > alloc_page() > > The second part is handled by (scatter|gather)_data_area. For the > calls from these routines I think usage of virt_to_page in > tcmu_flush_dcache_range was fine. But patch number 1 of this > series replaced these called with direct calls to flush_dcache_page. That was my problem. Did not see the replacement. Looks good to me.
diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index a65e9671ae7a..835d3001cb0e 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -601,7 +601,7 @@ static inline void tcmu_flush_dcache_range(void *vaddr, size_t size) size = round_up(size+offset, PAGE_SIZE); while (size) { - flush_dcache_page(virt_to_page(start)); + flush_dcache_page(vmalloc_to_page(start)); start += PAGE_SIZE; size -= PAGE_SIZE; }