diff mbox

[v4] mmc: sdio: check the buffer address for sdio API

Message ID 1486428890-28187-1-git-send-email-shawn.lin@rock-chips.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shawn Lin Feb. 7, 2017, 12:54 a.m. UTC
It's fine if the host driver use PIO mode to transfer the
vmalloc area buffer but not for DMA mode. The sdio APIs haven't
provide the capability to tell the caller whether it will use DMA
to finish the IO transfer or not, so don't give the randomly
insmoded sdio function driver the possibility to break the kernel.
Also the APIs shouldn't take the liberty to do a copy for these
cases and just kick out these requests should be enough.

This issue is observed by doing insmod a downloaded wifi module
driver and the kernel panic right away. Unfortunately we don't have
the source code but adding this patch that it proves that the module
driver was passing on a vmalloc area buffer for sdio APIs.

It could very easy to be reproduced by writing a simple function
module driver and insmod it, and panic log looks like:

unsigned u8 __aligned(32) buf[PAGE_SIZE];

static int wifi_probe(struct sdio_func *func, const struct
sdio_device_id *id)
{
	sdio_claim_host(func);
	sdio_enable_func(func);
	sdio_memcpy_toio(func, 0x0, buf, 0x200);
}

[  236.748210] wifi_probe: buf = 0xffffff8000a40b80
[  236.748258] swiotlb_tbl_map_single: orig_addr = 0xfffffffff8c40b80,
tlb_addr = 0xf7eae000 //I added log here
[  236.748276] Unable to handle kernel paging request at virtual address
fffffffff8c40b80
[  236.776486] pgd = ffffffc0b3417000
[  236.776789] [fffffffff8c40b80] *pgd=00000000b3427003,
*pud=00000000b3427003, *pmd=0000000000000000
[  236.777601] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  236.778093] Modules linked in: drvtst(O+)
[  236.778463] CPU: 0 PID: 1918 Comm: insmod Tainted: G           O
4.4.36 #25
[  236.779096] Hardware name: Rockchip RK3399 Evaluation Board v3 edp
(Android) (DT)
[  236.779753] task: ffffffc0e3db0c40 ti: ffffffc0b342c000 task.ti:
ffffffc0b342c000
[  236.780418] PC is at __memcpy+0x100/0x180
[  236.780777] LR is at swiotlb_tbl_map_single+0x254/0x274
[  236.781237] pc : [<ffffff80083579c0>] lr : [<ffffff800837cf70>]
...

[  236.941392] f460: 3d20726464615f67 6666666666783020
[  236.941826] [<ffffff80083579c0>] __memcpy+0x100/0x180
[  236.942274] [<ffffff800837dc18>] swiotlb_map_sg_attrs+0xa8/0x170
[  236.942810] [<ffffff800809359c>] __swiotlb_map_sg_attrs+0x24/0x8c
[  236.943353] [<ffffff800871113c>]
dw_mci_pre_dma_transfer.isra.16+0xf0/0x11c
[  236.943967] [<ffffff80087129cc>] __dw_mci_start_request+0x17c/0x4d0
[  236.944520] [<ffffff80087132e0>] dw_mci_request+0xb8/0xf0
[  236.945002] [<ffffff80086f8b44>] __mmc_start_request+0x9c/0xc0
[  236.945520] [<ffffff80087ae0e8>]
mmc_start_request.part.17+0x100/0x11c
[  236.946097] [<ffffff80086f9c74>] mmc_wait_for_req+0x78/0x1a8
[  236.946600] [<ffffff800870421c>] mmc_io_rw_extended+0x27c/0x2fc
[  236.947124] [<ffffff8008705800>] sdio_io_rw_ext_helper+0x1e4/0x238
[  236.947670] [<ffffff8008705954>] sdio_memcpy_toio+0x24/0x2c
[  236.948169] [<ffffff8000a40108>] wifi_probe+0xa8/0x198 [drvtst]
[  236.948708] [<ffffff8008704734>] sdio_bus_probe+0xb0/0x140
[  236.949195] [<ffffff80084bfd6c>] driver_probe_device+0x118/0x2b0
[  236.949724] [<ffffff80084bff68>] __driver_attach+0x64/0x90
[  236.950212] [<ffffff80084bee18>] bus_for_each_dev+0x80/0xb0
[  236.950706] [<ffffff80084bf8ac>] driver_attach+0x20/0x28
[  236.951176] [<ffffff80084bf45c>] bus_add_driver+0xe8/0x1ec
[  236.951661] [<ffffff80084c0afc>] driver_register+0x98/0xe4
[  236.952147] [<ffffff8008704580>] sdio_register_driver+0x24/0x2c
[  236.952675] [<ffffff8000a40228>] wifi_sdio_init+0x30/0x68 [drvtst]
[  236.953241] [<ffffff8000a4400c>] wifi_drv_init+0xc/0x38 [drvtst]
[  236.953789] [<ffffff8008082ba4>] do_one_initcall+0x17c/0x198
[  236.954293] [<ffffff800815e564>] do_init_module+0x60/0x1b8
[  236.954779] [<ffffff80081150f0>] load_module+0x1660/0x1a50
[  236.955264] [<ffffff800811562c>] SyS_init_module+0x14c/0x180
[  236.955765] [<ffffff80080826f0>] el0_svc_naked+0x24/0x28

So the kernel crash since the vmalloc area buffer, on-stack buffer or
the on-module buffer aren't physical continuous and swiotlb couldn't find
the correct mapping address in fact. Anyway, don't give the chance to panic
kernel by using sdio APIs.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>

---

Changes in v4:
- fix build warning by gcc-6 (Debian 6.2.0-3) 6.2.0 20160901

Changes in v3:
- fix build issue reported by Kbuild Robot

Changes in v2:
- add new function to check the addr of vmalloc/module/stack

 drivers/mmc/core/sdio_io.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

Comments

Ulf Hansson Feb. 14, 2017, 9:17 a.m. UTC | #1
+Russell, Jens, Christoph, linux-block (asking for help in review)

On 7 February 2017 at 01:54, Shawn Lin <shawn.lin@rock-chips.com> wrote:
> It's fine if the host driver use PIO mode to transfer the
> vmalloc area buffer but not for DMA mode. The sdio APIs haven't
> provide the capability to tell the caller whether it will use DMA
> to finish the IO transfer or not, so don't give the randomly
> insmoded sdio function driver the possibility to break the kernel.
> Also the APIs shouldn't take the liberty to do a copy for these
> cases and just kick out these requests should be enough.
>
> This issue is observed by doing insmod a downloaded wifi module
> driver and the kernel panic right away. Unfortunately we don't have
> the source code but adding this patch that it proves that the module
> driver was passing on a vmalloc area buffer for sdio APIs.
>
> It could very easy to be reproduced by writing a simple function
> module driver and insmod it, and panic log looks like:
>
> unsigned u8 __aligned(32) buf[PAGE_SIZE];
>
> static int wifi_probe(struct sdio_func *func, const struct
> sdio_device_id *id)
> {
>         sdio_claim_host(func);
>         sdio_enable_func(func);
>         sdio_memcpy_toio(func, 0x0, buf, 0x200);
> }
>
> [  236.748210] wifi_probe: buf = 0xffffff8000a40b80
> [  236.748258] swiotlb_tbl_map_single: orig_addr = 0xfffffffff8c40b80,
> tlb_addr = 0xf7eae000 //I added log here
> [  236.748276] Unable to handle kernel paging request at virtual address
> fffffffff8c40b80
> [  236.776486] pgd = ffffffc0b3417000
> [  236.776789] [fffffffff8c40b80] *pgd=00000000b3427003,
> *pud=00000000b3427003, *pmd=0000000000000000
> [  236.777601] Internal error: Oops: 96000005 [#1] PREEMPT SMP
> [  236.778093] Modules linked in: drvtst(O+)
> [  236.778463] CPU: 0 PID: 1918 Comm: insmod Tainted: G           O
> 4.4.36 #25
> [  236.779096] Hardware name: Rockchip RK3399 Evaluation Board v3 edp
> (Android) (DT)
> [  236.779753] task: ffffffc0e3db0c40 ti: ffffffc0b342c000 task.ti:
> ffffffc0b342c000
> [  236.780418] PC is at __memcpy+0x100/0x180
> [  236.780777] LR is at swiotlb_tbl_map_single+0x254/0x274
> [  236.781237] pc : [<ffffff80083579c0>] lr : [<ffffff800837cf70>]
> ...
>
> [  236.941392] f460: 3d20726464615f67 6666666666783020
> [  236.941826] [<ffffff80083579c0>] __memcpy+0x100/0x180
> [  236.942274] [<ffffff800837dc18>] swiotlb_map_sg_attrs+0xa8/0x170
> [  236.942810] [<ffffff800809359c>] __swiotlb_map_sg_attrs+0x24/0x8c
> [  236.943353] [<ffffff800871113c>]
> dw_mci_pre_dma_transfer.isra.16+0xf0/0x11c
> [  236.943967] [<ffffff80087129cc>] __dw_mci_start_request+0x17c/0x4d0
> [  236.944520] [<ffffff80087132e0>] dw_mci_request+0xb8/0xf0
> [  236.945002] [<ffffff80086f8b44>] __mmc_start_request+0x9c/0xc0
> [  236.945520] [<ffffff80087ae0e8>]
> mmc_start_request.part.17+0x100/0x11c
> [  236.946097] [<ffffff80086f9c74>] mmc_wait_for_req+0x78/0x1a8
> [  236.946600] [<ffffff800870421c>] mmc_io_rw_extended+0x27c/0x2fc
> [  236.947124] [<ffffff8008705800>] sdio_io_rw_ext_helper+0x1e4/0x238
> [  236.947670] [<ffffff8008705954>] sdio_memcpy_toio+0x24/0x2c
> [  236.948169] [<ffffff8000a40108>] wifi_probe+0xa8/0x198 [drvtst]
> [  236.948708] [<ffffff8008704734>] sdio_bus_probe+0xb0/0x140
> [  236.949195] [<ffffff80084bfd6c>] driver_probe_device+0x118/0x2b0
> [  236.949724] [<ffffff80084bff68>] __driver_attach+0x64/0x90
> [  236.950212] [<ffffff80084bee18>] bus_for_each_dev+0x80/0xb0
> [  236.950706] [<ffffff80084bf8ac>] driver_attach+0x20/0x28
> [  236.951176] [<ffffff80084bf45c>] bus_add_driver+0xe8/0x1ec
> [  236.951661] [<ffffff80084c0afc>] driver_register+0x98/0xe4
> [  236.952147] [<ffffff8008704580>] sdio_register_driver+0x24/0x2c
> [  236.952675] [<ffffff8000a40228>] wifi_sdio_init+0x30/0x68 [drvtst]
> [  236.953241] [<ffffff8000a4400c>] wifi_drv_init+0xc/0x38 [drvtst]
> [  236.953789] [<ffffff8008082ba4>] do_one_initcall+0x17c/0x198
> [  236.954293] [<ffffff800815e564>] do_init_module+0x60/0x1b8
> [  236.954779] [<ffffff80081150f0>] load_module+0x1660/0x1a50
> [  236.955264] [<ffffff800811562c>] SyS_init_module+0x14c/0x180
> [  236.955765] [<ffffff80080826f0>] el0_svc_naked+0x24/0x28
>
> So the kernel crash since the vmalloc area buffer, on-stack buffer or
> the on-module buffer aren't physical continuous and swiotlb couldn't find
> the correct mapping address in fact. Anyway, don't give the chance to panic
> kernel by using sdio APIs.
>
> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>
> ---
>
> Changes in v4:
> - fix build warning by gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
>
> Changes in v3:
> - fix build issue reported by Kbuild Robot
>
> Changes in v2:
> - add new function to check the addr of vmalloc/module/stack
>
>  drivers/mmc/core/sdio_io.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/sdio_io.c b/drivers/mmc/core/sdio_io.c
> index 406e5f0..4df7c6f 100644
> --- a/drivers/mmc/core/sdio_io.c
> +++ b/drivers/mmc/core/sdio_io.c
> @@ -10,6 +10,7 @@
>   */
>
>  #include <linux/export.h>
> +#include <linux/mm.h>
>  #include <linux/mmc/host.h>
>  #include <linux/mmc/card.h>
>  #include <linux/mmc/sdio.h>
> @@ -298,6 +299,19 @@ unsigned int sdio_align_size(struct sdio_func *func, unsigned int sz)
>  }
>  EXPORT_SYMBOL_GPL(sdio_align_size);
>
> +static int sdio_io_addr_sanity_check(void *buf)
> +{
> +#if defined(CONFIG_MODULES) && defined(MODULES_VADDR)
> +       unsigned long addr = (unsigned long)buf;
> +
> +       if ((addr >= MODULES_VADDR && addr < MODULES_END) ||
> +           object_is_on_stack(buf))
> +               return 1;
> +#endif
> +       return (is_vmalloc_addr(buf) || object_is_on_stack(buf));
> +
> +}
> +
>  /* Split an arbitrarily sized data transfer into several
>   * IO_RW_EXTENDED commands. */
>  static int sdio_io_rw_ext_helper(struct sdio_func *func, int write,
> @@ -307,7 +321,7 @@ static int sdio_io_rw_ext_helper(struct sdio_func *func, int write,
>         unsigned max_blocks;
>         int ret;
>
> -       if (!func || (func->num > 7))
> +       if (!func || (func->num > 7) || sdio_io_addr_sanity_check(buf))
>                 return -EINVAL;
>
>         /* Do the bulk of the transfer using block mode (if supported). */
> --
> 1.9.1
>
>

I somewhat understand the issue you are trying to solve when
validating the buffer for DMA. However, I don't know what a proper fix
should be. Especially since I have seen nothing similar in the kernel
so far.

Therefore I have looped in Russell King, Jens Axboe, Christoph Hellwig
and linux-block to ask for help in reviewing this. Hopefully we can
get some advise here.

To give some more background to the folkz above, the SDIO func API
(part of the MMC subsystem) is intended to be used by for example WLAN
drivers, which may be built as kernel modules. Part of SDIO func API
is about writing/reading buffers to/from an SDIO card. That may
involve doing DMA transactions, depending on what the corresponding
MMC/SDIO host controller/driver supports.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Feb. 14, 2017, 4:18 p.m. UTC | #2
On 02/14/2017 02:17 AM, Ulf Hansson wrote:
> +Russell, Jens, Christoph, linux-block (asking for help in review)
> 
> On 7 February 2017 at 01:54, Shawn Lin <shawn.lin@rock-chips.com> wrote:
>> It's fine if the host driver use PIO mode to transfer the
>> vmalloc area buffer but not for DMA mode. The sdio APIs haven't
>> provide the capability to tell the caller whether it will use DMA
>> to finish the IO transfer or not, so don't give the randomly
>> insmoded sdio function driver the possibility to break the kernel.
>> Also the APIs shouldn't take the liberty to do a copy for these
>> cases and just kick out these requests should be enough.
>>
>> This issue is observed by doing insmod a downloaded wifi module
>> driver and the kernel panic right away. Unfortunately we don't have
>> the source code but adding this patch that it proves that the module
>> driver was passing on a vmalloc area buffer for sdio APIs.
>>
>> It could very easy to be reproduced by writing a simple function
>> module driver and insmod it, and panic log looks like:
>>
>> unsigned u8 __aligned(32) buf[PAGE_SIZE];
>>
>> static int wifi_probe(struct sdio_func *func, const struct
>> sdio_device_id *id)
>> {
>>         sdio_claim_host(func);
>>         sdio_enable_func(func);
>>         sdio_memcpy_toio(func, 0x0, buf, 0x200);
>> }
>>
>> [  236.748210] wifi_probe: buf = 0xffffff8000a40b80
>> [  236.748258] swiotlb_tbl_map_single: orig_addr = 0xfffffffff8c40b80,
>> tlb_addr = 0xf7eae000 //I added log here
>> [  236.748276] Unable to handle kernel paging request at virtual address
>> fffffffff8c40b80
>> [  236.776486] pgd = ffffffc0b3417000
>> [  236.776789] [fffffffff8c40b80] *pgd=00000000b3427003,
>> *pud=00000000b3427003, *pmd=0000000000000000
>> [  236.777601] Internal error: Oops: 96000005 [#1] PREEMPT SMP
>> [  236.778093] Modules linked in: drvtst(O+)
>> [  236.778463] CPU: 0 PID: 1918 Comm: insmod Tainted: G           O
>> 4.4.36 #25
>> [  236.779096] Hardware name: Rockchip RK3399 Evaluation Board v3 edp
>> (Android) (DT)
>> [  236.779753] task: ffffffc0e3db0c40 ti: ffffffc0b342c000 task.ti:
>> ffffffc0b342c000
>> [  236.780418] PC is at __memcpy+0x100/0x180
>> [  236.780777] LR is at swiotlb_tbl_map_single+0x254/0x274
>> [  236.781237] pc : [<ffffff80083579c0>] lr : [<ffffff800837cf70>]
>> ...
>>
>> [  236.941392] f460: 3d20726464615f67 6666666666783020
>> [  236.941826] [<ffffff80083579c0>] __memcpy+0x100/0x180
>> [  236.942274] [<ffffff800837dc18>] swiotlb_map_sg_attrs+0xa8/0x170
>> [  236.942810] [<ffffff800809359c>] __swiotlb_map_sg_attrs+0x24/0x8c
>> [  236.943353] [<ffffff800871113c>]
>> dw_mci_pre_dma_transfer.isra.16+0xf0/0x11c
>> [  236.943967] [<ffffff80087129cc>] __dw_mci_start_request+0x17c/0x4d0
>> [  236.944520] [<ffffff80087132e0>] dw_mci_request+0xb8/0xf0
>> [  236.945002] [<ffffff80086f8b44>] __mmc_start_request+0x9c/0xc0
>> [  236.945520] [<ffffff80087ae0e8>]
>> mmc_start_request.part.17+0x100/0x11c
>> [  236.946097] [<ffffff80086f9c74>] mmc_wait_for_req+0x78/0x1a8
>> [  236.946600] [<ffffff800870421c>] mmc_io_rw_extended+0x27c/0x2fc
>> [  236.947124] [<ffffff8008705800>] sdio_io_rw_ext_helper+0x1e4/0x238
>> [  236.947670] [<ffffff8008705954>] sdio_memcpy_toio+0x24/0x2c
>> [  236.948169] [<ffffff8000a40108>] wifi_probe+0xa8/0x198 [drvtst]
>> [  236.948708] [<ffffff8008704734>] sdio_bus_probe+0xb0/0x140
>> [  236.949195] [<ffffff80084bfd6c>] driver_probe_device+0x118/0x2b0
>> [  236.949724] [<ffffff80084bff68>] __driver_attach+0x64/0x90
>> [  236.950212] [<ffffff80084bee18>] bus_for_each_dev+0x80/0xb0
>> [  236.950706] [<ffffff80084bf8ac>] driver_attach+0x20/0x28
>> [  236.951176] [<ffffff80084bf45c>] bus_add_driver+0xe8/0x1ec
>> [  236.951661] [<ffffff80084c0afc>] driver_register+0x98/0xe4
>> [  236.952147] [<ffffff8008704580>] sdio_register_driver+0x24/0x2c
>> [  236.952675] [<ffffff8000a40228>] wifi_sdio_init+0x30/0x68 [drvtst]
>> [  236.953241] [<ffffff8000a4400c>] wifi_drv_init+0xc/0x38 [drvtst]
>> [  236.953789] [<ffffff8008082ba4>] do_one_initcall+0x17c/0x198
>> [  236.954293] [<ffffff800815e564>] do_init_module+0x60/0x1b8
>> [  236.954779] [<ffffff80081150f0>] load_module+0x1660/0x1a50
>> [  236.955264] [<ffffff800811562c>] SyS_init_module+0x14c/0x180
>> [  236.955765] [<ffffff80080826f0>] el0_svc_naked+0x24/0x28
>>
>> So the kernel crash since the vmalloc area buffer, on-stack buffer or
>> the on-module buffer aren't physical continuous and swiotlb couldn't find
>> the correct mapping address in fact. Anyway, don't give the chance to panic
>> kernel by using sdio APIs.
>>
>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>
>> ---
>>
>> Changes in v4:
>> - fix build warning by gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
>>
>> Changes in v3:
>> - fix build issue reported by Kbuild Robot
>>
>> Changes in v2:
>> - add new function to check the addr of vmalloc/module/stack
>>
>>  drivers/mmc/core/sdio_io.c | 16 +++++++++++++++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/mmc/core/sdio_io.c b/drivers/mmc/core/sdio_io.c
>> index 406e5f0..4df7c6f 100644
>> --- a/drivers/mmc/core/sdio_io.c
>> +++ b/drivers/mmc/core/sdio_io.c
>> @@ -10,6 +10,7 @@
>>   */
>>
>>  #include <linux/export.h>
>> +#include <linux/mm.h>
>>  #include <linux/mmc/host.h>
>>  #include <linux/mmc/card.h>
>>  #include <linux/mmc/sdio.h>
>> @@ -298,6 +299,19 @@ unsigned int sdio_align_size(struct sdio_func *func, unsigned int sz)
>>  }
>>  EXPORT_SYMBOL_GPL(sdio_align_size);
>>
>> +static int sdio_io_addr_sanity_check(void *buf)
>> +{
>> +#if defined(CONFIG_MODULES) && defined(MODULES_VADDR)
>> +       unsigned long addr = (unsigned long)buf;
>> +
>> +       if ((addr >= MODULES_VADDR && addr < MODULES_END) ||
>> +           object_is_on_stack(buf))
>> +               return 1;
>> +#endif
>> +       return (is_vmalloc_addr(buf) || object_is_on_stack(buf));
>> +
>> +}
>> +
>>  /* Split an arbitrarily sized data transfer into several
>>   * IO_RW_EXTENDED commands. */
>>  static int sdio_io_rw_ext_helper(struct sdio_func *func, int write,
>> @@ -307,7 +321,7 @@ static int sdio_io_rw_ext_helper(struct sdio_func *func, int write,
>>         unsigned max_blocks;
>>         int ret;
>>
>> -       if (!func || (func->num > 7))
>> +       if (!func || (func->num > 7) || sdio_io_addr_sanity_check(buf))
>>                 return -EINVAL;
>>
>>         /* Do the bulk of the transfer using block mode (if supported). */
>> --
>> 1.9.1
>>
>>
> 
> I somewhat understand the issue you are trying to solve when
> validating the buffer for DMA. However, I don't know what a proper fix
> should be. Especially since I have seen nothing similar in the kernel
> so far.
> 
> Therefore I have looped in Russell King, Jens Axboe, Christoph Hellwig
> and linux-block to ask for help in reviewing this. Hopefully we can
> get some advise here.
> 
> To give some more background to the folkz above, the SDIO func API
> (part of the MMC subsystem) is intended to be used by for example WLAN
> drivers, which may be built as kernel modules. Part of SDIO func API
> is about writing/reading buffers to/from an SDIO card. That may
> involve doing DMA transactions, depending on what the corresponding
> MMC/SDIO host controller/driver supports.

The current situation seems like a bit of a mess. Why don't you have two
entry points, one for DMA and one for PIO. If the caller doesn't know if
he can use DMA, he'd better call the PIO variant. Either that, or audit
all callers and ensure they do the right thing wrt having a dma capable
buffer.

A check for can-do-dma should be restricted to size/alignment
constraints based on the hardware, not try and catch all weird cases of
whether or not that buffer is on the stack, heap, etc.
Russell King (Oracle) Feb. 14, 2017, 7:34 p.m. UTC | #3
On Tue, Feb 14, 2017 at 09:18:43AM -0700, Jens Axboe wrote:
> The current situation seems like a bit of a mess. Why don't you have two
> entry points, one for DMA and one for PIO. If the caller doesn't know if
> he can use DMA, he'd better call the PIO variant. Either that, or audit
> all callers and ensure they do the right thing wrt having a dma capable
> buffer.

It really shouldn't matter.  MMC interfaces are just like USB - you
have a host controller, which interfaces what is a multi-lane serial
bus to the system.  The SDIO card shouldn't care one bit whether
the host controller is using DMA or PIO.

However, I think that the DMA vs PIO thing is actually misleading here,
that's really not the issue at all.

Looking at the oops, I see it uses sdio_memcpy_toio().  Tracing that
code leads me to here:

                for_each_sg(data.sg, sg_ptr, data.sg_len, i) {
                        sg_set_page(sg_ptr, virt_to_page(buf + (i * seg_size)),
                                        min(seg_size, left_size),
                                        offset_in_page(buf + (i * seg_size)));

so the buffer that is passed into sdio_memcpy_toio() gets passed into
virt_to_page().

Firstly, the fact that it's passed to virt_to_page() means that "buf"
must _only_ _ever_ be a lowmem address.  It can't ever be a vmalloc
address (virt_to_page() is invalid on anything but lowmem.)  Just like
certain kernel interfaces, passing pointers to memory of different types
from the one intended by the interface produces invalid results, and
that seems to be what's happening here.

Secondly, it's a scatterlist, and scatterlists can be passed to DMA
mapping operations, which also implies that _if_ a host driver decides
to use DMA on it, the buffer better be DMA-able.

Thirdly, while PIO may work (or even appear to work) because _maybe_
converting a vmalloc address to a ficticious struct page + offset, and
then converting that back again _might_ result in hitting the correct
memory, but it's not guaranteed to.

I suspect that virt_to_page() + kmap_atomic() is likely to try to
dereference a struct page pointer that does not point at a legal entry
in the memmap arrays, and result in scribbling over some random part
of kernel memory.

So every way I look at this, the binary driver that Shawn downloaded
is buggy, whether the host controller uses PIO or DMA.

I bet if Shawn tries running it against a modern kernel with
CONFIG_DEBUG_VIRTUAL enabled, Shawn will get complaints backing up
my claim.
Christoph Hellwig Feb. 14, 2017, 7:45 p.m. UTC | #4
On Tue, Feb 07, 2017 at 08:54:50AM +0800, Shawn Lin wrote:
> It's fine if the host driver use PIO mode to transfer the
> vmalloc area buffer but not for DMA mode. The sdio APIs haven't
> provide the capability to tell the caller whether it will use DMA
> to finish the IO transfer or not,

Wether you dma or pio does not matter.  Addressability requirements
are slightly different, but it's nothing your patch is going to help
with.

> so don't give the randomly
> insmoded sdio function driver the possibility to break the kernel.
> Also the APIs shouldn't take the liberty to do a copy for these
> cases and just kick out these requests should be enough.
> 
> This issue is observed by doing insmod a downloaded wifi module
> driver and the kernel panic right away. Unfortunately we don't have
> the source code but adding this patch that it proves that the module
> driver was passing on a vmalloc area buffer for sdio APIs.

So don't use that illegally redistributed driver.  Working around it
is certainly nothing the upstream kernel cares about at all.
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shawn Lin Feb. 15, 2017, 4:12 a.m. UTC | #5
Hi Russell,

On 2017/2/15 3:34, Russell King - ARM Linux wrote:
> On Tue, Feb 14, 2017 at 09:18:43AM -0700, Jens Axboe wrote:
>> The current situation seems like a bit of a mess. Why don't you have two
>> entry points, one for DMA and one for PIO. If the caller doesn't know if
>> he can use DMA, he'd better call the PIO variant. Either that, or audit
>> all callers and ensure they do the right thing wrt having a dma capable
>> buffer.
>
> It really shouldn't matter.  MMC interfaces are just like USB - you
> have a host controller, which interfaces what is a multi-lane serial
> bus to the system.  The SDIO card shouldn't care one bit whether
> the host controller is using DMA or PIO.
>
> However, I think that the DMA vs PIO thing is actually misleading here,
> that's really not the issue at all.
>
> Looking at the oops, I see it uses sdio_memcpy_toio().  Tracing that
> code leads me to here:
>
>                 for_each_sg(data.sg, sg_ptr, data.sg_len, i) {
>                         sg_set_page(sg_ptr, virt_to_page(buf + (i * seg_size)),
>                                         min(seg_size, left_size),
>                                         offset_in_page(buf + (i * seg_size)));
>
> so the buffer that is passed into sdio_memcpy_toio() gets passed into
> virt_to_page().
>
> Firstly, the fact that it's passed to virt_to_page() means that "buf"
> must _only_ _ever_ be a lowmem address.  It can't ever be a vmalloc
> address (virt_to_page() is invalid on anything but lowmem.)  Just like
> certain kernel interfaces, passing pointers to memory of different types
> from the one intended by the interface produces invalid results, and
> that seems to be what's happening here.
>
> Secondly, it's a scatterlist, and scatterlists can be passed to DMA
> mapping operations, which also implies that _if_ a host driver decides
> to use DMA on it, the buffer better be DMA-able.
>
> Thirdly, while PIO may work (or even appear to work) because _maybe_
> converting a vmalloc address to a ficticious struct page + offset, and
> then converting that back again _might_ result in hitting the correct
> memory, but it's not guaranteed to.
>

[1]:
If no DMA involved, the host drivers usually use memcpy or readl/writel
to transfer the data between MMIO address and buffer coming from the
caller. So, is it also not guaranteed when using memcpy or readl to
transfer data between MMIO address and vmalloc/heap buffer?

> I suspect that virt_to_page() + kmap_atomic() is likely to try to
> dereference a struct page pointer that does not point at a legal entry
> in the memmap arrays, and result in scribbling over some random part
> of kernel memory.

If that is the fact, so what I am concerned mostly is that by
seraching the APIs, sdio_writeb and sdio_readb, under the drivers/net
/wireless/, I could see almost all sdio based WLAN drivers passed in
a vmalloc area(actually when built as moudle, it should be located in
MODULE range which also be included as vmalloc area, no?) or heap
buffer.

I assume my question[1] above is fine, then thanks to none of the mmc
host drivers use DMA for sdio_writeb and sdio_readb since it only
request one byte which didn't be fetched from host FIFO and the host
controller HW didn't support this kind of request to use DMA(but may be
not in the future). Otherwise, it may result in scribbling over some
random part of kernel memory.

Actually we didn't see that issues before, so I think the actual
question should be if the buffer from heap or vmalloc will be used with 
DMA involved?


>
> So every way I look at this, the binary driver that Shawn downloaded
> is buggy, whether the host controller uses PIO or DMA.

So it's really more dangerous that we were/are taking risking of
scribbling over some memory belonging to other code even if using
PIO if it also not guaranteed to use heap/vmalloc buffer..

>
> I bet if Shawn tries running it against a modern kernel with
> CONFIG_DEBUG_VIRTUAL enabled, Shawn will get complaints backing up
> my claim.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King (Oracle) Feb. 15, 2017, 9:55 a.m. UTC | #6
On Wed, Feb 15, 2017 at 12:12:47PM +0800, Shawn Lin wrote:
> Hi Russell,
> 
> On 2017/2/15 3:34, Russell King - ARM Linux wrote:
> >On Tue, Feb 14, 2017 at 09:18:43AM -0700, Jens Axboe wrote:
> >>The current situation seems like a bit of a mess. Why don't you have two
> >>entry points, one for DMA and one for PIO. If the caller doesn't know if
> >>he can use DMA, he'd better call the PIO variant. Either that, or audit
> >>all callers and ensure they do the right thing wrt having a dma capable
> >>buffer.
> >
> >It really shouldn't matter.  MMC interfaces are just like USB - you
> >have a host controller, which interfaces what is a multi-lane serial
> >bus to the system.  The SDIO card shouldn't care one bit whether
> >the host controller is using DMA or PIO.
> >
> >However, I think that the DMA vs PIO thing is actually misleading here,
> >that's really not the issue at all.
> >
> >Looking at the oops, I see it uses sdio_memcpy_toio().  Tracing that
> >code leads me to here:
> >
> >                for_each_sg(data.sg, sg_ptr, data.sg_len, i) {
> >                        sg_set_page(sg_ptr, virt_to_page(buf + (i * seg_size)),
> >                                        min(seg_size, left_size),
> >                                        offset_in_page(buf + (i * seg_size)));
> >
> >so the buffer that is passed into sdio_memcpy_toio() gets passed into
> >virt_to_page().
> >
> >Firstly, the fact that it's passed to virt_to_page() means that "buf"
> >must _only_ _ever_ be a lowmem address.  It can't ever be a vmalloc
> >address (virt_to_page() is invalid on anything but lowmem.)  Just like
> >certain kernel interfaces, passing pointers to memory of different types
> >from the one intended by the interface produces invalid results, and
> >that seems to be what's happening here.
> >
> >Secondly, it's a scatterlist, and scatterlists can be passed to DMA
> >mapping operations, which also implies that _if_ a host driver decides
> >to use DMA on it, the buffer better be DMA-able.
> >
> >Thirdly, while PIO may work (or even appear to work) because _maybe_
> >converting a vmalloc address to a ficticious struct page + offset, and
> >then converting that back again _might_ result in hitting the correct
> >memory, but it's not guaranteed to.
> >
> 
> [1]:
> If no DMA involved, the host drivers usually use memcpy or readl/writel
> to transfer the data between MMIO address and buffer coming from the
> caller. So, is it also not guaranteed when using memcpy or readl to
> transfer data between MMIO address and vmalloc/heap buffer?

The point here is, if buf is a vmalloc address:

	v = kmap_atomic(virt_to_page(buf))

v may be _anything_ at all.  The kmap_atomic() will be done by the MMC
host driver, the virt_to_page() is done by the code I quoted above.

v may be a lowmem page address.  v may be somewhere in userspace.  v may
be some device mapping.  v may be (if you're lucky) the same address as
"buf".  There's no guarantees what it will be.

Note that the scatterlist above does _not_ store the virtual address
that was passed into it, but only the struct page, offset and length.
So, drivers can not know what the original virtual address was.

> >I suspect that virt_to_page() + kmap_atomic() is likely to try to
> >dereference a struct page pointer that does not point at a legal entry
> >in the memmap arrays, and result in scribbling over some random part
> >of kernel memory.
> 
> If that is the fact, so what I am concerned mostly is that by
> seraching the APIs, sdio_writeb and sdio_readb, under the drivers/net
> /wireless/, I could see almost all sdio based WLAN drivers passed in
> a vmalloc area(actually when built as moudle, it should be located in
> MODULE range which also be included as vmalloc area, no?) or heap
> buffer.

sdio_readb() and sdio_writeb() convey the data in the command stream,
not the data stream.  Firstly, sdio_writeb() takes the actual value to
be written, so the memory storage is irrelevant (on many platforms, it
will be passed as a value in CPU registers.)  Eventually, it's written
into the MMC command structure as the command argument, which typically
ends up being written to a host controller register.

In the case of sdio_readb(), the returned value comes from the command
status, which is typically read from registers in the host controller.
mmc_io_rw_direct_host() writes the value to the passed address.  Hence,
the host controller, again, never sees the address.

> I assume my question[1] above is fine, then thanks to none of the mmc
> host drivers use DMA for sdio_writeb and sdio_readb since it only
> request one byte which didn't be fetched from host FIFO and the host
> controller HW didn't support this kind of request to use DMA(but may be
> not in the future). Otherwise, it may result in scribbling over some
> random part of kernel memory.

See above, your understanding of how sdio_readb() and sdio_writeb() is
not correct.  These are single value transfer functions, using
mmc_io_rw_direct() as the underlying method of access, and do not
transfer anything over the data lines.

These functions are quite different from sdio_memcpy_toio(),
sdio_memcpy_fromio(), sdio_readsb() and sdio_writesb(), which are
multiple value transfer functions, and these perform the transfer using
the data lines.  These use sdio_io_rw_ext_helper(), which then uses
mmc_io_rw_extended() to perform the transfer.  The code I quoted in my
original email is from mmc_io_rw_extended().

The command path is entirely separate from the data path in most MMC
host controllers.  The command path is commonly PIO, the data path is
commonly DMA, but host controllers _may_ choose PIO for small transfers
or when they have no DMA support.
diff mbox

Patch

diff --git a/drivers/mmc/core/sdio_io.c b/drivers/mmc/core/sdio_io.c
index 406e5f0..4df7c6f 100644
--- a/drivers/mmc/core/sdio_io.c
+++ b/drivers/mmc/core/sdio_io.c
@@ -10,6 +10,7 @@ 
  */
 
 #include <linux/export.h>
+#include <linux/mm.h>
 #include <linux/mmc/host.h>
 #include <linux/mmc/card.h>
 #include <linux/mmc/sdio.h>
@@ -298,6 +299,19 @@  unsigned int sdio_align_size(struct sdio_func *func, unsigned int sz)
 }
 EXPORT_SYMBOL_GPL(sdio_align_size);
 
+static int sdio_io_addr_sanity_check(void *buf)
+{
+#if defined(CONFIG_MODULES) && defined(MODULES_VADDR)
+	unsigned long addr = (unsigned long)buf;
+
+	if ((addr >= MODULES_VADDR && addr < MODULES_END) ||
+	    object_is_on_stack(buf))
+		return 1;
+#endif
+	return (is_vmalloc_addr(buf) || object_is_on_stack(buf));
+
+}
+
 /* Split an arbitrarily sized data transfer into several
  * IO_RW_EXTENDED commands. */
 static int sdio_io_rw_ext_helper(struct sdio_func *func, int write,
@@ -307,7 +321,7 @@  static int sdio_io_rw_ext_helper(struct sdio_func *func, int write,
 	unsigned max_blocks;
 	int ret;
 
-	if (!func || (func->num > 7))
+	if (!func || (func->num > 7) || sdio_io_addr_sanity_check(buf))
 		return -EINVAL;
 
 	/* Do the bulk of the transfer using block mode (if supported). */