Message ID | 20191220063829.11873-1-wgong@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ath10k: select the min len of RX skb for sdio | expand |
Does this mess up alignment or anything, feels wrong to just take the smaller but I don't understand why it isn't the correct size, why is it? Should a warning be printed? On Thu, Dec 19, 2019 at 10:39 PM Wen Gong <wgong@codeaurora.org> wrote: > > When simulate random transfer fail for sdio write and read, it crash > sometimes. > > Test steps: > 1. Add config and update kernel: > CONFIG_FAIL_MMC_REQUEST=y > CONFIG_FAULT_INJECTION=y > CONFIG_FAULT_INJECTION_DEBUG_FS=y > > 2. run simulate fail: > cd /sys/kernel/debug/mmc1/fail_mmc_request > echo 10 > probability > echo 10 > times # repeat until hitting issues > > 3. it crash, the act len of ath10k_htc_hdr is higher than allocate len, it cause panic: > [ 99.723482] skbuff: skb_over_panic: text:00000000caa0f780 len:57013 put:57013 head:000000004116f24a data:0000000019ecb4dc tail:0xdef5 end:0x640 dev:<NULL> > [ 99.737697] ------------[ cut here ]------------ > [ 99.742327] kernel BUG at /mnt/host/source/src/third_party/kernel/v4.19/net/core/skbuff.c:104! > [ 99.750937] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > [ 99.831154] Process kworker/0:2 (pid: 151, stack limit = 0x00000000728010bf) > [ 99.838200] CPU: 0 PID: 151 Comm: kworker/0:2 Tainted: G W 4.19.85 #48 > [ 99.846022] Hardware name: MediaTek krane sku0 board (DT) > [ 99.851429] Workqueue: events sdio_irq_work > [ 99.855614] pstate: 60000005 (nZCv daif -PAN -UAO) > [ 99.860402] pc : skb_panic+0x64/0x68 > [ 99.863974] lr : skb_panic+0x64/0x68 > [ 99.867542] sp : ffffff8008833a90 > [ 99.870850] x29: ffffff8008833ac0 x28: ffffffe52e337370 > [ 99.876159] x27: ffffffe52e328a90 x26: 000000000000e0d0 > [ 99.881469] x25: ffffffe52e336b60 x24: 000000000000deb5 > [ 99.886779] x23: ffffffe52e340680 x22: ffffffe4efd47e00 > [ 99.892088] x21: 000000000000deb5 x20: ffffffa516d85b4c > [ 99.897397] x19: ffffffa526928037 x18: 0000000000000000 > [ 99.902706] x17: 000000000000003c x16: ffffffa5265b6c80 > [ 99.908015] x15: 0000000000000006 x14: 3a76656420303436 > [ 99.913325] x13: 0000000000029bf0 x12: 0000000000000000 > [ 99.918634] x11: 0000000000000000 x10: 0000000000000000 > [ 99.923943] x9 : a3b907e4b2783000 x8 : a3b907e4b2783000 > [ 99.929253] x7 : 0000000000000000 x6 : ffffffa526f66d76 > [ 99.934563] x5 : 0000000000000000 x4 : 0000000000000000 > [ 99.939872] x3 : 000000000002a5ab x2 : ffffffe53feed918 > [ 99.945182] x1 : ffffffe53fee4a08 x0 : 000000000000008e > [ 99.950491] Call trace: > [ 99.952937] skb_panic+0x64/0x68 > [ 99.956165] skb_put+0x7c/0x84 > [ 99.959224] ath10k_sdio_irq_handler+0x740/0xbb8 [ath10k_sdio] > [ 99.965055] process_sdio_pending_irqs+0x58/0x1a4 > [ 99.969758] sdio_run_irqs+0x34/0x60 > [ 99.973329] sdio_irq_work+0x1c/0x28 > [ 99.974930] cros-ec-spi spi2.0: SPI transfer timed out > [ 99.976904] process_one_work+0x210/0x410 > [ 99.976911] worker_thread+0x234/0x3dc > [ 99.976923] kthread+0x120/0x130 > [ 99.982090] cros-ec-spi spi2.0: spi transfer failed: -110 > [ 99.986054] ret_from_fork+0x10/0x18 > [ 99.986063] Code: aa1403e2 2a1503e4 a90023e9 97e37d1a (d4210000) > [ 99.986068] ---[ end trace cb6d948c5a0fd6c7 ]--- > [ 100.017250] Kernel panic - not syncing: Fatal exception > [ 100.018879] cros-ec-spi spi2.0: Command xfer error (err:-110) > [ 100.023659] SMP: stopping secondary CPUs > [ 100.023703] Kernel Offset: 0x251dc00000 from 0xffffff8008000000 > [ 100.023707] CPU features: 0x0,2188200c > [ 100.023709] Memory Limit: none > > The simulate fail of sdio is not a real sdio transter fail, it only > set an error status in mmc_should_fail_request after the transfer end, > actually the transfer is success, then sdio_io_rw_ext_helper will > return error status and stop transfer the left data. For example, > the really RX len is 286 bytes, then it will split to 2 blocks in > sdio_io_rw_ext_helper, one is 256 bytes, left is 30 bytes, if the > first 256 bytes get an error status by mmc_should_fail_request,then > the left 30 bytes will not read in this RX operation. Then when the > next RX arrive, the left 30 bytes will be considered as the header > of the read, the top 8 bytes will be considered as ath10k_htc_hdr, > but actually the 8 bytes is not the ath10k_htc_hdr, so the act_len > from this ath10k_htc_hdr is not correct, if it is a big value, such > as 57013, it will trigger skb_panic. > > Select the min value of act_len and alloc_len for skb_put and > skb_put_data is reasonable. > > This patch only effect sdio chips. > > Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-00029. > > Signed-off-by: Wen Gong <wgong@codeaurora.org> > --- > drivers/net/wireless/ath/ath10k/sdio.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath10k/sdio.c b/drivers/net/wireless/ath/ath10k/sdio.c > index 7b894dcaad2e..71ce2674c543 100644 > --- a/drivers/net/wireless/ath/ath10k/sdio.c > +++ b/drivers/net/wireless/ath/ath10k/sdio.c > @@ -651,7 +651,7 @@ static int ath10k_sdio_mbox_rx_fetch(struct ath10k *ar) > > htc_hdr = (struct ath10k_htc_hdr *)skb->data; > pkt->act_len = le16_to_cpu(htc_hdr->len) + sizeof(*htc_hdr); > - skb_put(skb, pkt->act_len); > + skb_put(skb, min_t(size_t, pkt->alloc_len, pkt->act_len)); > > return ret; > } > @@ -687,7 +687,8 @@ static int ath10k_sdio_mbox_rx_fetch_bundle(struct ath10k *ar) > htc_hdr = (struct ath10k_htc_hdr *)(ar_sdio->vsg_buffer + pkt_offset); > pkt->act_len = le16_to_cpu(htc_hdr->len) + sizeof(*htc_hdr); > > - skb_put_data(pkt->skb, htc_hdr, pkt->act_len); > + skb_put_data(pkt->skb, htc_hdr, > + min_t(size_t, pkt->alloc_len, pkt->act_len)); > pkt_offset += pkt->alloc_len; > } > > -- > 2.23.0
On 2019-12-20 15:35, Justin Capella wrote: > Does this mess up alignment or anything, feels wrong to just take the > smaller but I don't understand why it isn't the correct size, why is > it? Should a warning be printed? > When tested with the simulate sdio transfer fail test, it will trigger the len is not correct, the commit log have description for the reason of wrong size. > On Thu, Dec 19, 2019 at 10:39 PM Wen Gong <wgong@codeaurora.org> wrote: >> >> When simulate random transfer fail for sdio write and read, it crash >> sometimes. >> >> Test steps: >> 1. Add config and update kernel: >> CONFIG_FAIL_MMC_REQUEST=y >> CONFIG_FAULT_INJECTION=y >> CONFIG_FAULT_INJECTION_DEBUG_FS=y >> >> 2. run simulate fail: >> cd /sys/kernel/debug/mmc1/fail_mmc_request >> echo 10 > probability >> echo 10 > times # repeat until hitting issues >> >> 3. it crash, the act len of ath10k_htc_hdr is higher than allocate >> len, it cause panic: >> [ 99.723482] skbuff: skb_over_panic: text:00000000caa0f780 len:57013 >> put:57013 head:000000004116f24a data:0000000019ecb4dc tail:0xdef5 >> end:0x640 dev:<NULL> >> [ 99.737697] ------------[ cut here ]------------ >> [ 99.742327] kernel BUG at >> /mnt/host/source/src/third_party/kernel/v4.19/net/core/skbuff.c:104! >> [ 99.750937] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP >> [ 99.831154] Process kworker/0:2 (pid: 151, stack limit = >> 0x00000000728010bf) >> [ 99.838200] CPU: 0 PID: 151 Comm: kworker/0:2 Tainted: G W 4.19.85 >> #48 >> [ 99.846022] Hardware name: MediaTek krane sku0 board (DT) >> [ 99.851429] Workqueue: events sdio_irq_work >> [ 99.855614] pstate: 60000005 (nZCv daif -PAN -UAO) >> [ 99.860402] pc : skb_panic+0x64/0x68 >> [ 99.863974] lr : skb_panic+0x64/0x68 >> [ 99.867542] sp : ffffff8008833a90 >> [ 99.870850] x29: ffffff8008833ac0 x28: ffffffe52e337370 >> [ 99.876159] x27: ffffffe52e328a90 x26: 000000000000e0d0 >> [ 99.881469] x25: ffffffe52e336b60 x24: 000000000000deb5 >> [ 99.886779] x23: ffffffe52e340680 x22: ffffffe4efd47e00 >> [ 99.892088] x21: 000000000000deb5 x20: ffffffa516d85b4c >> [ 99.897397] x19: ffffffa526928037 x18: 0000000000000000 >> [ 99.902706] x17: 000000000000003c x16: ffffffa5265b6c80 >> [ 99.908015] x15: 0000000000000006 x14: 3a76656420303436 >> [ 99.913325] x13: 0000000000029bf0 x12: 0000000000000000 >> [ 99.918634] x11: 0000000000000000 x10: 0000000000000000 >> [ 99.923943] x9 : a3b907e4b2783000 x8 : a3b907e4b2783000 >> [ 99.929253] x7 : 0000000000000000 x6 : ffffffa526f66d76 >> [ 99.934563] x5 : 0000000000000000 x4 : 0000000000000000 >> [ 99.939872] x3 : 000000000002a5ab x2 : ffffffe53feed918 >> [ 99.945182] x1 : ffffffe53fee4a08 x0 : 000000000000008e >> [ 99.950491] Call trace: >> [ 99.952937] skb_panic+0x64/0x68 >> [ 99.956165] skb_put+0x7c/0x84 >> [ 99.959224] ath10k_sdio_irq_handler+0x740/0xbb8 [ath10k_sdio] >> [ 99.965055] process_sdio_pending_irqs+0x58/0x1a4 >> [ 99.969758] sdio_run_irqs+0x34/0x60 >> [ 99.973329] sdio_irq_work+0x1c/0x28 >> [ 99.974930] cros-ec-spi spi2.0: SPI transfer timed out >> [ 99.976904] process_one_work+0x210/0x410 >> [ 99.976911] worker_thread+0x234/0x3dc >> [ 99.976923] kthread+0x120/0x130 >> [ 99.982090] cros-ec-spi spi2.0: spi transfer failed: -110 >> [ 99.986054] ret_from_fork+0x10/0x18 >> [ 99.986063] Code: aa1403e2 2a1503e4 a90023e9 97e37d1a (d4210000) >> [ 99.986068] ---[ end trace cb6d948c5a0fd6c7 ]--- >> [ 100.017250] Kernel panic - not syncing: Fatal exception >> [ 100.018879] cros-ec-spi spi2.0: Command xfer error (err:-110) >> [ 100.023659] SMP: stopping secondary CPUs >> [ 100.023703] Kernel Offset: 0x251dc00000 from 0xffffff8008000000 >> [ 100.023707] CPU features: 0x0,2188200c >> [ 100.023709] Memory Limit: none >> >> The simulate fail of sdio is not a real sdio transter fail, it only >> set an error status in mmc_should_fail_request after the transfer end, >> actually the transfer is success, then sdio_io_rw_ext_helper will >> return error status and stop transfer the left data. For example, >> the really RX len is 286 bytes, then it will split to 2 blocks in >> sdio_io_rw_ext_helper, one is 256 bytes, left is 30 bytes, if the >> first 256 bytes get an error status by mmc_should_fail_request,then >> the left 30 bytes will not read in this RX operation. Then when the >> next RX arrive, the left 30 bytes will be considered as the header >> of the read, the top 8 bytes will be considered as ath10k_htc_hdr, >> but actually the 8 bytes is not the ath10k_htc_hdr, so the act_len >> from this ath10k_htc_hdr is not correct, if it is a big value, such >> as 57013, it will trigger skb_panic. >> >> Select the min value of act_len and alloc_len for skb_put and >> skb_put_data is reasonable. >> >> This patch only effect sdio chips. >> >> Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-00029. >> >> Signed-off-by: Wen Gong <wgong@codeaurora.org> >> --- >> drivers/net/wireless/ath/ath10k/sdio.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/wireless/ath/ath10k/sdio.c >> b/drivers/net/wireless/ath/ath10k/sdio.c >> index 7b894dcaad2e..71ce2674c543 100644 >> --- a/drivers/net/wireless/ath/ath10k/sdio.c >> +++ b/drivers/net/wireless/ath/ath10k/sdio.c >> @@ -651,7 +651,7 @@ static int ath10k_sdio_mbox_rx_fetch(struct ath10k >> *ar) >> >> htc_hdr = (struct ath10k_htc_hdr *)skb->data; >> pkt->act_len = le16_to_cpu(htc_hdr->len) + sizeof(*htc_hdr); >> - skb_put(skb, pkt->act_len); >> + skb_put(skb, min_t(size_t, pkt->alloc_len, pkt->act_len)); >> >> return ret; >> } >> @@ -687,7 +687,8 @@ static int ath10k_sdio_mbox_rx_fetch_bundle(struct >> ath10k *ar) >> htc_hdr = (struct ath10k_htc_hdr >> *)(ar_sdio->vsg_buffer + pkt_offset); >> pkt->act_len = le16_to_cpu(htc_hdr->len) + >> sizeof(*htc_hdr); >> >> - skb_put_data(pkt->skb, htc_hdr, pkt->act_len); >> + skb_put_data(pkt->skb, htc_hdr, >> + min_t(size_t, pkt->alloc_len, >> pkt->act_len)); >> pkt_offset += pkt->alloc_len; >> } >> >> -- >> 2.23.0
If the data is invalid I think it should be discarded and a warning displayed, it is just as bad to pass too little data isn't it? Then you potentially have uninitialized data? Or pointers to nonsense? There is a good chance I am misunderstanding though, I just wanted to make sure you had considered those cases and the word alignment if that applies to this type of bus? On Fri, Dec 20, 2019 at 12:57 AM <wgong@codeaurora.org> wrote: > > On 2019-12-20 15:35, Justin Capella wrote: > > Does this mess up alignment or anything, feels wrong to just take the > > smaller but I don't understand why it isn't the correct size, why is > > it? Should a warning be printed? > > > When tested with the simulate sdio transfer fail test, it will trigger > the len is not correct, > the commit log have description for the reason of wrong size. > > > On Thu, Dec 19, 2019 at 10:39 PM Wen Gong <wgong@codeaurora.org> wrote: > >> > >> When simulate random transfer fail for sdio write and read, it crash > >> sometimes. > >> > >> Test steps: > >> 1. Add config and update kernel: > >> CONFIG_FAIL_MMC_REQUEST=y > >> CONFIG_FAULT_INJECTION=y > >> CONFIG_FAULT_INJECTION_DEBUG_FS=y > >> > >> 2. run simulate fail: > >> cd /sys/kernel/debug/mmc1/fail_mmc_request > >> echo 10 > probability > >> echo 10 > times # repeat until hitting issues > >> > >> 3. it crash, the act len of ath10k_htc_hdr is higher than allocate > >> len, it cause panic: > >> [ 99.723482] skbuff: skb_over_panic: text:00000000caa0f780 len:57013 > >> put:57013 head:000000004116f24a data:0000000019ecb4dc tail:0xdef5 > >> end:0x640 dev:<NULL> > >> [ 99.737697] ------------[ cut here ]------------ > >> [ 99.742327] kernel BUG at > >> /mnt/host/source/src/third_party/kernel/v4.19/net/core/skbuff.c:104! > >> [ 99.750937] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > >> [ 99.831154] Process kworker/0:2 (pid: 151, stack limit = > >> 0x00000000728010bf) > >> [ 99.838200] CPU: 0 PID: 151 Comm: kworker/0:2 Tainted: G W 4.19.85 > >> #48 > >> [ 99.846022] Hardware name: MediaTek krane sku0 board (DT) > >> [ 99.851429] Workqueue: events sdio_irq_work > >> [ 99.855614] pstate: 60000005 (nZCv daif -PAN -UAO) > >> [ 99.860402] pc : skb_panic+0x64/0x68 > >> [ 99.863974] lr : skb_panic+0x64/0x68 > >> [ 99.867542] sp : ffffff8008833a90 > >> [ 99.870850] x29: ffffff8008833ac0 x28: ffffffe52e337370 > >> [ 99.876159] x27: ffffffe52e328a90 x26: 000000000000e0d0 > >> [ 99.881469] x25: ffffffe52e336b60 x24: 000000000000deb5 > >> [ 99.886779] x23: ffffffe52e340680 x22: ffffffe4efd47e00 > >> [ 99.892088] x21: 000000000000deb5 x20: ffffffa516d85b4c > >> [ 99.897397] x19: ffffffa526928037 x18: 0000000000000000 > >> [ 99.902706] x17: 000000000000003c x16: ffffffa5265b6c80 > >> [ 99.908015] x15: 0000000000000006 x14: 3a76656420303436 > >> [ 99.913325] x13: 0000000000029bf0 x12: 0000000000000000 > >> [ 99.918634] x11: 0000000000000000 x10: 0000000000000000 > >> [ 99.923943] x9 : a3b907e4b2783000 x8 : a3b907e4b2783000 > >> [ 99.929253] x7 : 0000000000000000 x6 : ffffffa526f66d76 > >> [ 99.934563] x5 : 0000000000000000 x4 : 0000000000000000 > >> [ 99.939872] x3 : 000000000002a5ab x2 : ffffffe53feed918 > >> [ 99.945182] x1 : ffffffe53fee4a08 x0 : 000000000000008e > >> [ 99.950491] Call trace: > >> [ 99.952937] skb_panic+0x64/0x68 > >> [ 99.956165] skb_put+0x7c/0x84 > >> [ 99.959224] ath10k_sdio_irq_handler+0x740/0xbb8 [ath10k_sdio] > >> [ 99.965055] process_sdio_pending_irqs+0x58/0x1a4 > >> [ 99.969758] sdio_run_irqs+0x34/0x60 > >> [ 99.973329] sdio_irq_work+0x1c/0x28 > >> [ 99.974930] cros-ec-spi spi2.0: SPI transfer timed out > >> [ 99.976904] process_one_work+0x210/0x410 > >> [ 99.976911] worker_thread+0x234/0x3dc > >> [ 99.976923] kthread+0x120/0x130 > >> [ 99.982090] cros-ec-spi spi2.0: spi transfer failed: -110 > >> [ 99.986054] ret_from_fork+0x10/0x18 > >> [ 99.986063] Code: aa1403e2 2a1503e4 a90023e9 97e37d1a (d4210000) > >> [ 99.986068] ---[ end trace cb6d948c5a0fd6c7 ]--- > >> [ 100.017250] Kernel panic - not syncing: Fatal exception > >> [ 100.018879] cros-ec-spi spi2.0: Command xfer error (err:-110) > >> [ 100.023659] SMP: stopping secondary CPUs > >> [ 100.023703] Kernel Offset: 0x251dc00000 from 0xffffff8008000000 > >> [ 100.023707] CPU features: 0x0,2188200c > >> [ 100.023709] Memory Limit: none > >> > >> The simulate fail of sdio is not a real sdio transter fail, it only > >> set an error status in mmc_should_fail_request after the transfer end, > >> actually the transfer is success, then sdio_io_rw_ext_helper will > >> return error status and stop transfer the left data. For example, > >> the really RX len is 286 bytes, then it will split to 2 blocks in > >> sdio_io_rw_ext_helper, one is 256 bytes, left is 30 bytes, if the > >> first 256 bytes get an error status by mmc_should_fail_request,then > >> the left 30 bytes will not read in this RX operation. Then when the > >> next RX arrive, the left 30 bytes will be considered as the header > >> of the read, the top 8 bytes will be considered as ath10k_htc_hdr, > >> but actually the 8 bytes is not the ath10k_htc_hdr, so the act_len > >> from this ath10k_htc_hdr is not correct, if it is a big value, such > >> as 57013, it will trigger skb_panic. > >> > >> Select the min value of act_len and alloc_len for skb_put and > >> skb_put_data is reasonable. > >> > >> This patch only effect sdio chips. > >> > >> Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-00029. > >> > >> Signed-off-by: Wen Gong <wgong@codeaurora.org> > >> --- > >> drivers/net/wireless/ath/ath10k/sdio.c | 5 +++-- > >> 1 file changed, 3 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/net/wireless/ath/ath10k/sdio.c > >> b/drivers/net/wireless/ath/ath10k/sdio.c > >> index 7b894dcaad2e..71ce2674c543 100644 > >> --- a/drivers/net/wireless/ath/ath10k/sdio.c > >> +++ b/drivers/net/wireless/ath/ath10k/sdio.c > >> @@ -651,7 +651,7 @@ static int ath10k_sdio_mbox_rx_fetch(struct ath10k > >> *ar) > >> > >> htc_hdr = (struct ath10k_htc_hdr *)skb->data; > >> pkt->act_len = le16_to_cpu(htc_hdr->len) + sizeof(*htc_hdr); > >> - skb_put(skb, pkt->act_len); > >> + skb_put(skb, min_t(size_t, pkt->alloc_len, pkt->act_len)); > >> > >> return ret; > >> } > >> @@ -687,7 +687,8 @@ static int ath10k_sdio_mbox_rx_fetch_bundle(struct > >> ath10k *ar) > >> htc_hdr = (struct ath10k_htc_hdr > >> *)(ar_sdio->vsg_buffer + pkt_offset); > >> pkt->act_len = le16_to_cpu(htc_hdr->len) + > >> sizeof(*htc_hdr); > >> > >> - skb_put_data(pkt->skb, htc_hdr, pkt->act_len); > >> + skb_put_data(pkt->skb, htc_hdr, > >> + min_t(size_t, pkt->alloc_len, > >> pkt->act_len)); > >> pkt_offset += pkt->alloc_len; > >> } > >> > >> -- > >> 2.23.0
On 2019-12-20 20:11, Justin Capella wrote: > If the data is invalid I think it should be discarded and a warning > displayed, it is just as bad to pass too little data isn't it? Then > you potentially have uninitialized data? Or pointers to nonsense? > There is a good chance I am misunderstanding though, I just wanted to > make sure you had considered those cases and the word alignment if > that applies to this type of bus? yes, patch v2: https://patchwork.kernel.org/patch/11313849/
diff --git a/drivers/net/wireless/ath/ath10k/sdio.c b/drivers/net/wireless/ath/ath10k/sdio.c index 7b894dcaad2e..71ce2674c543 100644 --- a/drivers/net/wireless/ath/ath10k/sdio.c +++ b/drivers/net/wireless/ath/ath10k/sdio.c @@ -651,7 +651,7 @@ static int ath10k_sdio_mbox_rx_fetch(struct ath10k *ar) htc_hdr = (struct ath10k_htc_hdr *)skb->data; pkt->act_len = le16_to_cpu(htc_hdr->len) + sizeof(*htc_hdr); - skb_put(skb, pkt->act_len); + skb_put(skb, min_t(size_t, pkt->alloc_len, pkt->act_len)); return ret; } @@ -687,7 +687,8 @@ static int ath10k_sdio_mbox_rx_fetch_bundle(struct ath10k *ar) htc_hdr = (struct ath10k_htc_hdr *)(ar_sdio->vsg_buffer + pkt_offset); pkt->act_len = le16_to_cpu(htc_hdr->len) + sizeof(*htc_hdr); - skb_put_data(pkt->skb, htc_hdr, pkt->act_len); + skb_put_data(pkt->skb, htc_hdr, + min_t(size_t, pkt->alloc_len, pkt->act_len)); pkt_offset += pkt->alloc_len; }
When simulate random transfer fail for sdio write and read, it crash sometimes. Test steps: 1. Add config and update kernel: CONFIG_FAIL_MMC_REQUEST=y CONFIG_FAULT_INJECTION=y CONFIG_FAULT_INJECTION_DEBUG_FS=y 2. run simulate fail: cd /sys/kernel/debug/mmc1/fail_mmc_request echo 10 > probability echo 10 > times # repeat until hitting issues 3. it crash, the act len of ath10k_htc_hdr is higher than allocate len, it cause panic: [ 99.723482] skbuff: skb_over_panic: text:00000000caa0f780 len:57013 put:57013 head:000000004116f24a data:0000000019ecb4dc tail:0xdef5 end:0x640 dev:<NULL> [ 99.737697] ------------[ cut here ]------------ [ 99.742327] kernel BUG at /mnt/host/source/src/third_party/kernel/v4.19/net/core/skbuff.c:104! [ 99.750937] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 99.831154] Process kworker/0:2 (pid: 151, stack limit = 0x00000000728010bf) [ 99.838200] CPU: 0 PID: 151 Comm: kworker/0:2 Tainted: G W 4.19.85 #48 [ 99.846022] Hardware name: MediaTek krane sku0 board (DT) [ 99.851429] Workqueue: events sdio_irq_work [ 99.855614] pstate: 60000005 (nZCv daif -PAN -UAO) [ 99.860402] pc : skb_panic+0x64/0x68 [ 99.863974] lr : skb_panic+0x64/0x68 [ 99.867542] sp : ffffff8008833a90 [ 99.870850] x29: ffffff8008833ac0 x28: ffffffe52e337370 [ 99.876159] x27: ffffffe52e328a90 x26: 000000000000e0d0 [ 99.881469] x25: ffffffe52e336b60 x24: 000000000000deb5 [ 99.886779] x23: ffffffe52e340680 x22: ffffffe4efd47e00 [ 99.892088] x21: 000000000000deb5 x20: ffffffa516d85b4c [ 99.897397] x19: ffffffa526928037 x18: 0000000000000000 [ 99.902706] x17: 000000000000003c x16: ffffffa5265b6c80 [ 99.908015] x15: 0000000000000006 x14: 3a76656420303436 [ 99.913325] x13: 0000000000029bf0 x12: 0000000000000000 [ 99.918634] x11: 0000000000000000 x10: 0000000000000000 [ 99.923943] x9 : a3b907e4b2783000 x8 : a3b907e4b2783000 [ 99.929253] x7 : 0000000000000000 x6 : ffffffa526f66d76 [ 99.934563] x5 : 0000000000000000 x4 : 0000000000000000 [ 99.939872] x3 : 000000000002a5ab x2 : ffffffe53feed918 [ 99.945182] x1 : ffffffe53fee4a08 x0 : 000000000000008e [ 99.950491] Call trace: [ 99.952937] skb_panic+0x64/0x68 [ 99.956165] skb_put+0x7c/0x84 [ 99.959224] ath10k_sdio_irq_handler+0x740/0xbb8 [ath10k_sdio] [ 99.965055] process_sdio_pending_irqs+0x58/0x1a4 [ 99.969758] sdio_run_irqs+0x34/0x60 [ 99.973329] sdio_irq_work+0x1c/0x28 [ 99.974930] cros-ec-spi spi2.0: SPI transfer timed out [ 99.976904] process_one_work+0x210/0x410 [ 99.976911] worker_thread+0x234/0x3dc [ 99.976923] kthread+0x120/0x130 [ 99.982090] cros-ec-spi spi2.0: spi transfer failed: -110 [ 99.986054] ret_from_fork+0x10/0x18 [ 99.986063] Code: aa1403e2 2a1503e4 a90023e9 97e37d1a (d4210000) [ 99.986068] ---[ end trace cb6d948c5a0fd6c7 ]--- [ 100.017250] Kernel panic - not syncing: Fatal exception [ 100.018879] cros-ec-spi spi2.0: Command xfer error (err:-110) [ 100.023659] SMP: stopping secondary CPUs [ 100.023703] Kernel Offset: 0x251dc00000 from 0xffffff8008000000 [ 100.023707] CPU features: 0x0,2188200c [ 100.023709] Memory Limit: none The simulate fail of sdio is not a real sdio transter fail, it only set an error status in mmc_should_fail_request after the transfer end, actually the transfer is success, then sdio_io_rw_ext_helper will return error status and stop transfer the left data. For example, the really RX len is 286 bytes, then it will split to 2 blocks in sdio_io_rw_ext_helper, one is 256 bytes, left is 30 bytes, if the first 256 bytes get an error status by mmc_should_fail_request,then the left 30 bytes will not read in this RX operation. Then when the next RX arrive, the left 30 bytes will be considered as the header of the read, the top 8 bytes will be considered as ath10k_htc_hdr, but actually the 8 bytes is not the ath10k_htc_hdr, so the act_len from this ath10k_htc_hdr is not correct, if it is a big value, such as 57013, it will trigger skb_panic. Select the min value of act_len and alloc_len for skb_put and skb_put_data is reasonable. This patch only effect sdio chips. Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-00029. Signed-off-by: Wen Gong <wgong@codeaurora.org> --- drivers/net/wireless/ath/ath10k/sdio.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)