Message ID | 20240812125009.62635-1-dawid.osuchowski@linux.intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [iwl-net,v2] ice: Add netif_device_attach/detach into PF reset flow | expand |
On Mon, Aug 12, 2024 at 02:50:09PM +0200, Dawid Osuchowski wrote: > Ethtool callbacks can be executed while reset is in progress and try to > access deleted resources, e.g. getting coalesce settings can result in a > NULL pointer dereference seen below. > > Reproduction steps: > Once the driver is fully initialized, trigger reset: > # echo 1 > /sys/class/net/<interface>/device/reset > when reset is in progress try to get coalesce settings using ethtool: > # ethtool -c <interface> > > BUG: kernel NULL pointer dereference, address: 0000000000000020 > PGD 0 P4D 0 > Oops: Oops: 0000 [#1] PREEMPT SMP PTI > CPU: 11 PID: 19713 Comm: ethtool Tainted: G S 6.10.0-rc7+ #7 > RIP: 0010:ice_get_q_coalesce+0x2e/0xa0 [ice] > RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206 > RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588 > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000 > R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40 > FS: 00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0 > Call Trace: > <TASK> > ice_get_coalesce+0x17/0x30 [ice] > coalesce_prepare_data+0x61/0x80 > ethnl_default_doit+0xde/0x340 > genl_family_rcv_msg_doit+0xf2/0x150 > genl_rcv_msg+0x1b3/0x2c0 > netlink_rcv_skb+0x5b/0x110 > genl_rcv+0x28/0x40 > netlink_unicast+0x19c/0x290 > netlink_sendmsg+0x222/0x490 > __sys_sendto+0x1df/0x1f0 > __x64_sys_sendto+0x24/0x30 > do_syscall_64+0x82/0x160 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > RIP: 0033:0x7faee60d8e27 > > Calling netif_device_detach() before reset makes the net core not call > the driver when ethtool command is issued, the attempt to execute an > ethtool command during reset will result in the following message: > > netlink error: No such device > > instead of NULL pointer dereference. Once reset is done and > ice_rebuild() is executing, the netif_device_attach() is called to allow > for ethtool operations to occur again in a safe manner. > > Fixes: fcea6f3da546 ("ice: Add stats and ethtool support") > Suggested-by: Jakub Kicinski <kuba@kernel.org> > Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> > Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com> Your SoB should be the last tag. Other than that Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> > --- > Changes since v1: > * Changed Fixes tag to point to another commit > * Minified the stacktrace > > Suggestion from Kuba: https://lore.kernel.org/netdev/20240610194756.5be5be90@kernel.org/ > Previous attempt: https://lore.kernel.org/netdev/20240722122839.51342-1-dawid.osuchowski@linux.intel.com/ > --- > drivers/net/ethernet/intel/ice/ice_main.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c > index eaa73cc200f4..16b4920741ff 100644 > --- a/drivers/net/ethernet/intel/ice/ice_main.c > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > @@ -608,6 +608,8 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type) > memset(&vsi->mqprio_qopt, 0, sizeof(vsi->mqprio_qopt)); > } > } > + if (vsi->netdev) > + netif_device_detach(vsi->netdev); > skip: > > /* clear SW filtering DB */ > @@ -7568,11 +7570,13 @@ static void ice_update_pf_netdev_link(struct ice_pf *pf) > > ice_get_link_status(pf->vsi[i]->port_info, &link_up); > if (link_up) { > + netif_device_attach(pf->vsi[i]->netdev); > netif_carrier_on(pf->vsi[i]->netdev); > netif_tx_wake_all_queues(pf->vsi[i]->netdev); > } else { > netif_carrier_off(pf->vsi[i]->netdev); > netif_tx_stop_all_queues(pf->vsi[i]->netdev); > + netif_device_detach(pf->vsi[i]->netdev); > } > } > } > -- > 2.44.0 > >
On Mon, Aug 12, 2024 at 02:50:09PM +0200, Dawid Osuchowski wrote: > Ethtool callbacks can be executed while reset is in progress and try to > access deleted resources, e.g. getting coalesce settings can result in a > NULL pointer dereference seen below. > > Reproduction steps: > Once the driver is fully initialized, trigger reset: > # echo 1 > /sys/class/net/<interface>/device/reset > when reset is in progress try to get coalesce settings using ethtool: > # ethtool -c <interface> > > BUG: kernel NULL pointer dereference, address: 0000000000000020 > PGD 0 P4D 0 > Oops: Oops: 0000 [#1] PREEMPT SMP PTI > CPU: 11 PID: 19713 Comm: ethtool Tainted: G S 6.10.0-rc7+ #7 > RIP: 0010:ice_get_q_coalesce+0x2e/0xa0 [ice] > RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206 > RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588 > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000 > R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40 > FS: 00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0 > Call Trace: > <TASK> > ice_get_coalesce+0x17/0x30 [ice] > coalesce_prepare_data+0x61/0x80 > ethnl_default_doit+0xde/0x340 > genl_family_rcv_msg_doit+0xf2/0x150 > genl_rcv_msg+0x1b3/0x2c0 > netlink_rcv_skb+0x5b/0x110 > genl_rcv+0x28/0x40 > netlink_unicast+0x19c/0x290 > netlink_sendmsg+0x222/0x490 > __sys_sendto+0x1df/0x1f0 > __x64_sys_sendto+0x24/0x30 > do_syscall_64+0x82/0x160 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > RIP: 0033:0x7faee60d8e27 > > Calling netif_device_detach() before reset makes the net core not call > the driver when ethtool command is issued, the attempt to execute an > ethtool command during reset will result in the following message: > > netlink error: No such device > > instead of NULL pointer dereference. Once reset is done and > ice_rebuild() is executing, the netif_device_attach() is called to allow > for ethtool operations to occur again in a safe manner. > > Fixes: fcea6f3da546 ("ice: Add stats and ethtool support") What about other intel drivers tho? > Suggested-by: Jakub Kicinski <kuba@kernel.org> > Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> > Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com> > --- > Changes since v1: > * Changed Fixes tag to point to another commit > * Minified the stacktrace > > Suggestion from Kuba: https://lore.kernel.org/netdev/20240610194756.5be5be90@kernel.org/ > Previous attempt: https://lore.kernel.org/netdev/20240722122839.51342-1-dawid.osuchowski@linux.intel.com/ > --- > drivers/net/ethernet/intel/ice/ice_main.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c > index eaa73cc200f4..16b4920741ff 100644 > --- a/drivers/net/ethernet/intel/ice/ice_main.c > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > @@ -608,6 +608,8 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type) > memset(&vsi->mqprio_qopt, 0, sizeof(vsi->mqprio_qopt)); > } > } > + if (vsi->netdev) > + netif_device_detach(vsi->netdev); > skip: > > /* clear SW filtering DB */ > @@ -7568,11 +7570,13 @@ static void ice_update_pf_netdev_link(struct ice_pf *pf) > > ice_get_link_status(pf->vsi[i]->port_info, &link_up); > if (link_up) { > + netif_device_attach(pf->vsi[i]->netdev); > netif_carrier_on(pf->vsi[i]->netdev); > netif_tx_wake_all_queues(pf->vsi[i]->netdev); > } else { > netif_carrier_off(pf->vsi[i]->netdev); > netif_tx_stop_all_queues(pf->vsi[i]->netdev); > + netif_device_detach(pf->vsi[i]->netdev); > } > } > } > -- > 2.44.0 > >
On 13.08.2024 13:49, Maciej Fijalkowski wrote:
> What about other intel drivers tho?
I have not performed detailed analysis of other intel ethernet drivers
in this regard, but it is surely a topic worth investigating.
--Dawid
On Tue, Aug 13, 2024 at 05:31:37PM +0200, Dawid Osuchowski wrote: > On 13.08.2024 13:49, Maciej Fijalkowski wrote: > > What about other intel drivers tho? > > I have not performed detailed analysis of other intel ethernet drivers in > this regard, but it is surely a topic worth investigating. If you could take some action upon this then it would be great. I'm always hesitating with providing the review tag against a change that already contains few of them, but given that I dedicated some time to look into that: Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > > --Dawid
On 13.08.2024 21:24, Maciej Fijalkowski wrote: > On Tue, Aug 13, 2024 at 05:31:37PM +0200, Dawid Osuchowski wrote: >> On 13.08.2024 13:49, Maciej Fijalkowski wrote: >>> What about other intel drivers tho? >> >> I have not performed detailed analysis of other intel ethernet drivers in >> this regard, but it is surely a topic worth investigating. > > If you could take some action upon this then it would be great. I'm always > hesitating with providing the review tag against a change that already > contains few of them, but given that I dedicated some time to look into > that: > I got a valid concern from Kalesh (CCd) on the v1 thread (https://lore.kernel.org/netdev/CAH-L+nOFqs-K5YzfrfmpRHbhDGM-+1ahhWh4NXATX1FqZiPVLQ@mail.gmail.com/) about the attaching only if link is up. On 14.08.2024 05:19, Kalesh Anakkur Purayil wrote: > [Kalesh] Is there any reason to attach back the netdev only if link is > up? IMO, you should attach the device back irrespective of physical > link status. In ice_prepare_for_reset(), you are detaching the device > unconditionally. > > I may be missing something here. I agree with his suggestion to do the netif_device_attach() irrespective of link being up. Should I sent a v3 with the change? I have already tested that locally and it seems to fix the reported issue with NULL pointer dereference as well. --Dawid > Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > >> >> --Dawid
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index eaa73cc200f4..16b4920741ff 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -608,6 +608,8 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type) memset(&vsi->mqprio_qopt, 0, sizeof(vsi->mqprio_qopt)); } } + if (vsi->netdev) + netif_device_detach(vsi->netdev); skip: /* clear SW filtering DB */ @@ -7568,11 +7570,13 @@ static void ice_update_pf_netdev_link(struct ice_pf *pf) ice_get_link_status(pf->vsi[i]->port_info, &link_up); if (link_up) { + netif_device_attach(pf->vsi[i]->netdev); netif_carrier_on(pf->vsi[i]->netdev); netif_tx_wake_all_queues(pf->vsi[i]->netdev); } else { netif_carrier_off(pf->vsi[i]->netdev); netif_tx_stop_all_queues(pf->vsi[i]->netdev); + netif_device_detach(pf->vsi[i]->netdev); } } }