Message ID | 20230523144235.672290-1-manishc@marvell.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v5,net] qede: Fix scheduling while atomic | expand |
Tue, May 23, 2023 at 04:42:35PM CEST, manishc@marvell.com wrote: >Bonding module collects the statistics while holding >the spinlock, beneath that qede->qed driver statistics >flow gets scheduled out due to usleep_range() used in PTT >acquire logic which results into below bug and traces - > >[ 3673.988874] Hardware name: HPE ProLiant DL365 Gen10 Plus/ProLiant DL365 Gen10 Plus, BIOS A42 10/29/2021 >[ 3673.988878] Call Trace: >[ 3673.988891] dump_stack_lvl+0x34/0x44 >[ 3673.988908] __schedule_bug.cold+0x47/0x53 >[ 3673.988918] __schedule+0x3fb/0x560 >[ 3673.988929] schedule+0x43/0xb0 >[ 3673.988932] schedule_hrtimeout_range_clock+0xbf/0x1b0 >[ 3673.988937] ? __hrtimer_init+0xc0/0xc0 >[ 3673.988950] usleep_range+0x5e/0x80 >[ 3673.988955] qed_ptt_acquire+0x2b/0xd0 [qed] >[ 3673.988981] _qed_get_vport_stats+0x141/0x240 [qed] >[ 3673.989001] qed_get_vport_stats+0x18/0x80 [qed] >[ 3673.989016] qede_fill_by_demand_stats+0x37/0x400 [qede] >[ 3673.989028] qede_get_stats64+0x19/0xe0 [qede] >[ 3673.989034] dev_get_stats+0x5c/0xc0 >[ 3673.989045] netstat_show.constprop.0+0x52/0xb0 >[ 3673.989055] dev_attr_show+0x19/0x40 >[ 3673.989065] sysfs_kf_seq_show+0x9b/0xf0 >[ 3673.989076] seq_read_iter+0x120/0x4b0 >[ 3673.989087] new_sync_read+0x118/0x1a0 >[ 3673.989095] vfs_read+0xf3/0x180 >[ 3673.989099] ksys_read+0x5f/0xe0 >[ 3673.989102] do_syscall_64+0x3b/0x90 >[ 3673.989109] entry_SYSCALL_64_after_hwframe+0x44/0xae You mention "bonding module" at the beginning of this description. Where exactly is that shown in the trace? I guess that the "spinlock" you talk about is "dev_base_lock", isn't it? >[ 3673.989115] RIP: 0033:0x7f8467d0b082 >[ 3673.989119] Code: c0 e9 b2 fe ff ff 50 48 8d 3d ca 05 08 00 e8 35 e7 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 >[ 3673.989121] RSP: 002b:00007ffffb21fd08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 >[ 3673.989127] RAX: ffffffffffffffda RBX: 000000000100eca0 RCX: 00007f8467d0b082 >[ 3673.989128] RDX: 00000000000003ff RSI: 00007ffffb21fdc0 RDI: 0000000000000003 >[ 3673.989130] RBP: 00007f8467b96028 R08: 0000000000000010 R09: 00007ffffb21ec00 >[ 3673.989132] R10: 00007ffffb27b170 R11: 0000000000000246 R12: 00000000000000f0 >[ 3673.989134] R13: 0000000000000003 R14: 00007f8467b92000 R15: 0000000000045a05 >[ 3673.989139] CPU: 30 PID: 285188 Comm: read_all Kdump: loaded Tainted: G W OE > >Fix this by collecting the statistics asynchronously from a periodic >delayed work scheduled at default stats coalescing interval and return >the recent copy of statisitcs from .ndo_get_stats64(), also add ability >to configure/retrieve stats coalescing interval using below commands - > >ethtool -C ethx stats-block-usecs <val> >ethtool -c ethx > >Fixes: 133fac0eedc3 ("qede: Add basic ethtool support") >Cc: Sudarsana Kalluru <skalluru@marvell.com> >Cc: David Miller <davem@davemloft.net> >Signed-off-by: Manish Chopra <manishc@marvell.com> >--- >v1->v2: > - Fixed checkpatch and kdoc warnings. >v2->v3: > - Moving the changelog after tags. >v3->v4: > - Changes to collect stats periodically using delayed work > and add ability to configure/retrieve stats coalescing > interval using ethtool > - Modified commit description to reflect the changes >v4->v5: > - Renamed the variables (s/ticks/usecs and s/interval/ticks) > - Relaxed the stats usecs coalescing configuration to allow > user to set any range of values and also while getting return > the exact value configured > - Usage of usecs_to_jiffies() wherever applicable > - Cosmetic change for logs/comments >--- > drivers/net/ethernet/qlogic/qede/qede.h | 4 +++ > .../net/ethernet/qlogic/qede/qede_ethtool.c | 26 ++++++++++++-- > drivers/net/ethernet/qlogic/qede/qede_main.c | 35 ++++++++++++++++++- > 3 files changed, 62 insertions(+), 3 deletions(-) > >diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h >index f90dcfe9ee68..8a63f99d499c 100644 >--- a/drivers/net/ethernet/qlogic/qede/qede.h >+++ b/drivers/net/ethernet/qlogic/qede/qede.h >@@ -271,6 +271,10 @@ struct qede_dev { > #define QEDE_ERR_WARN 3 > > struct qede_dump_info dump_info; >+ struct delayed_work periodic_task; >+ unsigned long stats_coal_ticks; >+ u32 stats_coal_usecs; >+ spinlock_t stats_lock; /* lock for vport stats access */ > }; > > enum QEDE_STATE { >diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c >index 8284c4c1528f..a6498eb7cbd7 100644 >--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c >+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c >@@ -426,6 +426,8 @@ static void qede_get_ethtool_stats(struct net_device *dev, > } > } > >+ spin_lock(&edev->stats_lock); >+ > for (i = 0; i < QEDE_NUM_STATS; i++) { > if (qede_is_irrelevant_stat(edev, i)) > continue; >@@ -435,6 +437,8 @@ static void qede_get_ethtool_stats(struct net_device *dev, > buf++; > } > >+ spin_unlock(&edev->stats_lock); >+ > __qede_unlock(edev); > } > >@@ -817,6 +821,7 @@ static int qede_get_coalesce(struct net_device *dev, > > coal->rx_coalesce_usecs = rx_coal; > coal->tx_coalesce_usecs = tx_coal; >+ coal->stats_block_coalesce_usecs = edev->stats_coal_usecs; > > return rc; > } >@@ -830,6 +835,21 @@ int qede_set_coalesce(struct net_device *dev, struct ethtool_coalesce *coal, > int i, rc = 0; > u16 rxc, txc; > >+ if (edev->stats_coal_usecs != coal->stats_block_coalesce_usecs) { >+ bool stats_coal_enabled; >+ >+ stats_coal_enabled = edev->stats_coal_usecs ? true : false; >+ >+ edev->stats_coal_usecs = coal->stats_block_coalesce_usecs; >+ edev->stats_coal_ticks = usecs_to_jiffies(coal->stats_block_coalesce_usecs); >+ >+ if (!stats_coal_enabled) >+ schedule_delayed_work(&edev->periodic_task, 0); What is the point of schedule here? Don't you want to rather schedule if (stats_coal_enabled == true) ?? >+ >+ DP_INFO(edev, "Configured stats coal ticks=%lu jiffies\n", >+ edev->stats_coal_ticks); >+ } >+ > if (!netif_running(dev)) { > DP_INFO(edev, "Interface is down\n"); > return -EINVAL; >@@ -2236,7 +2256,8 @@ static int qede_get_per_coalesce(struct net_device *dev, > } > > static const struct ethtool_ops qede_ethtool_ops = { >- .supported_coalesce_params = ETHTOOL_COALESCE_USECS, >+ .supported_coalesce_params = ETHTOOL_COALESCE_USECS | >+ ETHTOOL_COALESCE_STATS_BLOCK_USECS, > .get_link_ksettings = qede_get_link_ksettings, > .set_link_ksettings = qede_set_link_ksettings, > .get_drvinfo = qede_get_drvinfo, >@@ -2287,7 +2308,8 @@ static const struct ethtool_ops qede_ethtool_ops = { > }; > > static const struct ethtool_ops qede_vf_ethtool_ops = { >- .supported_coalesce_params = ETHTOOL_COALESCE_USECS, >+ .supported_coalesce_params = ETHTOOL_COALESCE_USECS | >+ ETHTOOL_COALESCE_STATS_BLOCK_USECS, > .get_link_ksettings = qede_get_link_ksettings, > .get_drvinfo = qede_get_drvinfo, > .get_msglevel = qede_get_msglevel, >diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c >index 06c6a5813606..61cc10968988 100644 >--- a/drivers/net/ethernet/qlogic/qede/qede_main.c >+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c >@@ -308,6 +308,8 @@ void qede_fill_by_demand_stats(struct qede_dev *edev) > > edev->ops->get_vport_stats(edev->cdev, &stats); > >+ spin_lock(&edev->stats_lock); >+ > p_common->no_buff_discards = stats.common.no_buff_discards; > p_common->packet_too_big_discard = stats.common.packet_too_big_discard; > p_common->ttl0_discard = stats.common.ttl0_discard; >@@ -405,6 +407,8 @@ void qede_fill_by_demand_stats(struct qede_dev *edev) > p_ah->tx_1519_to_max_byte_packets = > stats.ah.tx_1519_to_max_byte_packets; > } >+ >+ spin_unlock(&edev->stats_lock); > } > > static void qede_get_stats64(struct net_device *dev, >@@ -413,9 +417,10 @@ static void qede_get_stats64(struct net_device *dev, > struct qede_dev *edev = netdev_priv(dev); > struct qede_stats_common *p_common; > >- qede_fill_by_demand_stats(edev); > p_common = &edev->stats.common; > >+ spin_lock(&edev->stats_lock); >+ > stats->rx_packets = p_common->rx_ucast_pkts + p_common->rx_mcast_pkts + > p_common->rx_bcast_pkts; > stats->tx_packets = p_common->tx_ucast_pkts + p_common->tx_mcast_pkts + >@@ -435,6 +440,8 @@ static void qede_get_stats64(struct net_device *dev, > stats->collisions = edev->stats.bb.tx_total_collisions; > stats->rx_crc_errors = p_common->rx_crc_errors; > stats->rx_frame_errors = p_common->rx_align_errors; >+ >+ spin_unlock(&edev->stats_lock); > } > > #ifdef CONFIG_QED_SRIOV >@@ -1000,6 +1007,21 @@ static void qede_unlock(struct qede_dev *edev) > rtnl_unlock(); > } > >+static void qede_periodic_task(struct work_struct *work) >+{ >+ struct qede_dev *edev = container_of(work, struct qede_dev, >+ periodic_task.work); >+ >+ if (test_bit(QEDE_SP_DISABLE, &edev->sp_flags)) >+ return; >+ >+ if (edev->stats_coal_usecs) { Why don't you cancel the work when you don't want this to happen? >+ qede_fill_by_demand_stats(edev); >+ schedule_delayed_work(&edev->periodic_task, >+ edev->stats_coal_ticks); >+ } >+} >+ > static void qede_sp_task(struct work_struct *work) > { > struct qede_dev *edev = container_of(work, struct qede_dev, >@@ -1208,7 +1230,9 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level, > * from there, although it's unlikely]. > */ > INIT_DELAYED_WORK(&edev->sp_task, qede_sp_task); >+ INIT_DELAYED_WORK(&edev->periodic_task, qede_periodic_task); > mutex_init(&edev->qede_lock); >+ spin_lock_init(&edev->stats_lock); > > rc = register_netdev(edev->ndev); > if (rc) { >@@ -1233,6 +1257,11 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level, > edev->rx_copybreak = QEDE_RX_HDR_SIZE; > > qede_log_probe(edev); >+ >+ edev->stats_coal_usecs = USEC_PER_SEC; >+ edev->stats_coal_ticks = usecs_to_jiffies(USEC_PER_SEC); >+ schedule_delayed_work(&edev->periodic_task, 0); >+ > return 0; > > err4: >@@ -1301,6 +1330,7 @@ static void __qede_remove(struct pci_dev *pdev, enum qede_remove_mode mode) > unregister_netdev(ndev); > > cancel_delayed_work_sync(&edev->sp_task); >+ cancel_delayed_work_sync(&edev->periodic_task); > > edev->ops->common->set_power_state(cdev, PCI_D0); > >@@ -2571,6 +2601,9 @@ static void qede_recovery_handler(struct qede_dev *edev) > > DP_NOTICE(edev, "Starting a recovery process\n"); > >+ /* disable periodic stats */ >+ edev->stats_coal_usecs = 0; You disable but never enable again. Why? Also, why don't you do: cancel_delayed_work_sync(&edev->periodic_task) here instead? >+ > /* No need to acquire first the qede_lock since is done by qede_sp_task > * before calling this function. > */ >-- >2.27.0 > >
Hi Jiri, > -----Original Message----- > From: Jiri Pirko <jiri@resnulli.us> > Sent: Wednesday, May 24, 2023 5:01 PM > To: Manish Chopra <manishc@marvell.com> > Cc: kuba@kernel.org; netdev@vger.kernel.org; Ariel Elior > <aelior@marvell.com>; Alok Prasad <palok@marvell.com>; Sudarsana Reddy > Kalluru <skalluru@marvell.com>; David Miller <davem@davemloft.net> > Subject: [EXT] Re: [PATCH v5 net] qede: Fix scheduling while atomic > > External Email > > ---------------------------------------------------------------------- > Tue, May 23, 2023 at 04:42:35PM CEST, manishc@marvell.com wrote: > >Bonding module collects the statistics while holding the spinlock, > >beneath that qede->qed driver statistics flow gets scheduled out due to > >usleep_range() used in PTT acquire logic which results into below bug > >and traces - > > > >[ 3673.988874] Hardware name: HPE ProLiant DL365 Gen10 Plus/ProLiant > >DL365 Gen10 Plus, BIOS A42 10/29/2021 [ 3673.988878] Call Trace: > >[ 3673.988891] dump_stack_lvl+0x34/0x44 [ 3673.988908] > >__schedule_bug.cold+0x47/0x53 [ 3673.988918] __schedule+0x3fb/0x560 [ > >3673.988929] schedule+0x43/0xb0 [ 3673.988932] > >schedule_hrtimeout_range_clock+0xbf/0x1b0 > >[ 3673.988937] ? __hrtimer_init+0xc0/0xc0 [ 3673.988950] > >usleep_range+0x5e/0x80 [ 3673.988955] qed_ptt_acquire+0x2b/0xd0 [qed] > >[ 3673.988981] _qed_get_vport_stats+0x141/0x240 [qed] [ 3673.989001] > >qed_get_vport_stats+0x18/0x80 [qed] [ 3673.989016] > >qede_fill_by_demand_stats+0x37/0x400 [qede] [ 3673.989028] > >qede_get_stats64+0x19/0xe0 [qede] [ 3673.989034] > >dev_get_stats+0x5c/0xc0 [ 3673.989045] > >netstat_show.constprop.0+0x52/0xb0 > >[ 3673.989055] dev_attr_show+0x19/0x40 [ 3673.989065] > >sysfs_kf_seq_show+0x9b/0xf0 [ 3673.989076] seq_read_iter+0x120/0x4b0 [ > >3673.989087] new_sync_read+0x118/0x1a0 [ 3673.989095] > >vfs_read+0xf3/0x180 [ 3673.989099] ksys_read+0x5f/0xe0 [ 3673.989102] > >do_syscall_64+0x3b/0x90 [ 3673.989109] > >entry_SYSCALL_64_after_hwframe+0x44/0xae > > You mention "bonding module" at the beginning of this description. Where > exactly is that shown in the trace? > > I guess that the "spinlock" you talk about is "dev_base_lock", isn't it? Bonding function somehow were not part of traces, but this is the flow from bonding module which calls dev_get_stats() under spin_lock_nested(&bond->stats_lock, nest_level) which results to this issue. > > > >[ 3673.989115] RIP: 0033:0x7f8467d0b082 [ 3673.989119] Code: c0 e9 b2 > >fe ff ff 50 48 8d 3d ca 05 08 00 e8 35 e7 01 00 0f 1f 44 00 00 f3 0f 1e > >fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 > >c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 [ 3673.989121] RSP: > >002b:00007ffffb21fd08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ > >3673.989127] RAX: ffffffffffffffda RBX: 000000000100eca0 RCX: > >00007f8467d0b082 [ 3673.989128] RDX: 00000000000003ff RSI: > >00007ffffb21fdc0 RDI: 0000000000000003 [ 3673.989130] RBP: > 00007f8467b96028 R08: 0000000000000010 R09: 00007ffffb21ec00 [ > 3673.989132] R10: 00007ffffb27b170 R11: 0000000000000246 R12: > 00000000000000f0 [ 3673.989134] R13: 0000000000000003 R14: > 00007f8467b92000 R15: 0000000000045a05 > >[ 3673.989139] CPU: 30 PID: 285188 Comm: read_all Kdump: loaded > Tainted: G W OE > > > >Fix this by collecting the statistics asynchronously from a periodic > >delayed work scheduled at default stats coalescing interval and return > >the recent copy of statisitcs from .ndo_get_stats64(), also add ability > >to configure/retrieve stats coalescing interval using below commands - > > > >ethtool -C ethx stats-block-usecs <val> ethtool -c ethx > > > >Fixes: 133fac0eedc3 ("qede: Add basic ethtool support") > >Cc: Sudarsana Kalluru <skalluru@marvell.com> > >Cc: David Miller <davem@davemloft.net> > >Signed-off-by: Manish Chopra <manishc@marvell.com> > >--- > >v1->v2: > > - Fixed checkpatch and kdoc warnings. > >v2->v3: > > - Moving the changelog after tags. > >v3->v4: > > - Changes to collect stats periodically using delayed work > > and add ability to configure/retrieve stats coalescing > > interval using ethtool > > - Modified commit description to reflect the changes > >v4->v5: > > - Renamed the variables (s/ticks/usecs and s/interval/ticks) > > - Relaxed the stats usecs coalescing configuration to allow > > user to set any range of values and also while getting return > > the exact value configured > > - Usage of usecs_to_jiffies() wherever applicable > > - Cosmetic change for logs/comments > >--- > > drivers/net/ethernet/qlogic/qede/qede.h | 4 +++ > > .../net/ethernet/qlogic/qede/qede_ethtool.c | 26 ++++++++++++-- > > drivers/net/ethernet/qlogic/qede/qede_main.c | 35 ++++++++++++++++++- > > 3 files changed, 62 insertions(+), 3 deletions(-) > > > >diff --git a/drivers/net/ethernet/qlogic/qede/qede.h > >b/drivers/net/ethernet/qlogic/qede/qede.h > >index f90dcfe9ee68..8a63f99d499c 100644 > >--- a/drivers/net/ethernet/qlogic/qede/qede.h > >+++ b/drivers/net/ethernet/qlogic/qede/qede.h > >@@ -271,6 +271,10 @@ struct qede_dev { > > #define QEDE_ERR_WARN 3 > > > > struct qede_dump_info dump_info; > >+ struct delayed_work periodic_task; > >+ unsigned long stats_coal_ticks; > >+ u32 stats_coal_usecs; > >+ spinlock_t stats_lock; /* lock for vport stats > access */ > > }; > > > > enum QEDE_STATE { > >diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c > >b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c > >index 8284c4c1528f..a6498eb7cbd7 100644 > >--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c > >+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c > >@@ -426,6 +426,8 @@ static void qede_get_ethtool_stats(struct net_device > *dev, > > } > > } > > > >+ spin_lock(&edev->stats_lock); > >+ > > for (i = 0; i < QEDE_NUM_STATS; i++) { > > if (qede_is_irrelevant_stat(edev, i)) > > continue; > >@@ -435,6 +437,8 @@ static void qede_get_ethtool_stats(struct net_device > *dev, > > buf++; > > } > > > >+ spin_unlock(&edev->stats_lock); > >+ > > __qede_unlock(edev); > > } > > > >@@ -817,6 +821,7 @@ static int qede_get_coalesce(struct net_device > >*dev, > > > > coal->rx_coalesce_usecs = rx_coal; > > coal->tx_coalesce_usecs = tx_coal; > >+ coal->stats_block_coalesce_usecs = edev->stats_coal_usecs; > > > > return rc; > > } > >@@ -830,6 +835,21 @@ int qede_set_coalesce(struct net_device *dev, > struct ethtool_coalesce *coal, > > int i, rc = 0; > > u16 rxc, txc; > > > >+ if (edev->stats_coal_usecs != coal->stats_block_coalesce_usecs) { > >+ bool stats_coal_enabled; > >+ > >+ stats_coal_enabled = edev->stats_coal_usecs ? true : false; > >+ > >+ edev->stats_coal_usecs = coal->stats_block_coalesce_usecs; > >+ edev->stats_coal_ticks = > >+usecs_to_jiffies(coal->stats_block_coalesce_usecs); > >+ > >+ if (!stats_coal_enabled) > >+ schedule_delayed_work(&edev->periodic_task, 0); > > What is the point of schedule here? Don't you want to rather schedule if > (stats_coal_enabled == true) ?? This was for scheduling of periodic task ONLY if previously it was disabled. But actually it does not harm to schedule once always whenever user sets a non-zero usecs. > > > >+ > >+ DP_INFO(edev, "Configured stats coal ticks=%lu jiffies\n", > >+ edev->stats_coal_ticks); > >+ } > >+ > > if (!netif_running(dev)) { > > DP_INFO(edev, "Interface is down\n"); > > return -EINVAL; > >@@ -2236,7 +2256,8 @@ static int qede_get_per_coalesce(struct > >net_device *dev, } > > > > static const struct ethtool_ops qede_ethtool_ops = { > >- .supported_coalesce_params = ETHTOOL_COALESCE_USECS, > >+ .supported_coalesce_params = ETHTOOL_COALESCE_USECS | > >+ > ETHTOOL_COALESCE_STATS_BLOCK_USECS, > > .get_link_ksettings = qede_get_link_ksettings, > > .set_link_ksettings = qede_set_link_ksettings, > > .get_drvinfo = qede_get_drvinfo, > >@@ -2287,7 +2308,8 @@ static const struct ethtool_ops qede_ethtool_ops > >= { }; > > > > static const struct ethtool_ops qede_vf_ethtool_ops = { > >- .supported_coalesce_params = ETHTOOL_COALESCE_USECS, > >+ .supported_coalesce_params = ETHTOOL_COALESCE_USECS | > >+ > ETHTOOL_COALESCE_STATS_BLOCK_USECS, > > .get_link_ksettings = qede_get_link_ksettings, > > .get_drvinfo = qede_get_drvinfo, > > .get_msglevel = qede_get_msglevel, > >diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c > >b/drivers/net/ethernet/qlogic/qede/qede_main.c > >index 06c6a5813606..61cc10968988 100644 > >--- a/drivers/net/ethernet/qlogic/qede/qede_main.c > >+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c > >@@ -308,6 +308,8 @@ void qede_fill_by_demand_stats(struct qede_dev > >*edev) > > > > edev->ops->get_vport_stats(edev->cdev, &stats); > > > >+ spin_lock(&edev->stats_lock); > >+ > > p_common->no_buff_discards = stats.common.no_buff_discards; > > p_common->packet_too_big_discard = > stats.common.packet_too_big_discard; > > p_common->ttl0_discard = stats.common.ttl0_discard; @@ -405,6 > +407,8 > >@@ void qede_fill_by_demand_stats(struct qede_dev *edev) > > p_ah->tx_1519_to_max_byte_packets = > > stats.ah.tx_1519_to_max_byte_packets; > > } > >+ > >+ spin_unlock(&edev->stats_lock); > > } > > > > static void qede_get_stats64(struct net_device *dev, @@ -413,9 +417,10 > >@@ static void qede_get_stats64(struct net_device *dev, > > struct qede_dev *edev = netdev_priv(dev); > > struct qede_stats_common *p_common; > > > >- qede_fill_by_demand_stats(edev); > > p_common = &edev->stats.common; > > > >+ spin_lock(&edev->stats_lock); > >+ > > stats->rx_packets = p_common->rx_ucast_pkts + p_common- > >rx_mcast_pkts + > > p_common->rx_bcast_pkts; > > stats->tx_packets = p_common->tx_ucast_pkts + p_common- > >tx_mcast_pkts > >+ @@ -435,6 +440,8 @@ static void qede_get_stats64(struct net_device > *dev, > > stats->collisions = edev->stats.bb.tx_total_collisions; > > stats->rx_crc_errors = p_common->rx_crc_errors; > > stats->rx_frame_errors = p_common->rx_align_errors; > >+ > >+ spin_unlock(&edev->stats_lock); > > } > > > > #ifdef CONFIG_QED_SRIOV > >@@ -1000,6 +1007,21 @@ static void qede_unlock(struct qede_dev *edev) > > rtnl_unlock(); > > } > > > >+static void qede_periodic_task(struct work_struct *work) { > >+ struct qede_dev *edev = container_of(work, struct qede_dev, > >+ periodic_task.work); > >+ > >+ if (test_bit(QEDE_SP_DISABLE, &edev->sp_flags)) > >+ return; > >+ > >+ if (edev->stats_coal_usecs) { > > Why don't you cancel the work when you don't want this to happen? Ok. > > > >+ qede_fill_by_demand_stats(edev); > >+ schedule_delayed_work(&edev->periodic_task, > >+ edev->stats_coal_ticks); > >+ } > >+} > >+ > > static void qede_sp_task(struct work_struct *work) { > > struct qede_dev *edev = container_of(work, struct qede_dev, @@ > >-1208,7 +1230,9 @@ static int __qede_probe(struct pci_dev *pdev, u32 > dp_module, u8 dp_level, > > * from there, although it's unlikely]. > > */ > > INIT_DELAYED_WORK(&edev->sp_task, qede_sp_task); > >+ INIT_DELAYED_WORK(&edev->periodic_task, > qede_periodic_task); > > mutex_init(&edev->qede_lock); > >+ spin_lock_init(&edev->stats_lock); > > > > rc = register_netdev(edev->ndev); > > if (rc) { > >@@ -1233,6 +1257,11 @@ static int __qede_probe(struct pci_dev *pdev, > u32 dp_module, u8 dp_level, > > edev->rx_copybreak = QEDE_RX_HDR_SIZE; > > > > qede_log_probe(edev); > >+ > >+ edev->stats_coal_usecs = USEC_PER_SEC; > >+ edev->stats_coal_ticks = usecs_to_jiffies(USEC_PER_SEC); > >+ schedule_delayed_work(&edev->periodic_task, 0); > >+ > > return 0; > > > > err4: > >@@ -1301,6 +1330,7 @@ static void __qede_remove(struct pci_dev *pdev, > enum qede_remove_mode mode) > > unregister_netdev(ndev); > > > > cancel_delayed_work_sync(&edev->sp_task); > >+ cancel_delayed_work_sync(&edev->periodic_task); > > > > edev->ops->common->set_power_state(cdev, PCI_D0); > > > >@@ -2571,6 +2601,9 @@ static void qede_recovery_handler(struct > qede_dev > >*edev) > > > > DP_NOTICE(edev, "Starting a recovery process\n"); > > > >+ /* disable periodic stats */ > >+ edev->stats_coal_usecs = 0; > > You disable but never enable again. Why? > It's getting enabled back when recovery flow completes in below part of changes in the patch. + edev->stats_coal_usecs = USEC_PER_SEC; + edev->stats_coal_ticks = usecs_to_jiffies(USEC_PER_SEC); + schedule_delayed_work(&edev->periodic_task, 0); Although on reset I am resetting values to default load values but maybe I will cleanup here to retain ethtool set values. > Also, why don't you do: > cancel_delayed_work_sync(&edev->periodic_task) > here instead? > Sure. > > >+ > > /* No need to acquire first the qede_lock since is done by > qede_sp_task > > * before calling this function. > > */ > >-- > >2.27.0 > > > >
Thu, May 25, 2023 at 05:27:03PM CEST, manishc@marvell.com wrote: >Hi Jiri, > >> -----Original Message----- >> From: Jiri Pirko <jiri@resnulli.us> >> Sent: Wednesday, May 24, 2023 5:01 PM >> To: Manish Chopra <manishc@marvell.com> >> Cc: kuba@kernel.org; netdev@vger.kernel.org; Ariel Elior >> <aelior@marvell.com>; Alok Prasad <palok@marvell.com>; Sudarsana Reddy >> Kalluru <skalluru@marvell.com>; David Miller <davem@davemloft.net> >> Subject: [EXT] Re: [PATCH v5 net] qede: Fix scheduling while atomic >> >> External Email >> >> ---------------------------------------------------------------------- >> Tue, May 23, 2023 at 04:42:35PM CEST, manishc@marvell.com wrote: >> >Bonding module collects the statistics while holding the spinlock, >> >beneath that qede->qed driver statistics flow gets scheduled out due to >> >usleep_range() used in PTT acquire logic which results into below bug >> >and traces - >> > >> >[ 3673.988874] Hardware name: HPE ProLiant DL365 Gen10 Plus/ProLiant >> >DL365 Gen10 Plus, BIOS A42 10/29/2021 [ 3673.988878] Call Trace: >> >[ 3673.988891] dump_stack_lvl+0x34/0x44 [ 3673.988908] >> >__schedule_bug.cold+0x47/0x53 [ 3673.988918] __schedule+0x3fb/0x560 [ >> >3673.988929] schedule+0x43/0xb0 [ 3673.988932] >> >schedule_hrtimeout_range_clock+0xbf/0x1b0 >> >[ 3673.988937] ? __hrtimer_init+0xc0/0xc0 [ 3673.988950] >> >usleep_range+0x5e/0x80 [ 3673.988955] qed_ptt_acquire+0x2b/0xd0 [qed] >> >[ 3673.988981] _qed_get_vport_stats+0x141/0x240 [qed] [ 3673.989001] >> >qed_get_vport_stats+0x18/0x80 [qed] [ 3673.989016] >> >qede_fill_by_demand_stats+0x37/0x400 [qede] [ 3673.989028] >> >qede_get_stats64+0x19/0xe0 [qede] [ 3673.989034] >> >dev_get_stats+0x5c/0xc0 [ 3673.989045] >> >netstat_show.constprop.0+0x52/0xb0 >> >[ 3673.989055] dev_attr_show+0x19/0x40 [ 3673.989065] >> >sysfs_kf_seq_show+0x9b/0xf0 [ 3673.989076] seq_read_iter+0x120/0x4b0 [ >> >3673.989087] new_sync_read+0x118/0x1a0 [ 3673.989095] >> >vfs_read+0xf3/0x180 [ 3673.989099] ksys_read+0x5f/0xe0 [ 3673.989102] >> >do_syscall_64+0x3b/0x90 [ 3673.989109] >> >entry_SYSCALL_64_after_hwframe+0x44/0xae >> >> You mention "bonding module" at the beginning of this description. Where >> exactly is that shown in the trace? >> >> I guess that the "spinlock" you talk about is "dev_base_lock", isn't it? > >Bonding function somehow were not part of traces, but this is the flow from bonding module >which calls dev_get_stats() under spin_lock_nested(&bond->stats_lock, nest_level) which results to this issue. Trace you included is obviously from sysfs read. Either change the trace or the description. > >> >> >> >[ 3673.989115] RIP: 0033:0x7f8467d0b082 [ 3673.989119] Code: c0 e9 b2 >> >fe ff ff 50 48 8d 3d ca 05 08 00 e8 35 e7 01 00 0f 1f 44 00 00 f3 0f 1e >> >fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 >> >c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 [ 3673.989121] RSP: >> >002b:00007ffffb21fd08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ >> >3673.989127] RAX: ffffffffffffffda RBX: 000000000100eca0 RCX: >> >00007f8467d0b082 [ 3673.989128] RDX: 00000000000003ff RSI: >> >00007ffffb21fdc0 RDI: 0000000000000003 [ 3673.989130] RBP: >> 00007f8467b96028 R08: 0000000000000010 R09: 00007ffffb21ec00 [ >> 3673.989132] R10: 00007ffffb27b170 R11: 0000000000000246 R12: >> 00000000000000f0 [ 3673.989134] R13: 0000000000000003 R14: >> 00007f8467b92000 R15: 0000000000045a05 >> >[ 3673.989139] CPU: 30 PID: 285188 Comm: read_all Kdump: loaded >> Tainted: G W OE [...]
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h index f90dcfe9ee68..8a63f99d499c 100644 --- a/drivers/net/ethernet/qlogic/qede/qede.h +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -271,6 +271,10 @@ struct qede_dev { #define QEDE_ERR_WARN 3 struct qede_dump_info dump_info; + struct delayed_work periodic_task; + unsigned long stats_coal_ticks; + u32 stats_coal_usecs; + spinlock_t stats_lock; /* lock for vport stats access */ }; enum QEDE_STATE { diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c index 8284c4c1528f..a6498eb7cbd7 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c +++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c @@ -426,6 +426,8 @@ static void qede_get_ethtool_stats(struct net_device *dev, } } + spin_lock(&edev->stats_lock); + for (i = 0; i < QEDE_NUM_STATS; i++) { if (qede_is_irrelevant_stat(edev, i)) continue; @@ -435,6 +437,8 @@ static void qede_get_ethtool_stats(struct net_device *dev, buf++; } + spin_unlock(&edev->stats_lock); + __qede_unlock(edev); } @@ -817,6 +821,7 @@ static int qede_get_coalesce(struct net_device *dev, coal->rx_coalesce_usecs = rx_coal; coal->tx_coalesce_usecs = tx_coal; + coal->stats_block_coalesce_usecs = edev->stats_coal_usecs; return rc; } @@ -830,6 +835,21 @@ int qede_set_coalesce(struct net_device *dev, struct ethtool_coalesce *coal, int i, rc = 0; u16 rxc, txc; + if (edev->stats_coal_usecs != coal->stats_block_coalesce_usecs) { + bool stats_coal_enabled; + + stats_coal_enabled = edev->stats_coal_usecs ? true : false; + + edev->stats_coal_usecs = coal->stats_block_coalesce_usecs; + edev->stats_coal_ticks = usecs_to_jiffies(coal->stats_block_coalesce_usecs); + + if (!stats_coal_enabled) + schedule_delayed_work(&edev->periodic_task, 0); + + DP_INFO(edev, "Configured stats coal ticks=%lu jiffies\n", + edev->stats_coal_ticks); + } + if (!netif_running(dev)) { DP_INFO(edev, "Interface is down\n"); return -EINVAL; @@ -2236,7 +2256,8 @@ static int qede_get_per_coalesce(struct net_device *dev, } static const struct ethtool_ops qede_ethtool_ops = { - .supported_coalesce_params = ETHTOOL_COALESCE_USECS, + .supported_coalesce_params = ETHTOOL_COALESCE_USECS | + ETHTOOL_COALESCE_STATS_BLOCK_USECS, .get_link_ksettings = qede_get_link_ksettings, .set_link_ksettings = qede_set_link_ksettings, .get_drvinfo = qede_get_drvinfo, @@ -2287,7 +2308,8 @@ static const struct ethtool_ops qede_ethtool_ops = { }; static const struct ethtool_ops qede_vf_ethtool_ops = { - .supported_coalesce_params = ETHTOOL_COALESCE_USECS, + .supported_coalesce_params = ETHTOOL_COALESCE_USECS | + ETHTOOL_COALESCE_STATS_BLOCK_USECS, .get_link_ksettings = qede_get_link_ksettings, .get_drvinfo = qede_get_drvinfo, .get_msglevel = qede_get_msglevel, diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 06c6a5813606..61cc10968988 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -308,6 +308,8 @@ void qede_fill_by_demand_stats(struct qede_dev *edev) edev->ops->get_vport_stats(edev->cdev, &stats); + spin_lock(&edev->stats_lock); + p_common->no_buff_discards = stats.common.no_buff_discards; p_common->packet_too_big_discard = stats.common.packet_too_big_discard; p_common->ttl0_discard = stats.common.ttl0_discard; @@ -405,6 +407,8 @@ void qede_fill_by_demand_stats(struct qede_dev *edev) p_ah->tx_1519_to_max_byte_packets = stats.ah.tx_1519_to_max_byte_packets; } + + spin_unlock(&edev->stats_lock); } static void qede_get_stats64(struct net_device *dev, @@ -413,9 +417,10 @@ static void qede_get_stats64(struct net_device *dev, struct qede_dev *edev = netdev_priv(dev); struct qede_stats_common *p_common; - qede_fill_by_demand_stats(edev); p_common = &edev->stats.common; + spin_lock(&edev->stats_lock); + stats->rx_packets = p_common->rx_ucast_pkts + p_common->rx_mcast_pkts + p_common->rx_bcast_pkts; stats->tx_packets = p_common->tx_ucast_pkts + p_common->tx_mcast_pkts + @@ -435,6 +440,8 @@ static void qede_get_stats64(struct net_device *dev, stats->collisions = edev->stats.bb.tx_total_collisions; stats->rx_crc_errors = p_common->rx_crc_errors; stats->rx_frame_errors = p_common->rx_align_errors; + + spin_unlock(&edev->stats_lock); } #ifdef CONFIG_QED_SRIOV @@ -1000,6 +1007,21 @@ static void qede_unlock(struct qede_dev *edev) rtnl_unlock(); } +static void qede_periodic_task(struct work_struct *work) +{ + struct qede_dev *edev = container_of(work, struct qede_dev, + periodic_task.work); + + if (test_bit(QEDE_SP_DISABLE, &edev->sp_flags)) + return; + + if (edev->stats_coal_usecs) { + qede_fill_by_demand_stats(edev); + schedule_delayed_work(&edev->periodic_task, + edev->stats_coal_ticks); + } +} + static void qede_sp_task(struct work_struct *work) { struct qede_dev *edev = container_of(work, struct qede_dev, @@ -1208,7 +1230,9 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level, * from there, although it's unlikely]. */ INIT_DELAYED_WORK(&edev->sp_task, qede_sp_task); + INIT_DELAYED_WORK(&edev->periodic_task, qede_periodic_task); mutex_init(&edev->qede_lock); + spin_lock_init(&edev->stats_lock); rc = register_netdev(edev->ndev); if (rc) { @@ -1233,6 +1257,11 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level, edev->rx_copybreak = QEDE_RX_HDR_SIZE; qede_log_probe(edev); + + edev->stats_coal_usecs = USEC_PER_SEC; + edev->stats_coal_ticks = usecs_to_jiffies(USEC_PER_SEC); + schedule_delayed_work(&edev->periodic_task, 0); + return 0; err4: @@ -1301,6 +1330,7 @@ static void __qede_remove(struct pci_dev *pdev, enum qede_remove_mode mode) unregister_netdev(ndev); cancel_delayed_work_sync(&edev->sp_task); + cancel_delayed_work_sync(&edev->periodic_task); edev->ops->common->set_power_state(cdev, PCI_D0); @@ -2571,6 +2601,9 @@ static void qede_recovery_handler(struct qede_dev *edev) DP_NOTICE(edev, "Starting a recovery process\n"); + /* disable periodic stats */ + edev->stats_coal_usecs = 0; + /* No need to acquire first the qede_lock since is done by qede_sp_task * before calling this function. */
Bonding module collects the statistics while holding the spinlock, beneath that qede->qed driver statistics flow gets scheduled out due to usleep_range() used in PTT acquire logic which results into below bug and traces - [ 3673.988874] Hardware name: HPE ProLiant DL365 Gen10 Plus/ProLiant DL365 Gen10 Plus, BIOS A42 10/29/2021 [ 3673.988878] Call Trace: [ 3673.988891] dump_stack_lvl+0x34/0x44 [ 3673.988908] __schedule_bug.cold+0x47/0x53 [ 3673.988918] __schedule+0x3fb/0x560 [ 3673.988929] schedule+0x43/0xb0 [ 3673.988932] schedule_hrtimeout_range_clock+0xbf/0x1b0 [ 3673.988937] ? __hrtimer_init+0xc0/0xc0 [ 3673.988950] usleep_range+0x5e/0x80 [ 3673.988955] qed_ptt_acquire+0x2b/0xd0 [qed] [ 3673.988981] _qed_get_vport_stats+0x141/0x240 [qed] [ 3673.989001] qed_get_vport_stats+0x18/0x80 [qed] [ 3673.989016] qede_fill_by_demand_stats+0x37/0x400 [qede] [ 3673.989028] qede_get_stats64+0x19/0xe0 [qede] [ 3673.989034] dev_get_stats+0x5c/0xc0 [ 3673.989045] netstat_show.constprop.0+0x52/0xb0 [ 3673.989055] dev_attr_show+0x19/0x40 [ 3673.989065] sysfs_kf_seq_show+0x9b/0xf0 [ 3673.989076] seq_read_iter+0x120/0x4b0 [ 3673.989087] new_sync_read+0x118/0x1a0 [ 3673.989095] vfs_read+0xf3/0x180 [ 3673.989099] ksys_read+0x5f/0xe0 [ 3673.989102] do_syscall_64+0x3b/0x90 [ 3673.989109] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3673.989115] RIP: 0033:0x7f8467d0b082 [ 3673.989119] Code: c0 e9 b2 fe ff ff 50 48 8d 3d ca 05 08 00 e8 35 e7 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 [ 3673.989121] RSP: 002b:00007ffffb21fd08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 3673.989127] RAX: ffffffffffffffda RBX: 000000000100eca0 RCX: 00007f8467d0b082 [ 3673.989128] RDX: 00000000000003ff RSI: 00007ffffb21fdc0 RDI: 0000000000000003 [ 3673.989130] RBP: 00007f8467b96028 R08: 0000000000000010 R09: 00007ffffb21ec00 [ 3673.989132] R10: 00007ffffb27b170 R11: 0000000000000246 R12: 00000000000000f0 [ 3673.989134] R13: 0000000000000003 R14: 00007f8467b92000 R15: 0000000000045a05 [ 3673.989139] CPU: 30 PID: 285188 Comm: read_all Kdump: loaded Tainted: G W OE Fix this by collecting the statistics asynchronously from a periodic delayed work scheduled at default stats coalescing interval and return the recent copy of statisitcs from .ndo_get_stats64(), also add ability to configure/retrieve stats coalescing interval using below commands - ethtool -C ethx stats-block-usecs <val> ethtool -c ethx Fixes: 133fac0eedc3 ("qede: Add basic ethtool support") Cc: Sudarsana Kalluru <skalluru@marvell.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Manish Chopra <manishc@marvell.com> --- v1->v2: - Fixed checkpatch and kdoc warnings. v2->v3: - Moving the changelog after tags. v3->v4: - Changes to collect stats periodically using delayed work and add ability to configure/retrieve stats coalescing interval using ethtool - Modified commit description to reflect the changes v4->v5: - Renamed the variables (s/ticks/usecs and s/interval/ticks) - Relaxed the stats usecs coalescing configuration to allow user to set any range of values and also while getting return the exact value configured - Usage of usecs_to_jiffies() wherever applicable - Cosmetic change for logs/comments --- drivers/net/ethernet/qlogic/qede/qede.h | 4 +++ .../net/ethernet/qlogic/qede/qede_ethtool.c | 26 ++++++++++++-- drivers/net/ethernet/qlogic/qede/qede_main.c | 35 ++++++++++++++++++- 3 files changed, 62 insertions(+), 3 deletions(-)