Message ID | CAHk-=wiZZi7FcvqVSUirHBjx0bBUZ4dFrMDVLc3+3HCrtq0rBA@mail.gmail.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Aquantia ethernet driver suspend/resume issues | expand |
Hi Linus, > and now the slab cache is corrupt and the system is dead. > > My *guess* is that what is going on is that when the kcalloc() failued > (because it tries to allocate a large area, and it has only been > tested at boot-time when it succeeds), we end up doing that You are probably right root causing the issue here, and the bad thing is that direct fix will not solve the problem. I'm trying to repro this on my side with some artificially increased structure sizes, but no success so far. Datapath initialization in driver requires normally 8 vectors each having rx/tx pair of rings, in total making 8*(4096+2048)*64 = ~ 3Mb of memory. ... > Anyway, I suspect a fix for the fatal error might be something like > the attached, but I think the *root* of the problem is how the > aquantia driver tried to allocate a humongous buff_ring with kmalloc, > which really doesn't work. You can see that "order:6", ie we're > talking an allocation > 100kB, and in low-memory situations that kind > of kmalloc space simply isn't available. It *will* fail. Correct. It seems after some series of pm-related changes the driver logic was changed to always reinit full sw structures during suspend/resume, which is obviously an overkill. ... > > I don't know what the right fix is, but *one* fix would certainly be > to not tear everything down at suspend time, only to build it up again > at resume. > > And please please please don't double-free things randomly (if that is > what was going on, but it does look like it was). You are right its also a potential double free problem (may be not manifesting), but aq_ring_free is being called from "same level" aq_ring_alloc, but also then from aq_ring_tx_alloc caller error handler. Attached is my draft patch to improve that. Still looking into the bigger problem of fixing suspend/resume. Thanks Igor From d753ba3bc5d18f0213833da7ceba3097d8db027a Mon Sep 17 00:00:00 2001 From: Igor Russkikh <irusskikh@marvell.com> Date: Mon, 27 Nov 2023 16:09:05 +0100 Subject: [PATCH] net: atlantic: fixed double free on reinit --- .../net/ethernet/aquantia/atlantic/aq_ptp.c | 17 ++---- .../net/ethernet/aquantia/atlantic/aq_ring.c | 60 ++++++------------- .../net/ethernet/aquantia/atlantic/aq_ring.h | 18 +++--- .../net/ethernet/aquantia/atlantic/aq_vec.c | 22 ++++--- 4 files changed, 42 insertions(+), 75 deletions(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ptp.c b/drivers/net/ethernet/aquantia/atlantic/aq_ptp.c index 80b44043e6c5..1814375fecee 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ptp.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ptp.c @@ -953,8 +953,6 @@ int aq_ptp_ring_alloc(struct aq_nic_s *aq_nic) { struct aq_ptp_s *aq_ptp = aq_nic->aq_ptp; unsigned int tx_ring_idx, rx_ring_idx; - struct aq_ring_s *hwts; - struct aq_ring_s *ring; int err; if (!aq_ptp) @@ -962,27 +960,24 @@ int aq_ptp_ring_alloc(struct aq_nic_s *aq_nic) tx_ring_idx = aq_ptp_ring_idx(aq_nic->aq_nic_cfg.tc_mode); - ring = aq_ring_tx_alloc(&aq_ptp->ptp_tx, aq_nic, + err = aq_ring_tx_alloc(&aq_ptp->ptp_tx, aq_nic, tx_ring_idx, &aq_nic->aq_nic_cfg); - if (!ring) { - err = -ENOMEM; + if (err) { goto err_exit; } rx_ring_idx = aq_ptp_ring_idx(aq_nic->aq_nic_cfg.tc_mode); - ring = aq_ring_rx_alloc(&aq_ptp->ptp_rx, aq_nic, + err = aq_ring_rx_alloc(&aq_ptp->ptp_rx, aq_nic, rx_ring_idx, &aq_nic->aq_nic_cfg); - if (!ring) { - err = -ENOMEM; + if (err) { goto err_exit_ptp_tx; } - hwts = aq_ring_hwts_rx_alloc(&aq_ptp->hwts_rx, aq_nic, PTP_HWST_RING_IDX, + err = aq_ring_hwts_rx_alloc(&aq_ptp->hwts_rx, aq_nic, PTP_HWST_RING_IDX, aq_nic->aq_nic_cfg.rxds, aq_nic->aq_nic_cfg.aq_hw_caps->rxd_size); - if (!hwts) { - err = -ENOMEM; + if (err) { goto err_exit_ptp_rx; } diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c index 4de22eed099a..241472e56eb2 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c @@ -132,8 +132,8 @@ static int aq_get_rxpages(struct aq_ring_s *self, struct aq_ring_buff_s *rxbuf) return 0; } -static struct aq_ring_s *aq_ring_alloc(struct aq_ring_s *self, - struct aq_nic_s *aq_nic) +static int aq_ring_alloc(struct aq_ring_s *self, + struct aq_nic_s *aq_nic) { int err = 0; @@ -156,46 +156,29 @@ static struct aq_ring_s *aq_ring_alloc(struct aq_ring_s *self, err_exit: if (err < 0) { aq_ring_free(self); - self = NULL; } - return self; + return err; } -struct aq_ring_s *aq_ring_tx_alloc(struct aq_ring_s *self, +int aq_ring_tx_alloc(struct aq_ring_s *self, struct aq_nic_s *aq_nic, unsigned int idx, struct aq_nic_cfg_s *aq_nic_cfg) { - int err = 0; - self->aq_nic = aq_nic; self->idx = idx; self->size = aq_nic_cfg->txds; self->dx_size = aq_nic_cfg->aq_hw_caps->txd_size; - self = aq_ring_alloc(self, aq_nic); - if (!self) { - err = -ENOMEM; - goto err_exit; - } - -err_exit: - if (err < 0) { - aq_ring_free(self); - self = NULL; - } - - return self; + return aq_ring_alloc(self, aq_nic); } -struct aq_ring_s *aq_ring_rx_alloc(struct aq_ring_s *self, - struct aq_nic_s *aq_nic, - unsigned int idx, - struct aq_nic_cfg_s *aq_nic_cfg) +int aq_ring_rx_alloc(struct aq_ring_s *self, + struct aq_nic_s *aq_nic, + unsigned int idx, + struct aq_nic_cfg_s *aq_nic_cfg) { - int err = 0; - self->aq_nic = aq_nic; self->idx = idx; self->size = aq_nic_cfg->rxds; @@ -217,22 +200,10 @@ struct aq_ring_s *aq_ring_rx_alloc(struct aq_ring_s *self, self->tail_size = 0; } - self = aq_ring_alloc(self, aq_nic); - if (!self) { - err = -ENOMEM; - goto err_exit; - } - -err_exit: - if (err < 0) { - aq_ring_free(self); - self = NULL; - } - - return self; + return aq_ring_alloc(self, aq_nic); } -struct aq_ring_s * +int aq_ring_hwts_rx_alloc(struct aq_ring_s *self, struct aq_nic_s *aq_nic, unsigned int idx, unsigned int size, unsigned int dx_size) { @@ -250,10 +221,10 @@ aq_ring_hwts_rx_alloc(struct aq_ring_s *self, struct aq_nic_s *aq_nic, GFP_KERNEL); if (!self->dx_ring) { aq_ring_free(self); - return NULL; + return -ENOMEM; } - return self; + return 0; } int aq_ring_init(struct aq_ring_s *self, const enum atl_ring_type ring_type) @@ -932,11 +903,14 @@ void aq_ring_free(struct aq_ring_s *self) return; kfree(self->buff_ring); + self->buff_ring = NULL; - if (self->dx_ring) + if (self->dx_ring) { dma_free_coherent(aq_nic_get_dev(self->aq_nic), self->size * self->dx_size, self->dx_ring, self->dx_ring_pa); + self->dx_ring = NULL; + } } unsigned int aq_ring_fill_stats_data(struct aq_ring_s *self, u64 *data) diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.h b/drivers/net/ethernet/aquantia/atlantic/aq_ring.h index 0a6c34438c1d..00fbeccc16c8 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.h +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.h @@ -183,14 +183,14 @@ static inline unsigned int aq_ring_avail_dx(struct aq_ring_s *self) self->sw_head - self->sw_tail - 1); } -struct aq_ring_s *aq_ring_tx_alloc(struct aq_ring_s *self, - struct aq_nic_s *aq_nic, - unsigned int idx, - struct aq_nic_cfg_s *aq_nic_cfg); -struct aq_ring_s *aq_ring_rx_alloc(struct aq_ring_s *self, - struct aq_nic_s *aq_nic, - unsigned int idx, - struct aq_nic_cfg_s *aq_nic_cfg); +int aq_ring_tx_alloc(struct aq_ring_s *self, + struct aq_nic_s *aq_nic, + unsigned int idx, + struct aq_nic_cfg_s *aq_nic_cfg); +int aq_ring_rx_alloc(struct aq_ring_s *self, + struct aq_nic_s *aq_nic, + unsigned int idx, + struct aq_nic_cfg_s *aq_nic_cfg); int aq_ring_init(struct aq_ring_s *self, const enum atl_ring_type ring_type); void aq_ring_rx_deinit(struct aq_ring_s *self); @@ -207,7 +207,7 @@ int aq_ring_rx_clean(struct aq_ring_s *self, int budget); int aq_ring_rx_fill(struct aq_ring_s *self); -struct aq_ring_s *aq_ring_hwts_rx_alloc(struct aq_ring_s *self, +int aq_ring_hwts_rx_alloc(struct aq_ring_s *self, struct aq_nic_s *aq_nic, unsigned int idx, unsigned int size, unsigned int dx_size); void aq_ring_hwts_rx_clean(struct aq_ring_s *self, struct aq_nic_s *aq_nic); diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c index f5db1c44e9b9..d811106862ff 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c @@ -136,10 +136,9 @@ int aq_vec_ring_alloc(struct aq_vec_s *self, struct aq_nic_s *aq_nic, const unsigned int idx_ring = AQ_NIC_CFG_TCVEC2RING(aq_nic_cfg, i, idx); - ring = aq_ring_tx_alloc(&self->ring[i][AQ_VEC_TX_ID], aq_nic, - idx_ring, aq_nic_cfg); - if (!ring) { - err = -ENOMEM; + ring = &self->ring[i][AQ_VEC_TX_ID]; + err = aq_ring_tx_alloc(ring, aq_nic, idx_ring, aq_nic_cfg); + if (err) { goto err_exit; } @@ -147,24 +146,23 @@ int aq_vec_ring_alloc(struct aq_vec_s *self, struct aq_nic_s *aq_nic, aq_nic_set_tx_ring(aq_nic, idx_ring, ring); - if (xdp_rxq_info_reg(&self->ring[i][AQ_VEC_RX_ID].xdp_rxq, + ring = &self->ring[i][AQ_VEC_RX_ID]; + if (xdp_rxq_info_reg(&ring->xdp_rxq, aq_nic->ndev, idx, self->napi.napi_id) < 0) { err = -ENOMEM; goto err_exit; } - if (xdp_rxq_info_reg_mem_model(&self->ring[i][AQ_VEC_RX_ID].xdp_rxq, + if (xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_PAGE_SHARED, NULL) < 0) { - xdp_rxq_info_unreg(&self->ring[i][AQ_VEC_RX_ID].xdp_rxq); + xdp_rxq_info_unreg(&ring->xdp_rxq); err = -ENOMEM; goto err_exit; } - ring = aq_ring_rx_alloc(&self->ring[i][AQ_VEC_RX_ID], aq_nic, - idx_ring, aq_nic_cfg); - if (!ring) { - xdp_rxq_info_unreg(&self->ring[i][AQ_VEC_RX_ID].xdp_rxq); - err = -ENOMEM; + err = aq_ring_rx_alloc(ring, aq_nic, idx_ring, aq_nic_cfg); + if (err) { + xdp_rxq_info_unreg(&ring->xdp_rxq); goto err_exit; }
On Mon, 27 Nov 2023 at 09:29, Igor Russkikh <irusskikh@marvell.com> wrote: > > I'm trying to repro this on my side with some artificially increased structure > sizes, but no success so far. So I suspect that one reason I triggered the problem was simply because the suspend/resume happened while I walked away from the computer when it was copying a few hundred gig of data from the old SSD (over USB, so not hugely fast). End result: the suspend/resume happened while the machine was actually quite busy, and presumably a *lot* of memory was just used for disk buffers etc. The "duspend when idle" logic doesn't really take background tasks into account, and my logs leading up to the suspend shows 13:54:09 systemd[1]: Starting systemd-suspend.service - System Suspend... 13:54:09 wpa_supplicant[2401]: wlo2: CTRL-EVENT-DSCP-POLICY clear_all 13:54:09 systemd-sleep[12477]: Entering sleep state 'suspend'... 13:54:09 kernel: PM: suspend entry (deep) 13:54:19 systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully. 13:54:22 kernel: Filesystems sync: 12.738 seconds 14:06:30 kernel: Freezing user space processes and while that last timestamp is bogus (the timestamp comes from syslogd logging, and it actually happens at *resume*), you can see that the filesystem activity was pretty significant with the sync taking a long time, because the copy process was still on-going the whole time. And it continued *after* the sync too. So I - accidentally - ended up hitting a lot of "that's not great" cases on this, that I wouldn't hit normally (because I obviously turn off suspend-at-idle). All on hardware that isn't normally used for suspend/resume anyway, so it probably has somewhat limited testing to begin with. For triggering it, you might try to change that self->buff_ring = kcalloc(self->size, sizeof(struct aq_ring_buff_s), GFP_KERNEL); to use GFP_NOWAIT instead of GFP_KERNEL. That makes allocation failures *much* more likely. It will still work at boot time. Or just artificially make it fail with a "fail the Nth time you hit it". Also, make sure you don't have ridiculous amounts of memory in your machine. I've got "only" 64GB in mine, which is small for a big machine, and presumably a lot of it was used for buffer cache, and I'm not sure what the device suspend/resume ordering was (ie disk might be resumed after ethernet). Linus
On Mon, 27 Nov 2023 18:29:19 +0100 Igor Russkikh wrote: > > Anyway, I suspect a fix for the fatal error might be something like > > the attached, but I think the *root* of the problem is how the > > aquantia driver tried to allocate a humongous buff_ring with kmalloc, > > which really doesn't work. You can see that "order:6", ie we're > > talking an allocation > 100kB, and in low-memory situations that kind > > of kmalloc space simply isn't available. It *will* fail. > > Correct. > It seems after some series of pm-related changes the driver logic was changed to always > reinit full sw structures during suspend/resume, which is obviously an overkill. Another option you can consider is lowering the default ring size. If I'm looking right you default to 4k descriptors for Tx. Is it based on real life experience? Longer term hopefully the queue management API will help drivers avoid freeing memory at suspend. We'll be able to disarm and unconfigure queues without freeing the memory.
On 11/27/2023 7:02 PM, Linus Torvalds wrote: > > So I suspect that one reason I triggered the problem was simply > because the suspend/resume happened while I walked away from the > computer when it was copying a few hundred gig of data from the old > SSD (over USB, so not hugely fast). ... > Also, make sure you don't have ridiculous amounts of memory in your > machine. I've got "only" 64GB in mine, which is small for a big > machine, and presumably a lot of it was used for buffer cache, and I'm > not sure what the device suspend/resume ordering was (ie disk might be > resumed after ethernet). With these details in mind I was able to repro this within seconds on my 16Gb machine, basically by doing a stress in parallel: stress -m 2000 --vm-bytes 20M --vm-hang 10 --backoff 1000 while true; do sudo ifconfig enp1s0 down; sudo ifconfig enp1s0 up; done in 5-10 seconds I get [ 859.536856] atlantic 0000:01:00.0 enp1s0: aq_ring_alloc[6](0x30000) [ 859.563153] warn_alloc: 1 callbacks suppressed [ 859.563156] ifconfig: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 [ 859.563163] CPU: 13 PID: 48544 Comm: ifconfig Tainted: G OE 6.7.0-rc2igor1+ #1 [ 859.563165] Hardware name: ASUS System Product Name/PRIME Z590-P, BIOS 1017 07/12/2021 [ 859.563166] Call Trace: [ 859.563168] <TASK> [ 859.563170] dump_stack_lvl+0x48/0x70 [ 859.563175] dump_stack+0x10/0x20 [ 859.563177] warn_alloc+0x119/0x190 [ 859.563180] ? __alloc_pages_direct_compact+0xae/0x1f0 [ 859.563183] __alloc_pages_slowpath.constprop.0+0xd1a/0xdd0 [ 859.563188] __alloc_pages+0x304/0x350 [ 859.563192] ? aq_ring_alloc+0x29/0xe0 [atlantic] [ 859.563207] __kmalloc_large_node+0x7f/0x140 [ 859.563210] __kmalloc+0xc9/0x140 [ 859.563212] aq_ring_alloc+0x29/0xe0 [atlantic] [ 859.563221] aq_ring_rx_alloc+0x7d/0x90 [atlantic] [ 859.563230] aq_vec_ring_alloc+0xab/0x170 [atlantic] [ 859.563241] aq_nic_init+0x11c/0x1e0 [atlantic] [ 859.563250] aq_ndev_open+0x20/0x90 [atlantic] [ 859.563259] __dev_open+0xe9/0x190 [ 859.563261] __dev_change_flags+0x18c/0x1f0 [ 859.563263] dev_change_flags+0x26/0x70 [ 859.563265] devinet_ioctl+0x602/0x760 [ 859.563268] inet_ioctl+0x167/0x190 [ 859.563269] ? sk_ioctl+0x4b/0x110 [ 859.563271] ? inet_ioctl+0x95/0x190 [ 859.563273] sock_do_ioctl+0x44/0xf0 [ 859.563274] ? __check_object_size+0x51/0x2d0 [ 859.563277] ? _copy_to_user+0x25/0x40 [ 859.563279] sock_ioctl+0xf7/0x300 [ 859.563280] __x64_sys_ioctl+0x95/0xd0 [ 859.563283] do_syscall_64+0x5c/0xe0 [ 859.563286] ? exit_to_user_mode_prepare+0x45/0x1a0 [ 859.563289] ? syscall_exit_to_user_mode+0x34/0x50 [ 859.563291] ? do_syscall_64+0x6b/0xe0 [ 859.563293] ? do_syscall_64+0x6b/0xe0 [ 859.563295] ? syscall_exit_to_user_mode+0x34/0x50 [ 859.563296] ? __x64_sys_openat+0x20/0x30 [ 859.563298] ? do_syscall_64+0x6b/0xe0 [ 859.563300] ? syscall_exit_to_user_mode+0x34/0x50 [ 859.563301] ? __x64_sys_read+0x1a/0x20 [ 859.563303] ? do_syscall_64+0x6b/0xe0 [ 859.563305] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 859.563307] RIP: 0033:0x7f98499df3ab [ 859.563309] Code: 0f 1e fa 48 8b 05 e5 7a 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 7a 0d 00 f7 d8 64 89 01 48 [ 859.563310] RSP: 002b:00007fffba955138 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 859.563312] RAX: ffffffffffffffda RBX: 00007fffba955140 RCX: 00007f98499df3ab [ 859.563313] RDX: 00007fffba955140 RSI: 0000000000008914 RDI: 0000000000000004 [ 859.563314] RBP: 00007fffba9551f0 R08: 0000000000000008 R09: 0000000000000001 [ 859.563315] R10: 0000000000000011 R11: 0000000000000202 R12: 0000000000000041 [ 859.563316] R13: 00007fffba9554e8 R14: 0000000000000000 R15: 0000000000000000 [ 859.563318] </TASK> [ 859.563319] Mem-Info: [ 859.563320] active_anon:14091 inactive_anon:3805083 isolated_anon:2336 active_file:4601 inactive_file:5452 isolated_file:3 unevictable:2258 dirty:56 writeback:0 slab_reclaimable:35879 slab_unreclaimable:42730 mapped:8485 shmem:2635 pagetables:35066 sec_pagetables:0 bounce:0 kernel_misc_reclaimable:0 free:56673 free_pcp:0 free_cma:0 [ 859.563323] Node 0 active_anon:56364kB inactive_anon:15220332kB active_file:18404kB inactive_file:21808kB unevictable:9032kB isolated(anon):9344kB isolated(file):12kB mapped:33940kB dirty:224kB writeback:0kB shmem:10540kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB kernel_stack:42592kB pagetables:140264kB sec_pagetables:0kB all_unreclaimable? no [ 859.563326] Node 0 DMA free:13308kB boost:0kB min:64kB low:80kB high:96kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 859.563329] lowmem_reserve[]: 0 2305 15744 15744 15744 [ 859.563332] Node 0 DMA32 free:63608kB boost:0kB min:9884kB low:12352kB high:14820kB reserved_highatomic:0KB active_anon:16kB inactive_anon:2330560kB active_file:88kB inactive_file:36kB unevictable:0kB writepending:8kB present:2467796kB managed:2401988kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 859.563335] lowmem_reserve[]: 0 0 13439 13439 13439 [ 859.563338] Node 0 Normal free:149776kB boost:90112kB min:147740kB low:162144kB high:176548kB reserved_highatomic:2048KB active_anon:56024kB inactive_anon:12889984kB active_file:19608kB inactive_file:22216kB unevictable:9032kB writepending:0kB present:14098432kB managed:13769880kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 859.563342] lowmem_reserve[]: 0 0 0 0 0 [ 859.563344] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 2*2048kB (UM) 2*4096kB (M) = 13308kB [ 859.563353] Node 0 DMA32: 2*4kB (UM) 2*8kB (UM) 5*16kB (M) 5*32kB (UM) 6*64kB (UM) 17*128kB (UM) 22*256kB (UM) 4*512kB (UM) 6*1024kB (UM) 11*2048kB (UM) 6*4096kB (UM) = 63752kB [ 859.563362] Node 0 Normal: 7073*4kB (UMEH) 1922*8kB (UMEH) 756*16kB (UMEH) 464*32kB (UMEH) 830*64kB (UMEH) 197*128kB (UMH) 6*256kB (MH) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 150484kB [ 859.563371] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 859.563373] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 859.563374] 23673 total pagecache pages [ 859.563374] 11582 pages in swap cache [ 859.563375] Free swap = 7137276kB [ 859.563375] Total swap = 8003580kB [ 859.563376] 4145555 pages RAM [ 859.563376] 0 pages HighMem/MovableOnly [ 859.563377] 98748 pages reserved [ 859.563378] 0 pages hwpoisoned [ 859.563379] atlantic 0000:01:00.0 enp1s0: aq_ring_alloc[6](0x18000) [ 859.563381] atlantic 0000:01:00.0 enp1s0: aq_ring_alloc[6] FAILURE ============================= [ 859.563388] atlantic 0000:01:00.0 enp1s0: device open failure [ 860.996946] atlantic 0000:01:00.0 enp1s0: aq_ring_alloc[0](0x30000) [ 860.996961] atlantic 0000:01:00.0 enp1s0: aq_ring_alloc[0](0x18000) Thats already with the patch applied, so no panic and next "ifconfig up" recovers the device state. I will submit a bugfix patch for that solution, but will also continue looking into suspend/resume refactoring. Thanks, Igor
On Tue, 28 Nov 2023 at 09:54, Igor Russkikh <irusskikh@marvell.com> wrote: > > Thats already with the patch applied, so no panic and next "ifconfig up" recovers the device state. Ack. At least it's recoverable, I had to reboot the machine when it happened to me (and am very happy that at least all the logs made it to disk). > I will submit a bugfix patch for that solution, but will also continue looking into suspend/resume refactoring. Thanks, Linus
On 11/27/2023 11:59 PM, Jakub Kicinski wrote: > Another option you can consider is lowering the default ring size. > If I'm looking right you default to 4k descriptors for Tx. > Is it based on real life experience? Probably reducing default will help - but again not 100%. I remember these numbers where chosen mainly to show up good 10Gbps line speed in tests, like iperf udp/tcp flood. But these of course artificial. For sure "normal" user can survive even with lower digits. Thanks Igor
On Tue, 28 Nov 2023 20:18:49 +0100 Igor Russkikh wrote: > > Another option you can consider is lowering the default ring size. > > If I'm looking right you default to 4k descriptors for Tx. > > Is it based on real life experience? > > Probably reducing default will help - but again not 100%. > > I remember these numbers where chosen mainly to show up good 10Gbps > line speed in tests, like iperf udp/tcp flood. But these of course > artificial. > > For sure "normal" user can survive even with lower digits. For Rx under load larger rings are sometimes useful to avoid drops. But your Tx rings are larger than Rx, which is a bit odd. I was going to say that with BQL enabled you're very unlikely to ever use much of the 4k Tx ring, anyway. But you don't have BQL support :S My free advice is to recheck you really need these sizes and implement BQL :)
On 11/28/2023 10:09 PM, Jakub Kicinski wrote: > > For Rx under load larger rings are sometimes useful to avoid drops. > But your Tx rings are larger than Rx, which is a bit odd. Agree. Just looked into the history, and it looks like this size was chosen since the very first commit of this driver. > I was going to say that with BQL enabled you're very unlikely to ever > use much of the 4k Tx ring, anyway. But you don't have BQL support :S > > My free advice is to recheck you really need these sizes and implement > BQL :) Thanks for the hint, will consider this. Regards Igor
I see a fix for double free [0] landed in 6.7; I've been running that for a few days and have hit a resume from suspend issue twice. Stack trace looks a little different (via __iommu_dma_map instead of __iommu_dma_free), provided below. I've had resume issues with the atlantic driver since I've had this hardware, but it went away for a while and seems as though it may have come back with 6.7. (No crashes since logs begin on Dec 15 till Jan 12, Upgrade to 6.7; crashes 20th and 21st, though my usage style of the system has also varied, maybe crashes are associated with higher memory usage?). Possibly unrelated but I also see fairly frequent (1 to ten times per boot, since logs begin?) messages in my logs of the form "atlantic 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xffce8000 flags=0x0020]". [0] https://github.com/torvalds/linux/commit/7bb26ea74aa86fdf894b7dbd8c5712c5b4187da7 - Peter kworker/u65:2: page allocation failure: order:6, mode:0x40d00(GFP_NOIO|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 18 PID: 166017 Comm: kworker/u65:2 Not tainted 6.7.0 Hardware name: ASUS System Product [...] BIOS 1502 06/08/2023 Workqueue: events_unbound async_run_entry_fn Call Trace: <TASK> dump_stack_lvl+0x47/0x60 warn_alloc+0x165/0x1e0 ? srso_alias_return_thunk+0x5/0xfbef5 ? __alloc_pages_direct_compact+0xb3/0x290 __alloc_pages+0x109e/0x1130 ? iommu_dma_alloc_iova+0xd4/0x120 ? srso_alias_return_thunk+0x5/0xfbef5 ? __iommu_dma_map+0x84/0xf0 ? aq_ring_alloc+0x22/0x80 [atlantic] __kmalloc_large_node+0x77/0x130 __kmalloc+0xc6/0x150 aq_ring_alloc+0x22/0x80 [atlantic] aq_vec_ring_alloc+0xee/0x1a0 [atlantic] aq_nic_init+0x118/0x1d0 [atlantic] atl_resume_common+0x40/0xd0 [atlantic] On 30/11/2023 12:59, Igor Russkikh wrote: > > On 11/28/2023 10:09 PM, Jakub Kicinski wrote: >> For Rx under load larger rings are sometimes useful to avoid drops. >> But your Tx rings are larger than Rx, which is a bit odd. > Agree. Just looked into the history, and it looks like this size was chosen > since the very first commit of this driver. > >> I was going to say that with BQL enabled you're very unlikely to ever >> use much of the 4k Tx ring, anyway. But you don't have BQL support :S >> >> My free advice is to recheck you really need these sizes and implement >> BQL :) > Thanks for the hint, will consider this. > > Regards > Igor
On 1/21/2024 10:05 PM, Peter Waller wrote: > I see a fix for double free [0] landed in 6.7; I've been running that > for a few days and have hit a resume from suspend issue twice. Stack > trace looks a little different (via __iommu_dma_map instead of > __iommu_dma_free), provided below. > > I've had resume issues with the atlantic driver since I've had this > hardware, but it went away for a while and seems as though it may have > come back with 6.7. (No crashes since logs begin on Dec 15 till Jan 12, > Upgrade to 6.7; crashes 20th and 21st, though my usage style of the > system has also varied, maybe crashes are associated with higher memory > usage?). Hi Peter, Are these hard crashes, or just warnings in dmesg you see? From the log you provided it looks like a warning, meaning system is usable and driver can be restored with `if down/up` sequence. If so, then this is somewhat expected, because I'm still looking into how to refactor this suspend/resume cycle to reduce mem usage. Permanent workaround would be to reduce rx/tx ring sizes with something like ethtool -G rx 1024 tx 1024 If its a hard panic, we should look deeper into it. > Possibly unrelated but I also see fairly frequent (1 to ten times per > boot, since logs begin?) messages in my logs of the form "atlantic > 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 > address=0xffce8000 flags=0x0020]". Seems to be unrelated, but basically indicates HW or FW tries to access unmapped memory addresses, and iommu catches that. Full dmesg may help analyze this. Regards Igor
True, it is a warning rather than a hard crash, though shutdown hangs. Thanks for the workaround. I can provide more dmesg when I’m back at my computer. Do you need the whole thing or is there something in particular you want from it? From memory there isn’t much more in the way of messages that looked connected to me. Sent from my mobile, please excuse brevity > On 23 Jan 2024, at 14:59, Igor Russkikh <irusskikh@marvell.com> wrote: > > >> On 1/21/2024 10:05 PM, Peter Waller wrote: >> I see a fix for double free [0] landed in 6.7; I've been running that >> for a few days and have hit a resume from suspend issue twice. Stack >> trace looks a little different (via __iommu_dma_map instead of >> __iommu_dma_free), provided below. >> >> I've had resume issues with the atlantic driver since I've had this >> hardware, but it went away for a while and seems as though it may have >> come back with 6.7. (No crashes since logs begin on Dec 15 till Jan 12, >> Upgrade to 6.7; crashes 20th and 21st, though my usage style of the >> system has also varied, maybe crashes are associated with higher memory >> usage?). > > Hi Peter, > > Are these hard crashes, or just warnings in dmesg you see? > From the log you provided it looks like a warning, meaning system is usable > and driver can be restored with `if down/up` sequence. > > If so, then this is somewhat expected, because I'm still looking into > how to refactor this suspend/resume cycle to reduce mem usage. > Permanent workaround would be to reduce rx/tx ring sizes with something like > > ethtool -G rx 1024 tx 1024 > > If its a hard panic, we should look deeper into it. > >> Possibly unrelated but I also see fairly frequent (1 to ten times per >> boot, since logs begin?) messages in my logs of the form "atlantic >> 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 >> address=0xffce8000 flags=0x0020]". > > Seems to be unrelated, but basically indicates HW or FW tries to access unmapped > memory addresses, and iommu catches that. > Full dmesg may help analyze this. > > Regards > Igor
Here's part of the log, I can provide more off list if it helps. - Peter <previous boot> Filesystems sync: 0.014 seconds <n>.678271 Freezing user space processes <n>.678366 Freezing user space processes completed (elapsed 0.001 seconds) <n>.678383 OOM killer disabled. <n>.678397 Freezing remaining freezable tasks <n>.678407 Freezing remaining freezable tasks completed (elapsed 0.000 seconds) <n>.678423 printk: Suspending console(s) (use no_console_suspend to debug) <n>.678437 serial 00:04: disabled <n>.678654 queueing ieee80211 work while going to suspend <n>.678680 sd 9:0:0:0: [sda] Synchronizing SCSI cache <n>.678884 ata10.00: Entering standby power mode <n>.678900 atlantic 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xfc80b000 flags=0x0020] <n>.679124 atlantic 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xffeae520 flags=0x0020] <n>.679270 atlantic 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xfc80c000 flags=0x0020] <n>.679411 atlantic 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xffeae530 flags=0x0020] <n>.679541 ACPI: EC: interrupt blocked <n>.679554 amdgpu 0000:03:00.0: amdgpu: MODE1 reset <n>.679682 amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset <n>.679803 amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset <n>.679919 ACPI: PM: Preparing to enter system sleep state S3 <n>.679931 ACPI: EC: event blocked <n>.679942 ACPI: EC: EC stopped <n>.679952 ACPI: PM: Saving platform NVS memory <n>.679959 Disabling non-boot CPUs ... <snip> <n>.682471 atlantic 0000:0c:00.0 eno2: atlantic: link change old 1000 new 0 <snip> <n>.687497 PM: suspend exit On 23/01/2024 15:13, Peter Waller wrote: > True, it is a warning rather than a hard crash, though shutdown hangs. Thanks for the workaround. > > I can provide more dmesg when I’m back at my computer. Do you need the whole thing or is there something in particular you want from it? From memory there isn’t much more in the way of messages that looked connected to me. > > Sent from my mobile, please excuse brevity > >> On 23 Jan 2024, at 14:59, Igor Russkikh <irusskikh@marvell.com> wrote: >> >> >>> On 1/21/2024 10:05 PM, Peter Waller wrote: >>> I see a fix for double free [0] landed in 6.7; I've been running that >>> for a few days and have hit a resume from suspend issue twice. Stack >>> trace looks a little different (via __iommu_dma_map instead of >>> __iommu_dma_free), provided below. >>> >>> I've had resume issues with the atlantic driver since I've had this >>> hardware, but it went away for a while and seems as though it may have >>> come back with 6.7. (No crashes since logs begin on Dec 15 till Jan 12, >>> Upgrade to 6.7; crashes 20th and 21st, though my usage style of the >>> system has also varied, maybe crashes are associated with higher memory >>> usage?). >> Hi Peter, >> >> Are these hard crashes, or just warnings in dmesg you see? >> From the log you provided it looks like a warning, meaning system is usable >> and driver can be restored with `if down/up` sequence. >> >> If so, then this is somewhat expected, because I'm still looking into >> how to refactor this suspend/resume cycle to reduce mem usage. >> Permanent workaround would be to reduce rx/tx ring sizes with something like >> >> ethtool -G rx 1024 tx 1024 >> >> If its a hard panic, we should look deeper into it. >> >>> Possibly unrelated but I also see fairly frequent (1 to ten times per >>> boot, since logs begin?) messages in my logs of the form "atlantic >>> 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 >>> address=0xffce8000 flags=0x0020]". >> Seems to be unrelated, but basically indicates HW or FW tries to access unmapped >> memory addresses, and iommu catches that. >> Full dmesg may help analyze this. >> >> Regards >> Igor
On 1/23/2024 10:02 PM, Peter Waller wrote: > Here's part of the log, I can provide more off list if it helps. - Peter > > <n>.678900 atlantic 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT > domain=0x0014 address=0xfc80b000 flags=0x0020] > <n>.679124 atlantic 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT > domain=0x0014 address=0xffeae520 flags=0x0020] > <n>.679270 atlantic 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT > domain=0x0014 address=0xfc80c000 flags=0x0020] > <n>.679411 atlantic 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT > domain=0x0014 address=0xffeae530 flags=0x0020] Thanks, these looks like descriptor prefetch accesses (those aligned by 0x10), and packet page accesses (aligned by 0x1000). Overall strange, because driver normally deinit all the device activities, not to access host memory after unmapping. I will check if there any potential flaws exist. Regards, Igor
Hi Peter, > I've been running with this for a day or two, and had a panic on resume > today, dmesg below. > > - Peter > > [65525.454687] atlantic: Boot code hanged > [65525.561024] ------------[ cut here ]------------ > [65525.561026] hw_atl2_shared_buffer_finish_ack > [65525.561042] WARNING: CPU: 8 PID: 797385 at > drivers/net/ethernet/aquantia/atlantic/hw_atl2/hw_atl2_utils_fw.c:112 > aq_a2_fw_deinit+0xcd/0xe0 [atlantic] Unfortunately this seems to be a different issue, HW related. I suspect you have ASUS labeled NIC or MB? Driver is reporting here it can't not activate device and firmware is not responsive. There exists a BZ: https://bugzilla.kernel.org/show_bug.cgi?id=217260 exploring a similar problem. It has some workaround patch proposal, but its questionable. Igor
On 02/02/2024 10:35, Igor Russkikh wrote:
> Unfortunately this seems to be a different issue, HW related. I suspect you have ASUS labeled NIC or MB?
Thanks for the link. Yes, it's an ASUS motherboard with onboard NIC.
There are bios updates available which may also come with firmware updates.
Do you happen to know if there a way to identify/fingerprint/version the
firmware on the NIC via the dmesg/lshw output/bios screens so that I may
tell if updating the bios is also supplying new firmware?
drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c index 4de22eed099a..472c7c08bfed 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c @@ -932,11 +932,14 @@ void aq_ring_free(struct aq_ring_s *self) return; kfree(self->buff_ring); + self->buff_ring = NULL; - if (self->dx_ring) + if (self->dx_ring) { dma_free_coherent(aq_nic_get_dev(self->aq_nic), self->size * self->dx_size, self->dx_ring, self->dx_ring_pa); + self->dx_ring = NULL; + } } unsigned int aq_ring_fill_stats_data(struct aq_ring_s *self, u64 *data)