[iwl-net,v1] ice: fix NULL pointer access during resume

The ice_suspend/ice_resume cycle was not updated when refactoring was
done to the init path and I suspect this allowed a bug to creep in where
the driver was not correctly reinitialized during resume.

I was able to test against 6.1.77 kernel and that ice driver works fine
for suspend/resume with no panic.

Instead of tearing down interrupts and freeing a bunch of memory during
suspend, just begin an internal reset event, which takes care of all the
correct steps during suspend.  Likewise during resume we'll just let the
reset complete and the driver comes right back to life. This mirrors the
behavior of other suspend/resume code in drivers like fm10k.

Older kernel commits were made to this driver and to the i40e driver to
try to fix "disk" or hibernate suspend events with many CPUs. The PM
subsystem was updated since then but the drivers kept the old flows.
Testing with rtcwake -m [disk | mem] -s 10 - passes but my system won't
hibernate due to too much RAM, not enough swap.

The code is slightly refactored during this change in order to share a
common "prep" path between suspend and the pci error handler functions
which all do the same thing, so introduce ice_quiesce_before_reset().

While doing all this and compile testing I ran across the pm.h changes
to get rid of compilation problems when CONFIG_PM=n etc, so those small
changes are included here as well.

PANIC from 6.8.0-rc1:

[1026674.915596] PM: suspend exit
[1026675.664697] ice 0000:17:00.1: PTP reset successful
[1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time
[1026675.667660] ice 0000:b1:00.0: PTP reset successful
[1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time
[1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None
[1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010
[1026677.192753] ice 0000:17:00.0: PTP reset successful
[1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time
[1026677.197928] #PF: supervisor read access in kernel mode
[1026677.197933] #PF: error_code(0x0000) - not-present page
[1026677.197937] PGD 1557a7067 P4D 0
[1026677.212133] ice 0000:b1:00.1: PTP reset successful
[1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time
[1026677.212575]
[1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI
[1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G        W          6.8.0-rc1+ #1
[1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022
[1026677.269367] Workqueue: ice ice_service_task [ice]
[1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6
[1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202
[1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000
[1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828
[1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010
[1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0
[1026677.344472] FS:  0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000
[1026677.353000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0
[1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1026677.381952] PKRU: 55555554
[1026677.385116] Call Trace:
[1026677.388023]  <TASK>
[1026677.390589]  ? __die+0x20/0x70
[1026677.394105]  ? page_fault_oops+0x82/0x160
[1026677.398576]  ? do_user_addr_fault+0x65/0x6a0
[1026677.403307]  ? exc_page_fault+0x6a/0x150
[1026677.407694]  ? asm_exc_page_fault+0x22/0x30
[1026677.412349]  ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.418614]  ice_vsi_rebuild+0x34b/0x3c0 [ice]
[1026677.423583]  ice_vsi_rebuild_by_type+0x76/0x180 [ice]
[1026677.429147]  ice_rebuild+0x18b/0x520 [ice]
[1026677.433746]  ? delay_tsc+0x8f/0xc0
[1026677.437630]  ice_do_reset+0xa3/0x190 [ice]
[1026677.442231]  ice_service_task+0x26/0x440 [ice]
[1026677.447180]  process_one_work+0x174/0x340
[1026677.451669]  worker_thread+0x27e/0x390
[1026677.455890]  ? __pfx_worker_thread+0x10/0x10
[1026677.460627]  kthread+0xee/0x120
[1026677.464235]  ? __pfx_kthread+0x10/0x10
[1026677.468445]  ret_from_fork+0x2d/0x50
[1026677.472476]  ? __pfx_kthread+0x10/0x10
[1026677.476671]  ret_from_fork_asm+0x1b/0x30
[1026677.481050]  </TASK>

Fixes: 5b246e533d01 ("ice: split probe into smaller functions")
Reported-by: Robert Elliott <elliott@hpe.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
NOTE:
Requires Amritha's patch:
https://patchwork.ozlabs.org/project/intel-wired-lan/patch/170785373072.3325.9129916579186572531.stgit@anambiarhost.jf.intel.com/
to be applied before this will pass testing cleanly.

Checkpatch warns on no "Closes:" but this was reported on a private
list, so there is nothing to close.

Testing Hints: 'rtcwake -m mem -s 10' should result in a 10 second sleep
and wake, with the interface fully functional afterward. Please also
test that magic packet wake can be enabled on an adapter that supports
it, and that the magic packet wakes the system.
---
 drivers/net/ethernet/intel/ice/ice_main.c | 179 +++-------------------
 1 file changed, 25 insertions(+), 154 deletions(-)

base-commit: 23f9c2c066e7e5052406fb8f04a115d3d0260b22

Message ID	20240220231720.14836-1-jesse.brandeburg@intel.com (mailing list archive)
State	Awaiting Upstream
Delegated to:	Netdev Maintainers
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E1A3154431 for <netdev@vger.kernel.org>; Tue, 20 Feb 2024 23:17:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708471064; cv=none; b=b7HCtWiSw8Pm1O3e6gVkcUHuQ88eHYvPtwlJ/ceaZvzzDGmCUAsLbdIJXqlNNClryPuOrMgDf2IbylBBxnscKMJHxPh+N/nu/GZRym2/RisVxaELrQugl1Ax42lBt5vfKqhYb4HoZoIXMbZi/yM+GVhYipqjkdiYRBn4dXGJC+c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708471064; c=relaxed/simple; bh=rLt2N4qD6L6RlGXQyss0vOroeT+miPoL8jbOXfsCIvc=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=bhbNziYvtZvTxuWzwgtBnTO0gL3w8DGCVqJldlLK23NF32Okn4X2ao2zUcMRmeFsvOFe3sLWnbSObVunc4E0wv5R9DRdLMtCx1BNPkBcB2xLL2TOuEgAZcTS6k4cexjZ3Z8zKksBJlxQAQ0YSzX0RsrJv7HVQ3d4Qd8tdbj18Gk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NE5szozN; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NE5szozN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1708471062; x=1740007062; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=rLt2N4qD6L6RlGXQyss0vOroeT+miPoL8jbOXfsCIvc=; b=NE5szozNFmKfY94thzeafKj1oR8FoMrIumZXFYwpLxP3lB800vC2b8MC IGSXK0ddarWz/KdojZvBb5HG7l1USYdCgJLd2KV2Z+/wkOBYiyKopN5DZ nLyqzm2i3FPhtYLO/eVTlF8ImgG0mQJc6Nr7UStvz6rlLcEZVb0ieJD+L Hbh48YPM+Z6290LHCFynq1u2pmIWSaCJwI/STe35bWfaBSCuGNfoDerMp seeYfW4ZszXHWuW3Y4mdl7wDlOww6bqGcMkRdlopA2yRo1WrUJ5UfAu92 FBM+gql6w6SKDpA9j/WCG9WYknl3RUGyA1Wq/LHsCJ4Kq7+MIZDVt+8XW A==; X-IronPort-AV: E=McAfee;i="6600,9927,10990"; a="5560983" X-IronPort-AV: E=Sophos;i="6.06,174,1705392000"; d="scan'208";a="5560983" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Feb 2024 15:17:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10990"; a="913165830" X-IronPort-AV: E=Sophos;i="6.06,174,1705392000"; d="scan'208";a="913165830" Received: from jbrandeb-coyote30.jf.intel.com ([10.166.29.19]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Feb 2024 15:17:30 -0800 From: Jesse Brandeburg <jesse.brandeburg@intel.com> To: intel-wired-lan@lists.osuosl.org Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>, netdev@vger.kernel.org, Robert Elliott <elliott@hpe.com>, Jacob Keller <jacob.e.keller@intel.com>, Tony Nguyen <anthony.l.nguyen@intel.com>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Subject: [PATCH iwl-net v1] ice: fix NULL pointer access during resume Date: Tue, 20 Feb 2024 15:17:20 -0800 Message-Id: <20240220231720.14836-1-jesse.brandeburg@intel.com> X-Mailer: git-send-email 2.39.3 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: <netdev.vger.kernel.org> List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Delegate: kuba@kernel.org
Series	[iwl-net,v1] ice: fix NULL pointer access during resume \| expand [iwl-net,v1] ice: fix NULL pointer access during resume

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for net
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 956 this patch: 956
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	success	CCed 8 of 8 maintainers
netdev/build_clang	success	Errors and warnings before: 973 this patch: 973
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 973 this patch: 973
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 261 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

[iwl-net,v1] ice: fix NULL pointer access during resume

Checks

Commit Message

Comments

Patch