Message ID | 20180731003954.19962-4-lyude@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix connector probing deadlocks from RPM bugs | expand |
Hi Lyude, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.18-rc7] [cannot apply to next-20180731] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Lyude-Paul/Fix-connector-probing-deadlocks-from-RPM-bugs/20180801-153816 config: x86_64-randconfig-x014-201830 (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): In file included from drivers/gpu//drm/drm_crtc_helper.c:42:0: include/drm/drm_fb_helper.h: In function 'drm_fb_helper_suspend_hotplug': >> include/drm/drm_fb_helper.h:586:1: warning: no return statement in function returning non-void [-Wreturn-type] } ^ vim +586 include/drm/drm_fb_helper.h 578 579 static inline void 580 drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper) 581 { 582 } 583 static inline bool 584 drm_fb_helper_suspend_hotplug(struct drm_fb_helper *fb_helper) 585 { > 586 } 587 #endif 588 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Lyude, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.18-rc7] [cannot apply to next-20180731] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Lyude-Paul/Fix-connector-probing-deadlocks-from-RPM-bugs/20180801-153816 reproduce: make htmldocs All warnings (new ones prefixed by >>): WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org) mm/mempool.c:228: warning: Function parameter or member 'pool' not described in 'mempool_init' include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ibss' not described in 'wireless_dev' include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.connect' not described in 'wireless_dev' include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.keys' not described in 'wireless_dev' include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ie' not described in 'wireless_dev' include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ie_len' not described in 'wireless_dev' include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.bssid' not described in 'wireless_dev' include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ssid' not described in 'wireless_dev' include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.default_key' not described in 'wireless_dev' include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.default_mgmt_key' not described in 'wireless_dev' include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.prev_bssid_valid' not described in 'wireless_dev' include/net/mac80211.h:2282: warning: Function parameter or member 'radiotap_timestamp.units_pos' not described in 'ieee80211_hw' include/net/mac80211.h:2282: warning: Function parameter or member 'radiotap_timestamp.accuracy' not described in 'ieee80211_hw' include/net/mac80211.h:955: warning: Function parameter or member 'control.rates' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'control.rts_cts_rate_idx' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'control.use_rts' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'control.use_cts_prot' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'control.short_preamble' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'control.skip_table' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'control.jiffies' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'control.vif' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'control.hw_key' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'control.flags' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'control.enqueue_time' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'ack' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'ack.cookie' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'status.rates' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'status.ack_signal' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'status.ampdu_ack_len' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'status.ampdu_len' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'status.antenna' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'status.tx_time' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'status.is_valid_ack_signal' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'status.status_driver_data' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'driver_rates' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'pad' not described in 'ieee80211_tx_info' include/net/mac80211.h:955: warning: Function parameter or member 'rate_driver_data' not described in 'ieee80211_tx_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg.signal' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg.chain_signal' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.filtered' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.retry_failed' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.retry_count' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.lost_packets' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_tdls_pkt_time' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.msdu_retries' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.msdu_failed' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_ack' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_ack_signal' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.ack_signal_filled' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.avg_ack_signal' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.packets' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.bytes' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.last_rate' not described in 'sta_info' net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.msdu' not described in 'sta_info' kernel/sched/fair.c:3760: warning: Function parameter or member 'flags' not described in 'attach_entity_load_avg' include/linux/device.h:93: warning: bad line: this bus. include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.cb' not described in 'dma_buf' include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.poll' not described in 'dma_buf' include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.active' not described in 'dma_buf' include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.cb' not described in 'dma_buf' include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.poll' not described in 'dma_buf' include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.active' not described in 'dma_buf' include/linux/dma-fence-array.h:54: warning: Function parameter or member 'work' not described in 'dma_fence_array' include/linux/gpio/driver.h:142: warning: Function parameter or member 'request_key' not described in 'gpio_irq_chip' include/linux/iio/hw-consumer.h:1: warning: no structured comments found include/linux/device.h:94: warning: bad line: this bus. include/linux/input/sparse-keymap.h:46: warning: Function parameter or member 'sw' not described in 'key_entry' include/linux/regulator/driver.h:227: warning: Function parameter or member 'resume_early' not described in 'regulator_ops' drivers/regulator/core.c:4465: warning: Excess function parameter 'state' description in 'regulator_suspend_late' arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw0' not described in 'irb' arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw1' not described in 'irb' arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw2' not described in 'irb' arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw3' not described in 'irb' arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.eadm' not described in 'irb' drivers/usb/dwc3/gadget.c:510: warning: Excess function parameter 'dwc' description in 'dwc3_gadget_start_config' include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_pin' not described in 'drm_driver' include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_unpin' not described in 'drm_driver' include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_res_obj' not described in 'drm_driver' include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_get_sg_table' not described in 'drm_driver' include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_import_sg_table' not described in 'drm_driver' include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_vmap' not described in 'drm_driver' include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_vunmap' not described in 'drm_driver' include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_mmap' not described in 'drm_driver' >> drivers/gpu/drm/drm_fb_helper.c:2750: warning: Function parameter or member 'fb_helper' not described in 'drm_fb_helper_resume_hotplug' drivers/gpu/drm/i915/i915_vma.h:48: warning: cannot understand function prototype: 'struct i915_vma ' drivers/gpu/drm/i915/i915_vma.h:1: warning: no structured comments found include/drm/tinydrm/tinydrm.h:34: warning: Function parameter or member 'fb_dirty' not described in 'tinydrm_device' drivers/gpu/drm/tinydrm/mipi-dbi.c:272: warning: Function parameter or member 'crtc_state' not described in 'mipi_dbi_enable_flush' drivers/gpu/drm/tinydrm/mipi-dbi.c:272: warning: Function parameter or member 'plane_state' not described in 'mipi_dbi_enable_flush' include/linux/skbuff.h:852: warning: Function parameter or member 'dev_scratch' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'ip_defrag_offset' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'skb_mstamp' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member '__cloned_offset' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'head_frag' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member '__pkt_type_offset' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'encapsulation' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'encap_hdr_csum' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'csum_valid' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'csum_complete_sw' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'csum_level' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'inner_protocol_type' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'remcsum_offload' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'offload_fwd_mark' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'offload_mr_fwd_mark' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'sender_cpu' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'reserved_tailroom' not described in 'sk_buff' include/linux/skbuff.h:852: warning: Function parameter or member 'inner_ipproto' not described in 'sk_buff' include/net/sock.h:234: warning: Function parameter or member 'skc_addrpair' not described in 'sock_common' include/net/sock.h:234: warning: Function parameter or member 'skc_portpair' not described in 'sock_common' include/net/sock.h:234: warning: Function parameter or member 'skc_ipv6only' not described in 'sock_common' include/net/sock.h:234: warning: Function parameter or member 'skc_net_refcnt' not described in 'sock_common' include/net/sock.h:234: warning: Function parameter or member 'skc_v6_daddr' not described in 'sock_common' include/net/sock.h:234: warning: Function parameter or member 'skc_v6_rcv_saddr' not described in 'sock_common' include/net/sock.h:234: warning: Function parameter or member 'skc_cookie' not described in 'sock_common' include/net/sock.h:234: warning: Function parameter or member 'skc_listener' not described in 'sock_common' include/net/sock.h:234: warning: Function parameter or member 'skc_tw_dr' not described in 'sock_common' include/net/sock.h:234: warning: Function parameter or member 'skc_rcv_wnd' not described in 'sock_common' include/net/sock.h:234: warning: Function parameter or member 'skc_tw_rcv_nxt' not described in 'sock_common' include/net/sock.h:493: warning: Function parameter or member 'sk_backlog.rmem_alloc' not described in 'sock' include/net/sock.h:493: warning: Function parameter or member 'sk_backlog.len' not described in 'sock' include/net/sock.h:493: warning: Function parameter or member 'sk_backlog.head' not described in 'sock' include/net/sock.h:493: warning: Function parameter or member 'sk_backlog.tail' not described in 'sock' include/net/sock.h:493: warning: Function parameter or member 'sk_wq_raw' not described in 'sock' include/net/sock.h:493: warning: Function parameter or member 'tcp_rtx_queue' not described in 'sock' include/net/sock.h:493: warning: Function parameter or member 'sk_route_forced_caps' not described in 'sock' include/net/sock.h:493: warning: Function parameter or member 'sk_validate_xmit_skb' not described in 'sock' include/linux/netdevice.h:1997: warning: Function parameter or member 'adj_list.upper' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member 'adj_list.lower' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member 'gso_partial_features' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member 'switchdev_ops' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member 'l3mdev_ops' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member 'xfrmdev_ops' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member 'tlsdev_ops' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member 'name_assign_type' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member 'ieee802154_ptr' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member 'mpls_ptr' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member 'xdp_prog' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member 'gro_flush_timeout' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member 'nf_hooks_ingress' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member '____cacheline_aligned_in_smp' not described in 'net_device' include/linux/netdevice.h:1997: warning: Function parameter or member 'qdisc_hash' not described in 'net_device' include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(advertising' not described in 'phylink_link_state' include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(lp_advertising' not described in 'phylink_link_state' sound/soc/soc-core.c:2787: warning: Excess function parameter 'legacy_dai_naming' description in 'snd_soc_register_dais' include/linux/rcupdate.h:571: ERROR: Unexpected indentation. include/linux/rcupdate.h:575: ERROR: Unexpected indentation. include/linux/rcupdate.h:579: WARNING: Block quote ends without a blank line; unexpected unindent. include/linux/rcupdate.h:581: WARNING: Block quote ends without a blank line; unexpected unindent. include/linux/rcupdate.h:581: WARNING: Inline literal start-string without end-string. lib/reed_solomon/reed_solomon.c:287: ERROR: Unknown target name: "gfp". include/linux/wait.h:110: WARNING: Block quote ends without a blank line; unexpected unindent. include/linux/wait.h:113: ERROR: Unexpected indentation. include/linux/wait.h:115: WARNING: Block quote ends without a blank line; unexpected unindent. kernel/time/hrtimer.c:1129: WARNING: Block quote ends without a blank line; unexpected unindent. kernel/signal.c:327: WARNING: Inline literal start-string without end-string. drivers/video/fbdev/core/modedb.c:647: WARNING: Inline strong start-string without end-string. drivers/video/fbdev/core/modedb.c:647: WARNING: Inline strong start-string without end-string. drivers/video/fbdev/core/modedb.c:647: WARNING: Inline strong start-string without end-string. drivers/video/fbdev/core/modedb.c:647: WARNING: Inline strong start-string without end-string. drivers/ata/libata-core.c:5946: ERROR: Unknown target name: "hw". drivers/message/fusion/mptbase.c:5054: WARNING: Definition list ends without a blank line; unexpected unindent. drivers/tty/serial/serial_core.c:1892: WARNING: Definition list ends without a blank line; unexpected unindent. include/linux/mtd/rawnand.h:1446: WARNING: Inline strong start-string without end-string. include/linux/mtd/rawnand.h:1448: WARNING: Inline strong start-string without end-string. include/linux/regulator/driver.h:279: ERROR: Unknown target name: "regulator_regmap_x_voltage". Documentation/driver-api/soundwire/locking.rst:50: ERROR: Inconsistent literal block quoting. Documentation/driver-api/soundwire/locking.rst:51: WARNING: Line block ends without a blank line. Documentation/driver-api/soundwire/locking.rst:55: WARNING: Inline substitution_reference start-string without end-string. Documentation/driver-api/soundwire/locking.rst:56: WARNING: Line block ends without a blank line. Documentation/driver-api/soundwire/stream.rst:177: WARNING: Explicit markup ends without a blank line; unexpected unindent. Documentation/driver-api/soundwire/stream.rst:203: WARNING: Explicit markup ends without a blank line; unexpected unindent. Documentation/driver-api/soundwire/stream.rst:248: WARNING: Explicit markup ends without a blank line; unexpected unindent. Documentation/driver-api/soundwire/stream.rst:277: WARNING: Explicit markup ends without a blank line; unexpected unindent. Documentation/driver-api/soundwire/stream.rst:304: WARNING: Explicit markup ends without a blank line; unexpected unindent. Documentation/driver-api/soundwire/stream.rst:328: WARNING: Explicit markup ends without a blank line; unexpected unindent. Documentation/driver-api/soundwire/stream.rst:352: WARNING: Explicit markup ends without a blank line; unexpected unindent. Documentation/driver-api/soundwire/stream.rst:364: WARNING: Explicit markup ends without a blank line; unexpected unindent. include/linux/spi/spi.h:373: ERROR: Unexpected indentation. block/bio.c:958: WARNING: Inline emphasis start-string without end-string. Documentation/gpu/drivers.rst:5: WARNING: toctree contains reference to nonexisting document u'gpu/v3d' Documentation/misc-devices/ibmvmc.rst:2: WARNING: Explicit markup ends without a blank line; unexpected unindent. net/core/dev.c:4650: ERROR: Unknown target name: "page_is". Documentation/networking/net_failover.rst:48: WARNING: Definition list ends without a blank line; unexpected unindent. Documentation/networking/net_failover.rst:50: ERROR: Unexpected indentation. vim +2750 drivers/gpu/drm/drm_fb_helper.c 2735 2736 /** 2737 * drm_fb_helper_resume_hotplug - Allow fb_helper to handle probing new 2738 * connectors again, and bring it up to date 2739 * with the current device's connector status 2740 * fb_helper: driver-allocated fbdev helper, can be NULL 2741 * 2742 * Allow fb_helper to react to connector status changes again after having 2743 * been disabled through drm_fb_helper_suspend_hotplug(). This also schedules 2744 * a fb_helper hotplug event to bring fb_helper up to date with the current 2745 * status of the DRM device's connectors. 2746 */ 2747 void 2748 drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper) 2749 { > 2750 fb_helper->hotplug_suspended = false; 2751 drm_fb_helper_hotplug_event(fb_helper); 2752 } 2753 EXPORT_SYMBOL(drm_fb_helper_resume_hotplug); 2754 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Mon, Jul 30, 2018 at 08:39:48PM -0400, Lyude Paul wrote: > I'm sure I don't need to tell you that fb_helper's locking is a mess. > That being said; fb_helper's locking mess can seriously complicate the > runtime suspend/resume operations of drivers because it can invoke > atomic commits and connector probing from anywhere that calls > drm_fb_helper_hotplug_event(). Since most drivers use > drm_fb_helper_output_poll_changed() as their output_poll_changed > handler, this can happen in every single context that can fire off a > hotplug event. An example: Uh, I think this ever more looks like a serious rabbit hole between nouveau and rpm. It looks like nouveau tries to use rpm as the outermost lock, surrounding every other drm lock there is. That doesn't really work, and means you'll keep fighting drm core locking forever. All other rpm supporting drivers (i915, and the pile of armsoc ones) push their rpm_get/put dance way down into the low-level callbacks. That then allows you to fix all the trouble with probing, i2x, dp_aux and stuff abitrarily nesting (since rpm itself is refcounted). But I'm not really sure why nouveau is different here. -Daniel > > [ 246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds. > [ 246.673398] Not tainted 4.18.0-rc5Lyude-Test+ #2 > [ 246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 246.676527] kworker/4:0 D 0 37 2 0x80000000 > [ 246.677580] Workqueue: events output_poll_execute [drm_kms_helper] > [ 246.678704] Call Trace: > [ 246.679753] __schedule+0x322/0xaf0 > [ 246.680916] schedule+0x33/0x90 > [ 246.681924] schedule_preempt_disabled+0x15/0x20 > [ 246.683023] __mutex_lock+0x569/0x9a0 > [ 246.684035] ? kobject_uevent_env+0x117/0x7b0 > [ 246.685132] ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] > [ 246.686179] mutex_lock_nested+0x1b/0x20 > [ 246.687278] ? mutex_lock_nested+0x1b/0x20 > [ 246.688307] drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] > [ 246.689420] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] > [ 246.690462] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] > [ 246.691570] output_poll_execute+0x198/0x1c0 [drm_kms_helper] > [ 246.692611] process_one_work+0x231/0x620 > [ 246.693725] worker_thread+0x214/0x3a0 > [ 246.694756] kthread+0x12b/0x150 > [ 246.695856] ? wq_pool_ids_show+0x140/0x140 > [ 246.696888] ? kthread_create_worker_on_cpu+0x70/0x70 > [ 246.697998] ret_from_fork+0x3a/0x50 > [ 246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds. > [ 246.700153] Not tainted 4.18.0-rc5Lyude-Test+ #2 > [ 246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 246.702278] kworker/0:1 D 0 60 2 0x80000000 > [ 246.703293] Workqueue: pm pm_runtime_work > [ 246.704393] Call Trace: > [ 246.705403] __schedule+0x322/0xaf0 > [ 246.706439] ? wait_for_completion+0x104/0x190 > [ 246.707393] schedule+0x33/0x90 > [ 246.708375] schedule_timeout+0x3a5/0x590 > [ 246.709289] ? mark_held_locks+0x58/0x80 > [ 246.710208] ? _raw_spin_unlock_irq+0x2c/0x40 > [ 246.711222] ? wait_for_completion+0x104/0x190 > [ 246.712134] ? trace_hardirqs_on_caller+0xf4/0x190 > [ 246.713094] ? wait_for_completion+0x104/0x190 > [ 246.713964] wait_for_completion+0x12c/0x190 > [ 246.714895] ? wake_up_q+0x80/0x80 > [ 246.715727] ? get_work_pool+0x90/0x90 > [ 246.716649] flush_work+0x1c9/0x280 > [ 246.717483] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 > [ 246.718442] __cancel_work_timer+0x146/0x1d0 > [ 246.719247] cancel_delayed_work_sync+0x13/0x20 > [ 246.720043] drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper] > [ 246.721123] nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau] > [ 246.721897] pci_pm_runtime_suspend+0x6b/0x190 > [ 246.722825] ? pci_has_legacy_pm_support+0x70/0x70 > [ 246.723737] __rpm_callback+0x7a/0x1d0 > [ 246.724721] ? pci_has_legacy_pm_support+0x70/0x70 > [ 246.725607] rpm_callback+0x24/0x80 > [ 246.726553] ? pci_has_legacy_pm_support+0x70/0x70 > [ 246.727376] rpm_suspend+0x142/0x6b0 > [ 246.728185] pm_runtime_work+0x97/0xc0 > [ 246.728938] process_one_work+0x231/0x620 > [ 246.729796] worker_thread+0x44/0x3a0 > [ 246.730614] kthread+0x12b/0x150 > [ 246.731395] ? wq_pool_ids_show+0x140/0x140 > [ 246.732202] ? kthread_create_worker_on_cpu+0x70/0x70 > [ 246.732878] ret_from_fork+0x3a/0x50 > [ 246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds. > [ 246.734587] Not tainted 4.18.0-rc5Lyude-Test+ #2 > [ 246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 246.736113] kworker/4:2 D 0 422 2 0x80000080 > [ 246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper] > [ 246.737665] Call Trace: > [ 246.738490] __schedule+0x322/0xaf0 > [ 246.739250] schedule+0x33/0x90 > [ 246.739908] rpm_resume+0x19c/0x850 > [ 246.740750] ? finish_wait+0x90/0x90 > [ 246.741541] __pm_runtime_resume+0x4e/0x90 > [ 246.742370] nv50_disp_atomic_commit+0x31/0x210 [nouveau] > [ 246.743124] drm_atomic_commit+0x4a/0x50 [drm] > [ 246.743775] restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper] > [ 246.744603] restore_fbdev_mode+0x31/0x140 [drm_kms_helper] > [ 246.745373] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper] > [ 246.746220] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper] > [ 246.746884] drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper] > [ 246.747675] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] > [ 246.748544] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] > [ 246.749439] nv50_mstm_hotplug+0x15/0x20 [nouveau] > [ 246.750111] drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper] > [ 246.750764] drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper] > [ 246.751602] drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper] > [ 246.752314] process_one_work+0x231/0x620 > [ 246.752979] worker_thread+0x44/0x3a0 > [ 246.753838] kthread+0x12b/0x150 > [ 246.754619] ? wq_pool_ids_show+0x140/0x140 > [ 246.755386] ? kthread_create_worker_on_cpu+0x70/0x70 > [ 246.756162] ret_from_fork+0x3a/0x50 > [ 246.756847] > Showing all locks held in the system: > [ 246.758261] 3 locks held by kworker/4:0/37: > [ 246.759016] #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 > [ 246.759856] #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620 > [ 246.760670] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] > [ 246.761516] 2 locks held by kworker/0:1/60: > [ 246.762274] #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 > [ 246.762982] #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 > [ 246.763890] 1 lock held by khungtaskd/64: > [ 246.764664] #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 > [ 246.765588] 5 locks held by kworker/4:2/422: > [ 246.766440] #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620 > [ 246.767390] #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620 > [ 246.768154] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper] > [ 246.768966] #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper] > [ 246.769921] #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm] > [ 246.770839] 1 lock held by dmesg/1038: > [ 246.771739] 2 locks held by zsh/1172: > [ 246.772650] #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 > [ 246.773680] #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 > > [ 246.775522] ============================================= > > Because of this, there's an unreasonable number of places that drm > drivers would need to insert special handling to prevent trying to > resume the device from all of these contexts that can deadlock. It's > difficult even to try synchronizing with fb_helper in these contexts as > well, since any of them could introduce a deadlock by waiting to acquire > the top-level fb_helper mutex, while it's being held by another thread > that might potentially call down to pm_runtime_get_sync(). > > Luckily-there's no actual reason we need to allow fb_helper to handle > hotplugging at all when runtime suspending a device. If a hotplug > happens during a runtime suspend operation, there's no reason the driver > can't just re-enable fbcon's hotplug handling and bring it up to speed > with hotplugging events it may have missed by calling > drm_fb_helper_hotplug_event(). > > So, let's make this easy and just add helpers to handle disabling and > enabling fb_helper connector probing() without having to potentially > wait on fb_helper to finish it's work. This will let us fix the runtime > suspend/resume deadlocks that we've been experiencing with nouveau, > along with being able to fix some of the incorrect runtime PM core > interaction that other DRM drivers currently perform to work around > these issues. > > Signed-off-by: Lyude Paul <lyude@redhat.com> > Cc: stable@vger.kernel.org > Cc: Lukas Wunner <lukas@wunner.de> > Cc: Karol Herbst <karolherbst@gmail.com> > --- > drivers/gpu/drm/drm_fb_helper.c | 65 +++++++++++++++++++++++++++++++++ > include/drm/drm_fb_helper.h | 20 ++++++++++ > 2 files changed, 85 insertions(+) > > diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c > index 2ee1eaa66188..28d59befbc92 100644 > --- a/drivers/gpu/drm/drm_fb_helper.c > +++ b/drivers/gpu/drm/drm_fb_helper.c > @@ -2733,6 +2733,66 @@ int drm_fb_helper_initial_config(struct drm_fb_helper *fb_helper, int bpp_sel) > } > EXPORT_SYMBOL(drm_fb_helper_initial_config); > > +/** > + * drm_fb_helper_resume_hotplug - Allow fb_helper to handle probing new > + * connectors again, and bring it up to date > + * with the current device's connector status > + * fb_helper: driver-allocated fbdev helper, can be NULL > + * > + * Allow fb_helper to react to connector status changes again after having > + * been disabled through drm_fb_helper_suspend_hotplug(). This also schedules > + * a fb_helper hotplug event to bring fb_helper up to date with the current > + * status of the DRM device's connectors. > + */ > +void > +drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper) > +{ > + fb_helper->hotplug_suspended = false; > + drm_fb_helper_hotplug_event(fb_helper); > +} > +EXPORT_SYMBOL(drm_fb_helper_resume_hotplug); > + > +/** > + * drm_fb_helper_suspend_hotplug - Attempt to temporarily inhibit fb_helper's > + * ability to respond to connector changes > + * without waiting > + * @fb_helper: driver-allocated fbdev helper, can be NULL > + * > + * Temporarily prevent fb_helper from being able to handle connector changes, > + * but only if it isn't busy doing something else. The connector probing > + * routines in fb_helper can potentially grab any modesetting lock imaginable, > + * which dramatically complicates the runtime suspend process. This helper can > + * be used to simplify the process of runtime suspend by allowing the driver > + * to disable unpredictable fb_helper operations as early as possible, without > + * requiring that we try waiting on fb_helper (as this could lead to very > + * difficult to solve deadlocks with runtime suspend code if fb_helper ends up > + * needing to acquire a runtime PM reference). > + * > + * This call should be put at the very start of a driver's runtime suspend > + * operation if desired. The driver must be responsible for re-enabling > + * fb_helper hotplug handling when normal hotplug detection becomes available > + * on the device again. This will usually happen if runtime suspend is > + * aborted, or when the device is runtime resumed. > + * > + * Returns: true if hotplug handling was disabled, false if disabling hotplug > + * handling would mean waiting on fb_helper. > + */ > +bool > +drm_fb_helper_suspend_hotplug(struct drm_fb_helper *fb_helper) > +{ > + int ret; > + > + ret = mutex_trylock(&fb_helper->lock); > + if (!ret) > + return false; > + > + fb_helper->hotplug_suspended = true; > + mutex_unlock(&fb_helper->lock); > + > + return true; > +} > +EXPORT_SYMBOL(drm_fb_helper_suspend_hotplug); > + > /** > * drm_fb_helper_hotplug_event - respond to a hotplug notification by > * probing all the outputs attached to the fb > @@ -2774,6 +2834,11 @@ int drm_fb_helper_hotplug_event(struct drm_fb_helper *fb_helper) > return err; > } > > + if (fb_helper->hotplug_suspended) { > + mutex_unlock(&fb_helper->lock); > + return err; > + } > + > DRM_DEBUG_KMS("\n"); > > drm_setup_crtcs(fb_helper, fb_helper->fb->width, fb_helper->fb->height); > diff --git a/include/drm/drm_fb_helper.h b/include/drm/drm_fb_helper.h > index b069433e7fc1..30881131075c 100644 > --- a/include/drm/drm_fb_helper.h > +++ b/include/drm/drm_fb_helper.h > @@ -232,6 +232,14 @@ struct drm_fb_helper { > * See also: @deferred_setup > */ > int preferred_bpp; > + > + /** > + * @hotplug_suspended: > + * > + * Whether or not we can currently handle hotplug events, or if we > + * need to wait for the DRM device to uninhibit us. > + */ > + bool hotplug_suspended; > }; > > /** > @@ -330,6 +338,10 @@ void drm_fb_helper_fbdev_teardown(struct drm_device *dev); > > void drm_fb_helper_lastclose(struct drm_device *dev); > void drm_fb_helper_output_poll_changed(struct drm_device *dev); > + > +void drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper); > +bool drm_fb_helper_suspend_hotplug(struct drm_fb_helper *fb_helper); > + > #else > static inline void drm_fb_helper_prepare(struct drm_device *dev, > struct drm_fb_helper *helper, > @@ -564,6 +576,14 @@ static inline void drm_fb_helper_output_poll_changed(struct drm_device *dev) > { > } > > +static inline void > +drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper) > +{ > +} > +static inline bool > +drm_fb_helper_suspend_hotplug(struct drm_fb_helper *fb_helper) > +{ > +} > #endif > > static inline int > -- > 2.17.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Mon, 2018-08-06 at 10:43 +0200, Daniel Vetter wrote: > On Mon, Jul 30, 2018 at 08:39:48PM -0400, Lyude Paul wrote: > > I'm sure I don't need to tell you that fb_helper's locking is a mess. > > That being said; fb_helper's locking mess can seriously complicate the > > runtime suspend/resume operations of drivers because it can invoke > > atomic commits and connector probing from anywhere that calls > > drm_fb_helper_hotplug_event(). Since most drivers use > > drm_fb_helper_output_poll_changed() as their output_poll_changed > > handler, this can happen in every single context that can fire off a > > hotplug event. An example: > > Uh, I think this ever more looks like a serious rabbit hole between > nouveau and rpm. It looks like nouveau tries to use rpm as the outermost > lock, surrounding every other drm lock there is. That doesn't really work, > and means you'll keep fighting drm core locking forever. > > All other rpm supporting drivers (i915, and the pile of armsoc ones) push > their rpm_get/put dance way down into the low-level callbacks. That then > allows you to fix all the trouble with probing, i2x, dp_aux and stuff > abitrarily nesting (since rpm itself is refcounted). > > But I'm not really sure why nouveau is different here. A quick note: this isn't a problem unique to nouveau; I already have (apparently incorrect according to pm runtime maintainers) hacks that fix these in amdgpu that I'm planning on reapproaching with a proper fix. That being said: i915 is also more of a special case compared to other drivers, mainly because i915 does still have to deal with all these problems but has it's own handlers that simplify dealing with it. For runtime resume, you guys call down to disable_rpm_wakeref_asserts() during a portion of the runtime suspend process to avoid the potential of something trying to grab a runtime power reference and deadlocking the process, and additionally do the same thing in your irq handler in i915_irq.c: /* IRQs are synced during runtime_suspend, we don't require a wakeref */ disable_rpm_wakeref_asserts(dev_priv); This essentially translates to the same thing that we're doing here with nouveau: preventing any calls that would potentially call down to pm_runtime_resume() so that the runtime suspend handler won't deadlock while trying to disable HPD irqs. Likewise-this also ensures that you can call intel_pm_runtime_get()/intel_pm_runtime_put() much lower on the stack without having to worry about the potential for deadlocks. While I haven't addressed it in this patchset yet, I pretty much want to do the same thing for handling nouveau's current issues with i2c and dp aux: grab runtime power references deeper, and avoid resuming the device in runtime suspend/resume callbacks. You did mention in the review of one of my other patches that we should avoid disabling polling during runtime suspend, and you're definitely right. I feel a bit silly for not remembering that since I was the one who made it so that i915 does polling in runtime suspend for chips without RPM HPD detection in the first place because it was causing people's displays not to come up on vlv... Anyway: I think if we just leave output polling enabled during runtime suspend that might actually fix all of the fb_helper locking issues since we won't need to wait on any of the output poll workers to finish, at least I think it should: I'll confirm this when I get into the office > -Daniel > > > > > [ 246.669625] INFO: task kworker/4:0:37 blocked for more than 120 > > seconds. > > [ 246.673398] Not tainted 4.18.0-rc5Lyude-Test+ #2 > > [ 246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > > this message. > > [ 246.676527] kworker/4:0 D 0 37 2 0x80000000 > > [ 246.677580] Workqueue: events output_poll_execute [drm_kms_helper] > > [ 246.678704] Call Trace: > > [ 246.679753] __schedule+0x322/0xaf0 > > [ 246.680916] schedule+0x33/0x90 > > [ 246.681924] schedule_preempt_disabled+0x15/0x20 > > [ 246.683023] __mutex_lock+0x569/0x9a0 > > [ 246.684035] ? kobject_uevent_env+0x117/0x7b0 > > [ 246.685132] ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 > > [drm_kms_helper] > > [ 246.686179] mutex_lock_nested+0x1b/0x20 > > [ 246.687278] ? mutex_lock_nested+0x1b/0x20 > > [ 246.688307] drm_fb_helper_hotplug_event.part.28+0x20/0xb0 > > [drm_kms_helper] > > [ 246.689420] drm_fb_helper_output_poll_changed+0x23/0x30 > > [drm_kms_helper] > > [ 246.690462] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] > > [ 246.691570] output_poll_execute+0x198/0x1c0 [drm_kms_helper] > > [ 246.692611] process_one_work+0x231/0x620 > > [ 246.693725] worker_thread+0x214/0x3a0 > > [ 246.694756] kthread+0x12b/0x150 > > [ 246.695856] ? wq_pool_ids_show+0x140/0x140 > > [ 246.696888] ? kthread_create_worker_on_cpu+0x70/0x70 > > [ 246.697998] ret_from_fork+0x3a/0x50 > > [ 246.699034] INFO: task kworker/0:1:60 blocked for more than 120 > > seconds. > > [ 246.700153] Not tainted 4.18.0-rc5Lyude-Test+ #2 > > [ 246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > > this message. > > [ 246.702278] kworker/0:1 D 0 60 2 0x80000000 > > [ 246.703293] Workqueue: pm pm_runtime_work > > [ 246.704393] Call Trace: > > [ 246.705403] __schedule+0x322/0xaf0 > > [ 246.706439] ? wait_for_completion+0x104/0x190 > > [ 246.707393] schedule+0x33/0x90 > > [ 246.708375] schedule_timeout+0x3a5/0x590 > > [ 246.709289] ? mark_held_locks+0x58/0x80 > > [ 246.710208] ? _raw_spin_unlock_irq+0x2c/0x40 > > [ 246.711222] ? wait_for_completion+0x104/0x190 > > [ 246.712134] ? trace_hardirqs_on_caller+0xf4/0x190 > > [ 246.713094] ? wait_for_completion+0x104/0x190 > > [ 246.713964] wait_for_completion+0x12c/0x190 > > [ 246.714895] ? wake_up_q+0x80/0x80 > > [ 246.715727] ? get_work_pool+0x90/0x90 > > [ 246.716649] flush_work+0x1c9/0x280 > > [ 246.717483] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 > > [ 246.718442] __cancel_work_timer+0x146/0x1d0 > > [ 246.719247] cancel_delayed_work_sync+0x13/0x20 > > [ 246.720043] drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper] > > [ 246.721123] nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau] > > [ 246.721897] pci_pm_runtime_suspend+0x6b/0x190 > > [ 246.722825] ? pci_has_legacy_pm_support+0x70/0x70 > > [ 246.723737] __rpm_callback+0x7a/0x1d0 > > [ 246.724721] ? pci_has_legacy_pm_support+0x70/0x70 > > [ 246.725607] rpm_callback+0x24/0x80 > > [ 246.726553] ? pci_has_legacy_pm_support+0x70/0x70 > > [ 246.727376] rpm_suspend+0x142/0x6b0 > > [ 246.728185] pm_runtime_work+0x97/0xc0 > > [ 246.728938] process_one_work+0x231/0x620 > > [ 246.729796] worker_thread+0x44/0x3a0 > > [ 246.730614] kthread+0x12b/0x150 > > [ 246.731395] ? wq_pool_ids_show+0x140/0x140 > > [ 246.732202] ? kthread_create_worker_on_cpu+0x70/0x70 > > [ 246.732878] ret_from_fork+0x3a/0x50 > > [ 246.733768] INFO: task kworker/4:2:422 blocked for more than 120 > > seconds. > > [ 246.734587] Not tainted 4.18.0-rc5Lyude-Test+ #2 > > [ 246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > > this message. > > [ 246.736113] kworker/4:2 D 0 422 2 0x80000080 > > [ 246.736789] Workqueue: events_long drm_dp_mst_link_probe_work > > [drm_kms_helper] > > [ 246.737665] Call Trace: > > [ 246.738490] __schedule+0x322/0xaf0 > > [ 246.739250] schedule+0x33/0x90 > > [ 246.739908] rpm_resume+0x19c/0x850 > > [ 246.740750] ? finish_wait+0x90/0x90 > > [ 246.741541] __pm_runtime_resume+0x4e/0x90 > > [ 246.742370] nv50_disp_atomic_commit+0x31/0x210 [nouveau] > > [ 246.743124] drm_atomic_commit+0x4a/0x50 [drm] > > [ 246.743775] restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper] > > [ 246.744603] restore_fbdev_mode+0x31/0x140 [drm_kms_helper] > > [ 246.745373] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 > > [drm_kms_helper] > > [ 246.746220] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper] > > [ 246.746884] drm_fb_helper_hotplug_event.part.28+0x96/0xb0 > > [drm_kms_helper] > > [ 246.747675] drm_fb_helper_output_poll_changed+0x23/0x30 > > [drm_kms_helper] > > [ 246.748544] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] > > [ 246.749439] nv50_mstm_hotplug+0x15/0x20 [nouveau] > > [ 246.750111] drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper] > > [ 246.750764] drm_dp_check_and_send_link_address+0xa8/0xd0 > > [drm_kms_helper] > > [ 246.751602] drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper] > > [ 246.752314] process_one_work+0x231/0x620 > > [ 246.752979] worker_thread+0x44/0x3a0 > > [ 246.753838] kthread+0x12b/0x150 > > [ 246.754619] ? wq_pool_ids_show+0x140/0x140 > > [ 246.755386] ? kthread_create_worker_on_cpu+0x70/0x70 > > [ 246.756162] ret_from_fork+0x3a/0x50 > > [ 246.756847] > > Showing all locks held in the system: > > [ 246.758261] 3 locks held by kworker/4:0/37: > > [ 246.759016] #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: > > process_one_work+0x1b3/0x620 > > [ 246.759856] #1: 00000000e6065461 ((work_completion)(&(&dev- > > >mode_config.output_poll_work)->work)){+.+.}, at: > > process_one_work+0x1b3/0x620 > > [ 246.760670] #2: 00000000cb66735f (&helper->lock){+.+.}, at: > > drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] > > [ 246.761516] 2 locks held by kworker/0:1/60: > > [ 246.762274] #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: > > process_one_work+0x1b3/0x620 > > [ 246.762982] #1: 000000005ab44fb4 ((work_completion)(&dev- > > >power.work)){+.+.}, at: process_one_work+0x1b3/0x620 > > [ 246.763890] 1 lock held by khungtaskd/64: > > [ 246.764664] #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: > > debug_show_all_locks+0x23/0x185 > > [ 246.765588] 5 locks held by kworker/4:2/422: > > [ 246.766440] #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, > > at: process_one_work+0x1b3/0x620 > > [ 246.767390] #1: 00000000bb59b134 ((work_completion)(&mgr- > > >work)){+.+.}, at: process_one_work+0x1b3/0x620 > > [ 246.768154] #2: 00000000cb66735f (&helper->lock){+.+.}, at: > > drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper] > > [ 246.768966] #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: > > restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper] > > [ 246.769921] #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: > > drm_modeset_backoff+0x8a/0x1b0 [drm] > > [ 246.770839] 1 lock held by dmesg/1038: > > [ 246.771739] 2 locks held by zsh/1172: > > [ 246.772650] #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: > > ldsem_down_read+0x37/0x40 > > [ 246.773680] #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: > > n_tty_read+0xc1/0x870 > > > > [ 246.775522] ============================================= > > > > Because of this, there's an unreasonable number of places that drm > > drivers would need to insert special handling to prevent trying to > > resume the device from all of these contexts that can deadlock. It's > > difficult even to try synchronizing with fb_helper in these contexts as > > well, since any of them could introduce a deadlock by waiting to acquire > > the top-level fb_helper mutex, while it's being held by another thread > > that might potentially call down to pm_runtime_get_sync(). > > > > Luckily-there's no actual reason we need to allow fb_helper to handle > > hotplugging at all when runtime suspending a device. If a hotplug > > happens during a runtime suspend operation, there's no reason the driver > > can't just re-enable fbcon's hotplug handling and bring it up to speed > > with hotplugging events it may have missed by calling > > drm_fb_helper_hotplug_event(). > > > > So, let's make this easy and just add helpers to handle disabling and > > enabling fb_helper connector probing() without having to potentially > > wait on fb_helper to finish it's work. This will let us fix the runtime > > suspend/resume deadlocks that we've been experiencing with nouveau, > > along with being able to fix some of the incorrect runtime PM core > > interaction that other DRM drivers currently perform to work around > > these issues. > > > > Signed-off-by: Lyude Paul <lyude@redhat.com> > > Cc: stable@vger.kernel.org > > Cc: Lukas Wunner <lukas@wunner.de> > > Cc: Karol Herbst <karolherbst@gmail.com> > > --- > > drivers/gpu/drm/drm_fb_helper.c | 65 +++++++++++++++++++++++++++++++++ > > include/drm/drm_fb_helper.h | 20 ++++++++++ > > 2 files changed, 85 insertions(+) > > > > diff --git a/drivers/gpu/drm/drm_fb_helper.c > > b/drivers/gpu/drm/drm_fb_helper.c > > index 2ee1eaa66188..28d59befbc92 100644 > > --- a/drivers/gpu/drm/drm_fb_helper.c > > +++ b/drivers/gpu/drm/drm_fb_helper.c > > @@ -2733,6 +2733,66 @@ int drm_fb_helper_initial_config(struct > > drm_fb_helper *fb_helper, int bpp_sel) > > } > > EXPORT_SYMBOL(drm_fb_helper_initial_config); > > > > +/** > > + * drm_fb_helper_resume_hotplug - Allow fb_helper to handle probing new > > + * connectors again, and bring it up to > > date > > + * with the current device's connector > > status > > + * fb_helper: driver-allocated fbdev helper, can be NULL > > + * > > + * Allow fb_helper to react to connector status changes again after > > having > > + * been disabled through drm_fb_helper_suspend_hotplug(). This also > > schedules > > + * a fb_helper hotplug event to bring fb_helper up to date with the > > current > > + * status of the DRM device's connectors. > > + */ > > +void > > +drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper) > > +{ > > + fb_helper->hotplug_suspended = false; > > + drm_fb_helper_hotplug_event(fb_helper); > > +} > > +EXPORT_SYMBOL(drm_fb_helper_resume_hotplug); > > + > > +/** > > + * drm_fb_helper_suspend_hotplug - Attempt to temporarily inhibit > > fb_helper's > > + * ability to respond to connector > > changes > > + * without waiting > > + * @fb_helper: driver-allocated fbdev helper, can be NULL > > + * > > + * Temporarily prevent fb_helper from being able to handle connector > > changes, > > + * but only if it isn't busy doing something else. The connector probing > > + * routines in fb_helper can potentially grab any modesetting lock > > imaginable, > > + * which dramatically complicates the runtime suspend process. This > > helper can > > + * be used to simplify the process of runtime suspend by allowing the > > driver > > + * to disable unpredictable fb_helper operations as early as possible, > > without > > + * requiring that we try waiting on fb_helper (as this could lead to very > > + * difficult to solve deadlocks with runtime suspend code if fb_helper > > ends up > > + * needing to acquire a runtime PM reference). > > + * > > + * This call should be put at the very start of a driver's runtime > > suspend > > + * operation if desired. The driver must be responsible for re-enabling > > + * fb_helper hotplug handling when normal hotplug detection becomes > > available > > + * on the device again. This will usually happen if runtime suspend is > > + * aborted, or when the device is runtime resumed. > > + * > > + * Returns: true if hotplug handling was disabled, false if disabling > > hotplug > > + * handling would mean waiting on fb_helper. > > + */ > > +bool > > +drm_fb_helper_suspend_hotplug(struct drm_fb_helper *fb_helper) > > +{ > > + int ret; > > + > > + ret = mutex_trylock(&fb_helper->lock); > > + if (!ret) > > + return false; > > + > > + fb_helper->hotplug_suspended = true; > > + mutex_unlock(&fb_helper->lock); > > + > > + return true; > > +} > > +EXPORT_SYMBOL(drm_fb_helper_suspend_hotplug); > > + > > /** > > * drm_fb_helper_hotplug_event - respond to a hotplug notification by > > * probing all the outputs attached to the > > fb > > @@ -2774,6 +2834,11 @@ int drm_fb_helper_hotplug_event(struct > > drm_fb_helper *fb_helper) > > return err; > > } > > > > + if (fb_helper->hotplug_suspended) { > > + mutex_unlock(&fb_helper->lock); > > + return err; > > + } > > + > > DRM_DEBUG_KMS("\n"); > > > > drm_setup_crtcs(fb_helper, fb_helper->fb->width, fb_helper->fb- > > >height); > > diff --git a/include/drm/drm_fb_helper.h b/include/drm/drm_fb_helper.h > > index b069433e7fc1..30881131075c 100644 > > --- a/include/drm/drm_fb_helper.h > > +++ b/include/drm/drm_fb_helper.h > > @@ -232,6 +232,14 @@ struct drm_fb_helper { > > * See also: @deferred_setup > > */ > > int preferred_bpp; > > + > > + /** > > + * @hotplug_suspended: > > + * > > + * Whether or not we can currently handle hotplug events, or if we > > + * need to wait for the DRM device to uninhibit us. > > + */ > > + bool hotplug_suspended; > > }; > > > > /** > > @@ -330,6 +338,10 @@ void drm_fb_helper_fbdev_teardown(struct drm_device > > *dev); > > > > void drm_fb_helper_lastclose(struct drm_device *dev); > > void drm_fb_helper_output_poll_changed(struct drm_device *dev); > > + > > +void drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper); > > +bool drm_fb_helper_suspend_hotplug(struct drm_fb_helper *fb_helper); > > + > > #else > > static inline void drm_fb_helper_prepare(struct drm_device *dev, > > struct drm_fb_helper *helper, > > @@ -564,6 +576,14 @@ static inline void > > drm_fb_helper_output_poll_changed(struct drm_device *dev) > > { > > } > > > > +static inline void > > +drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper) > > +{ > > +} > > +static inline bool > > +drm_fb_helper_suspend_hotplug(struct drm_fb_helper *fb_helper) > > +{ > > +} > > #endif > > > > static inline int > > -- > > 2.17.1 > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > >
On Mon, Aug 06, 2018 at 03:15:31PM -0400, Lyude Paul wrote: > You did mention in the review of one of my other patches that we should avoid > disabling polling during runtime suspend, and you're definitely right. I feel > a bit silly for not remembering that since I was the one who made it so that > i915 does polling in runtime suspend for chips without RPM HPD detection in > the first place because it was causing people's displays not to come up on > vlv... > Anyway: I think if we just leave output polling enabled during runtime suspend > that might actually fix all of the fb_helper locking issues since we won't > need to wait on any of the output poll workers to finish, at least I think it > should: I'll confirm this when I get into the office Quoth Imre Deak: "In i915 polling is on during runtime suspend only if there are outputs without hotplug interrupt support. A special case is when an output has working HPD interrupts when in D0, but no interrupts when runtime suspended. For these we start polling (from a scheduled work) in the runtime suspend hook and stop it in the runtime resume hook (again from a scheduled work)." https://lkml.org/lkml/2018/2/12/330 nouveau only uses runtime PM on discrete GPUs in dual GPU laptops. Resuming the GPU from D3cold to D0 every few seconds to poll the outputs would waste too much power on such machines. The question is, why is polling running at all, since all modern laptops have HPD-capable ports such as DP? Thanks, Lukas
On Mon, Aug 06, 2018 at 09:34:57PM +0200, Lukas Wunner wrote: > On Mon, Aug 06, 2018 at 03:15:31PM -0400, Lyude Paul wrote: > > You did mention in the review of one of my other patches that we should avoid > > disabling polling during runtime suspend, and you're definitely right. I feel > > a bit silly for not remembering that since I was the one who made it so that > > i915 does polling in runtime suspend for chips without RPM HPD detection in > > the first place because it was causing people's displays not to come up on > > vlv... > > Anyway: I think if we just leave output polling enabled during runtime suspend > > that might actually fix all of the fb_helper locking issues since we won't > > need to wait on any of the output poll workers to finish, at least I think it > > should: I'll confirm this when I get into the office > > Quoth Imre Deak: > > "In i915 polling is on during runtime suspend only if there are outputs > without hotplug interrupt support. A special case is when an output has > working HPD interrupts when in D0, but no interrupts when runtime > suspended. For these we start polling (from a scheduled work) in the > runtime suspend hook and stop it in the runtime resume hook (again from > a scheduled work)." > https://lkml.org/lkml/2018/2/12/330 > > nouveau only uses runtime PM on discrete GPUs in dual GPU laptops. > Resuming the GPU from D3cold to D0 every few seconds to poll the > outputs would waste too much power on such machines. > > The question is, why is polling running at all, since all modern > laptops have HPD-capable ports such as DP? Note we don't fully sync with the poll worker here, we just update the polling flags and let the poll worker lazily notice the change (and stop itsefl). That avoids the deadlock. It also means that the poll worker must call pm_runtime_get() in all the right places though. On resume we need to kick it again, but that doesn't have a deadlock potential. -Daniel
On Mon, Aug 6, 2018 at 3:34 PM, Lukas Wunner <lukas@wunner.de> wrote: > On Mon, Aug 06, 2018 at 03:15:31PM -0400, Lyude Paul wrote: >> You did mention in the review of one of my other patches that we should avoid >> disabling polling during runtime suspend, and you're definitely right. I feel >> a bit silly for not remembering that since I was the one who made it so that >> i915 does polling in runtime suspend for chips without RPM HPD detection in >> the first place because it was causing people's displays not to come up on >> vlv... >> Anyway: I think if we just leave output polling enabled during runtime suspend >> that might actually fix all of the fb_helper locking issues since we won't >> need to wait on any of the output poll workers to finish, at least I think it >> should: I'll confirm this when I get into the office > > Quoth Imre Deak: > > "In i915 polling is on during runtime suspend only if there are outputs > without hotplug interrupt support. A special case is when an output has > working HPD interrupts when in D0, but no interrupts when runtime > suspended. For these we start polling (from a scheduled work) in the > runtime suspend hook and stop it in the runtime resume hook (again from > a scheduled work)." > https://lkml.org/lkml/2018/2/12/330 > > nouveau only uses runtime PM on discrete GPUs in dual GPU laptops. > Resuming the GPU from D3cold to D0 every few seconds to poll the > outputs would waste too much power on such machines. > > The question is, why is polling running at all, since all modern > laptops have HPD-capable ports such as DP? > At least on AMD GPUs, the GPU has to be powered up for HPD interrupts to work. On hybrid graphics laptops, depending on the OEM configuration, the hotplug events are caught by the ACPI controller on the motherboard when the GPU is powered down and an ACPI event is generated which the driver can listen for and then tell userspace. I'm not sure how nvidia handles this. Also, some display connections (e.g., analog DACs) don't support hotplug interrupts in the first place so the only way to go is polling. Alex
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index 2ee1eaa66188..28d59befbc92 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -2733,6 +2733,66 @@ int drm_fb_helper_initial_config(struct drm_fb_helper *fb_helper, int bpp_sel) } EXPORT_SYMBOL(drm_fb_helper_initial_config); +/** + * drm_fb_helper_resume_hotplug - Allow fb_helper to handle probing new + * connectors again, and bring it up to date + * with the current device's connector status + * fb_helper: driver-allocated fbdev helper, can be NULL + * + * Allow fb_helper to react to connector status changes again after having + * been disabled through drm_fb_helper_suspend_hotplug(). This also schedules + * a fb_helper hotplug event to bring fb_helper up to date with the current + * status of the DRM device's connectors. + */ +void +drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper) +{ + fb_helper->hotplug_suspended = false; + drm_fb_helper_hotplug_event(fb_helper); +} +EXPORT_SYMBOL(drm_fb_helper_resume_hotplug); + +/** + * drm_fb_helper_suspend_hotplug - Attempt to temporarily inhibit fb_helper's + * ability to respond to connector changes + * without waiting + * @fb_helper: driver-allocated fbdev helper, can be NULL + * + * Temporarily prevent fb_helper from being able to handle connector changes, + * but only if it isn't busy doing something else. The connector probing + * routines in fb_helper can potentially grab any modesetting lock imaginable, + * which dramatically complicates the runtime suspend process. This helper can + * be used to simplify the process of runtime suspend by allowing the driver + * to disable unpredictable fb_helper operations as early as possible, without + * requiring that we try waiting on fb_helper (as this could lead to very + * difficult to solve deadlocks with runtime suspend code if fb_helper ends up + * needing to acquire a runtime PM reference). + * + * This call should be put at the very start of a driver's runtime suspend + * operation if desired. The driver must be responsible for re-enabling + * fb_helper hotplug handling when normal hotplug detection becomes available + * on the device again. This will usually happen if runtime suspend is + * aborted, or when the device is runtime resumed. + * + * Returns: true if hotplug handling was disabled, false if disabling hotplug + * handling would mean waiting on fb_helper. + */ +bool +drm_fb_helper_suspend_hotplug(struct drm_fb_helper *fb_helper) +{ + int ret; + + ret = mutex_trylock(&fb_helper->lock); + if (!ret) + return false; + + fb_helper->hotplug_suspended = true; + mutex_unlock(&fb_helper->lock); + + return true; +} +EXPORT_SYMBOL(drm_fb_helper_suspend_hotplug); + /** * drm_fb_helper_hotplug_event - respond to a hotplug notification by * probing all the outputs attached to the fb @@ -2774,6 +2834,11 @@ int drm_fb_helper_hotplug_event(struct drm_fb_helper *fb_helper) return err; } + if (fb_helper->hotplug_suspended) { + mutex_unlock(&fb_helper->lock); + return err; + } + DRM_DEBUG_KMS("\n"); drm_setup_crtcs(fb_helper, fb_helper->fb->width, fb_helper->fb->height); diff --git a/include/drm/drm_fb_helper.h b/include/drm/drm_fb_helper.h index b069433e7fc1..30881131075c 100644 --- a/include/drm/drm_fb_helper.h +++ b/include/drm/drm_fb_helper.h @@ -232,6 +232,14 @@ struct drm_fb_helper { * See also: @deferred_setup */ int preferred_bpp; + + /** + * @hotplug_suspended: + * + * Whether or not we can currently handle hotplug events, or if we + * need to wait for the DRM device to uninhibit us. + */ + bool hotplug_suspended; }; /** @@ -330,6 +338,10 @@ void drm_fb_helper_fbdev_teardown(struct drm_device *dev); void drm_fb_helper_lastclose(struct drm_device *dev); void drm_fb_helper_output_poll_changed(struct drm_device *dev); + +void drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper); +bool drm_fb_helper_suspend_hotplug(struct drm_fb_helper *fb_helper); + #else static inline void drm_fb_helper_prepare(struct drm_device *dev, struct drm_fb_helper *helper, @@ -564,6 +576,14 @@ static inline void drm_fb_helper_output_poll_changed(struct drm_device *dev) { } +static inline void +drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper) +{ +} +static inline bool +drm_fb_helper_suspend_hotplug(struct drm_fb_helper *fb_helper) +{ +} #endif static inline int
I'm sure I don't need to tell you that fb_helper's locking is a mess. That being said; fb_helper's locking mess can seriously complicate the runtime suspend/resume operations of drivers because it can invoke atomic commits and connector probing from anywhere that calls drm_fb_helper_hotplug_event(). Since most drivers use drm_fb_helper_output_poll_changed() as their output_poll_changed handler, this can happen in every single context that can fire off a hotplug event. An example: [ 246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds. [ 246.673398] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.676527] kworker/4:0 D 0 37 2 0x80000000 [ 246.677580] Workqueue: events output_poll_execute [drm_kms_helper] [ 246.678704] Call Trace: [ 246.679753] __schedule+0x322/0xaf0 [ 246.680916] schedule+0x33/0x90 [ 246.681924] schedule_preempt_disabled+0x15/0x20 [ 246.683023] __mutex_lock+0x569/0x9a0 [ 246.684035] ? kobject_uevent_env+0x117/0x7b0 [ 246.685132] ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.686179] mutex_lock_nested+0x1b/0x20 [ 246.687278] ? mutex_lock_nested+0x1b/0x20 [ 246.688307] drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.689420] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.690462] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.691570] output_poll_execute+0x198/0x1c0 [drm_kms_helper] [ 246.692611] process_one_work+0x231/0x620 [ 246.693725] worker_thread+0x214/0x3a0 [ 246.694756] kthread+0x12b/0x150 [ 246.695856] ? wq_pool_ids_show+0x140/0x140 [ 246.696888] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.697998] ret_from_fork+0x3a/0x50 [ 246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds. [ 246.700153] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.702278] kworker/0:1 D 0 60 2 0x80000000 [ 246.703293] Workqueue: pm pm_runtime_work [ 246.704393] Call Trace: [ 246.705403] __schedule+0x322/0xaf0 [ 246.706439] ? wait_for_completion+0x104/0x190 [ 246.707393] schedule+0x33/0x90 [ 246.708375] schedule_timeout+0x3a5/0x590 [ 246.709289] ? mark_held_locks+0x58/0x80 [ 246.710208] ? _raw_spin_unlock_irq+0x2c/0x40 [ 246.711222] ? wait_for_completion+0x104/0x190 [ 246.712134] ? trace_hardirqs_on_caller+0xf4/0x190 [ 246.713094] ? wait_for_completion+0x104/0x190 [ 246.713964] wait_for_completion+0x12c/0x190 [ 246.714895] ? wake_up_q+0x80/0x80 [ 246.715727] ? get_work_pool+0x90/0x90 [ 246.716649] flush_work+0x1c9/0x280 [ 246.717483] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 [ 246.718442] __cancel_work_timer+0x146/0x1d0 [ 246.719247] cancel_delayed_work_sync+0x13/0x20 [ 246.720043] drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper] [ 246.721123] nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau] [ 246.721897] pci_pm_runtime_suspend+0x6b/0x190 [ 246.722825] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.723737] __rpm_callback+0x7a/0x1d0 [ 246.724721] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.725607] rpm_callback+0x24/0x80 [ 246.726553] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.727376] rpm_suspend+0x142/0x6b0 [ 246.728185] pm_runtime_work+0x97/0xc0 [ 246.728938] process_one_work+0x231/0x620 [ 246.729796] worker_thread+0x44/0x3a0 [ 246.730614] kthread+0x12b/0x150 [ 246.731395] ? wq_pool_ids_show+0x140/0x140 [ 246.732202] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.732878] ret_from_fork+0x3a/0x50 [ 246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds. [ 246.734587] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.736113] kworker/4:2 D 0 422 2 0x80000080 [ 246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper] [ 246.737665] Call Trace: [ 246.738490] __schedule+0x322/0xaf0 [ 246.739250] schedule+0x33/0x90 [ 246.739908] rpm_resume+0x19c/0x850 [ 246.740750] ? finish_wait+0x90/0x90 [ 246.741541] __pm_runtime_resume+0x4e/0x90 [ 246.742370] nv50_disp_atomic_commit+0x31/0x210 [nouveau] [ 246.743124] drm_atomic_commit+0x4a/0x50 [drm] [ 246.743775] restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper] [ 246.744603] restore_fbdev_mode+0x31/0x140 [drm_kms_helper] [ 246.745373] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper] [ 246.746220] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper] [ 246.746884] drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper] [ 246.747675] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.748544] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.749439] nv50_mstm_hotplug+0x15/0x20 [nouveau] [ 246.750111] drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper] [ 246.750764] drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper] [ 246.751602] drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper] [ 246.752314] process_one_work+0x231/0x620 [ 246.752979] worker_thread+0x44/0x3a0 [ 246.753838] kthread+0x12b/0x150 [ 246.754619] ? wq_pool_ids_show+0x140/0x140 [ 246.755386] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.756162] ret_from_fork+0x3a/0x50 [ 246.756847] Showing all locks held in the system: [ 246.758261] 3 locks held by kworker/4:0/37: [ 246.759016] #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.759856] #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.760670] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.761516] 2 locks held by kworker/0:1/60: [ 246.762274] #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.762982] #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.763890] 1 lock held by khungtaskd/64: [ 246.764664] #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 [ 246.765588] 5 locks held by kworker/4:2/422: [ 246.766440] #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.767390] #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.768154] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper] [ 246.768966] #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper] [ 246.769921] #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm] [ 246.770839] 1 lock held by dmesg/1038: [ 246.771739] 2 locks held by zsh/1172: [ 246.772650] #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 [ 246.773680] #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 [ 246.775522] ============================================= Because of this, there's an unreasonable number of places that drm drivers would need to insert special handling to prevent trying to resume the device from all of these contexts that can deadlock. It's difficult even to try synchronizing with fb_helper in these contexts as well, since any of them could introduce a deadlock by waiting to acquire the top-level fb_helper mutex, while it's being held by another thread that might potentially call down to pm_runtime_get_sync(). Luckily-there's no actual reason we need to allow fb_helper to handle hotplugging at all when runtime suspending a device. If a hotplug happens during a runtime suspend operation, there's no reason the driver can't just re-enable fbcon's hotplug handling and bring it up to speed with hotplugging events it may have missed by calling drm_fb_helper_hotplug_event(). So, let's make this easy and just add helpers to handle disabling and enabling fb_helper connector probing() without having to potentially wait on fb_helper to finish it's work. This will let us fix the runtime suspend/resume deadlocks that we've been experiencing with nouveau, along with being able to fix some of the incorrect runtime PM core interaction that other DRM drivers currently perform to work around these issues. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Cc: Karol Herbst <karolherbst@gmail.com> --- drivers/gpu/drm/drm_fb_helper.c | 65 +++++++++++++++++++++++++++++++++ include/drm/drm_fb_helper.h | 20 ++++++++++ 2 files changed, 85 insertions(+)