diff mbox series

ALSA: hda: Avoid unsol event during RPM suspending

Message ID 20220328091411.31488-1-mkumard@nvidia.com (mailing list archive)
State Superseded
Headers show
Series ALSA: hda: Avoid unsol event during RPM suspending | expand

Commit Message

Mohan Kumar March 28, 2022, 9:14 a.m. UTC
There is a corner case with unsol event handling during codec runtime
suspending state. When the codec runtime suspend call initiated, the
codec->in_pm atomic variable would be 0, currently the codec runtime
suspend function calls snd_hdac_enter_pm() which will just increments
the codec->in_pm atomic variable. Consider unsol event happened just
after this step and before snd_hdac_leave_pm() in the codec runtime
suspend function. The snd_hdac_power_up_pm() in the unsol event
flow in hdmi_present_sense_via_verbs() function would just increment
the codec->in_pm atomic variable without calling pm_runtime_get_sync
function.

As codec runtime suspend flow is already in progress and in parallel
unsol event is also accessing the codec verbs, as soon as codec
suspend flow completes and clocks are  switched off before completing
the unsol event handling as both functions doesn't wait for each other.
This will result in below errors

[  589.428020] tegra-hda 3510000.hda: azx_get_response timeout, switching
to polling mode: last cmd=0x505f2f57
[  589.428344] tegra-hda 3510000.hda: spurious response 0x80000074:0x5,
last cmd=0x505f2f57
[  589.428547] tegra-hda 3510000.hda: spurious response 0x80000065:0x5,
last cmd=0x505f2f57

To avoid this, the unsol event flow should not perform any codec verb
related operations during RPM_SUSPENDING state.

Signed-off-by: Mohan Kumar <mkumard@nvidia.com>
---
 sound/pci/hda/patch_hdmi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Takashi Iwai March 28, 2022, 9:42 a.m. UTC | #1
On Mon, 28 Mar 2022 11:14:11 +0200,
Mohan Kumar wrote:
> 
> There is a corner case with unsol event handling during codec runtime
> suspending state. When the codec runtime suspend call initiated, the
> codec->in_pm atomic variable would be 0, currently the codec runtime
> suspend function calls snd_hdac_enter_pm() which will just increments
> the codec->in_pm atomic variable. Consider unsol event happened just
> after this step and before snd_hdac_leave_pm() in the codec runtime
> suspend function. The snd_hdac_power_up_pm() in the unsol event
> flow in hdmi_present_sense_via_verbs() function would just increment
> the codec->in_pm atomic variable without calling pm_runtime_get_sync
> function.
> 
> As codec runtime suspend flow is already in progress and in parallel
> unsol event is also accessing the codec verbs, as soon as codec
> suspend flow completes and clocks are  switched off before completing
> the unsol event handling as both functions doesn't wait for each other.
> This will result in below errors
> 
> [  589.428020] tegra-hda 3510000.hda: azx_get_response timeout, switching
> to polling mode: last cmd=0x505f2f57
> [  589.428344] tegra-hda 3510000.hda: spurious response 0x80000074:0x5,
> last cmd=0x505f2f57
> [  589.428547] tegra-hda 3510000.hda: spurious response 0x80000065:0x5,
> last cmd=0x505f2f57
> 
> To avoid this, the unsol event flow should not perform any codec verb
> related operations during RPM_SUSPENDING state.
> 
> Signed-off-by: Mohan Kumar <mkumard@nvidia.com>

Thanks, that's a hairy problem...

The logic sounds good, but can we check the PM state before calling
snd_hda_power_up_pm()?


Takashi
Mohan Kumar March 28, 2022, 10:19 a.m. UTC | #2
On 3/28/2022 3:12 PM, Takashi Iwai wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, 28 Mar 2022 11:14:11 +0200,
> Mohan Kumar wrote:
>> There is a corner case with unsol event handling during codec runtime
>> suspending state. When the codec runtime suspend call initiated, the
>> codec->in_pm atomic variable would be 0, currently the codec runtime
>> suspend function calls snd_hdac_enter_pm() which will just increments
>> the codec->in_pm atomic variable. Consider unsol event happened just
>> after this step and before snd_hdac_leave_pm() in the codec runtime
>> suspend function. The snd_hdac_power_up_pm() in the unsol event
>> flow in hdmi_present_sense_via_verbs() function would just increment
>> the codec->in_pm atomic variable without calling pm_runtime_get_sync
>> function.
>>
>> As codec runtime suspend flow is already in progress and in parallel
>> unsol event is also accessing the codec verbs, as soon as codec
>> suspend flow completes and clocks are  switched off before completing
>> the unsol event handling as both functions doesn't wait for each other.
>> This will result in below errors
>>
>> [  589.428020] tegra-hda 3510000.hda: azx_get_response timeout, switching
>> to polling mode: last cmd=0x505f2f57
>> [  589.428344] tegra-hda 3510000.hda: spurious response 0x80000074:0x5,
>> last cmd=0x505f2f57
>> [  589.428547] tegra-hda 3510000.hda: spurious response 0x80000065:0x5,
>> last cmd=0x505f2f57
>>
>> To avoid this, the unsol event flow should not perform any codec verb
>> related operations during RPM_SUSPENDING state.
>>
>> Signed-off-by: Mohan Kumar <mkumard@nvidia.com>
> Thanks, that's a hairy problem...
>
> The logic sounds good, but can we check the PM state before calling
> snd_hda_power_up_pm()?

If am not wrong, PM apis exposed either provide RPM_ACTIVE or 
RPM_SUSPENDED status. Don't see anything which provides info on 
RPM_SUSPENDING. We might need to exactly know this state to fix this issue.

>
> Takashi
Takashi Iwai March 28, 2022, 10:57 a.m. UTC | #3
On Mon, 28 Mar 2022 12:19:03 +0200,
Mohan Kumar D wrote:
> 
> 
> On 3/28/2022 3:12 PM, Takashi Iwai wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Mon, 28 Mar 2022 11:14:11 +0200,
> > Mohan Kumar wrote:
> >> There is a corner case with unsol event handling during codec runtime
> >> suspending state. When the codec runtime suspend call initiated, the
> >> codec->in_pm atomic variable would be 0, currently the codec runtime
> >> suspend function calls snd_hdac_enter_pm() which will just increments
> >> the codec->in_pm atomic variable. Consider unsol event happened just
> >> after this step and before snd_hdac_leave_pm() in the codec runtime
> >> suspend function. The snd_hdac_power_up_pm() in the unsol event
> >> flow in hdmi_present_sense_via_verbs() function would just increment
> >> the codec->in_pm atomic variable without calling pm_runtime_get_sync
> >> function.
> >>
> >> As codec runtime suspend flow is already in progress and in parallel
> >> unsol event is also accessing the codec verbs, as soon as codec
> >> suspend flow completes and clocks are  switched off before completing
> >> the unsol event handling as both functions doesn't wait for each other.
> >> This will result in below errors
> >>
> >> [  589.428020] tegra-hda 3510000.hda: azx_get_response timeout, switching
> >> to polling mode: last cmd=0x505f2f57
> >> [  589.428344] tegra-hda 3510000.hda: spurious response 0x80000074:0x5,
> >> last cmd=0x505f2f57
> >> [  589.428547] tegra-hda 3510000.hda: spurious response 0x80000065:0x5,
> >> last cmd=0x505f2f57
> >>
> >> To avoid this, the unsol event flow should not perform any codec verb
> >> related operations during RPM_SUSPENDING state.
> >>
> >> Signed-off-by: Mohan Kumar <mkumard@nvidia.com>
> > Thanks, that's a hairy problem...
> >
> > The logic sounds good, but can we check the PM state before calling
> > snd_hda_power_up_pm()?
> 
> If am not wrong, PM apis exposed either provide RPM_ACTIVE or
> RPM_SUSPENDED status. Don't see anything which provides info on
> RPM_SUSPENDING. We might need to exactly know this state to fix this
> issue.

Well, maybe my question wasn't clear.  What I meant was that your
change below

>  	ret = snd_hda_power_up_pm(codec);
> -	if (ret < 0 && pm_runtime_suspended(hda_codec_dev(codec)))
> +	if ((ret < 0 && pm_runtime_suspended(dev)) ||
> +		(dev->power.runtime_status == RPM_SUSPENDING))
>  		goto out;

can be rather like:

> +	if (dev->power.runtime_status == RPM_SUSPENDING)
> +		return;
>  	ret = snd_hda_power_up_pm(codec);
>	if (ret < 0 && pm_runtime_suspended(hda_codec_dev(codec)))

so that it skips unneeded power up/down calls.

Basically the state is set at drivers/base/power/runtime.c
rpm_suspend() just before calling the device's runtime_suspend
callback.  So the state is supposed to be same before and after
snd_hda_power_up_pm() in that case.


thanks,

Takashi
kernel test robot March 28, 2022, 1:07 p.m. UTC | #4
Hi Mohan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tiwai-sound/for-next]
[also build test ERROR on v5.17 next-20220328]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Mohan-Kumar/ALSA-hda-Avoid-unsol-event-during-RPM-suspending/20220328-171517
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-next
config: alpha-buildonly-randconfig-r001-20220327 (https://download.01.org/0day-ci/archive/20220328/202203282137.YsDmfIAj-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/80c4e21f5e97cd4b779806fa5da5bb7392e2874f
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Mohan-Kumar/ALSA-hda-Avoid-unsol-event-during-RPM-suspending/20220328-171517
        git checkout 80c4e21f5e97cd4b779806fa5da5bb7392e2874f
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=alpha SHELL=/bin/bash sound/pci/hda/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   sound/pci/hda/patch_hdmi.c: In function 'hdmi_present_sense_via_verbs':
>> sound/pci/hda/patch_hdmi.c:1644:28: error: 'struct dev_pm_info' has no member named 'runtime_status'
    1644 |                 (dev->power.runtime_status == RPM_SUSPENDING))
         |                            ^


vim +1644 sound/pci/hda/patch_hdmi.c

  1620	
  1621	/* update ELD and jack state via HD-audio verbs */
  1622	static void hdmi_present_sense_via_verbs(struct hdmi_spec_per_pin *per_pin,
  1623						 int repoll)
  1624	{
  1625		struct hda_codec *codec = per_pin->codec;
  1626		struct hdmi_spec *spec = codec->spec;
  1627		struct hdmi_eld *eld = &spec->temp_eld;
  1628		struct device *dev = hda_codec_dev(codec);
  1629		hda_nid_t pin_nid = per_pin->pin_nid;
  1630		int dev_id = per_pin->dev_id;
  1631		/*
  1632		 * Always execute a GetPinSense verb here, even when called from
  1633		 * hdmi_intrinsic_event; for some NVIDIA HW, the unsolicited
  1634		 * response's PD bit is not the real PD value, but indicates that
  1635		 * the real PD value changed. An older version of the HD-audio
  1636		 * specification worked this way. Hence, we just ignore the data in
  1637		 * the unsolicited response to avoid custom WARs.
  1638		 */
  1639		int present;
  1640		int ret;
  1641	
  1642		ret = snd_hda_power_up_pm(codec);
  1643		if ((ret < 0 && pm_runtime_suspended(dev)) ||
> 1644			(dev->power.runtime_status == RPM_SUSPENDING))
  1645			goto out;
  1646	
  1647		present = snd_hda_jack_pin_sense(codec, pin_nid, dev_id);
  1648	
  1649		mutex_lock(&per_pin->lock);
  1650		eld->monitor_present = !!(present & AC_PINSENSE_PRESENCE);
  1651		if (eld->monitor_present)
  1652			eld->eld_valid  = !!(present & AC_PINSENSE_ELDV);
  1653		else
  1654			eld->eld_valid = false;
  1655	
  1656		codec_dbg(codec,
  1657			"HDMI status: Codec=%d NID=0x%x Presence_Detect=%d ELD_Valid=%d\n",
  1658			codec->addr, pin_nid, eld->monitor_present, eld->eld_valid);
  1659	
  1660		if (eld->eld_valid) {
  1661			if (spec->ops.pin_get_eld(codec, pin_nid, dev_id,
  1662						  eld->eld_buffer, &eld->eld_size) < 0)
  1663				eld->eld_valid = false;
  1664		}
  1665	
  1666		update_eld(codec, per_pin, eld, repoll);
  1667		mutex_unlock(&per_pin->lock);
  1668	 out:
  1669		snd_hda_power_down_pm(codec);
  1670	}
  1671
Mohan Kumar March 28, 2022, 1:51 p.m. UTC | #5
On 3/28/2022 4:27 PM, Takashi Iwai wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, 28 Mar 2022 12:19:03 +0200,
> Mohan Kumar D wrote:
>>
>> On 3/28/2022 3:12 PM, Takashi Iwai wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Mon, 28 Mar 2022 11:14:11 +0200,
>>> Mohan Kumar wrote:
>>>> There is a corner case with unsol event handling during codec runtime
>>>> suspending state. When the codec runtime suspend call initiated, the
>>>> codec->in_pm atomic variable would be 0, currently the codec runtime
>>>> suspend function calls snd_hdac_enter_pm() which will just increments
>>>> the codec->in_pm atomic variable. Consider unsol event happened just
>>>> after this step and before snd_hdac_leave_pm() in the codec runtime
>>>> suspend function. The snd_hdac_power_up_pm() in the unsol event
>>>> flow in hdmi_present_sense_via_verbs() function would just increment
>>>> the codec->in_pm atomic variable without calling pm_runtime_get_sync
>>>> function.
>>>>
>>>> As codec runtime suspend flow is already in progress and in parallel
>>>> unsol event is also accessing the codec verbs, as soon as codec
>>>> suspend flow completes and clocks are  switched off before completing
>>>> the unsol event handling as both functions doesn't wait for each other.
>>>> This will result in below errors
>>>>
>>>> [  589.428020] tegra-hda 3510000.hda: azx_get_response timeout, switching
>>>> to polling mode: last cmd=0x505f2f57
>>>> [  589.428344] tegra-hda 3510000.hda: spurious response 0x80000074:0x5,
>>>> last cmd=0x505f2f57
>>>> [  589.428547] tegra-hda 3510000.hda: spurious response 0x80000065:0x5,
>>>> last cmd=0x505f2f57
>>>>
>>>> To avoid this, the unsol event flow should not perform any codec verb
>>>> related operations during RPM_SUSPENDING state.
>>>>
>>>> Signed-off-by: Mohan Kumar <mkumard@nvidia.com>
>>> Thanks, that's a hairy problem...
>>>
>>> The logic sounds good, but can we check the PM state before calling
>>> snd_hda_power_up_pm()?
>> If am not wrong, PM apis exposed either provide RPM_ACTIVE or
>> RPM_SUSPENDED status. Don't see anything which provides info on
>> RPM_SUSPENDING. We might need to exactly know this state to fix this
>> issue.
> Well, maybe my question wasn't clear.  What I meant was that your
> change below
>
>>        ret = snd_hda_power_up_pm(codec);
>> -     if (ret < 0 && pm_runtime_suspended(hda_codec_dev(codec)))
>> +     if ((ret < 0 && pm_runtime_suspended(dev)) ||
>> +             (dev->power.runtime_status == RPM_SUSPENDING))
>>                goto out;
> can be rather like:
>
>> +     if (dev->power.runtime_status == RPM_SUSPENDING)
>> +             return;
>>        ret = snd_hda_power_up_pm(codec);
>>        if (ret < 0 && pm_runtime_suspended(hda_codec_dev(codec)))
> so that it skips unneeded power up/down calls.
>
> Basically the state is set at drivers/base/power/runtime.c
> rpm_suspend() just before calling the device's runtime_suspend
> callback.  So the state is supposed to be same before and after
> snd_hda_power_up_pm() in that case.
Thanks!, Make sense, will push the updated patch after testing with 
latest suggestion.
>
> thanks,
>
> Takashi
kernel test robot March 28, 2022, 4:09 p.m. UTC | #6
Hi Mohan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tiwai-sound/for-next]
[also build test ERROR on v5.17 next-20220328]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Mohan-Kumar/ALSA-hda-Avoid-unsol-event-during-RPM-suspending/20220328-171517
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-next
config: s390-randconfig-c005-20220327 (https://download.01.org/0day-ci/archive/20220329/202203290053.emhsIdTK-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0f6d9501cf49ce02937099350d08f20c4af86f3d)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/80c4e21f5e97cd4b779806fa5da5bb7392e2874f
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Mohan-Kumar/ALSA-hda-Avoid-unsol-event-during-RPM-suspending/20220328-171517
        git checkout 80c4e21f5e97cd4b779806fa5da5bb7392e2874f
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash sound/pci/hda/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from sound/pci/hda/patch_hdmi.c:21:
   In file included from include/linux/pci.h:39:
   In file included from include/linux/io.h:13:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
                                                             ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
   #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
                                                        ^
   In file included from sound/pci/hda/patch_hdmi.c:21:
   In file included from include/linux/pci.h:39:
   In file included from include/linux/io.h:13:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
   #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
                                                        ^
   In file included from sound/pci/hda/patch_hdmi.c:21:
   In file included from include/linux/pci.h:39:
   In file included from include/linux/io.h:13:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> sound/pci/hda/patch_hdmi.c:1644:15: error: no member named 'runtime_status' in 'struct dev_pm_info'
                   (dev->power.runtime_status == RPM_SUSPENDING))
                    ~~~~~~~~~~ ^
   12 warnings and 1 error generated.


vim +1644 sound/pci/hda/patch_hdmi.c

  1620	
  1621	/* update ELD and jack state via HD-audio verbs */
  1622	static void hdmi_present_sense_via_verbs(struct hdmi_spec_per_pin *per_pin,
  1623						 int repoll)
  1624	{
  1625		struct hda_codec *codec = per_pin->codec;
  1626		struct hdmi_spec *spec = codec->spec;
  1627		struct hdmi_eld *eld = &spec->temp_eld;
  1628		struct device *dev = hda_codec_dev(codec);
  1629		hda_nid_t pin_nid = per_pin->pin_nid;
  1630		int dev_id = per_pin->dev_id;
  1631		/*
  1632		 * Always execute a GetPinSense verb here, even when called from
  1633		 * hdmi_intrinsic_event; for some NVIDIA HW, the unsolicited
  1634		 * response's PD bit is not the real PD value, but indicates that
  1635		 * the real PD value changed. An older version of the HD-audio
  1636		 * specification worked this way. Hence, we just ignore the data in
  1637		 * the unsolicited response to avoid custom WARs.
  1638		 */
  1639		int present;
  1640		int ret;
  1641	
  1642		ret = snd_hda_power_up_pm(codec);
  1643		if ((ret < 0 && pm_runtime_suspended(dev)) ||
> 1644			(dev->power.runtime_status == RPM_SUSPENDING))
  1645			goto out;
  1646	
  1647		present = snd_hda_jack_pin_sense(codec, pin_nid, dev_id);
  1648	
  1649		mutex_lock(&per_pin->lock);
  1650		eld->monitor_present = !!(present & AC_PINSENSE_PRESENCE);
  1651		if (eld->monitor_present)
  1652			eld->eld_valid  = !!(present & AC_PINSENSE_ELDV);
  1653		else
  1654			eld->eld_valid = false;
  1655	
  1656		codec_dbg(codec,
  1657			"HDMI status: Codec=%d NID=0x%x Presence_Detect=%d ELD_Valid=%d\n",
  1658			codec->addr, pin_nid, eld->monitor_present, eld->eld_valid);
  1659	
  1660		if (eld->eld_valid) {
  1661			if (spec->ops.pin_get_eld(codec, pin_nid, dev_id,
  1662						  eld->eld_buffer, &eld->eld_size) < 0)
  1663				eld->eld_valid = false;
  1664		}
  1665	
  1666		update_eld(codec, per_pin, eld, repoll);
  1667		mutex_unlock(&per_pin->lock);
  1668	 out:
  1669		snd_hda_power_down_pm(codec);
  1670	}
  1671
Takashi Iwai March 28, 2022, 4:15 p.m. UTC | #7
On Mon, 28 Mar 2022 15:51:17 +0200,
Mohan Kumar D wrote:
> 
> 
> On 3/28/2022 4:27 PM, Takashi Iwai wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Mon, 28 Mar 2022 12:19:03 +0200,
> > Mohan Kumar D wrote:
> >>
> >> On 3/28/2022 3:12 PM, Takashi Iwai wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Mon, 28 Mar 2022 11:14:11 +0200,
> >>> Mohan Kumar wrote:
> >>>> There is a corner case with unsol event handling during codec runtime
> >>>> suspending state. When the codec runtime suspend call initiated, the
> >>>> codec->in_pm atomic variable would be 0, currently the codec runtime
> >>>> suspend function calls snd_hdac_enter_pm() which will just increments
> >>>> the codec->in_pm atomic variable. Consider unsol event happened just
> >>>> after this step and before snd_hdac_leave_pm() in the codec runtime
> >>>> suspend function. The snd_hdac_power_up_pm() in the unsol event
> >>>> flow in hdmi_present_sense_via_verbs() function would just increment
> >>>> the codec->in_pm atomic variable without calling pm_runtime_get_sync
> >>>> function.
> >>>>
> >>>> As codec runtime suspend flow is already in progress and in parallel
> >>>> unsol event is also accessing the codec verbs, as soon as codec
> >>>> suspend flow completes and clocks are  switched off before completing
> >>>> the unsol event handling as both functions doesn't wait for each other.
> >>>> This will result in below errors
> >>>>
> >>>> [  589.428020] tegra-hda 3510000.hda: azx_get_response timeout, switching
> >>>> to polling mode: last cmd=0x505f2f57
> >>>> [  589.428344] tegra-hda 3510000.hda: spurious response 0x80000074:0x5,
> >>>> last cmd=0x505f2f57
> >>>> [  589.428547] tegra-hda 3510000.hda: spurious response 0x80000065:0x5,
> >>>> last cmd=0x505f2f57
> >>>>
> >>>> To avoid this, the unsol event flow should not perform any codec verb
> >>>> related operations during RPM_SUSPENDING state.
> >>>>
> >>>> Signed-off-by: Mohan Kumar <mkumard@nvidia.com>
> >>> Thanks, that's a hairy problem...
> >>>
> >>> The logic sounds good, but can we check the PM state before calling
> >>> snd_hda_power_up_pm()?
> >> If am not wrong, PM apis exposed either provide RPM_ACTIVE or
> >> RPM_SUSPENDED status. Don't see anything which provides info on
> >> RPM_SUSPENDING. We might need to exactly know this state to fix this
> >> issue.
> > Well, maybe my question wasn't clear.  What I meant was that your
> > change below
> >
> >>        ret = snd_hda_power_up_pm(codec);
> >> -     if (ret < 0 && pm_runtime_suspended(hda_codec_dev(codec)))
> >> +     if ((ret < 0 && pm_runtime_suspended(dev)) ||
> >> +             (dev->power.runtime_status == RPM_SUSPENDING))
> >>                goto out;
> > can be rather like:
> >
> >> +     if (dev->power.runtime_status == RPM_SUSPENDING)
> >> +             return;
> >>        ret = snd_hda_power_up_pm(codec);
> >>        if (ret < 0 && pm_runtime_suspended(hda_codec_dev(codec)))
> > so that it skips unneeded power up/down calls.
> >
> > Basically the state is set at drivers/base/power/runtime.c
> > rpm_suspend() just before calling the device's runtime_suspend
> > callback.  So the state is supposed to be same before and after
> > snd_hda_power_up_pm() in that case.
> Thanks!, Make sense, will push the updated patch after testing with
> latest suggestion.

Thanks.  Also don't forget to cover a case the test bot complained:
the reference to power.runtime_status needs #ifdef CONFIG_PM.


Takashi
Mohan Kumar March 28, 2022, 5:03 p.m. UTC | #8
On 3/28/2022 9:45 PM, Takashi Iwai wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, 28 Mar 2022 15:51:17 +0200,
> Mohan Kumar D wrote:
>>
>> On 3/28/2022 4:27 PM, Takashi Iwai wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Mon, 28 Mar 2022 12:19:03 +0200,
>>> Mohan Kumar D wrote:
>>>> On 3/28/2022 3:12 PM, Takashi Iwai wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> On Mon, 28 Mar 2022 11:14:11 +0200,
>>>>> Mohan Kumar wrote:
>>>>>> There is a corner case with unsol event handling during codec runtime
>>>>>> suspending state. When the codec runtime suspend call initiated, the
>>>>>> codec->in_pm atomic variable would be 0, currently the codec runtime
>>>>>> suspend function calls snd_hdac_enter_pm() which will just increments
>>>>>> the codec->in_pm atomic variable. Consider unsol event happened just
>>>>>> after this step and before snd_hdac_leave_pm() in the codec runtime
>>>>>> suspend function. The snd_hdac_power_up_pm() in the unsol event
>>>>>> flow in hdmi_present_sense_via_verbs() function would just increment
>>>>>> the codec->in_pm atomic variable without calling pm_runtime_get_sync
>>>>>> function.
>>>>>>
>>>>>> As codec runtime suspend flow is already in progress and in parallel
>>>>>> unsol event is also accessing the codec verbs, as soon as codec
>>>>>> suspend flow completes and clocks are  switched off before completing
>>>>>> the unsol event handling as both functions doesn't wait for each other.
>>>>>> This will result in below errors
>>>>>>
>>>>>> [  589.428020] tegra-hda 3510000.hda: azx_get_response timeout, switching
>>>>>> to polling mode: last cmd=0x505f2f57
>>>>>> [  589.428344] tegra-hda 3510000.hda: spurious response 0x80000074:0x5,
>>>>>> last cmd=0x505f2f57
>>>>>> [  589.428547] tegra-hda 3510000.hda: spurious response 0x80000065:0x5,
>>>>>> last cmd=0x505f2f57
>>>>>>
>>>>>> To avoid this, the unsol event flow should not perform any codec verb
>>>>>> related operations during RPM_SUSPENDING state.
>>>>>>
>>>>>> Signed-off-by: Mohan Kumar <mkumard@nvidia.com>
>>>>> Thanks, that's a hairy problem...
>>>>>
>>>>> The logic sounds good, but can we check the PM state before calling
>>>>> snd_hda_power_up_pm()?
>>>> If am not wrong, PM apis exposed either provide RPM_ACTIVE or
>>>> RPM_SUSPENDED status. Don't see anything which provides info on
>>>> RPM_SUSPENDING. We might need to exactly know this state to fix this
>>>> issue.
>>> Well, maybe my question wasn't clear.  What I meant was that your
>>> change below
>>>
>>>>         ret = snd_hda_power_up_pm(codec);
>>>> -     if (ret < 0 && pm_runtime_suspended(hda_codec_dev(codec)))
>>>> +     if ((ret < 0 && pm_runtime_suspended(dev)) ||
>>>> +             (dev->power.runtime_status == RPM_SUSPENDING))
>>>>                 goto out;
>>> can be rather like:
>>>
>>>> +     if (dev->power.runtime_status == RPM_SUSPENDING)
>>>> +             return;
>>>>         ret = snd_hda_power_up_pm(codec);
>>>>         if (ret < 0 && pm_runtime_suspended(hda_codec_dev(codec)))
>>> so that it skips unneeded power up/down calls.
>>>
>>> Basically the state is set at drivers/base/power/runtime.c
>>> rpm_suspend() just before calling the device's runtime_suspend
>>> callback.  So the state is supposed to be same before and after
>>> snd_hda_power_up_pm() in that case.
>> Thanks!, Make sense, will push the updated patch after testing with
>> latest suggestion.
> Thanks.  Also don't forget to cover a case the test bot complained:
> the reference to power.runtime_status needs #ifdef CONFIG_PM.
Sure, will take care same in next patch update.
>
> Takashi
diff mbox series

Patch

diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c
index c85ed7bc121e..67870c8d84a5 100644
--- a/sound/pci/hda/patch_hdmi.c
+++ b/sound/pci/hda/patch_hdmi.c
@@ -1625,6 +1625,7 @@  static void hdmi_present_sense_via_verbs(struct hdmi_spec_per_pin *per_pin,
 	struct hda_codec *codec = per_pin->codec;
 	struct hdmi_spec *spec = codec->spec;
 	struct hdmi_eld *eld = &spec->temp_eld;
+	struct device *dev = hda_codec_dev(codec);
 	hda_nid_t pin_nid = per_pin->pin_nid;
 	int dev_id = per_pin->dev_id;
 	/*
@@ -1639,7 +1640,8 @@  static void hdmi_present_sense_via_verbs(struct hdmi_spec_per_pin *per_pin,
 	int ret;
 
 	ret = snd_hda_power_up_pm(codec);
-	if (ret < 0 && pm_runtime_suspended(hda_codec_dev(codec)))
+	if ((ret < 0 && pm_runtime_suspended(dev)) ||
+		(dev->power.runtime_status == RPM_SUSPENDING))
 		goto out;
 
 	present = snd_hda_jack_pin_sense(codec, pin_nid, dev_id);