mbox series

[RFC,0/4] Add support for detection

Message ID 20241016130228.1013227-1-amadeuszx.slawinski@linux.intel.com (mailing list archive)
Headers show
Series Add support for detection | expand

Message

Amadeusz Sławiński Oct. 16, 2024, 1:02 p.m. UTC
There are some scenarios when using DSP where one may want to have
partially active stream and fully enable it after some event occurs.

Following patchset adds new "detect" state to ALSA state machine to
allow waiting for condition to occur before fully starting a stream. In
further patches the state is propagated through ASoC components to allow
them to handling the state as necessary.

Main goal of this patchset is to allow handling scenarios like keyphrase
detection - where DSP analyses incoming signal and wakes userspace to
consume stream only when keyphrase is detected.

I'm sending this as RFC so we can discuss if this is the way to go or if
there is perhaps another preferred way of adding such interface.
Userspace part of implementation is available at
https://github.com/amadeuszslawinski-intel/alsa-lib/tree/rfc_detect

Amadeusz Sławiński (4):
  ALSA: core: Add support for running detect on capture stream
  ALSA: core: Allow polling for detection
  ASoC: pcm: Add support for running detect on capture stream
  ASoC: Propagate DETECT trigger

 include/sound/pcm.h         |  6 +++
 include/sound/soc-dpcm.h    |  2 +
 include/uapi/sound/asound.h |  4 +-
 sound/core/pcm_native.c     | 89 ++++++++++++++++++++++++++++++++++++-
 sound/soc/soc-component.c   |  2 +
 sound/soc/soc-dai.c         |  2 +
 sound/soc/soc-link.c        |  2 +
 sound/soc/soc-pcm.c         | 56 +++++++++++++++++++++++
 8 files changed, 160 insertions(+), 3 deletions(-)

Comments

Takashi Iwai Oct. 16, 2024, 1:11 p.m. UTC | #1
On Wed, 16 Oct 2024 15:02:24 +0200,
Amadeusz Sławiński wrote:
> 
> There are some scenarios when using DSP where one may want to have
> partially active stream and fully enable it after some event occurs.
> 
> Following patchset adds new "detect" state to ALSA state machine to
> allow waiting for condition to occur before fully starting a stream. In
> further patches the state is propagated through ASoC components to allow
> them to handling the state as necessary.
> 
> Main goal of this patchset is to allow handling scenarios like keyphrase
> detection - where DSP analyses incoming signal and wakes userspace to
> consume stream only when keyphrase is detected.
> 
> I'm sending this as RFC so we can discuss if this is the way to go or if
> there is perhaps another preferred way of adding such interface.
> Userspace part of implementation is available at
> https://github.com/amadeuszslawinski-intel/alsa-lib/tree/rfc_detect
> 
> Amadeusz Sławiński (4):
>   ALSA: core: Add support for running detect on capture stream
>   ALSA: core: Allow polling for detection
>   ASoC: pcm: Add support for running detect on capture stream
>   ASoC: Propagate DETECT trigger

Generally speaking, the addition of a new PCM state should be avoided.
It'll influence too badly on all user-space programs.  e.g. if an old
user-space program receives such a new state, what should it do?
How can it know it's a fatal error or it can be ignored / skipped?

And, if it's about the synchronization of the DSP readiness, can't it
be rather synced in each PCM open or prepare instead?


Takashi
Amadeusz Sławiński Oct. 16, 2024, 1:29 p.m. UTC | #2
On 10/16/2024 3:11 PM, Takashi Iwai wrote:
> On Wed, 16 Oct 2024 15:02:24 +0200,
> Amadeusz Sławiński wrote:
>>
>> There are some scenarios when using DSP where one may want to have
>> partially active stream and fully enable it after some event occurs.
>>
>> Following patchset adds new "detect" state to ALSA state machine to
>> allow waiting for condition to occur before fully starting a stream. In
>> further patches the state is propagated through ASoC components to allow
>> them to handling the state as necessary.
>>
>> Main goal of this patchset is to allow handling scenarios like keyphrase
>> detection - where DSP analyses incoming signal and wakes userspace to
>> consume stream only when keyphrase is detected.
>>
>> I'm sending this as RFC so we can discuss if this is the way to go or if
>> there is perhaps another preferred way of adding such interface.
>> Userspace part of implementation is available at
>> https://github.com/amadeuszslawinski-intel/alsa-lib/tree/rfc_detect
>>
>> Amadeusz Sławiński (4):
>>    ALSA: core: Add support for running detect on capture stream
>>    ALSA: core: Allow polling for detection
>>    ASoC: pcm: Add support for running detect on capture stream
>>    ASoC: Propagate DETECT trigger
> 
> Generally speaking, the addition of a new PCM state should be avoided.
> It'll influence too badly on all user-space programs.  e.g. if an old
> user-space program receives such a new state, what should it do?
> How can it know it's a fatal error or it can be ignored / skipped?

In this case it should not get into new state unless specifically 
requested from userspace, unless I'm missing something?

Goal is to allow something along the lines of following in arecord or 
similar:

         ret = snd_pcm_detect(handle); // here only parts of DSP FW 
needed for detection are running
         c = snd_pcm_poll_descriptors_count(handle);

         pfds = malloc(sizeof(struct pollfd) * c);

         snd_pcm_poll_descriptors(handle, pfds, c); // polling for 
detect event to happen

         while (!detected) {
                 ret = poll(pfds, c, -1);
                 snd_pcm_poll_descriptors_revents(handle, pfds, c, 
&revents);

                 if (revents == POLLERR) {
                         error(_("poll, revents == |POLLERR"));
                 }
                 if (revents == POLLIN) {
                         error(_("poll, revents == |POLLIN"));
                         detected = 1;
                 }
         }
         ret = snd_pcm_start(handle); // starts whatever else is needed 
for PCM to work

> 
> And, if it's about the synchronization of the DSP readiness, can't it
> be rather synced in each PCM open or prepare instead?
> 

It's too early. We need to do it after hw_params as it needs to create 
all paths needed for full stream. During detection it just activates 
ones needed for detection and only after receiving detection event we 
want to start ones needed for draining.
Jaroslav Kysela Oct. 16, 2024, 1:41 p.m. UTC | #3
On 16. 10. 24 15:02, Amadeusz Sławiński wrote:
> There are some scenarios when using DSP where one may want to have
> partially active stream and fully enable it after some event occurs.
> 
> Following patchset adds new "detect" state to ALSA state machine to
> allow waiting for condition to occur before fully starting a stream. In
> further patches the state is propagated through ASoC components to allow
> them to handling the state as necessary.
> 
> Main goal of this patchset is to allow handling scenarios like keyphrase
> detection - where DSP analyses incoming signal and wakes userspace to
> consume stream only when keyphrase is detected.
> 
> I'm sending this as RFC so we can discuss if this is the way to go or if
> there is perhaps another preferred way of adding such interface.
> Userspace part of implementation is available at
> https://github.com/amadeuszslawinski-intel/alsa-lib/tree/rfc_detect

It looks like a complex extension with a low benefit. We have already layer 
which can signalize stream readiness - an element in the control API. PCM 
driver may just return silence samples when the condition is not met for 
security reasons.

				Jaroslav
Takashi Iwai Oct. 16, 2024, 1:47 p.m. UTC | #4
On Wed, 16 Oct 2024 15:29:35 +0200,
Amadeusz Sławiński wrote:
> 
> On 10/16/2024 3:11 PM, Takashi Iwai wrote:
> > On Wed, 16 Oct 2024 15:02:24 +0200,
> > Amadeusz Sławiński wrote:
> >> 
> >> There are some scenarios when using DSP where one may want to have
> >> partially active stream and fully enable it after some event occurs.
> >> 
> >> Following patchset adds new "detect" state to ALSA state machine to
> >> allow waiting for condition to occur before fully starting a stream. In
> >> further patches the state is propagated through ASoC components to allow
> >> them to handling the state as necessary.
> >> 
> >> Main goal of this patchset is to allow handling scenarios like keyphrase
> >> detection - where DSP analyses incoming signal and wakes userspace to
> >> consume stream only when keyphrase is detected.
> >> 
> >> I'm sending this as RFC so we can discuss if this is the way to go or if
> >> there is perhaps another preferred way of adding such interface.
> >> Userspace part of implementation is available at
> >> https://github.com/amadeuszslawinski-intel/alsa-lib/tree/rfc_detect
> >> 
> >> Amadeusz Sławiński (4):
> >>    ALSA: core: Add support for running detect on capture stream
> >>    ALSA: core: Allow polling for detection
> >>    ASoC: pcm: Add support for running detect on capture stream
> >>    ASoC: Propagate DETECT trigger
> > 
> > Generally speaking, the addition of a new PCM state should be avoided.
> > It'll influence too badly on all user-space programs.  e.g. if an old
> > user-space program receives such a new state, what should it do?
> > How can it know it's a fatal error or it can be ignored / skipped?
> 
> In this case it should not get into new state unless specifically
> requested from userspace, unless I'm missing something?

Hmm, if the state is exclusive only for the requested program and
influences only on that program itself, why does it have to be a
global PCM state that essentially every program code has to deal with?

> Goal is to allow something along the lines of following in arecord or
> similar:
> 
>         ret = snd_pcm_detect(handle); // here only parts of DSP FW
> needed for detection are running
>         c = snd_pcm_poll_descriptors_count(handle);
> 
>         pfds = malloc(sizeof(struct pollfd) * c);
> 
>         snd_pcm_poll_descriptors(handle, pfds, c); // polling for
> detect event to happen
> 
>         while (!detected) {
>                 ret = poll(pfds, c, -1);
>                 snd_pcm_poll_descriptors_revents(handle, pfds, c,
> &revents);
> 
>                 if (revents == POLLERR) {
>                         error(_("poll, revents == |POLLERR"));
>                 }
>                 if (revents == POLLIN) {
>                         error(_("poll, revents == |POLLIN"));
>                         detected = 1;
>                 }
>         }

It's too complex if it's needed for each program.
If any, it'd be easier to implement an ioctl() for triggering the
detect and the sync...

>         ret = snd_pcm_start(handle); // starts whatever else is needed
> for PCM to work
> 
> > 
> > And, if it's about the synchronization of the DSP readiness, can't it
> > be rather synced in each PCM open or prepare instead?
> > 
> 
> It's too early. We need to do it after hw_params as it needs to create
> all paths needed for full stream.

It can be done in hw_params, too.  I don't mind where to put, but my
point is that the sync can be implemented internally without changing
the external API to user-space.

> During detection it just activates
> ones needed for detection and only after receiving detection event we
> want to start ones needed for draining.

So if the driver needs the sync, it can be done in hw_params or
prepare, too.  Something like stop_sync stuff we introduced.
In the case of stop_sync, hw_params waits for the pending task by the
previous stop trigger.  In this case, the sync can be performed after
hw_params (if requested) instead.


thanks,

Takashi
Amadeusz Sławiński Oct. 18, 2024, 2:16 p.m. UTC | #5
On 10/16/2024 3:47 PM, Takashi Iwai wrote:
 > On Wed, 16 Oct 2024 15:29:35 +0200,
 > Amadeusz Sławiński wrote:
 >>
 >> On 10/16/2024 3:11 PM, Takashi Iwai wrote:
 >>> On Wed, 16 Oct 2024 15:02:24 +0200,
 >>> Amadeusz Sławiński wrote:
 >>>>
 >>>> There are some scenarios when using DSP where one may want to have
 >>>> partially active stream and fully enable it after some event occurs.
 >>>>
 >>>> Following patchset adds new "detect" state to ALSA state machine to
 >>>> allow waiting for condition to occur before fully starting a 
stream. In
 >>>> further patches the state is propagated through ASoC components to 
allow
 >>>> them to handling the state as necessary.
 >>>>
 >>>> Main goal of this patchset is to allow handling scenarios like 
keyphrase
 >>>> detection - where DSP analyses incoming signal and wakes userspace to
 >>>> consume stream only when keyphrase is detected.
 >>>>
 >>>> I'm sending this as RFC so we can discuss if this is the way to go 
or if
 >>>> there is perhaps another preferred way of adding such interface.
 >>>> Userspace part of implementation is available at
 >>>> https://github.com/amadeuszslawinski-intel/alsa-lib/tree/rfc_detect
 >>>>
 >>>> Amadeusz Sławiński (4):
 >>>>     ALSA: core: Add support for running detect on capture stream
 >>>>     ALSA: core: Allow polling for detection
 >>>>     ASoC: pcm: Add support for running detect on capture stream
 >>>>     ASoC: Propagate DETECT trigger
 >>>
 >>> Generally speaking, the addition of a new PCM state should be avoided.
 >>> It'll influence too badly on all user-space programs.  e.g. if an old
 >>> user-space program receives such a new state, what should it do?
 >>> How can it know it's a fatal error or it can be ignored / skipped?
 >>
 >> In this case it should not get into new state unless specifically
 >> requested from userspace, unless I'm missing something?
 >
 > Hmm, if the state is exclusive only for the requested program and
 > influences only on that program itself, why does it have to be a
 > global PCM state that essentially every program code has to deal with?

It is stream state, so we can have stream in detect state and wait for 
some event on DSP, it can be for example like detecting someone saying 
something above background noise threshold and then we want to process 
it in userspace, only when it happens and when there is no more 
interesting data we want to be able to return back into detection state.

 >> Goal is to allow something along the lines of following in arecord or
 >> similar:
 >>
 >>          ret = snd_pcm_detect(handle); // here only parts of DSP FW
 >> needed for detection are running
 >>          c = snd_pcm_poll_descriptors_count(handle);
 >>
 >>          pfds = malloc(sizeof(struct pollfd) * c);
 >>
 >>          snd_pcm_poll_descriptors(handle, pfds, c); // polling for
 >> detect event to happen
 >>
 >>          while (!detected) {
 >>                  ret = poll(pfds, c, -1);
 >>                  snd_pcm_poll_descriptors_revents(handle, pfds, c,
 >> &revents);
 >>
 >>                  if (revents == POLLERR) {
 >>                          error(_("poll, revents == |POLLERR"));
 >>                  }
 >>                  if (revents == POLLIN) {
 >>                          error(_("poll, revents == |POLLIN"));
 >>                          detected = 1;
 >>                  }
 >>          }
 >
 > It's too complex if it's needed for each program.
 > If any, it'd be easier to implement an ioctl() for triggering the
 > detect and the sync...
 >

Well we can implement custom IOCTL in HWDEP, but I'd rather prefer to 
have this functionality within machine state, so there is no confusion 
between state machine and custom IOCTL.

 >>          ret = snd_pcm_start(handle); // starts whatever else is needed
 >> for PCM to work
 >>
 >>>
 >>> And, if it's about the synchronization of the DSP readiness, can't it
 >>> be rather synced in each PCM open or prepare instead?
 >>>
 >>
 >> It's too early. We need to do it after hw_params as it needs to create
 >> all paths needed for full stream.
 >
 > It can be done in hw_params, too.  I don't mind where to put, but my
 > point is that the sync can be implemented internally without changing
 > the external API to user-space.
 >

hw_params is too early, as not all paths may be ready, additionally we 
need to allow for sending additional configuration from userspace in 
some use cases.

 >> During detection it just activates
 >> ones needed for detection and only after receiving detection event we
 >> want to start ones needed for draining.
 >
 > So if the driver needs the sync, it can be done in hw_params or
 > prepare, too.  Something like stop_sync stuff we introduced.
 > In the case of stop_sync, hw_params waits for the pending task by the
 > previous stop trigger.  In this case, the sync can be performed after
 > hw_params (if requested) instead.

I'm not sure where sync comes in here, maybe we are talking about two 
different use cases?

Our use case is to:
1. Program DSP with all pipelines it needs to implement whole scenario 
(hw_params)
2. Perform additional configuration from usespace
3. Start detection pipelines
4. Wait for event to happen (it can take however long necessary, for 
example hours)
5. When event happens start drain pipelines and process data in 
userspace likes in standard capture scenario
6. When no more data is needed, either stop drain pipelines and go back 
to 3 or just close stream
Jaroslav Kysela Oct. 18, 2024, 5:15 p.m. UTC | #6
On 18. 10. 24 16:16, Amadeusz Sławiński wrote:

> Our use case is to:

Inlined my proposal:

> 1. Program DSP with all pipelines it needs to implement whole scenario
> (hw_params)

SNDRV_PCM_IOCTL_HW_PARAMS

> 2. Perform additional configuration from usespace

Can be done using control API (/dev/snd/controlC*). E.g. We have already:

numid=19,iface=PCM,name='ELD',device=3
numid=32,iface=PCM,name='Playback Channel Map',device=3

Those controls can be R/W as you like.

> 3. Start detection pipelines

SNDRV_PCM_IOCTL_PREPARE

> 4. Wait for event to happen (it can take however long necessary, for
> example hours)

Can be done using control API (/dev/snd/controlC*). There is a mechanism for 
event handling already.

Something like (boolean type):

iface=PCM,device=2,name="Event X"   # first event type
iface=PCM,device=2,name="Event Y"   # second event type ...

The PCM stream may be active only when at least one event from the above set 
is active.

> 5. When event happens start drain pipelines and process data in
> userspace likes in standard capture scenario

SNDRV_PCM_IOCTL_START

The driver may set the correct timestamp when the capturing was started. The 
sample buffers are already allocated and ready for transfers, so we can assume 
that the time window between the event notification (event) and START ioctl is 
small, so no samples should be lost.

> 6. When no more data is needed, either stop drain pipelines and go back
> to 3 or just close stream

SNDRV_PCM_IOCTL_STOP / DRAIN, SNDRV_PCM_IOCTL_HW_PARAMS or close(fd)

Basically, I think that this extension can be implemented with minimal changes 
to the PCM API - it's just about to slightly extend existing APIs.

				Jaroslav
Amadeusz Sławiński Oct. 22, 2024, 3:11 p.m. UTC | #7
On 10/18/2024 7:15 PM, Jaroslav Kysela wrote:
> On 18. 10. 24 16:16, Amadeusz Sławiński wrote:
> 
>> Our use case is to:
> 
> Inlined my proposal:
> 
>> 1. Program DSP with all pipelines it needs to implement whole scenario
>> (hw_params)
> 
> SNDRV_PCM_IOCTL_HW_PARAMS
> 
>> 2. Perform additional configuration from usespace
> 
> Can be done using control API (/dev/snd/controlC*). E.g. We have already:
> 
> numid=19,iface=PCM,name='ELD',device=3
> numid=32,iface=PCM,name='Playback Channel Map',device=3
> 
> Those controls can be R/W as you like.
> 
>> 3. Start detection pipelines
> 
> SNDRV_PCM_IOCTL_PREPARE
> 

Will check, but from what I recall streams should not be started in 
prepare due to it being used for xrun recovery, but maybe it doesn't 
matter for detection ones.

>> 4. Wait for event to happen (it can take however long necessary, for
>> example hours)
> 
> Can be done using control API (/dev/snd/controlC*). There is a mechanism 
> for event handling already.
> 
> Something like (boolean type):
> 
> iface=PCM,device=2,name="Event X"   # first event type
> iface=PCM,device=2,name="Event Y"   # second event type ...
> 
> The PCM stream may be active only when at least one event from the above 
> set is active.
> 
>> 5. When event happens start drain pipelines and process data in
>> userspace likes in standard capture scenario
> 
> SNDRV_PCM_IOCTL_START
> 
> The driver may set the correct timestamp when the capturing was started. 
> The sample buffers are already allocated and ready for transfers, so we 
> can assume that the time window between the event notification (event) 
> and START ioctl is small, so no samples should be lost.
> 
>> 6. When no more data is needed, either stop drain pipelines and go back
>> to 3 or just close stream
> 
> SNDRV_PCM_IOCTL_STOP / DRAIN, SNDRV_PCM_IOCTL_HW_PARAMS or close(fd)
> 
> Basically, I think that this extension can be implemented with minimal 
> changes to the PCM API - it's just about to slightly extend existing APIs.
> 
>                  Jaroslav
>