Message ID | 20240527071133.223066-1-perex@perex.cz (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RFC] ALSA: compress_offload: introduce passthrough operation mode | expand |
On 5/27/2024 9:11 AM, Jaroslav Kysela wrote: > There is a requirement to expose the audio hardware that accelerates various > tasks for user space such as sample rate converters, compressed > stream decoders, etc. > > This is description for the API extension for the compress ALSA API which > is able to handle "tasks" that are not bound to real-time operations > and allows for the serialization of operations. > > For details, refer to "compress-passthrough.rst" document. > > Note: This code is RFC (not tested, just to clearify the API requirements). > My goal is to add a test (loopback) driver and add a support to tinycompress > library in the next step. > > Cc: Mark Brown <broonie@kernel.org> > Cc: Shengjiu Wang <shengjiu.wang@gmail.com> > Cc: Nicolas Dufresne <nicolas@ndufresne.ca> > Cc: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> > Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> > Cc: Vinod Koul <vkoul@kernel.org> > Signed-off-by: Jaroslav Kysela <perex@perex.cz> > --- Seems mostly ok to me, some nitpicks below. > .../sound/designs/compress-passthrough.rst | 125 +++++++ > include/sound/compress_driver.h | 32 ++ > include/uapi/sound/compress_offload.h | 44 ++- > sound/core/Kconfig | 4 + > sound/core/compress_offload.c | 314 +++++++++++++++++- > 5 files changed, 511 insertions(+), 8 deletions(-) > create mode 100644 Documentation/sound/designs/compress-passthrough.rst > > diff --git a/Documentation/sound/designs/compress-passthrough.rst b/Documentation/sound/designs/compress-passthrough.rst > new file mode 100644 > index 000000000000..bb5a0ae5c496 > --- /dev/null > +++ b/Documentation/sound/designs/compress-passthrough.rst > @@ -0,0 +1,125 @@ > +================================= > +ALSA Co-processor Passthrough API > +================================= > + > +Jaroslav Kysela <perex@perex.cz> > + > + > +Overview > +======== > + > +There is a requirement to expose the audio hardware that accelerates various > +tasks for user space such as sample rate converters, compressed > +stream decoders, etc. > + > +This is description for the API extension for the compress ALSA API which > +is able to handle "tasks" that are not bound to real-time operations > +and allows for the serialization of operations. > + > +Requirements > +============ > + > +The main requirements are: > + > +- serialization of multiple tasks for user space to allow multiple > + operations without user space intervention > + > +- separate buffers (input + output) for each operation > + > +- expose buffers using mmap to user space > + > +- signal user space when the task is finished (standard poll mechanism) > + > +Design > +====== > + > +A new direction SND_COMPRESS_PASSTHROUGH is introduced to identify > +the passthrough API. > + > +The API extension shares device enumeration and parameters handling from > +the main compressed API. All other realtime streaming ioctls are deactivated > +and a new set of task related ioctls are introduced. The standard > +read/write/mmap I/O operations are not supported in the passtrough device. passthrough > + > +Device ("stream") state handling is reduced to OPEN/SETUP. All other > +states are not available for the passthrough mode. > + > +Data I/O mechanism is using standard dma-buf interface with all advantages > +like mmap, standard I/O, buffer sharing etc. One buffer is used for the > +input data and second (separate) buffer is used for the ouput data. Each task output > +have separate I/O buffers. > + > +For the buffering parameters, the fragments means a limit of allocated tasks > +for given device. The fragment_size limits the input buffer size for the given > +device. The output buffer size is determined by the driver (may be different > +from the input buffer size). > + > +State Machine > +============= > + > +The passtrough audio stream state machine is described below : passthrough > + > + +----------+ > + | | > + | OPEN | > + | | > + +----------+ > + | > + | > + | compr_set_params() > + | > + v > + all passthrough task ops +----------+ > + +------------------------------------| | > + | | SETUP | > + | | > + | +----------+ > + | | > + +------------------------------------------+ > + > + > +Passthrough operations (ioctls) > +=============================== > + > +CREATE > +------ > +Creates a set of input/output buffers. The input buffer size is > +fragment_size. Allocates unique seqno. > + > +The hardware drivers allocate internal 'struct dma_buf' for both input and > +output buffers (using 'dma_buf_export()' function). The anonymous > +file descriptors for those buffers are passed to user space. > + > +FREE > +---- > +Free a set of input/output buffers. If an task is active, the stop _a_ task? > + > +static int snd_compr_task_start(struct snd_compr_stream *stream, struct snd_compr_task *utask) > +{ > + struct snd_compr_task_runtime *task; > + int retval; > + > + if (utask->origin_seqno > 0) { > + task = snd_compr_find_task(stream, utask->seqno); > + retval = snd_compr_task_start_prepare(task, utask); > + if (retval < 0) > + return retval; > + task->seqno = utask->seqno = snd_compr_seqno_next(stream); > + utask->origin_seqno = 0; > + list_move_tail(&task->list, &stream->runtime->tasks); > + } else { > + task = snd_compr_find_task(stream, utask->seqno); > + if (task && task->finished) > + return -EBUSY; > + retval = snd_compr_task_start_prepare(task, utask); > + if (retval < 0) > + return retval; > + } > + retval = stream->ops->task_start(stream, task); Should probably check somewhere that ops are set properly, or is it this way because they all considered mandatory? > + > +static int snd_compr_task_status(struct snd_compr_stream *stream, > + struct snd_compr_task_status *status) > +{ > + struct snd_compr_task_runtime *task; > + > + task = snd_compr_find_task(stream, status->seqno); > + if (task == NULL) > + return -EINVAL; > + status->output_size = task->output_size; > + status->active = task->active ? 1 : 0; > + status->finished = task->finished ? 1 : 0; Not sure, but access to ->active and ->finished, feels to me like something that may require locking.
On Mon, 27 May 2024 12:13:30 +0200, Amadeusz Sławiński wrote: > > > +static int snd_compr_task_status(struct snd_compr_stream *stream, > > + struct snd_compr_task_status *status) > > +{ > > + struct snd_compr_task_runtime *task; > > + > > + task = snd_compr_find_task(stream, status->seqno); > > + if (task == NULL) > > + return -EINVAL; > > + status->output_size = task->output_size; > > + status->active = task->active ? 1 : 0; > > + status->finished = task->finished ? 1 : 0; > > Not sure, but access to ->active and ->finished, feels to me like > something that may require locking. Or better to make it a single state field? thanks, Takashi
On Mon, 27 May 2024 09:11:33 +0200, Jaroslav Kysela wrote: > > There is a requirement to expose the audio hardware that accelerates various > tasks for user space such as sample rate converters, compressed > stream decoders, etc. > > This is description for the API extension for the compress ALSA API which > is able to handle "tasks" that are not bound to real-time operations > and allows for the serialization of operations. > > For details, refer to "compress-passthrough.rst" document. > > Note: This code is RFC (not tested, just to clearify the API requirements). > My goal is to add a test (loopback) driver and add a support to tinycompress > library in the next step. > > Cc: Mark Brown <broonie@kernel.org> > Cc: Shengjiu Wang <shengjiu.wang@gmail.com> > Cc: Nicolas Dufresne <nicolas@ndufresne.ca> > Cc: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> > Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> > Cc: Vinod Koul <vkoul@kernel.org> > Signed-off-by: Jaroslav Kysela <perex@perex.cz> > --- (snip) Through a quick glance, the general idea looks simple enough and straightforward. If this satisfies the needs, we can go in this way. A few subtle things: - We might want to have a safe limit for the task creations. An evil user-space can fill up too easily. - The error checks of memdup_user() are missing. A caveat is that memdup_user() returns ERR_PTR() instead of NULL. So with the automatic cleanup, you'd need to fiddle with PTR_ERR(no_free_ptr(p)). - The use of __u64 is rather for uapi headers. The kernel internals can be with straight u64. - The task list add/removal might need some locks. It looks racy. Also the task runtime returned from snd_compr_find_task() might be freed. Need for some locking or refcounting? - A linear lookup is OK from performance POV? Maybe enough for small number of tasks. - Better to have a protocol version check and accept the R/W mode only with the newer version. thanks, Takashi
On 27. 05. 24 13:10, Takashi Iwai wrote: > On Mon, 27 May 2024 12:13:30 +0200, > Amadeusz Sławiński wrote: >> >>> +static int snd_compr_task_status(struct snd_compr_stream *stream, >>> + struct snd_compr_task_status *status) >>> +{ >>> + struct snd_compr_task_runtime *task; >>> + >>> + task = snd_compr_find_task(stream, status->seqno); >>> + if (task == NULL) >>> + return -EINVAL; >>> + status->output_size = task->output_size; >>> + status->active = task->active ? 1 : 0; >>> + status->finished = task->finished ? 1 : 0; >> >> Not sure, but access to ->active and ->finished, feels to me like >> something that may require locking. > > Or better to make it a single state field? Yes, it's an option. Those states cannot be used simultaneously. Note that the access to the runtime task fields is protected by the stream->device->lock mutex. Unless I overlooked something. Jaroslav
On 27. 05. 24 12:13, Amadeusz Sławiński wrote: >> +The API extension shares device enumeration and parameters handling from >> +the main compressed API. All other realtime streaming ioctls are deactivated >> +and a new set of task related ioctls are introduced. The standard >> +read/write/mmap I/O operations are not supported in the passtrough device. > > passthrough Thank you. I fixed those typos in my tree. >> + retval = stream->ops->task_start(stream, task); > > Should probably check somewhere that ops are set properly, or is it this > way because they all considered mandatory? Yes, all four callbacks are mandatory for the passthrough mode. Thank you for your comments, Jaroslav
On 27. 05. 24 13:23, Takashi Iwai wrote: > On Mon, 27 May 2024 09:11:33 +0200, > Jaroslav Kysela wrote: >> >> There is a requirement to expose the audio hardware that accelerates various >> tasks for user space such as sample rate converters, compressed >> stream decoders, etc. >> >> This is description for the API extension for the compress ALSA API which >> is able to handle "tasks" that are not bound to real-time operations >> and allows for the serialization of operations. >> >> For details, refer to "compress-passthrough.rst" document. >> >> Note: This code is RFC (not tested, just to clearify the API requirements). >> My goal is to add a test (loopback) driver and add a support to tinycompress >> library in the next step. >> >> Cc: Mark Brown <broonie@kernel.org> >> Cc: Shengjiu Wang <shengjiu.wang@gmail.com> >> Cc: Nicolas Dufresne <nicolas@ndufresne.ca> >> Cc: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> >> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> >> Cc: Vinod Koul <vkoul@kernel.org> >> Signed-off-by: Jaroslav Kysela <perex@perex.cz> >> --- > (snip) > > Through a quick glance, the general idea looks simple enough and > straightforward. If this satisfies the needs, we can go in this way. > > A few subtle things: > > - We might want to have a safe limit for the task creations. > An evil user-space can fill up too easily. The limit was proposed in snd_compress_check_input() but not used later (TASK_CREATE ioctl). I will send v2 shortly. > - The error checks of memdup_user() are missing. > A caveat is that memdup_user() returns ERR_PTR() instead of NULL. > So with the automatic cleanup, you'd need to fiddle with > PTR_ERR(no_free_ptr(p)). Thanks. > - The use of __u64 is rather for uapi headers. The kernel internals > can be with straight u64. Yes, thanks. > - The task list add/removal might need some locks. It looks racy. > Also the task runtime returned from snd_compr_find_task() might be > freed. Need for some locking or refcounting? All task operations are covered using stream->driver->lock unless I overlooked something. > - A linear lookup is OK from performance POV? Maybe enough for small > number of tasks. We can improve this later, but I would not expect too many queued tasks. I set the limit to 64 for the first implementation. > - Better to have a protocol version check and accept the R/W mode only > with the newer version. There is no R/W mode for passthrough direction. An error should be returned. The direction is returned from the driver using SNDRV_COMPRESS_GET_CAPS and also open mode must be O_RDWR so things seem to be separated enough. Jaroslav
Thanks Jaroslav, this is very interesting indeed. I added a set of comments to clarify the design. > +There is a requirement to expose the audio hardware that accelerates various > +tasks for user space such as sample rate converters, compressed > +stream decoders, etc. "passthrough" usually means 'no change to data, filter coefficients not applied' in the audio world. > +This is description for the API extension for the compress ALSA API which > +is able to handle "tasks" that are not bound to real-time operations > +and allows for the serialization of operations. not sure what "not bound to real-time operations" means. sample-rate conversion is probably the most dependent on accurate timing :-) > +Requirements > +============ > + > +The main requirements are: > + > +- serialization of multiple tasks for user space to allow multiple > + operations without user space intervention > + > +- separate buffers (input + output) for each operation > + > +- expose buffers using mmap to user space If every buffer is mmap'ed to userspace, what prevents userspace from interfering? I think userspace would only be involved at the source and sink of the processing chain, no? > +- signal user space when the task is finished (standard poll mechanism) > + > +Design > +====== > + > +A new direction SND_COMPRESS_PASSTHROUGH is introduced to identify > +the passthrough API. not sure what you meant by 'direction', is this a new concept in addition to PLAYBACK and CAPTURE? edit: this is indeed what the code does, probably the documentation can be clarified to explain why this is needed. > +The API extension shares device enumeration and parameters handling from > +the main compressed API. All other realtime streaming ioctls are deactivated > +and a new set of task related ioctls are introduced. The standard > +read/write/mmap I/O operations are not supported in the passtrough device. The compress API was geared to encoders/decoders. I am not sure how we would e.g. expose parameters for transcoders (decode-reencode) or even SRCs? > +Device ("stream") state handling is reduced to OPEN/SETUP. All other > +states are not available for the passthrough mode. > + > +Data I/O mechanism is using standard dma-buf interface with all advantages > +like mmap, standard I/O, buffer sharing etc. One buffer is used for the > +input data and second (separate) buffer is used for the ouput data. Each task > +have separate I/O buffers. > + > +For the buffering parameters, the fragments means a limit of allocated tasks > +for given device. The fragment_size limits the input buffer size for the given > +device. The output buffer size is determined by the driver (may be different > +from the input buffer size). > + > +State Machine > +============= > + > +The passtrough audio stream state machine is described below : > + > + +----------+ > + | | > + | OPEN | > + | | > + +----------+ > + | > + | > + | compr_set_params() > + | > + v > + all passthrough task ops +----------+ > + +------------------------------------| | > + | | SETUP | > + | | > + | +----------+ > + | | > + +------------------------------------------+ > + > + > +Passthrough operations (ioctls) > +=============================== > + > +CREATE > +------ > +Creates a set of input/output buffers. The input buffer size is > +fragment_size. Allocates unique seqno. > + > +The hardware drivers allocate internal 'struct dma_buf' for both input and for each input and output buffers? > +output buffers (using 'dma_buf_export()' function). The anonymous > +file descriptors for those buffers are passed to user space. > + > +FREE > +---- > +Free a set of input/output buffers. If an task is active, the stop > +operation is executed before. If seqno is zero, operation is executed for all > +tasks. > + > +START > +----- > +Starts (queues) a task. There are two cases of the task start - right after > +the task is created. In this case, origin_seqno must be zero. > +The second case is for reusing of already finished task. The origin_seqno > +must identify the task to be reused. In both cases, a new seqno value > +is allocated and returned to user space. > + > +The prerequisite is that application filled input dma buffer with > +new source data and set input_size to pass the real data size to the driver. > + > +The order of data processing is preserved (first started job must be > +finished at first). > + > +STOP > +---- > +Stop (dequeues) a task. If seqno is zero, operation is executed for all > +tasks. Don't you need a DRAIN? for a co-processor API, you would want all the input data to be consumed and the stop happens when all the resulting data is provided in output buffers. And presumably when the input task is stopped, the state changes are propagated to the next task by the framework? Or is userspace supposed to track each and every task and change their state? I also wonder if the state for a task should reflect that it's waiting on data on its input, or conversely is blocked because the output buffers were not consumed? Dealing with SRC, encoders or decoders mean that the buffers are going to be used at vastly different rates on input and outputs. > +STATUS > +------ > +Obtain the task status (active, finished). Also, the driver will set > +the real output data size (valid area in the output buffer). Is this assuming that the entire input buffer has valid data? There could be cases where the buffers are made of variable-length 'frames', it would be interesting to send such partial buffers to hardware. That's always been a problem with the existing compressed API, we couldn't deal with buffers that were partially filled.
On 27. 05. 24 17:35, Pierre-Louis Bossart wrote: > Hi Jaroslav, > did you intend to reply privately? > I don't mind having this thread in public. I also don't mind, it was just a mistake. Moving to public again. Thanks. > > Couple of additional questions below. > Regards, > -Pierre > > On 5/27/24 09:57, Jaroslav Kysela wrote: >> On 27. 05. 24 16:17, Pierre-Louis Bossart wrote: >>> Thanks Jaroslav, this is very interesting indeed. >>> I added a set of comments to clarify the design. >>> >>>> +There is a requirement to expose the audio hardware that accelerates >>>> various >>>> +tasks for user space such as sample rate converters, compressed >>>> +stream decoders, etc. >>> >>> "passthrough" usually means 'no change to data, filter coefficients not >>> applied' in the audio world. >> >> I am open to any better word here. The passthrough means data passing >> through a hardware as fast as possible in this contents. > > I am trying to find a better word. > >>>> +This is description for the API extension for the compress ALSA API >>>> which >>>> +is able to handle "tasks" that are not bound to real-time operations >>>> +and allows for the serialization of operations. >>> >>> not sure what "not bound to real-time operations" means. sample-rate >>> conversion is probably the most dependent on accurate timing :-) >> >> The meaning is that the data are not queued/processed with a real world >> timing constraints like standard audio data for standard playback or >> capture. It's limited just by hardware speed. I would appreciate >> rewording from a native English speaker, of course. > > What happens if one bit of hardware is actually tied to an audio clock? > Would it prevent the use of the API? No, the conversion will be just slow and the hardware driver should probably use a standard PCM for the data output or input. It will be more appropriate. > I think the only difference with traditional ALSA is that there's no > concept of xrun or time, but the hardware can implement the processing > however it wants. There is no restriction for this design, but I think that we have already interfaces for this. This is a proposal for the audio acceleration API. >>>> +Requirements >>>> +============ >>>> + >>>> +The main requirements are: >>>> + >>>> +- serialization of multiple tasks for user space to allow multiple >>>> + operations without user space intervention >>>> + >>>> +- separate buffers (input + output) for each operation > > I guess we're talking about a "pipeline" where all the buffers are > shared/chained. Mixers and splitters are probably out-of-scope. > > But I wonder if the last task could just consume data, or if the first > task could just generate data. The former case would be processing that > analyzes data and generates a 'score' or an event, and the latter would > be some sort of hardware synthesizer. > >>>> + >>>> +- expose buffers using mmap to user space >>> >>> If every buffer is mmap'ed to userspace, what prevents userspace from >>> interfering? >>> >>> I think userspace would only be involved at the source and sink of the >>> processing chain, no? >> >> We probably talk about same thing. Each task has own buffers which are >> exposed to user space. The interfering is similar like for any other API. > > What I meant if that if userspace is involved in the middle of a chain, > then it kind of breaks the idea that the processing is propagated as > fast as possible through hardware. I don't really see the point of > having intermediate buffers available to userspace. There are no intermediate buffers in this API just input and output buffers yet. You probably have a bigger picture with the stream/data routing through multiple components. It's not the case (yet). The only advantage of existing dma-buf interface is that we can share those data among more drivers and user space simultaneously. The arbitration must be set elsewhere (and it is not set by this simple API). >>>> +The API extension shares device enumeration and parameters handling >>>> from >>>> +the main compressed API. All other realtime streaming ioctls are >>>> deactivated >>>> +and a new set of task related ioctls are introduced. The standard >>>> +read/write/mmap I/O operations are not supported in the passtrough >>>> device. >>> >>> The compress API was geared to encoders/decoders. I am not sure how we >>> would e.g. expose parameters for transcoders (decode-reencode) or even >>> SRCs? >> >> I expect that the struct snd_codec structure will be modified (or usage >> will be clarified) for transcoding. > > What I meant is that the snd_codec took a lot of time to agree on, and > it was based on standards. I wonder if for a generic API the parameters > can be defined at all. It's probably hardware/vendor specific. My goal is to start with something simple and decoding of the compressed stream to a PCM stream or ASRC use is really simple. Almost all parameters are already available in this structure, it's just about to settle the output buffer format. >>>> +CREATE >>>> +------ >>>> +Creates a set of input/output buffers. The input buffer size is >>>> +fragment_size. Allocates unique seqno. >>>> + >>>> +The hardware drivers allocate internal 'struct dma_buf' for both >>>> input and >>> >>> for each input and output buffers? >> >> Yes, for one task you need one input and one output buffer. All tasks >> are separate (but serialized). >> >>>> +STOP >>>> +---- >>>> +Stop (dequeues) a task. If seqno is zero, operation is executed for all >>>> +tasks. >>> >>> Don't you need a DRAIN? >> >> I would expect that transcoding will be fast or the application may use >> smaller input buffers. We can add drain only when really required. >> >>> for a co-processor API, you would want all the input data to be consumed >>> and the stop happens when all the resulting data is provided in output >>> buffers. >>> >>> And presumably when the input task is stopped, the state changes are >>> propagated to the next task by the framework? Or is userspace supposed >>> to track each and every task and change their state? >>> >>> I also wonder if the state for a task should reflect that it's waiting >>> on data on its input, or conversely is blocked because the output >>> buffers were not consumed? Dealing with SRC, encoders or decoders mean >>> that the buffers are going to be used at vastly different rates on input >>> and outputs. >> >> I would suggest to study the proposed sources. The allocated buffer >> sizes may be different from the really used areas. The user space is >> able to pass lower number of filled bytes than allocated. Also the >> driver must ensure that the output buffer is large enough to store >> result for the allocated input buffer size. >> >> The START operation means that user space filled all input bytes for the >> conversion. > > I have a doubt here. Is the intent to fill a bunch of buffers, then wait > for the output to be ready? > > Or can userspace write additional buffers while the output is being > created (similar to what we do for ALSA today). > > Take the SRC for example. userspace provides 1024 samples, gets the > converted output. Now userspace provides a second 1024 sample buffer. > That second conversion needs to use the history of the first processing. > In other words, when a task is finished, it should not reset its > internal context, there is a history buffer that should only be cleared > if the input is a completely different track or content. Yes, I would expect that the hardware driver will keep this state until first STOP command is received. STOP will create a discontinuation. Then the conversion will start again when more buffers are queued. User space can stop all tasks together, too. >>>> +STATUS >>>> +------ >>>> +Obtain the task status (active, finished). Also, the driver will set >>>> +the real output data size (valid area in the output buffer). >>> >>> Is this assuming that the entire input buffer has valid data? >>> There could be cases where the buffers are made of variable-length >>> 'frames', it would be interesting to send such partial buffers to >>> hardware. That's always been a problem with the existing compressed API, >>> we couldn't deal with buffers that were partially filled. >> >> I expect that the initial handshake (params/metadata) should set all >> parameters for the driver to determine the right (maximal) output buffer >> size. > > I wasn't talking about max sizes, but really that each buffer passed by > userspace might contain different number of valid bytes. In ALSA, all > the bytes are assumed to follow each other, we can't tell than the > period N has 200 bytes and period N+1 has 106 bytes valid. That would be > a great addition if we can lose the concept of 'ring buffer' and instead > deal with independent buffers. The proposed task works with independent buffers which may be reused to queue next data when the output (result) is consumed. The sizes of used areas in those buffers may be different for each task (variable data chunks). Jaroslav
diff --git a/Documentation/sound/designs/compress-passthrough.rst b/Documentation/sound/designs/compress-passthrough.rst new file mode 100644 index 000000000000..bb5a0ae5c496 --- /dev/null +++ b/Documentation/sound/designs/compress-passthrough.rst @@ -0,0 +1,125 @@ +================================= +ALSA Co-processor Passthrough API +================================= + +Jaroslav Kysela <perex@perex.cz> + + +Overview +======== + +There is a requirement to expose the audio hardware that accelerates various +tasks for user space such as sample rate converters, compressed +stream decoders, etc. + +This is description for the API extension for the compress ALSA API which +is able to handle "tasks" that are not bound to real-time operations +and allows for the serialization of operations. + +Requirements +============ + +The main requirements are: + +- serialization of multiple tasks for user space to allow multiple + operations without user space intervention + +- separate buffers (input + output) for each operation + +- expose buffers using mmap to user space + +- signal user space when the task is finished (standard poll mechanism) + +Design +====== + +A new direction SND_COMPRESS_PASSTHROUGH is introduced to identify +the passthrough API. + +The API extension shares device enumeration and parameters handling from +the main compressed API. All other realtime streaming ioctls are deactivated +and a new set of task related ioctls are introduced. The standard +read/write/mmap I/O operations are not supported in the passtrough device. + +Device ("stream") state handling is reduced to OPEN/SETUP. All other +states are not available for the passthrough mode. + +Data I/O mechanism is using standard dma-buf interface with all advantages +like mmap, standard I/O, buffer sharing etc. One buffer is used for the +input data and second (separate) buffer is used for the ouput data. Each task +have separate I/O buffers. + +For the buffering parameters, the fragments means a limit of allocated tasks +for given device. The fragment_size limits the input buffer size for the given +device. The output buffer size is determined by the driver (may be different +from the input buffer size). + +State Machine +============= + +The passtrough audio stream state machine is described below : + + +----------+ + | | + | OPEN | + | | + +----------+ + | + | + | compr_set_params() + | + v + all passthrough task ops +----------+ + +------------------------------------| | + | | SETUP | + | | + | +----------+ + | | + +------------------------------------------+ + + +Passthrough operations (ioctls) +=============================== + +CREATE +------ +Creates a set of input/output buffers. The input buffer size is +fragment_size. Allocates unique seqno. + +The hardware drivers allocate internal 'struct dma_buf' for both input and +output buffers (using 'dma_buf_export()' function). The anonymous +file descriptors for those buffers are passed to user space. + +FREE +---- +Free a set of input/output buffers. If an task is active, the stop +operation is executed before. If seqno is zero, operation is executed for all +tasks. + +START +----- +Starts (queues) a task. There are two cases of the task start - right after +the task is created. In this case, origin_seqno must be zero. +The second case is for reusing of already finished task. The origin_seqno +must identify the task to be reused. In both cases, a new seqno value +is allocated and returned to user space. + +The prerequisite is that application filled input dma buffer with +new source data and set input_size to pass the real data size to the driver. + +The order of data processing is preserved (first started job must be +finished at first). + +STOP +---- +Stop (dequeues) a task. If seqno is zero, operation is executed for all +tasks. + +STATUS +------ +Obtain the task status (active, finished). Also, the driver will set +the real output data size (valid area in the output buffer). + +Credits +======= +- ... diff --git a/include/sound/compress_driver.h b/include/sound/compress_driver.h index bcf872c17dd3..acfe79dd43f9 100644 --- a/include/sound/compress_driver.h +++ b/include/sound/compress_driver.h @@ -19,6 +19,23 @@ struct snd_compr_ops; +/** + * struct snd_compr_task_runtime: task runtime description + * + */ +struct snd_compr_task_runtime { + struct list_head list; + struct dma_buf *input; + struct dma_buf *output; + __u64 seqno; + __u64 input_size; + __u64 output_size; + bool active; + bool finished; + void *private_value; +}; + + /** * struct snd_compr_runtime: runtime stream description * @state: stream state @@ -54,6 +71,10 @@ struct snd_compr_runtime { dma_addr_t dma_addr; size_t dma_bytes; struct snd_dma_buffer *dma_buffer_p; + + u32 active_tasks; + u64 task_seqno; + struct list_head tasks; }; /** @@ -132,6 +153,12 @@ struct snd_compr_ops { struct snd_compr_caps *caps); int (*get_codec_caps) (struct snd_compr_stream *stream, struct snd_compr_codec_caps *codec); +#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH) + int (*task_create) (struct snd_compr_stream *stream, struct snd_compr_task_runtime *task); + int (*task_start) (struct snd_compr_stream *stream, struct snd_compr_task_runtime *task); + int (*task_stop) (struct snd_compr_stream *stream, struct snd_compr_task_runtime *task); + int (*task_free) (struct snd_compr_stream *stream, struct snd_compr_task_runtime *task); +#endif }; /** @@ -242,4 +269,9 @@ int snd_compr_free_pages(struct snd_compr_stream *stream); int snd_compr_stop_error(struct snd_compr_stream *stream, snd_pcm_state_t state); +#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH) +void snd_compr_task_finished(struct snd_compr_stream *stream, + struct snd_compr_task_runtime *task); +#endif + #endif diff --git a/include/uapi/sound/compress_offload.h b/include/uapi/sound/compress_offload.h index d185957f3fe0..f6cc721adeb2 100644 --- a/include/uapi/sound/compress_offload.h +++ b/include/uapi/sound/compress_offload.h @@ -14,7 +14,7 @@ #include <sound/compress_params.h> -#define SNDRV_COMPRESS_VERSION SNDRV_PROTOCOL_VERSION(0, 2, 0) +#define SNDRV_COMPRESS_VERSION SNDRV_PROTOCOL_VERSION(0, 3, 0) /** * struct snd_compressed_buffer - compressed buffer * @fragment_size: size of buffer fragment in bytes @@ -68,7 +68,8 @@ struct snd_compr_avail { enum snd_compr_direction { SND_COMPRESS_PLAYBACK = 0, - SND_COMPRESS_CAPTURE + SND_COMPRESS_CAPTURE, + SND_COMPRESS_PASSTHROUGH }; /** @@ -127,6 +128,37 @@ struct snd_compr_metadata { __u32 value[8]; } __attribute__((packed, aligned(4))); +/** + * struct snd_compr_task - task primitive for non-realtime operation + * @seqno: sequence number (task identifier) + * @origin_seqno: previous sequence number (task identifier) - for reuse + * @input_fd: data input file descriptor (dma-buf) + * @output_fd: data output file descriptor (dma-buf) + * @input_size: filled data in bytes (from caller, must not exceed fragment size) + */ +struct snd_compr_task { + __u64 seqno; + __u64 origin_seqno; + int input_fd; + int output_fd; + __u64 input_size; + __u8 reserved[16]; +} __attribute__((packed, aligned(4))); + +/** + * struct snd_compr_task_status - task status + * @seqno: sequence number (task identifier) + * @output_size: filled data in bytes (from driver) + * @finished: non-zero means that the task was finished + */ +struct snd_compr_task_status { + __u64 seqno; + __u64 output_size; + __u8 active; + __u8 finished; + __u8 reserved[14]; +} __attribute__((packed, aligned(4))); + /* * compress path ioctl definitions * SNDRV_COMPRESS_GET_CAPS: Query capability of DSP @@ -164,6 +196,14 @@ struct snd_compr_metadata { #define SNDRV_COMPRESS_DRAIN _IO('C', 0x34) #define SNDRV_COMPRESS_NEXT_TRACK _IO('C', 0x35) #define SNDRV_COMPRESS_PARTIAL_DRAIN _IO('C', 0x36) + + +#define SNDRV_COMPRESS_TASK_CREATE _IOWR('C', 0x60, struct snd_compr_task) +#define SNDRV_COMPRESS_TASK_FREE _IOW('C', 0x61, __u64) +#define SNDRV_COMPRESS_TASK_START _IOWR('C', 0x62, struct snd_compr_task) +#define SNDRV_COMPRESS_TASK_STOP _IOW('C', 0x63, __u64) +#define SNDRV_COMPRESS_TASK_STATUS _IOWR('C', 0x68, struct snd_compr_task_status) + /* * TODO * 1. add mmap support diff --git a/sound/core/Kconfig b/sound/core/Kconfig index 8077f481d84f..3541fe6d477f 100644 --- a/sound/core/Kconfig +++ b/sound/core/Kconfig @@ -59,6 +59,10 @@ config SND_CORE_TEST config SND_COMPRESS_OFFLOAD tristate +config SND_COMPRESS_PASSTHROUGH + select DMA_BUF + tristate + config SND_JACK bool diff --git a/sound/core/compress_offload.c b/sound/core/compress_offload.c index f0008fa2d839..47fb7c5ad564 100644 --- a/sound/core/compress_offload.c +++ b/sound/core/compress_offload.c @@ -24,6 +24,7 @@ #include <linux/types.h> #include <linux/uio.h> #include <linux/uaccess.h> +#include <linux/dma-buf.h> #include <linux/module.h> #include <linux/compat.h> #include <sound/core.h> @@ -85,6 +86,8 @@ static int snd_compr_open(struct inode *inode, struct file *f) dirn = SND_COMPRESS_PLAYBACK; else if ((f->f_flags & O_ACCMODE) == O_RDONLY) dirn = SND_COMPRESS_CAPTURE; + else if ((f->f_flags & O_ACCMODE) == O_RDWR) + dirn = SND_COMPRESS_PASSTHROUGH; else return -EINVAL; @@ -226,6 +229,9 @@ snd_compr_ioctl_avail(struct snd_compr_stream *stream, unsigned long arg) struct snd_compr_avail ioctl_avail; size_t avail; + if (stream->direction == SND_COMPRESS_PASSTHROUGH) + return -EBADFD; + avail = snd_compr_calc_avail(stream, &ioctl_avail); ioctl_avail.avail = avail; @@ -287,6 +293,8 @@ static ssize_t snd_compr_write(struct file *f, const char __user *buf, return -EFAULT; stream = &data->stream; + if (stream->direction == SND_COMPRESS_PASSTHROUGH) + return -EBADFD; guard(mutex)(&stream->device->lock); /* write is allowed when stream is running or has been steup */ switch (stream->runtime->state) { @@ -336,6 +344,8 @@ static ssize_t snd_compr_read(struct file *f, char __user *buf, return -EFAULT; stream = &data->stream; + if (stream->direction == SND_COMPRESS_PASSTHROUGH) + return -EBADFD; guard(mutex)(&stream->device->lock); /* read is allowed when stream is running, paused, draining and setup @@ -385,6 +395,8 @@ static __poll_t snd_compr_poll(struct file *f, poll_table *wait) { struct snd_compr_file *data = f->private_data; struct snd_compr_stream *stream; + struct snd_compr_runtime *runtime; + struct snd_compr_task_runtime *task; size_t avail; __poll_t retval = 0; @@ -392,6 +404,7 @@ static __poll_t snd_compr_poll(struct file *f, poll_table *wait) return EPOLLERR; stream = &data->stream; + runtime = stream->runtime; guard(mutex)(&stream->device->lock); @@ -405,6 +418,18 @@ static __poll_t snd_compr_poll(struct file *f, poll_table *wait) poll_wait(f, &stream->runtime->sleep, wait); + if (stream->direction == SND_COMPRESS_PASSTHROUGH) { + if (runtime->fragments > runtime->active_tasks) + retval |= EPOLLOUT | EPOLLWRNORM; + task = list_first_entry_or_null(&runtime->tasks, + struct snd_compr_task_runtime, + list); + if (task && task->finished) + retval |= EPOLLIN | EPOLLRDNORM; + return retval; + } + + avail = snd_compr_get_avail(stream); pr_debug("avail is %ld\n", (unsigned long)avail); /* check if we have at least one fragment to fill */ @@ -521,6 +546,9 @@ static int snd_compr_allocate_buffer(struct snd_compr_stream *stream, unsigned int buffer_size; void *buffer = NULL; + if (stream->direction == SND_COMPRESS_PASSTHROUGH) + goto params; + buffer_size = params->buffer.fragment_size * params->buffer.fragments; if (stream->ops->copy) { buffer = NULL; @@ -543,18 +571,30 @@ static int snd_compr_allocate_buffer(struct snd_compr_stream *stream, if (!buffer) return -ENOMEM; } - stream->runtime->fragment_size = params->buffer.fragment_size; - stream->runtime->fragments = params->buffer.fragments; + stream->runtime->buffer = buffer; stream->runtime->buffer_size = buffer_size; +params: + stream->runtime->fragment_size = params->buffer.fragment_size; + stream->runtime->fragments = params->buffer.fragments; return 0; } -static int snd_compress_check_input(struct snd_compr_params *params) +static int +snd_compress_check_input(struct snd_compr_stream *stream, struct snd_compr_params *params) { + u32 max_fragments; + /* first let's check the buffer parameter's */ - if (params->buffer.fragment_size == 0 || - params->buffer.fragments > U32_MAX / params->buffer.fragment_size || + if (params->buffer.fragment_size == 0) + return -EINVAL; + + if (stream->direction == SND_COMPRESS_PASSTHROUGH) + max_fragments = 64; /* safe value */ + else + max_fragments = U32_MAX / params->buffer.fragment_size; + + if (params->buffer.fragments > max_fragments || params->buffer.fragments == 0) return -EINVAL; @@ -583,7 +623,7 @@ snd_compr_set_params(struct snd_compr_stream *stream, unsigned long arg) if (IS_ERR(params)) return PTR_ERR(no_free_ptr(params)); - retval = snd_compress_check_input(params); + retval = snd_compress_check_input(stream, params); if (retval) return retval; @@ -939,6 +979,247 @@ static int snd_compr_partial_drain(struct snd_compr_stream *stream) return snd_compress_wait_for_drain(stream); } +#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH) + +static struct snd_compr_task_runtime * + snd_compr_find_task(struct snd_compr_stream *stream, __u64 seqno) +{ + struct snd_compr_task_runtime *task; + + list_for_each_entry(task, &stream->runtime->tasks, list) { + if (task->seqno == seqno) + return task; + } + return NULL; +} + +static void snd_compr_task_free(struct snd_compr_task_runtime *task) +{ + if (task->output) + dma_buf_put(task->output); + if (task->input) + dma_buf_put(task->input); + kfree(task); +} + +static u64 snd_compr_seqno_next(struct snd_compr_stream *stream) +{ + u64 seqno = ++stream->runtime->task_seqno; + if (seqno == 0) + seqno = ++stream->runtime->task_seqno; + return seqno; +} + +static int snd_compr_task_new(struct snd_compr_stream *stream, struct snd_compr_task *utask) +{ + struct snd_compr_task_runtime *task; + int retval; + + if (utask->origin_seqno != 0 || utask->input_size != 0) + return -EINVAL; + if (snd_compr_find_task(stream, utask->seqno)) + return -EALREADY; + task = kzalloc(sizeof(*task), GFP_KERNEL); + if (task == NULL) + return -ENOMEM; + task->seqno = snd_compr_seqno_next(stream); + task->input_size = utask->input_size; + retval = stream->ops->task_create(stream, task); + if (retval < 0) + goto cleanup; + utask->input_fd = dma_buf_fd(task->input, O_WRONLY|O_CLOEXEC); + if (utask->input_fd < 0) { + retval = utask->input_fd; + goto cleanup; + } + utask->output_fd = dma_buf_fd(task->output, O_RDONLY|O_CLOEXEC); + if (utask->output_fd < 0) { + retval = utask->output_fd; + goto cleanup; + } + list_add_tail(&task->list, &stream->runtime->tasks); + return 0; +cleanup: + snd_compr_task_free(task); + return retval; +} + +static int snd_compr_task_create(struct snd_compr_stream *stream, unsigned long arg) +{ + struct snd_compr_task *task __free(kfree) = NULL; + int retval; + + if (stream->runtime->state != SNDRV_PCM_STATE_SETUP) + return -EPERM; + task = memdup_user((void __user *)arg, sizeof(*task)); + retval = snd_compr_task_new(stream, task); + if (retval >= 0) + if (copy_to_user((void __user *)arg, task, sizeof(*task))) + retval = -EFAULT; + return retval; +} + +static int snd_compr_task_start_prepare(struct snd_compr_task_runtime *task, + struct snd_compr_task *utask) +{ + if (task == NULL) + return -EINVAL; + if (task->active) + return -EBUSY; + if (utask->input_size > task->input->size) + return -EINVAL; + task->input_size = utask->input_size; + task->finished = false; + return 0; +} + +static int snd_compr_task_start(struct snd_compr_stream *stream, struct snd_compr_task *utask) +{ + struct snd_compr_task_runtime *task; + int retval; + + if (utask->origin_seqno > 0) { + task = snd_compr_find_task(stream, utask->seqno); + retval = snd_compr_task_start_prepare(task, utask); + if (retval < 0) + return retval; + task->seqno = utask->seqno = snd_compr_seqno_next(stream); + utask->origin_seqno = 0; + list_move_tail(&task->list, &stream->runtime->tasks); + } else { + task = snd_compr_find_task(stream, utask->seqno); + if (task && task->finished) + return -EBUSY; + retval = snd_compr_task_start_prepare(task, utask); + if (retval < 0) + return retval; + } + retval = stream->ops->task_start(stream, task); + if (retval >= 0) { + task->active = true; + stream->runtime->active_tasks++; + } + return retval; +} + +static int snd_compr_task_start_ioctl(struct snd_compr_stream *stream, unsigned long arg) +{ + struct snd_compr_task *task __free(kfree) = NULL; + int retval; + + if (stream->runtime->state != SNDRV_PCM_STATE_SETUP) + return -EPERM; + task = memdup_user((void __user *)arg, sizeof(*task)); + retval = snd_compr_task_start(stream, task); + if (retval >= 0) + if (copy_to_user((void __user *)arg, task, sizeof(*task))) + retval = -EFAULT; + return retval; +} + +static void snd_compr_task_stop_one(struct snd_compr_stream *stream, + struct snd_compr_task_runtime *task) +{ + if (!task->finished) + stream->ops->task_stop(stream, task); + if (task->active) { + task->active = false; + if (!snd_BUG_ON(stream->runtime->active_tasks == 0)) + stream->runtime->active_tasks--; + list_move_tail(&task->list, &stream->runtime->tasks); + } +} + +static void snd_compr_task_free_one(struct snd_compr_stream *stream, + struct snd_compr_task_runtime *task) +{ + if (task->active) + snd_compr_task_stop_one(stream, task); + stream->ops->task_free(stream, task); + list_del(&task->list); + snd_compr_task_free(task); +} + +typedef void (*snd_compr_seq_func_t)(struct snd_compr_stream *stream, + struct snd_compr_task_runtime *task); + +static int snd_compr_task_seq(struct snd_compr_stream *stream, unsigned long arg, + snd_compr_seq_func_t fcn) +{ + struct snd_compr_task_runtime *task; + __u64 seqno; + int retval; + + if (stream->runtime->state != SNDRV_PCM_STATE_SETUP) + return -EPERM; + retval = get_user(seqno, (__u64 __user *)arg); + if (retval < 0) + return retval; + retval = 0; + if (seqno == 0) { + list_for_each_entry(task, &stream->runtime->tasks, list) + fcn(stream, task); + } else { + task = snd_compr_find_task(stream, seqno); + if (task == NULL) { + retval = -EINVAL; + } else { + fcn(stream, task); + } + } + return retval; +} + +static int snd_compr_task_status(struct snd_compr_stream *stream, + struct snd_compr_task_status *status) +{ + struct snd_compr_task_runtime *task; + + task = snd_compr_find_task(stream, status->seqno); + if (task == NULL) + return -EINVAL; + status->output_size = task->output_size; + status->active = task->active ? 1 : 0; + status->finished = task->finished ? 1 : 0; + return 0; +} + +static int snd_compr_task_status_ioctl(struct snd_compr_stream *stream, unsigned long arg) +{ + struct snd_compr_task_status *status __free(kfree) = NULL; + int retval; + + if (stream->runtime->state != SNDRV_PCM_STATE_SETUP) + return -EPERM; + status = memdup_user((void __user *)arg, sizeof(*status)); + retval = snd_compr_task_status(stream, status); + if (retval >= 0) + if (copy_to_user((void __user *)arg, status, sizeof(*status))) + retval = -EFAULT; + return retval; +} + +/** + * snd_compr_task_finished: Notify that the task was finished + * @stream: pointer to stream + * @task: runtime task structure + * + * Set the finished task state and notify waiters. + */ +void snd_compr_task_finished(struct snd_compr_stream *stream, + struct snd_compr_task_runtime *task) +{ + guard(mutex)(&stream->device->lock); + if (!snd_BUG_ON(stream->runtime->active_tasks == 0)) + stream->runtime->active_tasks--; + task->active = false; + task->finished = true; + wake_up(&stream->runtime->sleep); +} +EXPORT_SYMBOL(snd_compr_task_finished); + +#endif /* CONFIG_COMPRESS_PASSTHROUGH */ + static long snd_compr_ioctl(struct file *f, unsigned int cmd, unsigned long arg) { struct snd_compr_file *data = f->private_data; @@ -968,6 +1249,27 @@ static long snd_compr_ioctl(struct file *f, unsigned int cmd, unsigned long arg) return snd_compr_set_metadata(stream, arg); case _IOC_NR(SNDRV_COMPRESS_GET_METADATA): return snd_compr_get_metadata(stream, arg); + } + + if (stream->direction == SND_COMPRESS_PASSTHROUGH) { +#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH) + switch (_IOC_NR(cmd)) { + case _IOC_NR(SNDRV_COMPRESS_TASK_CREATE): + return snd_compr_task_create(stream, arg); + case _IOC_NR(SNDRV_COMPRESS_TASK_FREE): + return snd_compr_task_seq(stream, arg, snd_compr_task_free_one); + case _IOC_NR(SNDRV_COMPRESS_TASK_START): + return snd_compr_task_start_ioctl(stream, arg); + case _IOC_NR(SNDRV_COMPRESS_TASK_STOP): + return snd_compr_task_seq(stream, arg, snd_compr_task_stop_one); + case _IOC_NR(SNDRV_COMPRESS_TASK_STATUS): + return snd_compr_task_status_ioctl(stream, arg); + } +#endif + return -ENOTTY; + } + + switch (_IOC_NR(cmd)) { case _IOC_NR(SNDRV_COMPRESS_TSTAMP): return snd_compr_tstamp(stream, arg); case _IOC_NR(SNDRV_COMPRESS_AVAIL):
There is a requirement to expose the audio hardware that accelerates various tasks for user space such as sample rate converters, compressed stream decoders, etc. This is description for the API extension for the compress ALSA API which is able to handle "tasks" that are not bound to real-time operations and allows for the serialization of operations. For details, refer to "compress-passthrough.rst" document. Note: This code is RFC (not tested, just to clearify the API requirements). My goal is to add a test (loopback) driver and add a support to tinycompress library in the next step. Cc: Mark Brown <broonie@kernel.org> Cc: Shengjiu Wang <shengjiu.wang@gmail.com> Cc: Nicolas Dufresne <nicolas@ndufresne.ca> Cc: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Cc: Vinod Koul <vkoul@kernel.org> Signed-off-by: Jaroslav Kysela <perex@perex.cz> --- .../sound/designs/compress-passthrough.rst | 125 +++++++ include/sound/compress_driver.h | 32 ++ include/uapi/sound/compress_offload.h | 44 ++- sound/core/Kconfig | 4 + sound/core/compress_offload.c | 314 +++++++++++++++++- 5 files changed, 511 insertions(+), 8 deletions(-) create mode 100644 Documentation/sound/designs/compress-passthrough.rst