Message ID | 20161129170055.27608-1-robert@sixbynine.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> == Series Details == > > Series: drm/i915/perf: More documentation hooked to i915.rst > URL : https://patchwork.freedesktop.org/series/16114/ > State : warning > > == Summary == > > Series 16114v1 drm/i915/perf: More documentation hooked to i915.rst > https://patchwork.freedesktop.org/api/1.0/series/16114/revisions/1/mbo > x/ > > Test kms_flip: > Subgroup basic-flip-vs-wf_vblank: > pass -> DMESG-WARN (fi-skl-6770hq) Two issues seen: *ERROR* failed to inform PCU about cdclk change => https://bugs.freedesktop.org/show_bug.cgi?id=97929 *ERROR* CPU pipe A FIFO underrun => https://bugs.freedesktop.org/show_bug.cgi?id=94605 > > fi-bdw-5557u total:245 pass:230 dwarn:0 dfail:0 fail:0 skip:15 > fi-bsw-n3050 total:245 pass:205 dwarn:0 dfail:0 fail:0 skip:40 > fi-bxt-t5700 total:245 pass:217 dwarn:0 dfail:0 fail:0 skip:28 > fi-byt-j1900 total:245 pass:217 dwarn:0 dfail:0 fail:0 skip:28 > fi-byt-n2820 total:245 pass:213 dwarn:0 dfail:0 fail:0 skip:32 > fi-hsw-4770 total:245 pass:225 dwarn:0 dfail:0 fail:0 skip:20 > fi-hsw-4770r total:245 pass:225 dwarn:0 dfail:0 fail:0 skip:20 > fi-ilk-650 total:245 pass:192 dwarn:0 dfail:0 fail:0 skip:53 > fi-ivb-3520m total:245 pass:223 dwarn:0 dfail:0 fail:0 skip:22 > fi-ivb-3770 total:245 pass:223 dwarn:0 dfail:0 fail:0 skip:22 > fi-kbl-7500u total:245 pass:223 dwarn:0 dfail:0 fail:0 skip:22 > fi-skl-6260u total:245 pass:231 dwarn:0 dfail:0 fail:0 skip:14 > fi-skl-6700hq total:245 pass:224 dwarn:0 dfail:0 fail:0 skip:21 > fi-skl-6700k total:245 pass:223 dwarn:1 dfail:0 fail:0 skip:21 > fi-skl-6770hq total:245 pass:230 dwarn:1 dfail:0 fail:0 skip:14 > fi-snb-2520m total:245 pass:213 dwarn:0 dfail:0 fail:0 skip:32 > fi-snb-2600 total:245 pass:212 dwarn:0 dfail:0 fail:0 skip:33 > > 0d603327c63630327b2a4a09ae6ea521dd21513d drm-tip: 2016y-11m- > 29d-18h-00m-53s UTC integration manifest > d10f3f3 drm/i915/perf: More documentation hooked to i915.rst > > == Logs == > > For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3146/ Jani Saarinen Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
On 29 November 2016 at 17:00, Robert Bragg <robert@sixbynine.org> wrote: > This adds a 'Perf' section to i915.rst with the following sub sections: > - Overview > - Comparison with Core Perf > - i915 Driver Entry Points > - i915 Perf Stream > - i915 Perf Observation Architecture Stream > - All i915 Perf Internals > > Signed-off-by: Robert Bragg <robert@sixbynine.org> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > --- > Documentation/gpu/i915.rst | 92 +++++++++++++ > drivers/gpu/drm/i915/i915_drv.h | 151 ++++++++++++++++---- > drivers/gpu/drm/i915/i915_perf.c | 289 +++++++++++++++++++++++++++++++++++---- > 3 files changed, 478 insertions(+), 54 deletions(-) > > diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst > index 117d2ab..714bd4b 100644 > --- a/Documentation/gpu/i915.rst > +++ b/Documentation/gpu/i915.rst > @@ -356,4 +356,96 @@ switch_mm > .. kernel-doc:: drivers/gpu/drm/i915/i915_trace.h > :doc: switch_mm tracepoint > > +Perf > +==== > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :doc: i915 Perf Overview > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :doc: i915 Perf History and Comparison with Core Perf > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :doc: i915 Perf File Operations I see no such :DOC entry in i915_perf.c. > + > +i915 Driver Entry Points > +------------------------ > + > +This section covers the entrypoints exported outside of i915_perf.c to > +integrate with drm/i915 and to handle the `DRM_I915_PERF_OPEN` ioctl. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_init > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_fini > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_register > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_unregister > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_open_ioctl > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_release I see no kernel doc for this. > + > +i915 Perf Stream > +---------------- > + > +This section covers the stream-semantics-agnostic structures and functions > +for representing an i915 perf stream FD and associated file operations. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h > + :functions: i915_perf_stream > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h > + :functions: i915_perf_stream_ops > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: read_properties_unlocked > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_open_ioctl_unlocked i915_perf_open_ioctl_locked, and there is no kernel doc for it. > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_destroy_unlocked i915_perf_destroy_locked, and there is no kernel doc for it. > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_read > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_ioctl I see no kernel doc for this. > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_enable_unlocked i915_perf_enable_locked, and there is no kernel doc for it. > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_disable_unlocked i915_perf_disable_locked, and there is no kernel doc for it. > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_poll > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_poll_unlocked i915_perf_poll_locked, and there is no kernel doc for it > + > +i915 Perf Observation Architecture Stream > +----------------------------------------- > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: OA_BUFFER_SIZE hmm, but this is not a function, enum, union, struct or typedef, so can we actually add kernel doc for preprocessor directives ? Also OA_BUFFER_SIZE lacks the proper formatting to get picked up anyway. > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h > + :functions: i915_oa_ops > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_oa_stream_init > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_oa_read > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_oa_stream_enable > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_oa_stream_disable > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_oa_wait_unlocked > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_oa_poll_wait > + > +All i915 Perf Internals > +----------------------- > + > +This section simply includes all currently documented i915 perf internals, in > +no particular order, but may include some more minor utilities or platform > +specific details than found in the more high-level sections. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :internal: > + > .. WARNING: DOCPROC directive not supported: !Cdrivers/gpu/drm/i915/i915_irq.c > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 1ec9619..9f92755 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -1827,89 +1827,186 @@ struct i915_oa_reg { > > struct i915_perf_stream; > > +/** > + * struct i915_perf_stream_ops - the OPs to support a specific stream type > + */ > struct i915_perf_stream_ops { > - /* Enables the collection of HW samples, either in response to > - * I915_PERF_IOCTL_ENABLE or implicitly called when stream is > - * opened without I915_PERF_FLAG_DISABLED. > + /** > + * @enable: Enables the collection of HW samples, either in response to > + * I915_PERF_IOCTL_ENABLE or implicitly called when stream is opened > + * without I915_PERF_FLAG_DISABLED. > */ > void (*enable)(struct i915_perf_stream *stream); > > - /* Disables the collection of HW samples, either in response to > - * I915_PERF_IOCTL_DISABLE or implicitly called before > - * destroying the stream. > + /** > + * @disable: Disables the collection of HW samples, either in response > + * to I915_PERF_IOCTL_DISABLE or implicitly called before destroying > + * the stream. > */ > void (*disable)(struct i915_perf_stream *stream); > > - /* Call poll_wait, passing a wait queue that will be woken > + /** > + * @poll_wait: Call poll_wait, passing a wait queue that will be woken > * once there is something ready to read() for the stream > */ > void (*poll_wait)(struct i915_perf_stream *stream, > struct file *file, > poll_table *wait); > > - /* For handling a blocking read, wait until there is something > - * to ready to read() for the stream. E.g. wait on the same > + /** > + * @wait_unlocked: For handling a blocking read, wait until there is > + * something to ready to read() for the stream. E.g. wait on the same > * wait queue that would be passed to poll_wait(). > */ > int (*wait_unlocked)(struct i915_perf_stream *stream); > > - /* read - Copy buffered metrics as records to userspace > - * @buf: the userspace, destination buffer > - * @count: the number of bytes to copy, requested by userspace > - * @offset: zero at the start of the read, updated as the read > - * proceeds, it represents how many bytes have been > - * copied so far and the buffer offset for copying the > - * next record. > + /** > + * @read: Copy buffered metrics as records to userspace > + * **buf**: the userspace, destination buffer > + * **count**: the number of bytes to copy, requested by userspace > + * **offset**: zero at the start of the read, updated as the read > + * proceeds, it represents how many bytes have been copied so far and > + * the buffer offset for copying the next record. So there's no proper way of documenting the parameters of a function pointer which happens to also be a member of a struct ? I guess you would have to do a typedef, but then that's a bit ugly... > * > - * Copy as many buffered i915 perf samples and records for > - * this stream to userspace as will fit in the given buffer. > + * Copy as many buffered i915 perf samples and records for this stream > + * to userspace as will fit in the given buffer. > * > - * Only write complete records; returning -ENOSPC if there > - * isn't room for a complete record. > + * Only write complete records; returning -ENOSPC if there isn't room > + * for a complete record. > * > - * Return any error condition that results in a short read > - * such as -ENOSPC or -EFAULT, even though these may be > - * squashed before returning to userspace. > + * Return any error condition that results in a short read such as > + * -ENOSPC or -EFAULT, even though these may be squashed before > + * returning to userspace. > */ > int (*read)(struct i915_perf_stream *stream, > char __user *buf, > size_t count, > size_t *offset); > > - /* Cleanup any stream specific resources. > + /** > + * @destroy: Cleanup any stream specific resources. > * > * The stream will always be disabled before this is called. > */ > void (*destroy)(struct i915_perf_stream *stream); > }; > > +/** > + * struct i915_perf_stream - state for a single open stream FD > + */ > struct i915_perf_stream { > + /** > + * @dev_priv: i915 drm device > + */ > struct drm_i915_private *dev_priv; > > + /** > + * @link: Links the stream into ``&drm_i915_private->streams`` > + */ > struct list_head link; > > + /** > + * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*` > + * properties given when opening a stream, representing the contents > + * of a single sample as read() by userspace. > + */ > u32 sample_flags; > + > + /** > + * @sample_size: Considering the configured contents of a sample > + * combined with the required header size, this is the total size > + * of a single sample record. > + */ > int sample_size; > > + /** > + * @ctx: %NULL if measuring system-wide across all contexts or a > + * specific context that is being monitored. > + */ > struct i915_gem_context *ctx; > + > + /** > + * @enabled: Whether the stream is currently enabled, considering > + * whether the stream was opened in a disabled state and based > + * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls. > + */ > bool enabled; > > + /** > + * @ops: The callbacks providing the implementation of this specific > + * type of configured stream. > + */ > const struct i915_perf_stream_ops *ops; > }; > > +/** > + * struct i915_oa_ops - Gen specific implementation of an OA unit stream > + */ > struct i915_oa_ops { > + /** > + * @init_oa_buffer: Resets the head and tail pointers of the > + * circular buffer for periodic OA reports. > + * > + * Called when first opening a stream for OA metrics, but also may be > + * called in response to an OA buffer overflow or other error > + * condition. > + * > + * Note it may be necessary to clear the full OA buffer here as part of > + * maintaining the invariable that new reports must be written to > + * zeroed memory for us to be able to reliable detect if an expected > + * report has not yet landed in memory. (At least on Haswell the OA > + * buffer tail pointer is not synchronized with reports being visible > + * to the CPU) > + */ > void (*init_oa_buffer)(struct drm_i915_private *dev_priv); > + > + /** > + * @enable_metric_set: Applies any MUX configuration to set up the > + * Boolean and Custom (B/C) counters that are part of the counter > + * reports being sampled. May apply system constraints such as > + * disabling EU clock gating as required. > + */ > int (*enable_metric_set)(struct drm_i915_private *dev_priv); > + > + /** > + * @disable_metric_set: Remove system constraints associated with using > + * the OA unit. > + */ > void (*disable_metric_set)(struct drm_i915_private *dev_priv); > + > + /** > + * @oa_enable: Enable periodic sampling > + */ > void (*oa_enable)(struct drm_i915_private *dev_priv); > + > + /** > + * @oa_disable: Disable periodic sampling > + */ > void (*oa_disable)(struct drm_i915_private *dev_priv); > - void (*update_oacontrol)(struct drm_i915_private *dev_priv); > - void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv, > - u32 ctx_id); > + > + /** > + * @read: Copy data from the circular OA buffer into a given userspace > + * buffer. > + */ > int (*read)(struct i915_perf_stream *stream, > char __user *buf, > size_t count, > size_t *offset); > + > + /** > + * @oa_buffer_is_empty: Check if OA buffer empty (false positives OK) > + * > + * This is either called via fops or the poll check hrtimer (atomic > + * ctx) without any locks taken. > + * > + * It's safe to read OA config state here unlocked, assuming that this > + * is only called while the stream is enabled, while the global OA > + * configuration can't be modified. > + * > + * Efficiency is more important than avoiding some false positives > + * here, which will be handled gracefully - likely resulting in an > + * EAGAIN error for userspace. > + */ > bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv); > }; > > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > index 68b7c27..0be8cef 100644 > --- a/drivers/gpu/drm/i915/i915_perf.c > +++ b/drivers/gpu/drm/i915/i915_perf.c > @@ -26,7 +26,10 @@ > > > /** > - * DOC: i915 Perf, streaming API for GPU metrics > + * DOC: i915 Perf Overview > + * > + * Overview > + * -------- > * > * Gen graphics supports a large number of performance counters that can help > * driver and application developers understand and optimize their use of the > @@ -45,6 +48,13 @@ > * privileges by default, unless changed via the dev.i915.perf_event_paranoid > * sysctl option. > * > + */ > + > +/** > + * DOC: i915 Perf History and Comparison with Core Perf > + * > + * Comparison with Core Perf > + * ------------------------- > * > * The interface was initially inspired by the core Perf infrastructure but > * some notable differences are: > @@ -75,8 +85,8 @@ > * gets copied from the GPU mapped buffers to userspace buffers. > * > * > - * Some notes regarding Linux Perf: > - * -------------------------------- > + * Issues hit with first prototype based on Core Perf > + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > * > * The first prototype of this driver was based on the core perf > * infrastructure, and while we did make that mostly work, with some changes to > @@ -135,7 +145,7 @@ > * for combining with the side-band raw reports it captures using > * MI_REPORT_PERF_COUNT commands. > * > - * _ As a side note on perf's grouping feature; there was also some concern > + * - As a side note on perf's grouping feature; there was also some concern > * that using PERF_FORMAT_GROUP as a way to pack together counter values > * would quite drastically inflate our sample sizes, which would likely > * lower the effective sampling resolutions we could use when the available > @@ -277,6 +287,20 @@ static struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = { > > #define SAMPLE_OA_REPORT (1<<0) > > +/** > + * struct perf_open_properties - for validated properties given to open a stream > + * @sample_flags: `DRM_I915_PERF_PROP_SAMPLE_*` properties are tracked as flags > + * @single_context: Whether a single or all gpu contexts should be monitored > + * @ctx_handle: A gem ctx handle for use with @single_context > + * @metrics_set: An ID for an OA unit metric set advertised via sysfs > + * @oa_format: An OA unit HW report format > + * @oa_periodic: Whether to enable periodic OA unit sampling > + * @oa_period_exponent: The OA unit sampling period is derived from this > + * > + * As read_properties_unlocked() enumerates and validates the properties given > + * to open a stream of metrics the configuration is built up in the structure > + * which starts out zero initialized. > + */ > struct perf_open_properties { > u32 sample_flags; > > @@ -314,7 +338,19 @@ static bool gen7_oa_buffer_is_empty_fop_unlocked(struct drm_i915_private *dev_pr > } > > /** > - * Appends a status record to a userspace read() buffer. > + * append_oa_status - Appends a status record to a userspace read() buffer. > + * @stream: An i915-perf stream opened for OA metrics > + * @buf: destination buffer given by userspace > + * @count: the number of bytes userspace wants to read > + * @offset: (inout): the current position for writing into @buf > + * @type: The kind of status to report to userspace > + * > + * Writes a status record (such as `DRM_I915_PERF_RECORD_OA_REPORT_LOST`) > + * into the userspace read() buffer. > + * > + * The @buf @offset will only be updated on success. > + * > + * Returns: 0 on success, negative error code on failure. > */ > static int append_oa_status(struct i915_perf_stream *stream, > char __user *buf, > @@ -336,7 +372,21 @@ static int append_oa_status(struct i915_perf_stream *stream, > } > > /** > - * Copies single OA report into userspace read() buffer. > + * append_oa_sample - Copies single OA report into userspace read() buffer. > + * @stream: An i915-perf stream opened for OA metrics > + * @buf: destination buffer given by userspace > + * @count: the number of bytes userspace wants to read > + * @offset: (inout): the current position for writing into @buf > + * @report: A single OA report to (optionally) include as part of the sample > + * > + * The contents of a sample are configured through `DRM_I915_PERF_PROP_SAMPLE_*` > + * properties when opening a stream, tracked as `stream->sample_flags`. This > + * function copies the requested components of a single sample to the given > + * read() @buf. > + * > + * The @buf @offset will only be updated on success. > + * > + * Returns: 0 on success, negative error code on failure. > */ > static int append_oa_sample(struct i915_perf_stream *stream, > char __user *buf, > @@ -380,8 +430,6 @@ static int append_oa_sample(struct i915_perf_stream *stream, > * @head_ptr: (inout): the current oa buffer cpu read position > * @tail: the current oa buffer gpu write position > * > - * Returns 0 on success, negative error code on failure. > - * > * Notably any error condition resulting in a short read (-ENOSPC or > * -EFAULT) will be returned even though one or more records may > * have been successfully copied. In this case it's up to the caller > @@ -392,6 +440,8 @@ static int append_oa_sample(struct i915_perf_stream *stream, > * tail, so the head chases the tail?... If you think that's mad > * and back-to-front you're not alone, but this follows the > * Gen PRM naming convention. > + * > + * Returns: 0 on success, negative error code on failure. > */ > static int gen7_append_oa_reports(struct i915_perf_stream *stream, > char __user *buf, > @@ -496,6 +546,22 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, > return ret; > } > > +/** > + * gen7_oa_read - copy status records then buffered OA reports > + * @stream: An i915-perf stream opened for OA metrics > + * @buf: destination buffer given by userspace > + * @count: the number of bytes userspace wants to read > + * @offset: (inout): the current position for writing into @buf > + * > + * Checks Gen 7 specific OA unit status registers and if necessary appends > + * corresponding status records for userspace (such as for a buffer full > + * condition) and then initiate appending any buffered OA reports. > + * > + * Updates @offset according to the number of bytes successfully copied into > + * the userspace buffer. > + * > + * Returns: zero on success or a negative error code > + */ > static int gen7_oa_read(struct i915_perf_stream *stream, > char __user *buf, > size_t count, > @@ -597,6 +663,20 @@ static int gen7_oa_read(struct i915_perf_stream *stream, > return ret; > } > > +/** > + * i915_oa_wait_unlocked - handles blocking IO until OA data available > + * @stream: An i915-perf stream opened for OA metrics > + * > + * Called when userspace tries to read() from a blocking stream FD opened > + * for OA metrics. It waits until the hrtimer callback finds a non-empty > + * OA buffer and wakes us. > + * > + * Note: it's acceptable to have this return with some false positives > + * since any subsequent read handling will return EAGAIN if there isn't s/EAGAIN/-EAGAIN/ > + * really data ready for userspace yet. > + * > + * Returns: zero on success or a negative error code > + */ > static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) > { > struct drm_i915_private *dev_priv = stream->dev_priv; > @@ -615,6 +695,16 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) > !dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv)); > } > > +/** > + * i915_oa_poll_wait - call poll_wait() for an OA stream poll() > + * @stream: An i915-perf stream opened for OA metrics > + * @file: An i915 perf stream file > + * @wait: poll() state table > + * > + * For handling userspace polling on an i915 perf stream opened for OA metrics, > + * this starts a poll_wait with the wait queue that our hrtimer callback wakes > + * when it sees data ready to read in the circular OA buffer. > + */ > static void i915_oa_poll_wait(struct i915_perf_stream *stream, > struct file *file, > poll_table *wait) > @@ -624,6 +714,18 @@ static void i915_oa_poll_wait(struct i915_perf_stream *stream, > poll_wait(file, &dev_priv->perf.oa.poll_wq, wait); > } > > +/** > + * i915_oa_read - just calls through to ``&i915_oa_ops->read`` I've got no idea what ``&i915_oa_ops->read`` is supposed to generate, but it spews out :c:type:`i915_oa_ops->read <i915_oa_ops>` for me... > + * @stream: An i915-perf stream opened for OA metrics > + * @buf: destination buffer given by userspace > + * @count: the number of bytes userspace wants to read > + * @offset: (inout): the current position for writing into @buf > + * > + * Updates @offset according to the number of bytes successfully copied into > + * the userspace buffer. > + * > + * Returns: zero on success or a negative error code > + */ > static int i915_oa_read(struct i915_perf_stream *stream, > char __user *buf, > size_t count, > @@ -634,9 +736,15 @@ static int i915_oa_read(struct i915_perf_stream *stream, > return dev_priv->perf.oa.ops.read(stream, buf, count, offset); > } > > -/* Determine the render context hw id, and ensure it remains fixed for the > +/** > + * oa_get_render_ctx_id - determine and hold ctx hw id > + * @stream: An i915-perf stream opened for OA metrics > + * > + * Determine the render context hw id, and ensure it remains fixed for the > * lifetime of the stream. This ensures that we don't have to worry about > * updating the context ID in OACONTROL on the fly. > + * > + * Returns: zero on success or a negative error code > */ > static int oa_get_render_ctx_id(struct i915_perf_stream *stream) > { > @@ -673,6 +781,13 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) > return ret; > } > > +/** > + * oa_put_render_ctx_id - counterpart to oa_get_render_ctx_id releases hold > + * @stream: An i915-perf stream opened for OA metrics > + * > + * In case anything needed doing to ensure the context HW ID would remain valid > + * for the lifetime of the stream, then that can be undone here. > + */ > static void oa_put_render_ctx_id(struct i915_perf_stream *stream) > { > struct drm_i915_private *dev_priv = stream->dev_priv; > @@ -945,6 +1060,15 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv) > spin_unlock_irqrestore(&dev_priv->perf.hook_lock, flags); > } > > +/** > + * i915_oa_stream_enable - handle I915_PERF_IOCTL_ENABLE for OA stream > + * @stream: An i915 perf stream opened for OA metrics > + * > + * [Re]enables hardware periodic sampling according to the period configured > + * when opening the stream. This also starts a hrtimer that will periodically > + * check for data in the circular OA buffer for notifying userspace (e.g. > + * during a read() or poll()). > + */ > static void i915_oa_stream_enable(struct i915_perf_stream *stream) > { > struct drm_i915_private *dev_priv = stream->dev_priv; > @@ -962,6 +1086,14 @@ static void gen7_oa_disable(struct drm_i915_private *dev_priv) > I915_WRITE(GEN7_OACONTROL, 0); > } > > +/** > + * i915_oa_stream_disable - handle I915_PERF_IOCTL_DISABLE for OA stream > + * @stream: An i915 perf stream opened for OA metrics > + * > + * Stops the OA unit from periodically writing counter reports into the > + * circular OA buffer. This also stops the hrtimer that periodically checks for > + * data in the circular OA buffer, for notifying userspace. > + */ > static void i915_oa_stream_disable(struct i915_perf_stream *stream) > { > struct drm_i915_private *dev_priv = stream->dev_priv; > @@ -987,6 +1119,24 @@ static const struct i915_perf_stream_ops i915_oa_stream_ops = { > .read = i915_oa_read, > }; > > +/** > + * i915_oa_stream_init - validate combined props for OA stream and init > + * @stream: An i915 perf stream > + * @param: The open parameters passed to 'DRM_I915_PERF_OPEN` > + * @props: The property state that configures stream (individually validated) > + * > + * While read_properties_unlocked() validates properties in isolation it > + * doesn't ensure that the combination necessarily makes sense. > + * > + * At this point it has been determined that userspace wants a stream of > + * OA metrics, but still we need to further validate the combined > + * properties are OK. > + * > + * If the configuration makes sense then we can allocate memory for > + * a circular OA buffer and apply the requested metric set configuration. > + * > + * Returns: zero on success or a negative error code. > + */ > static int i915_oa_stream_init(struct i915_perf_stream *stream, > struct drm_i915_perf_open_param *param, > struct perf_open_properties *props) > @@ -1110,6 +1260,31 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, > return ret; > } > > +/** > + * i915_perf_read_locked - wrap stream->ops->read with error notmalisation notmalisation? normalisation? nationalisation? > + * @stream: An i915 perf stream > + * @file: An i915 perf stream file > + * @buf: destination buffer given by userspace > + * @count: the number of bytes userspace wants to read > + * @ppos: (inout) file seek position (unused) > + * > + * Besides wrapping stream->ops->read() this provides a common place to ensure This generates stream->ops->:c:func:read() for me...hmm, so maybe we don't want the brackets ? > + * that if we've successfully copied any data then reporting that takes > + * precedence over any internal error status, so the data isn't lost. > + * > + * For example ret will be -ENOSPC whenever there is more buffered data than > + * can be copied to userspace, but that's only interesting if we weren't able > + * to copy some data because it implies the userspace buffer is too small to > + * receive a single record (and we never split records). > + * > + * Another case with ret == -EFAULT is more of a grey area since it would seem > + * like bad form for userspace to ask us to overrun its buffer, but the user > + * knows best: > + * > + * http://yarchive.net/comp/linux/partial_reads_writes.html > + * > + * Returns: The number of bytes copied or a negative error code on failure. > + */ > static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream, > struct file *file, > char __user *buf, > @@ -1125,25 +1300,27 @@ static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream, > size_t offset = 0; > int ret = stream->ops->read(stream, buf, count, &offset); > > - /* If we've successfully copied any data then reporting that > - * takes precedence over any internal error status, so the > - * data isn't lost. > - * > - * For example ret will be -ENOSPC whenever there is more > - * buffered data than can be copied to userspace, but that's > - * only interesting if we weren't able to copy some data > - * because it implies the userspace buffer is too small to > - * receive a single record (and we never split records). > - * > - * Another case with ret == -EFAULT is more of a grey area > - * since it would seem like bad form for userspace to ask us > - * to overrun its buffer, but the user knows best: > - * > - * http://yarchive.net/comp/linux/partial_reads_writes.html > - */ > return offset ?: (ret ?: -EAGAIN); > } > > +/** > + * i915_perf_read - handles read() FOP for i915 perf stream FDs > + * @file: An i915 perf stream file > + * @buf: destination buffer given by userspace > + * @count: the number of bytes userspace wants to read > + * @ppos: (inout) file seek position (unused) > + * > + * The entry point for handling a read() on a stream file descriptor from > + * userspace. Most of the work is left to the i915_perf_read_locked() and > + * stream->op->read() but to save having stream implementations (of which s/stream->op->read()/stream->ops->read > + * we might have multiple later) we handle blocking read here. > + * > + * We can also consistently treat trying to read from a disable stream s/disable/disabled > + * as an IO error so implementations can assume the stream is enabled > + * while reading. > + * > + * Returns: The number of bytes copied or a negative error code on failure. > + */ > static ssize_t i915_perf_read(struct file *file, > char __user *buf, > size_t count, > @@ -1458,12 +1635,20 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, > return ret; > } > > -/* Note we copy the properties from userspace outside of the i915 perf > - * mutex to avoid an awkward lockdep with mmap_sem. > +/** > + * read_properties_unlocked - validate + copy userspace stream open properties > + * @dev_priv: i915 device instance > + * @uprops: The array of u64 key value pairs given by userspace > + * @n_props: The number of key value pairs expected in @uprops > + * @props: The stream configuration built up while validating properties > * > * Note this function only validates properties in isolation it doesn't > * validate that the combination of properties makes sense or that all > * properties necessary for a particular kind of stream have been set. > + * > + * Note that there currently aren't any ordering requirements for properties so > + * we shouldn't validate or assume anything about ordering here. This doesn't > + * rule out defining new properties with ordering requirements in the future. > */ > static int read_properties_unlocked(struct drm_i915_private *dev_priv, > u64 __user *uprops, > @@ -1585,6 +1770,26 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, > return 0; > } > > +/** > + * i915_perf_open_ioctl - DRM ioctl() for userspace to open a stream FD > + * @dev: drm device > + * @data: ioctl data copied from userspace (unvalidated) > + * @file: drm file > + * > + * Validates the stream open parameters given by userspace including flags > + * and an array of u64 key, value pair properties. > + * > + * Very little is assumed up front about the nature of the stream being > + * opened (for instance we don't assume it's for periodic OA unit metrics). An > + * i915-perf stream is expected to be a suitable interface for other forms of > + * buffered data written by the GPU besides periodic OA metrics. > + * > + * Note we copy the properties from userspace outside of the i915 perf > + * mutex to avoid an awkward lockdep with mmap_sem. > + * > + * Return: A newly opened i915 Perf stream file descriptor or negative > + * error code on failure. > + */ > int i915_perf_open_ioctl(struct drm_device *dev, void *data, > struct drm_file *file) > { > @@ -1621,6 +1826,14 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data, > return ret; > } > > +/** > + * i915_perf_register - exposes i915-perf to userspace > + * @dev_priv: i915 device instance > + * > + * In particular OA metric sets are advertised under a sysfs metrics/ > + * directory allowing userspace to enumerate valid IDs that can be > + * used to open an i915-perf stream. > + */ > void i915_perf_register(struct drm_i915_private *dev_priv) > { > if (!IS_HASWELL(dev_priv)) > @@ -1650,6 +1863,15 @@ void i915_perf_register(struct drm_i915_private *dev_priv) > mutex_unlock(&dev_priv->perf.lock); > } > > +/** > + * i915_perf_register - hide i915-perf from userspace i915_perf_unregister > + * @dev_priv: i915 device instance > + * > + * i915-perf state cleanup is split up into an 'unregister' and > + * 'deinit' phase where the interface is first hidden from > + * userspace by i915_perf_unregister() before cleaning up > + * remaining state in i915_perf_fini(). > + */ > void i915_perf_unregister(struct drm_i915_private *dev_priv) > { > if (!IS_HASWELL(dev_priv)) > @@ -1706,6 +1928,15 @@ static struct ctl_table dev_root[] = { > {} > }; > > +/** > + * i915_perf_init - initialize i915-perf state on module load > + * @dev_priv: i915 device instance > + * > + * Initializes state i915-perf state without exposing anything to userspace. s/state i915-perf state/i915-perf state/ Other than that it all looks good.
On Tue, Nov 29, 2016 at 05:00:55PM +0000, Robert Bragg wrote: > This adds a 'Perf' section to i915.rst with the following sub sections: > - Overview > - Comparison with Core Perf > - i915 Driver Entry Points > - i915 Perf Stream > - i915 Perf Observation Architecture Stream > - All i915 Perf Internals > > Signed-off-by: Robert Bragg <robert@sixbynine.org> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Two style bikesheds below, feel free to ignore. > --- > Documentation/gpu/i915.rst | 92 +++++++++++++ > drivers/gpu/drm/i915/i915_drv.h | 151 ++++++++++++++++---- > drivers/gpu/drm/i915/i915_perf.c | 289 +++++++++++++++++++++++++++++++++++---- > 3 files changed, 478 insertions(+), 54 deletions(-) > > diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst > index 117d2ab..714bd4b 100644 > --- a/Documentation/gpu/i915.rst > +++ b/Documentation/gpu/i915.rst > @@ -356,4 +356,96 @@ switch_mm > .. kernel-doc:: drivers/gpu/drm/i915/i915_trace.h > :doc: switch_mm tracepoint > > +Perf > +==== > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :doc: i915 Perf Overview > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :doc: i915 Perf History and Comparison with Core Perf > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :doc: i915 Perf File Operations You have the headings in the DOC comments itself, which works until someone reorganizes stuff. Then it tends to fall apart badly. > + > +i915 Driver Entry Points > +------------------------ > + > +This section covers the entrypoints exported outside of i915_perf.c to > +integrate with drm/i915 and to handle the `DRM_I915_PERF_OPEN` ioctl. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_init > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_fini > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_register > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_unregister > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_open_ioctl > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_release One potential issue with listing everything explicitly is that if someone ever (and this is bound to happen) adds a new function, they'll forget to add it. Hence we just pull them all in, and if you want to refernce some specifically, do that in the overview sections. And also sprinkle lots of cross-references all over to make groups of functions easier to discover. But in the end your docs, your turf ;-) -Daniel > + > +i915 Perf Stream > +---------------- > + > +This section covers the stream-semantics-agnostic structures and functions > +for representing an i915 perf stream FD and associated file operations. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h > + :functions: i915_perf_stream > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h > + :functions: i915_perf_stream_ops > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: read_properties_unlocked > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_open_ioctl_unlocked > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_destroy_unlocked > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_read > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_ioctl > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_enable_unlocked > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_disable_unlocked > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_poll > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_perf_poll_unlocked > + > +i915 Perf Observation Architecture Stream > +----------------------------------------- > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: OA_BUFFER_SIZE > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h > + :functions: i915_oa_ops > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_oa_stream_init > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_oa_read > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_oa_stream_enable > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_oa_stream_disable > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_oa_wait_unlocked > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :functions: i915_oa_poll_wait > + > +All i915 Perf Internals > +----------------------- > + > +This section simply includes all currently documented i915 perf internals, in > +no particular order, but may include some more minor utilities or platform > +specific details than found in the more high-level sections. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > + :internal: > + > .. WARNING: DOCPROC directive not supported: !Cdrivers/gpu/drm/i915/i915_irq.c > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 1ec9619..9f92755 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -1827,89 +1827,186 @@ struct i915_oa_reg { > > struct i915_perf_stream; > > +/** > + * struct i915_perf_stream_ops - the OPs to support a specific stream type > + */ > struct i915_perf_stream_ops { > - /* Enables the collection of HW samples, either in response to > - * I915_PERF_IOCTL_ENABLE or implicitly called when stream is > - * opened without I915_PERF_FLAG_DISABLED. > + /** > + * @enable: Enables the collection of HW samples, either in response to > + * I915_PERF_IOCTL_ENABLE or implicitly called when stream is opened > + * without I915_PERF_FLAG_DISABLED. > */ > void (*enable)(struct i915_perf_stream *stream); > > - /* Disables the collection of HW samples, either in response to > - * I915_PERF_IOCTL_DISABLE or implicitly called before > - * destroying the stream. > + /** > + * @disable: Disables the collection of HW samples, either in response > + * to I915_PERF_IOCTL_DISABLE or implicitly called before destroying > + * the stream. > */ > void (*disable)(struct i915_perf_stream *stream); > > - /* Call poll_wait, passing a wait queue that will be woken > + /** > + * @poll_wait: Call poll_wait, passing a wait queue that will be woken > * once there is something ready to read() for the stream > */ > void (*poll_wait)(struct i915_perf_stream *stream, > struct file *file, > poll_table *wait); > > - /* For handling a blocking read, wait until there is something > - * to ready to read() for the stream. E.g. wait on the same > + /** > + * @wait_unlocked: For handling a blocking read, wait until there is > + * something to ready to read() for the stream. E.g. wait on the same > * wait queue that would be passed to poll_wait(). > */ > int (*wait_unlocked)(struct i915_perf_stream *stream); > > - /* read - Copy buffered metrics as records to userspace > - * @buf: the userspace, destination buffer > - * @count: the number of bytes to copy, requested by userspace > - * @offset: zero at the start of the read, updated as the read > - * proceeds, it represents how many bytes have been > - * copied so far and the buffer offset for copying the > - * next record. > + /** > + * @read: Copy buffered metrics as records to userspace > + * **buf**: the userspace, destination buffer > + * **count**: the number of bytes to copy, requested by userspace > + * **offset**: zero at the start of the read, updated as the read > + * proceeds, it represents how many bytes have been copied so far and > + * the buffer offset for copying the next record. > * > - * Copy as many buffered i915 perf samples and records for > - * this stream to userspace as will fit in the given buffer. > + * Copy as many buffered i915 perf samples and records for this stream > + * to userspace as will fit in the given buffer. > * > - * Only write complete records; returning -ENOSPC if there > - * isn't room for a complete record. > + * Only write complete records; returning -ENOSPC if there isn't room > + * for a complete record. > * > - * Return any error condition that results in a short read > - * such as -ENOSPC or -EFAULT, even though these may be > - * squashed before returning to userspace. > + * Return any error condition that results in a short read such as > + * -ENOSPC or -EFAULT, even though these may be squashed before > + * returning to userspace. > */ > int (*read)(struct i915_perf_stream *stream, > char __user *buf, > size_t count, > size_t *offset); > > - /* Cleanup any stream specific resources. > + /** > + * @destroy: Cleanup any stream specific resources. > * > * The stream will always be disabled before this is called. > */ > void (*destroy)(struct i915_perf_stream *stream); > }; > > +/** > + * struct i915_perf_stream - state for a single open stream FD > + */ > struct i915_perf_stream { > + /** > + * @dev_priv: i915 drm device > + */ > struct drm_i915_private *dev_priv; > > + /** > + * @link: Links the stream into ``&drm_i915_private->streams`` > + */ > struct list_head link; > > + /** > + * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*` > + * properties given when opening a stream, representing the contents > + * of a single sample as read() by userspace. > + */ > u32 sample_flags; > + > + /** > + * @sample_size: Considering the configured contents of a sample > + * combined with the required header size, this is the total size > + * of a single sample record. > + */ > int sample_size; > > + /** > + * @ctx: %NULL if measuring system-wide across all contexts or a > + * specific context that is being monitored. > + */ > struct i915_gem_context *ctx; > + > + /** > + * @enabled: Whether the stream is currently enabled, considering > + * whether the stream was opened in a disabled state and based > + * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls. > + */ > bool enabled; > > + /** > + * @ops: The callbacks providing the implementation of this specific > + * type of configured stream. > + */ > const struct i915_perf_stream_ops *ops; > }; > > +/** > + * struct i915_oa_ops - Gen specific implementation of an OA unit stream > + */ > struct i915_oa_ops { > + /** > + * @init_oa_buffer: Resets the head and tail pointers of the > + * circular buffer for periodic OA reports. > + * > + * Called when first opening a stream for OA metrics, but also may be > + * called in response to an OA buffer overflow or other error > + * condition. > + * > + * Note it may be necessary to clear the full OA buffer here as part of > + * maintaining the invariable that new reports must be written to > + * zeroed memory for us to be able to reliable detect if an expected > + * report has not yet landed in memory. (At least on Haswell the OA > + * buffer tail pointer is not synchronized with reports being visible > + * to the CPU) > + */ > void (*init_oa_buffer)(struct drm_i915_private *dev_priv); > + > + /** > + * @enable_metric_set: Applies any MUX configuration to set up the > + * Boolean and Custom (B/C) counters that are part of the counter > + * reports being sampled. May apply system constraints such as > + * disabling EU clock gating as required. > + */ > int (*enable_metric_set)(struct drm_i915_private *dev_priv); > + > + /** > + * @disable_metric_set: Remove system constraints associated with using > + * the OA unit. > + */ > void (*disable_metric_set)(struct drm_i915_private *dev_priv); > + > + /** > + * @oa_enable: Enable periodic sampling > + */ > void (*oa_enable)(struct drm_i915_private *dev_priv); > + > + /** > + * @oa_disable: Disable periodic sampling > + */ > void (*oa_disable)(struct drm_i915_private *dev_priv); > - void (*update_oacontrol)(struct drm_i915_private *dev_priv); > - void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv, > - u32 ctx_id); > + > + /** > + * @read: Copy data from the circular OA buffer into a given userspace > + * buffer. > + */ > int (*read)(struct i915_perf_stream *stream, > char __user *buf, > size_t count, > size_t *offset); > + > + /** > + * @oa_buffer_is_empty: Check if OA buffer empty (false positives OK) > + * > + * This is either called via fops or the poll check hrtimer (atomic > + * ctx) without any locks taken. > + * > + * It's safe to read OA config state here unlocked, assuming that this > + * is only called while the stream is enabled, while the global OA > + * configuration can't be modified. > + * > + * Efficiency is more important than avoiding some false positives > + * here, which will be handled gracefully - likely resulting in an > + * EAGAIN error for userspace. > + */ > bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv); > }; > > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > index 68b7c27..0be8cef 100644 > --- a/drivers/gpu/drm/i915/i915_perf.c > +++ b/drivers/gpu/drm/i915/i915_perf.c > @@ -26,7 +26,10 @@ > > > /** > - * DOC: i915 Perf, streaming API for GPU metrics > + * DOC: i915 Perf Overview > + * > + * Overview > + * -------- > * > * Gen graphics supports a large number of performance counters that can help > * driver and application developers understand and optimize their use of the > @@ -45,6 +48,13 @@ > * privileges by default, unless changed via the dev.i915.perf_event_paranoid > * sysctl option. > * > + */ > + > +/** > + * DOC: i915 Perf History and Comparison with Core Perf > + * > + * Comparison with Core Perf > + * ------------------------- > * > * The interface was initially inspired by the core Perf infrastructure but > * some notable differences are: > @@ -75,8 +85,8 @@ > * gets copied from the GPU mapped buffers to userspace buffers. > * > * > - * Some notes regarding Linux Perf: > - * -------------------------------- > + * Issues hit with first prototype based on Core Perf > + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > * > * The first prototype of this driver was based on the core perf > * infrastructure, and while we did make that mostly work, with some changes to > @@ -135,7 +145,7 @@ > * for combining with the side-band raw reports it captures using > * MI_REPORT_PERF_COUNT commands. > * > - * _ As a side note on perf's grouping feature; there was also some concern > + * - As a side note on perf's grouping feature; there was also some concern > * that using PERF_FORMAT_GROUP as a way to pack together counter values > * would quite drastically inflate our sample sizes, which would likely > * lower the effective sampling resolutions we could use when the available > @@ -277,6 +287,20 @@ static struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = { > > #define SAMPLE_OA_REPORT (1<<0) > > +/** > + * struct perf_open_properties - for validated properties given to open a stream > + * @sample_flags: `DRM_I915_PERF_PROP_SAMPLE_*` properties are tracked as flags > + * @single_context: Whether a single or all gpu contexts should be monitored > + * @ctx_handle: A gem ctx handle for use with @single_context > + * @metrics_set: An ID for an OA unit metric set advertised via sysfs > + * @oa_format: An OA unit HW report format > + * @oa_periodic: Whether to enable periodic OA unit sampling > + * @oa_period_exponent: The OA unit sampling period is derived from this > + * > + * As read_properties_unlocked() enumerates and validates the properties given > + * to open a stream of metrics the configuration is built up in the structure > + * which starts out zero initialized. > + */ > struct perf_open_properties { > u32 sample_flags; > > @@ -314,7 +338,19 @@ static bool gen7_oa_buffer_is_empty_fop_unlocked(struct drm_i915_private *dev_pr > } > > /** > - * Appends a status record to a userspace read() buffer. > + * append_oa_status - Appends a status record to a userspace read() buffer. > + * @stream: An i915-perf stream opened for OA metrics > + * @buf: destination buffer given by userspace > + * @count: the number of bytes userspace wants to read > + * @offset: (inout): the current position for writing into @buf > + * @type: The kind of status to report to userspace > + * > + * Writes a status record (such as `DRM_I915_PERF_RECORD_OA_REPORT_LOST`) > + * into the userspace read() buffer. > + * > + * The @buf @offset will only be updated on success. > + * > + * Returns: 0 on success, negative error code on failure. > */ > static int append_oa_status(struct i915_perf_stream *stream, > char __user *buf, > @@ -336,7 +372,21 @@ static int append_oa_status(struct i915_perf_stream *stream, > } > > /** > - * Copies single OA report into userspace read() buffer. > + * append_oa_sample - Copies single OA report into userspace read() buffer. > + * @stream: An i915-perf stream opened for OA metrics > + * @buf: destination buffer given by userspace > + * @count: the number of bytes userspace wants to read > + * @offset: (inout): the current position for writing into @buf > + * @report: A single OA report to (optionally) include as part of the sample > + * > + * The contents of a sample are configured through `DRM_I915_PERF_PROP_SAMPLE_*` > + * properties when opening a stream, tracked as `stream->sample_flags`. This > + * function copies the requested components of a single sample to the given > + * read() @buf. > + * > + * The @buf @offset will only be updated on success. > + * > + * Returns: 0 on success, negative error code on failure. > */ > static int append_oa_sample(struct i915_perf_stream *stream, > char __user *buf, > @@ -380,8 +430,6 @@ static int append_oa_sample(struct i915_perf_stream *stream, > * @head_ptr: (inout): the current oa buffer cpu read position > * @tail: the current oa buffer gpu write position > * > - * Returns 0 on success, negative error code on failure. > - * > * Notably any error condition resulting in a short read (-ENOSPC or > * -EFAULT) will be returned even though one or more records may > * have been successfully copied. In this case it's up to the caller > @@ -392,6 +440,8 @@ static int append_oa_sample(struct i915_perf_stream *stream, > * tail, so the head chases the tail?... If you think that's mad > * and back-to-front you're not alone, but this follows the > * Gen PRM naming convention. > + * > + * Returns: 0 on success, negative error code on failure. > */ > static int gen7_append_oa_reports(struct i915_perf_stream *stream, > char __user *buf, > @@ -496,6 +546,22 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, > return ret; > } > > +/** > + * gen7_oa_read - copy status records then buffered OA reports > + * @stream: An i915-perf stream opened for OA metrics > + * @buf: destination buffer given by userspace > + * @count: the number of bytes userspace wants to read > + * @offset: (inout): the current position for writing into @buf > + * > + * Checks Gen 7 specific OA unit status registers and if necessary appends > + * corresponding status records for userspace (such as for a buffer full > + * condition) and then initiate appending any buffered OA reports. > + * > + * Updates @offset according to the number of bytes successfully copied into > + * the userspace buffer. > + * > + * Returns: zero on success or a negative error code > + */ > static int gen7_oa_read(struct i915_perf_stream *stream, > char __user *buf, > size_t count, > @@ -597,6 +663,20 @@ static int gen7_oa_read(struct i915_perf_stream *stream, > return ret; > } > > +/** > + * i915_oa_wait_unlocked - handles blocking IO until OA data available > + * @stream: An i915-perf stream opened for OA metrics > + * > + * Called when userspace tries to read() from a blocking stream FD opened > + * for OA metrics. It waits until the hrtimer callback finds a non-empty > + * OA buffer and wakes us. > + * > + * Note: it's acceptable to have this return with some false positives > + * since any subsequent read handling will return EAGAIN if there isn't > + * really data ready for userspace yet. > + * > + * Returns: zero on success or a negative error code > + */ > static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) > { > struct drm_i915_private *dev_priv = stream->dev_priv; > @@ -615,6 +695,16 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) > !dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv)); > } > > +/** > + * i915_oa_poll_wait - call poll_wait() for an OA stream poll() > + * @stream: An i915-perf stream opened for OA metrics > + * @file: An i915 perf stream file > + * @wait: poll() state table > + * > + * For handling userspace polling on an i915 perf stream opened for OA metrics, > + * this starts a poll_wait with the wait queue that our hrtimer callback wakes > + * when it sees data ready to read in the circular OA buffer. > + */ > static void i915_oa_poll_wait(struct i915_perf_stream *stream, > struct file *file, > poll_table *wait) > @@ -624,6 +714,18 @@ static void i915_oa_poll_wait(struct i915_perf_stream *stream, > poll_wait(file, &dev_priv->perf.oa.poll_wq, wait); > } > > +/** > + * i915_oa_read - just calls through to ``&i915_oa_ops->read`` > + * @stream: An i915-perf stream opened for OA metrics > + * @buf: destination buffer given by userspace > + * @count: the number of bytes userspace wants to read > + * @offset: (inout): the current position for writing into @buf > + * > + * Updates @offset according to the number of bytes successfully copied into > + * the userspace buffer. > + * > + * Returns: zero on success or a negative error code > + */ > static int i915_oa_read(struct i915_perf_stream *stream, > char __user *buf, > size_t count, > @@ -634,9 +736,15 @@ static int i915_oa_read(struct i915_perf_stream *stream, > return dev_priv->perf.oa.ops.read(stream, buf, count, offset); > } > > -/* Determine the render context hw id, and ensure it remains fixed for the > +/** > + * oa_get_render_ctx_id - determine and hold ctx hw id > + * @stream: An i915-perf stream opened for OA metrics > + * > + * Determine the render context hw id, and ensure it remains fixed for the > * lifetime of the stream. This ensures that we don't have to worry about > * updating the context ID in OACONTROL on the fly. > + * > + * Returns: zero on success or a negative error code > */ > static int oa_get_render_ctx_id(struct i915_perf_stream *stream) > { > @@ -673,6 +781,13 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) > return ret; > } > > +/** > + * oa_put_render_ctx_id - counterpart to oa_get_render_ctx_id releases hold > + * @stream: An i915-perf stream opened for OA metrics > + * > + * In case anything needed doing to ensure the context HW ID would remain valid > + * for the lifetime of the stream, then that can be undone here. > + */ > static void oa_put_render_ctx_id(struct i915_perf_stream *stream) > { > struct drm_i915_private *dev_priv = stream->dev_priv; > @@ -945,6 +1060,15 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv) > spin_unlock_irqrestore(&dev_priv->perf.hook_lock, flags); > } > > +/** > + * i915_oa_stream_enable - handle I915_PERF_IOCTL_ENABLE for OA stream > + * @stream: An i915 perf stream opened for OA metrics > + * > + * [Re]enables hardware periodic sampling according to the period configured > + * when opening the stream. This also starts a hrtimer that will periodically > + * check for data in the circular OA buffer for notifying userspace (e.g. > + * during a read() or poll()). > + */ > static void i915_oa_stream_enable(struct i915_perf_stream *stream) > { > struct drm_i915_private *dev_priv = stream->dev_priv; > @@ -962,6 +1086,14 @@ static void gen7_oa_disable(struct drm_i915_private *dev_priv) > I915_WRITE(GEN7_OACONTROL, 0); > } > > +/** > + * i915_oa_stream_disable - handle I915_PERF_IOCTL_DISABLE for OA stream > + * @stream: An i915 perf stream opened for OA metrics > + * > + * Stops the OA unit from periodically writing counter reports into the > + * circular OA buffer. This also stops the hrtimer that periodically checks for > + * data in the circular OA buffer, for notifying userspace. > + */ > static void i915_oa_stream_disable(struct i915_perf_stream *stream) > { > struct drm_i915_private *dev_priv = stream->dev_priv; > @@ -987,6 +1119,24 @@ static const struct i915_perf_stream_ops i915_oa_stream_ops = { > .read = i915_oa_read, > }; > > +/** > + * i915_oa_stream_init - validate combined props for OA stream and init > + * @stream: An i915 perf stream > + * @param: The open parameters passed to 'DRM_I915_PERF_OPEN` > + * @props: The property state that configures stream (individually validated) > + * > + * While read_properties_unlocked() validates properties in isolation it > + * doesn't ensure that the combination necessarily makes sense. > + * > + * At this point it has been determined that userspace wants a stream of > + * OA metrics, but still we need to further validate the combined > + * properties are OK. > + * > + * If the configuration makes sense then we can allocate memory for > + * a circular OA buffer and apply the requested metric set configuration. > + * > + * Returns: zero on success or a negative error code. > + */ > static int i915_oa_stream_init(struct i915_perf_stream *stream, > struct drm_i915_perf_open_param *param, > struct perf_open_properties *props) > @@ -1110,6 +1260,31 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, > return ret; > } > > +/** > + * i915_perf_read_locked - wrap stream->ops->read with error notmalisation > + * @stream: An i915 perf stream > + * @file: An i915 perf stream file > + * @buf: destination buffer given by userspace > + * @count: the number of bytes userspace wants to read > + * @ppos: (inout) file seek position (unused) > + * > + * Besides wrapping stream->ops->read() this provides a common place to ensure > + * that if we've successfully copied any data then reporting that takes > + * precedence over any internal error status, so the data isn't lost. > + * > + * For example ret will be -ENOSPC whenever there is more buffered data than > + * can be copied to userspace, but that's only interesting if we weren't able > + * to copy some data because it implies the userspace buffer is too small to > + * receive a single record (and we never split records). > + * > + * Another case with ret == -EFAULT is more of a grey area since it would seem > + * like bad form for userspace to ask us to overrun its buffer, but the user > + * knows best: > + * > + * http://yarchive.net/comp/linux/partial_reads_writes.html > + * > + * Returns: The number of bytes copied or a negative error code on failure. > + */ > static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream, > struct file *file, > char __user *buf, > @@ -1125,25 +1300,27 @@ static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream, > size_t offset = 0; > int ret = stream->ops->read(stream, buf, count, &offset); > > - /* If we've successfully copied any data then reporting that > - * takes precedence over any internal error status, so the > - * data isn't lost. > - * > - * For example ret will be -ENOSPC whenever there is more > - * buffered data than can be copied to userspace, but that's > - * only interesting if we weren't able to copy some data > - * because it implies the userspace buffer is too small to > - * receive a single record (and we never split records). > - * > - * Another case with ret == -EFAULT is more of a grey area > - * since it would seem like bad form for userspace to ask us > - * to overrun its buffer, but the user knows best: > - * > - * http://yarchive.net/comp/linux/partial_reads_writes.html > - */ > return offset ?: (ret ?: -EAGAIN); > } > > +/** > + * i915_perf_read - handles read() FOP for i915 perf stream FDs > + * @file: An i915 perf stream file > + * @buf: destination buffer given by userspace > + * @count: the number of bytes userspace wants to read > + * @ppos: (inout) file seek position (unused) > + * > + * The entry point for handling a read() on a stream file descriptor from > + * userspace. Most of the work is left to the i915_perf_read_locked() and > + * stream->op->read() but to save having stream implementations (of which > + * we might have multiple later) we handle blocking read here. > + * > + * We can also consistently treat trying to read from a disable stream > + * as an IO error so implementations can assume the stream is enabled > + * while reading. > + * > + * Returns: The number of bytes copied or a negative error code on failure. > + */ > static ssize_t i915_perf_read(struct file *file, > char __user *buf, > size_t count, > @@ -1458,12 +1635,20 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, > return ret; > } > > -/* Note we copy the properties from userspace outside of the i915 perf > - * mutex to avoid an awkward lockdep with mmap_sem. > +/** > + * read_properties_unlocked - validate + copy userspace stream open properties > + * @dev_priv: i915 device instance > + * @uprops: The array of u64 key value pairs given by userspace > + * @n_props: The number of key value pairs expected in @uprops > + * @props: The stream configuration built up while validating properties > * > * Note this function only validates properties in isolation it doesn't > * validate that the combination of properties makes sense or that all > * properties necessary for a particular kind of stream have been set. > + * > + * Note that there currently aren't any ordering requirements for properties so > + * we shouldn't validate or assume anything about ordering here. This doesn't > + * rule out defining new properties with ordering requirements in the future. > */ > static int read_properties_unlocked(struct drm_i915_private *dev_priv, > u64 __user *uprops, > @@ -1585,6 +1770,26 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, > return 0; > } > > +/** > + * i915_perf_open_ioctl - DRM ioctl() for userspace to open a stream FD > + * @dev: drm device > + * @data: ioctl data copied from userspace (unvalidated) > + * @file: drm file > + * > + * Validates the stream open parameters given by userspace including flags > + * and an array of u64 key, value pair properties. > + * > + * Very little is assumed up front about the nature of the stream being > + * opened (for instance we don't assume it's for periodic OA unit metrics). An > + * i915-perf stream is expected to be a suitable interface for other forms of > + * buffered data written by the GPU besides periodic OA metrics. > + * > + * Note we copy the properties from userspace outside of the i915 perf > + * mutex to avoid an awkward lockdep with mmap_sem. > + * > + * Return: A newly opened i915 Perf stream file descriptor or negative > + * error code on failure. > + */ > int i915_perf_open_ioctl(struct drm_device *dev, void *data, > struct drm_file *file) > { > @@ -1621,6 +1826,14 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data, > return ret; > } > > +/** > + * i915_perf_register - exposes i915-perf to userspace > + * @dev_priv: i915 device instance > + * > + * In particular OA metric sets are advertised under a sysfs metrics/ > + * directory allowing userspace to enumerate valid IDs that can be > + * used to open an i915-perf stream. > + */ > void i915_perf_register(struct drm_i915_private *dev_priv) > { > if (!IS_HASWELL(dev_priv)) > @@ -1650,6 +1863,15 @@ void i915_perf_register(struct drm_i915_private *dev_priv) > mutex_unlock(&dev_priv->perf.lock); > } > > +/** > + * i915_perf_register - hide i915-perf from userspace > + * @dev_priv: i915 device instance > + * > + * i915-perf state cleanup is split up into an 'unregister' and > + * 'deinit' phase where the interface is first hidden from > + * userspace by i915_perf_unregister() before cleaning up > + * remaining state in i915_perf_fini(). > + */ > void i915_perf_unregister(struct drm_i915_private *dev_priv) > { > if (!IS_HASWELL(dev_priv)) > @@ -1706,6 +1928,15 @@ static struct ctl_table dev_root[] = { > {} > }; > > +/** > + * i915_perf_init - initialize i915-perf state on module load > + * @dev_priv: i915 device instance > + * > + * Initializes state i915-perf state without exposing anything to userspace. > + * > + * Note: i915-perf initialization is split into an 'init' and 'register' > + * phase with the i915_perf_register() exposing state to userspace. > + */ > void i915_perf_init(struct drm_i915_private *dev_priv) > { > if (!IS_HASWELL(dev_priv)) > @@ -1741,6 +1972,10 @@ void i915_perf_init(struct drm_i915_private *dev_priv) > dev_priv->perf.initialized = true; > } > > +/** > + * i915_perf_fini - Counter part to i915_perf_init() > + * @dev_priv: i915 device instance > + */ > void i915_perf_fini(struct drm_i915_private *dev_priv) > { > if (!dev_priv->perf.initialized) > -- > 2.10.2 >
On Wed, 30 Nov 2016, Daniel Vetter <daniel@ffwll.ch> wrote: > On Tue, Nov 29, 2016 at 05:00:55PM +0000, Robert Bragg wrote: >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c >> + :functions: i915_perf_init >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c >> + :functions: i915_perf_fini >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c >> + :functions: i915_perf_register >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c >> + :functions: i915_perf_unregister >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c >> + :functions: i915_perf_open_ioctl >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c >> + :functions: i915_perf_release > > One potential issue with listing everything explicitly is that if someone > ever (and this is bound to happen) adds a new function, they'll forget to > add it. Hence we just pull them all in, and if you want to refernce some > specifically, do that in the overview sections. One real issue with listing everything separately is that kernel-doc parses the source file once per every kernel-doc directive. Also, doesn't Sphinx complain about not having a blank line to end the indented block after the directive? It might not, but I thought it might. BR, Jani.
On Nov 30, 2016 19:41, "Daniel Vetter" <daniel@ffwll.ch> wrote: > > On Tue, Nov 29, 2016 at 05:00:55PM +0000, Robert Bragg wrote: > > This adds a 'Perf' section to i915.rst with the following sub sections: > > - Overview > > - Comparison with Core Perf > > - i915 Driver Entry Points > > - i915 Perf Stream > > - i915 Perf Observation Architecture Stream > > - All i915 Perf Internals > > > > Signed-off-by: Robert Bragg <robert@sixbynine.org> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > Two style bikesheds below, feel free to ignore. > > > --- > > Documentation/gpu/i915.rst | 92 +++++++++++++ > > drivers/gpu/drm/i915/i915_drv.h | 151 ++++++++++++++++---- > > drivers/gpu/drm/i915/i915_perf.c | 289 ++++++++++++++++++++++++++++++ +++++---- > > 3 files changed, 478 insertions(+), 54 deletions(-) > > > > diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst > > index 117d2ab..714bd4b 100644 > > --- a/Documentation/gpu/i915.rst > > +++ b/Documentation/gpu/i915.rst > > @@ -356,4 +356,96 @@ switch_mm > > .. kernel-doc:: drivers/gpu/drm/i915/i915_trace.h > > :doc: switch_mm tracepoint > > > > +Perf > > +==== > > + > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :doc: i915 Perf Overview > > + > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :doc: i915 Perf History and Comparison with Core Perf > > + > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :doc: i915 Perf File Operations > > You have the headings in the DOC comments itself, which works until > someone reorganizes stuff. Then it tends to fall apart badly. Yeah, could be better. > > > + > > +i915 Driver Entry Points > > +------------------------ > > + > > +This section covers the entrypoints exported outside of i915_perf.c to > > +integrate with drm/i915 and to handle the `DRM_I915_PERF_OPEN` ioctl. > > + > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_init > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_fini > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_register > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_unregister > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_open_ioctl > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_release > > One potential issue with listing everything explicitly is that if someone > ever (and this is bound to happen) adds a new function, they'll forget to > add it. Hence we just pull them all in, and if you want to refernce some > specifically, do that in the overview sections. And also sprinkle lots of > cross-references all over to make groups of functions easier to discover. Without any structure it just didn't seem like documentation; just a dump of internals info which didn't look that helpful to me. There are some fairly noteworthy separations of responsibilities between functions providing the i915 perf stream infrastructure and then the code for OA unit streams, and then code specifically for Haswell. Splitting them into sections seems worthwhile to me. I don't think it's that big a deal listing symbols individually and maintaining this when adding new symbols. Having maintained Cogl using gtk-doc where symbols had to be explicitly listed in a -sections.txt file to added them to the documentation, that wasn't too bad. It's true that slip ups happen, but having control over the order symbols are presented seems good to me. It could be nice if an ordered list could be passed to :functions: to reduce how many times the corresponding perl script runs. It could maybe be good if there was some way to tag/label symbols in kerneldoc for selection in restructured text. It would also be nice if the tooling understood what symbols were in the document so far, so it could somehow be possible to have a dumping ground section for 'everything else' at the end. Regards, - Robert > > But in the end your docs, your turf ;-) > -Daniel > > > + > > +i915 Perf Stream > > +---------------- > > + > > +This section covers the stream-semantics-agnostic structures and functions > > +for representing an i915 perf stream FD and associated file operations. > > + > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h > > + :functions: i915_perf_stream > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h > > + :functions: i915_perf_stream_ops > > + > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: read_properties_unlocked > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_open_ioctl_unlocked > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_destroy_unlocked > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_read > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_ioctl > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_enable_unlocked > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_disable_unlocked > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_poll > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_perf_poll_unlocked > > + > > +i915 Perf Observation Architecture Stream > > +----------------------------------------- > > + > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: OA_BUFFER_SIZE > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h > > + :functions: i915_oa_ops > > + > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_oa_stream_init > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_oa_read > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_oa_stream_enable > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_oa_stream_disable > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_oa_wait_unlocked > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :functions: i915_oa_poll_wait > > + > > +All i915 Perf Internals > > +----------------------- > > + > > +This section simply includes all currently documented i915 perf internals, in > > +no particular order, but may include some more minor utilities or platform > > +specific details than found in the more high-level sections. > > + > > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > > + :internal: > > + > > .. WARNING: DOCPROC directive not supported: !Cdrivers/gpu/drm/i915/i915_irq.c > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > > index 1ec9619..9f92755 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -1827,89 +1827,186 @@ struct i915_oa_reg { > > > > struct i915_perf_stream; > > > > +/** > > + * struct i915_perf_stream_ops - the OPs to support a specific stream type > > + */ > > struct i915_perf_stream_ops { > > - /* Enables the collection of HW samples, either in response to > > - * I915_PERF_IOCTL_ENABLE or implicitly called when stream is > > - * opened without I915_PERF_FLAG_DISABLED. > > + /** > > + * @enable: Enables the collection of HW samples, either in response to > > + * I915_PERF_IOCTL_ENABLE or implicitly called when stream is opened > > + * without I915_PERF_FLAG_DISABLED. > > */ > > void (*enable)(struct i915_perf_stream *stream); > > > > - /* Disables the collection of HW samples, either in response to > > - * I915_PERF_IOCTL_DISABLE or implicitly called before > > - * destroying the stream. > > + /** > > + * @disable: Disables the collection of HW samples, either in response > > + * to I915_PERF_IOCTL_DISABLE or implicitly called before destroying > > + * the stream. > > */ > > void (*disable)(struct i915_perf_stream *stream); > > > > - /* Call poll_wait, passing a wait queue that will be woken > > + /** > > + * @poll_wait: Call poll_wait, passing a wait queue that will be woken > > * once there is something ready to read() for the stream > > */ > > void (*poll_wait)(struct i915_perf_stream *stream, > > struct file *file, > > poll_table *wait); > > > > - /* For handling a blocking read, wait until there is something > > - * to ready to read() for the stream. E.g. wait on the same > > + /** > > + * @wait_unlocked: For handling a blocking read, wait until there is > > + * something to ready to read() for the stream. E.g. wait on the same > > * wait queue that would be passed to poll_wait(). > > */ > > int (*wait_unlocked)(struct i915_perf_stream *stream); > > > > - /* read - Copy buffered metrics as records to userspace > > - * @buf: the userspace, destination buffer > > - * @count: the number of bytes to copy, requested by userspace > > - * @offset: zero at the start of the read, updated as the read > > - * proceeds, it represents how many bytes have been > > - * copied so far and the buffer offset for copying the > > - * next record. > > + /** > > + * @read: Copy buffered metrics as records to userspace > > + * **buf**: the userspace, destination buffer > > + * **count**: the number of bytes to copy, requested by userspace > > + * **offset**: zero at the start of the read, updated as the read > > + * proceeds, it represents how many bytes have been copied so far and > > + * the buffer offset for copying the next record. > > * > > - * Copy as many buffered i915 perf samples and records for > > - * this stream to userspace as will fit in the given buffer. > > + * Copy as many buffered i915 perf samples and records for this stream > > + * to userspace as will fit in the given buffer. > > * > > - * Only write complete records; returning -ENOSPC if there > > - * isn't room for a complete record. > > + * Only write complete records; returning -ENOSPC if there isn't room > > + * for a complete record. > > * > > - * Return any error condition that results in a short read > > - * such as -ENOSPC or -EFAULT, even though these may be > > - * squashed before returning to userspace. > > + * Return any error condition that results in a short read such as > > + * -ENOSPC or -EFAULT, even though these may be squashed before > > + * returning to userspace. > > */ > > int (*read)(struct i915_perf_stream *stream, > > char __user *buf, > > size_t count, > > size_t *offset); > > > > - /* Cleanup any stream specific resources. > > + /** > > + * @destroy: Cleanup any stream specific resources. > > * > > * The stream will always be disabled before this is called. > > */ > > void (*destroy)(struct i915_perf_stream *stream); > > }; > > > > +/** > > + * struct i915_perf_stream - state for a single open stream FD > > + */ > > struct i915_perf_stream { > > + /** > > + * @dev_priv: i915 drm device > > + */ > > struct drm_i915_private *dev_priv; > > > > + /** > > + * @link: Links the stream into ``&drm_i915_private->streams`` > > + */ > > struct list_head link; > > > > + /** > > + * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*` > > + * properties given when opening a stream, representing the contents > > + * of a single sample as read() by userspace. > > + */ > > u32 sample_flags; > > + > > + /** > > + * @sample_size: Considering the configured contents of a sample > > + * combined with the required header size, this is the total size > > + * of a single sample record. > > + */ > > int sample_size; > > > > + /** > > + * @ctx: %NULL if measuring system-wide across all contexts or a > > + * specific context that is being monitored. > > + */ > > struct i915_gem_context *ctx; > > + > > + /** > > + * @enabled: Whether the stream is currently enabled, considering > > + * whether the stream was opened in a disabled state and based > > + * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls. > > + */ > > bool enabled; > > > > + /** > > + * @ops: The callbacks providing the implementation of this specific > > + * type of configured stream. > > + */ > > const struct i915_perf_stream_ops *ops; > > }; > > > > +/** > > + * struct i915_oa_ops - Gen specific implementation of an OA unit stream > > + */ > > struct i915_oa_ops { > > + /** > > + * @init_oa_buffer: Resets the head and tail pointers of the > > + * circular buffer for periodic OA reports. > > + * > > + * Called when first opening a stream for OA metrics, but also may be > > + * called in response to an OA buffer overflow or other error > > + * condition. > > + * > > + * Note it may be necessary to clear the full OA buffer here as part of > > + * maintaining the invariable that new reports must be written to > > + * zeroed memory for us to be able to reliable detect if an expected > > + * report has not yet landed in memory. (At least on Haswell the OA > > + * buffer tail pointer is not synchronized with reports being visible > > + * to the CPU) > > + */ > > void (*init_oa_buffer)(struct drm_i915_private *dev_priv); > > + > > + /** > > + * @enable_metric_set: Applies any MUX configuration to set up the > > + * Boolean and Custom (B/C) counters that are part of the counter > > + * reports being sampled. May apply system constraints such as > > + * disabling EU clock gating as required. > > + */ > > int (*enable_metric_set)(struct drm_i915_private *dev_priv); > > + > > + /** > > + * @disable_metric_set: Remove system constraints associated with using > > + * the OA unit. > > + */ > > void (*disable_metric_set)(struct drm_i915_private *dev_priv); > > + > > + /** > > + * @oa_enable: Enable periodic sampling > > + */ > > void (*oa_enable)(struct drm_i915_private *dev_priv); > > + > > + /** > > + * @oa_disable: Disable periodic sampling > > + */ > > void (*oa_disable)(struct drm_i915_private *dev_priv); > > - void (*update_oacontrol)(struct drm_i915_private *dev_priv); > > - void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv, > > - u32 ctx_id); > > + > > + /** > > + * @read: Copy data from the circular OA buffer into a given userspace > > + * buffer. > > + */ > > int (*read)(struct i915_perf_stream *stream, > > char __user *buf, > > size_t count, > > size_t *offset); > > + > > + /** > > + * @oa_buffer_is_empty: Check if OA buffer empty (false positives OK) > > + * > > + * This is either called via fops or the poll check hrtimer (atomic > > + * ctx) without any locks taken. > > + * > > + * It's safe to read OA config state here unlocked, assuming that this > > + * is only called while the stream is enabled, while the global OA > > + * configuration can't be modified. > > + * > > + * Efficiency is more important than avoiding some false positives > > + * here, which will be handled gracefully - likely resulting in an > > + * EAGAIN error for userspace. > > + */ > > bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv); > > }; > > > > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > > index 68b7c27..0be8cef 100644 > > --- a/drivers/gpu/drm/i915/i915_perf.c > > +++ b/drivers/gpu/drm/i915/i915_perf.c > > @@ -26,7 +26,10 @@ > > > > > > /** > > - * DOC: i915 Perf, streaming API for GPU metrics > > + * DOC: i915 Perf Overview > > + * > > + * Overview > > + * -------- > > * > > * Gen graphics supports a large number of performance counters that can help > > * driver and application developers understand and optimize their use of the > > @@ -45,6 +48,13 @@ > > * privileges by default, unless changed via the dev.i915.perf_event_paranoid > > * sysctl option. > > * > > + */ > > + > > +/** > > + * DOC: i915 Perf History and Comparison with Core Perf > > + * > > + * Comparison with Core Perf > > + * ------------------------- > > * > > * The interface was initially inspired by the core Perf infrastructure but > > * some notable differences are: > > @@ -75,8 +85,8 @@ > > * gets copied from the GPU mapped buffers to userspace buffers. > > * > > * > > - * Some notes regarding Linux Perf: > > - * -------------------------------- > > + * Issues hit with first prototype based on Core Perf > > + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > * > > * The first prototype of this driver was based on the core perf > > * infrastructure, and while we did make that mostly work, with some changes to > > @@ -135,7 +145,7 @@ > > * for combining with the side-band raw reports it captures using > > * MI_REPORT_PERF_COUNT commands. > > * > > - * _ As a side note on perf's grouping feature; there was also some concern > > + * - As a side note on perf's grouping feature; there was also some concern > > * that using PERF_FORMAT_GROUP as a way to pack together counter values > > * would quite drastically inflate our sample sizes, which would likely > > * lower the effective sampling resolutions we could use when the available > > @@ -277,6 +287,20 @@ static struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = { > > > > #define SAMPLE_OA_REPORT (1<<0) > > > > +/** > > + * struct perf_open_properties - for validated properties given to open a stream > > + * @sample_flags: `DRM_I915_PERF_PROP_SAMPLE_*` properties are tracked as flags > > + * @single_context: Whether a single or all gpu contexts should be monitored > > + * @ctx_handle: A gem ctx handle for use with @single_context > > + * @metrics_set: An ID for an OA unit metric set advertised via sysfs > > + * @oa_format: An OA unit HW report format > > + * @oa_periodic: Whether to enable periodic OA unit sampling > > + * @oa_period_exponent: The OA unit sampling period is derived from this > > + * > > + * As read_properties_unlocked() enumerates and validates the properties given > > + * to open a stream of metrics the configuration is built up in the structure > > + * which starts out zero initialized. > > + */ > > struct perf_open_properties { > > u32 sample_flags; > > > > @@ -314,7 +338,19 @@ static bool gen7_oa_buffer_is_empty_fop_unlocked(struct drm_i915_private *dev_pr > > } > > > > /** > > - * Appends a status record to a userspace read() buffer. > > + * append_oa_status - Appends a status record to a userspace read() buffer. > > + * @stream: An i915-perf stream opened for OA metrics > > + * @buf: destination buffer given by userspace > > + * @count: the number of bytes userspace wants to read > > + * @offset: (inout): the current position for writing into @buf > > + * @type: The kind of status to report to userspace > > + * > > + * Writes a status record (such as `DRM_I915_PERF_RECORD_OA_ REPORT_LOST`) > > + * into the userspace read() buffer. > > + * > > + * The @buf @offset will only be updated on success. > > + * > > + * Returns: 0 on success, negative error code on failure. > > */ > > static int append_oa_status(struct i915_perf_stream *stream, > > char __user *buf, > > @@ -336,7 +372,21 @@ static int append_oa_status(struct i915_perf_stream *stream, > > } > > > > /** > > - * Copies single OA report into userspace read() buffer. > > + * append_oa_sample - Copies single OA report into userspace read() buffer. > > + * @stream: An i915-perf stream opened for OA metrics > > + * @buf: destination buffer given by userspace > > + * @count: the number of bytes userspace wants to read > > + * @offset: (inout): the current position for writing into @buf > > + * @report: A single OA report to (optionally) include as part of the sample > > + * > > + * The contents of a sample are configured through `DRM_I915_PERF_PROP_SAMPLE_*` > > + * properties when opening a stream, tracked as `stream->sample_flags`. This > > + * function copies the requested components of a single sample to the given > > + * read() @buf. > > + * > > + * The @buf @offset will only be updated on success. > > + * > > + * Returns: 0 on success, negative error code on failure. > > */ > > static int append_oa_sample(struct i915_perf_stream *stream, > > char __user *buf, > > @@ -380,8 +430,6 @@ static int append_oa_sample(struct i915_perf_stream *stream, > > * @head_ptr: (inout): the current oa buffer cpu read position > > * @tail: the current oa buffer gpu write position > > * > > - * Returns 0 on success, negative error code on failure. > > - * > > * Notably any error condition resulting in a short read (-ENOSPC or > > * -EFAULT) will be returned even though one or more records may > > * have been successfully copied. In this case it's up to the caller > > @@ -392,6 +440,8 @@ static int append_oa_sample(struct i915_perf_stream *stream, > > * tail, so the head chases the tail?... If you think that's mad > > * and back-to-front you're not alone, but this follows the > > * Gen PRM naming convention. > > + * > > + * Returns: 0 on success, negative error code on failure. > > */ > > static int gen7_append_oa_reports(struct i915_perf_stream *stream, > > char __user *buf, > > @@ -496,6 +546,22 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, > > return ret; > > } > > > > +/** > > + * gen7_oa_read - copy status records then buffered OA reports > > + * @stream: An i915-perf stream opened for OA metrics > > + * @buf: destination buffer given by userspace > > + * @count: the number of bytes userspace wants to read > > + * @offset: (inout): the current position for writing into @buf > > + * > > + * Checks Gen 7 specific OA unit status registers and if necessary appends > > + * corresponding status records for userspace (such as for a buffer full > > + * condition) and then initiate appending any buffered OA reports. > > + * > > + * Updates @offset according to the number of bytes successfully copied into > > + * the userspace buffer. > > + * > > + * Returns: zero on success or a negative error code > > + */ > > static int gen7_oa_read(struct i915_perf_stream *stream, > > char __user *buf, > > size_t count, > > @@ -597,6 +663,20 @@ static int gen7_oa_read(struct i915_perf_stream *stream, > > return ret; > > } > > > > +/** > > + * i915_oa_wait_unlocked - handles blocking IO until OA data available > > + * @stream: An i915-perf stream opened for OA metrics > > + * > > + * Called when userspace tries to read() from a blocking stream FD opened > > + * for OA metrics. It waits until the hrtimer callback finds a non-empty > > + * OA buffer and wakes us. > > + * > > + * Note: it's acceptable to have this return with some false positives > > + * since any subsequent read handling will return EAGAIN if there isn't > > + * really data ready for userspace yet. > > + * > > + * Returns: zero on success or a negative error code > > + */ > > static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) > > { > > struct drm_i915_private *dev_priv = stream->dev_priv; > > @@ -615,6 +695,16 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) > > !dev_priv->perf.oa.ops.oa_ buffer_is_empty(dev_priv)); > > } > > > > +/** > > + * i915_oa_poll_wait - call poll_wait() for an OA stream poll() > > + * @stream: An i915-perf stream opened for OA metrics > > + * @file: An i915 perf stream file > > + * @wait: poll() state table > > + * > > + * For handling userspace polling on an i915 perf stream opened for OA metrics, > > + * this starts a poll_wait with the wait queue that our hrtimer callback wakes > > + * when it sees data ready to read in the circular OA buffer. > > + */ > > static void i915_oa_poll_wait(struct i915_perf_stream *stream, > > struct file *file, > > poll_table *wait) > > @@ -624,6 +714,18 @@ static void i915_oa_poll_wait(struct i915_perf_stream *stream, > > poll_wait(file, &dev_priv->perf.oa.poll_wq, wait); > > } > > > > +/** > > + * i915_oa_read - just calls through to ``&i915_oa_ops->read`` > > + * @stream: An i915-perf stream opened for OA metrics > > + * @buf: destination buffer given by userspace > > + * @count: the number of bytes userspace wants to read > > + * @offset: (inout): the current position for writing into @buf > > + * > > + * Updates @offset according to the number of bytes successfully copied into > > + * the userspace buffer. > > + * > > + * Returns: zero on success or a negative error code > > + */ > > static int i915_oa_read(struct i915_perf_stream *stream, > > char __user *buf, > > size_t count, > > @@ -634,9 +736,15 @@ static int i915_oa_read(struct i915_perf_stream *stream, > > return dev_priv->perf.oa.ops.read(stream, buf, count, offset); > > } > > > > -/* Determine the render context hw id, and ensure it remains fixed for the > > +/** > > + * oa_get_render_ctx_id - determine and hold ctx hw id > > + * @stream: An i915-perf stream opened for OA metrics > > + * > > + * Determine the render context hw id, and ensure it remains fixed for the > > * lifetime of the stream. This ensures that we don't have to worry about > > * updating the context ID in OACONTROL on the fly. > > + * > > + * Returns: zero on success or a negative error code > > */ > > static int oa_get_render_ctx_id(struct i915_perf_stream *stream) > > { > > @@ -673,6 +781,13 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) > > return ret; > > } > > > > +/** > > + * oa_put_render_ctx_id - counterpart to oa_get_render_ctx_id releases hold > > + * @stream: An i915-perf stream opened for OA metrics > > + * > > + * In case anything needed doing to ensure the context HW ID would remain valid > > + * for the lifetime of the stream, then that can be undone here. > > + */ > > static void oa_put_render_ctx_id(struct i915_perf_stream *stream) > > { > > struct drm_i915_private *dev_priv = stream->dev_priv; > > @@ -945,6 +1060,15 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv) > > spin_unlock_irqrestore(&dev_priv->perf.hook_lock, flags); > > } > > > > +/** > > + * i915_oa_stream_enable - handle I915_PERF_IOCTL_ENABLE for OA stream > > + * @stream: An i915 perf stream opened for OA metrics > > + * > > + * [Re]enables hardware periodic sampling according to the period configured > > + * when opening the stream. This also starts a hrtimer that will periodically > > + * check for data in the circular OA buffer for notifying userspace (e.g. > > + * during a read() or poll()). > > + */ > > static void i915_oa_stream_enable(struct i915_perf_stream *stream) > > { > > struct drm_i915_private *dev_priv = stream->dev_priv; > > @@ -962,6 +1086,14 @@ static void gen7_oa_disable(struct drm_i915_private *dev_priv) > > I915_WRITE(GEN7_OACONTROL, 0); > > } > > > > +/** > > + * i915_oa_stream_disable - handle I915_PERF_IOCTL_DISABLE for OA stream > > + * @stream: An i915 perf stream opened for OA metrics > > + * > > + * Stops the OA unit from periodically writing counter reports into the > > + * circular OA buffer. This also stops the hrtimer that periodically checks for > > + * data in the circular OA buffer, for notifying userspace. > > + */ > > static void i915_oa_stream_disable(struct i915_perf_stream *stream) > > { > > struct drm_i915_private *dev_priv = stream->dev_priv; > > @@ -987,6 +1119,24 @@ static const struct i915_perf_stream_ops i915_oa_stream_ops = { > > .read = i915_oa_read, > > }; > > > > +/** > > + * i915_oa_stream_init - validate combined props for OA stream and init > > + * @stream: An i915 perf stream > > + * @param: The open parameters passed to 'DRM_I915_PERF_OPEN` > > + * @props: The property state that configures stream (individually validated) > > + * > > + * While read_properties_unlocked() validates properties in isolation it > > + * doesn't ensure that the combination necessarily makes sense. > > + * > > + * At this point it has been determined that userspace wants a stream of > > + * OA metrics, but still we need to further validate the combined > > + * properties are OK. > > + * > > + * If the configuration makes sense then we can allocate memory for > > + * a circular OA buffer and apply the requested metric set configuration. > > + * > > + * Returns: zero on success or a negative error code. > > + */ > > static int i915_oa_stream_init(struct i915_perf_stream *stream, > > struct drm_i915_perf_open_param *param, > > struct perf_open_properties *props) > > @@ -1110,6 +1260,31 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, > > return ret; > > } > > > > +/** > > + * i915_perf_read_locked - wrap stream->ops->read with error notmalisation > > + * @stream: An i915 perf stream > > + * @file: An i915 perf stream file > > + * @buf: destination buffer given by userspace > > + * @count: the number of bytes userspace wants to read > > + * @ppos: (inout) file seek position (unused) > > + * > > + * Besides wrapping stream->ops->read() this provides a common place to ensure > > + * that if we've successfully copied any data then reporting that takes > > + * precedence over any internal error status, so the data isn't lost. > > + * > > + * For example ret will be -ENOSPC whenever there is more buffered data than > > + * can be copied to userspace, but that's only interesting if we weren't able > > + * to copy some data because it implies the userspace buffer is too small to > > + * receive a single record (and we never split records). > > + * > > + * Another case with ret == -EFAULT is more of a grey area since it would seem > > + * like bad form for userspace to ask us to overrun its buffer, but the user > > + * knows best: > > + * > > + * http://yarchive.net/comp/linux/partial_reads_writes.html > > + * > > + * Returns: The number of bytes copied or a negative error code on failure. > > + */ > > static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream, > > struct file *file, > > char __user *buf, > > @@ -1125,25 +1300,27 @@ static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream, > > size_t offset = 0; > > int ret = stream->ops->read(stream, buf, count, &offset); > > > > - /* If we've successfully copied any data then reporting that > > - * takes precedence over any internal error status, so the > > - * data isn't lost. > > - * > > - * For example ret will be -ENOSPC whenever there is more > > - * buffered data than can be copied to userspace, but that's > > - * only interesting if we weren't able to copy some data > > - * because it implies the userspace buffer is too small to > > - * receive a single record (and we never split records). > > - * > > - * Another case with ret == -EFAULT is more of a grey area > > - * since it would seem like bad form for userspace to ask us > > - * to overrun its buffer, but the user knows best: > > - * > > - * http://yarchive.net/comp/linux/partial_reads_writes.html > > - */ > > return offset ?: (ret ?: -EAGAIN); > > } > > > > +/** > > + * i915_perf_read - handles read() FOP for i915 perf stream FDs > > + * @file: An i915 perf stream file > > + * @buf: destination buffer given by userspace > > + * @count: the number of bytes userspace wants to read > > + * @ppos: (inout) file seek position (unused) > > + * > > + * The entry point for handling a read() on a stream file descriptor from > > + * userspace. Most of the work is left to the i915_perf_read_locked() and > > + * stream->op->read() but to save having stream implementations (of which > > + * we might have multiple later) we handle blocking read here. > > + * > > + * We can also consistently treat trying to read from a disable stream > > + * as an IO error so implementations can assume the stream is enabled > > + * while reading. > > + * > > + * Returns: The number of bytes copied or a negative error code on failure. > > + */ > > static ssize_t i915_perf_read(struct file *file, > > char __user *buf, > > size_t count, > > @@ -1458,12 +1635,20 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, > > return ret; > > } > > > > -/* Note we copy the properties from userspace outside of the i915 perf > > - * mutex to avoid an awkward lockdep with mmap_sem. > > +/** > > + * read_properties_unlocked - validate + copy userspace stream open properties > > + * @dev_priv: i915 device instance > > + * @uprops: The array of u64 key value pairs given by userspace > > + * @n_props: The number of key value pairs expected in @uprops > > + * @props: The stream configuration built up while validating properties > > * > > * Note this function only validates properties in isolation it doesn't > > * validate that the combination of properties makes sense or that all > > * properties necessary for a particular kind of stream have been set. > > + * > > + * Note that there currently aren't any ordering requirements for properties so > > + * we shouldn't validate or assume anything about ordering here. This doesn't > > + * rule out defining new properties with ordering requirements in the future. > > */ > > static int read_properties_unlocked(struct drm_i915_private *dev_priv, > > u64 __user *uprops, > > @@ -1585,6 +1770,26 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, > > return 0; > > } > > > > +/** > > + * i915_perf_open_ioctl - DRM ioctl() for userspace to open a stream FD > > + * @dev: drm device > > + * @data: ioctl data copied from userspace (unvalidated) > > + * @file: drm file > > + * > > + * Validates the stream open parameters given by userspace including flags > > + * and an array of u64 key, value pair properties. > > + * > > + * Very little is assumed up front about the nature of the stream being > > + * opened (for instance we don't assume it's for periodic OA unit metrics). An > > + * i915-perf stream is expected to be a suitable interface for other forms of > > + * buffered data written by the GPU besides periodic OA metrics. > > + * > > + * Note we copy the properties from userspace outside of the i915 perf > > + * mutex to avoid an awkward lockdep with mmap_sem. > > + * > > + * Return: A newly opened i915 Perf stream file descriptor or negative > > + * error code on failure. > > + */ > > int i915_perf_open_ioctl(struct drm_device *dev, void *data, > > struct drm_file *file) > > { > > @@ -1621,6 +1826,14 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data, > > return ret; > > } > > > > +/** > > + * i915_perf_register - exposes i915-perf to userspace > > + * @dev_priv: i915 device instance > > + * > > + * In particular OA metric sets are advertised under a sysfs metrics/ > > + * directory allowing userspace to enumerate valid IDs that can be > > + * used to open an i915-perf stream. > > + */ > > void i915_perf_register(struct drm_i915_private *dev_priv) > > { > > if (!IS_HASWELL(dev_priv)) > > @@ -1650,6 +1863,15 @@ void i915_perf_register(struct drm_i915_private *dev_priv) > > mutex_unlock(&dev_priv->perf.lock); > > } > > > > +/** > > + * i915_perf_register - hide i915-perf from userspace > > + * @dev_priv: i915 device instance > > + * > > + * i915-perf state cleanup is split up into an 'unregister' and > > + * 'deinit' phase where the interface is first hidden from > > + * userspace by i915_perf_unregister() before cleaning up > > + * remaining state in i915_perf_fini(). > > + */ > > void i915_perf_unregister(struct drm_i915_private *dev_priv) > > { > > if (!IS_HASWELL(dev_priv)) > > @@ -1706,6 +1928,15 @@ static struct ctl_table dev_root[] = { > > {} > > }; > > > > +/** > > + * i915_perf_init - initialize i915-perf state on module load > > + * @dev_priv: i915 device instance > > + * > > + * Initializes state i915-perf state without exposing anything to userspace. > > + * > > + * Note: i915-perf initialization is split into an 'init' and 'register' > > + * phase with the i915_perf_register() exposing state to userspace. > > + */ > > void i915_perf_init(struct drm_i915_private *dev_priv) > > { > > if (!IS_HASWELL(dev_priv)) > > @@ -1741,6 +1972,10 @@ void i915_perf_init(struct drm_i915_private *dev_priv) > > dev_priv->perf.initialized = true; > > } > > > > +/** > > + * i915_perf_fini - Counter part to i915_perf_init() > > + * @dev_priv: i915 device instance > > + */ > > void i915_perf_fini(struct drm_i915_private *dev_priv) > > { > > if (!dev_priv->perf.initialized) > > -- > > 2.10.2 > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Thu, Dec 1, 2016 at 12:12 PM, Jani Nikula <jani.nikula@linux.intel.com> wrote: > On Wed, 30 Nov 2016, Daniel Vetter <daniel@ffwll.ch> wrote: > > On Tue, Nov 29, 2016 at 05:00:55PM +0000, Robert Bragg wrote: > >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > >> + :functions: i915_perf_init > >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > >> + :functions: i915_perf_fini > >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > >> + :functions: i915_perf_register > >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > >> + :functions: i915_perf_unregister > >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > >> + :functions: i915_perf_open_ioctl > >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c > >> + :functions: i915_perf_release > > > > One potential issue with listing everything explicitly is that if someone > > ever (and this is bound to happen) adds a new function, they'll forget to > > add it. Hence we just pull them all in, and if you want to refernce some > > specifically, do that in the overview sections. > > One real issue with listing everything separately is that kernel-doc > parses the source file once per every kernel-doc directive. > Yeah, this is unfortunate and I'd originally hoped I could pass an ordered list which could reduce how often kernel-doc is run. In practice I haven't seen a performance problem with doing this though. > > Also, doesn't Sphinx complain about not having a blank line to end the > indented block after the directive? It might not, but I thought it > might. > Apparently it's ok, I've been generating and previewing the documentation and haven't seen a warning about this. From the restructure text spec, regarding white space: "Blank lines may be omitted when the markup makes element separation unambiguous, in conjunction with indentation." Regards, - Robert > > BR, > Jani. > > > -- > Jani Nikula, Intel Open Source Technology Center >
diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst index 117d2ab..714bd4b 100644 --- a/Documentation/gpu/i915.rst +++ b/Documentation/gpu/i915.rst @@ -356,4 +356,96 @@ switch_mm .. kernel-doc:: drivers/gpu/drm/i915/i915_trace.h :doc: switch_mm tracepoint +Perf +==== + +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :doc: i915 Perf Overview + +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :doc: i915 Perf History and Comparison with Core Perf + +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :doc: i915 Perf File Operations + +i915 Driver Entry Points +------------------------ + +This section covers the entrypoints exported outside of i915_perf.c to +integrate with drm/i915 and to handle the `DRM_I915_PERF_OPEN` ioctl. + +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_init +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_fini +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_register +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_unregister +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_open_ioctl +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_release + +i915 Perf Stream +---------------- + +This section covers the stream-semantics-agnostic structures and functions +for representing an i915 perf stream FD and associated file operations. + +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h + :functions: i915_perf_stream +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h + :functions: i915_perf_stream_ops + +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: read_properties_unlocked +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_open_ioctl_unlocked +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_destroy_unlocked +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_read +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_ioctl +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_enable_unlocked +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_disable_unlocked +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_poll +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_perf_poll_unlocked + +i915 Perf Observation Architecture Stream +----------------------------------------- + +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: OA_BUFFER_SIZE +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h + :functions: i915_oa_ops + +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_oa_stream_init +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_oa_read +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_oa_stream_enable +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_oa_stream_disable +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_oa_wait_unlocked +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :functions: i915_oa_poll_wait + +All i915 Perf Internals +----------------------- + +This section simply includes all currently documented i915 perf internals, in +no particular order, but may include some more minor utilities or platform +specific details than found in the more high-level sections. + +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c + :internal: + .. WARNING: DOCPROC directive not supported: !Cdrivers/gpu/drm/i915/i915_irq.c diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1ec9619..9f92755 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1827,89 +1827,186 @@ struct i915_oa_reg { struct i915_perf_stream; +/** + * struct i915_perf_stream_ops - the OPs to support a specific stream type + */ struct i915_perf_stream_ops { - /* Enables the collection of HW samples, either in response to - * I915_PERF_IOCTL_ENABLE or implicitly called when stream is - * opened without I915_PERF_FLAG_DISABLED. + /** + * @enable: Enables the collection of HW samples, either in response to + * I915_PERF_IOCTL_ENABLE or implicitly called when stream is opened + * without I915_PERF_FLAG_DISABLED. */ void (*enable)(struct i915_perf_stream *stream); - /* Disables the collection of HW samples, either in response to - * I915_PERF_IOCTL_DISABLE or implicitly called before - * destroying the stream. + /** + * @disable: Disables the collection of HW samples, either in response + * to I915_PERF_IOCTL_DISABLE or implicitly called before destroying + * the stream. */ void (*disable)(struct i915_perf_stream *stream); - /* Call poll_wait, passing a wait queue that will be woken + /** + * @poll_wait: Call poll_wait, passing a wait queue that will be woken * once there is something ready to read() for the stream */ void (*poll_wait)(struct i915_perf_stream *stream, struct file *file, poll_table *wait); - /* For handling a blocking read, wait until there is something - * to ready to read() for the stream. E.g. wait on the same + /** + * @wait_unlocked: For handling a blocking read, wait until there is + * something to ready to read() for the stream. E.g. wait on the same * wait queue that would be passed to poll_wait(). */ int (*wait_unlocked)(struct i915_perf_stream *stream); - /* read - Copy buffered metrics as records to userspace - * @buf: the userspace, destination buffer - * @count: the number of bytes to copy, requested by userspace - * @offset: zero at the start of the read, updated as the read - * proceeds, it represents how many bytes have been - * copied so far and the buffer offset for copying the - * next record. + /** + * @read: Copy buffered metrics as records to userspace + * **buf**: the userspace, destination buffer + * **count**: the number of bytes to copy, requested by userspace + * **offset**: zero at the start of the read, updated as the read + * proceeds, it represents how many bytes have been copied so far and + * the buffer offset for copying the next record. * - * Copy as many buffered i915 perf samples and records for - * this stream to userspace as will fit in the given buffer. + * Copy as many buffered i915 perf samples and records for this stream + * to userspace as will fit in the given buffer. * - * Only write complete records; returning -ENOSPC if there - * isn't room for a complete record. + * Only write complete records; returning -ENOSPC if there isn't room + * for a complete record. * - * Return any error condition that results in a short read - * such as -ENOSPC or -EFAULT, even though these may be - * squashed before returning to userspace. + * Return any error condition that results in a short read such as + * -ENOSPC or -EFAULT, even though these may be squashed before + * returning to userspace. */ int (*read)(struct i915_perf_stream *stream, char __user *buf, size_t count, size_t *offset); - /* Cleanup any stream specific resources. + /** + * @destroy: Cleanup any stream specific resources. * * The stream will always be disabled before this is called. */ void (*destroy)(struct i915_perf_stream *stream); }; +/** + * struct i915_perf_stream - state for a single open stream FD + */ struct i915_perf_stream { + /** + * @dev_priv: i915 drm device + */ struct drm_i915_private *dev_priv; + /** + * @link: Links the stream into ``&drm_i915_private->streams`` + */ struct list_head link; + /** + * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*` + * properties given when opening a stream, representing the contents + * of a single sample as read() by userspace. + */ u32 sample_flags; + + /** + * @sample_size: Considering the configured contents of a sample + * combined with the required header size, this is the total size + * of a single sample record. + */ int sample_size; + /** + * @ctx: %NULL if measuring system-wide across all contexts or a + * specific context that is being monitored. + */ struct i915_gem_context *ctx; + + /** + * @enabled: Whether the stream is currently enabled, considering + * whether the stream was opened in a disabled state and based + * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls. + */ bool enabled; + /** + * @ops: The callbacks providing the implementation of this specific + * type of configured stream. + */ const struct i915_perf_stream_ops *ops; }; +/** + * struct i915_oa_ops - Gen specific implementation of an OA unit stream + */ struct i915_oa_ops { + /** + * @init_oa_buffer: Resets the head and tail pointers of the + * circular buffer for periodic OA reports. + * + * Called when first opening a stream for OA metrics, but also may be + * called in response to an OA buffer overflow or other error + * condition. + * + * Note it may be necessary to clear the full OA buffer here as part of + * maintaining the invariable that new reports must be written to + * zeroed memory for us to be able to reliable detect if an expected + * report has not yet landed in memory. (At least on Haswell the OA + * buffer tail pointer is not synchronized with reports being visible + * to the CPU) + */ void (*init_oa_buffer)(struct drm_i915_private *dev_priv); + + /** + * @enable_metric_set: Applies any MUX configuration to set up the + * Boolean and Custom (B/C) counters that are part of the counter + * reports being sampled. May apply system constraints such as + * disabling EU clock gating as required. + */ int (*enable_metric_set)(struct drm_i915_private *dev_priv); + + /** + * @disable_metric_set: Remove system constraints associated with using + * the OA unit. + */ void (*disable_metric_set)(struct drm_i915_private *dev_priv); + + /** + * @oa_enable: Enable periodic sampling + */ void (*oa_enable)(struct drm_i915_private *dev_priv); + + /** + * @oa_disable: Disable periodic sampling + */ void (*oa_disable)(struct drm_i915_private *dev_priv); - void (*update_oacontrol)(struct drm_i915_private *dev_priv); - void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv, - u32 ctx_id); + + /** + * @read: Copy data from the circular OA buffer into a given userspace + * buffer. + */ int (*read)(struct i915_perf_stream *stream, char __user *buf, size_t count, size_t *offset); + + /** + * @oa_buffer_is_empty: Check if OA buffer empty (false positives OK) + * + * This is either called via fops or the poll check hrtimer (atomic + * ctx) without any locks taken. + * + * It's safe to read OA config state here unlocked, assuming that this + * is only called while the stream is enabled, while the global OA + * configuration can't be modified. + * + * Efficiency is more important than avoiding some false positives + * here, which will be handled gracefully - likely resulting in an + * EAGAIN error for userspace. + */ bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv); }; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 68b7c27..0be8cef 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -26,7 +26,10 @@ /** - * DOC: i915 Perf, streaming API for GPU metrics + * DOC: i915 Perf Overview + * + * Overview + * -------- * * Gen graphics supports a large number of performance counters that can help * driver and application developers understand and optimize their use of the @@ -45,6 +48,13 @@ * privileges by default, unless changed via the dev.i915.perf_event_paranoid * sysctl option. * + */ + +/** + * DOC: i915 Perf History and Comparison with Core Perf + * + * Comparison with Core Perf + * ------------------------- * * The interface was initially inspired by the core Perf infrastructure but * some notable differences are: @@ -75,8 +85,8 @@ * gets copied from the GPU mapped buffers to userspace buffers. * * - * Some notes regarding Linux Perf: - * -------------------------------- + * Issues hit with first prototype based on Core Perf + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * The first prototype of this driver was based on the core perf * infrastructure, and while we did make that mostly work, with some changes to @@ -135,7 +145,7 @@ * for combining with the side-band raw reports it captures using * MI_REPORT_PERF_COUNT commands. * - * _ As a side note on perf's grouping feature; there was also some concern + * - As a side note on perf's grouping feature; there was also some concern * that using PERF_FORMAT_GROUP as a way to pack together counter values * would quite drastically inflate our sample sizes, which would likely * lower the effective sampling resolutions we could use when the available @@ -277,6 +287,20 @@ static struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = { #define SAMPLE_OA_REPORT (1<<0) +/** + * struct perf_open_properties - for validated properties given to open a stream + * @sample_flags: `DRM_I915_PERF_PROP_SAMPLE_*` properties are tracked as flags + * @single_context: Whether a single or all gpu contexts should be monitored + * @ctx_handle: A gem ctx handle for use with @single_context + * @metrics_set: An ID for an OA unit metric set advertised via sysfs + * @oa_format: An OA unit HW report format + * @oa_periodic: Whether to enable periodic OA unit sampling + * @oa_period_exponent: The OA unit sampling period is derived from this + * + * As read_properties_unlocked() enumerates and validates the properties given + * to open a stream of metrics the configuration is built up in the structure + * which starts out zero initialized. + */ struct perf_open_properties { u32 sample_flags; @@ -314,7 +338,19 @@ static bool gen7_oa_buffer_is_empty_fop_unlocked(struct drm_i915_private *dev_pr } /** - * Appends a status record to a userspace read() buffer. + * append_oa_status - Appends a status record to a userspace read() buffer. + * @stream: An i915-perf stream opened for OA metrics + * @buf: destination buffer given by userspace + * @count: the number of bytes userspace wants to read + * @offset: (inout): the current position for writing into @buf + * @type: The kind of status to report to userspace + * + * Writes a status record (such as `DRM_I915_PERF_RECORD_OA_REPORT_LOST`) + * into the userspace read() buffer. + * + * The @buf @offset will only be updated on success. + * + * Returns: 0 on success, negative error code on failure. */ static int append_oa_status(struct i915_perf_stream *stream, char __user *buf, @@ -336,7 +372,21 @@ static int append_oa_status(struct i915_perf_stream *stream, } /** - * Copies single OA report into userspace read() buffer. + * append_oa_sample - Copies single OA report into userspace read() buffer. + * @stream: An i915-perf stream opened for OA metrics + * @buf: destination buffer given by userspace + * @count: the number of bytes userspace wants to read + * @offset: (inout): the current position for writing into @buf + * @report: A single OA report to (optionally) include as part of the sample + * + * The contents of a sample are configured through `DRM_I915_PERF_PROP_SAMPLE_*` + * properties when opening a stream, tracked as `stream->sample_flags`. This + * function copies the requested components of a single sample to the given + * read() @buf. + * + * The @buf @offset will only be updated on success. + * + * Returns: 0 on success, negative error code on failure. */ static int append_oa_sample(struct i915_perf_stream *stream, char __user *buf, @@ -380,8 +430,6 @@ static int append_oa_sample(struct i915_perf_stream *stream, * @head_ptr: (inout): the current oa buffer cpu read position * @tail: the current oa buffer gpu write position * - * Returns 0 on success, negative error code on failure. - * * Notably any error condition resulting in a short read (-ENOSPC or * -EFAULT) will be returned even though one or more records may * have been successfully copied. In this case it's up to the caller @@ -392,6 +440,8 @@ static int append_oa_sample(struct i915_perf_stream *stream, * tail, so the head chases the tail?... If you think that's mad * and back-to-front you're not alone, but this follows the * Gen PRM naming convention. + * + * Returns: 0 on success, negative error code on failure. */ static int gen7_append_oa_reports(struct i915_perf_stream *stream, char __user *buf, @@ -496,6 +546,22 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, return ret; } +/** + * gen7_oa_read - copy status records then buffered OA reports + * @stream: An i915-perf stream opened for OA metrics + * @buf: destination buffer given by userspace + * @count: the number of bytes userspace wants to read + * @offset: (inout): the current position for writing into @buf + * + * Checks Gen 7 specific OA unit status registers and if necessary appends + * corresponding status records for userspace (such as for a buffer full + * condition) and then initiate appending any buffered OA reports. + * + * Updates @offset according to the number of bytes successfully copied into + * the userspace buffer. + * + * Returns: zero on success or a negative error code + */ static int gen7_oa_read(struct i915_perf_stream *stream, char __user *buf, size_t count, @@ -597,6 +663,20 @@ static int gen7_oa_read(struct i915_perf_stream *stream, return ret; } +/** + * i915_oa_wait_unlocked - handles blocking IO until OA data available + * @stream: An i915-perf stream opened for OA metrics + * + * Called when userspace tries to read() from a blocking stream FD opened + * for OA metrics. It waits until the hrtimer callback finds a non-empty + * OA buffer and wakes us. + * + * Note: it's acceptable to have this return with some false positives + * since any subsequent read handling will return EAGAIN if there isn't + * really data ready for userspace yet. + * + * Returns: zero on success or a negative error code + */ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; @@ -615,6 +695,16 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) !dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv)); } +/** + * i915_oa_poll_wait - call poll_wait() for an OA stream poll() + * @stream: An i915-perf stream opened for OA metrics + * @file: An i915 perf stream file + * @wait: poll() state table + * + * For handling userspace polling on an i915 perf stream opened for OA metrics, + * this starts a poll_wait with the wait queue that our hrtimer callback wakes + * when it sees data ready to read in the circular OA buffer. + */ static void i915_oa_poll_wait(struct i915_perf_stream *stream, struct file *file, poll_table *wait) @@ -624,6 +714,18 @@ static void i915_oa_poll_wait(struct i915_perf_stream *stream, poll_wait(file, &dev_priv->perf.oa.poll_wq, wait); } +/** + * i915_oa_read - just calls through to ``&i915_oa_ops->read`` + * @stream: An i915-perf stream opened for OA metrics + * @buf: destination buffer given by userspace + * @count: the number of bytes userspace wants to read + * @offset: (inout): the current position for writing into @buf + * + * Updates @offset according to the number of bytes successfully copied into + * the userspace buffer. + * + * Returns: zero on success or a negative error code + */ static int i915_oa_read(struct i915_perf_stream *stream, char __user *buf, size_t count, @@ -634,9 +736,15 @@ static int i915_oa_read(struct i915_perf_stream *stream, return dev_priv->perf.oa.ops.read(stream, buf, count, offset); } -/* Determine the render context hw id, and ensure it remains fixed for the +/** + * oa_get_render_ctx_id - determine and hold ctx hw id + * @stream: An i915-perf stream opened for OA metrics + * + * Determine the render context hw id, and ensure it remains fixed for the * lifetime of the stream. This ensures that we don't have to worry about * updating the context ID in OACONTROL on the fly. + * + * Returns: zero on success or a negative error code */ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) { @@ -673,6 +781,13 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) return ret; } +/** + * oa_put_render_ctx_id - counterpart to oa_get_render_ctx_id releases hold + * @stream: An i915-perf stream opened for OA metrics + * + * In case anything needed doing to ensure the context HW ID would remain valid + * for the lifetime of the stream, then that can be undone here. + */ static void oa_put_render_ctx_id(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; @@ -945,6 +1060,15 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv) spin_unlock_irqrestore(&dev_priv->perf.hook_lock, flags); } +/** + * i915_oa_stream_enable - handle I915_PERF_IOCTL_ENABLE for OA stream + * @stream: An i915 perf stream opened for OA metrics + * + * [Re]enables hardware periodic sampling according to the period configured + * when opening the stream. This also starts a hrtimer that will periodically + * check for data in the circular OA buffer for notifying userspace (e.g. + * during a read() or poll()). + */ static void i915_oa_stream_enable(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; @@ -962,6 +1086,14 @@ static void gen7_oa_disable(struct drm_i915_private *dev_priv) I915_WRITE(GEN7_OACONTROL, 0); } +/** + * i915_oa_stream_disable - handle I915_PERF_IOCTL_DISABLE for OA stream + * @stream: An i915 perf stream opened for OA metrics + * + * Stops the OA unit from periodically writing counter reports into the + * circular OA buffer. This also stops the hrtimer that periodically checks for + * data in the circular OA buffer, for notifying userspace. + */ static void i915_oa_stream_disable(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; @@ -987,6 +1119,24 @@ static const struct i915_perf_stream_ops i915_oa_stream_ops = { .read = i915_oa_read, }; +/** + * i915_oa_stream_init - validate combined props for OA stream and init + * @stream: An i915 perf stream + * @param: The open parameters passed to 'DRM_I915_PERF_OPEN` + * @props: The property state that configures stream (individually validated) + * + * While read_properties_unlocked() validates properties in isolation it + * doesn't ensure that the combination necessarily makes sense. + * + * At this point it has been determined that userspace wants a stream of + * OA metrics, but still we need to further validate the combined + * properties are OK. + * + * If the configuration makes sense then we can allocate memory for + * a circular OA buffer and apply the requested metric set configuration. + * + * Returns: zero on success or a negative error code. + */ static int i915_oa_stream_init(struct i915_perf_stream *stream, struct drm_i915_perf_open_param *param, struct perf_open_properties *props) @@ -1110,6 +1260,31 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, return ret; } +/** + * i915_perf_read_locked - wrap stream->ops->read with error notmalisation + * @stream: An i915 perf stream + * @file: An i915 perf stream file + * @buf: destination buffer given by userspace + * @count: the number of bytes userspace wants to read + * @ppos: (inout) file seek position (unused) + * + * Besides wrapping stream->ops->read() this provides a common place to ensure + * that if we've successfully copied any data then reporting that takes + * precedence over any internal error status, so the data isn't lost. + * + * For example ret will be -ENOSPC whenever there is more buffered data than + * can be copied to userspace, but that's only interesting if we weren't able + * to copy some data because it implies the userspace buffer is too small to + * receive a single record (and we never split records). + * + * Another case with ret == -EFAULT is more of a grey area since it would seem + * like bad form for userspace to ask us to overrun its buffer, but the user + * knows best: + * + * http://yarchive.net/comp/linux/partial_reads_writes.html + * + * Returns: The number of bytes copied or a negative error code on failure. + */ static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream, struct file *file, char __user *buf, @@ -1125,25 +1300,27 @@ static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream, size_t offset = 0; int ret = stream->ops->read(stream, buf, count, &offset); - /* If we've successfully copied any data then reporting that - * takes precedence over any internal error status, so the - * data isn't lost. - * - * For example ret will be -ENOSPC whenever there is more - * buffered data than can be copied to userspace, but that's - * only interesting if we weren't able to copy some data - * because it implies the userspace buffer is too small to - * receive a single record (and we never split records). - * - * Another case with ret == -EFAULT is more of a grey area - * since it would seem like bad form for userspace to ask us - * to overrun its buffer, but the user knows best: - * - * http://yarchive.net/comp/linux/partial_reads_writes.html - */ return offset ?: (ret ?: -EAGAIN); } +/** + * i915_perf_read - handles read() FOP for i915 perf stream FDs + * @file: An i915 perf stream file + * @buf: destination buffer given by userspace + * @count: the number of bytes userspace wants to read + * @ppos: (inout) file seek position (unused) + * + * The entry point for handling a read() on a stream file descriptor from + * userspace. Most of the work is left to the i915_perf_read_locked() and + * stream->op->read() but to save having stream implementations (of which + * we might have multiple later) we handle blocking read here. + * + * We can also consistently treat trying to read from a disable stream + * as an IO error so implementations can assume the stream is enabled + * while reading. + * + * Returns: The number of bytes copied or a negative error code on failure. + */ static ssize_t i915_perf_read(struct file *file, char __user *buf, size_t count, @@ -1458,12 +1635,20 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, return ret; } -/* Note we copy the properties from userspace outside of the i915 perf - * mutex to avoid an awkward lockdep with mmap_sem. +/** + * read_properties_unlocked - validate + copy userspace stream open properties + * @dev_priv: i915 device instance + * @uprops: The array of u64 key value pairs given by userspace + * @n_props: The number of key value pairs expected in @uprops + * @props: The stream configuration built up while validating properties * * Note this function only validates properties in isolation it doesn't * validate that the combination of properties makes sense or that all * properties necessary for a particular kind of stream have been set. + * + * Note that there currently aren't any ordering requirements for properties so + * we shouldn't validate or assume anything about ordering here. This doesn't + * rule out defining new properties with ordering requirements in the future. */ static int read_properties_unlocked(struct drm_i915_private *dev_priv, u64 __user *uprops, @@ -1585,6 +1770,26 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, return 0; } +/** + * i915_perf_open_ioctl - DRM ioctl() for userspace to open a stream FD + * @dev: drm device + * @data: ioctl data copied from userspace (unvalidated) + * @file: drm file + * + * Validates the stream open parameters given by userspace including flags + * and an array of u64 key, value pair properties. + * + * Very little is assumed up front about the nature of the stream being + * opened (for instance we don't assume it's for periodic OA unit metrics). An + * i915-perf stream is expected to be a suitable interface for other forms of + * buffered data written by the GPU besides periodic OA metrics. + * + * Note we copy the properties from userspace outside of the i915 perf + * mutex to avoid an awkward lockdep with mmap_sem. + * + * Return: A newly opened i915 Perf stream file descriptor or negative + * error code on failure. + */ int i915_perf_open_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { @@ -1621,6 +1826,14 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data, return ret; } +/** + * i915_perf_register - exposes i915-perf to userspace + * @dev_priv: i915 device instance + * + * In particular OA metric sets are advertised under a sysfs metrics/ + * directory allowing userspace to enumerate valid IDs that can be + * used to open an i915-perf stream. + */ void i915_perf_register(struct drm_i915_private *dev_priv) { if (!IS_HASWELL(dev_priv)) @@ -1650,6 +1863,15 @@ void i915_perf_register(struct drm_i915_private *dev_priv) mutex_unlock(&dev_priv->perf.lock); } +/** + * i915_perf_register - hide i915-perf from userspace + * @dev_priv: i915 device instance + * + * i915-perf state cleanup is split up into an 'unregister' and + * 'deinit' phase where the interface is first hidden from + * userspace by i915_perf_unregister() before cleaning up + * remaining state in i915_perf_fini(). + */ void i915_perf_unregister(struct drm_i915_private *dev_priv) { if (!IS_HASWELL(dev_priv)) @@ -1706,6 +1928,15 @@ static struct ctl_table dev_root[] = { {} }; +/** + * i915_perf_init - initialize i915-perf state on module load + * @dev_priv: i915 device instance + * + * Initializes state i915-perf state without exposing anything to userspace. + * + * Note: i915-perf initialization is split into an 'init' and 'register' + * phase with the i915_perf_register() exposing state to userspace. + */ void i915_perf_init(struct drm_i915_private *dev_priv) { if (!IS_HASWELL(dev_priv)) @@ -1741,6 +1972,10 @@ void i915_perf_init(struct drm_i915_private *dev_priv) dev_priv->perf.initialized = true; } +/** + * i915_perf_fini - Counter part to i915_perf_init() + * @dev_priv: i915 device instance + */ void i915_perf_fini(struct drm_i915_private *dev_priv) { if (!dev_priv->perf.initialized)
This adds a 'Perf' section to i915.rst with the following sub sections: - Overview - Comparison with Core Perf - i915 Driver Entry Points - i915 Perf Stream - i915 Perf Observation Architecture Stream - All i915 Perf Internals Signed-off-by: Robert Bragg <robert@sixbynine.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> --- Documentation/gpu/i915.rst | 92 +++++++++++++ drivers/gpu/drm/i915/i915_drv.h | 151 ++++++++++++++++---- drivers/gpu/drm/i915/i915_perf.c | 289 +++++++++++++++++++++++++++++++++++---- 3 files changed, 478 insertions(+), 54 deletions(-)