Message ID | 1510748034-14034-5-git-send-email-sagar.a.kamble@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Sagar Arun Kamble (2017-11-15 12:13:54) > From: Sourab Gupta <sourab.gupta@intel.com> > > Currently, we have the ability to only forward the GPU timestamps in the > samples (which are generated via OA reports). This limits the ability to > correlate these samples with the system events. > > An ability is therefore needed to report timestamps in different clock > domains, such as CLOCK_MONOTONIC, in the perf samples to be of more > practical use to the userspace. This ability becomes important > when we want to correlate/plot GPU events/samples with other system events > on the same timeline (e.g. vblank events, or timestamps when work was > submitted to kernel, etc.) > > The patch here proposes a mechanism to achieve this. The correlation > between gpu time and system time is established using the timestamp clock > associated with the command stream, abstracted as timecounter/cyclecounter > to retrieve gpu/system time correlated values. > > v2: Added i915_driver_init_late() function to capture the new late init > phase for perf (Chris) > > v3: Removed cross-timestamp changes. > > Signed-off-by: Sourab Gupta <sourab.gupta@intel.com> > Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> > Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Sourab Gupta <sourab.gupta@intel.com> > Cc: Matthew Auld <matthew.auld@intel.com> > --- > drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++ > include/uapi/drm/i915_drm.h | 7 +++++++ > 2 files changed, 34 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > index 3b721d7..94ee924 100644 > --- a/drivers/gpu/drm/i915/i915_perf.c > +++ b/drivers/gpu/drm/i915/i915_perf.c > @@ -336,6 +336,7 @@ > > #define SAMPLE_OA_REPORT BIT(0) > #define SAMPLE_GPU_TS BIT(1) > +#define SAMPLE_SYSTEM_TS BIT(2) > > /** > * struct perf_open_properties - for validated properties given to open a stream > @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream, > struct drm_i915_perf_record_header header; > u32 sample_flags = stream->sample_flags; > u64 gpu_ts = 0; > + u64 system_ts = 0; > > header.type = DRM_I915_PERF_RECORD_SAMPLE; > header.pad = 0; > @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream, > > if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE)) > return -EFAULT; > + buf += I915_PERF_TS_SAMPLE_SIZE; > + } > + > + if (sample_flags & SAMPLE_SYSTEM_TS) { > + gpu_ts = get_gpu_ts_from_oa_report(stream, report); Scope your variables. Stops us from being confused as to where else gpu_ts or sys_ts may be reused. For instance I first thought you were using SAMPLE_GPU_TS to initialise gpu_ts > + /* > + * XXX: timecounter_cyc2time considers time backwards if delta > + * timestamp is more than half the max ns time covered by > + * counter. It will be ~35min for 36 bit counter. If this much > + * sampling duration is needed we will have to update tc->nsec > + * by explicitly reading the timecounter (timecounter_read) > + * before this duration. > + */ > + system_ts = timecounter_cyc2time(&stream->tc, gpu_ts); > + > + if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE)) > + return -EFAULT; Advance buf.
On 11/15/2017 6:01 PM, Chris Wilson wrote: > Quoting Sagar Arun Kamble (2017-11-15 12:13:54) >> From: Sourab Gupta <sourab.gupta@intel.com> >> >> Currently, we have the ability to only forward the GPU timestamps in the >> samples (which are generated via OA reports). This limits the ability to >> correlate these samples with the system events. >> >> An ability is therefore needed to report timestamps in different clock >> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more >> practical use to the userspace. This ability becomes important >> when we want to correlate/plot GPU events/samples with other system events >> on the same timeline (e.g. vblank events, or timestamps when work was >> submitted to kernel, etc.) >> >> The patch here proposes a mechanism to achieve this. The correlation >> between gpu time and system time is established using the timestamp clock >> associated with the command stream, abstracted as timecounter/cyclecounter >> to retrieve gpu/system time correlated values. >> >> v2: Added i915_driver_init_late() function to capture the new late init >> phase for perf (Chris) >> >> v3: Removed cross-timestamp changes. >> >> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com> >> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> >> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Sourab Gupta <sourab.gupta@intel.com> >> Cc: Matthew Auld <matthew.auld@intel.com> >> --- >> drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++ >> include/uapi/drm/i915_drm.h | 7 +++++++ >> 2 files changed, 34 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c >> index 3b721d7..94ee924 100644 >> --- a/drivers/gpu/drm/i915/i915_perf.c >> +++ b/drivers/gpu/drm/i915/i915_perf.c >> @@ -336,6 +336,7 @@ >> >> #define SAMPLE_OA_REPORT BIT(0) >> #define SAMPLE_GPU_TS BIT(1) >> +#define SAMPLE_SYSTEM_TS BIT(2) >> >> /** >> * struct perf_open_properties - for validated properties given to open a stream >> @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream, >> struct drm_i915_perf_record_header header; >> u32 sample_flags = stream->sample_flags; >> u64 gpu_ts = 0; >> + u64 system_ts = 0; >> >> header.type = DRM_I915_PERF_RECORD_SAMPLE; >> header.pad = 0; >> @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream, >> >> if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE)) >> return -EFAULT; >> + buf += I915_PERF_TS_SAMPLE_SIZE; >> + } >> + >> + if (sample_flags & SAMPLE_SYSTEM_TS) { >> + gpu_ts = get_gpu_ts_from_oa_report(stream, report); > Scope your variables. Stops us from being confused as to where else > gpu_ts or sys_ts may be reused. For instance I first thought you were > using SAMPLE_GPU_TS to initialise gpu_ts Yes >> + /* >> + * XXX: timecounter_cyc2time considers time backwards if delta >> + * timestamp is more than half the max ns time covered by >> + * counter. It will be ~35min for 36 bit counter. If this much >> + * sampling duration is needed we will have to update tc->nsec >> + * by explicitly reading the timecounter (timecounter_read) >> + * before this duration. >> + */ >> + system_ts = timecounter_cyc2time(&stream->tc, gpu_ts); >> + >> + if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE)) >> + return -EFAULT; > Advance buf. Had kept advance logic only if there is new field to be added as this advance is missing for SAMPLE_OA_REPORT currently in drm-tip. Will fix. Thanks for the review :)
On 11/15/2017 5:43 PM, Sagar Arun Kamble wrote: > From: Sourab Gupta <sourab.gupta@intel.com> > > Currently, we have the ability to only forward the GPU timestamps in the > samples (which are generated via OA reports). This limits the ability to > correlate these samples with the system events. > > An ability is therefore needed to report timestamps in different clock > domains, such as CLOCK_MONOTONIC, in the perf samples to be of more > practical use to the userspace. This ability becomes important > when we want to correlate/plot GPU events/samples with other system events > on the same timeline (e.g. vblank events, or timestamps when work was > submitted to kernel, etc.) > > The patch here proposes a mechanism to achieve this. The correlation > between gpu time and system time is established using the timestamp clock > associated with the command stream, abstracted as timecounter/cyclecounter > to retrieve gpu/system time correlated values. > > v2: Added i915_driver_init_late() function to capture the new late init > phase for perf (Chris) > > v3: Removed cross-timestamp changes. > > Signed-off-by: Sourab Gupta <sourab.gupta@intel.com> > Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> > Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Sourab Gupta <sourab.gupta@intel.com> > Cc: Matthew Auld <matthew.auld@intel.com> > --- > drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++ > include/uapi/drm/i915_drm.h | 7 +++++++ > 2 files changed, 34 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > index 3b721d7..94ee924 100644 > --- a/drivers/gpu/drm/i915/i915_perf.c > +++ b/drivers/gpu/drm/i915/i915_perf.c > @@ -336,6 +336,7 @@ > > #define SAMPLE_OA_REPORT BIT(0) > #define SAMPLE_GPU_TS BIT(1) > +#define SAMPLE_SYSTEM_TS BIT(2) > > /** > * struct perf_open_properties - for validated properties given to open a stream > @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream, > struct drm_i915_perf_record_header header; > u32 sample_flags = stream->sample_flags; > u64 gpu_ts = 0; > + u64 system_ts = 0; > > header.type = DRM_I915_PERF_RECORD_SAMPLE; > header.pad = 0; > @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream, > > if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE)) > return -EFAULT; > + buf += I915_PERF_TS_SAMPLE_SIZE; > + } > + > + if (sample_flags & SAMPLE_SYSTEM_TS) { > + gpu_ts = get_gpu_ts_from_oa_report(stream, report); > + /* > + * XXX: timecounter_cyc2time considers time backwards if delta > + * timestamp is more than half the max ns time covered by > + * counter. It will be ~35min for 36 bit counter. If this much > + * sampling duration is needed we will have to update tc->nsec > + * by explicitly reading the timecounter (timecounter_read) > + * before this duration. This is implemented as overflow watchdog in mlx5e_timestamp_init (drivers/net/ethernet/mellanox). Will need similar mechanism here. > + */ > + system_ts = timecounter_cyc2time(&stream->tc, gpu_ts); > + > + if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE)) > + return -EFAULT; > } > > (*offset) += header.size; > @@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, > stream->sample_size += I915_PERF_TS_SAMPLE_SIZE; > } > > + if (props->sample_flags & SAMPLE_SYSTEM_TS) { > + stream->sample_flags |= SAMPLE_SYSTEM_TS; > + stream->sample_size += I915_PERF_TS_SAMPLE_SIZE; > + } > + > dev_priv->perf.oa.oa_buffer.format_size = format_size; > if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0)) > return -EINVAL; > @@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, > case DRM_I915_PERF_PROP_SAMPLE_GPU_TS: > props->sample_flags |= SAMPLE_GPU_TS; > break; > + case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS: > + props->sample_flags |= SAMPLE_SYSTEM_TS; > + break; > case DRM_I915_PERF_PROP_OA_METRICS_SET: > if (value == 0) { > DRM_DEBUG("Unknown OA metric set ID\n"); > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > index 0b9249e..283859c 100644 > --- a/include/uapi/drm/i915_drm.h > +++ b/include/uapi/drm/i915_drm.h > @@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id { > DRM_I915_PERF_PROP_SAMPLE_GPU_TS, > > /** > + * This property requests inclusion of CLOCK_MONOTONIC system time in > + * the perf sample data. > + */ > + DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS, > + > + /** > * The value specifies which set of OA unit metrics should be > * be configured, defining the contents of any OA unit reports. > */ > @@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type { > * > * { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA > * { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS > + * { u64 system_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS > * }; > */ > DRM_I915_PERF_RECORD_SAMPLE = 1,
On 15/11/17 12:13, Sagar Arun Kamble wrote: > From: Sourab Gupta <sourab.gupta@intel.com> > > Currently, we have the ability to only forward the GPU timestamps in the > samples (which are generated via OA reports). This limits the ability to > correlate these samples with the system events. > > An ability is therefore needed to report timestamps in different clock > domains, such as CLOCK_MONOTONIC, in the perf samples to be of more > practical use to the userspace. This ability becomes important > when we want to correlate/plot GPU events/samples with other system events > on the same timeline (e.g. vblank events, or timestamps when work was > submitted to kernel, etc.) > > The patch here proposes a mechanism to achieve this. The correlation > between gpu time and system time is established using the timestamp clock > associated with the command stream, abstracted as timecounter/cyclecounter > to retrieve gpu/system time correlated values. > > v2: Added i915_driver_init_late() function to capture the new late init > phase for perf (Chris) > > v3: Removed cross-timestamp changes. > > Signed-off-by: Sourab Gupta <sourab.gupta@intel.com> > Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> > Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Sourab Gupta <sourab.gupta@intel.com> > Cc: Matthew Auld <matthew.auld@intel.com> > --- > drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++ > include/uapi/drm/i915_drm.h | 7 +++++++ > 2 files changed, 34 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > index 3b721d7..94ee924 100644 > --- a/drivers/gpu/drm/i915/i915_perf.c > +++ b/drivers/gpu/drm/i915/i915_perf.c > @@ -336,6 +336,7 @@ > > #define SAMPLE_OA_REPORT BIT(0) > #define SAMPLE_GPU_TS BIT(1) > +#define SAMPLE_SYSTEM_TS BIT(2) > > /** > * struct perf_open_properties - for validated properties given to open a stream > @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream, > struct drm_i915_perf_record_header header; > u32 sample_flags = stream->sample_flags; > u64 gpu_ts = 0; > + u64 system_ts = 0; > > header.type = DRM_I915_PERF_RECORD_SAMPLE; > header.pad = 0; > @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream, > > if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE)) > return -EFAULT; > + buf += I915_PERF_TS_SAMPLE_SIZE; This is a ridiculous nit, but I think using sizeof(u64) would be more readable than this I915_PERF_TS_SAMPLE_SIZE define. > + } > + > + if (sample_flags & SAMPLE_SYSTEM_TS) { > + gpu_ts = get_gpu_ts_from_oa_report(stream, report); > + /* > + * XXX: timecounter_cyc2time considers time backwards if delta > + * timestamp is more than half the max ns time covered by > + * counter. It will be ~35min for 36 bit counter. If this much > + * sampling duration is needed we will have to update tc->nsec > + * by explicitly reading the timecounter (timecounter_read) > + * before this duration. > + */ > + system_ts = timecounter_cyc2time(&stream->tc, gpu_ts); > + > + if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE)) > + return -EFAULT; > } > > (*offset) += header.size; > @@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, > stream->sample_size += I915_PERF_TS_SAMPLE_SIZE; > } > > + if (props->sample_flags & SAMPLE_SYSTEM_TS) { > + stream->sample_flags |= SAMPLE_SYSTEM_TS; > + stream->sample_size += I915_PERF_TS_SAMPLE_SIZE; > + } > + > dev_priv->perf.oa.oa_buffer.format_size = format_size; > if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0)) > return -EINVAL; > @@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, > case DRM_I915_PERF_PROP_SAMPLE_GPU_TS: > props->sample_flags |= SAMPLE_GPU_TS; > break; > + case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS: > + props->sample_flags |= SAMPLE_SYSTEM_TS; > + break; > case DRM_I915_PERF_PROP_OA_METRICS_SET: > if (value == 0) { > DRM_DEBUG("Unknown OA metric set ID\n"); > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > index 0b9249e..283859c 100644 > --- a/include/uapi/drm/i915_drm.h > +++ b/include/uapi/drm/i915_drm.h > @@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id { > DRM_I915_PERF_PROP_SAMPLE_GPU_TS, > > /** > + * This property requests inclusion of CLOCK_MONOTONIC system time in > + * the perf sample data. > + */ > + DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS, > + > + /** > * The value specifies which set of OA unit metrics should be > * be configured, defining the contents of any OA unit reports. > */ > @@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type { > * > * { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA > * { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS > + * { u64 system_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS I would just add those u64 fields before oa_report. Since the report sizes are dictated by the hardware, I'm afraid that one day someone might come up with a non 64bit aligned format (however unlikely). And since the new properties mean you need to be aware of the potential new offsets, it's not breaking existing userspace. > * }; > */ > DRM_I915_PERF_RECORD_SAMPLE = 1,
On 12/5/2017 7:52 PM, Lionel Landwerlin wrote: > On 15/11/17 12:13, Sagar Arun Kamble wrote: >> From: Sourab Gupta <sourab.gupta@intel.com> >> >> Currently, we have the ability to only forward the GPU timestamps in the >> samples (which are generated via OA reports). This limits the ability to >> correlate these samples with the system events. >> >> An ability is therefore needed to report timestamps in different clock >> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more >> practical use to the userspace. This ability becomes important >> when we want to correlate/plot GPU events/samples with other system >> events >> on the same timeline (e.g. vblank events, or timestamps when work was >> submitted to kernel, etc.) >> >> The patch here proposes a mechanism to achieve this. The correlation >> between gpu time and system time is established using the timestamp >> clock >> associated with the command stream, abstracted as >> timecounter/cyclecounter >> to retrieve gpu/system time correlated values. >> >> v2: Added i915_driver_init_late() function to capture the new late init >> phase for perf (Chris) >> >> v3: Removed cross-timestamp changes. >> >> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com> >> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> >> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Sourab Gupta <sourab.gupta@intel.com> >> Cc: Matthew Auld <matthew.auld@intel.com> >> --- >> drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++ >> include/uapi/drm/i915_drm.h | 7 +++++++ >> 2 files changed, 34 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/i915_perf.c >> b/drivers/gpu/drm/i915/i915_perf.c >> index 3b721d7..94ee924 100644 >> --- a/drivers/gpu/drm/i915/i915_perf.c >> +++ b/drivers/gpu/drm/i915/i915_perf.c >> @@ -336,6 +336,7 @@ >> #define SAMPLE_OA_REPORT BIT(0) >> #define SAMPLE_GPU_TS BIT(1) >> +#define SAMPLE_SYSTEM_TS BIT(2) >> /** >> * struct perf_open_properties - for validated properties given to >> open a stream >> @@ -622,6 +623,7 @@ static int append_oa_sample(struct >> i915_perf_stream *stream, >> struct drm_i915_perf_record_header header; >> u32 sample_flags = stream->sample_flags; >> u64 gpu_ts = 0; >> + u64 system_ts = 0; >> header.type = DRM_I915_PERF_RECORD_SAMPLE; >> header.pad = 0; >> @@ -647,6 +649,23 @@ static int append_oa_sample(struct >> i915_perf_stream *stream, >> if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE)) >> return -EFAULT; >> + buf += I915_PERF_TS_SAMPLE_SIZE; > > This is a ridiculous nit, but I think using sizeof(u64) would be more > readable than this I915_PERF_TS_SAMPLE_SIZE define. Will update. Thanks. > >> + } >> + >> + if (sample_flags & SAMPLE_SYSTEM_TS) { >> + gpu_ts = get_gpu_ts_from_oa_report(stream, report); >> + /* >> + * XXX: timecounter_cyc2time considers time backwards if delta >> + * timestamp is more than half the max ns time covered by >> + * counter. It will be ~35min for 36 bit counter. If this much >> + * sampling duration is needed we will have to update tc->nsec >> + * by explicitly reading the timecounter (timecounter_read) >> + * before this duration. >> + */ >> + system_ts = timecounter_cyc2time(&stream->tc, gpu_ts); >> + >> + if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE)) >> + return -EFAULT; >> } >> (*offset) += header.size; >> @@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct >> i915_perf_stream *stream, >> stream->sample_size += I915_PERF_TS_SAMPLE_SIZE; >> } >> + if (props->sample_flags & SAMPLE_SYSTEM_TS) { >> + stream->sample_flags |= SAMPLE_SYSTEM_TS; >> + stream->sample_size += I915_PERF_TS_SAMPLE_SIZE; >> + } >> + >> dev_priv->perf.oa.oa_buffer.format_size = format_size; >> if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0)) >> return -EINVAL; >> @@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct >> drm_i915_private *dev_priv, >> case DRM_I915_PERF_PROP_SAMPLE_GPU_TS: >> props->sample_flags |= SAMPLE_GPU_TS; >> break; >> + case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS: >> + props->sample_flags |= SAMPLE_SYSTEM_TS; >> + break; >> case DRM_I915_PERF_PROP_OA_METRICS_SET: >> if (value == 0) { >> DRM_DEBUG("Unknown OA metric set ID\n"); >> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h >> index 0b9249e..283859c 100644 >> --- a/include/uapi/drm/i915_drm.h >> +++ b/include/uapi/drm/i915_drm.h >> @@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id { >> DRM_I915_PERF_PROP_SAMPLE_GPU_TS, >> /** >> + * This property requests inclusion of CLOCK_MONOTONIC system >> time in >> + * the perf sample data. >> + */ >> + DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS, >> + >> + /** >> * The value specifies which set of OA unit metrics should be >> * be configured, defining the contents of any OA unit reports. >> */ >> @@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type { >> * >> * { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA >> * { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS >> + * { u64 system_timestamp; } && >> DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS > > I would just add those u64 fields before oa_report. > Since the report sizes are dictated by the hardware, I'm afraid that > one day someone might come up with a non 64bit aligned format (however > unlikely). > And since the new properties mean you need to be aware of the > potential new offsets, it's not breaking existing userspace. Sure. Will update. > >> * }; >> */ >> DRM_I915_PERF_RECORD_SAMPLE = 1, > >
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 3b721d7..94ee924 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -336,6 +336,7 @@ #define SAMPLE_OA_REPORT BIT(0) #define SAMPLE_GPU_TS BIT(1) +#define SAMPLE_SYSTEM_TS BIT(2) /** * struct perf_open_properties - for validated properties given to open a stream @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream, struct drm_i915_perf_record_header header; u32 sample_flags = stream->sample_flags; u64 gpu_ts = 0; + u64 system_ts = 0; header.type = DRM_I915_PERF_RECORD_SAMPLE; header.pad = 0; @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream, if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE)) return -EFAULT; + buf += I915_PERF_TS_SAMPLE_SIZE; + } + + if (sample_flags & SAMPLE_SYSTEM_TS) { + gpu_ts = get_gpu_ts_from_oa_report(stream, report); + /* + * XXX: timecounter_cyc2time considers time backwards if delta + * timestamp is more than half the max ns time covered by + * counter. It will be ~35min for 36 bit counter. If this much + * sampling duration is needed we will have to update tc->nsec + * by explicitly reading the timecounter (timecounter_read) + * before this duration. + */ + system_ts = timecounter_cyc2time(&stream->tc, gpu_ts); + + if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE)) + return -EFAULT; } (*offset) += header.size; @@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, stream->sample_size += I915_PERF_TS_SAMPLE_SIZE; } + if (props->sample_flags & SAMPLE_SYSTEM_TS) { + stream->sample_flags |= SAMPLE_SYSTEM_TS; + stream->sample_size += I915_PERF_TS_SAMPLE_SIZE; + } + dev_priv->perf.oa.oa_buffer.format_size = format_size; if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0)) return -EINVAL; @@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, case DRM_I915_PERF_PROP_SAMPLE_GPU_TS: props->sample_flags |= SAMPLE_GPU_TS; break; + case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS: + props->sample_flags |= SAMPLE_SYSTEM_TS; + break; case DRM_I915_PERF_PROP_OA_METRICS_SET: if (value == 0) { DRM_DEBUG("Unknown OA metric set ID\n"); diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 0b9249e..283859c 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id { DRM_I915_PERF_PROP_SAMPLE_GPU_TS, /** + * This property requests inclusion of CLOCK_MONOTONIC system time in + * the perf sample data. + */ + DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS, + + /** * The value specifies which set of OA unit metrics should be * be configured, defining the contents of any OA unit reports. */ @@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type { * * { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA * { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS + * { u64 system_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS * }; */ DRM_I915_PERF_RECORD_SAMPLE = 1,