diff mbox

[RFC,05/22] drm/i915: Implement command parsing

Message ID 1385484699-51596-6-git-send-email-bradley.d.volkin@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

bradley.d.volkin@intel.com Nov. 26, 2013, 4:51 p.m. UTC
From: Brad Volkin <bradley.d.volkin@intel.com>

At this point the parser just implements the mechanics of finding
commands in the batch buffer and looking them up in the parser tables.

It rejects poorly formed batches (e.g. no MI_BATCH_BUFFER_END command,
containing unrecognized commands, etc) but does no other checks.

Optionally enabled by a module parameter, currently defaulting to off.
We'll enable by default at the end of the series.

The code to look up commands in the per-ring tables implements a linear
search. The tables are small so in practice this does not appear to cause
a performance issue. However, the tables are sorted by command opcode such
that we can move to something like binary search if this code becomes a
bottleneck in the future.

OTC-Tracker: AXIA-4631
Change-Id: If71f50d28578d27325c3438abf4b229c7cf695cd
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 148 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.c            |   5 +
 drivers/gpu/drm/i915/i915_drv.h            |   4 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  15 +++
 4 files changed, 172 insertions(+)

Comments

Chris Wilson Nov. 26, 2013, 5:29 p.m. UTC | #1
On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote:
> +static const struct drm_i915_cmd_descriptor*
> +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> +		  u32 cmd_header)
> +{
> +	int i;
> +
> +	for (i = 0; i < table->count; i++) {
> +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> +
> +		if (masked_cmd == masked_value)
> +			return desc;

Maybe pre-sort the cmd table and use bsearch? And a runtime test on
module load to check that the table is sorted correctly.
-Chris
bradley.d.volkin@intel.com Nov. 26, 2013, 5:38 p.m. UTC | #2
On Tue, Nov 26, 2013 at 09:29:32AM -0800, Chris Wilson wrote:
> On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote:
> > +static const struct drm_i915_cmd_descriptor*
> > +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> > +		  u32 cmd_header)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < table->count; i++) {
> > +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> > +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> > +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> > +
> > +		if (masked_cmd == masked_value)
> > +			return desc;
> 
> Maybe pre-sort the cmd table and use bsearch? And a runtime test on
> module load to check that the table is sorted correctly.

So far it doesn't look like the search is a bottleneck, but I've tried to keep
the tables sorted so that we could easily switch to bsearch if needed. Would
you prefer to just use bsearch from the start?

I agree that the module load check makes sense if we do switch.

> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
Chris Wilson Nov. 26, 2013, 5:56 p.m. UTC | #3
On Tue, Nov 26, 2013 at 09:38:55AM -0800, Volkin, Bradley D wrote:
> On Tue, Nov 26, 2013 at 09:29:32AM -0800, Chris Wilson wrote:
> > On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote:
> > > +static const struct drm_i915_cmd_descriptor*
> > > +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> > > +		  u32 cmd_header)
> > > +{
> > > +	int i;
> > > +
> > > +	for (i = 0; i < table->count; i++) {
> > > +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> > > +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> > > +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> > > +
> > > +		if (masked_cmd == masked_value)
> > > +			return desc;
> > 
> > Maybe pre-sort the cmd table and use bsearch? And a runtime test on
> > module load to check that the table is sorted correctly.
> 
> So far it doesn't look like the search is a bottleneck, but I've tried to keep
> the tables sorted so that we could easily switch to bsearch if needed. Would
> you prefer to just use bsearch from the start?

Yes. I think it will be easier (especially if the codes does the runtime
check) to keep the table sorted as commands are incremently added in the
future, rather than having to retrofit the bsearch when it becomes a
significant problem.
-Chris
bradley.d.volkin@intel.com Nov. 26, 2013, 6:55 p.m. UTC | #4
On Tue, Nov 26, 2013 at 09:56:09AM -0800, Chris Wilson wrote:
> On Tue, Nov 26, 2013 at 09:38:55AM -0800, Volkin, Bradley D wrote:
> > On Tue, Nov 26, 2013 at 09:29:32AM -0800, Chris Wilson wrote:
> > > On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote:
> > > > +static const struct drm_i915_cmd_descriptor*
> > > > +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> > > > +		  u32 cmd_header)
> > > > +{
> > > > +	int i;
> > > > +
> > > > +	for (i = 0; i < table->count; i++) {
> > > > +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> > > > +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> > > > +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> > > > +
> > > > +		if (masked_cmd == masked_value)
> > > > +			return desc;
> > > 
> > > Maybe pre-sort the cmd table and use bsearch? And a runtime test on
> > > module load to check that the table is sorted correctly.
> > 
> > So far it doesn't look like the search is a bottleneck, but I've tried to keep
> > the tables sorted so that we could easily switch to bsearch if needed. Would
> > you prefer to just use bsearch from the start?
> 
> Yes. I think it will be easier (especially if the codes does the runtime
> check) to keep the table sorted as commands are incremently added in the
> future, rather than having to retrofit the bsearch when it becomes a
> significant problem.

Ok, makes sense. I'll assume the same comment applies to the register whitelists and
make similar changes there.

Brad

> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
bradley.d.volkin@intel.com Dec. 5, 2013, 9:10 p.m. UTC | #5
On Tue, Nov 26, 2013 at 09:56:09AM -0800, Chris Wilson wrote:
> On Tue, Nov 26, 2013 at 09:38:55AM -0800, Volkin, Bradley D wrote:
> > On Tue, Nov 26, 2013 at 09:29:32AM -0800, Chris Wilson wrote:
> > > On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote:
> > > > +static const struct drm_i915_cmd_descriptor*
> > > > +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> > > > +		  u32 cmd_header)
> > > > +{
> > > > +	int i;
> > > > +
> > > > +	for (i = 0; i < table->count; i++) {
> > > > +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> > > > +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> > > > +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> > > > +
> > > > +		if (masked_cmd == masked_value)
> > > > +			return desc;
> > > 
> > > Maybe pre-sort the cmd table and use bsearch? And a runtime test on
> > > module load to check that the table is sorted correctly.
> > 
> > So far it doesn't look like the search is a bottleneck, but I've tried to keep
> > the tables sorted so that we could easily switch to bsearch if needed. Would
> > you prefer to just use bsearch from the start?
> 
> Yes. I think it will be easier (especially if the codes does the runtime
> check) to keep the table sorted as commands are incremently added in the
> future, rather than having to retrofit the bsearch when it becomes a
> significant problem.
> -Chris

A quick test is showing that bsearch() with the current table definitions is worse
than the linear search by ~7%. If I flatten the tables so there's one table with
all of the commands for a given ring, bsearch() is ~2% better than linear search.

So I'm inclined to add the code to check that the list is sorted but stick with linear
search for now. I'll revisit when we have more complete performance data.

Brad

> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 247d530..b01628e 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -232,3 +232,151 @@  void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
 			  ring->id);
 	}
 }
+
+static const struct drm_i915_cmd_descriptor*
+find_cmd_in_table(const struct drm_i915_cmd_table *table,
+		  u32 cmd_header)
+{
+	int i;
+
+	for (i = 0; i < table->count; i++) {
+		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
+		u32 masked_cmd = desc->cmd.mask & cmd_header;
+		u32 masked_value = desc->cmd.value & desc->cmd.mask;
+
+		if (masked_cmd == masked_value)
+			return desc;
+	}
+
+	return NULL;
+}
+
+/* Returns a pointer to a descriptor for the command specified by cmd_header.
+ *
+ * The caller must supply space for a default descriptor via the default_desc
+ * parameter. If no descriptor for the specified command exists in the ring's
+ * command parser tables, this function fills in default_desc based on the
+ * ring's default length encoding and returns default_desc.
+ */
+static const struct drm_i915_cmd_descriptor*
+find_cmd(struct intel_ring_buffer *ring,
+	 u32 cmd_header,
+	 struct drm_i915_cmd_descriptor *default_desc)
+{
+	u32 mask;
+	int i;
+
+	for (i = 0; i < ring->cmd_table_count; i++) {
+		const struct drm_i915_cmd_descriptor *desc;
+
+		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
+		if (desc)
+			return desc;
+	}
+
+	mask = ring->get_cmd_length_mask(cmd_header);
+	if (!mask)
+		return NULL;
+
+	BUG_ON(!default_desc);
+	default_desc->flags = CMD_DESC_SKIP;
+	default_desc->length.mask = mask;
+
+	return default_desc;
+}
+
+static u32 *vmap_batch(struct drm_i915_gem_object *obj)
+{
+	int i;
+	void *addr = NULL;
+	struct sg_page_iter sg_iter;
+	struct page **pages;
+
+	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
+	if (pages == NULL) {
+		DRM_DEBUG("Failed to get space for pages\n");
+		goto finish;
+	}
+
+	i = 0;
+	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
+		pages[i] = sg_page_iter_page(&sg_iter);
+		i++;
+	}
+
+	addr = vmap(pages, i, 0, PAGE_KERNEL);
+	if (addr == NULL) {
+		DRM_DEBUG("Failed to vmap pages\n");
+		goto finish;
+	}
+
+finish:
+	if (pages)
+		drm_free_large(pages);
+	return (u32*)addr;
+}
+
+#define LENGTH_BIAS 2
+
+int i915_parse_cmds(struct intel_ring_buffer *ring,
+		    struct drm_i915_gem_object *batch_obj,
+		    u32 batch_start_offset)
+{
+	int ret = 0;
+	u32 *cmd, *batch_base, *batch_end;
+	struct drm_i915_cmd_descriptor default_desc = { 0 };
+
+	/* No command tables currently indicates a platform without parsing */
+	if (!ring->cmd_tables)
+		return 0;
+
+	batch_base = vmap_batch(batch_obj);
+	if (!batch_base) {
+		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
+		return -ENOMEM;
+	}
+
+	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
+	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
+
+	while (cmd < batch_end) {
+		const struct drm_i915_cmd_descriptor *desc;
+		u32 length;
+
+		if (*cmd == MI_BATCH_BUFFER_END)
+			break;
+
+		desc = find_cmd(ring, *cmd, &default_desc);
+		if (!desc) {
+			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
+					 *cmd);
+			ret = -EINVAL;
+			break;
+		}
+
+		if (desc->flags & CMD_DESC_FIXED)
+			length = desc->length.fixed;
+		else
+			length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
+
+		if ((batch_end - cmd) < length) {
+			DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%d batchlen=%ld\n",
+					 *cmd,
+					 length,
+					 batch_end - cmd);
+			ret = -EINVAL;
+			break;
+		}
+
+		cmd += length;
+	}
+
+	if (cmd >= batch_end) {
+		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
+		ret = -EINVAL;
+	}
+
+	vunmap(batch_base);
+
+	return ret;
+}
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 13076db..90d7db0 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -154,6 +154,11 @@  module_param_named(prefault_disable, i915_prefault_disable, bool, 0600);
 MODULE_PARM_DESC(prefault_disable,
 		"Disable page prefaulting for pread/pwrite/reloc (default:false). For developers only.");
 
+int i915_enable_cmd_parser __read_mostly = 0;
+module_param_named(enable_cmd_parser, i915_enable_cmd_parser, int, 0600);
+MODULE_PARM_DESC(enable_cmd_parser,
+		"Enable command parsing (default: false)");
+
 static struct drm_driver driver;
 
 static const struct intel_device_info intel_i830_info = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7de4e59..81ef047 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1926,6 +1926,7 @@  extern bool i915_fastboot __read_mostly;
 extern int i915_enable_pc8 __read_mostly;
 extern int i915_pc8_timeout __read_mostly;
 extern bool i915_prefault_disable __read_mostly;
+extern int i915_enable_cmd_parser __read_mostly;
 
 extern int i915_suspend(struct drm_device *dev, pm_message_t state);
 extern int i915_resume(struct drm_device *dev);
@@ -2374,6 +2375,9 @@  const char *i915_cache_level_str(int type);
 
 /* i915_cmd_parser.c */
 void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring);
+int i915_parse_cmds(struct intel_ring_buffer *ring,
+		    struct drm_i915_gem_object *batch_obj,
+		    u32 batch_start_offset);
 
 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b800fe4..06975c7 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1143,6 +1143,21 @@  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
 
+	if (i915_enable_cmd_parser && !(flags & I915_DISPATCH_SECURE)) {
+		ret = i915_parse_cmds(ring,
+				      batch_obj,
+				      args->batch_start_offset);
+		if (ret)
+			goto err;
+
+		/* Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
+		 * from MI_BATCH_BUFFER_START commands issued in the
+		 * dispatch_execbuffer implementations. We specifically don't
+		 * want that set when the command parser is enabled.
+		 */
+		flags |= I915_DISPATCH_SECURE;
+	}
+
 	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */