Message ID | 1385484699-51596-6-git-send-email-bradley.d.volkin@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote: > +static const struct drm_i915_cmd_descriptor* > +find_cmd_in_table(const struct drm_i915_cmd_table *table, > + u32 cmd_header) > +{ > + int i; > + > + for (i = 0; i < table->count; i++) { > + const struct drm_i915_cmd_descriptor *desc = &table->table[i]; > + u32 masked_cmd = desc->cmd.mask & cmd_header; > + u32 masked_value = desc->cmd.value & desc->cmd.mask; > + > + if (masked_cmd == masked_value) > + return desc; Maybe pre-sort the cmd table and use bsearch? And a runtime test on module load to check that the table is sorted correctly. -Chris
On Tue, Nov 26, 2013 at 09:29:32AM -0800, Chris Wilson wrote: > On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote: > > +static const struct drm_i915_cmd_descriptor* > > +find_cmd_in_table(const struct drm_i915_cmd_table *table, > > + u32 cmd_header) > > +{ > > + int i; > > + > > + for (i = 0; i < table->count; i++) { > > + const struct drm_i915_cmd_descriptor *desc = &table->table[i]; > > + u32 masked_cmd = desc->cmd.mask & cmd_header; > > + u32 masked_value = desc->cmd.value & desc->cmd.mask; > > + > > + if (masked_cmd == masked_value) > > + return desc; > > Maybe pre-sort the cmd table and use bsearch? And a runtime test on > module load to check that the table is sorted correctly. So far it doesn't look like the search is a bottleneck, but I've tried to keep the tables sorted so that we could easily switch to bsearch if needed. Would you prefer to just use bsearch from the start? I agree that the module load check makes sense if we do switch. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre
On Tue, Nov 26, 2013 at 09:38:55AM -0800, Volkin, Bradley D wrote: > On Tue, Nov 26, 2013 at 09:29:32AM -0800, Chris Wilson wrote: > > On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote: > > > +static const struct drm_i915_cmd_descriptor* > > > +find_cmd_in_table(const struct drm_i915_cmd_table *table, > > > + u32 cmd_header) > > > +{ > > > + int i; > > > + > > > + for (i = 0; i < table->count; i++) { > > > + const struct drm_i915_cmd_descriptor *desc = &table->table[i]; > > > + u32 masked_cmd = desc->cmd.mask & cmd_header; > > > + u32 masked_value = desc->cmd.value & desc->cmd.mask; > > > + > > > + if (masked_cmd == masked_value) > > > + return desc; > > > > Maybe pre-sort the cmd table and use bsearch? And a runtime test on > > module load to check that the table is sorted correctly. > > So far it doesn't look like the search is a bottleneck, but I've tried to keep > the tables sorted so that we could easily switch to bsearch if needed. Would > you prefer to just use bsearch from the start? Yes. I think it will be easier (especially if the codes does the runtime check) to keep the table sorted as commands are incremently added in the future, rather than having to retrofit the bsearch when it becomes a significant problem. -Chris
On Tue, Nov 26, 2013 at 09:56:09AM -0800, Chris Wilson wrote: > On Tue, Nov 26, 2013 at 09:38:55AM -0800, Volkin, Bradley D wrote: > > On Tue, Nov 26, 2013 at 09:29:32AM -0800, Chris Wilson wrote: > > > On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote: > > > > +static const struct drm_i915_cmd_descriptor* > > > > +find_cmd_in_table(const struct drm_i915_cmd_table *table, > > > > + u32 cmd_header) > > > > +{ > > > > + int i; > > > > + > > > > + for (i = 0; i < table->count; i++) { > > > > + const struct drm_i915_cmd_descriptor *desc = &table->table[i]; > > > > + u32 masked_cmd = desc->cmd.mask & cmd_header; > > > > + u32 masked_value = desc->cmd.value & desc->cmd.mask; > > > > + > > > > + if (masked_cmd == masked_value) > > > > + return desc; > > > > > > Maybe pre-sort the cmd table and use bsearch? And a runtime test on > > > module load to check that the table is sorted correctly. > > > > So far it doesn't look like the search is a bottleneck, but I've tried to keep > > the tables sorted so that we could easily switch to bsearch if needed. Would > > you prefer to just use bsearch from the start? > > Yes. I think it will be easier (especially if the codes does the runtime > check) to keep the table sorted as commands are incremently added in the > future, rather than having to retrofit the bsearch when it becomes a > significant problem. Ok, makes sense. I'll assume the same comment applies to the register whitelists and make similar changes there. Brad > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre
On Tue, Nov 26, 2013 at 09:56:09AM -0800, Chris Wilson wrote: > On Tue, Nov 26, 2013 at 09:38:55AM -0800, Volkin, Bradley D wrote: > > On Tue, Nov 26, 2013 at 09:29:32AM -0800, Chris Wilson wrote: > > > On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote: > > > > +static const struct drm_i915_cmd_descriptor* > > > > +find_cmd_in_table(const struct drm_i915_cmd_table *table, > > > > + u32 cmd_header) > > > > +{ > > > > + int i; > > > > + > > > > + for (i = 0; i < table->count; i++) { > > > > + const struct drm_i915_cmd_descriptor *desc = &table->table[i]; > > > > + u32 masked_cmd = desc->cmd.mask & cmd_header; > > > > + u32 masked_value = desc->cmd.value & desc->cmd.mask; > > > > + > > > > + if (masked_cmd == masked_value) > > > > + return desc; > > > > > > Maybe pre-sort the cmd table and use bsearch? And a runtime test on > > > module load to check that the table is sorted correctly. > > > > So far it doesn't look like the search is a bottleneck, but I've tried to keep > > the tables sorted so that we could easily switch to bsearch if needed. Would > > you prefer to just use bsearch from the start? > > Yes. I think it will be easier (especially if the codes does the runtime > check) to keep the table sorted as commands are incremently added in the > future, rather than having to retrofit the bsearch when it becomes a > significant problem. > -Chris A quick test is showing that bsearch() with the current table definitions is worse than the linear search by ~7%. If I flatten the tables so there's one table with all of the commands for a given ring, bsearch() is ~2% better than linear search. So I'm inclined to add the code to check that the list is sorted but stick with linear search for now. I'll revisit when we have more complete performance data. Brad > > -- > Chris Wilson, Intel Open Source Technology Centre
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 247d530..b01628e 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -232,3 +232,151 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring) ring->id); } } + +static const struct drm_i915_cmd_descriptor* +find_cmd_in_table(const struct drm_i915_cmd_table *table, + u32 cmd_header) +{ + int i; + + for (i = 0; i < table->count; i++) { + const struct drm_i915_cmd_descriptor *desc = &table->table[i]; + u32 masked_cmd = desc->cmd.mask & cmd_header; + u32 masked_value = desc->cmd.value & desc->cmd.mask; + + if (masked_cmd == masked_value) + return desc; + } + + return NULL; +} + +/* Returns a pointer to a descriptor for the command specified by cmd_header. + * + * The caller must supply space for a default descriptor via the default_desc + * parameter. If no descriptor for the specified command exists in the ring's + * command parser tables, this function fills in default_desc based on the + * ring's default length encoding and returns default_desc. + */ +static const struct drm_i915_cmd_descriptor* +find_cmd(struct intel_ring_buffer *ring, + u32 cmd_header, + struct drm_i915_cmd_descriptor *default_desc) +{ + u32 mask; + int i; + + for (i = 0; i < ring->cmd_table_count; i++) { + const struct drm_i915_cmd_descriptor *desc; + + desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header); + if (desc) + return desc; + } + + mask = ring->get_cmd_length_mask(cmd_header); + if (!mask) + return NULL; + + BUG_ON(!default_desc); + default_desc->flags = CMD_DESC_SKIP; + default_desc->length.mask = mask; + + return default_desc; +} + +static u32 *vmap_batch(struct drm_i915_gem_object *obj) +{ + int i; + void *addr = NULL; + struct sg_page_iter sg_iter; + struct page **pages; + + pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages)); + if (pages == NULL) { + DRM_DEBUG("Failed to get space for pages\n"); + goto finish; + } + + i = 0; + for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) { + pages[i] = sg_page_iter_page(&sg_iter); + i++; + } + + addr = vmap(pages, i, 0, PAGE_KERNEL); + if (addr == NULL) { + DRM_DEBUG("Failed to vmap pages\n"); + goto finish; + } + +finish: + if (pages) + drm_free_large(pages); + return (u32*)addr; +} + +#define LENGTH_BIAS 2 + +int i915_parse_cmds(struct intel_ring_buffer *ring, + struct drm_i915_gem_object *batch_obj, + u32 batch_start_offset) +{ + int ret = 0; + u32 *cmd, *batch_base, *batch_end; + struct drm_i915_cmd_descriptor default_desc = { 0 }; + + /* No command tables currently indicates a platform without parsing */ + if (!ring->cmd_tables) + return 0; + + batch_base = vmap_batch(batch_obj); + if (!batch_base) { + DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n"); + return -ENOMEM; + } + + cmd = batch_base + (batch_start_offset / sizeof(*cmd)); + batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end)); + + while (cmd < batch_end) { + const struct drm_i915_cmd_descriptor *desc; + u32 length; + + if (*cmd == MI_BATCH_BUFFER_END) + break; + + desc = find_cmd(ring, *cmd, &default_desc); + if (!desc) { + DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n", + *cmd); + ret = -EINVAL; + break; + } + + if (desc->flags & CMD_DESC_FIXED) + length = desc->length.fixed; + else + length = ((*cmd & desc->length.mask) + LENGTH_BIAS); + + if ((batch_end - cmd) < length) { + DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%d batchlen=%ld\n", + *cmd, + length, + batch_end - cmd); + ret = -EINVAL; + break; + } + + cmd += length; + } + + if (cmd >= batch_end) { + DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n"); + ret = -EINVAL; + } + + vunmap(batch_base); + + return ret; +} diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 13076db..90d7db0 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -154,6 +154,11 @@ module_param_named(prefault_disable, i915_prefault_disable, bool, 0600); MODULE_PARM_DESC(prefault_disable, "Disable page prefaulting for pread/pwrite/reloc (default:false). For developers only."); +int i915_enable_cmd_parser __read_mostly = 0; +module_param_named(enable_cmd_parser, i915_enable_cmd_parser, int, 0600); +MODULE_PARM_DESC(enable_cmd_parser, + "Enable command parsing (default: false)"); + static struct drm_driver driver; static const struct intel_device_info intel_i830_info = { diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 7de4e59..81ef047 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1926,6 +1926,7 @@ extern bool i915_fastboot __read_mostly; extern int i915_enable_pc8 __read_mostly; extern int i915_pc8_timeout __read_mostly; extern bool i915_prefault_disable __read_mostly; +extern int i915_enable_cmd_parser __read_mostly; extern int i915_suspend(struct drm_device *dev, pm_message_t state); extern int i915_resume(struct drm_device *dev); @@ -2374,6 +2375,9 @@ const char *i915_cache_level_str(int type); /* i915_cmd_parser.c */ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring); +int i915_parse_cmds(struct intel_ring_buffer *ring, + struct drm_i915_gem_object *batch_obj, + u32 batch_start_offset); /* i915_suspend.c */ extern int i915_save_state(struct drm_device *dev); diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index b800fe4..06975c7 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1143,6 +1143,21 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, } batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND; + if (i915_enable_cmd_parser && !(flags & I915_DISPATCH_SECURE)) { + ret = i915_parse_cmds(ring, + batch_obj, + args->batch_start_offset); + if (ret) + goto err; + + /* Set the DISPATCH_SECURE bit to remove the NON_SECURE bit + * from MI_BATCH_BUFFER_START commands issued in the + * dispatch_execbuffer implementations. We specifically don't + * want that set when the command parser is enabled. + */ + flags |= I915_DISPATCH_SECURE; + } + /* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure * batch" bit. Hence we need to pin secure batches into the global gtt. * hsw should have this fixed, but bdw mucks it up again. */