Message ID | 1301621509-23107-1-git-send-email-ben@bwidawsk.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 01 Apr 2011 11:51:23 -0700, Eric Anholt <eric@anholt.net> wrote: > On Fri, 01 Apr 2011 08:32:09 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > I'm just not happy about haphazard locking. Can we do simple and safe > > locking and revisit it if a real use-case for brute-forcing the read/write > > is found? > > The concern we had was "I want to be sure that when the GPU is hung we > can still reg_dump so people can write bug reports. The mutex lock try > should have a timeout, and we should go ahead and just try forcewaking > and reading the reg anyway after a while." If the GPU has hung (and more importantly in such a way as to block the mutex - nowadays that is it an OOPS with the lock held), we can simply do the reg read/write from the userspace without worry. But for bug reports, I'd much rather we automatically capture any register that we might need than rely on the frustrated user doing the right thing. So I don't see having the haphazard locking in the kernel, for everyone to criticise us for, justified. > > It's not the performance I care about, but the atomicity. There seem to be > > a growing abundance of messaging systems within the chip driven by > > combinations of registers. I'd feel happier if we could send a message > > without fouling or being fouled by the kernel. > > Generally for those things, you end up needing to poll in the middle. > Let's not build a more complicated interface than required for current > use-cases in userland (reading many registers, or writing an arbitrary > one), without solving a specific problem. What I guess I was trying to express was that we need to be very clear what the interface is for and the limitations about its use. For the more complicated set of registers, we can and should expose knobs in the debugfs to read and write them. For instance, to control the render clock frequencies and thresholds. But perhaps we do need to reconsider the performance aspect. intel_gpu_top samples the ring HEAD and TAIL at around 10KHz and forcing gt-wake is about 50 microseconds... I hope I'm mistaken, because even batched that is doomed. Ben, do you mind checking that thought experiment with a little hard fact? -Chris
On Sat, Apr 02, 2011 at 07:46:31AM +0100, Chris Wilson wrote: > > But perhaps we do need to reconsider the performance aspect. intel_gpu_top > samples the ring HEAD and TAIL at around 10KHz and forcing gt-wake is > about 50 microseconds... I hope I'm mistaken, because even batched that is > doomed. Ben, do you mind checking that thought experiment with a little > hard fact? I can get some numbers for it... but I'd like to further the discussion a bit since I have to go to the office to get a SNB, and typing is easier than doing that right now :). I too think we might be doomed. At the very least we have the POSTING_READ, which is expensive. The udelays may or may not actually occur (I'll find out). Let's not forget too that we do fix other tools, not just intel_gpu_top, and those tools don't poll, and don't care about timing (granted they're mostly less interesting too). I think what we really need to try to defer the forcewake_put, as well as something like the last patch I sent <1301105269-23970-2-git-send-email-ben@bwidawsk.net> to remove the need to protect force_wake_get with struct_mutex, and keep a refcount. I'd rather get the current stuff accepted, port the tools, and then make improvements to the performance. However if you feel the order must be another way, I can work with that. > -Chris > Thanks. Ben
On Sat, Apr 02, 2011 at 07:46:31AM +0100, Chris Wilson wrote: > What I guess I was trying to express was that we need to be very clear > what the interface is for and the limitations about its use. > > For the more complicated set of registers, we can and should expose knobs > in the debugfs to read and write them. For instance, to control the render > clock frequencies and thresholds. > > But perhaps we do need to reconsider the performance aspect. intel_gpu_top > samples the ring HEAD and TAIL at around 10KHz and forcing gt-wake is > about 50 microseconds... I hope I'm mistaken, because even batched that is > doomed. Ben, do you mind checking that thought experiment with a little > hard fact? Here is the data from ~100 samples while playing playing Armacycles Advanced measured off of d-i-f 7f58aabc369014fda3a4a33604ba0a1b63b941ac. min 02.775us max 19.402us avg 07.057us stddev 02.819us When I do a cat /sys/kernel/debug/dri/0/i915_gem_interrupt, I always get 3 reads, in a similar pattern to this: 6) ! 285.852 us | __gen6_gt_force_wake_get(); 6) 1.944 us | __gen6_gt_force_wake_get(); 6) 1.854 us | __gen6_gt_force_wake_get(); Not sure why that case is so different. > -Chris Ben
On Sun, 3 Apr 2011 18:35:04 -0700, Ben Widawsky <ben@bwidawsk.net> wrote: > Here is the data from ~100 samples while playing playing Armacycles Advanced > measured off of d-i-f 7f58aabc369014fda3a4a33604ba0a1b63b941ac. > > min 02.775us > max 19.402us > avg 07.057us > stddev 02.819us > > When I do a cat /sys/kernel/debug/dri/0/i915_gem_interrupt, I always get > 3 reads, in a similar pattern to this: > > 6) ! 285.852 us | __gen6_gt_force_wake_get(); > 6) 1.944 us | __gen6_gt_force_wake_get(); > 6) 1.854 us | __gen6_gt_force_wake_get(); > > Not sure why that case is so different. Presumably the high cost is when we need to wait for the GT to power up and the hardware has its own hysteresis and will delay before powering down again. [It also looks like we always have to wait at least for one loop. Perhaps a posting read is in order?] So at least we don't have to worry about doing that ourselves - adding a spinlock just for performance optimisation on a seldom used debugging ioctl is painful. -Chris
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 7273037..8b5165d 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1846,6 +1846,133 @@ out_unlock: } EXPORT_SYMBOL_GPL(i915_gpu_turbo_disable); + +/* If we assume registers are dword aligned, second check is redundant */ +#define REG_IN_RANGE(reg, begin, end) \ + ((reg <= end && reg >= begin) || \ + ((reg+3) <= end && (reg+3) >= begin)) + +/* + * This is here to avoid hanging the GPU from userspace. + * The ranges are defined the in "Register Address Map" section of the + * documents. + */ +static inline int +reg_in_rsvd_range(struct drm_device *dev, u32 reg) +{ + if (INTEL_INFO(dev)->gen < 6) { + if (REG_IN_RANGE(reg, 0x1000, 0x1fff) || + ((IS_BROADWATER(dev) || IS_CRESTLINE(dev)) ? + REG_IN_RANGE(reg, 0x4000, 0x4fff) : 0) || + REG_IN_RANGE(reg, 0x9000, 0x9fff) || + REG_IN_RANGE(reg, 0xb000, 0xffff) || + REG_IN_RANGE(reg, 0x14000, 0x2ffff) || + REG_IN_RANGE(reg, 0x40000, 0x5ffff)) + return 1; + + if (reg > 0x73fff) + return 1; + } else { + if (REG_IN_RANGE(reg, 0x1000, 0x1fff) || + REG_IN_RANGE(reg, 0x9000, 0x9fff) || + REG_IN_RANGE(reg, 0xb000, 0xffff) || + REG_IN_RANGE(reg, 0x15000, 0x21fff) || + REG_IN_RANGE(reg, 0x23000, 0x23fff) || + REG_IN_RANGE(reg, 0x25000, 0x2ffff) || + REG_IN_RANGE(reg, 0x40000, 0xfffff) || + REG_IN_RANGE(reg, 0x108000, 0x13ffff)) + return 1; + + if (reg > 0x17ffff) + return 1; + } + + return 0; +} + + +#define REG_IN_READ_BLACKLIST(reg) 0 +#define REG_IN_WRITE_BLACKLIST(reg) 0 + +static inline int +reg_read_allowed(struct drm_device *dev, u32 offset) +{ + if (offset & 0x3) + return 0; + + if (reg_in_rsvd_range(dev, offset)) + return 0; + + if (REG_IN_READ_BLACKLIST(offset)) + return 0; + + return 1; +} + +static inline int +reg_write_allowed(struct drm_device *dev, u32 offset) +{ + if (offset & 0x3) + return 0; + + if (reg_in_rsvd_range(dev, offset)) + return 0; + + if (REG_IN_WRITE_BLACKLIST(offset)) + return 0; + + return 1; +} + +static int +i915_read_register_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_intel_write_reg *args = data; + struct drm_i915_private *dev_priv = dev->dev_private; + int ret; + + if (!reg_read_allowed(dev, args->offset)) { + args->value = 0xffffffff; + return -EINVAL; + } + + ret = i915_mutex_lock_interruptible(dev); + if (ret) + DRM_DEBUG_DRIVER("Couldn't acquire mutex, reading anyway\n"); + + args->value = (u32)i915_gt_read(dev_priv, args->offset); + + mutex_unlock(&dev->struct_mutex); + + return 0; +} + +static int +i915_write_register_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_intel_read_reg *args = data; + struct drm_i915_private *dev_priv = dev->dev_private; + int ret; + + if (!reg_write_allowed(dev, args->offset)) + return -EINVAL; + + ret = i915_mutex_lock_interruptible(dev); + if (ret) + DRM_DEBUG_DRIVER("Couldn't acquire mutex, writing anyway\n"); + + DRM_INFO("User space write %x %x\n", args->offset, (u32)args->value); + dev_priv->user_tainted = true; + + i915_gt_write(dev_priv, args->offset, (u32)args->value); + + mutex_unlock(&dev->struct_mutex); + + return 0; +} + /** * Tells the intel_ips driver that the i915 driver is now loaded, if * IPS got loaded first. @@ -2276,6 +2403,8 @@ struct drm_ioctl_desc i915_ioctls[] = { DRM_IOCTL_DEF_DRV(I915_GEM_MADVISE, i915_gem_madvise_ioctl, DRM_UNLOCKED), DRM_IOCTL_DEF_DRV(I915_OVERLAY_PUT_IMAGE, intel_overlay_put_image, DRM_MASTER|DRM_CONTROL_ALLOW|DRM_UNLOCKED), DRM_IOCTL_DEF_DRV(I915_OVERLAY_ATTRS, intel_overlay_attrs, DRM_MASTER|DRM_CONTROL_ALLOW|DRM_UNLOCKED), + DRM_IOCTL_DEF_DRV(I915_READ_REGISTER, i915_read_register_ioctl, DRM_UNLOCKED), + DRM_IOCTL_DEF_DRV(I915_WRITE_REGISTER, i915_write_register_ioctl, DRM_AUTH|DRM_ROOT_ONLY|DRM_UNLOCKED), }; int i915_max_ioctl = DRM_ARRAY_SIZE(i915_ioctls); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 5004724..4c66734 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -703,6 +703,8 @@ typedef struct drm_i915_private { struct intel_fbdev *fbdev; struct drm_property *broadcast_rgb_property; + + bool user_tainted; } drm_i915_private_t; struct drm_i915_gem_object { diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h index c4d6dbf..68eeaa3 100644 --- a/include/drm/i915_drm.h +++ b/include/drm/i915_drm.h @@ -198,6 +198,8 @@ typedef struct _drm_i915_sarea { #define DRM_I915_OVERLAY_PUT_IMAGE 0x27 #define DRM_I915_OVERLAY_ATTRS 0x28 #define DRM_I915_GEM_EXECBUFFER2 0x29 +#define DRM_I915_READ_REGISTER 0x2a +#define DRM_I915_WRITE_REGISTER 0x2b #define DRM_IOCTL_I915_INIT DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t) #define DRM_IOCTL_I915_FLUSH DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH) @@ -239,6 +241,8 @@ typedef struct _drm_i915_sarea { #define DRM_IOCTL_I915_GEM_MADVISE DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_MADVISE, struct drm_i915_gem_madvise) #define DRM_IOCTL_I915_OVERLAY_PUT_IMAGE DRM_IOW(DRM_COMMAND_BASE + DRM_IOCTL_I915_OVERLAY_ATTRS, struct drm_intel_overlay_put_image) #define DRM_IOCTL_I915_OVERLAY_ATTRS DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_OVERLAY_ATTRS, struct drm_intel_overlay_attrs) +#define DRM_IOCTL_I915_READ_REGISTER DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_READ_REGISTER, struct drm_intel_read_reg) +#define DRM_IOCTL_I915_WRITE_REGISTER DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_WRITE_REGISTER, struct drm_intel_write_reg) /* Allow drivers to submit batchbuffers directly to hardware, relying * on the security mechanisms provided by hardware. @@ -844,4 +848,27 @@ struct drm_intel_overlay_attrs { __u32 gamma5; }; +struct drm_intel_read_reg { + /* register offset to read */ + __u32 offset; + + /* register size, RFU */ + __u8 size; + + /* return value, high 4 bytes RFU */ + __u64 value; +}; + +struct drm_intel_write_reg { + /* register offset to read */ + __u32 offset; + + /* register size, RFU */ + __u8 size; + + /* value to write, high 4 bytes RFU */ + __u64 value; + +}; + #endif /* _I915_DRM_H_ */