Message ID | 20171013204929.1748-2-antonio.argenziano@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Antonio Argenziano (2017-10-13 21:49:29) > A client that submits 'bad' contexts will be banned eventually while > other clients are not affected. Add a test for this. > > v2 > - Do not use a fixed number of contexts to ban client (Chris) > > Signed-off-by: Antonio Argenziano <antonio.argenziano@intel.com> > > Cc: Michel Thierry <michel.thierry@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > --- > tests/gem_reset_stats.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 58 insertions(+) > > diff --git a/tests/gem_reset_stats.c b/tests/gem_reset_stats.c > index edc40767..da309237 100644 > --- a/tests/gem_reset_stats.c > +++ b/tests/gem_reset_stats.c > @@ -438,6 +438,61 @@ static void test_ban_ctx(const struct intel_execution_engine *e) > close(fd); > } > > +static void test_client_ban(const struct intel_execution_engine *e) > +{ > + int fd_bad,fd_good; > + struct local_drm_i915_reset_stats rs_bad, rs_good; > + int ban, ctx_bans_retry = 10; > + int client_ban, client_bans_retry = 10; > + uint32_t ctx_bad; > + uint32_t test_ctx; > + int active_count = 0; /* * We try to prevent DoS GPU hang attacks from one client affecting the * service of other clients by banning the malicious client. * * To simulate this, we have one client submit a sequence of GPU hangs * that we expect to be banned (the client will no longer be allowed * to submit new work). Meanwhile a second client will not be affected * and allowed to continue. * * Note this is an extremely simplistic mechanism. For example, each * context inside an fd may represent a separate client (e.g. WebGL) * in which case the per-fd ban allow one client to DoS a second. * On the other hand, a malicious client may be creating a new context * for each hang to circumvent per-context restrictions. It may even be * creating a new fd each time to circumvent the per-fd restrictions. * Finally, the most malicious of all clients may succeed in wedging * the driver, bringing everyone to a standstill. In short, this DoS * prevention is itself suspect. * * So given the above what exactly to we want to define as the * expected behaviour? The answer is none of the above; use a * timeslicing scheduler and no banning. Any context is then allowed to * use as much as its time slice spinning as it wants; it won't be * allowed to steal GPU time from other processes. DoS averted. */ I keep coming to the conclusion that banning is a temporary hack, and not mandatory uABI, like hangcheck. The usual answer is that we then describe the implementation's behaviour through say a CONTEXT_GETPARAM and let userspace factor that into account. But again, I to have ask what requirements has userspace? https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_robustness.txt (And how are the requirements going to change?) The implication from that is that we could ban a context after the first hang, but mesa isn't ready for that! Maybe you should try writing an explanation of exactly what behaviour you are trying to elicit here and why? :) One thing that seems nice on the surface for execlists is repriotising after a hang to allow all innocent parties to make progress at the expense of the guilty. You will probably think of a few other approaches that make sense as you think about what userspace expects. > + fd_bad = drm_open_driver(DRIVER_INTEL); > + fd_good = drm_open_driver(DRIVER_INTEL); > + > + assert_reset_status(fd_bad, fd_bad, 0, RS_NO_ERROR); > + assert_reset_status(fd_good, fd_good, 0, RS_NO_ERROR); > + > + while (client_bans_retry--) { > + client_ban = __gem_context_create(fd_bad, &ctx_bad); > + if (client_ban == -EIO) > + break; > + > + noop(fd_bad, ctx_bad, e); > + assert_reset_status(fd_bad, fd_bad, ctx_bad, RS_NO_ERROR); > + > + ctx_bans_retry = 10; > + active_count = 0; > + while (ctx_bans_retry--) { > + inject_hang(fd_bad, ctx_bad, e, BAN); > + active_count++; > + > + ban = noop(fd_bad, ctx_bad, e); > + if (ban == -EIO) > + break; There's a major variation in submitting a queue of hangs that also needs to be taken into account (with different code paths in the kernel). -Chris
diff --git a/tests/gem_reset_stats.c b/tests/gem_reset_stats.c index edc40767..da309237 100644 --- a/tests/gem_reset_stats.c +++ b/tests/gem_reset_stats.c @@ -438,6 +438,61 @@ static void test_ban_ctx(const struct intel_execution_engine *e) close(fd); } +static void test_client_ban(const struct intel_execution_engine *e) +{ + int fd_bad,fd_good; + struct local_drm_i915_reset_stats rs_bad, rs_good; + int ban, ctx_bans_retry = 10; + int client_ban, client_bans_retry = 10; + uint32_t ctx_bad; + uint32_t test_ctx; + int active_count = 0; + + fd_bad = drm_open_driver(DRIVER_INTEL); + fd_good = drm_open_driver(DRIVER_INTEL); + + assert_reset_status(fd_bad, fd_bad, 0, RS_NO_ERROR); + assert_reset_status(fd_good, fd_good, 0, RS_NO_ERROR); + + while (client_bans_retry--) { + client_ban = __gem_context_create(fd_bad, &ctx_bad); + if (client_ban == -EIO) + break; + + noop(fd_bad, ctx_bad, e); + assert_reset_status(fd_bad, fd_bad, ctx_bad, RS_NO_ERROR); + + ctx_bans_retry = 10; + active_count = 0; + while (ctx_bans_retry--) { + inject_hang(fd_bad, ctx_bad, e, BAN); + active_count++; + + ban = noop(fd_bad, ctx_bad, e); + if (ban == -EIO) + break; + } + + igt_assert_eq(ban, -EIO); + + assert_reset_status(fd_bad, fd_bad, ctx_bad, RS_BATCH_ACTIVE); + igt_assert_eq(gem_reset_stats(fd_bad, ctx_bad, &rs_bad), 0); + igt_assert_eq(rs_bad.batch_active, active_count); + + igt_debug("retrying for client ban (%d)\n", client_bans_retry); + } + + igt_assert_eq(__gem_context_create(fd_bad, &test_ctx), -EIO); + + igt_assert_lt(0, noop(fd_good, 0, e)); + assert_reset_status(fd_good, fd_good, 0, RS_NO_ERROR); + igt_assert_eq(gem_reset_stats(fd_good, 0, &rs_good), 0); + igt_assert_eq(rs_good.batch_active, 0); + + close(fd_bad); + close(fd_good); +} + static void test_unrelated_ctx(const struct intel_execution_engine *e) { int fd1,fd2; @@ -817,6 +872,9 @@ igt_main igt_subtest_f("ban-ctx-%s", e->name) RUN_CTX_TEST(test_ban_ctx(e)); + igt_subtest_f("ban-client-%s", e->name) + RUN_CTX_TEST(test_client_ban(e)); + igt_subtest_f("reset-count-%s", e->name) RUN_TEST(test_reset_count(e, false));
A client that submits 'bad' contexts will be banned eventually while other clients are not affected. Add a test for this. v2 - Do not use a fixed number of contexts to ban client (Chris) Signed-off-by: Antonio Argenziano <antonio.argenziano@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> --- tests/gem_reset_stats.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+)