Message ID | 20190918185041.22738-5-volodymyr_babchuk@epam.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | arch/arm: optee: fix TODOs and change status to "Tech Preview" | expand |
Hi, On 18/09/2019 19:51, Volodymyr Babchuk wrote: > +/* Handles return from Xen-issued RPC */ > +static void handle_xen_rpc_return(struct optee_domain *ctx, > + struct cpu_user_regs *regs, > + struct optee_std_call *call, > + struct shm_rpc *shm_rpc) > +{ > + call->state = OPTEE_CALL_NORMAL; > + > + /* > + * Right now we have only one reason to be there - we asked guest > + * to free shared buffer and it did it. Now we can tell OP-TEE > + * that buffer allocation failed. We are not storing exact command > + * type, only type of RPC return. So, this is the only check we > + * can perform there. > + */ > + ASSERT(call->rpc_op == OPTEE_SMC_RPC_FUNC_CMD); As I pointed out in v1, ASSERT() is probably the less desirable solution here as this is an error path. Can you explain why you chose that over the 3 solutions I suggested? Cheers,
Hi Julien, Julien Grall writes: > Hi, > > On 18/09/2019 19:51, Volodymyr Babchuk wrote: >> +/* Handles return from Xen-issued RPC */ >> +static void handle_xen_rpc_return(struct optee_domain *ctx, >> + struct cpu_user_regs *regs, >> + struct optee_std_call *call, >> + struct shm_rpc *shm_rpc) >> +{ >> + call->state = OPTEE_CALL_NORMAL; >> + >> + /* >> + * Right now we have only one reason to be there - we asked guest >> + * to free shared buffer and it did it. Now we can tell OP-TEE >> + * that buffer allocation failed. We are not storing exact command >> + * type, only type of RPC return. So, this is the only check we >> + * can perform there. >> + */ >> + ASSERT(call->rpc_op == OPTEE_SMC_RPC_FUNC_CMD); > > As I pointed out in v1, ASSERT() is probably the less desirable > solution here as this is an error path. Looks like I misunderstood you :( > Can you explain why you chose that over the 3 solutions I suggested? I think I need some clarification there. ASSERT() is adopted way to tell about invariant. Clearly, this is programming error if ASSERT() fails. But I understand, that ASSERT() is available only in debug build. So, in release it will never fire. As this is very unlikely error path, bug there can live forever. Okay, so ASSERT() is the least desirable way. Warning is not enough, as we are already in incorrect state. So, best way is to crash guest, it this correct? I'll do this in the next version then.
On 24/09/2019 12:37, Volodymyr Babchuk wrote: > > Hi Julien, > > Julien Grall writes: > >> Hi, >> >> On 18/09/2019 19:51, Volodymyr Babchuk wrote: >>> +/* Handles return from Xen-issued RPC */ >>> +static void handle_xen_rpc_return(struct optee_domain *ctx, >>> + struct cpu_user_regs *regs, >>> + struct optee_std_call *call, >>> + struct shm_rpc *shm_rpc) >>> +{ >>> + call->state = OPTEE_CALL_NORMAL; >>> + >>> + /* >>> + * Right now we have only one reason to be there - we asked guest >>> + * to free shared buffer and it did it. Now we can tell OP-TEE >>> + * that buffer allocation failed. We are not storing exact command >>> + * type, only type of RPC return. So, this is the only check we >>> + * can perform there. >>> + */ >>> + ASSERT(call->rpc_op == OPTEE_SMC_RPC_FUNC_CMD); >> >> As I pointed out in v1, ASSERT() is probably the less desirable >> solution here as this is an error path. > Looks like I misunderstood you :( > >> Can you explain why you chose that over the 3 solutions I suggested? > I think I need some clarification there. ASSERT() is adopted way to tell > about invariant. Clearly, this is programming error if ASSERT() fails. That's correct. > > But I understand, that ASSERT() is available only in debug build. So, in > release it will never fire. As this is very unlikely error path, bug > there can live forever. Okay, so ASSERT() is the least desirable way. This is my concern, ASSERT() are fine in path that can be exercised quite well. By experience, error path as not so well tested, so any verbosity in non-debug build is always helpful. > > Warning is not enough, as we are already in incorrect state. Incorrect state for who? The guest? > > So, best way is to crash guest, it this correct? I'll do this in the > next version then. A rule of thumb is - BUG_ON can be replaced by domain_crash() as this is a fatal error you can't recover (the scope depends on the function call). - ASSERT() can be replaced by WARN_ON() as the former will be a NOP in non-debug build. In both case, the error is not fatal and continue will not result So it means the error is not fatal. You used ASSERT() in your code, so it feels to me that WARN_ON() would be a suitable replacement. However, if the state inconsistency is for Xen, then domain_crash() is the best. If this is for the guest, then this is not really our business and it may be best to let him continue as it could provide more debug (domain_crash() will only dump the stack and registers). Cheers,
Julien Grall writes: > On 24/09/2019 12:37, Volodymyr Babchuk wrote: >> >> Hi Julien, >> >> Julien Grall writes: >> >>> Hi, >>> >>> On 18/09/2019 19:51, Volodymyr Babchuk wrote: >>>> +/* Handles return from Xen-issued RPC */ >>>> +static void handle_xen_rpc_return(struct optee_domain *ctx, >>>> + struct cpu_user_regs *regs, >>>> + struct optee_std_call *call, >>>> + struct shm_rpc *shm_rpc) >>>> +{ >>>> + call->state = OPTEE_CALL_NORMAL; >>>> + >>>> + /* >>>> + * Right now we have only one reason to be there - we asked guest >>>> + * to free shared buffer and it did it. Now we can tell OP-TEE >>>> + * that buffer allocation failed. We are not storing exact command >>>> + * type, only type of RPC return. So, this is the only check we >>>> + * can perform there. >>>> + */ >>>> + ASSERT(call->rpc_op == OPTEE_SMC_RPC_FUNC_CMD); >>> >>> As I pointed out in v1, ASSERT() is probably the less desirable >>> solution here as this is an error path. >> Looks like I misunderstood you :( >> >>> Can you explain why you chose that over the 3 solutions I suggested? >> I think I need some clarification there. ASSERT() is adopted way to tell >> about invariant. Clearly, this is programming error if ASSERT() fails. > > That's correct. > >> >> But I understand, that ASSERT() is available only in debug build. So, in >> release it will never fire. As this is very unlikely error path, bug >> there can live forever. Okay, so ASSERT() is the least desirable way. > > This is my concern, ASSERT() are fine in path that can be exercised > quite well. By experience, error path as not so well tested, so any > verbosity in non-debug build is always helpful. Yep, I see. >> >> Warning is not enough, as we are already in incorrect state. > Incorrect state for who? The guest? Yes, for the current call of the current guest. State of other calls and other guests should not be affected. But it is possible that our view of OP-TEE state for that guest is not correct also. >> >> So, best way is to crash guest, it this correct? I'll do this in the >> next version then. > > A rule of thumb is > - BUG_ON can be replaced by domain_crash() as this is a fatal error > you can't recover (the scope depends on the function call). This seems correct. > > - ASSERT() can be replaced by WARN_ON() as the former will be a NOP > in non-debug build. In both case, the error is not fatal and continue > will not result So it means the error is not fatal. I can't agree with this in general. But maybe this makes sense for Xen. As I said, generally ASSERT() is used to, well, assert that condition is true for any correct state of a program. So it cant' be replaced with WARN_ON(). If we detected invalid state we should either try to correct it (if know how) or to immediately stop the program. But I can see, why this behavior is not desired for hypervisor. Especially in production use. > You used ASSERT() in your code, so it feels to me that WARN_ON() would > be a suitable replacement. Well, honestly I believe that it is better to crash guest to prevent any additional harm. Look, we already detected that something is wrong with mediator state for certain guest. We can pretend that all is fine and try to continue the call. But who knows which consequences it will have? > However, if the state inconsistency is for Xen, then domain_crash() is > the best. If this is for the guest, then this is not really our > business and it may be best to let him continue as it could provide > more debug (domain_crash() will only dump the stack and registers). I'm looking at this from different point: we promised to provide some service for a guest and screwed up. It is not guest's fault. Now we know that we can't provide reliable service for a guest anymore. From safety point of view we should shut down the service. (But this is job for another patch) For now, we at least should crash the guest. This is the safest way. What do you think? -- Volodymyr Babchuk at EPAM
On 24/09/2019 14:30, Volodymyr Babchuk wrote: > > Julien Grall writes: > >> On 24/09/2019 12:37, Volodymyr Babchuk wrote: >>> >>> Hi Julien, >>> >>> Julien Grall writes: >>> >>>> Hi, >>>> >>>> On 18/09/2019 19:51, Volodymyr Babchuk wrote: >>>>> +/* Handles return from Xen-issued RPC */ >>>>> +static void handle_xen_rpc_return(struct optee_domain *ctx, >>>>> + struct cpu_user_regs *regs, >>>>> + struct optee_std_call *call, >>>>> + struct shm_rpc *shm_rpc) >>>>> +{ >>>>> + call->state = OPTEE_CALL_NORMAL; >>>>> + >>>>> + /* >>>>> + * Right now we have only one reason to be there - we asked guest >>>>> + * to free shared buffer and it did it. Now we can tell OP-TEE >>>>> + * that buffer allocation failed. We are not storing exact command >>>>> + * type, only type of RPC return. So, this is the only check we >>>>> + * can perform there. >>>>> + */ >>>>> + ASSERT(call->rpc_op == OPTEE_SMC_RPC_FUNC_CMD); >>>> >>>> As I pointed out in v1, ASSERT() is probably the less desirable >>>> solution here as this is an error path. >>> Looks like I misunderstood you :( >>> >>>> Can you explain why you chose that over the 3 solutions I suggested? >>> I think I need some clarification there. ASSERT() is adopted way to tell >>> about invariant. Clearly, this is programming error if ASSERT() fails. >> >> That's correct. >> >>> >>> But I understand, that ASSERT() is available only in debug build. So, in >>> release it will never fire. As this is very unlikely error path, bug >>> there can live forever. Okay, so ASSERT() is the least desirable way. >> >> This is my concern, ASSERT() are fine in path that can be exercised >> quite well. By experience, error path as not so well tested, so any >> verbosity in non-debug build is always helpful. > Yep, I see. > >>> >>> Warning is not enough, as we are already in incorrect state. >> Incorrect state for who? The guest? > Yes, for the current call of the current guest. State of other calls and > other guests should not be affected. But it is possible that our view of > OP-TEE state for that guest is not correct also. > >>> >>> So, best way is to crash guest, it this correct? I'll do this in the >>> next version then. >> >> A rule of thumb is >> - BUG_ON can be replaced by domain_crash() as this is a fatal error >> you can't recover (the scope depends on the function call). > This seems correct. > >> >> - ASSERT() can be replaced by WARN_ON() as the former will be a NOP >> in non-debug build. In both case, the error is not fatal and continue >> will not result So it means the error is not fatal. > I can't agree with this in general. But maybe this makes sense for Xen. > As I said, generally ASSERT() is used to, well, assert that condition is > true for any correct state of a program. So it cant' be replaced with > WARN_ON(). If we detected invalid state we should either try to correct > it (if know how) or to immediately stop the program. I agree that assert are here to catch that any condition is true. However, as I pointed out, ASSERTs are turned to NOP in production build. So if there were a problem in that path, you would have happily continued with the error. In other words, ASSERT() is only here to help you catch any mistake during development by crashing the hypervisor so you get information on what happen. But your code should be able to cope of any mistake in production build (or you know this can't happen thanks to code coverage). This is very different from domain_crash() where you know that your code is not able to cope with it and you don't want the guest to continue. Note that domain_crash() is asynchronous, so you will still execute some part of the hypervisor (until leave_hypervisor_head() is called). > > But I can see, why this behavior is not desired for > hypervisor. Especially in production use. > >> You used ASSERT() in your code, so it feels to me that WARN_ON() would >> be a suitable replacement. > Well, honestly I believe that it is better to crash guest to prevent > any additional harm. Look, we already detected that something is wrong > with mediator state for certain guest. We can pretend that all is fine > and try to continue the call. But who knows which consequences it will have? > >> However, if the state inconsistency is for Xen, then domain_crash() is >> the best. If this is for the guest, then this is not really our >> business and it may be best to let him continue as it could provide >> more debug (domain_crash() will only dump the stack and registers). > I'm looking at this from different point: we promised to provide some > service for a guest and screwed up. It is not guest's fault. Now we know > that we can't provide reliable service for a guest anymore. From > safety point of view we should shut down the service. (But this is job > for another patch) For now, we at least should crash the > guest. This is the safest way. What do you think? I am happy with domain_crash(). Cheers,
diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c index 88be959819..0a3205f9e8 100644 --- a/xen/arch/arm/tee/optee.c +++ b/xen/arch/arm/tee/optee.c @@ -98,6 +98,11 @@ OPTEE_SMC_SEC_CAP_UNREGISTERED_SHM | \ OPTEE_SMC_SEC_CAP_DYNAMIC_SHM) +enum optee_call_state { + OPTEE_CALL_NORMAL, + OPTEE_CALL_XEN_RPC, +}; + static unsigned int __read_mostly max_optee_threads; /* @@ -114,6 +119,9 @@ struct optee_std_call { paddr_t guest_arg_ipa; int optee_thread_id; int rpc_op; + /* Saved buffer type for the current buffer allocate request */ + unsigned int rpc_buffer_type; + enum optee_call_state state; uint64_t rpc_data_cookie; bool in_flight; register_t rpc_params[2]; @@ -301,6 +309,7 @@ static struct optee_std_call *allocate_std_call(struct optee_domain *ctx) call->optee_thread_id = -1; call->in_flight = true; + call->state = OPTEE_CALL_NORMAL; spin_lock(&ctx->lock); list_add_tail(&call->list, &ctx->call_list); @@ -1086,6 +1095,10 @@ static int handle_rpc_return(struct optee_domain *ctx, ret = -ERESTART; } + /* Save the buffer type in case we will want to free it */ + if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC ) + call->rpc_buffer_type = shm_rpc->xen_arg->params[0].u.value.a; + unmap_domain_page(shm_rpc->xen_arg); } @@ -1250,18 +1263,107 @@ err: return; } +/* + * Prepare RPC request to free shared buffer in the same way, as + * OP-TEE does this. + * + * Return values: + * true - successfully prepared RPC request + * false - there was an error + */ +static bool issue_rpc_cmd_free(struct optee_domain *ctx, + struct cpu_user_regs *regs, + struct optee_std_call *call, + struct shm_rpc *shm_rpc, + uint64_t cookie) +{ + register_t r1, r2; + + /* In case if guest will forget to update it with meaningful value */ + shm_rpc->xen_arg->ret = TEEC_ERROR_GENERIC; + shm_rpc->xen_arg->cmd = OPTEE_RPC_CMD_SHM_FREE; + shm_rpc->xen_arg->num_params = 1; + shm_rpc->xen_arg->params[0].attr = OPTEE_MSG_ATTR_TYPE_VALUE_INPUT; + shm_rpc->xen_arg->params[0].u.value.a = call->rpc_buffer_type; + shm_rpc->xen_arg->params[0].u.value.b = cookie; + + if ( access_guest_memory_by_ipa(current->domain, + gfn_to_gaddr(shm_rpc->gfn), + shm_rpc->xen_arg, + OPTEE_MSG_GET_ARG_SIZE(1), + true) ) + { + /* + * Well, this is quite bad. We have error in the error + * path. This can happen only if guest behaves badly, so all + * we can do is to return error to OP-TEE and leave guest's + * memory leaked. We already have freed all resources + * allocated for this buffer, but guest will never receive + * OPTEE_RPC_CMD_SHM_FREE request, so it will not know that it + * can release allocated buffer. + */ + shm_rpc->xen_arg->ret = TEEC_ERROR_GENERIC; + shm_rpc->xen_arg->num_params = 0; + + return false; + } + + uint64_to_regpair(&r1, &r2, shm_rpc->cookie); + + call->state = OPTEE_CALL_XEN_RPC; + call->rpc_op = OPTEE_SMC_RPC_FUNC_CMD; + call->rpc_params[0] = r1; + call->rpc_params[1] = r2; + call->optee_thread_id = get_user_reg(regs, 3); + + set_user_reg(regs, 0, OPTEE_SMC_RETURN_RPC_CMD); + set_user_reg(regs, 1, r1); + set_user_reg(regs, 2, r2); + + return true; +} + +/* Handles return from Xen-issued RPC */ +static void handle_xen_rpc_return(struct optee_domain *ctx, + struct cpu_user_regs *regs, + struct optee_std_call *call, + struct shm_rpc *shm_rpc) +{ + call->state = OPTEE_CALL_NORMAL; + + /* + * Right now we have only one reason to be there - we asked guest + * to free shared buffer and it did it. Now we can tell OP-TEE + * that buffer allocation failed. We are not storing exact command + * type, only type of RPC return. So, this is the only check we + * can perform there. + */ + ASSERT(call->rpc_op == OPTEE_SMC_RPC_FUNC_CMD); + + /* + * We are not checking return value from a guest because we assume + * that OPTEE_RPC_CMD_SHM_FREE never fails. + */ + shm_rpc->xen_arg->ret = TEEC_ERROR_GENERIC; + shm_rpc->xen_arg->num_params = 0; +} + /* * This function is called when guest is finished processing RPC * request from OP-TEE and wished to resume the interrupted standard * call. + * + * Return values: + * false - there was an error, do not call OP-TEE + * true - success, proceed as normal */ -static void handle_rpc_cmd_alloc(struct optee_domain *ctx, +static bool handle_rpc_cmd_alloc(struct optee_domain *ctx, struct cpu_user_regs *regs, struct optee_std_call *call, struct shm_rpc *shm_rpc) { if ( shm_rpc->xen_arg->ret || shm_rpc->xen_arg->num_params != 1 ) - return; + return true; if ( shm_rpc->xen_arg->params[0].attr != (OPTEE_MSG_ATTR_TYPE_TMEM_OUTPUT | OPTEE_MSG_ATTR_NONCONTIG) ) @@ -1269,7 +1371,7 @@ static void handle_rpc_cmd_alloc(struct optee_domain *ctx, gdprintk(XENLOG_WARNING, "Invalid attrs for shared mem buffer: %"PRIx64"\n", shm_rpc->xen_arg->params[0].attr); - return; + return true; } /* Free pg list for buffer */ @@ -1285,21 +1387,14 @@ static void handle_rpc_cmd_alloc(struct optee_domain *ctx, { call->rpc_data_cookie = 0; /* - * Okay, so there was problem with guest's buffer and we need - * to tell about this to OP-TEE. - */ - shm_rpc->xen_arg->ret = TEEC_ERROR_GENERIC; - shm_rpc->xen_arg->num_params = 0; - /* - * TODO: With current implementation, OP-TEE will not issue - * RPC to free this buffer. Guest and OP-TEE will be out of - * sync: guest believes that it provided buffer to OP-TEE, - * while OP-TEE thinks of opposite. Ideally, we need to - * emulate RPC with OPTEE_MSG_RPC_CMD_SHM_FREE command. + * We are unable to translate guest's buffer, so we need tell guest + * to free it, before reporting an error to OP-TEE. */ - gprintk(XENLOG_WARNING, - "translate_noncontig() failed, OP-TEE/guest state is out of sync.\n"); + return !issue_rpc_cmd_free(ctx, regs, call, shm_rpc, + shm_rpc->xen_arg->params[0].u.tmem.shm_ref); } + + return true; } static void handle_rpc_cmd(struct optee_domain *ctx, struct cpu_user_regs *regs, @@ -1349,22 +1444,37 @@ static void handle_rpc_cmd(struct optee_domain *ctx, struct cpu_user_regs *regs, goto out; } - switch (shm_rpc->xen_arg->cmd) + if ( call->state == OPTEE_CALL_NORMAL ) { - case OPTEE_RPC_CMD_GET_TIME: - case OPTEE_RPC_CMD_WAIT_QUEUE: - case OPTEE_RPC_CMD_SUSPEND: - break; - case OPTEE_RPC_CMD_SHM_ALLOC: - handle_rpc_cmd_alloc(ctx, regs, call, shm_rpc); - break; - case OPTEE_RPC_CMD_SHM_FREE: - free_optee_shm_buf(ctx, shm_rpc->xen_arg->params[0].u.value.b); - if ( call->rpc_data_cookie == shm_rpc->xen_arg->params[0].u.value.b ) - call->rpc_data_cookie = 0; - break; - default: - break; + switch (shm_rpc->xen_arg->cmd) + { + case OPTEE_RPC_CMD_GET_TIME: + case OPTEE_RPC_CMD_WAIT_QUEUE: + case OPTEE_RPC_CMD_SUSPEND: + break; + case OPTEE_RPC_CMD_SHM_ALLOC: + if ( !handle_rpc_cmd_alloc(ctx, regs, call, shm_rpc) ) + { + /* We failed to translate buffer, report back to guest */ + unmap_domain_page(shm_rpc->xen_arg); + put_std_call(ctx, call); + + return; + } + break; + case OPTEE_RPC_CMD_SHM_FREE: + free_optee_shm_buf(ctx, shm_rpc->xen_arg->params[0].u.value.b); + if ( call->rpc_data_cookie == + shm_rpc->xen_arg->params[0].u.value.b ) + call->rpc_data_cookie = 0; + break; + default: + break; + } + } + else + { + handle_xen_rpc_return(ctx, regs, call, shm_rpc); } out:
There is a case possible, when OP-TEE asks guest to allocate shared buffer, but Xen for some reason can't translate buffer's addresses. In this situation we should do two things: 1. Tell guest to free allocated buffer, so there will be no memory leak for guest. 2. Tell OP-TEE that buffer allocation failed. To ask guest to free allocated buffer we should perform the same thing, as OP-TEE does - issue RPC request. This is done by filling request buffer (luckily we can reuse the same buffer, that OP-TEE used to issue original request) and then return to guest with special return code. Then we need to handle next call from guest in a special way: as RPC was issued by Xen, not by OP-TEE, it should be handled by Xen. Basically, this is the mechanism to preempt OP-TEE mediator. The same mechanism can be used in the future to preempt mediator during translation large (>512 pages) shared buffers. Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> --- Changes from v1: - Renamed OPTEEM_CALL_* to OPTEE_CALL_* - Fixed comments - Added ASSERT() in handle_xen_rpc_return() --- xen/arch/arm/tee/optee.c | 172 ++++++++++++++++++++++++++++++++------- 1 file changed, 141 insertions(+), 31 deletions(-)