Message ID | 839258138.49105.1564003328543.JavaMail.zimbra@nod.at (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
Series | Backlog support for CAAM? | expand |
On 7/25/2019 12:22 AM, Richard Weinberger wrote: > Hi! > > Recently I had the pleasure to debug a lockup on a imx6 based platform. > It turned out that the lockup was caused by the CAAM driver because it > just returns -EBUSY upon a full job ring. > > Then I found commits: > 0618764cb25f ("dm crypt: fix deadlock when async crypto algorithm returns -EBUSY") > c0403ec0bb5a ("Revert "dm crypt: fix deadlock when async crypto algorithm returns -EBUSY"") > Truly sorry for the inconvenience. Indeed this is a caam driver issue, and not a dm-crypt one. > Is there a reason why the driver has still no proper backlog support? > We've been rejected a few times or the implementation had performance issues: v1: https://patchwork.kernel.org/patch/7144701 v2: https://patchwork.kernel.org/patch/7199241 v3: https://patchwork.kernel.org/patch/7221941 v4: https://patchwork.kernel.org/patch/7230241 v5: https://patchwork.kernel.org/patch/9033121 and we haven't been persistent enough. > If it is just a matter of -ENOPATCH, I have some cycles left an can help. > But before working on this topic I'd like to figure what the current state > or plans are. :-) > Right now we're evaluating two options: -reworking v5 above -using crypto engine (crypto/crypto_engine.c) Ideally crypto engine should be the way to go. However we need to make sure performance degradation is negligible, which unfortunately is not case. Currently it seems that crypto engine has an issue with sending multiple crypto requests from (SW) engine queue -> (HW) caam queue. More exactly, crypto_pump_requests() performs this check: /* Make sure we are not already running a request */ if (engine->cur_req) goto out; thus it's not possible to add more crypto requests to the caam queue until HW finishes the work on the current crypto request and calls crypto_finalize_request(): if (finalize_cur_req) { [...] engine->cur_req = NULL; Horia
----- Ursprüngliche Mail ----- > Von: "horia geanta" <horia.geanta@nxp.com> > An: "richard" <richard@nod.at>, "Linux Crypto Mailing List" <linux-crypto@vger.kernel.org>, "linux-kernel" > <linux-kernel@vger.kernel.org> > CC: "aymen sghaier" <aymen.sghaier@nxp.com>, "david" <david@sigma-star.at>, "Baolin Wang" <baolin.wang@linaro.org> > Gesendet: Donnerstag, 25. Juli 2019 07:57:28 > Betreff: Re: Backlog support for CAAM? > On 7/25/2019 12:22 AM, Richard Weinberger wrote: >> Hi! >> >> Recently I had the pleasure to debug a lockup on a imx6 based platform. >> It turned out that the lockup was caused by the CAAM driver because it >> just returns -EBUSY upon a full job ring. >> >> Then I found commits: >> 0618764cb25f ("dm crypt: fix deadlock when async crypto algorithm returns >> -EBUSY") >> c0403ec0bb5a ("Revert "dm crypt: fix deadlock when async crypto algorithm >> returns -EBUSY"") >> > Truly sorry for the inconvenience. No need to worry. Nobody got hurt. :-) > Indeed this is a caam driver issue, and not a dm-crypt one. > >> Is there a reason why the driver has still no proper backlog support? >> > We've been rejected a few times or the implementation had performance issues: > v1: https://patchwork.kernel.org/patch/7144701 > v2: https://patchwork.kernel.org/patch/7199241 > v3: https://patchwork.kernel.org/patch/7221941 > v4: https://patchwork.kernel.org/patch/7230241 > v5: https://patchwork.kernel.org/patch/9033121 > > and we haven't been persistent enough. > >> If it is just a matter of -ENOPATCH, I have some cycles left an can help. >> But before working on this topic I'd like to figure what the current state >> or plans are. :-) >> > Right now we're evaluating two options: > -reworking v5 above > -using crypto engine (crypto/crypto_engine.c) I'll look into that to get a better understanding. > Ideally crypto engine should be the way to go. > However we need to make sure performance degradation is negligible, > which unfortunately is not case. > > Currently it seems that crypto engine has an issue with sending > multiple crypto requests from (SW) engine queue -> (HW) caam queue. > > More exactly, crypto_pump_requests() performs this check: > /* Make sure we are not already running a request */ > if (engine->cur_req) > goto out; > > thus it's not possible to add more crypto requests to the caam queue > until HW finishes the work on the current crypto request and > calls crypto_finalize_request(): > if (finalize_cur_req) { > [...] > engine->cur_req = NULL; Let me also dig into this. Thanks for all the pointers! Thanks, //richard
----- Ursprüngliche Mail ----- > Right now we're evaluating two options: > -reworking v5 above > -using crypto engine (crypto/crypto_engine.c) > > Ideally crypto engine should be the way to go. > However we need to make sure performance degradation is negligible, > which unfortunately is not case. > > Currently it seems that crypto engine has an issue with sending > multiple crypto requests from (SW) engine queue -> (HW) caam queue. > > More exactly, crypto_pump_requests() performs this check: > /* Make sure we are not already running a request */ > if (engine->cur_req) > goto out; > > thus it's not possible to add more crypto requests to the caam queue > until HW finishes the work on the current crypto request and > calls crypto_finalize_request(): > if (finalize_cur_req) { > [...] > engine->cur_req = NULL; Did you consider using a hybrid approach? Please let me sketch my idea: - Let's have a worker thread which serves a software queue. - The software queue is a linked list of requests. - Upon job submission the driver checks whether the software queue is empty. - If the software queue is empty the regular submission continues. - Is the hardware queue full at this point, the request is put on the software queue and we return EBUSY. - If upon job submission the software queue not empty, the new job is also put on the software queue. - The worker thread is woken up every time a new job is put on the software queue and every time CAAM processed a job. That way we can keep the fast path fast. If hardware queue not full, software queue can be bypassed completely. If the software queue is used once it will become empty as soon jobs are getting submitted at a slower rate and the fast path will be used again. What do you think? Thanks, //richard
On 7/28/2019 11:50 PM, Richard Weinberger wrote: > ----- Ursprüngliche Mail ----- >> Right now we're evaluating two options: >> -reworking v5 above >> -using crypto engine (crypto/crypto_engine.c) >> >> Ideally crypto engine should be the way to go. >> However we need to make sure performance degradation is negligible, >> which unfortunately is not case. >> >> Currently it seems that crypto engine has an issue with sending >> multiple crypto requests from (SW) engine queue -> (HW) caam queue. >> >> More exactly, crypto_pump_requests() performs this check: >> /* Make sure we are not already running a request */ >> if (engine->cur_req) >> goto out; >> >> thus it's not possible to add more crypto requests to the caam queue >> until HW finishes the work on the current crypto request and >> calls crypto_finalize_request(): >> if (finalize_cur_req) { >> [...] >> engine->cur_req = NULL; > > Did you consider using a hybrid approach? > Yes, this is on our plate, though we haven't tried it yet. > Please let me sketch my idea: > > - Let's have a worker thread which serves a software queue. > - The software queue is a linked list of requests. > - Upon job submission the driver checks whether the software queue is empty. > - If the software queue is empty the regular submission continues. > - Is the hardware queue full at this point, the request is put on the software > queue and we return EBUSY. > - If upon job submission the software queue not empty, the new job is also put > on the software queue. > - The worker thread is woken up every time a new job is put on the software > queue and every time CAAM processed a job. > > That way we can keep the fast path fast. If hardware queue not full, software queue > can be bypassed completely. > If the software queue is used once it will become empty as soon jobs are getting > submitted at a slower rate and the fast path will be used again. > > What do you think? > The optimization mentioned above - bypassing SW queue (i.e. try enqueuing to HW queue if SW is empty) should probably be added into crypto engine implementation itself - for e.g. in crypto_transfer_request(). Thanks, Horia
--- a/drivers/crypto/caam/jr.c +++ b/drivers/crypto/caam/jr.c @@ -339,6 +339,7 @@ int caam_jr_enqueue(struct device *dev, u32 *desc, return -EIO; } +again: spin_lock_bh(&jrp->inplock); head = jrp->head; @@ -347,8 +348,8 @@ int caam_jr_enqueue(struct device *dev, u32 *desc, if (!rd_reg32(&jrp->rregs->inpring_avail) || CIRC_SPACE(head, tail, JOBR_DEPTH) <= 0) { spin_unlock_bh(&jrp->inplock); - dma_unmap_single(dev, desc_dma, desc_size, DMA_TO_DEVICE); - return -EBUSY; + msleep(100); + goto again; } head_entry = &jrp->entinfo[head];