[1/2] drivers: qcom: rpmh-rsc: simplify TCS locking

Message ID	20190701152907.16407-1-ilina@codeaurora.org (mailing list archive)
State	Not Applicable, archived
Headers	show Return-Path: <linux-pm-owner@kernel.org> sender: ilina@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 092BF602BC; Mon, 1 Jul 2019 15:32:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 092BF602BC From: Lina Iyer <ilina@codeaurora.org> To: andy.gross@linaro.org, bjorn.andersson@linaro.org Cc: linux-arm-msm@vger.kernel.org, linux-soc@vger.kernel.org, rnayak@codeaurora.org, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, swboyd@chromium.org, dianders@chromium.org, mkshah@codeaurora.org, "Raju P.L.S.S.S.N" <rplsssn@codeaurora.org>, Lina Iyer <ilina@codeaurora.org> Subject: [PATCH 1/2] drivers: qcom: rpmh-rsc: simplify TCS locking Date: Mon, 1 Jul 2019 09:29:06 -0600 Message-Id: <20190701152907.16407-1-ilina@codeaurora.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-pm-owner@vger.kernel.org Precedence: bulk
Series	[1/2] drivers: qcom: rpmh-rsc: simplify TCS locking \| expand [1/2] drivers: qcom: rpmh-rsc: simplify TCS locking [2/2] drivers: qcom: rpmh-rsc: fix read back of trigger register

Lina Iyer July 1, 2019, 3:29 p.m. UTC

From: "Raju P.L.S.S.S.N" <rplsssn@codeaurora.org>

tcs->lock was introduced to serialize access with in TCS group. But
even without tcs->lock, drv->lock is serving the same purpose. So
use a single drv->lock.

Other optimizations include -
 - Remove locking around clear_bit() in IRQ handler. clear_bit() is
   atomic.
 - Remove redundant read of TCS registers.
 - Use spin_lock instead of _irq variants as the locks are not held
   in interrupt context.

Fixes: 658628 ("drivers: qcom: rpmh-rsc: add RPMH controller for QCOM
SoCs")
Signed-off-by: Raju P.L.S.S.S.N <rplsssn@codeaurora.org>
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
---
 drivers/soc/qcom/rpmh-internal.h |  2 --
 drivers/soc/qcom/rpmh-rsc.c      | 37 +++++++++++---------------------
 drivers/soc/qcom/rpmh.c          | 20 +++++++----------
 3 files changed, 21 insertions(+), 38 deletions(-)

Lina Iyer July 1, 2019, 3:52 p.m. UTC | #1

Switching Andy's email address.

On Mon, Jul 01 2019 at 09:32 -0600, Lina Iyer wrote:
>From: "Raju P.L.S.S.S.N" <rplsssn@codeaurora.org>
>
>tcs->lock was introduced to serialize access with in TCS group. But
>even without tcs->lock, drv->lock is serving the same purpose. So
>use a single drv->lock.
>
>Other optimizations include -
> - Remove locking around clear_bit() in IRQ handler. clear_bit() is
>   atomic.
> - Remove redundant read of TCS registers.
> - Use spin_lock instead of _irq variants as the locks are not held
>   in interrupt context.
>
>Fixes: 658628 ("drivers: qcom: rpmh-rsc: add RPMH controller for QCOM
>SoCs")
>Signed-off-by: Raju P.L.S.S.S.N <rplsssn@codeaurora.org>
>Signed-off-by: Lina Iyer <ilina@codeaurora.org>
>---
> drivers/soc/qcom/rpmh-internal.h |  2 --
> drivers/soc/qcom/rpmh-rsc.c      | 37 +++++++++++---------------------
> drivers/soc/qcom/rpmh.c          | 20 +++++++----------
> 3 files changed, 21 insertions(+), 38 deletions(-)
>
>diff --git a/drivers/soc/qcom/rpmh-internal.h b/drivers/soc/qcom/rpmh-internal.h
>index a7bbbb67991c..969d5030860e 100644
>--- a/drivers/soc/qcom/rpmh-internal.h
>+++ b/drivers/soc/qcom/rpmh-internal.h
>@@ -28,7 +28,6 @@ struct rsc_drv;
>  * @offset:    start of the TCS group relative to the TCSes in the RSC
>  * @num_tcs:   number of TCSes in this type
>  * @ncpt:      number of commands in each TCS
>- * @lock:      lock for synchronizing this TCS writes
>  * @req:       requests that are sent from the TCS
>  * @cmd_cache: flattened cache of cmds in sleep/wake TCS
>  * @slots:     indicates which of @cmd_addr are occupied
>@@ -40,7 +39,6 @@ struct tcs_group {
> 	u32 offset;
> 	int num_tcs;
> 	int ncpt;
>-	spinlock_t lock;
> 	const struct tcs_request *req[MAX_TCS_PER_TYPE];
> 	u32 *cmd_cache;
> 	DECLARE_BITMAP(slots, MAX_TCS_SLOTS);
>diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
>index e278fc11fe5c..92461311aef3 100644
>--- a/drivers/soc/qcom/rpmh-rsc.c
>+++ b/drivers/soc/qcom/rpmh-rsc.c
>@@ -93,8 +93,7 @@ static void write_tcs_reg_sync(struct rsc_drv *drv, int reg, int tcs_id,
>
> static bool tcs_is_free(struct rsc_drv *drv, int tcs_id)
> {
>-	return !test_bit(tcs_id, drv->tcs_in_use) &&
>-	       read_tcs_reg(drv, RSC_DRV_STATUS, tcs_id, 0);
>+	return !test_bit(tcs_id, drv->tcs_in_use);
> }
>
> static struct tcs_group *get_tcs_of_type(struct rsc_drv *drv, int type)
>@@ -104,29 +103,28 @@ static struct tcs_group *get_tcs_of_type(struct rsc_drv *drv, int type)
>
> static int tcs_invalidate(struct rsc_drv *drv, int type)
> {
>-	int m;
>+	int m, ret = 0;
> 	struct tcs_group *tcs;
>
> 	tcs = get_tcs_of_type(drv, type);
>
>-	spin_lock(&tcs->lock);
>-	if (bitmap_empty(tcs->slots, MAX_TCS_SLOTS)) {
>-		spin_unlock(&tcs->lock);
>-		return 0;
>-	}
>+	spin_lock(&drv->lock);
>+	if (bitmap_empty(tcs->slots, MAX_TCS_SLOTS))
>+		goto done;
>
> 	for (m = tcs->offset; m < tcs->offset + tcs->num_tcs; m++) {
> 		if (!tcs_is_free(drv, m)) {
>-			spin_unlock(&tcs->lock);
>-			return -EAGAIN;
>+			ret = -EAGAIN;
>+			goto done;
> 		}
> 		write_tcs_reg_sync(drv, RSC_DRV_CMD_ENABLE, m, 0);
> 		write_tcs_reg_sync(drv, RSC_DRV_CMD_WAIT_FOR_CMPL, m, 0);
> 	}
> 	bitmap_zero(tcs->slots, MAX_TCS_SLOTS);
>-	spin_unlock(&tcs->lock);
>
>-	return 0;
>+done:
>+	spin_unlock(&drv->lock);
>+	return ret;
> }
>
> /**
>@@ -242,9 +240,7 @@ static irqreturn_t tcs_tx_done(int irq, void *p)
> 		write_tcs_reg(drv, RSC_DRV_CMD_ENABLE, i, 0);
> 		write_tcs_reg(drv, RSC_DRV_CMD_WAIT_FOR_CMPL, i, 0);
> 		write_tcs_reg(drv, RSC_DRV_IRQ_CLEAR, 0, BIT(i));
>-		spin_lock(&drv->lock);
> 		clear_bit(i, drv->tcs_in_use);
>-		spin_unlock(&drv->lock);
> 		if (req)
> 			rpmh_tx_done(req, err);
> 	}
>@@ -349,14 +345,12 @@ static int tcs_write(struct rsc_drv *drv, const struct tcs_request *msg)
> {
> 	struct tcs_group *tcs;
> 	int tcs_id;
>-	unsigned long flags;
> 	int ret;
>
> 	tcs = get_tcs_for_msg(drv, msg);
> 	if (IS_ERR(tcs))
> 		return PTR_ERR(tcs);
>
>-	spin_lock_irqsave(&tcs->lock, flags);
> 	spin_lock(&drv->lock);
> 	/*
> 	 * The h/w does not like if we send a request to the same address,
>@@ -364,26 +358,23 @@ static int tcs_write(struct rsc_drv *drv, const struct tcs_request *msg)
> 	 */
> 	ret = check_for_req_inflight(drv, tcs, msg);
> 	if (ret) {
>-		spin_unlock(&drv->lock);
> 		goto done_write;
> 	}
>
> 	tcs_id = find_free_tcs(tcs);
> 	if (tcs_id < 0) {
> 		ret = tcs_id;
>-		spin_unlock(&drv->lock);
> 		goto done_write;
> 	}
>
> 	tcs->req[tcs_id - tcs->offset] = msg;
> 	set_bit(tcs_id, drv->tcs_in_use);
>-	spin_unlock(&drv->lock);
>
> 	__tcs_buffer_write(drv, tcs_id, 0, msg);
> 	__tcs_trigger(drv, tcs_id);
>
> done_write:
>-	spin_unlock_irqrestore(&tcs->lock, flags);
>+	spin_unlock(&drv->lock);
> 	return ret;
> }
>
>@@ -481,19 +472,18 @@ static int tcs_ctrl_write(struct rsc_drv *drv, const struct tcs_request *msg)
> {
> 	struct tcs_group *tcs;
> 	int tcs_id = 0, cmd_id = 0;
>-	unsigned long flags;
> 	int ret;
>
> 	tcs = get_tcs_for_msg(drv, msg);
> 	if (IS_ERR(tcs))
> 		return PTR_ERR(tcs);
>
>-	spin_lock_irqsave(&tcs->lock, flags);
>+	spin_lock(&drv->lock);
> 	/* find the TCS id and the command in the TCS to write to */
> 	ret = find_slots(tcs, msg, &tcs_id, &cmd_id);
> 	if (!ret)
> 		__tcs_buffer_write(drv, tcs_id, cmd_id, msg);
>-	spin_unlock_irqrestore(&tcs->lock, flags);
>+	spin_unlock(&drv->lock);
>
> 	return ret;
> }
>@@ -584,7 +574,6 @@ static int rpmh_probe_tcs_config(struct platform_device *pdev,
> 		tcs->type = tcs_cfg[i].type;
> 		tcs->num_tcs = tcs_cfg[i].n;
> 		tcs->ncpt = ncpt;
>-		spin_lock_init(&tcs->lock);
>
> 		if (!tcs->num_tcs || tcs->type == CONTROL_TCS)
> 			continue;
>diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
>index 035091fd44b8..12f830610b94 100644
>--- a/drivers/soc/qcom/rpmh.c
>+++ b/drivers/soc/qcom/rpmh.c
>@@ -118,9 +118,8 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
> 					   struct tcs_cmd *cmd)
> {
> 	struct cache_req *req;
>-	unsigned long flags;
>
>-	spin_lock_irqsave(&ctrlr->cache_lock, flags);
>+	spin_lock(&ctrlr->cache_lock);
> 	req = __find_req(ctrlr, cmd->addr);
> 	if (req)
> 		goto existing;
>@@ -154,7 +153,7 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
>
> 	ctrlr->dirty = true;
> unlock:
>-	spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>+	spin_unlock(&ctrlr->cache_lock);
>
> 	return req;
> }
>@@ -283,23 +282,21 @@ EXPORT_SYMBOL(rpmh_write);
>
> static void cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
> {
>-	unsigned long flags;
>
>-	spin_lock_irqsave(&ctrlr->cache_lock, flags);
>+	spin_lock(&ctrlr->cache_lock);
> 	list_add_tail(&req->list, &ctrlr->batch_cache);
>-	spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>+	spin_unlock(&ctrlr->cache_lock);
> }
>
> static int flush_batch(struct rpmh_ctrlr *ctrlr)
> {
> 	struct batch_cache_req *req;
> 	const struct rpmh_request *rpm_msg;
>-	unsigned long flags;
> 	int ret = 0;
> 	int i;
>
> 	/* Send Sleep/Wake requests to the controller, expect no response */
>-	spin_lock_irqsave(&ctrlr->cache_lock, flags);
>+	spin_lock(&ctrlr->cache_lock);
> 	list_for_each_entry(req, &ctrlr->batch_cache, list) {
> 		for (i = 0; i < req->count; i++) {
> 			rpm_msg = req->rpm_msgs + i;
>@@ -309,7 +306,7 @@ static int flush_batch(struct rpmh_ctrlr *ctrlr)
> 				break;
> 		}
> 	}
>-	spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>+	spin_unlock(&ctrlr->cache_lock);
>
> 	return ret;
> }
>@@ -317,13 +314,12 @@ static int flush_batch(struct rpmh_ctrlr *ctrlr)
> static void invalidate_batch(struct rpmh_ctrlr *ctrlr)
> {
> 	struct batch_cache_req *req, *tmp;
>-	unsigned long flags;
>
>-	spin_lock_irqsave(&ctrlr->cache_lock, flags);
>+	spin_lock(&ctrlr->cache_lock);
> 	list_for_each_entry_safe(req, tmp, &ctrlr->batch_cache, list)
> 		kfree(req);
> 	INIT_LIST_HEAD(&ctrlr->batch_cache);
>-	spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>+	spin_unlock(&ctrlr->cache_lock);
> }
>
> /**
>--
>The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
>a Linux Foundation Collaborative Project
>

Stephen Boyd July 19, 2019, 6:20 p.m. UTC | #2

Quoting Lina Iyer (2019-07-01 08:29:06)
> From: "Raju P.L.S.S.S.N" <rplsssn@codeaurora.org>
> 
> tcs->lock was introduced to serialize access with in TCS group. But
> even without tcs->lock, drv->lock is serving the same purpose. So
> use a single drv->lock.

Isn't the downside now that we're going to be serializing access to the
different TCSes when two are being written in parallel or waited on? I
thought that was the whole point of splitting the lock into a TCS lock
and a general "driver" lock that protects the global driver state vs.
the specific TCS state.

> 
> Other optimizations include -
>  - Remove locking around clear_bit() in IRQ handler. clear_bit() is
>    atomic.
>  - Remove redundant read of TCS registers.
>  - Use spin_lock instead of _irq variants as the locks are not held
>    in interrupt context.

Can you please split this patch up into 3 or 4 different patches? I'm
not sure why any of these patches are marked with Fixes either. It's an
optimization patch, not a fix patch, unless the optimization is really
large somehow?

> 
> Fixes: 658628 ("drivers: qcom: rpmh-rsc: add RPMH controller for QCOM
> SoCs")
> Signed-off-by: Raju P.L.S.S.S.N <rplsssn@codeaurora.org>
> Signed-off-by: Lina Iyer <ilina@codeaurora.org>
> ---
>  drivers/soc/qcom/rpmh-internal.h |  2 --
>  drivers/soc/qcom/rpmh-rsc.c      | 37 +++++++++++---------------------
>  drivers/soc/qcom/rpmh.c          | 20 +++++++----------
>  3 files changed, 21 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/soc/qcom/rpmh-internal.h b/drivers/soc/qcom/rpmh-internal.h
> index a7bbbb67991c..969d5030860e 100644
> --- a/drivers/soc/qcom/rpmh-internal.h
> +++ b/drivers/soc/qcom/rpmh-internal.h
> diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
> index e278fc11fe5c..92461311aef3 100644
> --- a/drivers/soc/qcom/rpmh-rsc.c
> +++ b/drivers/soc/qcom/rpmh-rsc.c
> @@ -93,8 +93,7 @@ static void write_tcs_reg_sync(struct rsc_drv *drv, int reg, int tcs_id,
>  
>  static bool tcs_is_free(struct rsc_drv *drv, int tcs_id)
>  {
> -       return !test_bit(tcs_id, drv->tcs_in_use) &&
> -              read_tcs_reg(drv, RSC_DRV_STATUS, tcs_id, 0);
> +       return !test_bit(tcs_id, drv->tcs_in_use);

This can be a different patch. Why is reading the tcs register
redundant? Please put that information in the commit text.

Lina Iyer July 22, 2019, 4:20 p.m. UTC | #3

On Fri, Jul 19 2019 at 12:20 -0600, Stephen Boyd wrote:
>Quoting Lina Iyer (2019-07-01 08:29:06)
>> From: "Raju P.L.S.S.S.N" <rplsssn@codeaurora.org>
>>
>> tcs->lock was introduced to serialize access with in TCS group. But
>> even without tcs->lock, drv->lock is serving the same purpose. So
>> use a single drv->lock.
>
>Isn't the downside now that we're going to be serializing access to the
>different TCSes when two are being written in parallel or waited on? I
>thought that was the whole point of splitting the lock into a TCS lock
>and a general "driver" lock that protects the global driver state vs.
>the specific TCS state.
>
Yes but we were holding the drv->lock as well as tcs->lock for the most
critical of the path anyways (writing to TCS). The added complexity
doesn't seem to help reduce the latency that it expected to reduce.
>>
>> Other optimizations include -
>>  - Remove locking around clear_bit() in IRQ handler. clear_bit() is
>>    atomic.
>>  - Remove redundant read of TCS registers.
>>  - Use spin_lock instead of _irq variants as the locks are not held
>>    in interrupt context.
>
>Can you please split this patch up into 3 or 4 different patches? I'm
>not sure why any of these patches are marked with Fixes either. It's an
>optimization patch, not a fix patch, unless the optimization is really
>large somehow?
>
Okay. I will try that.
>>
>> Fixes: 658628 ("drivers: qcom: rpmh-rsc: add RPMH controller for QCOM
>> SoCs")
>> Signed-off-by: Raju P.L.S.S.S.N <rplsssn@codeaurora.org>
>> Signed-off-by: Lina Iyer <ilina@codeaurora.org>
>> ---
>>  drivers/soc/qcom/rpmh-internal.h |  2 --
>>  drivers/soc/qcom/rpmh-rsc.c      | 37 +++++++++++---------------------
>>  drivers/soc/qcom/rpmh.c          | 20 +++++++----------
>>  3 files changed, 21 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/soc/qcom/rpmh-internal.h b/drivers/soc/qcom/rpmh-internal.h
>> index a7bbbb67991c..969d5030860e 100644
>> --- a/drivers/soc/qcom/rpmh-internal.h
>> +++ b/drivers/soc/qcom/rpmh-internal.h
>> diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
>> index e278fc11fe5c..92461311aef3 100644
>> --- a/drivers/soc/qcom/rpmh-rsc.c
>> +++ b/drivers/soc/qcom/rpmh-rsc.c
>> @@ -93,8 +93,7 @@ static void write_tcs_reg_sync(struct rsc_drv *drv, int reg, int tcs_id,
>>
>>  static bool tcs_is_free(struct rsc_drv *drv, int tcs_id)
>>  {
>> -       return !test_bit(tcs_id, drv->tcs_in_use) &&
>> -              read_tcs_reg(drv, RSC_DRV_STATUS, tcs_id, 0);
>> +       return !test_bit(tcs_id, drv->tcs_in_use);
>
>This can be a diffedjusted rent patch. Why is reading the tcs register
>redundant? Please put that information in the commit text.
>
The tcs_in_use, is adjusted along with the DRV_STS and reading the
tcs_in_use should be enough.

Thanks for your review Stephen.

--Lina

Stephen Boyd July 22, 2019, 6:18 p.m. UTC | #4

Quoting Lina Iyer (2019-07-22 09:20:03)
> On Fri, Jul 19 2019 at 12:20 -0600, Stephen Boyd wrote:
> >Quoting Lina Iyer (2019-07-01 08:29:06)
> >> From: "Raju P.L.S.S.S.N" <rplsssn@codeaurora.org>
> >>
> >> tcs->lock was introduced to serialize access with in TCS group. But
> >> even without tcs->lock, drv->lock is serving the same purpose. So
> >> use a single drv->lock.
> >
> >Isn't the downside now that we're going to be serializing access to the
> >different TCSes when two are being written in parallel or waited on? I
> >thought that was the whole point of splitting the lock into a TCS lock
> >and a general "driver" lock that protects the global driver state vs.
> >the specific TCS state.
> >
> Yes but we were holding the drv->lock as well as tcs->lock for the most
> critical of the path anyways (writing to TCS). The added complexity
> doesn't seem to help reduce the latency that it expected to reduce.

Ok. That sort of information should be in the commit text to explain why
it's not helping with reducing the latency or throughput of the API.

Lina Iyer July 22, 2019, 7:46 p.m. UTC | #5

On Mon, Jul 22 2019 at 12:18 -0600, Stephen Boyd wrote:
>Quoting Lina Iyer (2019-07-22 09:20:03)
>> On Fri, Jul 19 2019 at 12:20 -0600, Stephen Boyd wrote:
>> >Quoting Lina Iyer (2019-07-01 08:29:06)
>> >> From: "Raju P.L.S.S.S.N" <rplsssn@codeaurora.org>
>> >>
>> >> tcs->lock was introduced to serialize access with in TCS group. But
>> >> even without tcs->lock, drv->lock is serving the same purpose. So
>> >> use a single drv->lock.
>> >
>> >Isn't the downside now that we're going to be serializing access to the
>> >different TCSes when two are being written in parallel or waited on? I
>> >thought that was the whole point of splitting the lock into a TCS lock
>> >and a general "driver" lock that protects the global driver state vs.
>> >the specific TCS state.
>> >
>> Yes but we were holding the drv->lock as well as tcs->lock for the most
>> critical of the path anyways (writing to TCS). The added complexity
>> doesn't seem to help reduce the latency that it expected to reduce.
>
>Ok. That sort of information should be in the commit text to explain why
>it's not helping with reducing the latency or throughput of the API.
>
Will add.

--Lina

[1/2] drivers: qcom: rpmh-rsc: simplify TCS locking

Commit Message

Comments

Patch