From patchwork Tue Oct 11 23:15:59 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: subhashj@codeaurora.org
X-Patchwork-Id: 9371853
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	CD3EB6048F for <patchwork-linux-scsi@patchwork.kernel.org>;
	Tue, 11 Oct 2016 23:16:40 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BFF9928EA4
	for <patchwork-linux-scsi@patchwork.kernel.org>;
	Tue, 11 Oct 2016 23:16:40 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id B4C0529080; Tue, 11 Oct 2016 23:16:40 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3C73928EA4
	for <patchwork-linux-scsi@patchwork.kernel.org>;
	Tue, 11 Oct 2016 23:16:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754410AbcJKXQX (ORCPT
	<rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
	Tue, 11 Oct 2016 19:16:23 -0400
Received: from smtp.codeaurora.org ([198.145.29.96]:47071 "EHLO
	smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752845AbcJKXQW (ORCPT
	<rfc822; linux-scsi@vger.kernel.org>); Tue, 11 Oct 2016 19:16:22 -0400
Received: by smtp.codeaurora.org (Postfix, from userid 1000)
	id 5D2B761787; Tue, 11 Oct 2016 23:16:21 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org;
	s=default; t=1476227781;
	bh=cB9OcOTk4rlNzfh2DceqqN2qWl36kRcegyWNXdSwIjM=;
	h=From:To:Cc:Subject:Date:From;
	b=Mdv8L8ij+glcXU/vukd/gK9gArREOBLKzgvRcrGLxvwURw5HWwenRymgMWkTTjRf8
	I1XPMmfhm4PJ+yfEu85WvwFIeo3j+Vap13e1u+1nwSbQpRPwwxrjjaKNXUPYWvO9my
	69j5dk1oGbK5g5MppuQGN4sBh27mjTdeYB9YEb8M=
Received: from subhashj-linux1.qualcomm.com (i-global254.qualcomm.com
	[199.106.103.254])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits))
	(No client certificate requested)
	(Authenticated sender: subhashj@smtp.codeaurora.org)
	by smtp.codeaurora.org (Postfix) with ESMTPSA id 69FCC61787;
	Tue, 11 Oct 2016 23:16:20 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org;
	s=default; t=1476227780;
	bh=cB9OcOTk4rlNzfh2DceqqN2qWl36kRcegyWNXdSwIjM=;
	h=From:To:Cc:Subject:Date:From;
	b=ldgRqYjD6Tk181CA3T16L8kmmUNU+PxOy2Ue+k/HbOrMDYtdS3xBvP05pnf/bem60
	9KQahOjR+jQysZSjPYAsnpu70i36cwkRIHOJwKNI2LysTesA+udJ16p9C5QeGiz/HV
	JuYINr4b02v+YwqdUKgqEshLKqdxEe/jnf8jSSH0=
DMARC-Filter: OpenDMARC Filter v1.3.1 smtp.codeaurora.org 69FCC61787
Authentication-Results: pdx-caf-mail.web.codeaurora.org;
	dmarc=none header.from=codeaurora.org
Authentication-Results: pdx-caf-mail.web.codeaurora.org;
	spf=pass smtp.mailfrom=subhashj@codeaurora.org
From: Subhash Jadavani <subhashj@codeaurora.org>
To: vinholikatti@gmail.com, jejb@linux.vnet.ibm.com,
	martin.petersen@oracle.com
Cc: Subhash Jadavani <subhashj@codeaurora.org>,
	linux-scsi@vger.kernel.org (open list:UNIVERSAL FLASH STORAGE HOST
	CONTROLLER DRIVER), linux-kernel@vger.kernel.org (open list)
Subject: [RESEND PATCH] scsi: ufs: fix race between clock gating and devfreq
	scaling work
Date: Tue, 11 Oct 2016 16:15:59 -0700
Message-Id: <1476227759-22389-1-git-send-email-subhashj@codeaurora.org>
X-Mailer: git-send-email 1.9.1
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

UFS devfreq clock scaling work may require clocks to be ON if it need to
execute some UFS commands hence it may request for clock hold before
issuing the command. But if UFS clock gating work is already running in
parallel, ungate work would end up waiting for the clock gating work to
finish and as clock gating work would also wait for the clock scaling work
to finish, we would enter in deadlock state. Here is the call trace during
this deadlock state:

Workqueue: devfreq_wq devfreq_monitor
	__switch_to
	__schedule
	schedule
	schedule_timeout
	wait_for_common
	wait_for_completion
	flush_work
	ufshcd_hold
	ufshcd_send_uic_cmd
	ufshcd_dme_get_attr
	ufs_qcom_set_dme_vs_core_clk_ctrl_clear_div
	ufs_qcom_clk_scale_notify
	ufshcd_scale_clks
	ufshcd_devfreq_target
	update_devfreq
	devfreq_monitor
	process_one_work
	worker_thread
	kthread
	ret_from_fork

Workqueue: events ufshcd_gate_work
	__switch_to
	__schedule
	schedule
	schedule_preempt_disabled
	__mutex_lock_slowpath
	mutex_lock
	devfreq_monitor_suspend
	devfreq_simple_ondemand_handler
	devfreq_suspend_device
	ufshcd_gate_work
	process_one_work
	worker_thread
	kthread
	ret_from_fork

Workqueue: events ufshcd_ungate_work
	__switch_to
	__schedule
	schedule
	schedule_timeout
	wait_for_common
	wait_for_completion
	flush_work
	__cancel_work_timer
	cancel_delayed_work_sync
	ufshcd_ungate_work
	process_one_work
	worker_thread
	kthread
	ret_from_fork

This change fixes this deadlock by doing this in devfreq work (devfreq_wq):
Try cancelling clock gating work. If we are able to cancel gating work
or it wasn't scheduled, hold the clock reference count until scaling is
in progress. If gate work is already running in parallel, let's skip
the frequecy scaling at this time and it will be retried once next scaling
window expires.

Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 571a2f6..77700ee 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -6323,15 +6323,47 @@ static int ufshcd_devfreq_target(struct device *dev,
 {
 	int err = 0;
 	struct ufs_hba *hba = dev_get_drvdata(dev);
+	bool release_clk_hold = false;
+	unsigned long irq_flags;
 
 	if (!ufshcd_is_clkscaling_enabled(hba))
 		return -EINVAL;
 
+	spin_lock_irqsave(hba->host->host_lock, irq_flags);
+	if (ufshcd_eh_in_progress(hba)) {
+		spin_unlock_irqrestore(hba->host->host_lock, irq_flags);
+		return 0;
+	}
+
+	if (ufshcd_is_clkgating_allowed(hba) &&
+	    (hba->clk_gating.state != CLKS_ON)) {
+		if (cancel_delayed_work(&hba->clk_gating.gate_work)) {
+			/* hold the vote until the scaling work is completed */
+			hba->clk_gating.active_reqs++;
+			release_clk_hold = true;
+			hba->clk_gating.state = CLKS_ON;
+		} else {
+			/*
+			 * Clock gating work seems to be running in parallel
+			 * hence skip scaling work to avoid deadlock between
+			 * current scaling work and gating work.
+			 */
+			spin_unlock_irqrestore(hba->host->host_lock, irq_flags);
+			return 0;
+		}
+	}
+	spin_unlock_irqrestore(hba->host->host_lock, irq_flags);
+
 	if (*freq == UINT_MAX)
 		err = ufshcd_scale_clks(hba, true);
 	else if (*freq == 0)
 		err = ufshcd_scale_clks(hba, false);
 
+	spin_lock_irqsave(hba->host->host_lock, irq_flags);
+	if (release_clk_hold)
+		__ufshcd_release(hba);
+	spin_unlock_irqrestore(hba->host->host_lock, irq_flags);
+
 	return err;
 }