mmc: core: Set card as removed in mmc_remove_card()

Message ID	1523431471-204238-1-git-send-email-shawn.lin@rock-chips.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-mmc-owner@kernel.org> From: Shawn Lin <shawn.lin@rock-chips.com> To: Ulf Hansson <ulf.hansson@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com>, linux-mmc@vger.kernel.org, Shawn Lin <shawn.lin@rock-chips.com> Subject: [PATCH] mmc: core: Set card as removed in mmc_remove_card() Date: Wed, 11 Apr 2018 15:24:31 +0800 Message-Id: <1523431471-204238-1-git-send-email-shawn.lin@rock-chips.com> Sender: linux-mmc-owner@vger.kernel.org Precedence: bulk

Shawn Lin April 11, 2018, 7:24 a.m. UTC

A simply continueous background I/O could 100% make the system stuck for
a long time in my system when a unbind for the controller driver happens
simultaneously. See:

dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &&
echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind

The reason is all pending requests wait for timeout one by one, but never
propagates BLK_STS_IOERR in the first place when kicked from the queue.
Set the card as removed immediately in mmc_remove_card() to solve it.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
---

 drivers/mmc/core/bus.c | 1 +
 1 file changed, 1 insertion(+)

Adrian Hunter April 11, 2018, 8:46 a.m. UTC | #1

On 11/04/18 10:24, Shawn Lin wrote:
> A simply continueous background I/O could 100% make the system stuck for
> a long time in my system when a unbind for the controller driver happens
> simultaneously. See:
> 
> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &&
> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
> 
> The reason is all pending requests wait for timeout one by one, but never
> propagates BLK_STS_IOERR in the first place when kicked from the queue.
> Set the card as removed immediately in mmc_remove_card() to solve it.
> 
> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> ---
> 
>  drivers/mmc/core/bus.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
> index fc92c6c..c6c5dfe 100644
> --- a/drivers/mmc/core/bus.c
> +++ b/drivers/mmc/core/bus.c
> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card)
>  			pr_info("%s: card %04x removed\n",
>  				mmc_hostname(card->host), card->rca);
>  		}
> +		mmc_card_set_removed(card);

Pedantically we should not call mmc_card_set_removed() if we have not
claimed the host.  Of course we can't claim the host because the block
driver already has it, but I am not sure this is the right place to do this.
 My first question is how come the I/O times out if the card is still
present i.e. you are only unbinding the host controller, so you should
remove the card while the host controller and card are still functional?

>  		device_del(&card->dev);
>  		of_node_put(card->dev.of_node);
>  	}
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shawn Lin April 11, 2018, 9:13 a.m. UTC | #2

On 2018/4/11 16:46, Adrian Hunter wrote:
> On 11/04/18 10:24, Shawn Lin wrote:
>> A simply continueous background I/O could 100% make the system stuck for
>> a long time in my system when a unbind for the controller driver happens
>> simultaneously. See:
>>
>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &&
>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>
>> The reason is all pending requests wait for timeout one by one, but never
>> propagates BLK_STS_IOERR in the first place when kicked from the queue.
>> Set the card as removed immediately in mmc_remove_card() to solve it.
>>
>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>> ---
>>
>>   drivers/mmc/core/bus.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
>> index fc92c6c..c6c5dfe 100644
>> --- a/drivers/mmc/core/bus.c
>> +++ b/drivers/mmc/core/bus.c
>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card)
>>   			pr_info("%s: card %04x removed\n",
>>   				mmc_hostname(card->host), card->rca);
>>   		}
>> +		mmc_card_set_removed(card);
> 
> Pedantically we should not call mmc_card_set_removed() if we have not
> claimed the host.  Of course we can't claim the host because the block

yep.

> driver already has it, but I am not sure this is the right place to do this.
>   My first question is how come the I/O times out if the card is still
> present i.e. you are only unbinding the host controller, so you should
> remove the card while the host controller and card are still functional?

The card is still functional for my host(but maybe not if ->remove()
touchs the vmmc or vqmmc parts), but the host controller isn't, as the
->remove() calls.

> 
>>   		device_del(&card->dev);
>>   		of_node_put(card->dev.of_node);
>>   	}
>>
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adrian Hunter April 11, 2018, 10:26 a.m. UTC | #3

On 11/04/18 12:13, Shawn Lin wrote:
> On 2018/4/11 16:46, Adrian Hunter wrote:
>> On 11/04/18 10:24, Shawn Lin wrote:
>>> A simply continueous background I/O could 100% make the system stuck for
>>> a long time in my system when a unbind for the controller driver happens
>>> simultaneously. See:
>>>
>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &&
>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>
>>> The reason is all pending requests wait for timeout one by one, but never
>>> propagates BLK_STS_IOERR in the first place when kicked from the queue.
>>> Set the card as removed immediately in mmc_remove_card() to solve it.
>>>
>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>> ---
>>>
>>>   drivers/mmc/core/bus.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
>>> index fc92c6c..c6c5dfe 100644
>>> --- a/drivers/mmc/core/bus.c
>>> +++ b/drivers/mmc/core/bus.c
>>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card)
>>>               pr_info("%s: card %04x removed\n",
>>>                   mmc_hostname(card->host), card->rca);
>>>           }
>>> +        mmc_card_set_removed(card);
>>
>> Pedantically we should not call mmc_card_set_removed() if we have not
>> claimed the host.  Of course we can't claim the host because the block
> 
> yep.
> 
>> driver already has it, but I am not sure this is the right place to do this.
>>   My first question is how come the I/O times out if the card is still
>> present i.e. you are only unbinding the host controller, so you should
>> remove the card while the host controller and card are still functional?
> 
> The card is still functional for my host(but maybe not if ->remove()
> touchs the vmmc or vqmmc parts), but the host controller isn't, as the
> ->remove() calls.

I don't understand why the host controller isn't functional.  I tried it
with sdhci and it just waits for dd to finish but there are no I/O errors.

Is there a use-case for unbinding quickly?

> 
>>
>>>           device_del(&card->dev);
>>>           of_node_put(card->dev.of_node);
>>>       }
>>>
>>
>>
>>
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shawn Lin April 12, 2018, 12:43 a.m. UTC | #4

On 2018/4/11 18:26, Adrian Hunter wrote:
> On 11/04/18 12:13, Shawn Lin wrote:
>> On 2018/4/11 16:46, Adrian Hunter wrote:
>>> On 11/04/18 10:24, Shawn Lin wrote:
>>>> A simply continueous background I/O could 100% make the system stuck for
>>>> a long time in my system when a unbind for the controller driver happens
>>>> simultaneously. See:
>>>>
>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &&
>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>>
>>>> The reason is all pending requests wait for timeout one by one, but never
>>>> propagates BLK_STS_IOERR in the first place when kicked from the queue.
>>>> Set the card as removed immediately in mmc_remove_card() to solve it.
>>>>
>>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>>> ---
>>>>
>>>>    drivers/mmc/core/bus.c | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
>>>> index fc92c6c..c6c5dfe 100644
>>>> --- a/drivers/mmc/core/bus.c
>>>> +++ b/drivers/mmc/core/bus.c
>>>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card)
>>>>                pr_info("%s: card %04x removed\n",
>>>>                    mmc_hostname(card->host), card->rca);
>>>>            }
>>>> +        mmc_card_set_removed(card);
>>>
>>> Pedantically we should not call mmc_card_set_removed() if we have not
>>> claimed the host.  Of course we can't claim the host because the block
>>
>> yep.
>>
>>> driver already has it, but I am not sure this is the right place to do this.
>>>    My first question is how come the I/O times out if the card is still
>>> present i.e. you are only unbinding the host controller, so you should
>>> remove the card while the host controller and card are still functional?
>>
>> The card is still functional for my host(but maybe not if ->remove()
>> touchs the vmmc or vqmmc parts), but the host controller isn't, as the
>> ->remove() calls.
> 
> I don't understand why the host controller isn't functional.  I tried it

I guess we have misunderstanding about what the "functional" means. What
I want to say is when unbinding the host driver, mmc_remove_host() is
called, so the target host isn't present from the view of core layer,
right?

> with sdhci and it just waits for dd to finish but there are no I/O errors.
> 
> Is there a use-case for unbinding quickly?

Oh, I think I made a mistake in commit msg, the repro steps should be
(1) run dd in background:
dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &
(2) unbind you driver, for instance:
echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind

Then yes, there is no I/O errors and it waits for dd to finish. But
when waiting for dd to finish, console got stuck, and can't respond to
any ctrl+{c,d,z}, even sysrq. The time depends how much I/O it submits,
the above setp(1) make it stuck for 2 miniutes for my system.

> 
>>
>>>
>>>>            device_del(&card->dev);
>>>>            of_node_put(card->dev.of_node);
>>>>        }
>>>>
>>>
>>>
>>>
>>>
>>
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adrian Hunter April 12, 2018, 7:03 a.m. UTC | #5

On 12/04/18 03:43, Shawn Lin wrote:
> On 2018/4/11 18:26, Adrian Hunter wrote:
>> On 11/04/18 12:13, Shawn Lin wrote:
>>> On 2018/4/11 16:46, Adrian Hunter wrote:
>>>> On 11/04/18 10:24, Shawn Lin wrote:
>>>>> A simply continueous background I/O could 100% make the system stuck for
>>>>> a long time in my system when a unbind for the controller driver happens
>>>>> simultaneously. See:
>>>>>
>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &&
>>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>>>
>>>>> The reason is all pending requests wait for timeout one by one, but never
>>>>> propagates BLK_STS_IOERR in the first place when kicked from the queue.
>>>>> Set the card as removed immediately in mmc_remove_card() to solve it.
>>>>>
>>>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>>>> ---
>>>>>
>>>>>    drivers/mmc/core/bus.c | 1 +
>>>>>    1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
>>>>> index fc92c6c..c6c5dfe 100644
>>>>> --- a/drivers/mmc/core/bus.c
>>>>> +++ b/drivers/mmc/core/bus.c
>>>>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card)
>>>>>                pr_info("%s: card %04x removed\n",
>>>>>                    mmc_hostname(card->host), card->rca);
>>>>>            }
>>>>> +        mmc_card_set_removed(card);
>>>>
>>>> Pedantically we should not call mmc_card_set_removed() if we have not
>>>> claimed the host.  Of course we can't claim the host because the block
>>>
>>> yep.
>>>
>>>> driver already has it, but I am not sure this is the right place to do
>>>> this.
>>>>    My first question is how come the I/O times out if the card is still
>>>> present i.e. you are only unbinding the host controller, so you should
>>>> remove the card while the host controller and card are still functional?
>>>
>>> The card is still functional for my host(but maybe not if ->remove()
>>> touchs the vmmc or vqmmc parts), but the host controller isn't, as the
>>> ->remove() calls.
>>
>> I don't understand why the host controller isn't functional.  I tried it
> 
> I guess we have misunderstanding about what the "functional" means. What
> I want to say is when unbinding the host driver, mmc_remove_host() is
> called, so the target host isn't present from the view of core layer,
> right?
> 
>> with sdhci and it just waits for dd to finish but there are no I/O errors.
>>
>> Is there a use-case for unbinding quickly?
> 
> Oh, I think I made a mistake in commit msg, the repro steps should be
> (1) run dd in background:
> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &
> (2) unbind you driver, for instance:
> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
> 
> Then yes, there is no I/O errors and it waits for dd to finish. But
> when waiting for dd to finish, console got stuck, and can't respond to
> any ctrl+{c,d,z}, even sysrq. The time depends how much I/O it submits,
> the above setp(1) make it stuck for 2 miniutes for my system.

Job control on a console might not always work, especially with busybox.
I guess if you run the unbind in the background, then there is not a problem.

Given that unbinding the host controller while it is still in use is a bad
thing, why do we care?
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shawn Lin April 12, 2018, 8:53 a.m. UTC | #6

On 2018/4/12 15:03, Adrian Hunter wrote:
> On 12/04/18 03:43, Shawn Lin wrote:
>> On 2018/4/11 18:26, Adrian Hunter wrote:
>>> On 11/04/18 12:13, Shawn Lin wrote:
>>>> On 2018/4/11 16:46, Adrian Hunter wrote:
>>>>> On 11/04/18 10:24, Shawn Lin wrote:
>>>>>> A simply continueous background I/O could 100% make the system stuck for
>>>>>> a long time in my system when a unbind for the controller driver happens
>>>>>> simultaneously. See:
>>>>>>
>>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &&
>>>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>>>>
>>>>>> The reason is all pending requests wait for timeout one by one, but never
>>>>>> propagates BLK_STS_IOERR in the first place when kicked from the queue.
>>>>>> Set the card as removed immediately in mmc_remove_card() to solve it.
>>>>>>
>>>>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>>>>> ---
>>>>>>
>>>>>>     drivers/mmc/core/bus.c | 1 +
>>>>>>     1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
>>>>>> index fc92c6c..c6c5dfe 100644
>>>>>> --- a/drivers/mmc/core/bus.c
>>>>>> +++ b/drivers/mmc/core/bus.c
>>>>>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card)
>>>>>>                 pr_info("%s: card %04x removed\n",
>>>>>>                     mmc_hostname(card->host), card->rca);
>>>>>>             }
>>>>>> +        mmc_card_set_removed(card);
>>>>>
>>>>> Pedantically we should not call mmc_card_set_removed() if we have not
>>>>> claimed the host.  Of course we can't claim the host because the block
>>>>
>>>> yep.
>>>>
>>>>> driver already has it, but I am not sure this is the right place to do
>>>>> this.
>>>>>     My first question is how come the I/O times out if the card is still
>>>>> present i.e. you are only unbinding the host controller, so you should
>>>>> remove the card while the host controller and card are still functional?
>>>>
>>>> The card is still functional for my host(but maybe not if ->remove()
>>>> touchs the vmmc or vqmmc parts), but the host controller isn't, as the
>>>> ->remove() calls.
>>>
>>> I don't understand why the host controller isn't functional.  I tried it
>>
>> I guess we have misunderstanding about what the "functional" means. What
>> I want to say is when unbinding the host driver, mmc_remove_host() is
>> called, so the target host isn't present from the view of core layer,
>> right?
>>
>>> with sdhci and it just waits for dd to finish but there are no I/O errors.
>>>
>>> Is there a use-case for unbinding quickly?
>>
>> Oh, I think I made a mistake in commit msg, the repro steps should be
>> (1) run dd in background:
>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &
>> (2) unbind you driver, for instance:
>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>
>> Then yes, there is no I/O errors and it waits for dd to finish. But
>> when waiting for dd to finish, console got stuck, and can't respond to
>> any ctrl+{c,d,z}, even sysrq. The time depends how much I/O it submits,
>> the above setp(1) make it stuck for 2 miniutes for my system.
> 
> Job control on a console might not always work, especially with busybox.
> I guess if you run the unbind in the background, then there is not a problem.

The dd case is just a quick reproduce but the real case is the
Android system has many vold(similar to udev) driven partition scan
tools/applications that fire I/Os once sd card mounted. The applications
is busy waiting there if we check background job by "top", which make
the user experience very bad in this way.

> 
> Given that unbinding the host controller while it is still in use is a bad
> thing, why do we care?

I just checked the log and wonder why mmc_mq_queue_rq still allows
request to dispatch?  Maybe it's a bad thing to unbind the controller
when it's used, but that's not forbade, at least it should behave
the same with real physically hot-plug.

Another reason I do care about this is because I usually can't access to
my test devices, but only remotely do job on their console. So if I need
to reset/hot-plug the card, I usually do unbind/bind, but the partition
scan applications + unbind make the console stuck for a longer time
than expected which is very painful for me to wait for finishment. :(


> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adrian Hunter April 12, 2018, 9:19 a.m. UTC | #7

On 12/04/18 11:53, Shawn Lin wrote:
> On 2018/4/12 15:03, Adrian Hunter wrote:
>> On 12/04/18 03:43, Shawn Lin wrote:
>>> On 2018/4/11 18:26, Adrian Hunter wrote:
>>>> On 11/04/18 12:13, Shawn Lin wrote:
>>>>> On 2018/4/11 16:46, Adrian Hunter wrote:
>>>>>> On 11/04/18 10:24, Shawn Lin wrote:
>>>>>>> A simply continueous background I/O could 100% make the system stuck for
>>>>>>> a long time in my system when a unbind for the controller driver happens
>>>>>>> simultaneously. See:
>>>>>>>
>>>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &&
>>>>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>>>>>
>>>>>>> The reason is all pending requests wait for timeout one by one, but
>>>>>>> never
>>>>>>> propagates BLK_STS_IOERR in the first place when kicked from the queue.
>>>>>>> Set the card as removed immediately in mmc_remove_card() to solve it.
>>>>>>>
>>>>>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>>>>>> ---
>>>>>>>
>>>>>>>     drivers/mmc/core/bus.c | 1 +
>>>>>>>     1 file changed, 1 insertion(+)
>>>>>>>
>>>>>>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
>>>>>>> index fc92c6c..c6c5dfe 100644
>>>>>>> --- a/drivers/mmc/core/bus.c
>>>>>>> +++ b/drivers/mmc/core/bus.c
>>>>>>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card)
>>>>>>>                 pr_info("%s: card %04x removed\n",
>>>>>>>                     mmc_hostname(card->host), card->rca);
>>>>>>>             }
>>>>>>> +        mmc_card_set_removed(card);
>>>>>>
>>>>>> Pedantically we should not call mmc_card_set_removed() if we have not
>>>>>> claimed the host.  Of course we can't claim the host because the block
>>>>>
>>>>> yep.
>>>>>
>>>>>> driver already has it, but I am not sure this is the right place to do
>>>>>> this.
>>>>>>     My first question is how come the I/O times out if the card is still
>>>>>> present i.e. you are only unbinding the host controller, so you should
>>>>>> remove the card while the host controller and card are still functional?
>>>>>
>>>>> The card is still functional for my host(but maybe not if ->remove()
>>>>> touchs the vmmc or vqmmc parts), but the host controller isn't, as the
>>>>> ->remove() calls.
>>>>
>>>> I don't understand why the host controller isn't functional.  I tried it
>>>
>>> I guess we have misunderstanding about what the "functional" means. What
>>> I want to say is when unbinding the host driver, mmc_remove_host() is
>>> called, so the target host isn't present from the view of core layer,
>>> right?
>>>
>>>> with sdhci and it just waits for dd to finish but there are no I/O errors.
>>>>
>>>> Is there a use-case for unbinding quickly?
>>>
>>> Oh, I think I made a mistake in commit msg, the repro steps should be
>>> (1) run dd in background:
>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &
>>> (2) unbind you driver, for instance:
>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>
>>> Then yes, there is no I/O errors and it waits for dd to finish. But
>>> when waiting for dd to finish, console got stuck, and can't respond to
>>> any ctrl+{c,d,z}, even sysrq. The time depends how much I/O it submits,
>>> the above setp(1) make it stuck for 2 miniutes for my system.
>>
>> Job control on a console might not always work, especially with busybox.
>> I guess if you run the unbind in the background, then there is not a problem.
> 
> The dd case is just a quick reproduce but the real case is the
> Android system has many vold(similar to udev) driven partition scan
> tools/applications that fire I/Os once sd card mounted. The applications
> is busy waiting there if we check background job by "top", which make
> the user experience very bad in this way.

Is this related to unbinding the host controller?

> 
>>
>> Given that unbinding the host controller while it is still in use is a bad
>> thing, why do we care?
> 
> I just checked the log and wonder why mmc_mq_queue_rq still allows
> request to dispatch?  Maybe it's a bad thing to unbind the controller
> when it's used, but that's not forbade, at least it should behave
> the same with real physically hot-plug.

Should it?  From memory I vaguely recall seeing EXT4 doing a sync before its
partition gets yanked from beneath it.  If that is right it would tend to
indicate that there is not an expectation that I/O should fail immediately.

> Another reason I do care about this is because I usually can't access to
> my test devices, but only remotely do job on their console. So if I need
> to reset/hot-plug the card, I usually do unbind/bind, but the partition
> scan applications + unbind make the console stuck for a longer time
> than expected which is very painful for me to wait for finishment. :(

Why not do the unbind in the background and then kill the partition scan?
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shawn Lin April 13, 2018, 12:53 a.m. UTC | #8

On 2018/4/12 17:19, Adrian Hunter wrote:
> On 12/04/18 11:53, Shawn Lin wrote:
>> On 2018/4/12 15:03, Adrian Hunter wrote:
>>> On 12/04/18 03:43, Shawn Lin wrote:
>>>> On 2018/4/11 18:26, Adrian Hunter wrote:
>>>>> On 11/04/18 12:13, Shawn Lin wrote:
>>>>>> On 2018/4/11 16:46, Adrian Hunter wrote:
>>>>>>> On 11/04/18 10:24, Shawn Lin wrote:
>>>>>>>> A simply continueous background I/O could 100% make the system stuck for
>>>>>>>> a long time in my system when a unbind for the controller driver happens
>>>>>>>> simultaneously. See:
>>>>>>>>
>>>>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &&
>>>>>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>>>>>>
>>>>>>>> The reason is all pending requests wait for timeout one by one, but
>>>>>>>> never
>>>>>>>> propagates BLK_STS_IOERR in the first place when kicked from the queue.
>>>>>>>> Set the card as removed immediately in mmc_remove_card() to solve it.
>>>>>>>>
>>>>>>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>>>>>>> ---
>>>>>>>>
>>>>>>>>      drivers/mmc/core/bus.c | 1 +
>>>>>>>>      1 file changed, 1 insertion(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
>>>>>>>> index fc92c6c..c6c5dfe 100644
>>>>>>>> --- a/drivers/mmc/core/bus.c
>>>>>>>> +++ b/drivers/mmc/core/bus.c
>>>>>>>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card)
>>>>>>>>                  pr_info("%s: card %04x removed\n",
>>>>>>>>                      mmc_hostname(card->host), card->rca);
>>>>>>>>              }
>>>>>>>> +        mmc_card_set_removed(card);
>>>>>>>
>>>>>>> Pedantically we should not call mmc_card_set_removed() if we have not
>>>>>>> claimed the host.  Of course we can't claim the host because the block
>>>>>>
>>>>>> yep.
>>>>>>
>>>>>>> driver already has it, but I am not sure this is the right place to do
>>>>>>> this.
>>>>>>>      My first question is how come the I/O times out if the card is still
>>>>>>> present i.e. you are only unbinding the host controller, so you should
>>>>>>> remove the card while the host controller and card are still functional?
>>>>>>
>>>>>> The card is still functional for my host(but maybe not if ->remove()
>>>>>> touchs the vmmc or vqmmc parts), but the host controller isn't, as the
>>>>>> ->remove() calls.
>>>>>
>>>>> I don't understand why the host controller isn't functional.  I tried it
>>>>
>>>> I guess we have misunderstanding about what the "functional" means. What
>>>> I want to say is when unbinding the host driver, mmc_remove_host() is
>>>> called, so the target host isn't present from the view of core layer,
>>>> right?
>>>>
>>>>> with sdhci and it just waits for dd to finish but there are no I/O errors.
>>>>>
>>>>> Is there a use-case for unbinding quickly?
>>>>
>>>> Oh, I think I made a mistake in commit msg, the repro steps should be
>>>> (1) run dd in background:
>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &
>>>> (2) unbind you driver, for instance:
>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>>
>>>> Then yes, there is no I/O errors and it waits for dd to finish. But
>>>> when waiting for dd to finish, console got stuck, and can't respond to
>>>> any ctrl+{c,d,z}, even sysrq. The time depends how much I/O it submits,
>>>> the above setp(1) make it stuck for 2 miniutes for my system.
>>>
>>> Job control on a console might not always work, especially with busybox.
>>> I guess if you run the unbind in the background, then there is not a problem.
>>
>> The dd case is just a quick reproduce but the real case is the
>> Android system has many vold(similar to udev) driven partition scan
>> tools/applications that fire I/Os once sd card mounted. The applications
>> is busy waiting there if we check background job by "top", which make
>> the user experience very bad in this way.
> 
> Is this related to unbinding the host controller?
> 
>>
>>>
>>> Given that unbinding the host controller while it is still in use is a bad
>>> thing, why do we care?
>>
>> I just checked the log and wonder why mmc_mq_queue_rq still allows
>> request to dispatch?  Maybe it's a bad thing to unbind the controller
>> when it's used, but that's not forbade, at least it should behave
>> the same with real physically hot-plug.
> 
> Should it?  From memory I vaguely recall seeing EXT4 doing a sync before its
> partition gets yanked from beneath it.  If that is right it would tend to
> indicate that there is not an expectation that I/O should fail immediately.

I have little knowledge about EXT4 :),but that happend for cards with
different fs format.

> 
>> Another reason I do care about this is because I usually can't access to
>> my test devices, but only remotely do job on their console. So if I need
>> to reset/hot-plug the card, I usually do unbind/bind, but the partition
>> scan applications + unbind make the console stuck for a longer time
>> than expected which is very painful for me to wait for finishment. :(
> 
> Why not do the unbind in the background and then kill the partition scan?

Yes, I could :). But the conversation is deviating from core issue that
why the MMC_CARD_REMOVED flag is left apart for the card but the card
is removed. That being said, don't you think "mmc1: card 1234 removed"
means mmc_card_removed(card) should be true?

> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adrian Hunter April 13, 2018, 7:03 a.m. UTC | #9

On 13/04/18 03:53, Shawn Lin wrote:
> On 2018/4/12 17:19, Adrian Hunter wrote:
>> On 12/04/18 11:53, Shawn Lin wrote:
>>> On 2018/4/12 15:03, Adrian Hunter wrote:
>>>> On 12/04/18 03:43, Shawn Lin wrote:
>>>>> On 2018/4/11 18:26, Adrian Hunter wrote:
>>>>>> On 11/04/18 12:13, Shawn Lin wrote:
>>>>>>> On 2018/4/11 16:46, Adrian Hunter wrote:
>>>>>>>> On 11/04/18 10:24, Shawn Lin wrote:
>>>>>>>>> A simply continueous background I/O could 100% make the system
>>>>>>>>> stuck for
>>>>>>>>> a long time in my system when a unbind for the controller driver
>>>>>>>>> happens
>>>>>>>>> simultaneously. See:
>>>>>>>>>
>>>>>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &&
>>>>>>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>>>>>>>
>>>>>>>>> The reason is all pending requests wait for timeout one by one, but
>>>>>>>>> never
>>>>>>>>> propagates BLK_STS_IOERR in the first place when kicked from the
>>>>>>>>> queue.
>>>>>>>>> Set the card as removed immediately in mmc_remove_card() to solve it.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>>      drivers/mmc/core/bus.c | 1 +
>>>>>>>>>      1 file changed, 1 insertion(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
>>>>>>>>> index fc92c6c..c6c5dfe 100644
>>>>>>>>> --- a/drivers/mmc/core/bus.c
>>>>>>>>> +++ b/drivers/mmc/core/bus.c
>>>>>>>>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card)
>>>>>>>>>                  pr_info("%s: card %04x removed\n",
>>>>>>>>>                      mmc_hostname(card->host), card->rca);
>>>>>>>>>              }
>>>>>>>>> +        mmc_card_set_removed(card);
>>>>>>>>
>>>>>>>> Pedantically we should not call mmc_card_set_removed() if we have not
>>>>>>>> claimed the host.  Of course we can't claim the host because the block
>>>>>>>
>>>>>>> yep.
>>>>>>>
>>>>>>>> driver already has it, but I am not sure this is the right place to do
>>>>>>>> this.
>>>>>>>>      My first question is how come the I/O times out if the card is
>>>>>>>> still
>>>>>>>> present i.e. you are only unbinding the host controller, so you should
>>>>>>>> remove the card while the host controller and card are still
>>>>>>>> functional?
>>>>>>>
>>>>>>> The card is still functional for my host(but maybe not if ->remove()
>>>>>>> touchs the vmmc or vqmmc parts), but the host controller isn't, as the
>>>>>>> ->remove() calls.
>>>>>>
>>>>>> I don't understand why the host controller isn't functional.  I tried it
>>>>>
>>>>> I guess we have misunderstanding about what the "functional" means. What
>>>>> I want to say is when unbinding the host driver, mmc_remove_host() is
>>>>> called, so the target host isn't present from the view of core layer,
>>>>> right?
>>>>>
>>>>>> with sdhci and it just waits for dd to finish but there are no I/O
>>>>>> errors.
>>>>>>
>>>>>> Is there a use-case for unbinding quickly?
>>>>>
>>>>> Oh, I think I made a mistake in commit msg, the repro steps should be
>>>>> (1) run dd in background:
>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &
>>>>> (2) unbind you driver, for instance:
>>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>>>
>>>>> Then yes, there is no I/O errors and it waits for dd to finish. But
>>>>> when waiting for dd to finish, console got stuck, and can't respond to
>>>>> any ctrl+{c,d,z}, even sysrq. The time depends how much I/O it submits,
>>>>> the above setp(1) make it stuck for 2 miniutes for my system.
>>>>
>>>> Job control on a console might not always work, especially with busybox.
>>>> I guess if you run the unbind in the background, then there is not a
>>>> problem.
>>>
>>> The dd case is just a quick reproduce but the real case is the
>>> Android system has many vold(similar to udev) driven partition scan
>>> tools/applications that fire I/Os once sd card mounted. The applications
>>> is busy waiting there if we check background job by "top", which make
>>> the user experience very bad in this way.
>>
>> Is this related to unbinding the host controller?
>>
>>>
>>>>
>>>> Given that unbinding the host controller while it is still in use is a bad
>>>> thing, why do we care?
>>>
>>> I just checked the log and wonder why mmc_mq_queue_rq still allows
>>> request to dispatch?  Maybe it's a bad thing to unbind the controller
>>> when it's used, but that's not forbade, at least it should behave
>>> the same with real physically hot-plug.
>>
>> Should it?  From memory I vaguely recall seeing EXT4 doing a sync before its
>> partition gets yanked from beneath it.  If that is right it would tend to
>> indicate that there is not an expectation that I/O should fail immediately.
> 
> I have little knowledge about EXT4 :),but that happend for cards with
> different fs format.
> 
>>
>>> Another reason I do care about this is because I usually can't access to
>>> my test devices, but only remotely do job on their console. So if I need
>>> to reset/hot-plug the card, I usually do unbind/bind, but the partition
>>> scan applications + unbind make the console stuck for a longer time
>>> than expected which is very painful for me to wait for finishment. :(
>>
>> Why not do the unbind in the background and then kill the partition scan?
> 
> Yes, I could :). But the conversation is deviating from core issue that
> why the MMC_CARD_REMOVED flag is left apart for the card but the card
> is removed. That being said, don't you think "mmc1: card 1234 removed"
> means mmc_card_removed(card) should be true?

No.  It means the message is misleading for the unbind case.

If you want a way to mark the card as removed, couldn't you add something to
debugfs.
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shawn Lin April 13, 2018, 7:21 a.m. UTC | #10

On 2018/4/13 15:03, Adrian Hunter wrote:
> On 13/04/18 03:53, Shawn Lin wrote:
>> On 2018/4/12 17:19, Adrian Hunter wrote:
>>> On 12/04/18 11:53, Shawn Lin wrote:
>>>> On 2018/4/12 15:03, Adrian Hunter wrote:
>>>>> On 12/04/18 03:43, Shawn Lin wrote:
>>>>>> On 2018/4/11 18:26, Adrian Hunter wrote:
>>>>>>> On 11/04/18 12:13, Shawn Lin wrote:
>>>>>>>> On 2018/4/11 16:46, Adrian Hunter wrote:
>>>>>>>>> On 11/04/18 10:24, Shawn Lin wrote:
>>>>>>>>>> A simply continueous background I/O could 100% make the system
>>>>>>>>>> stuck for
>>>>>>>>>> a long time in my system when a unbind for the controller driver
>>>>>>>>>> happens
>>>>>>>>>> simultaneously. See:
>>>>>>>>>>
>>>>>>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &&
>>>>>>>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>>>>>>>>
>>>>>>>>>> The reason is all pending requests wait for timeout one by one, but
>>>>>>>>>> never
>>>>>>>>>> propagates BLK_STS_IOERR in the first place when kicked from the
>>>>>>>>>> queue.
>>>>>>>>>> Set the card as removed immediately in mmc_remove_card() to solve it.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>>       drivers/mmc/core/bus.c | 1 +
>>>>>>>>>>       1 file changed, 1 insertion(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
>>>>>>>>>> index fc92c6c..c6c5dfe 100644
>>>>>>>>>> --- a/drivers/mmc/core/bus.c
>>>>>>>>>> +++ b/drivers/mmc/core/bus.c
>>>>>>>>>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card)
>>>>>>>>>>                   pr_info("%s: card %04x removed\n",
>>>>>>>>>>                       mmc_hostname(card->host), card->rca);
>>>>>>>>>>               }
>>>>>>>>>> +        mmc_card_set_removed(card);
>>>>>>>>>
>>>>>>>>> Pedantically we should not call mmc_card_set_removed() if we have not
>>>>>>>>> claimed the host.  Of course we can't claim the host because the block
>>>>>>>>
>>>>>>>> yep.
>>>>>>>>
>>>>>>>>> driver already has it, but I am not sure this is the right place to do
>>>>>>>>> this.
>>>>>>>>>       My first question is how come the I/O times out if the card is
>>>>>>>>> still
>>>>>>>>> present i.e. you are only unbinding the host controller, so you should
>>>>>>>>> remove the card while the host controller and card are still
>>>>>>>>> functional?
>>>>>>>>
>>>>>>>> The card is still functional for my host(but maybe not if ->remove()
>>>>>>>> touchs the vmmc or vqmmc parts), but the host controller isn't, as the
>>>>>>>> ->remove() calls.
>>>>>>>
>>>>>>> I don't understand why the host controller isn't functional.  I tried it
>>>>>>
>>>>>> I guess we have misunderstanding about what the "functional" means. What
>>>>>> I want to say is when unbinding the host driver, mmc_remove_host() is
>>>>>> called, so the target host isn't present from the view of core layer,
>>>>>> right?
>>>>>>
>>>>>>> with sdhci and it just waits for dd to finish but there are no I/O
>>>>>>> errors.
>>>>>>>
>>>>>>> Is there a use-case for unbinding quickly?
>>>>>>
>>>>>> Oh, I think I made a mistake in commit msg, the repro steps should be
>>>>>> (1) run dd in background:
>>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &
>>>>>> (2) unbind you driver, for instance:
>>>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>>>>
>>>>>> Then yes, there is no I/O errors and it waits for dd to finish. But
>>>>>> when waiting for dd to finish, console got stuck, and can't respond to
>>>>>> any ctrl+{c,d,z}, even sysrq. The time depends how much I/O it submits,
>>>>>> the above setp(1) make it stuck for 2 miniutes for my system.
>>>>>
>>>>> Job control on a console might not always work, especially with busybox.
>>>>> I guess if you run the unbind in the background, then there is not a
>>>>> problem.
>>>>
>>>> The dd case is just a quick reproduce but the real case is the
>>>> Android system has many vold(similar to udev) driven partition scan
>>>> tools/applications that fire I/Os once sd card mounted. The applications
>>>> is busy waiting there if we check background job by "top", which make
>>>> the user experience very bad in this way.
>>>
>>> Is this related to unbinding the host controller?
>>>
>>>>
>>>>>
>>>>> Given that unbinding the host controller while it is still in use is a bad
>>>>> thing, why do we care?
>>>>
>>>> I just checked the log and wonder why mmc_mq_queue_rq still allows
>>>> request to dispatch?  Maybe it's a bad thing to unbind the controller
>>>> when it's used, but that's not forbade, at least it should behave
>>>> the same with real physically hot-plug.
>>>
>>> Should it?  From memory I vaguely recall seeing EXT4 doing a sync before its
>>> partition gets yanked from beneath it.  If that is right it would tend to
>>> indicate that there is not an expectation that I/O should fail immediately.
>>
>> I have little knowledge about EXT4 :),but that happend for cards with
>> different fs format.
>>
>>>
>>>> Another reason I do care about this is because I usually can't access to
>>>> my test devices, but only remotely do job on their console. So if I need
>>>> to reset/hot-plug the card, I usually do unbind/bind, but the partition
>>>> scan applications + unbind make the console stuck for a longer time
>>>> than expected which is very painful for me to wait for finishment. :(
>>>
>>> Why not do the unbind in the background and then kill the partition scan?
>>
>> Yes, I could :). But the conversation is deviating from core issue that
>> why the MMC_CARD_REMOVED flag is left apart for the card but the card
>> is removed. That being said, don't you think "mmc1: card 1234 removed"
>> means mmc_card_removed(card) should be true?
> 
> No.  It means the message is misleading for the unbind case.

Hmm.. I don't think so. The unbind case means the controller/slot has
gone, so system can never access to the card any more. I admit it's
physcially present, but not logically. So it looks same to me that
I have a card in slot but the kernel lacks driver to support the card.

> 
> If you want a way to mark the card as removed, couldn't you add something to
> debugfs.

It's irrelevant, as no in-kernel scenario check the debugfs. They only
care about status propagated from MMC block layer. If the card could
never be access any more, you shouldn't still allow requests pass
through the first check of mmc_card_removed in mmc_mq_queue_rq. Do you
agree? :)

> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adrian Hunter April 13, 2018, 7:55 a.m. UTC | #11

On 13/04/18 10:21, Shawn Lin wrote:
> On 2018/4/13 15:03, Adrian Hunter wrote:
>> On 13/04/18 03:53, Shawn Lin wrote:
>>> On 2018/4/12 17:19, Adrian Hunter wrote:
>>>> On 12/04/18 11:53, Shawn Lin wrote:
>>>>> On 2018/4/12 15:03, Adrian Hunter wrote:
>>>>>> On 12/04/18 03:43, Shawn Lin wrote:
>>>>>>> On 2018/4/11 18:26, Adrian Hunter wrote:
>>>>>>>> On 11/04/18 12:13, Shawn Lin wrote:
>>>>>>>>> On 2018/4/11 16:46, Adrian Hunter wrote:
>>>>>>>>>> On 11/04/18 10:24, Shawn Lin wrote:
>>>>>>>>>>> A simply continueous background I/O could 100% make the system
>>>>>>>>>>> stuck for
>>>>>>>>>>> a long time in my system when a unbind for the controller driver
>>>>>>>>>>> happens
>>>>>>>>>>> simultaneously. See:
>>>>>>>>>>>
>>>>>>>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &&
>>>>>>>>>>> echo fe320000.dwmmc >
>>>>>>>>>>> /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>>>>>>>>>
>>>>>>>>>>> The reason is all pending requests wait for timeout one by one, but
>>>>>>>>>>> never
>>>>>>>>>>> propagates BLK_STS_IOERR in the first place when kicked from the
>>>>>>>>>>> queue.
>>>>>>>>>>> Set the card as removed immediately in mmc_remove_card() to solve
>>>>>>>>>>> it.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>>>>>>>>>> ---
>>>>>>>>>>>
>>>>>>>>>>>       drivers/mmc/core/bus.c | 1 +
>>>>>>>>>>>       1 file changed, 1 insertion(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
>>>>>>>>>>> index fc92c6c..c6c5dfe 100644
>>>>>>>>>>> --- a/drivers/mmc/core/bus.c
>>>>>>>>>>> +++ b/drivers/mmc/core/bus.c
>>>>>>>>>>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card)
>>>>>>>>>>>                   pr_info("%s: card %04x removed\n",
>>>>>>>>>>>                       mmc_hostname(card->host), card->rca);
>>>>>>>>>>>               }
>>>>>>>>>>> +        mmc_card_set_removed(card);
>>>>>>>>>>
>>>>>>>>>> Pedantically we should not call mmc_card_set_removed() if we have not
>>>>>>>>>> claimed the host.  Of course we can't claim the host because the
>>>>>>>>>> block
>>>>>>>>>
>>>>>>>>> yep.
>>>>>>>>>
>>>>>>>>>> driver already has it, but I am not sure this is the right place
>>>>>>>>>> to do
>>>>>>>>>> this.
>>>>>>>>>>       My first question is how come the I/O times out if the card is
>>>>>>>>>> still
>>>>>>>>>> present i.e. you are only unbinding the host controller, so you
>>>>>>>>>> should
>>>>>>>>>> remove the card while the host controller and card are still
>>>>>>>>>> functional?
>>>>>>>>>
>>>>>>>>> The card is still functional for my host(but maybe not if ->remove()
>>>>>>>>> touchs the vmmc or vqmmc parts), but the host controller isn't, as the
>>>>>>>>> ->remove() calls.
>>>>>>>>
>>>>>>>> I don't understand why the host controller isn't functional.  I
>>>>>>>> tried it
>>>>>>>
>>>>>>> I guess we have misunderstanding about what the "functional" means. What
>>>>>>> I want to say is when unbinding the host driver, mmc_remove_host() is
>>>>>>> called, so the target host isn't present from the view of core layer,
>>>>>>> right?
>>>>>>>
>>>>>>>> with sdhci and it just waits for dd to finish but there are no I/O
>>>>>>>> errors.
>>>>>>>>
>>>>>>>> Is there a use-case for unbinding quickly?
>>>>>>>
>>>>>>> Oh, I think I made a mistake in commit msg, the repro steps should be
>>>>>>> (1) run dd in background:
>>>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &
>>>>>>> (2) unbind you driver, for instance:
>>>>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>>>>>
>>>>>>> Then yes, there is no I/O errors and it waits for dd to finish. But
>>>>>>> when waiting for dd to finish, console got stuck, and can't respond to
>>>>>>> any ctrl+{c,d,z}, even sysrq. The time depends how much I/O it submits,
>>>>>>> the above setp(1) make it stuck for 2 miniutes for my system.
>>>>>>
>>>>>> Job control on a console might not always work, especially with busybox.
>>>>>> I guess if you run the unbind in the background, then there is not a
>>>>>> problem.
>>>>>
>>>>> The dd case is just a quick reproduce but the real case is the
>>>>> Android system has many vold(similar to udev) driven partition scan
>>>>> tools/applications that fire I/Os once sd card mounted. The applications
>>>>> is busy waiting there if we check background job by "top", which make
>>>>> the user experience very bad in this way.
>>>>
>>>> Is this related to unbinding the host controller?
>>>>
>>>>>
>>>>>>
>>>>>> Given that unbinding the host controller while it is still in use is a
>>>>>> bad
>>>>>> thing, why do we care?
>>>>>
>>>>> I just checked the log and wonder why mmc_mq_queue_rq still allows
>>>>> request to dispatch?  Maybe it's a bad thing to unbind the controller
>>>>> when it's used, but that's not forbade, at least it should behave
>>>>> the same with real physically hot-plug.
>>>>
>>>> Should it?  From memory I vaguely recall seeing EXT4 doing a sync before
>>>> its
>>>> partition gets yanked from beneath it.  If that is right it would tend to
>>>> indicate that there is not an expectation that I/O should fail immediately.
>>>
>>> I have little knowledge about EXT4 :),but that happend for cards with
>>> different fs format.
>>>
>>>>
>>>>> Another reason I do care about this is because I usually can't access to
>>>>> my test devices, but only remotely do job on their console. So if I need
>>>>> to reset/hot-plug the card, I usually do unbind/bind, but the partition
>>>>> scan applications + unbind make the console stuck for a longer time
>>>>> than expected which is very painful for me to wait for finishment. :(
>>>>
>>>> Why not do the unbind in the background and then kill the partition scan?
>>>
>>> Yes, I could :). But the conversation is deviating from core issue that
>>> why the MMC_CARD_REMOVED flag is left apart for the card but the card
>>> is removed. That being said, don't you think "mmc1: card 1234 removed"
>>> means mmc_card_removed(card) should be true?
>>
>> No.  It means the message is misleading for the unbind case.
> 
> Hmm.. I don't think so. The unbind case means the controller/slot has
> gone, so system can never access to the card any more. I admit it's
> physcially present, but not logically. So it looks same to me that
> I have a card in slot but the kernel lacks driver to support the card.

No we always remove children (card) before the parent (host controller).

> 
>>
>> If you want a way to mark the card as removed, couldn't you add something to
>> debugfs.
> 
> It's irrelevant, as no in-kernel scenario check the debugfs. They only

I am not sure that is true.  For a removable card, setting the card as
removed and scheduling a rescan would probably remove it.

> care about status propagated from MMC block layer. If the card could
> never be access any more, you shouldn't still allow requests pass
> through the first check of mmc_card_removed in mmc_mq_queue_rq. Do you
> agree? :)

No, I think we have established that the card is fully functional.
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shawn Lin April 24, 2018, 9:13 a.m. UTC | #12

On 2018/4/13 15:55, Adrian Hunter wrote:
> On 13/04/18 10:21, Shawn Lin wrote:

> 
> No we always remove children (card) before the parent (host controller).
> 

Ok, I agree with this. One thing now come to my mind is could we allow
mmc_remove_host() to help remove children in advance or is it okay to
add a new helper for host drivers to remove children in their
->remove()?

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adrian Hunter April 26, 2018, 10:09 a.m. UTC | #13

On 24/04/18 12:13, Shawn Lin wrote:
> 
> On 2018/4/13 15:55, Adrian Hunter wrote:
>> On 13/04/18 10:21, Shawn Lin wrote:
> 
>>
>> No we always remove children (card) before the parent (host controller).
>>
> 
> Ok, I agree with this. One thing now come to my mind is could we allow
> mmc_remove_host() to help remove children in advance or is it okay to
> add a new helper for host drivers to remove children in their
> ->remove()?

Lets' step back a bit.  At what point does the remove end up waiting?  Does
it get to mmc_blk_remove()?


--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shawn Lin April 26, 2018, 11:28 a.m. UTC | #14

On 2018/4/26 18:09, Adrian Hunter wrote:
> On 24/04/18 12:13, Shawn Lin wrote:
>>
>> On 2018/4/13 15:55, Adrian Hunter wrote:
>>> On 13/04/18 10:21, Shawn Lin wrote:
>>
>>>
>>> No we always remove children (card) before the parent (host controller).
>>>
>>
>> Ok, I agree with this. One thing now come to my mind is could we allow
>> mmc_remove_host() to help remove children in advance or is it okay to
>> add a new helper for host drivers to remove children in their
>> ->remove()?
> 
> Lets' step back a bit.  At what point does the remove end up waiting?  Does
> it get to mmc_blk_remove()?

I guess the remove might get to mmc_blk_remove for sd/(e)MMC cards.

> 
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

mmc: core: Set card as removed in mmc_remove_card()

Commit Message

Comments

Patch