diff mbox series

[RFC] target: tcmu: clean the nl_cmd of the udev when nl send fails

Message ID 20190802103830.8881-1-lizhongfs@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] target: tcmu: clean the nl_cmd of the udev when nl send fails | expand

Commit Message

Li Zhong Aug. 2, 2019, 10:38 a.m. UTC
If the userspace process crashes while we send the nl msg, it is possible
that the cmd in curr_nl_cmd of tcmu_dev never gets reset to 0, and
and returns busy for other commands after the userspace process is
restartd.

More details below:

/backstores/user:file/file> set attribute dev_size=2048
Cannot set attribute dev_size: [Errno 3] No such process
/backstores/user:file/file> set attribute dev_size=2048
Cannot set attribute dev_size: [Errno 16] Device or resource busy

with following kernel messages:
[173605.747169] Unable to reconfigure device
[173616.686674] tcmu daemon: command reply support 1.
[173623.866978] netlink cmd 3 already executing on file
[173623.866984] Unable to reconfigure device

Also, it is not safe to leave the nl_cmd in the list, and not get
deleted.

This patch removes the nl_cmd from the list, and clear its data if
it is not sent successfully.

Signed-off-by: Li Zhong <lizhongfs@gmail.com>
---
 drivers/target/target_core_user.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Comments

Mike Christie Aug. 2, 2019, 4:58 p.m. UTC | #1
On 08/02/2019 05:38 AM, Li Zhong wrote:
> If the userspace process crashes while we send the nl msg, it is possible
> that the cmd in curr_nl_cmd of tcmu_dev never gets reset to 0, and
> and returns busy for other commands after the userspace process is
> restartd.
> 
> More details below:
> 
> /backstores/user:file/file> set attribute dev_size=2048
> Cannot set attribute dev_size: [Errno 3] No such process
> /backstores/user:file/file> set attribute dev_size=2048
> Cannot set attribute dev_size: [Errno 16] Device or resource busy
> 
> with following kernel messages:
> [173605.747169] Unable to reconfigure device
> [173616.686674] tcmu daemon: command reply support 1.
> [173623.866978] netlink cmd 3 already executing on file
> [173623.866984] Unable to reconfigure device
> 
> Also, it is not safe to leave the nl_cmd in the list, and not get
> deleted.
> 
> This patch removes the nl_cmd from the list, and clear its data if
> it is not sent successfully.
> 
> Signed-off-by: Li Zhong <lizhongfs@gmail.com>
> ---
>  drivers/target/target_core_user.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
> index 04eda111920e..4ae3103e204c 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -1708,6 +1708,24 @@ static int tcmu_init_genl_cmd_reply(struct tcmu_dev *udev, int cmd)
>  	return 0;
>  }
>  
> +static void tcmu_destroy_genl_cmd_reply(struct tcmu_dev *udev)
> +{
> +	struct tcmu_nl_cmd *nl_cmd = &udev->curr_nl_cmd;
> +
> +	if (!tcmu_kern_cmd_reply_supported)
> +		return;
> +
> +	if (udev->nl_reply_supported <= 0)
> +		return;
> +
> +	mutex_lock(&tcmu_nl_cmd_mutex);
> +
> +	list_del(&nl_cmd->nl_list);
> +	memset(nl_cmd, 0, sizeof(*nl_cmd));
> +
> +	mutex_unlock(&tcmu_nl_cmd_mutex);
> +}
> +
>  static int tcmu_wait_genl_cmd_reply(struct tcmu_dev *udev)
>  {
>  	struct tcmu_nl_cmd *nl_cmd = &udev->curr_nl_cmd;
> @@ -1788,6 +1806,9 @@ static int tcmu_netlink_event_send(struct tcmu_dev *udev,
>  	if (ret == 0 ||
>  	   (ret == -ESRCH && cmd == TCMU_CMD_ADDED_DEVICE))
>  		return tcmu_wait_genl_cmd_reply(udev);
> +	else
> +		/* If failure, remove from the list and clear the nl_cmd */

Drop the comment. We know it is in the failure path already and the
function name tells us it cleans up the command.

> +		tcmu_destroy_genl_cmd_reply(udev);
>
Li Zhong Aug. 5, 2019, 12:36 a.m. UTC | #2
On Sat, Aug 3, 2019 at 12:58 AM Mike Christie <mchristi@redhat.com> wrote:
>
> On 08/02/2019 05:38 AM, Li Zhong wrote:
> > If the userspace process crashes while we send the nl msg, it is possible
> > that the cmd in curr_nl_cmd of tcmu_dev never gets reset to 0, and
> > and returns busy for other commands after the userspace process is
> > restartd.
> >
> > More details below:
> >
> > /backstores/user:file/file> set attribute dev_size=2048
> > Cannot set attribute dev_size: [Errno 3] No such process
> > /backstores/user:file/file> set attribute dev_size=2048
> > Cannot set attribute dev_size: [Errno 16] Device or resource busy
> >
> > with following kernel messages:
> > [173605.747169] Unable to reconfigure device
> > [173616.686674] tcmu daemon: command reply support 1.
> > [173623.866978] netlink cmd 3 already executing on file
> > [173623.866984] Unable to reconfigure device
> >
> > Also, it is not safe to leave the nl_cmd in the list, and not get
> > deleted.
> >
> > This patch removes the nl_cmd from the list, and clear its data if
> > it is not sent successfully.
> >
> > Signed-off-by: Li Zhong <lizhongfs@gmail.com>
> > ---
> >  drivers/target/target_core_user.c | 21 +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> >
> > diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
> > index 04eda111920e..4ae3103e204c 100644
> > --- a/drivers/target/target_core_user.c
> > +++ b/drivers/target/target_core_user.c
> > @@ -1708,6 +1708,24 @@ static int tcmu_init_genl_cmd_reply(struct tcmu_dev *udev, int cmd)
> >       return 0;
> >  }
> >
> > +static void tcmu_destroy_genl_cmd_reply(struct tcmu_dev *udev)
> > +{
> > +     struct tcmu_nl_cmd *nl_cmd = &udev->curr_nl_cmd;
> > +
> > +     if (!tcmu_kern_cmd_reply_supported)
> > +             return;
> > +
> > +     if (udev->nl_reply_supported <= 0)
> > +             return;
> > +
> > +     mutex_lock(&tcmu_nl_cmd_mutex);
> > +
> > +     list_del(&nl_cmd->nl_list);
> > +     memset(nl_cmd, 0, sizeof(*nl_cmd));
> > +
> > +     mutex_unlock(&tcmu_nl_cmd_mutex);
> > +}
> > +
> >  static int tcmu_wait_genl_cmd_reply(struct tcmu_dev *udev)
> >  {
> >       struct tcmu_nl_cmd *nl_cmd = &udev->curr_nl_cmd;
> > @@ -1788,6 +1806,9 @@ static int tcmu_netlink_event_send(struct tcmu_dev *udev,
> >       if (ret == 0 ||
> >          (ret == -ESRCH && cmd == TCMU_CMD_ADDED_DEVICE))
> >               return tcmu_wait_genl_cmd_reply(udev);
> > +     else
> > +             /* If failure, remove from the list and clear the nl_cmd */
>
> Drop the comment. We know it is in the failure path already and the
> function name tells us it cleans up the command.

Thank you for the review, Will drop it.

>
> > +             tcmu_destroy_genl_cmd_reply(udev);
> >
>
diff mbox series

Patch

diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
index 04eda111920e..4ae3103e204c 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -1708,6 +1708,24 @@  static int tcmu_init_genl_cmd_reply(struct tcmu_dev *udev, int cmd)
 	return 0;
 }
 
+static void tcmu_destroy_genl_cmd_reply(struct tcmu_dev *udev)
+{
+	struct tcmu_nl_cmd *nl_cmd = &udev->curr_nl_cmd;
+
+	if (!tcmu_kern_cmd_reply_supported)
+		return;
+
+	if (udev->nl_reply_supported <= 0)
+		return;
+
+	mutex_lock(&tcmu_nl_cmd_mutex);
+
+	list_del(&nl_cmd->nl_list);
+	memset(nl_cmd, 0, sizeof(*nl_cmd));
+
+	mutex_unlock(&tcmu_nl_cmd_mutex);
+}
+
 static int tcmu_wait_genl_cmd_reply(struct tcmu_dev *udev)
 {
 	struct tcmu_nl_cmd *nl_cmd = &udev->curr_nl_cmd;
@@ -1788,6 +1806,9 @@  static int tcmu_netlink_event_send(struct tcmu_dev *udev,
 	if (ret == 0 ||
 	   (ret == -ESRCH && cmd == TCMU_CMD_ADDED_DEVICE))
 		return tcmu_wait_genl_cmd_reply(udev);
+	else
+		/* If failure, remove from the list and clear the nl_cmd */
+		tcmu_destroy_genl_cmd_reply(udev);
 
 	return ret;
 }