diff mbox series

[v17,08/15] s390/vfio-ap: keep track of active guests

Message ID 20211021152332.70455-9-akrowiak@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series s390/vfio-ap: dynamic configuration support | expand

Commit Message

Anthony Krowiak Oct. 21, 2021, 3:23 p.m. UTC
The vfio_ap device driver registers for notification when the pointer to
the KVM object for a guest is set. Let's store the KVM pointer as well as
the pointer to the mediated device when the KVM pointer is set.

The reason for storing the KVM and mediated device pointers is to
facilitate hot plug/unplug of AP queues for a KVM guest when a queue device
is probed or removed. When a guest's APCB is hot plugged into the guest,
the kvm->lock must be taken prior to taking the matrix_dev->lock, or there
is potential for a lockdep splat (see below). Unfortunately, when a queue
is probed or removed, we have no idea whether it is assigned to a guest or
which KVM object is associated with the guest. If we take the
matrix_dev->lock to determine whether the APQN is assigned to a running
guest then subsequently take the kvm->lock, in certain situations that will
result in a lockdep splat:

* see commit 0cc00c8d4050 ("Fix circular lockdep when setting/clearing
  crypto masks")

* see commit 86956e70761b ("replace open coded locks for
  VFIO_GROUP_NOTIFY_SET_KVM notification")

The reason a lockdep splat can occur has to do with the fact that the
kvm->lock has to be taken before the vcpu->lock; so, for example, when a
secure execution guest is started, you may end up with the following
scenario:

        Interception of PQAP(AQIC) instruction executed on the guest:
        ------------------------------------------------------------
        handle_pqap:                    matrix_dev->lock
        kvm_vcpu_ioctl:                 vcpu_mutex

        Start of secure execution guest:
        -------------------------------
        kvm_s390_cpus_to_pv:            vcpu->mutex
        kvm_arch_vm_ioctl:              kvm->lock

        Queue is unbound from vfio_ap device driver:
        -------------------------------------------
                                        kvm->lock
        vfio_ap_mdev_remove_queue:      matrix_dev->lock

This patch introduces a new ap_guest structure into which the pointers to
the kvm and matrix_mdev can be stored. It also introduces two new fields
in the struct ap_matrix_dev:

struct ap_matrix_dev {
        ...
        struct rw_semaphore guests_lock;
        struct list_head guests;
       ...
}

The 'guests_lock' field is a r/w semaphore to control access to the
'guests' field. The 'guests' field is a list of ap_guest
structures containing the KVM and matrix_mdev pointers for each active
guest. An ap_guest structure will be stored into the list whenever the
vfio_ap device driver is notified that the KVM pointer has been set and
removed when notified that the KVM pointer has been cleared.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     |  2 ++
 drivers/s390/crypto/vfio_ap_ops.c     | 44 +++++++++++++++++++++++++--
 drivers/s390/crypto/vfio_ap_private.h | 10 ++++++
 3 files changed, 53 insertions(+), 3 deletions(-)

Comments

Halil Pasic Dec. 30, 2021, 2:04 a.m. UTC | #1
On Thu, 21 Oct 2021 11:23:25 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The reason a lockdep splat can occur has to do with the fact that the
> kvm->lock has to be taken before the vcpu->lock; so, for example, when a
> secure execution guest is started, you may end up with the following
> scenario:
> 
>         Interception of PQAP(AQIC) instruction executed on the guest:
>         ------------------------------------------------------------
>         handle_pqap:                    matrix_dev->lock                
>         kvm_vcpu_ioctl:                 vcpu_mutex                      
> 
>         Start of secure execution guest:
>         -------------------------------
>         kvm_s390_cpus_to_pv:            vcpu->mutex                     
>         kvm_arch_vm_ioctl:              kvm->lock                    
> 
>         Queue is unbound from vfio_ap device driver:
>         -------------------------------------------
>                                         kvm->lock
>         vfio_ap_mdev_remove_queue:      matrix_dev->lock

The way you describe your scenario is a little ambiguous. It
seems you choose a stack-trace like description, in a sense that for
example for PQAP: first vcpu->mutex is taken and then matrix_dev->lock
but you write the latter first and the former second. I think it is more
usual to describe such stuff a a sequence of event in a sense that
if A precedes B in the text (from the top towards the bottom), then
execution of a A precedes the execution of B in time.

Also you are inconsistent with vcpu_mutex vs vcpu->mutex.

I can't say I understand the need for this yet. I have been starring
at the end result for a while. Let me see if I can come up with an
alternate proposal for some things.

Regards,
Halil
Halil Pasic Dec. 30, 2021, 3:33 a.m. UTC | #2
On Thu, 21 Oct 2021 11:23:25 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The vfio_ap device driver registers for notification when the pointer to
> the KVM object for a guest is set. Let's store the KVM pointer as well as
> the pointer to the mediated device when the KVM pointer is set.

[..]


> struct ap_matrix_dev {
>         ...
>         struct rw_semaphore guests_lock;
>         struct list_head guests;
>        ...
> }
> 
> The 'guests_lock' field is a r/w semaphore to control access to the
> 'guests' field. The 'guests' field is a list of ap_guest
> structures containing the KVM and matrix_mdev pointers for each active
> guest. An ap_guest structure will be stored into the list whenever the
> vfio_ap device driver is notified that the KVM pointer has been set and
> removed when notified that the KVM pointer has been cleared.
> 

Is this about the field or about the list including all the nodes? This
reads lie guests_lock only protects the head element, which makes no
sense to me. Because of how these lists work.

The narrowest scope that could make sense is all the list_head stuff
in the entire list. I.e. one would only need the lock to traverse or
manipulate the list, while the payload would still be subject to
the matrix_dev->lock mutex.

[..]

> +struct ap_guest {
> +	struct kvm *kvm;
> +	struct list_head node;
> +};
> +
>  /**
>   * struct ap_matrix_dev - Contains the data for the matrix device.
>   *
> @@ -39,6 +44,9 @@
>   *		single ap_matrix_mdev device. It's quite coarse but we don't
>   *		expect much contention.
>   * @vfio_ap_drv: the vfio_ap device driver
> + * @guests_lock: r/w semaphore for protecting access to @guests
> + * @guests:	list of guests (struct ap_guest) using AP devices bound to the
> + *		vfio_ap device driver.

Please compare the above. Also if it is only about the access to the
list, then you could drop the lock right after create, and not keep it
till the very end of vfio_ap_mdev_set_kvm(). Right?

In any case I'm skeptical about this whole struct ap_guest business. To
me, it looks like something that just makes things more obscure and
complicated without any real benefit.

Regards,
Halil

>   */
>  struct ap_matrix_dev {
>  	struct device device;
> @@ -47,6 +55,8 @@ struct ap_matrix_dev {
>  	struct list_head mdev_list;
>  	struct mutex lock;
>  	struct ap_driver  *vfio_ap_drv;
> +	struct rw_semaphore guests_lock;
> +	struct list_head guests;
>  };
>  
>  extern struct ap_matrix_dev *matrix_dev;
Anthony Krowiak Jan. 11, 2022, 9:27 p.m. UTC | #3
On 12/29/21 21:04, Halil Pasic wrote:
> On Thu, 21 Oct 2021 11:23:25 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> The reason a lockdep splat can occur has to do with the fact that the
>> kvm->lock has to be taken before the vcpu->lock; so, for example, when a
>> secure execution guest is started, you may end up with the following
>> scenario:
>>
>>          Interception of PQAP(AQIC) instruction executed on the guest:
>>          ------------------------------------------------------------
>>          handle_pqap:                    matrix_dev->lock
>>          kvm_vcpu_ioctl:                 vcpu_mutex
>>
>>          Start of secure execution guest:
>>          -------------------------------
>>          kvm_s390_cpus_to_pv:            vcpu->mutex
>>          kvm_arch_vm_ioctl:              kvm->lock
>>
>>          Queue is unbound from vfio_ap device driver:
>>          -------------------------------------------
>>                                          kvm->lock
>>          vfio_ap_mdev_remove_queue:      matrix_dev->lock
> The way you describe your scenario is a little ambiguous. It
> seems you choose a stack-trace like description, in a sense that for
> example for PQAP: first vcpu->mutex is taken and then matrix_dev->lock
> but you write the latter first and the former second. I think it is more
> usual to describe such stuff a a sequence of event in a sense that
> if A precedes B in the text (from the top towards the bottom), then
> execution of a A precedes the execution of B in time.

I wrote it the way it is displayed in the lockdep splat trace.
I'd be happy to re-arrange it if you'd prefer.

>
> Also you are inconsistent with vcpu_mutex vs vcpu->mutex.
>
> I can't say I understand the need for this yet. I have been starring
> at the end result for a while. Let me see if I can come up with an
> alternate proposal for some things.

Go for it, and may the force be with you.

>
> Regards,
> Halil
>
>
Anthony Krowiak Jan. 11, 2022, 9:58 p.m. UTC | #4
On 12/29/21 22:33, Halil Pasic wrote:
> On Thu, 21 Oct 2021 11:23:25 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> The vfio_ap device driver registers for notification when the pointer to
>> the KVM object for a guest is set. Let's store the KVM pointer as well as
>> the pointer to the mediated device when the KVM pointer is set.
> [..]
>
>
>> struct ap_matrix_dev {
>>          ...
>>          struct rw_semaphore guests_lock;
>>          struct list_head guests;
>>         ...
>> }
>>
>> The 'guests_lock' field is a r/w semaphore to control access to the
>> 'guests' field. The 'guests' field is a list of ap_guest
>> structures containing the KVM and matrix_mdev pointers for each active
>> guest. An ap_guest structure will be stored into the list whenever the
>> vfio_ap device driver is notified that the KVM pointer has been set and
>> removed when notified that the KVM pointer has been cleared.
>>
> Is this about the field or about the list including all the nodes? This
> reads lie guests_lock only protects the head element, which makes no
> sense to me. Because of how these lists work.

It locks the list, I can rewrite the description.

>
> The narrowest scope that could make sense is all the list_head stuff
> in the entire list. I.e. one would only need the lock to traverse or
> manipulate the list, while the payload would still be subject to
> the matrix_dev->lock mutex.

The matrix_dev->guests lock is needed whenever the kvm->lock
is needed because the struct ap_guest object is created and the
struct kvm assigned to it when the kvm pointer is set
(vfio_ap_mdev_set_kvm function). So, in order to access the
ap_guest object and retrieve the kvm pointer, we have to ensure
the ap_guest_object is still available. The fact we can get the
kvm pointer from the ap_matrix_mdev object just makes things
more efficient - i.e., we won't have to traverse the list.

Whenever the kvm->lock and matrix_dev->lock mutexes must
be held, the order is:

     matrix_dev->guests_lock
     matrix_dev->guests->kvm->lock
     matrix_dev->lock

There are times where all three locks are not required; for example,
the handle_pqap and vfio_ap_mdev_probe/remove functions only
require the matrix_dev->lock because it does not need to lock kvm.

>
> [..]
>
>> +struct ap_guest {
>> +	struct kvm *kvm;
>> +	struct list_head node;
>> +};
>> +
>>   /**
>>    * struct ap_matrix_dev - Contains the data for the matrix device.
>>    *
>> @@ -39,6 +44,9 @@
>>    *		single ap_matrix_mdev device. It's quite coarse but we don't
>>    *		expect much contention.
>>    * @vfio_ap_drv: the vfio_ap device driver
>> + * @guests_lock: r/w semaphore for protecting access to @guests
>> + * @guests:	list of guests (struct ap_guest) using AP devices bound to the
>> + *		vfio_ap device driver.
> Please compare the above. Also if it is only about the access to the
> list, then you could drop the lock right after create, and not keep it
> till the very end of vfio_ap_mdev_set_kvm(). Right?

That would be true if it only controlled access to the list, but as I
explained above, that is not its sole purpose.

>
> In any case I'm skeptical about this whole struct ap_guest business. To
> me, it looks like something that just makes things more obscure and
> complicated without any real benefit.

I'm open to other ideas, but you'll have to come up with a way
to take the kvm->lock before the matrix_mdev->lock in the
vfio_ap_mdev_probe_queue and vfio_ap_mdev_remove_queue
functions where we don't have access to the ap_matrix_mdev
object to which the APQN is assigned and has the pointer to the
kvm object.

In order to retrieve the matrix_mdev, we need the matrix_dev->lock.
In order to hot plug/unplug the queue, we need the kvm->lock.
There's your catch-22 that needs to be solved. This design is my
attempt to solve that.

>
> Regards,
> Halil
>
>>    */
>>   struct ap_matrix_dev {
>>   	struct device device;
>> @@ -47,6 +55,8 @@ struct ap_matrix_dev {
>>   	struct list_head mdev_list;
>>   	struct mutex lock;
>>   	struct ap_driver  *vfio_ap_drv;
>> +	struct rw_semaphore guests_lock;
>> +	struct list_head guests;
>>   };
>>   
>>   extern struct ap_matrix_dev *matrix_dev;
Anthony Krowiak Jan. 11, 2022, 10:19 p.m. UTC | #5
On 1/11/22 16:58, Tony Krowiak wrote:
>
>
> On 12/29/21 22:33, Halil Pasic wrote:
>> On Thu, 21 Oct 2021 11:23:25 -0400
>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>
>>> The vfio_ap device driver registers for notification when the 
>>> pointer to
>>> the KVM object for a guest is set. Let's store the KVM pointer as 
>>> well as
>>> the pointer to the mediated device when the KVM pointer is set.
>> [..]
>>
>>
>>> struct ap_matrix_dev {
>>>          ...
>>>          struct rw_semaphore guests_lock;
>>>          struct list_head guests;
>>>         ...
>>> }
>>>
>>> The 'guests_lock' field is a r/w semaphore to control access to the
>>> 'guests' field. The 'guests' field is a list of ap_guest
>>> structures containing the KVM and matrix_mdev pointers for each active
>>> guest. An ap_guest structure will be stored into the list whenever the
>>> vfio_ap device driver is notified that the KVM pointer has been set and
>>> removed when notified that the KVM pointer has been cleared.
>>>
>> Is this about the field or about the list including all the nodes? This
>> reads lie guests_lock only protects the head element, which makes no
>> sense to me. Because of how these lists work.
>
> It locks the list, I can rewrite the description.

Ignore this response and read the answers to your comments below.

>
>
>>
>> The narrowest scope that could make sense is all the list_head stuff
>> in the entire list. I.e. one would only need the lock to traverse or
>> manipulate the list, while the payload would still be subject to
>> the matrix_dev->lock mutex.
>
> The matrix_dev->guests lock is needed whenever the kvm->lock
> is needed because the struct ap_guest object is created and the
> struct kvm assigned to it when the kvm pointer is set
> (vfio_ap_mdev_set_kvm function). So, in order to access the
> ap_guest object and retrieve the kvm pointer, we have to ensure
> the ap_guest_object is still available. The fact we can get the
> kvm pointer from the ap_matrix_mdev object just makes things
> more efficient - i.e., we won't have to traverse the list.
>
> Whenever the kvm->lock and matrix_dev->lock mutexes must
> be held, the order is:
>
>     matrix_dev->guests_lock
>     matrix_dev->guests->kvm->lock
>     matrix_dev->lock
>
> There are times where all three locks are not required; for example,
> the handle_pqap and vfio_ap_mdev_probe/remove functions only
> require the matrix_dev->lock because it does not need to lock kvm.
>
>>
>> [..]
>>
>>> +struct ap_guest {
>>> +    struct kvm *kvm;
>>> +    struct list_head node;
>>> +};
>>> +
>>>   /**
>>>    * struct ap_matrix_dev - Contains the data for the matrix device.
>>>    *
>>> @@ -39,6 +44,9 @@
>>>    *        single ap_matrix_mdev device. It's quite coarse but we 
>>> don't
>>>    *        expect much contention.
>>>    * @vfio_ap_drv: the vfio_ap device driver
>>> + * @guests_lock: r/w semaphore for protecting access to @guests
>>> + * @guests:    list of guests (struct ap_guest) using AP devices 
>>> bound to the
>>> + *        vfio_ap device driver.
>> Please compare the above. Also if it is only about the access to the
>> list, then you could drop the lock right after create, and not keep it
>> till the very end of vfio_ap_mdev_set_kvm(). Right?
>
> That would be true if it only controlled access to the list, but as I
> explained above, that is not its sole purpose.
>
>>
>> In any case I'm skeptical about this whole struct ap_guest business. To
>> me, it looks like something that just makes things more obscure and
>> complicated without any real benefit.
>
> I'm open to other ideas, but you'll have to come up with a way
> to take the kvm->lock before the matrix_mdev->lock in the
> vfio_ap_mdev_probe_queue and vfio_ap_mdev_remove_queue
> functions where we don't have access to the ap_matrix_mdev
> object to which the APQN is assigned and has the pointer to the
> kvm object.
>
> In order to retrieve the matrix_mdev, we need the matrix_dev->lock.
> In order to hot plug/unplug the queue, we need the kvm->lock.
> There's your catch-22 that needs to be solved. This design is my
> attempt to solve that.
>
>>
>> Regards,
>> Halil
>>
>>>    */
>>>   struct ap_matrix_dev {
>>>       struct device device;
>>> @@ -47,6 +55,8 @@ struct ap_matrix_dev {
>>>       struct list_head mdev_list;
>>>       struct mutex lock;
>>>       struct ap_driver  *vfio_ap_drv;
>>> +    struct rw_semaphore guests_lock;
>>> +    struct list_head guests;
>>>   };
>>>     extern struct ap_matrix_dev *matrix_dev;
>
Halil Pasic Jan. 12, 2022, 2:25 p.m. UTC | #6
On Tue, 11 Jan 2022 16:58:13 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 12/29/21 22:33, Halil Pasic wrote:
> > On Thu, 21 Oct 2021 11:23:25 -0400
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >  
> >> The vfio_ap device driver registers for notification when the pointer to
> >> the KVM object for a guest is set. Let's store the KVM pointer as well as
> >> the pointer to the mediated device when the KVM pointer is set.  
> > [..]
> >
> >  
> >> struct ap_matrix_dev {
> >>          ...
> >>          struct rw_semaphore guests_lock;
> >>          struct list_head guests;
> >>         ...
> >> }
> >>
> >> The 'guests_lock' field is a r/w semaphore to control access to the
> >> 'guests' field. The 'guests' field is a list of ap_guest
> >> structures containing the KVM and matrix_mdev pointers for each active
> >> guest. An ap_guest structure will be stored into the list whenever the
> >> vfio_ap device driver is notified that the KVM pointer has been set and
> >> removed when notified that the KVM pointer has been cleared.
> >>  
> > Is this about the field or about the list including all the nodes? This
> > reads lie guests_lock only protects the head element, which makes no
> > sense to me. Because of how these lists work.  
> 
> It locks the list, I can rewrite the description.
> 
> >
> > The narrowest scope that could make sense is all the list_head stuff
> > in the entire list. I.e. one would only need the lock to traverse or
> > manipulate the list, while the payload would still be subject to
> > the matrix_dev->lock mutex.  
> 
> The matrix_dev->guests lock is needed whenever the kvm->lock
> is needed because the struct ap_guest object is created and the
> struct kvm assigned to it when the kvm pointer is set
> (vfio_ap_mdev_set_kvm function). 

Yes reading the code, my impression was, that this is more about the
ap_guest.kvm that about the list.

My understanding is that struct ap_gurest is basically about the
marriage between a matrix_mdev and a kvm. Basically a link between the
two.

But then, it probably does not make a sense for this link to outlive
either kvm or matrix_mdev.

Thus I don't quite understand why do we need the extra allocation? If
we want a list, why don't we just the pointers to matrix_mdev?

We could still protect that stuff with a separate lock.

> So, in order to access the
> ap_guest object and retrieve the kvm pointer, we have to ensure
> the ap_guest_object is still available. The fact we can get the
> kvm pointer from the ap_matrix_mdev object just makes things
> more efficient - i.e., we won't have to traverse the list.

Well if the guests_lock is only protecting the list, then that should not
be true. In that case, you can be only sure about the nodes that you
reached by traversing the list with he lock held. Right.

If only the list is protected, then one could do

down_write(guests_lock)
list_del(element)
up_write(guests_lock)
fancy_free(element)


> 
> Whenever the kvm->lock and matrix_dev->lock mutexes must
> be held, the order is:
> 
>      matrix_dev->guests_lock
>      matrix_dev->guests->kvm->lock
>      matrix_dev->lock
> 
> There are times where all three locks are not required; for example,
> the handle_pqap and vfio_ap_mdev_probe/remove functions only
> require the matrix_dev->lock because it does not need to lock kvm.
> 

Yeah, that is what gets rid of the circular lock dependency. If we had
to take guests_lock there we would have guests_lock in the same role
as matrix_dev->lock before.

But the thing is you do 
kvm = q->matrix_mdev->guest->kvm;
in the pqap_handler (more precisely in a function called by it).

So you do access the struct ap_guest object and its kvm member
without the guests_lock being held. That is where things become very
muddy to me.

It looks to me that the kvm pointer is changed with both the
guests_lock and the matrix_dev->lock held in write mode. And accessing
such stuff read only is safe with either of the two locks held.

Thus I do believe that the general idea is viable. I've pointed that out
in a later email.

But the information you give the unsuspecting reader to aid him in
understanding our new locking scheme is severely lacking.

> >
> > [..]
> >  
> >> +struct ap_guest {
> >> +	struct kvm *kvm;
> >> +	struct list_head node;
> >> +};
> >> +
> >>   /**
> >>    * struct ap_matrix_dev - Contains the data for the matrix device.
> >>    *
> >> @@ -39,6 +44,9 @@
> >>    *		single ap_matrix_mdev device. It's quite coarse but we don't
> >>    *		expect much contention.
> >>    * @vfio_ap_drv: the vfio_ap device driver
> >> + * @guests_lock: r/w semaphore for protecting access to @guests
> >> + * @guests:	list of guests (struct ap_guest) using AP devices bound to the
> >> + *		vfio_ap device driver.  
> > Please compare the above. Also if it is only about the access to the
> > list, then you could drop the lock right after create, and not keep it
> > till the very end of vfio_ap_mdev_set_kvm(). Right?  
> 
> That would be true if it only controlled access to the list, but as I
> explained above, that is not its sole purpose.

Well, but guests is a member of struct ap_matrix_dev and not the whole
list including all the nodes.

> 
> >
> > In any case I'm skeptical about this whole struct ap_guest business. To
> > me, it looks like something that just makes things more obscure and
> > complicated without any real benefit.  
> 
> I'm open to other ideas, but you'll have to come up with a way
> to take the kvm->lock before the matrix_mdev->lock in the
> vfio_ap_mdev_probe_queue and vfio_ap_mdev_remove_queue
> functions where we don't have access to the ap_matrix_mdev
> object to which the APQN is assigned and has the pointer to the
> kvm object.
> 
> In order to retrieve the matrix_mdev, we need the matrix_dev->lock.
> In order to hot plug/unplug the queue, we need the kvm->lock.
> There's your catch-22 that needs to be solved. This design is my
> attempt to solve that.
> 

I agree that having a lock that we take before kvm->lock is taken,
and another one that we take with the kvm->lock taken is a good idea.

I was referring to having ap_guest objects which are separately
allocated, and have a decoupled lifecycle. Please see above!

Regards,
Halil
[..]
Anthony Krowiak Jan. 15, 2022, 12:29 a.m. UTC | #7
On 1/12/22 09:25, Halil Pasic wrote:
> On Tue, 11 Jan 2022 16:58:13 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> On 12/29/21 22:33, Halil Pasic wrote:
>>> On Thu, 21 Oct 2021 11:23:25 -0400
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>   
>>>> The vfio_ap device driver registers for notification when the pointer to
>>>> the KVM object for a guest is set. Let's store the KVM pointer as well as
>>>> the pointer to the mediated device when the KVM pointer is set.
>>> [..]
>>>
>>>   
>>>> struct ap_matrix_dev {
>>>>           ...
>>>>           struct rw_semaphore guests_lock;
>>>>           struct list_head guests;
>>>>          ...
>>>> }
>>>>
>>>> The 'guests_lock' field is a r/w semaphore to control access to the
>>>> 'guests' field. The 'guests' field is a list of ap_guest
>>>> structures containing the KVM and matrix_mdev pointers for each active
>>>> guest. An ap_guest structure will be stored into the list whenever the
>>>> vfio_ap device driver is notified that the KVM pointer has been set and
>>>> removed when notified that the KVM pointer has been cleared.
>>>>   
>>> Is this about the field or about the list including all the nodes? This
>>> reads lie guests_lock only protects the head element, which makes no
>>> sense to me. Because of how these lists work.
>> It locks the list, I can rewrite the description.
>>
>>> The narrowest scope that could make sense is all the list_head stuff
>>> in the entire list. I.e. one would only need the lock to traverse or
>>> manipulate the list, while the payload would still be subject to
>>> the matrix_dev->lock mutex.
>> The matrix_dev->guests lock is needed whenever the kvm->lock
>> is needed because the struct ap_guest object is created and the
>> struct kvm assigned to it when the kvm pointer is set
>> (vfio_ap_mdev_set_kvm function).
> Yes reading the code, my impression was, that this is more about the
> ap_guest.kvm that about the list.
>
> My understanding is that struct ap_gurest is basically about the
> marriage between a matrix_mdev and a kvm. Basically a link between the
> two.
>
> But then, it probably does not make a sense for this link to outlive
> either kvm or matrix_mdev.
>
> Thus I don't quite understand why do we need the extra allocation? If
> we want a list, why don't we just the pointers to matrix_mdev?
>
> We could still protect that stuff with a separate lock.

I think this may be a good idea. We already have a list of matrix_mdev
stored in matrix_dev. I'll explore this further.

>
>> So, in order to access the
>> ap_guest object and retrieve the kvm pointer, we have to ensure
>> the ap_guest_object is still available. The fact we can get the
>> kvm pointer from the ap_matrix_mdev object just makes things
>> more efficient - i.e., we won't have to traverse the list.
> Well if the guests_lock is only protecting the list, then that should not
> be true. In that case, you can be only sure about the nodes that you
> reached by traversing the list with he lock held. Right.
>
> If only the list is protected, then one could do
>
> down_write(guests_lock)
> list_del(element)
> up_write(guests_lock)
> fancy_free(element)
>
>
>> Whenever the kvm->lock and matrix_dev->lock mutexes must
>> be held, the order is:
>>
>>       matrix_dev->guests_lock
>>       matrix_dev->guests->kvm->lock
>>       matrix_dev->lock
>>
>> There are times where all three locks are not required; for example,
>> the handle_pqap and vfio_ap_mdev_probe/remove functions only
>> require the matrix_dev->lock because it does not need to lock kvm.
>>
> Yeah, that is what gets rid of the circular lock dependency. If we had
> to take guests_lock there we would have guests_lock in the same role
> as matrix_dev->lock before.
>
> But the thing is you do
> kvm = q->matrix_mdev->guest->kvm;
> in the pqap_handler (more precisely in a function called by it).
>
> So you do access the struct ap_guest object and its kvm member
> without the guests_lock being held. That is where things become very
> muddy to me.

I was thinking about this the other day, that the kvm pointer is
needed when the IRQ is disabled to clean up the gisa stuff and
the pinned memory. I'm going to revisit this.

>
> It looks to me that the kvm pointer is changed with both the
> guests_lock and the matrix_dev->lock held in write mode. And accessing
> such stuff read only is safe with either of the two locks held.
>
> Thus I do believe that the general idea is viable. I've pointed that out
> in a later email.
>
> But the information you give the unsuspecting reader to aid him in
> understanding our new locking scheme is severely lacking.

I'll try to clear up the patch description.

>
>>> [..]
>>>   
>>>> +struct ap_guest {
>>>> +	struct kvm *kvm;
>>>> +	struct list_head node;
>>>> +};
>>>> +
>>>>    /**
>>>>     * struct ap_matrix_dev - Contains the data for the matrix device.
>>>>     *
>>>> @@ -39,6 +44,9 @@
>>>>     *		single ap_matrix_mdev device. It's quite coarse but we don't
>>>>     *		expect much contention.
>>>>     * @vfio_ap_drv: the vfio_ap device driver
>>>> + * @guests_lock: r/w semaphore for protecting access to @guests
>>>> + * @guests:	list of guests (struct ap_guest) using AP devices bound to the
>>>> + *		vfio_ap device driver.
>>> Please compare the above. Also if it is only about the access to the
>>> list, then you could drop the lock right after create, and not keep it
>>> till the very end of vfio_ap_mdev_set_kvm(). Right?
>> That would be true if it only controlled access to the list, but as I
>> explained above, that is not its sole purpose.
> Well, but guests is a member of struct ap_matrix_dev and not the whole
> list including all the nodes.
>
>>> In any case I'm skeptical about this whole struct ap_guest business. To
>>> me, it looks like something that just makes things more obscure and
>>> complicated without any real benefit.
>> I'm open to other ideas, but you'll have to come up with a way
>> to take the kvm->lock before the matrix_mdev->lock in the
>> vfio_ap_mdev_probe_queue and vfio_ap_mdev_remove_queue
>> functions where we don't have access to the ap_matrix_mdev
>> object to which the APQN is assigned and has the pointer to the
>> kvm object.
>>
>> In order to retrieve the matrix_mdev, we need the matrix_dev->lock.
>> In order to hot plug/unplug the queue, we need the kvm->lock.
>> There's your catch-22 that needs to be solved. This design is my
>> attempt to solve that.
>>
> I agree that having a lock that we take before kvm->lock is taken,
> and another one that we take with the kvm->lock taken is a good idea.
>
> I was referring to having ap_guest objects which are separately
> allocated, and have a decoupled lifecycle. Please see above!

I'm thinking about looking into getting rid of the struct ap_guest and
the guests list as I said above. I think I can rework this.

>
> Regards,
> Halil
> [..]
diff mbox series

Patch

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 5255e338591d..1d1746fe50ea 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -98,6 +98,8 @@  static int vfio_ap_matrix_dev_create(void)
 
 	mutex_init(&matrix_dev->lock);
 	INIT_LIST_HEAD(&matrix_dev->mdev_list);
+	init_rwsem(&matrix_dev->guests_lock);
+	INIT_LIST_HEAD(&matrix_dev->guests);
 
 	dev_set_name(&matrix_dev->device, "%s", VFIO_AP_DEV_NAME);
 	matrix_dev->device.parent = root_device;
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 6b40db6dab3c..a2875cf79091 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1086,6 +1086,20 @@  static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
 	NULL
 };
 
+static int vfio_ap_mdev_create_guest(struct kvm *kvm,
+				     struct ap_matrix_mdev *matrix_mdev)
+{
+	struct ap_guest *guest;
+
+	guest = kzalloc(sizeof(*guest), GFP_KERNEL);
+	if (!guest)
+		return -ENOMEM;
+
+	list_add(&guest->node, &matrix_dev->guests);
+
+	return 0;
+}
+
 /**
  * vfio_ap_mdev_set_kvm - sets all data for @matrix_mdev that are needed
  * to manage AP resources for the guest whose state is represented by @kvm
@@ -1106,16 +1120,23 @@  static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
 static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
 				struct kvm *kvm)
 {
+	int ret;
 	struct ap_matrix_mdev *m;
 
 	if (kvm->arch.crypto.crycbd) {
+		down_write(&matrix_dev->guests_lock);
+		ret = vfio_ap_mdev_create_guest(kvm, matrix_mdev);
+		if (WARN_ON(ret))
+			return ret;
+
 		mutex_lock(&kvm->lock);
 		mutex_lock(&matrix_dev->lock);
 
 		list_for_each_entry(m, &matrix_dev->mdev_list, node) {
 			if (m != matrix_mdev && m->kvm == kvm) {
-				mutex_unlock(&kvm->lock);
 				mutex_unlock(&matrix_dev->lock);
+				mutex_unlock(&kvm->lock);
+				up_write(&matrix_dev->guests_lock);
 				return -EPERM;
 			}
 		}
@@ -1127,8 +1148,9 @@  static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
 					  matrix_mdev->shadow_apcb.aqm,
 					  matrix_mdev->shadow_apcb.adm);
 
-		mutex_unlock(&kvm->lock);
 		mutex_unlock(&matrix_dev->lock);
+		mutex_unlock(&kvm->lock);
+		up_write(&matrix_dev->guests_lock);
 	}
 
 	return 0;
@@ -1164,6 +1186,18 @@  static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
 	return NOTIFY_DONE;
 }
 
+static void vfio_ap_mdev_remove_guest(struct kvm *kvm)
+{
+	struct ap_guest *guest, *tmp;
+
+	list_for_each_entry_safe(guest, tmp, &matrix_dev->guests, node) {
+		if (guest->kvm == kvm) {
+			list_del(&guest->node);
+			break;
+		}
+	}
+}
+
 /**
  * vfio_ap_mdev_unset_kvm - performs clean-up of resources no longer needed
  * by @matrix_mdev.
@@ -1182,6 +1216,9 @@  static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev,
 				   struct kvm *kvm)
 {
 	if (kvm && kvm->arch.crypto.crycbd) {
+		down_write(&matrix_dev->guests_lock);
+		vfio_ap_mdev_remove_guest(kvm);
+
 		mutex_lock(&kvm->lock);
 		mutex_lock(&matrix_dev->lock);
 
@@ -1191,8 +1228,9 @@  static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev,
 		kvm->arch.crypto.data = NULL;
 		matrix_mdev->kvm = NULL;
 
-		mutex_unlock(&kvm->lock);
 		mutex_unlock(&matrix_dev->lock);
+		mutex_unlock(&kvm->lock);
+		up_write(&matrix_dev->guests_lock);
 	}
 }
 
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 6dc0ebbf7a06..6d28b287d7bf 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -26,6 +26,11 @@ 
 #define VFIO_AP_MODULE_NAME "vfio_ap"
 #define VFIO_AP_DRV_NAME "vfio_ap"
 
+struct ap_guest {
+	struct kvm *kvm;
+	struct list_head node;
+};
+
 /**
  * struct ap_matrix_dev - Contains the data for the matrix device.
  *
@@ -39,6 +44,9 @@ 
  *		single ap_matrix_mdev device. It's quite coarse but we don't
  *		expect much contention.
  * @vfio_ap_drv: the vfio_ap device driver
+ * @guests_lock: r/w semaphore for protecting access to @guests
+ * @guests:	list of guests (struct ap_guest) using AP devices bound to the
+ *		vfio_ap device driver.
  */
 struct ap_matrix_dev {
 	struct device device;
@@ -47,6 +55,8 @@  struct ap_matrix_dev {
 	struct list_head mdev_list;
 	struct mutex lock;
 	struct ap_driver  *vfio_ap_drv;
+	struct rw_semaphore guests_lock;
+	struct list_head guests;
 };
 
 extern struct ap_matrix_dev *matrix_dev;