Message ID | 20190130132212.7376-2-cohuck@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio-ccw: support hsch/csch (kernel part) | expand |
On Wed, 30 Jan 2019 14:22:07 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > When we get a solicited interrupt, the start function may have > been cleared by a csch, but we still have a channel program > structure allocated. Make it safe to call the cp accessors in > any case, so we can call them unconditionally. I read this like it is supposed to be safe regardless of parallelism and threads. However I don't see any explicit synchronization done for cp->initialized. I've managed to figure out how is that supposed to be safe for the cp_free() (which is probably our main concern) in vfio_ccw_sch_io_todo(), but if fail when it comes to the one in vfio_ccw_mdev_notifier(). Can you explain us how does the synchronization work? Regards, Halil
On Wed, 30 Jan 2019 19:51:27 +0100 Halil Pasic <pasic@linux.ibm.com> wrote: > On Wed, 30 Jan 2019 14:22:07 +0100 > Cornelia Huck <cohuck@redhat.com> wrote: > > > When we get a solicited interrupt, the start function may have > > been cleared by a csch, but we still have a channel program > > structure allocated. Make it safe to call the cp accessors in > > any case, so we can call them unconditionally. > > I read this like it is supposed to be safe regardless of > parallelism and threads. However I don't see any explicit > synchronization done for cp->initialized. > > I've managed to figure out how is that supposed to be safe > for the cp_free() (which is probably our main concern) in > vfio_ccw_sch_io_todo(), but if fail when it comes to the one > in vfio_ccw_mdev_notifier(). > > Can you explain us how does the synchronization work? You read that wrong, I don't add synchronization, I just add a check.
On Thu, 31 Jan 2019 12:52:20 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > On Wed, 30 Jan 2019 19:51:27 +0100 > Halil Pasic <pasic@linux.ibm.com> wrote: > > > On Wed, 30 Jan 2019 14:22:07 +0100 > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > When we get a solicited interrupt, the start function may have > > > been cleared by a csch, but we still have a channel program > > > structure allocated. Make it safe to call the cp accessors in > > > any case, so we can call them unconditionally. > > > > I read this like it is supposed to be safe regardless of > > parallelism and threads. However I don't see any explicit > > synchronization done for cp->initialized. > > > > I've managed to figure out how is that supposed to be safe > > for the cp_free() (which is probably our main concern) in > > vfio_ccw_sch_io_todo(), but if fail when it comes to the one > > in vfio_ccw_mdev_notifier(). > > > > Can you explain us how does the synchronization work? > > You read that wrong, I don't add synchronization, I just add a check. > Now I'm confused. Does that mean we don't need synchronization for this? Regards, Halil
On Thu, 31 Jan 2019 13:34:55 +0100 Halil Pasic <pasic@linux.ibm.com> wrote: > On Thu, 31 Jan 2019 12:52:20 +0100 > Cornelia Huck <cohuck@redhat.com> wrote: > > > On Wed, 30 Jan 2019 19:51:27 +0100 > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > On Wed, 30 Jan 2019 14:22:07 +0100 > > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > > > When we get a solicited interrupt, the start function may have > > > > been cleared by a csch, but we still have a channel program > > > > structure allocated. Make it safe to call the cp accessors in > > > > any case, so we can call them unconditionally. > > > > > > I read this like it is supposed to be safe regardless of > > > parallelism and threads. However I don't see any explicit > > > synchronization done for cp->initialized. > > > > > > I've managed to figure out how is that supposed to be safe > > > for the cp_free() (which is probably our main concern) in > > > vfio_ccw_sch_io_todo(), but if fail when it comes to the one > > > in vfio_ccw_mdev_notifier(). > > > > > > Can you explain us how does the synchronization work? > > > > You read that wrong, I don't add synchronization, I just add a check. > > > > Now I'm confused. Does that mean we don't need synchronization for this? If we lack synchronization (that is not provided by the current state machine handling, or the rework here), we should do a patch on top (preferably on top of the whole series, so this does not get even more tangled up.) This is really just about the extra check.
On 01/30/2019 08:22 AM, Cornelia Huck wrote: > When we get a solicited interrupt, the start function may have > been cleared by a csch, but we still have a channel program > structure allocated. Make it safe to call the cp accessors in > any case, so we can call them unconditionally. > > While at it, also make sure that functions called from other parts > of the code return gracefully if the channel program structure > has not been initialized (even though that is a bug in the caller). > > Signed-off-by: Cornelia Huck <cohuck@redhat.com> > --- > drivers/s390/cio/vfio_ccw_cp.c | 20 +++++++++++++++++++- > drivers/s390/cio/vfio_ccw_cp.h | 2 ++ > drivers/s390/cio/vfio_ccw_fsm.c | 5 +++++ > 3 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c > index ba08fe137c2e..0bc0c38edda7 100644 > --- a/drivers/s390/cio/vfio_ccw_cp.c > +++ b/drivers/s390/cio/vfio_ccw_cp.c > @@ -335,6 +335,7 @@ static void cp_unpin_free(struct channel_program *cp) > struct ccwchain *chain, *temp; > int i; > > + cp->initialized = false; > list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) { > for (i = 0; i < chain->ch_len; i++) { > pfn_array_table_unpin_free(chain->ch_pat + i, > @@ -701,6 +702,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb) > */ > cp->orb.cmd.c64 = 1; > > + cp->initialized = true; > + Not seen in this hunk, but we call ccwchain_loop_tic() just prior to this point. If that returns non-zero, we call cp_unpin_free()[1] (and set initailized to false), and then fall through to here. So this is going to set initialized to true, even though we're taking an error path. :-( [1] Wait, why is it calling cp_unpin_free()? Oh, I had proposed squashing cp_free() and cp_unpin_free() back in November[2], got an r-b from Pierre but haven't gotten back to tidy up the series for a v2. Okay, I'll try to do that again soon. :-) [2] https://patchwork.kernel.org/patch/10675261/ > return ret; > } > > @@ -715,7 +718,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb) > */ > void cp_free(struct channel_program *cp) > { > - cp_unpin_free(cp); > + if (cp->initialized) > + cp_unpin_free(cp); > } > > /** > @@ -760,6 +764,10 @@ int cp_prefetch(struct channel_program *cp) > struct ccwchain *chain; > int len, idx, ret; > > + /* this is an error in the caller */ > + if (!cp || !cp->initialized) > + return -EINVAL; > + > list_for_each_entry(chain, &cp->ccwchain_list, next) { > len = chain->ch_len; > for (idx = 0; idx < len; idx++) { > @@ -795,6 +803,10 @@ union orb *cp_get_orb(struct channel_program *cp, u32 intparm, u8 lpm) > struct ccwchain *chain; > struct ccw1 *cpa; > > + /* this is an error in the caller */ > + if (!cp || !cp->initialized) > + return NULL; > + > orb = &cp->orb; > > orb->cmd.intparm = intparm; > @@ -831,6 +843,9 @@ void cp_update_scsw(struct channel_program *cp, union scsw *scsw) > u32 cpa = scsw->cmd.cpa; > u32 ccw_head, ccw_tail; > > + if (!cp->initialized) > + return; > + > /* > * LATER: > * For now, only update the cmd.cpa part. We may need to deal with > @@ -869,6 +884,9 @@ bool cp_iova_pinned(struct channel_program *cp, u64 iova) > struct ccwchain *chain; > int i; > > + if (!cp->initialized) So, two of the checks added above look for a nonzero cp pointer prior to checking initialized, while two don't. I guess cp can't be NULL, since it's embedded in the private struct directly and that's only free'd when we do vfio_ccw_sch_remove() ... But I guess some consistency in how we look would be nice. > + return false; > + > list_for_each_entry(chain, &cp->ccwchain_list, next) { > for (i = 0; i < chain->ch_len; i++) > if (pfn_array_table_iova_pinned(chain->ch_pat + i, > diff --git a/drivers/s390/cio/vfio_ccw_cp.h b/drivers/s390/cio/vfio_ccw_cp.h > index a4b74fb1aa57..3c20cd208da5 100644 > --- a/drivers/s390/cio/vfio_ccw_cp.h > +++ b/drivers/s390/cio/vfio_ccw_cp.h > @@ -21,6 +21,7 @@ > * @ccwchain_list: list head of ccwchains > * @orb: orb for the currently processed ssch request > * @mdev: the mediated device to perform page pinning/unpinning > + * @initialized: whether this instance is actually initialized > * > * @ccwchain_list is the head of a ccwchain list, that contents the > * translated result of the guest channel program that pointed out by > @@ -30,6 +31,7 @@ struct channel_program { > struct list_head ccwchain_list; > union orb orb; > struct device *mdev; > + bool initialized; > }; > > extern int cp_init(struct channel_program *cp, struct device *mdev, > diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c > index cab17865aafe..e7c9877c9f1e 100644 > --- a/drivers/s390/cio/vfio_ccw_fsm.c > +++ b/drivers/s390/cio/vfio_ccw_fsm.c > @@ -31,6 +31,10 @@ static int fsm_io_helper(struct vfio_ccw_private *private) > private->state = VFIO_CCW_STATE_BUSY; > > orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm); > + if (!orb) { > + ret = -EIO; > + goto out; > + } > > /* Issue "Start Subchannel" */ > ccode = ssch(sch->schid, orb); > @@ -64,6 +68,7 @@ static int fsm_io_helper(struct vfio_ccw_private *private) > default: > ret = ccode; > } > +out: > spin_unlock_irqrestore(sch->lock, flags); > return ret; > } >
On Mon, 4 Feb 2019 16:31:02 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > On Thu, 31 Jan 2019 13:34:55 +0100 > Halil Pasic <pasic@linux.ibm.com> wrote: > > > On Thu, 31 Jan 2019 12:52:20 +0100 > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > On Wed, 30 Jan 2019 19:51:27 +0100 > > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > > > On Wed, 30 Jan 2019 14:22:07 +0100 > > > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > > > > > When we get a solicited interrupt, the start function may have > > > > > been cleared by a csch, but we still have a channel program > > > > > structure allocated. Make it safe to call the cp accessors in > > > > > any case, so we can call them unconditionally. > > > > > > > > I read this like it is supposed to be safe regardless of > > > > parallelism and threads. However I don't see any explicit > > > > synchronization done for cp->initialized. > > > > > > > > I've managed to figure out how is that supposed to be safe > > > > for the cp_free() (which is probably our main concern) in > > > > vfio_ccw_sch_io_todo(), but if fail when it comes to the one > > > > in vfio_ccw_mdev_notifier(). > > > > > > > > Can you explain us how does the synchronization work? > > > > > > You read that wrong, I don't add synchronization, I just add a check. > > > > > > > Now I'm confused. Does that mean we don't need synchronization for this? > > If we lack synchronization (that is not provided by the current state > machine handling, or the rework here), we should do a patch on top > (preferably on top of the whole series, so this does not get even more > tangled up.) This is really just about the extra check. > I'm not a huge fan of keeping or introducing races -- it makes things difficult to reason about, but I do have some understanging your position. This patch-series is AFAICT a big improvement over what we have. I would like Farhan confirming that it makes these hick-ups when he used to hit BUSY with another ssch request disappear. If it does (I hope it does) it's definitely a good thing for anybody who wants to use vfio-ccw. Yet I find it difficult to slap my r-b over racy code, or partial solutions. In the latter case, when I lack conceptual clarity, I find it difficult to tell if we are heading into the right direction, or is what we build today going to turn against us tomorrow. Sorry for being a drag. Regards, Halil
On Mon, 4 Feb 2019 14:25:34 -0500 Eric Farman <farman@linux.ibm.com> wrote: > On 01/30/2019 08:22 AM, Cornelia Huck wrote: > > When we get a solicited interrupt, the start function may have > > been cleared by a csch, but we still have a channel program > > structure allocated. Make it safe to call the cp accessors in > > any case, so we can call them unconditionally. > > > > While at it, also make sure that functions called from other parts > > of the code return gracefully if the channel program structure > > has not been initialized (even though that is a bug in the caller). > > > > Signed-off-by: Cornelia Huck <cohuck@redhat.com> > > --- > > drivers/s390/cio/vfio_ccw_cp.c | 20 +++++++++++++++++++- > > drivers/s390/cio/vfio_ccw_cp.h | 2 ++ > > drivers/s390/cio/vfio_ccw_fsm.c | 5 +++++ > > 3 files changed, 26 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c > > index ba08fe137c2e..0bc0c38edda7 100644 > > --- a/drivers/s390/cio/vfio_ccw_cp.c > > +++ b/drivers/s390/cio/vfio_ccw_cp.c > > @@ -335,6 +335,7 @@ static void cp_unpin_free(struct channel_program *cp) > > struct ccwchain *chain, *temp; > > int i; > > > > + cp->initialized = false; > > list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) { > > for (i = 0; i < chain->ch_len; i++) { > > pfn_array_table_unpin_free(chain->ch_pat + i, > > @@ -701,6 +702,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb) > > */ > > cp->orb.cmd.c64 = 1; > > > > + cp->initialized = true; > > + > > Not seen in this hunk, but we call ccwchain_loop_tic() just prior to > this point. If that returns non-zero, we call cp_unpin_free()[1] (and > set initailized to false), and then fall through to here. So this is > going to set initialized to true, even though we're taking an error > path. :-( Eek, setting c64 unconditionally threw me off. This needs to check for !ret, of course. > > [1] Wait, why is it calling cp_unpin_free()? Oh, I had proposed > squashing cp_free() and cp_unpin_free() back in November[2], got an r-b > from Pierre but haven't gotten back to tidy up the series for a v2. > Okay, I'll try to do that again soon. :-) :) > [2] https://patchwork.kernel.org/patch/10675261/ > > > return ret; > > } > > > > @@ -715,7 +718,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb) > > */ > > void cp_free(struct channel_program *cp) > > { > > - cp_unpin_free(cp); > > + if (cp->initialized) > > + cp_unpin_free(cp); > > } > > > > /** > > @@ -760,6 +764,10 @@ int cp_prefetch(struct channel_program *cp) > > struct ccwchain *chain; > > int len, idx, ret; > > > > + /* this is an error in the caller */ > > + if (!cp || !cp->initialized) > > + return -EINVAL; > > + > > list_for_each_entry(chain, &cp->ccwchain_list, next) { > > len = chain->ch_len; > > for (idx = 0; idx < len; idx++) { > > @@ -795,6 +803,10 @@ union orb *cp_get_orb(struct channel_program *cp, u32 intparm, u8 lpm) > > struct ccwchain *chain; > > struct ccw1 *cpa; > > > > + /* this is an error in the caller */ > > + if (!cp || !cp->initialized) > > + return NULL; > > + > > orb = &cp->orb; > > > > orb->cmd.intparm = intparm; > > @@ -831,6 +843,9 @@ void cp_update_scsw(struct channel_program *cp, union scsw *scsw) > > u32 cpa = scsw->cmd.cpa; > > u32 ccw_head, ccw_tail; > > > > + if (!cp->initialized) > > + return; > > + > > /* > > * LATER: > > * For now, only update the cmd.cpa part. We may need to deal with > > @@ -869,6 +884,9 @@ bool cp_iova_pinned(struct channel_program *cp, u64 iova) > > struct ccwchain *chain; > > int i; > > > > + if (!cp->initialized) > > So, two of the checks added above look for a nonzero cp pointer prior to > checking initialized, while two don't. I guess cp can't be NULL, since > it's embedded in the private struct directly and that's only free'd when > we do vfio_ccw_sch_remove() ... But I guess some consistency in how we > look would be nice. The idea was: In which context is this called? Is there a legitimate reason for the caller to pass in an uninitialized cp, or would that mean the caller had messed up (and we should not trust cp to be !NULL either?) But you're right, that does look inconsistent. Always checking for cp != NULL probably looks least odd, although it is overkill. Opinions? > > > + return false; > > + > > list_for_each_entry(chain, &cp->ccwchain_list, next) { > > for (i = 0; i < chain->ch_len; i++) > > if (pfn_array_table_iova_pinned(chain->ch_pat + i,
On Tue, 5 Feb 2019 12:52:29 +0100 Halil Pasic <pasic@linux.ibm.com> wrote: > On Mon, 4 Feb 2019 16:31:02 +0100 > Cornelia Huck <cohuck@redhat.com> wrote: > > > On Thu, 31 Jan 2019 13:34:55 +0100 > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > On Thu, 31 Jan 2019 12:52:20 +0100 > > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > > > On Wed, 30 Jan 2019 19:51:27 +0100 > > > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > > > > > On Wed, 30 Jan 2019 14:22:07 +0100 > > > > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > > > > > > > When we get a solicited interrupt, the start function may have > > > > > > been cleared by a csch, but we still have a channel program > > > > > > structure allocated. Make it safe to call the cp accessors in > > > > > > any case, so we can call them unconditionally. > > > > > > > > > > I read this like it is supposed to be safe regardless of > > > > > parallelism and threads. However I don't see any explicit > > > > > synchronization done for cp->initialized. > > > > > > > > > > I've managed to figure out how is that supposed to be safe > > > > > for the cp_free() (which is probably our main concern) in > > > > > vfio_ccw_sch_io_todo(), but if fail when it comes to the one > > > > > in vfio_ccw_mdev_notifier(). > > > > > > > > > > Can you explain us how does the synchronization work? > > > > > > > > You read that wrong, I don't add synchronization, I just add a check. > > > > > > > > > > Now I'm confused. Does that mean we don't need synchronization for this? > > > > If we lack synchronization (that is not provided by the current state > > machine handling, or the rework here), we should do a patch on top > > (preferably on top of the whole series, so this does not get even more > > tangled up.) This is really just about the extra check. > > > > I'm not a huge fan of keeping or introducing races -- it makes things > difficult to reason about, but I do have some understanging your > position. The only thing I want to avoid is knowingly making things worse than before, and I don't think this patch does that. > > This patch-series is AFAICT a big improvement over what we have. I would > like Farhan confirming that it makes these hick-ups when he used to hit > BUSY with another ssch request disappear. If it does (I hope it does) > it's definitely a good thing for anybody who wants to use vfio-ccw. Yep. There remains a lot to be done, but it's a first step. > > Yet I find it difficult to slap my r-b over racy code, or partial > solutions. In the latter case, when I lack conceptual clarity, I find it > difficult to tell if we are heading into the right direction, or is what > we build today going to turn against us tomorrow. Sorry for being a drag. As long as we don't introduce bad user space interfaces we have to drag around forever, I think anything is fair game if we think it's a good idea at that moment. We can rewrite things if it turned out to be a bad idea (although I'm not arguing for doing random crap, of course :)
On 02/05/2019 07:03 AM, Cornelia Huck wrote: > On Mon, 4 Feb 2019 14:25:34 -0500 > Eric Farman <farman@linux.ibm.com> wrote: > >> On 01/30/2019 08:22 AM, Cornelia Huck wrote: >>> When we get a solicited interrupt, the start function may have >>> been cleared by a csch, but we still have a channel program >>> structure allocated. Make it safe to call the cp accessors in >>> any case, so we can call them unconditionally. >>> >>> While at it, also make sure that functions called from other parts >>> of the code return gracefully if the channel program structure >>> has not been initialized (even though that is a bug in the caller). >>> >>> Signed-off-by: Cornelia Huck <cohuck@redhat.com> >>> --- >>> drivers/s390/cio/vfio_ccw_cp.c | 20 +++++++++++++++++++- >>> drivers/s390/cio/vfio_ccw_cp.h | 2 ++ >>> drivers/s390/cio/vfio_ccw_fsm.c | 5 +++++ >>> 3 files changed, 26 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c >>> index ba08fe137c2e..0bc0c38edda7 100644 >>> --- a/drivers/s390/cio/vfio_ccw_cp.c >>> +++ b/drivers/s390/cio/vfio_ccw_cp.c >>> @@ -335,6 +335,7 @@ static void cp_unpin_free(struct channel_program *cp) >>> struct ccwchain *chain, *temp; >>> int i; >>> >>> + cp->initialized = false; >>> list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) { >>> for (i = 0; i < chain->ch_len; i++) { >>> pfn_array_table_unpin_free(chain->ch_pat + i, >>> @@ -701,6 +702,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb) >>> */ >>> cp->orb.cmd.c64 = 1; >>> >>> + cp->initialized = true; >>> + >> >> Not seen in this hunk, but we call ccwchain_loop_tic() just prior to >> this point. If that returns non-zero, we call cp_unpin_free()[1] (and >> set initailized to false), and then fall through to here. So this is >> going to set initialized to true, even though we're taking an error >> path. :-( > > Eek, setting c64 unconditionally threw me off. This needs to check > for !ret, of course. > >> >> [1] Wait, why is it calling cp_unpin_free()? Oh, I had proposed >> squashing cp_free() and cp_unpin_free() back in November[2], got an r-b >> from Pierre but haven't gotten back to tidy up the series for a v2. >> Okay, I'll try to do that again soon. :-) > > :) > >> [2] https://patchwork.kernel.org/patch/10675261/ >> >>> return ret; >>> } >>> >>> @@ -715,7 +718,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb) >>> */ >>> void cp_free(struct channel_program *cp) >>> { >>> - cp_unpin_free(cp); >>> + if (cp->initialized) >>> + cp_unpin_free(cp); >>> } >>> >>> /** >>> @@ -760,6 +764,10 @@ int cp_prefetch(struct channel_program *cp) >>> struct ccwchain *chain; >>> int len, idx, ret; >>> >>> + /* this is an error in the caller */ >>> + if (!cp || !cp->initialized) >>> + return -EINVAL; >>> + >>> list_for_each_entry(chain, &cp->ccwchain_list, next) { >>> len = chain->ch_len; >>> for (idx = 0; idx < len; idx++) { >>> @@ -795,6 +803,10 @@ union orb *cp_get_orb(struct channel_program *cp, u32 intparm, u8 lpm) >>> struct ccwchain *chain; >>> struct ccw1 *cpa; >>> >>> + /* this is an error in the caller */ >>> + if (!cp || !cp->initialized) >>> + return NULL; >>> + >>> orb = &cp->orb; >>> >>> orb->cmd.intparm = intparm; >>> @@ -831,6 +843,9 @@ void cp_update_scsw(struct channel_program *cp, union scsw *scsw) >>> u32 cpa = scsw->cmd.cpa; >>> u32 ccw_head, ccw_tail; >>> >>> + if (!cp->initialized) >>> + return; >>> + >>> /* >>> * LATER: >>> * For now, only update the cmd.cpa part. We may need to deal with >>> @@ -869,6 +884,9 @@ bool cp_iova_pinned(struct channel_program *cp, u64 iova) >>> struct ccwchain *chain; >>> int i; >>> >>> + if (!cp->initialized) >> >> So, two of the checks added above look for a nonzero cp pointer prior to >> checking initialized, while two don't. I guess cp can't be NULL, since >> it's embedded in the private struct directly and that's only free'd when >> we do vfio_ccw_sch_remove() ... But I guess some consistency in how we >> look would be nice. > > The idea was: In which context is this called? Is there a legitimate > reason for the caller to pass in an uninitialized cp, or would that > mean the caller had messed up (and we should not trust cp to be !NULL > either?) > > But you're right, that does look inconsistent. Always checking for > cp != NULL probably looks least odd, although it is overkill. Opinions? My opinion? Since cp is embedded in vfio_ccw_private, rather than a pointer to a separately malloc'd struct, we pass &private->cp to those functions. So a check for !cp doesn't really buy us anything because what we are actually concerned about is whether or not private is NULL, which only changes on the probe/remove boundaries. > >> >>> + return false; >>> + >>> list_for_each_entry(chain, &cp->ccwchain_list, next) { >>> for (i = 0; i < chain->ch_len; i++) >>> if (pfn_array_table_iova_pinned(chain->ch_pat + i, >
On 02/05/2019 07:35 AM, Cornelia Huck wrote: > On Tue, 5 Feb 2019 12:52:29 +0100 > Halil Pasic <pasic@linux.ibm.com> wrote: > >> On Mon, 4 Feb 2019 16:31:02 +0100 >> Cornelia Huck <cohuck@redhat.com> wrote: >> >>> On Thu, 31 Jan 2019 13:34:55 +0100 >>> Halil Pasic <pasic@linux.ibm.com> wrote: >>> >>>> On Thu, 31 Jan 2019 12:52:20 +0100 >>>> Cornelia Huck <cohuck@redhat.com> wrote: >>>> >>>>> On Wed, 30 Jan 2019 19:51:27 +0100 >>>>> Halil Pasic <pasic@linux.ibm.com> wrote: >>>>> >>>>>> On Wed, 30 Jan 2019 14:22:07 +0100 >>>>>> Cornelia Huck <cohuck@redhat.com> wrote: >>>>>> >>>>>>> When we get a solicited interrupt, the start function may have >>>>>>> been cleared by a csch, but we still have a channel program >>>>>>> structure allocated. Make it safe to call the cp accessors in >>>>>>> any case, so we can call them unconditionally. >>>>>> >>>>>> I read this like it is supposed to be safe regardless of >>>>>> parallelism and threads. However I don't see any explicit >>>>>> synchronization done for cp->initialized. >>>>>> >>>>>> I've managed to figure out how is that supposed to be safe >>>>>> for the cp_free() (which is probably our main concern) in >>>>>> vfio_ccw_sch_io_todo(), but if fail when it comes to the one >>>>>> in vfio_ccw_mdev_notifier(). >>>>>> >>>>>> Can you explain us how does the synchronization work? >>>>> >>>>> You read that wrong, I don't add synchronization, I just add a check. >>>>> >>>> >>>> Now I'm confused. Does that mean we don't need synchronization for this? >>> >>> If we lack synchronization (that is not provided by the current state >>> machine handling, or the rework here), we should do a patch on top >>> (preferably on top of the whole series, so this does not get even more >>> tangled up.) This is really just about the extra check. >>> >> >> I'm not a huge fan of keeping or introducing races -- it makes things >> difficult to reason about, but I do have some understanging your >> position. > > The only thing I want to avoid is knowingly making things worse than > before, and I don't think this patch does that. > >> >> This patch-series is AFAICT a big improvement over what we have. I would >> like Farhan confirming that it makes these hick-ups when he used to hit >> BUSY with another ssch request disappear. If it does (I hope it does) >> it's definitely a good thing for anybody who wants to use vfio-ccw. > > Yep. There remains a lot to be done, but it's a first step. s/a first step/an excellent first step/ :) Can't speak for Farhan, but this makes things somewhat better for me. I'm still getting some periodic errors, but they happen infrequently enough now that debugging them is frustrating. ;-) - Eric > >> >> Yet I find it difficult to slap my r-b over racy code, or partial >> solutions. In the latter case, when I lack conceptual clarity, I find it >> difficult to tell if we are heading into the right direction, or is what >> we build today going to turn against us tomorrow. Sorry for being a drag. > > As long as we don't introduce bad user space interfaces we have to drag > around forever, I think anything is fair game if we think it's a good > idea at that moment. We can rewrite things if it turned out to be a bad > idea (although I'm not arguing for doing random crap, of course :) >
On 02/05/2019 09:48 AM, Eric Farman wrote: > > > On 02/05/2019 07:35 AM, Cornelia Huck wrote: >> On Tue, 5 Feb 2019 12:52:29 +0100 >> Halil Pasic <pasic@linux.ibm.com> wrote: >> >>> On Mon, 4 Feb 2019 16:31:02 +0100 >>> Cornelia Huck <cohuck@redhat.com> wrote: >>> >>>> On Thu, 31 Jan 2019 13:34:55 +0100 >>>> Halil Pasic <pasic@linux.ibm.com> wrote: >>>>> On Thu, 31 Jan 2019 12:52:20 +0100 >>>>> Cornelia Huck <cohuck@redhat.com> wrote: >>>>>> On Wed, 30 Jan 2019 19:51:27 +0100 >>>>>> Halil Pasic <pasic@linux.ibm.com> wrote: >>>>>>> On Wed, 30 Jan 2019 14:22:07 +0100 >>>>>>> Cornelia Huck <cohuck@redhat.com> wrote: >>>>>>>> When we get a solicited interrupt, the start function may have >>>>>>>> been cleared by a csch, but we still have a channel program >>>>>>>> structure allocated. Make it safe to call the cp accessors in >>>>>>>> any case, so we can call them unconditionally. >>>>>>> >>>>>>> I read this like it is supposed to be safe regardless of >>>>>>> parallelism and threads. However I don't see any explicit >>>>>>> synchronization done for cp->initialized. >>>>>>> >>>>>>> I've managed to figure out how is that supposed to be safe >>>>>>> for the cp_free() (which is probably our main concern) in >>>>>>> vfio_ccw_sch_io_todo(), but if fail when it comes to the one >>>>>>> in vfio_ccw_mdev_notifier(). >>>>>>> >>>>>>> Can you explain us how does the synchronization work? >>>>>> >>>>>> You read that wrong, I don't add synchronization, I just add a check. >>>>> >>>>> Now I'm confused. Does that mean we don't need synchronization for >>>>> this? >>>> >>>> If we lack synchronization (that is not provided by the current state >>>> machine handling, or the rework here), we should do a patch on top >>>> (preferably on top of the whole series, so this does not get even more >>>> tangled up.) This is really just about the extra check. >>> >>> I'm not a huge fan of keeping or introducing races -- it makes things >>> difficult to reason about, but I do have some understanging your >>> position. >> >> The only thing I want to avoid is knowingly making things worse than >> before, and I don't think this patch does that. >> >>> >>> This patch-series is AFAICT a big improvement over what we have. I would >>> like Farhan confirming that it makes these hick-ups when he used to hit >>> BUSY with another ssch request disappear. If it does (I hope it does) >>> it's definitely a good thing for anybody who wants to use vfio-ccw. >> >> Yep. There remains a lot to be done, but it's a first step. > > s/a first step/an excellent first step/ :) > > Can't speak for Farhan, but this makes things somewhat better for me. > I'm still getting some periodic errors, but they happen infrequently > enough now that debugging them is frustrating. ;-) > > - Eric > I ran the my workloads/tests with the patches and like Eric I notice the errors I previously hit less frequently. >> >>> >>> Yet I find it difficult to slap my r-b over racy code, or partial >>> solutions. In the latter case, when I lack conceptual clarity, I find it >>> difficult to tell if we are heading into the right direction, or is what >>> we build today going to turn against us tomorrow. Sorry for being a >>> drag. >> >> As long as we don't introduce bad user space interfaces we have to drag >> around forever, I think anything is fair game if we think it's a good >> idea at that moment. We can rewrite things if it turned out to be a bad >> idea (although I'm not arguing for doing random crap, of course :) >> >
On Tue, 5 Feb 2019 10:14:46 -0500 Farhan Ali <alifm@linux.ibm.com> wrote: > On 02/05/2019 09:48 AM, Eric Farman wrote: > >>> This patch-series is AFAICT a big improvement over what we have. I would > >>> like Farhan confirming that it makes these hick-ups when he used to hit > >>> BUSY with another ssch request disappear. If it does (I hope it does) > >>> it's definitely a good thing for anybody who wants to use vfio-ccw. > >> > >> Yep. There remains a lot to be done, but it's a first step. > > > > s/a first step/an excellent first step/ :) > > > > Can't speak for Farhan, but this makes things somewhat better for me. > > I'm still getting some periodic errors, but they happen infrequently > > enough now that debugging them is frustrating. ;-) > > > > - Eric > > > > I ran the my workloads/tests with the patches and like Eric I notice the > errors I previously hit less frequently. Great, thanks for testing!
On Tue, 5 Feb 2019 09:41:15 -0500 Eric Farman <farman@linux.ibm.com> wrote: > On 02/05/2019 07:03 AM, Cornelia Huck wrote: > > On Mon, 4 Feb 2019 14:25:34 -0500 > > Eric Farman <farman@linux.ibm.com> wrote: > > > >> On 01/30/2019 08:22 AM, Cornelia Huck wrote: > >>> @@ -760,6 +764,10 @@ int cp_prefetch(struct channel_program *cp) > >>> struct ccwchain *chain; > >>> int len, idx, ret; > >>> > >>> + /* this is an error in the caller */ > >>> + if (!cp || !cp->initialized) > >>> + return -EINVAL; > >>> + > >>> list_for_each_entry(chain, &cp->ccwchain_list, next) { > >>> len = chain->ch_len; > >>> for (idx = 0; idx < len; idx++) { > >>> @@ -795,6 +803,10 @@ union orb *cp_get_orb(struct channel_program *cp, u32 intparm, u8 lpm) > >>> struct ccwchain *chain; > >>> struct ccw1 *cpa; > >>> > >>> + /* this is an error in the caller */ > >>> + if (!cp || !cp->initialized) > >>> + return NULL; > >>> + > >>> orb = &cp->orb; > >>> > >>> orb->cmd.intparm = intparm; > >>> @@ -831,6 +843,9 @@ void cp_update_scsw(struct channel_program *cp, union scsw *scsw) > >>> u32 cpa = scsw->cmd.cpa; > >>> u32 ccw_head, ccw_tail; > >>> > >>> + if (!cp->initialized) > >>> + return; > >>> + > >>> /* > >>> * LATER: > >>> * For now, only update the cmd.cpa part. We may need to deal with > >>> @@ -869,6 +884,9 @@ bool cp_iova_pinned(struct channel_program *cp, u64 iova) > >>> struct ccwchain *chain; > >>> int i; > >>> > >>> + if (!cp->initialized) > >> > >> So, two of the checks added above look for a nonzero cp pointer prior to > >> checking initialized, while two don't. I guess cp can't be NULL, since > >> it's embedded in the private struct directly and that's only free'd when > >> we do vfio_ccw_sch_remove() ... But I guess some consistency in how we > >> look would be nice. > > > > The idea was: In which context is this called? Is there a legitimate > > reason for the caller to pass in an uninitialized cp, or would that > > mean the caller had messed up (and we should not trust cp to be !NULL > > either?) > > > > But you're right, that does look inconsistent. Always checking for > > cp != NULL probably looks least odd, although it is overkill. Opinions? > > My opinion? Since cp is embedded in vfio_ccw_private, rather than a > pointer to a separately malloc'd struct, we pass &private->cp to those > functions. So a check for !cp doesn't really buy us anything because > what we are actually concerned about is whether or not private is NULL, > which only changes on the probe/remove boundaries. I guess if we pass in crap (or NULL) instead of &private->cp, it's our own fault and we can disregard fencing that case. The probe/remove path does not really bother me, for the reasons you said.
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c index ba08fe137c2e..0bc0c38edda7 100644 --- a/drivers/s390/cio/vfio_ccw_cp.c +++ b/drivers/s390/cio/vfio_ccw_cp.c @@ -335,6 +335,7 @@ static void cp_unpin_free(struct channel_program *cp) struct ccwchain *chain, *temp; int i; + cp->initialized = false; list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) { for (i = 0; i < chain->ch_len; i++) { pfn_array_table_unpin_free(chain->ch_pat + i, @@ -701,6 +702,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb) */ cp->orb.cmd.c64 = 1; + cp->initialized = true; + return ret; } @@ -715,7 +718,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb) */ void cp_free(struct channel_program *cp) { - cp_unpin_free(cp); + if (cp->initialized) + cp_unpin_free(cp); } /** @@ -760,6 +764,10 @@ int cp_prefetch(struct channel_program *cp) struct ccwchain *chain; int len, idx, ret; + /* this is an error in the caller */ + if (!cp || !cp->initialized) + return -EINVAL; + list_for_each_entry(chain, &cp->ccwchain_list, next) { len = chain->ch_len; for (idx = 0; idx < len; idx++) { @@ -795,6 +803,10 @@ union orb *cp_get_orb(struct channel_program *cp, u32 intparm, u8 lpm) struct ccwchain *chain; struct ccw1 *cpa; + /* this is an error in the caller */ + if (!cp || !cp->initialized) + return NULL; + orb = &cp->orb; orb->cmd.intparm = intparm; @@ -831,6 +843,9 @@ void cp_update_scsw(struct channel_program *cp, union scsw *scsw) u32 cpa = scsw->cmd.cpa; u32 ccw_head, ccw_tail; + if (!cp->initialized) + return; + /* * LATER: * For now, only update the cmd.cpa part. We may need to deal with @@ -869,6 +884,9 @@ bool cp_iova_pinned(struct channel_program *cp, u64 iova) struct ccwchain *chain; int i; + if (!cp->initialized) + return false; + list_for_each_entry(chain, &cp->ccwchain_list, next) { for (i = 0; i < chain->ch_len; i++) if (pfn_array_table_iova_pinned(chain->ch_pat + i, diff --git a/drivers/s390/cio/vfio_ccw_cp.h b/drivers/s390/cio/vfio_ccw_cp.h index a4b74fb1aa57..3c20cd208da5 100644 --- a/drivers/s390/cio/vfio_ccw_cp.h +++ b/drivers/s390/cio/vfio_ccw_cp.h @@ -21,6 +21,7 @@ * @ccwchain_list: list head of ccwchains * @orb: orb for the currently processed ssch request * @mdev: the mediated device to perform page pinning/unpinning + * @initialized: whether this instance is actually initialized * * @ccwchain_list is the head of a ccwchain list, that contents the * translated result of the guest channel program that pointed out by @@ -30,6 +31,7 @@ struct channel_program { struct list_head ccwchain_list; union orb orb; struct device *mdev; + bool initialized; }; extern int cp_init(struct channel_program *cp, struct device *mdev, diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c index cab17865aafe..e7c9877c9f1e 100644 --- a/drivers/s390/cio/vfio_ccw_fsm.c +++ b/drivers/s390/cio/vfio_ccw_fsm.c @@ -31,6 +31,10 @@ static int fsm_io_helper(struct vfio_ccw_private *private) private->state = VFIO_CCW_STATE_BUSY; orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm); + if (!orb) { + ret = -EIO; + goto out; + } /* Issue "Start Subchannel" */ ccode = ssch(sch->schid, orb); @@ -64,6 +68,7 @@ static int fsm_io_helper(struct vfio_ccw_private *private) default: ret = ccode; } +out: spin_unlock_irqrestore(sch->lock, flags); return ret; }
When we get a solicited interrupt, the start function may have been cleared by a csch, but we still have a channel program structure allocated. Make it safe to call the cp accessors in any case, so we can call them unconditionally. While at it, also make sure that functions called from other parts of the code return gracefully if the channel program structure has not been initialized (even though that is a bug in the caller). Signed-off-by: Cornelia Huck <cohuck@redhat.com> --- drivers/s390/cio/vfio_ccw_cp.c | 20 +++++++++++++++++++- drivers/s390/cio/vfio_ccw_cp.h | 2 ++ drivers/s390/cio/vfio_ccw_fsm.c | 5 +++++ 3 files changed, 26 insertions(+), 1 deletion(-)