diff mbox

lightnvm: pblk: Add read memory barrier when reading from rb

Message ID 1529535298-15465-1-git-send-email-hlitz@ucsc.edu (mailing list archive)
State New, archived
Headers show

Commit Message

Heiner Litz June 20, 2018, 10:54 p.m. UTC
READ_ONCE does not imply a read memory barrier in the presence of control
dependencies between two separate memory locations (flags and data). On x86
TSO, reading from the data page might be reordered before the flags read.
See chapter CONTROL DEPENDENCIES in
https://www.kernel.org/doc/Documentation/memory-barriers.txt

Signed-off-by: Heiner Litz <hlitz@ucsc.edu>
---
 drivers/lightnvm/pblk-rb.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Javier Gonzalez June 21, 2018, 7:46 a.m. UTC | #1
> On 21 Jun 2018, at 00.54, Heiner Litz <hlitz@ucsc.edu> wrote:
> 
> READ_ONCE does not imply a read memory barrier in the presence of control
> dependencies between two separate memory locations (flags and data). On x86
> TSO, reading from the data page might be reordered before the flags read.
> See chapter CONTROL DEPENDENCIES in
> https://www.kernel.org/doc/Documentation/memory-barriers.txt
> 
> Signed-off-by: Heiner Litz <hlitz@ucsc.edu>
> ---
> drivers/lightnvm/pblk-rb.c | 3 +++
> 1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
> index a81a97e..5f09983 100644
> --- a/drivers/lightnvm/pblk-rb.c
> +++ b/drivers/lightnvm/pblk-rb.c
> @@ -545,6 +545,9 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
> 			goto try;
> 		}
> 
> +		/* Observe control dependency between flags and data read */
> +		smp_rmb();
> +
> 		page = virt_to_page(entry->data);
> 		if (!page) {
> 			pr_err("pblk: could not allocate write bio page\n");
> --
> 2.7.4

Thanks for sending the patch; I was wondering where it went.


Reviewed-by: Javier González <javier@cnexlabs.com>
Matias Bjorling June 22, 2018, 6:02 p.m. UTC | #2
On 06/21/2018 12:54 AM, Heiner Litz wrote:
> READ_ONCE does not imply a read memory barrier in the presence of control
> dependencies between two separate memory locations (flags and data). On x86
> TSO, reading from the data page might be reordered before the flags read.
> See chapter CONTROL DEPENDENCIES in
> https://www.kernel.org/doc/Documentation/memory-barriers.txt
> 
> Signed-off-by: Heiner Litz <hlitz@ucsc.edu>
> ---
>   drivers/lightnvm/pblk-rb.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
> index a81a97e..5f09983 100644
> --- a/drivers/lightnvm/pblk-rb.c
> +++ b/drivers/lightnvm/pblk-rb.c
> @@ -545,6 +545,9 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
>   			goto try;
>   		}
>   
> +		/* Observe control dependency between flags and data read */
> +		smp_rmb();
> +
>   		page = virt_to_page(entry->data);
>   		if (!page) {
>   			pr_err("pblk: could not allocate write bio page\n");
> 

Hi Heiner,

Can you help explain how it is a control dependency? What case am I missing?

The way I read the code, there is the case where a read of entry->data 
happens before entry->w_ctx.flags, but I see that as a memory 
reordering, which the smp_rmb() fixes. Would that be more accurate?

Thank you,
Matias
Matias Bjorling June 28, 2018, 7:59 a.m. UTC | #3
On 06/28/2018 01:31 AM, Heiner Litz wrote:
> There is a control dependency between two disjoint variables (only read 
> data if flags == WRITTEN). Because x86-TSO allows re-ordering of loads 
> the control dependency can be violated.

I'm sorry, I do not see it :)

Here is my understanding:

entry->w_ctx.flags is used as a flagging mechanism to wait for data to 
become available when "flags" has PBLK_WRITTEN_DATA.

The later access to entry->data is independent of this. It is assumed 
that entry->w_ctx.flags is the guarding barrier.

Here is the titbit from the control dependency section that makes me say 
that the entry->data loads is not possible to be done before 
entry->w_ctx.flags:

" (*) Control dependencies apply only to the then-clause and else-clause
       of the if-statement containing the control dependency, including
       any functions that these two clauses call.  Control dependencies
       do -not- apply to code following the if-statement containing the
       control dependency."

The read of entry->data must come after the READ_ONCE of 
entry->w_ctx.flags.

> 
> https://www.kernel.org/doc/Documentation/memory-barriers.txt refers to 
> the above scenario as control dependency but your description is of 
> course also correct. Let me know if we should clarify the comment.
> 
> BTW: ARM has an even more relaxed memory model and also allows to 
> re-order writes. If we want to support ARM correctly, I think we also 
> need to insert a memory barrier between writing data and writing flags 
> (and there might be a couple other places where we need to check).
> 
> On Fri, Jun 22, 2018, 11:02 AM Matias Bjørling <mb@lightnvm.io 
> <mailto:mb@lightnvm.io>> wrote:
> 
>     On 06/21/2018 12:54 AM, Heiner Litz wrote:
>      > READ_ONCE does not imply a read memory barrier in the presence of
>     control
>      > dependencies between two separate memory locations (flags and
>     data). On x86
>      > TSO, reading from the data page might be reordered before the
>     flags read.
>      > See chapter CONTROL DEPENDENCIES in
>      > https://www.kernel.org/doc/Documentation/memory-barriers.txt
>      >
>      > Signed-off-by: Heiner Litz <hlitz@ucsc.edu <mailto:hlitz@ucsc.edu>>
>      > ---
>      >   drivers/lightnvm/pblk-rb.c | 3 +++
>      >   1 file changed, 3 insertions(+)
>      >
>      > diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
>      > index a81a97e..5f09983 100644
>      > --- a/drivers/lightnvm/pblk-rb.c
>      > +++ b/drivers/lightnvm/pblk-rb.c
>      > @@ -545,6 +545,9 @@ unsigned int pblk_rb_read_to_bio(struct
>     pblk_rb *rb, struct nvm_rq *rqd,
>      >                       goto try;
>      >               }
>      >
>      > +             /* Observe control dependency between flags and
>     data read */
>      > +             smp_rmb();
>      > +
>      >               page = virt_to_page(entry->data);
>      >               if (!page) {
>      >                       pr_err("pblk: could not allocate write bio
>     page\n");
>      >
> 
>     Hi Heiner,
> 
>     Can you help explain how it is a control dependency? What case am I
>     missing?
> 
>     The way I read the code, there is the case where a read of entry->data
>     happens before entry->w_ctx.flags, but I see that as a memory
>     reordering, which the smp_rmb() fixes. Would that be more accurate?
> 
>     Thank you,
>     Matias
>
Javier Gonzalez June 28, 2018, 8:15 a.m. UTC | #4
> On 28 Jun 2018, at 09.59, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> On 06/28/2018 01:31 AM, Heiner Litz wrote:
>> There is a control dependency between two disjoint variables (only read data if flags == WRITTEN). Because x86-TSO allows re-ordering of loads the control dependency can be violated.
> 
> I'm sorry, I do not see it :)
> 
> Here is my understanding:
> 
> entry->w_ctx.flags is used as a flagging mechanism to wait for data to become available when "flags" has PBLK_WRITTEN_DATA.
> 
> The later access to entry->data is independent of this. It is assumed that entry->w_ctx.flags is the guarding barrier.

This is correct. The motivation is to allow several producers to fill the
buffer simultaneously.

> 
> Here is the titbit from the control dependency section that makes me
> say that the entry->data loads is not possible to be done before
> entry->w_ctx.flags:
> 
> " (*) Control dependencies apply only to the then-clause and else-clause
>      of the if-statement containing the control dependency, including
>      any functions that these two clauses call.  Control dependencies
>      do -not- apply to code following the if-statement containing the
>      control dependency."
> 
> The read of entry->data must come after the READ_ONCE of entry->w_ctx.flags.
> 

I also understood it this way when implementing the barrier, but after
an offline discussion with Heiner, he convinced me that reordering could
occur.

>> BTW: ARM has an even more relaxed memory model and also allows to
>> re-order writes. If we want to support ARM correctly, I think we also
>> need to insert a memory barrier between writing data and writing
>> flags (and there might be a couple other places where we need to
>> check).

Heiner: Can you develop on this?

Javier
Heiner Litz July 3, 2018, 10:14 a.m. UTC | #5
Hi Matias,

I think you may be right. I created some test benches and could
produce the error I was expecting. My reasoning was based on the
following:

"This will not have the desired effect because there is no actual data
dependency, but rather a control dependency that the CPU may short-circuit
by attempting to predict the outcome in advance, so that other CPUs see
the load from b as having happened before the load from a.  In such a
case what's actually required is:

q = READ_ONCE(a);
if (q) {
<read barrier>
p = READ_ONCE(b);
}"

As I understand above, the software gives no guarantee that above
works without the read_barrier. The section somewhat contradicts
Matias's reference. Reading more on x86-TSO it seems as if the
hardware does not allow reordering of read(a) and read(b) so we should
be safe, however, ARM hardware would definitely allow reordering of
the two reads. Summary: The proposed smp_rmb() is not required for x86
but it may be for ARM.

On Thu, Jun 28, 2018 at 10:15 AM Javier Gonzalez <javier@cnexlabs.com> wrote:
>
> > On 28 Jun 2018, at 09.59, Matias Bjørling <mb@lightnvm.io> wrote:
> >
> > On 06/28/2018 01:31 AM, Heiner Litz wrote:
> >> There is a control dependency between two disjoint variables (only read data if flags == WRITTEN). Because x86-TSO allows re-ordering of loads the control dependency can be violated.
> >
> > I'm sorry, I do not see it :)
> >
> > Here is my understanding:
> >
> > entry->w_ctx.flags is used as a flagging mechanism to wait for data to become available when "flags" has PBLK_WRITTEN_DATA.
> >
> > The later access to entry->data is independent of this. It is assumed that entry->w_ctx.flags is the guarding barrier.
>
> This is correct. The motivation is to allow several producers to fill the
> buffer simultaneously.
>
> >
> > Here is the titbit from the control dependency section that makes me
> > say that the entry->data loads is not possible to be done before
> > entry->w_ctx.flags:
> >
> > " (*) Control dependencies apply only to the then-clause and else-clause
> >      of the if-statement containing the control dependency, including
> >      any functions that these two clauses call.  Control dependencies
> >      do -not- apply to code following the if-statement containing the
> >      control dependency."
> >
> > The read of entry->data must come after the READ_ONCE of entry->w_ctx.flags.
> >
>
> I also understood it this way when implementing the barrier, but after
> an offline discussion with Heiner, he convinced me that reordering could
> occur.
>
> >> BTW: ARM has an even more relaxed memory model and also allows to
> >> re-order writes. If we want to support ARM correctly, I think we also
> >> need to insert a memory barrier between writing data and writing
> >> flags (and there might be a couple other places where we need to
> >> check).
>
> Heiner: Can you develop on this?
>
> Javier
diff mbox

Patch

diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index a81a97e..5f09983 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -545,6 +545,9 @@  unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
 			goto try;
 		}
 
+		/* Observe control dependency between flags and data read */
+		smp_rmb();
+
 		page = virt_to_page(entry->data);
 		if (!page) {
 			pr_err("pblk: could not allocate write bio page\n");