Message ID | 20170717132646.3020-7-laurentiu.tudor@nxp.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 17/07/17 14:26, laurentiu.tudor@nxp.com wrote: > From: Laurentiu Tudor <laurentiu.tudor@nxp.com> > > Split the 64-bit accesses in 32-bit accesses because there's no real > constrain in MC to do only atomic 64-bit. There's only one place > where ordering is important: when writing the MC command header the > first 32-bit part of the header must be written last. > We do this switch in order to allow compiling the driver on 32-bit. #include <linux/io-64-nonatomic-hi-lo.h> Then you can just use writeq()/writeq_relaxed()/readq_relaxed() on 32-bit platforms as well. Robin. > Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> > --- > drivers/staging/fsl-mc/bus/mc-sys.c | 31 ++++++++++++------------------- > 1 file changed, 12 insertions(+), 19 deletions(-) > > diff --git a/drivers/staging/fsl-mc/bus/mc-sys.c b/drivers/staging/fsl-mc/bus/mc-sys.c > index 195d9f3..dd2828e 100644 > --- a/drivers/staging/fsl-mc/bus/mc-sys.c > +++ b/drivers/staging/fsl-mc/bus/mc-sys.c > @@ -124,14 +124,15 @@ static inline void mc_write_command(struct mc_command __iomem *portal, > { > int i; > > - /* copy command parameters into the portal */ > - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) > - __raw_writeq(cmd->params[i], &portal->params[i]); > - /* ensure command params are committed before submitting it */ > - wmb(); > - > - /* submit the command by writing the header */ > - __raw_writeq(cmd->header, &portal->header); > + /* > + * copy command parameters into the portal. Final write > + * triggers the submission of the command. > + */ > + for (i = sizeof(struct mc_command) / sizeof(u32) - 1; i >= 0; i--) { > + __raw_writel(((u32 *)cmd)[i], &((u32 *)portal)[i]); > + /* ensure command params are committed before submitting it */ > + wmb(); > + } > } > > /** > @@ -148,19 +149,11 @@ static inline enum mc_cmd_status mc_read_response(struct mc_command __iomem * > struct mc_command *resp) > { > int i; > - enum mc_cmd_status status; > - > - /* Copy command response header from MC portal: */ > - resp->header = __raw_readq(&portal->header); > - status = mc_cmd_hdr_read_status(resp); > - if (status != MC_CMD_STATUS_OK) > - return status; > > - /* Copy command response data from MC portal: */ > - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) > - resp->params[i] = __raw_readq(&portal->params[i]); > + for (i = 0; i < sizeof(struct mc_command) / sizeof(u32); i++) > + ((u32 *)resp)[i] = __raw_readl(&((u32 *)portal)[i]); > > - return status; > + return mc_cmd_hdr_read_status(resp); > } > > /** >
On Mon, Jul 17, 2017 at 3:26 PM, <laurentiu.tudor@nxp.com> wrote: > From: Laurentiu Tudor <laurentiu.tudor@nxp.com> > > Split the 64-bit accesses in 32-bit accesses because there's no real > constrain in MC to do only atomic 64-bit. There's only one place > where ordering is important: when writing the MC command header the > first 32-bit part of the header must be written last. > We do this switch in order to allow compiling the driver on 32-bit. > > Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> > --- > drivers/staging/fsl-mc/bus/mc-sys.c | 31 ++++++++++++------------------- > 1 file changed, 12 insertions(+), 19 deletions(-) > > diff --git a/drivers/staging/fsl-mc/bus/mc-sys.c b/drivers/staging/fsl-mc/bus/mc-sys.c > index 195d9f3..dd2828e 100644 > --- a/drivers/staging/fsl-mc/bus/mc-sys.c > +++ b/drivers/staging/fsl-mc/bus/mc-sys.c > @@ -124,14 +124,15 @@ static inline void mc_write_command(struct mc_command __iomem *portal, > { > int i; > > - /* copy command parameters into the portal */ > - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) > - __raw_writeq(cmd->params[i], &portal->params[i]); > - /* ensure command params are committed before submitting it */ > - wmb(); > - > - /* submit the command by writing the header */ > - __raw_writeq(cmd->header, &portal->header); > + /* > + * copy command parameters into the portal. Final write > + * triggers the submission of the command. > + */ > + for (i = sizeof(struct mc_command) / sizeof(u32) - 1; i >= 0; i--) { > + __raw_writel(((u32 *)cmd)[i], &((u32 *)portal)[i]); > + /* ensure command params are committed before submitting it */ > + wmb(); > + } > } What is the byte order requirement on this buffer? If this is a byte string rather than individual registers, you should probably just use memcpy_toio(), but if these are separate registers, then using the __raw_* accessors is still wrong, at least on kernels that have a different byteorder from the hardware. Also, are you sure that adding those six extra barriers has no performance impact? Arnd
Hi Arnd, On 07/17/2017 04:45 PM, Arnd Bergmann wrote: > On Mon, Jul 17, 2017 at 3:26 PM, <laurentiu.tudor@nxp.com> wrote: >> From: Laurentiu Tudor <laurentiu.tudor@nxp.com> >> >> Split the 64-bit accesses in 32-bit accesses because there's no real >> constrain in MC to do only atomic 64-bit. There's only one place >> where ordering is important: when writing the MC command header the >> first 32-bit part of the header must be written last. >> We do this switch in order to allow compiling the driver on 32-bit. >> >> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> >> --- >> drivers/staging/fsl-mc/bus/mc-sys.c | 31 ++++++++++++------------------- >> 1 file changed, 12 insertions(+), 19 deletions(-) >> >> diff --git a/drivers/staging/fsl-mc/bus/mc-sys.c b/drivers/staging/fsl-mc/bus/mc-sys.c >> index 195d9f3..dd2828e 100644 >> --- a/drivers/staging/fsl-mc/bus/mc-sys.c >> +++ b/drivers/staging/fsl-mc/bus/mc-sys.c >> @@ -124,14 +124,15 @@ static inline void mc_write_command(struct mc_command __iomem *portal, >> { >> int i; >> >> - /* copy command parameters into the portal */ >> - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) >> - __raw_writeq(cmd->params[i], &portal->params[i]); >> - /* ensure command params are committed before submitting it */ >> - wmb(); >> - >> - /* submit the command by writing the header */ >> - __raw_writeq(cmd->header, &portal->header); >> + /* >> + * copy command parameters into the portal. Final write >> + * triggers the submission of the command. >> + */ >> + for (i = sizeof(struct mc_command) / sizeof(u32) - 1; i >= 0; i--) { >> + __raw_writel(((u32 *)cmd)[i], &((u32 *)portal)[i]); >> + /* ensure command params are committed before submitting it */ >> + wmb(); >> + } >> } > > What is the byte order requirement on this buffer? Endianness is handled by the callers so this function must leave the binary blob intact. > If this is a byte string > rather than individual registers, you should probably just use > memcpy_toio() It's a header followed by an opaque command. The protocol for queueing a command says that the first 32-bit half of the header must be written last, this triggering the command handling in the MC. > but if these are separate registers, then using the > __raw_* accessors is still wrong, at least on kernels that have a > different byteorder from the hardware. As mentioned above, endianness is handled by the caller. This function takes raw data and must leave it unchanged. > Also, are you sure that adding those six extra barriers has no > performance impact? This is a slow interface used in slow paths, so i don't think those extra barriers will have any performance impact. --- Thanks & Best Regards, Laurentiu
Hi Robin, On 07/17/2017 04:43 PM, Robin Murphy wrote: > On 17/07/17 14:26, laurentiu.tudor@nxp.com wrote: >> From: Laurentiu Tudor <laurentiu.tudor@nxp.com> >> >> Split the 64-bit accesses in 32-bit accesses because there's no real >> constrain in MC to do only atomic 64-bit. There's only one place >> where ordering is important: when writing the MC command header the >> first 32-bit part of the header must be written last. >> We do this switch in order to allow compiling the driver on 32-bit. > > #include <linux/io-64-nonatomic-hi-lo.h> > > Then you can just use writeq()/writeq_relaxed()/readq_relaxed() on > 32-bit platforms as well. > Thanks, didn't knew of the header. This should take care of the compilation errors i was seeing. There's one problem though: these handle byte-ordering and i need raw accessors. :-( --- Best Regards, Laurentiu >> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> >> --- >> drivers/staging/fsl-mc/bus/mc-sys.c | 31 ++++++++++++------------------- >> 1 file changed, 12 insertions(+), 19 deletions(-) >> >> diff --git a/drivers/staging/fsl-mc/bus/mc-sys.c b/drivers/staging/fsl-mc/bus/mc-sys.c >> index 195d9f3..dd2828e 100644 >> --- a/drivers/staging/fsl-mc/bus/mc-sys.c >> +++ b/drivers/staging/fsl-mc/bus/mc-sys.c >> @@ -124,14 +124,15 @@ static inline void mc_write_command(struct mc_command __iomem *portal, >> { >> int i; >> >> - /* copy command parameters into the portal */ >> - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) >> - __raw_writeq(cmd->params[i], &portal->params[i]); >> - /* ensure command params are committed before submitting it */ >> - wmb(); >> - >> - /* submit the command by writing the header */ >> - __raw_writeq(cmd->header, &portal->header); >> + /* >> + * copy command parameters into the portal. Final write >> + * triggers the submission of the command. >> + */ >> + for (i = sizeof(struct mc_command) / sizeof(u32) - 1; i >= 0; i--) { >> + __raw_writel(((u32 *)cmd)[i], &((u32 *)portal)[i]); >> + /* ensure command params are committed before submitting it */ >> + wmb(); >> + } >> } >> >> /** >> @@ -148,19 +149,11 @@ static inline enum mc_cmd_status mc_read_response(struct mc_command __iomem * >> struct mc_command *resp) >> { >> int i; >> - enum mc_cmd_status status; >> - >> - /* Copy command response header from MC portal: */ >> - resp->header = __raw_readq(&portal->header); >> - status = mc_cmd_hdr_read_status(resp); >> - if (status != MC_CMD_STATUS_OK) >> - return status; >> >> - /* Copy command response data from MC portal: */ >> - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) >> - resp->params[i] = __raw_readq(&portal->params[i]); >> + for (i = 0; i < sizeof(struct mc_command) / sizeof(u32); i++) >> + ((u32 *)resp)[i] = __raw_readl(&((u32 *)portal)[i]); >> >> - return status; >> + return mc_cmd_hdr_read_status(resp); >> } >> >> /** >> >
On Mon, Jul 17, 2017 at 4:27 PM, Laurentiu Tudor <laurentiu.tudor@nxp.com> wrote: > Hi Arnd, > > On 07/17/2017 04:45 PM, Arnd Bergmann wrote: >> On Mon, Jul 17, 2017 at 3:26 PM, <laurentiu.tudor@nxp.com> wrote: >>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com> >>> >>> Split the 64-bit accesses in 32-bit accesses because there's no real >>> constrain in MC to do only atomic 64-bit. There's only one place >>> where ordering is important: when writing the MC command header the >>> first 32-bit part of the header must be written last. >>> We do this switch in order to allow compiling the driver on 32-bit. >>> >>> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> >>> --- >>> drivers/staging/fsl-mc/bus/mc-sys.c | 31 ++++++++++++------------------- >>> 1 file changed, 12 insertions(+), 19 deletions(-) >>> >>> diff --git a/drivers/staging/fsl-mc/bus/mc-sys.c b/drivers/staging/fsl-mc/bus/mc-sys.c >>> index 195d9f3..dd2828e 100644 >>> --- a/drivers/staging/fsl-mc/bus/mc-sys.c >>> +++ b/drivers/staging/fsl-mc/bus/mc-sys.c >>> @@ -124,14 +124,15 @@ static inline void mc_write_command(struct mc_command __iomem *portal, >>> { >>> int i; >>> >>> - /* copy command parameters into the portal */ >>> - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) >>> - __raw_writeq(cmd->params[i], &portal->params[i]); >>> - /* ensure command params are committed before submitting it */ >>> - wmb(); >>> - >>> - /* submit the command by writing the header */ >>> - __raw_writeq(cmd->header, &portal->header); >>> + /* >>> + * copy command parameters into the portal. Final write >>> + * triggers the submission of the command. >>> + */ >>> + for (i = sizeof(struct mc_command) / sizeof(u32) - 1; i >= 0; i--) { >>> + __raw_writel(((u32 *)cmd)[i], &((u32 *)portal)[i]); >>> + /* ensure command params are committed before submitting it */ >>> + wmb(); >>> + } >>> } >> >> What is the byte order requirement on this buffer? > > Endianness is handled by the callers so this function must leave > the binary blob intact. Got it, the endianess looks correct indeed. >> If this is a byte string >> rather than individual registers, you should probably just use >> memcpy_toio() > > It's a header followed by an opaque command. The protocol for queueing a > command says that the first 32-bit half of the header must be written > last, this triggering the command handling in the MC. Strictly speaking the __raw_writel() won't guarantee that the data is written as a single word, the compiler might decide to split it up into byte-sized writes if it believes the destination pointer is unaligned and the CPU has no efficient writes. I think this cannot happen on arm or powerpc, as we go through inline assembly for the store, but it's not completely portable. Before your patch, both the compiler and the CPU could also reorder the stores in theory (but normally won't), but the wmb() will prevent that now. >> but if these are separate registers, then using the >> __raw_* accessors is still wrong, at least on kernels that have a >> different byteorder from the hardware. > > As mentioned above, endianness is handled by the caller. This function > takes raw data and must leave it unchanged. > >> Also, are you sure that adding those six extra barriers has no >> performance impact? > > This is a slow interface used in slow paths, so i don't think those > extra barriers will have any performance impact. Ok. One other problem remains: most developers looking at this code like Robin and me will immediately think there might be an endianess bug here. As this is a slow path, how about using an explicit conversion using writeq(le64_to_cpup(buffer), pointer); with a comment? That would explain what's going on and escape people looking for incorrect __raw_writeq() uses. I would expect that the compiler can actually drop the double byteswap here by itself, and even if it doesn't, the extra swaps won't cause noticeable overhead compared to the barriers. Arnd
On 07/17/2017 06:00 PM, Arnd Bergmann wrote: > On Mon, Jul 17, 2017 at 4:27 PM, Laurentiu Tudor > <laurentiu.tudor@nxp.com> wrote: >> Hi Arnd, >> >> On 07/17/2017 04:45 PM, Arnd Bergmann wrote: >>> On Mon, Jul 17, 2017 at 3:26 PM, <laurentiu.tudor@nxp.com> wrote: >>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com> >>>> >>>> Split the 64-bit accesses in 32-bit accesses because there's no real >>>> constrain in MC to do only atomic 64-bit. There's only one place >>>> where ordering is important: when writing the MC command header the >>>> first 32-bit part of the header must be written last. >>>> We do this switch in order to allow compiling the driver on 32-bit. >>>> >>>> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> >>>> --- >>>> drivers/staging/fsl-mc/bus/mc-sys.c | 31 ++++++++++++------------------- >>>> 1 file changed, 12 insertions(+), 19 deletions(-) >>>> >>>> diff --git a/drivers/staging/fsl-mc/bus/mc-sys.c b/drivers/staging/fsl-mc/bus/mc-sys.c >>>> index 195d9f3..dd2828e 100644 >>>> --- a/drivers/staging/fsl-mc/bus/mc-sys.c >>>> +++ b/drivers/staging/fsl-mc/bus/mc-sys.c >>>> @@ -124,14 +124,15 @@ static inline void mc_write_command(struct mc_command __iomem *portal, >>>> { >>>> int i; >>>> >>>> - /* copy command parameters into the portal */ >>>> - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) >>>> - __raw_writeq(cmd->params[i], &portal->params[i]); >>>> - /* ensure command params are committed before submitting it */ >>>> - wmb(); >>>> - >>>> - /* submit the command by writing the header */ >>>> - __raw_writeq(cmd->header, &portal->header); >>>> + /* >>>> + * copy command parameters into the portal. Final write >>>> + * triggers the submission of the command. >>>> + */ >>>> + for (i = sizeof(struct mc_command) / sizeof(u32) - 1; i >= 0; i--) { >>>> + __raw_writel(((u32 *)cmd)[i], &((u32 *)portal)[i]); >>>> + /* ensure command params are committed before submitting it */ >>>> + wmb(); >>>> + } >>>> } >>> >>> What is the byte order requirement on this buffer? >> >> Endianness is handled by the callers so this function must leave >> the binary blob intact. > > Got it, the endianess looks correct indeed. > >>> If this is a byte string >>> rather than individual registers, you should probably just use >>> memcpy_toio() >> >> It's a header followed by an opaque command. The protocol for queueing a >> command says that the first 32-bit half of the header must be written >> last, this triggering the command handling in the MC. > > Strictly speaking the __raw_writel() won't guarantee that the > data is written as a single word, the compiler might decide to > split it up into byte-sized writes if it believes the destination pointer > is unaligned and the CPU has no efficient writes. > > I think this cannot happen on arm or powerpc, as we go through > inline assembly for the store, but it's not completely portable. Should i worry about portability? Slim changes that this driver will ever run on anything else other than ARM & ARM64. My current goal was just to make it compile on other arches. > Before your patch, both the compiler and the CPU could also > reorder the stores in theory (but normally won't), but the wmb() > will prevent that now. Ok, thanks for the info. >>> but if these are separate registers, then using the >>> __raw_* accessors is still wrong, at least on kernels that have a >>> different byteorder from the hardware. >> >> As mentioned above, endianness is handled by the caller. This function >> takes raw data and must leave it unchanged. >> >>> Also, are you sure that adding those six extra barriers has no >>> performance impact? >> >> This is a slow interface used in slow paths, so i don't think those >> extra barriers will have any performance impact. > > Ok. > > One other problem remains: most developers looking at this > code like Robin and me will immediately think there might be > an endianess bug here. As this is a slow path, how about > using an explicit conversion using > > writeq(le64_to_cpup(buffer), pointer); Sure, sounds perfect. I'll do that in the next respin. --- Thanks & Best Regards, Laurentiu
On Tue, Jul 18, 2017 at 1:08 PM, Laurentiu Tudor <laurentiu.tudor@nxp.com> wrote: > On 07/17/2017 06:00 PM, Arnd Bergmann wrote: >> Strictly speaking the __raw_writel() won't guarantee that the >> data is written as a single word, the compiler might decide to >> split it up into byte-sized writes if it believes the destination pointer >> is unaligned and the CPU has no efficient writes. >> >> I think this cannot happen on arm or powerpc, as we go through >> inline assembly for the store, but it's not completely portable. > > Should i worry about portability? Slim changes that this driver > will ever run on anything else other than ARM & ARM64. > My current goal was just to make it compile on other arches. I always recommend writing any driver in the most portable way out of principle, since you never know who looks at it for reference when writing another driver. I wouldn't expect the driver itself to be used on other architectures, but of course you never know what CPU becomes fashionable 10 years from now. Arnd
diff --git a/drivers/staging/fsl-mc/bus/mc-sys.c b/drivers/staging/fsl-mc/bus/mc-sys.c index 195d9f3..dd2828e 100644 --- a/drivers/staging/fsl-mc/bus/mc-sys.c +++ b/drivers/staging/fsl-mc/bus/mc-sys.c @@ -124,14 +124,15 @@ static inline void mc_write_command(struct mc_command __iomem *portal, { int i; - /* copy command parameters into the portal */ - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) - __raw_writeq(cmd->params[i], &portal->params[i]); - /* ensure command params are committed before submitting it */ - wmb(); - - /* submit the command by writing the header */ - __raw_writeq(cmd->header, &portal->header); + /* + * copy command parameters into the portal. Final write + * triggers the submission of the command. + */ + for (i = sizeof(struct mc_command) / sizeof(u32) - 1; i >= 0; i--) { + __raw_writel(((u32 *)cmd)[i], &((u32 *)portal)[i]); + /* ensure command params are committed before submitting it */ + wmb(); + } } /** @@ -148,19 +149,11 @@ static inline enum mc_cmd_status mc_read_response(struct mc_command __iomem * struct mc_command *resp) { int i; - enum mc_cmd_status status; - - /* Copy command response header from MC portal: */ - resp->header = __raw_readq(&portal->header); - status = mc_cmd_hdr_read_status(resp); - if (status != MC_CMD_STATUS_OK) - return status; - /* Copy command response data from MC portal: */ - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) - resp->params[i] = __raw_readq(&portal->params[i]); + for (i = 0; i < sizeof(struct mc_command) / sizeof(u32); i++) + ((u32 *)resp)[i] = __raw_readl(&((u32 *)portal)[i]); - return status; + return mc_cmd_hdr_read_status(resp); } /**