diff mbox

usb: dwc3: host: inherit dma configuration from parent dev

Message ID 877ffjro0w.fsf@ti.com (mailing list archive)
State New, archived
Headers show

Commit Message

Felipe Balbi April 27, 2016, 4:53 p.m. UTC
Hi,

Arnd Bergmann <arnd@arndb.de> writes:
> On Wednesday 27 April 2016 16:50:19 Catalin Marinas wrote:
>> On Wed, Apr 27, 2016 at 04:11:17PM +0200, Arnd Bergmann wrote:
>> > On Wednesday 27 April 2016 14:59:00 Catalin Marinas wrote:
>> > > 
>> > > I would be in favour of a dma_inherit() function as well. We could hack
>> > > something up in the arch code (like below) but I would rather prefer an
>> > > explicit dma_inherit() call by drivers creating such devices.
>> > > 
>> > > diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
>> > > index ba437f090a74..ea6fb9b0e8fa 100644
>> > > --- a/arch/arm64/include/asm/dma-mapping.h
>> > > +++ b/arch/arm64/include/asm/dma-mapping.h
>> > > @@ -29,8 +29,11 @@ extern struct dma_map_ops dummy_dma_ops;
>> > >  
>> > >  static inline struct dma_map_ops *__generic_dma_ops(struct device *dev)
>> > >  {
>> > > -       if (dev && dev->archdata.dma_ops)
>> > > -               return dev->archdata.dma_ops;
>> > > +       while (dev) {
>> > > +               if (dev->archdata.dma_ops)
>> > > +                       return dev->archdata.dma_ops;
>> > > +               dev = dev->parent;
>> > > +       }
>> > 
>> > I think this would be a very bad idea: we don't want to have random
>> > devices be able to perform DMA just because their parent devices
>> > have been set up that way.
>> 
>> I agree, it's a big hack. It would be nice to have a simpler way to do
>> this in driver code rather than explicitly calling
>> of_dma_configure/arch_setup_dma_ops as per the original patch in this
>> thread.
>
> I haven't followed the entire discussion, but what's wrong with passing
> around a pointer to a 'struct device *hwdev' that represents the physical
> device that does the DMA?

that will likely create duplicated solutions in several drivers and
it'll be a pain to maintain. There's another complication, dwc3 can be
integrated in many different ways. See the device child-parent tree
representations below:

a) with a parent PCI device:

pci_bus_type
 - dwc3-pci
   - dwc3
     - xhci-plat

b) with a parent platform_device (OF):

platform_bus_type
 - dwc3-${omap,st,of-simple,exynos,keystone}
   - dwc3
     - xhci-plat

c) without a parent at all (thanks Grygorii):

platform_bus_type
 - dwc3
   - xhci-plat

(a) and (b) above are the common cases. The DMA-capable device is
clearly dwc3-${pci,omap,st,of-simple,exynos,keystone} with dwc3 only
having proper DMA configuration in OF platforms (because of the
unconditional of_dma_configure() during OF device creation) and
xhci-plat not knowing about DMA at all and hardcoding some crappy
defaults.

(c) is the uncommon case which creates some problems. In this case, dwc3
itself is the DMA-capable device and dwc3->dev->parent is the
platform_bus_type itself. Now consider the problem this creates:

i. the patch that I wrote [1] becomes invalid for (c), thanks to
Grygorii for pointing this out before it was too late.

ii. xhci-plat can also be described directly in DT (and is in some
cases). This means that assuming xhci-plat's parent's parent to be the
DMA-capable device is also an invalid assumption.

iii. one might argue that for DT-based platforms *with* a glue layer
((b) above), OF already "copies" some sensible DMA defaults during
device creation. PCI-based systems just don't have the luxury of
creating random PCI devices like that :-) I say it copies because I can
pass *any* struct device_node pointer and it'll just copy that to the
struct device argument. Here's of_dma_configure() to make your life
easier:

void of_dma_configure(struct device *dev, struct device_node *np)
{
	u64 dma_addr, paddr, size;
	int ret;
	bool coherent;
	unsigned long offset;
	struct iommu_ops *iommu;

	/*
	 * Set default coherent_dma_mask to 32 bit.  Drivers are expected to
	 * setup the correct supported mask.
	 */
	if (!dev->coherent_dma_mask)
		dev->coherent_dma_mask = DMA_BIT_MASK(32);

	/*
	 * Set it to coherent_dma_mask by default if the architecture
	 * code has not set it.
	 */
	if (!dev->dma_mask)
		dev->dma_mask = &dev->coherent_dma_mask;

	ret = of_dma_get_range(np, &dma_addr, &paddr, &size);
	if (ret < 0) {
		dma_addr = offset = 0;
		size = dev->coherent_dma_mask + 1;
	} else {
		offset = PFN_DOWN(paddr - dma_addr);

		/*
		 * Add a work around to treat the size as mask + 1 in case
		 * it is defined in DT as a mask.
		 */
		if (size & 1) {
			dev_warn(dev, "Invalid size 0x%llx for dma-range\n",
				 size);
			size = size + 1;
		}

		if (!size) {
			dev_err(dev, "Adjusted size 0x%llx invalid\n", size);
			return;
		}
		dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", offset);
	}

	dev->dma_pfn_offset = offset;

	/*
	 * Limit coherent and dma mask based on size and default mask
	 * set by the driver.
	 */
	dev->coherent_dma_mask = min(dev->coherent_dma_mask,
				     DMA_BIT_MASK(ilog2(dma_addr + size)));
	*dev->dma_mask = min((*dev->dma_mask),
			     DMA_BIT_MASK(ilog2(dma_addr + size)));

	coherent = of_dma_is_coherent(np);
	dev_dbg(dev, "device is%sdma coherent\n",
		coherent ? " " : " not ");

	iommu = of_iommu_configure(dev, np);
	dev_dbg(dev, "device is%sbehind an iommu\n",
		iommu ? " " : " not ");

	arch_setup_dma_ops(dev, dma_addr, size, iommu, coherent);
}

The only clean way to fix this up is with a dma_inherit()-like API which
would allow dwc3's glue layers ((a) and (b) above) to initialize child's
(dwc3) DMA configuration during child's creation. Something like below:


that's all I'm asking for :-) dma_inherit() should, probably, be
arch-specific to handle details like IOMMU (which today sits under
archdata).

without something like this, we're gonna have to resort to tons of
checks trying to find the correct DMA configuration to use in all
possible usage scenarios of dwc3.

Note, however, that I'm using dwc3 as an example, but anywhere you see
manual platform_device creation, there's a potential problem WRT DMA
and/or IOMMU.

[1] https://git.kernel.org/cgit/linux/kernel/git/balbi/usb.git/commit/?h=testing/next&id=2725d6f974c4c268ae5fb746f8e3b33b76135aa8

Comments

Arnd Bergmann April 27, 2016, 5:42 p.m. UTC | #1
On Wednesday 27 April 2016 19:53:51 Felipe Balbi wrote:
> Arnd Bergmann <arnd@arndb.de> writes:
> > On Wednesday 27 April 2016 16:50:19 Catalin Marinas wrote:
> >> On Wed, Apr 27, 2016 at 04:11:17PM +0200, Arnd Bergmann wrote:
> >> > On Wednesday 27 April 2016 14:59:00 Catalin Marinas wrote:
> >> > > 
> >> > > I would be in favour of a dma_inherit() function as well. We could hack
> >> > > something up in the arch code (like below) but I would rather prefer an
> >> > > explicit dma_inherit() call by drivers creating such devices.
> >> > > 
> >> > > diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
> >> > > index ba437f090a74..ea6fb9b0e8fa 100644
> >> > > --- a/arch/arm64/include/asm/dma-mapping.h
> >> > > +++ b/arch/arm64/include/asm/dma-mapping.h
> >> > > @@ -29,8 +29,11 @@ extern struct dma_map_ops dummy_dma_ops;
> >> > >  
> >> > >  static inline struct dma_map_ops *__generic_dma_ops(struct device *dev)
> >> > >  {
> >> > > -       if (dev && dev->archdata.dma_ops)
> >> > > -               return dev->archdata.dma_ops;
> >> > > +       while (dev) {
> >> > > +               if (dev->archdata.dma_ops)
> >> > > +                       return dev->archdata.dma_ops;
> >> > > +               dev = dev->parent;
> >> > > +       }
> >> > 
> >> > I think this would be a very bad idea: we don't want to have random
> >> > devices be able to perform DMA just because their parent devices
> >> > have been set up that way.
> >> 
> >> I agree, it's a big hack. It would be nice to have a simpler way to do
> >> this in driver code rather than explicitly calling
> >> of_dma_configure/arch_setup_dma_ops as per the original patch in this
> >> thread.
> >
> > I haven't followed the entire discussion, but what's wrong with passing
> > around a pointer to a 'struct device *hwdev' that represents the physical
> > device that does the DMA?
> 
> that will likely create duplicated solutions in several drivers and
> it'll be a pain to maintain. There's another complication, dwc3 can be
> integrated in many different ways. See the device child-parent tree
> representations below:
> 
> a) with a parent PCI device:
> 
> pci_bus_type
>  - dwc3-pci
>    - dwc3
>      - xhci-plat
> 
> b) with a parent platform_device (OF):
> 
> platform_bus_type
>  - dwc3-${omap,st,of-simple,exynos,keystone}
>    - dwc3
>      - xhci-plat
> 
> c) without a parent at all (thanks Grygorii):
> 
> platform_bus_type
>  - dwc3
>    - xhci-plat
> 
> (a) and (b) above are the common cases. The DMA-capable device is
> clearly dwc3-${pci,omap,st,of-simple,exynos,keystone} with dwc3 only
> having proper DMA configuration in OF platforms (because of the
> unconditional of_dma_configure() during OF device creation) and
> xhci-plat not knowing about DMA at all and hardcoding some crappy
> defaults.
> 
> (c) is the uncommon case which creates some problems. In this case, dwc3
> itself is the DMA-capable device and dwc3->dev->parent is the
> platform_bus_type itself. Now consider the problem this creates:
> 
> i. the patch that I wrote [1] becomes invalid for (c), thanks to
> Grygorii for pointing this out before it was too late.
> 
> ii. xhci-plat can also be described directly in DT (and is in some
> cases). This means that assuming xhci-plat's parent's parent to be the
> DMA-capable device is also an invalid assumption.
> 
> iii. one might argue that for DT-based platforms *with* a glue layer
> ((b) above), OF already "copies" some sensible DMA defaults during
> device creation.

But that is exactly the point I was trying to make: instead of copying
all the data into the xhci-plat device, just assign one pointer
inside the xhci-plat device from whoever creates it.

> The only clean way to fix this up is with a dma_inherit()-like API which
> would allow dwc3's glue layers ((a) and (b) above) to initialize child's
> (dwc3) DMA configuration during child's creation. Something like below:
> 
> diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
> index adc1e8a624cb..74b599269e2c 100644
> --- a/drivers/usb/dwc3/dwc3-pci.c
> +++ b/drivers/usb/dwc3/dwc3-pci.c
> @@ -152,6 +152,8 @@ static int dwc3_pci_probe(struct pci_dev *pci,
>  		return -ENOMEM;
>  	}
>  
> +	dma_inherit(&dwc->dev, dev);
> +
>  	memset(res, 0x00, sizeof(struct resource) * ARRAY_SIZE(res));
>  
>  	res[0].start	= pci_resource_start(pci, 0);
> 
> that's all I'm asking for :-) dma_inherit() should, probably, be
> arch-specific to handle details like IOMMU (which today sits under
> archdata).

It's still wrong: the archdata in the device can contain all sorts of
additional information about how to do DMA, and some of that information
is bus specific, e.g. when your dma_map_ops look like the on sparc:

static inline struct dma_map_ops *get_dma_ops(struct device *dev)
{
#ifdef CONFIG_SPARC_LEON
        if (sparc_cpu_model == sparc_leon)
                return leon_dma_ops;
#endif
#if defined(CONFIG_SPARC32) && defined(CONFIG_PCI)
        if (dev->bus == &pci_bus_type)
                return &pci32_dma_ops;
#endif
        return dma_ops;
}

There is no way for a device to "inherit" the bus_type of its parent,
so it becomes an architecture specific hack. However, if you change
all the dma operations to work on the actual device that the dma
mapping API knows about, it just works.

I've looked at the usb HCD code now and see this:

struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver,
                struct device *dev, const char *bus_name,
                struct usb_hcd *primary_hcd)
{
	...
        hcd->self.controller = dev;
        hcd->self.uses_dma = (dev->dma_mask != NULL);
	...
}

What I think we need to do here is ensure that the device that gets
passed here and assigned to hcd->self.controller is the actual DMA
master device, i.e. the pci_device or platform_device that was created
from outside of the xhci stack. This is after all the pointer that
gets passed into all the dma_map_*/dma_sync_*/dma_alloc_*/...
functions.

	Arnd
Alan Stern April 27, 2016, 5:59 p.m. UTC | #2
On Wed, 27 Apr 2016, Arnd Bergmann wrote:

> I've looked at the usb HCD code now and see this:
> 
> struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver,
>                 struct device *dev, const char *bus_name,
>                 struct usb_hcd *primary_hcd)
> {
> 	...
>         hcd->self.controller = dev;
>         hcd->self.uses_dma = (dev->dma_mask != NULL);
> 	...
> }
> 
> What I think we need to do here is ensure that the device that gets
> passed here and assigned to hcd->self.controller is the actual DMA
> master device, i.e. the pci_device or platform_device that was created
> from outside of the xhci stack. This is after all the pointer that
> gets passed into all the dma_map_*/dma_sync_*/dma_alloc_*/...
> functions.

It would be better to add a new field, since self.controller is also
used for lots of other purposes.  Something like hcd->self.dma_dev.

Alan Stern
Arnd Bergmann April 27, 2016, 6:08 p.m. UTC | #3
On Wednesday 27 April 2016 13:59:13 Alan Stern wrote:
> On Wed, 27 Apr 2016, Arnd Bergmann wrote:
> 
> > I've looked at the usb HCD code now and see this:
> > 
> > struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver,
> >                 struct device *dev, const char *bus_name,
> >                 struct usb_hcd *primary_hcd)
> > {
> >       ...
> >         hcd->self.controller = dev;
> >         hcd->self.uses_dma = (dev->dma_mask != NULL);
> >       ...
> > }
> > 
> > What I think we need to do here is ensure that the device that gets
> > passed here and assigned to hcd->self.controller is the actual DMA
> > master device, i.e. the pci_device or platform_device that was created
> > from outside of the xhci stack. This is after all the pointer that
> > gets passed into all the dma_map_*/dma_sync_*/dma_alloc_*/...
> > functions.
> 
> It would be better to add a new field, since self.controller is also
> used for lots of other purposes.  Something like hcd->self.dma_dev.
> 

Ok, fair enough. I only took a brief look and all uses I found were
either for the DMA mapping API or some printk logging.

	Arnd
Felipe Balbi April 27, 2016, 8:05 p.m. UTC | #4
Hi,

Arnd Bergmann <arnd@arndb.de> writes:
> On Wednesday 27 April 2016 13:59:13 Alan Stern wrote:
>> On Wed, 27 Apr 2016, Arnd Bergmann wrote:
>> 
>> > I've looked at the usb HCD code now and see this:
>> > 
>> > struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver,
>> >                 struct device *dev, const char *bus_name,
>> >                 struct usb_hcd *primary_hcd)
>> > {
>> >       ...
>> >         hcd->self.controller = dev;
>> >         hcd->self.uses_dma = (dev->dma_mask != NULL);
>> >       ...
>> > }
>> > 
>> > What I think we need to do here is ensure that the device that gets
>> > passed here and assigned to hcd->self.controller is the actual DMA
>> > master device, i.e. the pci_device or platform_device that was created
>> > from outside of the xhci stack. This is after all the pointer that
>> > gets passed into all the dma_map_*/dma_sync_*/dma_alloc_*/...
>> > functions.
>> 
>> It would be better to add a new field, since self.controller is also
>> used for lots of other purposes.  Something like hcd->self.dma_dev.
>
> Ok, fair enough. I only took a brief look and all uses I found were
> either for the DMA mapping API or some printk logging.

I have a feeling you guys are not considering how the patch to implement
this will look like.

How are you expecting dwc3 to pass a pointer to the DMA device from
dwc3.ko to xhci-plat ? platform_data ? That's gonna be horrible :-)

Also, remember that the DMA device for dwc3 is not always
dwc3->dev->parent. It might be dwc3->dev itself. How are you expecting
us to figure that one out ?

I still think dma_inherit() (or something along those lines) is
necessary. Specially when you consider that, as I said previously,
that's pretty much what of_dma_configure() does.

Anyway, I'd really like to see a patch implementing this
hcd->self.dma_dev logic. Consider all the duplication with this
approach, btw. struct dwc3 will *also* need a dwc->dma_dev of its
own. Will that be passed to dwc3 as platform_data from glue layer ? What
about platforms which don't even use a glue layer ?
Felipe Balbi April 27, 2016, 8:57 p.m. UTC | #5
Hi,

Arnd Bergmann <arnd@arndb.de> writes:
> On Wednesday 27 April 2016 19:53:51 Felipe Balbi wrote:
>> Arnd Bergmann <arnd@arndb.de> writes:
>> > On Wednesday 27 April 2016 16:50:19 Catalin Marinas wrote:
>> >> On Wed, Apr 27, 2016 at 04:11:17PM +0200, Arnd Bergmann wrote:
>> >> > On Wednesday 27 April 2016 14:59:00 Catalin Marinas wrote:
>> >> > > 
>> >> > > I would be in favour of a dma_inherit() function as well. We could hack
>> >> > > something up in the arch code (like below) but I would rather prefer an
>> >> > > explicit dma_inherit() call by drivers creating such devices.
>> >> > > 
>> >> > > diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
>> >> > > index ba437f090a74..ea6fb9b0e8fa 100644
>> >> > > --- a/arch/arm64/include/asm/dma-mapping.h
>> >> > > +++ b/arch/arm64/include/asm/dma-mapping.h
>> >> > > @@ -29,8 +29,11 @@ extern struct dma_map_ops dummy_dma_ops;
>> >> > >  
>> >> > >  static inline struct dma_map_ops *__generic_dma_ops(struct device *dev)
>> >> > >  {
>> >> > > -       if (dev && dev->archdata.dma_ops)
>> >> > > -               return dev->archdata.dma_ops;
>> >> > > +       while (dev) {
>> >> > > +               if (dev->archdata.dma_ops)
>> >> > > +                       return dev->archdata.dma_ops;
>> >> > > +               dev = dev->parent;
>> >> > > +       }
>> >> > 
>> >> > I think this would be a very bad idea: we don't want to have random
>> >> > devices be able to perform DMA just because their parent devices
>> >> > have been set up that way.
>> >> 
>> >> I agree, it's a big hack. It would be nice to have a simpler way to do
>> >> this in driver code rather than explicitly calling
>> >> of_dma_configure/arch_setup_dma_ops as per the original patch in this
>> >> thread.
>> >
>> > I haven't followed the entire discussion, but what's wrong with passing
>> > around a pointer to a 'struct device *hwdev' that represents the physical
>> > device that does the DMA?
>> 
>> that will likely create duplicated solutions in several drivers and
>> it'll be a pain to maintain. There's another complication, dwc3 can be
>> integrated in many different ways. See the device child-parent tree
>> representations below:
>> 
>> a) with a parent PCI device:
>> 
>> pci_bus_type
>>  - dwc3-pci
>>    - dwc3
>>      - xhci-plat
>> 
>> b) with a parent platform_device (OF):
>> 
>> platform_bus_type
>>  - dwc3-${omap,st,of-simple,exynos,keystone}
>>    - dwc3
>>      - xhci-plat
>> 
>> c) without a parent at all (thanks Grygorii):
>> 
>> platform_bus_type
>>  - dwc3
>>    - xhci-plat
>> 
>> (a) and (b) above are the common cases. The DMA-capable device is
>> clearly dwc3-${pci,omap,st,of-simple,exynos,keystone} with dwc3 only
>> having proper DMA configuration in OF platforms (because of the
>> unconditional of_dma_configure() during OF device creation) and
>> xhci-plat not knowing about DMA at all and hardcoding some crappy
>> defaults.
>> 
>> (c) is the uncommon case which creates some problems. In this case, dwc3
>> itself is the DMA-capable device and dwc3->dev->parent is the
>> platform_bus_type itself. Now consider the problem this creates:
>> 
>> i. the patch that I wrote [1] becomes invalid for (c), thanks to
>> Grygorii for pointing this out before it was too late.
>> 
>> ii. xhci-plat can also be described directly in DT (and is in some
>> cases). This means that assuming xhci-plat's parent's parent to be the
>> DMA-capable device is also an invalid assumption.
>> 
>> iii. one might argue that for DT-based platforms *with* a glue layer
>> ((b) above), OF already "copies" some sensible DMA defaults during
>> device creation.
>
> But that is exactly the point I was trying to make: instead of copying
> all the data into the xhci-plat device, just assign one pointer
> inside the xhci-plat device from whoever creates it.

and how do you pass that pointer to xhci-plat from another driver ?
No, platform_data is not an option ;-)

>> The only clean way to fix this up is with a dma_inherit()-like API which
>> would allow dwc3's glue layers ((a) and (b) above) to initialize child's
>> (dwc3) DMA configuration during child's creation. Something like below:
>> 
>> diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
>> index adc1e8a624cb..74b599269e2c 100644
>> --- a/drivers/usb/dwc3/dwc3-pci.c
>> +++ b/drivers/usb/dwc3/dwc3-pci.c
>> @@ -152,6 +152,8 @@ static int dwc3_pci_probe(struct pci_dev *pci,
>>  		return -ENOMEM;
>>  	}
>>  
>> +	dma_inherit(&dwc->dev, dev);
>> +
>>  	memset(res, 0x00, sizeof(struct resource) * ARRAY_SIZE(res));
>>  
>>  	res[0].start	= pci_resource_start(pci, 0);
>> 
>> that's all I'm asking for :-) dma_inherit() should, probably, be
>> arch-specific to handle details like IOMMU (which today sits under
>> archdata).
>
> It's still wrong: the archdata in the device can contain all sorts of
> additional information about how to do DMA, and some of that information

yes, inherit all of that ;-)

> is bus specific, e.g. when your dma_map_ops look like the on sparc:

okay, let me be clear. The underlying bus is still pci_bus_type or
platform_bus_type; that won't change.

> static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> {
> #ifdef CONFIG_SPARC_LEON
>         if (sparc_cpu_model == sparc_leon)
>                 return leon_dma_ops;
> #endif
> #if defined(CONFIG_SPARC32) && defined(CONFIG_PCI)
>         if (dev->bus == &pci_bus_type)
>                 return &pci32_dma_ops;
> #endif
>         return dma_ops;
> }

this seems to be a rather fragile assumption, no ? Only works because
*_bus_type declarations are exported. I wonder if this is the only
reason *why* they are exported, probably not...

> There is no way for a device to "inherit" the bus_type of its parent,

I don't want to inherit bus_type, I just want DMA configuration to not
depend on bus_type directly ;-)
Arnd Bergmann April 27, 2016, 9:05 p.m. UTC | #6
On Wednesday 27 April 2016 23:05:42 Felipe Balbi wrote:
> Arnd Bergmann <arnd@arndb.de> writes:
> > On Wednesday 27 April 2016 13:59:13 Alan Stern wrote:
> >> On Wed, 27 Apr 2016, Arnd Bergmann wrote:
> >> 
> >> > I've looked at the usb HCD code now and see this:
> >> > 
> >> > struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver,
> >> >                 struct device *dev, const char *bus_name,
> >> >                 struct usb_hcd *primary_hcd)
> >> > {
> >> >       ...
> >> >         hcd->self.controller = dev;
> >> >         hcd->self.uses_dma = (dev->dma_mask != NULL);
> >> >       ...
> >> > }
> >> > 
> >> > What I think we need to do here is ensure that the device that gets
> >> > passed here and assigned to hcd->self.controller is the actual DMA
> >> > master device, i.e. the pci_device or platform_device that was created
> >> > from outside of the xhci stack. This is after all the pointer that
> >> > gets passed into all the dma_map_*/dma_sync_*/dma_alloc_*/...
> >> > functions.
> >> 
> >> It would be better to add a new field, since self.controller is also
> >> used for lots of other purposes.  Something like hcd->self.dma_dev.
> >
> > Ok, fair enough. I only took a brief look and all uses I found were
> > either for the DMA mapping API or some printk logging.
> 
> I have a feeling you guys are not considering how the patch to implement
> this will look like.
> 
> How are you expecting dwc3 to pass a pointer to the DMA device from
> dwc3.ko to xhci-plat ? platform_data ? That's gonna be horrible 

Not any worse than it already is really. It already uses platform_data
for the exact case that is causing the problem here.

> Also, remember that the DMA device for dwc3 is not always
> dwc3->dev->parent. It might be dwc3->dev itself. How are you expecting
> us to figure that one out ?

Do you have an example for this? The ones I found here either
create the dwc3 device from PCI or from a platform glue driver.

> I still think dma_inherit() (or something along those lines) is
> necessary. Specially when you consider that, as I said previously,
> that's pretty much what of_dma_configure() does.

As I said, this is not an API that can work in general, because
it makes the assumption that everything related to DMA in a
device can be set in that device itself.

The simplest case where this does not work is a PCI device behind
an IOMMU: when you call dma_map_single() or dma_alloc_coherent(),
the dma_map_ops implementation for the IOMMU has to look at the
PCI device to find out the association with an IOMMU context,
and on most architectures you cannot bind an IOMMU context to
a platform device at all.

> Anyway, I'd really like to see a patch implementing this
> hcd->self.dma_dev logic. Consider all the duplication with this
> approach, btw. struct dwc3 will *also* need a dwc->dma_dev of its
> own. Will that be passed to dwc3 as platform_data from glue layer ? What
> about platforms which don't even use a glue layer ?

Let's separate the three problems here.

a) dwc3 creating a "xhci-hcd" platform_device that is not connected
   to any proper bus. We can work around that by adding the "self.dma_dev"
   pointer and pass that in platform_data. This is really easy, it's
   actually less code (and less iffy) than the current implementation of
   copying the parent dma mask.
   In the long run, this could be solved by doing away with the extra
   platform_device, by calling a variant of xhci_probe() from
   xhci_plat_probe() like we do for the normal *HCI drivers.

b) dwc3-pci creating a "dwc3" platform_device under the covers. This
   is pretty much the exact same problem as a) on another layer. In
   the short run, we can pass the device pointer as part of
   struct dwc3_platform_data (dwc3-pci is the only such user anway),
   and in the long run, it should be easy enough to get rid of the
   extra platform device by just calling a variant of dwc3_probe,
   which will again simplify the driver

c) some SoCs that have two separate device nodes to describe their
   dwc3 xhci. I don't think this is causing any additional problems,
   but if we want to make this behave more like other drivers in the
   long run or deal with machines that are missing a "dma-ranges"
   property in the parent node, we can kill off the
   of_platform_populate() hack and instead call dwc3_probe()
   directly from the glue drivers as in b), and have that
   do for_each_child_of_node() or similar to find the child node.

	Arnd
Felipe Balbi April 28, 2016, 6:37 a.m. UTC | #7
Hi,

Arnd Bergmann <arnd@arndb.de> writes:
> On Wednesday 27 April 2016 23:05:42 Felipe Balbi wrote:
>> Arnd Bergmann <arnd@arndb.de> writes:
>> > On Wednesday 27 April 2016 13:59:13 Alan Stern wrote:
>> >> On Wed, 27 Apr 2016, Arnd Bergmann wrote:
>> >> 
>> >> > I've looked at the usb HCD code now and see this:
>> >> > 
>> >> > struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver,
>> >> >                 struct device *dev, const char *bus_name,
>> >> >                 struct usb_hcd *primary_hcd)
>> >> > {
>> >> >       ...
>> >> >         hcd->self.controller = dev;
>> >> >         hcd->self.uses_dma = (dev->dma_mask != NULL);
>> >> >       ...
>> >> > }
>> >> > 
>> >> > What I think we need to do here is ensure that the device that gets
>> >> > passed here and assigned to hcd->self.controller is the actual DMA
>> >> > master device, i.e. the pci_device or platform_device that was created
>> >> > from outside of the xhci stack. This is after all the pointer that
>> >> > gets passed into all the dma_map_*/dma_sync_*/dma_alloc_*/...
>> >> > functions.
>> >> 
>> >> It would be better to add a new field, since self.controller is also
>> >> used for lots of other purposes.  Something like hcd->self.dma_dev.
>> >
>> > Ok, fair enough. I only took a brief look and all uses I found were
>> > either for the DMA mapping API or some printk logging.
>> 
>> I have a feeling you guys are not considering how the patch to implement
>> this will look like.
>> 
>> How are you expecting dwc3 to pass a pointer to the DMA device from
>> dwc3.ko to xhci-plat ? platform_data ? That's gonna be horrible 
>
> Not any worse than it already is really. It already uses platform_data
> for the exact case that is causing the problem here.

there's no use of platform_data for passing around DMA configuration. By
platform_data I really mean platform_device_add_data().

>> Also, remember that the DMA device for dwc3 is not always
>> dwc3->dev->parent. It might be dwc3->dev itself. How are you expecting
>> us to figure that one out ?
>
> Do you have an example for this? The ones I found here either
> create the dwc3 device from PCI or from a platform glue driver.

arch/arm64/boot/dts/xilinx/zynqmp.dtsi

>> I still think dma_inherit() (or something along those lines) is
>> necessary. Specially when you consider that, as I said previously,
>> that's pretty much what of_dma_configure() does.
>
> As I said, this is not an API that can work in general, because
> it makes the assumption that everything related to DMA in a
> device can be set in that device itself.
>
> The simplest case where this does not work is a PCI device behind
> an IOMMU: when you call dma_map_single() or dma_alloc_coherent(),
> the dma_map_ops implementation for the IOMMU has to look at the
> PCI device to find out the association with an IOMMU context,
> and on most architectures you cannot bind an IOMMU context to
> a platform device at all.

no, it relies on dev->archdata for IOMMU. In fact, the first "patch"
(more of a hack) I wrote to fix IOMMU with dwc3 on Intel platforms was
to literally memcpy() pci's archdata to dwc3->dev and it worked just
fine with and without IOMMU enabled.

>> Anyway, I'd really like to see a patch implementing this
>> hcd->self.dma_dev logic. Consider all the duplication with this
>> approach, btw. struct dwc3 will *also* need a dwc->dma_dev of its
>> own. Will that be passed to dwc3 as platform_data from glue layer ? What
>> about platforms which don't even use a glue layer ?
>
> Let's separate the three problems here.
>
> a) dwc3 creating a "xhci-hcd" platform_device that is not connected
>    to any proper bus. We can work around that by adding the "self.dma_dev"

platform_bus_type *is* a proper bus.

>    pointer and pass that in platform_data. This is really easy, it's

Sorry but passing a struct device pointer in platform_data is
ridiculous. Not to mention that, as I said before, we can't assume which
device to pass to xhci_plat in the first place. It might be dwc->dev and
it might be dwc->dev->parent.

>    actually less code (and less iffy) than the current implementation of
>    copying the parent dma mask.
>    In the long run, this could be solved by doing away with the extra
>    platform_device, by calling a variant of xhci_probe() from
>    xhci_plat_probe() like we do for the normal *HCI drivers.

no, that's not something I'll do for dwc3. We have had this talk before
and I'm not giving up the benefits of splitting things to separate
devices.

> b) dwc3-pci creating a "dwc3" platform_device under the covers. This

it's not under the covers at all. It's pretty similar to what MFD
drivers do. It's just not wrapped in a nice API because there's no need
for that.

>    is pretty much the exact same problem as a) on another layer. In
>    the short run, we can pass the device pointer as part of
>    struct dwc3_platform_data (dwc3-pci is the only such user anway),

It's incredible that you'd even suggest this at all. Did we not have a
big trouble to solve on ARM land because of different mach-* passing
function pointers and other pointers from arch/arch/mach-* instead of
abstracting things away. Then came DT to the rescue, a setup where you
can't even pass code or kernel objects ;-)

>    and in the long run, it should be easy enough to get rid of the
>    extra platform device by just calling a variant of dwc3_probe,
>    which will again simplify the driver

also again definitely not something I'll do

> c) some SoCs that have two separate device nodes to describe their
>    dwc3 xhci. I don't think this is causing any additional problems,
>    but if we want to make this behave more like other drivers in the
>    long run or deal with machines that are missing a "dma-ranges"
>    property in the parent node, we can kill off the
>    of_platform_populate() hack and instead call dwc3_probe()
>    directly from the glue drivers as in b), and have that
>    do for_each_child_of_node() or similar to find the child node.

no, we cannot.
Russell King - ARM Linux April 28, 2016, 2:16 p.m. UTC | #8
On Thu, Apr 28, 2016 at 09:37:08AM +0300, Felipe Balbi wrote:
> 
> Hi,
> 
> Arnd Bergmann <arnd@arndb.de> writes:
> >    pointer and pass that in platform_data. This is really easy, it's
> 
> Sorry but passing a struct device pointer in platform_data is
> ridiculous. Not to mention that, as I said before, we can't assume which
> device to pass to xhci_plat in the first place. It might be dwc->dev and
> it might be dwc->dev->parent.

+1.  Passing an unref-counted struct device through platform data is
totally mad, Arnd you're off your rocker if you think that's a good
idea.  What's more is that there's no way to properly refcount the
thing.
Arnd Bergmann April 28, 2016, 2:23 p.m. UTC | #9
On Thursday 28 April 2016 15:16:12 Russell King - ARM Linux wrote:
> On Thu, Apr 28, 2016 at 09:37:08AM +0300, Felipe Balbi wrote:
> > 
> > Hi,
> > 
> > Arnd Bergmann <arnd@arndb.de> writes:
> > >    pointer and pass that in platform_data. This is really easy, it's
> > 
> > Sorry but passing a struct device pointer in platform_data is
> > ridiculous. Not to mention that, as I said before, we can't assume which
> > device to pass to xhci_plat in the first place. It might be dwc->dev and
> > it might be dwc->dev->parent.
> 
> +1.  Passing an unref-counted struct device through platform data is
> totally mad, Arnd you're off your rocker if you think that's a good
> idea.  What's more is that there's no way to properly refcount the
> thing.

It's the parent device (or NULL), there is no way it can ever go away as
it's already refcounted through the device subsystem by the creation
of the child device.

I do realize that it's a hack, but the idea is to get rid of that
as soon as possibly by fixing the way the xhci device is probe so
we no longer need to fake a platform_device as the child here and
can just use the device itself.

	Arnd
Felipe Balbi April 28, 2016, 2:27 p.m. UTC | #10
Hi,

Arnd Bergmann <arnd@arndb.de> writes:
> On Thursday 28 April 2016 15:16:12 Russell King - ARM Linux wrote:
>> On Thu, Apr 28, 2016 at 09:37:08AM +0300, Felipe Balbi wrote:
>> > 
>> > Hi,
>> > 
>> > Arnd Bergmann <arnd@arndb.de> writes:
>> > >    pointer and pass that in platform_data. This is really easy, it's
>> > 
>> > Sorry but passing a struct device pointer in platform_data is
>> > ridiculous. Not to mention that, as I said before, we can't assume which
>> > device to pass to xhci_plat in the first place. It might be dwc->dev and
>> > it might be dwc->dev->parent.
>> 
>> +1.  Passing an unref-counted struct device through platform data is
>> totally mad, Arnd you're off your rocker if you think that's a good
>> idea.  What's more is that there's no way to properly refcount the
>> thing.
>
> It's the parent device (or NULL), there is no way it can ever go away as
> it's already refcounted through the device subsystem by the creation
> of the child device.

you're assuming that based on what we have today. We could get into a
situation where we need to use a completely unrelated device and the
problem exists again.

> I do realize that it's a hack, but the idea is to get rid of that
> as soon as possibly by fixing the way the xhci device is probe so
> we no longer need to fake a platform_device as the child here and
> can just use the device itself.

okay, let me try to be extra clear here:

We will *not* remove the extra platform_device because it actually
*does* exist and helps me hide/abstract a bunch of details and make
assumptions about order of certain events. We have already gone through
that in the past when I explained why I wrote dwc3 the way it is; if you
need a refresher, there are mailing list archives for that.

Moreover, this same problem exists for anything under drivers/mfd. It
just so happens that they're usually some i2c or spi device which don't
do DMA by themselves.
Yang Li Sept. 1, 2016, 10:14 p.m. UTC | #11
On Thu, Apr 28, 2016 at 9:27 AM, Felipe Balbi <balbi@kernel.org> wrote:
>
> Hi,
>
> Arnd Bergmann <arnd@arndb.de> writes:
>> On Thursday 28 April 2016 15:16:12 Russell King - ARM Linux wrote:
>>> On Thu, Apr 28, 2016 at 09:37:08AM +0300, Felipe Balbi wrote:
>>> >
>>> > Hi,
>>> >
>>> > Arnd Bergmann <arnd@arndb.de> writes:
>>> > >    pointer and pass that in platform_data. This is really easy, it's
>>> >
>>> > Sorry but passing a struct device pointer in platform_data is
>>> > ridiculous. Not to mention that, as I said before, we can't assume which
>>> > device to pass to xhci_plat in the first place. It might be dwc->dev and
>>> > it might be dwc->dev->parent.
>>>
>>> +1.  Passing an unref-counted struct device through platform data is
>>> totally mad, Arnd you're off your rocker if you think that's a good
>>> idea.  What's more is that there's no way to properly refcount the
>>> thing.
>>
>> It's the parent device (or NULL), there is no way it can ever go away as
>> it's already refcounted through the device subsystem by the creation
>> of the child device.
>
> you're assuming that based on what we have today. We could get into a
> situation where we need to use a completely unrelated device and the
> problem exists again.
>
>> I do realize that it's a hack, but the idea is to get rid of that
>> as soon as possibly by fixing the way the xhci device is probe so
>> we no longer need to fake a platform_device as the child here and
>> can just use the device itself.
>
> okay, let me try to be extra clear here:
>
> We will *not* remove the extra platform_device because it actually
> *does* exist and helps me hide/abstract a bunch of details and make
> assumptions about order of certain events. We have already gone through
> that in the past when I explained why I wrote dwc3 the way it is; if you
> need a refresher, there are mailing list archives for that.
>
> Moreover, this same problem exists for anything under drivers/mfd. It
> just so happens that they're usually some i2c or spi device which don't
> do DMA by themselves.

Hi Felipe and Arnd,

It has been a while since the last response to this discussion, but we
haven't reached an agreement yet!  Can we get to a conclusion on if it
is valid to create child platform device for abstraction purpose?  If
yes, can this child device do DMA by itself?

- Leo
Arnd Bergmann Sept. 2, 2016, 10:43 a.m. UTC | #12
On Thursday, September 1, 2016 5:14:28 PM CEST Leo Li wrote:
> 
> Hi Felipe and Arnd,
> 
> It has been a while since the last response to this discussion, but we
> haven't reached an agreement yet!  Can we get to a conclusion on if it
> is valid to create child platform device for abstraction purpose?  If
> yes, can this child device do DMA by itself?

I'd say it's no problem for a driver to create child devices in order
to represent different aspects of a device, but you should not rely on
those devices working when used with the dma-mapping interfaces.

This used to be simpler back when we could configure the kernel for
only one SoC platform at a time, and the platforms could provide their
own overrides for the dma-mapping interfaces. These days, we rely on
firmware or bootloader to describe various aspects of how DMA is done,
so you can't assume that passing a device without an of_node pointer
or ACPI data into those functions will do the right thing.

	Arnd
Russell King (Oracle) Sept. 2, 2016, 10:47 a.m. UTC | #13
On Fri, Sep 02, 2016 at 12:43:39PM +0200, Arnd Bergmann wrote:
> On Thursday, September 1, 2016 5:14:28 PM CEST Leo Li wrote:
> > 
> > Hi Felipe and Arnd,
> > 
> > It has been a while since the last response to this discussion, but we
> > haven't reached an agreement yet!  Can we get to a conclusion on if it
> > is valid to create child platform device for abstraction purpose?  If
> > yes, can this child device do DMA by itself?
> 
> I'd say it's no problem for a driver to create child devices in order
> to represent different aspects of a device, but you should not rely on
> those devices working when used with the dma-mapping interfaces.

That's absolutely right.  Consider the USB model - only the USB host
controller can perform DMA, not the USB devices themselves.  All DMA
mappings need to be mapped using the USB host controller device struct
not the USB device struct.

The same _should_ be true everywhere else: the struct device representing
the device performing DMA must be the one used to map the transfer.
Felipe Balbi Sept. 2, 2016, 10:53 a.m. UTC | #14
Hi,

Arnd Bergmann <arnd@arndb.de> writes:
> On Thursday, September 1, 2016 5:14:28 PM CEST Leo Li wrote:
>> 
>> Hi Felipe and Arnd,
>> 
>> It has been a while since the last response to this discussion, but we
>> haven't reached an agreement yet!  Can we get to a conclusion on if it
>> is valid to create child platform device for abstraction purpose?  If
>> yes, can this child device do DMA by itself?
>
> I'd say it's no problem for a driver to create child devices in order
> to represent different aspects of a device, but you should not rely on
> those devices working when used with the dma-mapping interfaces.

heh, that looks like an excuse to me :-)

This will always be a problem for e.g. MFD, for example. Are you saying
MFD child-devices shouldn't be allowed to do DMA? It becomes silly when
you read it that way, right?

> This used to be simpler back when we could configure the kernel for
> only one SoC platform at a time, and the platforms could provide their
> own overrides for the dma-mapping interfaces. These days, we rely on

right, so we have a very old regression that just took a complex driver
such as dwc3 to trigger ;-)

> firmware or bootloader to describe various aspects of how DMA is done,

there's no DMA description in DT. Every OF device gets the same 32-bit
DMA mask and that is, itself, wrong for several devices.

> so you can't assume that passing a device without an of_node pointer
> or ACPI data into those functions will do the right thing.

That's not the problem, however. We can very easily pass along
ACPI_COMPANION() to any platform_device we want, but that's not enough
because DMA-related bits are passed along with archdata; but archdata
isn't generic in any way. Some arches (like x86) _do_ use it for DMA,
but some don't.
Felipe Balbi Sept. 2, 2016, 11:08 a.m. UTC | #15
Hi,

Russell King - ARM Linux <linux@armlinux.org.uk> writes:
> On Fri, Sep 02, 2016 at 12:43:39PM +0200, Arnd Bergmann wrote:
>> On Thursday, September 1, 2016 5:14:28 PM CEST Leo Li wrote:
>> > 
>> > Hi Felipe and Arnd,
>> > 
>> > It has been a while since the last response to this discussion, but we
>> > haven't reached an agreement yet!  Can we get to a conclusion on if it
>> > is valid to create child platform device for abstraction purpose?  If
>> > yes, can this child device do DMA by itself?
>> 
>> I'd say it's no problem for a driver to create child devices in order
>> to represent different aspects of a device, but you should not rely on
>> those devices working when used with the dma-mapping interfaces.
>
> That's absolutely right.  Consider the USB model - only the USB host
> controller can perform DMA, not the USB devices themselves.  All DMA
> mappings need to be mapped using the USB host controller device struct
> not the USB device struct.
>
> The same _should_ be true everywhere else: the struct device representing
> the device performing DMA must be the one used to map the transfer.

How do we fix dwc3 in dual-role, then?

Peripheral-side dwc3 is easy, we just require a glue-layer to be present
and use dwc3.ko's parent device (which will be the PCI device or OF
device). But for host side dwc3, the problem is slightly more complex
because we're using xhci-plat.ko by just instantiating a xhci-platform
device so xhci-plat can probe.

xhci core has no means to know if its own device or the parent of its
parent should be used for DMA. Any ideas?
Robin Murphy Sept. 2, 2016, 11:55 a.m. UTC | #16
On 02/09/16 11:53, Felipe Balbi wrote:
> 
> Hi,
> 
> Arnd Bergmann <arnd@arndb.de> writes:
>> On Thursday, September 1, 2016 5:14:28 PM CEST Leo Li wrote:
>>>
>>> Hi Felipe and Arnd,
>>>
>>> It has been a while since the last response to this discussion, but we
>>> haven't reached an agreement yet!  Can we get to a conclusion on if it
>>> is valid to create child platform device for abstraction purpose?  If
>>> yes, can this child device do DMA by itself?
>>
>> I'd say it's no problem for a driver to create child devices in order
>> to represent different aspects of a device, but you should not rely on
>> those devices working when used with the dma-mapping interfaces.
> 
> heh, that looks like an excuse to me :-)
> 
> This will always be a problem for e.g. MFD, for example. Are you saying
> MFD child-devices shouldn't be allowed to do DMA? It becomes silly when
> you read it that way, right?
> 
>> This used to be simpler back when we could configure the kernel for
>> only one SoC platform at a time, and the platforms could provide their
>> own overrides for the dma-mapping interfaces. These days, we rely on
> 
> right, so we have a very old regression that just took a complex driver
> such as dwc3 to trigger ;-)
> 
>> firmware or bootloader to describe various aspects of how DMA is done,
> 
> there's no DMA description in DT. Every OF device gets the same 32-bit
> DMA mask and that is, itself, wrong for several devices.

Huh? There's only no DMA description in DT if the device can be assumed
to be happy with the defaults. Anything else should be using
"dma-ranges", "dma-coherent", etc. to describe non-default integration
aspects. For devices with an inherent fixed addressing capability !=32
bits, then it's down to the driver to call dma_set_mask() appropriately
to override the default 32-bit mask (which is not unique to OF-probed
devices either).

Sure, it's by no means a perfect API, but you're railing against
untruths here.

Robin.

>> so you can't assume that passing a device without an of_node pointer
>> or ACPI data into those functions will do the right thing.
> 
> That's not the problem, however. We can very easily pass along
> ACPI_COMPANION() to any platform_device we want, but that's not enough
> because DMA-related bits are passed along with archdata; but archdata
> isn't generic in any way. Some arches (like x86) _do_ use it for DMA,
> but some don't.
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Felipe Balbi Sept. 2, 2016, 12:56 p.m. UTC | #17
Hi,

Robin Murphy <robin.murphy@arm.com> writes:
>>>> It has been a while since the last response to this discussion, but we
>>>> haven't reached an agreement yet!  Can we get to a conclusion on if it
>>>> is valid to create child platform device for abstraction purpose?  If
>>>> yes, can this child device do DMA by itself?
>>>
>>> I'd say it's no problem for a driver to create child devices in order
>>> to represent different aspects of a device, but you should not rely on
>>> those devices working when used with the dma-mapping interfaces.
>> 
>> heh, that looks like an excuse to me :-)
>> 
>> This will always be a problem for e.g. MFD, for example. Are you saying
>> MFD child-devices shouldn't be allowed to do DMA? It becomes silly when
>> you read it that way, right?
>> 
>>> This used to be simpler back when we could configure the kernel for
>>> only one SoC platform at a time, and the platforms could provide their
>>> own overrides for the dma-mapping interfaces. These days, we rely on
>> 
>> right, so we have a very old regression that just took a complex driver
>> such as dwc3 to trigger ;-)
>> 
>>> firmware or bootloader to describe various aspects of how DMA is done,
>> 
>> there's no DMA description in DT. Every OF device gets the same 32-bit
>> DMA mask and that is, itself, wrong for several devices.
>
> Huh? There's only no DMA description in DT if the device can be assumed
> to be happy with the defaults. Anything else should be using
> "dma-ranges", "dma-coherent", etc. to describe non-default integration

heh, guilty as charged. I never noticed we had dma-ranges or
dma-coherent.
Arnd Bergmann Sept. 2, 2016, 1:10 p.m. UTC | #18
On Friday, September 2, 2016 12:55:33 PM CEST Robin Murphy wrote:
> 
> Huh? There's only no DMA description in DT if the device can be assumed
> to be happy with the defaults. Anything else should be using
> "dma-ranges", "dma-coherent", etc. to describe non-default integration
> aspects. For devices with an inherent fixed addressing capability !=32
> bits, then it's down to the driver to call dma_set_mask() appropriately
> to override the default 32-bit mask (which is not unique to OF-probed
> devices either).

The iommu configuration would be the main other one worth mentioning.

Note that there is a known bug with dma_set_mask(), which always succeeds
at the moment, even if the dma-ranges limit the possible addresses
in a way that should fail.

	Arnd
Felipe Balbi Sept. 2, 2016, 2:11 p.m. UTC | #19
Hi,

Felipe Balbi <balbi@kernel.org> writes:
> Hi,
>
> Russell King - ARM Linux <linux@armlinux.org.uk> writes:
>> On Fri, Sep 02, 2016 at 12:43:39PM +0200, Arnd Bergmann wrote:
>>> On Thursday, September 1, 2016 5:14:28 PM CEST Leo Li wrote:
>>> > 
>>> > Hi Felipe and Arnd,
>>> > 
>>> > It has been a while since the last response to this discussion, but we
>>> > haven't reached an agreement yet!  Can we get to a conclusion on if it
>>> > is valid to create child platform device for abstraction purpose?  If
>>> > yes, can this child device do DMA by itself?
>>> 
>>> I'd say it's no problem for a driver to create child devices in order
>>> to represent different aspects of a device, but you should not rely on
>>> those devices working when used with the dma-mapping interfaces.
>>
>> That's absolutely right.  Consider the USB model - only the USB host
>> controller can perform DMA, not the USB devices themselves.  All DMA
>> mappings need to be mapped using the USB host controller device struct
>> not the USB device struct.
>>
>> The same _should_ be true everywhere else: the struct device representing
>> the device performing DMA must be the one used to map the transfer.
>
> How do we fix dwc3 in dual-role, then?
>
> Peripheral-side dwc3 is easy, we just require a glue-layer to be present
> and use dwc3.ko's parent device (which will be the PCI device or OF
> device). But for host side dwc3, the problem is slightly more complex
> because we're using xhci-plat.ko by just instantiating a xhci-platform
> device so xhci-plat can probe.
>
> xhci core has no means to know if its own device or the parent of its
> parent should be used for DMA. Any ideas?

another thing to consider is that dwc3 only works on omap because DT
defaults to 32-bit DMA mask for anything described in DT that doesn't
provide dma-ranges. Isn't that somewhat odd as well?

Based on your reply, Russell, dwc3-omap should be the DMA device, but
dwc3 works just as well because of the whole 32-bit default.
Alan Stern Sept. 2, 2016, 2:21 p.m. UTC | #20
On Fri, 2 Sep 2016, Felipe Balbi wrote:

> Hi,
> 
> Russell King - ARM Linux <linux@armlinux.org.uk> writes:
> > On Fri, Sep 02, 2016 at 12:43:39PM +0200, Arnd Bergmann wrote:
> >> On Thursday, September 1, 2016 5:14:28 PM CEST Leo Li wrote:
> >> > 
> >> > Hi Felipe and Arnd,
> >> > 
> >> > It has been a while since the last response to this discussion, but we
> >> > haven't reached an agreement yet!  Can we get to a conclusion on if it
> >> > is valid to create child platform device for abstraction purpose?  If
> >> > yes, can this child device do DMA by itself?
> >> 
> >> I'd say it's no problem for a driver to create child devices in order
> >> to represent different aspects of a device, but you should not rely on
> >> those devices working when used with the dma-mapping interfaces.
> >
> > That's absolutely right.  Consider the USB model - only the USB host
> > controller can perform DMA, not the USB devices themselves.  All DMA
> > mappings need to be mapped using the USB host controller device struct
> > not the USB device struct.
> >
> > The same _should_ be true everywhere else: the struct device representing
> > the device performing DMA must be the one used to map the transfer.
> 
> How do we fix dwc3 in dual-role, then?
> 
> Peripheral-side dwc3 is easy, we just require a glue-layer to be present
> and use dwc3.ko's parent device (which will be the PCI device or OF
> device). But for host side dwc3, the problem is slightly more complex
> because we're using xhci-plat.ko by just instantiating a xhci-platform
> device so xhci-plat can probe.
> 
> xhci core has no means to know if its own device or the parent of its
> parent should be used for DMA. Any ideas?

In theory, you can store a flag somewhere in the platform device,
something that would tell xhci-hcd that it has to use the parent's
parent for DMA purposes.

I know it would be somewhat of a hack, but ought to work.

Alan Stern
Arnd Bergmann Sept. 2, 2016, 3:51 p.m. UTC | #21
On Friday, September 2, 2016 10:21:23 AM CEST Alan Stern wrote:
> On Fri, 2 Sep 2016, Felipe Balbi wrote:
> 
> > Hi,
> > 
> > Russell King - ARM Linux <linux@armlinux.org.uk> writes:
> > > On Fri, Sep 02, 2016 at 12:43:39PM +0200, Arnd Bergmann wrote:
> > >> On Thursday, September 1, 2016 5:14:28 PM CEST Leo Li wrote:
> > >> > 
> > >> > Hi Felipe and Arnd,
> > >> > 
> > >> > It has been a while since the last response to this discussion, but we
> > >> > haven't reached an agreement yet!  Can we get to a conclusion on if it
> > >> > is valid to create child platform device for abstraction purpose?  If
> > >> > yes, can this child device do DMA by itself?
> > >> 
> > >> I'd say it's no problem for a driver to create child devices in order
> > >> to represent different aspects of a device, but you should not rely on
> > >> those devices working when used with the dma-mapping interfaces.
> > >
> > > That's absolutely right.  Consider the USB model - only the USB host
> > > controller can perform DMA, not the USB devices themselves.  All DMA
> > > mappings need to be mapped using the USB host controller device struct
> > > not the USB device struct.
> > >
> > > The same _should_ be true everywhere else: the struct device representing
> > > the device performing DMA must be the one used to map the transfer.
> > 
> > How do we fix dwc3 in dual-role, then?
> > 
> > Peripheral-side dwc3 is easy, we just require a glue-layer to be present
> > and use dwc3.ko's parent device (which will be the PCI device or OF
> > device). But for host side dwc3, the problem is slightly more complex
> > because we're using xhci-plat.ko by just instantiating a xhci-platform
> > device so xhci-plat can probe.
> > 
> > xhci core has no means to know if its own device or the parent of its
> > parent should be used for DMA. Any ideas?
> 
> In theory, you can store a flag somewhere in the platform device,
> something that would tell xhci-hcd that it has to use the parent's
> parent for DMA purposes.
> 
> I know it would be somewhat of a hack, but ought to work.

Speaking of that flag, I suppose we need the same logic to know where
to look for USB devices attached to a dwc3 host when we need to describe
them in DT. By default we look for child device nodes under the
node of the HCD device node, but that would be wrong here too.

	Arnd
Grygorii Strashko Sept. 2, 2016, 4:23 p.m. UTC | #22
On 09/02/2016 02:08 PM, Felipe Balbi wrote:
> 
> Hi,
> 
> Russell King - ARM Linux <linux@armlinux.org.uk> writes:
>> On Fri, Sep 02, 2016 at 12:43:39PM +0200, Arnd Bergmann wrote:
>>> On Thursday, September 1, 2016 5:14:28 PM CEST Leo Li wrote:
>>>>
>>>> Hi Felipe and Arnd,
>>>>
>>>> It has been a while since the last response to this discussion, but we
>>>> haven't reached an agreement yet!  Can we get to a conclusion on if it
>>>> is valid to create child platform device for abstraction purpose?  If
>>>> yes, can this child device do DMA by itself?
>>>
>>> I'd say it's no problem for a driver to create child devices in order
>>> to represent different aspects of a device, but you should not rely on
>>> those devices working when used with the dma-mapping interfaces.
>>
>> That's absolutely right.  Consider the USB model - only the USB host
>> controller can perform DMA, not the USB devices themselves.  All DMA
>> mappings need to be mapped using the USB host controller device struct
>> not the USB device struct.
>>
>> The same _should_ be true everywhere else: the struct device representing
>> the device performing DMA must be the one used to map the transfer.
> 
> How do we fix dwc3 in dual-role, then?
> 
> Peripheral-side dwc3 is easy, we just require a glue-layer to be present
> and use dwc3.ko's parent device (which will be the PCI device or OF
> device). But for host side dwc3, the problem is slightly more complex
> because we're using xhci-plat.ko by just instantiating a xhci-platform
> device so xhci-plat can probe.
> 
> xhci core has no means to know if its own device or the parent of its
> parent should be used for DMA. Any ideas?
> 

Wouldn't be possible to use dma_mask for such purposes?
Like, case 1:
 dwc3-omap (dma_mask=X) -> dwc3 (dma_mask = NULL) -> xhci-plat (NULL)
and then it might be possible to find proper parent by traversing DD
hierarchy.

or :
dwc3 (dma_mask = X) -> xhci-plat (NULL)

or :
xhci-plat (dma_mask = X)

of course, it might be needed to skip DMA configuration for devices
which parents have been configured for dma already (easy for xhci-plat,
but can be not easy for dwc3).

just thinking..

Also, I'd like to note that problem become more complex when scsi layer is used
on top USB ;(
Yang Li Sept. 2, 2016, 10:16 p.m. UTC | #23
On Fri, Sep 2, 2016 at 5:43 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday, September 1, 2016 5:14:28 PM CEST Leo Li wrote:
>>
>> Hi Felipe and Arnd,
>>
>> It has been a while since the last response to this discussion, but we
>> haven't reached an agreement yet!  Can we get to a conclusion on if it
>> is valid to create child platform device for abstraction purpose?  If
>> yes, can this child device do DMA by itself?
>
> I'd say it's no problem for a driver to create child devices in order
> to represent different aspects of a device, but you should not rely on
> those devices working when used with the dma-mapping interfaces.
>
> This used to be simpler back when we could configure the kernel for
> only one SoC platform at a time, and the platforms could provide their
> own overrides for the dma-mapping interfaces. These days, we rely on
> firmware or bootloader to describe various aspects of how DMA is done,
> so you can't assume that passing a device without an of_node pointer
> or ACPI data into those functions will do the right thing.

Can we use the firmware or bootloader information to provide the
default dma-mapping attributes for devices that doesn't have an
of_node pointer or ACPI data?  This will at least restore what we had
previously provided .  I'm concerned that changing all the drivers
that are creating child device will be a big effort.  Like I mentioned
in another thread, there are many instances of platform_device_add()
under the drivers/ directory.

- Leo
Arnd Bergmann Sept. 5, 2016, 3:39 p.m. UTC | #24
On Friday, September 2, 2016 5:16:31 PM CEST Leo Li wrote:
> 
> Can we use the firmware or bootloader information to provide the
> default dma-mapping attributes for devices that doesn't have an
> of_node pointer or ACPI data?  This will at least restore what we had
> previously provided .  I'm concerned that changing all the drivers
> that are creating child device will be a big effort.  Like I mentioned
> in another thread, there are many instances of platform_device_add()
> under the drivers/ directory.

Fortunately, there are not too many drivers that call platform_device_add
*and* try to set up a dma mask for the child device:

git grep -wl dma_mask drivers | xargs grep -wl 'platform_device_\(add\|register\)'

drivers/base/platform.c
drivers/bcma/main.c
drivers/eisa/virtual_root.c
drivers/mfd/mfd-core.c
drivers/mfd/omap-usb-host.c
drivers/misc/mic/card/mic_x100.c
drivers/platform/goldfish/pdev_bus.c
drivers/ssb/main.c
drivers/usb/chipidea/core.c
drivers/usb/dwc3/dwc3-exynos.c
drivers/usb/dwc3/host.c
drivers/usb/gadget/udc/bdc/bdc_pci.c
drivers/usb/host/bcma-hcd.c
drivers/usb/host/fsl-mph-dr-of.c
drivers/usb/host/ssb-hcd.c
drivers/usb/misc/ftdi-elan.c
drivers/usb/musb/blackfin.c
drivers/usb/musb/musb_dsps.c
drivers/usb/musb/omap2430.c
drivers/usb/musb/ux500.c

Most of these are probably never used with any nonstandard
DMA settings (IOMMU, cache coherency, offset, ...).

One thing we could possibly do is to go through these and
replace the hardcoded dma mask setup with of_dma_configure()
in all cases in which we actually use DT for probing, which
should cover the interesting cases.

	Arnd
Peter Chen Sept. 6, 2016, 6:35 a.m. UTC | #25
On Mon, Sep 05, 2016 at 05:39:27PM +0200, Arnd Bergmann wrote:
> On Friday, September 2, 2016 5:16:31 PM CEST Leo Li wrote:
> > 
> > Can we use the firmware or bootloader information to provide the
> > default dma-mapping attributes for devices that doesn't have an
> > of_node pointer or ACPI data?  This will at least restore what we had
> > previously provided .  I'm concerned that changing all the drivers
> > that are creating child device will be a big effort.  Like I mentioned
> > in another thread, there are many instances of platform_device_add()
> > under the drivers/ directory.
> 
> Fortunately, there are not too many drivers that call platform_device_add
> *and* try to set up a dma mask for the child device:
> 
> git grep -wl dma_mask drivers | xargs grep -wl 'platform_device_\(add\|register\)'
> 
> drivers/base/platform.c
> drivers/bcma/main.c
> drivers/eisa/virtual_root.c
> drivers/mfd/mfd-core.c
> drivers/mfd/omap-usb-host.c
> drivers/misc/mic/card/mic_x100.c
> drivers/platform/goldfish/pdev_bus.c
> drivers/ssb/main.c
> drivers/usb/chipidea/core.c
> drivers/usb/dwc3/dwc3-exynos.c
> drivers/usb/dwc3/host.c
> drivers/usb/gadget/udc/bdc/bdc_pci.c
> drivers/usb/host/bcma-hcd.c
> drivers/usb/host/fsl-mph-dr-of.c
> drivers/usb/host/ssb-hcd.c
> drivers/usb/misc/ftdi-elan.c
> drivers/usb/musb/blackfin.c
> drivers/usb/musb/musb_dsps.c
> drivers/usb/musb/omap2430.c
> drivers/usb/musb/ux500.c
> 
> Most of these are probably never used with any nonstandard
> DMA settings (IOMMU, cache coherency, offset, ...).
> 
> One thing we could possibly do is to go through these and
> replace the hardcoded dma mask setup with of_dma_configure()
> in all cases in which we actually use DT for probing, which
> should cover the interesting cases.
> 

One case I am going to work is to let USB chipidea driver support iommu,
the chipidea core device is no of_node, and created by
platform_add_device on the runtime. Using of_dma_configure with parent
of_node is a solution from my point, like [1].

https://lkml.org/lkml/2016/2/22/7
Felipe Balbi Sept. 6, 2016, 6:40 a.m. UTC | #26
Hi,

Peter Chen <hzpeterchen@gmail.com> writes:
> On Mon, Sep 05, 2016 at 05:39:27PM +0200, Arnd Bergmann wrote:
>> On Friday, September 2, 2016 5:16:31 PM CEST Leo Li wrote:
>> > 
>> > Can we use the firmware or bootloader information to provide the
>> > default dma-mapping attributes for devices that doesn't have an
>> > of_node pointer or ACPI data?  This will at least restore what we had
>> > previously provided .  I'm concerned that changing all the drivers
>> > that are creating child device will be a big effort.  Like I mentioned
>> > in another thread, there are many instances of platform_device_add()
>> > under the drivers/ directory.
>> 
>> Fortunately, there are not too many drivers that call platform_device_add
>> *and* try to set up a dma mask for the child device:
>> 
>> git grep -wl dma_mask drivers | xargs grep -wl 'platform_device_\(add\|register\)'
>> 
>> drivers/base/platform.c
>> drivers/bcma/main.c
>> drivers/eisa/virtual_root.c
>> drivers/mfd/mfd-core.c
>> drivers/mfd/omap-usb-host.c
>> drivers/misc/mic/card/mic_x100.c
>> drivers/platform/goldfish/pdev_bus.c
>> drivers/ssb/main.c
>> drivers/usb/chipidea/core.c
>> drivers/usb/dwc3/dwc3-exynos.c
>> drivers/usb/dwc3/host.c
>> drivers/usb/gadget/udc/bdc/bdc_pci.c
>> drivers/usb/host/bcma-hcd.c
>> drivers/usb/host/fsl-mph-dr-of.c
>> drivers/usb/host/ssb-hcd.c
>> drivers/usb/misc/ftdi-elan.c
>> drivers/usb/musb/blackfin.c
>> drivers/usb/musb/musb_dsps.c
>> drivers/usb/musb/omap2430.c
>> drivers/usb/musb/ux500.c
>> 
>> Most of these are probably never used with any nonstandard
>> DMA settings (IOMMU, cache coherency, offset, ...).
>> 
>> One thing we could possibly do is to go through these and
>> replace the hardcoded dma mask setup with of_dma_configure()
>> in all cases in which we actually use DT for probing, which
>> should cover the interesting cases.
>> 
>
> One case I am going to work is to let USB chipidea driver support iommu,
> the chipidea core device is no of_node, and created by
> platform_add_device on the runtime. Using of_dma_configure with parent
> of_node is a solution from my point, like [1].
>
> https://lkml.org/lkml/2016/2/22/7

this only solves the problem for DT devices. Legacy devices and
PCI-based systems will still suffer from the same problem. At least for
dwc3, I will only be taking patches that solve the problem for all
users, not a subset of them.
Arnd Bergmann Sept. 6, 2016, 10:38 a.m. UTC | #27
On Tuesday, September 6, 2016 2:35:29 PM CEST Peter Chen wrote:
> On Mon, Sep 05, 2016 at 05:39:27PM +0200, Arnd Bergmann wrote:
> > On Friday, September 2, 2016 5:16:31 PM CEST Leo Li wrote:

> > 
> > Most of these are probably never used with any nonstandard
> > DMA settings (IOMMU, cache coherency, offset, ...).
> > 
> > One thing we could possibly do is to go through these and
> > replace the hardcoded dma mask setup with of_dma_configure()
> > in all cases in which we actually use DT for probing, which
> > should cover the interesting cases.
> > 
> 
> One case I am going to work is to let USB chipidea driver support iommu,
> the chipidea core device is no of_node, and created by
> platform_add_device on the runtime. Using of_dma_configure with parent
> of_node is a solution from my point, like [1].
> 
> https://lkml.org/lkml/2016/2/22/7

Right, that should make it work with iommu as well. However, it does
not solve the other issue I mentioned above, with boards that have
USB devices hardwired to a chipidea host controller that need
configuration from DT. For that, we still need to come up with another
way to associate the DT hierarchy in the host bridge node with
the Linux platform_device.

	Arnd
Arnd Bergmann Sept. 6, 2016, 10:46 a.m. UTC | #28
On Tuesday, September 6, 2016 9:40:19 AM CEST Felipe Balbi wrote:
> 
> this only solves the problem for DT devices. Legacy devices and
> PCI-based systems will still suffer from the same problem. At least for
> dwc3, I will only be taking patches that solve the problem for all
> users, not a subset of them.

I don't think legacy devices are a worry, because they wouldn't
have this problem. For the PCI case, you are right that it cannot
work, in particular for machines that have complex IOMMU setup.

Some architectures (at least arm64 and sparc) check the bus_type of
a device in order to find the correct set of dma_map_ops for that
device, so there is no real way to handle this as long as you
pass a platform_device into an API that expects a pci_device.

	Arnd
Felipe Balbi Sept. 6, 2016, 10:50 a.m. UTC | #29
Hi,

Arnd Bergmann <arnd@arndb.de> writes:
> On Tuesday, September 6, 2016 9:40:19 AM CEST Felipe Balbi wrote:
>> 
>> this only solves the problem for DT devices. Legacy devices and
>> PCI-based systems will still suffer from the same problem. At least for
>> dwc3, I will only be taking patches that solve the problem for all
>> users, not a subset of them.
>
> I don't think legacy devices are a worry, because they wouldn't
> have this problem. For the PCI case, you are right that it cannot
> work, in particular for machines that have complex IOMMU setup.
>
> Some architectures (at least arm64 and sparc) check the bus_type of
> a device in order to find the correct set of dma_map_ops for that
> device, so there is no real way to handle this as long as you
> pass a platform_device into an API that expects a pci_device.

Then I guess we're left with adding a "struct device *dma_dev" to struct
dwc3 and trying to figure out if we should use parent or self. Does
anybody see any problems with that?

Note, we would NOT be passing device pointers are platform_data, we
would have dwc3.ko figure out if it should use self or its parent device
for dma.
Arnd Bergmann Sept. 6, 2016, 1:27 p.m. UTC | #30
On Tuesday, September 6, 2016 1:50:48 PM CEST Felipe Balbi wrote:
> Hi,
> 
> Arnd Bergmann <arnd@arndb.de> writes:
> > On Tuesday, September 6, 2016 9:40:19 AM CEST Felipe Balbi wrote:
> >> 
> >> this only solves the problem for DT devices. Legacy devices and
> >> PCI-based systems will still suffer from the same problem. At least for
> >> dwc3, I will only be taking patches that solve the problem for all
> >> users, not a subset of them.
> >
> > I don't think legacy devices are a worry, because they wouldn't
> > have this problem. For the PCI case, you are right that it cannot
> > work, in particular for machines that have complex IOMMU setup.
> >
> > Some architectures (at least arm64 and sparc) check the bus_type of
> > a device in order to find the correct set of dma_map_ops for that
> > device, so there is no real way to handle this as long as you
> > pass a platform_device into an API that expects a pci_device.
> 
> Then I guess we're left with adding a "struct device *dma_dev" to struct
> dwc3 and trying to figure out if we should use parent or self. Does
> anybody see any problems with that?

I think we actually need the device pointer in the usb_hcd structure,
so it can be passed in these API calls from the USB core

drivers/usb/core/buffer.c:      return dma_alloc_coherent(hcd->self.controller, size, dma, mem_flags);
drivers/usb/core/buffer.c:      dma_free_coherent(hcd->self.controller, size, addr, dma);
drivers/usb/core/buffer.c:          (!hcd->self.controller->dma_mask &&
drivers/usb/core/buffer.c:              hcd->pool[i] = dma_pool_create(name, hcd->self.controller,
drivers/usb/core/hcd.c:                 urb->setup_dma = dma_map_single(
drivers/usb/core/hcd.c:                 if (dma_mapping_error(hcd->self.controller,
drivers/usb/core/hcd.c:                         n = dma_map_sg(
drivers/usb/core/hcd.c:                         urb->transfer_dma = dma_map_page(
drivers/usb/core/hcd.c:                         if (dma_mapping_error(hcd->self.controller,
drivers/usb/core/hcd.c:                         urb->transfer_dma = dma_map_single(
drivers/usb/core/hcd.c:                         if (dma_mapping_error(hcd->self.controller,
drivers/usb/core/usb.c:         urb->transfer_dma = dma_map_single(controller,
drivers/usb/core/usb.c: return dma_map_sg(controller, sg, nents,
drivers/usb/core/usb.c:         dma_sync_single_for_cpu(controller,
drivers/usb/core/usb.c:                 dma_sync_single_for_cpu(controller,
drivers/usb/core/usb.c: dma_sync_sg_for_cpu(controller, sg, n_hw_ents,

as these are all called on behalf of the host controller node.
Looking for more instances of hcd->self.controller, I find this
instance:

drivers/usb/core/hcd.c:         struct usb_phy *phy = usb_get_phy_dev(hcd->self.controller, 0);
drivers/usb/core/hcd.c:         struct phy *phy = phy_get(hcd->self.controller, "usb");

I'm unsure which device pointer we want here, but I suspect this also
needs to be the one that has the device node in order to make the lookup
of the phy structure by device node work right. Can you clarify how
this works today?

We probably also need to add the of_node of the host controller device
to struct usb_hcd in order to make usb_of_get_child_node() work
in the case where the hcd itself is not device that is listed
in DT. It might be a good idea to use 'struct fwnode_handle' for that,
so we can in the future also allow ACPI platforms to specify 

> Note, we would NOT be passing device pointers are platform_data, we
> would have dwc3.ko figure out if it should use self or its parent device
> for dma.

Ok, sounds good.

	Arnd
Peter Chen Sept. 7, 2016, 6:33 a.m. UTC | #31
On Tue, Sep 06, 2016 at 12:38:29PM +0200, Arnd Bergmann wrote:
> On Tuesday, September 6, 2016 2:35:29 PM CEST Peter Chen wrote:
> > On Mon, Sep 05, 2016 at 05:39:27PM +0200, Arnd Bergmann wrote:
> > > On Friday, September 2, 2016 5:16:31 PM CEST Leo Li wrote:
> 
> > > 
> > > Most of these are probably never used with any nonstandard
> > > DMA settings (IOMMU, cache coherency, offset, ...).
> > > 
> > > One thing we could possibly do is to go through these and
> > > replace the hardcoded dma mask setup with of_dma_configure()
> > > in all cases in which we actually use DT for probing, which
> > > should cover the interesting cases.
> > > 
> > 
> > One case I am going to work is to let USB chipidea driver support iommu,
> > the chipidea core device is no of_node, and created by
> > platform_add_device on the runtime. Using of_dma_configure with parent
> > of_node is a solution from my point, like [1].
> > 
> > https://lkml.org/lkml/2016/2/22/7
> 
> Right, that should make it work with iommu as well. However, it does
> not solve the other issue I mentioned above, with boards that have
> USB devices hardwired to a chipidea host controller that need
> configuration from DT. For that, we still need to come up with another
> way to associate the DT hierarchy in the host bridge node with
> the Linux platform_device.
> 

Why? The DMA configuration is for host controller, not for USB device.
No matter there is hardwired or hotplug devices, the DMA configuration
for host controller are both inherited from glue layer platform devices,
current implementation is at function ci_hdrc_add_device,
drivers/usb/chipidea/core.c.
Felipe Balbi Sept. 7, 2016, 6:51 a.m. UTC | #32
Arnd Bergmann <arnd@arndb.de> writes:

> On Tuesday, September 6, 2016 1:50:48 PM CEST Felipe Balbi wrote:
>> Hi,
>> 
>> Arnd Bergmann <arnd@arndb.de> writes:
>> > On Tuesday, September 6, 2016 9:40:19 AM CEST Felipe Balbi wrote:
>> >> 
>> >> this only solves the problem for DT devices. Legacy devices and
>> >> PCI-based systems will still suffer from the same problem. At least for
>> >> dwc3, I will only be taking patches that solve the problem for all
>> >> users, not a subset of them.
>> >
>> > I don't think legacy devices are a worry, because they wouldn't
>> > have this problem. For the PCI case, you are right that it cannot
>> > work, in particular for machines that have complex IOMMU setup.
>> >
>> > Some architectures (at least arm64 and sparc) check the bus_type of
>> > a device in order to find the correct set of dma_map_ops for that
>> > device, so there is no real way to handle this as long as you
>> > pass a platform_device into an API that expects a pci_device.
>> 
>> Then I guess we're left with adding a "struct device *dma_dev" to struct
>> dwc3 and trying to figure out if we should use parent or self. Does
>> anybody see any problems with that?
>
> I think we actually need the device pointer in the usb_hcd structure,
> so it can be passed in these API calls from the USB core

that's for host side. I'm concerned about peripheral side

> as these are all called on behalf of the host controller node.
> Looking for more instances of hcd->self.controller, I find this
> instance:
>
> drivers/usb/core/hcd.c:         struct usb_phy *phy = usb_get_phy_dev(hcd->self.controller, 0);
> drivers/usb/core/hcd.c:         struct phy *phy = phy_get(hcd->self.controller, "usb");
>
> I'm unsure which device pointer we want here, but I suspect this also
> needs to be the one that has the device node in order to make the lookup
> of the phy structure by device node work right. Can you clarify how
> this works today?

sounds correct to me.
Roger Quadros Sept. 7, 2016, 7:17 a.m. UTC | #33
Hi Arnd,

On 02/09/16 18:51, Arnd Bergmann wrote:
> On Friday, September 2, 2016 10:21:23 AM CEST Alan Stern wrote:
>> On Fri, 2 Sep 2016, Felipe Balbi wrote:
>>
>>> Hi,
>>>
>>> Russell King - ARM Linux <linux@armlinux.org.uk> writes:
>>>> On Fri, Sep 02, 2016 at 12:43:39PM +0200, Arnd Bergmann wrote:
>>>>> On Thursday, September 1, 2016 5:14:28 PM CEST Leo Li wrote:
>>>>>>
>>>>>> Hi Felipe and Arnd,
>>>>>>
>>>>>> It has been a while since the last response to this discussion, but we
>>>>>> haven't reached an agreement yet!  Can we get to a conclusion on if it
>>>>>> is valid to create child platform device for abstraction purpose?  If
>>>>>> yes, can this child device do DMA by itself?
>>>>>
>>>>> I'd say it's no problem for a driver to create child devices in order
>>>>> to represent different aspects of a device, but you should not rely on
>>>>> those devices working when used with the dma-mapping interfaces.
>>>>
>>>> That's absolutely right.  Consider the USB model - only the USB host
>>>> controller can perform DMA, not the USB devices themselves.  All DMA
>>>> mappings need to be mapped using the USB host controller device struct
>>>> not the USB device struct.
>>>>
>>>> The same _should_ be true everywhere else: the struct device representing
>>>> the device performing DMA must be the one used to map the transfer.
>>>
>>> How do we fix dwc3 in dual-role, then?
>>>
>>> Peripheral-side dwc3 is easy, we just require a glue-layer to be present
>>> and use dwc3.ko's parent device (which will be the PCI device or OF
>>> device). But for host side dwc3, the problem is slightly more complex
>>> because we're using xhci-plat.ko by just instantiating a xhci-platform
>>> device so xhci-plat can probe.
>>>
>>> xhci core has no means to know if its own device or the parent of its
>>> parent should be used for DMA. Any ideas?
>>
>> In theory, you can store a flag somewhere in the platform device,
>> something that would tell xhci-hcd that it has to use the parent's
>> parent for DMA purposes.
>>
>> I know it would be somewhat of a hack, but ought to work.
> 
> Speaking of that flag, I suppose we need the same logic to know where
> to look for USB devices attached to a dwc3 host when we need to describe
> them in DT. By default we look for child device nodes under the
> node of the HCD device node, but that would be wrong here too.

I didn't get this part. Information about USB devices attached to a USB host
is never provided in DT because they are always dynamically created via
usb_new_device(), whether they are hard-wired on the board or hot-plugged.

These USB devices inherit their DMA masks in the usb_alloc_dev() routine
whereas each interface within the USB device inherits its DMA mask in
usb_set_configuration().

There is a bug  in the USB core because of which the ISB device and interfaces
do not inherit dma_pfn_offset correctly for which I've sent a patch
https://lkml.org/lkml/2016/8/17/275

cheers,
-roger
Peter Chen Sept. 7, 2016, 7:44 a.m. UTC | #34
On Tue, Sep 06, 2016 at 03:27:43PM +0200, Arnd Bergmann wrote:
> On Tuesday, September 6, 2016 1:50:48 PM CEST Felipe Balbi wrote:
> > Hi,
> > 
> > Arnd Bergmann <arnd@arndb.de> writes:
> > > On Tuesday, September 6, 2016 9:40:19 AM CEST Felipe Balbi wrote:
> > >> 
> > >> this only solves the problem for DT devices. Legacy devices and
> > >> PCI-based systems will still suffer from the same problem. At least for
> > >> dwc3, I will only be taking patches that solve the problem for all
> > >> users, not a subset of them.
> > >
> > > I don't think legacy devices are a worry, because they wouldn't
> > > have this problem. For the PCI case, you are right that it cannot
> > > work, in particular for machines that have complex IOMMU setup.
> > >
> > > Some architectures (at least arm64 and sparc) check the bus_type of
> > > a device in order to find the correct set of dma_map_ops for that
> > > device, so there is no real way to handle this as long as you
> > > pass a platform_device into an API that expects a pci_device.
> > 
> > Then I guess we're left with adding a "struct device *dma_dev" to struct
> > dwc3 and trying to figure out if we should use parent or self. Does
> > anybody see any problems with that?
> 
> I think we actually need the device pointer in the usb_hcd structure,
> so it can be passed in these API calls from the USB core
> 
> drivers/usb/core/buffer.c:      return dma_alloc_coherent(hcd->self.controller, size, dma, mem_flags);
> drivers/usb/core/buffer.c:      dma_free_coherent(hcd->self.controller, size, addr, dma);
> drivers/usb/core/buffer.c:          (!hcd->self.controller->dma_mask &&
> drivers/usb/core/buffer.c:              hcd->pool[i] = dma_pool_create(name, hcd->self.controller,
> drivers/usb/core/hcd.c:                 urb->setup_dma = dma_map_single(
> drivers/usb/core/hcd.c:                 if (dma_mapping_error(hcd->self.controller,
> drivers/usb/core/hcd.c:                         n = dma_map_sg(
> drivers/usb/core/hcd.c:                         urb->transfer_dma = dma_map_page(
> drivers/usb/core/hcd.c:                         if (dma_mapping_error(hcd->self.controller,
> drivers/usb/core/hcd.c:                         urb->transfer_dma = dma_map_single(
> drivers/usb/core/hcd.c:                         if (dma_mapping_error(hcd->self.controller,
> drivers/usb/core/usb.c:         urb->transfer_dma = dma_map_single(controller,
> drivers/usb/core/usb.c: return dma_map_sg(controller, sg, nents,
> drivers/usb/core/usb.c:         dma_sync_single_for_cpu(controller,
> drivers/usb/core/usb.c:                 dma_sync_single_for_cpu(controller,
> drivers/usb/core/usb.c: dma_sync_sg_for_cpu(controller, sg, n_hw_ents,
> 
> as these are all called on behalf of the host controller node.

The USB HCD core uses the struct device pointer passed by usb_create_hcd
which is called by each host controller driver, the host controller driver
needs to make sure the information (DMA configurations, of_node, etc)
in struct device are correct before calling usb_create_hcd.

> Looking for more instances of hcd->self.controller, I find this
> instance:
> 
> drivers/usb/core/hcd.c:         struct usb_phy *phy = usb_get_phy_dev(hcd->self.controller, 0);
> drivers/usb/core/hcd.c:         struct phy *phy = phy_get(hcd->self.controller, "usb");
> 
> I'm unsure which device pointer we want here, but I suspect this also
> needs to be the one that has the device node in order to make the lookup
> of the phy structure by device node work right. Can you clarify how
> this works today?
> 

The above codes are only called when the host controller driver does not
the code which try to get USB PHY. Once the PHY drivers is probed, the
above code can work no matter DT or non-DT.

But by looking at the code, I am wondering how dwc3 host get its USB PHY at
xhci-platform.c, it seems there is no of_node at xhci-hcd for dwc3.

> We probably also need to add the of_node of the host controller device
> to struct usb_hcd in order to make usb_of_get_child_node() work
> in the case where the hcd itself is not device that is listed
> in DT.

The pre-condition of DT function at USB HCD core works is the host
controller device has of_node, since it is the root node for USB tree
described at DT. If the host controller device is not at DT, it needs
to try to get its of_node, the chipidea driver gets it through its
parent node [1]

> It might be a good idea to use 'struct fwnode_handle' for that,
> so we can in the future also allow ACPI platforms to specify 
> 
> > Note, we would NOT be passing device pointers are platform_data, we
> > would have dwc3.ko figure out if it should use self or its parent device
> > for dma.
> 
> Ok, sounds good.
> 
> 	Arnd

[1] https://lkml.org/lkml/2016/8/8/119
Arnd Bergmann Sept. 7, 2016, 8:29 a.m. UTC | #35
On Wednesday, September 7, 2016 10:17:31 AM CEST Roger Quadros wrote:
> > 
> > Speaking of that flag, I suppose we need the same logic to know where
> > to look for USB devices attached to a dwc3 host when we need to describe
> > them in DT. By default we look for child device nodes under the
> > node of the HCD device node, but that would be wrong here too.
> 
> I didn't get this part. Information about USB devices attached to a USB host
> is never provided in DT because they are always dynamically created via
> usb_new_device(), whether they are hard-wired on the board or hot-plugged.
> 
> These USB devices inherit their DMA masks in the usb_alloc_dev() routine
> whereas each interface within the USB device inherits its DMA mask in
> usb_set_configuration().

We had talked about adding support for this for at least six years (probably
much more), but Peter Chen finally added it this year in commit 69bec72598
("USB: core: let USB device know device node").

The main use for it is to let you specify a MAC address for on-board
ethernet devices that lack an EPROM, but any other information can be
added that way too.

> There is a bug  in the USB core because of which the ISB device and interfaces
> do not inherit dma_pfn_offset correctly for which I've sent a patch
> https://lkml.org/lkml/2016/8/17/275

I'm a bit skeptical about this. Clearly if we set the dma_mask, we should
also set the dma_pfn_offset, but what exactly is this used for in USB
devices?

As I understand it, the dma_mask/dma_pfn_offset etc is used for the DMA
mapping interface, but that can't really be used on USB devices, which
I assume use usb_alloc_coherent() and the URB interfaces for passing data
between a USB driver and the HCD. My knowledge of USB device drivers
is a bit lacking, so it's possible I'm misunderstanding things here.

	Arnd
Arnd Bergmann Sept. 7, 2016, 8:48 a.m. UTC | #36
On Wednesday, September 7, 2016 2:33:13 PM CEST Peter Chen wrote:
> > 
> > Right, that should make it work with iommu as well. However, it does
> > not solve the other issue I mentioned above, with boards that have
> > USB devices hardwired to a chipidea host controller that need
> > configuration from DT. For that, we still need to come up with another
> > way to associate the DT hierarchy in the host bridge node with
> > the Linux platform_device.
> > 
> 
> Why? The DMA configuration is for host controller, not for USB device.
> No matter there is hardwired or hotplug devices, the DMA configuration
> for host controller are both inherited from glue layer platform devices,
> current implementation is at function ci_hdrc_add_device,
> drivers/usb/chipidea/core.c.

I wasn't referring to DMA configuration there, but only to how we
set the of_node pointer in register_root_hub, which you added in
dc5878abf49 ("usb: core: move root hub's device node assignment after
it is added to bus") as:

	usb_dev->dev.of_node = parent_dev->of_node;

As I understand, parent_dev (aka hcd->self.controller) here
refers to the device that you add in ci_hdrc_add_device(),
which does not have an of_node pointer, so we actually
want parent_dev->parent->of_node.

I'm sure you understand that code better than me, so let me
know what my mistake is if this indeed works correctly.



Regarding the DMA configuration that you mention in ci_hdrc_add_device(),
I think we should replace 

        pdev->dev.dma_mask = dev->dma_mask;
        pdev->dev.dma_parms = dev->dma_parms;
        dma_set_coherent_mask(&pdev->dev, dev->coherent_dma_mask);

with of_dma_configure(), which has the chance to configure more than
just those three, as the dma API might look into different aspects:

- iommu specific configuration
- cache coherency information
- bus type
- dma offset
- dma_map_ops pointer

We try to handle everything in of_dma_configure() at configuration
time, and that would be the place to add anything else that we might
need in the future.

	Arnd
Arnd Bergmann Sept. 7, 2016, 8:52 a.m. UTC | #37
On Wednesday, September 7, 2016 3:44:28 PM CEST Peter Chen wrote:
> 
> The pre-condition of DT function at USB HCD core works is the host
> controller device has of_node, since it is the root node for USB tree
> described at DT. If the host controller device is not at DT, it needs
> to try to get its of_node, the chipidea driver gets it through its
> parent node [1]

> 
> [1] https://lkml.org/lkml/2016/8/8/119
> 

Ah, this is what I was referring to in the other mail.

However, the way you set the of_node might be dangerous too:
We should generally not have two platform_device structures with
the same of_node pointer, most importantly it may cause the
child device to be bound to the same driver as the parent
device since the probing is done by compatible string.

As you tested it successfully, it must work at the moment on your
machine, but it could easily break depending on deferred probing
or module load order.

	Arnd
Peter Chen Sept. 7, 2016, 9:29 a.m. UTC | #38
On Wed, Sep 07, 2016 at 10:52:46AM +0200, Arnd Bergmann wrote:
> On Wednesday, September 7, 2016 3:44:28 PM CEST Peter Chen wrote:
> > 
> > The pre-condition of DT function at USB HCD core works is the host
> > controller device has of_node, since it is the root node for USB tree
> > described at DT. If the host controller device is not at DT, it needs
> > to try to get its of_node, the chipidea driver gets it through its
> > parent node [1]
> 
> > 
> > [1] https://lkml.org/lkml/2016/8/8/119
> > 
> 
> Ah, this is what I was referring to in the other mail.
> 
> However, the way you set the of_node might be dangerous too:
> We should generally not have two platform_device structures with
> the same of_node pointer, most importantly it may cause the
> child device to be bound to the same driver as the parent
> device since the probing is done by compatible string.
> 
> As you tested it successfully, it must work at the moment on your
> machine, but it could easily break depending on deferred probing
> or module load order.
> 

Currently, I work around above problems by setting core device of_node
as NULL at both probe error path and platform driver .remove routine.

I admit it is not a good way, but if we only have of_node at device's
life periods after probe, it seems ok currently. It is hard to create
of_node dynamically when create device, and keep some contents
of parent's of_node, and some are not.
Russell King (Oracle) Sept. 7, 2016, 9:35 a.m. UTC | #39
On Wed, Sep 07, 2016 at 05:29:01PM +0800, Peter Chen wrote:
> On Wed, Sep 07, 2016 at 10:52:46AM +0200, Arnd Bergmann wrote:
> > On Wednesday, September 7, 2016 3:44:28 PM CEST Peter Chen wrote:
> > > 
> > > The pre-condition of DT function at USB HCD core works is the host
> > > controller device has of_node, since it is the root node for USB tree
> > > described at DT. If the host controller device is not at DT, it needs
> > > to try to get its of_node, the chipidea driver gets it through its
> > > parent node [1]
> > 
> > > 
> > > [1] https://lkml.org/lkml/2016/8/8/119
> > > 
> > 
> > Ah, this is what I was referring to in the other mail.
> > 
> > However, the way you set the of_node might be dangerous too:
> > We should generally not have two platform_device structures with
> > the same of_node pointer, most importantly it may cause the
> > child device to be bound to the same driver as the parent
> > device since the probing is done by compatible string.
> > 
> > As you tested it successfully, it must work at the moment on your
> > machine, but it could easily break depending on deferred probing
> > or module load order.
> > 
> 
> Currently, I work around above problems by setting core device of_node
> as NULL at both probe error path and platform driver .remove routine.
> 
> I admit it is not a good way, but if we only have of_node at device's
> life periods after probe, it seems ok currently. It is hard to create
> of_node dynamically when create device, and keep some contents
> of parent's of_node, and some are not.

How about turning dwc3 into a library which can be used by a range of
platform devices?  Wouldn't that solve all the current problems, and
completely avoid the need to copy resources from one platform device
to another?
Peter Chen Sept. 7, 2016, 9:55 a.m. UTC | #40
On Wed, Sep 07, 2016 at 10:48:06AM +0200, Arnd Bergmann wrote:
> On Wednesday, September 7, 2016 2:33:13 PM CEST Peter Chen wrote:
> > > 
> > > Right, that should make it work with iommu as well. However, it does
> > > not solve the other issue I mentioned above, with boards that have
> > > USB devices hardwired to a chipidea host controller that need
> > > configuration from DT. For that, we still need to come up with another
> > > way to associate the DT hierarchy in the host bridge node with
> > > the Linux platform_device.
> > > 
> > 
> > Why? The DMA configuration is for host controller, not for USB device.
> > No matter there is hardwired or hotplug devices, the DMA configuration
> > for host controller are both inherited from glue layer platform devices,
> > current implementation is at function ci_hdrc_add_device,
> > drivers/usb/chipidea/core.c.
> 
> I wasn't referring to DMA configuration there, but only to how we
> set the of_node pointer in register_root_hub, which you added in
> dc5878abf49 ("usb: core: move root hub's device node assignment after
> it is added to bus") as:
> 
> 	usb_dev->dev.of_node = parent_dev->of_node;
> 
> As I understand, parent_dev (aka hcd->self.controller) here
> refers to the device that you add in ci_hdrc_add_device(),
> which does not have an of_node pointer, so we actually
> want parent_dev->parent->of_node.

For platform devices, like chipidea and dwc3, it is correct, since
they don't have of_node. If the host controller has of_node, it
is controller's of_node. In order to let USB HCD core life be easy,
we need to let hcd->self.controller own of_node when it is added
to HCD core, no matter it has from the dts or get it dynamically.

> 
> I'm sure you understand that code better than me, so let me
> know what my mistake is if this indeed works correctly.
> 

Your understanding is correct, just some have of_node from dts, some
(chipidea/dwc3) need to have assignment at its driver.

> 
> 
> Regarding the DMA configuration that you mention in ci_hdrc_add_device(),
> I think we should replace 
> 
>         pdev->dev.dma_mask = dev->dma_mask;
>         pdev->dev.dma_parms = dev->dma_parms;
>         dma_set_coherent_mask(&pdev->dev, dev->coherent_dma_mask);
> 
> with of_dma_configure(), which has the chance to configure more than
> just those three, as the dma API might look into different aspects:
> 
> - iommu specific configuration
> - cache coherency information
> - bus type
> - dma offset
> - dma_map_ops pointer
> 
> We try to handle everything in of_dma_configure() at configuration
> time, and that would be the place to add anything else that we might
> need in the future.
> 

Yes, I agree with you, but just like Felipe mentioned, we also need to
consider PCI device, can we do something like gpiod_get_index does? Are
there any similar APIs like of_dma_configure for ACPI?
Felipe Balbi Sept. 7, 2016, 10:18 a.m. UTC | #41
Hi,

Russell King - ARM Linux <linux@armlinux.org.uk> writes:
> On Wed, Sep 07, 2016 at 05:29:01PM +0800, Peter Chen wrote:
>> On Wed, Sep 07, 2016 at 10:52:46AM +0200, Arnd Bergmann wrote:
>> > On Wednesday, September 7, 2016 3:44:28 PM CEST Peter Chen wrote:
>> > > 
>> > > The pre-condition of DT function at USB HCD core works is the host
>> > > controller device has of_node, since it is the root node for USB tree
>> > > described at DT. If the host controller device is not at DT, it needs
>> > > to try to get its of_node, the chipidea driver gets it through its
>> > > parent node [1]
>> > 
>> > > 
>> > > [1] https://lkml.org/lkml/2016/8/8/119
>> > > 
>> > 
>> > Ah, this is what I was referring to in the other mail.
>> > 
>> > However, the way you set the of_node might be dangerous too:
>> > We should generally not have two platform_device structures with
>> > the same of_node pointer, most importantly it may cause the
>> > child device to be bound to the same driver as the parent
>> > device since the probing is done by compatible string.
>> > 
>> > As you tested it successfully, it must work at the moment on your
>> > machine, but it could easily break depending on deferred probing
>> > or module load order.
>> > 
>> 
>> Currently, I work around above problems by setting core device of_node
>> as NULL at both probe error path and platform driver .remove routine.
>> 
>> I admit it is not a good way, but if we only have of_node at device's
>> life periods after probe, it seems ok currently. It is hard to create
>> of_node dynamically when create device, and keep some contents
>> of parent's of_node, and some are not.
>
> How about turning dwc3 into a library which can be used by a range of
> platform devices?  Wouldn't that solve all the current problems, and
> completely avoid the need to copy resources from one platform device
> to another?

This will break all existing DTs out there. Also, there are other
benefits from keeping current design, these have been discussed before
but here's a short summary:

. PM callbacks are kept simple
. We avoid abuse of internal dwc3 functions
. It's a lot less work to "port" dwc3 to "your SoC"
. We prevent another MUSB (drivers/usb/musb/)

And few others. Sure, they are rather subjective benefits, but it has
worked well so far. Also, breaking DT ABI is kind of a big deal.
Felipe Balbi Sept. 7, 2016, 10:24 a.m. UTC | #42
Hi,

Arnd Bergmann <arnd@arndb.de> writes:

[...]

> Regarding the DMA configuration that you mention in ci_hdrc_add_device(),
> I think we should replace 
>
>         pdev->dev.dma_mask = dev->dma_mask;
>         pdev->dev.dma_parms = dev->dma_parms;
>         dma_set_coherent_mask(&pdev->dev, dev->coherent_dma_mask);
>
> with of_dma_configure(), which has the chance to configure more than
> just those three, as the dma API might look into different aspects:
>
> - iommu specific configuration
> - cache coherency information
> - bus type
> - dma offset
> - dma_map_ops pointer
>
> We try to handle everything in of_dma_configure() at configuration
> time, and that would be the place to add anything else that we might
> need in the future.

There are a couple problems with this:

1) won't work for PCI-based systems.

DWC3 is used in production PCI-based HW and also in Synopsys HAPS DX
platform (FPGA that appears like a PCI card to host PC)


2) not very robust solution

of_dma_configure() will hardcode 32-bit DMA dmask for xhci-plat because
that's not created by DT. The only reason why this works at all is
because of the default 32-bit dma mask thing :-) So, how is it any
different than copying 32-bit dma mask from parent?
Robin Murphy Sept. 7, 2016, 10:33 a.m. UTC | #43
On 07/09/16 10:55, Peter Chen wrote:
[...]
>> Regarding the DMA configuration that you mention in ci_hdrc_add_device(),
>> I think we should replace 
>>
>>         pdev->dev.dma_mask = dev->dma_mask;
>>         pdev->dev.dma_parms = dev->dma_parms;
>>         dma_set_coherent_mask(&pdev->dev, dev->coherent_dma_mask);
>>
>> with of_dma_configure(), which has the chance to configure more than
>> just those three, as the dma API might look into different aspects:
>>
>> - iommu specific configuration
>> - cache coherency information
>> - bus type
>> - dma offset
>> - dma_map_ops pointer
>>
>> We try to handle everything in of_dma_configure() at configuration
>> time, and that would be the place to add anything else that we might
>> need in the future.
>>
> 
> Yes, I agree with you, but just like Felipe mentioned, we also need to
> consider PCI device, can we do something like gpiod_get_index does? Are
> there any similar APIs like of_dma_configure for ACPI?

Not yet, but Lorenzo has one in progress[1], primarily for the sake of
abstracting away the IOMMU configuration.

Robin.

[1]:http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1209911.html
Felipe Balbi Sept. 7, 2016, 10:47 a.m. UTC | #44
Hi,

Robin Murphy <robin.murphy@arm.com> writes:
> On 07/09/16 10:55, Peter Chen wrote:
> [...]
>>> Regarding the DMA configuration that you mention in ci_hdrc_add_device(),
>>> I think we should replace 
>>>
>>>         pdev->dev.dma_mask = dev->dma_mask;
>>>         pdev->dev.dma_parms = dev->dma_parms;
>>>         dma_set_coherent_mask(&pdev->dev, dev->coherent_dma_mask);
>>>
>>> with of_dma_configure(), which has the chance to configure more than
>>> just those three, as the dma API might look into different aspects:
>>>
>>> - iommu specific configuration
>>> - cache coherency information
>>> - bus type
>>> - dma offset
>>> - dma_map_ops pointer
>>>
>>> We try to handle everything in of_dma_configure() at configuration
>>> time, and that would be the place to add anything else that we might
>>> need in the future.
>>>
>> 
>> Yes, I agree with you, but just like Felipe mentioned, we also need to
>> consider PCI device, can we do something like gpiod_get_index does? Are
>> there any similar APIs like of_dma_configure for ACPI?
>
> Not yet, but Lorenzo has one in progress[1], primarily for the sake of
> abstracting away the IOMMU configuration.
>
> Robin.
>
> [1]:http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1209911.html

not exported for drivers to use. If Lorenzo is trying to making a
matching API for ACPI systems, then it needs to follow what
of_dma_configure() is doing, and add an EXPORT_SYMBOL_GPL()
Roger Quadros Sept. 7, 2016, 1:04 p.m. UTC | #45
On 07/09/16 11:29, Arnd Bergmann wrote:
> On Wednesday, September 7, 2016 10:17:31 AM CEST Roger Quadros wrote:
>>>
>>> Speaking of that flag, I suppose we need the same logic to know where
>>> to look for USB devices attached to a dwc3 host when we need to describe
>>> them in DT. By default we look for child device nodes under the
>>> node of the HCD device node, but that would be wrong here too.
>>
>> I didn't get this part. Information about USB devices attached to a USB host
>> is never provided in DT because they are always dynamically created via
>> usb_new_device(), whether they are hard-wired on the board or hot-plugged.
>>
>> These USB devices inherit their DMA masks in the usb_alloc_dev() routine
>> whereas each interface within the USB device inherits its DMA mask in
>> usb_set_configuration().
> 
> We had talked about adding support for this for at least six years (probably
> much more), but Peter Chen finally added it this year in commit 69bec72598
> ("USB: core: let USB device know device node").

OK. Thanks for this pointer.
> 
> The main use for it is to let you specify a MAC address for on-board
> ethernet devices that lack an EPROM, but any other information can be
> added that way too.
> 
>> There is a bug  in the USB core because of which the ISB device and interfaces
>> do not inherit dma_pfn_offset correctly for which I've sent a patch
>> https://lkml.org/lkml/2016/8/17/275
> 
> I'm a bit skeptical about this. Clearly if we set the dma_mask, we should
> also set the dma_pfn_offset, but what exactly is this used for in USB
> devices?

Consider the mass storage device case.
USB storage driver creates a scsi host for the mass storage interface in
drivers/usb/storage/usb.c
The scsi host parent device is nothing but the the USB interface device.

Now, __scsi_init_queue() calls scsi_calculate_bounce_limit() to find out
and set the block layer bounce limit.

scsi_calculate_bounce_limit() uses dma_max_pfn(host_dev) to get the bounce_limit.

host_dev is nothing but the device representing the mass storage interface.

If that device doesn't have the right dma_pfn_offset, then dma_max_pfn()
is messed up and the bounce buffer limit is wrong.

> 
> As I understand it, the dma_mask/dma_pfn_offset etc is used for the DMA
> mapping interface, but that can't really be used on USB devices, which
> I assume use usb_alloc_coherent() and the URB interfaces for passing data
> between a USB driver and the HCD. My knowledge of USB device drivers
> is a bit lacking, so it's possible I'm misunderstanding things here.
> 

cheers,
-roger
Arnd Bergmann Sept. 7, 2016, 2:38 p.m. UTC | #46
On Wednesday, September 7, 2016 4:04:52 PM CEST Roger Quadros wrote:
> > The main use for it is to let you specify a MAC address for on-board
> > ethernet devices that lack an EPROM, but any other information can be
> > added that way too.
> > 
> >> There is a bug  in the USB core because of which the ISB device and interfaces
> >> do not inherit dma_pfn_offset correctly for which I've sent a patch
> >> https://lkml.org/lkml/2016/8/17/275
> > 
> > I'm a bit skeptical about this. Clearly if we set the dma_mask, we should
> > also set the dma_pfn_offset, but what exactly is this used for in USB
> > devices?
> 
> Consider the mass storage device case.
> USB storage driver creates a scsi host for the mass storage interface in
> drivers/usb/storage/usb.c
> The scsi host parent device is nothing but the the USB interface device.
> 
> Now, __scsi_init_queue() calls scsi_calculate_bounce_limit() to find out
> and set the block layer bounce limit.
> 
> scsi_calculate_bounce_limit() uses dma_max_pfn(host_dev) to get the bounce_limit.
> 
> host_dev is nothing but the device representing the mass storage interface.
> 
> If that device doesn't have the right dma_pfn_offset, then dma_max_pfn()
> is messed up and the bounce buffer limit is wrong.

I see. The same thing probably happens in the network and mmc subsystems,
which have similar code.

This shows the inconsistencies we have in the handling for bounce buffers
in the kernel, which are sometimes handled by subsystems but sometimes
rely on swiotlb instead. I don't have any better idea than your patch
here, but maybe we should add a comment explaining that next to the
code.

	Arnd
Lorenzo Pieralisi Sept. 14, 2016, 4:31 p.m. UTC | #47
On Wed, Sep 07, 2016 at 01:47:22PM +0300, Felipe Balbi wrote:
> 
> Hi,
> 
> Robin Murphy <robin.murphy@arm.com> writes:
> > On 07/09/16 10:55, Peter Chen wrote:
> > [...]
> >>> Regarding the DMA configuration that you mention in ci_hdrc_add_device(),
> >>> I think we should replace 
> >>>
> >>>         pdev->dev.dma_mask = dev->dma_mask;
> >>>         pdev->dev.dma_parms = dev->dma_parms;
> >>>         dma_set_coherent_mask(&pdev->dev, dev->coherent_dma_mask);
> >>>
> >>> with of_dma_configure(), which has the chance to configure more than
> >>> just those three, as the dma API might look into different aspects:
> >>>
> >>> - iommu specific configuration
> >>> - cache coherency information
> >>> - bus type
> >>> - dma offset
> >>> - dma_map_ops pointer
> >>>
> >>> We try to handle everything in of_dma_configure() at configuration
> >>> time, and that would be the place to add anything else that we might
> >>> need in the future.
> >>>
> >> 
> >> Yes, I agree with you, but just like Felipe mentioned, we also need to
> >> consider PCI device, can we do something like gpiod_get_index does? Are
> >> there any similar APIs like of_dma_configure for ACPI?
> >
> > Not yet, but Lorenzo has one in progress[1], primarily for the sake of
> > abstracting away the IOMMU configuration.
> >
> > Robin.
> >
> > [1]:http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1209911.html
> 
> not exported for drivers to use. If Lorenzo is trying to making a
> matching API for ACPI systems, then it needs to follow what
> of_dma_configure() is doing, and add an EXPORT_SYMBOL_GPL()

That's easy enough, not sure I understand though why
of_dma_deconfigure() is not exported then. The second question mark
is about the dma-ranges equivalent in ACPI world; the _DMA method
seems to be the exact equivalent but to the best of my knowledge
it is ignored by the kernel, to really have an of_dma_configure()
equivalent that's really necessary, unless we want to resort to
arch specific methods (is that what x86 is currently doing ?) to
retrieve/build the dma masks.

Lorenzo
Arnd Bergmann Sept. 14, 2016, 9:50 p.m. UTC | #48
On Wednesday, September 14, 2016 5:31:36 PM CEST Lorenzo Pieralisi wrote:
> On Wed, Sep 07, 2016 at 01:47:22PM +0300, Felipe Balbi wrote:
> > 
> > Hi,
> > 
> > Robin Murphy <robin.murphy@arm.com> writes:
> > > On 07/09/16 10:55, Peter Chen wrote:
> > > [...]
> > >>> Regarding the DMA configuration that you mention in ci_hdrc_add_device(),
> > >>> I think we should replace 
> > >>>
> > >>>         pdev->dev.dma_mask = dev->dma_mask;
> > >>>         pdev->dev.dma_parms = dev->dma_parms;
> > >>>         dma_set_coherent_mask(&pdev->dev, dev->coherent_dma_mask);
> > >>>
> > >>> with of_dma_configure(), which has the chance to configure more than
> > >>> just those three, as the dma API might look into different aspects:
> > >>>
> > >>> - iommu specific configuration
> > >>> - cache coherency information
> > >>> - bus type
> > >>> - dma offset
> > >>> - dma_map_ops pointer
> > >>>
> > >>> We try to handle everything in of_dma_configure() at configuration
> > >>> time, and that would be the place to add anything else that we might
> > >>> need in the future.
> > >>>
> > >> 
> > >> Yes, I agree with you, but just like Felipe mentioned, we also need to
> > >> consider PCI device, can we do something like gpiod_get_index does? Are
> > >> there any similar APIs like of_dma_configure for ACPI?
> > >
> > > Not yet, but Lorenzo has one in progress[1], primarily for the sake of
> > > abstracting away the IOMMU configuration.
> > >
> > > Robin.
> > >
> > > [1]:http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1209911.html
> > 
> > not exported for drivers to use. If Lorenzo is trying to making a
> > matching API for ACPI systems, then it needs to follow what
> > of_dma_configure() is doing, and add an EXPORT_SYMBOL_GPL()
> 
> That's easy enough, not sure I understand though why
> of_dma_deconfigure() is not exported then. The second question mark
> is about the dma-ranges equivalent in ACPI world; the _DMA method
> seems to be the exact equivalent but to the best of my knowledge
> it is ignored by the kernel, to really have an of_dma_configure()
> equivalent that's really necessary, unless we want to resort to
> arch specific methods (is that what x86 is currently doing ?) to
> retrieve/build the dma masks.

Please see the follow-up emails after my proposed patch: if we add
a pointer to the device that firmware knows about in the USB core layer,
there is no longer a problem to be solved and the DMA operations will
do the right thing.

	Arnd
diff mbox

Patch

diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
index adc1e8a624cb..74b599269e2c 100644
--- a/drivers/usb/dwc3/dwc3-pci.c
+++ b/drivers/usb/dwc3/dwc3-pci.c
@@ -152,6 +152,8 @@  static int dwc3_pci_probe(struct pci_dev *pci,
 		return -ENOMEM;
 	}
 
+	dma_inherit(&dwc->dev, dev);
+
 	memset(res, 0x00, sizeof(struct resource) * ARRAY_SIZE(res));
 
 	res[0].start	= pci_resource_start(pci, 0);