Message ID | 50DD6311.9000002@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 28.12.2012 11:14, Mark Zhang wrote: > diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c > index a936902..c3ded60 100644 > --- a/drivers/gpu/drm/tegra/gr2d.c > +++ b/drivers/gpu/drm/tegra/gr2d.c > @@ -131,6 +131,14 @@ static int gr2d_submit(struct tegra_drm_context > *context, > if (err) > goto fail; > > + /* We define CMA as an temporary solution in host1x driver. > + That's also why we have a CMA kernel config in it. > + But seems here, in tegradrm, we hardcode the CMA here. > + So what do you do when host1x change to IOMMU? > + Do we also need to define a CMA config in tegradrm > driver, > + then after host1x changes to IOMMU, we add another IOMMU > + config in tegradrm? Or we should invent another more > + generic wrapper layer here? */ > cmdbuf.mem = handle_cma_to_host1x(drm, file_priv, > cmdbuf.mem); Lucas is working on host1x allocator, so let's defer this comment until we have Lucas' code. > diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c > index cc8ca2f..0867b72 100644 > --- a/drivers/gpu/host1x/job.c > +++ b/drivers/gpu/host1x/job.c > @@ -82,6 +82,26 @@ struct host1x_job *host1x_job_alloc(struct > host1x_channel *ch, > mem += num_unpins * sizeof(dma_addr_t); > job->pin_ids = num_unpins ? mem : NULL; > > + /* I think this is a somewhat ugly design. > + Actually "addr_phys" is consisted by "reloc_addr_phys" and > + "gather_addr_phys". > + Why don't we just declare "reloc_addr_phys" and > "gather_addr_phys"? > + In current design, let's say if one nvhost newbie changes the > order > + of these 2 blocks of code in function "pin_job_mem": > + > + for (i = 0; i < job->num_relocs; i++) { > + struct host1x_reloc *reloc = &job->relocarray[i]; > + job->pin_ids[count] = reloc->target; > + count++; > + } > + > + for (i = 0; i < job->num_gathers; i++) { > + struct host1x_job_gather *g = &job->gathers[i]; > + job->pin_ids[count] = g->mem_id; > + count++; > + } > + > + Then likely something weird occurs... */ We do this because this way we can implement batch pinning of memory handles. This way we can decrease memory handle management a lot as we need to do locking only once per submit. Decreasing memory management overhead is really important, because in complex graphics cases, there are might be a hundreds of buffer references per submit, and several submits per frame. Any extra overhead relates directly to reduced performance. > job->reloc_addr_phys = job->addr_phys; > job->gather_addr_phys = &job->addr_phys[num_relocs]; > > @@ -252,6 +272,10 @@ static int do_relocs(struct host1x_job *job, > } > } > > + /* I have question here. Does this mean the address info > + which need to be relocated(according to the libdrm codes, > + seems this address is "0xDEADBEEF") always staying at the > + beginning of a page? */ > __raw_writel( > (job->reloc_addr_phys[i] + > reloc->target_offset) >> reloc->shift, No - the slot can be anywhere. That's why we have cmdbuf_offset in the reloc struct. > @@ -565,6 +589,7 @@ int host1x_job_pin(struct host1x_job *job, struct > platform_device *pdev) > } > } > > + /* if (host1x_firewall && !err) { */ > if (host1x_firewall) { > err = copy_gathers(job, pdev); > if (err) { Will add. > @@ -573,6 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struct > platform_device *pdev) > } > } > > +/* Rename this label to "out" or something else. > + Because if everything goes right, the codes under this label also > + get executed. */ > fail: > wmb(); Will do. > > diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c > index f3954f7..bb5763d 100644 > --- a/drivers/gpu/host1x/memmgr.c > +++ b/drivers/gpu/host1x/memmgr.c > @@ -156,6 +156,9 @@ int host1x_memmgr_pin_array_ids(struct > platform_device *dev, > count, &unpin_data[pin_count], > phys_addr); > > + /* I don't think the current "host1x_cma_pin_array_ids" > + is able to return a negative value. So this "if" doesn't > + make sense...*/ > if (cma_count < 0) { > /* clean up previous handles */ > while (pin_count) { It should return negative in case of error. Terje
On 01/02/2013 05:42 PM, Terje Bergström wrote: > On 28.12.2012 11:14, Mark Zhang wrote: >> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c >> index a936902..c3ded60 100644 >> --- a/drivers/gpu/drm/tegra/gr2d.c >> +++ b/drivers/gpu/drm/tegra/gr2d.c >> @@ -131,6 +131,14 @@ static int gr2d_submit(struct tegra_drm_context >> *context, >> if (err) >> goto fail; >> >> + /* We define CMA as an temporary solution in host1x driver. >> + That's also why we have a CMA kernel config in it. >> + But seems here, in tegradrm, we hardcode the CMA here. >> + So what do you do when host1x change to IOMMU? >> + Do we also need to define a CMA config in tegradrm >> driver, >> + then after host1x changes to IOMMU, we add another IOMMU >> + config in tegradrm? Or we should invent another more >> + generic wrapper layer here? */ >> cmdbuf.mem = handle_cma_to_host1x(drm, file_priv, >> cmdbuf.mem); > > Lucas is working on host1x allocator, so let's defer this comment until > we have Lucas' code. > OK. >> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c >> index cc8ca2f..0867b72 100644 >> --- a/drivers/gpu/host1x/job.c >> +++ b/drivers/gpu/host1x/job.c >> @@ -82,6 +82,26 @@ struct host1x_job *host1x_job_alloc(struct >> host1x_channel *ch, >> mem += num_unpins * sizeof(dma_addr_t); >> job->pin_ids = num_unpins ? mem : NULL; >> >> + /* I think this is a somewhat ugly design. >> + Actually "addr_phys" is consisted by "reloc_addr_phys" and >> + "gather_addr_phys". >> + Why don't we just declare "reloc_addr_phys" and >> "gather_addr_phys"? >> + In current design, let's say if one nvhost newbie changes the >> order >> + of these 2 blocks of code in function "pin_job_mem": >> + >> + for (i = 0; i < job->num_relocs; i++) { >> + struct host1x_reloc *reloc = &job->relocarray[i]; >> + job->pin_ids[count] = reloc->target; >> + count++; >> + } >> + >> + for (i = 0; i < job->num_gathers; i++) { >> + struct host1x_job_gather *g = &job->gathers[i]; >> + job->pin_ids[count] = g->mem_id; >> + count++; >> + } >> + >> + Then likely something weird occurs... */ > > We do this because this way we can implement batch pinning of memory > handles. This way we can decrease memory handle management a lot as we > need to do locking only once per submit. > Sorry I didn't get it. Yes, in current design, you can pin all mem handles in one time but I haven't found anything related with "locking only once per submit". My idea is: - remove "job->addr_phys" - assign "job->reloc_addr_phys" & "job->gather_addr_phys" separately - In "pin_job_mem", just call "host1x_memmgr_pin_array_ids" twice to fill the "reloc_addr_phy" & "gather_addr_phys". Anything I misunderstood? > Decreasing memory management overhead is really important, because in > complex graphics cases, there are might be a hundreds of buffer > references per submit, and several submits per frame. Any extra overhead > relates directly to reduced performance. > >> job->reloc_addr_phys = job->addr_phys; >> job->gather_addr_phys = &job->addr_phys[num_relocs]; >> >> @@ -252,6 +272,10 @@ static int do_relocs(struct host1x_job *job, >> } >> } >> >> + /* I have question here. Does this mean the address info >> + which need to be relocated(according to the libdrm codes, >> + seems this address is "0xDEADBEEF") always staying at the >> + beginning of a page? */ >> __raw_writel( >> (job->reloc_addr_phys[i] + >> reloc->target_offset) >> reloc->shift, > > No - the slot can be anywhere. That's why we have cmdbuf_offset in the > reloc struct. > Yes. Sorry I must misread the code before. >> @@ -565,6 +589,7 @@ int host1x_job_pin(struct host1x_job *job, struct >> platform_device *pdev) >> } >> } >> >> + /* if (host1x_firewall && !err) { */ >> if (host1x_firewall) { >> err = copy_gathers(job, pdev); >> if (err) { > > Will add. > >> @@ -573,6 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struct >> platform_device *pdev) >> } >> } >> >> +/* Rename this label to "out" or something else. >> + Because if everything goes right, the codes under this label also >> + get executed. */ >> fail: >> wmb(); > > Will do. > >> >> diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c >> index f3954f7..bb5763d 100644 >> --- a/drivers/gpu/host1x/memmgr.c >> +++ b/drivers/gpu/host1x/memmgr.c >> @@ -156,6 +156,9 @@ int host1x_memmgr_pin_array_ids(struct >> platform_device *dev, >> count, &unpin_data[pin_count], >> phys_addr); >> >> + /* I don't think the current "host1x_cma_pin_array_ids" >> + is able to return a negative value. So this "if" doesn't >> + make sense...*/ >> if (cma_count < 0) { >> /* clean up previous handles */ >> while (pin_count) { > > It should return negative in case of error. > "host1x_cma_pin_array_ids" doesn't return negative value right now, so maybe you need to take a look at it. > Terje >
On 03.01.2013 05:31, Mark Zhang wrote: > Sorry I didn't get it. Yes, in current design, you can pin all mem > handles in one time but I haven't found anything related with "locking > only once per submit". > > My idea is: > - remove "job->addr_phys" > - assign "job->reloc_addr_phys" & "job->gather_addr_phys" separately > - In "pin_job_mem", just call "host1x_memmgr_pin_array_ids" twice to > fill the "reloc_addr_phy" & "gather_addr_phys". > > Anything I misunderstood? The current design uses CMA, which makes pinning basically a no-op. When we have IOMMU support, pinning requires calling into IOMMU. Updating SMMU tables requires locking, and probably maintenance before SMMU code also requires its own locked data tables. For example, preventing duplicate pinning might require a global table of handles. Putting all of the handles in one table allows doing duplicate detection across cmdbuf and reloc tables. This allows pinning each buffer exactly once, which reduces number of calls to IOMMU. > "host1x_cma_pin_array_ids" doesn't return negative value right now, so > maybe you need to take a look at it. True, and also a consequence of using CMA: pinning can never fail. With IOMMU, pinning can fail. Terje
On 01/03/2013 01:50 PM, Terje Bergström wrote: > On 03.01.2013 05:31, Mark Zhang wrote: >> Sorry I didn't get it. Yes, in current design, you can pin all mem >> handles in one time but I haven't found anything related with "locking >> only once per submit". >> >> My idea is: >> - remove "job->addr_phys" >> - assign "job->reloc_addr_phys" & "job->gather_addr_phys" separately >> - In "pin_job_mem", just call "host1x_memmgr_pin_array_ids" twice to >> fill the "reloc_addr_phy" & "gather_addr_phys". >> >> Anything I misunderstood? > > The current design uses CMA, which makes pinning basically a no-op. When > we have IOMMU support, pinning requires calling into IOMMU. Updating > SMMU tables requires locking, and probably maintenance before SMMU code > also requires its own locked data tables. For example, preventing > duplicate pinning might require a global table of handles. > > Putting all of the handles in one table allows doing duplicate detection > across cmdbuf and reloc tables. This allows pinning each buffer exactly > once, which reduces number of calls to IOMMU. > Thanks Terje. Yes, I understand IOMMU will bring in more operations. But I'm not convinced that separating 2 arrays have lots of overheads than putting them into one array. But it doesn't matter, after the IOMMU support version released, I can read this part of codes again. Let's talk about this at that time. >> "host1x_cma_pin_array_ids" doesn't return negative value right now, so >> maybe you need to take a look at it. > > True, and also a consequence of using CMA: pinning can never fail. With > IOMMU, pinning can fail. Got it. Agree. > > Terje >
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c index a936902..c3ded60 100644 --- a/drivers/gpu/drm/tegra/gr2d.c +++ b/drivers/gpu/drm/tegra/gr2d.c @@ -131,6 +131,14 @@ static int gr2d_submit(struct tegra_drm_context *context, if (err) goto fail; + /* We define CMA as an temporary solution in host1x driver. + That's also why we have a CMA kernel config in it. + But seems here, in tegradrm, we hardcode the CMA here. + So what do you do when host1x change to IOMMU? + Do we also need to define a CMA config in tegradrm driver, + then after host1x changes to IOMMU, we add another IOMMU + config in tegradrm? Or we should invent another more + generic wrapper layer here? */ cmdbuf.mem = handle_cma_to_host1x(drm, file_priv, cmdbuf.mem); if (!cmdbuf.mem) goto fail; diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index cc8ca2f..0867b72 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -82,6 +82,26 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch, mem += num_unpins * sizeof(dma_addr_t); job->pin_ids = num_unpins ? mem : NULL; + /* I think this is a somewhat ugly design. + Actually "addr_phys" is consisted by "reloc_addr_phys" and + "gather_addr_phys". + Why don't we just declare "reloc_addr_phys" and "gather_addr_phys"? + In current design, let's say if one nvhost newbie changes the order + of these 2 blocks of code in function "pin_job_mem": + + for (i = 0; i < job->num_relocs; i++) { + struct host1x_reloc *reloc = &job->relocarray[i]; + job->pin_ids[count] = reloc->target; + count++; + } + + for (i = 0; i < job->num_gathers; i++) { + struct host1x_job_gather *g = &job->gathers[i]; + job->pin_ids[count] = g->mem_id; + count++; + } + + Then likely something weird occurs... */ job->reloc_addr_phys = job->addr_phys; job->gather_addr_phys = &job->addr_phys[num_relocs]; @@ -252,6 +272,10 @@ static int do_relocs(struct host1x_job *job, } } + /* I have question here. Does this mean the address info + which need to be relocated(according to the libdrm codes, + seems this address is "0xDEADBEEF") always staying at the + beginning of a page? */ __raw_writel( (job->reloc_addr_phys[i] + reloc->target_offset) >> reloc->shift, @@ -565,6 +589,7 @@ int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev) } } + /* if (host1x_firewall && !err) { */ if (host1x_firewall) { err = copy_gathers(job, pdev); if (err) { @@ -573,6 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev) } } +/* Rename this label to "out" or something else. + Because if everything goes right, the codes under this label also + get executed. */ fail: wmb(); diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c index f3954f7..bb5763d 100644 --- a/drivers/gpu/host1x/memmgr.c +++ b/drivers/gpu/host1x/memmgr.c @@ -156,6 +156,9 @@ int host1x_memmgr_pin_array_ids(struct platform_device *dev, count, &unpin_data[pin_count], phys_addr); + /* I don't think the current "host1x_cma_pin_array_ids" + is able to return a negative value. So this "if" doesn't + make sense...*/ if (cma_count < 0) { /* clean up previous handles */