mbox series

[RFT,v3,0/4] ACPI: ACPICA / OSL: Avoid unmapping ACPI memory inside of the AML interpreter

Message ID 2788992.3K7huLjdjL@kreacher (mailing list archive)
Headers show
Series ACPI: ACPICA / OSL: Avoid unmapping ACPI memory inside of the AML interpreter | expand

Message

Rafael J. Wysocki June 26, 2020, 5:28 p.m. UTC
Hi All,

On Monday, June 22, 2020 3:50:42 PM CEST Rafael J. Wysocki wrote:
> Hi All,
> 
> This series is to address the problem with RCU synchronization occurring,
> possibly relatively often, inside of acpi_ex_system_memory_space_handler(),
> when the namespace and interpreter mutexes are held.
> 
> Like I said before, I had decided to change the approach used in the previous
> iteration of this series and to allow the unmap operations carried out by 
> acpi_ex_system_memory_space_handler() to be deferred in the first place,
> which is done in patches [1-2/4].

In the meantime I realized that calling syncrhonize_rcu_expedited() under the
"tables" mutex within ACPICA is not quite a good idea too and that there is no
reason for any users of acpi_os_unmap_memory() in the tree to use the "sync"
variant of unmapping.

So, unless I'm missing something, acpi_os_unmap_memory() can be changed to
always defer the final unmapping and the only ACPICA change needed to support
that is the addition of the acpi_os_release_unused_mappings() call to get rid
of the unused mappings when leaving the interpreter (module the extra call in
the debug code for consistency).

So patches [1-2/4] have been changed accordingly.

> However, it turns out that the "fast-path" mapping is still useful on top of
> the above to reduce the number of ioremap-iounmap cycles for the same address
> range and so it is introduced by patches [3-4/4].

Patches [3-4/4] still do what they did, but they have been simplified a bit
after rebasing on top of the new [1-2/4].

The below information is still valid, but it applies to the v3, of course.

> For details, please refer to the patch changelogs.
> 
> The series is available from the git branch at
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
>  acpica-osl
> 
> for easier testing.

Also the series have been tested locally.

Thanks,
Rafael

Comments

Dan Williams June 26, 2020, 6:41 p.m. UTC | #1
On Fri, Jun 26, 2020 at 10:34 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> Hi All,
>
> On Monday, June 22, 2020 3:50:42 PM CEST Rafael J. Wysocki wrote:
> > Hi All,
> >
> > This series is to address the problem with RCU synchronization occurring,
> > possibly relatively often, inside of acpi_ex_system_memory_space_handler(),
> > when the namespace and interpreter mutexes are held.
> >
> > Like I said before, I had decided to change the approach used in the previous
> > iteration of this series and to allow the unmap operations carried out by
> > acpi_ex_system_memory_space_handler() to be deferred in the first place,
> > which is done in patches [1-2/4].
>
> In the meantime I realized that calling syncrhonize_rcu_expedited() under the
> "tables" mutex within ACPICA is not quite a good idea too and that there is no
> reason for any users of acpi_os_unmap_memory() in the tree to use the "sync"
> variant of unmapping.
>
> So, unless I'm missing something, acpi_os_unmap_memory() can be changed to
> always defer the final unmapping and the only ACPICA change needed to support
> that is the addition of the acpi_os_release_unused_mappings() call to get rid
> of the unused mappings when leaving the interpreter (module the extra call in
> the debug code for consistency).
>
> So patches [1-2/4] have been changed accordingly.
>
> > However, it turns out that the "fast-path" mapping is still useful on top of
> > the above to reduce the number of ioremap-iounmap cycles for the same address
> > range and so it is introduced by patches [3-4/4].
>
> Patches [3-4/4] still do what they did, but they have been simplified a bit
> after rebasing on top of the new [1-2/4].
>
> The below information is still valid, but it applies to the v3, of course.
>
> > For details, please refer to the patch changelogs.
> >
> > The series is available from the git branch at
> >
> >  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> >  acpica-osl
> >
> > for easier testing.
>
> Also the series have been tested locally.

Ok, I'm still trying to get the original reporter to confirm this
reduces the execution time for ASL routines with a lot of OpRegion
touches. Shall I rebuild that test kernel with these changes, or are
the results from the original RFT still interesting?
Rafael J. Wysocki June 28, 2020, 5:09 p.m. UTC | #2
On Fri, Jun 26, 2020 at 8:41 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Fri, Jun 26, 2020 at 10:34 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > Hi All,
> >
> > On Monday, June 22, 2020 3:50:42 PM CEST Rafael J. Wysocki wrote:
> > > Hi All,
> > >
> > > This series is to address the problem with RCU synchronization occurring,
> > > possibly relatively often, inside of acpi_ex_system_memory_space_handler(),
> > > when the namespace and interpreter mutexes are held.
> > >
> > > Like I said before, I had decided to change the approach used in the previous
> > > iteration of this series and to allow the unmap operations carried out by
> > > acpi_ex_system_memory_space_handler() to be deferred in the first place,
> > > which is done in patches [1-2/4].
> >
> > In the meantime I realized that calling syncrhonize_rcu_expedited() under the
> > "tables" mutex within ACPICA is not quite a good idea too and that there is no
> > reason for any users of acpi_os_unmap_memory() in the tree to use the "sync"
> > variant of unmapping.
> >
> > So, unless I'm missing something, acpi_os_unmap_memory() can be changed to
> > always defer the final unmapping and the only ACPICA change needed to support
> > that is the addition of the acpi_os_release_unused_mappings() call to get rid
> > of the unused mappings when leaving the interpreter (module the extra call in
> > the debug code for consistency).
> >
> > So patches [1-2/4] have been changed accordingly.
> >
> > > However, it turns out that the "fast-path" mapping is still useful on top of
> > > the above to reduce the number of ioremap-iounmap cycles for the same address
> > > range and so it is introduced by patches [3-4/4].
> >
> > Patches [3-4/4] still do what they did, but they have been simplified a bit
> > after rebasing on top of the new [1-2/4].
> >
> > The below information is still valid, but it applies to the v3, of course.
> >
> > > For details, please refer to the patch changelogs.
> > >
> > > The series is available from the git branch at
> > >
> > >  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > >  acpica-osl
> > >
> > > for easier testing.
> >
> > Also the series have been tested locally.
>
> Ok, I'm still trying to get the original reporter to confirm this
> reduces the execution time for ASL routines with a lot of OpRegion
> touches. Shall I rebuild that test kernel with these changes, or are
> the results from the original RFT still interesting?

I'm mostly interested in the results with the v3 applied.

Also it would be good to check the impact of the first two patches
alone relative to all four.

Thanks!
Dan Williams June 29, 2020, 8:46 p.m. UTC | #3
On Sun, Jun 28, 2020 at 10:09 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Jun 26, 2020 at 8:41 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Fri, Jun 26, 2020 at 10:34 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > Hi All,
> > >
> > > On Monday, June 22, 2020 3:50:42 PM CEST Rafael J. Wysocki wrote:
> > > > Hi All,
> > > >
> > > > This series is to address the problem with RCU synchronization occurring,
> > > > possibly relatively often, inside of acpi_ex_system_memory_space_handler(),
> > > > when the namespace and interpreter mutexes are held.
> > > >
> > > > Like I said before, I had decided to change the approach used in the previous
> > > > iteration of this series and to allow the unmap operations carried out by
> > > > acpi_ex_system_memory_space_handler() to be deferred in the first place,
> > > > which is done in patches [1-2/4].
> > >
> > > In the meantime I realized that calling syncrhonize_rcu_expedited() under the
> > > "tables" mutex within ACPICA is not quite a good idea too and that there is no
> > > reason for any users of acpi_os_unmap_memory() in the tree to use the "sync"
> > > variant of unmapping.
> > >
> > > So, unless I'm missing something, acpi_os_unmap_memory() can be changed to
> > > always defer the final unmapping and the only ACPICA change needed to support
> > > that is the addition of the acpi_os_release_unused_mappings() call to get rid
> > > of the unused mappings when leaving the interpreter (module the extra call in
> > > the debug code for consistency).
> > >
> > > So patches [1-2/4] have been changed accordingly.
> > >
> > > > However, it turns out that the "fast-path" mapping is still useful on top of
> > > > the above to reduce the number of ioremap-iounmap cycles for the same address
> > > > range and so it is introduced by patches [3-4/4].
> > >
> > > Patches [3-4/4] still do what they did, but they have been simplified a bit
> > > after rebasing on top of the new [1-2/4].
> > >
> > > The below information is still valid, but it applies to the v3, of course.
> > >
> > > > For details, please refer to the patch changelogs.
> > > >
> > > > The series is available from the git branch at
> > > >
> > > >  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > > >  acpica-osl
> > > >
> > > > for easier testing.
> > >
> > > Also the series have been tested locally.
> >
> > Ok, I'm still trying to get the original reporter to confirm this
> > reduces the execution time for ASL routines with a lot of OpRegion
> > touches. Shall I rebuild that test kernel with these changes, or are
> > the results from the original RFT still interesting?
>
> I'm mostly interested in the results with the v3 applied.
>

Ok, I just got feedback on v2 and it still showed the 30 minute
execution time where 7 minutes was achieved previously.

> Also it would be good to check the impact of the first two patches
> alone relative to all four.

I'll start with the full set and see if they can also support the
"first 2" experiment.
Rafael J. Wysocki June 30, 2020, 11:04 a.m. UTC | #4
On Mon, Jun 29, 2020 at 10:46 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Sun, Jun 28, 2020 at 10:09 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Fri, Jun 26, 2020 at 8:41 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > On Fri, Jun 26, 2020 at 10:34 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > On Monday, June 22, 2020 3:50:42 PM CEST Rafael J. Wysocki wrote:
> > > > > Hi All,
> > > > >
> > > > > This series is to address the problem with RCU synchronization occurring,
> > > > > possibly relatively often, inside of acpi_ex_system_memory_space_handler(),
> > > > > when the namespace and interpreter mutexes are held.
> > > > >
> > > > > Like I said before, I had decided to change the approach used in the previous
> > > > > iteration of this series and to allow the unmap operations carried out by
> > > > > acpi_ex_system_memory_space_handler() to be deferred in the first place,
> > > > > which is done in patches [1-2/4].
> > > >
> > > > In the meantime I realized that calling syncrhonize_rcu_expedited() under the
> > > > "tables" mutex within ACPICA is not quite a good idea too and that there is no
> > > > reason for any users of acpi_os_unmap_memory() in the tree to use the "sync"
> > > > variant of unmapping.
> > > >
> > > > So, unless I'm missing something, acpi_os_unmap_memory() can be changed to
> > > > always defer the final unmapping and the only ACPICA change needed to support
> > > > that is the addition of the acpi_os_release_unused_mappings() call to get rid
> > > > of the unused mappings when leaving the interpreter (module the extra call in
> > > > the debug code for consistency).
> > > >
> > > > So patches [1-2/4] have been changed accordingly.
> > > >
> > > > > However, it turns out that the "fast-path" mapping is still useful on top of
> > > > > the above to reduce the number of ioremap-iounmap cycles for the same address
> > > > > range and so it is introduced by patches [3-4/4].
> > > >
> > > > Patches [3-4/4] still do what they did, but they have been simplified a bit
> > > > after rebasing on top of the new [1-2/4].
> > > >
> > > > The below information is still valid, but it applies to the v3, of course.
> > > >
> > > > > For details, please refer to the patch changelogs.
> > > > >
> > > > > The series is available from the git branch at
> > > > >
> > > > >  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > > > >  acpica-osl
> > > > >
> > > > > for easier testing.
> > > >
> > > > Also the series have been tested locally.
> > >
> > > Ok, I'm still trying to get the original reporter to confirm this
> > > reduces the execution time for ASL routines with a lot of OpRegion
> > > touches. Shall I rebuild that test kernel with these changes, or are
> > > the results from the original RFT still interesting?
> >
> > I'm mostly interested in the results with the v3 applied.
> >
>
> Ok, I just got feedback on v2 and it still showed the 30 minute
> execution time where 7 minutes was achieved previously.

This probably means that "transient" memory opregions, which appear
and go away during the AML execution, are involved and so moving the
RCU synchronization outside of the interpreter and namespace locks is
not enough to cover this case.

It should be covered by the v4
(https://lore.kernel.org/linux-acpi/1666722.UopIai5n7p@kreacher/T/#u),
though, because the unmapping is completely asynchronous in there and
it doesn't add any significant latency to the interpreter exit path.
So I would expect to see much better results with the v4, so I'd
recommend testing this one next.

> > Also it would be good to check the impact of the first two patches
> > alone relative to all four.
>
> I'll start with the full set and see if they can also support the
> "first 2" experiment.

In the v4 there are just two patches, so it should be straightforward
enough to test with and without the top-most one. :-)