diff mbox series

[v18,12/21] dm: add finalize hook to target_type

Message ID 1714775551-22384-13-git-send-email-wufan@linux.microsoft.com (mailing list archive)
State Superseded
Delegated to: Paul Moore
Headers show
Series Integrity Policy Enforcement LSM (IPE) | expand

Commit Message

Fan Wu May 3, 2024, 10:32 p.m. UTC
This patch adds a target finalize hook.

The hook is triggered just before activating an inactive table of a
mapped device. If it returns an error the __bind get cancelled.

The dm-verity target will use this hook to attach the dm-verity's
roothash metadata to the block_device struct of the mapped device.

Signed-off-by: Fan Wu <wufan@linux.microsoft.com>

---
v1-v10:
  + Not present

v11:
  + Introduced

v12:
  + No changes

v13:
  + No changes

v14:
  + Add documentation

v15:
  + No changes

v16:
  + No changes

v17:
  + No changes

v18:
  + No changes
---
 drivers/md/dm.c               | 12 ++++++++++++
 include/linux/device-mapper.h |  9 +++++++++
 2 files changed, 21 insertions(+)

Comments

Mikulas Patocka May 8, 2024, 5:17 p.m. UTC | #1
On Fri, 3 May 2024, Fan Wu wrote:

> This patch adds a target finalize hook.
> 
> The hook is triggered just before activating an inactive table of a
> mapped device. If it returns an error the __bind get cancelled.
> 
> The dm-verity target will use this hook to attach the dm-verity's
> roothash metadata to the block_device struct of the mapped device.
> 
> Signed-off-by: Fan Wu <wufan@linux.microsoft.com>

Hi

Why not use the preresume callback?

Is there some reason why do we need a new callback instead of using the 
existing one?

Mikulas
Fan Wu May 8, 2024, 10:30 p.m. UTC | #2
On 5/8/2024 10:17 AM, Mikulas Patocka wrote:
> 
> 
> On Fri, 3 May 2024, Fan Wu wrote:
> 
>> This patch adds a target finalize hook.
>>
>> The hook is triggered just before activating an inactive table of a
>> mapped device. If it returns an error the __bind get cancelled.
>>
>> The dm-verity target will use this hook to attach the dm-verity's
>> roothash metadata to the block_device struct of the mapped device.
>>
>> Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
> 
> Hi
> 
> Why not use the preresume callback?
> 
> Is there some reason why do we need a new callback instead of using the
> existing one?
> 
> Mikulas
Thanks for the suggestion.

Mike suggested the new finalize() callback. I think the reason for not 
using the preresume() callback is that there are multiple points that 
can fail before activating an inactive table of a mapped device which 
can potentially lead to inconsistent state.

In our specific case, we are trying to associate dm-verity's roothash 
metadata with the block_device struct of the mapped device inside the 
callback.

If we use the preresume() callback for the work and an error occurs 
between the callback and the table activation, this leave the 
block_device struct in an inconsistent state.

This is because now the block device contains the roothash metadata of 
it's inactive table due to the preresume() callback, but the activation 
failed so the mapped device is still using the old table.

The new finalize() callback guarantees that the callback happens just 
before the table activation, thus avoiding the inconsistency.

-Fan
Mikulas Patocka May 9, 2024, 5:07 p.m. UTC | #3
On Wed, 8 May 2024, Fan Wu wrote:

> 
> 
> On 5/8/2024 10:17 AM, Mikulas Patocka wrote:
> > 
> > 
> > On Fri, 3 May 2024, Fan Wu wrote:
> > 
> >> This patch adds a target finalize hook.
> >>
> >> The hook is triggered just before activating an inactive table of a
> >> mapped device. If it returns an error the __bind get cancelled.
> >>
> >> The dm-verity target will use this hook to attach the dm-verity's
> >> roothash metadata to the block_device struct of the mapped device.
> >>
> >> Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
> > 
> > Hi
> > 
> > Why not use the preresume callback?
> > 
> > Is there some reason why do we need a new callback instead of using the
> > existing one?
> > 
> > Mikulas
> Thanks for the suggestion.
> 
> Mike suggested the new finalize() callback. I think the reason for not using
> the preresume() callback is that there are multiple points that can fail
> before activating an inactive table of a mapped device which can potentially
> lead to inconsistent state.
> 
> In our specific case, we are trying to associate dm-verity's roothash metadata
> with the block_device struct of the mapped device inside the callback.
> 
> If we use the preresume() callback for the work and an error occurs between
> the callback and the table activation, this leave the block_device struct in
> an inconsistent state.

The preresume callback is the final GO/NO-GO decision point. If all the 
targets return zero in their preresume callback, then there's no turning 
back and the table will be activated.

> This is because now the block device contains the roothash metadata of it's
> inactive table due to the preresume() callback, but the activation failed so
> the mapped device is still using the old table.
> 
> The new finalize() callback guarantees that the callback happens just before
> the table activation, thus avoiding the inconsistency.

In your patch, it doesn't guarantee that.

do_resume calls dm_swap_table, dm_swap_table calls __bind, __bind calls 
ti->type->finalize. Then we go back to do_resume and call dm_resume which 
calls __dm_resume which calls dm_table_resume_targets which calls the 
preresume callback on all the targets. If some of them fails, it returns a 
failure (despite the fact that ti->type->finalize succeeded), if all of 
them succeed, it calls the resume callback on all of them.

So, it seems that the preresume callback provides the guarantee that you 
looking for.

> -Fan

Mikulas
Fan Wu May 17, 2024, 7:13 p.m. UTC | #4
On 5/9/2024 10:07 AM, Mikulas Patocka wrote:
> 
> 
> On Wed, 8 May 2024, Fan Wu wrote:
> 
>>
>>
>> On 5/8/2024 10:17 AM, Mikulas Patocka wrote:
>>>
>>>
>>> On Fri, 3 May 2024, Fan Wu wrote:
>>>
>>>> This patch adds a target finalize hook.
>>>>
>>>> The hook is triggered just before activating an inactive table of a
>>>> mapped device. If it returns an error the __bind get cancelled.
>>>>
>>>> The dm-verity target will use this hook to attach the dm-verity's
>>>> roothash metadata to the block_device struct of the mapped device.
>>>>
>>>> Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
>>>
>>> Hi
>>>
>>> Why not use the preresume callback?
>>>
>>> Is there some reason why do we need a new callback instead of using the
>>> existing one?
>>>
>>> Mikulas
>> Thanks for the suggestion.
>>
>> Mike suggested the new finalize() callback. I think the reason for not using
>> the preresume() callback is that there are multiple points that can fail
>> before activating an inactive table of a mapped device which can potentially
>> lead to inconsistent state.
>>
>> In our specific case, we are trying to associate dm-verity's roothash metadata
>> with the block_device struct of the mapped device inside the callback.
>>
>> If we use the preresume() callback for the work and an error occurs between
>> the callback and the table activation, this leave the block_device struct in
>> an inconsistent state.
> 
> The preresume callback is the final GO/NO-GO decision point. If all the
> targets return zero in their preresume callback, then there's no turning
> back and the table will be activated.
> 
>> This is because now the block device contains the roothash metadata of it's
>> inactive table due to the preresume() callback, but the activation failed so
>> the mapped device is still using the old table.
>>
>> The new finalize() callback guarantees that the callback happens just before
>> the table activation, thus avoiding the inconsistency.
> 
> In your patch, it doesn't guarantee that.
> 
> do_resume calls dm_swap_table, dm_swap_table calls __bind, __bind calls
> ti->type->finalize. Then we go back to do_resume and call dm_resume which
> calls __dm_resume which calls dm_table_resume_targets which calls the
> preresume callback on all the targets. If some of them fails, it returns a
> failure (despite the fact that ti->type->finalize succeeded), if all of
> them succeed, it calls the resume callback on all of them.
> 
> So, it seems that the preresume callback provides the guarantee that you
> looking for.
> 
>> -Fan
> 
> Mikulas

Thanks for the info. I have tested and verified that the preresume() 
hook can also work for our case.

 From the source code 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/md/dm-ioctl.c#n1149, 
the whole resume process appears to be:

1. Check if there is a new map for the device. If so, attempt to 
activate the new map using dm_swap_table() (where the finalize() 
callback occurs).

2. Check if the device is suspended. If so, use dm_resume() (where the 
preresume() callback occurs) to resume the device.

3. If a new map is activated, use dm_table_destroy() to destroy the old map.

For our case:

- Using the finalize() callback, the metadata of the dm-verity target 
inside the table is attached to the mapped device every time a new table 
is activated.
- Using the preresume() callback, the same metadata is attached every 
time the device resumes from suspension.

If I understand the code correctly, resuming from suspension is a 
necessary step for loading a new mapping table. Thus, the preresume() 
callback covers all conditions where the finalize() callback would be 
triggered. However, the preresume() callback can also be triggered when 
the device resumes from suspension without loading a new table, in which 
case there is no new metadata in the table to attach to the mapped device.

In the scenario where the finalize() callback succeeds but the 
preresume() callback fails, it seems the device will remain in a 
suspended state, the newly activated table will be kept, and the old 
table will be destroyed, so it seems there is no inconsistency using 
finalize() even preresume() potentially fails.

I believe both the finalize() callback proposed by Mike and the 
preresume() callback suggested by Mikulas can work for our case. I am 
fine with either approach, but I would like to know which one is 
preferred by the maintainers and would appreciate an ACK for the chosen 
approach.

-Fan
Mikulas Patocka May 20, 2024, 12:31 p.m. UTC | #5
On Fri, 17 May 2024, Fan Wu wrote:

> > So, it seems that the preresume callback provides the guarantee that you
> > looking for.
> > 
> >> -Fan
> > 
> > Mikulas
> 
> Thanks for the info. I have tested and verified that the preresume() hook can
> also work for our case.
> 
> From the source code
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/md/dm-ioctl.c#n1149,
> the whole resume process appears to be:
> 
> 1. Check if there is a new map for the device. If so, attempt to activate the
> new map using dm_swap_table() (where the finalize() callback occurs).
> 
> 2. Check if the device is suspended. If so, use dm_resume() (where the
> preresume() callback occurs) to resume the device.
> 
> 3. If a new map is activated, use dm_table_destroy() to destroy the old map.
> 
> For our case:
> 
> - Using the finalize() callback, the metadata of the dm-verity target inside
> the table is attached to the mapped device every time a new table is
> activated.
> - Using the preresume() callback, the same metadata is attached every time the
> device resumes from suspension.
> 
> If I understand the code correctly, resuming from suspension is a necessary
> step for loading a new mapping table. Thus, the preresume() callback covers
> all conditions where the finalize() callback would be triggered.

Yes.

> However, the preresume() callback can also be triggered when the device 
> resumes from suspension without loading a new table, in which case there 
> is no new metadata in the table to attach to the mapped device.

Yes.

> In the scenario where the finalize() callback succeeds but the preresume()
> callback fails, it seems the device will remain in a suspended state, the
> newly activated table will be kept, and the old table will be destroyed, so it
> seems there is no inconsistency using finalize() even preresume() potentially
> fails.

What does your security module do when the verification of the dm-verity 
hash fails? Does it halt the whole system? Does it destroy just the 
failing dm device? Or does it attempt to recover somehow from this 
situation?

> I believe both the finalize() callback proposed by Mike and the preresume()
> callback suggested by Mikulas can work for our case. I am fine with either
> approach, but I would like to know which one is preferred by the maintainers
> and would appreciate an ACK for the chosen approach.
> 
> -Fan

I would prefer preresume - we shouldn't add new callbacks unless it's 
necessary.

Mikulas
Fan Wu May 21, 2024, 9:42 p.m. UTC | #6
On 5/20/2024 5:31 AM, Mikulas Patocka wrote:
> 
> 
> On Fri, 17 May 2024, Fan Wu wrote:
> 
>>> So, it seems that the preresume callback provides the guarantee that you
>>> looking for.
>>>
>>>> -Fan
>>>
>>> Mikulas
>>
>> Thanks for the info. I have tested and verified that the preresume() hook can
>> also work for our case.
>>
>>  From the source code
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/md/dm-ioctl.c#n1149,
>> the whole resume process appears to be:
>>
>> 1. Check if there is a new map for the device. If so, attempt to activate the
>> new map using dm_swap_table() (where the finalize() callback occurs).
>>
>> 2. Check if the device is suspended. If so, use dm_resume() (where the
>> preresume() callback occurs) to resume the device.
>>
>> 3. If a new map is activated, use dm_table_destroy() to destroy the old map.
>>
>> For our case:
>>
>> - Using the finalize() callback, the metadata of the dm-verity target inside
>> the table is attached to the mapped device every time a new table is
>> activated.
>> - Using the preresume() callback, the same metadata is attached every time the
>> device resumes from suspension.
>>
>> If I understand the code correctly, resuming from suspension is a necessary
>> step for loading a new mapping table. Thus, the preresume() callback covers
>> all conditions where the finalize() callback would be triggered.
> 
> Yes.
> 
>> However, the preresume() callback can also be triggered when the device
>> resumes from suspension without loading a new table, in which case there
>> is no new metadata in the table to attach to the mapped device.
> 
> Yes.
> 
>> In the scenario where the finalize() callback succeeds but the preresume()
>> callback fails, it seems the device will remain in a suspended state, the
>> newly activated table will be kept, and the old table will be destroyed, so it
>> seems there is no inconsistency using finalize() even preresume() potentially
>> fails.
> 
> What does your security module do when the verification of the dm-verity
> hash fails? Does it halt the whole system? Does it destroy just the
> failing dm device? Or does it attempt to recover somehow from this
> situation?
>

I'm not sure which hash verification is being referred to here, but it 
could be either root hash signature verification or block-level hash 
verification. Our security module does not intervene in these processes, 
so the behavior remains as dm-verity currently handles it.

Within the device mapper, our security module uses the device mapper 
callback to duplicate the root hash of a dm-verity target and record the 
signature verification state of the dm-verity target, then attach this 
information to the security field of the block_device structure. This 
process can only fail if the system is out of memory.

With the root hash and signature verification state attached to the 
security field of the block device, the security system can access this 
important metadata to enforce policies. For example, these policies can 
include only allowing files from a dm-verity volume specified by its 
root hash to execute or only allowing files from a verified signed 
dm-verity volume to execute.

>> I believe both the finalize() callback proposed by Mike and the preresume()
>> callback suggested by Mikulas can work for our case. I am fine with either
>> approach, but I would like to know which one is preferred by the maintainers
>> and would appreciate an ACK for the chosen approach.
>>
>> -Fan
> 
> I would prefer preresume - we shouldn't add new callbacks unless it's
> necessary.
> 
> Mikulas
>

Thanks for the confirmation. I will switch to use prereume and I will 
send a new version later this week.

-Fan
diff mbox series

Patch

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7d0746b37c8e..a748c3735156 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2276,6 +2276,18 @@  static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
 		goto out;
 	}
 
+	for (unsigned int i = 0; i < t->num_targets; i++) {
+		struct dm_target *ti = dm_table_get_target(t, i);
+
+		if (ti->type->finalize) {
+			ret = ti->type->finalize(ti);
+			if (ret) {
+				old_map = ERR_PTR(ret);
+				goto out;
+			}
+		}
+	}
+
 	old_map = rcu_dereference_protected(md->map, lockdep_is_held(&md->suspend_lock));
 	rcu_assign_pointer(md->map, (void *)t);
 	md->immutable_target_type = dm_table_get_immutable_target_type(t);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 82b2195efaca..ad368904b1d5 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -160,6 +160,14 @@  typedef int (*dm_dax_zero_page_range_fn)(struct dm_target *ti, pgoff_t pgoff,
  */
 typedef size_t (*dm_dax_recovery_write_fn)(struct dm_target *ti, pgoff_t pgoff,
 		void *addr, size_t bytes, struct iov_iter *i);
+/*
+ * This hook allows DM targets in an inactive table to complete their setup
+ * before the table is made active.
+ * Returns:
+ *  < 0 : error
+ *  = 0 : success
+ */
+typedef int (*dm_finalize_fn) (struct dm_target *target);
 
 void dm_error(const char *message);
 
@@ -210,6 +218,7 @@  struct target_type {
 	dm_dax_direct_access_fn direct_access;
 	dm_dax_zero_page_range_fn dax_zero_page_range;
 	dm_dax_recovery_write_fn dax_recovery_write;
+	dm_finalize_fn finalize;
 
 	/* For internal device-mapper use. */
 	struct list_head list;