[v5,00/14] integrity: Introduce the Integrity Digest Cache

Message ID	20240905150543.3766895-1-roberto.sassu@huaweicloud.com (mailing list archive)
Headers	show Received: from frasgout11.his.huawei.com (frasgout11.his.huawei.com [14.137.139.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA10919E827; Thu, 5 Sep 2024 15:06:45 +0000 (UTC) From: Roberto Sassu <roberto.sassu@huaweicloud.com> To: zohar@linux.ibm.com, dmitry.kasatkin@gmail.com, eric.snowberg@oracle.com, corbet@lwn.net, akpm@linux-foundation.org, paul@paul-moore.com, jmorris@namei.org, serge@hallyn.com, shuah@kernel.org, mcoquelin.stm32@gmail.com, alexandre.torgue@foss.st.com Cc: linux-integrity@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kselftest@vger.kernel.org, wufan@linux.microsoft.com, pbrobinson@gmail.com, zbyszek@in.waw.pl, hch@lst.de, mjg59@srcf.ucam.org, pmatilai@redhat.com, jannh@google.com, dhowells@redhat.com, jikos@kernel.org, mkoutny@suse.com, ppavlu@suse.com, petr.vorel@gmail.com, mzerqung@0pointer.de, kgold@linux.ibm.com, Roberto Sassu <roberto.sassu@huawei.com> Subject: [PATCH v5 00/14] integrity: Introduce the Integrity Digest Cache Date: Thu, 5 Sep 2024 17:05:29 +0200 Message-Id: <20240905150543.3766895-1-roberto.sassu@huaweicloud.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	integrity: Introduce the Integrity Digest Cache \| expand [v5,00/14] integrity: Introduce the Integrity Digest Cache [v5,01/14] lib: Add TLV parser [v5,02/14] integrity: Introduce the Integrity Digest Cache [v5,03/14] digest_cache: Initialize digest caches [v5,04/14] digest_cache: Add securityfs interface [v5,05/14] digest_cache: Add hash tables and operations [v5,06/14] digest_cache: Populate the digest cache from a digest list [v5,07/14] digest_cache: Parse tlv digest lists [v5,08/14] digest_cache: Parse rpm digest lists [v5,09/14] digest_cache: Add management of verification data [v5,10/14] digest_cache: Add support for directories [v5,11/14] digest cache: Prefetch digest lists if requested [v5,12/14] digest_cache: Reset digest cache on file/directory change [v5,13/14] selftests/digest_cache: Add selftests for the Integrity Digest Cache [v5,14/14] docs: Add documentation of the Integrity Digest Cache

From: Roberto Sassu <roberto.sassu@huawei.com> Integrity detection and protection has long been a desirable feature, to reach a large user base and mitigate the risk of flaws in the software and attacks. However, while solutions exist, they struggle to reach a large user base, due to requiring higher than desired constraints on performance, flexibility and configurability, that only security conscious people are willing to accept. For example, IMA measurement requires the target platform to collect integrity measurements, and to protect them with the TPM, which introduces a noticeable overhead (up to 10x slower in a microbenchmark) on frequently used system calls, like the open(). IMA Appraisal currently requires individual files to be signed and verified, and Linux distributions to rebuild all packages to include file signatures (this approach has been adopted from Fedora 39+). Like a TPM, also signature verification introduces a significant overhead, especially if it is used to check the integrity of many files. This is where the new Integrity Digest Cache comes into play, it offers additional support for new and existing integrity solutions, to make them faster and easier to deploy. The Integrity Digest Cache can help IMA to reduce the number of TPM operations and to make them happen in a deterministic way. If IMA knows that a file comes from a Linux distribution, it can measure files in a different way: measure the list of digests coming from the distribution (e.g. RPM package headers), and subsequently measure a file if it is not found in that list. The performance improvement comes at the cost of IMA not reporting which files from installed packages were accessed, and in which temporal sequence. This approach might not be suitable for all use cases. The Integrity Digest Cache can also help IMA for appraisal. IMA can simply lookup the calculated digest of an accessed file in the list of digests extracted from package headers, after verifying the header signature. It is sufficient to verify only one signature for all files in the package, as opposed to verifying a signature for each file. The same approach can be followed by other LSMs, such as Integrity Policy Enforcement (IPE), and BPF LSM. The Integrity Digest Cache is not tied to a specific package format. While it currently supports a TLV-based and the RPM formats, it can be easily extended to support more formats, such as DEBs. Focusing on just extracting digests keeps these parsers minimal and reasonably simple (e.g. the RPM parser has ~220 LOC). Included parsers have been verified for memory safety with the Frama-C static analyzer. The parsers with the Frama-C assertions are available here: https://github.com/robertosassu/rpm-formal/ Integrating the Integrity Digest Cache in IMA brings significant performance improvements: up to 67% and 79% for measurement respectively in sequential and parallel file reads; up to 65% and 43% for appraisal respectively in sequential and parallel file reads. The performance can be further enhanced by using fsverity digests instead of conventional file digests, which would make IMA verify only the portion of the file to be read. However, at the moment, fsverity digests are not included in RPM packages. In this case, once rpm is extended to include them, Linux distributions still have to rebuild their packages. The Integrity Digest Cache can support both digest types, so that the functionality is immediately available without waiting for Linux distributions to do the transition. This patch set only includes the patches necessary to extract digests from a TLV-based and RPM data formats, and exposes an API for LSMs to query them. A separate patch set will be provided to integrate it in IMA. This patch set and the follow-up IMA integration can be tested by following the instructions at: https://github.com/linux-integrity/digest-cache-tools This patch set applies on top of: https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git/log/?h=next-integrity with commit fa8a4ce432e8 ("ima: fix buffer overrun in ima_eventdigest_init_common"). Changelog v4: - Rename digest_cache LSM to Integrity Digest Cache (suggested by Paul Moore) - Update documentation - Remove forward declaration of struct digest_cache in include/linux/digest_cache.h (suggested by Jarkko) - Add DIGEST_CACHE_FREE digest cache event for notification - Remove digest_cache_found_t typedef and use uintptr_t instead - Add header callback in TLV parser and unexport tlv_parse_hdr() and tlv_parse_data() - Plug the Integrity Digest Cache into the 'ima' LSM - Switch from constructor to zeroing the object cache - Remove notifier and detect digest cache changes by comparing pointers - Rename digest_cache_dir_create() to digest_cache_dir_add_entries() - Introduce digest_cache_dir_create() to create and initialize a directory digest cache - Introduce digest_cache_dir_update_dig_user() to update dig_user with a file digest cache on positive digest lookup - Use up to date directory digest cache, to take into account possible inode eviction for the old ones - Introduce digest_cache_dir_prefetch() to prefetch digest lists - Adjust component name in debug messages (suggested by Jarkko) - Add FILE_PREFETCH and FILE_READ digest cache flags, remove RESET_USER - Reintroduce spin lock for digest cache verification data (needed for the selftests) - Get inode and file descriptor security blob offsets from outside (IMA) - Avoid user-after-free in digest_cache_unref() by decrementing the ref. count after printing the debug message - Check for digest list lookup loops also for the parent directory - Put and clear dig_owner directly in digest_cache_reset_clear_owner() - Move digest cache initialization code from digest_cache_create() to digest_cache_init() - Hold the digest list path until the digest cache is initialized (to avoid premature inode eviction) - Avoid race condition on setting DIR_PREFETCH in the directory digest cache - Introduce digest_cache_dir_prefetch() and do it between digest cache creation and initialization (to avoid lock inversion) - Avoid unnecessary length check in digest_list_parse_rpm() - Declare arrays of strings in tlv parser as static - Emit reset for parent directory on directory entry modification - Rename digest_cache_reset_owner() to digest_cache_reset_clear_owner() and digest_cache_reset_user() to digest_cache_clear_user() - Execute digest_cache_file_release() either if FMODE_WRITE or FMODE_CREATED are set in the file descriptor f_mode - Determine in digest_cache_verif_set() which gfp flag to use depending on verifier ID - Update selftests v3: - Rewrite documentation, and remove the installation instructions since they are now included in the README of digest-cache-tools - Add digest cache event notifier - Drop digest_cache_was_reset(), and send instead to asynchronous notifications - Fix digest_cache LSM Kconfig style issues (suggested by Randy Dunlap) - Propagate digest cache reset to directory entries - Destroy per directory entry mutex - Introduce RESET_USER bit, to clear the dig_user pointer on set/removexattr - Replace 'file content' with 'file data' (suggested by Mimi) - Introduce per digest cache mutex and replace verif_data_lock spinlock - Track changes of security.digest_list xattr - Stop tracking file_open and use file_release instead also for file writes - Add error messages in digest_cache_create() - Load/unload testing kernel module automatically during execution of test - Add tests for digest cache event notifier - Add test for ftruncate() - Remove DIGEST_CACHE_RESET_PREFETCH_BUF command in test and clear the buffer on read instead v2: - Include the TLV parser in this patch set (from user asymmetric keys and signatures) - Move from IMA and make an independent LSM - Remove IMA-specific stuff from this patch set - Add per algorithm hash table - Expect all digest lists to be in the same directory and allow changing the default directory - Support digest lookup on directories, when there is no security.digest_list xattr - Add seq num to digest list file name, to impose ordering on directory iteration - Add a new data type DIGEST_LIST_ENTRY_DATA for the nested data in the tlv digest list format - Add the concept of verification data attached to digest caches - Add the reset mechanism to track changes on digest lists and directory containing the digest lists - Add kernel selftests v1: - Add documentation in Documentation/security/integrity-digest-cache.rst - Pass the mask of IMA actions to digest_cache_alloc() - Add a reference count to the digest cache - Remove the path parameter from digest_cache_get(), and rely on the reference count to avoid the digest cache disappearing while being used - Rename the dentry_to_check parameter of digest_cache_get() to dentry - Rename digest_cache_get() to digest_cache_new() and add digest_cache_get() to set the digest cache in the iint of the inode for which the digest cache was requested - Add dig_owner and dig_user to the iint, to distinguish from which inode the digest cache was created from, and which is using it; consequently it makes the digest cache usable to measure/appraise other digest caches (support not yet enabled) - Add dig_owner_mutex and dig_user_mutex to serialize accesses to dig_owner and dig_user until they are initialized - Enforce strong synchronization and make the contenders wait until dig_owner and dig_user are assigned to the iint the first time - Move checking IMA actions on the digest list earlier, and fail if no action were performed (digest cache not usable) - Remove digest_cache_put(), not needed anymore with the introduction of the reference count - Fail immediately in digest_cache_lookup() if the digest algorithm is not set in the digest cache - Use 64 bit mask for IMA actions on the digest list instead of 8 bit - Return NULL in the inline version of digest_cache_get() - Use list_add_tail() instead of list_add() in the iterator - Copy the digest list path to a separate buffer in digest_cache_iter_dir() - Use digest list parsers verified with Frama-C - Explicitly disable (for now) the possibility in the IMA policy to use the digest cache to measure/appraise other digest lists - Replace exit(<value>) with return <value> in manage_digest_lists.c Roberto Sassu (14): lib: Add TLV parser integrity: Introduce the Integrity Digest Cache digest_cache: Initialize digest caches digest_cache: Add securityfs interface digest_cache: Add hash tables and operations digest_cache: Populate the digest cache from a digest list digest_cache: Parse tlv digest lists digest_cache: Parse rpm digest lists digest_cache: Add management of verification data digest_cache: Add support for directories digest cache: Prefetch digest lists if requested digest_cache: Reset digest cache on file/directory change selftests/digest_cache: Add selftests for the Integrity Digest Cache docs: Add documentation of the Integrity Digest Cache Documentation/security/digest_cache.rst | 814 ++++++++++++++++++ Documentation/security/index.rst | 1 + MAINTAINERS | 10 + include/linux/digest_cache.h | 58 ++ include/linux/kernel_read_file.h | 1 + include/linux/tlv_parser.h | 48 ++ include/uapi/linux/tlv_digest_list.h | 72 ++ include/uapi/linux/tlv_parser.h | 62 ++ include/uapi/linux/xattr.h | 6 + lib/Kconfig | 3 + lib/Makefile | 2 + lib/tlv_parser.c | 221 +++++ lib/tlv_parser.h | 17 + security/integrity/Kconfig | 1 + security/integrity/Makefile | 1 + security/integrity/digest_cache/Kconfig | 33 + security/integrity/digest_cache/Makefile | 11 + security/integrity/digest_cache/dir.c | 397 +++++++++ security/integrity/digest_cache/htable.c | 254 ++++++ security/integrity/digest_cache/internal.h | 277 ++++++ security/integrity/digest_cache/main.c | 559 ++++++++++++ security/integrity/digest_cache/modsig.c | 66 ++ .../integrity/digest_cache/parsers/parsers.h | 15 + security/integrity/digest_cache/parsers/rpm.c | 220 +++++ security/integrity/digest_cache/parsers/tlv.c | 341 ++++++++ security/integrity/digest_cache/populate.c | 157 ++++ security/integrity/digest_cache/reset.c | 227 +++++ security/integrity/digest_cache/secfs.c | 104 +++ security/integrity/digest_cache/verif.c | 131 +++ security/integrity/ima/ima.h | 1 + security/integrity/ima/ima_fs.c | 6 + security/integrity/ima/ima_main.c | 11 +- tools/testing/selftests/Makefile | 1 + .../testing/selftests/digest_cache/.gitignore | 3 + tools/testing/selftests/digest_cache/Makefile | 24 + .../testing/selftests/digest_cache/all_test.c | 749 ++++++++++++++++ tools/testing/selftests/digest_cache/common.c | 78 ++ tools/testing/selftests/digest_cache/common.h | 134 +++ .../selftests/digest_cache/common_user.c | 47 + .../selftests/digest_cache/common_user.h | 17 + tools/testing/selftests/digest_cache/config | 1 + .../selftests/digest_cache/generators.c | 248 ++++++ .../selftests/digest_cache/generators.h | 19 + .../selftests/digest_cache/testmod/Makefile | 16 + .../selftests/digest_cache/testmod/kern.c | 501 +++++++++++ 45 files changed, 5964 insertions(+), 1 deletion(-) create mode 100644 Documentation/security/digest_cache.rst create mode 100644 include/linux/digest_cache.h create mode 100644 include/linux/tlv_parser.h create mode 100644 include/uapi/linux/tlv_digest_list.h create mode 100644 include/uapi/linux/tlv_parser.h create mode 100644 lib/tlv_parser.c create mode 100644 lib/tlv_parser.h create mode 100644 security/integrity/digest_cache/Kconfig create mode 100644 security/integrity/digest_cache/Makefile create mode 100644 security/integrity/digest_cache/dir.c create mode 100644 security/integrity/digest_cache/htable.c create mode 100644 security/integrity/digest_cache/internal.h create mode 100644 security/integrity/digest_cache/main.c create mode 100644 security/integrity/digest_cache/modsig.c create mode 100644 security/integrity/digest_cache/parsers/parsers.h create mode 100644 security/integrity/digest_cache/parsers/rpm.c create mode 100644 security/integrity/digest_cache/parsers/tlv.c create mode 100644 security/integrity/digest_cache/populate.c create mode 100644 security/integrity/digest_cache/reset.c create mode 100644 security/integrity/digest_cache/secfs.c create mode 100644 security/integrity/digest_cache/verif.c create mode 100644 tools/testing/selftests/digest_cache/.gitignore create mode 100644 tools/testing/selftests/digest_cache/Makefile create mode 100644 tools/testing/selftests/digest_cache/all_test.c create mode 100644 tools/testing/selftests/digest_cache/common.c create mode 100644 tools/testing/selftests/digest_cache/common.h create mode 100644 tools/testing/selftests/digest_cache/common_user.c create mode 100644 tools/testing/selftests/digest_cache/common_user.h create mode 100644 tools/testing/selftests/digest_cache/config create mode 100644 tools/testing/selftests/digest_cache/generators.c create mode 100644 tools/testing/selftests/digest_cache/generators.h create mode 100644 tools/testing/selftests/digest_cache/testmod/Makefile create mode 100644 tools/testing/selftests/digest_cache/testmod/kern.c

On Thu, Sep 05, 2024 at 05:05:29PM +0200, Roberto Sassu wrote: Good morning, I hope the week is starting well for everyone Apologies for the delay in getting these thoughts out, scrambling to catch up on my e-mail backlog. I looped Linus in, secondary to the conversations surrounding the PGP verification infrastructure in the kernel, given that the primary use case at this time appears to be the digest cache and his concerns regarding that use. Our proposed TSEM LSM, most recent submission here: https://lore.kernel.org/linux-security-module/20240826103728.3378-1-greg@enjellic.com/T/#t Is a superset of IMA functionality and depends heavily on file checksums, hence our interest and reflections in your efforts with this. > From: Roberto Sassu <roberto.sassu@huawei.com> > > Integrity detection and protection has long been a desirable feature, to > reach a large user base and mitigate the risk of flaws in the software > and attacks. > > However, while solutions exist, they struggle to reach a large user base, > due to requiring higher than desired constraints on performance, > flexibility and configurability, that only security conscious people are > willing to accept. No argument here, inherent in better and more effective security architectures is better useability, pure and simple. > For example, IMA measurement requires the target platform to collect > integrity measurements, and to protect them with the TPM, which > introduces a noticeable overhead (up to 10x slower in a > microbenchmark) on frequently used system calls, like the open(). The future for trusted systems will not be in TPM's, as unpopular a notion as that may be in some circles. They represent a design from a quarter century ago that struggles to have relevance with our current system architectures. If a TPM is present, TSEM will extend the security coefficients for the root modeling namespace into a PCR to establish a root of trust that the rest of the trust orchestration system can be built on. Ours is a worst case scenario beyond IMA since there is a coefficient generated for each LSM call that is being modeled. We had to go to asynchronous updates through an ordered workqueue in order to have something less than abysmal performance, even with vTPM's running in a Xen hypervisor domain. This is without the current performance impacts being discussed with respect to HMAC based TPM session authentication. > IMA Appraisal currently requires individual files to be signed and > verified, and Linux distributions to rebuild all packages to include > file signatures (this approach has been adopted from Fedora > 39+). Like a TPM, also signature verification introduces a > significant overhead, especially if it is used to check the > integrity of many files. > > This is where the new Integrity Digest Cache comes into play, it > offers additional support for new and existing integrity solutions, > to make them faster and easier to deploy. > > The Integrity Digest Cache can help IMA to reduce the number of TPM > operations and to make them happen in a deterministic way. If IMA > knows that a file comes from a Linux distribution, it can measure > files in a different way: measure the list of digests coming from > the distribution (e.g. RPM package headers), and subsequently > measure a file if it is not found in that list. > > The performance improvement comes at the cost of IMA not reporting > which files from installed packages were accessed, and in which > temporal sequence. This approach might not be suitable for all use > cases. That, in and of itself, is certainly not the end of the world. With TSEM we offer the notion of the 'state' of a security namespace, which is the extension sum of the security coefficients after they have been sorted in natural (big-endian) hash order. In this model you know what files have been accessed but you do not have a statement on temporal ordering of access. Given scheduling artifacts, let alone the almost absolute ubiquity of multi-core, the simple TPM/TCG linear extension model seems to struggle with respect to any relevancy as a security metric. > The Integrity Digest Cache can also help IMA for appraisal. IMA can simply > lookup the calculated digest of an accessed file in the list of digests > extracted from package headers, after verifying the header signature. It is > sufficient to verify only one signature for all files in the package, as > opposed to verifying a signature for each file. > > The same approach can be followed by other LSMs, such as Integrity Policy > Enforcement (IPE), and BPF LSM. As we've noted above, TSEM would also be a potential consumer, which is why we wanted to seek clarifications on the architecture. We've reviewed the patch set and the documentation, and will freely admit that we may still misunderstand all of this, but it would seem that the architecture, as it stands, would be subject to Time Of Measurement Time Of Use (TOMTOU) challenges. The Time Of Measurement will be when the distribution generates an RPM, or equivalent construct, ie. .deb, and signs the digest list with their packaging key. What is elusive to us is how can their be an expectation that the file, on medium, when accessed (Time Of Use), matches the digest of the file that was signed by the distribution? At a minimum, there would seem to be a need to have the kernel read and validate the on medium checksum of the file, as the in-kernel RPM parser reads each signature from the package list. At that point, as long as the kernel is running, the digest cache will represent a valid statement on the cryptographic checksum of a file held in the digest cache, as your patch series seem to have invalidation support well in hand. After a system reboot, it would seem to be that all bets are off, and from a security perspective, there would be a need to re-verify that the on medium file checksums match those from a signed digest list. IMA has the ability to do protection against offline modification but you are then back to a possibly expensive operation on each file access. We see in the thread on PGP infrastructure in the kernel you make the following statement: "If the calculated digest of a file being accessed matches one extracted from the RPM header, access is granted otherwise it is denied." Which would seem to imply that you do compute the on-medium checksum of each file and verify it against a reference value from the RPM header, but it isn't clear where that happens in the patch series. The only kernel based file read operation we could find is what appears to be a call to read the digest list files. IMA already has the concept of a digest cache, as does TSEM. If you need to read a file in order to match its medium based checksum against the value from a package list, in order to avoid a TOMTOU condition, it is unclear how one gains a performance improvement. Unless of course the objective is to prime the digest cache at boot so that all subsequent integrity verifications are answered from cache rather than by computing the checksum at file access time. In the thread on PGP access you indicate that all of this needs to be in the kernel in order to be tamper proof. FWIW, the kernel has the ability to know if kernel + userspace should be trusted at any given time, that is one of the security statements that we seek to offer with TSEM. If the kernel can make a judgement, that in a limited execution context, such as system boot and initialization, that userspace has not acted in an untrusted manner, it can punt verification and parsing of RPM headers and priming of something like the digest cache to userspace. Again, apologies if we misunderstand the architecture, any clarifications would be appreciated. Have a good week. As always, Dr. Greg The Quixote Project - Flailing at the Travails of Cybersecurity https://github.com/Quixote-Project

[v5,00/14] integrity: Introduce the Integrity Digest Cache

Message

Comments