Message ID | 20240221182020.1086096-9-nifan.cxl@gmail.com |
---|---|
State | Superseded |
Headers | show |
Series | Enabling DCD emulation support in Qemu | expand |
On 2024-02-22 오전 3:16, nifan.cxl@gmail.com wrote: > From: Fan Ni <fan.ni@samsung.com> > > Per CXL spec 3.1, two mailbox commands are implemented: > Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and > Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4. > > Signed-off-by: Fan Ni <fan.ni@samsung.com> > --- > hw/cxl/cxl-mailbox-utils.c | 288 ++++++++++++++++++++++++++++++++++++ > include/hw/cxl/cxl_device.h | 2 + > 2 files changed, 290 insertions(+) > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c > index dae7fe00ed..65ed28f700 100644 > --- a/hw/cxl/cxl-mailbox-utils.c > +++ b/hw/cxl/cxl-mailbox-utils.c [snip] > +/* > + * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload > + * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload > + */ > +typedef struct CXLUpdateDCExtentListInPl { > + uint32_t num_entries_updated; > + uint8_t flags; > + uint8_t rsvd[3]; > + /* CXL r3.1 Table 8-169: Updated Extent List */ I'm not sure why this was changed, but in r3.1 the table is simply renamed to "Updated Extent" ... > +/* > + * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (opcode 4802h) It's too trivial, but it's written as "Opcode" in other comments. : Opcode 4802h ... > +/* > + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (opcode 4803h) And here, too : Opcode 4803h ... > @@ -1462,6 +1744,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = { > [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = { > "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list, > 8, 0 }, > + [DCD_CONFIG][ADD_DYN_CAP_RSP] = { > + "ADD_DCD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp, > + ~0, IMMEDIATE_DATA_CHANGE }, > + [DCD_CONFIG][RELEASE_DYN_CAP] = { > + "RELEASE_DCD_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap, > + ~0, IMMEDIATE_DATA_CHANGE }, For consistency, how about setting the names of commands as follows? : DCD_ADD_DYNAMIC_CAPACITY_RESPONSE : DCD_RELEASE_DYNAMIC_CAPACITY Thanks, Wonjae
On Wed, 21 Feb 2024 10:16:01 -0800 nifan.cxl@gmail.com wrote: > From: Fan Ni <fan.ni@samsung.com> > > Per CXL spec 3.1, two mailbox commands are implemented: > Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and > Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4. > > Signed-off-by: Fan Ni <fan.ni@samsung.com> Hi Fan, Comments on this are all about corner cases. If we can I think we need to cover a few more. Linux won't hit them (I think) so it will be a bit of a pain to test but maybe raw commands enabled and some userspace code will let us exercise the corner cases? Jonathan > + > +/* > + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (opcode 4803h) > + */ > +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd, > + uint8_t *payload_in, > + size_t len_in, > + uint8_t *payload_out, > + size_t *len_out, > + CXLCCI *cci) > +{ > + CXLUpdateDCExtentListInPl *in = (void *)payload_in; > + CXLType3Dev *ct3d = CXL_TYPE3(cci->d); > + CXLDCExtentList *extent_list = &ct3d->dc.extents; > + CXLDCExtent *ent; > + uint32_t i; > + uint64_t dpa, len; > + CXLRetCode ret; > + > + if (in->num_entries_updated == 0) { > + return CXL_MBOX_INVALID_INPUT; > + } > + > + ret = cxl_detect_malformed_extent_list(ct3d, in); > + if (ret != CXL_MBOX_SUCCESS) { > + return ret; > + } > + > + for (i = 0; i < in->num_entries_updated; i++) { > + bool found = false; > + > + dpa = in->updated_entries[i].start_dpa; > + len = in->updated_entries[i].len; > + > + QTAILQ_FOREACH(ent, extent_list, node) { > + if (ent->start_dpa <= dpa && > + dpa + len <= ent->start_dpa + ent->len) { > + /* > + * If an incoming extent covers a portion of an extent > + * in the device extent list, remove only the overlapping > + * portion, meaning > + * 1. the portions that are not covered by the incoming > + * extent at both end of the original extent will become > + * new extents and inserted to the extent list; and > + * 2. the original extent is removed from the extent list; > + * 3. dc extent count is updated accordingly. > + */ > + uint64_t ent_start_dpa = ent->start_dpa; > + uint64_t ent_len = ent->len; > + uint64_t len1 = dpa - ent_start_dpa; > + uint64_t len2 = ent_start_dpa + ent_len - dpa - len; > + > + found = true; > + cxl_remove_extent_from_extent_list(extent_list, ent); > + ct3d->dc.total_extent_count -= 1; > + > + if (len1) { > + cxl_insert_extent_to_extent_list(extent_list, > + ent_start_dpa, len1, > + NULL, 0); > + ct3d->dc.total_extent_count += 1; > + } > + if (len2) { > + cxl_insert_extent_to_extent_list(extent_list, dpa + len, > + len2, NULL, 0); > + ct3d->dc.total_extent_count += 1; There is a non zero chance that we'll overflow however many extents we claim to support. So we need to check that and fail the remove if it happens. Could ignore this for now though as that value is (I think!) conservative to allow for complex extent list tracking implementations. Succeeding when a naive solution would fail due to running out of extents that it can manage is not (I think) a bug. > + } > + break; > + /*Currently we reject the attempt to remove a superset*/ Space after /* and before */ I think we need to fix this. Linux isn't going to do it any time soon, but I think it's allowed to allocate two extents next to each other then free them in one go. Isn't this case easy to do or are there awkward corners? If it's sufficiently nasty (maybe because only part of extent provided exists?) then maybe we can leave it for now. I worry about something like | EXTENT TO FREE | | Exists | gap | Exists | Where we have to check for gap before removing anything? Does the spec address this? Not that I can find. I think the implication is we have to do a validation pass, then a free pass after we know whole of requested extent is valid. Nasty to test if nothing else :( Would look much like your check on malformed extent lists. > + } else if ((dpa < ent->start_dpa + ent->len && > + dpa + len > ent->start_dpa + ent->len) || > + (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) { > + return CXL_MBOX_INVALID_EXTENT_LIST; > + } > + } > + > + if (!found) { > + /* Try to remove a non-existing extent */ > + return CXL_MBOX_INVALID_PA; > + } > + } > + > + return CXL_MBOX_SUCCESS; > +}
On Mon, Feb 26, 2024 at 06:04:17PM +0000, Jonathan Cameron wrote: > On Wed, 21 Feb 2024 10:16:01 -0800 > nifan.cxl@gmail.com wrote: > > > From: Fan Ni <fan.ni@samsung.com> > > > > Per CXL spec 3.1, two mailbox commands are implemented: > > Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and > > Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4. > > > > Signed-off-by: Fan Ni <fan.ni@samsung.com> > > Hi Fan, > > Comments on this are all about corner cases. If we can I think we need > to cover a few more. Linux won't hit them (I think) so it will be > a bit of a pain to test but maybe raw commands enabled and some > userspace code will let us exercise the corner cases? > > Jonathan > > > > > + > > +/* > > + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (opcode 4803h) > > + */ > > +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd, > > + uint8_t *payload_in, > > + size_t len_in, > > + uint8_t *payload_out, > > + size_t *len_out, > > + CXLCCI *cci) > > +{ > > + CXLUpdateDCExtentListInPl *in = (void *)payload_in; > > + CXLType3Dev *ct3d = CXL_TYPE3(cci->d); > > + CXLDCExtentList *extent_list = &ct3d->dc.extents; > > + CXLDCExtent *ent; > > + uint32_t i; > > + uint64_t dpa, len; > > + CXLRetCode ret; > > + > > + if (in->num_entries_updated == 0) { > > + return CXL_MBOX_INVALID_INPUT; > > + } > > + > > + ret = cxl_detect_malformed_extent_list(ct3d, in); > > + if (ret != CXL_MBOX_SUCCESS) { > > + return ret; > > + } > > + > > + for (i = 0; i < in->num_entries_updated; i++) { > > + bool found = false; > > + > > + dpa = in->updated_entries[i].start_dpa; > > + len = in->updated_entries[i].len; > > + > > + QTAILQ_FOREACH(ent, extent_list, node) { > > + if (ent->start_dpa <= dpa && > > + dpa + len <= ent->start_dpa + ent->len) { > > + /* > > + * If an incoming extent covers a portion of an extent > > + * in the device extent list, remove only the overlapping > > + * portion, meaning > > + * 1. the portions that are not covered by the incoming > > + * extent at both end of the original extent will become > > + * new extents and inserted to the extent list; and > > + * 2. the original extent is removed from the extent list; > > + * 3. dc extent count is updated accordingly. > > + */ > > + uint64_t ent_start_dpa = ent->start_dpa; > > + uint64_t ent_len = ent->len; > > + uint64_t len1 = dpa - ent_start_dpa; > > + uint64_t len2 = ent_start_dpa + ent_len - dpa - len; > > + > > + found = true; > > + cxl_remove_extent_from_extent_list(extent_list, ent); > > + ct3d->dc.total_extent_count -= 1; > > + > > + if (len1) { > > + cxl_insert_extent_to_extent_list(extent_list, > > + ent_start_dpa, len1, > > + NULL, 0); > > + ct3d->dc.total_extent_count += 1; > > + } > > + if (len2) { > > + cxl_insert_extent_to_extent_list(extent_list, dpa + len, > > + len2, NULL, 0); > > + ct3d->dc.total_extent_count += 1; > > There is a non zero chance that we'll overflow however many extents we claim > to support. So we need to check that and fail the remove if it happens. > Could ignore this for now though as that value is (I think!) conservative > to allow for complex extent list tracking implementations. Succeeding > when a naive solution would fail due to running out of extents that it can > manage is not (I think) a bug. Yeah. spec r3.1 mentioned about the overflow issue that adding/releasing extent requests can raise. We should fail the operation if running out of extents and report resource exhausted. > > > + } > > + break; > > + /*Currently we reject the attempt to remove a superset*/ > > Space after /* and before */ > > I think we need to fix this. Linux isn't going to do it any time soon, but > I think it's allowed to allocate two extents next to each other then free them > in one go. Isn't this case easy to do or are there awkward corners? > If it's sufficiently nasty (maybe because only part of extent provided exists?) > then maybe we can leave it for now. > > I worry about something like > > | EXTENT TO FREE | > | Exists | gap | Exists | > Where we have to check for gap before removing anything? > Does the spec address this? Not that I can find. > I think the implication is we have to do a validation pass, then a free > pass after we know whole of requested extent is valid. > Nasty to test if nothing else :( Would look much like your check > on malformed extent lists. > I cannot find anything specific to this in the specification either. Since we have already detected the case where the extent range across multiple regions, the only case we need to capture here is one/multiple portions of an extents getting released and causing extent overflow. I think we can handle it after we introduce the bitmaps (PATCH 10) which indicates DPA ranges mapped by valid extents in the device. With that, The release workflow would be 1) detecting malformed extent lists; if passed 2) do cxl_detect_extent_overflow { delta = 0; make a copy of the bitmap as bitmap_copy; for each extent in the updated_extent_list; do if (extent range not fully set in the bitmap_copy) return error; else { if gap at the front based on the bitmap_copy: delta += 1; if gap at the end based on the bitmap_copy: delta += 1; delta -= 1; // NOTE: current_extent_count will not be updated in the // loop since delta will track the whole loop if (delta + current_extent_count > max_extent_count) return resource exhausted; update bitmap_copy to clear the range covered by the extent under consideration; } done }; if pass 3. do real release: in the pass, we will not need to detect extent errors; Does the above solution sound reasonable? If so, do we want to go this way? do we need to introduce the bitmap earlier in the series? Thanks, Fan > > > + } else if ((dpa < ent->start_dpa + ent->len && > > + dpa + len > ent->start_dpa + ent->len) || > > + (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) { > > + return CXL_MBOX_INVALID_EXTENT_LIST; > > + } > > + } > > + > > + if (!found) { > > + /* Try to remove a non-existing extent */ > > + return CXL_MBOX_INVALID_PA; > > + } > > + } > > + > > + return CXL_MBOX_SUCCESS; > > +} > >
On Mon, Feb 26, 2024 at 06:04:17PM +0000, Jonathan Cameron wrote: > On Wed, 21 Feb 2024 10:16:01 -0800 > nifan.cxl@gmail.com wrote: > > > From: Fan Ni <fan.ni@samsung.com> > > > > Per CXL spec 3.1, two mailbox commands are implemented: > > Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and > > Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4. > > > > Signed-off-by: Fan Ni <fan.ni@samsung.com> > > Hi Fan, > > Comments on this are all about corner cases. If we can I think we need > to cover a few more. Linux won't hit them (I think) so it will be > a bit of a pain to test but maybe raw commands enabled and some > userspace code will let us exercise the corner cases? > > Jonathan > > > > > + > > +/* > > + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (opcode 4803h) > > + */ > > +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd, > > + uint8_t *payload_in, > > + size_t len_in, > > + uint8_t *payload_out, > > + size_t *len_out, > > + CXLCCI *cci) > > +{ > > + CXLUpdateDCExtentListInPl *in = (void *)payload_in; > > + CXLType3Dev *ct3d = CXL_TYPE3(cci->d); > > + CXLDCExtentList *extent_list = &ct3d->dc.extents; > > + CXLDCExtent *ent; > > + uint32_t i; > > + uint64_t dpa, len; > > + CXLRetCode ret; > > + > > + if (in->num_entries_updated == 0) { > > + return CXL_MBOX_INVALID_INPUT; > > + } > > + > > + ret = cxl_detect_malformed_extent_list(ct3d, in); > > + if (ret != CXL_MBOX_SUCCESS) { > > + return ret; > > + } > > + > > + for (i = 0; i < in->num_entries_updated; i++) { > > + bool found = false; > > + > > + dpa = in->updated_entries[i].start_dpa; > > + len = in->updated_entries[i].len; > > + > > + QTAILQ_FOREACH(ent, extent_list, node) { > > + if (ent->start_dpa <= dpa && > > + dpa + len <= ent->start_dpa + ent->len) { > > + /* > > + * If an incoming extent covers a portion of an extent > > + * in the device extent list, remove only the overlapping > > + * portion, meaning > > + * 1. the portions that are not covered by the incoming > > + * extent at both end of the original extent will become > > + * new extents and inserted to the extent list; and > > + * 2. the original extent is removed from the extent list; > > + * 3. dc extent count is updated accordingly. > > + */ > > + uint64_t ent_start_dpa = ent->start_dpa; > > + uint64_t ent_len = ent->len; > > + uint64_t len1 = dpa - ent_start_dpa; > > + uint64_t len2 = ent_start_dpa + ent_len - dpa - len; > > + > > + found = true; > > + cxl_remove_extent_from_extent_list(extent_list, ent); > > + ct3d->dc.total_extent_count -= 1; > > + > > + if (len1) { > > + cxl_insert_extent_to_extent_list(extent_list, > > + ent_start_dpa, len1, > > + NULL, 0); > > + ct3d->dc.total_extent_count += 1; > > + } > > + if (len2) { > > + cxl_insert_extent_to_extent_list(extent_list, dpa + len, > > + len2, NULL, 0); > > + ct3d->dc.total_extent_count += 1; > > There is a non zero chance that we'll overflow however many extents we claim > to support. So we need to check that and fail the remove if it happens. > Could ignore this for now though as that value is (I think!) conservative > to allow for complex extent list tracking implementations. Succeeding > when a naive solution would fail due to running out of extents that it can > manage is not (I think) a bug. > > > + } > > + break; > > + /*Currently we reject the attempt to remove a superset*/ > > Space after /* and before */ > > I think we need to fix this. Linux isn't going to do it any time soon, but > I think it's allowed to allocate two extents next to each other then free them > in one go. Isn't this case easy to do or are there awkward corners? If we use the bitmap (indicating each range is filled by valid extents) in PATCH 10, it should not be that difficult to do. Fan > If it's sufficiently nasty (maybe because only part of extent provided exists?) > then maybe we can leave it for now. > > I worry about something like > > | EXTENT TO FREE | > | Exists | gap | Exists | > Where we have to check for gap before removing anything? > Does the spec address this? Not that I can find. > I think the implication is we have to do a validation pass, then a free > pass after we know whole of requested extent is valid. > Nasty to test if nothing else :( Would look much like your check > on malformed extent lists. > > > > + } else if ((dpa < ent->start_dpa + ent->len && > > + dpa + len > ent->start_dpa + ent->len) || > > + (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) { > > + return CXL_MBOX_INVALID_EXTENT_LIST; > > + } > > + } > > + > > + if (!found) { > > + /* Try to remove a non-existing extent */ > > + return CXL_MBOX_INVALID_PA; > > + } > > + } > > + > > + return CXL_MBOX_SUCCESS; > > +} > >
On Mon, 26 Feb 2024 17:01:22 -0800 fan <nifan.cxl@gmail.com> wrote: > On Mon, Feb 26, 2024 at 06:04:17PM +0000, Jonathan Cameron wrote: > > On Wed, 21 Feb 2024 10:16:01 -0800 > > nifan.cxl@gmail.com wrote: > > > > > From: Fan Ni <fan.ni@samsung.com> > > > > > > Per CXL spec 3.1, two mailbox commands are implemented: > > > Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and > > > Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4. > > > > > > Signed-off-by: Fan Ni <fan.ni@samsung.com> > > > > Hi Fan, > > > > Comments on this are all about corner cases. If we can I think we need > > to cover a few more. Linux won't hit them (I think) so it will be > > a bit of a pain to test but maybe raw commands enabled and some > > userspace code will let us exercise the corner cases? > > > > Jonathan > > > > > > > > > + > > > +/* > > > + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (opcode 4803h) > > > + */ > > > +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd, > > > + uint8_t *payload_in, > > > + size_t len_in, > > > + uint8_t *payload_out, > > > + size_t *len_out, > > > + CXLCCI *cci) > > > +{ > > > + CXLUpdateDCExtentListInPl *in = (void *)payload_in; > > > + CXLType3Dev *ct3d = CXL_TYPE3(cci->d); > > > + CXLDCExtentList *extent_list = &ct3d->dc.extents; > > > + CXLDCExtent *ent; > > > + uint32_t i; > > > + uint64_t dpa, len; > > > + CXLRetCode ret; > > > + > > > + if (in->num_entries_updated == 0) { > > > + return CXL_MBOX_INVALID_INPUT; > > > + } > > > + > > > + ret = cxl_detect_malformed_extent_list(ct3d, in); > > > + if (ret != CXL_MBOX_SUCCESS) { > > > + return ret; > > > + } > > > + > > > + for (i = 0; i < in->num_entries_updated; i++) { > > > + bool found = false; > > > + > > > + dpa = in->updated_entries[i].start_dpa; > > > + len = in->updated_entries[i].len; > > > + > > > + QTAILQ_FOREACH(ent, extent_list, node) { > > > + if (ent->start_dpa <= dpa && > > > + dpa + len <= ent->start_dpa + ent->len) { > > > + /* > > > + * If an incoming extent covers a portion of an extent > > > + * in the device extent list, remove only the overlapping > > > + * portion, meaning > > > + * 1. the portions that are not covered by the incoming > > > + * extent at both end of the original extent will become > > > + * new extents and inserted to the extent list; and > > > + * 2. the original extent is removed from the extent list; > > > + * 3. dc extent count is updated accordingly. > > > + */ > > > + uint64_t ent_start_dpa = ent->start_dpa; > > > + uint64_t ent_len = ent->len; > > > + uint64_t len1 = dpa - ent_start_dpa; > > > + uint64_t len2 = ent_start_dpa + ent_len - dpa - len; > > > + > > > + found = true; > > > + cxl_remove_extent_from_extent_list(extent_list, ent); > > > + ct3d->dc.total_extent_count -= 1; > > > + > > > + if (len1) { > > > + cxl_insert_extent_to_extent_list(extent_list, > > > + ent_start_dpa, len1, > > > + NULL, 0); > > > + ct3d->dc.total_extent_count += 1; > > > + } > > > + if (len2) { > > > + cxl_insert_extent_to_extent_list(extent_list, dpa + len, > > > + len2, NULL, 0); > > > + ct3d->dc.total_extent_count += 1; > > > > There is a non zero chance that we'll overflow however many extents we claim > > to support. So we need to check that and fail the remove if it happens. > > Could ignore this for now though as that value is (I think!) conservative > > to allow for complex extent list tracking implementations. Succeeding > > when a naive solution would fail due to running out of extents that it can > > manage is not (I think) a bug. > > Yeah. spec r3.1 mentioned about the overflow issue that adding/releasing > extent requests can raise. We should fail the operation if running out of > extents and report resource exhausted. > > > > > > + } > > > + break; > > > + /*Currently we reject the attempt to remove a superset*/ > > > > Space after /* and before */ > > > > I think we need to fix this. Linux isn't going to do it any time soon, but > > I think it's allowed to allocate two extents next to each other then free them > > in one go. Isn't this case easy to do or are there awkward corners? > > If it's sufficiently nasty (maybe because only part of extent provided exists?) > > then maybe we can leave it for now. > > > > I worry about something like > > > > | EXTENT TO FREE | > > | Exists | gap | Exists | > > Where we have to check for gap before removing anything? > > Does the spec address this? Not that I can find. > > I think the implication is we have to do a validation pass, then a free > > pass after we know whole of requested extent is valid. > > Nasty to test if nothing else :( Would look much like your check > > on malformed extent lists. > > > > I cannot find anything specific to this in the specification either. > Since we have already detected the case where the extent range across > multiple regions, the only case we need to capture here is one/multiple > portions of an extents getting released and causing extent overflow. > I think we can handle it after we introduce the bitmaps (PATCH 10) which > indicates DPA ranges mapped by valid extents in the device. > > With that, The release workflow would be > > 1) detecting malformed extent lists; if passed > 2) do cxl_detect_extent_overflow { > delta = 0; > make a copy of the bitmap as bitmap_copy; > for each extent in the updated_extent_list; do > if (extent range not fully set in the bitmap_copy) > return error; > else { > if gap at the front based on the bitmap_copy: > delta += 1; > if gap at the end based on the bitmap_copy: > delta += 1; > delta -= 1; > // NOTE: current_extent_count will not be updated in the > // loop since delta will track the whole loop > if (delta + current_extent_count > max_extent_count) > return resource exhausted; > update bitmap_copy to clear the range covered by the extent > under consideration; > } > done > > }; if pass > 3. do real release: in the pass, we will not need to detect extent > errors; > > Does the above solution sound reasonable? If so, do we want to go this > way? do we need to introduce the bitmap earlier in the series? Yes, something along these lines should work nicely. Jonathan > > Thanks, > Fan > > > > > > > > + } else if ((dpa < ent->start_dpa + ent->len && > > > + dpa + len > ent->start_dpa + ent->len) || > > > + (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) { > > > + return CXL_MBOX_INVALID_EXTENT_LIST; > > > + } > > > + } > > > + > > > + if (!found) { > > > + /* Try to remove a non-existing extent */ > > > + return CXL_MBOX_INVALID_PA; > > > + } > > > + } > > > + > > > + return CXL_MBOX_SUCCESS; > > > +} > > > >
On Tue, Feb 27, 2024 at 10:39:09AM +0000, Jonathan Cameron wrote: > On Mon, 26 Feb 2024 17:01:22 -0800 > fan <nifan.cxl@gmail.com> wrote: > > > On Mon, Feb 26, 2024 at 06:04:17PM +0000, Jonathan Cameron wrote: > > > On Wed, 21 Feb 2024 10:16:01 -0800 > > > nifan.cxl@gmail.com wrote: > > > > > > > From: Fan Ni <fan.ni@samsung.com> > > > > > > > > Per CXL spec 3.1, two mailbox commands are implemented: > > > > Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and > > > > Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4. > > > > > > > > Signed-off-by: Fan Ni <fan.ni@samsung.com> > > > > > > Hi Fan, > > > > > > Comments on this are all about corner cases. If we can I think we need > > > to cover a few more. Linux won't hit them (I think) so it will be > > > a bit of a pain to test but maybe raw commands enabled and some > > > userspace code will let us exercise the corner cases? > > > > > > Jonathan > > > > > > > > > > > > > + > > > > +/* > > > > + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (opcode 4803h) > > > > + */ > > > > +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd, > > > > + uint8_t *payload_in, > > > > + size_t len_in, > > > > + uint8_t *payload_out, > > > > + size_t *len_out, > > > > + CXLCCI *cci) > > > > +{ > > > > + CXLUpdateDCExtentListInPl *in = (void *)payload_in; > > > > + CXLType3Dev *ct3d = CXL_TYPE3(cci->d); > > > > + CXLDCExtentList *extent_list = &ct3d->dc.extents; > > > > + CXLDCExtent *ent; > > > > + uint32_t i; > > > > + uint64_t dpa, len; > > > > + CXLRetCode ret; > > > > + > > > > + if (in->num_entries_updated == 0) { > > > > + return CXL_MBOX_INVALID_INPUT; > > > > + } > > > > + > > > > + ret = cxl_detect_malformed_extent_list(ct3d, in); > > > > + if (ret != CXL_MBOX_SUCCESS) { > > > > + return ret; > > > > + } > > > > + > > > > + for (i = 0; i < in->num_entries_updated; i++) { > > > > + bool found = false; > > > > + > > > > + dpa = in->updated_entries[i].start_dpa; > > > > + len = in->updated_entries[i].len; > > > > + > > > > + QTAILQ_FOREACH(ent, extent_list, node) { > > > > + if (ent->start_dpa <= dpa && > > > > + dpa + len <= ent->start_dpa + ent->len) { > > > > + /* > > > > + * If an incoming extent covers a portion of an extent > > > > + * in the device extent list, remove only the overlapping > > > > + * portion, meaning > > > > + * 1. the portions that are not covered by the incoming > > > > + * extent at both end of the original extent will become > > > > + * new extents and inserted to the extent list; and > > > > + * 2. the original extent is removed from the extent list; > > > > + * 3. dc extent count is updated accordingly. > > > > + */ > > > > + uint64_t ent_start_dpa = ent->start_dpa; > > > > + uint64_t ent_len = ent->len; > > > > + uint64_t len1 = dpa - ent_start_dpa; > > > > + uint64_t len2 = ent_start_dpa + ent_len - dpa - len; > > > > + > > > > + found = true; > > > > + cxl_remove_extent_from_extent_list(extent_list, ent); > > > > + ct3d->dc.total_extent_count -= 1; > > > > + > > > > + if (len1) { > > > > + cxl_insert_extent_to_extent_list(extent_list, > > > > + ent_start_dpa, len1, > > > > + NULL, 0); > > > > + ct3d->dc.total_extent_count += 1; > > > > + } > > > > + if (len2) { > > > > + cxl_insert_extent_to_extent_list(extent_list, dpa + len, > > > > + len2, NULL, 0); > > > > + ct3d->dc.total_extent_count += 1; > > > > > > There is a non zero chance that we'll overflow however many extents we claim > > > to support. So we need to check that and fail the remove if it happens. > > > Could ignore this for now though as that value is (I think!) conservative > > > to allow for complex extent list tracking implementations. Succeeding > > > when a naive solution would fail due to running out of extents that it can > > > manage is not (I think) a bug. > > > > Yeah. spec r3.1 mentioned about the overflow issue that adding/releasing > > extent requests can raise. We should fail the operation if running out of > > extents and report resource exhausted. > > > > > > > > > + } > > > > + break; > > > > + /*Currently we reject the attempt to remove a superset*/ > > > > > > Space after /* and before */ > > > > > > I think we need to fix this. Linux isn't going to do it any time soon, but > > > I think it's allowed to allocate two extents next to each other then free them > > > in one go. Isn't this case easy to do or are there awkward corners? > > > If it's sufficiently nasty (maybe because only part of extent provided exists?) > > > then maybe we can leave it for now. > > > > > > I worry about something like > > > > > > | EXTENT TO FREE | > > > | Exists | gap | Exists | > > > Where we have to check for gap before removing anything? > > > Does the spec address this? Not that I can find. > > > I think the implication is we have to do a validation pass, then a free > > > pass after we know whole of requested extent is valid. > > > Nasty to test if nothing else :( Would look much like your check > > > on malformed extent lists. > > > > > > > I cannot find anything specific to this in the specification either. > > Since we have already detected the case where the extent range across > > multiple regions, the only case we need to capture here is one/multiple > > portions of an extents getting released and causing extent overflow. > > I think we can handle it after we introduce the bitmaps (PATCH 10) which > > indicates DPA ranges mapped by valid extents in the device. > > > > With that, The release workflow would be > > > > 1) detecting malformed extent lists; if passed > > 2) do cxl_detect_extent_overflow { > > delta = 0; > > make a copy of the bitmap as bitmap_copy; > > for each extent in the updated_extent_list; do > > if (extent range not fully set in the bitmap_copy) > > return error; > > else { > > if gap at the front based on the bitmap_copy: > > delta += 1; > > if gap at the end based on the bitmap_copy: > > delta += 1; > > delta -= 1; > > // NOTE: current_extent_count will not be updated in the > > // loop since delta will track the whole loop > > if (delta + current_extent_count > max_extent_count) > > return resource exhausted; > > update bitmap_copy to clear the range covered by the extent > > under consideration; > > } > > done > > > > }; if pass > > 3. do real release: in the pass, we will not need to detect extent > > errors; > > > > Does the above solution sound reasonable? If so, do we want to go this > > way? do we need to introduce the bitmap earlier in the series? > > Yes, something along these lines should work nicely. > > Jonathan Hi Jonathan, I updated the code based on your feedback and now we can process extent release request more flexible. We can now support superset release (actually it can do even more, as long as the DPA range is coverd by accepted extents, we can release). I have run following tests and the code works as expected, 1. Add multiple extents, and removing them one by one, passed; 2. Superset release: add multiple extents with continuous DPA ranges, and remove all of them with a single release request with an extent covering the whole DPA range, passed; 3. Partial extent release: add a large extent and release only part of it, passed; 4. Partial+superset release: add multiple extents,and release it with some leftover with one request with an extent. For example, add extents [0-128M] and [128M-256M], release [64M-256M]. Passed; 5. Release extent not aligned to block size, failed as expected; 6. Extents have overlaps, fail the request as expected; 7. Extent has uncovered DPA range, skip the extent as expected; The only limitation is that for superset release case, if we find part of its DPA range is still pending to add, while the other is accepted, we reject it through QMP interface. The latest code is https://github.com/moking/qemu/tree/dcd-v5. The main changes are in the last three commits. Btw, in the last commit, I introduce new QMP interfaces to print out accepted and pending-to-add list in the device to a file "/tmp/qmp.txt", do we want it? If yes, I can polish it a little bit, otherwise I will keep it for my own test purpose. I will test more and send out v5 if the above looks reasonable to you. Fan > > > > > > Thanks, > > Fan > > > > > > > > > > > > > + } else if ((dpa < ent->start_dpa + ent->len && > > > > + dpa + len > ent->start_dpa + ent->len) || > > > > + (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) { > > > > + return CXL_MBOX_INVALID_EXTENT_LIST; > > > > + } > > > > + } > > > > + > > > > + if (!found) { > > > > + /* Try to remove a non-existing extent */ > > > > + return CXL_MBOX_INVALID_PA; > > > > + } > > > > + } > > > > + > > > > + return CXL_MBOX_SUCCESS; > > > > +} > > > > > > >
... > > > I cannot find anything specific to this in the specification either. > > > Since we have already detected the case where the extent range across > > > multiple regions, the only case we need to capture here is one/multiple > > > portions of an extents getting released and causing extent overflow. > > > I think we can handle it after we introduce the bitmaps (PATCH 10) which > > > indicates DPA ranges mapped by valid extents in the device. > > > > > > With that, The release workflow would be > > > > > > 1) detecting malformed extent lists; if passed > > > 2) do cxl_detect_extent_overflow { > > > delta = 0; > > > make a copy of the bitmap as bitmap_copy; > > > for each extent in the updated_extent_list; do > > > if (extent range not fully set in the bitmap_copy) > > > return error; > > > else { > > > if gap at the front based on the bitmap_copy: > > > delta += 1; > > > if gap at the end based on the bitmap_copy: > > > delta += 1; > > > delta -= 1; > > > // NOTE: current_extent_count will not be updated in the > > > // loop since delta will track the whole loop > > > if (delta + current_extent_count > max_extent_count) > > > return resource exhausted; > > > update bitmap_copy to clear the range covered by the extent > > > under consideration; > > > } > > > done > > > > > > }; if pass > > > 3. do real release: in the pass, we will not need to detect extent > > > errors; > > > > > > Does the above solution sound reasonable? If so, do we want to go this > > > way? do we need to introduce the bitmap earlier in the series? > > > > Yes, something along these lines should work nicely. > > > > Jonathan > > Hi Jonathan, > I updated the code based on your feedback and now we can process extent > release request more flexible. Excellent! > We can now support superset release (actually it can do even more, > as long as the DPA range is coverd by accepted extents, we can release). > > I have run following tests and the code works as expected, > 1. Add multiple extents, and removing them one by one, passed; > 2. Superset release: add multiple extents with continuous DPA ranges, and > remove all of them with a single release request with an extent covering the > whole DPA range, passed; > 3. Partial extent release: add a large extent and release only part of it, > passed; > 4. Partial+superset release: add multiple extents,and release it with some > leftover with one request with an extent. For example, add extents [0-128M] > and [128M-256M], release [64M-256M]. Passed; > 5. Release extent not aligned to block size, failed as expected; > 6. Extents have overlaps, fail the request as expected; > 7. Extent has uncovered DPA range, skip the extent as expected; > > The only limitation is that for superset release case, if we find > part of its DPA range is still pending to add, while the other is > accepted, we reject it through QMP interface. I think that is a reasonable limitation as we don't expect people to do that crazy on QMP side. Maybe long term we'll want a 'release all' type command (I'm thinking virtualized device usecases) but we can deal with that later. > > The latest code is https://github.com/moking/qemu/tree/dcd-v5. > > The main changes are in the last three commits. > Btw, in the last commit, I introduce new QMP interfaces to print out > accepted and pending-to-add list in the device to a file "/tmp/qmp.txt", > do we want it? If yes, I can polish it a little bit, otherwise I will > keep it for my own test purpose. Ah. I missed this mail and replied directly. That needs a rethink as the thread has concluded I think. I'll carry it on my tree, but not look to upstream it. > > I will test more and send out v5 if the above looks reasonable to you. > Sorry for slow reply - I'm a bit behind with mailing lists. Great you sent it out in the meantime. Jonathan
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c index dae7fe00ed..65ed28f700 100644 --- a/hw/cxl/cxl-mailbox-utils.c +++ b/hw/cxl/cxl-mailbox-utils.c @@ -84,6 +84,8 @@ enum { DCD_CONFIG = 0x48, #define GET_DC_CONFIG 0x0 #define GET_DYN_CAP_EXT_LIST 0x1 + #define ADD_DYN_CAP_RSP 0x2 + #define RELEASE_DYN_CAP 0x3 PHYSICAL_SWITCH = 0x51, #define IDENTIFY_SWITCH_DEVICE 0x0 #define GET_PHYSICAL_PORT_STATE 0x1 @@ -1412,6 +1414,286 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd, return CXL_MBOX_SUCCESS; } +/* + * Check whether any bit between addr[nr, nr+size) is set, + * return true if any bit is set, otherwise return false + */ +static bool test_any_bits_set(const unsigned long *addr, int nr, int size) +{ + unsigned long res = find_next_bit(addr, size + nr, nr); + + return res < nr + size; +} + +CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len) +{ + int i; + CXLDCDRegion *region = &ct3d->dc.regions[0]; + + if (dpa < region->base || + dpa >= region->base + ct3d->dc.total_capacity) { + return NULL; + } + + /* + * CXL r3.1 section 9.13.3: Dynamic Capacity Device (DCD) + * + * Regions are used in increasing-DPA order, with Region 0 being used for + * the lowest DPA of Dynamic Capacity and Region 7 for the highest DPA. + * So check from the last region to find where the dpa belongs. Extents that + * cross multiple regions are not allowed. + */ + for (i = ct3d->dc.num_regions - 1; i >= 0; i--) { + region = &ct3d->dc.regions[i]; + if (dpa >= region->base) { + if (dpa + len > region->base + region->len) { + return NULL; + } + return region; + } + } + + return NULL; +} + +static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, + uint64_t dpa, + uint64_t len, + uint8_t *tag, + uint16_t shared_seq) +{ + CXLDCExtent *extent; + + extent = g_new0(CXLDCExtent, 1); + extent->start_dpa = dpa; + extent->len = len; + if (tag) { + memcpy(extent->tag, tag, 0x10); + } + extent->shared_seq = shared_seq; + + QTAILQ_INSERT_TAIL(list, extent, node); +} + +static void cxl_remove_extent_from_extent_list(CXLDCExtentList *list, + CXLDCExtent *extent) +{ + QTAILQ_REMOVE(list, extent, node); + g_free(extent); +} + +/* + * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload + * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload + */ +typedef struct CXLUpdateDCExtentListInPl { + uint32_t num_entries_updated; + uint8_t flags; + uint8_t rsvd[3]; + /* CXL r3.1 Table 8-169: Updated Extent List */ + struct { + uint64_t start_dpa; + uint64_t len; + uint8_t rsvd[8]; + } QEMU_PACKED updated_entries[]; +} QEMU_PACKED CXLUpdateDCExtentListInPl; + +/* + * For the extents in the extent list to operate, check whether they are valid + * 1. The extent should be in the range of a valid DC region; + * 2. The extent should not cross multiple regions; + * 3. The start DPA and the length of the extent should align with the block + * size of the region; + * 4. The address range of multiple extents in the list should not overlap. + */ +static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d, + const CXLUpdateDCExtentListInPl *in) +{ + uint64_t min_block_size = UINT64_MAX; + CXLDCDRegion *region = &ct3d->dc.regions[0]; + CXLDCDRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1]; + g_autofree unsigned long *blk_bitmap = NULL; + uint64_t dpa, len; + uint32_t i; + + for (i = 0; i < ct3d->dc.num_regions; i++) { + region = &ct3d->dc.regions[i]; + min_block_size = MIN(min_block_size, region->block_size); + } + + blk_bitmap = bitmap_new((lastregion->base + lastregion->len - + ct3d->dc.regions[0].base) / min_block_size); + + for (i = 0; i < in->num_entries_updated; i++) { + dpa = in->updated_entries[i].start_dpa; + len = in->updated_entries[i].len; + + region = cxl_find_dc_region(ct3d, dpa, len); + if (!region) { + return CXL_MBOX_INVALID_PA; + } + + dpa -= ct3d->dc.regions[0].base; + if (dpa % region->block_size || len % region->block_size) { + return CXL_MBOX_INVALID_EXTENT_LIST; + } + /* the dpa range already covered by some other extents in the list */ + if (test_any_bits_set(blk_bitmap, dpa / min_block_size, + len / min_block_size)) { + return CXL_MBOX_INVALID_EXTENT_LIST; + } + bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size); + } + + return CXL_MBOX_SUCCESS; +} + +/* + * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (opcode 4802h) + * An extent is added to the extent list and becomes usable only after the + * response is processed successfully + */ +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd, + uint8_t *payload_in, + size_t len_in, + uint8_t *payload_out, + size_t *len_out, + CXLCCI *cci) +{ + CXLUpdateDCExtentListInPl *in = (void *)payload_in; + CXLType3Dev *ct3d = CXL_TYPE3(cci->d); + CXLDCExtentList *extent_list = &ct3d->dc.extents; + CXLDCExtent *ent; + uint32_t i; + uint64_t dpa, len; + CXLRetCode ret; + + if (in->num_entries_updated == 0) { + return CXL_MBOX_SUCCESS; + } + + ret = cxl_detect_malformed_extent_list(ct3d, in); + if (ret != CXL_MBOX_SUCCESS) { + return ret; + } + + for (i = 0; i < in->num_entries_updated; i++) { + dpa = in->updated_entries[i].start_dpa; + len = in->updated_entries[i].len; + + /* + * Check if the DPA range of the to-be-added extent overlaps with + * existing extent list maintained by the device. + */ + QTAILQ_FOREACH(ent, extent_list, node) { + if (ent->start_dpa <= dpa && + dpa + len <= ent->start_dpa + ent->len) { + return CXL_MBOX_INVALID_PA; + /* Overlapping one end of the other */ + } else if ((dpa < ent->start_dpa + ent->len && + dpa + len > ent->start_dpa + ent->len) || + (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) { + return CXL_MBOX_INVALID_PA; + } + } + + /* + * TODO: we will add a pending extent list based on event log record + * and verify the input response; also, the "More" flag is not + * considered at the moment. + */ + + cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0); + ct3d->dc.total_extent_count += 1; + } + + return CXL_MBOX_SUCCESS; +} + +/* + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (opcode 4803h) + */ +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd, + uint8_t *payload_in, + size_t len_in, + uint8_t *payload_out, + size_t *len_out, + CXLCCI *cci) +{ + CXLUpdateDCExtentListInPl *in = (void *)payload_in; + CXLType3Dev *ct3d = CXL_TYPE3(cci->d); + CXLDCExtentList *extent_list = &ct3d->dc.extents; + CXLDCExtent *ent; + uint32_t i; + uint64_t dpa, len; + CXLRetCode ret; + + if (in->num_entries_updated == 0) { + return CXL_MBOX_INVALID_INPUT; + } + + ret = cxl_detect_malformed_extent_list(ct3d, in); + if (ret != CXL_MBOX_SUCCESS) { + return ret; + } + + for (i = 0; i < in->num_entries_updated; i++) { + bool found = false; + + dpa = in->updated_entries[i].start_dpa; + len = in->updated_entries[i].len; + + QTAILQ_FOREACH(ent, extent_list, node) { + if (ent->start_dpa <= dpa && + dpa + len <= ent->start_dpa + ent->len) { + /* + * If an incoming extent covers a portion of an extent + * in the device extent list, remove only the overlapping + * portion, meaning + * 1. the portions that are not covered by the incoming + * extent at both end of the original extent will become + * new extents and inserted to the extent list; and + * 2. the original extent is removed from the extent list; + * 3. dc extent count is updated accordingly. + */ + uint64_t ent_start_dpa = ent->start_dpa; + uint64_t ent_len = ent->len; + uint64_t len1 = dpa - ent_start_dpa; + uint64_t len2 = ent_start_dpa + ent_len - dpa - len; + + found = true; + cxl_remove_extent_from_extent_list(extent_list, ent); + ct3d->dc.total_extent_count -= 1; + + if (len1) { + cxl_insert_extent_to_extent_list(extent_list, + ent_start_dpa, len1, + NULL, 0); + ct3d->dc.total_extent_count += 1; + } + if (len2) { + cxl_insert_extent_to_extent_list(extent_list, dpa + len, + len2, NULL, 0); + ct3d->dc.total_extent_count += 1; + } + break; + /*Currently we reject the attempt to remove a superset*/ + } else if ((dpa < ent->start_dpa + ent->len && + dpa + len > ent->start_dpa + ent->len) || + (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) { + return CXL_MBOX_INVALID_EXTENT_LIST; + } + } + + if (!found) { + /* Try to remove a non-existing extent */ + return CXL_MBOX_INVALID_PA; + } + } + + return CXL_MBOX_SUCCESS; +} + #define IMMEDIATE_CONFIG_CHANGE (1 << 1) #define IMMEDIATE_DATA_CHANGE (1 << 2) #define IMMEDIATE_POLICY_CHANGE (1 << 3) @@ -1462,6 +1744,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = { [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = { "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list, 8, 0 }, + [DCD_CONFIG][ADD_DYN_CAP_RSP] = { + "ADD_DCD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp, + ~0, IMMEDIATE_DATA_CHANGE }, + [DCD_CONFIG][RELEASE_DYN_CAP] = { + "RELEASE_DCD_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap, + ~0, IMMEDIATE_DATA_CHANGE }, }; static const struct cxl_cmd cxl_cmd_set_sw[256][256] = { diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h index 12a6fb47a9..6178416cbb 100644 --- a/include/hw/cxl/cxl_device.h +++ b/include/hw/cxl/cxl_device.h @@ -548,4 +548,6 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d); void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d); +CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len); + #endif