Message ID | 20240306-mmc-partswitch-v1-1-bf116985d950@codewreck.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mmc: part_switch: fixes switch on gp3 partition | expand |
On Wed, Mar 6, 2024 at 2:45 AM Dominique Martinet <asmadeus@codewreck.org> wrote: > From: Dominique Martinet <dominique.martinet@atmark-techno.com> > > Commit e7794c14fd73 ("mmc: rpmb: fixes pause retune on all RPMB > partitions.") added a mask check for 'part_type', but the mask used was > wrong leading to the code intended for rpmb also being executed for GP3. > > On some MMCs (but not all) this would make gp3 partition inaccessible: > armadillo:~# head -c 1 < /dev/mmcblk2gp3 > head: standard input: I/O error > armadillo:~# dmesg -c > [ 422.976583] mmc2: running CQE recovery > [ 423.058182] mmc2: running CQE recovery > [ 423.137607] mmc2: running CQE recovery > [ 423.137802] blk_update_request: I/O error, dev mmcblk2gp3, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0 > [ 423.237125] mmc2: running CQE recovery > [ 423.318206] mmc2: running CQE recovery > [ 423.397680] mmc2: running CQE recovery > [ 423.397837] blk_update_request: I/O error, dev mmcblk2gp3, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 > [ 423.408287] Buffer I/O error on dev mmcblk2gp3, logical block 0, async page read > > the part_type values of interest here are defined as follow: > main 0 > boot0 1 > boot1 2 > rpmb 3 > gp0 4 > gp1 5 > gp2 6 > gp3 7 > > so mask with EXT_CSD_PART_CONFIG_ACC_MASK (7) to correctly identify rpmb > > Fixes: e7794c14fd73 ("mmc: rpmb: fixes pause retune on all RPMB partitions.") > Cc: stable@vger.kernel.org > Cc: Jorge Ramirez-Ortiz <jorge@foundries.io> > Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com> The patch: Reviewed-by: Linus Walleij <linus.walleij@linaro.org> > A couple of notes: > - this doesn't fail on all eMMCs, I can still access gp3 on some models > but it seems to fail reliably with micron's "G1M15L" > - I've encountered this on the 5.10 backport (in 5.10.208), so that'll > need to be backported everywhere the fix was taken... Which device is this? I have never seen an eMMC using the GP:s in my life. Or did you create the GP manually? Yours, Linus Walleij
Thanks for the review! Linus Walleij wrote on Wed, Mar 06, 2024 at 09:03:45AM +0100: > > A couple of notes: > > - this doesn't fail on all eMMCs, I can still access gp3 on some models > > but it seems to fail reliably with micron's "G1M15L" > > - I've encountered this on the 5.10 backport (in 5.10.208), so that'll > > need to be backported everywhere the fix was taken... > > Which device is this? Sorry I gave the 'name' as seen in linux, the actual reference is MTFC32GAZAQHD-IT (but we're creating the GP manually) > I have never seen an eMMC using the GP:s in my life. > > Or did you create the GP manually? Yes, we're creating them on all our devices: (japanese only, but you might get something out of google translate or similar if interested: https://armadillo.atmark-techno.com/about ) I'd be hard pressed to explain why other than "it's always been done" and I've been replicating that pattern, but as far as I understand the gp partitions are always created in "write reliable" mode so it might help a bit with lifetime if the main area is left as default. We're using them for rare, important logs and things like that (minimal trace of updates, hard reset etc for diagnostic)
On 06/03/24 10:44:38, Dominique Martinet wrote: > From: Dominique Martinet <dominique.martinet@atmark-techno.com> > > Commit e7794c14fd73 ("mmc: rpmb: fixes pause retune on all RPMB > partitions.") added a mask check for 'part_type', but the mask used was > wrong leading to the code intended for rpmb also being executed for GP3. > > On some MMCs (but not all) this would make gp3 partition inaccessible: > armadillo:~# head -c 1 < /dev/mmcblk2gp3 > head: standard input: I/O error > armadillo:~# dmesg -c > [ 422.976583] mmc2: running CQE recovery > [ 423.058182] mmc2: running CQE recovery > [ 423.137607] mmc2: running CQE recovery > [ 423.137802] blk_update_request: I/O error, dev mmcblk2gp3, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0 > [ 423.237125] mmc2: running CQE recovery > [ 423.318206] mmc2: running CQE recovery > [ 423.397680] mmc2: running CQE recovery > [ 423.397837] blk_update_request: I/O error, dev mmcblk2gp3, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 > [ 423.408287] Buffer I/O error on dev mmcblk2gp3, logical block 0, async page read > > the part_type values of interest here are defined as follow: > main 0 > boot0 1 > boot1 2 > rpmb 3 > gp0 4 > gp1 5 > gp2 6 > gp3 7 right, the patch I originally sent didn't consider anything after GP0 as per the definitions below. #define EXT_CSD_PART_CONFIG_ACC_MASK (0x7) #define EXT_CSD_PART_CONFIG_ACC_BOOT0 (0x1) #define EXT_CSD_PART_CONFIG_ACC_RPMB (0x3) #define EXT_CSD_PART_CONFIG_ACC_GP0 (0x4) That looked strange as there should be support for 4 GP but this code kind of convinced me of the opposite. if (idata->rpmb) { /* Support multiple RPMB partitions */ target_part = idata->rpmb->part_index; target_part |= EXT_CSD_PART_CONFIG_ACC_RPMB; } So if we apply the fix that you propose, how are multiple RPMB partitions (ie, 4) going to be identified as RPMB? Unless there can't be more than 3? But sure, your patch makes sense to me. > > so mask with EXT_CSD_PART_CONFIG_ACC_MASK (7) to correctly identify rpmb > > Fixes: e7794c14fd73 ("mmc: rpmb: fixes pause retune on all RPMB partitions.") > Cc: stable@vger.kernel.org > Cc: Jorge Ramirez-Ortiz <jorge@foundries.io> > Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com> > --- > A couple of notes: > - this doesn't fail on all eMMCs, I can still access gp3 on some models > but it seems to fail reliably with micron's "G1M15L" > - I've encountered this on the 5.10 backport (in 5.10.208), so that'll > need to be backported everywhere the fix was taken... > > Thanks! > --- > drivers/mmc/core/block.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c > index 32d49100dff5..86efa6084696 100644 > --- a/drivers/mmc/core/block.c > +++ b/drivers/mmc/core/block.c > @@ -874,10 +874,11 @@ static const struct block_device_operations mmc_bdops = { > static int mmc_blk_part_switch_pre(struct mmc_card *card, > unsigned int part_type) > { > - const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_RPMB; > + const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_MASK; > + const unsigned int rpmb = EXT_CSD_PART_CONFIG_ACC_RPMB; > int ret = 0; > > - if ((part_type & mask) == mask) { > + if ((part_type & mask) == rpmb) { > if (card->ext_csd.cmdq_en) { > ret = mmc_cmdq_disable(card); > if (ret) > @@ -892,10 +893,11 @@ static int mmc_blk_part_switch_pre(struct mmc_card *card, > static int mmc_blk_part_switch_post(struct mmc_card *card, > unsigned int part_type) > { > - const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_RPMB; > + const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_MASK; > + const unsigned int rpmb = EXT_CSD_PART_CONFIG_ACC_RPMB; > int ret = 0; > > - if ((part_type & mask) == mask) { > + if ((part_type & mask) == rpmb) { > mmc_retune_unpause(card->host); > if (card->reenable_cmdq && !card->ext_csd.cmdq_en) > ret = mmc_cmdq_enable(card); > > --- > base-commit: 5847c9777c303a792202c609bd761dceb60f4eed > change-id: 20240306-mmc-partswitch-c3a50b5084ae > > Best regards, > -- > Dominique Martinet | Asmadeus >
Jorge Ramirez-Ortiz, Foundries wrote on Wed, Mar 06, 2024 at 10:05:40AM +0100: > > the part_type values of interest here are defined as follow: > > main 0 > > boot0 1 > > boot1 2 > > rpmb 3 > > gp0 4 > > gp1 5 > > gp2 6 > > gp3 7 > > right, the patch I originally sent didn't consider anything after GP0 as per > the definitions below. > > #define EXT_CSD_PART_CONFIG_ACC_MASK (0x7) > #define EXT_CSD_PART_CONFIG_ACC_BOOT0 (0x1) > #define EXT_CSD_PART_CONFIG_ACC_RPMB (0x3) > #define EXT_CSD_PART_CONFIG_ACC_GP0 (0x4) Yes, as far as I can see these are used in drivers/mmc/core/mmc.c for example for GP0, below snippet: mmc_part_add(card, part_size << 19, EXT_CSD_PART_CONFIG_ACC_GP0 + idx, "gp%d", idx, false, MMC_BLK_DATA_AREA_GP); where idx is in [0;3], so we've got 4-7 for GP partitions in the part's part_cfg. (similarly, boot has BOOT0 + [0-1], and RPMB has RPMB without anything added -- so as far as this field is concerned there seems to be a single RPMB) > That looked strange as there should be support for 4 GP but this code > kind of convinced me of the opposite. > > if (idata->rpmb) { > /* Support multiple RPMB partitions */ > target_part = idata->rpmb->part_index; > target_part |= EXT_CSD_PART_CONFIG_ACC_RPMB; > } > > So if we apply the fix that you propose, how are multiple RPMB > partitions (ie, 4) going to be identified as RPMB? Unless there can't be > more than 3? Hmm, that code is definitely odd. Reading this I'd normally assume that idata->rpmb->part_index ought to be in a range separate fom EXT_CSD_PART_CONFIG_ACC_MASK -- so we've got the ACC_RPMB "flag" that identifies it as RPMB within the mask, and then it can target a given index within that. But as far as I'm seeing in the code, rpmb->part_index always comes from mmc_blk_alloc_rpmb_part (set to part's part_cfg), which in turn is only called if area_type is MMC_BLK_DATA_AREA_RPMB, which is only set for mmc_part_add() for rpmb with ACC_RPMB as part_index.. So we've got target_part = 3 and then target_part |= 3 which will leave the value unchanged. Even assuming part_index was something else than 3 (let's say 1 or 2), we'd end up with target_part = 4 or 5 which won't match the PART_CONFIG_ACC_MASK check (&3 != 3), so it doesn't make sense until something is shifted somewhere outside of the mask, and I see no trace of part_index being shifted. So the if (idata->rpmb) itself makes sense as per the comment, but we could just have target_part take either values here as far as I understand. I've never actually used the rpmb partition of my MMCs so I'm not sure how multiple RPMB partitions is supposed to work in the first place, sorry. That code is authored by Linus W (in 2017), perhaps he'll remember something?
On Wed, Mar 6, 2024 at 10:05 AM Jorge Ramirez-Ortiz, Foundries <jorge@foundries.io> wrote: > That looked strange as there should be support for 4 GP but this code > kind of convinced me of the opposite. > > if (idata->rpmb) { > /* Support multiple RPMB partitions */ > target_part = idata->rpmb->part_index; > target_part |= EXT_CSD_PART_CONFIG_ACC_RPMB; > } > > So if we apply the fix that you propose, how are multiple RPMB > partitions (ie, 4) going to be identified as RPMB? Unless there can't be > more than 3? Sorry for writing bad code comments. This comment means: "support multiple RPMB partitions [on the same Linux system]" not: "support multiple RPMB partitions [on the same eMMC device]" Yours, Linus Walleij
On Wed, Mar 6, 2024 at 10:05 AM Jorge Ramirez-Ortiz, Foundries <jorge@foundries.io> wrote: > That looked strange as there should be support for 4 GP but this code > kind of convinced me of the opposite. > > if (idata->rpmb) { > /* Support multiple RPMB partitions */ > target_part = idata->rpmb->part_index; > target_part |= EXT_CSD_PART_CONFIG_ACC_RPMB; > } > > So if we apply the fix that you propose, how are multiple RPMB > partitions (ie, 4) going to be identified as RPMB? Unless there can't be > more than 3? As far as I can tell there can only be one RPMB partition per device. The v5.1A spec says (section 6.2.1): "Two Boot Area Partitions, (...)" "One RPMB Partition accessed through a trusted mechanism, (...)" "Four General Purpose Area Partitions (...)" implying there can be only one RPMB. Also I have never seen more than one in practice. I paged in Jens Wiklander and Tomas Winkler who used it much more than I have, to confirm. I think my linked list of RPMB partitions is a case of overdesign and could be simplified. I blame the fact that I didn't have the (non-public) spec at the time. Yours, Linus Walleij
On 06/03/24 14:18:49, Linus Walleij wrote: > On Wed, Mar 6, 2024 at 10:05 AM Jorge Ramirez-Ortiz, Foundries > <jorge@foundries.io> wrote: > > > That looked strange as there should be support for 4 GP but this code > > kind of convinced me of the opposite. > > > > if (idata->rpmb) { > > /* Support multiple RPMB partitions */ > > target_part = idata->rpmb->part_index; > > target_part |= EXT_CSD_PART_CONFIG_ACC_RPMB; > > } > > > > So if we apply the fix that you propose, how are multiple RPMB > > partitions (ie, 4) going to be identified as RPMB? Unless there can't be > > more than 3? > > As far as I can tell there can only be one RPMB partition per device. that matches everything I have seen in the field too (and we have been supporting RPMB on many designs lately (# > 30). > > The v5.1A spec says (section 6.2.1): > > "Two Boot Area Partitions, (...)" > "One RPMB Partition accessed through a trusted mechanism, (...)" > "Four General Purpose Area Partitions (...)" > > implying there can be only one RPMB. > > Also I have never seen more than one in practice. +1 so I think it is safe to conclude that my commit did indeed cause these regressions as it ignored the support for multiple GP. Sorry about it!. I still cant grasp how "target_part = idata->rpmb->part_index" is helping in the design. What happens when: 1) EXT_CSD_PART_CONFIG_ACC_MASK > part_index > EXT_CSD_PART_CONFIG_ACC_RPMB target_part now could be indicating a GP instead of an RPMB leading to failures. 2) part_index <= EXT_CSD_PART_CONFIG_ACC_RPMB loses the part_index value . So part_index should be larger than EXT_CSD_PART_CONFIG_ACC_MASK even though the comment indicates it starts at 0? /** * struct mmc_rpmb_data - special RPMB device type for these areas * @dev: the device for the RPMB area * @chrdev: character device for the RPMB area * @id: unique device ID number * @part_index: partition index (0 on first) <--------------------- * @md: parent MMC block device * @node: list item, so we can put this device on a list */ struct mmc_rpmb_data { struct device dev; struct cdev chrdev; int id; is it just possible that "target_part = idata->rpmb->part_index" just needs to be shifted to avoid issues? I think the fix to the regression I introduced could perhaps address this as well.
On Wed, Mar 6, 2024 at 3:22 PM Jorge Ramirez-Ortiz, Foundries <jorge@foundries.io> wrote: > I still cant grasp how "target_part = idata->rpmb->part_index" is > helping in the design. > > What happens when: > 1) EXT_CSD_PART_CONFIG_ACC_MASK > part_index > EXT_CSD_PART_CONFIG_ACC_RPMB > target_part now could be indicating a GP instead of an RPMB leading to failures. > > 2) part_index <= EXT_CSD_PART_CONFIG_ACC_RPMB > loses the part_index value . > > So part_index should be larger than EXT_CSD_PART_CONFIG_ACC_MASK even > though the comment indicates it starts at 0? > > /** > * struct mmc_rpmb_data - special RPMB device type for these areas > * @dev: the device for the RPMB area > * @chrdev: character device for the RPMB area > * @id: unique device ID number > * @part_index: partition index (0 on first) <--------------------- > * @md: parent MMC block device > * @node: list item, so we can put this device on a list > */ > struct mmc_rpmb_data { > struct device dev; > struct cdev chrdev; > int id; > > is it just possible that "target_part = idata->rpmb->part_index" just > needs to be shifted to avoid issues? > > I think the fix to the regression I introduced could perhaps address > this as well. I have no clue how the regression happened really ... heh. We should probably rename it part_cfg because that is what we store in it, it's assigned from card->part[idx].part_cfg. Then the id field in mmc_rpmb_data should be deleted along with all the IDA counter code etc and the partition name hardcoded to be "0" as there will never be anything else. Yours, Linus Walleij
On Wed, 6 Mar 2024 at 15:38, Linus Walleij <linus.walleij@linaro.org> wrote: > > On Wed, Mar 6, 2024 at 3:22 PM Jorge Ramirez-Ortiz, Foundries > <jorge@foundries.io> wrote: > > > I still cant grasp how "target_part = idata->rpmb->part_index" is > > helping in the design. > > > > What happens when: > > 1) EXT_CSD_PART_CONFIG_ACC_MASK > part_index > EXT_CSD_PART_CONFIG_ACC_RPMB > > target_part now could be indicating a GP instead of an RPMB leading to failures. > > > > 2) part_index <= EXT_CSD_PART_CONFIG_ACC_RPMB > > loses the part_index value . > > > > So part_index should be larger than EXT_CSD_PART_CONFIG_ACC_MASK even > > though the comment indicates it starts at 0? > > > > /** > > * struct mmc_rpmb_data - special RPMB device type for these areas > > * @dev: the device for the RPMB area > > * @chrdev: character device for the RPMB area > > * @id: unique device ID number > > * @part_index: partition index (0 on first) <--------------------- > > * @md: parent MMC block device > > * @node: list item, so we can put this device on a list > > */ > > struct mmc_rpmb_data { > > struct device dev; > > struct cdev chrdev; > > int id; > > > > is it just possible that "target_part = idata->rpmb->part_index" just > > needs to be shifted to avoid issues? > > > > I think the fix to the regression I introduced could perhaps address > > this as well. > > I have no clue how the regression happened really ... heh. > > We should probably rename it part_cfg because that is what we > store in it, it's assigned from card->part[idx].part_cfg. > > Then the id field in mmc_rpmb_data should be deleted along > with all the IDA counter code etc and the partition name hardcoded > to be "0" as there will never be anything else. Seems reasonable to me. Are you thinking of sending a cleanup patch on top of $subject patch? Kind regards Uffe
On Wed, Mar 6, 2024 at 4:57 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > On Wed, 6 Mar 2024 at 15:38, Linus Walleij <linus.walleij@linaro.org> wrote: > > We should probably rename it part_cfg because that is what we > > store in it, it's assigned from card->part[idx].part_cfg. > > > > Then the id field in mmc_rpmb_data should be deleted along > > with all the IDA counter code etc and the partition name hardcoded > > to be "0" as there will never be anything else. > > Seems reasonable to me. Are you thinking of sending a cleanup patch on > top of $subject patch? Yes I can do that once this patch is finalized and merged. Yours, Linus Walleij
On Wed, 6 Mar 2024 at 20:49, Linus Walleij <linus.walleij@linaro.org> wrote: > > On Wed, Mar 6, 2024 at 4:57 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Wed, 6 Mar 2024 at 15:38, Linus Walleij <linus.walleij@linaro.org> wrote: > > > > We should probably rename it part_cfg because that is what we > > > store in it, it's assigned from card->part[idx].part_cfg. > > > > > > Then the id field in mmc_rpmb_data should be deleted along > > > with all the IDA counter code etc and the partition name hardcoded > > > to be "0" as there will never be anything else. > > > > Seems reasonable to me. Are you thinking of sending a cleanup patch on > > top of $subject patch? >? > Yes I can do that once this patch is finalized and merged. Great! I don't think the $subject patch is perfect, but it's slim and suitable for stable - and of course it seems to do the job well. I am therefore queuing it for fixes, thanks everyone for helping out! Kind regards Uffe
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 32d49100dff5..86efa6084696 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -874,10 +874,11 @@ static const struct block_device_operations mmc_bdops = { static int mmc_blk_part_switch_pre(struct mmc_card *card, unsigned int part_type) { - const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_RPMB; + const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_MASK; + const unsigned int rpmb = EXT_CSD_PART_CONFIG_ACC_RPMB; int ret = 0; - if ((part_type & mask) == mask) { + if ((part_type & mask) == rpmb) { if (card->ext_csd.cmdq_en) { ret = mmc_cmdq_disable(card); if (ret) @@ -892,10 +893,11 @@ static int mmc_blk_part_switch_pre(struct mmc_card *card, static int mmc_blk_part_switch_post(struct mmc_card *card, unsigned int part_type) { - const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_RPMB; + const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_MASK; + const unsigned int rpmb = EXT_CSD_PART_CONFIG_ACC_RPMB; int ret = 0; - if ((part_type & mask) == mask) { + if ((part_type & mask) == rpmb) { mmc_retune_unpause(card->host); if (card->reenable_cmdq && !card->ext_csd.cmdq_en) ret = mmc_cmdq_enable(card);