Message ID | 20221107025359.2911028-1-jiaqiyan@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Memory poison recovery in khugepaged collapsing | expand |
On Sun, 6 Nov 2022 18:53:57 -0800 Jiaqi Yan <jiaqiyan@google.com> wrote: > Memory DIMMs are subject to multi-bit flips, i.e. memory errors. > As memory size and density increase, the chances of and number of > memory errors increase. The increasing size and density of server > RAM in the data center and cloud have shown increased uncorrectable > memory errors. There are already mechanisms in the kernel to recover > from uncorrectable memory errors. This series of patches provides > the recovery mechanism for the particular kernel agent khugepaged > when it collapses memory pages. Thanks, I'll toss v6 into mm-unstable for some testing, pending further review. When resending a patchset, please try to also cc the people who have commented on previous versions.
Thanks for ccing Oscar, Andrew. After getting this patch into our internal production environment, I recently found a regression bug introduced by my commit a0157a2c735b ("mm/khugepaged: recover from poisoned file-backed memory"). Given it is only in mm-unstable, I wonder should I put out a v7 with the fix, or should I make it a new and separate commit? Sorry for the bug. On Mon, Nov 7, 2022 at 12:53 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Sun, 6 Nov 2022 18:53:57 -0800 Jiaqi Yan <jiaqiyan@google.com> wrote: > > > Memory DIMMs are subject to multi-bit flips, i.e. memory errors. > > As memory size and density increase, the chances of and number of > > memory errors increase. The increasing size and density of server > > RAM in the data center and cloud have shown increased uncorrectable > > memory errors. There are already mechanisms in the kernel to recover > > from uncorrectable memory errors. This series of patches provides > > the recovery mechanism for the particular kernel agent khugepaged > > when it collapses memory pages. > > Thanks, I'll toss v6 into mm-unstable for some testing, pending further review. > > When resending a patchset, please try to also cc the people who have > commented on previous versions. >
On Wed, 16 Nov 2022 09:58:23 -0800 Jiaqi Yan <jiaqiyan@google.com> wrote: > Thanks for ccing Oscar, Andrew. > > After getting this patch into our internal production environment, I > recently found a regression bug introduced by my commit a0157a2c735b > ("mm/khugepaged: recover from poisoned file-backed memory"). > Given it is only in mm-unstable, I wonder should I put out a v7 with > the fix, or should I make it a new and separate commit? Either approach is OK. I usually convert replacement patches into deltas so I and others can see what changed. But people like to see the whole patch for review purposes, so I guess that if you email out a new version of the patch, we get to see both.
On Wed, Nov 16, 2022 at 1:52 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Wed, 16 Nov 2022 09:58:23 -0800 Jiaqi Yan <jiaqiyan@google.com> wrote: > > > Thanks for ccing Oscar, Andrew. > > > > After getting this patch into our internal production environment, I > > recently found a regression bug introduced by my commit a0157a2c735b > > ("mm/khugepaged: recover from poisoned file-backed memory"). > > Given it is only in mm-unstable, I wonder should I put out a v7 with > > the fix, or should I make it a new and separate commit? > > Either approach is OK. I usually convert replacement patches into > deltas so I and others can see what changed. But people like to see > the whole patch for review purposes, so I guess that if you email out a > new version of the patch, we get to see both. > > Thanks Andrew. I just sent out v7 with the fix.