Message ID | oriounafd8.fsf@livre.home (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Dec 17, 2013 at 7:25 PM, Alexandre Oliva <oliva@gnu.org> wrote: > This is a (probably half-baked) solution for the problem of cephfs > clusters encountering recovery problems when clients are accessing files > that don't have a parent attribute. It enables clients to request the > parent attribute to be updated right away, by a simple setxattr call: > > # setfattr -n ceph.parent /cephfs/mount/path/name > > I had to relax the assert because there's no reason I can think of to > force the object and its parent dirty just to set this internal > bookkeeping xattr. The operation is not journaled, as it takes effect > immediately. Although there's no assurance that the operation is > completed before success is returned, my tests indicate that running > rados getxattr right after setfattr already gets an attribute, for > objects that had been created before the introduction of the parent > attribute. > > I realize Zheng Yan posted a patch that would mark for update missing or > too-old parent attributes on the fly, when inodes were brought into the > cache, so that the parent attribute would be updated when the inodes > were to be expired from the MDS log. I had mds running with that patch > for a while, and I even explicitly touched and linked files and dirs > that were missing the parent attribute; many, but not all of the files > and dirs got the attribute, and I'm having some difficulties getting it > to work on the remaining ones, in part because it takes so long to take > effect (as in, perform an operation, then wait for several hours until > the then-current MDS log segment gets expired, then check whether the > attribute was set). This patch causes the parent attribute to be set > right away. > > I'm not sure this immediate behavior would be appropriate for use in > production (as in, I'm not sure creating an inode and trying to set the > parent attribute might cause a failure because the inode object isn't > there yet by the time we try to set an attribute on it), but it should > be ok for retrofitting ancient inode objects so that they don't cause > recovery problems due to the lack of the parent xattr. > Hi, This seems like a good solution for fixing cephfs that was created before dumpling. But I'm not sure whether we should merge this patch (how many cephfs that was created before dumpling still exist) Regards Yan, Zheng > BTW, Zheng Yan, thanks for the patch that fixed mds readdir with dirs > ending in remote (hard) links; this one had annoyed me for a long time, > and I was just about to start actually digging into it when I saw the > 0.73 announcement that mentioned what appeared to be a fix for the > problem I was running into, and indeed, it was. I merged it into my > 0.72.1 build and it's been working great! > > > > -- > Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ > You must be the change you wish to see in the world. -- Gandhi > Be Free! -- http://FSFLA.org/ FSF Latin America board member > Free Software Evangelist Red Hat Brazil Compiler Engineer > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Dec 18, 2013, "Yan, Zheng" <ukernel@gmail.com> wrote: > On Tue, Dec 17, 2013 at 7:25 PM, Alexandre Oliva <oliva@gnu.org> wrote: >> # setfattr -n ceph.parent /cephfs/mount/path/name > This seems like a good solution for fixing cephfs that was created > before dumpling. There's more to it than just that, actually. Renaming an entire subtree won't update the parent attribute of files in there, so they will appear to be incorrect (*). This patch introduces a mechanism that could be used to force them to be updated. (*) I'm well aware that they contain enough information to find the updated information, so the redundant info in this attribute can be harmlessly out-of-date, but if someone plans to use the data for other purposes (like I sometimes do), it's useful to have them fully up to date. I also move large trees around, which makes this issue visible.
On Wed, 18 Dec 2013, Alexandre Oliva wrote: > On Dec 18, 2013, "Yan, Zheng" <ukernel@gmail.com> wrote: > > > On Tue, Dec 17, 2013 at 7:25 PM, Alexandre Oliva <oliva@gnu.org> wrote: > >> # setfattr -n ceph.parent /cephfs/mount/path/name Can we add an additional prefix indicating that this is a debug/developer kludge that is not intended to be supported in the long-term? ceph.dev.force_rewrite_backtrace or something? sage > > > This seems like a good solution for fixing cephfs that was created > > before dumpling. > > There's more to it than just that, actually. Renaming an entire subtree > won't update the parent attribute of files in there, so they will appear > to be incorrect (*). This patch introduces a mechanism that could be > used to force them to be updated. > > (*) I'm well aware that they contain enough information to find the > updated information, so the redundant info in this attribute can be > harmlessly out-of-date, but if someone plans to use the data for other > purposes (like I sometimes do), it's useful to have them fully up to > date. I also move large trees around, which makes this issue visible. > > -- > Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ > You must be the change you wish to see in the world. -- Gandhi > Be Free! -- http://FSFLA.org/ FSF Latin America board member > Free Software Evangelist Red Hat Brazil Compiler Engineer > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 18, 2013 at 9:09 AM, Sage Weil <sage@inktank.com> wrote: > On Wed, 18 Dec 2013, Alexandre Oliva wrote: >> On Dec 18, 2013, "Yan, Zheng" <ukernel@gmail.com> wrote: >> >> > On Tue, Dec 17, 2013 at 7:25 PM, Alexandre Oliva <oliva@gnu.org> wrote: >> >> # setfattr -n ceph.parent /cephfs/mount/path/name > > Can we add an additional prefix indicating that this is a debug/developer > kludge that is not intended to be supported in the long-term? > ceph.dev.force_rewrite_backtrace or something? While the "ceph.dev" namespace might be a good idea for other things, I'm not sure we should merge anything that we can't support long-term. This probably wouldn't be too hard to get working properly, but if it's only going to impact people who have been using CephFS for so long they can build their own clusters I'm not sure it's worth the effort? In particular, I don't think this interacts properly with projection, so we can't make any guarantees that it's actually done what the user wants if there are ongoing changes, and it's sort of weird to not pay attention to if the update actually succeeds... -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com > > sage > >> >> > This seems like a good solution for fixing cephfs that was created >> > before dumpling. >> >> There's more to it than just that, actually. Renaming an entire subtree >> won't update the parent attribute of files in there, so they will appear >> to be incorrect (*). This patch introduces a mechanism that could be >> used to force them to be updated. >> >> (*) I'm well aware that they contain enough information to find the >> updated information, so the redundant info in this attribute can be >> harmlessly out-of-date, but if someone plans to use the data for other >> purposes (like I sometimes do), it's useful to have them fully up to >> date. I also move large trees around, which makes this issue visible. >> >> -- >> Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ >> You must be the change you wish to see in the world. -- Gandhi >> Be Free! -- http://FSFLA.org/ FSF Latin America board member >> Free Software Evangelist Red Hat Brazil Compiler Engineer >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Dec 18, 2013, Gregory Farnum <greg@inktank.com> wrote:
> This probably wouldn't be too hard to get working properly,
For some value of properly ;-)
The current state of affairs is that the parent attribute only gets
updated when the log segment is about to be expired. Worst case, using
the proposed setxattr extension will force it to be updated earlier.
How could that end up being a bad thing? It's not like we even use the
parent attribute for anything while the inode remains in the mds
journal.
So we have the following possibilities of divergence:
a) the inode is created or moved, and then someone calls
setxattr(parent), and the file remains in place until the inode gets
expired from the journal. the parent attribute will be updated at the
time of the setxattr request, but it won't ever be used before the inode
gets expired from the journal, at which point it would have been updated
to the same value.
b) the inode is absent from the journal, and someone calls
setxattr(parent), and then moves the inode to a different location. the
parent attribute will be updated (a nop unless the attribute is missing
or wrong) at the time of the setxattr request, and then the move
operation will cause the attribute to be overwritten at the time the
inode is about to be expired from the journal
c) the inode is moved, then setxattr(parent)ed, then moved again, before
the initial move gets expired from the journal. the setxattr will be
performed at the time it is requested, and it will be correct at that
point; when the first inode move is expired from the journal, the parent
attribute may or may not be updated (I'm not sure), but if it is, then
we're back to the original behavior, and anyway, this incorrect value
won't ever be used as long as the subsequent move remains in the journal
Did I miss any case?
Now, I've just run into another scenario in which this parent-setting
useful. I had to resort to --reset-journal (for reasons unknown), but
any files and directories created recently, whose create operations
hadn't been expired from the journal yet, won't get a parent attribute
from ceph unless I actually moved them about to force an update. This
means caps on them won't recover properly until I find out what they are
and perform corrective action.
Moving a bunch of objects is somewhat tricky, because if the mds
restarts just at the wrong time, the move operation will seem to fail
because the new mds won't recover that transaction correctly, precisely
because the object is absent from the journal and missing the parent
attribute. This sort of probably will often get a client stuck, or
signal an error that may or may not indicate the operation failed.
Plus, if I have to do that move dance on a large number of objects, odds
are the mds will get slow enough that a standby-replay mds will decide
it's dead and take over, and then fail to recover the ongoing
operations. See where I'm going? :-)
Having some means to update the internal bookkeeping parent attribute
without actually touching the inodes, not even their ctimes, is a plus
for this case.
So now it's not just really old ceph nodes and a wish to have accurate
information in the parent nodes, it's recovering from a --reset-journal
required by some other failure I couldn't figure out.
(hmm... if I have 2*N replicas of PGs in the metadata pool and demand N
replicas to be up for the PG to be deemed complete, if I shut down the N
replicas that are up after they get an update and bring up the other N
replicas, they will know they're out of date, right? IIUC that's what
the down state is about, although I'm not sure where the OSDs get the
info from to decide to enter that state; I've always assumed it was from
pg versions known by the monitors)
On 12/19/2013 04:00 PM, Alexandre Oliva wrote: > On Dec 18, 2013, Gregory Farnum <greg@inktank.com> wrote: > >> This probably wouldn't be too hard to get working properly, > > For some value of properly ;-) > > The current state of affairs is that the parent attribute only gets > updated when the log segment is about to be expired. Worst case, using > the proposed setxattr extension will force it to be updated earlier. > How could that end up being a bad thing? It's not like we even use the > parent attribute for anything while the inode remains in the mds > journal. > > So we have the following possibilities of divergence: > > a) the inode is created or moved, and then someone calls > setxattr(parent), and the file remains in place until the inode gets > expired from the journal. the parent attribute will be updated at the > time of the setxattr request, but it won't ever be used before the inode > gets expired from the journal, at which point it would have been updated > to the same value. > > b) the inode is absent from the journal, and someone calls > setxattr(parent), and then moves the inode to a different location. the > parent attribute will be updated (a nop unless the attribute is missing > or wrong) at the time of the setxattr request, and then the move > operation will cause the attribute to be overwritten at the time the > inode is about to be expired from the journal > > c) the inode is moved, then setxattr(parent)ed, then moved again, before > the initial move gets expired from the journal. the setxattr will be > performed at the time it is requested, and it will be correct at that > point; when the first inode move is expired from the journal, the parent > attribute may or may not be updated (I'm not sure), but if it is, then > we're back to the original behavior, and anyway, this incorrect value > won't ever be used as long as the subsequent move remains in the journal > > > Did I miss any case? > I think you are right. Setting the parent xattr direclty won't compromise the backtrace. > > Now, I've just run into another scenario in which this parent-setting > useful. I had to resort to --reset-journal (for reasons unknown), but > any files and directories created recently, whose create operations > hadn't been expired from the journal yet, won't get a parent attribute > from ceph unless I actually moved them about to force an update. This > means caps on them won't recover properly until I find out what they are > and perform corrective action. > > Moving a bunch of objects is somewhat tricky, because if the mds > restarts just at the wrong time, the move operation will seem to fail > because the new mds won't recover that transaction correctly, precisely > because the object is absent from the journal and missing the parent > attribute. This sort of probably will often get a client stuck, or > signal an error that may or may not indicate the operation failed. > > Plus, if I have to do that move dance on a large number of objects, odds > are the mds will get slow enough that a standby-replay mds will decide > it's dead and take over, and then fail to recover the ongoing > operations. See where I'm going? :-) > > Having some means to update the internal bookkeeping parent attribute > without actually touching the inodes, not even their ctimes, is a plus > for this case. > > > So now it's not just really old ceph nodes and a wish to have accurate > information in the parent nodes, it's recovering from a --reset-journal > required by some other failure I couldn't figure out. next time you encountered log corruption, please open "new issues" at http://tracker.ceph.com/ Regards Yan, Zheng > > (hmm... if I have 2*N replicas of PGs in the metadata pool and demand N > replicas to be up for the PG to be deemed complete, if I shut down the N > replicas that are up after they get an update and bring up the other N > replicas, they will know they're out of date, right? IIUC that's what > the down state is about, although I'm not sure where the OSDs get the > info from to decide to enter that state; I've always assumed it was from > pg versions known by the monitors) > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Dec 19, 2013, "Yan, Zheng" <zheng.z.yan@intel.com> wrote:
> next time you encountered log corruption, please open "new issues" at http://tracker.ceph.com/
I have saved the dumped journal, but it contains too much information
I'm not supposed to or willing to share, especially with a US-based
company in the post-Snowden era :-( (something like btrfs-image -s
might be good to have)
I'd be glad to try to dig info that might be useful out of it, if told
how to do so, but uploading the journal as taken from the cluster is not
an option.
On 12/20/2013 11:35 AM, Alexandre Oliva wrote: > On Dec 19, 2013, "Yan, Zheng" <zheng.z.yan@intel.com> wrote: > >> next time you encountered log corruption, please open "new issues" at http://tracker.ceph.com/ > > I have saved the dumped journal, but it contains too much information > I'm not supposed to or willing to share, especially with a US-based > company in the post-Snowden era :-( (something like btrfs-image -s > might be good to have) > > I'd be glad to try to dig info that might be useful out of it, if told > how to do so, but uploading the journal as taken from the cluster is not > an option. > can you send out the mds log (the latest few hundreds of lines) when the mds crashed. Regards Yan, Zheng -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Dec 20, 2013, "Yan, Zheng" <zheng.z.yan@intel.com> wrote: > On 12/20/2013 11:35 AM, Alexandre Oliva wrote: >> On Dec 19, 2013, "Yan, Zheng" <zheng.z.yan@intel.com> wrote: >> >>> next time you encountered log corruption, please open "new issues" at http://tracker.ceph.com/ >> >> I have saved the dumped journal, but it contains too much information >> I'm not supposed to or willing to share, especially with a US-based >> company in the post-Snowden era :-( (something like btrfs-image -s >> might be good to have) >> >> I'd be glad to try to dig info that might be useful out of it, if told >> how to do so, but uploading the journal as taken from the cluster is not >> an option. > can you send out the mds log (the latest few hundreds of lines) when the mds crashed. It didn't quite crash. Here's what it spit out: mds.0.cache creating system inode with ino:1 mds.0.journaler(ro) try_read_entry got 0 len entry at offset 1248878082748 mds.0.log _replay journaler got error -22, aborting mds.0.1307 boot_start encountered an error, failing mds.0.1307 suicide. wanted down:dne, now up:replay Before this, the mds was brought down along with the rest of the cluster: mds.0.1273 *** got signal Terminated *** mds.0.1273 suicide. wanted down:dne, now up:active The --reset-journal session, that aborted because of the assertion failure I mentioned in the patch, added the recent message log, but there's nothing useful there: the mds asked osds for each one of the log segments, got successful responses for them all, and then proceeded to reset the journal. The only oddity I found was this: client.411305.journaler(ro) _finish_probe_end write_pos = 1248890415494 (header had 1248890413881). recovered. For reference, here's the complete loghead info: client.411305.journaler(ro) _finish_read_head loghead(trim 1248761741312, expire 1248761744503, write 1248890413881). probing for end of log (from 1248890413881)... Note how the offset at which we got the zero-len entry is not close to either end of the (non-trimmed portion of) the log. 200.00048b1b, where the zero-len entry was, was not only read successfully, the corresponding file in the osds had the correct 4MB size, and the md5sum of that file was the same on all 4 osds that contained it. So, whatever it was that corrupted the log happened long before the mds shutdown; 200.00048b1b's timestamp is 1387397117, while 200.00048b1e's (the tail before the journal reset) is 1387430417; the 2 other files in between are spaced at a nearly constant rate of 10000 seconds per 4MB. The mds log has nothing special for that time range, but from the mon and the osd logs I can tell that was about the time in which I rolled back many of the osds to recent snapshots thereof, from which I'd cleaned all traces of the user.ceph._parent. I intended to roll back only 2 out of the 4 replicas of each pg at a time, and wait for them to recover, but I recall that during this process, two osds came down before they were meant to, because of timeouts caused by the metadata-heavy recursive setfattr. I guess this may have caused me to lose changes that had already been committed, by rolling back all the osds that had those changes. So I think by now I'm happy to announce that it was an IO error (where IO stands for Incompetence of the Operator ;-) Sorry about this disturbance, and thanks for asking me to investigate it further and find a probable cause that involves no fault of Ceph's.
On Dec 20, 2013, Alexandre Oliva <oliva@gnu.org> wrote: > back many of the osds to recent snapshots thereof, from which I'd > cleaned all traces of the user.ceph._parent. I intended to roll back Err, I meant user.ceph._path, of course ;-) > So I think by now I'm happy to announce that it was an IO error (where > IO stands for Incompetence of the Operator ;-) > Sorry about this disturbance, and thanks for asking me to investigate it > further and find a probable cause that involves no fault of Ceph's. I guess after the successful --reset-journal, I get to clean up on my own the journal files that are no longer used but that apparently won't get cleaned up by the mds any more. Right?
On Fri, Dec 20, 2013 at 4:50 PM, Alexandre Oliva <oliva@gnu.org> wrote: > On Dec 20, 2013, Alexandre Oliva <oliva@gnu.org> wrote: > >> back many of the osds to recent snapshots thereof, from which I'd >> cleaned all traces of the user.ceph._parent. I intended to roll back > > Err, I meant user.ceph._path, of course ;-) > >> So I think by now I'm happy to announce that it was an IO error (where >> IO stands for Incompetence of the Operator ;-) > >> Sorry about this disturbance, and thanks for asking me to investigate it >> further and find a probable cause that involves no fault of Ceph's. > > I guess after the successful --reset-journal, I get to clean up on my > own the journal files that are no longer used but that apparently won't > get cleaned up by the mds any more. Right? If they're in the "future" of the mds journal they'll get cleared out automatically as the MDS gets up to them (this is the pre-zeroing thing). If they're in the "past", yeah, you'll need to clear them up. Did you do that rollback via your cluster snapshot thing, or just local btrfs snaps? I don't think I want to add anything that makes it easy for people to break their filesystem like this. :p -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Jan 6, 2014, Gregory Farnum <greg@inktank.com> wrote: > On Fri, Dec 20, 2013 at 4:50 PM, Alexandre Oliva <oliva@gnu.org> wrote: >> On Dec 20, 2013, Alexandre Oliva <oliva@gnu.org> wrote: >> >>> back many of the osds to recent snapshots thereof, from which I'd >>> cleaned all traces of the user.ceph._parent. I intended to roll back >> >> Err, I meant user.ceph._path, of course ;-) >> >>> So I think by now I'm happy to announce that it was an IO error (where >>> IO stands for Incompetence of the Operator ;-) >> >>> Sorry about this disturbance, and thanks for asking me to investigate it >>> further and find a probable cause that involves no fault of Ceph's. >> >> I guess after the successful --reset-journal, I get to clean up on my >> own the journal files that are no longer used but that apparently won't >> get cleaned up by the mds any more. Right? > If they're in the "future" of the mds journal they'll get cleared out > automatically as the MDS gets up to them (this is the pre-zeroing > thing). If they're in the "past", yeah, you'll need to clear them up. After an ceph-mds --reset-journal, I don't see how they could be in the "future", but what do I know? > Did you do that rollback via your cluster snapshot thing, or just > local btrfs snaps? Local btrfs snaps. Had I rolled back to a cluster snapshot proper, there's no way this problem could have happened.
On Mon, Jan 6, 2014 at 8:15 PM, Alexandre Oliva <oliva@gnu.org> wrote: > On Jan 6, 2014, Gregory Farnum <greg@inktank.com> wrote: > >> On Fri, Dec 20, 2013 at 4:50 PM, Alexandre Oliva <oliva@gnu.org> wrote: >>> On Dec 20, 2013, Alexandre Oliva <oliva@gnu.org> wrote: >>> >>>> back many of the osds to recent snapshots thereof, from which I'd >>>> cleaned all traces of the user.ceph._parent. I intended to roll back >>> >>> Err, I meant user.ceph._path, of course ;-) >>> >>>> So I think by now I'm happy to announce that it was an IO error (where >>>> IO stands for Incompetence of the Operator ;-) >>> >>>> Sorry about this disturbance, and thanks for asking me to investigate it >>>> further and find a probable cause that involves no fault of Ceph's. >>> >>> I guess after the successful --reset-journal, I get to clean up on my >>> own the journal files that are no longer used but that apparently won't >>> get cleaned up by the mds any more. Right? > >> If they're in the "future" of the mds journal they'll get cleared out >> automatically as the MDS gets up to them (this is the pre-zeroing >> thing). If they're in the "past", yeah, you'll need to clear them up. > > After an ceph-mds --reset-journal, I don't see how they could be in the > "future", but what do I know? I wasn't entirely clear on the ordering of things — if for instance you rolled back OSDs such that the journal head pointed to an offset, and there was a hole following that offset, but you didn't roll back every OSD so there were still some log segments which didn't get deleted in your rollback or the reset-journal operation (because it stopped once it found a hole). -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mds: handle setxattr ceph.parent From: Alexandre Oliva <oliva@gnu.org> Enable clients to setxattr ceph.parent to update the parent xattr. Signed-off-by: Alexandre Oliva <oliva@gnu.org> --- src/mds/CInode.cc | 2 +- src/mds/Server.cc | 5 +++++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc index 1fc57fe..7c692d4 100644 --- a/src/mds/CInode.cc +++ b/src/mds/CInode.cc @@ -1009,7 +1009,7 @@ struct C_Inode_StoredBacktrace : public Context { void CInode::store_backtrace(Context *fin) { dout(10) << "store_backtrace on " << *this << dendl; - assert(is_dirty_parent()); + assert(!fin || is_dirty_parent()); auth_pin(this); diff --git a/src/mds/Server.cc b/src/mds/Server.cc index 6bb3aef..2afb6d7 100644 --- a/src/mds/Server.cc +++ b/src/mds/Server.cc @@ -3615,6 +3615,11 @@ void Server::handle_set_vxattr(MDRequest *mdr, CInode *cur, journal_and_reply(mdr, cur, 0, le, new C_MDS_inode_update_finish(mds, mdr, cur)); return; } + else if (name == "ceph.parent" && value == "") { + cur->store_backtrace(NULL); + reply_request(mdr, 0); + return; + } dout(10) << " unknown vxattr " << name << dendl; reply_request(mdr, -EINVAL);