Message ID | 20180518002214.5657-10-longli@linuxonhyperv.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote: > From: Long Li <longli@microsoft.com> > > When cache=rdma is enabled on mount options, CIFS do not allocate internal data > buffer pages for I/O, data is read/writen directly to user memory via RDMA. I don't think this should be an option. For direct I/O without signing or encryption CIFS should always use get_user_pages, with or without RDMA. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option > > On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote: > > From: Long Li <longli@microsoft.com> > > > > When cache=rdma is enabled on mount options, CIFS do not allocate > > internal data buffer pages for I/O, data is read/writen directly to user > memory via RDMA. > > I don't think this should be an option. For direct I/O without signing or > encryption CIFS should always use get_user_pages, with or without RDMA. Yes this should be done for all transport. If there are no objections, I'll send patches to change this. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 18, 2018 at 12:00 PM, Long Li via samba-technical <samba-technical@lists.samba.org> wrote: >> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option >> >> On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote: >> > From: Long Li <longli@microsoft.com> >> > >> > When cache=rdma is enabled on mount options, CIFS do not allocate >> > internal data buffer pages for I/O, data is read/writen directly to user >> memory via RDMA. >> >> I don't think this should be an option. For direct I/O without signing or >> encryption CIFS should always use get_user_pages, with or without RDMA. > > Yes this should be done for all transport. If there are no objections, I'll send patches to change this. Would this help/change performance much?
PiBTdWJqZWN0OiBSZTogW1JGQyBQQVRDSCAwOS8wOV0gSW50cm9kdWNlIGNhY2hlPXJkbWEgbW91 dG5pbmcgb3B0aW9uDQo+IA0KPiBPbiBGcmksIE1heSAxOCwgMjAxOCBhdCAxMjowMCBQTSwgTG9u ZyBMaSB2aWEgc2FtYmEtdGVjaG5pY2FsIDxzYW1iYS0NCj4gdGVjaG5pY2FsQGxpc3RzLnNhbWJh Lm9yZz4gd3JvdGU6DQo+ID4+IFN1YmplY3Q6IFJlOiBbUkZDIFBBVENIIDA5LzA5XSBJbnRyb2R1 Y2UgY2FjaGU9cmRtYSBtb3V0bmluZyBvcHRpb24NCj4gPj4NCj4gPj4gT24gVGh1LCBNYXkgMTcs IDIwMTggYXQgMDU6MjI6MTRQTSAtMDcwMCwgTG9uZyBMaSB3cm90ZToNCj4gPj4gPiBGcm9tOiBM b25nIExpIDxsb25nbGlAbWljcm9zb2Z0LmNvbT4NCj4gPj4gPg0KPiA+PiA+IFdoZW4gY2FjaGU9 cmRtYSBpcyBlbmFibGVkIG9uIG1vdW50IG9wdGlvbnMsIENJRlMgZG8gbm90IGFsbG9jYXRlDQo+ ID4+ID4gaW50ZXJuYWwgZGF0YSBidWZmZXIgcGFnZXMgZm9yIEkvTywgZGF0YSBpcyByZWFkL3dy aXRlbiBkaXJlY3RseSB0bw0KPiA+PiA+IHVzZXINCj4gPj4gbWVtb3J5IHZpYSBSRE1BLg0KPiA+ Pg0KPiA+PiBJIGRvbid0IHRoaW5rIHRoaXMgc2hvdWxkIGJlIGFuIG9wdGlvbi4gIEZvciBkaXJl Y3QgSS9PIHdpdGhvdXQNCj4gPj4gc2lnbmluZyBvciBlbmNyeXB0aW9uIENJRlMgc2hvdWxkIGFs d2F5cyB1c2UgZ2V0X3VzZXJfcGFnZXMsIHdpdGggb3INCj4gd2l0aG91dCBSRE1BLg0KPiA+DQo+ ID4gWWVzIHRoaXMgc2hvdWxkIGJlIGRvbmUgZm9yIGFsbCB0cmFuc3BvcnQuIElmIHRoZXJlIGFy ZSBubyBvYmplY3Rpb25zLCBJJ2xsIHNlbmQNCj4gcGF0Y2hlcyB0byBjaGFuZ2UgdGhpcy4NCj4g DQo+IFdvdWxkIHRoaXMgaGVscC9jaGFuZ2UgcGVyZm9ybWFuY2UgbXVjaD8NCg0KT24gUkRNQSwg aXQgaGVscHMgd2l0aCBJL08gbGF0ZW5jeSBhbmQgcmVkdWNlcyBDUFUgdXNhZ2Ugb24gY2VydGFp biBJL08gcGF0dGVybnMuDQoNCkJ1dCBJIGhhdmVuJ3QgdGVzdGVkIG9uIFRDUC4gTWF5YmUgaXQg d2lsbCBoZWxwIGEgbGl0dGxlIGJpdC4NCg0KPiANCj4gDQo+IA0KPiAtLQ0KPiBUaGFua3MsDQo+ IA0KPiBTdGV2ZQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/18/2018 1:58 PM, Long Li wrote: >> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option >> >> On Fri, May 18, 2018 at 12:00 PM, Long Li via samba-technical <samba- >> technical@lists.samba.org> wrote: >>>> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option >>>> >>>> On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote: >>>>> From: Long Li <longli@microsoft.com> >>>>> >>>>> When cache=rdma is enabled on mount options, CIFS do not allocate >>>>> internal data buffer pages for I/O, data is read/writen directly to >>>>> user >>>> memory via RDMA. >>>> >>>> I don't think this should be an option. For direct I/O without >>>> signing or encryption CIFS should always use get_user_pages, with or >> without RDMA. >>> >>> Yes this should be done for all transport. If there are no objections, I'll send >> patches to change this. >> >> Would this help/change performance much? > > On RDMA, it helps with I/O latency and reduces CPU usage on certain I/O patterns. > > But I haven't tested on TCP. Maybe it will help a little bit. Well, when the application requests direct i/o on a TCP connection, you definitely don't want to cache it! So even if the performance is different, correctness would dictate doing this. You probably don't need to pin the buffer in the TCP case, which might be worth avoiding. Tom. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/cifs/cifs_fs_sb.h b/fs/cifs/cifs_fs_sb.h index 350fa55..7c28dc3 100644 --- a/fs/cifs/cifs_fs_sb.h +++ b/fs/cifs/cifs_fs_sb.h @@ -51,6 +51,8 @@ */ #define CIFS_MOUNT_UID_FROM_ACL 0x2000000 /* try to get UID via special SID */ +#define CIFS_MOUNT_DIRECT_RDMA 0x4000000 + struct cifs_sb_info { struct rb_root tlink_tree; spinlock_t tlink_tree_lock; diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index a51855c..3bec63f 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -518,6 +518,7 @@ struct smb_vol { bool server_ino:1; /* use inode numbers from server ie UniqueId */ bool direct_io:1; bool strict_io:1; /* strict cache behavior */ + bool rdma_io:1; bool remap:1; /* set to remap seven reserved chars in filenames */ bool sfu_remap:1; /* remap seven reserved chars ala SFU */ bool posix_paths:1; /* unset to not ask for posix pathnames. */ diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 46d0cf4..c92b3d8 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -282,6 +282,7 @@ enum { Opt_cache_loose, Opt_cache_strict, Opt_cache_none, + Opt_cache_rdma, Opt_cache_err }; @@ -289,6 +290,7 @@ static const match_table_t cifs_cacheflavor_tokens = { { Opt_cache_loose, "loose" }, { Opt_cache_strict, "strict" }, { Opt_cache_none, "none" }, + { Opt_cache_rdma, "rdma" }, { Opt_cache_err, NULL } }; @@ -1128,6 +1130,9 @@ cifs_parse_cache_flavor(char *value, struct smb_vol *vol) vol->direct_io = true; vol->strict_io = false; break; + case Opt_cache_rdma: + vol->rdma_io = true; + break; default: cifs_dbg(VFS, "bad cache= option: %s\n", value); return 1; @@ -3612,6 +3617,10 @@ int cifs_setup_cifs_sb(struct smb_vol *pvolume_info, cifs_dbg(FYI, "mounting share using direct i/o\n"); cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_DIRECT_IO; } + if (pvolume_info->rdma_io) { + cifs_dbg(VFS, "mounting share using rdma i/o\n"); + cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_DIRECT_RDMA; + } if (pvolume_info->mfsymlinks) { if (pvolume_info->sfu_emul) { /* diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c index 81ba6e0..ce69b1c 100644 --- a/fs/cifs/dir.c +++ b/fs/cifs/dir.c @@ -557,6 +557,11 @@ cifs_atomic_open(struct inode *inode, struct dentry *direntry, file->f_op = &cifs_file_direct_ops; } + if (file->f_flags & O_DIRECT && + CIFS_SB(inode->i_sb)->mnt_cifs_flags & CIFS_MOUNT_DIRECT_RDMA) + file->f_op = &cifs_file_direct_rdma_ops; + + file_info = cifs_new_fileinfo(&fid, file, tlink, oplock); if (file_info == NULL) { if (server->ops->close) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0b394db..30ccf6a 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -492,6 +492,10 @@ int cifs_open(struct inode *inode, struct file *file) file->f_op = &cifs_file_direct_ops; } + if (file->f_flags & O_DIRECT && + cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_RDMA) + file->f_op = &cifs_file_direct_rdma_ops; + if (server->oplocks) oplock = REQ_OPLOCK; else diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index 3c371f7..7991298 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -44,7 +44,9 @@ static void cifs_set_ops(struct inode *inode) switch (inode->i_mode & S_IFMT) { case S_IFREG: inode->i_op = &cifs_file_inode_ops; - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_IO) { + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_RDMA) + inode->i_fop = &cifs_file_direct_rdma_ops; + else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_IO) { if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_BRL) inode->i_fop = &cifs_file_direct_nobrl_ops; else