From patchwork Tue Oct 9 10:48:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Henriques X-Patchwork-Id: 10632223 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5B18913AA for ; Tue, 9 Oct 2018 10:47:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 553B1289F3 for ; Tue, 9 Oct 2018 10:47:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4960E289FC; Tue, 9 Oct 2018 10:47:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA4D7289F3 for ; Tue, 9 Oct 2018 10:47:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726569AbeJISDa (ORCPT ); Tue, 9 Oct 2018 14:03:30 -0400 Received: from mx2.suse.de ([195.135.220.15]:41766 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726479AbeJISD3 (ORCPT ); Tue, 9 Oct 2018 14:03:29 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 41C9CAECC; Tue, 9 Oct 2018 10:47:08 +0000 (UTC) From: Luis Henriques To: "Yan, Zheng" , Sage Weil , Ilya Dryomov , Gregory Farnum Cc: ceph-devel@vger.kernel.org, Luis Henriques Subject: [RFC PATCH v5 0/4] copy_file_range in cephfs kernel client Date: Tue, 9 Oct 2018 11:48:02 +0100 Message-Id: <20181009104806.6821-1-lhenriques@suse.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi, finally, here's a new iteration of my copy_file_range patchset. I've an extra patch (not included in this RFC) that adds tracepoints to this syscall. I wanted to know if that's something we would like to start including in the kernel cephfs client. I can prepare a new rev that includes this extra patch. And here's the changelog since v4: - Complete rewrite of ceph_copy_file_range function. Now the copy loop includes only the remote object copies, while do_splice_direct is invoked (at most) twice -- before and after object copy loop. - Check sizes after every put/get caps cycle - Added new checks to ensure the files being copied have the same layouts Changes since v3: - release/get caps before doing the do_splice_direct calls - if an error occurs after data has been copied already, return the amount of bytes already copied instead of the error - fix oid init/destroy (was broken since a pre-v1 version of the patch) - always mark Fw as dirty, without FILE_BUFFER - call file_update_time on destination file - added an extra mount option (nocopyfrom) which allows an admin to force a fallback to VFS copy_file_range - added a some more debug messages Changes since v2: - Files size checks are now done after we have all the required caps Here's the main changes since v1, after Zheng's review: 1. ceph_osdc_copy_from() now receives source and destination snapids instead of ceph_vino structs 2. Also get FILE_RD capabilities in ceph_copy_file_range() for source file as other clients may have dirty data in their cache. 3. Fallback to VFS copy_file_range default implementation if we're copying beyond source file EOF Note that 2. required an extra patch modifying ceph_try_get_caps() so that it could perform a non-blocking attempt at getting CEPH_CAP_FILE_RD capabilities. And here's the original (v1) RFC cover letter just for reference: This series is my initial attempt at getting a copy_file_range syscall implementation in the kernel cephfs client using the 'copy-from' RADOS operation. The idea of getting this implemented was from Greg -- or, at least, he created a feature in the tracker [1]. I just decided to give it a try as the feature wasn't assigned to anyone ;-) I have this patchset sitting on my laptop for a while already, waiting for me to revisit it, review some of its TODOs... but I finally decided to send it out as-is instead, to get some early feedback. The first patch implements the copy-from operation in the libceph module. Unfortunately, the documentation for this operation is nonexistent and I had to do a lot of digging to figure out the details (and I probably I missed something!). For example, initially I was hoping that this operation could be used to copy more than one object at the time. Doing an OSD request per object copy is not ideal, but unfortunately it seems to be the only way. Anyway, my expectations are that this new operation will be useful for other features in the future. The 2nd patch is where the copy_file_range is implemented and could probably be optimised, but I didn't bother with that for now. The important bit is that we still may need to do some manual copies if the offsets aren't object aligned or if the length is smaller than the object size. I'm using do_splice_direct() for the manual copies as it was the easiest way to get a PoC running, but maybe there are better ways. I've done some functional testing on this PoC. And it also passes the generic xfstest suite, in particular the copy_file_range specific tests (430-434). But I haven't done any benchmarks to measure any performance changes in using this syscall. Any feedback is welcome, specially regarding the TODOs on the code. [1] https://tracker.ceph.com/issues/21944 Luis Henriques (4): ceph: add non-blocking parameter to ceph_try_get_caps() ceph: support the RADOS copy-from operation ceph: support copy_file_range file operation ceph: new mount option to disable usage of RADOS 'copy-from' op Documentation/filesystems/ceph.txt | 5 + fs/ceph/addr.c | 2 +- fs/ceph/caps.c | 7 +- fs/ceph/file.c | 302 ++++++++++++++++++++++++++++- fs/ceph/super.c | 13 ++ fs/ceph/super.h | 3 +- include/linux/ceph/osd_client.h | 17 ++ include/linux/ceph/rados.h | 19 ++ net/ceph/osd_client.c | 72 +++++++ 9 files changed, 434 insertions(+), 6 deletions(-)