From patchwork Tue Nov 20 19:44:14 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Watkins X-Patchwork-Id: 1775361 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id D0987DFF38 for ; Tue, 20 Nov 2012 19:50:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751910Ab2KTTuk (ORCPT ); Tue, 20 Nov 2012 14:50:40 -0500 Received: from mail-gh0-f179.google.com ([209.85.160.179]:48206 "EHLO mail-gh0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751600Ab2KTTuj (ORCPT ); Tue, 20 Nov 2012 14:50:39 -0500 X-Greylist: delayed 384 seconds by postgrey-1.27 at vger.kernel.org; Tue, 20 Nov 2012 14:50:39 EST Received: by mail-gh0-f179.google.com with SMTP id r14so1236897ghr.10 for ; Tue, 20 Nov 2012 11:50:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucsc.edu; s=ucsc-google; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=+e/zTOkGO1PFMMuFBPA8kkfeqxB27uxgnFeV9Ykk4Eg=; b=Yoe0pj8cVkG0q/ZyYuNwibTjbRXdsOBXJbQr4NWO/vHL61mIGdscLEevifMwhkfvI5 peu7Xwm7PRPWGqUhL5OGLcwlvFxMH+XVrQKRaQNFPN+4sIMP110FLSzBLF7bpKGamEk2 usTIcZnil6+CS/L6h0oti3kk0k6sX9M1mDnps= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type :x-gm-message-state; bh=+e/zTOkGO1PFMMuFBPA8kkfeqxB27uxgnFeV9Ykk4Eg=; b=d5h+oVVr/jqb5KeDtyAUz5/tG23XbGJhEpjAXnLfCOEXJBE01sJcZNoF4wx0xx8DBf bA1HkrzsWx58MlRvYhN4M4z+Lkb5dczkhV3XbkQ4hDEf6Ohxvox0U1lY5Ki64pWWCqPp aTeTBBz/nZfUF14/aLQCkynwQmvtFiGDuwKPEj7F/NyU12xGPYRAY6oPYywmoSaj2/FN lfWNH5RXDtWqmamTNtDu2M0CgjqIG2TwJ6lNB+VMwpH9BmmG9sePbPA5wPGFO2rQJk3D 734hT75swi7d/DqFmfphkyxlmwaOo0zeUjQwdP3cina/bzPMI/ffsjBAX4RMtNlusqm2 7xwg== MIME-Version: 1.0 Received: by 10.236.77.39 with SMTP id c27mr16365712yhe.102.1353440654831; Tue, 20 Nov 2012 11:44:14 -0800 (PST) Received: by 10.146.110.4 with HTTP; Tue, 20 Nov 2012 11:44:14 -0800 (PST) Date: Tue, 20 Nov 2012 11:44:14 -0800 Message-ID: Subject: Hadoop and Ceph client/mds view of modification time From: Noah Watkins To: ceph-devel Cc: Gregory Farnum , Sage Weil X-Gm-Message-State: ALoCoQl+uTOIRwL7IQhgEJo+qydVAVnma8JXWNrPbupyAq8XKSazRSaS4stffPr6NnJdkyJoVkss Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org This is a description of the clock synchronization issue we are facing in Hadoop: Components of Hadoop use mtime as a versioning mechanism. Here is an example where Client B tests the expected 'version' of a file created by Client A: Client A: create file, write data into file. Client A: expected_mtime <-- lstat(file) Client A: broadcast expected_mtime to client B ... Client B: mtime <-- lstat(file) Client B: test expected_mtime == mtime Since mtime may be set in Ceph by both client and MDS, inconsistent mtime view is possible when clocks are not adequately synchronized. Here is a test that reproduces the problem. In the following output, issdm-18 has the MDS, and issdm-22 is a non-Ceph node with its time set to an hour earlier than the MDS node. nwatkins@issdm-22:~$ ssh issdm-18 date && ./test Tue Nov 20 11:40:28 PST 2012 // MDS TIME local time: Tue Nov 20 10:42:47 2012 // Client TIME fstat time: Tue Nov 20 11:40:28 2012 // mtime seen after file creation (MDS time) lstat time: Tue Nov 20 10:42:47 2012 // mtime seen after file write (client time) Here is the code used to produce that output. #include #include #include #include #include #include #include #include #include #include #include #include #include int main(int argc, char **argv) { struct stat st; struct ceph_mount_info *cmount; struct timeval tv; /* setup */ ceph_create(&cmount, "admin"); ceph_conf_read_file(cmount, "/users/nwatkins/Projects/ceph.conf"); ceph_mount(cmount, "/"); /* print local time for reference */ gettimeofday(&tv, NULL); printf("local time: %s", ctime(&tv.tv_sec)); /* create a file */ char buf[256]; sprintf(buf, "/somefile.%d", getpid()); int fd = ceph_open(cmount, buf, O_WRONLY|O_CREAT, 0); assert(fd > 0); /* get mtime for this new file */ memset(&st, 0, sizeof(st)); int ret = ceph_fstat(cmount, fd, &st); assert(ret == 0); printf("fstat time: %s", ctime(&st.st_mtime)); /* write some data into the file */ ret = ceph_write(cmount, fd, buf, sizeof(buf), -1); assert(ret == sizeof(buf)); ceph_close(cmount, fd); memset(&st, 0, sizeof(st)); ret = ceph_lstat(cmount, buf, &st); assert(ret == 0); printf("lstat time: %s", ctime(&st.st_mtime)); ceph_shutdown(cmount); return 0; } Note that this output is currently using the short patch from http://marc.info/?l=ceph-devel&m=133178637520337&w=2 which forces getattr to always go to the MDS. issued=" << yes << \ dendl; if (yes) --- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/src/client/Client.cc b/src/client/Client.cc index 4a9ae3c..2bb24b7 100644 --- a/src/client/Client.cc +++ b/src/client/Client.cc @@ -3858,7 +3858,7 @@ int Client::readlink(const char *relpath, char *buf, loff_t \ size) int Client::_getattr(Inode *in, int mask, int uid, int gid) { - bool yes = in->caps_issued_mask(mask); + bool yes = false; //in->caps_issued_mask(mask); ldout(cct, 10) << "_getattr mask " << ccap_string(mask) << "