From patchwork Thu Sep 27 21:56:15 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jim Schutt X-Patchwork-Id: 1515871 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id 4DB1440D8F for ; Thu, 27 Sep 2012 21:57:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755543Ab2I0V5B (ORCPT ); Thu, 27 Sep 2012 17:57:01 -0400 Received: from sentry-two.sandia.gov ([132.175.109.14]:54885 "EHLO sentry-two.sandia.gov" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755434Ab2I0V47 (ORCPT ); Thu, 27 Sep 2012 17:56:59 -0400 X-WSS-ID: 0MB13MY-0B-171-02 X-M-MSG: Received: from interceptor1.sandia.gov (interceptor1.sandia.gov [132.175.109.5]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sentry-two.sandia.gov (Postfix) with ESMTP id 1B052D2C734 for ; Thu, 27 Sep 2012 15:56:57 -0600 (MDT) Received: from sentry.sandia.gov (sentry.sandia.gov [132.175.109.20]) by interceptor1.sandia.gov (RSA Interceptor) for ; Thu, 27 Sep 2012 15:56:36 -0600 Received: from [132.175.109.1] by sentry.sandia.gov with ESMTP (SMTP Relay 01 (Email Firewall v6.3.2)); Thu, 27 Sep 2012 15:56:26 -0600 X-Server-Uuid: AF72F651-81B1-4134-BA8C-A8E1A4E620FF Received: from skynetrps1.sandia.gov (skynetrps1.sandia.gov [134.253.138.1]) by mailgate.sandia.gov (8.14.4/8.14.4) with ESMTP id q8RLuHul020934; Thu, 27 Sep 2012 15:56:18 -0600 From: "Jim Schutt" To: ceph-devel@vger.kernel.org cc: "Jim Schutt" Subject: [PATCH] PG: Do not discard op data too early Date: Thu, 27 Sep 2012 15:56:15 -0600 Message-ID: <1348782975-7082-1-git-send-email-jaschut@sandia.gov> X-Mailer: git-send-email 1.7.8.2 X-PMX-Version: 5.6.1.2065439, Antispam-Engine: 2.7.2.376379, Antispam-Data: 2012.9.27.214217 X-PMX-Spam: Gauge=IIIIIIII, Probability=8%, Report=' HTML_00_01 0.05, HTML_00_10 0.05, BODYTEXTP_SIZE_3000_LESS 0, BODY_SIZE_1600_1699 0, BODY_SIZE_2000_LESS 0, BODY_SIZE_5000_LESS 0, BODY_SIZE_7000_LESS 0, DATE_TZ_NA 0, __ANY_URI 0, __HAS_FROM 0, __HAS_MSGID 0, __HAS_X_MAILER 0, __MIME_TEXT_ONLY 0, __SANE_MSGID 0, __SUBJ_ALPHA_END 0, __TO_MALFORMED_2 0, __TO_NO_NAME 0, __URI_NO_PATH 0, __URI_NO_WWW 0, __URI_NS ' X-TMWD-Spam-Summary: TS=20120927215627; ID=1; SEV=2.3.1; DFV=B2012092721; IFV=NA; AIF=B2012092721; RPD=5.03.0010; ENG=NA; RPDID=7374723D303030312E30413031303230362E35303634434238422E303030383A534346535441543838363133332C73733D312C6667733D30; CAT=NONE; CON=NONE; SIG=AAAAAAAAAAAAAAAAAAAAAAAAfQ== X-MMS-Spam-Filter-ID: B2012092721_5.03.0010 MIME-Version: 1.0 X-WSS-ID: 7C7A14000I82170601-01-01 X-RSA-Inspected: yes X-RSA-Classifications: public X-RSA-Action: allow Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Under a sustained cephfs write load where the offered load is higher than the storage cluster write throughput, a backlog of replication ops that arrive via the cluster messenger builds up. The client message policy throttler, which should be limiting the total write workload accepted by the storage cluster, is unable to prevent it, for any value of osd_client_message_size_cap, under such an overload condition. The root cause is that op data is released too early, in op_applied(). If instead the op data is released at op deletion, then the limit imposed by the client policy throttler applies over the entire lifetime of the op, including commits of replication ops. That makes the policy throttler an effective means for an OSD to protect itself from a sustained high offered load, because it can effectively limit the total, cluster-wide resources needed to process in-progress write ops. Signed-off-by: Jim Schutt --- src/osd/ReplicatedPG.cc | 4 ---- 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc index a64abda..80bec2a 100644 --- a/src/osd/ReplicatedPG.cc +++ b/src/osd/ReplicatedPG.cc @@ -3490,10 +3490,6 @@ void ReplicatedPG::op_applied(RepGather *repop) dout(10) << "op_applied " << *repop << dendl; if (repop->ctx->op) repop->ctx->op->mark_event("op_applied"); - - // discard my reference to the buffer - if (repop->ctx->op) - repop->ctx->op->request->clear_data(); repop->applying = false; repop->applied = true;