From patchwork Fri Feb 28 06:27:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thorsten Glaser X-Patchwork-Id: 13995776 Received: from evolvis.org (evolvis.org [217.144.135.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03D2923DE for ; Fri, 28 Feb 2025 06:27:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.144.135.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740724049; cv=none; b=gDed4P5pTCXJFQ8WZW9GAEC5ZlyobBBhfIl73syhqLz/mlRiy5UAJKY/7E0w5GJ2B615efBTq+loYCBjUOCejDOcfEhEKZ7gHL5XmRHB1doX+4Q8JqOZlHQsMqRL8tUhRuptSdcCHCAkiNLQG+Eh9/nPK8HPKkr/Fwu89zO6wi4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740724049; c=relaxed/simple; bh=X1I4nSo6AV8ZQidbUamKGbjYQi8ITJblHASwofa6Cyk=; h=Date:From:To:Subject:Message-ID:MIME-Version:Content-Type; b=bTEmLxOyAqKfJaKRHiOAlGyCqF8++a4OQFDOPau/Ebdp9y3oJwmI1Lp7QYZ7PVnBq4YjGI29Qc4+To3wKmNkeYJHps4n9udX3yZTqHVW2O0N+Tjd8dxloCEgGGiuZ6nOrD/jfoXmFiBNm49YumQCJl1RTjjJXXUhP7Z084MwpsQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org; spf=none smtp.mailfrom=debian.org; arc=none smtp.client-ip=217.144.135.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=debian.org Received: from x61p.mirbsd.org (2001-4dd7-25b2-0-21d-e0ff-fe77-c17b.ipv6dyn.netcologne.de [IPv6:2001:4dd7:25b2:0:21d:e0ff:fe77:c17b]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X448 server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: x61p@relay.evolvis.org) by evolvis.org (Postfix) with ESMTPSA id 7609F100119 for ; Fri, 28 Feb 2025 06:27:13 +0000 (UTC) Received: by x61p.mirbsd.org (Postfix, from userid 1000) id 5A529147132; Fri, 28 Feb 2025 07:27:12 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by x61p.mirbsd.org (Postfix) with ESMTP id 56522147131 for ; Fri, 28 Feb 2025 07:27:12 +0100 (CET) Date: Fri, 28 Feb 2025 07:27:12 +0100 (CET) From: Thorsten Glaser To: git@vger.kernel.org Subject: gitweb encoding issues (partial patch) Message-ID: <8e89b5d0-c913-0dcf-7c3a-62de5af02282@debian.org> Content-Language: de-Zsym-DE-1901-u-em-text-rg-denw-tz-utc, en-Zsym-GB-u-cu-eur-em-text-fw-mon-hc-h23-ms-metric-mu-celsius-rg-denw-tz-utc-va-posix Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Hi again, I also noticed a mojibake issue: a git diff that contains UTF-8 is double-encoded (converted to UTF-8 as if it were latin1 or something, even if it was already UTF-8), and this seems to be independent of the locale. I *think* using the to_utf8 sub on the content is the right fix, as it seems to check if it’s UTF-8, pass through if it is, and encode to UTF-8 (which the HTTP headers say is sent) if it’s not. Patch which fixes this, for commitdiff_plain and patch for me: This might probably need fixing in more places. While scrolling, I saw blobdiff; I have not identified all places needed and would appreciate the maintainer on your side doing so and fixing them. Thanks in advance, //mirabilos diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index b5490dfecf..434b1c01cd 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -8063,12 +8063,12 @@ sub git_commitdiff { } elsif ($format eq 'plain') { local $/ = undef; - print <$fd>; + print to_utf8(<$fd>); close $fd or print "Reading git-diff-tree failed\n"; } elsif ($format eq 'patch') { local $/ = undef; - print <$fd>; + print to_utf8(<$fd>); close $fd or print "Reading git-format-patch failed\n"; }