diff mbox

[linux-4.1,test] 79008: regressions - FAIL

Message ID 1454422203.28781.164.camel@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ian Campbell Feb. 2, 2016, 2:10 p.m. UTC
On Wed, 2016-01-27 at 12:05 +0000, Ian Campbell wrote:
> On Wed, 2016-01-27 at 11:18 +0000, Ian Campbell wrote:
> > On Tue, 2016-01-26 at 13:11 +0000, osstest service owner wrote:
> > > flight 79008 linux-4.1 real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/79008/
> > > 
> > > Regressions :-(
> > > 
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > >  test-armhf-armhf-xl-credit2  15 guest-start/debian.repeat fail REGR.
> > > vs. 66399
> > >  test-armhf-armhf-xl-xsm      15 guest-start/debian.repeat fail REGR.
> > > vs. 66399
> > 
> > These were both:
> > 
> > 2016-01-26 01:20:33 Z executing ssh ... root@172.16.147.101 echo guest
> > debian.guest.osstest: ok 
> > Warning: Permanently added '172.16.147.101' (ECDSA) to the list of
> > known hosts.
> > key_verify failed for server_host_key

So I've narrowed this down a bit, but not yet sufficiently to actually
diagnose.

The issue only occurs when the userspace is Debian Jessie. Debian Wheezy
does not, for some reason, expose this. It seems unlikely (although not
impossible) to be a real issue in Jessie vs Wheezy, more likely some
different behaviour in Jessie's sshd just exposes some issue somewhere
else.

When running Jessie userspace the issue only appeared somewhere between
Linux v3.18 and v3.19, I'm currently looking at bisecting that range in
case the commit which exposed the issue gives a hint (I fear it wont
though).

The attached ts-fetch-check-file exposes this pretty readily against either
dom0 or domU (slightly differing symptoms) on effected versions.

I can reproduce on the cubietruck on my desk as well as in the COLO.

I cannot reproduce on the same cubietruck on my desk when running native
4.1, it only happens when running under Xen.

I cannot reproduce on the arndale on my desk. I've not tried in the COLO
since the test results suggest there would be no point -- there is no sign
of this class of failure in the colo on arndale nor on any x86 box.

Ian.

Comments

Ian Campbell Feb. 5, 2016, 2:51 p.m. UTC | #1
On Tue, 2016-02-02 at 14:10 +0000, Ian Campbell wrote:

> When running Jessie userspace the issue only appeared somewhere between
> Linux v3.18 and v3.19, I'm currently looking at bisecting that range in
> case the commit which exposed the issue gives a hint (I fear it wont
> though).

Bisecting the dom0 failure lead me to:

3567258d281b5b515d5165ed23851d9f84087e7d is the first bad commit
commit 3567258d281b5b515d5165ed23851d9f84087e7d
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Fri Nov 21 11:05:39 2014 +0000

    xen/arm: use hypercall to flush caches in map_page
    
    In xen_dma_map_page, if the page is a local page, call the native
    map_page dma_ops. If the page is foreign, call __xen_dma_map_page that
    issues any required cache maintenane operations via hypercall.
    
    The reason for doing this is that the native dma_ops map_page could
    allocate buffers than need to be freed. If the page is foreign we don't
    call the native unmap_page dma_ops function, resulting in a memory leak.
    
    Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

However AIUI this commit is supposed to be a NOP for all dom0 initiated
I/O, which is all which should be occurring in a test which only involves
ssh to dom0.

Something to do with cache flushes, dma and/or barriers does seem like a
plausible candidate for there error though.

Ian.

NB last few were combined with 
    git cherry-pick --no-commit 28603d13997e2ef47f18589cc9a44553aad49c86
else the NIC driver just crashes on boot.

git bisect start
# bad: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
git bisect bad bfa76d49576599a4b9f9b7a71f23d73d6dcff735
# good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18
git bisect good b2776bf7149bddd1f4161f14f79520f17fc1d71d
# bad: [54850e73e86e3bc092680d1bdb84eb322f982ab1] zram: change parameter from vaild_io_request()
git bisect bad 54850e73e86e3bc092680d1bdb84eb322f982ab1
# good: [6b9e2cea428cf7af93a84bcb865e478d8bf1c165] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
git bisect good 6b9e2cea428cf7af93a84bcb865e478d8bf1c165
# good: [b5f185f33d0432cef6ff78765e033dfa8f4de068] Merge tag 'master-2014-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
git bisect good b5f185f33d0432cef6ff78765e033dfa8f4de068
# good: [bae41e45b7400496b9bf0c70c6004419d9987819] Merge tag 'sound-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect good bae41e45b7400496b9bf0c70c6004419d9987819
# good: [c0222ac086669a631814bbf857f8c8023452a4d7] Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
git bisect good c0222ac086669a631814bbf857f8c8023452a4d7
# bad: [a7cb7bb664543e4562ab0e9a072470d2d18c761f] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
git bisect bad a7cb7bb664543e4562ab0e9a072470d2d18c761f
# bad: [9bfccec24e31f4f83445cfe0c1b0a5ef97900628] Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
git bisect bad 9bfccec24e31f4f83445cfe0c1b0a5ef97900628
# bad: [4e8790f77f051d4cc745a57b48a73052521e8dfc] Merge branch 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
git bisect bad 4e8790f77f051d4cc745a57b48a73052521e8dfc
# skip: [b1df4a56bf4a61113e8928f932d346bed6eef553] xen/pciback: Restore configuration space when detaching from a guest.
git bisect skip b1df4a56bf4a61113e8928f932d346bed6eef553
# bad: [9d050966e2eb37a643ac15904b6a8fda7fcfabe9] Merge tag 'stable/for-linus-3.19-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
git bisect bad 9d050966e2eb37a643ac15904b6a8fda7fcfabe9
# bad: [9490c6c67e2f41760de8ece4e4f56f75f84ceb9e] swiotlb-xen: call xen_dma_sync_single_for_device when appropriate
git bisect bad 9490c6c67e2f41760de8ece4e4f56f75f84ceb9e
# good: [a0f2dee0cd651efb5fac6a1d35b0a14460ebcdd4] xen: add a dma_addr_t dev_addr argument to xen_dma_map_page
git bisect good a0f2dee0cd651efb5fac6a1d35b0a14460ebcdd4
# bad: [a4dba130891271084344c12537731542ec77cb85] xen/arm/arm64: introduce xen_arch_need_swiotlb
git bisect bad a4dba130891271084344c12537731542ec77cb85
# bad: [3567258d281b5b515d5165ed23851d9f84087e7d] xen/arm: use hypercall to flush caches in map_page
git bisect bad 3567258d281b5b515d5165ed23851d9f84087e7d
# first bad commit: [3567258d281b5b515d5165ed23851d9f84087e7d] xen/arm: use hypercall to flush caches in map_page
diff mbox

Patch

From 337f663c13e46f815ce1f13b070b492f8d248b0c Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.campbell@citrix.com>
Date: Tue, 2 Feb 2016 10:54:42 +0000
Subject: [PATCH] ts-fetch-check-file: new ts to fetch a file and check for
 corruption

Compares a checksum computed on the target with one computed after
cat'ting the file over ssh. Picks up on network corruption errors etc
which might be missed with smaller interactions.

Works for guests or hosts.

To support this add a variant of target_cmd_output which returns the
file descriptor instead of the actual data (which could be large),
allowing us to pipe it to the local sum.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 Osstest/TestSupport.pm | 10 ++++++--
 ts-fetch-check-file    | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 2 deletions(-)
 create mode 100755 ts-fetch-check-file

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 2141905..3c287b2 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -51,6 +51,7 @@  BEGIN {
 
                       target_cmd_root target_cmd target_cmd_build
                       target_cmd_output_root target_cmd_output
+                      target_cmd_stdoutfd target_cmd_stdoutfd_root
                       target_cmd_inputfh_root sshuho
                       target_getfile target_getfile_root
                       target_putfile target_putfile_root
@@ -646,9 +647,11 @@  sub target_cmd ($$;$$) { tcmd(undef,undef,'osstest',@_); }
 sub target_cmd_root ($$;$$) { tcmd(undef,undef,'root',@_); }
 
 sub tcmdout {
+    my $wantfd = shift;
     my $stdout= IO::File::new_tmpfile();
     tcmd(undef,$stdout,@_);
     $stdout->seek(0,0) or die "$stdout $!";
+    return $stdout if $wantfd;
     my $r;
     { local ($/) = undef;
       $r= <$stdout>; }
@@ -657,8 +660,11 @@  sub tcmdout {
     return $r;
 }
 
-sub target_cmd_output ($$;$) { tcmdout('osstest',@_); }
-sub target_cmd_output_root ($$;$) { tcmdout('root',@_); }
+sub target_cmd_output ($$;$) { tcmdout(0,'osstest',@_); }
+sub target_cmd_output_root ($$;$) { tcmdout(0,'root',@_); }
+
+sub target_cmd_stdoutfd ($$;$$) { tcmdout(1,'osstest',@_); }
+sub target_cmd_stdoutfd_root ($$;$$) { tcmdout(1,'root',@_); }
 
 sub target_cmd_inputfh_root ($$$;$$) {
     my ($tho,$stdinfh,$tcmd,@rest) = @_;
diff --git a/ts-fetch-check-file b/ts-fetch-check-file
new file mode 100755
index 0000000..bfceb6b
--- /dev/null
+++ b/ts-fetch-check-file
@@ -0,0 +1,68 @@ 
+#!/usr/bin/perl -w
+# This is part of "osstest", an automated testing framework for Xen.
+# Copyright (C) 2016 Citrix Inc.
+# 
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+# 
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+use strict qw(vars);
+use DBI;
+use Osstest;
+use Osstest::TestSupport;
+
+use IO::Pipe;
+
+tsreadconfig();
+
+our ($whhost,$guest) = @ARGV;
+$whhost ||= 'host';
+
+our ($ho,$gho);
+our $fn = "/bin/bash"; # reasonable size, present in most guests and hosts
+
+$ho= selecthost($whhost);
+$gho= selectguest($guest,$ho) if $guest;
+
+sub fetch_and_check_file ($$) {
+    my ($t,$fn) = @_;
+
+    target_check_ip($t);
+
+    target_cmd_root($t, "ls -lH $fn");
+
+    my $expect = target_cmd_output_root($t,"sum $fn");
+    logm($expect);
+
+    my $stdout= target_cmd_stdoutfd_root($t,"cat $fn",5,[qw(-v)]);
+
+    my $pipe= IO::Pipe->new();
+    my $child= fork;  die $! unless defined $child;
+    if (!$child) {
+	$pipe->writer();
+	open STDIN, "<&", $stdout or die "STDIN $!";
+	open STDOUT, ">&", $pipe or die "STDOUT $!";
+	exec("sum") or die "pipe writer $!";
+    }
+
+    $pipe->reader();
+
+    my $got = <$pipe>;
+    chomp($got);
+
+    logm("expected $expect");
+    logm("got      $got");
+
+    die unless $expect eq $got;
+}
+
+fetch_and_check_file($gho ? $gho : $ho, $fn);
-- 
2.6.1