Message ID | 53AABCF5.4050403@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 6/25/2014 5:13 AM, Tushar Behera wrote: > On 06/25/2014 03:59 AM, Laura Abbott wrote: >> On 6/24/2014 10:47 AM, Laura Abbott wrote: >>> On 6/23/2014 11:32 AM, Kevin Hilman wrote: >>>> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote: >>>>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience. >>>>> >>>>> On 06/19/2014 04:12 PM, Tushar Behera wrote: >>>>>> On 06/19/2014 03:02 PM, Tushar Behera wrote: >>>>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote: >>>>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote: >>>>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote: >>>>>>>>>> Sachin, >>>>>>>>>> >>>>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote: >>>>>>>>>>> >>>>>>>>>>> Tree/Branch: mainline >>>>>>>>>>> Git describe: v3.16-rc1-2-gebe0618 >>>>>>>>>>> Failed boot tests (console logs at the end) >>>>>>>>>>> =========================================== >>>>>>>>>>> exynos5420-arndale-octa: FAIL: arm-exynos_defconfig >>>>>>>>>>> ste-snowball: FAIL: arm-u8500_defconfig >>>>>>>>>> >>>>>>>>>> FYI... these failures are getting more consistent on my octa board, >>>>>>>>>> but still not failing every time. >>>>>>>>>> >>>>>>>>>> Kevin >>>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Kevin, >>>>>>>>> >>>>>>>>> Same here. >>>>>>>>> >>>>>>>>> Observation: If you soft-reset the board (through the jumpers) after >>>>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset >>>>>>>>> the board (by removing the power cord), the problem doesn't occur during >>>>>>>>> next iteration. >>>>>>>> >>>>>>>> I don't ever use the soft-reset, I only toggle the wall power. I >>>>>>>> don't ever actually remove the power cord though, I'm using a >>>>>>>> USB-controlled relay to toggle the wall power. >>>>>>>> >>>>>>>> Kevin >>>>>>>> >>>>>>> >>>>>>> Laura, >>>>>>> >>>>>>> We are getting following kernel panic [1] (not always, but quite >>>>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420) >>>>>>> board with upstream kernel. I haven't observed this issue with other >>>>>>> boards yet. >>>>>>> >>>>>>> This issue is observed when I am booting with uImage + dtb (within >>>>>>> roughly ~10 iterations). >>>>>>> >>>>>> >>>>>> Some more information: >>>>>> >>>>>> The boot logs are provided in pastebin, okay[2] and failed[3]. >>>>>> >>>>>> In case of boot failures, I am getting a higher value for vm_total_pages >>>>>> (684424 in [3]). In case of successful boot on my board, it is always >>>>>> 521232 [2] on my board. >>>> >>>> I can confirm that reverting the "Get rid of meminfo" patch gets the >>>> Octa board booting reliably again for me also. >>>> >>>> In case it helps, some boot logs for failures from the last copule >>>> linux-next build/boot cycles can be seen here: >>>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log >>>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log >>>> >>> >>> Sorry, I missed this yesterday. I'm going to take a look. >>> >> >> Were all of >> >> http://pastebin.com/1iLaizuL >> http://pastebin.com/5tdDt4GL >> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log >> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log >> >> collected on the same type of board with the same amount of DRAM? I'm seeing a >> different amount of total pages across all those logs. All the logs have the >> same lowmem limit so it seems like the upper bound was being calculated >> incorrectly for passing to free_area_init_node. Nothing is immediately jumping >> out at me so can you boot up with a small debug patch? >> >> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c >> index 659c75d..88eac1f 100644 >> --- a/arch/arm/mm/init.c >> +++ b/arch/arm/mm/init.c >> @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, >> unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES]; >> struct memblock_region *reg; >> >> + pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high); >> + __memblock_dump_all(); >> /* >> * initialise the zones. >> */ >> >> It would be helpful to do this across a few bootups to see if the values are >> actually consistent. I'll keep looking in the meantime. >> >> Thanks, >> Laura >> > > Thanks Laura for the pointer. In case of error, I am getting some random > memblock_add() calls from drivers/of/fdt.c:early_init_dt_scan_memory. > > The issue seems to be from u-boot, where it is not updating the memory > subnode properly. I have got a fix for the u-boot, which I am testing > right now. I will update tomorrow after I do some more test. > I'm concerned my change can stay as is if this is exposing an issue in u-boot. Asking people to change bootloaders rarely ends well. Can you elaborate on what u-boot is doing that would be exposing this issue? Thanks, Laura
On 06/26/2014 03:27 AM, Laura Abbott wrote: > On 6/25/2014 5:13 AM, Tushar Behera wrote: >> On 06/25/2014 03:59 AM, Laura Abbott wrote: >>> On 6/24/2014 10:47 AM, Laura Abbott wrote: >>>> On 6/23/2014 11:32 AM, Kevin Hilman wrote: >>>>> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote: >>>>>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience. >>>>>> >>>>>> On 06/19/2014 04:12 PM, Tushar Behera wrote: >>>>>>> On 06/19/2014 03:02 PM, Tushar Behera wrote: >>>>>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote: >>>>>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote: >>>>>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote: >>>>>>>>>>> Sachin, >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Tree/Branch: mainline >>>>>>>>>>>> Git describe: v3.16-rc1-2-gebe0618 >>>>>>>>>>>> Failed boot tests (console logs at the end) >>>>>>>>>>>> =========================================== >>>>>>>>>>>> exynos5420-arndale-octa: FAIL: arm-exynos_defconfig >>>>>>>>>>>> ste-snowball: FAIL: arm-u8500_defconfig >>>>>>>>>>> >>>>>>>>>>> FYI... these failures are getting more consistent on my octa board, >>>>>>>>>>> but still not failing every time. >>>>>>>>>>> >>>>>>>>>>> Kevin >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Kevin, >>>>>>>>>> >>>>>>>>>> Same here. >>>>>>>>>> >>>>>>>>>> Observation: If you soft-reset the board (through the jumpers) after >>>>>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset >>>>>>>>>> the board (by removing the power cord), the problem doesn't occur during >>>>>>>>>> next iteration. >>>>>>>>> >>>>>>>>> I don't ever use the soft-reset, I only toggle the wall power. I >>>>>>>>> don't ever actually remove the power cord though, I'm using a >>>>>>>>> USB-controlled relay to toggle the wall power. >>>>>>>>> >>>>>>>>> Kevin >>>>>>>>> >>>>>>>> >>>>>>>> Laura, >>>>>>>> >>>>>>>> We are getting following kernel panic [1] (not always, but quite >>>>>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420) >>>>>>>> board with upstream kernel. I haven't observed this issue with other >>>>>>>> boards yet. >>>>>>>> >>>>>>>> This issue is observed when I am booting with uImage + dtb (within >>>>>>>> roughly ~10 iterations). >>>>>>>> >>>>>>> >>>>>>> Some more information: >>>>>>> >>>>>>> The boot logs are provided in pastebin, okay[2] and failed[3]. >>>>>>> >>>>>>> In case of boot failures, I am getting a higher value for vm_total_pages >>>>>>> (684424 in [3]). In case of successful boot on my board, it is always >>>>>>> 521232 [2] on my board. >>>>> >>>>> I can confirm that reverting the "Get rid of meminfo" patch gets the >>>>> Octa board booting reliably again for me also. >>>>> >>>>> In case it helps, some boot logs for failures from the last copule >>>>> linux-next build/boot cycles can be seen here: >>>>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log >>>>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log >>>>> >>>> >>>> Sorry, I missed this yesterday. I'm going to take a look. >>>> >>> >>> Were all of >>> >>> http://pastebin.com/1iLaizuL >>> http://pastebin.com/5tdDt4GL >>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log >>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log >>> >>> collected on the same type of board with the same amount of DRAM? I'm seeing a >>> different amount of total pages across all those logs. All the logs have the >>> same lowmem limit so it seems like the upper bound was being calculated >>> incorrectly for passing to free_area_init_node. Nothing is immediately jumping >>> out at me so can you boot up with a small debug patch? >>> >>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c >>> index 659c75d..88eac1f 100644 >>> --- a/arch/arm/mm/init.c >>> +++ b/arch/arm/mm/init.c >>> @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, >>> unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES]; >>> struct memblock_region *reg; >>> >>> + pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high); >>> + __memblock_dump_all(); >>> /* >>> * initialise the zones. >>> */ >>> >>> It would be helpful to do this across a few bootups to see if the values are >>> actually consistent. I'll keep looking in the meantime. >>> >>> Thanks, >>> Laura >>> >> >> Thanks Laura for the pointer. In case of error, I am getting some random >> memblock_add() calls from drivers/of/fdt.c:early_init_dt_scan_memory. >> >> The issue seems to be from u-boot, where it is not updating the memory >> subnode properly. I have got a fix for the u-boot, which I am testing >> right now. I will update tomorrow after I do some more test. >> > > I'm concerned my change can stay as is if this is exposing an issue > in u-boot. Asking people to change bootloaders rarely ends well. Can > you elaborate on what u-boot is doing that would be exposing this > issue? > > Thanks, > Laura > > Laura, Here is my assessment of the current situation. *Bug in the u-boot* Current u-boot for Arndale-octa board has defined NR_BANKS as 12 and the core uses a global structure (gd->bd) to maintain the start and size of individual banks. Depending on the revision of SoC used on the board, the board file [1] updates the start/size for either 8 or 12 banks. In case of current revision of Arndale-Octa boards, the board file always updates start/size for 8 banks, leaving the start/size data for remaining 4 banks uninitialized. But the u-boot core[2] updates the value of all the 12 banks, thus potentially updating invalid data for last 4 banks. The issue can be fixed by resetting the start/size for unused memory banks to 0/0.[3] *Before migration to memblock* The path for adding DRAM banks was done through [4]. For Exynos systems, NR_BANKS was defined as 8. The initial check for rejecting any banks beyond NR_BANKS was good enough for fixing this issue. The bootlog[5] (with some debug messages) shows the invalid data, both in u-boot and kernel. Please grep for "NR_BANKS too low, ignoring memory" in the bootlog. *After migration to memblock* Now that the memory banks are added through [6], all the memory banks are getting updated unconditionally resulting in the panic. IMO, the bug is in u-boot and we should fix that. [1] https://github.com/tusharbehera/u-boot/blob/tracking-arndale-octa-v2012.07/board/samsung/smdk5420/smdk5420.c#L158 [2] https://github.com/tusharbehera/u-boot/blob/tracking-arndale-octa-v2012.07/arch/arm/lib/bootm.c#L80 [3] https://github.com/tusharbehera/u-boot/commit/9be794e886603a80f2c8686a75187ae67ac2158d [4] https://github.com/tusharbehera/linux/blob/v3.15-rc1/arch/arm/kernel/setup.c#L629 [5] http://pastebin.com/vLP2oG1mP [6] https://github.com/tusharbehera/linux/blob/v3.16-rc1/drivers/of/fdt.c#L878
Hi Tushar, > Here is my assessment of the current situation. Thanks for digging into this and the detailed diagnosis. > *Bug in the u-boot* > Current u-boot for Arndale-octa board has defined NR_BANKS as 12 and the > core uses a global structure (gd->bd) to maintain the start and size of > individual banks. Depending on the revision of SoC used on the board, > the board file [1] updates the start/size for either 8 or 12 banks. In > case of current revision of Arndale-Octa boards, the board file always > updates start/size for 8 banks, leaving the start/size data for > remaining 4 banks uninitialized. > > But the u-boot core[2] updates the value of all the 12 banks, thus > potentially updating invalid data for last 4 banks. > > The issue can be fixed by resetting the start/size for unused memory > banks to 0/0.[3] > > *Before migration to memblock* > The path for adding DRAM banks was done through [4]. For Exynos systems, > NR_BANKS was defined as 8. The initial check for rejecting any banks > beyond NR_BANKS was good enough for fixing this issue. The bootlog[5] > (with some debug messages) shows the invalid data, both in u-boot and > kernel. Please grep for "NR_BANKS too low, ignoring memory" in the bootlog. > > *After migration to memblock* > Now that the memory banks are added through [6], all the memory banks > are getting updated unconditionally resulting in the panic. > > IMO, the bug is in u-boot and we should fix that. I agree that the u-boot bug needs to be fixed, and FWIW, I updated my u-boot and haven't seen the boot failure yet after several boots with next-20140625. That being said, since it's not always feasible/practical to update u-boot, and when it comes down to it, this is still a kernel regression, we should also fix the kernel to sanity check the values coming from u-boot, like it was doing before. Could you (or Laura) come up with a way to recreate the sanity check that was detecting this problem before and ignoring those banks? Thanks, Kevin
On Thu, Jun 26, 2014 at 07:59:19AM -0700, Kevin Hilman wrote: > I agree that the u-boot bug needs to be fixed, and FWIW, I updated my > u-boot and haven't seen the boot failure yet after several boots with > next-20140625. > > That being said, since it's not always feasible/practical to update > u-boot, and when it comes down to it, this is still a kernel > regression, we should also fix the kernel to sanity check the values > coming from u-boot, like it was doing before. It wasn't sanity checking the values (there is some sanity checking, but the sanity checking doesn't catch this). What caught it was that the kernel was configured to only look at the first 8 of the 12 meminfo entries with ATAGs. Since we no longer have that limit, all meminfo entries are now looked at (since the kernel doesn't need the limit.) We could add back a soft-limit on the number of meminfo entries, but this has to be platform specific. Another entry to go into the mach_info structures?
Hi Kevin and Tushar, Am 26.06.2014 16:59, schrieb Kevin Hilman: >> IMO, the bug is in u-boot and we should fix that. > > I agree that the u-boot bug needs to be fixed, and FWIW, I updated my > u-boot and haven't seen the boot failure yet after several boots with > next-20140625. Could you clarify your test setup: Are you using the original InSignal SPL [1] with just your own u-boot.bin? Or do you have access to some newer Samsung-signed SPL? > That being said, since it's not always feasible/practical to update > u-boot, and when it comes down to it, this is still a kernel > regression, we should also fix the kernel to sanity check the values > coming from u-boot, like it was doing before. Sounds good. Apart from this memory issue here, I noticed that CPUs don't appear to be in HYP mode for virtualization, which had required a signed SPL update for the ODROID-XU [2]. And to me it looks as if there's no Arndale Octa support in upstream U-Boot [3], no real maintenance on the InSignal fork [4] and a policy of not cooperating with others [5]. Thanks, Andreas [1] http://forum.insignal.co.kr/viewtopic.php?f=6&t=3199 [2] http://forum.odroid.com/viewtopic.php?f=64&t=2778&start=40#p32581 [3] http://git.denx.de/?p=u-boot.git;a=blob;f=boards.cfg;h=947f2bc5ba2794c94b3b2cea04664f005e025f9f;hb=HEAD#l286 [4] http://git.insignal.co.kr/insignal/arndale_octa-jb_mr1.1/u-boot/ [5] http://forum.insignal.co.kr/viewtopic.php?f=40&t=3613
On 06/26/2014 10:34 PM, Andreas Färber wrote: > Hi Kevin and Tushar, > > Am 26.06.2014 16:59, schrieb Kevin Hilman: >>> IMO, the bug is in u-boot and we should fix that. >> >> I agree that the u-boot bug needs to be fixed, and FWIW, I updated my >> u-boot and haven't seen the boot failure yet after several boots with >> next-20140625. > > Could you clarify your test setup: Are you using the original InSignal > SPL [1] with just your own u-boot.bin? Or do you have access to some > newer Samsung-signed SPL? > The u-boot changes for Arndale-Octa was done as part of an activity within Linaro. Insignal had signed the SPL binary for us. You can extract the signed SPL binary from following hwpack[6] (tar xfz and then within u_boot folder[7]). The source code for this u-boot can be found here.[8] Just in case, commands to flash u-boot binaries are listed here.[9] >> That being said, since it's not always feasible/practical to update >> u-boot, and when it comes down to it, this is still a kernel >> regression, we should also fix the kernel to sanity check the values >> coming from u-boot, like it was doing before. > > Sounds good. > > Apart from this memory issue here, I noticed that CPUs don't appear to > be in HYP mode for virtualization, which had required a signed SPL > update for the ODROID-XU [2]. And to me it looks as if there's no > Arndale Octa support in upstream U-Boot [3], no real maintenance on the > InSignal fork [4] and a policy of not cooperating with others [5]. > Adding Arndale-Octa support to upstream U-Boot was on a TODO list, but that didn't materialize because of some other reasons. > Thanks, > Andreas > > [1] http://forum.insignal.co.kr/viewtopic.php?f=6&t=3199 > [2] http://forum.odroid.com/viewtopic.php?f=64&t=2778&start=40#p32581 > [3] > http://git.denx.de/?p=u-boot.git;a=blob;f=boards.cfg;h=947f2bc5ba2794c94b3b2cea04664f005e025f9f;hb=HEAD#l286 > [4] http://git.insignal.co.kr/insignal/arndale_octa-jb_mr1.1/u-boot/ > [5] http://forum.insignal.co.kr/viewtopic.php?f=40&t=3613 > [6] http://snapshots.linaro.org/kernel-hwpack/linux-linaro-tracking-ll-arndale-octa/442/hwpack_linaro-arndale-octa_20140626-442_armhf_supported.tar.gz [7] <path_to_extracted_folder>/u_boot/usr/lib/u-boot/arndale_octa [8] git.linaro.org/landing-teams/working/samsung/u-boot.git/shortlog/refs/heads/tracking-arndale_octa [9] http://pastebin.com/pfGF2giq Thanks,
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index c4cddf0..bca82b3 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -817,7 +817,7 @@ int __init early_init_dt_scan_memory(unsigned long node, const char *uname, endp = reg + (l / sizeof(__be32)); - pr_debug("memory scan node %s, reg size %d, data: %x %x %x %x,\n", + pr_err("memory scan node %s, reg size %d, data: %x %x %x %x,\n", uname, l, reg[0], reg[1], reg[2], reg[3]);