eMMC partition becomes "read only" / boot into initramfs

Discussion in 'UDOO X86' started by tojo2503, Jul 16, 2017.

  1. tojo2503

    tojo2503 New Member

    Joined:
    May 29, 2017
    Messages:
    24
    Likes Received:
    5
    Hi,
    my server works fine but I have a strange problem and don't know where to start debugging. Maybe some of you have an idea...

    - Server runs fine, but NOT reproduceable after some time (1-3 days) the eMMC parition on which Ubuntu is installed, becomes read-only
    - After a reboot, the system boots into initramfs. I have to type in "reboot" again.
    - Again it reboots into initramfs, but advices me to manually do a fsck on /dev/mmcblk0p1
    - fsck finds some errors, that can be repaired
    - If I type "exit", it boots normally

    This kind of sounds like a hardware problem, but I don't know where to start debugging because it's not reproduceable. Any advice so we can maybe narrow it down?
    System is a Udoo Advanced with 32Gb eMMC and a Xubuntu running, that boots into Kodi (see this guide: https://forum.kodi.tv/showthread.php?tid=282593).

    I'm willing to provide some logfiles, but syslog shows nothing interesting and I can't narrow down the time it failed...

    Thanks a lot!
     
  2. Markus Laire

    Markus Laire Active Member

    Joined:
    Mar 9, 2017
    Messages:
    225
    Likes Received:
    91
    Do you have new version of Xubuntu?

    Not sure if this is related, but with Debian Jessie I had a lot of I/O errors when it was installed in eMMC, so much that eventually system remounted filesystem in readonly mode. I was able to reproduce this easily by doing something which caused a lot of disk activity (e.g. watching YouTube video in browser).

    This was most likely a Jessie bug since Stretch works without problems. I didn't really debug this much as I moved to Stretch quite soon.

    To check if you have same problem, run "sudo dmesg" when filesystem becomes readonly and check for any I/O errors - there are two images of what I got in my short thread about this.
     
    tojo2503 likes this.
  3. tojo2503

    tojo2503 New Member

    Joined:
    May 29, 2017
    Messages:
    24
    Likes Received:
    5
    Hi,
    it's based on the newest Ubuntu 16.04.2 LTS as a minimal Xubuntu installation with SSH server. It actually looks like a kernel problem to me.
    Here are the last lines of the log (I suppose it becomes read-only afterwards):

    https://hastebin.com/uwulecixuc.xml

    This looks pretty much like this output Markus posted in the other thread.
    This goes too deep into Kernel / Hardware knowledge for me. Any advice how to debug / track down this error? Updated Kernel? With the experience in the other thread I'd say it's a Kernel problem and not a faulty eMMC. But I'm wondering why this does not occur to more users then.

    The first lines in the log (time-adjustment) also look strange to me...

    Thanks for your support!

    edit: if it helps: Kernel Version is 4.4.0-83-generic
     
    Last edited: Jul 17, 2017
  4. tojo2503

    tojo2503 New Member

    Joined:
    May 29, 2017
    Messages:
    24
    Likes Received:
    5
    OK, looks like it's actually a Kernel problem.
    I updated the Kernel to 4.8.0-58 with the following code:
    Code:
    sudo apt install --install-recommends linux-image-generic-hwe-16.04 xserver-xorg-hwe-16.04
    This seems to resolve the problem, no I/O errors, no interrupt errors and no reboots into initramfs and no need for a manual fsck for 1 day under heavy use.
    This means a relevant change must have been made between 4.4.0-83 and 4.8.0-58. I'm wondering that no one else ran into this problem, because Ubuntu 16.04 LTS comes with Kernel 4.4 which does not work properly and 16.04. is most likely to be used if you use Ubuntu and I doubt that there are different eMMC-Types around. This also explains why it didn't work with Debian Jessie and the old Kernel.

    Well, at least it's solved. :) Thanks a lot for the help @MarkusLaire.
     
    Markus Laire likes this.
  5. Markus Laire

    Markus Laire Active Member

    Joined:
    Mar 9, 2017
    Messages:
    225
    Likes Received:
    91
    Debian Stretch uses kernel 4.9, so this also fits with bug being fixed before 4.8 (Jessie had 3.16).
     
    LDighera likes this.
  6. Jeff

    Jeff New Member

    Joined:
    Jun 30, 2016
    Messages:
    2
    Likes Received:
    0
    I was having the same problem with the eMMC and also thought that it was hardware related, untill I found a solution in this thread.
    I would see random i/o-errors and interrupts for no reason. The eMMC was not particularly heavily used, I think. Ubuntu runs on the eMMC. Logging is the majority of activity on the eMMC, because I use a SATA- and a USB-disk for data-storage.
    Ubuntu 16.04.2 Server
    Samba / UPNP-server / NextCloud / BitcoinCore / Deluged with OpenVPN / OpenVPN Access Server / UPS-monitor
    Kernel-update done: 4.4.0-87 to 4.10.0-27
    I only saw it once boot slower then usual after errors were generated.
    Syslog now shows other errors, but at least the system reboots fine when needed. Here are 2 snippets from that log:
    Code:
    Jul 29 19:20:10 udoox86 kernel: [14726.071462] mmc0: Tuning timeout, falling back to fixed sampling clock
    Jul 29 19:20:20 udoox86 kernel: [14736.139694] mmc0: Timeout waiting for hardware interrupt.
    Jul 29 19:20:20 udoox86 kernel: [14736.147299] sdhci: =========== REGISTER DUMP (mmc0)===========
    Jul 29 19:20:20 udoox86 kernel: [14736.155006] sdhci: Sys addr: 0x00000008 | Version:  0x00001002
    Jul 29 19:20:20 udoox86 kernel: [14736.162812] sdhci: Blk size: 0x00007200 | Blk cnt:  0x00000008
    Jul 29 19:20:20 udoox86 kernel: [14736.170662] sdhci: Argument: 0x015d35e8 | Trn mode: 0x0000002b
    Jul 29 19:20:20 udoox86 kernel: [14736.178466] sdhci: Present:  0x1fff0106 | Host ctl: 0x00000034
    Jul 29 19:20:20 udoox86 kernel: [14736.186317] sdhci: Power:    0x0000000b | Blk gap:  0x00000080
    Jul 29 19:20:20 udoox86 kernel: [14736.194131] sdhci: Wake-up:  0x00000000 | Clock:    0x00000007
    Jul 29 19:20:20 udoox86 kernel: [14736.201971] sdhci: Timeout:  0x00000006 | Int stat: 0x00000000
    Jul 29 19:20:20 udoox86 kernel: [14736.209734] sdhci: Int enab: 0x02ff000b | Sig enab: 0x02ff000b
    Jul 29 19:20:20 udoox86 kernel: [14736.217416] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000
    Jul 29 19:20:20 udoox86 kernel: [14736.225104] sdhci: Caps:     0x446cc8b2 | Caps_1:   0x00000807
    Jul 29 19:20:20 udoox86 kernel: [14736.232667] sdhci: Cmd:      0x0000193a | Max curr: 0x00000000
    Jul 29 19:20:20 udoox86 kernel: [14736.240039] sdhci: Host ctl2: 0x0000000b
    Jul 29 19:20:20 udoox86 kernel: [14736.247336] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x35000200
    Jul 29 19:20:20 udoox86 kernel: [14736.254658] sdhci: ===========================================
    Jul 29 19:20:20 udoox86 kernel: [14736.268786] mmcblk0: error -84 sending status command, retrying
    Jul 29 19:20:20 udoox86 kernel: [14736.275646] mmcblk0: response CRC error sending r/w cmd command, card status 0x900
    
    Jul 29 22:43:54 udoox86 kernel: [26950.712158] mmc0: Tuning timeout, falling back to fixed sampling clock
    Jul 29 22:44:04 udoox86 kernel: [26960.796581] mmc0: Timeout waiting for hardware interrupt.
    Jul 29 22:44:04 udoox86 kernel: [26960.798723] sdhci: =========== REGISTER DUMP (mmc0)===========
    Jul 29 22:44:04 udoox86 kernel: [26960.800879] sdhci: Sys addr: 0x00000008 | Version:  0x00001002
    Jul 29 22:44:04 udoox86 kernel: [26960.803093] sdhci: Blk size: 0x00007200 | Blk cnt:  0x00000008
    Jul 29 22:44:04 udoox86 kernel: [26960.805312] sdhci: Argument: 0x015d37b0 | Trn mode: 0x0000002b
    Jul 29 22:44:04 udoox86 kernel: [26960.807528] sdhci: Present:  0x1fff0001 | Host ctl: 0x00000034
    Jul 29 22:44:04 udoox86 kernel: [26960.809727] sdhci: Power:    0x0000000b | Blk gap:  0x00000080
    Jul 29 22:44:04 udoox86 kernel: [26960.811952] sdhci: Wake-up:  0x00000000 | Clock:    0x00000007
    Jul 29 22:44:04 udoox86 kernel: [26960.814149] sdhci: Timeout:  0x00000006 | Int stat: 0x00000000
    Jul 29 22:44:04 udoox86 kernel: [26960.816324] sdhci: Int enab: 0x02ff000b | Sig enab: 0x02ff000b
    Jul 29 22:44:04 udoox86 kernel: [26960.818500] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000
    Jul 29 22:44:04 udoox86 kernel: [26960.820603] sdhci: Caps:     0x446cc8b2 | Caps_1:   0x00000807
    Jul 29 22:44:04 udoox86 kernel: [26960.822647] sdhci: Cmd:      0x0000193a | Max curr: 0x00000000
    Jul 29 22:44:04 udoox86 kernel: [26960.824674] sdhci: Host ctl2: 0x0000000b
    Jul 29 22:44:04 udoox86 kernel: [26960.826676] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x35000200
    Jul 29 22:44:04 udoox86 kernel: [26960.828708] sdhci: ===========================================
    Jul 29 22:44:04 udoox86 kernel: [26960.835077] mmcblk0: error -110 sending stop command, original cmd response 0x0, card status 0x400900
    Jul 29 22:44:04 udoox86 kernel: [26960.835130] mmcblk0: error -110 transferring data, sector 22886320, nr 8, cmd response 0x0, card status 0x0
    
    Does this mean that the eMMC is indeed defective?
    Or is there another software solution?
     
  7. tojo2503

    tojo2503 New Member

    Joined:
    May 29, 2017
    Messages:
    24
    Likes Received:
    5
    Hi,

    good to hear, that the solution worked for you. I'm having the same snippets you posted in my syslog. This doesn't look good but I'm not experiencing major issues (like the read-only problem). Does not look like a defective eMMC to me, but I'm not that deep into hardware related stuff...

    Best regards
     

Share This Page