Hi,

My system sometimes crashes suddenly and reboots itself. It’s random, browsing web, idling, checking mails, I couldn’t find the trigger. This is the only log I could find about the crash

mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 5: baa0000000030150 microcode: CPU23: patch_level=0x0a201025 fbcon: Taking over console mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d000002 IPID 500b000000000 mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1689019332 SOCKET 0 APIC 2 microcode a201025

EDIT: my thermals are fine btw, 40C at idle and 70C at max on heavy tasks

  • mvirts@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    1 year ago

    Woohoo an mce. If it’s always the same core you could disable it with some thing like ‘echo 0 > /sys/devices/system/cpu/cpu3/online’

    This would have to be run every boot, there may be kernel options to do the same thing.

      • mvirts@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        Lol those cores are totally there for redundancy… Right? :P

        I have an old itanium server that ‘boots’ with like 3/8 working cores… Unfortunately the hardware has some other unknown issues that panic Linux shortly after loading. Somehow the efi system seems to be stable…

    • lnxtx@feddit.nl
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      TIL

      I can save like 20 W per real core. Nice tip for a home server.