1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

amdgpu ERROR で デスクトップが停止する

Posted at

VLC とか、Slack とかを動かしているとデスクトップが停止してリセットをかけないとどうしようもない。
マウスやキーボードは動くけど、デスクトップの反応が無い。

環境

  • Ubuntu 22.04 LTS
  • Linux 5.19.0-35-generic
  • AMD Ryzen 5 5600G with Radeon Graphics
  • Display Full HD display + 4K display

syslog

以下は VLC を動かしたときのエラー。

VLC 関係のエラーが出たあと、以下のような kernel のエラーが記録されている。

Mar 13 08:29:20 nanbuwks-B550M-S2H kernel: [226780.597406] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec timeout, signaled seq=517415, emitted seq=517417
Mar 13 08:29:20 nanbuwks-B550M-S2H kernel: [226780.597555] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process vlc pid 479471 thread vlc:cs0 pid 479492
Mar 13 08:29:20 nanbuwks-B550M-S2H kernel: [226780.597675] amdgpu 0000:05:00.0: amdgpu: GPU reset begin!

TVの録画のファイルを見ているのだけど、信号受信エラーとなったエラーフレームを表示しようとしたときに起こっているみたい。

しかしながら それに限らず、 Slack アプリを使うときにも同じ症状が出る。

以下は Slack を動かしたときのエラー。

Mar 16 12:58:44 nanbuwks-B550M-S2H gnome-shell[1440]: Can't update stage views actor <unnamed>[<MetaWindowGroup>:0x55efdd174370] is on because it needs an allocation.
Mar 16 12:58:44 nanbuwks-B550M-S2H gnome-shell[1440]: Can't update stage views actor <unnamed>[<MetaWindowActorX11>:0x55efdfa2bf00] is on because it needs an allocation.
Mar 16 12:58:44 nanbuwks-B550M-S2H gnome-shell[1440]: Can't update stage views actor <unnamed>[<MetaSurfaceActorX11>:0x55efdfb78350] is on because it needs an allocation.
Mar 16 12:58:52 nanbuwks-B550M-S2H gnome-shell[1440]: Can't update stage views actor <unnamed>[<MetaWindowGroup>:0x55efdd174370] is on because it needs an allocation.
Mar 16 12:58:52 nanbuwks-B550M-S2H gnome-shell[1440]: Can't update stage views actor <unnamed>[<MetaWindowActorX11>:0x55efdfa2a760] is on because it needs an allocation.
Mar 16 12:58:52 nanbuwks-B550M-S2H gnome-shell[1440]: Can't update stage views actor <unnamed>[<MetaSurfaceActorX11>:0x55efdfb786d0] is on because it needs an allocation.
Mar 16 12:58:52 nanbuwks-B550M-S2H gnome-shell[1440]: Can't update stage views actor <unnamed>[<MetaWindowActorX11>:0x55efe07f1ed0] is on because it needs an allocation.
Mar 16 12:58:52 nanbuwks-B550M-S2H gnome-shell[1440]: Can't update stage views actor <unnamed>[<MetaSurfaceActorX11>:0x55efdfb79150] is on because it needs an allocation.
Mar 16 12:59:57 nanbuwks-B550M-S2H gnome-shell[1440]: Can't update stage views actor <unnamed>[<MetaWindowGroup>:0x55efdd174370] is on because it needs an allocation.
Mar 16 12:59:57 nanbuwks-B550M-S2H gnome-shell[1440]: Can't update stage views actor <unnamed>[<MetaWindowActorX11>:0x55efdfa2a760] is on because it needs an allocation.
Mar 16 12:59:57 nanbuwks-B550M-S2H gnome-shell[1440]: Can't update stage views actor <unnamed>[<MetaSurfaceActorX11>:0x55efdfb786d0] is on because it needs an allocation.
Mar 16 13:00:01 nanbuwks-B550M-S2H zeitgeist-fts[1838]: Unable to get info on application://zoom.desktop
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290347] amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:40 vmid:1 pasid:32773, for process slack pid 3984 thread slack:cs0 pid 4012)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290354] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800107228000 from IH client 0x12 (VMC)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290358] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00140051
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290360] amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290361] amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290362] amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290363] amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290364] amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290364] amdgpu 0000:05:00.0: amdgpu:          RW: 0x1
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290365] amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:40 vmid:1 pasid:32773, for process slack pid 3984 thread slack:cs0 pid 4012)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290367] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800107229000 from IH client 0x12 (VMC)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290370] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290371] amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290372] amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290373] amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290373] amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290374] amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290374] amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290597] amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:40 vmid:1 pasid:32773, for process slack pid 3984 thread slack:cs0 pid 4012)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290600] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800107228000 from IH client 0x12 (VMC)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290602] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00140051
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290603] amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290604] amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290605] amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290605] amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290606] amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290607] amdgpu 0000:05:00.0: amdgpu:          RW: 0x1
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290607] amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:40 vmid:1 pasid:32773, for process slack pid 3984 thread slack:cs0 pid 4012)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290609] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800107229000 from IH client 0x12 (VMC)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290611] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290611] amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290612] amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290613] amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290613] amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290614] amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290614] amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290855] amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:40 vmid:1 pasid:32773, for process slack pid 3984 thread slack:cs0 pid 4012)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290860] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800107228000 from IH client 0x12 (VMC)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290864] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00140051
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290865] amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290867] amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290868] amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290870] amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290871] amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290872] amdgpu 0000:05:00.0: amdgpu:          RW: 0x1
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290874] amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:40 vmid:1 pasid:32773, for process slack pid 3984 thread slack:cs0 pid 4012)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290877] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800107229000 from IH client 0x12 (VMC)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290880] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290882] amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290884] amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290885] amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290886] amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
Mar 16 13:00:09 nanbuwks-B550M-S2H kernel: [131134.290887] amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
           .
.
.
.
.
Mar 16 13:00:14 nanbuwks-B550M-S2H kernel: [131139.294917] amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
Mar 16 13:00:14 nanbuwks-B550M-S2H kernel: [131139.294918] amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 16 13:00:14 nanbuwks-B550M-S2H kernel: [131139.294919] amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Mar 16 13:00:14 nanbuwks-B550M-S2H kernel: [131139.493647] [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
Mar 16 13:00:19 nanbuwks-B550M-S2H kernel: [131139.493653] [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
Mar 16 13:00:19 nanbuwks-B550M-S2H kernel: [131144.297863] gmc_v9_0_process_interrupt: 38898 callbacks suppressed
・
・
・
Mar 16 13:00:19 nanbuwks-B550M-S2H kernel: [131144.298906] amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
Mar 16 13:00:19 nanbuwks-B550M-S2H kernel: [131144.298907] amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 16 13:00:19 nanbuwks-B550M-S2H kernel: [131144.298908] amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Mar 16 13:00:19 nanbuwks-B550M-S2H kernel: [131144.357644] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=3217946, emitted seq=3217948
Mar 16 13:00:19 nanbuwks-B550M-S2H kernel: [131144.357819] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process slack pid 3984 thread slack:cs0 pid 4012
Mar 16 13:00:19 nanbuwks-B550M-S2H kernel: [131144.357959] amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
Mar 16 13:00:19 nanbuwks-B550M-S2H kernel: [131144.617823] [drm] free PSP TMR buffer
・
・
・
Mar 16 13:00:19 nanbuwks-B550M-S2H kernel: [131144.978156] amdgpu 0000:05:00.0: amdgpu: SMU is resumed successfully!
Mar 16 13:00:19 nanbuwks-B550M-S2H kernel: [131144.979493] [drm] DMUB hardware initialized: version=0x0101001F
Mar 16 13:00:20 nanbuwks-B550M-S2H kernel: [131145.268036] [drm] kiq ring mec 2 pipe 1 q 0
Mar 16 13:00:20 nanbuwks-B550M-S2H kernel: [131145.425119] amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
Mar 16 13:00:20 nanbuwks-B550M-S2H kernel: [131145.425216] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
Mar 16 13:00:20 nanbuwks-B550M-S2H kernel: [131145.425295] amdgpu 0000:05:00.0: amdgpu: GPU reset(1) failed
Mar 16 13:00:20 nanbuwks-B550M-S2H kernel: [131145.425322] amdgpu 0000:05:00.0: amdgpu: GPU reset end with ret = -110
Mar 16 13:00:20 nanbuwks-B550M-S2H kernel: [131145.425324] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
Mar 16 13:00:21 nanbuwks-B550M-S2H discord.desktop[134589]: 2023-03-16T04:00:21.559Z [Modules] Checking for host updates.
Mar 16 13:00:22 nanbuwks-B550M-S2H discord.desktop[134589]: 2023-03-16T04:00:22.121Z [Modules] Host is up to date.
Mar 16 13:00:22 nanbuwks-B550M-S2H discord.desktop[134589]: 2023-03-16T04:00:22.121Z [Modules] Checking for module updates at https://discord.com/api/modules/stable/versions.json
Mar 16 13:00:22 nanbuwks-B550M-S2H discord.desktop[134589]: 2023-03-16T04:00:22.135Z [Modules] No module updates available.
Mar 16 13:00:24 nanbuwks-B550M-S2H kernel: [131149.301644] gmc_v9_0_process_interrupt: 33789 callbacks suppressed
・
・
・

カーネル起動パラメーターを追加する

「Latest Linux Kernel dislikes Rzyen Vega Mobile / Kernel & Hardware / Arch Linux Forums」
https://bbs.archlinux.org/viewtopic.php?id=250297

の内容を参考にして、

iommu=pt

をカーネルパラメータとして設定してみましたが・・・効果はなかったようです。

以下のやり方で設定しました。

「grub にカーネル起動パラメーターを追加する」
https://qiita.com/nanbuwks/items/657ce5677c58570a4f1c

カーネルをアップデート前に戻す

そういえば、エラーが発生し始めたのはこの1ヶ月だったなあ、ということで、カーネルを戻してみました。

'Ubuntu, with Linux 5.19.0-35-generic'

'Ubuntu, with Linux 5.15.0-60-generic'

以下の方法で5.15.0-60 にしました。

「Ubuntu でアップデートで問題の発生したカーネルを旧カーネルに戻す」
https://qiita.com/nanbuwks/items/f3d3949be8116b62bd5a

とりあえず、これで解消したようです。

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?