[pve-devel] [PATCH pve-kernel] cherry-pick/backport amd{gpu, _sfh} fixes from ubuntu-jammy

Fabian Ebner f.ebner at proxmox.com
Fri Dec 10 10:24:13 CET 2021


Some users reported boot failures after updating to the latest 5.13
kernel[0] because of a crash in amdgpu.

The patch
    drm/amdgpu: fix uvd crash on Polaris12 during driver unloading
fixes
    d82e2c249c8f ("drm/amdgpu: Fix crash on device remove/driver
    unload")
which is present as a backport 838dfb5888ff in the impish tree. As
this is a supplement to the original one, fixing a crash with a
similar backtrace as the ones in the forum thread[0], this seems to be
the most promising.

The patch
    drm/amd/pm: avoid duplicate powergate/ungate setting
is related as it fixes
    bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12
    UVD/VCE on suspend")
which is the same commit that was fixed by 838dfb5888ff and has a Cc
for stable. A very slight adaptation of the surrounding code was
necessary for the patch to apply.

The patch
    drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works
    on vga and dvi connectors
is likely not related, but it seems simply enough, has a Cc for stable
and applied cleanly.

The patch (with the same title as the one it fixes)
    HID: amd_sfh: Fix potential NULL pointer dereference
fixes
    d46ef750ed58 ("HID: amd_sfh: Fix potential NULL pointer
    dereference")
which is present as a backport 56559d7910e7 in the impish tree and
seems like the most likely culprit for a different issue reported in
the same forum thread[1]. A very slight adaptation of the surrounding
code was necessary for the patch to apply.

[0]: https://forum.proxmox.com/threads/100825/
[1]: https://forum.proxmox.com/threads/100825/post-435329

Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
---
 ...x-potential-NULL-pointer-dereference.patch |  52 ++++++++
 ...vd-crash-on-Polaris12-during-driver-.patch |  71 +++++++++++
 ...et-scaling-mode-Full-Full-aspect-Cen.patch |  45 +++++++
 ...d-duplicate-powergate-ungate-setting.patch | 119 ++++++++++++++++++
 4 files changed, 287 insertions(+)
 create mode 100644 patches/kernel/0011-HID-amd_sfh-Fix-potential-NULL-pointer-dereference.patch
 create mode 100644 patches/kernel/0012-drm-amdgpu-fix-uvd-crash-on-Polaris12-during-driver-.patch
 create mode 100644 patches/kernel/0013-drm-amdgpu-fix-set-scaling-mode-Full-Full-aspect-Cen.patch
 create mode 100644 patches/kernel/0014-drm-amd-pm-avoid-duplicate-powergate-ungate-setting.patch

diff --git a/patches/kernel/0011-HID-amd_sfh-Fix-potential-NULL-pointer-dereference.patch b/patches/kernel/0011-HID-amd_sfh-Fix-potential-NULL-pointer-dereference.patch
new file mode 100644
index 0000000..993328e
--- /dev/null
+++ b/patches/kernel/0011-HID-amd_sfh-Fix-potential-NULL-pointer-dereference.patch
@@ -0,0 +1,52 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Basavaraj Natikar <Basavaraj.Natikar at amd.com>
+Date: Thu, 23 Sep 2021 17:59:27 +0530
+Subject: [PATCH] HID: amd_sfh: Fix potential NULL pointer dereference
+
+The cl_data field of a privdata must be allocated and updated before
+using in amd_sfh_hid_client_init() function.
+
+Hence handling NULL pointer cl_data accordingly.
+
+Fixes: d46ef750ed58 ("HID: amd_sfh: Fix potential NULL pointer dereference")
+Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar at amd.com>
+Signed-off-by: Jiri Kosina <jkosina at suse.cz>
+[trivial backport]
+Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
+---
+ drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 12 ++++--------
+ 1 file changed, 4 insertions(+), 8 deletions(-)
+
+diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
+index 9a1824757aae..05c007b213f2 100644
+--- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
++++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
+@@ -235,21 +235,17 @@ static int amd_mp2_pci_probe(struct pci_dev *pdev, const struct pci_device_id *i
+ 		return rc;
+ 	}
+ 
+-	rc = amd_sfh_hid_client_init(privdata);
+-	if (rc)
+-		return rc;
+-
+ 	privdata->cl_data = devm_kzalloc(&pdev->dev, sizeof(struct amdtp_cl_data), GFP_KERNEL);
+ 	if (!privdata->cl_data)
+ 		return -ENOMEM;
+ 
+-	rc = devm_add_action_or_reset(&pdev->dev, amd_mp2_pci_remove, privdata);
++	mp2_select_ops(privdata);
++
++	rc = amd_sfh_hid_client_init(privdata);
+ 	if (rc)
+ 		return rc;
+ 
+-	mp2_select_ops(privdata);
+-
+-	return 0;
++	return devm_add_action_or_reset(&pdev->dev, amd_mp2_pci_remove, privdata);
+ }
+ 
+ static const struct pci_device_id amd_mp2_pci_tbl[] = {
+-- 
+2.30.2
+
diff --git a/patches/kernel/0012-drm-amdgpu-fix-uvd-crash-on-Polaris12-during-driver-.patch b/patches/kernel/0012-drm-amdgpu-fix-uvd-crash-on-Polaris12-during-driver-.patch
new file mode 100644
index 0000000..59a4f57
--- /dev/null
+++ b/patches/kernel/0012-drm-amdgpu-fix-uvd-crash-on-Polaris12-during-driver-.patch
@@ -0,0 +1,71 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Evan Quan <evan.quan at amd.com>
+Date: Sat, 9 Oct 2021 17:35:36 +0800
+Subject: [PATCH] drm/amdgpu: fix uvd crash on Polaris12 during driver
+ unloading
+
+BugLink: https://bugs.launchpad.net/bugs/1951822
+
+[ Upstream commit 4fc30ea780e0a5c1c019bc2e44f8523e1eed9051 ]
+
+There was a change(below) target for such issue:
+d82e2c249c8f ("drm/amdgpu: Fix crash on device remove/driver unload")
+But the fix for VI ASICs was missing there. This is a supplement for
+that.
+
+Fixes: d82e2c249c8f ("drm/amdgpu: Fix crash on device remove/driver unload")
+
+Signed-off-by: Evan Quan <evan.quan at amd.com>
+Acked-by: Alex Deucher <alexander.deucher at amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
+Signed-off-by: Sasha Levin <sashal at kernel.org>
+Signed-off-by: Paolo Pisati <paolo.pisati at canonical.com>
+---
+ drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 24 +++++++++++++-----------
+ 1 file changed, 13 insertions(+), 11 deletions(-)
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+index bc571833632e..72f876290768 100644
+--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+@@ -543,6 +543,19 @@ static int uvd_v6_0_hw_fini(void *handle)
+ {
+ 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ 
++	cancel_delayed_work_sync(&adev->uvd.idle_work);
++
++	if (RREG32(mmUVD_STATUS) != 0)
++		uvd_v6_0_stop(adev);
++
++	return 0;
++}
++
++static int uvd_v6_0_suspend(void *handle)
++{
++	int r;
++	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
++
+ 	/*
+ 	 * Proper cleanups before halting the HW engine:
+ 	 *   - cancel the delayed idle work
+@@ -567,17 +580,6 @@ static int uvd_v6_0_hw_fini(void *handle)
+ 						       AMD_CG_STATE_GATE);
+ 	}
+ 
+-	if (RREG32(mmUVD_STATUS) != 0)
+-		uvd_v6_0_stop(adev);
+-
+-	return 0;
+-}
+-
+-static int uvd_v6_0_suspend(void *handle)
+-{
+-	int r;
+-	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+-
+ 	r = uvd_v6_0_hw_fini(adev);
+ 	if (r)
+ 		return r;
+-- 
+2.30.2
+
diff --git a/patches/kernel/0013-drm-amdgpu-fix-set-scaling-mode-Full-Full-aspect-Cen.patch b/patches/kernel/0013-drm-amdgpu-fix-set-scaling-mode-Full-Full-aspect-Cen.patch
new file mode 100644
index 0000000..b904bbd
--- /dev/null
+++ b/patches/kernel/0013-drm-amdgpu-fix-set-scaling-mode-Full-Full-aspect-Cen.patch
@@ -0,0 +1,45 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: hongao <hongao at uniontech.com>
+Date: Thu, 11 Nov 2021 11:32:07 +0800
+Subject: [PATCH] drm/amdgpu: fix set scaling mode Full/Full aspect/Center not
+ works on vga and dvi connectors
+
+BugLink: https://bugs.launchpad.net/bugs/1952579
+
+commit bf552083916a7f8800477b5986940d1c9a31b953 upstream.
+
+amdgpu_connector_vga_get_modes missed function amdgpu_get_native_mode
+which assign amdgpu_encoder->native_mode with *preferred_mode result in
+amdgpu_encoder->native_mode.clock always be 0. That will cause
+amdgpu_connector_set_property returned early on:
+if ((rmx_type != DRM_MODE_SCALE_NONE) &&
+	(amdgpu_encoder->native_mode.clock == 0))
+when we try to set scaling mode Full/Full aspect/Center.
+Add the missing function to amdgpu_connector_vga_get_mode can fix this.
+It also works on dvi connectors because
+amdgpu_connector_dvi_helper_funcs.get_mode use the same method.
+
+Signed-off-by: hongao <hongao at uniontech.com>
+Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
+Cc: stable at vger.kernel.org
+Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
+Signed-off-by: Andrea Righi <andrea.righi at canonical.com>
+---
+ drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+index b9c11c2b2885..0de66f59adb8 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+@@ -827,6 +827,7 @@ static int amdgpu_connector_vga_get_modes(struct drm_connector *connector)
+ 
+ 	amdgpu_connector_get_edid(connector);
+ 	ret = amdgpu_connector_ddc_get_modes(connector);
++	amdgpu_get_native_mode(connector);
+ 
+ 	return ret;
+ }
+-- 
+2.30.2
+
diff --git a/patches/kernel/0014-drm-amd-pm-avoid-duplicate-powergate-ungate-setting.patch b/patches/kernel/0014-drm-amd-pm-avoid-duplicate-powergate-ungate-setting.patch
new file mode 100644
index 0000000..8e638ae
--- /dev/null
+++ b/patches/kernel/0014-drm-amd-pm-avoid-duplicate-powergate-ungate-setting.patch
@@ -0,0 +1,119 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Evan Quan <evan.quan at amd.com>
+Date: Fri, 5 Nov 2021 15:25:30 +0800
+Subject: [PATCH] drm/amd/pm: avoid duplicate powergate/ungate setting
+
+BugLink: https://bugs.launchpad.net/bugs/1952579
+
+commit 6ee27ee27ba8b2e725886951ba2d2d87f113bece upstream.
+
+Just bail out if the target IP block is already in the desired
+powergate/ungate state. This can avoid some duplicate settings
+which sometimes may cause unexpected issues.
+
+Link: https://lore.kernel.org/all/YV81vidWQLWvATMM@zn.tnic/
+Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214921
+Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215025
+Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1789
+Fixes: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")
+Signed-off-by: Evan Quan <evan.quan at amd.com>
+Tested-by: Borislav Petkov <bp at suse.de>
+Reviewed-by: Lijo Lazar <lijo.lazar at amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
+Cc: stable at vger.kernel.org
+Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
+Signed-off-by: Andrea Righi <andrea.righi at canonical.com>
+[trivial backport]
+Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
+---
+ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +++
+ drivers/gpu/drm/amd/include/amd_shared.h   |  3 ++-
+ drivers/gpu/drm/amd/pm/amdgpu_dpm.c        | 10 ++++++++++
+ drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h    |  8 ++++++++
+ 4 files changed, 23 insertions(+), 1 deletion(-)
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+index c1e34aa5925b..96ca42bcfdbf 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+@@ -3387,6 +3387,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
+ 		adev->rmmio_size = pci_resource_len(adev->pdev, 2);
+ 	}
+ 
++	for (i = 0; i < AMD_IP_BLOCK_TYPE_NUM; i++)
++		atomic_set(&adev->pm.pwr_state[i], POWER_STATE_UNKNOWN);
++
+ 	adev->rmmio = ioremap(adev->rmmio_base, adev->rmmio_size);
+ 	if (adev->rmmio == NULL) {
+ 		return -ENOMEM;
+diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd/include/amd_shared.h
+index 257f280d3d53..bd077ea224a4 100644
+--- a/drivers/gpu/drm/amd/include/amd_shared.h
++++ b/drivers/gpu/drm/amd/include/amd_shared.h
+@@ -97,7 +97,8 @@ enum amd_ip_block_type {
+ 	AMD_IP_BLOCK_TYPE_ACP,
+ 	AMD_IP_BLOCK_TYPE_VCN,
+ 	AMD_IP_BLOCK_TYPE_MES,
+-	AMD_IP_BLOCK_TYPE_JPEG
++	AMD_IP_BLOCK_TYPE_JPEG,
++	AMD_IP_BLOCK_TYPE_NUM,
+ };
+ 
+ enum amd_clockgating_state {
+diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+index 03581d5b1836..08362d506534 100644
+--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
++++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+@@ -927,6 +927,13 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device *adev, uint32_t block
+ {
+ 	int ret = 0;
+ 	const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
++	enum ip_power_state pwr_state = gate ? POWER_STATE_OFF : POWER_STATE_ON;
++
++	if (atomic_read(&adev->pm.pwr_state[block_type]) == pwr_state) {
++		dev_dbg(adev->dev, "IP block%d already in the target %s state!",
++				block_type, gate ? "gate" : "ungate");
++		return 0;
++	}
+ 
+ 	switch (block_type) {
+ 	case AMD_IP_BLOCK_TYPE_UVD:
+@@ -979,6 +986,9 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device *adev, uint32_t block
+ 		break;
+ 	}
+ 
++	if (!ret)
++		atomic_set(&adev->pm.pwr_state[block_type], pwr_state);
++
+ 	return ret;
+ }
+ 
+diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
+index 98f1b3d8c1d5..16e3f72d31b9 100644
+--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
++++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
+@@ -417,6 +417,12 @@ struct amdgpu_dpm {
+ 	enum amd_dpm_forced_level forced_level;
+ };
+ 
++enum ip_power_state {
++	POWER_STATE_UNKNOWN,
++	POWER_STATE_ON,
++	POWER_STATE_OFF,
++};
++
+ struct amdgpu_pm {
+ 	struct mutex		mutex;
+ 	u32                     current_sclk;
+@@ -451,6 +457,8 @@ struct amdgpu_pm {
+ 	/* Used for I2C access to various EEPROMs on relevant ASICs */
+ 	struct i2c_adapter smu_i2c;
+ 	struct list_head	pm_attr_list;
++
++	atomic_t		pwr_state[AMD_IP_BLOCK_TYPE_NUM];
+ };
+ 
+ #define R600_SSTU_DFLT                               0
+-- 
+2.30.2
+
-- 
2.30.2






More information about the pve-devel mailing list