[pve-devel] [PATCH pve-kernel] cherry-pick/backport amd{gpu, _sfh} fixes from ubuntu-jammy
Fabian Ebner
f.ebner at proxmox.com
Fri Dec 10 10:24:13 CET 2021
Some users reported boot failures after updating to the latest 5.13
kernel[0] because of a crash in amdgpu.
The patch
drm/amdgpu: fix uvd crash on Polaris12 during driver unloading
fixes
d82e2c249c8f ("drm/amdgpu: Fix crash on device remove/driver
unload")
which is present as a backport 838dfb5888ff in the impish tree. As
this is a supplement to the original one, fixing a crash with a
similar backtrace as the ones in the forum thread[0], this seems to be
the most promising.
The patch
drm/amd/pm: avoid duplicate powergate/ungate setting
is related as it fixes
bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12
UVD/VCE on suspend")
which is the same commit that was fixed by 838dfb5888ff and has a Cc
for stable. A very slight adaptation of the surrounding code was
necessary for the patch to apply.
The patch
drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works
on vga and dvi connectors
is likely not related, but it seems simply enough, has a Cc for stable
and applied cleanly.
The patch (with the same title as the one it fixes)
HID: amd_sfh: Fix potential NULL pointer dereference
fixes
d46ef750ed58 ("HID: amd_sfh: Fix potential NULL pointer
dereference")
which is present as a backport 56559d7910e7 in the impish tree and
seems like the most likely culprit for a different issue reported in
the same forum thread[1]. A very slight adaptation of the surrounding
code was necessary for the patch to apply.
[0]: https://forum.proxmox.com/threads/100825/
[1]: https://forum.proxmox.com/threads/100825/post-435329
Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
---
...x-potential-NULL-pointer-dereference.patch | 52 ++++++++
...vd-crash-on-Polaris12-during-driver-.patch | 71 +++++++++++
...et-scaling-mode-Full-Full-aspect-Cen.patch | 45 +++++++
...d-duplicate-powergate-ungate-setting.patch | 119 ++++++++++++++++++
4 files changed, 287 insertions(+)
create mode 100644 patches/kernel/0011-HID-amd_sfh-Fix-potential-NULL-pointer-dereference.patch
create mode 100644 patches/kernel/0012-drm-amdgpu-fix-uvd-crash-on-Polaris12-during-driver-.patch
create mode 100644 patches/kernel/0013-drm-amdgpu-fix-set-scaling-mode-Full-Full-aspect-Cen.patch
create mode 100644 patches/kernel/0014-drm-amd-pm-avoid-duplicate-powergate-ungate-setting.patch
diff --git a/patches/kernel/0011-HID-amd_sfh-Fix-potential-NULL-pointer-dereference.patch b/patches/kernel/0011-HID-amd_sfh-Fix-potential-NULL-pointer-dereference.patch
new file mode 100644
index 0000000..993328e
--- /dev/null
+++ b/patches/kernel/0011-HID-amd_sfh-Fix-potential-NULL-pointer-dereference.patch
@@ -0,0 +1,52 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Basavaraj Natikar <Basavaraj.Natikar at amd.com>
+Date: Thu, 23 Sep 2021 17:59:27 +0530
+Subject: [PATCH] HID: amd_sfh: Fix potential NULL pointer dereference
+
+The cl_data field of a privdata must be allocated and updated before
+using in amd_sfh_hid_client_init() function.
+
+Hence handling NULL pointer cl_data accordingly.
+
+Fixes: d46ef750ed58 ("HID: amd_sfh: Fix potential NULL pointer dereference")
+Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar at amd.com>
+Signed-off-by: Jiri Kosina <jkosina at suse.cz>
+[trivial backport]
+Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
+---
+ drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 12 ++++--------
+ 1 file changed, 4 insertions(+), 8 deletions(-)
+
+diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
+index 9a1824757aae..05c007b213f2 100644
+--- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
++++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
+@@ -235,21 +235,17 @@ static int amd_mp2_pci_probe(struct pci_dev *pdev, const struct pci_device_id *i
+ return rc;
+ }
+
+- rc = amd_sfh_hid_client_init(privdata);
+- if (rc)
+- return rc;
+-
+ privdata->cl_data = devm_kzalloc(&pdev->dev, sizeof(struct amdtp_cl_data), GFP_KERNEL);
+ if (!privdata->cl_data)
+ return -ENOMEM;
+
+- rc = devm_add_action_or_reset(&pdev->dev, amd_mp2_pci_remove, privdata);
++ mp2_select_ops(privdata);
++
++ rc = amd_sfh_hid_client_init(privdata);
+ if (rc)
+ return rc;
+
+- mp2_select_ops(privdata);
+-
+- return 0;
++ return devm_add_action_or_reset(&pdev->dev, amd_mp2_pci_remove, privdata);
+ }
+
+ static const struct pci_device_id amd_mp2_pci_tbl[] = {
+--
+2.30.2
+
diff --git a/patches/kernel/0012-drm-amdgpu-fix-uvd-crash-on-Polaris12-during-driver-.patch b/patches/kernel/0012-drm-amdgpu-fix-uvd-crash-on-Polaris12-during-driver-.patch
new file mode 100644
index 0000000..59a4f57
--- /dev/null
+++ b/patches/kernel/0012-drm-amdgpu-fix-uvd-crash-on-Polaris12-during-driver-.patch
@@ -0,0 +1,71 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Evan Quan <evan.quan at amd.com>
+Date: Sat, 9 Oct 2021 17:35:36 +0800
+Subject: [PATCH] drm/amdgpu: fix uvd crash on Polaris12 during driver
+ unloading
+
+BugLink: https://bugs.launchpad.net/bugs/1951822
+
+[ Upstream commit 4fc30ea780e0a5c1c019bc2e44f8523e1eed9051 ]
+
+There was a change(below) target for such issue:
+d82e2c249c8f ("drm/amdgpu: Fix crash on device remove/driver unload")
+But the fix for VI ASICs was missing there. This is a supplement for
+that.
+
+Fixes: d82e2c249c8f ("drm/amdgpu: Fix crash on device remove/driver unload")
+
+Signed-off-by: Evan Quan <evan.quan at amd.com>
+Acked-by: Alex Deucher <alexander.deucher at amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
+Signed-off-by: Sasha Levin <sashal at kernel.org>
+Signed-off-by: Paolo Pisati <paolo.pisati at canonical.com>
+---
+ drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 24 +++++++++++++-----------
+ 1 file changed, 13 insertions(+), 11 deletions(-)
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+index bc571833632e..72f876290768 100644
+--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+@@ -543,6 +543,19 @@ static int uvd_v6_0_hw_fini(void *handle)
+ {
+ struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
++ cancel_delayed_work_sync(&adev->uvd.idle_work);
++
++ if (RREG32(mmUVD_STATUS) != 0)
++ uvd_v6_0_stop(adev);
++
++ return 0;
++}
++
++static int uvd_v6_0_suspend(void *handle)
++{
++ int r;
++ struct amdgpu_device *adev = (struct amdgpu_device *)handle;
++
+ /*
+ * Proper cleanups before halting the HW engine:
+ * - cancel the delayed idle work
+@@ -567,17 +580,6 @@ static int uvd_v6_0_hw_fini(void *handle)
+ AMD_CG_STATE_GATE);
+ }
+
+- if (RREG32(mmUVD_STATUS) != 0)
+- uvd_v6_0_stop(adev);
+-
+- return 0;
+-}
+-
+-static int uvd_v6_0_suspend(void *handle)
+-{
+- int r;
+- struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+-
+ r = uvd_v6_0_hw_fini(adev);
+ if (r)
+ return r;
+--
+2.30.2
+
diff --git a/patches/kernel/0013-drm-amdgpu-fix-set-scaling-mode-Full-Full-aspect-Cen.patch b/patches/kernel/0013-drm-amdgpu-fix-set-scaling-mode-Full-Full-aspect-Cen.patch
new file mode 100644
index 0000000..b904bbd
--- /dev/null
+++ b/patches/kernel/0013-drm-amdgpu-fix-set-scaling-mode-Full-Full-aspect-Cen.patch
@@ -0,0 +1,45 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: hongao <hongao at uniontech.com>
+Date: Thu, 11 Nov 2021 11:32:07 +0800
+Subject: [PATCH] drm/amdgpu: fix set scaling mode Full/Full aspect/Center not
+ works on vga and dvi connectors
+
+BugLink: https://bugs.launchpad.net/bugs/1952579
+
+commit bf552083916a7f8800477b5986940d1c9a31b953 upstream.
+
+amdgpu_connector_vga_get_modes missed function amdgpu_get_native_mode
+which assign amdgpu_encoder->native_mode with *preferred_mode result in
+amdgpu_encoder->native_mode.clock always be 0. That will cause
+amdgpu_connector_set_property returned early on:
+if ((rmx_type != DRM_MODE_SCALE_NONE) &&
+ (amdgpu_encoder->native_mode.clock == 0))
+when we try to set scaling mode Full/Full aspect/Center.
+Add the missing function to amdgpu_connector_vga_get_mode can fix this.
+It also works on dvi connectors because
+amdgpu_connector_dvi_helper_funcs.get_mode use the same method.
+
+Signed-off-by: hongao <hongao at uniontech.com>
+Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
+Cc: stable at vger.kernel.org
+Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
+Signed-off-by: Andrea Righi <andrea.righi at canonical.com>
+---
+ drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+index b9c11c2b2885..0de66f59adb8 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+@@ -827,6 +827,7 @@ static int amdgpu_connector_vga_get_modes(struct drm_connector *connector)
+
+ amdgpu_connector_get_edid(connector);
+ ret = amdgpu_connector_ddc_get_modes(connector);
++ amdgpu_get_native_mode(connector);
+
+ return ret;
+ }
+--
+2.30.2
+
diff --git a/patches/kernel/0014-drm-amd-pm-avoid-duplicate-powergate-ungate-setting.patch b/patches/kernel/0014-drm-amd-pm-avoid-duplicate-powergate-ungate-setting.patch
new file mode 100644
index 0000000..8e638ae
--- /dev/null
+++ b/patches/kernel/0014-drm-amd-pm-avoid-duplicate-powergate-ungate-setting.patch
@@ -0,0 +1,119 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Evan Quan <evan.quan at amd.com>
+Date: Fri, 5 Nov 2021 15:25:30 +0800
+Subject: [PATCH] drm/amd/pm: avoid duplicate powergate/ungate setting
+
+BugLink: https://bugs.launchpad.net/bugs/1952579
+
+commit 6ee27ee27ba8b2e725886951ba2d2d87f113bece upstream.
+
+Just bail out if the target IP block is already in the desired
+powergate/ungate state. This can avoid some duplicate settings
+which sometimes may cause unexpected issues.
+
+Link: https://lore.kernel.org/all/YV81vidWQLWvATMM@zn.tnic/
+Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214921
+Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215025
+Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1789
+Fixes: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")
+Signed-off-by: Evan Quan <evan.quan at amd.com>
+Tested-by: Borislav Petkov <bp at suse.de>
+Reviewed-by: Lijo Lazar <lijo.lazar at amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
+Cc: stable at vger.kernel.org
+Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
+Signed-off-by: Andrea Righi <andrea.righi at canonical.com>
+[trivial backport]
+Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
+---
+ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
+ drivers/gpu/drm/amd/include/amd_shared.h | 3 ++-
+ drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 10 ++++++++++
+ drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 8 ++++++++
+ 4 files changed, 23 insertions(+), 1 deletion(-)
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+index c1e34aa5925b..96ca42bcfdbf 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+@@ -3387,6 +3387,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
+ adev->rmmio_size = pci_resource_len(adev->pdev, 2);
+ }
+
++ for (i = 0; i < AMD_IP_BLOCK_TYPE_NUM; i++)
++ atomic_set(&adev->pm.pwr_state[i], POWER_STATE_UNKNOWN);
++
+ adev->rmmio = ioremap(adev->rmmio_base, adev->rmmio_size);
+ if (adev->rmmio == NULL) {
+ return -ENOMEM;
+diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd/include/amd_shared.h
+index 257f280d3d53..bd077ea224a4 100644
+--- a/drivers/gpu/drm/amd/include/amd_shared.h
++++ b/drivers/gpu/drm/amd/include/amd_shared.h
+@@ -97,7 +97,8 @@ enum amd_ip_block_type {
+ AMD_IP_BLOCK_TYPE_ACP,
+ AMD_IP_BLOCK_TYPE_VCN,
+ AMD_IP_BLOCK_TYPE_MES,
+- AMD_IP_BLOCK_TYPE_JPEG
++ AMD_IP_BLOCK_TYPE_JPEG,
++ AMD_IP_BLOCK_TYPE_NUM,
+ };
+
+ enum amd_clockgating_state {
+diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+index 03581d5b1836..08362d506534 100644
+--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
++++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+@@ -927,6 +927,13 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device *adev, uint32_t block
+ {
+ int ret = 0;
+ const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
++ enum ip_power_state pwr_state = gate ? POWER_STATE_OFF : POWER_STATE_ON;
++
++ if (atomic_read(&adev->pm.pwr_state[block_type]) == pwr_state) {
++ dev_dbg(adev->dev, "IP block%d already in the target %s state!",
++ block_type, gate ? "gate" : "ungate");
++ return 0;
++ }
+
+ switch (block_type) {
+ case AMD_IP_BLOCK_TYPE_UVD:
+@@ -979,6 +986,9 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device *adev, uint32_t block
+ break;
+ }
+
++ if (!ret)
++ atomic_set(&adev->pm.pwr_state[block_type], pwr_state);
++
+ return ret;
+ }
+
+diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
+index 98f1b3d8c1d5..16e3f72d31b9 100644
+--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
++++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
+@@ -417,6 +417,12 @@ struct amdgpu_dpm {
+ enum amd_dpm_forced_level forced_level;
+ };
+
++enum ip_power_state {
++ POWER_STATE_UNKNOWN,
++ POWER_STATE_ON,
++ POWER_STATE_OFF,
++};
++
+ struct amdgpu_pm {
+ struct mutex mutex;
+ u32 current_sclk;
+@@ -451,6 +457,8 @@ struct amdgpu_pm {
+ /* Used for I2C access to various EEPROMs on relevant ASICs */
+ struct i2c_adapter smu_i2c;
+ struct list_head pm_attr_list;
++
++ atomic_t pwr_state[AMD_IP_BLOCK_TYPE_NUM];
+ };
+
+ #define R600_SSTU_DFLT 0
+--
+2.30.2
+
--
2.30.2
More information about the pve-devel
mailing list