Author: Longxuan Yu <ylong030@ucr.edu>
Date: Fri May 8 20:49:07 2026 -0400
8021q: delete cleared egress QoS mappings
[ Upstream commit 7dddc74af369478ba7f9bc136d0fc1dc4570cb66 ]
vlan_dev_set_egress_priority() currently keeps cleared egress
priority mappings in the hash as tombstones. Repeated set/clear cycles
with distinct skb priorities therefore accumulate mapping nodes until
device teardown and leak memory.
Delete mappings when vlan_prio is cleared instead of keeping tombstones.
Now that the egress mapping lists are RCU protected, the node can be
unlinked safely and freed after a grace period.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@kernel.org
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Longxuan Yu <ylong030@ucr.edu>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Link: https://patch.msgid.link/ecfa6f6ce2467a42647ff4c5221238ae85b79a59.1776647968.git.yuantan098@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Longxuan Yu <ylong030@ucr.edu>
Date: Fri May 8 20:49:06 2026 -0400
8021q: use RCU for egress QoS mappings
[ Upstream commit fc69decc811b155a0ed8eef17ee940f28c4f6dbc ]
The TX fast path and reporting paths walk egress QoS mappings without
RTNL. Convert the mapping lists to RCU-protected pointers, use RCU
reader annotations in readers, and defer freeing mapping nodes with an
embedded rcu_head.
This prepares the egress QoS mapping code for safe removal of mapping
nodes in a follow-up change while preserving the current behavior.
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Longxuan Yu <ylong030@ucr.edu>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Link: https://patch.msgid.link/9136768189f8c6d3f824f476c62d2fa1111688e8.1776647968.git.yuantan098@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 7dddc74af369 ("8021q: delete cleared egress QoS mappings")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Karol Wachowski <karol.wachowski@linux.intel.com>
Date: Thu Apr 30 11:56:44 2026 +0200
accel/ivpu: Disallow re-exporting imported GEM objects
commit 7dd57d7a6350770dfc283287125c409e995200e0 upstream.
Prevent re-exporting of imported GEM buffers by adding a custom
prime_handle_to_fd callback that checks if the object is imported
and returns -EOPNOTSUPP if so.
Re-exporting imported GEM buffers causes loss of buffer flags settings,
leading to incorrect device access and data corruption.
Reported-by: Yametsu <yam3tsu@gmail.com>
Fixes: 57557964b582 ("accel/ivpu: Add support for userptr buffer objects")
Reviewed-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.19+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Breno Leitao <leitao@debian.org>
Date: Mon Apr 20 02:27:13 2026 -0700
ACPI: arm64: cpuidle: Tolerate platforms with no deep PSCI idle states
commit 3ea4415015d690a51a3fb1f98dfc9a02f88f7bc4 upstream.
Commit cac173bea57d ("ACPI: processor: idle: Rework the handling of
acpi_processor_ffh_lpi_probe()") moved the acpi_processor_ffh_lpi_probe()
call from acpi_processor_setup_cpuidle_dev(), where its return value was
ignored, to acpi_processor_get_power_info(), where it is now treated as
a hard failure. As a result, platforms where psci_acpi_cpu_init_idle()
returned -ENODEV stopped registering any cpuidle states, forcing CPUs to
busy-poll when idle.
On NVIDIA Grace (aarch64) systems with PSCIv1.1, pr->power.count is 1
(only WFI, no deep PSCI states beyond it), so the previous
"count = pr->power.count - 1; if (count <= 0) return -ENODEV;" check
returned -ENODEV for all 72 CPUs and disabled cpuidle entirely.
The lpi_states count is already validated in acpi_processor_get_lpi_info(),
so the check here is redundant. Simplify the loop to iterate over
lpi_states[1..power.count). When only WFI is present, the loop body
simply does not execute and the function returns 0, which is the correct
outcome: there is nothing to validate for FFH and no error to report.
Suggested-by: Huisong Li <lihuisong@huawei.com>
Cc: stable@vger.kernel.org
Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Sudeep Holla <sudeep.holla@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jinjie Ruan <ruanjinjie@huawei.com>
Date: Fri Apr 17 12:01:12 2026 +0800
ACPI: CPPC: Fix related_cpus inconsistency during CPU hotplug
commit 75141a770f4f8225d316f6c7e146723a32e9720e upstream.
When concurrently bringing up and down two SMT threads of a physical
core, many warning call traces occur as below:
The issue timeline is as follows:
1. When the system starts,
cpufreq: CPU: 220, policy->related_cpus: 220-221, policy->cpus: 220-221
2. Offline CPU 220 and CPU 221.
3. Online CPU 220
- CPU 221 is now offline, as acpi_get_psd_map() use
for_each_online_cpu(), so the cpu_data->shared_cpu_map,
policy->cpus, and related_cpus has only CPU 220.
cpufreq: CPU: 220, policy->related_cpus: 220, policy->cpus: 220
4. Offline CPU 220
5. Online CPU 221, the below call trace occurs:
- Since CPU 220 and CPU 221 share one policy, and
policy->related_cpus = 220 after step 3, so CPU 221
is not in policy->related_cpus but
per_cpu(cpufreq_cpu_data, cpu221) is not NULL.
After reverting commit 56eb0c0ed345 ("ACPI: CPPC: Fix remaining
for_each_possible_cpu() to use online CPUs"), the issue disappeared.
The _PSD (P-State Dependency) defines the hardware-level dependency of
frequency control across CPU cores. Since this relationship is a physical
attribute of the hardware topology, it remains constant regardless of the
online or offline status of the CPUs.
Using for_each_online_cpu() in acpi_get_psd_map() is problematic. If a
CPU is offline, it will be excluded from the shared_cpu_map.
Consequently, if that CPU is brought online later, the kernel will fail
to recognize it as part of any shared frequency domain.
Switch back to for_each_possible_cpu() to ensure that all cores defined
in the ACPI tables are correctly mapped into their respective performance
domains from the start. This aligns with the logic of policy->related_cpus,
which must encompass all potentially available cores in the domain to
prevent logic gaps during CPU hotplug operations.
To resolve the original issue regarding the "nosmt" or "nosmt=force"
boot parameter, as send_pcc_cmd() function already does if (!desc)
continue, so reverting that loop back to for_each_possible_cpu() is ok,
only need to change the match_cpc_ptr NULL case in acpi_get_psd_map() to
continue as Sean suggested.
How to reproduce, on arm64 machine with SMT support which use acpi cppc
cpufreq driver:
bash test.sh 220 & bash test.sh 221 &
The test.sh is as below:
while true
do
echo 0 > /sys/devices/system/cpu/cpu${1}/online
sleep 0.5
cat /sys/devices/system/cpu/cpu${1}/cpufreq/related_cpus
echo 1 > /sys/devices/system/cpu/cpu${1}/online
cat /sys/devices/system/cpu/cpu${1}/cpufreq/related_cpus
done
CPU: 221 PID: 1119 Comm: cpuhp/221 Kdump: loaded Not tainted 6.6.0debug+ #5
Hardware name: To be filled by O.E.M. S920X20/BC83AMDA01-7270Z, BIOS 20.39 09/04/2024
pstate: a1400009 (NzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : cpufreq_online+0x8ac/0xa90
lr : cpuhp_cpufreq_online+0x18/0x30
sp : ffff80008739bce0
x29: ffff80008739bce0 x28: 0000000000000000 x27: ffff28400ca32200
x26: 0000000000000000 x25: 0000000000000003 x24: ffffd483503ff000
x23: ffffd483504051a0 x22: ffffd48350024a00 x21: 00000000000000dd
x20: 000000000000001d x19: ffff28400ca32000 x18: 0000000000000000
x17: 0000000000000020 x16: ffffd4834e6a3fc8 x15: 0000000000000020
x14: 0000000000000008 x13: 0000000000000001 x12: 00000000ffffffff
x11: 0000000000000040 x10: ffffd48350430728 x9 : ffffd4834f087c78
x8 : 0000000000000001 x7 : ffff2840092bdf00 x6 : ffffd483504264f0
x5 : ffffd48350405000 x4 : ffff283f7f95cc60 x3 : 0000000000000000
x2 : ffff53bc2f94b000 x1 : 00000000000000dd x0 : 0000000000000000
Call trace:
cpufreq_online+0x8ac/0xa90
cpuhp_cpufreq_online+0x18/0x30
cpuhp_invoke_callback+0x128/0x580
cpuhp_thread_fun+0x110/0x1b0
smpboot_thread_fn+0x140/0x190
kthread+0xec/0x100
ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
Cc: All applicable <stable@vger.kernel.org>
Fixes: 56eb0c0ed345 ("ACPI: CPPC: Fix remaining for_each_possible_cpu() to use online CPUs")
Co-developed-by: Sean Kelley <skelley@nvidia.com>
Signed-off-by: Sean Kelley <skelley@nvidia.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
[ rjw: Changelog edits ]
Link: https://patch.msgid.link/20260417040112.3727756-1-ruanjinjie@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Guangshuo Li <lgs201920130244@gmail.com>
Date: Mon Apr 13 21:53:43 2026 +0800
ACPI: scan: Use acpi_dev_put() in object add error paths
commit 9c0acc169ac71535477caedea8315f7041c5f07c upstream.
After acpi_init_device_object(), the lifetime of struct acpi_device is
managed by the driver core through reference counting.
Both acpi_add_power_resource() and acpi_add_single_object() call
acpi_init_device_object() and then invoke acpi_device_add(). If that
fails, their error paths call the release callback directly instead of
dropping the device reference through acpi_dev_put().
This bypasses the normal device lifetime rules and frees the object
without releasing the reference acquired by device_initialize(), which
may lead to a refcount leak.
The issue was identified by a static analysis tool I developed and
confirmed by manual review.
Fix both error paths by using acpi_dev_put() and let the release
callback handle the final cleanup.
Fixes: 781d737c7466 ("ACPI: Drop power resources driver")
Fixes: 718fb0de8ff88 ("ACPI: fix NULL bug for HID/UID string")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Link: https://patch.msgid.link/20260413135343.2884481-1-lgs201920130244@gmail.com
Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jan Schär <jan@jschaer.ch>
Date: Sat Apr 11 11:26:06 2026 +0200
ACPI: video: Add backlight=native quirk for Dell OptiPlex 7770 AIO
commit ad7997f5a01af6f711fe6b6a2df578b964109d49 upstream.
The Dell OptiPlex 7770 AIO needs the same quirk as the 7760 AIO. The
backlight can be controlled with the native controller, intel_backlight,
but not with dell_uart_backlight.
I dumped the DSDT using acpidump, acpixtract and iasl, and confirmed
that it contains the DELL0501 device. When loading the
dell_uart_backlight driver with `rmmod dell_uart_backlight`, `modprobe
dell_uart_backlight dyndbg`, it reports "Firmware version: GL_Re_V18".
Fixes: cd8e468efb4f ("ACPI: video: Add Dell UART backlight controller detection")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Jan Schär <jan@jschaer.ch>
Reviewed-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
Link: https://patch.msgid.link/20260411092606.47925-1-jan@jschaer.ch
Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Shivam Kalra <shivamkalra98@zohomail.in>
Date: Sun Apr 26 19:38:41 2026 +0530
ACPI: video: force native backlight on HP OMEN 16 (8A44)
commit 4b506ea5351a1f5937ac632a4a5c35f6f796cc41 upstream.
The HP OMEN 16 Gaming Laptop (board name 8A44) has a mux-less hybrid
GPU configuration with AMD Rembrandt (Radeon 680M) and NVIDIA GA104
(RTX 3070 Ti). The internal eDP panel is wired to the AMD iGPU.
When Nouveau loads without GSP firmware, the ACPI video backlight
device (acpi_video0) gets registered alongside the native AMD
backlight (amdgpu_bl2). In this state, writes to amdgpu_bl2 update
the software brightness value but fail to change the physical panel
brightness.
Force native backlight to prevent acpi_video0 from registering.
Confirmed that booting with acpi_backlight=native resolves the
issue.
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
Link: https://patch.msgid.link/20260426-omen-16-backlight-fix-v1-1-62364f268ea6@zohomail.in
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jiexun Wang <wangjiexun2025@gmail.com>
Date: Wed May 6 22:08:23 2026 +0800
af_unix: Reject SIOCATMARK on non-stream sockets
commit d119775f2bad827edc28071c061fdd4a91f889a5 upstream.
SIOCATMARK reports whether the receive queue is at the urgent mark for
MSG_OOB.
In AF_UNIX, MSG_OOB is supported only for SOCK_STREAM sockets.
SOCK_DGRAM and SOCK_SEQPACKET reject MSG_OOB in sendmsg() and recvmsg(),
so they should not support SIOCATMARK either.
Return -EOPNOTSUPP for non-stream sockets before checking the receive
queue.
Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260506140825.2987635-1-n05ec@lzu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Date: Wed May 6 00:34:47 2026 -0300
ALSA: core: Serialize deferred fasync state checks
commit 5337213381df578058e2e41da93cbd0e4639935f upstream.
snd_fasync_helper() updates fasync->on under snd_fasync_lock, and
snd_fasync_work_fn() now also evaluates fasync->on under the same
lock. snd_kill_fasync() still tests the flag before taking the lock,
leaving an unsynchronized read against FASYNC enable/disable updates.
Move the enabled-state check into the locked section.
Also clear fasync->on under snd_fasync_lock in snd_fasync_free()
before unlinking the pending entry. Together with the locked sender-side
check, this publishes teardown before flushing the deferred work and
prevents a racing sender from requeueing the entry after free has
started.
Fixes: ef34a0ae7a26 ("ALSA: core: Add async signal helpers")
Fixes: 8146cd333d23 ("ALSA: core: Fix potential data race at fasync handling")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260506-alsa-core-fasync-on-lock-v1-1-ea48c77d6ca4@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Date: Sun May 3 21:55:52 2026 -0300
ALSA: firewire-tascam: Do not drop unread control events
commit 0749daa8eb5ab90334aaad3b0671efd7150d43b1 upstream.
tscm_hwdep_read_queue() copies as many queued control events as fit in
the userspace buffer. When the buffer is smaller than the current
contiguous queue segment, length is rounded down to the number of bytes
that can be copied.
However, after copying that shortened length, the code advances pull_pos
to the original tail_pos, marking the whole contiguous segment as
consumed. Any events between the copied portion and tail_pos are lost.
Limit tail_pos to the position after the entries actually copied before
updating pull_pos. When the whole segment fits, this is equivalent to the
old tail_pos update; when the buffer is smaller, the remaining events
stay queued for the next read.
Fixes: a8c0d13267a4 ("ALSA: firewire-tascam: notify events of change of state for userspace applications")
Cc: stable@vger.kernel.org
Suggested-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Co-developed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260503-alsa-firewire-tascam-read-queue-v2-1-126c6efd7642@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yuriy Padlyak <yuriypadlyak@gmail.com>
Date: Thu Apr 30 01:09:03 2026 +0300
ALSA: hda/realtek: Fix speaker silence after S3 resume on Xiaomi Mi Laptop Pro 15
commit 92a8b5e2eff6920bf815cd6a80b088ec3fdf01a3 upstream.
The Xiaomi Mi Laptop Pro 15 (TM1905, subsystem 1d72:1905) ships with the
Realtek ALC256 codec on Intel Comet Lake PCH-LP. After S3 resume the
codec sets coefficient register 0x10 to 0x0220 instead of 0x0020 — bit 9
is erroneously set, which silences the internal speaker. Bluetooth and
HDMI audio are unaffected because they use different paths.
This is the same mechanism fixed for Clevo NJ51CU by commit edca7cc4b0ac
("ALSA: hda/realtek: Fix quirk for Clevo NJ51CU"), but the existing
ALC256_FIXUP_MIC_NO_PRESENCE_AND_RESUME also reconfigures pin 0x19 as a
front mic, which is wrong for this Xiaomi where pin 0x19 default is
0x411111f0 (disabled). Add a minimal fixup that only clears the stuck
coef bit, and add the Xiaomi SSID to the quirk table.
Verified by reading coef 0x10 with hda-verb after resume (returns
0x0220), writing 0x0020, and confirming the internal speaker resumes
output. With this fixup applied the bit is cleared on every codec init,
including post-resume.
Signed-off-by: Yuriy Padlyak <yuriypadlyak@gmail.com>
Cc: <stable@vger.kernel.org>
Tested-by: Yuriy Padlyak <yuriypadlyak@gmail.com>
Link: https://patch.msgid.link/20260429220903.14918-1-yuriypadlyak@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Date: Thu Apr 23 10:11:31 2026 -0300
ALSA: hda: cs35l56: Propagate ASP TX source control errors
commit 0faacc0841d66f3cf51989c10a83f3a82d52ff2c upstream.
cs35l56_hda_mixer_get() ignores regmap_read() and
cs35l56_hda_mixer_put() ignores regmap_update_bits_check().
This makes the ASP TX source controls report success when a regmap
access fails. The write path returns no change instead of an error,
and the read path continues after a failed read instead of aborting
the control callback.
Propagate the regmap errors, matching the posture and volume controls
in this driver.
Fixes: 73cfbfa9caea ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Reviewed-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260423-alsa-cs35l56-asp-tx-source-errors-v1-1-17ea7c62ec31@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Takashi Iwai <tiwai@suse.de>
Date: Fri Apr 24 13:21:55 2026 +0200
ALSA: pcm: oss: Fix data race at accessing runtime.oss.trigger
commit 901ac0ff15edf9503162e2cf6579bd11a30f1ed4 upstream.
Currently the runtime.oss.trigger field may be accessed concurrently
without protection, which may lead to the data race. And, in this
case, it may lead to more severe problem because it's a bit field; as
writing the data, it may overwrite other bit fields as well, which
confuses the operation completely, as spotted by fuzzing.
Fix it by covering runtime.oss.trigger bit fled also with the existing
params_lock mutex in both snd_pcm_oss_get_trigger() and
snd_pcm_oss_poll().
Reported-and-tested-by: Jaeyoung Chung <jjy600901@snu.ac.kr>
Closes: https://lore.kernel.org/20260423145330.210035-1-jjy600901@snu.ac.kr
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20260424112205.123703-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Date: Wed May 6 00:15:48 2026 -0300
ALSA: seq: Fix UMP group 16 filtering
commit 92429ca999db99febced82f23362a71b2ba4c1d8 upstream.
The sequencer UAPI defines group_filter as an unsigned int bitmap.
Bit 0 filters groupless messages and bits 1-16 filter UMP groups 1-16.
The internal snd_seq_client storage is only unsigned short, so bit 16
is truncated when userspace sets the filter. The same truncation affects
the automatic UMP client filter used to avoid delivery to inactive
groups, so events for group 16 cannot be filtered.
Store the internal bitmap as unsigned int and keep both userspace-provided
and automatically generated values limited to the defined UAPI bits.
Fixes: d2b706077792 ("ALSA: seq: Add UMP group filter")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260506-alsa-seq-ump-group16-filter-v1-1-b75160bf6993@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Takashi Iwai <tiwai@suse.de>
Date: Mon Apr 27 17:22:15 2026 +0200
ALSA: usb-audio: Avoid potential endless loop in convert_chmap_v3()
commit 6e7247d8f5fefeceb0bb9cc80a5388a636b219cd upstream.
The convert_chmap_v3() has a loop with its increment size of
cs_desc->wLength, but we forgot to validate cs_desc->wLength itself,
which may lead to potential endless loop by a malformed descriptor.
Add a proper size check to abort the loop for plugging the hole.
Fixes: ecfd41166b72 ("ALSA: usb-audio: Validate UAC3 cluster segment descriptors")
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20260427152224.15276-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Date: Fri Apr 24 18:50:10 2026 -0300
ALSA: usb-audio: Fix UAC3 cluster descriptor size check
commit 26265dd69da32d88a88d21987853cec899d9e21f upstream.
The UAC3 cluster descriptor length check in
snd_usb_get_audioformat_uac3()was added to
make sure that the buffer is large enough for
a struct uac3_cluster_header_descriptor before the
returned data is cast and used.
However, the check uses sizeof(cluster), where cluster
is a pointer, not the size of the descriptor header.
This makes the validation depend on the architecture
pointer size and does not match the intended object size.
Check against sizeof(*cluster) instead.
Fixes: fb4e2a6e8f28 ("ALSA: usb-audio: Fix out-of-bounds read in snd_usb_get_audioformat_uac3()")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260424-alsa-usb-uac3-cluster-size-v1-1-99a5808898a3@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Date: Mon May 4 11:08:45 2026 -0300
ALSA: usb-audio: midi2: Restart output URBs on resume
commit f3c57c9c2a49a21d784b7c04a2c883bffc070659 upstream.
USB MIDI 2.0 suspend saves the endpoint running state, clears it and
kills all endpoint URBs. Resume restores the running state, but only
restarts input endpoints.
For a running output endpoint, this leaves the endpoint marked running
with an empty URB queue. Output transfer progress depends on either the
rawmidi trigger path starting the queue or an output completion refilling
it. After suspend there is no completion left, and output data that
remains queued in the raw UMP or legacy rawmidi buffer can stay stalled
until userspace happens to trigger the stream again.
Restore the saved state with atomic accessors, keep input endpoints
restarted as before, and restart output endpoints that were running before
suspend. Clear the saved suspend state after restoring it.
Fixes: ff49d1df79ae ("ALSA: usb-audio: USB MIDI 2.0 UMP support")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260504-usb-midi2-output-resume-v1-1-c089cc8ad3c6@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Breno Leitao <leitao@debian.org>
Date: Tue May 5 09:02:13 2026 -0700
arm64/fpsimd: ptrace: zero target's fpsimd_state, not the tracer's
commit 5cbb61bf4168859d97c068d88d364f4f1f440325 upstream.
sve_set_common() is the backend for PTRACE_SETREGSET(NT_ARM_SVE) and
PTRACE_SETREGSET(NT_ARM_SSVE). Every write in the function operates on
the tracee (target) - except a single memset that uses current instead,
zeroing the tracer's saved V0-V31 / FPSR / FPCR shadow on every ptrace
SETREGSET call.
The memset is meant to give the tracee a defined zero register image
before the user-supplied payload is copied in (for partial writes,
header-only writes, and FPSIMD<->SVE format switches). Aiming it at
current both denies the tracee that clean slate and silently corrupts
the tracer.
The corruption of the tracer's saved FPSIMD state is not always
observable. Where the tracer's state is live on a CPU, this may be
reused without loading the corrupted state from memory, and will
eventually be written back over the corrupted state. Where the tracer's
state is saved in SVE_PT_REGS_SVE format, only the FPSR and FPCR are
clobbered, and the effective copy of the vectors is in the task's
sve_state.
Reproducible on an arm64 kernel with SVE: a single-threaded tracer that
loads a known pattern into V0-V31, issues PTRACE_SETREGSET(NT_ARM_SVE)
on a child, and reads V0-V31 back observes them all zeroed within tens
of thousands of iterations when a sibling thread keeps stealing the
FPSIMD CPU binding.
Fixes: 316283f276eb ("arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE")
Cc: <stable@vger.kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Kevin Brodsky <kevin.brodsky@arm.com>
Date: Mon Apr 27 13:03:33 2026 +0100
arm64: signal: Preserve POR_EL0 if poe_context is missing
commit 030e8a40fff65ca6ac1c04a4d3c08afe72438922 upstream.
Commit 2e8a1acea859 ("arm64: signal: Improve POR_EL0 handling to
avoid uaccess failures") delayed the write to POR_EL0 in
rt_sigreturn to avoid spurious uaccess failures. This change however
relies on the poe_context frame record being present: on a system
supporting POE, calling sigreturn without a poe_context record now
results in writing arbitrary data from the kernel stack into POR_EL0.
Fix this by adding a __valid_fields member to struct
user_access_state, and zeroing the struct on allocation.
restore_poe_context() then indicates that the por_el0 field is valid
by setting the corresponding bit in __valid_fields, and
restore_user_access_state() only touches POR_EL0 if there is a valid
value to set it to. This is in line with how POR_EL0 was originally
handled; all frame records are currently optional, except
fpsimd_context.
To ensure that __valid_fields is kept in sync, fields (currently
just por_el0) are now accessed via accessors and prefixed with __ to
discourage direct access.
Fixes: 2e8a1acea859 ("arm64: signal: Improve POR_EL0 handling to avoid uaccess failures")
Cc: <stable@vger.kernel.org>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tommaso Soncin <soncintommaso@gmail.com>
Date: Wed Apr 29 18:08:57 2026 +0200
ASoC: amd: yc: Add HP OMEN Gaming Laptop 16-ap0xxx product line in quirk table
commit d63c219b7ff39f897da10c160a2edef76320f16c upstream.
Add a DMI quirk for the HP OMEN Gaming Laptop 16-ap0xxx line fixing the
issue where the internal microphone was not detected.
Cc: stable@vger.kernel.org
Signed-off-by: Tommaso Soncin <soncintommaso@gmail.com>
Link: https://patch.msgid.link/20260429160858.538986-1-soncintommaso@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Li Jian <lazycat-xiao@foxmail.com>
Date: Fri Apr 17 18:53:14 2026 +0800
ASoC: ES8389: convert to devm_clk_get_optional() to get clock
commit 8ed3311131077712cdd0b3afec6909b9388ad3e4 upstream.
When enabling ES8390 via ACPI description, es8389 would fail to
obtain a clock source, causing the driver to fail to initialize.
This was not an issue with older kernels, but since commit
abae8e57e49a ("clk: generalize devm_clk_get() a bit"),
devm_clk_get() would return an error pointer when a clock source
was not detected (instead of falling back to a static clock),
causing the driver to fail early.
Use devm_clk_get_optional() instead to return to the previous
behaviour, allowing the use of a static clock source.
Cc: stable@vger.kernel.org
Signed-off-by: Li Jian <lazycat-xiao@foxmail.com>
Link: https://patch.msgid.link/tencent_7C78374FB9F4B3A37101E5C719715D8BC40A@qq.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Joseph Salisbury <joseph.salisbury@oracle.com>
Date: Mon Mar 16 14:05:45 2026 -0400
ASoC: fsl_easrc: fix comment typo
commit 804dce6c73fdfa44184ee4e8b09abad7f5da408f upstream.
The file contains a spelling error in a source comment (funciton).
Typos in comments reduce readability and make text searches less reliable
for developers and maintainers.
Replace 'funciton' with 'function' in the affected comment. This is a
comment-only cleanup and does not change behavior.
Fixes: 955ac624058f ("ASoC: fsl_easrc: Add EASRC ASoC CPU DAI drivers")
Cc: stable@vger.kernel.org
Signed-off-by: Joseph Salisbury <joseph.salisbury@oracle.com>
Link: https://patch.msgid.link/20260316180545.144032-1-joseph.salisbury@oracle.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Date: Mon Apr 27 23:38:41 2026 -0300
ASoC: Intel: bytcr_wm5102: Fix MCLK leak on platform_clock_control error
commit 13d30682e8dee191ac04e93642f0372a723e8b0c upstream.
If byt_wm5102_prepare_and_enable_pll1() fails in the
SND_SOC_DAPM_EVENT_ON() path, platform_clock_control() returns after
clk_prepare_enable(priv->mclk) without disabling the clock again.
This leaks an MCLK enable reference on failed power-up attempts. Add the
missing clk_disable_unprepare() on the error path, matching the unwind
used by the other Intel platform_clock_control() implementations.
Fixes: 9a87fc1e0619 ("ASoC: Intel: bytcr_wm5102: Add machine driver for BYT/WM5102")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Reviewed-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
Link: https://patch.msgid.link/20260427-bytcr-wm5102-mclk-leak-v1-1-02b96d08e99c@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Date: Thu Apr 2 08:11:10 2026 +0000
ASoC: qcom: q6apm-dai: reset queue ptr on trigger stop
commit cab45ab95ce7600fc0ff84585c77fd45b7b0d67c upstream.
Reset queue pointer on SNDRV_PCM_TRIGGER_STOP event to be inline
with resetting appl_ptr. Without this we will end up with a queue_ptr
out of sync and driver could try to send data that is not ready yet.
Fix this by resetting the queue_ptr.
Fixes: 3d4a4411aa8bb ("ASoC: q6apm-dai: schedule all available frames to avoid dsp under-runs")
Cc: Stable@vger.kernel.org
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Link: https://patch.msgid.link/20260402081118.348071-6-srinivas.kandagatla@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Date: Thu Apr 2 08:11:09 2026 +0000
ASoC: qcom: q6apm-lpass-dai: Fix multiple graph opens
commit 69acc488aaf39d0ddf6c3cf0e47c1873d39919a2 upstream.
As prepare can be called mulitple times, this can result in multiple
graph opens for playback path.
This will result in a memory leaks, fix this by adding a check before
opening.
Fixes: be1fae62cf25 ("ASoC: q6apm-lpass-dai: close graph on prepare errors")
Cc: Stable@vger.kernel.org
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Link: https://patch.msgid.link/20260402081118.348071-5-srinivas.kandagatla@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Date: Thu Apr 2 08:11:07 2026 +0000
ASoC: qcom: q6apm: remove child devices when apm is removed
commit 4a0e1bcc98f7281d1605768bd2fe71eacc34f9b7 upstream.
looks like q6apm driver does not remove the child driver q6apm-dai and
q6apm-bedais when the this driver is removed.
Fix this by depopulating them in remove callback.
With this change when the dsp is shutdown all the devices associated with
q6apm will now be removed.
Fixes: 5477518b8a0e ("ASoC: qdsp6: audioreach: add q6apm support")
Cc: Stable@vger.kernel.org
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Link: https://patch.msgid.link/20260402081118.348071-3-srinivas.kandagatla@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mark Brown <broonie@kernel.org>
Date: Thu Mar 26 14:52:41 2026 +0000
ASoC: SOF: Don't allow pointer operations on unconfigured streams
commit c5b6285aae050ff1c3ea824ca3d88ac4be1e69c8 upstream.
When reporting the pointer for a compressed stream we report the current
I/O frame position by dividing the position by the number of channels
multiplied by the number of container bytes. These values default to 0 and
are only configured as part of setting the stream parameters so this allows
a divide by zero to be configured. Validate that they are non zero,
returning an error if not
Fixes: c1a731c71359 ("ASoC: SOF: compress: Add support for computing timestamps")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260326-asoc-compress-tstamp-params-v1-1-3dc735b3d599@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Naman Jain <namjain@linux.microsoft.com>
Date: Fri Apr 10 15:34:13 2026 +0000
block: add pgmap check to biovec_phys_mergeable
commit 13920e4b7b784b40cf4519ff1f0f3e513476a499 upstream.
biovec_phys_mergeable() is used by the request merge, DMA mapping,
and integrity merge paths to decide if two physically contiguous
bvec segments can be coalesced into one. It currently has no check
for whether the segments belong to different dev_pagemaps.
When zone device memory is registered in multiple chunks, each chunk
gets its own dev_pagemap. A single bio can legitimately contain
bvecs from different pgmaps -- iov_iter_extract_bvecs() breaks at
pgmap boundaries but the outer loop in bio_iov_iter_get_pages()
continues filling the same bio. If such bvecs are physically
contiguous, biovec_phys_mergeable() will coalesce them, making it
impossible to recover the correct pgmap for the merged segment
via page_pgmap().
Add a zone_device_pages_have_same_pgmap() check to prevent merging
bvec segments that span different pgmaps.
Fixes: 49580e690755 ("block: add check when merging zone device pages")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Link: https://patch.msgid.link/20260410153414.4159050-2-namjain@linux.microsoft.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Fri Feb 27 22:19:44 2026 +0900
block: fix zone write plug removal
commit b7d4ffb510373cc6ecf16022dd0e510a023034fb upstream.
Commit 7b295187287e ("block: Do not remove zone write plugs still in
use") modified disk_should_remove_zone_wplug() to add a check on the
reference count of a zone write plug to prevent removing zone write
plugs from a disk hash table when the plugs are still being referenced
by BIOs or requests in-flight. However, this check does not take into
account that a BIO completion may happen right after its submission by
a zone write plug BIO work, and before the zone write plug BIO work
releases the zone write plug reference count. This situation leads to
disk_should_remove_zone_wplug() returning false as in this case the zone
write plug reference count is at least equal to 3. If the BIO that
completes in such manner transitioned the zone to the FULL condition,
the zone write plug for the FULL zone will remain in the disk hash
table.
Furthermore, relying on a particular value of a zone write plug
reference count to set the BLK_ZONE_WPLUG_UNHASHED flag is fragile as
reading the atomic reference count and doing a comparison with some
value is not overall atomic at all.
Address these issues by reworking the reference counting of zone write
plugs so that removing plugs from a disk hash table can be done
directly from disk_put_zone_wplug() when the last reference on a plug
is dropped.
To do so, replace the function disk_remove_zone_wplug() with
disk_mark_zone_wplug_dead(). This new function sets the zone write plug
flag BLK_ZONE_WPLUG_DEAD (which replaces BLK_ZONE_WPLUG_UNHASHED) and
drops the initial reference on the zone write plug taken when the plug
was added to the disk hash table. This function is called either for
zones that are empty or full, or directly in the case of a forced plug
removal (e.g. when the disk hash table is being destroyed on disk
removal). With this change, disk_should_remove_zone_wplug() is also
removed.
disk_put_zone_wplug() is modified to call the function
disk_free_zone_wplug() to remove a zone write plug from a disk hash
table and free the plug structure (with a call_rcu()), when the last
reference on a zone write plug is dropped. disk_free_zone_wplug()
always checks that the BLK_ZONE_WPLUG_DEAD flag is set.
In order to avoid having multiple zone write plugs for the same zone in
the disk hash table, disk_get_and_lock_zone_wplug() checked for the
BLK_ZONE_WPLUG_UNHASHED flag. This check is removed and a check for
the new BLK_ZONE_WPLUG_DEAD flag is added to
blk_zone_wplug_handle_write(). With this change, we continue preventing
adding multiple zone write plugs for the same zone and at the same time
re-inforce checks on the user behavior by failing new incoming write
BIOs targeting a zone that is marked as dead. This case can happen only
if the user erroneously issues write BIOs to zones that are full, or to
zones that are currently being reset or finished.
Fixes: 7b295187287e ("block: Do not remove zone write plugs still in use")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jens Axboe <axboe@kernel.dk>
Date: Mon May 4 08:34:32 2026 -0600
block: only read from sqe on initial invocation of blkdev_uring_cmd()
commit 212ec34e4e726e8cd4af7bea4740db24de8a9dab upstream.
This passthrough helper currently only supports discards. Part of that
command is the start and length, which is read from the SQE. It does
so on every invocation, where it really should just make it stable
on the first invocation. This avoids needing to copy the SQE upfront,
as we only really need those two 8b values stored in our per-req
payload.
Cc: stable@vger.kernel.org # 6.17+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tristan Madani <tristan@talencesecurity.com>
Date: Tue Apr 21 11:14:54 2026 +0000
Bluetooth: btmtk: validate WMT event SKB length before struct access
commit 634a4408c0615c523cf7531790f4f14a422b9206 upstream.
btmtk_usb_hci_wmt_sync() casts the WMT event response SKB data to
struct btmtk_hci_wmt_evt (7 bytes) and struct btmtk_hci_wmt_evt_funcc
(9 bytes) without first checking that the SKB contains enough data.
A short firmware response causes out-of-bounds reads from SKB tailroom.
Use skb_pull_data() to validate and advance past the base WMT event
header. For the FUNC_CTRL case, pull the additional status field bytes
before accessing them.
Fixes: d019930b0049 ("Bluetooth: btmtk: move btusb_mtk_hci_wmt_sync to btmtk.c")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: David Carlier <devnexen@gmail.com>
Date: Sun Apr 12 21:29:16 2026 +0100
Bluetooth: hci_conn: fix potential UAF in create_big_sync
commit 0beddb0c380bed5f5b8e61ddbe14635bb73d0b41 upstream.
Add hci_conn_valid() check in create_big_sync() to detect stale
connections before proceeding with BIG creation. Handle the
resulting -ECANCELED in create_big_complete() and re-validate the
connection under hci_dev_lock() before dereferencing, matching the
pattern used by create_le_conn_complete() and create_pa_complete().
Keep the hci_conn object alive across the async boundary by taking
a reference via hci_conn_get() when queueing create_big_sync(), and
dropping it in the completion callback. The refcount and the lock
are complementary: the refcount keeps the object allocated, while
hci_dev_lock() serializes hci_conn_hash_del()'s list_del_rcu() on
hdev->conn_hash, as required by hci_conn_del().
hci_conn_put() is called outside hci_dev_unlock() so the final put
(which resolves to kfree() via bt_link_release) does not run under
hdev->lock, though the release path would be safe either way.
Without this, create_big_complete() would unconditionally
dereference the conn pointer on error, causing a use-after-free
via hci_connect_cfm() and hci_conn_del().
Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections")
Cc: stable@vger.kernel.org
Co-developed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date: Fri Apr 10 15:29:52 2026 -0400
Bluetooth: hci_event: Fix OOB read and infinite loop in hci_le_create_big_complete_evt
commit 5ddb8014261137cadaf83ab5617a588d80a22586 upstream.
hci_le_create_big_complete_evt() iterates over BT_BOUND connections for
a BIG handle using a while loop, accessing ev->bis_handle[i++] on each
iteration. However, there is no check that i stays within ev->num_bis
before the array access.
When a controller sends a LE_Create_BIG_Complete event with fewer
bis_handle entries than there are BT_BOUND connections for that BIG,
or with num_bis=0, the loop reads beyond the valid bis_handle[] flex
array into adjacent heap memory. Since the out-of-bounds values
typically exceed HCI_CONN_HANDLE_MAX (0x0EFF), hci_conn_set_handle()
rejects them and the connection remains in BT_BOUND state. The same
connection is then found again by hci_conn_hash_lookup_big_state(),
creating an infinite loop with hci_dev_lock held.
Fix this by terminating the BIG if in case not all BIS could be setup
properly.
Fixes: a0bfde167b50 ("Bluetooth: ISO: Add support for connecting multiple BISes")
Cc: stable@vger.kernel.org
Signed-off-by: ZhiTao Ou <hkbinbinbin@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Siwei Zhang <oss@fourdim.xyz>
Date: Wed Apr 15 16:53:36 2026 -0400
Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_get_sndtimeo_cb()
commit 78a88d43dab8d23aeef934ed8ce34d40e6b3d613 upstream.
Add the same NULL guard already present in
l2cap_sock_resume_cb() and l2cap_sock_ready_cb().
Fixes: 8d836d71e222 ("Bluetooth: Access sk_sndtimeo indirectly in l2cap_core.c")
Cc: stable@kernel.org
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Siwei Zhang <oss@fourdim.xyz>
Date: Wed Apr 15 16:49:59 2026 -0400
Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_new_connection_cb()
commit 0a120d96166301d7a95be75b52f843837dbd1219 upstream.
Add the same NULL guard already present in
l2cap_sock_resume_cb() and l2cap_sock_ready_cb().
Fixes: 80808e431e1e ("Bluetooth: Add l2cap_chan_ops abstraction")
Cc: stable@kernel.org
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Siwei Zhang <oss@fourdim.xyz>
Date: Wed Apr 15 16:51:36 2026 -0400
Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_state_change_cb()
commit 2ff1a41a912de8517b4482e946dd951b7d80edbf upstream.
Add the same NULL guard already present in
l2cap_sock_resume_cb() and l2cap_sock_ready_cb().
Fixes: 89bc500e41fc ("Bluetooth: Add state tracking to struct l2cap_chan")
Cc: stable@kernel.org
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Michael Bommarito <michael.bommarito@gmail.com>
Date: Tue Apr 21 13:08:44 2026 -0400
Bluetooth: virtio_bt: clamp rx length before skb_put
commit 21bd244b6de5d2fe1063c23acc93fbdd2b20d112 upstream.
virtbt_rx_work() calls skb_put(skb, len) where len comes directly
from virtqueue_get_buf() with no validation against the buffer we
posted to the device. The RX skb is allocated in virtbt_add_inbuf()
and exposed to virtio as exactly 1000 bytes via sg_init_one().
Checking len against skb_tailroom(skb) is not sufficient because
alloc_skb() can leave more tailroom than the 1000 bytes actually
handed to the device. A malicious or buggy backend can therefore
report used.len between 1001 and skb_tailroom(skb), causing skb_put()
to include uninitialized kernel heap bytes that were never written by
the device.
The same path also accepts len == 0, in which case skb_put(skb, 0)
leaves the skb empty but virtbt_rx_handle() still reads the pkt_type
byte from skb->data, consuming uninitialized memory.
Define VIRTBT_RX_BUF_SIZE once and reuse it in alloc_skb() and
sg_init_one(), and gate virtbt_rx_work() on that same constant so
the bound checked matches the buffer actually exposed to the device.
Reject used.len == 0 in the same gate so an empty completion can
no longer reach virtbt_rx_handle().
Use bt_dev_err_ratelimited() because the length value comes from an
untrusted backend that can otherwise flood the kernel log.
Same class of bug as commit c04db81cd028 ("net/9p: Fix buffer
overflow in USB transport layer"), which hardened the USB 9p
transport against unchecked device-reported length.
Fixes: 160fbcf3bfb9 ("Bluetooth: virtio_bt: Use skb_put to set length")
Cc: stable@vger.kernel.org
Cc: Soenke Huster <soenke.huster@eknoes.de>
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Michael Bommarito <michael.bommarito@gmail.com>
Date: Tue Apr 21 13:08:45 2026 -0400
Bluetooth: virtio_bt: validate rx pkt_type header length
commit daf23014e5d975e72ea9c02b5160d3fcf070ea47 upstream.
virtbt_rx_handle() reads the leading pkt_type byte from the RX skb
and forwards the remainder to hci_recv_frame() for every
event/ACL/SCO/ISO type, without checking that the remaining payload
is at least the fixed HCI header for that type.
After the preceding patch bounds the backend-supplied used.len to
[1, VIRTBT_RX_BUF_SIZE], a one-byte completion still reaches
hci_recv_frame() with skb->len already pulled to 0. If the byte
happened to be HCI_ACLDATA_PKT, the ACL-vs-ISO classification
fast-path in hci_dev_classify_pkt_type() dereferences
hci_acl_hdr(skb)->handle whenever the HCI device has an active
CIS_LINK, BIS_LINK, or PA_LINK connection, reading two bytes of
uninitialized RX-buffer data. The same hazard exists for every
packet type the driver accepts because none of the switch cases in
virtbt_rx_handle() check skb->len against the per-type minimum HCI
header size before handing the frame to the core.
After stripping pkt_type, require skb->len to cover the fixed
header size for the selected type (event 2, ACL 4, SCO 3, ISO 4)
before calling hci_recv_frame(); drop ratelimited otherwise.
Unknown pkt_type values still take the original kfree_skb() default
path.
Use bt_dev_err_ratelimited() because both the length and pkt_type
values come from an untrusted backend that can otherwise flood the
kernel log.
Fixes: 160fbcf3bfb9 ("Bluetooth: virtio_bt: Use skb_put to set length")
Cc: stable@vger.kernel.org
Cc: Soenke Huster <soenke.huster@eknoes.de>
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Alexei Starovoitov <ast@kernel.org>
Date: Mon Apr 13 12:42:45 2026 -0700
bpf: Fix use-after-free in arena_vm_close on fork
commit 4fddde2a732de60bb97e3307d4eb69ac5f1d2b74 upstream.
arena_vm_open() only bumps vml->mmap_count but never registers the
child VMA in arena->vma_list. The vml->vma always points at the
parent VMA, so after parent munmap the pointer dangles. If the child
then calls bpf_arena_free_pages(), zap_pages() reads the stale
vml->vma triggering use-after-free.
Fix this by preventing the arena VMA from being inherited across
fork with VM_DONTCOPY, and preventing VMA splits via the may_split
callback.
Also reject mremap with a .mremap callback returning -EINVAL. A
same-size mremap(MREMAP_FIXED) on the full arena VMA reaches
copy_vma() through the following path:
check_prep_vma() - returns 0 early: new_len == old_len
skips VM_DONTEXPAND check
prep_move_vma() - vm_start == old_addr and
vm_end == old_addr + old_len
so may_split is never called
move_vma()
copy_vma_and_data()
copy_vma()
vm_area_dup() - copies vm_private_data (vml pointer)
vm_ops->open() - bumps vml->mmap_count
vm_ops->mremap() - returns -EINVAL, rollback unmaps new VMA
The refcount ensures the rollback's arena_vm_close does not free
the vml shared with the original VMA.
Reported-by: Weiming Shi <bestswngs@gmail.com>
Reported-by: Xiang Mei <xmei5@asu.edu>
Fixes: 317460317a02 ("bpf: Introduce bpf_arena.")
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260413194245.21449-1-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Qu Wenruo <wqu@suse.com>
Date: Mon Feb 16 13:19:38 2026 +1030
btrfs: do not mark inode incompressible after inline attempt fails
commit 2e0e3716c7b6f8d71df2fbe709b922e54700f71b upstream.
[BUG]
The following sequence will set the file with nocompress flag:
# mkfs.btrfs -f $dev
# mount $dev $mnt -o max_inline=4,compress
# xfs_io -f -c "pwrite 0 2k" -c sync $mnt/foobar
The inode will have NOCOMPRESS flag, even if the content itself (all 0xcd)
can still be compressed very well:
item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160
generation 9 transid 10 size 2097152 nbytes 1052672
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 257 flags 0x8(NOCOMPRESS)
Please note that, this behavior is there even before commit 59615e2c1f63
("btrfs: reject single block sized compression early").
[CAUSE]
At compress_file_range(), after btrfs_compress_folios() call, we try
making an inlined extent by calling cow_file_range_inline().
But cow_file_range_inline() calls can_cow_file_range_inline() which has
more accurate checks on if the range can be inlined.
One of the user configurable conditions is the "max_inline=" mount
option. If that value is set low (like the example, 4 bytes, which
cannot store any header), or the compressed content is just slightly
larger than 2K (the default value, meaning a 50% compression ratio),
cow_file_range_inline() will return 1 immediately.
And since we're here only to try inline the compressed data, the range
is no larger than a single fs block.
Thus compression is never going to make it a win, we fall back to
marking the inode incompressible unavoidably.
[FIX]
Just add an extra check after inline attempt, so that if the inline
attempt failed, do not set the nocompress flag.
As there is no way to remove that flag, and the default 50% compression
ratio is way too strict for the whole inode.
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yochai Eisenrich <yochaie@sweet.security>
Date: Sun Mar 22 08:39:35 2026 +0200
btrfs: fix btrfs_ioctl_space_info() slot_count TOCTOU which can lead to info-leak
commit 973e57c726c1f8e77259d1c8e519519f1e9aea77 upstream.
btrfs_ioctl_space_info() has a TOCTOU race between two passes over the
block group RAID type lists. The first pass counts entries to determine
the allocation size, then the second pass fills the buffer. The
groups_sem rwlock is released between passes, allowing concurrent block
group removal to reduce the entry count.
When the second pass fills fewer entries than the first pass counted,
copy_to_user() copies the full alloc_size bytes including trailing
uninitialized kmalloc bytes to userspace.
Fix by copying only total_spaces entries (the actually-filled count from
the second pass) instead of alloc_size bytes, and switch to kzalloc so
any future copy size mismatch cannot leak heap data.
Fixes: 7fde62bffb57 ("Btrfs: buffer results in the space_info ioctl")
CC: stable@vger.kernel.org # 3.0
Signed-off-by: Yochai Eisenrich <echelonh@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Guangshuo Li <lgs201920130244@gmail.com>
Date: Wed Apr 1 18:56:19 2026 +0800
btrfs: fix double free in create_space_info() error path
commit 3f487be81292702a59ea9dbc4088b3360a50e837 upstream.
When kobject_init_and_add() fails, the call chain is:
create_space_info()
-> btrfs_sysfs_add_space_info_type()
-> kobject_init_and_add()
-> failure
-> kobject_put(&space_info->kobj)
-> space_info_release()
-> kfree(space_info)
Then control returns to create_space_info():
btrfs_sysfs_add_space_info_type() returns error
-> goto out_free
-> kfree(space_info)
This causes a double free.
Keep the direct kfree(space_info) for the earlier failure path, but
after btrfs_sysfs_add_space_info_type() has called kobject_put(), let
the kobject release callback handle the cleanup.
Fixes: a11224a016d6d ("btrfs: fix memory leaks in create_space_info() error paths")
CC: stable@vger.kernel.org # 6.19+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Guangshuo Li <lgs201920130244@gmail.com>
Date: Wed Apr 1 19:02:19 2026 +0800
btrfs: fix double free in create_space_info_sub_group() error path
commit a7449edf96143f192606ec8647e3167e1ecbd728 upstream.
When kobject_init_and_add() fails, the call chain is:
create_space_info_sub_group()
-> btrfs_sysfs_add_space_info_type()
-> kobject_init_and_add()
-> failure
-> kobject_put(&sub_group->kobj)
-> space_info_release()
-> kfree(sub_group)
Then control returns to create_space_info_sub_group(), where:
btrfs_sysfs_add_space_info_type() returns error
-> kfree(sub_group)
Thus, sub_group is freed twice.
Keep parent->sub_group[index] = NULL for the failure path, but after
btrfs_sysfs_add_space_info_type() has called kobject_put(), let the
kobject release callback handle the cleanup.
Fixes: f92ee31e031c ("btrfs: introduce btrfs_space_info sub-group")
CC: stable@vger.kernel.org # 6.18+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Filipe Manana <fdmanana@suse.com>
Date: Thu Apr 9 15:46:51 2026 +0100
btrfs: fix missing last_unlink_trans update when removing a directory
commit 999757231c49376cd1a37308d2c8c4c9932571e1 upstream.
When removing a directory we are not updating its last_unlink_trans field,
which can result in incorrect fsync behaviour in case some one fsyncs the
directory after it was removed because it's holding a file descriptor on
it.
Example scenario:
mkdir /mnt/dir1
mkdir /mnt/dir1/dir2
mkdir /mnt/dir3
sync -f /mnt
# Do some change to the directory and fsync it.
chmod 700 /mnt/dir1
xfs_io -c fsync /mnt/dir1
# Move dir2 out of dir1 so that dir1 becomes empty.
mv /mnt/dir1/dir2 /mnt/dir3/
open fd on /mnt/dir1
call rmdir(2) on path "/mnt/dir1"
fsync fd
<trigger power failure>
When attempting to mount the filesystem, the log replay will fail with
an -EIO error and dmesg/syslog has the following:
[445771.626482] BTRFS info (device dm-0): first mount of filesystem 0368bbea-6c5e-44b5-b409-09abe496e650
[445771.626486] BTRFS info (device dm-0): using crc32c checksum algorithm
[445771.627912] BTRFS info (device dm-0): start tree-log replay
[445771.628335] page: refcount:2 mapcount:0 mapping:0000000061443ddc index:0x1d00 pfn:0x7072a5
[445771.629453] memcg:ffff89f400351b00
[445771.629892] aops:btree_aops [btrfs] ino:1
[445771.630737] flags: 0x17fffc00000402a(uptodate|lru|private|writeback|node=0|zone=2|lastcpupid=0x1ffff)
[445771.632359] raw: 017fffc00000402a fffff47284d950c8 fffff472907b7c08 ffff89f458e412b8
[445771.633713] raw: 0000000000001d00 ffff89f6c51d1a90 00000002ffffffff ffff89f400351b00
[445771.635029] page dumped because: eb page dump
[445771.635825] BTRFS critical (device dm-0): corrupt leaf: root=5 block=30408704 slot=10 ino=258, invalid nlink: has 2 expect no more than 1 for dir
[445771.638088] BTRFS info (device dm-0): leaf 30408704 gen 10 total ptrs 17 free space 14878 owner 5
[445771.638091] BTRFS info (device dm-0): refs 4 lock_owner 0 current 3581087
[445771.638094] item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
[445771.638097] inode generation 3 transid 9 size 16 nbytes 16384
[445771.638098] block group 0 mode 40755 links 1 uid 0 gid 0
[445771.638100] rdev 0 sequence 2 flags 0x0
[445771.638102] atime 1775744884.0
[445771.660056] ctime 1775744885.645502983
[445771.660058] mtime 1775744885.645502983
[445771.660060] otime 1775744884.0
[445771.660062] item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
[445771.660064] index 0 name_len 2
[445771.660066] item 2 key (256 DIR_ITEM 1843588421) itemoff 16077 itemsize 34
[445771.660068] location key (259 1 0) type 2
[445771.660070] transid 9 data_len 0 name_len 4
[445771.660075] item 3 key (256 DIR_ITEM 2363071922) itemoff 16043 itemsize 34
[445771.660076] location key (257 1 0) type 2
[445771.660077] transid 9 data_len 0 name_len 4
[445771.660078] item 4 key (256 DIR_INDEX 2) itemoff 16009 itemsize 34
[445771.660079] location key (257 1 0) type 2
[445771.660080] transid 9 data_len 0 name_len 4
[445771.660081] item 5 key (256 DIR_INDEX 3) itemoff 15975 itemsize 34
[445771.660082] location key (259 1 0) type 2
[445771.660083] transid 9 data_len 0 name_len 4
[445771.660084] item 6 key (257 INODE_ITEM 0) itemoff 15815 itemsize 160
[445771.660086] inode generation 9 transid 9 size 8 nbytes 0
[445771.660087] block group 0 mode 40777 links 1 uid 0 gid 0
[445771.660088] rdev 0 sequence 2 flags 0x0
[445771.660089] atime 1775744885.641174097
[445771.660090] ctime 1775744885.645502983
[445771.660091] mtime 1775744885.645502983
[445771.660105] otime 1775744885.641174097
[445771.660106] item 7 key (257 INODE_REF 256) itemoff 15801 itemsize 14
[445771.660107] index 2 name_len 4
[445771.660108] item 8 key (257 DIR_ITEM 2676584006) itemoff 15767 itemsize 34
[445771.660109] location key (258 1 0) type 2
[445771.660110] transid 9 data_len 0 name_len 4
[445771.660111] item 9 key (257 DIR_INDEX 2) itemoff 15733 itemsize 34
[445771.660112] location key (258 1 0) type 2
[445771.660113] transid 9 data_len 0 name_len 4
[445771.660114] item 10 key (258 INODE_ITEM 0) itemoff 15573 itemsize 160
[445771.660115] inode generation 9 transid 10 size 0 nbytes 0
[445771.660116] block group 0 mode 40755 links 2 uid 0 gid 0
[445771.660117] rdev 0 sequence 0 flags 0x0
[445771.660118] atime 1775744885.645502983
[445771.660119] ctime 1775744885.645502983
[445771.660120] mtime 1775744885.645502983
[445771.660121] otime 1775744885.645502983
[445771.660122] item 11 key (258 INODE_REF 257) itemoff 15559 itemsize 14
[445771.660123] index 2 name_len 4
[445771.660124] item 12 key (258 INODE_REF 259) itemoff 15545 itemsize 14
[445771.660125] index 2 name_len 4
[445771.660126] item 13 key (259 INODE_ITEM 0) itemoff 15385 itemsize 160
[445771.660127] inode generation 9 transid 10 size 8 nbytes 0
[445771.660128] block group 0 mode 40755 links 1 uid 0 gid 0
[445771.660129] rdev 0 sequence 1 flags 0x0
[445771.660130] atime 1775744885.645502983
[445771.660130] ctime 1775744885.645502983
[445771.660131] mtime 1775744885.645502983
[445771.660132] otime 1775744885.645502983
[445771.660133] item 14 key (259 INODE_REF 256) itemoff 15371 itemsize 14
[445771.660134] index 3 name_len 4
[445771.660135] item 15 key (259 DIR_ITEM 2676584006) itemoff 15337 itemsize 34
[445771.660136] location key (258 1 0) type 2
[445771.660137] transid 10 data_len 0 name_len 4
[445771.660138] item 16 key (259 DIR_INDEX 2) itemoff 15303 itemsize 34
[445771.660139] location key (258 1 0) type 2
[445771.660140] transid 10 data_len 0 name_len 4
[445771.660144] BTRFS error (device dm-0): block=30408704 write time tree block corruption detected
[445771.661650] ------------[ cut here ]------------
[445771.662358] WARNING: fs/btrfs/disk-io.c:326 at btree_csum_one_bio+0x217/0x230 [btrfs], CPU#8: mount/3581087
[445771.663588] Modules linked in: btrfs f2fs xfs (...)
[445771.671229] CPU: 8 UID: 0 PID: 3581087 Comm: mount Tainted: G W 7.0.0-rc6-btrfs-next-230+ #2 PREEMPT(full)
[445771.672575] Tainted: [W]=WARN
[445771.672987] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[445771.674460] RIP: 0010:btree_csum_one_bio+0x217/0x230 [btrfs]
[445771.675222] Code: 89 44 24 (...)
[445771.677364] RSP: 0018:ffffd23882247660 EFLAGS: 00010246
[445771.678029] RAX: 0000000000000000 RBX: ffff89f6c51d1a90 RCX: 0000000000000000
[445771.678975] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff89f406020000
[445771.679983] RBP: ffff89f821204000 R08: 0000000000000000 R09: 00000000ffefffff
[445771.680905] R10: ffffd23882247448 R11: 0000000000000003 R12: ffffd23882247668
[445771.681978] R13: ffff89f458e40fc0 R14: ffff89f737f4f500 R15: ffff89f737f4f500
[445771.682912] FS: 00007f0447a98840(0000) GS:ffff89fb9771d000(0000) knlGS:0000000000000000
[445771.684393] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[445771.685230] CR2: 00007f0447bf1330 CR3: 000000017cb02002 CR4: 0000000000370ef0
[445771.686273] Call Trace:
[445771.686646] <TASK>
[445771.686969] btrfs_submit_bbio+0x83f/0x860 [btrfs]
[445771.687750] ? write_one_eb+0x28f/0x340 [btrfs]
[445771.688428] btree_writepages+0x2e3/0x550 [btrfs]
[445771.689180] ? kmem_cache_alloc_noprof+0x12a/0x490
[445771.689963] ? alloc_extent_state+0x19/0x120 [btrfs]
[445771.690801] ? kmem_cache_free+0x135/0x380
[445771.691328] ? preempt_count_add+0x69/0xa0
[445771.691831] ? set_extent_bit+0x252/0x8e0 [btrfs]
[445771.692468] ? xas_load+0x9/0xc0
[445771.692873] ? xas_find+0x14d/0x1a0
[445771.693304] do_writepages+0xc6/0x160
[445771.693756] filemap_writeback+0xb8/0xe0
[445771.694274] btrfs_write_marked_extents+0x61/0x170 [btrfs]
[445771.694999] btrfs_write_and_wait_transaction+0x4e/0xc0 [btrfs]
[445771.695818] btrfs_commit_transaction+0x5c8/0xd10 [btrfs]
[445771.696530] ? kmem_cache_free+0x135/0x380
[445771.697120] ? release_extent_buffer+0x34/0x160 [btrfs]
[445771.697786] btrfs_recover_log_trees+0x7be/0x7e0 [btrfs]
[445771.698525] ? __pfx_replay_one_buffer+0x10/0x10 [btrfs]
[445771.699206] open_ctree+0x11e5/0x1810 [btrfs]
[445771.699776] btrfs_get_tree.cold+0xb/0x162 [btrfs]
[445771.700463] ? fscontext_read+0x165/0x180
[445771.701146] ? rw_verify_area+0x50/0x180
[445771.701866] vfs_get_tree+0x25/0xd0
[445771.702491] vfs_cmd_create+0x59/0xe0
[445771.703125] __do_sys_fsconfig+0x303/0x610
[445771.703603] do_syscall_64+0xe9/0xf20
[445771.703974] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[445771.704700] RIP: 0033:0x7f0447cbd4aa
[445771.705108] Code: 73 01 c3 (...)
[445771.707263] RSP: 002b:00007ffc4e528318 EFLAGS: 00000246 ORIG_RAX: 00000000000001af
[445771.708107] RAX: ffffffffffffffda RBX: 00005561585d8c20 RCX: 00007f0447cbd4aa
[445771.708931] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
[445771.709744] RBP: 00005561585d9120 R08: 0000000000000000 R09: 0000000000000000
[445771.710674] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[445771.711477] R13: 00007f0447e4f580 R14: 00007f0447e5126c R15: 00007f0447e36a23
[445771.712277] </TASK>
[445771.712541] ---[ end trace 0000000000000000 ]---
[445771.713382] BTRFS error (device dm-0): error while writing out transaction: -5
[445771.714679] BTRFS warning (device dm-0): Skipping commit of aborted transaction.
[445771.715562] BTRFS error (device dm-0 state A): Transaction aborted (error -5)
[445771.716459] BTRFS: error (device dm-0 state A) in cleanup_transaction:2068: errno=-5 IO failure
[445771.717936] BTRFS error (device dm-0 state EA): failed to recover log trees with error: -5
[445771.719681] BTRFS error (device dm-0 state EA): open_ctree failed: -5
The problem is that such a fsync should have result in a fallback to a
transaction commit, but that did not happen because through the
btrfs_rmdir() we never update the directory's last_unlink_trans field.
Any inode that had a link removed must have its last_unlink_trans updated
to the ID of transaction used for the operation, otherwise fsync and log
replay will not work correctly.
btrfs_rmdir() calls btrfs_unlink_inode() and through that call chain we
never call btrfs_record_unlink_dir() in order to update last_unlink_trans.
However btrfs_unlink(), which is used for unlinking regular files, calls
btrfs_record_unlink_dir() and then calls btrfs_unlink_inode(). So fix
this by moving the call to btrfs_record_unlink_dir() from btrfs_unlink()
to btrfs_unlink_inode().
A test case for fstests will follow soon.
Reported-by: Slava0135 <slava.kovalevskiy.2014@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAAJYhww5ov62Hm+n+tmhcL-e_4cBobg+OWogKjOJxVUXivC=MQ@mail.gmail.com/
CC: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Shyam Prasad N <sprasad@microsoft.com>
Date: Tue Apr 28 21:37:47 2026 +0530
cifs: abort open_cached_dir if we don't request leases
commit d68ce834f8cf6cb2e77f3331df65166b35466b53 upstream.
It is possible that SMB2_open_init may not set lease context based
on the requested oplock level. This can happen when leases have been
temporarily or permanently disabled. When this happens, we will have
open_cached_dir making an open without lease context and the response
will anyway be rejected by open_cached_dir (thereby forcing a close to
discard this open). That's unnecessary two round-trips to the server.
This change adds a check before making the open request to the server
to make sure that SMB2_open_init did add the expected lease context
to the open in open_cached_dir.
Cc: <stable@vger.kernel.org>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Shyam Prasad N <sprasad@microsoft.com>
Date: Mon Mar 30 16:19:59 2026 +0530
cifs: change_conf needs to be called for session setup
commit c208a2b95811d6e1ebae65d0d2fc13f73707f8e7 upstream.
Today we skip calling change_conf for negotiates and session setup
requests. This can be a problem for mchan as the immediate next call
after session setup could be due to an I/O that is made on the
mount point. For single channel, this is not a problem as
there will be several calls after setting up session.
This change enforces calling change_conf when the total credits contain
enough for reservations for echoes and oplocks. We expect this to happen
during the last session setup response. This way, echoes and oplocks are
not disabled before the first request to the server. So if that first
request is an open, it does not need to disable requesting leases.
Cc: <stable@vger.kernel.org>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Stefan Eichenberger <stefan.eichenberger@toradex.com>
Date: Thu Feb 12 16:57:50 2026 +0800
clk: imx: imx8-acm: fix flags for acm clocks
commit f2c2fc93b4a3efdfcf3805ab74741826d343ff2c upstream.
Currently, the flags for the ACM clocks are set to 0. This configuration
causes the fsl-sai audio driver to fail when attempting to set the
sysclk, returning an EINVAL error. The following error messages
highlight the issue:
fsl-sai 59090000.sai: ASoC: error at snd_soc_dai_set_sysclk on 59090000.sai: -22
imx-hdmi sound-hdmi: failed to set cpu sysclk: -22
By setting the flag CLK_SET_RATE_NO_REPARENT, we signal that the ACM
driver does not support reparenting and instead relies on the clock tree
as defined in the device tree. This change resolves the issue with the
fsl-sai audio driver.
CC: stable@vger.kernel.org
Fixes: d3a0946d7ac9 ("clk: imx: imx8: add audio clock mux driver")
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com>
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Link: https://patch.msgid.link/20260212085750.3253187-1-shengjiu.wang@nxp.com
Signed-off-by: Abel Vesa <abel.vesa@oss.qualcomm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Conor Dooley <conor.dooley@microchip.com>
Date: Tue Feb 24 09:35:25 2026 +0000
clk: microchip: mpfs-ccc: fix out of bounds access during output registration
commit 2f7ae8ab6aa73daaf080d5332110357c29df9c36 upstream.
UBSAN reported an out of bounds access during registration of the last
two outputs. This out of bounds access occurs because space is only
allocated in the hws array for two PLLs and the four output dividers
that each has, but the defined IDs contain two DLLS and their two
outputs each, which are not supported by the driver. The ID order is
PLLs -> DLLs -> PLL outputs -> DLL outputs. Decrement the PLL output IDs
by two while adding them to the array to avoid the problem.
Fixes: d39fb172760e ("clk: microchip: add PolarFire SoC fabric clock support")
CC: stable@vger.kernel.org
Reviewed-by: Brian Masney <bmasney@redhat.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Tue Apr 7 11:50:27 2026 +0200
clk: rk808: fix OF node reference imbalance
commit de019f203b0d472c98ead4081ad4f05d92c9b826 upstream.
The driver reuses the OF node of the parent multi-function device but
fails to take another reference to balance the one dropped by the
platform bus code when unbinding the MFD and deregistering the child
devices.
Fix this by using the intended helper for reusing OF nodes.
Fixes: 2dc51ca822e4 ("clk: RK808: Reduce 'struct rk808' usage")
Cc: stable@vger.kernel.org # 6.5
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Brian Masney <bmasney@redhat.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Shrikanth Hegde <sshegde@linux.ibm.com>
Date: Wed Mar 11 11:47:09 2026 +0530
cpuidle: powerpc: avoid double clear when breaking snooze
commit 64ed1e3e728afb57ba9acb59e69de930ead847d9 upstream.
snooze_loop is done often in any system which has fair bit of
idle time. So it qualifies for even micro-optimizations.
When breaking the snooze due to timeout, TIF_POLLING_NRFLAG is cleared
twice. Clearing the bit invokes atomics. Avoid double clear and thereby
avoid one atomic write.
dev->poll_time_limit indicates whether the loop was broken due to
timeout. Use that instead of defining a new variable.
Fixes: 7ded429152e8 ("cpuidle: powerpc: no memory barrier after break from idle")
Cc: stable@vger.kernel.org
Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260311061709.1230440-1-sshegde@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thorsten Blum <thorsten.blum@linux.dev>
Date: Sat May 9 14:59:29 2026 -0400
crypto: caam - guard HMAC key hex dumps in hash_digest_key
[ Upstream commit 177730a273b18e195263ed953853273e901b5064 ]
Use print_hex_dump_devel() for dumping sensitive HMAC key bytes in
hash_digest_key() to avoid leaking secrets at runtime when
CONFIG_DYNAMIC_DEBUG is enabled.
Fixes: 045e36780f11 ("crypto: caam - ahash hmac support")
Fixes: 3f16f6c9d632 ("crypto: caam/qi2 - add support for ahash algorithms")
Cc: stable@vger.kernel.org
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Suman Kumar Chakraborty <suman.kumar.chakraborty@intel.com>
Date: Tue May 5 06:17:38 2026 -0400
crypto: qat - fix firmware loading failure for GEN6 devices
[ Upstream commit e7dcb722bb75bb3f3992f580a8728a794732fd7a ]
QAT GEN6 hardware requires a minimum 3 us delay during the acceleration
engine reset sequence to ensure the hardware fully settles.
Without this delay, the firmware load may fail intermittently.
Add a delay after placing the AE into reset and before clearing the reset,
matching the hardware requirements and ensuring stable firmware loading.
Earlier generations remain unaffected.
Fixes: 17fd7514ae68 ("crypto: qat - add qat_6xxx driver")
Signed-off-by: Suman Kumar Chakraborty <suman.kumar.chakraborty@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Suman Kumar Chakraborty <suman.kumar.chakraborty@intel.com>
Date: Tue May 5 06:17:37 2026 -0400
crypto: qat - fix indentation of macros in qat_hal.c
[ Upstream commit 4963b39e3a3feed07fbf4d5cc2b5df8498888285 ]
The macros in qat_hal.c were using a mixture of tabs and spaces.
Update all macro indentation to use tabs consistently, matching the
predominant style.
This does not introduce any functional change.
Signed-off-by: Suman Kumar Chakraborty <suman.kumar.chakraborty@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Stable-dep-of: e7dcb722bb75 ("crypto: qat - fix firmware loading failure for GEN6 devices")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Mon Apr 20 19:56:44 2026 +0200
dm-thin: fix metadata refcount underflow
commit 09a65adc7d8bbfce06392cb6d375468e2728ead5 upstream.
There's a bug in dm-thin in the function rebalance_children. If the
internal btree node has one entry, the code tries to copy all btree
entries from the node's child to the node itself and then decrement the
child's reference count.
If the child node is shared (it has reference count > 1), we won't free
it, so there would be two pointers to each of the grandchildren nodes.
But the reference counts of the grandchildren is not increased, thus the
reference count doesn't match the number of pointers that point to the
grandchildren. This results in "device mapper: space map common: unable
to decrement block" errors.
Fix this bug by incrementing reference counts on the grandchildren if the
btree node is shared.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 3241b1d3e0aa ("dm: add persistent data library")
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Eric Biggers <ebiggers@kernel.org>
Date: Thu Feb 5 20:59:20 2026 -0800
dm-verity-fec: correctly reject too-small FEC devices
commit 2b14e0bb63cc671120e7791658f5c494fc66d072 upstream.
Fix verity_fec_ctr() to reject too-small FEC devices by correctly
computing the number of parity blocks as 'f->rounds * f->roots'.
Previously it incorrectly used 'div64_u64(f->rounds * f->roots,
v->fec->roots << SECTOR_SHIFT)' which is a much smaller value.
Note that the units of 'rounds' are blocks, not bytes. This matches the
units of the value returned by dm_bufio_get_device_size(), which are
also blocks. A later commit will give 'rounds' a clearer name.
Fixes: a739ff3f543a ("dm verity: add support for forward error correction")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Eric Biggers <ebiggers@kernel.org>
Date: Thu Feb 5 20:59:21 2026 -0800
dm-verity-fec: correctly reject too-small hash devices
commit 4355142245f7e55336dcc005ec03592df4d546f8 upstream.
Fix verity_fec_ctr() to reject too-small hash devices by correctly
taking hash_start into account.
Note that this is necessary because dm-verity doesn't call
dm_bufio_set_sector_offset() on the hash device's bufio client
(v->bufio). Thus, dm_bufio_get_device_size(v->bufio) returns a size
relative to 0 rather than hash_start. An alternative fix would be to
call dm_bufio_set_sector_offset() on v->bufio, but then all the code
that reads from the hash device would have to be adjusted accordingly.
Fixes: a739ff3f543a ("dm verity: add support for forward error correction")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Eric Biggers <ebiggers@kernel.org>
Date: Thu Feb 5 20:59:22 2026 -0800
dm-verity-fec: fix corrected block count stat
commit 48640c88a8ddd482b6456fcbc084b08dd2bac083 upstream.
dm_verity_fec::corrected seems to have been intended to count the number
of corrected blocks. However, it actually counted the number of calls
to fec_decode_bufs() that corrected at least one error. That's not the
same thing. For example, in low-memory situations correcting a single
block can require many calls to fec_decode_bufs().
Fix it to count corrected blocks instead.
Fixes: ae97648e14f7 ("dm verity fec: Expose corrected block count via status")
Cc: Shubhankar Mishra <shubhankarm@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Eric Biggers <ebiggers@kernel.org>
Date: Thu Feb 5 20:59:24 2026 -0800
dm-verity-fec: fix reading parity bytes split across blocks (take 3)
commit 430a05cb926f6bdf53e81460a2c3a553257f3f61 upstream.
fec_decode_bufs() assumes that the parity bytes of the first RS codeword
it decodes are never split across parity blocks.
This assumption is false. Consider v->fec->block_size == 4096 &&
v->fec->roots == 17 && fio->nbufs == 1, for example. In that case, each
call to fec_decode_bufs() consumes v->fec->roots * (fio->nbufs <<
DM_VERITY_FEC_BUF_RS_BITS) = 272 parity bytes.
Considering that the parity data for each message block starts on a
block boundary, the byte alignment in the parity data will iterate
through 272*i mod 4096 until the 3 parity blocks have been consumed. On
the 16th call (i=15), the alignment will be 4080 bytes into the first
block. Only 16 bytes remain in that block, but 17 parity bytes will be
needed. The code reads out-of-bounds from the parity block buffer.
Fortunately this doesn't normally happen, since it can occur only for
certain non-default values of fec_roots *and* when the maximum number of
buffers couldn't be allocated due to low memory. For example with
block_size=4096 only the following cases are affected:
fec_roots=17: nbufs in [1, 3, 5, 15]
fec_roots=19: nbufs in [1, 229]
fec_roots=21: nbufs in [1, 3, 5, 13, 15, 39, 65, 195]
fec_roots=23: nbufs in [1, 89]
Regardless, fix it by refactoring how the parity blocks are read.
Fixes: 6df90c02bae4 ("dm-verity FEC: Fix RS FEC repair for roots unaligned to block size (take 2)")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Eric Biggers <ebiggers@kernel.org>
Date: Thu Feb 5 20:59:23 2026 -0800
dm-verity-fec: fix the size of dm_verity_fec_io::erasures
commit a7fca324d7d90f7b139d4d32747c83a629fdb446 upstream.
At most 25 entries in dm_verity_fec_io::erasures are used: the maximum
number of FEC roots plus one. Therefore, set the array size
accordingly. This reduces the size of dm_verity_fec_io by 912 bytes.
Note: a later commit introduces a constant DM_VERITY_FEC_MAX_ROOTS,
which allows the size to be more clearly expressed as
DM_VERITY_FEC_MAX_ROOTS + 1. This commit just fixes the size first.
Fixes: a739ff3f543a ("dm verity: add support for forward error correction")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Mon Mar 16 15:04:15 2026 +0100
dm: don't report warning when doing deferred remove
commit b7cce3e2cca9cd78418f3c3784474b778e7996fe upstream.
If dm_hash_remove_all was called from dm_deferred_remove, it would write
a warning "remove_all left %d open device(s)" if there are some other
devices active.
The warning is bogus, so let's disable it in this case.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 2c140a246dc0 ("dm: allow remove to be deferred")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Thu Apr 9 17:49:58 2026 +0200
dm: fix a buffer overflow in ioctl processing
commit 2fa49cc884f6496a915c35621ba4da35649bf159 upstream.
Tony Asleson (using Claude) found a buffer overflow in dm-ioctl in the
function retrieve_status:
1. The code in retrieve_status checks that the output string fits into
the output buffer and writes the output string there
2. Then, the code aligns the "outptr" variable to the next 8-byte
boundary:
outptr = align_ptr(outptr);
3. The alignment doesn't check overflow, so outptr could point past the
buffer end
4. The "for" loop is iterated again, it executes:
remaining = len - (outptr - outbuf);
5. If "outptr" points past "outbuf + len", the arithmetics wraps around
and the variable "remaining" contains unusually high number
6. With "remaining" being high, the code writes more data past the end of
the buffer
Luckily, this bug has no security implications because:
1. Only root can issue device mapper ioctls
2. The commonly used libraries that communicate with device mapper
(libdevmapper and devicemapper-rs) use buffer size that is aligned to
8 bytes - thus, "outptr = align_ptr(outptr)" can't overshoot the input
buffer and the bug can't happen accidentally
Reported-by: Tony Asleson <tasleson@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Bryn M. Reeves <bmr@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: David Carlier <devnexen@gmail.com>
Date: Sat Apr 18 20:17:37 2026 +0100
eventfs: Hold eventfs_mutex and SRCU when remount walks events
commit 07004a8c4b572171934390148ee48c4175c77eed upstream.
Commit 340f0c7067a9 ("eventfs: Update all the eventfs_inodes from the
events descriptor") had eventfs_set_attrs() recurse through ei->children
on remount. The walk only holds the rcu_read_lock() taken by
tracefs_apply_options() over tracefs_inodes, which is wrong:
- list_for_each_entry over ei->children races with the list_del_rcu()
in eventfs_remove_rec() -- LIST_POISON1 deref, same shape as
d2603279c7d6.
- eventfs_inodes are freed via call_srcu(&eventfs_srcu, ...).
rcu_read_lock() does not extend an SRCU grace period, so ti->private
can be reclaimed under the walk.
- The writes to ei->attr race with eventfs_set_attr(), which holds
eventfs_mutex.
Reproducer:
while :; do mount -o remount,uid=$((RANDOM%1000)) /sys/kernel/tracing; done &
while :; do
echo "p:kp submit_bio" > /sys/kernel/tracing/kprobe_events
echo > /sys/kernel/tracing/kprobe_events
done
Wrap the events portion of tracefs_apply_options() in
eventfs_remount_lock()/_unlock() that take eventfs_mutex and
srcu_read_lock(&eventfs_srcu). eventfs_set_attrs() doesn't sleep so the
nested rcu_read_lock() is fine; lockdep_assert_held() pins the contract.
Comment in tracefs_drop_inode() said "RCU cycle" -- it is SRCU.
Fixes: 340f0c7067a9 ("eventfs: Update all the eventfs_inodes from the events descriptor")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260418191737.10289-1-devnexen@gmail.com
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jann Horn <jannh@google.com>
Date: Mon May 11 08:55:11 2026 -0700
exit: prevent preemption of oopsing TASK_DEAD task
commit c1fa0bb633e4a6b11e83ffc57fa5abe8ebb87891 upstream.
When an already-exiting task oopses, make_task_dead() currently calls
do_task_dead() with preemption enabled. That is forbidden:
do_task_dead() calls __schedule(), which has a comment saying "WARNING:
must be called with preemption disabled!".
If an oopsing task is preempted in do_task_dead(), between becoming
TASK_DEAD and entering the scheduler explicitly, bad things happen:
finish_task_switch() assumes that once the scheduler has switched away
from a TASK_DEAD task, the task can never run again and its stack is no
longer needed; but that assumption apparently doesn't hold if the dead
task was preempted (the SM_PREEMPT case).
This means that the scheduler ends up repeatedly dropping references on
the dead task's stack, which can lead to use-after-free or double-free
of the entire task stack; in other words, two tasks can end up running
on the same stack, resulting in various kinds of memory corruption.
(This does not just affect "recursively oopsing" tasks; it is enough to
oops once during task exit, for example in a file_operations::release
handler)
Fixes: 7f80a2fd7db9 ("exit: Stop poorly open coding do_task_dead in make_task_dead")
Cc: stable@kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Xu Yang <xu.yang_2@nxp.com>
Date: Sat Nov 15 10:59:05 2025 +0800
extcon: ptn5150: handle pending IRQ events during system resume
commit 4652fefcda3c604c83d1ae28ede94544e2142f06 upstream.
When the system is suspended and ptn5150 wakeup interrupt is disabled,
any changes on ptn5150 will only be record in interrupt status
registers and won't fire an IRQ since its trigger type is falling
edge. So the HW interrupt line will keep at low state and any further
changes won't trigger IRQ anymore. To fix it, this will schedule a
work to check whether any IRQ are pending and handle it accordingly.
Fixes: 4ed754de2d66 ("extcon: Add support for ptn5150 extcon driver")
Cc: stable@vger.kernel.org
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Link: https://lore.kernel.org/lkml/20251115025905.1395347-1-xu.yang_2@nxp.com/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Cen Zhang <zzzccc427@gmail.com>
Date: Wed Mar 18 15:32:53 2026 +0800
f2fs: add READ_ONCE() for i_blocks in f2fs_update_inode()
commit 5471834a96fb697874be2ca0b052e74bcf3c23d1 upstream.
f2fs_update_inode() reads inode->i_blocks without holding i_lock to
serialize it to the on-disk inode, while concurrent truncate or
allocation paths may modify i_blocks under i_lock. Since blkcnt_t is
u64, this risks torn reads on 32-bit architectures.
Following the approach in ext4_inode_blocks_set(), add READ_ONCE() to prevent
potential compiler-induced tearing.
Fixes: 19f99cee206c ("f2fs: add core inode operations")
Cc: stable@vger.kernel.org
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Chao Yu <chao@kernel.org>
Date: Fri Mar 6 12:24:20 2026 +0000
f2fs: fix false alarm of lockdep on cp_global_sem lock
commit 6a5e3de9c2bb0b691d16789a5d19e9276a09b308 upstream.
lockdep reported a potential deadlock:
a) TCMU device removal context:
- call del_gendisk() to get q->q_usage_counter
- call start_flush_work() to get work_completion of wb->dwork
b) f2fs writeback context:
- in wb_workfn(), which holds work_completion of wb->dwork
- call f2fs_balance_fs() to get sbi->gc_lock
c) f2fs vfs_write context:
- call f2fs_gc() to get sbi->gc_lock
- call f2fs_write_checkpoint() to get sbi->cp_global_sem
d) f2fs mount context:
- call recover_fsync_data() to get sbi->cp_global_sem
- call f2fs_check_and_fix_write_pointer() to call blkdev_report_zones()
that goes down to blk_mq_alloc_request and get q->q_usage_counter
Original callstack is in Closes tag.
However, I think this is a false alarm due to before mount returns
successfully (context d), we can not access file therein via vfs_write
(context c).
Let's introduce per-sb cp_global_sem_key, and assign the key for
cp_global_sem, so that lockdep can recognize cp_global_sem from
different super block correctly.
A lot of work are done by Shin'ichiro Kawasaki, thanks a lot for
the work.
Fixes: c426d99127b1 ("f2fs: Check write pointer consistency of open zones")
Cc: stable@kernel.org
Reported-and-tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/20260218125237.3340441-1-shinichiro.kawasaki@wdc.com
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yongpeng Yang <yangyongpeng@xiaomi.com>
Date: Mon Mar 23 20:06:24 2026 +0800
f2fs: fix fiemap boundary handling when read extent cache is incomplete
commit 95e159ad3e52f7478cfd22e44ec37c9f334f8993 upstream.
f2fs_fiemap() calls f2fs_map_blocks() to obtain the block mapping a
file, and then merges contiguous mappings into extents. If the mapping
is found in the read extent cache, node blocks do not need to be read.
However, in the following scenario, a contiguous extent can be split
into two extents:
$ dd if=/dev/zero of=data.128M bs=1M count=128
$ losetup -f data.128M
$ mkfs.f2fs /dev/loop0 -f
$ mount -o mode=lfs /dev/loop0 /mnt/f2fs/
$ cd /mnt/f2fs/
$ dd if=/dev/zero of=data.72M bs=1M count=72 && sync
$ dd if=/dev/zero of=data.4M bs=1M count=4 && sync
$ dd if=/dev/zero of=data.4M bs=1M count=2 seek=2 conv=notrunc && sync
$ echo 3 > /proc/sys/vm/drop_caches
$ dd if=/dev/zero of=data.4M bs=1M count=2 seek=0 conv=notrunc && sync
$ dd if=/dev/zero of=data.4M bs=1M count=2 seek=0 conv=notrunc && sync
$ f2fs_io fiemap 0 1024 data.4M
Fiemap: offset = 0 len = 1024
logical addr. physical addr. length flags
0 0000000000000000 0000000006400000 0000000000200000 00001000
1 0000000000200000 0000000006600000 0000000000200000 00001001
Although the physical addresses of the ranges 0~2MB and 2M~4MB are
contiguous, the mapping for the 2M~4MB range is not present in memory.
When the physical addresses for the 0~2MB range are updated, no merge
happens because the adjacent mapping is missing from the in-memory
cache. As a result, fiemap reports two separate extents instead of a
single contiguous one.
The root cause is that the read extent cache does not guarantee that all
blocks of an extent are present in memory. Therefore, when the extent
length returned by f2fs_map_blocks_cached() is smaller than maxblocks,
the remaining mappings are retrieved via f2fs_get_dnode_of_data() to
ensure correct fiemap extent boundary handling.
Cc: stable@kernel.org
Fixes: cd8fc5226bef ("f2fs: remove the create argument to f2fs_map_blocks")
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yongpeng Yang <yangyongpeng@xiaomi.com>
Date: Wed Mar 18 16:45:34 2026 +0800
f2fs: fix fsck inconsistency caused by FGGC of node block
commit c3e238bd1f56993f205ef83889d406dfeaf717a8 upstream.
During FGGC node block migration, fsck may incorrectly treat the
migrated node block as fsync-written data.
The reproduction scenario:
root@vm:/mnt/f2fs# seq 1 2048 | xargs -n 1 ./test_sync // write inline inode and sync
root@vm:/mnt/f2fs# rm -f 1
root@vm:/mnt/f2fs# sync
root@vm:/mnt/f2fs# f2fs_io gc_range // move data block in sync mode and not write CP
SPO, "fsck --dry-run" find inode has already checkpointed but still
with DENT_BIT_SHIFT set
The root cause is that GC does not clear the dentry mark and fsync mark
during node block migration, leading fsck to misinterpret them as
user-issued fsync writes.
In BGGC mode, node block migration is handled by f2fs_sync_node_pages(),
which guarantees the dentry and fsync marks are cleared before writing.
This patch move the set/clear of the fsync|dentry marks into
__write_node_folio to make the logic clearer, and ensures the
fsync|dentry mark is cleared in FGGC.
Cc: stable@kernel.org
Fixes: da011cc0da8c ("f2fs: move node pages only in victim section during GC")
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yongpeng Yang <yangyongpeng@xiaomi.com>
Date: Tue Mar 10 17:36:12 2026 +0800
f2fs: fix fsck inconsistency caused by incorrect nat_entry flag usage
commit 019f9dda7f66e55eb94cd32e1d3fff5835f73fbc upstream.
f2fs_need_dentry_mark() reads nat_entry flags without mutual exclusion
with the checkpoint path, which can result in an incorrect inode block
marking state. The scenario is as follows:
create & write & fsync 'file A' write checkpoint
- f2fs_do_sync_file // inline inode
- f2fs_write_inode // inode folio is dirty
- f2fs_write_checkpoint
- f2fs_flush_merged_writes
- f2fs_sync_node_pages
- f2fs_fsync_node_pages // no dirty node
- f2fs_need_inode_block_update // return true
- f2fs_fsync_node_pages // inode dirtied
- f2fs_need_dentry_mark //return true
- f2fs_flush_nat_entries
- f2fs_write_checkpoint end
- __write_node_folio // inode with DENT_BIT_SHIFT set
SPO, "fsck --dry-run" find inode has already checkpointed but still
with DENT_BIT_SHIFT set
The state observed by f2fs_need_dentry_mark() can differ from the state
observed in __write_node_folio() after acquiring sbi->node_write. The
root cause is that the semantics of IS_CHECKPOINTED and
HAS_FSYNCED_INODE are only guaranteed after the checkpoint write has
fully completed.
This patch moves set_dentry_mark() into __write_node_folio() and
protects it with the sbi->node_write lock.
Cc: stable@kernel.org
Fixes: 88bd02c9472a ("f2fs: fix conditions to remain recovery information in f2fs_sync_file")
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yongpeng Yang <yangyongpeng@xiaomi.com>
Date: Tue Feb 3 21:36:35 2026 +0800
f2fs: fix incorrect file address mapping when inline inode is unwritten
commit 68a0178981a0f493295afa29f8880246e561494c upstream.
When `fileinfo->fi_flags` does not have the `FIEMAP_FLAG_SYNC` bit set
and inline data has not been persisted yet, the physical address of the
extent is calculated incorrectly for unwritten inline inodes.
root@vm:/mnt/f2fs# dd if=/dev/zero of=data.3k bs=3k count=1
root@vm:/mnt/f2fs# f2fs_io fiemap 0 100 data.3k
Fiemap: offset = 0 len = 100
logical addr. physical addr. length flags
0 0000000000000000 00000ffffffff16c 0000000000000c00 00000301
This patch fixes the issue by checking if the inode's address is valid.
If the inline inode is unwritten, set the physical address to 0 and
mark the extent with `FIEMAP_EXTENT_UNKNOWN | FIEMAP_EXTENT_DELALLOC`
flags.
Cc: stable@kernel.org
Fixes: 67f8cf3cee6f ("f2fs: support fiemap for inline_data")
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yongpeng Yang <yangyongpeng@xiaomi.com>
Date: Mon Mar 23 20:06:22 2026 +0800
f2fs: fix incorrect multidevice info in trace_f2fs_map_blocks()
commit eb2ca3ca983551a80e16a4a25df5a4ce59df8484 upstream.
When f2fs_map_blocks()->f2fs_map_blocks_cached() hits the read extent
cache, map->m_multidev_dio is not updated, which leads to incorrect
multidevice information being reported by trace_f2fs_map_blocks().
This patch updates map->m_multidev_dio in f2fs_map_blocks_cached() when
the read extent cache is hit.
Cc: stable@kernel.org
Fixes: 0094e98bd147 ("f2fs: factor a f2fs_map_blocks_cached helper")
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yongpeng Yang <yangyongpeng@xiaomi.com>
Date: Wed Mar 18 16:46:35 2026 +0800
f2fs: fix inline data not being written to disk in writeback path
commit fe9b8b30b97102859a9102be7bd2a09803bd90bd upstream.
When f2fs_fiemap() is called with `fileinfo->fi_flags` containing the
FIEMAP_FLAG_SYNC flag, it attempts to write data to disk before
retrieving file mappings via filemap_write_and_wait(). However, there is
an issue where the file does not get mapped as expected. The following
scenario can occur:
root@vm:/mnt/f2fs# dd if=/dev/zero of=data.3k bs=3k count=1
root@vm:/mnt/f2fs# xfs_io data.3k -c "fiemap -v 0 4096"
data.3k:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..5]: 0..5 6 0x307
The root cause of this issue is that f2fs_write_single_data_page() only
calls f2fs_write_inline_data() to copy data from the data folio to the
inode folio, and it clears the dirty flag on the data folio. However, it
does not mark the data folio as writeback. When
__filemap_fdatawait_range() checks for folios with the writeback flag,
it returns early, causing f2fs_fiemap() to report that the file has no
mapping.
To fix this issue, the solution is to call
f2fs_write_single_node_folio() in f2fs_inline_data_fiemap() when
getting fiemap with FIEMAP_FLAG_SYNC flags. This patch ensures that the
inode folio is written back and the writeback process completes before
proceeding.
Cc: stable@kernel.org
Fixes: 9ffe0fb5f3bb ("f2fs: handle inline data operations")
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yongpeng Yang <yangyongpeng@xiaomi.com>
Date: Fri Apr 3 22:40:17 2026 +0800
f2fs: fix node_cnt race between extent node destroy and writeback
commit ed78aeebef05212ef7dca93bd931e4eff67c113f upstream.
f2fs_destroy_extent_node() does not set FI_NO_EXTENT before clearing
extent nodes. When called from f2fs_drop_inode() with I_SYNC set,
concurrent kworker writeback can insert new extent nodes into the same
extent tree, racing with the destroy and triggering f2fs_bug_on() in
__destroy_extent_node(). The scenario is as follows:
drop inode writeback
- iput
- f2fs_drop_inode // I_SYNC set
- f2fs_destroy_extent_node
- __destroy_extent_node
- while (node_cnt) {
write_lock(&et->lock)
__free_extent_tree
write_unlock(&et->lock)
- __writeback_single_inode
- f2fs_outplace_write_data
- f2fs_update_read_extent_cache
- __update_extent_tree_range
// FI_NO_EXTENT not set,
// insert new extent node
} // node_cnt == 0, exit while
- f2fs_bug_on(node_cnt) // node_cnt > 0
Additionally, __update_extent_tree_range() only checks FI_NO_EXTENT for
EX_READ type, leaving EX_BLOCK_AGE updates completely unprotected.
This patch set FI_NO_EXTENT under et->lock in __destroy_extent_node(),
consistent with other callers (__update_extent_tree_range and
__drop_extent_tree) and check FI_NO_EXTENT for both EX_READ and
EX_BLOCK_AGE tree.
Fixes: 3fc5d5a182f6 ("f2fs: fix to shrink read extent node in batches")
Cc: stable@vger.kernel.org
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Guangshuo Li <lgs201920130244@gmail.com>
Date: Fri Apr 10 20:47:26 2026 +0800
f2fs: fix uninitialized kobject put in f2fs_init_sysfs()
commit b635f2ecdb5ad34f9c967cabb704d6bed9382fd0 upstream.
In f2fs_init_sysfs(), all failure paths after kset_register() jump to
put_kobject, which unconditionally releases both f2fs_tune and
f2fs_feat.
If kobject_init_and_add(&f2fs_feat, ...) fails, f2fs_tune has not been
initialized yet, so calling kobject_put(&f2fs_tune) is invalid.
Fix this by splitting the unwind path so each error path only releases
objects that were successfully initialized.
Fixes: a907f3a68ee26ba4 ("f2fs: add a sysfs entry to reclaim POSIX_FADV_NOREUSE pages")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yongpeng Yang <yangyongpeng@xiaomi.com>
Date: Wed Mar 18 16:45:32 2026 +0800
f2fs: refactor f2fs_move_node_folio function
commit 92c20989366e023b74fa0c1028af9436c1917dbf upstream.
This patch refactor the f2fs_move_node_folio() function. No logical
changes.
Cc: stable@kernel.org
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Miklos Szeredi <mszeredi@redhat.com>
Date: Fri Apr 10 16:49:47 2026 +0200
fanotify: fix false positive on permission events
commit 7746e3bd4cc19b5092e00d32d676e329bfcb6900 upstream.
fsnotify_get_mark_safe() may return false for a mark on an unrelated group,
which results in bypassing the permission check.
Fix by skipping over detached marks that are not in the current group.
CC: stable@vger.kernel.org
Fixes: abc77577a669 ("fsnotify: Provide framework for dropping SRCU lock in ->handle_event")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://patch.msgid.link/20260410144950.156160-1-mszeredi@redhat.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thomas Zimmermann <tzimmermann@suse.de>
Date: Tue Apr 7 11:23:12 2026 +0200
fbcon: Avoid OOB font access if console rotation fails
commit e4ef723d8975a2694cc90733a6b888a5e2841842 upstream.
Clear the font buffer if the reallocation during console rotation fails
in fbcon_rotate_font(). The putcs implementations for the rotated buffer
will return early in this case. See [1] for an example.
Currently, fbcon_rotate_font() keeps the old buffer, which is too small
for the rotated font. Printing to the rotated console with a high-enough
character code will overflow the font buffer.
v2:
- fix typos in commit message
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 6cc50e1c5b57 ("[PATCH] fbcon: Console Rotation - Add support to rotate font bitmap")
Cc: stable@vger.kernel.org # v2.6.15+
Link: https://elixir.bootlin.com/linux/v6.19/source/drivers/video/fbdev/core/fbcon_ccw.c#L144 # [1]
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Rajat Gupta <rajgupt@qti.qualcomm.com>
Date: Sun May 3 20:51:10 2026 -0700
fbdev: udlfb: add vm_ops to dlfb_ops_mmap to prevent use-after-free
commit 8de779dc40d35d39fa07387b6f921eb11df0f511 upstream.
dlfb_ops_mmap() uses remap_pfn_range() to map vmalloc framebuffer pages
to userspace but sets no vm_ops on the VMA. This means the kernel cannot
track active mmaps. When dlfb_realloc_framebuffer() replaces the backing
buffer via FBIOPUT_VSCREENINFO, existing mmap PTEs are not invalidated.
On USB disconnect, dlfb_ops_destroy() calls vfree() on the old pages
while userspace PTEs still reference them, resulting in a use-after-free:
the process retains read/write access to freed kernel pages.
Add vm_operations_struct with open/close callbacks that maintain an
atomic mmap_count on struct dlfb_data. In dlfb_realloc_framebuffer(),
check mmap_count and return -EBUSY if the buffer is currently mapped,
preventing buffer replacement while userspace holds stale PTEs.
Tested with PoC using dummy_hcd + raw_gadget USB device emulation.
Signed-off-by: Rajat Gupta <rajgupt@qti.qualcomm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Qingfang Deng <qingfang.deng@linux.dev>
Date: Wed Apr 15 10:24:50 2026 +0800
flow_dissector: do not dissect PPPoE PFC frames
[ Upstream commit d6c19b31a3c1d519fabdcf0aa239e6b6109b9473 ]
RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT
RECOMMENDED for PPPoE. In practice, pppd does not support negotiating
PFC for PPPoE sessions, and the flow dissector driver has assumed an
uncompressed frame until the blamed commit.
During the review process of that commit [1], support for PFC is
suggested. However, having a compressed (1-byte) protocol field means
the subsequent PPP payload is shifted by one byte, causing 4-byte
misalignment for the network header and an unaligned access exception
on some architectures.
The exception can be reproduced by sending a PPPoE PFC frame to an
ethernet interface of a MIPS board, with RPS enabled, even if no PPPoE
session is active on that interface:
$ 0 : 00000000 80c40000 00000000 85144817
$ 4 : 00000008 00000100 80a75758 81dc9bb8
$ 8 : 00000010 8087ae2c 0000003d 00000000
$12 : 000000e0 00000039 00000000 00000000
$16 : 85043240 80a75758 81dc9bb8 00006488
$20 : 0000002f 00000007 85144810 80a70000
$24 : 81d1bda0 00000000
$28 : 81dc8000 81dc9aa8 00000000 805ead08
Hi : 00009d51
Lo : 2163358a
epc : 805e91f0 __skb_flow_dissect+0x1b0/0x1b50
ra : 805ead08 __skb_get_hash_net+0x74/0x12c
Status: 11000403 KERNEL EXL IE
Cause : 40800010 (ExcCode 04)
BadVA : 85144817
PrId : 0001992f (MIPS 1004Kc)
Call Trace:
[<805e91f0>] __skb_flow_dissect+0x1b0/0x1b50
[<805ead08>] __skb_get_hash_net+0x74/0x12c
[<805ef330>] get_rps_cpu+0x1b8/0x3fc
[<805fca70>] netif_receive_skb_list_internal+0x324/0x364
[<805fd120>] napi_complete_done+0x68/0x2a4
[<8058de5c>] mtk_napi_rx+0x228/0xfec
[<805fd398>] __napi_poll+0x3c/0x1c4
[<805fd754>] napi_threaded_poll_loop+0x234/0x29c
[<805fd848>] napi_threaded_poll+0x8c/0xb0
[<80053544>] kthread+0x104/0x12c
[<80002bd8>] ret_from_kernel_thread+0x14/0x1c
Code: 02d51821 1060045b 00000000 <8c640000> 3084000f 2c820005 144001a2 00042080 8e220000
To reduce the attack surface and maintain performance, do not process
PPPoE PFC frames.
[1] https://lore.kernel.org/r/20220630231016.GA392@debian.home
Fixes: 46126db9c861 ("flow_dissector: Add PPPoE dissectors")
Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Link: https://patch.msgid.link/20260415022456.141758-1-qingfang.deng@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Date: Mon Mar 9 13:42:37 2026 +0100
gpio: of: clear OF_POPULATED on hog nodes in remove path
commit bbee90e750262bfb406d66dc65c46d616d2b6673 upstream.
The previously set OF_POPULATED flag should be cleared on the hog nodes
when removing the chip.
Cc: stable@vger.kernel.org
Fixes: 63636d956c455 ("gpio: of: Add DT overlay support for GPIO hogs")
Acked-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260309-gpio-hog-fwnode-v2-1-4e61f3dbf06a@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Zilin Guan <zilin@seu.edu.cn>
Date: Fri May 8 20:00:00 2026 -0400
hfsplus: fix held lock freed on hfsplus_fill_super()
[ Upstream commit 90c500e4fd83fa33c09bc7ee23b6d9cc487ac733 ]
hfsplus_fill_super() calls hfs_find_init() to initialize a search
structure, which acquires tree->tree_lock. If the subsequent call to
hfsplus_cat_build_key() fails, the function jumps to the out_put_root
error label without releasing the lock. The later cleanup path then
frees the tree data structure with the lock still held, triggering a
held lock freed warning.
Fix this by adding the missing hfs_find_exit(&fd) call before jumping
to the out_put_root error label. This ensures that tree->tree_lock is
properly released on the error path.
The bug was originally detected on v6.13-rc1 using an experimental
static analysis tool we are developing, and we have verified that the
issue persists in the latest mainline kernel. The tool is specifically
designed to detect memory management issues. It is currently under active
development and not yet publicly available.
We confirmed the bug by runtime testing under QEMU with x86_64 defconfig,
lockdep enabled, and CONFIG_HFSPLUS_FS=y. To trigger the error path, we
used GDB to dynamically shrink the max_unistr_len parameter to 1 before
hfsplus_asc2uni() is called. This forces hfsplus_asc2uni() to naturally
return -ENAMETOOLONG, which propagates to hfsplus_cat_build_key() and
exercises the faulty error path. The following warning was observed
during mount:
=========================
WARNING: held lock freed!
7.0.0-rc3-00016-gb4f0dd314b39 #4 Not tainted
-------------------------
mount/174 is freeing memory ffff888103f92000-ffff888103f92fff, with a lock still held there!
ffff888103f920b0 (&tree->tree_lock){+.+.}-{4:4}, at: hfsplus_find_init+0x154/0x1e0
2 locks held by mount/174:
#0: ffff888103f960e0 (&type->s_umount_key#42/1){+.+.}-{4:4}, at: alloc_super.constprop.0+0x167/0xa40
#1: ffff888103f920b0 (&tree->tree_lock){+.+.}-{4:4}, at: hfsplus_find_init+0x154/0x1e0
stack backtrace:
CPU: 2 UID: 0 PID: 174 Comm: mount Not tainted 7.0.0-rc3-00016-gb4f0dd314b39 #4 PREEMPT(lazy)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x82/0xd0
debug_check_no_locks_freed+0x13a/0x180
kfree+0x16b/0x510
? hfsplus_fill_super+0xcb4/0x18a0
hfsplus_fill_super+0xcb4/0x18a0
? __pfx_hfsplus_fill_super+0x10/0x10
? srso_return_thunk+0x5/0x5f
? bdev_open+0x65f/0xc30
? srso_return_thunk+0x5/0x5f
? pointer+0x4ce/0xbf0
? trace_contention_end+0x11c/0x150
? __pfx_pointer+0x10/0x10
? srso_return_thunk+0x5/0x5f
? bdev_open+0x79b/0xc30
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? vsnprintf+0x6da/0x1270
? srso_return_thunk+0x5/0x5f
? __mutex_unlock_slowpath+0x157/0x740
? __pfx_vsnprintf+0x10/0x10
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? mark_held_locks+0x49/0x80
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? irqentry_exit+0x17b/0x5e0
? trace_irq_disable.constprop.0+0x116/0x150
? __pfx_hfsplus_fill_super+0x10/0x10
? __pfx_hfsplus_fill_super+0x10/0x10
get_tree_bdev_flags+0x302/0x580
? __pfx_get_tree_bdev_flags+0x10/0x10
? vfs_parse_fs_qstr+0x129/0x1a0
? __pfx_vfs_parse_fs_qstr+0x3/0x10
vfs_get_tree+0x89/0x320
fc_mount+0x10/0x1d0
path_mount+0x5c5/0x21c0
? __pfx_path_mount+0x10/0x10
? trace_irq_enable.constprop.0+0x116/0x150
? trace_irq_enable.constprop.0+0x116/0x150
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? kmem_cache_free+0x307/0x540
? user_path_at+0x51/0x60
? __x64_sys_mount+0x212/0x280
? srso_return_thunk+0x5/0x5f
__x64_sys_mount+0x212/0x280
? __pfx___x64_sys_mount+0x10/0x10
? srso_return_thunk+0x5/0x5f
? trace_irq_enable.constprop.0+0x116/0x150
? srso_return_thunk+0x5/0x5f
do_syscall_64+0x111/0x680
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ffacad55eae
Code: 48 8b 0d 85 1f 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 8
RSP: 002b:00007fff1ab55718 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffacad55eae
RDX: 000055740c64e5b0 RSI: 000055740c64e630 RDI: 000055740c651ab0
RBP: 000055740c64e380 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000055740c64e5b0 R14: 000055740c651ab0 R15: 000055740c64e380
</TASK>
After applying this patch, the warning no longer appears.
Fixes: 89ac9b4d3d1a ("hfsplus: fix longname handling")
CC: stable@vger.kernel.org
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Deepanshu Kartikey <kartikey406@gmail.com>
Date: Fri May 8 19:59:59 2026 -0400
hfsplus: fix uninit-value by validating catalog record size
[ Upstream commit b6b592275aeff184aa82fcf6abccd833fb71b393 ]
Syzbot reported a KMSAN uninit-value issue in hfsplus_strcasecmp(). The
root cause is that hfs_brec_read() doesn't validate that the on-disk
record size matches the expected size for the record type being read.
When mounting a corrupted filesystem, hfs_brec_read() may read less data
than expected. For example, when reading a catalog thread record, the
debug output showed:
HFSPLUS_BREC_READ: rec_len=520, fd->entrylength=26
HFSPLUS_BREC_READ: WARNING - entrylength (26) < rec_len (520) - PARTIAL READ!
hfs_brec_read() only validates that entrylength is not greater than the
buffer size, but doesn't check if it's less than expected. It successfully
reads 26 bytes into a 520-byte structure and returns success, leaving 494
bytes uninitialized.
This uninitialized data in tmp.thread.nodeName then gets copied by
hfsplus_cat_build_key_uni() and used by hfsplus_strcasecmp(), triggering
the KMSAN warning when the uninitialized bytes are used as array indices
in case_fold().
Fix by introducing hfsplus_brec_read_cat() wrapper that:
1. Calls hfs_brec_read() to read the data
2. Validates the record size based on the type field:
- Fixed size for folder and file records
- Variable size for thread records (depends on string length)
3. Returns -EIO if size doesn't match expected
For thread records, check against HFSPLUS_MIN_THREAD_SZ before reading
nodeName.length to avoid reading uninitialized data at call sites that
don't zero-initialize the entry structure.
Also initialize the tmp variable in hfsplus_find_cat() as defensive
programming to ensure no uninitialized data even if validation is
bypassed.
Reported-by: syzbot+d80abb5b890d39261e72@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d80abb5b890d39261e72
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: syzbot+d80abb5b890d39261e72@syzkaller.appspotmail.com
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Viacheslav Dubeyko <slava@dubeyko.com>
Suggested-by: Charalampos Mitrodimas <charmitro@posteo.net>
Link: https://lore.kernel.org/all/20260120051114.1281285-1-kartikey406@gmail.com/ [v1]
Link: https://lore.kernel.org/all/20260121063109.1830263-1-kartikey406@gmail.com/ [v2]
Link: https://lore.kernel.org/all/20260212014233.2422046-1-kartikey406@gmail.com/ [v3]
Link: https://lore.kernel.org/all/20260214002100.436125-1-kartikey406@gmail.com/T/ [v4]
Link: https://lore.kernel.org/all/20260221061626.15853-1-kartikey406@gmail.com/T/ [v5]
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20260307010302.41547-1-kartikey406@gmail.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Stable-dep-of: 90c500e4fd83 ("hfsplus: fix held lock freed on hfsplus_fill_super()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thomas Zimmermann <tzimmermann@suse.de>
Date: Thu Apr 2 11:09:15 2026 +0200
hv: Select CONFIG_SYSFB only for CONFIG_HYPERV_VMBUS
commit d33db956c9618e7cb08c2520ce708437914214ec upstream.
Hyperv's sysfb access only exists in the VMBUS support. Therefore
only select CONFIG_SYSFB for CONFIG_HYPERV_VMBUS. Avoids sysfb code
on systems that don't need it.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 96959283a58d ("Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests")
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Saurabh Sengar <ssengar@linux.microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-hyperv@vger.kernel.org
Cc: <stable@vger.kernel.org> # v6.16+
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Link: https://patch.msgid.link/20260402092305.208728-2-tzimmermann@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Date: Tue Apr 28 08:53:39 2026 -0400
hv_sock: fix ARM64 support
commit b31681206e3f527970a7c7ed807fbf6a028fc25b upstream.
VMBUS ring buffers must be page aligned. Therefore, the current value of
24K presents a challenge on ARM64 kernels (with 64K pages). So, use
VMBUS_RING_SIZE() to ensure they are always aligned and large enough to
hold all of the relevant data.
Cc: stable@vger.kernel.org
Fixes: 77ffe33363c0 ("hv_sock: use HV_HYP_PAGE_SIZE for Hyper-V communication")
Tested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260428125339.13963-1-hamzamahfooz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Dexuan Cui <decui@microsoft.com>
Date: Thu Apr 16 12:14:33 2026 -0700
hv_sock: Report EOF instead of -EIO for FIN
commit f6315295899415f1ddcf39f7c9cb46d25e2c6c6a upstream.
Commit f0c5827d07cb unluckily causes a regression for the FIN packet,
and the final read syscall gets an error rather than 0.
Ideally, we would want to fix hvs_channel_readable_payload() so that it
could return 0 in the FIN scenario, but it's not good for the hv_sock
driver to use the VMBus ringbuffer's cached priv_read_index, which is
internal data in the VMBus driver.
Fix the regression in hv_sock by returning 0 rather than -EIO.
Fixes: f0c5827d07cb ("hv_sock: Return the readable bytes in hvs_stream_has_data()")
Cc: stable@vger.kernel.org
Reported-by: Ben Hillis <Ben.Hillis@microsoft.com>
Reported-by: Mitchell Levy <levymitchell0@gmail.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260416191433.840637-1-decui@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Dexuan Cui <decui@microsoft.com>
Date: Wed Apr 22 23:48:11 2026 -0700
hv_sock: Return -EIO for malformed/short packets
commit 3d1f20727a635811f6b77801a7b57b8995268abd upstream.
Commit f63152958994 fixes a regression, however it fails to report an
error for malformed/short packets -- normally we should never see such
packets, but let's report an error for them just in case.
Fixes: f63152958994 ("hv_sock: Report EOF instead of -EIO for FIN")
Cc: stable@vger.kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260423064811.1371749-1-decui@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Myeonghun Pak <mhun512@gmail.com>
Date: Fri Apr 24 22:50:51 2026 +0900
hwmon: (corsair-psu) Close HID device on probe errors
commit 174606451fbb17db506ebaacdd5e203e57773d5f upstream.
corsairpsu_probe() opens the HID device before sending the device init
and firmware-info commands. If either command fails, the error path jumps
directly to fail_and_stop and skips hid_hw_close().
Use the existing fail_and_close label for those post-open failures so the
open count and low-level close callback are balanced before hid_hw_stop().
Fixes: d115b51e0e56 ("hwmon: add Corsair PSU HID controller driver")
Cc: stable@vger.kernel.org
Signed-off-by: Myeonghun Pak <mhun512@gmail.com>
Reviewed-by: Wilken Gottwalt <wilken.gottwalt@posteo.net>
Link: https://lore.kernel.org/r/20260424135107.13720-1-mhun512@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sanman Pradhan <psanman@juniper.net>
Date: Thu Apr 16 21:59:30 2026 +0000
hwmon: (ltc2992) Clamp threshold writes to hardware range
commit d6cc7c99bf1f73eda7d565d224d791d16239bb41 upstream.
ltc2992_set_voltage(), ltc2992_set_current(), and ltc2992_set_power()
do not validate the user-supplied value before converting it to a
register value. This can result in:
1. Negative input values wrapping to large positive register values.
For power, the negative long is implicitly cast to u64 in
mul_u64_u32_div(), producing an incorrect value. For voltage and
current, the negative converted value wraps when passed to
ltc2992_write_reg() as a u32.
2. Intermediate arithmetic exceeding the range representable in u64 on
64-bit platforms. In ltc2992_set_voltage(), (u64)val * 1000 can
exceed U64_MAX when val is a large positive long. In
ltc2992_set_current(), (u64)val * r_sense_uohm can overflow
similarly. In ltc2992_set_power(), the computed value may not fit
in u64.
3. Register values exceeding the hardware field width. Voltage and
current threshold registers are 12-bit (stored left-justified in
16 bits), and power threshold registers are 24-bit. Without
clamping, bits above the field width are truncated in
ltc2992_write_reg().
Fix by clamping negative values to zero, clamping positive values to
the rounded hardware-representable maximum (the value returned by the
read path for a full-scale register) to prevent intermediate overflow,
and clamping the converted register value to the hardware field width
before writing. The existing conversion formula and rounding behavior
are preserved.
In the power write path, cancel the factor of 1000 from both the
numerator (r_sense_uohm * 1000) and the denominator
(VADC_UV_LSB * IADC_NANOV_LSB) to also eliminate a u32 overflow of
r_sense_uohm * 1000 when r_sense_uohm exceeds about 4.29 ohms.
Fixes: b0bd407e94b03 ("hwmon: (ltc2992) Add support")
Cc: stable@vger.kernel.org
Signed-off-by: Sanman Pradhan <psanman@juniper.net>
Link: https://lore.kernel.org/r/20260416215904.101969-2-sanman.pradhan@hpe.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sanman Pradhan <psanman@juniper.net>
Date: Thu Apr 16 21:59:40 2026 +0000
hwmon: (ltc2992) Fix u32 overflow in power read path
commit 2da0c1fd01dbd6b22844e8676585153dfc660cbe upstream.
ltc2992_get_power() computes the divisor for mul_u64_u32_div() as
r_sense_uohm * 1000. This multiplication overflows u32 when
r_sense_uohm exceeds about 4.29 ohms (4294967 micro-ohms), producing
a truncated divisor and an incorrect power reading.
Cancel the factor of 1000 from both the numerator
(VADC_UV_LSB * IADC_NANOV_LSB = 312500000) and the divisor
(r_sense_uohm * 1000), giving (VADC_UV_LSB / 1000) * IADC_NANOV_LSB
= 312500 as the numerator and plain r_sense_uohm as the divisor.
The cancellation is exact because LTC2992_VADC_UV_LSB (25000) is
divisible by 1000.
This is the read-path counterpart of the write-path fix applied in
the preceding patch.
Fixes: b0bd407e94b03 ("hwmon: (ltc2992) Add support")
Cc: stable@vger.kernel.org
Signed-off-by: Sanman Pradhan <psanman@juniper.net>
Link: https://lore.kernel.org/r/20260416215904.101969-3-sanman.pradhan@hpe.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mingming Cao <mmc@linux.ibm.com>
Date: Fri Apr 24 09:29:17 2026 -0700
ibmveth: Disable GSO for packets with small MSS
commit cc427d24ac6442ffdeafd157a63c7c5b73ed4de4 upstream.
Some physical adapters on Power systems do not support segmentation
offload when the MSS is less than 224 bytes. Attempting to send such
packets causes the adapter to freeze, stopping all traffic until
manually reset.
Implement ndo_features_check to disable GSO for packets with small MSS
values. The network stack will perform software segmentation instead.
The 224-byte minimum matches ibmvnic
commit <f10b09ef687f> ("ibmvnic: Enforce stronger sanity checks
on GSO packets")
which uses the same physical adapters in SEA configurations.
The issue occurs specifically when the hardware attempts to perform
segmentation (gso_segs > 1) with a small MSS. Single-segment GSO packets
(gso_segs == 1) do not trigger the problematic LSO code path and are
transmitted normally without segmentation.
Add an ndo_features_check callback to disable GSO when MSS < 224 bytes.
Also call vlan_features_check() to ensure proper handling of VLAN packets,
particularly QinQ (802.1ad) configurations where the hardware parser may
not support certain offload features.
Validated using iptables to force small MSS values. Without the fix,
the adapter freezes. With the fix, packets are segmented in software
and transmission succeeds. Comprehensive regression testing completedd
(MSS tests, performance, stability).
Fixes: 8641dd85799f ("ibmveth: Add support for TSO")
Cc: stable@vger.kernel.org
Reviewed-by: Brian King <bjking1@linux.ibm.com>
Tested-by: Shaik Abdulla <shaik.abdulla1@ibm.com>
Tested-by: Naveed Ahmed <naveedaus@in.ibm.com>
Signed-off-by: Mingming Cao <mmc@linux.ibm.com>
Link: https://patch.msgid.link/20260424162917.65725-1-mmc@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Guangshuo Li <lgs201920130244@gmail.com>
Date: Thu Apr 16 17:53:27 2026 -0700
ice: fix double free in ice_sf_eth_activate() error path
commit 9aab1c3d7299285e2569cbc0ed5892d631a241b2 upstream.
When auxiliary_device_add() fails, ice_sf_eth_activate() jumps to
aux_dev_uninit and calls auxiliary_device_uninit(&sf_dev->adev).
The device release callback ice_sf_dev_release() frees sf_dev, but
the current error path falls through to sf_dev_free and calls
kfree(sf_dev) again, causing a double free.
Keep kfree(sf_dev) for the auxiliary_device_init() failure path, but
avoid falling through to sf_dev_free after auxiliary_device_uninit().
Fixes: 13acc5c4cdbe ("ice: subfunction activation and base devlink ops")
Cc: stable@vger.kernel.org
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-3-686c33c9828d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Martin Michaelis <code@mgjm.de>
Date: Thu Apr 23 15:54:11 2026 -0600
io_uring/kbuf: support min length left for incremental buffers
commit 7deba791ad495ce1d7921683f4f7d1190fa210d1 upstream.
Incrementally consumed buffer rings are generally fully consumed, but
it's quite possible that the application has a minimum size it needs to
meet to avoid truncation. Currently that minimum limit is 1 byte, but
this should be a setting that is the hands of the application. For
recvmsg multishot, a prime use case for incrementally consumed buffers,
the application may get spurious -EFAULT returned at the end of an
incrementally consumed buffer, as less space is available than the
headers need.
Grab a u32 field in struct io_uring_buf_reg, which the application can
use to inform the kernel of the minimum size that should be available
in an incrementally consumed buffer. If less than that is available,
the current buffer is fully processed and the next one will be picked.
Cc: stable@vger.kernel.org
Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption")
Link: https://github.com/axboe/liburing/issues/1433
Signed-off-by: Martin Michaelis <code@mgjm.de>
[axboe: write commit message, change io_buffer_list member name]
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jens Axboe <axboe@kernel.dk>
Date: Mon Apr 27 14:29:02 2026 -0600
io_uring/tw: serialize ctx->retry_llist with ->uring_lock
commit 17666e2d7592c3e85260cafd3950121524acc2c5 upstream.
The DEFER_TASKRUN local task work paths all run under ctx->uring_lock,
which serializes them with each other and with the rest of the ring's
hot paths. io_move_task_work_from_local() is the exception - it's called
from io_ring_exit_work() on a kworker without holding the lock and from
the iopoll cancelation side right after dropping it.
->work_llist is fine with this, as it's only ever updated via the
expected paths. But the ->retry_llist is updated while runing, and hence
it could potentially race between normal task_work running and the
task-has-exited shutdown path.
Simply grab ->uring_lock while moving the local work to the fallback
list for exit purposes, which nicely serializes it across both the
normal additions and the exit prune path.
Cc: stable@vger.kernel.org
Fixes: f46b9cdb22f7 ("io_uring: limit local tw done")
Reported-by: Robert Femmer <robert.femmer@x41-dsec.de>
Reported-by: Christian Reitter <invd@inhq.net>
Reported-by: Michael Rodler <michael.rodler@x41-dsec.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Nicolin Chen <nicolinc@nvidia.com>
Date: Tue Mar 17 00:59:16 2026 -0700
iommu/arm-smmu-v3: Add a missing dma_wmb() for hitless STE update
commit 6fabce53f6b9c2419012a9103e1a46d40888cefa upstream.
When writing a new (previously invalid) valid IOPTE to a page table, then
installing the page table into an STE hitlesslessly (e.g. in S2TTB field),
there is a window before an STE invalidation, where the page-table may be
accessed by SMMU but the new IOPTE is still siting in the CPU cache.
This could occur when we allocate an iommu_domain and immediately install
it hitlessly, while there would be no dma_wmb() for the page table memory
prior to the earliest point of HW reading the STE.
Fix it by adding a dma_wmb() prior to updating the STE.
Fixes: 56e1a4cc2588 ("iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry")
Cc: stable@vger.kernel.org
Reported-by: Will Deacon <will@kernel.org>
Closes: https://lore.kernel.org/linux-iommu/aXdlnLLFUBwjT0V5@willie-the-truck/
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Zhenzhong Duan <zhenzhong.duan@intel.com>
Date: Thu Apr 2 14:57:24 2026 +0800
iommu/vt-d: Block PASID attachment to nested domain with dirty tracking
commit cc5bd898ff70710ffc41cd8e5c2741cb64750047 upstream.
Kernel lacks dirty tracking support on nested domain attached to PASID,
fails the attachment early if nesting parent domain is dirty tracking
configured, otherwise dirty pages would be lost.
Cc: stable@vger.kernel.org
Fixes: 67f6f56b5912 ("iommu/vt-d: Add set_dev_pasid callback for nested domain")
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20260330101108.12594-2-zhenzhong.duan@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Fixes: 67f6f56b5912 ("iommu/vt-d: Add set_dev_pasid callback for nested domain")
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sina Hassani <sina@openai.com>
Date: Fri Apr 10 11:32:44 2026 -0700
iommufd: Fix a race with concurrent allocation and unmap
commit 8602018b1f17fbdaa5e5d79f4c8603ad20640c12 upstream.
iopt_unmap_iova_range() releases the lock on iova_rwsem inside the loop
body when getting to the more expensive unmap operations. This is fine on
its own, except the loop condition is based on the first area that matches
the unmap address range. If a concurrent call to map picks an area that
was unmapped in previous iterations, the loop mistakenly tries to unmap
it.
This is reproducible by having one userspace thread map buffers and pass
them to another thread that unmaps them. The problem manifests as EBUSY
errors with single page mappings.
Fix this by advancing the start pointer after unmapping an area. This
ensures each iteration only examines the IOVA range that remains mapped,
which is guaranteed not to have overlaps.
Cc: stable@vger.kernel.org
Fixes: 51fe6141f0f6 ("iommufd: Data structure to provide IOVA to PFN mapping")
Link: https://patch.msgid.link/r/CAAJpGJSR4r_ds1JOjmkqHtsBPyxu8GntoeW08Sk5RNQPmgi+tg@mail.gmail.com
Signed-off-by: Sina Hassani <sina@openai.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Zhenzhong Duan <zhenzhong.duan@intel.com>
Date: Sun Mar 29 23:07:55 2026 -0400
iommufd: Fix return value of iommufd_fault_fops_write()
commit aaca2aa92785a6ab8e3183e7184bca447a99cd76 upstream.
copy_from_user() may return number of bytes failed to copy, we should
not pass over this number to user space to cheat that write() succeed.
Instead, -EFAULT should be returned.
Link: https://patch.msgid.link/r/20260330030755.12856-1-zhenzhong.duan@intel.com
Cc: stable@vger.kernel.org
Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Maoyi Xie <maoyixie.tju@gmail.com>
Date: Thu Apr 30 18:33:18 2026 +0800
ip6_gre: Use cached t->net in ip6erspan_changelink().
commit 1d324c2f43f70c965f25c58cc3611c779adbe47e upstream.
After commit 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of
rtnl_link_ops"), ip6erspan_newlink() correctly resolves the per-netns
ip6gre hash via link_net. ip6erspan_changelink() was not converted in
that series and still uses dev_net(dev), which diverges from the
device's creation netns after IFLA_NET_NS_FD migration.
This re-inserts the tunnel into the wrong per-netns hash. The
original netns keeps a stale entry. When that netns is later
destroyed, ip6gre_exit_rtnl_net() walks the stale entry, producing a
slab-use-after-free reported by KASAN, followed by a kernel BUG at
net/core/dev.c (LIST_POISON1) in unregister_netdevice_many_notify().
Reachable from an unprivileged user namespace (unshare --user
--map-root-user --net).
ip6gre_changelink() earlier in the same file already uses the cached
t->net; only ip6erspan_changelink() has the wrong shape.
Fixes: 2d665034f239 ("net: ip6_gre: Fix ip6erspan hlen calculation")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260430103318.3206018-1-maoyi.xie@ntu.edu.sg
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Corey Minyard <corey@minyard.net>
Date: Mon Apr 20 12:39:51 2026 -0500
ipmi: Add limits to event and receive message requests
commit c4cca236968683eb0d59abfb12d5c7e4d8514227 upstream.
The driver would just fetch events and receive messages until the
BMC said it was done. To avoid issues with BMCs that never say they are
done, add a limit of 10 fetches at a time.
In addition, an si interface has an attn state it can return from the
hardware which is supposed to cause a flag fetch to see if the driver
needs to fetch events or message or a few other things. If the attn
bit gets stuck, it's a similar problem. So allow messages in between
flag fetches so the driver itself doesn't get stuck.
This is a more general fix than the previous fix for the specific bad
BMC, but should fix the more general issue of a BMC that won't stop
saying it has data.
This has been there from the beginning of the driver. It's not a bug
per-se, but it is accounting for bugs in BMCs.
Reported-by: Matt Fleming <mfleming@cloudflare.com>
Closes: https://lore.kernel.org/lkml/20260415115930.3428942-1-matt@readmodwrite.com/
Fixes: <1da177e4c3f4> ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Corey Minyard <corey@minyard.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Corey Minyard <corey@minyard.net>
Date: Mon Apr 20 12:50:09 2026 -0500
ipmi: Check event message buffer response for bad data
commit 36920f30e78e69df01f9691c470b6f3ba8aebf98 upstream.
The event message buffer response data size got checked later when
processing, but check it right after the response comes back. It
appears some BMCs may return an empty message instead of an error
when fetching events.
There are apparently some new BMCs that make this error, so we need to
compensate.
Reported-by: Matt Fleming <mfleming@cloudflare.com>
Closes: https://lore.kernel.org/lkml/20260415115930.3428942-1-matt@readmodwrite.com/
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org>
Signed-off-by: Corey Minyard <corey@minyard.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Corey Minyard <corey@minyard.net>
Date: Mon Apr 20 13:02:18 2026 -0500
ipmi:si: Return state to normal if message allocation fails
commit 09dd798270ff582d7309f285d4aaf5dbebae01cb upstream.
There were places where nothing would get started if a message
allocation failed, so the driver needs to return to normal state.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org>
Signed-off-by: Corey Minyard <corey@minyard.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yilin Zhu <zylzyl2333@gmail.com>
Date: Sun Apr 12 13:07:54 2026 +0800
ipv6: xfrm6: release dst on error in xfrm6_rcv_encap()
commit bc0fcb9823cd0894934cf968b525c575833d7078 upstream.
xfrm6_rcv_encap() performs an IPv6 route lookup when the skb does not
already have a dst attached. ip6_route_input_lookup() returns a
referenced dst entry even when the lookup resolves to an error route.
If dst->error is set, xfrm6_rcv_encap() drops the skb without attaching
the dst to the skb and without releasing the reference returned by the
lookup. Repeated packets hitting this path therefore leak dst entries.
Release the dst before jumping to the drop path.
Fixes: 0146dca70b87 ("xfrm: add support for UDPv6 encapsulation of ESP")
Cc: stable@kernel.org
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Tested-by: Ruide Cao <caoruide123@gmail.com>
Signed-off-by: Yilin Zhu <zylzyl2333@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Michael Bommarito <michael.bommarito@gmail.com>
Date: Sun Apr 19 17:21:55 2026 -0400
isofs: validate block number from NFS file handle in isofs_export_iget
commit 24376458138387fb251e782e624c7776e9826796 upstream.
isofs_fh_to_dentry() and isofs_fh_to_parent() pass an attacker-
controlled block number (ifid->block or ifid->parent_block) from
the NFS file handle to isofs_export_iget(), which only rejects
block == 0 before calling isofs_iget() and ultimately sb_bread().
A crafted file handle with fh_len sufficient to pass the check
added by commit 0405d4b63d08 ("isofs: Prevent the use of too small
fid") can still drive the server to read any in-range block on the
backing device as if it were an iso_directory_record. That earlier
fix was assigned CVE-2025-37780.
sb_bread() on an out-of-range block returns NULL cleanly via the
EIO path, so there is no memory-safety violation. For in-range
reads of adjacent-partition data on the same block device, the
unrelated bytes end up in iso_inode_info fields that reach the NFS
client as dentry metadata. The deployment surface (isofs exported
over NFS from loop-mounted images) is narrow and requires an
authenticated NFS peer, but the malformed-file-handle class is
reportable as hardening next to the existing CVE-2025-37780 fix.
Reject block >= ISOFS_SB(sb)->s_nzones in isofs_export_iget() so
the check covers both isofs_fh_to_dentry() and isofs_fh_to_parent()
call sites with a single line.
Fixes: 0405d4b63d08 ("isofs: Prevent the use of too small fid")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://patch.msgid.link/20260419212155.2169382-3-michael.bommarito@gmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Michael Bommarito <michael.bommarito@gmail.com>
Date: Sun Apr 19 17:21:54 2026 -0400
isofs: validate Rock Ridge CE continuation extent against volume size
commit a36d990f591320e9dd379ab30063ebfe91d47e1f upstream.
rock_continue() reads rs->cont_extent verbatim from the Rock Ridge CE
record and passes it to sb_bread() without checking that the block
number is within the mounted ISO 9660 volume. commit e595447e177b
("[PATCH] rock.c: handle corrupted directories") added cont_offset
and cont_size rejection for the CE continuation but did not validate
the extent block number itself. commit f54e18f1b831 ("isofs: Fix
infinite looping over CE entries") later capped the CE chain length
at RR_MAX_CE_ENTRIES = 32 but again left the block number unchecked.
With a crafted ISO mounted via udisks2 (desktop optical auto-mount)
or via CAP_SYS_ADMIN mount, rs->cont_extent can therefore point at
an out-of-range block or at blocks belonging to an adjacent
filesystem on the same block device. sb_bread() on an out-of-range
block returns NULL cleanly via the block layer EIO path, so there
is no memory-safety violation. For in-range reads of adjacent-
filesystem data, the CE buffer is parsed as Rock Ridge records and
only the text of SL sub-records reaches userspace through
readlink(), which makes the info-leak channel narrow and difficult
to exploit; still, rejecting the malformed CE outright matches the
rejection shape already present in the same function for
cont_offset and cont_size.
Add an ISOFS_SB(sb)->s_nzones bounds check to rock_continue() next
to the existing offset/size rejection, printing the same
corrupted-directory-entry notice.
Fixes: f54e18f1b831 ("isofs: Fix infinite looping over CE entries")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://patch.msgid.link/20260419212155.2169382-2-michael.bommarito@gmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: DaeMyung Kang <charsyam@gmail.com>
Date: Sat Apr 25 18:38:28 2026 +0900
ksmbd: rewrite stop_sessions() with restartable iteration
commit c444139cb747bf6de1922b39900fdf02281490f4 upstream.
stop_sessions() walks conn_list with hash_for_each() and, for every
entry, drops conn_list_lock across the transport ->shutdown() call
before re-acquiring the read lock to continue the loop. The hash
walk relies on cross-iteration state (the current bucket and the
hlist position), which is not preserved across unlock/relock: if
another thread performs a list mutation during the unlocked window,
the ongoing iteration becomes unreliable and can re-visit
connections that have already been handled or skip connections that
have not. The outer `if (!hash_empty(conn_list)) goto again;` retry
masks the symptom in the common case but does not address the
unsafe iteration itself.
Reframe the loop so it never relies on iterator state across
unlock/relock. Under conn_list_lock held for read, pick the first
connection whose ->shutdown() has not yet been issued by this path,
pin it by taking an extra reference, record that fact on the
connection and mark it EXITING while still inside the locked walk,
then drop the lock. Then call ->shutdown() outside the lock, drop
the pin (freeing the connection if the handler already released its
reference), and restart from the top.
Use a new per-connection flag, conn->stop_called, as the "shutdown
issued from stop_sessions()" marker rather than reusing the status
state. ksmbd_conn_set_exiting() is also invoked by
ksmbd_sessions_deregister() on sibling channels of a multichannel
session without issuing a transport shutdown, so treating
KSMBD_SESS_EXITING as "already handled here" would skip connections
that still need shutdown() to wake their handler out of recv(),
leaving the outer retry waiting indefinitely for the hash to drain.
stop_sessions() is serialised by init_lock in
ksmbd_conn_transport_destroy(), so writing stop_called under the
read lock has no other writer.
Set EXITING inside the locked walk so the selection, the stop_called
marker, and the status transition all happen together, and guard
against regressing a connection that has already advanced to
KSMBD_SESS_RELEASING on its own (for example, if the handler exited
its receive loop for an unrelated reason between teardown steps).
When the pin drop is the last put, release the transport and pair
ida_destroy(&target->async_ida) with the ida_init() done in
ksmbd_conn_alloc(), so stop_sessions() retiring a connection on its
own does not leak the xarray backing of the embedded async_ida.
The outer retry with msleep() is kept to wait for handler threads to
reach ksmbd_conn_free() and drain the hash.
Observed with an instrumented build that logs one line per visit and
widens the unlocked window before ->shutdown() by 200 ms, under
five concurrent cifs mounts (nosharesock, one connection each):
* Current code: the same connection address is revisited many
times during a single stop_sessions() call and ->shutdown() is
invoked well beyond the number of live connections before the
hash finally drains.
* Rewritten code: each live connection produces exactly one
->shutdown() call; the function returns as soon as the hash is
empty.
Functional teardown via `ksmbd.control --shutdown` with the same
five mounts completes cleanly on the rewritten path.
Performance is observably unchanged. Tearing down N concurrent
nosharesock cifs connections with `ksmbd.control --shutdown` +
`rmmod ksmbd` takes essentially the same wall time before and after
the rewrite:
N before after
10 4.93s 5.34s
30 7.34s 7.03s
50 7.31s 7.01s (3-run avg: 7.04s vs 7.25s)
100 6.98s 6.78s
200 6.77s 6.89s
and the number of ->shutdown() calls equals the number of live
connections on both paths when the race is not widened. The
teardown is dominated by the msleep(100)-based outer retry waiting
for handler threads to run ksmbd_conn_free(), not by the iteration
itself; the restartable loop's worst-case O(N^2) visit cost is in
the microseconds even at N=200 and sits far below the msleep(100)
granularity.
Applied alone on top of ksmbd-for-next-next, this patch does not
introduce a new leak site. Under the same reproducer (10x
concurrent-holders + ss -K + ksmbd.control --shutdown + rmmod), the
tree still shows the pre-existing per-connection transport leak
count that arises when the last refcount drop lands in one of
ksmbd_conn_r_count_dec(), __free_opinfo() or session_fd_check() -
all of which end with a bare kfree() today. kmemleak backtraces
for the unreferenced objects point into the TCP accept path
(sk_clone -> inet_csk_clone_lock, sock_alloc_inode) and none
involve stop_sessions(). Plugging those bare-kfree sites is the
responsibility of the follow-up patch.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Shota Zaizen <s@zaizen.me>
Date: Tue Apr 28 19:02:55 2026 +0900
ksmbd: validate inherited ACE SID length
commit 996454bc0da84d5a1dedb1a7861823087e01a7ae upstream.
smb_inherit_dacl() walks the parent directory DACL loaded from the
security descriptor xattr. It verifies that each ACE contains the fixed
SID header before using it, but does not verify that the variable-length
SID described by sid.num_subauth is fully contained in the ACE.
A malformed inheritable ACE can advertise more subauthorities than are
present in the ACE. compare_sids() may then read past the ACE.
smb_set_ace() also clamps the copied destination SID, but used the
unchecked source SID count to compute the inherited ACE size. That could
advance the temporary inherited ACE buffer pointer and nt_size accounting
past the allocated buffer.
Fix this by validating the parent ACE SID count and SID length before
using the SID during inheritance. Compute the inherited ACE size from the
copied SID so the size matches the bounded destination SID. Reject the
inherited DACL if size accumulation would overflow smb_acl.size or the
security descriptor allocation size.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Shota Zaizen <s@zaizen.me>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Fuad Tabba <tabba@google.com>
Date: Fri Apr 24 09:49:03 2026 +0100
KVM: arm64: Fix FEAT_Debugv8p9 to check DebugVer, not PMUVer
commit 7fe2cd4e1a3ad230d8fcc00cc99c4bcce4412a75 upstream.
FEAT_Debugv8p9 is incorrectly defined against ID_AA64DFR0_EL1.PMUVer
instead of ID_AA64DFR0_EL1.DebugVer. All three consumers of the macro
gate features that are architecturally tied to FEAT_Debugv8p9
(DebugVer = 0b1011, DDI0487 M.b A2.2.10):
- HDFGRTR2_EL2.nMDSELR_EL1, HDFGWTR2_EL2.nMDSELR_EL1: MDSELR_EL1
is present only when FEAT_Debugv8p9 is implemented (D24.3.21).
- MDCR_EL2.EBWE: the Extended Breakpoint and Watchpoint Enable bit
is RES0 unless FEAT_Debugv8p9 is implemented (D24.3.17).
Neither register has any dependency on PMUVer.
FEAT_Debugv8p9 and FEAT_PMUv3p9 are independent. Per DDI0487 M.b
A2.2.10, FEAT_Debugv8p9 is unconditionally mandatory from Armv8.9,
whereas FEAT_PMUv3p9 is mandatory only when FEAT_PMUv3 is implemented.
An Armv8.9 CPU without a PMU has DebugVer = 0b1011 but PMUVer = 0b0000,
so the wrong field check would cause KVM to incorrectly treat EBWE and
MDSELR_EL1 as RES0 on such hardware.
Fixes: 4bc0fe089840 ("KVM: arm64: Add sanitisation for FEAT_FGT2 registers")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260424084908.370776-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Fuad Tabba <tabba@google.com>
Date: Fri Apr 24 09:49:05 2026 +0100
KVM: arm64: Fix FEAT_SPE_FnE to use PMSIDR_EL1.FnE, not PMSVer
commit 08d715338287a1affb4c7ad5733decef4558a5c8 upstream.
FEAT_SPE_FnE is architecturally detected via PMSIDR_EL1.FnE [6], not
ID_AA64DFR0_EL1.PMSVer. The FEAT_X macro form (register, field, value)
cannot encode a PMSIDR_EL1-based feature, so FEAT_SPE_FnE was defined
identically to FEAT_SPEv1p2 (ID_AA64DFR0_EL1, PMSVer, V1P2), producing
a duplicate that used PMSVer >= V1P2 as a proxy.
Replace the macro with feat_spe_fne(), following the same pattern as
the sibling feat_spe_fds(): guard on FEAT_SPEv1p2 and read
PMSIDR_EL1.FnE [6] directly. Wire the two NEEDS_FEAT consumers to use
the new function.
Remove the now-unused FEAT_SPE_FnE macro.
Fixes: 63d423a7635b ("KVM: arm64: Switch to table-driven FGU configuration")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260424084908.370776-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Quentin Perret <qperret@google.com>
Date: Fri Apr 24 09:49:08 2026 +0100
KVM: arm64: Fix initialisation order in __pkvm_init_finalise()
commit 5bb0aed57ba944f8c201e4e82ec066e0187e0f85 upstream.
fix_host_ownership() walks the hypervisor's stage-1 page-table to
adjust the host's stage-2 accordingly. Any such adjustment that
requires cache maintenance operations depends on the per-CPU hyp
fixmap being present. However, fix_host_ownership() is currently
called before fix_hyp_pgtable_refcnt() and hyp_create_fixmap(), so
the fixmap does not yet exist when it runs.
This is benign today because the host stage-2 starts empty and no
CMOs are needed, but it becomes a latent crash as soon as
fix_host_ownership() is extended to operate on a non-empty
page-table.
Reorder the calls so that fix_hyp_pgtable_refcnt() and
hyp_create_fixmap() complete before fix_host_ownership() is invoked.
Fixes: 0d16d12eb26e ("KVM: arm64: Fix-up hyp stage-1 refcounts for all pages mapped at EL2")
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260424084908.370776-7-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Fuad Tabba <tabba@google.com>
Date: Fri Apr 24 09:49:06 2026 +0100
KVM: arm64: Fix kvm_vcpu_initialized() macro parameter
commit d89fdda7dd8a488f922e1175e6782f781ba8a23b upstream.
The macro is defined with parameter 'v' but the body references the
literal token 'vcpu' instead, causing it to silently operate on whatever
'vcpu' resolves to in the caller's scope rather than the value passed by
the caller. All current call sites happen to use a variable named 'vcpu',
so the bug is latent.
Fixes: e016333745c7 ("KVM: arm64: Only reset vCPU-scoped feature ID regs once")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260424084908.370776-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Fuad Tabba <tabba@google.com>
Date: Fri Apr 24 09:49:07 2026 +0100
KVM: arm64: Fix pin leak and publication ordering in __pkvm_init_vcpu()
commit 73b9c1e5da84cd69b1a86e374e450817cd051371 upstream.
Two bugs exist in the vCPU initialisation path:
1. If a check fails after hyp_pin_shared_mem() succeeds, the cleanup
path jumps to 'unlock' without calling unpin_host_vcpu() or
unpin_host_sve_state(), permanently leaking pin references on the
host vCPU and SVE state pages.
Extract a register_hyp_vcpu() helper that performs the checks and
the store. When register_hyp_vcpu() returns an error, call
unpin_host_vcpu() and unpin_host_sve_state() inline before falling
through to the existing 'unlock' label.
2. register_hyp_vcpu() publishes the new vCPU pointer into
'hyp_vm->vcpus[]' with a bare store, allowing a concurrent caller
of pkvm_load_hyp_vcpu() to observe a partially initialised vCPU
object.
Ensure the store uses smp_store_release() and the load uses
smp_load_acquire(). While 'vm_table_lock' currently serialises the
store and the load, these barriers ensure the reader sees the fully
initialised 'hyp_vcpu' object even if there were a lockless path or
if the lock's own ordering guarantees were insufficient for nested
object initialization.
Fixes: 49af6ddb8e5c ("KVM: arm64: Add infrastructure to create and track pKVM instances at EL2")
Reported-by: Ben Simner <ben.simner@cl.cam.ac.uk>
Co-developed-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260424084908.370776-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: David Woodhouse <dwmw@amazon.co.uk>
Date: Tue Apr 7 21:27:02 2026 +0100
KVM: arm64: vgic: Fix IIDR revision field extracted from wrong value
commit a0e6ae45af17e8b27958830595799c702ffbab8d upstream.
The uaccess write handlers for GICD_IIDR in both GICv2 and GICv3
extract the revision field from 'reg' (the current IIDR value read back
from the emulated distributor) instead of 'val' (the value userspace is
trying to write). This means userspace can never actually change the
implementation revision — the extracted value is always the current one.
Fix the FIELD_GET to use 'val' so that userspace can select a different
revision for migration compatibility.
Fixes: 49a1a2c70a7f ("KVM: arm64: vgic-v3: Advertise GICR_CTLR.{IR, CES} as a new GICD_IIDR revision")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/20260407210949.2076251-2-dwmw2@infradead.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Marc Zyngier <maz@kernel.org>
Date: Thu Apr 23 17:36:07 2026 +0100
KVM: arm64: Wake-up from WFI when iqrchip is in userspace
commit 4ce98bf0865c349e7026ad9c14f48da264920953 upstream.
It appears that there is nothing in the wake-up path that
evaluates whether the in-kernel interrupts are pending unless
we have a vgic.
This means that the userspace irqchip support has been broken for
about four years, and nobody noticed. It was also broken before
as we wouldn't wake-up on a PMU interrupt, but hey, who cares...
It is probably time to remove the feature altogether, because it
was a terrible idea 10 years ago, and it still is.
Fixes: b57de4ffd7c6d ("KVM: arm64: Simplify kvm_cpu_has_pending_timer()")
Link: https://patch.msgid.link/20260423163607.486345-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Mon Apr 27 14:25:40 2026 +0200
KVM: x86: check for nEPT/nNPT in slow flush hypercalls
commit 464af6fc2b1dcc74005b7f58ee3812b17777efee upstream.
Checking is_guest_mode(vcpu) is incorrect, because translate_nested_gpa()
is only valid if an L2 guest is running *with nested EPT/NPT enabled*.
Instead use the same condition as translate_nested_gpa() itself.
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Fixes: aee738236dca ("KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs", 2022-11-18)
Link: https://patch.msgid.link/20260503200905.106077-1-pbonzini@redhat.com/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Sun May 3 19:19:32 2026 +0200
KVM: x86: Do IRR scan in __kvm_apic_update_irr even if PIR is empty
commit 33fd0ccd2590b470b65adcca288615ad3b5e3e06 upstream.
Fall back to apic_find_highest_vector() when PID.ON is set but PIR
turns out to be empty, to correctly report the highest pending interrupt
from the existing IRR.
In a nested VM stress test, the following WARNING fires in
vmx_check_nested_events() when kvm_cpu_has_interrupt() reports a pending
interrupt but the subsequent kvm_apic_has_interrupt() (which invokes
vmx_sync_pir_to_irr() again) returns -1:
WARNING: CPU: 99 PID: 57767 at arch/x86/kvm/vmx/nested.c:4449 vmx_check_nested_events+0x6bf/0x6e0 [kvm_intel]
Call Trace:
kvm_check_and_inject_events
vcpu_enter_guest.constprop.0
vcpu_run
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
__x64_sys_ioctl
do_syscall_64
entry_SYSCALL_64_after_hwframe
The root cause is a race between vmx_sync_pir_to_irr() on the target vCPU
and __vmx_deliver_posted_interrupt() on a sender vCPU. The sender
performs two individually-atomic operations that are not a single
transaction:
1. pi_test_and_set_pir(vector) -- sets the PIR bit
2. pi_test_and_set_on() -- sets PID.ON
The following interleaving triggers the bug:
Sender vCPU (IPI): Target vCPU (1st sync_pir_to_irr):
B1: set PIR[vector]
A1: pi_clear_on()
A2: pi_harvest_pir() -> sees B1 bit
A3: xchg() -> consumes bit, PIR=0
(1st sync returns correct max_irr)
B2: set PID.ON = 1
Target vCPU (2nd sync_pir_to_irr):
C1: pi_test_on() -> TRUE (from B2)
C2: pi_clear_on() -> ON=0
C3: pi_harvest_pir() -> PIR empty
C4: *max_irr = -1, early return
IRR NOT SCANNED
The interrupt is not lost (it resides in the IRR from the first sync and
is recovered on the next vcpu_enter_guest() iteration), but the incorrect
max_irr causes a spurious WARNING and a wasted L2 VM-Enter/VM-Exit cycle.
Fixes: b41f8638b9d3 ("KVM: VMX: Isolate pure loads from atomic XCHG when processing PIR")
Reported-by: Farrah Chen <farrah.chen@intel.com>
Analyzed-by: Chenyi Qiang <chenyi.qiang@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/kvm/20260428070349.1633238-1-chenyi.qiang@intel.com/T/
Link: https://patch.msgid.link/20260503201703.108231-2-pbonzini@redhat.com/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sean Christopherson <seanjc@google.com>
Date: Tue May 5 08:57:15 2026 +0200
KVM: x86: Fix shadow paging use-after-free due to unexpected GFN
commit 0cb2af2ea66ad8ff195c156ea690f11216285bdf upstream.
The shadow MMU computes GFNs for direct shadow pages using sp->gfn plus
the SPTE index. This assumption breaks for shadow paging if the guest
page tables are modified between VM entries (similar to commit
aad885e77496, "KVM: x86/mmu: Drop/zap existing present SPTE even
when creating an MMIO SPTE", 2026-03-27). The flow is as follows:
- a PDE is installed for a 2MB mapping, and a page in that area is
accessed. KVM creates a kvm_mmu_page consisting of 512 4KB pages;
the kvm_mmu_page is marked by FNAME(fetch) as direct-mapped because
the guest's mapping is a huge page (and thus contiguous).
- the PDE mapping is changed from outside the guest.
- the guest accesses another page in the same 2MB area. KVM installs
a new leaf SPTE and rmap entry; the SPTE uses the "correct" GFN
(i.e. based on the new mapping, as changed in the previous step) but
that GFN is outside of the [sp->gfn, sp->gfn + 511] range; therefore
the rmap entry cannot be found and removed when the kvm_mmu_page
is zapped.
- the memslot that covers the first 2MB mapping is deleted, and the
kvm_mmu_page for the now-invalid GPA is zapped. However, rmap_remove()
only looks at the [sp->gfn, sp->gfn + 511] range established in step 1,
and fails to find the rmap entry that was recorded by step 3.
- any operation that causes an rmap walk for the same page accessed
by step 3 then walks a stale rmap and dereferences a freed kvm_mmu_page.
This includes dirty logging or MMU notifier invalidations (e.g., from
MADV_DONTNEED).
The underlying issue is that KVM's walking of shadow PTEs assumes that
if a SPTE is present when KVM wants to install a non-leaf SPTE, then the
existing kvm_mmu_page must be for the correct gfn. Because the only way
for the gfn to be wrong is if KVM messed up and failed to zap a SPTE...
which shouldn't happen, but *actually* only happens in response to a
guest write.
That bug dates back literally forever, as even the first version of KVM
assumes that the GFN matches and walks into the "wrong" shadow page.
However, that was only an imprecision until 2032a93d66fa ("KVM: MMU:
Don't allocate gfns page for direct mmu pages") came along.
Fix it by checking for a target gfn mismatch and zapping the existing
SPTE. That way the old SP and rmap entries are gone, KVM installs
the rmap in the right location, and everyone is happy.
Fixes: 2032a93d66fa ("KVM: MMU: Don't allocate gfns page for direct mmu pages")
Fixes: 6aa8b732ca01 ("kvm: userspace interface")
Reported-by: Alexander Bulekov <bkov@amazon.com>
Reported-by: Fred Griffoul <fgriffo@amazon.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20260503201029.106481-1-pbonzini@redhat.com/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Eric Biggers <ebiggers@kernel.org>
Date: Thu Mar 5 19:35:55 2026 -0800
lib/crc: tests: Make crc_kunit test only the enabled CRC variants
commit 85c9f3a2b805eb96d899da7bcc38a16459aa3c16 upstream.
Like commit 4478e8eeb871 ("lib/crypto: tests: Depend on library options
rather than selecting them") did with the crypto library tests, make
crc_kunit depend on the code it tests rather than selecting it. This
follows the standard convention for KUnit and fixes an issue where
enabling KUNIT_ALL_TESTS enabled non-test code.
crc_kunit does differ from the crypto library tests in that it
consolidates the tests for multiple CRC variants, with 5 kconfig
options, into one KUnit suite. Since depending on *all* of these
kconfig options would greatly restrict the ability to enable crc_kunit,
instead just depend on *any* of these options. Update crc_kunit
accordingly to test only the reachable code.
Alternatively we could split crc_kunit into 5 test suites. But keeping
it as one is simpler for now.
Fixes: e47d9b1a76ed ("lib/crc_kunit.c: add KUnit test suite for CRC library functions")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20260306033557.250499-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lukas Wunner <lukas@wunner.de>
Date: Sun Apr 12 16:19:47 2026 +0200
lib/crypto: mpi: Fix integer underflow in mpi_read_raw_from_sgl()
commit 8c2f1288250a90a4b5cabed5d888d7e3aeed4035 upstream.
Yiming reports an integer underflow in mpi_read_raw_from_sgl() when
subtracting "lzeros" from the unsigned "nbytes".
For this to happen, the scatterlist "sgl" needs to occupy more bytes
than the "nbytes" parameter and the first "nbytes + 1" bytes of the
scatterlist must be zero. Under these conditions, the while loop
iterating over the scatterlist will count more zeroes than "nbytes",
subtract the number of zeroes from "nbytes" and cause the underflow.
When commit 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers") originally
introduced the bug, it couldn't be triggered because all callers of
mpi_read_raw_from_sgl() passed a scatterlist whose length was equal to
"nbytes".
However since commit 63ba4d67594a ("KEYS: asymmetric: Use new crypto
interface without scatterlists"), the underflow can now actually be
triggered. When invoking a KEYCTL_PKEY_ENCRYPT system call with a
larger "out_len" than "in_len" and filling the "in" buffer with zeroes,
crypto_akcipher_sync_prep() will create an all-zero scatterlist used for
both the "src" and "dst" member of struct akcipher_request and thereby
fulfil the conditions to trigger the bug:
sys_keyctl()
keyctl_pkey_e_d_s()
asymmetric_key_eds_op()
software_key_eds_op()
crypto_akcipher_sync_encrypt()
crypto_akcipher_sync_prep()
crypto_akcipher_encrypt()
rsa_enc()
mpi_read_raw_from_sgl()
To the user this will be visible as a DoS as the kernel spins forever,
causing soft lockup splats as a side effect.
Fix it.
Reported-by: Yiming Qian <yimingqian591@gmail.com> # off-list
Fixes: 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v4.4+
Reviewed-by: Ignat Korchagin <ignat@linux.win>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lore.kernel.org/r/59eca92ff4f87e2081777f1423a0efaaadcfdb39.1776003111.git.lukas@wunner.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Christian A. Ehrhardt <lk@c--e.de>
Date: Thu Mar 26 22:49:01 2026 +0100
lib/scatterlist: fix length calculations in extract_kvec_to_sg
commit 07b7d66e65d9cfe6b9c2c34aa22cfcaac37a5c45 upstream.
Patch series "Fix bugs in extract_iter_to_sg()", v3.
Fix bugs in the kvec and user variants of extract_iter_to_sg. This series
is growing due to useful remarks made by sashiko.dev.
The main bugs are:
- The length for an sglist entry when extracting from
a kvec can exceed the number of bytes in the page. This
is obviously not intended.
- When extracting a user buffer the sglist is temporarily
used as a scratch buffer for extracted page pointers.
If the sglist already contains some elements this scratch
buffer could overlap with existing entries in the sglist.
The series adds test cases to the kunit_iov_iter test that demonstrate all
of these bugs. Additionally, there is a memory leak fix for the test
itself.
The bugs were orignally introduced into kernel v6.3 where the function
lived in fs/netfs/iterator.c. It was later moved to lib/scatterlist.c in
v6.5. Thus the actual fix is only marked for backports to v6.5+.
This patch (of 5):
When extracting from a kvec to a scatterlist, do not cross page
boundaries. The required length was already calculated but not used as
intended.
Adjust the copied length if the loop runs out of sglist entries without
extracting everything.
While there, return immediately from extract_iter_to_sg if there are no
sglist entries at all.
A subsequent commit will add kunit test cases that demonstrate that the
patch is necessary.
Link: https://lkml.kernel.org/r/20260326214905.818170-1-lk@c--e.de
Link: https://lkml.kernel.org/r/20260326214905.818170-2-lk@c--e.de
Fixes: 018584697533 ("netfs: Add a function to extract an iterator into a scatterlist")
Signed-off-by: Christian A. Ehrhardt <lk@c--e.de>
Cc: David Gow <davidgow@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: <stable@vger.kernel.org> [v6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Christian A. Ehrhardt <lk@c--e.de>
Date: Thu Mar 26 22:49:02 2026 +0100
lib/scatterlist: fix temp buffer in extract_user_to_sg()
commit 118cf3f55975352ac357fb194405031458186819 upstream.
Instead of allocating a temporary buffer for extracted user pages
extract_user_to_sg() uses the end of the to be filled scatterlist as a
temporary buffer.
Fix the calculation of the start address if the scatterlist already
contains elements. The unused space starts at sgtable->sgl +
sgtable->nents not directly at sgtable->nents and the temporary buffer is
placed at the end of this unused space.
A subsequent commit will add kunit test cases that demonstrate that the
patch is necessary.
Pointed out by sashiko.dev on a previous iteration of this series.
Link: https://lkml.kernel.org/r/20260326214905.818170-3-lk@c--e.de
Fixes: 018584697533 ("netfs: Add a function to extract an iterator into a scatterlist")
Signed-off-by: Christian A. Ehrhardt <lk@c--e.de>
Cc: David Howells <dhowells@redhat.com>
Cc: David Gow <davidgow@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: <stable@vger.kernel.org> [v6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de>
Date: Tue Apr 21 10:27:01 2026 +0200
libceph: Fix slab-out-of-bounds access in auth message processing
commit 1c439de70b1c3eb3c6bffa8245c16b9fc318f114 upstream.
If a (potentially corrupted) message of type CEPH_MSG_AUTH_REPLY
contains a positive value in its result field, it is treated as an
error code by ceph_handle_auth_reply() and returned to
handle_auth_reply(). Thereafter, an attempt is made to send the
preallocated message of type CEPH_MSG_AUTH, where the returned value is
interpreted as the size of the front segment to send. If the result
value in the message is greater than the size of the memory buffer
allocated for the front segment, an out-of-bounds access occurs, and
the content of the memory region beyond this buffer is sent out.
This patch fixes the issue by treating only negative values in the
result field as errors. Positive values are therefore treated as success
in the same way as a zero value. Additionally, a BUG_ON is added to
__send_prepared_auth_request() comparing the len parameter to
front_alloc_len to prevent sending the message if it exceeds the bounds
of the allocation and to make it easier to catch any logic flaws leading
to this.
Cc: stable@vger.kernel.org
Signed-off-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Thu May 14 15:31:20 2026 +0200
Linux 7.0.7
Link: https://lore.kernel.org/r/20260512173940.117428952@linuxfoundation.org
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Tested-by: Brett A C Sheffield <bacs@librecast.net>
Tested-by: Mark Brown <broonie@kernel.org>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20260513153754.934923793@linuxfoundation.org
Tested-by: Ronald Warsow <rwarsow@gmx.de>
Tested-by: Brett A C Sheffield <bacs@librecast.net>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Peter Schneider <pschneider1968@googlemail.com>
Tested-by: Ron Economos <re@w6rz.net>
Tested-by: Mark Brown <broonie@kernel.org>
Tested-by: Stephano Cetola <stephano@cetola.net>
Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com>
Tested-by: Barry K. Nathan <barryn@pobox.com>
Tested-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Wentao Guan <guanwentao@uniontech.com>
Date: Mon May 4 09:00:20 2026 +0800
LoongArch: Fix potential ADE in loongson_gpu_fixup_dma_hang()
commit 8dfa2f8780e486d05b9a0ffce70b8f5fbd62053e upstream.
The switch case in loongson_gpu_fixup_dma_hang() may not DC2 or DC3, and
readl(crtc_reg) will access with random address, because the "device" is
from "base+PCI_DEVICE_ID", "base" is from "pdev->devfn+1". This is wrong
when my platform inserts a discrete GPU:
lspci -tv
-[0000:00]-+-00.0 Loongson Technology LLC Hyper Transport Bridge Controller
...
+-06.0 Loongson Technology LLC LG100 GPU
+-06.2 Loongson Technology LLC Device 7a37
...
Add a default switch case to fix the panic as below:
Kernel ade access[#1]:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.136-loong64-desktop-hwe+ #4
pc 90000000017e5534 ra 90000000017e54c0 tp 90000001002f8000 sp 90000001002fb6c0
a0 80000efe00003100 a1 0000000000003100 a2 0000000000000000 a3 0000000000000002
a4 90000001002fb6b4 a5 900000087cdb58fd a6 90000000027af000 a7 0000000000000001
t0 00000000000085b9 t1 000000000000ffff t2 0000000000000000 t3 0000000000000000
t4 fffffffffffffffd t5 00000000fffb6d9c t6 0000000000083b00 t7 00000000000070c0
t8 900000087cdb4d94 u0 900000087cdb58fd s9 90000001002fb826 s0 90000000031c12c8
s1 7fffffffffffff00 s2 90000000031c12d0 s3 0000000000002710 s4 0000000000000000
s5 0000000000000000 s6 9000000100053000 s7 7fffffffffffff00 s8 90000000030d4000
ra: 90000000017e54c0 loongson_gpu_fixup_dma_hang+0x40/0x210
ERA: 90000000017e5534 loongson_gpu_fixup_dma_hang+0xb4/0x210
CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
PRMD: 00000004 (PPLV0 +PIE -PWE)
EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
ESTAT: 00480000 [ADEM] (IS= ECode=8 EsubCode=1)
BADV: 7fffffffffffff00
PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
Modules linked in:
Process swapper/0 (pid: 1, threadinfo=(____ptrval____), task=(____ptrval____))
Stack : 0000000000000006 90000001002fb778 90000001002fb704 0000000000000007
0000000016a65700 90000000017e5690 000000000000ffff ffffffffffffffff
900000000209f7c0 9000000100053000 900000000209f7a8 9000000000eebc08
0000000000000000 0000000000000000 0000000000000006 90000001002fb778
90000001000530b8 90000000027af000 0000000000000000 9000000100054000
9000000100053000 9000000000ebb70c 9000000100004c00 9000000004000001
90000001002fb7e4 bae765461f31cb12 0000000000000000 0000000000000000
0000000000000006 90000000027af000 0000000000000030 90000000027af000
900000087cd6f800 9000000100053000 0000000000000000 9000000000ebc560
7a2500147cdaf720 bae765461f31cb12 0000000000000001 0000000000000030
...
Call Trace:
[<90000000017e5534>] loongson_gpu_fixup_dma_hang+0xb4/0x210
[<9000000000eebc08>] pci_fixup_device+0x108/0x280
[<9000000000ebb70c>] pci_setup_device+0x24c/0x690
[<9000000000ebc560>] pci_scan_single_device+0xe0/0x140
[<9000000000ebc684>] pci_scan_slot+0xc4/0x280
[<9000000000ebdd00>] pci_scan_child_bus_extend+0x60/0x3f0
[<9000000000f5bc94>] acpi_pci_root_create+0x2b4/0x420
[<90000000017e5e74>] pci_acpi_scan_root+0x2d4/0x440
[<9000000000f5b02c>] acpi_pci_root_add+0x21c/0x3a0
[<9000000000f4ee54>] acpi_bus_attach+0x1a4/0x3c0
[<90000000010e200c>] device_for_each_child+0x6c/0xe0
[<9000000000f4bbf4>] acpi_dev_for_each_child+0x44/0x70
[<9000000000f4ef40>] acpi_bus_attach+0x290/0x3c0
[<90000000010e200c>] device_for_each_child+0x6c/0xe0
[<9000000000f4bbf4>] acpi_dev_for_each_child+0x44/0x70
[<9000000000f4ef40>] acpi_bus_attach+0x290/0x3c0
[<9000000000f5211c>] acpi_bus_scan+0x6c/0x280
[<900000000189c028>] acpi_scan_init+0x194/0x310
[<900000000189bc6c>] acpi_init+0xcc/0x140
[<9000000000220cdc>] do_one_initcall+0x4c/0x310
[<90000000018618fc>] kernel_init_freeable+0x258/0x2d4
[<900000000184326c>] kernel_init+0x28/0x13c
[<9000000000222008>] ret_from_kernel_thread+0xc/0xa4
Cc: stable@vger.kernel.org
Fixes: 95db0c9f526d ("LoongArch: Workaround LS2K/LS7A GPU DMA hang bug")
Link: https://gist.github.com/opsiff/ebf2dac51b4013d22462f2124c55f807
Link: https://gist.github.com/opsiff/a62f2a73db0492b3c49bf223a339b133
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Huacai Chen <chenhuacai@kernel.org>
Date: Mon May 4 09:00:01 2026 +0800
LoongArch: Fix SYM_SIGFUNC_START definition for 32BIT
commit 98b8aebb14fdc0133939fd8fe07d0d98333dc976 upstream.
The SYM_SIGFUNC_START definition should match sigcontext that the length
of GPRs are 8 bytes for both 32BIT and 64BIT. So replace SZREG with 8 to
fix it.
Cc: stable@vger.kernel.org
Fixes: e4878c37f6679fde ("LoongArch: vDSO: Emit GNU_EH_FRAME correctly")
Suggested-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Qiang Ma <maqianga@uniontech.com>
Date: Mon May 4 09:00:37 2026 +0800
LoongArch: KVM: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
commit b3e31a6650d4cab63f0814c37c0b360372c6ee9e upstream.
It doesn't make sense to return the recommended maximum number of vCPUs
which exceeds the maximum possible number of vCPUs.
Other architectures have already done this, such as commit 57a2e13ebdda
("KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS")
Cc: stable@vger.kernel.org
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Xianglai Li <lixianglai@loongson.cn>
Date: Mon May 4 09:00:37 2026 +0800
LoongArch: KVM: Compile switch.S directly into the kernel
commit 5203012fa6045aac4b69d4e7c212e16dcf38ef10 upstream.
If we directly compile the switch.S file into the kernel, the address of
the kvm_exc_entry function will definitely be within the DMW memory area.
Therefore, we will no longer need to perform a copy relocation of the
kvm_exc_entry.
So this patch compiles switch.S directly into the kernel, and then remove
the copy relocation execution logic for the kvm_exc_entry function.
Cc: stable@vger.kernel.org
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Xianglai Li <lixianglai@loongson.cn>
Date: Mon May 4 09:00:37 2026 +0800
LoongArch: KVM: Fix "unreliable stack" for kvm_exc_entry
commit b323a441da602dfdfc24f30d3190cac786ffebf2 upstream.
Insert the appropriate UNWIND hint into the kvm_exc_entry assembly
function to guide the generation of correct ORC table entries, thereby
solving the timeout problem ("unreliable stack") while loading the
livepatch-sample module on a physical machine running virtual machines
with multiple vcpus.
Cc: stable@vger.kernel.org
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bibo Mao <maobibo@loongson.cn>
Date: Mon May 4 09:00:48 2026 +0800
LoongArch: KVM: Fix HW timer interrupt lost when inject interrupt by software
commit 2433f3f5724b3af569d9fb411ba728629524738b upstream.
With passthrough HW timer, timer interrupt is injected by HW. When
inject emulated CPU interrupt by software such SIP0/SIP1/IPI, HW timer
interrupt may be lost.
Here check whether there is timer tick value inversion before and after
injecting emulated CPU interrupt by software, timer enabling by reading
timer cfg register is skipped. If the timer tick value is detected with
changing, then timer should be enabled. And inject a timer interrupt by
software if there is.
Cc: <stable@vger.kernel.org>
Fixes: f45ad5b8aa93 ("LoongArch: KVM: Implement vcpu interrupt operations").
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tao Cui <cuitao@kylinos.cn>
Date: Mon May 4 09:00:38 2026 +0800
LoongArch: KVM: Fix missing EMULATE_FAIL in kvm_emu_mmio_read()
commit f26faae96c411a70641e4d21b759475caa6122d5 upstream.
In the ldptr (0x24...0x27) opcode decoding path, the default case only
breaks out but without setting "ret" value to EMULATE_FAIL. This leaves
run->mmio.len uninitialized (stale from a previous MMIO operation) while
"ret" value remains EMULATE_DO_MMIO, causing the code to proceed with an
incorrect MMIO length.
Add "ret = EMULATE_FAIL" to match the other default branches in the same
function (e.g. the 0x28...0x2e and 0x38 cases).
Cc: stable@vger.kernel.org
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bibo Mao <maobibo@loongson.cn>
Date: Mon May 4 09:00:48 2026 +0800
LoongArch: KVM: Move unconditional delay into timer clear scenery
commit 5a873d77ba792410a796595a917be6a440f9b7d2 upstream.
When timer interrupt arrives in guest kernel, guest kernel clears the
timer interrupt and program timer with the next incoming event.
During this stage, timer tick is -1 and timer interrupt status is
disabled in ESTAT register. KVM hypervisor need write zero with timer
tick register and wait timer interrupt injection from HW side, and
then clear timer interrupt.
So there is 2 cycle delay in KVM hypervisor to emulate such scenery,
and the delay is unnecessary if there is no need to clear the timer
interrupt.
Here move 2 cycle delay into timer clear scenery and add timer ESTAT
checking after delay, and set max timer expire value if timer interrupt
does not arrive still.
Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tao Cui <cuitao@kylinos.cn>
Date: Mon May 4 09:00:38 2026 +0800
LoongArch: KVM: Use kvm_set_pte() in kvm_flush_pte()
commit 81e18777d61440511451866c7c80b34a8bdd6b33 upstream.
kvm_flush_pte() is the only caller that directly assigns *pte instead
of using the kvm_set_pte() wrapper. Use the wrapper for consistency with
the rest of the file.
No functional change intended.
Cc: stable@vger.kernel.org
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Huacai Chen <chenhuacai@kernel.org>
Date: Mon May 4 09:00:20 2026 +0800
LoongArch: Use per-root-bridge PCIH flag to skip mem resource fixup
commit 49f33840dcc907d21313d369e34872880846b61c upstream.
When firmware enables 64-bit PCI host bridge support, some root bridges
already provide valid 64-bit mem resource windows through ACPI.
In this case, the LoongArch-specific mem resource high-bits fixup in
acpi_prepare_root_resources() should not be applied unconditionally.
Otherwise, the kernel may override the native resource layout derived
from firmware, and later BAR assignment can fail to place device BARs
into the intended 64-bit address space correctly.
Add a per-root-bridge ACPI flag, PCIH, and evaluate it from the current
root bridge device scope. When PCIH is set, skip the mem resource high-
bits fixup path and let the kernel use the firmware-provided resource
description directly. When PCIH is absent or cleared, keep the existing
behavior and continue filling the high address bits from the host bridge
address.
This makes the behavior per-root-bridge configurable and avoids breaking
valid 64-bit BAR space allocation on bridges whose 64-bit windows have
already been fully described by firmware.
Cc: stable@vger.kernel.org
Suggested-by: Chao Li <lichao@loongson.cn>
Tested-by: Dongyan Qian <qiandongyan@loongson.cn>
Signed-off-by: Dongyan Qian <qiandongyan@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Junrui Luo <moonafterrain@outlook.com>
Date: Thu Apr 16 11:39:56 2026 +0800
md/raid10: fix divide-by-zero in setup_geo() with zero far_copies
commit 9aa6d860b0930e2f72795665c42c44252a558a0c upstream.
setup_geo() extracts near_copies (nc) and far_copies (fc) from the
user-provided layout parameter without checking for zero. When fc=0
with the "improved" far set layout selected, 'geo->far_set_size =
disks / fc' triggers a divide-by-zero.
Validate nc and fc immediately after extraction, returning -1 if
either is zero.
Fixes: 475901aff158 ("MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1)")
Cc: stable@vger.kernel.org
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Link: https://lore.kernel.org/linux-raid/SYBPR01MB7881A5E2556806CC1D318582AF232@SYBPR01MB7881.ausprd01.prod.outlook.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: SeongJae Park <sj@kernel.org>
Date: Sun Apr 19 09:10:01 2026 -0700
mm/damon/lru_sort: detect and use fresh enabled and kdamond_pid values
commit b98b7ff6025ae82570d4915e083f0cbd8d48b3cf upstream.
DAMON_LRU_SORT updates 'enabled' and 'kdamond_pid' parameter values, which
represents the running status of its kdamond, when the user explicitly
requests start/stop of the kdamond. The kdamond can, however, be stopped
in events other than the explicit user request in the following three
events.
1. ctx->regions_score_histogram allocation failure at beginning of the
execution,
2. damon_commit_ctx() failure due to invalid user input, and
3. damon_commit_ctx() failure due to its internal allocation failures.
Hence, if the kdamond is stopped by the above three events, the values of
the status parameters can be stale. Users could show the stale values and
be confused. This is already bad, but the real consequence is worse.
DAMON_LRU_SORT avoids unnecessary damon_start() and damon_stop() calls
based on the 'enabled' parameter value. And the update of 'enabled'
parameter value depends on the damon_start() and damon_stop() call
results. Hence, once the kdamond has stopped by the unintentional events,
the user cannot restart the kdamond before the system reboot. For
example, the issue can be reproduced via below steps.
# cd /sys/module/damon_lru_sort/parameters
#
# # start DAMON_LRU_SORT
# echo Y > enabled
# ps -ef | grep kdamond
root 806 2 0 17:53 ? 00:00:00 [kdamond.0]
root 808 803 0 17:53 pts/4 00:00:00 grep kdamond
#
# # commit wrong input to stop kdamond withou explicit stop request
# echo 3 > addr_unit
# echo Y > commit_inputs
bash: echo: write error: Invalid argument
#
# # confirm kdamond is stopped
# ps -ef | grep kdamond
root 811 803 0 17:53 pts/4 00:00:00 grep kdamond
#
# # users casn now show stable status
# cat enabled
Y
# cat kdamond_pid
806
#
# # even after fixing the wrong parameter,
# # kdamond cannot be restarted.
# echo 1 > addr_unit
# echo Y > enabled
# ps -ef | grep kdamond
root 815 803 0 17:54 pts/4 00:00:00 grep kdamond
The problem will only rarely happen in real and common setups for the
following reasons. The allocation failures are unlikely in such setups
since those allocations are arguably too small to fail. Also sane users
on real production environments may not commit wrong input parameters.
But once it happens, the consequence is quite bad. And the bug is a bug.
The issue stems from the fact that there are multiple events that can
change the status, and following all the events is challenging.
Dynamically detect and use the fresh status for the parameters when those
are requested.
Link: https://lore.kernel.org/20260419161003.79176-3-sj@kernel.org
Fixes: 40e983cca927 ("mm/damon: introduce DAMON-based LRU-lists Sorting")
Co-developed-by: Liew Rui Yan <aethernet65535@gmail.com>
Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.0.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: SeongJae Park <sj@kernel.org>
Date: Sun Apr 19 09:10:00 2026 -0700
mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values
commit 64a140afa5ed1c6f5ba6d451512cbdbbab1ba339 upstream.
Patch series "mm/damon/modules: detect and use fresh status", v3.
DAMON modules including DAMON_RECLAIM, DAMON_LRU_SORT and DAMON_STAT
commonly expose the kdamond running status via their parameters. Under
certain scenarios including wrong user inputs and memory allocation
failures, those parameter values can be stale. It can confuse users. For
DAMON_RECLAIM and DAMON_LRU_SORT, it even makes the kdamond unable to be
restarted before the system reboot.
The problem comes from the fact that there are multiple events for the
status changes and it is difficult to follow up all the scenarios. Fix
the issue by detecting and using the status on demand, instead of using a
cached status that is difficult to be updated.
Patches 1-3 fix the bugs in DAMON_RECLAIM, DAMON_LRU_SORT and DAMON_STAT
in the order.
This patch (of 3):
DAMON_RECLAIM updates 'enabled' and 'kdamond_pid' parameter values, which
represents the running status of its kdamond, when the user explicitly
requests start/stop of the kdamond. The kdamond can, however, be stopped
in events other than the explicit user request in the following three
events.
1. ctx->regions_score_histogram allocation failure at beginning of the
execution,
2. damon_commit_ctx() failure due to invalid user input, and
3. damon_commit_ctx() failure due to its internal allocation failures.
Hence, if the kdamond is stopped by the above three events, the values of
the status parameters can be stale. Users could show the stale values and
be confused. This is already bad, but the real consequence is worse.
DAMON_RECLAIM avoids unnecessary damon_start() and damon_stop() calls
based on the 'enabled' parameter value. And the update of 'enabled'
parameter value depends on the damon_start() and damon_stop() call
results. Hence, once the kdamond has stopped by the unintentional events,
the user cannot restart the kdamond before the system reboot. For
example, the issue can be reproduced via below steps.
# cd /sys/module/damon_reclaim/parameters
#
# # start DAMON_RECLAIM
# echo Y > enabled
# ps -ef | grep kdamond
root 806 2 0 17:53 ? 00:00:00 [kdamond.0]
root 808 803 0 17:53 pts/4 00:00:00 grep kdamond
#
# # commit wrong input to stop kdamond withou explicit stop request
# echo 3 > addr_unit
# echo Y > commit_inputs
bash: echo: write error: Invalid argument
#
# # confirm kdamond is stopped
# ps -ef | grep kdamond
root 811 803 0 17:53 pts/4 00:00:00 grep kdamond
#
# # users casn now show stable status
# cat enabled
Y
# cat kdamond_pid
806
#
# # even after fixing the wrong parameter,
# # kdamond cannot be restarted.
# echo 1 > addr_unit
# echo Y > enabled
# ps -ef | grep kdamond
root 815 803 0 17:54 pts/4 00:00:00 grep kdamond
The problem will only rarely happen in real and common setups for the
following reasons. The allocation failures are unlikely in such setups
since those allocations are arguably too small to fail. Also sane users
on real production environments may not commit wrong input parameters.
But once it happens, the consequence is quite bad. And the bug is a bug.
The issue stems from the fact that there are multiple events that can
change the status, and following all the events is challenging.
Dynamically detect and use the fresh status for the parameters when those
are requested.
Link: https://lore.kernel.org/20260419161003.79176-1-sj@kernel.org
Link: https://lore.kernel.org/20260419161003.79176-2-sj@kernel.org
Fixes: e035c280f6df ("mm/damon/reclaim: support online inputs update")
Co-developed-by: Liew Rui Yan <aethernet65535@gmail.com>
Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 5.19.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: SeongJae Park <sj@kernel.org>
Date: Sun Apr 19 09:10:02 2026 -0700
mm/damon/stat: detect and use fresh enabled value
commit f98590bc08d4aea435e1c2213e38bae0d9e9a7bb upstream.
DAMON_STAT updates 'enabled' parameter value, which represents the running
status of its kdamond, when the user explicitly requests start/stop of the
kdamond. The kdamond can, however, be stopped even if the user explicitly
requested the stop, if ctx->regions_score_histogram allocation failure at
beginning of the execution of the kdamond. Hence, if the kdamond is
stopped by the allocation failure, the value of the parameter can be
stale.
Users could show the stale value and be confused. The problem will only
rarely happen in real and common setups because the allocation is arguably
too small to fail. Also, unlike the similar bugs that are now fixed in
DAMON_RECLAIM and DAMON_LRU_SORT, kdamond can be restarted in this case,
because DAMON_STAT force-updates the enabled parameter value for user
inputs. The bug is a bug, though.
The issue stems from the fact that there are multiple events that can
change the status, and following all the events is challenging.
Dynamically detect and use the fresh status for the parameters when those
are requested.
The issue was dicovered [1] by Sashiko.
Link: https://lore.kernel.org/20260419161003.79176-4-sj@kernel.org
Link: https://lore.kernel.org/20260416040602.88665-1-sj@kernel.org [1]
Fixes: 369c415e6073 ("mm/damon: introduce DAMON_STAT module")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Liew Rui Yan <aethernet65535@gmail.com>
Cc: <stable@vger.kernel.org> # 6.17.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: SeongJae Park <sj@kernel.org>
Date: Thu Apr 23 08:02:51 2026 -0700
mm/damon/sysfs-schemes: protect memcg_path kfree() with damon_sysfs_lock
commit 1e68eb96e8beb1abefd12dd22c5637795d8a877e upstream.
Patch series "mm/damon/sysfs-schemes: fix use-after-free for [memcg_]path".
Reads of 'memcg_path' and 'path' files in DAMON sysfs interface could race
with their writes, results in use-after-free. Fix those.
This patch (of 2):
damon_sysfs_scheme_filter->mmecg_path can be read and written by users,
via DAMON sysfs memcg_path file. It can also be indirectly read, for the
parameters {on,off}line committing to DAMON. The reads for parameters
committing are protected by damon_sysfs_lock to avoid the sysfs files
being destroyed while any of the parameters are being read. But the
user-driven direct reads and writes are not protected by any lock, while
the write is deallocating the memcg_path-pointing buffer. As a result,
the readers could read the already freed buffer (user-after-free). Note
that the user-reads don't race when the same open file is used by the
writer, due to kernfs's open file locking. Nonetheless, doing the reads
and writes with separate open files would be common. Fix it by protecting
both the user-direct reads and writes with damon_sysfs_lock.
Link: https://lore.kernel.org/20260423150253.111520-1-sj@kernel.org
Link: https://lore.kernel.org/20260423150253.111520-2-sj@kernel.org
Fixes: 4f489fe6afb3 ("mm/damon/sysfs-schemes: free old damon_sysfs_scheme_filter->memcg_path on write")
Co-developed-by: Junxi Qian <qjx1298677004@gmail.com>
Signed-off-by: Junxi Qian <qjx1298677004@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.16.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: SeongJae Park <sj@kernel.org>
Date: Thu Apr 23 08:02:52 2026 -0700
mm/damon/sysfs-schemes: protect path kfree() with damon_sysfs_lock
commit cf3b71421ca00807328c6d9cd242f9de3b77a4bf upstream.
damon_sysfs_quot_goal->path can be read and written by users, via DAMON
sysfs 'path' file. It can also be indirectly read, for the parameters
{on,off}line committing to DAMON. The reads for parameters committing are
protected by damon_sysfs_lock to avoid the sysfs files being destroyed
while any of the parameters are being read. But the user-driven direct
reads and writes are not protected by any lock, while the write is
deallocating the path-pointing buffer. As a result, the readers could
read the already freed buffer (user-after-free). Note that the user-reads
don't race when the same open file is used by the writer, due to kernfs's
open file locking. Nonetheless, doing the reads and writes with separate
open files would be common. Fix it by protecting both the user-direct
reads and writes with damon_sysfs_lock.
Link: https://lore.kernel.org/20260423150253.111520-3-sj@kernel.org
Fixes: c41e253a411e ("mm/damon/sysfs-schemes: implement path file under quota goal directory")
Co-developed-by: Junxi Qian <qjx1298677004@gmail.com>
Signed-off-by: Junxi Qian <qjx1298677004@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.19.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sang-Heon Jeon <ekffu200098@gmail.com>
Date: Wed Apr 22 23:33:53 2026 +0900
mm/hugetlb_cma: round up per_node before logging it
commit 8f5ce56b76303c55b78a87af996e2e0f8535f979 upstream.
When the user requests a total hugetlb CMA size without per-node
specification, hugetlb_cma_reserve() computes per_node from
hugetlb_cma_size and the number of nodes that have memory
per_node = DIV_ROUND_UP(hugetlb_cma_size,
nodes_weight(hugetlb_bootmem_nodes));
The reservation loop later computes
size = round_up(min(per_node, hugetlb_cma_size - reserved),
PAGE_SIZE << order);
So the actually reserved per_node size is multiple of (PAGE_SIZE <<
order), but the logged per_node is not rounded up, so it may be smaller
than the actual reserved size.
For example, as the existing comment describes, if a 3 GB area is
requested on a machine with 4 NUMA nodes that have memory, 1 GB is
allocated on the first three nodes, but the printed log is
hugetlb_cma: reserve 3072 MiB, up to 768 MiB per node
Round per_node up to (PAGE_SIZE << order) before logging so that the
printed log always matches the actual reserved size. No functional change
to the actual reservation size, as the following case analysis shows
1. remaining (hugetlb_cma_size - reserved) >= rounded per_node
- AS-IS: min() picks unrounded per_node;
round_up() returns rounded per_node
- TO-BE: min() picks rounded per_node;
round_up() returns rounded per_node (no-op)
2. remaining < unrounded per_node
- AS-IS: min() picks remaining;
round_up() returns round_up(remaining)
- TO-BE: min() picks remaining;
round_up() returns round_up(remaining)
3. unrounded per_node <= remaining < rounded per_node
- AS-IS: min() picks unrounded per_node;
round_up() returns rounded per_node
- TO-BE: min() picks remaining;
round_up() returns round_up(remaining) equals rounded per_node
Link: https://lore.kernel.org/20260422143353.852257-1-ekffu200098@gmail.com
Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") # 5.7
Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Cc: David Hildenbrand <david@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Avri Altman <avri.altman@sandisk.com>
Date: Tue May 5 06:02:49 2026 -0400
mmc: core: Add quirk for incorrect manufacturing date
[ Upstream commit 263ff314cc5602599d481b0912a381555fcbad28 ]
Some eMMC vendors need to report manufacturing dates beyond 2025 but are
reluctant to update the EXT_CSD revision from 8 to 9. Changing the
Updating the EXT_CSD revision may involve additional testing or
qualification steps with customers. To ease this transition and avoid a
full re-qualification process, a workaround is needed. This
patch introduces a temporary quirk that re-purposes the year codes
corresponding to 2010, 2011, and 2012 to represent the years 2026, 2027,
and 2028, respectively. This solution is only valid for this three-year
period.
After 2028, vendors must update their firmware to set EXT_CSD_REV=9 to
continue reporting the correct manufacturing date in compliance with the
JEDEC standard.
The `MMC_QUIRK_BROKEN_MDT` is introduced and enabled for all Sandisk
devices to handle this behavior.
Signed-off-by: Avri Altman <avri.altman@sandisk.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Stable-dep-of: d6bf2e64dec8 ("mmc: core: Optimize time for secure erase/trim for some Kingston eMMCs")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Avri Altman <avri.altman@sandisk.com>
Date: Tue May 5 06:02:48 2026 -0400
mmc: core: Adjust MDT beyond 2025
[ Upstream commit 3e487a634bc019166e452ea276f7522710eda9f4 ]
JEDEC JESD84-B51B which was released in September 2025, increases the
manufacturing year limit for eMMC devices. The eMMC manufacturing year
is stored in a 4-bit field in the CID register. Originally, it covered
1997–2012. Later, with EXT_CSD_REV=8, it was extended up to 2025. Now,
with EXT_CSD_REV=9, the range is rolled over by another 16 years, up to
2038.
The mapping is as follows:
cid[8..11] | rev ≤ 4 | 8 ≥ rev > 4 | rev > 8
---------------------------------------------
0 | 1997 | 2013 | 2029
1 | 1998 | 2014 | 2030
2 | 1999 | 2015 | 2031
3 | 2000 | 2016 | 2032
4 | 2001 | 2017 | 2033
5 | 2002 | 2018 | 2034
6 | 2003 | 2019 | 2035
7 | 2004 | 2020 | 2036
8 | 2005 | 2021 | 2037
9 | 2006 | 2022 | 2038
10 | 2007 | 2023 |
11 | 2008 | 2024 |
12 | 2009 | 2025 |
13 | 2010 | | 2026
14 | 2011 | | 2027
15 | 2012 | | 2028
Signed-off-by: Avri Altman <avri.altman@sandisk.com>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Stable-dep-of: d6bf2e64dec8 ("mmc: core: Optimize time for secure erase/trim for some Kingston eMMCs")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Luke Wang <ziniu.wang_1@nxp.com>
Date: Tue May 5 06:02:50 2026 -0400
mmc: core: Optimize time for secure erase/trim for some Kingston eMMCs
[ Upstream commit d6bf2e64dec87322f2b11565ddb59c0e967f96e3 ]
Kingston eMMC IY2964 and IB2932 takes a fixed ~2 seconds for each secure
erase/trim operation regardless of size - that is, a single secure
erase/trim operation of 1MB takes the same time as 1GB. With default
calculated 3.5MB max discard size, secure erase 1GB requires ~300 separate
operations taking ~10 minutes total.
Add a card quirk, MMC_QUIRK_FIXED_SECURE_ERASE_TRIM_TIME, to set maximum
secure erase size for those devices. This allows 1GB secure erase to
complete in a single operation, reducing time from 10 minutes to just 2
seconds.
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Mon Apr 27 21:54:35 2026 +0200
mptcp: fastclose msk when linger time is 0
commit f14d6e9c3678a067f304abba561e0c5446c7e845 upstream.
The SO_LINGER socket option has been supported for a while with MPTCP
sockets [1], but it didn't cause the equivalent of a TCP reset as
expected when enabled and its time was set to 0. This was causing some
behavioural differences with TCP where some connections were not
promptly stopped as expected.
To fix that, an extra condition is checked at close() time before
sending an MP_FASTCLOSE, the MPTCP equivalent of a TCP reset.
Note that backporting up to [1] will be difficult as more changes are
needed to be able to send MP_FASTCLOSE. It seems better to stop at [2],
which was supposed to already imitate TCP.
Validated with MPTCP packetdrill tests [3].
Fixes: 268b12387460 ("mptcp: setsockopt: support SO_LINGER") [1]
Fixes: d21f83485518 ("mptcp: use fastclose on more edge scenarios") [2]
Cc: stable@vger.kernel.org
Reported-by: Lance Tuller <lance@lance0.com>
Closes: https://github.com/lance0/xfr/pull/67
Link: https://github.com/multipath-tcp/packetdrill/pull/196 [3]
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260427-net-mptcp-misc-fixes-7-1-rc2-v1-3-7432b7f279fa@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Paolo Abeni <pabeni@redhat.com>
Date: Fri May 1 21:35:36 2026 +0200
mptcp: fix rx timestamp corruption on fastopen
commit 6254a16d6f0c672e3809ca5d7c9a28a55d71f764 upstream.
The skb cb offset containing the timestamp presence flag is cleared
before loading such information. Cache such value before MPTCP CB
initialization.
Fixes: 36b122baf6a8 ("mptcp: add subflow_v(4,6)_send_synack()")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-3-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Gang Yan <yangang@kylinos.cn>
Date: Mon Apr 27 21:54:34 2026 +0200
mptcp: fix scheduling with atomic in timestamp sockopt
commit b5c52908d52c6c8eb8933264aa6087a0600fd892 upstream.
Using lock_sock_fast() (atomic context) around sock_set_timestamp()
and sock_set_timestamping() is unsafe, as both helpers can sleep.
Replace lock_sock_fast() with sleepable lock_sock()/release_sock()
to avoid scheduling while atomic panic.
Fixes: 9061f24bf82e ("mptcp: sockopt: propagate timestamp request to subflows")
Cc: stable@vger.kernel.org
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260420093343.16443-1-gang.yan@linux.dev
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260427-net-mptcp-misc-fixes-7-1-rc2-v1-2-7432b7f279fa@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Tue May 5 17:00:50 2026 +0200
mptcp: pm: ADD_ADDR rtx: allow ID 0
commit 03f324f3f1f7619a47b9c91282cb12775ab0a2f1 upstream.
ADD_ADDR can be sent for the ID 0, which corresponds to the local
address and port linked to the initial subflow.
Indeed, this address could be removed, and re-added later on, e.g. what
is done in the "delete re-add signal" MPTCP Join selftests. So no reason
to ignore it.
Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-2-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Tue May 5 17:00:52 2026 +0200
mptcp: pm: ADD_ADDR rtx: always decrease sk refcount
commit 9634cb35af17019baec21ca648516ce376fa10e6 upstream.
When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer().
It should then be released in all cases at the end.
Some (unlikely) checks were returning directly instead of calling
sock_put() to decrease the refcount. Jump to a new 'exit' label to call
__sock_put() (which will become sock_put() in the next commit) to fix
this potential leak.
While at it, drop the '!msk' check which cannot happen because it is
never reset, and explicitly mark the remaining one as "unlikely".
Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-4-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Tue May 5 17:00:51 2026 +0200
mptcp: pm: ADD_ADDR rtx: fix potential data-race
commit 5cd6e0ad79d2615264f63929f8b457ad97ae550d upstream.
This mptcp_pm_add_timer() helper is executed as a timer callback in
softirq context. To avoid any data races, the socket lock needs to be
held with bh_lock_sock().
If the socket is in use, retry again soon after, similar to what is done
with the keepalive timer.
Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-3-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Tue May 5 17:00:53 2026 +0200
mptcp: pm: ADD_ADDR rtx: free sk if last
commit b7b9a461569734d33d3259d58d2507adfac107ed upstream.
When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer(),
and released at the end.
If at that moment, it was the last reference being held, the sk would
not be freed. sock_put() should then be called instead of __sock_put().
But that's not enough: if it is the last reference, sock_put() will call
sk_free(), which will end up calling sk_stop_timer_sync() on the same
timer, and waiting indefinitely to finish. So it is needed to mark that
the timer is done at the end of the timer handler when it has not been
rescheduled, not to call sk_stop_timer_sync() on "itself".
Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-5-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Tue May 5 17:00:54 2026 +0200
mptcp: pm: ADD_ADDR rtx: resched blocked ADD_ADDR quicker
commit 3cf12492891c4b5ff54dda404a2de4ec54c9e1b5 upstream.
When an ADD_ADDR needs to be retransmitted and another one has already
been prepared -- e.g. multiple ADD_ADDRs have been sent in a row and
need to be retransmitted later -- this additional retransmission will
need to wait.
In this case, the timer was reset to TCP_RTO_MAX / 8, which is ~15
seconds. This delay is unnecessary long: it should just be rescheduled
at the next opportunity, e.g. after the retransmission timeout.
Without this modification, some issues can be seen from time to time in
the selftests when multiple ADD_ADDRs are sent, and the host takes time
to process them, e.g. the "signal addresses, ADD_ADDR timeout" MPTCP
Join selftest, especially with a debug kernel config.
Note that on older kernels, 'timeout' is not available. It should be
enough to replace it by one second (HZ).
Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-6-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Tue May 5 17:00:56 2026 +0200
mptcp: pm: ADD_ADDR rtx: return early if no retrans
commit 62a9b19dce77e72426f049fb99b9d1d032b9a8ea upstream.
No need to iterate over all subflows if there is no retransmission
needed.
Exit early in this case then.
Fixes: 30549eebc4d8 ("mptcp: make ADD_ADDR retransmission timeout adaptive")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-8-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Tue May 5 17:00:55 2026 +0200
mptcp: pm: ADD_ADDR rtx: skip inactive subflows
commit c6d395e2de1306b5fef0344a3c3835fbbfaa18be upstream.
When looking at the maximum RTO amongst the subflows, inactive subflows
were taken into account: that includes stale ones, and the initial one
if it has been already been closed.
Unusable subflows are now simply skipped. Stale ones are used as an
alternative: if there are only stale ones, to take their maximum RTO and
avoid to eventually fallback to net.mptcp.add_addr_timeout, which is set
to 2 minutes by default.
Fixes: 30549eebc4d8 ("mptcp: make ADD_ADDR retransmission timeout adaptive")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-7-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Tue May 5 17:00:49 2026 +0200
mptcp: pm: kernel: correctly retransmit ADD_ADDR ID 0
commit b12014d2d36eaed4e4bec5f1ac7e91110eeb100d upstream.
When adding the ADD_ADDR to the list, the address including the IP, port
and ID are copied. On the other hand, when the endpoint corresponds to
the one from the initial subflow, the ID is set to 0, as specified by
the MPTCP protocol.
The issue is that the ID was reset after having copied the ID in the
ADD_ADDR entry. So the retransmission was done, but using a different ID
than the initial one.
Fixes: 8b8ed1b429f8 ("mptcp: pm: reuse ID 0 after delete and re-add")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-1-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Mon Apr 27 21:54:36 2026 +0200
mptcp: pm: kernel: reset fullmesh counter after flush
commit 1774d3cf3cf17baaf30c095606cda496268283b3 upstream.
This variable counts how many MPTCP endpoints have a 'fullmesh' flag
set. After having flushed all MPTCP endpoints, it is then needed to
reset this counter.
Without this reset, this counter exposed to the userspace is wrong, but
also non-fullmesh endpoints added after the flush will not be taken into
account to create subflows in reaction to ADD_ADDRs.
Fixes: f88191c7f361 ("mptcp: pm: in-kernel: record fullmesh endp nb")
Cc: stable@vger.kernel.org
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260422-mptcp-inc-limits-v6-0-903181771530%40kernel.org?part=15
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260427-net-mptcp-misc-fixes-7-1-rc2-v1-4-7432b7f279fa@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Tue May 5 17:00:57 2026 +0200
mptcp: pm: prio: skip closed subflows
commit 166b78344031bf7ac9f55cb5282776cfd85f220e upstream.
When sending an MP_PRIO, closed subflows need to be skipped.
This fixes the case where the initial subflow got closed, re-opened
later, then an MP_PRIO is needed for the same local address.
Note that explicit MP_PRIO cannot be sent during the 3WHS, so it is fine
to use __mptcp_subflow_active().
Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support")
Cc: stable@vger.kernel.org
Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-9-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Fri May 1 21:35:37 2026 +0200
mptcp: sockopt: increase seq in mptcp_setsockopt_all_sf
commit 70ece9d7021c54cf40c72b31b066e9088f5f75f5 upstream.
mptcp_setsockopt_all_sf() was missing a call to sockopt_seq_inc(). This
is required not to cause missing synchronization for newer subflows
created later on.
This helper is called each time a socket option is set on subflows, and
future ones will need to inherit this option after their creation.
Fixes: 51c5fd09e1b4 ("mptcp: add TCP_MAXSEG sockopt support")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-4-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Gang Yan <yangang@kylinos.cn>
Date: Mon Apr 27 21:54:33 2026 +0200
mptcp: sockopt: set timestamp flags on subflow socket, not msk
commit 5f95c21fc23a7ef22b4d27d1ed9bb55557ffb926 upstream.
Both mptcp_setsockopt_sol_socket_tstamp() and
mptcp_setsockopt_sol_socket_timestamping() iterate over subflows,
acquire the subflow socket lock, but then erroneously pass the MPTCP
msk socket to sock_set_timestamp() / sock_set_timestamping() instead
of the subflow ssk. As a result, the timestamp flags are set on the
wrong socket and have no effect on the actual subflows.
Pass ssk instead of sk to both helpers.
Fixes: 9061f24bf82e ("mptcp: sockopt: propagate timestamp request to subflows")
Cc: stable@vger.kernel.org
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260427-net-mptcp-misc-fixes-7-1-rc2-v1-1-7432b7f279fa@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Shardul Bankar <shardul.b@mpiricsoftware.com>
Date: Fri May 1 21:35:34 2026 +0200
mptcp: use MPJoinSynAckHMacFailure for SynAck HMAC failure
commit c4a99a921949cddc590b22bb14eeb23dffcc3ba6 upstream.
In subflow_finish_connect(), HMAC validation of the server's HMAC
in SYN/ACK + MP_JOIN increments MPTCP_MIB_JOINACKMAC ("HMAC was
wrong on ACK + MP_JOIN") on failure. The function processes the
SYN/ACK, not the ACK; the matching MPTCP_MIB_JOINSYNACKMAC counter
("HMAC was wrong on SYN/ACK + MP_JOIN") exists but is not
incremented anywhere in the tree.
The mirror site on the server, subflow_syn_recv_sock(), already
uses JOINACKMAC correctly for ACK HMAC failure. Use JOINSYNACKMAC
at the SYN/ACK validation site so each counter reflects the packet
whose HMAC actually failed.
Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure")
Cc: stable@vger.kernel.org
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-1-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Shardul Bankar <shardul.b@mpiricsoftware.com>
Date: Fri May 1 21:35:35 2026 +0200
mptcp: use MPTCP_RST_EMPTCP for ACK HMAC validation failure
commit a6da02d4c00fdda2417e42ad2b762a9209e6cc49 upstream.
When HMAC validation fails on a received ACK + MP_JOIN in
subflow_syn_recv_sock(), the subflow is reset with reason
MPTCP_RST_EPROHIBIT ("Administratively prohibited"). This is
incorrect: HMAC validation failure is an MPTCP protocol-level
error, not an administrative policy denial.
The mirror site on the client, in subflow_finish_connect(), already
uses MPTCP_RST_EMPTCP ("MPTCP-specific error") for the same kind of
HMAC failure on the SYN/ACK + MP_JOIN. Use the same reason on the
server side for symmetry and accuracy.
Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Fixes: 443041deb5ef ("mptcp: fix NULL pointer in can_accept_new_subflow")
Cc: stable@vger.kernel.org
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-2-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tudor Ambarus <tudor.ambarus@linaro.org>
Date: Fri Apr 17 15:24:39 2026 +0000
mtd: spi-nor: debugfs: fix out-of-bounds read in spi_nor_params_show()
commit e47029b977e747cb3a9174308fd55762cce70147 upstream.
Sashiko noticed an out-of-bounds read [1].
In spi_nor_params_show(), the snor_f_names array is passed to
spi_nor_print_flags() using sizeof(snor_f_names).
Since snor_f_names is an array of pointers, sizeof() returns the total
number of bytes occupied by the pointers
(element_count * sizeof(void *))
rather than the element count itself. On 64-bit systems, this makes the
passed length 8x larger than intended.
Inside spi_nor_print_flags(), the 'names_len' argument is used to
bounds-check the 'names' array access. An out-of-bounds read occurs
if a flag bit is set that exceeds the array's actual element count
but is within the inflated byte-size count.
Correct this by using ARRAY_SIZE() to pass the actual number of
string pointers in the array.
Cc: stable@vger.kernel.org
Fixes: 0257be79fc4a ("mtd: spi-nor: expose internal parameters via debugfs")
Closes: https://sashiko.dev/#/patchset/20260417-die-erase-fix-v2-1-73bb7004ebad%40infineon.com [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Takahiro Kuwano <takahiro.kuwano@infineon.com>
Reviewed-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Nan Li <tonanli66@gmail.com>
Date: Fri May 1 09:08:44 2026 +0800
net/rds: handle zerocopy send cleanup before the message is queued
commit 44b550d88b267320459d518c0743a241ab2108fa upstream.
A zerocopy send can fail after user pages have been pinned but before
the message is attached to the sending socket.
The purge path currently infers zerocopy state from rm->m_rs, so an
unqueued message can be cleaned up as if it owned normal payload pages.
However, zerocopy ownership is really determined by the presence of
op_mmp_znotifier, regardless of whether the message has reached the
socket queue.
Capture op_mmp_znotifier up front in rds_message_purge() and use it as
the cleanup discriminator. If the message is already associated with a
socket, keep the existing completion path. Otherwise, drop the pinned
page accounting directly and release the notifier before putting the
payload pages.
This keeps early send failure cleanup consistent with the zerocopy
lifetime rules without changing the normal queued completion path.
Fixes: 0cebaccef3ac ("rds: zerocopy Tx support.")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Co-developed-by: Xiao Liu <lx24@stu.ynu.edu.cn>
Signed-off-by: Xiao Liu <lx24@stu.ynu.edu.cn>
Signed-off-by: Nan Li <tonanli66@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Reviewed-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/d2ea98a6313d5467bac00f7c9fef8c7acddb9258.1777550074.git.tonanli66@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Thu Apr 30 11:29:55 2026 -0400
net/sched: sch_red: Replace direct dequeue call with peek and qdisc_dequeue_peeked
commit 458d5615272d3de535748342eb68ca492343048c upstream.
When red qdisc has children (eg qfq qdisc) whose peek() callback is
qdisc_peek_dequeued(), we could get a kernel panic. When the parent of such
qdiscs (eg illustrated in patch #3 as tbf) wants to retrieve an skb from
its child (red in this case), it will do the following:
1a. do a peek() - and when sensing there's an skb the child can offer, then
- the child in this case(red) calls its child's (qfq) peek.
qfq does the right thing and will return the gso_skb queue packet.
Note: if there wasnt a gso_skb entry then qfq will store it there.
1b. invoke a dequeue() on the child (red). And herein lies the problem.
- red will call the child's dequeue() which will essentially just
try to grab something of qfq's queue.
[ 78.667668][ T363] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
[ 78.667927][ T363] CPU: 1 UID: 0 PID: 363 Comm: ping Not tainted 7.1.0-rc1-00033-g46f74a3f7d57-dirty #790 PREEMPT(full)
[ 78.668263][ T363] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 78.668486][ T363] RIP: 0010:qfq_dequeue+0x446/0xc90 [sch_qfq]
[ 78.668718][ T363] Code: 54 c0 e8 dd 90 00 f1 48 c7 c7 e0 03 54 c0 48 89 de e8 ce 90 00 f1 48 8d 7b 48 b8 ff ff 37 00 48 89 fa 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 05 e8 ef a1 e1 f1 48 8b 7b 48 48 8d 54 24 58 48 8d
[ 78.669312][ T363] RSP: 0018:ffff88810de573e0 EFLAGS: 00010216
[ 78.669533][ T363] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 78.669790][ T363] RDX: 0000000000000009 RSI: 0000000000000004 RDI: 0000000000000048
[ 78.670044][ T363] RBP: ffff888110dc4000 R08: ffffffffb1b0885a R09: fffffbfff6ba9078
[ 78.670297][ T363] R10: 0000000000000003 R11: ffff888110e31c80 R12: 0000001880000000
[ 78.670560][ T363] R13: ffff888110dc4150 R14: ffff888110dc42b8 R15: 0000000000000200
[ 78.670814][ T363] FS: 00007f66a8f09c40(0000) GS:ffff888163428000(0000) knlGS:0000000000000000
[ 78.671110][ T363] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 78.671324][ T363] CR2: 000055db4c6a30a8 CR3: 000000010da67000 CR4: 0000000000750ef0
[ 78.671585][ T363] PKRU: 55555554
[ 78.671713][ T363] Call Trace:
[ 78.671843][ T363] <TASK>
[ 78.671936][ T363] ? __pfx_qfq_dequeue+0x10/0x10 [sch_qfq]
[ 78.672148][ T363] ? __pfx__printk+0x10/0x10
[ 78.672322][ T363] ? srso_alias_return_thunk+0x5/0xfbef5
[ 78.672496][ T363] ? lockdep_hardirqs_on_prepare+0xa8/0x1a0
[ 78.672706][ T363] ? srso_alias_return_thunk+0x5/0xfbef5
[ 78.672875][ T363] ? trace_hardirqs_on+0x19/0x1a0
[ 78.673047][ T363] red_dequeue+0x65/0x270 [sch_red]
[ 78.673217][ T363] ? srso_alias_return_thunk+0x5/0xfbef5
[ 78.673385][ T363] tbf_dequeue.cold+0xb0/0x70c [sch_tbf]
[ 78.673566][ T363] __qdisc_run+0x169/0x1900
The right thing to do in #1b is to grab the skb off gso_skb queue.
This patchset fixes that issue by changing #1b to use qdisc_dequeue_peeked()
method instead.
Fixes: 77be155cba4e ("pkt_sched: Add peek emulation for non-work-conserving qdiscs.")
Reported-by: Manas <ghandatmanas@gmail.com>
Reported-by: Rakshit Awasthi <rakshitawasthi17@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430152957.194015-2-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jiawen Wu <jiawenwu@trustnetic.com>
Date: Wed Apr 29 16:37:42 2026 +0800
net: libwx: fix VF illegal register access
commit 694de316f607fe2473d52ca0707e3918e72c1562 upstream.
Register WX_CFG_PORT_ST is a PF restricted register. When a VF is
initialized, attempting to read this register triggers an illegal
register access, which lead to a system hang.
When the device is VF, the bus function ID can be obtained directly from
the PCI_FUNC(pdev->devfn).
Fixes: a04ea57aae37 ("net: libwx: fix device bus LAN ID")
Cc: stable@vger.kernel.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/4D1F4452D21DE107+20260429083743.88961-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jiawen Wu <jiawenwu@trustnetic.com>
Date: Wed Apr 29 16:37:43 2026 +0800
net: libwx: use request_irq for VF misc interrupt
commit 7a33345153eeeda195c55f15be27074e4c3b5109 upstream.
Currently, request_threaded_irq() is used with a primary handler but a
NULL threaded handler, while also setting the IRQF_ONESHOT flag. This
specific combination triggers a WARNING since the commit aef30c8d569c
("genirq: Warn about using IRQF_ONESHOT without a threaded handler").
WARNING: kernel/irq/manage.c:1502 at __setup_irq+0x4fa/0x760
Fix the issue by switching to request_irq(), which is the appropriate
interface or a non-threaded interrupt handler, and removing the
unnecessary IRQF_ONESHOT flag.
Fixes: eb4898fde1de ("net: libwx: add wangxun vf common api")
Cc: stable@vger.kernel.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/786DDC7D5CCA6D0A+20260429083743.88961-2-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Kai Zen <kai.aizen.dev@gmail.com>
Date: Thu Apr 30 18:26:48 2026 +0300
net: rtnetlink: zero ifla_vf_broadcast to avoid stack infoleak in rtnl_fill_vfinfo
commit 4b9e327991815e128ad3af75c3a04630a63ce3e0 upstream.
rtnl_fill_vfinfo() declares struct ifla_vf_broadcast on the stack
without initialisation:
struct ifla_vf_broadcast vf_broadcast;
The struct contains a single fixed 32-byte field:
/* include/uapi/linux/if_link.h */
struct ifla_vf_broadcast {
__u8 broadcast[32];
};
The function then copies dev->broadcast into it using dev->addr_len
as the length:
memcpy(vf_broadcast.broadcast, dev->broadcast, dev->addr_len);
On Ethernet devices (the overwhelming majority of SR-IOV NICs)
dev->addr_len is 6, so only the first 6 bytes of broadcast[] are
written. The remaining 26 bytes retain whatever was previously on
the kernel stack. The full struct is then handed to userspace via:
nla_put(skb, IFLA_VF_BROADCAST,
sizeof(vf_broadcast), &vf_broadcast)
leaking up to 26 bytes of uninitialised kernel stack per VF per
RTM_GETLINK request, repeatable.
The other vf_* structs in the same function are explicitly zeroed
for exactly this reason - see the memset() calls for ivi,
vf_vlan_info, node_guid and port_guid a few lines above.
vf_broadcast was simply missed when it was added.
Reachability: any unprivileged local process can open AF_NETLINK /
NETLINK_ROUTE without capabilities and send RTM_GETLINK with an
IFLA_EXT_MASK attribute carrying RTEXT_FILTER_VF. The kernel walks
each VF and emits IFLA_VF_BROADCAST, leaking 26 bytes of stack per
VF per request. Stack residue at this call site can include return
addresses and transient sensitive data; KASAN with stack
instrumentation, or KMSAN, will flag the nla_put() when reproduced.
Zero the on-stack struct before the partial memcpy, matching the
existing pattern used for the other vf_* structs in the same
function.
Fixes: 75345f888f70 ("ipoib: show VF broadcast address")
Cc: stable@vger.kernel.org
Signed-off-by: Kai Zen <kai.aizen.dev@gmail.com>
Link: https://patch.msgid.link/3c506e8f936e52b57620269b55c348af05d413a2.1777557228.git.kai.aizen.dev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sam Edwards <cfsworks@gmail.com>
Date: Sun May 10 09:37:57 2026 -0400
net: stmmac: Prevent NULL deref when RX memory exhausted
[ Upstream commit 0bb05e6adfa99a2ea1fee1125cc0953409f83ed8 ]
The CPU receives frames from the MAC through conventional DMA: the CPU
allocates buffers for the MAC, then the MAC fills them and returns
ownership to the CPU. For each hardware RX queue, the CPU and MAC
coordinate through a shared ring array of DMA descriptors: one
descriptor per DMA buffer. Each descriptor includes the buffer's
physical address and a status flag ("OWN") indicating which side owns
the buffer: OWN=0 for CPU, OWN=1 for MAC. The CPU is only allowed to set
the flag and the MAC is only allowed to clear it, and both must move
through the ring in sequence: thus the ring is used for both
"submissions" and "completions."
In the stmmac driver, stmmac_rx() bookmarks its position in the ring
with the `cur_rx` index. The main receive loop in that function checks
for rx_descs[cur_rx].own=0, gives the corresponding buffer to the
network stack (NULLing the pointer), and increments `cur_rx` modulo the
ring size. After the loop exits, stmmac_rx_refill(), which bookmarks its
position with `dirty_rx`, allocates fresh buffers and rearms the
descriptors (setting OWN=1). If it fails any allocation, it simply stops
early (leaving OWN=0) and will retry where it left off when next called.
This means descriptors have a three-stage lifecycle (terms my own):
- `empty` (OWN=1, buffer valid)
- `full` (OWN=0, buffer valid and populated)
- `dirty` (OWN=0, buffer NULL)
But because stmmac_rx() only checks OWN, it confuses `full`/`dirty`. In
the past (see 'Fixes:'), there was a bug where the loop could cycle
`cur_rx` all the way back to the first descriptor it dirtied, resulting
in a NULL dereference when mistaken for `full`. The aforementioned
commit resolved that *specific* failure by capping the loop's iteration
limit at `dma_rx_size - 1`, but this is only a partial fix: if the
previous stmmac_rx_refill() didn't complete, then there are leftover
`dirty` descriptors that the loop might encounter without needing to
cycle fully around. The current code therefore panics (see 'Closes:')
when stmmac_rx_refill() is memory-starved long enough for `cur_rx` to
catch up to `dirty_rx`.
Fix this by explicitly checking, before advancing `cur_rx`, if the next
entry is dirty; exit the loop if so. This prevents processing of the
final, used descriptor until stmmac_rx_refill() succeeds, but
fully prevents the `cur_rx == dirty_rx` ambiguity as the previous bugfix
intended: so remove the clamp as well. Since stmmac_rx_zc() is a
copy-paste-and-tweak of stmmac_rx() and the code structure is identical,
any fix to stmmac_rx() will also need a corresponding fix for
stmmac_rx_zc(). Therefore, apply the same check there.
In stmmac_rx() (not stmmac_rx_zc()), a related bug remains: after the
MAC sets OWN=0 on the final descriptor, it will be unable to send any
further DMA-complete IRQs until it's given more `empty` descriptors.
Currently, the driver simply *hopes* that the next stmmac_rx_refill()
succeeds, risking an indefinite stall of the receive process if not. But
this is not a regression, so it can be addressed in a future change.
Fixes: b6cb4541853c7 ("net: stmmac: avoid rx queue overrun")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221010
Cc: stable@vger.kernel.org
Suggested-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
Link: https://patch.msgid.link/20260422044503.5349-1-CFSworks@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Date: Sun May 10 09:37:56 2026 -0400
net: stmmac: rename STMMAC_GET_ENTRY() -> STMMAC_NEXT_ENTRY()
[ Upstream commit 6b4286e0550814cdc4b897f881ec1fa8b0313227 ]
STMMAC_GET_ENTRY() doesn't describe what this macro is doing - it is
incrementing the provided index for the circular array of descriptors.
Replace "GET" with "NEXT" as this better describes the action here.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w2vba-0000000DbWo-1oL5@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 0bb05e6adfa9 ("net: stmmac: Prevent NULL deref when RX memory exhausted")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Pavitra Jha <jhapavitra98@gmail.com>
Date: Fri May 1 07:07:12 2026 -0400
net: wwan: t7xx: validate port_count against message length in t7xx_port_enum_msg_handler
commit 0e7c074cfcd9bd93765505f9eb8b42f03ed2a744 upstream.
t7xx_port_enum_msg_handler() uses the modem-supplied port_count field as
a loop bound over port_msg->data[] without checking that the message buffer
contains sufficient data. A modem sending port_count=65535 in a 12-byte
buffer triggers a slab-out-of-bounds read of up to 262140 bytes.
Add a sizeof(*port_msg) check before accessing the port message header
fields to guard against undersized messages.
Add a struct_size() check after extracting port_count and before the loop.
In t7xx_parse_host_rt_data(), guard the rt_feature header read with a
remaining-buffer check before accessing data_len, validate feat_data_len
against the actual remaining buffer to prevent OOB reads and signed
integer overflow on offset.
Pass msg_len from both call sites: skb->len at the DPMAIF path after
skb_pull(), and the validated feat_data_len at the handshake path.
Fixes: da45d2566a1d ("net: wwan: t7xx: Add control port")
Cc: stable@vger.kernel.org
Signed-off-by: Pavitra Jha <jhapavitra98@gmail.com>
Link: https://patch.msgid.link/20260501110713.145563-1-jhapavitra98@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Breno Leitao <leitao@debian.org>
Date: Fri May 1 02:58:41 2026 -0700
netpoll: pass buffer size to egress_dev() to avoid MAC truncation
commit 76b93a8107574006b25495664304ea9237494d70 upstream.
egress_dev() formats np->dev_mac via snprintf() but receives buf as
a bare char *, so it cannot derive the buffer size from the pointer. The
size argument was hardcoded to MAC_ADDR_STR_LEN (3 * ETH_ALEN - 1 = 17),
which is silly wrong in two ways:
1) misleading kernel log output on the MAC-selected target path
(np->dev_name[0] == '\0'); for example "aa:bb:cc:dd:ee:ff doesn't
exist, aborting" was logged as "aa:bb:cc:dd:ee:f doesn't exist,
aborting".
2) the second argument of snprintf is the size of the buffer, not the
size of what you want to write.
Add a bufsz parameter to egress_dev() and pass sizeof(buf) from each
caller, matching the standard snprintf() idiom and removing the
hardcoded size from the helper.
Every caller already declares "char buf[MAC_ADDR_STR_LEN + 1]" so the
formatted MAC continues to fit.
Tested by booting with
netconsole=6665@/aa:bb:cc:dd:ee:ff,6666@10.0.0.1/00:11:22:33:44:55
on a kernel without a matching device. Pre-fix dmesg shows
"aa:bb:cc:dd:ee:f doesn't exist, aborting"; post-fix shows the full
"aa:bb:cc:dd:ee:ff doesn't exist, aborting".
Fixes: f8a10bed32f5 ("netconsole: allow selection of egress interface via MAC address")
Cc: stable@vger.kernel.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260501-netpoll_snprintf_fix-v1-1-84b0566e6597@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Fedor Pchelkin <pchelkin@ispras.ru>
Date: Wed Apr 8 17:18:14 2026 +0300
nvme-apple: drop invalid put of admin queue reference count
commit ba9d308ccd6732dd97ed8080d834a4a89e758e14 upstream.
Commit 03b3bcd319b3 ("nvme: fix admin request_queue lifetime") moved the
admin queue reference ->put call into nvme_free_ctrl() - a controller
device release callback performed for every nvme driver doing
nvme_init_ctrl().
nvme-apple sets refcount of the admin queue to 1 at allocation during the
probe function and then puts it twice now:
nvme_free_ctrl()
blk_put_queue(ctrl->admin_q) // #1
->free_ctrl()
apple_nvme_free_ctrl()
blk_put_queue(anv->ctrl.admin_q) // #2
Note that there is a commit 941f7298c70c ("nvme-apple: remove an extra
queue reference") which intended to drop taking an extra admin queue
reference. Looks like at that moment it accidentally fixed a refcount
leak, which existed since the driver's introduction. There were two ->get
calls at driver's probe function and a single ->put inside
apple_nvme_free_ctrl().
However now after commit 03b3bcd319b3 ("nvme: fix admin request_queue
lifetime") the refcount is imbalanced again. Fix it by removing extra
->put call from apple_nvme_free_ctrl(). anv->dev and ctrl->dev point to
the same device, so use ctrl->dev directly for simplification. Compile
tested only.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 03b3bcd319b3 ("nvme: fix admin request_queue lifetime")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Chaitanya Kulkarni <kch@nvidia.com>
Date: Wed Apr 8 00:51:31 2026 -0700
nvmet-tcp: fix race between ICReq handling and queue teardown
commit 5293a8882c549fab4a878bc76b0b6c951f980a61 upstream.
nvmet_tcp_handle_icreq() updates queue->state after sending an
Initialization Connection Response (ICResp), but it does so without
serializing against target-side queue teardown.
If an NVMe/TCP host sends an Initialization Connection Request
(ICReq) and immediately closes the connection, target-side teardown
may start in softirq context before io_work drains the already
buffered ICReq. In that case, nvmet_tcp_schedule_release_queue()
sets queue->state to NVMET_TCP_Q_DISCONNECTING and drops the queue
reference under state_lock.
If io_work later processes that ICReq, nvmet_tcp_handle_icreq() can
still overwrite the state back to NVMET_TCP_Q_LIVE. That defeats the
DISCONNECTING-state guard in nvmet_tcp_schedule_release_queue() and
allows a later socket state change to re-enter teardown and issue a
second kref_put() on an already released queue.
The ICResp send failure path has the same problem. If teardown has
already moved the queue to DISCONNECTING, a send error can still
overwrite the state with NVMET_TCP_Q_FAILED, again reopening the
window for a second teardown path to drop the queue reference.
Fix this by serializing both post-send state transitions with
state_lock and bailing out if teardown has already started.
Use -ESHUTDOWN as an internal sentinel for that bail-out path rather
than propagating it as a transport error like -ECONNRESET. Keep
nvmet_tcp_socket_error() setting rcv_state to NVMET_TCP_RECV_ERR before
honoring that sentinel so receive-side parsing stays quiesced until the
existing release path completes.
Fixes: c46a6465bac2 ("nvmet-tcp: add NVMe over TCP target driver")
Cc: stable@vger.kernel.org
Reported-by: Shivam Kumar <skumar47@syr.edu>
Tested-by: Shivam Kumar <kumar.shivam43666@gmail.com>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Chaitanya Kulkarni <kch@nvidia.com>
Date: Wed Apr 8 17:56:47 2026 -0700
nvmet: avoid recursive nvmet-wq flush in nvmet_ctrl_free
commit aade8abd8b868b6ffa9697aadaea28ec7f65bee6 upstream.
nvmet_tcp_release_queue_work() runs on nvmet-wq and can drop the
final controller reference through nvmet_cq_put(). If that triggers
nvmet_ctrl_free(), the teardown path flushes ctrl->async_event_work on
the same nvmet-wq.
Call chain:
nvmet_tcp_schedule_release_queue()
kref_put(&queue->kref, nvmet_tcp_release_queue)
nvmet_tcp_release_queue()
queue_work(nvmet_wq, &queue->release_work) <--- nvmet_wq
process_one_work()
nvmet_tcp_release_queue_work()
nvmet_cq_put(&queue->nvme_cq)
nvmet_cq_destroy()
nvmet_ctrl_put(cq->ctrl)
nvmet_ctrl_free()
flush_work(&ctrl->async_event_work) <--- nvmet_wq
Previously Scheduled by :-
nvmet_add_async_event
queue_work(nvmet_wq, &ctrl->async_event_work);
This trips lockdep with a possible recursive locking warning.
[ 5223.015876] run blktests nvme/003 at 2026-04-07 20:53:55
[ 5223.061801] loop0: detected capacity change from 0 to 2097152
[ 5223.072206] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 5223.088368] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[ 5223.126086] nvmet: Created discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[ 5223.128453] nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[ 5233.199447] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[ 5233.227718] ============================================
[ 5233.231283] WARNING: possible recursive locking detected
[ 5233.234696] 7.0.0-rc3nvme+ #20 Tainted: G O N
[ 5233.238434] --------------------------------------------
[ 5233.241852] kworker/u192:6/2413 is trying to acquire lock:
[ 5233.245429] ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x26/0x90
[ 5233.251438]
but task is already holding lock:
[ 5233.255254] ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x5cc/0x6e0
[ 5233.261125]
other info that might help us debug this:
[ 5233.265333] Possible unsafe locking scenario:
[ 5233.269217] CPU0
[ 5233.270795] ----
[ 5233.272436] lock((wq_completion)nvmet-wq);
[ 5233.275241] lock((wq_completion)nvmet-wq);
[ 5233.278020]
*** DEADLOCK ***
[ 5233.281793] May be due to missing lock nesting notation
[ 5233.286195] 3 locks held by kworker/u192:6/2413:
[ 5233.289192] #0: ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x5cc/0x6e0
[ 5233.294569] #1: ffffc9000e2a7e40 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1c5/0x6e0
[ 5233.300128] #2: ffffffff82d7dc40 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x62/0x530
[ 5233.304290]
stack backtrace:
[ 5233.306520] CPU: 4 UID: 0 PID: 2413 Comm: kworker/u192:6 Tainted: G O N 7.0.0-rc3nvme+ #20 PREEMPT(full)
[ 5233.306524] Tainted: [O]=OOT_MODULE, [N]=TEST
[ 5233.306525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 5233.306527] Workqueue: nvmet-wq nvmet_tcp_release_queue_work [nvmet_tcp]
[ 5233.306532] Call Trace:
[ 5233.306534] <TASK>
[ 5233.306536] dump_stack_lvl+0x73/0xb0
[ 5233.306552] print_deadlock_bug+0x225/0x2f0
[ 5233.306556] __lock_acquire+0x13f0/0x2290
[ 5233.306563] lock_acquire+0xd0/0x300
[ 5233.306565] ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306571] ? __flush_work+0x20b/0x530
[ 5233.306573] ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306577] touch_wq_lockdep_map+0x3b/0x90
[ 5233.306580] ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306583] ? __flush_work+0x20b/0x530
[ 5233.306585] __flush_work+0x268/0x530
[ 5233.306588] ? __pfx_wq_barrier_func+0x10/0x10
[ 5233.306594] ? xen_error_entry+0x30/0x60
[ 5233.306600] nvmet_ctrl_free+0x140/0x310 [nvmet]
[ 5233.306617] nvmet_cq_put+0x74/0x90 [nvmet]
[ 5233.306629] nvmet_tcp_release_queue_work+0x19f/0x360 [nvmet_tcp]
[ 5233.306634] process_one_work+0x206/0x6e0
[ 5233.306640] worker_thread+0x184/0x320
[ 5233.306643] ? __pfx_worker_thread+0x10/0x10
[ 5233.306646] kthread+0xf1/0x130
[ 5233.306648] ? __pfx_kthread+0x10/0x10
[ 5233.306651] ret_from_fork+0x355/0x450
[ 5233.306653] ? __pfx_kthread+0x10/0x10
[ 5233.306656] ret_from_fork_asm+0x1a/0x30
[ 5233.306664] </TASK>
There is also no need to flush async_event_work from controller
teardown. The admin queue teardown already fails outstanding AER
requests before the final controller put :-
nvmet_sq_destroy(admin sq)
nvmet_async_events_failall(ctrl)
The controller has already been removed from the subsystem list before
nvmet_ctrl_free() quiesces outstanding work.
Replace flush_work() with cancel_work_sync() so a pending
async_event_work item is canceled and a running instance is waited on
without recursing into the same workqueue.
Fixes: 06406d81a2d7 ("nvmet: cancel fatal error and flush async work before free controller")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: David Carlier <devnexen@gmail.com>
Date: Tue May 5 03:38:09 2026 -0400
octeon_ep_vf: add NULL check for napi_build_skb()
[ Upstream commit dd66b42854705e4e4ee7f14d260f86c578bed3e3 ]
napi_build_skb() can return NULL on allocation failure. In
__octep_vf_oq_process_rx(), the result is used directly without a NULL
check in both the single-buffer and multi-fragment paths, leading to a
NULL pointer dereference.
Add NULL checks after both napi_build_skb() calls, properly advancing
descriptors and consuming remaining fragments on failure.
Fixes: 1cd3b407977c ("octeon_ep_vf: add Tx/Rx processing and interrupt support")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260409184009.930359-3-devnexen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ inlined missing octep_vf_oq_next_idx() helper as read_idx++ with wraparound ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Fri May 1 01:38:37 2026 +0200
openvswitch: vport: fix self-deadlock on release of tunnel ports
commit aa69918bd418e700309fdd08509dba324fb24296 upstream.
vports are used concurrently and protected by RCU, so netdev_put()
must happen after the RCU grace period. So, either in an RCU call or
after the synchronize_net(). The rtnl_delete_link() must happen under
RTNL and so can't be executed in RCU context. Calling synchronize_net()
while holding RTNL is not a good idea for performance and system
stability under load in general, so calling netdev_put() in RCU call
is the right solution here.
However,
when the device is deleted, rtnl_unlock() will call netdev_run_todo()
and block until all the references are gone. In the current code this
means that we never reach the call_rcu() and the vport is never freed
and the reference is never released, causing a self-deadlock on device
removal.
Fix that by moving the rcu_call() before the rtnl_unlock(), so the
scheduled RCU callback will be executed when synchronize_net() is
called from the rtnl_unlock()->netdev_run_todo() while the RTNL itself
is already released.
Fixes: 6931d21f87bc ("openvswitch: defer tunnel netdev_put to RCU release")
Cc: stable@vger.kernel.org
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20260430233848.440994-2-i.maximets@ovn.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Colin Walters <walters@verbum.org>
Date: Tue May 5 15:42:57 2026 -0700
ovl: fix verity lazy-load guard broken by fsverity_active() semantic change
commit 0c8c88b8eb82a2a41bec5f17c076d6312dc40316 upstream.
Commit f77f281b6118 ("fsverity: use a hashtable to find the
fsverity_info") made fsverity_active() check whether the inode has the
verity flag, rather than whether the inode's fsverity_info is loaded.
This broke ovl_ensure_verity_loaded(), which wants to load the
fsverity_info for any verity inodes that haven't had it loaded yet.
Therefore, to check that the fsverity_info hasn't yet been loaded, use
fsverity_get_info(inode) == NULL instead of !fsverity_active(inode).
Also, since fsverity_get_info() now involves a hash table lookup, put
the more lightweight IS_VERITY() flag check first.
Fixes: f77f281b6118 ("fsverity: use a hashtable to find the fsverity_info")
Cc: stable@vger.kernel.org
Link: https://github.com/bootc-dev/bootc/issues/2174
Signed-off-by: Colin Walters <walters@verbum.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260505224257.23213-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Hongling Zeng <zenghongling@kylinos.cn>
Date: Sun May 3 12:17:44 2026 +0800
parisc: Fix IRQ leak in LASI driver
commit 37b0dc5e279f35036fb638d1e187197b6c05a76d upstream.
When request_irq() succeeds but gsc_common_setup() fails later,
the IRQ is never released. Fix this by adding proper error handling
with goto labels to ensure resources are released in LIFO order.
Detected by Smatch:
drivers/parisc/lasi.c:216 lasi_init_chip() warn: 'lasi->gsc_irq.irq'
from request_irq() not released on lines: 207.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202604180957.4QdAIxP6-lkp@intel.com/
Signed-off-by: Hongling Zeng <zenghongling@kylinos.cn>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Shuai Xue <xueshuai@linux.alibaba.com>
Date: Wed Feb 11 20:46:24 2026 +0800
PCI/AER: Clear only error bits in PCIe Device Status
commit a8aeea1bf3c80cc87983689e0118770e019bd4f3 upstream.
Currently, pcie_clear_device_status() clears the entire PCIe Device Status
register (PCI_EXP_DEVSTA) by writing back the value read from the register,
which affects not only the error status bits but also other writable bits.
According to PCIe r7.0, sec 7.5.3.5, this register contains:
- RW1C error status bits (CED, NFED, FED, URD at bits 0-3): These are the
four error status bits that need to be cleared.
- Read-only bits (AUXPD at bit 4, TRPND at bit 5): Writing to these has
no effect.
- Emergency Power Reduction Detected (bit 6): A RW1C non-error bit
introduced in PCIe r5.0 (2019). This is currently the only writable
non-error bit in the Device Status register. Unconditionally clearing
this bit can interfere with other software components that rely on this
power management indication.
- Reserved bits (RsvdZ): These bits are required to be written as zero.
Writing 1s to them (as the current implementation may do) violates the
specification.
To prevent unintended side effects, modify pcie_clear_device_status() to
only write 1s to the four error status bits (CED, NFED, FED, URD), leaving
the Emergency Power Reduction Detected bit and reserved bits unaffected.
Fixes: ec752f5d54d7 ("PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL")
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260211124624.49656-1-xueshuai@linux.alibaba.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lukas Wunner <lukas@wunner.de>
Date: Fri Mar 27 10:56:43 2026 +0100
PCI/AER: Stop ruling out unbound devices as error source
commit 1ab4a3c805084d752ec571efc78272295a9f2f74 upstream.
When searching for the error source, the AER driver rules out devices whose
enable_cnt is zero. This was introduced in 2009 by commit 28eb27cf0839
("PCI AER: support invalid error source IDs") without providing a
rationale.
Drivers typically call pci_enable_device() on probe, hence the enable_cnt
check essentially filters out unbound devices. At the time of the commit,
drivers had to opt in to AER by calling pci_enable_pcie_error_reporting()
and so any AER-enabled device could be assumed to be bound to a driver.
The check thus made sense because it allowed skipping config space accesses
to devices which were known not to be the error source.
But since 2022, AER is universally enabled on all devices when they are
enumerated, cf. commit f26e58bf6f54 ("PCI/AER: Enable error reporting when
AER is native").
Errors may very well be reported by unbound devices, e.g. due to link
instability. By ruling them out as error source, errors reported by them
are neither logged nor cleared. When they do get bound and another error
occurs, the earlier error is reported together with the new error, which
may confuse users. Stop doing so.
Fixes: f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stefan Roese <stefan.roese@mailbox.org>
Cc: stable@vger.kernel.org # v6.0+
Link: https://patch.msgid.link/734338c2e8b669db5a5a3b45d34131b55ffebfca.1774605029.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lukas Wunner <lukas@wunner.de>
Date: Mon Feb 16 08:46:13 2026 +0100
PCI/ASPM: Fix pci_clear_and_set_config_dword() usage
commit cc33985d26c92a5c908c0185239c59ec35b8637c upstream.
When aspm_calc_l12_info() programs the L1 PM Substates Control 1 register
fields Common_Mode_Restore_Time, LTR_L1.2_THRESHOLD_Value and _Scale, it
invokes pci_clear_and_set_config_dword() in an incorrect way:
For the bits to clear it selects those corresponding to the field. So far
so good. But for the bits to set it passes a full register value.
pci_clear_and_set_config_dword() performs a boolean OR operation which
sets all bits of that value, not just the ones that were just cleared.
Thus, when setting the LTR_L1.2_THRESHOLD_Value and _Scale on the child of
an ASPM link, aspm_calc_l12_info() also sets the Common_Mode_Restore_Time.
That's a spec violation: PCIe r7.0 sec 7.8.3.3 says this field is RsvdP
for Upstream Ports. On Adrià's Pixelbook Eve, Common_Mode_Restore_Time
of the Intel 7265 "Stone Peak" wifi card is zero, yet aspm_calc_l12_info()
does not preserve the zero bits but instead programs the value calculated
for the Root Port into the wifi card.
Likewise, when setting the Common_Mode_Restore_Time on the Root Port,
aspm_calc_l12_info() also changes the LTR_L1.2_THRESHOLD_Value and _Scale
from the initial 163840 nsec to 237568 nsec (due to ORing those fields),
only to reduce it afterwards to 106496 nsec.
Amend all invocations of pci_clear_and_set_config_dword() to only set bits
which are cleared.
Finally, when setting the T_POWER_ON_Value and _Scale on the Root Port and
the wifi card, aspm_calc_l12_info() fails to preserve bits declared RsvdP
and instead overwrites them with zeroes. Replace pci_write_config_dword()
with pci_clear_and_set_config_dword() to avoid this.
Fixes: aeda9adebab8 ("PCI/ASPM: Configure L1 substate settings")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220705#c22
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Adrià Vilanova Martínez <me@avm99963.com>
Cc: stable@vger.kernel.org # v4.11+
Link: https://patch.msgid.link/5c1752d7512eed0f4ea57b84b12d7ee08ca61fc5.1771226659.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lukas Wunner <lukas@wunner.de>
Date: Wed Apr 15 17:56:06 2026 +0200
PCI: Update saved_config_space upon resource assignment
commit 909f7bf9b080c10df3c3b38533906dbf09ff1d8b upstream.
Bernd reports passthrough failure of a Digital Devices Cine S2 V6 DVB
adapter plugged into an ASRock X570S PG Riptide board with BIOS version
P5.41 (09/07/2023):
ddbridge 0000:05:00.0: detected Digital Devices Cine S2 V6 DVB adapter
ddbridge 0000:05:00.0: cannot read registers
ddbridge 0000:05:00.0: fail
BIOS assigns an incorrect BAR to the DVB adapter which doesn't fit into the
upstream bridge window. The kernel corrects the BAR assignment:
pci 0000:07:00.0: BAR 0 [mem 0xfffffffffc500000-0xfffffffffc50ffff 64bit]: can't claim; no compatible bridge window
pci 0000:07:00.0: BAR 0 [mem 0xfc500000-0xfc50ffff 64bit]: assigned
Correction of the BAR assignment happens in an x86-specific fs_initcall,
pcibios_assign_resources(), after device enumeration in a subsys_initcall.
This order was introduced at the behest of Linus in 2004:
https://git.kernel.org/tglx/history/c/a06a30144bbc
No other architecture performs such a late BAR correction.
Bernd bisected the issue to commit a2f1e22390ac ("PCI/ERR: Ensure error
recoverability at all times"), but it only occurs in the absence of commit
4d4c10f763d7 ("PCI: Explicitly put devices into D0 when initializing").
This combination exists in stable kernel v6.12.70, but not in mainline,
hence Bernd cannot reproduce the issue with mainline.
Since a2f1e22390ac, config space is saved on enumeration, prior to BAR
correction. Upon passthrough, the corrected BAR is overwritten with the
incorrect saved value by:
vfio_pci_core_register_device()
vfio_pci_set_power_state()
pci_restore_state()
But only if the device's current_state is PCI_UNKNOWN, as it was prior to
commit 4d4c10f763d7. Since the commit, it is PCI_D0, which changes the
behavior of vfio_pci_set_power_state() to no longer restore the state
without saving it first.
Alexandre is reporting the same issue as Bernd, but in his case, mainline
is affected as well. The difference is that on Alexandre's system, the
host kernel binds a driver to the device which is unbound prior to
passthrough, whereas on Bernd's system no driver gets bound by the host
kernel.
Unbinding sets current_state to PCI_UNKNOWN in pci_device_remove(), so when
vfio-pci is subsequently bound to the device, pci_restore_state() is once
again called without invoking pci_save_state() first.
To robustly fix the issue, always update saved_config_space upon resource
assignment.
Reported-by: Bernd Schumacher <bernd@bschu.de>
Closes: https://lore.kernel.org/r/acfZrlP0Ua_5D3U4@eldamar.lan/
Reported-by: Alexandre N. <an.tech@mailo.com>
Closes: https://lore.kernel.org/r/dd3c3358-de0f-4a56-9c81-04aceaab4058@mailo.com/
Fixes: a2f1e22390ac ("PCI/ERR: Ensure error recoverability at all times")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Bernd Schumacher <bernd@bschu.de>
Tested-by: Alexandre N. <an.tech@mailo.com>
Cc: stable@vger.kernel.org # v6.12+
Link: https://patch.msgid.link/febc3f354e0c1f5a9f5b3ee9ffddaa44caccf651.1776268054.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
Date: Thu Apr 30 08:25:55 2026 +0800
perf/x86/intel: Always reprogram ACR events to prevent stale masks
commit 8ba0b706a485b1e607594cf4210786d517ad1611 upstream.
Members of an ACR group are logically linked via a bitmask of their
hardware counter indices. If some members of the group are assigned new
hardware counters during rescheduling, even events that keep their
original counter index must be updated with a new mask.
Without this, an event will continue to use a stale acr_mask that
references the old indices of its group peers. Ensure all ACR events are
reprogrammed during the scheduling path to maintain consistency across
the group.
Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260430002558.712334-3-dapeng1.mi@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
Date: Thu Apr 30 08:25:56 2026 +0800
perf/x86/intel: Disable PMI for self-reloaded ACR events
commit 1271aeccc307066315b2d3b0d5af2510e27018b5 upstream.
On platforms with Auto Counter Reload (ACR) support, such as NVL, a
"NMI received for unknown reason 30" warning is observed when running
multiple events in a group with ACR enabled:
$ perf record -e '{instructions/period=20000,acr_mask=0x2/u,\
cycles/period=40000,acr_mask=0x3/u}' ./test
The warning occurs because the Performance Monitoring Interrupt (PMI)
is enabled for the self-reloaded event (the cycles event in this case).
According to the Intel SDM, the overflow bit
(IA32_PERF_GLOBAL_STATUS.PMCn_OVF) is never set for self-reloaded events.
Since the bit is not set, the perf NMI handler cannot identify the source
of the interrupt, leading to the "unknown reason" message.
Furthermore, enabling PMI for self-reloaded events is unnecessary and
can lead to extraneous records that pollute the user's requested data.
Disable the interrupt bit for all events configured with ACR self-reload.
Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260430002558.712334-4-dapeng1.mi@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
Date: Thu Apr 30 08:25:57 2026 +0800
perf/x86/intel: Enable auto counter reload for DMR
commit aa4384bc8f4360167f3c3d5322121fe892289ea2 upstream.
Panther cove µarch starts to support auto counter reload (ACR), but the
static_call intel_pmu_enable_acr_event() is not updated for the Panther
Cove µarch used by DMR. It leads to the auto counter reload is not
really enabled on DMR.
Update static_call intel_pmu_enable_acr_event() in intel_pmu_init_pnc().
Fixes: d345b6bb8860 ("perf/x86/intel: Add core PMU support for DMR")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260430002558.712334-5-dapeng1.mi@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
Date: Thu Apr 30 08:25:54 2026 +0800
perf/x86/intel: Improve validation and configuration of ACR masks
commit 5ad732a56be46aabf158c16aa0c095291727aaef upstream.
Currently there are several issues on the user space ACR mask validation
and configuration.
- The validation for user space ACR mask (attr.config2) is incomplete,
e.g., the ACR mask could include the index which belongs to another
ACR events group, but it's not validated.
- An early return on an invalid ACR mask caused all subsequent ACR groups
to be skipped.
- The stale hardware ACR mask (hw.config1) is not cleared before setting
new hardware ACR mask.
The following changes address all of the above issues.
- Figure out the event index group of an ACR group. Any bits in the
user-space mask not present in the index group are now dropped.
- Instead of an early return on invalid bits, drop only the invalid
portions and continue iterating through all ACR events to ensure full
configuration.
- Explicitly clear the stale hardware ACR mask for each event prior to
writing the new configuration.
Besides, a non-leader event member of ACR group could be disabled in
theory. This could cause bit-shifting errors in the acr_mask of remaining
group members. But since ACR sampling requires all events to be active,
this should not be a big concern in real use case. Add a "FIXME" comment
to notice this risk.
Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260430002558.712334-2-dapeng1.mi@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tzung-Bi Shih <tzungbi@kernel.org>
Date: Tue May 5 05:34:03 2026 +0000
platform/chrome: cros_ec_typec: Init mutex in Thunderbolt registration
commit 525cb7ba6661074c1c5cc3772bccc6afab6791ef upstream.
cros_typec_register_thunderbolt() missed initializing the `adata->lock`
mutex. This leads to a NULL dereference when the mutex is later
acquired (e.g. in cros_typec_altmode_work()).
Initialize the mutex in cros_typec_register_thunderbolt() to fix the
issue.
Cc: stable@vger.kernel.org
Fixes: 3b00be26b16a ("platform/chrome: cros_ec_typec: Thunderbolt support")
Reviewed-by: Benson Leung <bleung@chromium.org>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Link: https://lore.kernel.org/r/20260505053403.3335740-1-tzungbi@kernel.org
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ulf Hansson <ulfh@kernel.org>
Date: Fri Apr 17 13:13:31 2026 +0200
pmdomain: core: Fix detach procedure for virtual devices in genpd
commit 26735dfdd8930d9ef1fa92e590a9bf77726efdf6 upstream.
If a device is attached to a PM domain through genpd_dev_pm_attach_by_id(),
genpd calls pm_runtime_enable() for the corresponding virtual device that
it registers. While this avoids boilerplate code in drivers, there is no
corresponding call to pm_runtime_disable() in genpd_dev_pm_detach().
This means these virtual devices are typically detached from its genpd,
while runtime PM remains enabled for them, which is not how things are
designed to work. In worst cases it may lead to critical errors, like a
NULL pointer dereference bug in genpd_runtime_suspend(), which was recently
reported. For another case, we may end up keeping an unnecessary vote for a
performance state for the device.
To fix these problems, let's add this missing call to pm_runtime_disable()
in genpd_dev_pm_detach().
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/all/CAMuHMdWapT40hV3c+CSBqFOW05aWcV1a6v_NiJYgoYi0i9_PDQ@mail.gmail.com/
Fixes: 3c095f32a92b ("PM / Domains: Add support for multi PM domains per device to genpd")
Cc: stable@vger.kernel.org
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Wentao Liang <vulab@iscas.ac.cn>
Date: Wed Apr 8 14:11:21 2026 +0000
pmdomain: mediatek: fix use-after-free in scpsys_get_bus_protection_legacy()
commit ec1fcddb3117d9452210e838fd37389ee61e10e8 upstream.
In scpsys_get_bus_protection_legacy(), of_find_node_with_property()
returns a device node with its reference count incremented. The function
then calls of_node_put(node) before checking whether
syscon_regmap_lookup_by_phandle() returns an error. If an error occurs,
dev_err_probe() dereferences the node pointer to print diagnostic
information, but the node memory may have already been freed due to the
earlier of_node_put(), leading to a use-after-free vulnerability.
Fix this by moving the of_node_put() call after the error check, ensuring
the node is still valid when accessed in the error path.
Fixes: c29345fa5f66 ("pmdomain: mediatek: Refactor bus protection regmaps retrieval")
Cc: stable@vger.kernel.org
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: André Draszik <andre.draszik@linaro.org>
Date: Mon Mar 2 13:32:05 2026 +0000
power: supply: max17042: avoid overflow when determining health
commit 9a44949da669708f19d29141e65b3ac774d08f5a upstream.
If vmax has the default value of INT_MAX (e.g. because not specified in
DT), battery health is reported as over-voltage. This is because adding
any value to vmax (the vmax tolerance in this case) causes it to wrap
around, making it negative and smaller than the measured battery
voltage.
Avoid that by using size_add().
Fixes: edd4ab055931 ("power: max17042_battery: add HEALTH and TEMP_* properties support")
Cc: stable@vger.kernel.org
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Link: https://patch.msgid.link/20260302-max77759-fg-v3-6-3c5f01dbda23@linaro.org
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sourabh Jain <sourabhjain@linux.ibm.com>
Date: Tue Apr 7 18:13:44 2026 +0530
powerpc/kdump: fix KASAN sanitization flag for core_$(BITS).o
commit b3a97f9484080c6e71db9e803e3cc1bb372a9bc7 upstream.
KASAN instrumentation is intended to be disabled for the kexec core
code, but the existing Makefile entry misses the object suffix. As a
result, the flag is not applied correctly to core_$(BITS).o.
So when KASAN is enabled, kexec_copy_flush and copy_segments in
kexec/core_64.c are instrumented, which can result in accesses to
shadow memory via normal address translation paths. Since these run
with the MMU disabled, such accesses may trigger page faults
(bad_page_fault) that cannot be handled in the kdump path, ultimately
causing a hang and preventing the kdump kernel from booting. The same
is true for kexec as well, since the same functions are used there.
Update the entry to include the “.o” suffix so that KASAN
instrumentation is properly disabled for this object file.
Fixes: 2ab2d5794f14 ("powerpc/kasan: Disable address sanitization in kexec paths")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/all/1dee8891-8bcc-46b4-93f3-fc3a774abd5b@linux.ibm.com/
Cc: stable@vger.kernel.org
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260407124349.1698552-1-sourabhjain@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Nilay Shroff <nilay@linux.ibm.com>
Date: Wed Mar 11 19:13:31 2026 +0530
powerpc/xive: fix kmemleak caused by incorrect chip_data lookup
commit 6771c54728c278bf1e4bfdab4fddbbb186e33498 upstream.
The kmemleak reports the following memory leak:
Unreferenced object 0xc0000002a7fbc640 (size 64):
comm "kworker/8:1", pid 540, jiffies 4294937872
hex dump (first 32 bytes):
01 00 00 00 00 00 00 00 00 00 09 04 00 04 00 00 ................
00 00 a7 81 00 00 0a c0 00 00 08 04 00 04 00 00 ................
backtrace (crc 177d48f6):
__kmalloc_cache_noprof+0x520/0x730
xive_irq_alloc_data.constprop.0+0x40/0xe0
xive_irq_domain_alloc+0xd0/0x1b0
irq_domain_alloc_irqs_parent+0x44/0x6c
pseries_irq_domain_alloc+0x1cc/0x354
irq_domain_alloc_irqs_parent+0x44/0x6c
msi_domain_alloc+0xb0/0x220
irq_domain_alloc_irqs_locked+0x138/0x4d0
__irq_domain_alloc_irqs+0x8c/0xfc
__msi_domain_alloc_irqs+0x214/0x4d8
msi_domain_alloc_irqs_all_locked+0x70/0xf8
pci_msi_setup_msi_irqs+0x60/0x78
__pci_enable_msix_range+0x54c/0x98c
pci_alloc_irq_vectors_affinity+0x16c/0x1d4
nvme_pci_enable+0xac/0x9c0 [nvme]
nvme_probe+0x340/0x764 [nvme]
This occurs when allocating MSI-X vectors for an NVMe device. During
allocation the XIVE code creates a struct xive_irq_data and stores it
in irq_data->chip_data.
When the MSI-X irqdomain is later freed, xive_irq_free_data() is
responsible for retrieving this structure and freeing it. However,
after commit cc0cc23babc9 ("powerpc/xive: Untangle xive from child
interrupt controller drivers"), xive_irq_free_data() retrieves the
chip_data using irq_get_chip_data(), which looks up the data through
the child domain.
This is incorrect because the XIVE-specific irq data is associated with
the XIVE (parent) domain. As a result the lookup fails and the allocated
struct xive_irq_data is never freed, leading to the kmemleak report
shown above.
Fix this by retrieving the irq_data from the correct domain using
irq_domain_get_irq_data() and then accessing the chip_data via
irq_data_get_irq_chip_data().
Cc: stable@vger.kernel.org
Fixes: cc0cc23babc9 ("powerpc/xive: Untangle xive from child interrupt controller drivers")
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260311134336.326996-1-nilay@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thorsten Blum <thorsten.blum@linux.dev>
Date: Sat May 9 14:59:28 2026 -0400
printk: add print_hex_dump_devel()
[ Upstream commit d134feeb5df33fbf77f482f52a366a44642dba09 ]
Add print_hex_dump_devel() as the hex dump equivalent of pr_devel(),
which emits output only when DEBUG is enabled, but keeps call sites
compiled otherwise.
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Stable-dep-of: 177730a273b1 ("crypto: caam - guard HMAC key hex dumps in hash_digest_key")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Date: Fri May 1 09:41:43 2026 +0530
pseries/papr-hvpipe: Fix & simplify error handling in papr_hvpipe_init()
commit 713e468cdbc2277db6ce949c32c1acbd83501733 upstream.
Remove such 3 levels of nesting patterns to check success return values
from function calls.
ret = enable_hvpipe_IRQ()
if (!ret)
ret = set_hvpipe_sys_param(1)
if (!ret)
ret = misc_register()
Instead just bail out to "out*:" labels, in case of any error. This
simplifies the init flow.
While at it let's also fix the following error handling logic:
We have already enabled interrupt sources and enabled hvpipe to received
interrupts, if misc_register() fails, we will destroy the workqueue, but
the HMC might send us a msg via hvpipe which will call, queue work on
the workqueue which might be destroyed.
So instead, let's reverse the order of enabling set_hvpipe_sys_param(1)
and in case of an error let's remove the misc dev by calling
misc_deregister().
Cc: stable@vger.kernel.org
Fixes: 39a08a4f94980 ("powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/f2141eafb80e7780395e03aa9a22e8a37be80513.1777606826.git.ritesh.list@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Date: Fri May 1 09:41:42 2026 +0530
pseries/papr-hvpipe: Fix null ptr deref in papr_hvpipe_dev_create_handle()
commit 1b9f7aafa44f5ce852c00509104d10fd9eb0f402 upstream.
commit 6d3789d347a7 ("papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()"),
changed the create handle to FD_PREPARE(), but it caused kernel
null-ptr-deref because after call to retain_and_null_ptr(src_info),
src_info is re-used for adding it to the global list.
Getting the following kernel panic in papr_hvpipe_dev_create_handle()
when trying to add src_info to the list.
Kernel attempted to write user page (0) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on write at 0x00000000
Faulting instruction address: 0xc0000000001b44a0
Oops: Kernel access of bad area, sig: 11 [#1]
...
Call Trace:
papr_hvpipe_dev_ioctl+0x1f4/0x48c (unreliable)
sys_ioctl+0x528/0x1064
system_call_exception+0x128/0x360
system_call_vectored_common+0x15c/0x2ec
Now, the error handling with FD_PREPARE's file cleanup and __free(kfree) auto
cleanup is getting too convoluted. This is mainly because we need to
ensure only 1 user get the srcID handle. To simplify this, we allocate
prepare the src_info in the beginning and add it to the global list
under a spinlock after checking that no duplicates exist.
This simplify the error handling where if the FD_ADD fails, we can
simply remove the src_info from the list and consume any pending msg in
hvpipe to be cleared, after src_info became visible in the global list.
Cc: stable@vger.kernel.org
Fixes: 6d3789d347a7 ("papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()")
Reported-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/31ad94bc89d44156ee700c5bd006cb47a748e3cb.1777606826.git.ritesh.list@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Date: Fri May 1 09:41:40 2026 +0530
pseries/papr-hvpipe: Fix race with interrupt handler
commit 7a4f0846ee6cc8cf44ae0046ed42e3259d1dd45b upstream.
While executing ->ioctl handler or ->release handler, if an interrupt
fires on the same cpu, then we can enter into a deadlock.
This patch fixes both these handlers to take spin_lock_irq{save|restore}
versions of the lock to prevent this deadlock.
Cc: stable@vger.kernel.org
Fixes: 814ef095f12c9 ("powerpc/pseries: Add papr-hvpipe char driver for HVPIPE interfaces")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/e4ed435c44fc191f2eb23c7907ba6f72f193e6aa.1777606826.git.ritesh.list@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Date: Fri May 1 09:41:44 2026 +0530
pseries/papr-hvpipe: Fix the usage of copy_to_user()
commit d48654bd8b1a75f662e224d257db54de475120dc upstream.
copy_to_user() return bytes_not_copied to the user buffer. If there was
an error writing bytes into the user buffer, i.e. if copy_to_user
returns a non-zero value, then we should simply return -EFAULT from the
->read() call.
Otherwise, in the non-patched version, we may end up mixing
"bytes_not_copied + bytes_copied (HVPIPE_HDR_LEN)" as the return value
to the user in ->read() call
Also let's make sure we clear the hvpipe_status flag, if we have
consumed the hvpipe msg by making the rtas call. ret = -EFAULT means
copy_to_user has failed but that still means that the msg was read from
the hvpipe, hence for both cases, success & -EFAULT, we should clear the
HVPIPE_MSG_AVAILABLE flag in hvpipe_status.
Cc: stable@vger.kernel.org
Fixes: cebdb522fd3edd1 ("powerpc/pseries: Receive payload with ibm,receive-hvpipe-msg RTAS")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/8fda3212a1ad48879c174e92f67472d9b9f1c3b7.1777606826.git.ritesh.list@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Date: Fri May 1 09:41:41 2026 +0530
pseries/papr-hvpipe: Prevent kernel stack memory leak to userspace
commit cefeed44296261173a806bef988b26bc565da4be upstream.
The hdr variable is allocated on the stack and only hdr.version and
hdr.flags are initialized explicitly. Because the struct papr_hvpipe_hdr
contains reserved padding bytes (reserved[3] and reserved2[40]), these
could leak the uninitialized bytes to userspace after copy_to_user().
This patch fixes that by initializing the whole struct to 0.
Cc: stable@vger.kernel.org
Fixes: cebdb522fd3ed ("powerpc/pseries: Receive payload with ibm,receive-hvpipe-msg RTAS")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/7bfe03b65a282c856ed8182d1871bb973c0b78f2.1777606826.git.ritesh.list@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: David Carlier <devnexen@gmail.com>
Date: Sat May 2 15:19:45 2026 +0100
psp: strip variable-length PSP header in psp_dev_rcv()
commit 30cb24f97d44f6b81c14b85c5323de62eef1fb7f upstream.
psp_dev_rcv() unconditionally removes a fixed PSP_ENCAP_HLEN, even
when psph->hdrlen indicates that the PSP header carries optional
fields. A frame whose PSP header advertises a non-zero VC or any
extension would therefore be silently mis-decapsulated: option bytes
would spill into the inner packet head and downstream parsing would
fail on a corrupted skb.
Compute the full PSP header length from psph->hdrlen, pull the
optional bytes into the linear region, and strip the whole header
when decapsulating. Optional fields (VC, ...) are still ignored,
just discarded with the rest of the header instead of leaking.
crypt_offset and the VIRT flag are intentionally not validated here
- callers know their device's PSP implementation and can decide.
Both in-tree callers gate on hardware-validated PSP, so this is a
correctness fix rather than a reachable corruption path under
current configurations.
Fixes: 0eddb8023cee ("psp: provide decapsulation and receive helper for drivers")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260502141945.14484-1-devnexen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Tue Apr 28 13:17:48 2026 -0300
RDMA/hns: Fix unlocked call to hns_roce_qp_remove()
commit 0c99acbc8b6c6dd526ae475a48ee1897b61072fb upstream.
Sashiko points out that hns_roce_qp_remove() requires the caller to hold
locks. The error flow in hns_roce_create_qp_common() doesn't hold those
locks for the error unwind so it risks corrupting memory.
Grab the same locks the other two callers use.
Cc: stable@vger.kernel.org
Fixes: e088a685eae9 ("RDMA/hns: Support rq record doorbell for the user space")
Link: https://sashiko.dev/#/patchset/0-v2-1c49eeb88c48%2B91-rdma_udata_rep_jgg%40nvidia.com?part=9
Link: https://patch.msgid.link/r/15-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com
Reviewed-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Kai Zen <kai.aizen.dev@gmail.com>
Date: Tue Apr 7 12:20:22 2026 +0300
RDMA/ionic: bound node_desc sysfs read with %.64s
commit 654a27f25530d052eeedf086e6c3e2d585c203bd upstream.
node_desc[64] in struct ib_device is not guaranteed to be NUL-
terminated. The core IB sysfs handler uses "%.64s" for exactly this
reason (drivers/infiniband/core/sysfs.c:1307), since node_desc_store()
performs a raw memcpy of up to IB_DEVICE_NODE_DESC_MAX bytes with no NUL
termination:
memcpy(desc.node_desc, buf, min_t(int, count, IB_DEVICE_NODE_DESC_MAX));
If exactly 64 bytes are written via the node_desc sysfs file, the array
contains no NUL byte. The ionic hca_type_show() handler uses unbounded
"%s" and will read past the end of node_desc into adjacent fields of
struct ib_device until it encounters a NUL.
ionic supports IB_DEVICE_MODIFY_NODE_DESC, so this is triggerable by
userspace.
Match the core handler and bound the format specifier.
Cc: stable@vger.kernel.org
Fixes: 2075bbe8ef03 ("RDMA/ionic: Register device ops for miscellaneous functionality")
Link: https://patch.msgid.link/r/CALynFi7NAbhDCt1tdaDbf6TnLvAqbaHa6-Wqf6OkzREbA_PAfg@mail.gmail.com
Signed-off-by: Kai Aizen <kai.aizen.dev@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Tue Apr 28 13:17:34 2026 -0300
RDMA/ionic: Fix typo in format string
commit 70f780edcd1e86350202d8a409de026b2d2e2067 upstream.
Applying the corrupted patch by hand mangled the format string, put the s
in the right place.
Cc: stable@vger.kernel.org
Fixes: 654a27f25530 ("RDMA/ionic: bound node_desc sysfs read with %.64s")
Link: https://patch.msgid.link/r/1-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com
Reported-by: Brad Spengler <brad.spengler@opensrcsec.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Tue Apr 28 13:17:40 2026 -0300
RDMA/mana: Fix error unwind in mana_ib_create_qp_rss()
commit 6aaa978c6b6218cfac15fe1dab17c76fe229ce3f upstream.
Sashiko points out that mana_ib_cfg_vport_steering() is leaked, the normal
destroy path cleans it up.
Cc: stable@vger.kernel.org
Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter")
Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=4
Link: https://patch.msgid.link/r/7-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Tue Apr 28 13:17:39 2026 -0300
RDMA/mana: Fix mana_destroy_wq_obj() cleanup in mana_ib_create_qp_rss()
commit 34ecf795692ee57c393109f4a24ccc313091e137 upstream.
Sashiko points out there are two bugs here in the error unwind flow, both
related to how the WQ table is unwound.
First there is a double i-- on the first failure path due to the while loop
having a i--, remove it.
Second if mana_ib_install_cq_cb() fails then mana_create_wq_obj() is not
undone due to the above i--.
Cc: stable@vger.kernel.org
Fixes: c15d7802a424 ("RDMA/mana_ib: Add CQ interrupt support for RAW QP")
Link: https://sashiko.dev/#/patchset/0-v2-1c49eeb88c48%2B91-rdma_udata_rep_jgg%40nvidia.com?part=1
Link: https://patch.msgid.link/r/6-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Tue Apr 28 13:17:38 2026 -0300
RDMA/mana: Remove user triggerable WARN_ON() in mana_ib_create_qp_rss()
commit 159f2efabc89d3f931d38f2d35876535d4abf0a3 upstream.
Sashiko points out that the user can specify WQs sharing the same CQ as a
part of the uAPI and this will trigger the WARN_ON() then go on to corrupt
the kernel.
Just reject it outright and fail the QP creation.
Cc: stable@vger.kernel.org
Fixes: c15d7802a424 ("RDMA/mana_ib: Add CQ interrupt support for RAW QP")
Link: https://sashiko.dev/#/patchset/0-v2-1c49eeb88c48%2B91-rdma_udata_rep_jgg%40nvidia.com?part=1
Link: https://patch.msgid.link/r/5-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Tue Apr 28 13:17:37 2026 -0300
RDMA/mana: Validate rx_hash_key_len
commit 6dd2d4ad9c8429523b1c220c5132bd551c006425 upstream.
Sashiko points out that rx_hash_key_len comes from a uAPI structure and is
blindly passed to memcpy, allowing the userspace to trash kernel
memory. Bounds check it so the memcpy cannot overflow.
Cc: stable@vger.kernel.org
Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter")
Link: https://sashiko.dev/#/patchset/0-v2-1c49eeb88c48%2B91-rdma_udata_rep_jgg%40nvidia.com?part=1
Link: https://patch.msgid.link/r/4-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Tue Apr 28 13:17:45 2026 -0300
RDMA/mlx4: Fix mis-use of RCU in mlx4_srq_event()
commit c9341307ea16b9395c2e4c9c94d8499d91fe31d0 upstream.
Sashiko points out the radix_tree itself is RCU safe, but nothing ever
frees the mlx4_srq struct with RCU, and it isn't even accessed within the
RCU critical section. It also will crash if an event is delivered before
the srq object is finished initializing.
Use the spinlock since it isn't easy to make RCU work, use
refcount_inc_not_zero() to protect against partially initialized objects,
and order the refcount_set() to be after the srq is fully initialized.
Cc: stable@vger.kernel.org
Fixes: 30353bfc43a1 ("net/mlx4_core: Use RCU to perform radix tree lookup for SRQ")
Link: https://sashiko.dev/#/patchset/0-v2-1c49eeb88c48%2B91-rdma_udata_rep_jgg%40nvidia.com?part=5
Link: https://patch.msgid.link/r/12-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Tue Apr 28 13:17:44 2026 -0300
RDMA/mlx4: Fix resource leak on error in mlx4_ib_create_srq()
commit c54c7e4cb679c0aaa1cb489b9c3f2cd98e63a44c upstream.
Sashiko points out that mlx4_srq_alloc() was not undone during error
unwind, add the missing call to mlx4_srq_free().
Cc: stable@vger.kernel.org
Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=8
Link: https://patch.msgid.link/r/11-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Junrui Luo <moonafterrain@outlook.com>
Date: Fri Apr 24 13:51:02 2026 +0800
RDMA/mlx5: Fix error path fall-through in mlx5_ib_dev_res_srq_init()
commit c488df06bd552bb8b6e14fa0cfd5ad986c6e9525 upstream.
mlx5_ib_dev_res_srq_init() allocates two SRQs, s0 and s1. When
ib_create_srq() fails for s1, the error branch destroys s0 but falls
through and unconditionally assigns the freed s0 and the ERR_PTR s1 to
devr->s0 and devr->s1.
This leads to several problems: the lock-free fast path checks
"if (devr->s1) return 0;" and treats the ERR_PTR as already initialised;
users in mlx5_ib_create_qp() dereference the freed SRQ or ERR_PTR via
to_msrq(devr->s0)->msrq.srqn; and mlx5_ib_dev_res_cleanup() dereferences
the ERR_PTR and double-frees s0 on teardown.
Fix by adding the same `goto unlock` in the s1 failure path.
Cc: stable@vger.kernel.org
Fixes: 5895e70f2e6e ("IB/mlx5: Allocate resources just before first QP/SRQ is created")
Link: https://patch.msgid.link/r/SYBPR01MB7881E1E0970268BD69C0BA75AF2B2@SYBPR01MB7881.ausprd01.prod.outlook.com
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Tue Apr 28 13:17:42 2026 -0300
RDMA/ocrdma: Don't NULL deref uctx on errors in ocrdma_copy_pd_uresp()
commit 34fbf48cf3b410d2a6e8c586fa952a36331ca5ba upstream.
Sashiko points out that pd->uctx isn't initialized until late in the
function so all these error flow references are NULL and will crash. Use
the uctx that isn't NULL.
Cc: stable@vger.kernel.org
Fixes: fe2caefcdf58 ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter")
Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=4
Link: https://patch.msgid.link/r/9-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Michael Bommarito <michael.bommarito@gmail.com>
Date: Sat Apr 18 12:21:41 2026 -0400
RDMA/rxe: Reject non-8-byte ATOMIC_WRITE payloads
commit 1114c87aa6f195cf07da55a27b2122ae26557b26 upstream.
atomic_write_reply() at drivers/infiniband/sw/rxe/rxe_resp.c
unconditionally dereferences 8 bytes at payload_addr(pkt):
value = *(u64 *)payload_addr(pkt);
check_rkey() previously accepted an ATOMIC_WRITE request with pktlen ==
resid == 0 because the length validation only compared pktlen against
resid. A remote initiator that sets the RETH length to 0 therefore reaches
atomic_write_reply() with a zero-byte logical payload, and the responder
reads sizeof(u64) bytes from past the logical end of the packet into
skb->head tailroom, then writes those 8 bytes into the attacker's MR via
rxe_mr_do_atomic_write(). That is a remote disclosure of 4 bytes of kernel
tailroom per probe (the other 4 bytes are the packet's own trailing ICRC).
IBA oA19-28 defines ATOMIC_WRITE as exactly 8 bytes. Anything else is
protocol-invalid. Hoist a strict length check into check_rkey() so the
responder never reaches the unchecked dereference, and keep the existing
WRITE-family length logic for the normal RDMA WRITE path.
Reproduced on mainline with an unmodified rxe driver: a sustained
zero-length ATOMIC_WRITE probe repeatedly leaks adjacent skb head-buffer
bytes into the attacker's MR, including recognisable kernel strings and
partial kernel-direct-map pointer words. With this patch applied the
responder rejects the PDU and the MR stays all-zero.
Cc: stable@vger.kernel.org
Fixes: 034e285f8b99 ("RDMA/rxe: Make responder support atomic write on RC service")
Link: https://patch.msgid.link/r/20260418162141.3610201-1-michael.bommarito@gmail.com
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Michael Bommarito <michael.bommarito@gmail.com>
Date: Tue Apr 14 07:15:55 2026 -0400
RDMA/rxe: Reject unknown opcodes before ICRC processing
commit 4c6f86d85d03cdb33addce86aa69aa795ca6c47a upstream.
Even after applying commit 7244491dab34 ("RDMA/rxe: Validate pad and ICRC
before payload_size() in rxe_rcv"), a single unauthenticated UDP packet
can still trigger panic. That patch handled payload_size() underflow only
for valid opcodes with short packets, not for packets carrying an unknown
opcode. The unknown-opcode OOB read described below predates that commit
and reaches back to the initial Soft RoCE driver.
The check added there reads
pkt->paylen < header_size(pkt) + bth_pad(pkt) + RXE_ICRC_SIZE
where header_size(pkt) expands to rxe_opcode[pkt->opcode].length. The
rxe_opcode[] array has 256 entries but is only populated for defined IB
opcodes; any other entry (for example opcode 0xff) is zero-initialized, so
length == 0 and the check degenerates to
pkt->paylen < 0 + bth_pad(pkt) + RXE_ICRC_SIZE
which does not constrain pkt->paylen enough. rxe_icrc_hdr() then computes
rxe_opcode[pkt->opcode].length - RXE_BTH_BYTES
which underflows when length == 0 and passes a huge value to rxe_crc32(),
causing an out-of-bounds read of the skb payload.
Reproduced on v7.0-rc7 with that fix applied, QEMU/KVM with
CONFIG_RDMA_RXE=y and CONFIG_KASAN=y, after
rdma link add rxe0 type rxe netdev eth0
A single 48-byte UDP packet to port 4791 with BTH opcode=0xff and
QPN=IB_MULTICAST_QPN triggers:
BUG: KASAN: slab-out-of-bounds in crc32_le+0x115/0x170
Read of size 1 at addr ...
The buggy address is located 0 bytes to the right of
allocated 704-byte region
Call Trace:
crc32_le+0x115/0x170
rxe_icrc_hdr.isra.0+0x226/0x300
rxe_icrc_check+0x13f/0x3a0
rxe_rcv+0x6e1/0x16e0
rxe_udp_encap_recv+0x20a/0x320
udp_queue_rcv_one_skb+0x7ed/0x12c0
Subsequent packets with the same shape fault on unmapped memory and panic
the kernel. The trigger requires only module load and "rdma link add"; no
QP, no connection, and no authentication.
Fix this by rejecting packets whose opcode has no rxe_opcode[] entry,
detected via the zero mask or zero length, before any length arithmetic
runs.
Cc: stable@vger.kernel.org
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://patch.msgid.link/r/20260414111555.3386793-1-michael.bommarito@gmail.com
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Tue Apr 28 13:17:43 2026 -0300
RDMA/vmw_pvrdma: Fix double free on pvrdma_alloc_ucontext() error path
commit e38e86995df27f1f854063dab1f0c6a513db3faf upstream.
Sashiko points out that pvrdma_uar_free() is already called within
pvrdma_dealloc_ucontext(), so calling it before triggers a double free.
Cc: stable@vger.kernel.org
Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver")
Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=4
Link: https://patch.msgid.link/r/10-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Chen Ni <nichen@iscas.ac.cn>
Date: Fri Feb 27 17:15:46 2026 +0800
remoteproc: imx_rproc: Fix NULL vs IS_ERR() bug in imx_rproc_addr_init()
commit 665eebebb029690a5b2f92e481020877cc6c8d36 upstream.
The devm_ioremap_resource_wc() function never returns NULL, it returns
error pointers. Update the error checking to match.
Fixes: 67a7bc7f0358 ("remoteproc: Use of_reserved_mem_region_* functions for "memory-region"")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20260227091546.4044246-1-nichen@iscas.ac.cn
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Chen Ni <nichen@iscas.ac.cn>
Date: Fri Feb 27 17:21:10 2026 +0800
remoteproc: k3: Fix NULL vs IS_ERR() bug in k3_reserved_mem_init()
commit 5b1f4b5c72cc40e676293b8609cacef7e1545beb upstream.
The devm_ioremap_resource_wc() function never returns NULL, it returns
error pointers. Update the error checking to match.
Fixes: 67a7bc7f0358 ("remoteproc: Use of_reserved_mem_region_* functions for "memory-region"")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20260227092110.4044313-1-nichen@iscas.ac.cn
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Osama Abdelkader <osama.abdelkader@gmail.com>
Date: Mon Mar 16 16:16:11 2026 +0100
riscv: kvm: fix vector context allocation leak
commit b7c958d7c1eb1cb9b2be7b5ee4129fcd66cec978 upstream.
When the second kzalloc (host_context.vector.datap) fails in
kvm_riscv_vcpu_alloc_vector_context, the first allocation
(guest_context.vector.datap) is leaked. Free it before returning.
Fixes: 0f4b82579716 ("riscv: KVM: Add vector lazy save/restore support")
Cc: stable@vger.kernel.org
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20260316151612.13305-1-osama.abdelkader@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thomas Gleixner <tglx@kernel.org>
Date: Tue Apr 28 09:34:45 2026 +0200
rseq: Don't advertise time slice extensions if disabled
commit 010b7723c0a3b9ad58f50b715dbe2e7781d29400 upstream.
If time slice extensions have been disabled on the kernel command line,
then advertising them in RSEQ flags is wrong.
Adjust the conditionals to reflect reality, fixup the misleading comments
about the gap of these flags and the rseq::flags field.
Fixes: d6200245c75e ("rseq: Allow registering RSEQ with slice extension")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20260428224427.437059375%40kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thomas Gleixner <tglx@kernel.org>
Date: Tue Apr 28 10:14:41 2026 +0200
rseq: Protect rseq_reset() against interrupts
commit e9766e6f7d330dce7530918d8c6e3ec96d6c6e24 upstream.
rseq_reset() uses memset() to clear the tasks rseq data. That's racy
against membarrier() and preemption.
Guard it with irqsave to cure this.
Fixes: faba9d250eae ("rseq: Introduce struct rseq_data")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20260428224427.353887714%40kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thomas Gleixner <tglx@kernel.org>
Date: Tue Apr 28 10:10:19 2026 +0200
rseq: Set rseq::cpu_id_start to 0 on unregistration
commit 2cb68e45120dfc66404c7547d95b8ac6ff0b25ce upstream.
The RSEQ rework changed that to RSEQ_CPU_UNINITILIZED, which is obviously
incompatible. Revert back to the original behavior.
Fixes: 0f085b41880e ("rseq: Provide and use rseq_set_ids()")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20260428224427.271566313%40kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Miguel Ojeda <ojeda@kernel.org>
Date: Sun Apr 26 16:42:01 2026 +0200
rust: allow `clippy::collapsible_if` globally
commit 2adc8664018c1cc595c7c0c98474a33c7fe32a85 upstream.
Similar to `clippy::collapsible_match` (globally allowed in the previous
commit), the `clippy::collapsible_if` lint [1] can make code harder to
read in certain cases.
Thus just let developers decide on their own.
In addition, remove the existing `expect` we had.
Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs).
Suggested-by: Gary Guo <gary@garyguo.net>
Link: https://lore.kernel.org/rust-for-linux/DGROP5CHU1QZ.1OKJRAUZXE9WC@garyguo.net/
Link: https://rust-lang.github.io/rust-clippy/master/index.html#collapsible_if [1]
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260426144201.227108-2-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Miguel Ojeda <ojeda@kernel.org>
Date: Sun Apr 26 16:42:00 2026 +0200
rust: allow `clippy::collapsible_match` globally
commit 838d852da8503372f3a1779bfbd1ccb93153ab4e upstream.
The `clippy::collapsible_match` lint [1] can make code harder to read
in certain cases [2], e.g.
CLIPPY P rust/libmacros.so - due to command line change
warning: this `if` can be collapsed into the outer `match`
--> rust/pin-init/internal/src/helpers.rs:91:17
|
91 | / if nesting == 1 {
92 | | impl_generics.push(tt.clone());
93 | | impl_generics.push(tt);
94 | | skip_until_comma = false;
95 | | }
| |_________________^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#collapsible_match
= note: `-W clippy::collapsible-match` implied by `-W clippy::all`
= help: to override `-W clippy::all` add `#[allow(clippy::collapsible_match)]`
help: collapse nested if block
|
90 ~ TokenTree::Punct(p) if skip_until_comma && p.as_char() == ','
91 ~ && nesting == 1 => {
92 | impl_generics.push(tt.clone());
93 | impl_generics.push(tt);
94 | skip_until_comma = false;
95 ~ }
|
The lint does not have much upside -- when the suggestion may be a good
one, it would still read fine when nested anyway. And it is the kind of
lint that may easily bias people to just apply the suggestion instead
of allowing it.
[ In addition, as Gary points out [3], the suggestion is also wrong [4] and
in the process of being fixed [5], possibly for Rust 1.97.0:
Link: https://lore.kernel.org/rust-for-linux/DI3YV94TH9I3.1SOHW51552497@garyguo.net/ [3]
Link: https://github.com/rust-lang/rust-clippy/issues/16875 [4]
Link: https://github.com/rust-lang/rust-clippy/pull/16878 [5]
- Miguel ]
Thus just let developers decide on their own.
Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs).
Link: https://rust-lang.github.io/rust-clippy/master/index.html#collapsible_match [1]
Link: https://lore.kernel.org/rust-for-linux/CANiq72nWYJna_hdFxjQCQZK6yJBrr1Mb86iKavivV0U0BgufeA@mail.gmail.com/ [2]
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260426144201.227108-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Eliot Courtney <ecourtney@nvidia.com>
Date: Thu Apr 23 21:36:52 2026 +0900
rust: drm: gem: clean up GEM state in init failure case
commit 2e42a17b8f6bc3c0cd69d7556b588011d3ec2394 upstream.
Currently, if `drm_gem_object_init` fails, the object is freed without
any cleanup. Perform the cleanup in that case.
Cc: stable@vger.kernel.org
Fixes: c284d3e42338 ("rust: drm: gem: Add GEM object abstraction")
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Onur Özkan <work@onurozkan.dev>
Link: https://patch.msgid.link/20260423-fix-gem-1-v1-1-e12e35f7bba9@nvidia.com
[ Move safety comment closer to unsafe block to avoid a clippy warning.
- Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Gary Guo <gary@garyguo.net>
Date: Tue May 12 16:17:20 2026 +0100
rust: pin-init: fix incorrect accessor reference lifetime
commit 68bf102226cf2199dc609b67c1e847cad4de4b57 upstream
When a field has been initialized, `init!`/`pin_init!` create a reference
or pinned reference to the field so it can be accessed later during the
initialization of other fields. However, the reference it created is
incorrectly `&'static` rather than just the scope of the initializer.
This means that you can do
init!(Foo {
a: 1,
_: {
let b: &'static u32 = a;
}
})
which is unsound.
This is caused by `&mut (*#slot).#ident`, which actually allows arbitrary
lifetime, so this is effectively `'static`. Somewhat ironically, the safety
justification of creating the accessor is.. "SAFETY: TODO".
Fix it by adding `let_binding` method on `DropGuard` to shorten lifetime.
This results exactly what we want for these accessors. The safety and
invariant comments of `DropGuard` have been reworked; instead of reasoning
about what caller can do with the guard, express it in a way that the
ownership is transferred to the guard and `forget` takes it back, so the
unsafe operations within the `DropGuard` can be more easily justified.
Fixes: db96c5103ae6 ("add references to previously initialized fields")
Signed-off-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Gary Guo <gary@garyguo.net>
Date: Mon Apr 27 16:43:00 2026 +0100
rust: pin-init: internal: move alignment check to `make_field_check`
commit 83ac2870310b694775ab7e8f0244fdd94fc21926 upstream.
Instead of having the reference creation serving dual-purpose as both for
let bindings and alignment check, detangle them so that the alignment check
is done explicitly in `make_field_check`. This is more robust against
refactors that may change the way let bindings are created.
Cc: stable@vger.kernel.org
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260427-pin-init-fix-v3-1-496a699674dd@garyguo.net
[ Reworded for typo. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Pengpeng Hou <pengpeng@iscas.ac.cn>
Date: Fri Apr 17 15:35:30 2026 +0800
s390/debug: Reject zero-length input before trimming a newline
commit c366a7b5ed7564e41345c380285bd3f6cb98971b upstream.
debug_get_user_string() duplicates the userspace buffer with
memdup_user_nul() and then unconditionally looks at buffer[user_len - 1]
to strip a trailing newline.
A zero-length write reaches this helper unchanged, so the newline trim
reads before the start of the allocated buffer.
Reject empty writes before accessing the last input byte.
Fixes: 66a464dbc8e0 ("[PATCH] s390: debug feature changes")
Cc: stable@vger.kernel.org
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20260417073530.96002-1-pengpeng@iscas.ac.cn
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Vasily Gorbik <gor@linux.ibm.com>
Date: Fri Apr 17 14:33:43 2026 +0200
s390/debug: Reject zero-length input in debug_input_flush_fn()
commit e14622a7584f9608927c59a7d6ae4a0999dc545e upstream.
debug_input_flush_fn() always copies one byte from the userspace buffer
with copy_from_user() regardless of the supplied write length. A
zero-length write therefore reads one byte beyond the caller's buffer.
If the stale byte happens to be '-' or a digit the debug log is
silently flushed. With an unmapped buffer the call returns -EFAULT.
Reject zero-length writes before copying from userspace.
Cc: stable@vger.kernel.org # v5.10+
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: David Carlier <devnexen@gmail.com>
Date: Thu Apr 30 10:27:47 2026 +0100
sched_ext: idle: Recheck prev_cpu after narrowing allowed mask
commit b34c82777a2c0648ee053595f4b290fd5249b093 upstream.
scx_select_cpu_dfl() narrows @allowed to @cpus_allowed & @p->cpus_ptr
when the BPF caller supplies a @cpus_allowed that differs from
@p->cpus_ptr and @p doesn't have full affinity. However,
@is_prev_allowed was computed against the original (wider)
@cpus_allowed, so the prev_cpu fast paths could pick a @prev_cpu that
is in @cpus_allowed but not in @p->cpus_ptr, violating the intended
invariant that the returned CPU is always usable by @p. The kernel
masks this via the SCX_EV_SELECT_CPU_FALLBACK fallback, but the
behavior contradicts the documented contract.
Move the @is_prev_allowed evaluation past the narrowing block so it
tests against the final @allowed mask.
Fixes: ee9a4e92799d ("sched_ext: idle: Properly handle invalid prev_cpu during idle selection")
Cc: stable@vger.kernel.org # v6.16+
Assisted-by: Claude <noreply@anthropic.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tejun Heo <tj@kernel.org>
Date: Fri Apr 24 14:31:35 2026 -1000
sched_ext: Read scx_root under scx_cgroup_ops_rwsem in cgroup setters
commit 80afd4c84bc8f5e80145ce35279f5ce53f6043db upstream.
scx_group_set_{weight,idle,bandwidth}() cache scx_root before acquiring
scx_cgroup_ops_rwsem, so the pointer can be stale by the time the op runs.
If the loaded scheduler is disabled and freed (via RCU work) and another is
enabled between the naked load and the rwsem acquire, the reader sees
scx_cgroup_enabled=true (the new scheduler's) but dereferences the freed one
- UAF on SCX_HAS_OP(sch, ...) / SCX_CALL_OP(sch, ...).
scx_cgroup_enabled is toggled only under scx_cgroup_ops_rwsem write
(scx_cgroup_{init,exit}), so reading scx_root inside the rwsem read section
correlates @sch with the enabled snapshot.
Fixes: a5bd6ba30b33 ("sched_ext: Use cgroup_lock/unlock() to synchronize against cgroup operations")
Cc: stable@vger.kernel.org # v6.18+
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tejun Heo <tj@kernel.org>
Date: Fri Apr 24 14:31:35 2026 -1000
sched_ext: Use dsq->first_task instead of list_empty() in dispatch_enqueue() FIFO-tail
commit 2f2ea77092660b53bfcbc4acc590b57ce9ab5dce upstream.
dispatch_enqueue()'s FIFO-tail path used list_empty(&dsq->list) to decide
whether to set dsq->first_task on enqueue. dsq->list can contain parked BPF
iterator cursors (SCX_DSQ_LNODE_ITER_CURSOR), so list_empty() is not a
reliable "no real task" check. If the last real task is unlinked while a
cursor is parked, first_task becomes NULL; the next FIFO-tail enqueue then
sees list_empty() == false and skips the first_task update, leaving
scx_bpf_dsq_peek() returning NULL for a non-empty DSQ.
Test dsq->first_task directly, which already tracks only real tasks and is
maintained under dsq->lock.
Fixes: 44f5c8ec5b9a ("sched_ext: Add lockless peek operation for DSQs")
Cc: stable@vger.kernel.org # v6.19+
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Cc: Ryan Newton <newton@meta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ranjan Kumar <ranjan.kumar@broadcom.com>
Date: Tue Apr 14 16:38:11 2026 +0530
scsi: mpt3sas: Limit NVMe request size to 2 MiB
commit 04631f55afc543d5431a2bdee7f6cc0f2c0debe7 upstream.
The HBA firmware reports NVMe MDTS values based on the underlying drive
capability. However, because the driver allocates a fixed 4K buffer for
the PRP list, accommodating at most 512 entries, the driver supports a
maximum I/O transfer size of 2 MiB.
Limit max_hw_sectors to the smaller of the reported MDTS and the 2 MiB
driver limit to prevent issuing oversized I/O that may lead to a kernel
oops.
Cc: stable@vger.kernel.org
Fixes: 9b8b84879d4a ("block: Increase BLK_DEF_MAX_SECTORS_CAP")
Reported-by: Mira Limbeck <m.limbeck@proxmox.com>
Closes: https://lore.kernel.org/r/291f78bf-4b4a-40dd-867d-053b36c564b3@proxmox.com
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9b8b84879d4a
Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Tested-by: Mira Limbeck <m.limbeck@proxmox.com>
Link: https://patch.msgid.link/20260414110811.85156-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Sat Apr 11 14:06:00 2026 +0200
scsi: target: configfs: Bound snprintf() return in tg_pt_gp_members_show()
commit 772a896a56e0e3ef9424a025cec9176f9d8f4552 upstream.
target_tg_pt_gp_members_show() formats LUN paths with snprintf() into a
256-byte stack buffer, then will memcpy() cur_len bytes from that
buffer. snprintf() returns the length the output would have had, which
can exceed the buffer size when the fabric WWN is long because iSCSI IQN
names can be up to 223 bytes. The check at the memcpy() site only
guards the destination page write, not the source read, so memcpy() will
read past the stack buffer and copy adjacent stack contents to the sysfs
reader, which when CONFIG_FORTIFY_SOURCE is enabled, fortify_panic()
will be triggered.
Commit 27e06650a5ea ("scsi: target: target_core_configfs: Add length
check to avoid buffer overflow") added the same bound to the
target_lu_gp_members_show() but the tg_pt_gp variant was missed so
resolve that here.
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6")
Assisted-by: gregkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026041159-garter-theft-3be0@gregkh
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mark Brown <broonie@kernel.org>
Date: Thu Apr 23 20:17:45 2026 +0100
selftests/rseq: Don't run tests with runner scripts outside of the scripts
commit cb48828f06afa232cc330f0f4d6be101067810b3 upstream.
The rseq selftests include two runner scripts run_param_test.sh and
run_syscall_errors_test.sh which set up the environment for test binaries
and run them with various parameters. Currently we list these test binaries
in TEST_GEN_PROGS but this results in the kselftest framework running them
directly as well as via the runners, resulting in duplication and spurious
failures when the environment is not correctly set up (eg, if glibc tries
to use rseq).
Move the binaries the runners invoke to TEST_GEN_PROGS_EXTENDED, binaries
listed there are built but not run by the framework. The param_test
benchmarks are not moved since they are not run by run_param_test.sh.
Fixes: 830969e7821a ("selftests/rseq: Implement time slice extension test")
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260423-selftests-rseq-use-runner-v1-1-e13a133754c1@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thomas Gleixner <tglx@kernel.org>
Date: Sat Apr 25 14:48:23 2026 +0200
selftests/rseq: Expand for optimized RSEQ ABI v2
commit e744060076871eebc2647b24420b550ff44b2b65 upstream.
Update the selftests so they are executed for legacy (32 bytes RSEQ region)
and optimized RSEQ ABI v2 mode.
Fixes: d6200245c75e ("rseq: Allow registering RSEQ with slice extension")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20260428224428.009121296%40kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thomas Gleixner <tglx@kernel.org>
Date: Sun Apr 26 18:13:54 2026 +0200
selftests/rseq: Make registration flexible for legacy and optimized mode
commit d97cb2ef0b221b068e90b6058aa97faa0626bdab upstream.
rseq_register_current_thread() either uses the glibc registered RSEQ region
or registers it's own region with the legacy size of 32 bytes.
That worked so far, but becomes a problem when the kernel implements a
distinction between legacy and performance optimized behavior based on the
registration size as that does not allow to test both modes with the self
test suite.
Add two arguments to the function. One to enforce that the registration is
not using libc provided mode and one to tell the registration to use the
legacy size and not the kernel advertised size.
Rename it and make the original one a inline wrapper which preserves the
existing behavior.
Fixes: 566d8015f7ee ("rseq: Avoid CPU/MM CID updates when no event pending")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20260428224427.677889423%40kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thomas Gleixner <tglx@kernel.org>
Date: Sat Apr 25 15:46:06 2026 +0200
selftests/rseq: Skip tests if time slice extensions are not available
commit 02b44d943b3adddc3a15c1da97045e205b7d14c1 upstream.
Don't fail, skip the test if the extensions are not enabled at compile or
runtime.
Fixes: 830969e7821a ("selftests/rseq: Implement time slice extension test")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20260428224427.597838491%40kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thomas Gleixner <tglx@kernel.org>
Date: Sun Apr 26 17:51:07 2026 +0200
selftests/rseq: Validate legacy behavior
commit fdf4eb632683bfc2840acebe62716cb468d43e10 upstream.
The RSEQ legacy mode behavior requires that the ID fields in the rseq
region are unconditionally updated on every context switch and before
signal delivery even if not required by the ABI specification.
To ensure that this behavior is preserved for legacy users in the future,
add a test which validates that with a sleep() and a signal sent to self.
Provide a run script which prevents GLIBC from registering a RSEQ region,
so that the test can register it's own legacy sized region.
Fixes: 566d8015f7ee ("rseq: Avoid CPU/MM CID updates when no event pending")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20260428224427.764705536%40kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Tue May 5 17:00:58 2026 +0200
selftests: mptcp: check output: catch cmd errors
commit 65db7b27b90e2ea8d4966935aa9a50b6a60c31ac upstream.
Using '${?}' inside the if-statement to check the returned value from
the command that was evaluated as part of the if-statement is not
correct: here, '${?}' will be linked to the previous instruction, not
the one that is expected here (${cmd}).
Instead, simply mark the error, except if an error is expected. If
that's the case, 1 can be passed as the 4th argument of this helper.
Three checks from pm_netlink.sh expect an error.
While at it, improve the error message when the command unexpectedly
fails or succeeds.
Note that we could expect a specific returned value, but the checks
currently expecting an error can be used with 'ip mptcp' or 'pm_nl_ctl',
and these two tools don't return the same error code.
Fixes: 2d0c1d27ea4e ("selftests: mptcp: add mptcp_lib_check_output helper")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-10-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Tue May 5 17:00:59 2026 +0200
selftests: mptcp: pm: restrict 'unknown' check to pm_nl_ctl
commit 53705ddfa18408f8e1f064331b6387509fa19f7f upstream.
When pm_netlink.sh is executed with '-i', 'ip mptcp' is used instead of
'pm_nl_ctl'. IPRoute2 doesn't support the 'unknown' flag, which has only
been added to 'pm_nl_ctl' for this specific check: to ensure that the
kernel ignores such unsupported flag.
No reason to add this flag to 'ip mptcp'. Then, this check should be
skipped when 'ip mptcp' is used.
Fixes: 0cef6fcac24d ("selftests: mptcp: ip_mptcp option for more scripts")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-11-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Stephen Smalley <stephen.smalley.work@gmail.com>
Date: Tue May 5 10:06:38 2026 -0400
selinux: allow multiple opens of /sys/fs/selinux/policy
commit a02cd6805562305f936e807da83e253b719dd965 upstream.
Currently there can only be a single open of /sys/fs/selinux/policy at
any time. This allows any process to block any other process from
reading the kernel policy. The original motivation seems to have been
a mix of preventing an inconsistent view of the policy size and
preventing userspace from allocating kernel memory without bound, but
this is arguably equally bad. Eliminate the policy_opened flag and
shrink the critical section that the policy mutex is held. While we
are making changes here, drop a couple of extraneous BUG_ONs.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/selinux/20100726193414.19538.64028.stgit@paris.rdu.redhat.com/
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: David Windsor <dwindsor@gmail.com>
Date: Sun Apr 26 19:23:49 2026 -0400
selinux: don't reserve xattr slot when we won't fill it
commit 1e5a8eed7821e7a43a31b4c1b3675a91be6bc6f6 upstream.
Move lsm_get_xattr_slot() below the SBLABEL_MNT check so we don't leave
a NULL-named slot in the array when returning -EOPNOTSUPP; filesystem
initxattrs() callbacks stop iterating at the first NULL ->name, silently
dropping xattrs installed by later LSMs.
Cc: stable@vger.kernel.org
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Stephen Smalley <stephen.smalley.work@gmail.com>
Date: Fri Apr 10 15:29:50 2026 -0400
selinux: fix avdcache auditing
commit f92d542577db878acfd21cc18dab23d03023b217 upstream.
The per-task avdcache was incorrectly saving and reusing the
audited vector computed by avc_audit_required() rather than
recomputing based on the currently requested permissions and
distinguishing the denied versus allowed cases. As a result,
some permission checks were not being audited, e.g.
directory write checks after a previously cached directory
search check.
Cc: stable@vger.kernel.org
Fixes: dde3a5d0f4dce ("selinux: move avdcache to per-task security struct")
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: line wrap tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Stephen Smalley <stephen.smalley.work@gmail.com>
Date: Tue May 5 08:49:48 2026 -0400
selinux: prune /sys/fs/selinux/checkreqprot
commit 644132a48f4e28a1d949d162160869286f3e75de upstream.
commit a7e4676e8e2cb ("selinux: remove the 'checkreqprot'
functionality") removed the ability to modify the checkreqprot setting
but left everything except the updating of the checkreqprot value
intact. Aside from unnecessary processing, this could produce a local
DoS from log spam and incorrectly calls selinux_ima_measure_state() on
each write even though no state has changed. Prune it to just log an
error message once and return count (i.e. all bytes written
successfully) so that userspace never breaks.
Cc: stable@vger.kernel.org
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Stephen Smalley <stephen.smalley.work@gmail.com>
Date: Tue May 5 08:49:49 2026 -0400
selinux: prune /sys/fs/selinux/disable
commit 19cfa0099024bb9cd40f6d950caa7f47ff8e77f6 upstream.
Commit f22f9aaf6c3d ("selinux: remove the runtime disable
functionality") removed the underlying SELinux runtime disable
functionality but left everything else intact and started logging an
error message to warn any residual users.
Prune it to just log an error message once and to return count
(i.e. all bytes written successfully) to avoid breaking
userspace. This also fixes a local DoS from logspam.
Cc: stable@vger.kernel.org
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Stephen Smalley <stephen.smalley.work@gmail.com>
Date: Tue May 5 08:49:50 2026 -0400
selinux: prune /sys/fs/selinux/user
commit ad1ac3d740cc6b858a99ab9c45c8c0574be7d1d3 upstream.
Remove the previously deprecated /sys/fs/selinux/user interface aside
from a residual stub for userspace compatibility.
Commit d7b6918e22c7 ("selinux: Deprecate /sys/fs/selinux/user") started
the deprecation process for /sys/fs/selinux/user:
The selinuxfs "user" node allows userspace to request a list
of security contexts that can be reached for a given SELinux
user from a given starting context. This was used by libselinux
when various login-style programs requested contexts for
users, but libselinux stopped using it in 2020.
Kernel support will be removed no sooner than Dec 2025.
A pr_warn() message has been in place since Linux v6.13, and a 5
second sleep was introduced since Linux v6.17 to help make it more
noticeable.
We are now past the stated deadline of Dec 2025, so remove the
underlying functionality and replace it with a stub that returns a
'0\0' buffer to avoid breaking userspace. This also avoids a local DoS
from logspam and an uninterruptible sleep delay.
Cc: stable@vger.kernel.org
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Stephen Smalley <stephen.smalley.work@gmail.com>
Date: Thu Apr 30 14:36:52 2026 -0400
selinux: shrink critical section in sel_write_load()
commit 868f31e4061eca8c3cd607d79d954d5e54f204aa upstream.
Currently sel_write_load() takes the policy mutex earlier than
necessary. Move the taking of the mutex later. This avoids
holding it unnecessarily across the vmalloc() and copy_from_user()
of the policy data.
Cc: stable@vger.kernel.org
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Zongyao Chen <ZongYao.Chen@linux.alibaba.com>
Date: Fri Apr 24 15:37:53 2026 +0800
selinux: use sk blob accessor in socket permission helpers
commit 032e70aff025d7c519af9ab791cd084380619263 upstream.
SELinux socket state lives in the composite LSM socket blob.
sock_has_perm() and nlmsg_sock_has_extended_perms() currently
dereference sk->sk_security directly, which assumes the SELinux socket
blob is at offset zero.
In stacked configurations that assumption does not hold. If another LSM
allocates socket blob storage before SELinux, these helpers may read the
wrong blob and feed invalid SID and class values into AVC checks.
Use selinux_sock() instead of accessing sk->sk_security directly.
Fixes: d1d991efaf34 ("selinux: Add netlink xperm support")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Zongyao Chen <ZongYao.Chen@linux.alibaba.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Zisen Ye <zisenye@stu.xidian.edu.cn>
Date: Wed May 6 11:49:08 2026 +0800
smb/client: fix out-of-bounds read in smb2_compound_op()
commit 8d09328dfda089675e4c049f3f256064a1d1996b upstream.
If a server sends a truncated response but a large OutputBufferLength, and
terminates the EA list early, check_wsl_eas() returns success without
validating that the entire OutputBufferLength fits within iov_len.
Then smb2_compound_op() does:
memcpy(idata->wsl.eas, data[0], size[0]);
Where size[0] is OutputBufferLength. If iov_len is smaller than size[0],
memcpy can read beyond the end of the rsp_iov allocation and leak adjacent
kernel heap memory.
Link: https://lore.kernel.org/linux-cifs/d998240c-aca9-420d-9dbd-f5ba24af19e0@chenxiaosong.com/
Fixes: ea41367b2a60 ("smb: client: introduce SMB2_OP_QUERY_WSL_EA")
Cc: stable@vger.kernel.org
Signed-off-by: Zisen Ye <zisenye@stu.xidian.edu.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Zisen Ye <zisenye@stu.xidian.edu.cn>
Date: Sat May 2 18:48:36 2026 +0800
smb/client: fix out-of-bounds read in symlink_data()
commit d62b8d236fab503c6fec1d3e9a38bea71feaca20 upstream.
Since smb2_check_message() returns success without length validation for
the symlink error response, in symlink_data() it is possible for
iov->iov_len to be smaller than sizeof(struct smb2_err_rsp). If the buffer
only contains the base SMB2 header (64 bytes), accessing
err->ErrorContextCount (at offset 66) or err->ByteCount later in
symlink_data() will cause an out-of-bounds read.
Link: https://lore.kernel.org/linux-cifs/297d8d9b-adf7-42fd-a1c2-5b1f230032bc@chenxiaosong.com/
Fixes: 76894f3e2f71 ("cifs: improve symlink handling for smb2+")
Cc: Stable@vger.kernel.org
Signed-off-by: Zisen Ye <zisenye@stu.xidian.edu.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yi Kuo <yi@yikuo.dev>
Date: Fri May 8 10:15:47 2026 +0200
smb: client/smbdirect: fix MR registration for coalesced SG lists
commit 9900b9fee5a0e0f72d7c744b37c7c851d5785ac6 upstream.
The stable backport to < 7.1 patches a different file. Also
the Fixes tag below is adjusted for the old code path.
ib_dma_map_sg() modifies the provided scatterlist and returns the
number of mapped entries, which can be fewer than the requested
mr->sgt.nents if the DMA controller coalesces contiguous memory
segments. Passing the original, uncoalesced count to ib_map_mr_sg()
causes memory registration failures if coalescing actually occurs.
Capture the actual mapped count returned by ib_dma_map_sg() and pass it
to ib_map_mr_sg() to ensure correct MR registration.
Also update the ib_dma_map_sg() error logging to drop the error
pointer formatting, since the return value is an integer count
rather than an error code.
Ensure a proper error code (-EIO) is assigned when DMA mapping or
MR registration fails.
Fixes: c7398583340a ("CIFS: SMBD: Implement RDMA memory registration")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221408
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Yi Kuo <yi@yikuo.dev>
Signed-off-by: Steve French <stfrench@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Bjoern Doebel <doebel@amazon.de>
Date: Thu Apr 30 08:57:17 2026 +0000
smb: client: use kzalloc to zero-initialize security descriptor buffer
commit 5e489c6c47a2ac15edbaca153b9348e42c1eacab upstream.
Commit 62e7dd0a39c2d ("smb: common: change the data type of num_aces
to le16") split struct smb_acl's __le32 num_aces field into __le16
num_aces and __le16 reserved. The reserved field corresponds to Sbz2
in the MS-DTYP ACL wire format, which must be zero [1].
When building an ACL descriptor in build_sec_desc(), we are using a
kmalloc()'ed descriptor buffer and writing the fields explicitly using
le16() writes now. This never writes to the 2 byte reserved field,
leaving it as uninitialized heap data.
When the reserved field happens to contain non-zero slab garbage,
Samba rejects the security descriptor with "ndr_pull_security_descriptor
failed: Range Error", causing chmod to fail with EINVAL.
Change kmalloc() to kzalloc() to ensure the entire buffer is
zero-initialized.
Fixes: 62e7dd0a39c2d ("smb: common: change the data type of num_aces to le16")
Cc: stable@vger.kernel.org
Signed-off-by: Bjoern Doebel <doebel@amazon.de>
Assisted-by: Kiro:claude-opus-4.6
[1] https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/20233ed8-a6c6-4097-aafa-dd545ed24428
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Michael Bommarito <michael.bommarito@gmail.com>
Date: Mon Apr 20 10:47:47 2026 -0400
smb: client: validate dacloffset before building DACL pointers
commit f98b48151cc502ada59d9778f0112d21f2586ca3 upstream.
parse_sec_desc(), build_sec_desc(), and the chown path in
id_mode_to_cifs_acl() all add the server-supplied dacloffset to pntsd
before proving a DACL header fits inside the returned security
descriptor.
On 32-bit builds a malicious server can return dacloffset near
U32_MAX, wrap the derived DACL pointer below end_of_acl, and then slip
past the later pointer-based bounds checks. build_sec_desc() and
id_mode_to_cifs_acl() can then dereference DACL fields from the wrapped
pointer in the chmod/chown rewrite paths.
Validate dacloffset numerically before building any DACL pointer and
reuse the same helper at the three DACL entry points.
Fixes: bc3e9dd9d104 ("cifs: Change SIDs in ACEs while transferring file ownership.")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: SeungJu Cheon <suunj1331@gmail.com>
Date: Sun Apr 26 20:12:39 2026 +0900
sound: ua101: fix division by zero at probe
commit d1f73f169c1014463b5060e3f60813e13ddc7b87 upstream.
Add a missing sanity check for bNrChannels in detect_usb_format()
to prevent a division by zero in playback_urb_complete() and
capture_urb_complete().
USB core does not validate class-specific descriptor fields such
as bNrChannels, so drivers must verify them before use. If a
device provides bNrChannels = 0, frame_bytes becomes zero and is
later used as a divisor in the URB completion handlers, leading
to a kernel crash.
Fixes: 63978ab3e3e9 ("sound: add Edirol UA-101 support")
Cc: stable@vger.kernel.org
Signed-off-by: SeungJu Cheon <suunj1331@gmail.com>
Link: https://patch.msgid.link/20260426111239.103296-1-suunj1331@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Conor Dooley <conor.dooley@microchip.com>
Date: Thu Apr 30 11:10:18 2026 +0100
spi: microchip-core-qspi: control built-in cs manually
commit 7672749e1496215e8683ce57cf323119033954cf upstream.
The coreQSPI IP supports only a single chip select, which is
automagically operated by the hardware - set low when the transmit
buffer first gets written to and set high when the number of bytes
written to the TOTALBYTES field of the FRAMES register have been sent on
the bus. Additional devices must use GPIOs for their chip selects.
It was reported to me that if there are two devices attached to this
QSPI controller that the in-built chip select is set low while linux
tries to access the device attached to the GPIO.
This went undetected as the boards that connected multiple devices to
the SPI controller all exclusively used GPIOs for chip selects, not
relying on the built-in chip select at all. It turns out that this was
because the built-in chip select, when controlled automagically, is set
low when active and high when inactive, thereby ruling out its use for
active-high devices or devices that need to transmit with the chip
select disabled.
Modify the driver so that it controls chip select directly, retaining
the behaviour for mem_ops of setting the chip select active for the
entire duration of the transfer in the exec_op callback. For regular
transfers, implement the set_cs callback for the core to use.
As part of this, the existing setup callback, mchp_coreqspi_setup_op(),
is removed. Modifying the CLKIDLE field is not safe to do during
operation when there are multiple devices, so this code is removed
entirely. Setting the MASTER and ENABLE fields is something that can be
done once at probe, it doesn't need to be re-run for each device.
Instead the new setup callback sets the built-in chip select to its
inactive state for active-low devices, as the reset value of the chip
select in software controlled mode is low.
Fixes: 8f9cf02c88528 ("spi: microchip-core-qspi: Add regular transfers")
Fixes: 8596124c4c1bc ("spi: microchip-core-qspi: Add support for microchip fpga qspi controllers")
CC: stable@vger.kernel.org
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260430-hamstring-busload-f941d0347b5e@spud
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Conor Dooley <conor.dooley@microchip.com>
Date: Thu Apr 30 11:10:19 2026 +0100
spi: microchip-core-qspi: don't attempt to transmit during emulated read-only dual/quad operations
commit eb56deaabf127e8985fc91fa6c97bf8a3b062844 upstream.
The core will deal with reads by creating clock cycles itself, there's
no need to generate clock cycles by transmitting garbage data at the
driver level. Further, transmitting garbage data just bricks the transfer
since QSPI doesn't have a dedicated master-out line like MOSI in regular
SPI. I'm not entirely sure if the transfer is bricked because of the
garbage data being transmitted on the bus or because the core loses
track of whether it is supposed to be sending or receiving data.
Fixes: 8f9cf02c88528 ("spi: microchip-core-qspi: Add regular transfers")
CC: stable@vger.kernel.org
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260430-freezing-saloon-95b1f3d9dad0@spud
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Thu Apr 9 14:04:17 2026 +0200
spi: microchip-core-qspi: fix controller deregistration
commit e6464140d439f2d42f072eb422a5b1fec470c5a6 upstream.
Make sure to deregister the controller before disabling underlying
resources like interrupts during driver unbind.
Fixes: 8596124c4c1b ("spi: microchip-core-qspi: Add support for microchip fpga qspi controllers")
Cc: stable@vger.kernel.org # 6.1
Cc: Naga Sureshkumar Relli <nagasuresh.relli@microchip.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260409120419.388546-19-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Thu Apr 9 14:04:18 2026 +0200
spi: microchip-core-spi: fix controller deregistration
commit d00d722ebad46cf7a9886684f26a26337b5ee3f4 upstream.
Make sure to deregister the controller before disabling underlying
resources like interrupts during driver unbind.
Fixes: 059f545832be ("spi: add support for microchip "soft" spi controller")
Cc: stable@vger.kernel.org # 6.19
Cc: Prajna Rajendra Kumar <prajna.rajendrakumar@microchip.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260409120419.388546-20-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Tue Mar 24 09:23:23 2026 +0100
spi: rockchip: fix controller deregistration
commit 53e7a16070feb7d1d4d81a583eaac5e25048b9c3 upstream.
Make sure to deregister the controller before freeing underlying
resources like DMA channels during driver unbind.
Fixes: 64e36824b32b ("spi/rockchip: add driver for Rockchip RK3xxx SoCs integrated SPI")
Cc: stable@vger.kernel.org # 3.17
Cc: addy ke <addy.ke@rock-chips.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260324082326.901043-3-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Apr 10 11:49:25 2026 +0200
spi: s3c64xx: fix NULL-deref on driver unbind
commit 45daacbead8a009844bd5dba6cfa731332184d17 upstream.
A change moving DMA channel allocation from probe() back to
s3c64xx_spi_prepare_transfer() failed to remove the corresponding
deallocation from remove().
Drop the bogus DMA channel release from remove() to avoid triggering a
NULL-pointer dereference on driver unbind.
This issue was flagged by Sashiko when reviewing a controller
deregistration fix.
Fixes: f52b03c70744 ("spi: s3c64xx: requests spi-dma channel only during data transfer")
Cc: stable@vger.kernel.org # 6.0
Cc: Adithya K V <adithya.kv@samsung.com>
Link: https://sashiko.dev/#/patchset/20260410081757.503099-1-johan%40kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260410094925.518343-1-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Apr 10 10:17:48 2026 +0200
spi: sun4i: fix controller deregistration
commit 42108a2f03e0fdeabe9d02d085bdb058baa1189f upstream.
Make sure to deregister the controller before disabling underlying
resources like clocks during driver unbind.
Fixes: b5f6517948cc ("spi: sunxi: Add Allwinner A10 SPI controller driver")
Cc: stable@vger.kernel.org # 3.15
Cc: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260410081757.503099-19-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Apr 10 10:17:49 2026 +0200
spi: sun6i: fix controller deregistration
commit d874a1c33aee0d88fb4ba2f8aeadaa9f1965209a upstream.
Make sure to deregister the controller before disabling underlying
resources like clocks during driver unbind.
Fixes: 3558fe900e8a ("spi: sunxi: Add Allwinner A31 SPI controller driver")
Cc: stable@vger.kernel.org # 3.15
Cc: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260410081757.503099-20-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Apr 10 10:17:50 2026 +0200
spi: syncuacer: fix controller deregistration
commit 75d849c3452e9611de031db45b3149ba9a99035f upstream.
Make sure to deregister the controller before disabling underlying
resources like clocks during driver unbind.
Fixes: b0823ee35cf9 ("spi: Add spi driver for Socionext SynQuacer platform")
Cc: stable@vger.kernel.org # 5.3
Cc: Masahisa Kojima <masahisa.kojima@linaro.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260410081757.503099-21-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Apr 10 10:17:51 2026 +0200
spi: tegra114: fix controller deregistration
commit 9c9c27ff2058142d8f800de3186d6864184958de upstream.
Make sure to deregister the controller before disabling underlying
resources like clocks during driver unbind.
Fixes: 5c8096439600 ("spi: tegra114: use devm_spi_register_master()")
Cc: stable@vger.kernel.org # 3.13
Cc: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260410081757.503099-22-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Apr 10 10:17:52 2026 +0200
spi: tegra20-sflash: fix controller deregistration
commit ad7310e983327f939dd6c4e801eab13238992572 upstream.
Make sure to deregister the controller before disabling underlying
resources like clocks during driver unbind.
Fixes: f12f7318c44a ("spi: tegra20-sflash: use devm_spi_register_master()")
Cc: stable@vger.kernel.org # 3.13
Cc: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260410081757.503099-23-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Apr 10 10:17:53 2026 +0200
spi: ti-qspi: fix controller deregistration
commit 0c18a1bacbb1d8b8aa34d3d004a2cb8226c8b1ea upstream.
Make sure to deregister the controller before disabling underlying
resources like clocks during driver unbind.
Note that the controller is suspended before disabling and releasing
resources since commit 3ac066e2227c ("spi: spi-ti-qspi: Suspend the
queue before removing the device") which avoids issues like unclocked
accesses but prevents SPI device drivers from doing I/O during
deregistration.
Fixes: 3b3a80019ff1 ("spi: ti-qspi: one only one interrupt handler")
Cc: stable@vger.kernel.org # 3.13
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260410081757.503099-24-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Tue Apr 14 15:43:18 2026 +0200
spi: topcliff-pch: fix controller deregistration
commit 5d6f477d6fc0767c57c5e1e6f55a1662820eef87 upstream.
Make sure to deregister the controller before disabling and releasing
underlying resources like interrupts and DMA during driver unbind.
Fixes: e8b17b5b3f30 ("spi/topcliff: Add topcliff platform controller hub (PCH) spi bus driver")
Cc: stable@vger.kernel.org # 2.6.37
Cc: Masayuki Ohtake <masa-korg@dsn.okisemi.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260414134319.978196-8-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Tue Apr 14 15:43:19 2026 +0200
spi: topcliff-pch: fix use-after-free on unbind
commit 9d72732fe70c11424bc90ed466c7ccfa58b42a9a upstream.
Give the driver a chance to flush its queue before releasing the DMA
buffers on driver unbind
Fixes: c37f3c2749b5 ("spi/topcliff_pch: DMA support")
Cc: stable@vger.kernel.org # 3.1
Cc: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260414134319.978196-9-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Apr 10 10:17:56 2026 +0200
spi: zynq-qspi: fix controller deregistration
commit c9c012706c9fa8ca6d129a9161caf92ab625a3fd upstream.
Make sure to deregister the controller before disabling it during driver
unbind.
Note that clocks were also disabled before the recent commit
1f8fd9490e31 ("spi: zynq-qspi: Simplify clock handling with
devm_clk_get_enabled()").
Fixes: 67dca5e580f1 ("spi: spi-mem: Add support for Zynq QSPI controller")
Cc: stable@vger.kernel.org # 5.2: 8eb2fd00f65a
Cc: stable@vger.kernel.org # 5.2
Cc: Naga Sureshkumar Relli <naga.sureshkumar.relli@xilinx.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260410081757.503099-27-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Apr 10 10:17:55 2026 +0200
spi: zynqmp-gqspi: fix controller deregistration
commit 6895fc4faafc9082e15e4e624b23dd5f0c98feb5 upstream.
Make sure to deregister the controller before disabling underlying
resources like clocks during driver unbind.
Fixes: dfe11a11d523 ("spi: Add support for Zynq Ultrascale+ MPSoC GQSPI controller")
Cc: stable@vger.kernel.org # 4.2: 64640f6c972e
Cc: stable@vger.kernel.org # 4.2
Cc: Ranjit Waghmode <ranjit.waghmode@xilinx.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260410081757.503099-26-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Shyam Sunder Reddy Padira <shyamsunderreddypadira@gmail.com>
Date: Tue Apr 14 12:43:06 2026 +0530
staging: rtl8723bs: os_dep: avoid NULL pointer dereference in rtw_cbuf_alloc
commit bc851db06045a40c18233dd76ef0562d7f8bb6db upstream.
The return value of kzalloc_flex() is used without
ensuring that the allocation succeeded, and the
pointer is dereferenced unconditionally.
Guard the access to the allocated structure to
avoid a potential NULL pointer dereference if the
allocation fails.
Fixes: 980cd426a257 ("staging: rtl8723bs: replace rtw_zmalloc() with kzalloc()")
Cc: stable <stable@kernel.org>
Signed-off-by: Shyam Sunder Reddy Padira <shyamsunderreddypadira@gmail.com>
Reviewed-by: Dan Carpenter <error27@gmail.com>
Link: https://patch.msgid.link/20260414071308.4781-2-shyamsunderreddypadira@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Apr 24 12:49:10 2026 +0200
staging: vme_user: fix root device leak on init failure
commit 32c91e8ee039777d0b95b914633fc6a42607959c upstream.
Make sure to deregister and free the root device in case module
initialisation fails.
Fixes: 658bcdae9c67 ("vme: Adding Fake VME driver")
Cc: stable@vger.kernel.org # 4.9
Cc: Martyn Welch <martyn@welchs.me.uk>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260424104910.2619349-1-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thorsten Blum <thorsten.blum@linux.dev>
Date: Sat Mar 7 11:24:21 2026 +0100
thermal/drivers/sprd: Fix raw temperature clamping in sprd_thm_rawdata_to_temp
commit b3414148bbc1f9cd56217e58a558c6ac4fd1b4a6 upstream.
The raw temperature data was never clamped to SPRD_THM_RAW_DATA_LOW or
SPRD_THM_RAW_DATA_HIGH because the return value of clamp() was not used.
Fix this by assigning the clamped value to 'rawdata'.
Casting SPRD_THM_RAW_DATA_LOW and SPRD_THM_RAW_DATA_HIGH to u32 is also
redundant and can be removed.
Fixes: 554fdbaf19b1 ("thermal: sprd: Add Spreadtrum thermal driver support")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260307102422.306055-2-thorsten.blum@linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thorsten Blum <thorsten.blum@linux.dev>
Date: Sat Mar 7 11:24:20 2026 +0100
thermal/drivers/sprd: Fix temperature clamping in sprd_thm_temp_to_rawdata
commit 83c0f9a5d679a6f8d84fc49b2f62ea434ccab4b6 upstream.
The temperature was never clamped to SPRD_THM_TEMP_LOW or
SPRD_THM_TEMP_HIGH because the return value of clamp() was not used. Fix
this by assigning the clamped value to 'temp'.
Casting SPRD_THM_TEMP_LOW and SPRD_THM_TEMP_HIGH to int is also
redundant and can be removed.
Fixes: 554fdbaf19b1 ("thermal: sprd: Add Spreadtrum thermal driver support")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260307102422.306055-1-thorsten.blum@linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Tue Apr 7 15:58:34 2026 +0200
thermal: core: Free thermal zone ID later during removal
commit daae9c18feec74566e023fc88cfb0ce26e39d868 upstream.
The thermal zone removal ordering is different from the thermal zone
registration rollback path ordering and the former is arguably
problematic because freeing a thermal zone ID prematurely may cause
it to be used during the registration of another thermal zone which
may fail as a result.
Prevent that from occurring by changing the thermal zone removal
ordering to reflect the thermal zone registration rollback path
ordering.
Also more the ida_destroy() call from thermal_zone_device_unregister()
to thermal_release() for consistency.
Fixes: b31ef8285b19 ("thermal core: convert ID allocation to IDA")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/5063934.GXAFRqVoOG@rafael.j.wysocki
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: David Carlier <devnexen@gmail.com>
Date: Sat Apr 4 14:47:47 2026 +0100
tracefs: Fix default permissions not being applied on initial mount
commit e8368d1f4bedbb0cce4cfe33a1d2664bb0fd4f27 upstream.
Commit e4d32142d1de ("tracing: Fix tracefs mount options") moved the
option application from tracefs_fill_super() to tracefs_reconfigure()
called from tracefs_get_tree(). This fixed mount options being ignored
on user-space mounts when the superblock already exists, but introduced
a regression for the initial kernel-internal mount.
On the first mount (via simple_pin_fs during init), sget_fc() transfers
fc->s_fs_info to sb->s_fs_info and sets fc->s_fs_info to NULL. When
tracefs_get_tree() then calls tracefs_reconfigure(), it sees a NULL
fc->s_fs_info and returns early without applying any options. The root
inode keeps mode 0755 from simple_fill_super() instead of the intended
TRACEFS_DEFAULT_MODE (0700).
Furthermore, even on subsequent user-space mounts without an explicit
mode= option, tracefs_apply_options(sb, true) gates the mode behind
fsi->opts & BIT(Opt_mode), which is unset for the defaults. So the
mode is never corrected unless the user explicitly passes mode=0700.
Restore the tracefs_apply_options(sb, false) call in tracefs_fill_super()
to apply default permissions on initial superblock creation, matching
what debugfs does in debugfs_fill_super().
Cc: stable@vger.kernel.org
Fixes: e4d32142d1de ("tracing: Fix tracefs mount options")
Link: https://patch.msgid.link/20260404134747.98867-1-devnexen@gmail.com
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: David Carlier <devnexen@gmail.com>
Date: Mon Apr 13 20:06:01 2026 +0100
tracepoint: balance regfunc() on func_add() failure in tracepoint_add_func()
commit fad217e16fded7f3c09f8637b0f6a224d58b5f2e upstream.
When a tracepoint goes through the 0 -> 1 transition, tracepoint_add_func()
invokes the subsystem's ext->regfunc() before attempting to install the
new probe via func_add(). If func_add() then fails (for example, when
allocate_probes() cannot allocate a new probe array under memory pressure
and returns -ENOMEM), the function returns the error without calling the
matching ext->unregfunc(), leaving the side effects of regfunc() behind
with no installed probe to justify them.
For syscall tracepoints this is particularly unpleasant: syscall_regfunc()
bumps sys_tracepoint_refcount and sets SYSCALL_TRACEPOINT on every task.
After a leaked failure, the refcount is stuck at a non-zero value with no
consumer, and every task continues paying the syscall trace entry/exit
overhead until reboot. Other subsystems providing regfunc()/unregfunc()
pairs exhibit similarly scoped persistent state.
Mirror the existing 1 -> 0 cleanup and call ext->unregfunc() in the
func_add() error path, gated on the same condition used there so the
unwind is symmetric with the registration.
Fixes: 8cf868affdc4 ("tracing: Have the reg function allow to fail")
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260413190601.21993-1-devnexen@gmail.com
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date: Mon Apr 20 23:01:12 2026 +0900
tracing/fprobe: Avoid kcalloc() in rcu_read_lock section
commit aa72812b49104bb5a38272fc9541feb62ca6fd32 upstream.
fprobe_remove_node_in_module() is called under RCU read locked, but
this invokes kcalloc() if there are more than 8 fprobes installed
on the module. Sashiko warns it because kcalloc() can sleep [1].
[1] https://sashiko.dev/#/patchset/177552432201.853249.5125045538812833325.stgit%40mhiramat.tok.corp.google.com
To fix this issue, expand the batch size to 128 and do not expand
the fprobe_addr_list, but just cancel walking on fprobe_ip_table,
update fgraph/ftrace_ops and retry the loop again.
Link: https://lore.kernel.org/all/177669367206.132053.1493637946869032744.stgit@mhiramat.tok.corp.google.com/
Fixes: 0de4c70d04a4 ("tracing: fprobe: use rhltable for fprobe_ip_table")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date: Mon Apr 20 23:01:20 2026 +0900
tracing/fprobe: Check the same type fprobe on table as the unregistered one
commit 0ac0058a74ac5765c7ce09ea630f4fdeaf4d80fa upstream.
Commit 2c67dc457bc6 ("tracing: fprobe: optimization for entry only case")
introduced a different ftrace_ops for entry-only fprobes.
However, when unregistering an fprobe, the kernel only checks if another
fprobe exists at the same address, without checking which type of fprobe
it is.
If different fprobes are registered at the same address, the same address
will be registered in both fgraph_ops and ftrace_ops, but only one of
them will be deleted when unregistering. (the one removed first will not
be deleted from the ops).
This results in junk entries remaining in either fgraph_ops or ftrace_ops.
For example:
=======
cd /sys/kernel/tracing
# 'Add entry and exit events on the same place'
echo 'f:event1 vfs_read' >> dynamic_events
echo 'f:event2 vfs_read%return' >> dynamic_events
# 'Enable both of them'
echo 1 > events/fprobes/enable
cat enabled_functions
vfs_read (2) ->arch_ftrace_ops_list_func+0x0/0x210
# 'Disable and remove exit event'
echo 0 > events/fprobes/event2/enable
echo -:event2 >> dynamic_events
# 'Disable and remove all events'
echo 0 > events/fprobes/enable
echo > dynamic_events
# 'Add another event'
echo 'f:event3 vfs_open%return' > dynamic_events
cat dynamic_events
f:fprobes/event3 vfs_open%return
echo 1 > events/fprobes/enable
cat enabled_functions
vfs_open (1) tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150}
vfs_read (1) tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150}
=======
As you can see, an entry for the vfs_read remains.
To fix this issue, when unregistering, the kernel should also check if
there is the same type of fprobes still exist at the same address, and
if not, delete its entry from either fgraph_ops or ftrace_ops.
Link: https://lore.kernel.org/all/177669367993.132053.10553046138528674802.stgit@mhiramat.tok.corp.google.com/
Fixes: 2c67dc457bc6 ("tracing: fprobe: optimization for entry only case")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date: Mon Apr 20 23:01:04 2026 +0900
tracing/fprobe: Remove fprobe from hash in failure path
commit 845947aca6814f5723ed65e556eb5ee09493f05b upstream.
When register_fprobe_ips() fails, it tries to remove a list of
fprobe_hash_node from fprobe_ip_table, but it missed to remove
fprobe itself from fprobe_table. Moreover, when removing
the fprobe_hash_node which is added to rhltable once, it must
use kfree_rcu() after removing from rhltable.
To fix these issues, this reuses unregister_fprobe() internal
code to rollback the half-way registered fprobe.
Link: https://lore.kernel.org/all/177669366417.132053.17874946321744910456.stgit@mhiramat.tok.corp.google.com/
Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date: Mon Apr 20 23:00:56 2026 +0900
tracing/fprobe: Unregister fprobe even if memory allocation fails
commit 1aec9e5c3e31ce1e28f914427fb7f90b91d310df upstream.
unregister_fprobe() can fail under memory pressure because of memory
allocation failure, but this maybe called from module unloading, and
usually there is no way to retry it. Moreover. trace_fprobe does not
check the return value.
To fix this problem, unregister fprobe and fprobe_hash_node even if
working memory allocation fails.
Anyway, if the last fprobe is removed, the filter will be freed.
Link: https://lore.kernel.org/all/177669365629.132053.8433032896213721288.stgit@mhiramat.tok.corp.google.com/
Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Steven Rostedt <rostedt@goodmis.org>
Date: Tue Apr 28 12:23:02 2026 -0400
tracing/probes: Limit size of event probe to 3K
commit b2aa3b4d64e460ac606f386c24e7d8a873ce6f1a upstream.
There currently isn't a max limit an event probe can be. One could make an
event greater than PAGE_SIZE, which makes the event useless because if
it's bigger than the max event that can be recorded into the ring buffer,
then it will never be recorded.
A event probe should never need to be greater than 3K, so make that the
max size. As long as the max is less than the max that can be recorded
onto the ring buffer, it should be fine.
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 93ccae7a22274 ("tracing/kprobes: Support basic types on dynamic events")
Link: https://patch.msgid.link/20260428122302.706610ba@gandalf.local.home
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Michael Bommarito <michael.bommarito@gmail.com>
Date: Mon Apr 13 17:12:40 2026 -0400
udf: reject descriptors with oversized CRC length
commit 55d41b0a20128e86b9e960dd2e3f0a2d69a18df7 upstream.
udf_read_tagged() skips CRC verification when descCRCLength +
sizeof(struct tag) exceeds the block size. A crafted UDF image can
set descCRCLength to an oversized value to bypass CRC validation
entirely; the descriptor is then accepted based solely on the 8-bit
tag checksum, which is trivially recomputable.
Reject such descriptors instead of silently accepting them. A
legitimate single-block descriptor should never have a CRC length that
exceeds the block.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Assisted-by: Codex:gpt-5-4
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://patch.msgid.link/20260413211240.853662-1-michael.bommarito@gmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Selvarasu Ganesan <selvarasu.g@samsung.com>
Date: Fri Apr 17 12:03:11 2026 +0530
usb: dwc3: Move GUID programming after PHY initialization
commit aad35f9c926ec220b0742af1ada45666ae667956 upstream.
The Linux Version Code is currently written to the GUID register before
PHY initialization. Certain PHY implementations (such as Synopsys eUSB
PHY performing link_sw_reset) clear the GUID register to its default
value during initialization, causing the kernel version information to
be lost.
Move the GUID register programming to occur after PHY initialization
completes to ensure the Linux version information persists.
Fixes: fa0ea13e9f1c ("usb: dwc3: core: write LINUX_VERSION_CODE to our GUID register")
Cc: stable <stable@kernel.org>
Reported-by: Pritam Manohar Sutar <pritam.sutar@samsung.com>
Signed-off-by: Selvarasu Ganesan <selvarasu.g@samsung.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://patch.msgid.link/20260417063314.2359-1-selvarasu.g@samsung.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Mon Apr 13 21:49:12 2026 +0300
USB: omap_udc: DMA: Don't enable burst 4 mode
commit 3f91484f6c13c434bd573ca6b6779c26adb0ddab upstream.
Commit 65111084c63d7 ("USB: more omap_udc updates (dma and omap1710)")
added setting for DMA burst 4 mode. But I think this should be undone for
two reasons:
- It breaks DMA on 15xx boards - transfers just silently stall.
- On newer OMAP1 boards, like Nokia 770 (omap1710), there is no measurable
performance impact when testing TCP throughput with g_ether with large
15000 byte MTU size.
It's also worth noting that when the original change was made, the
OMAP_DMA_DATA_BURST_4 handling in arch/arm/plat-omap/dma.c was broken, and
actually resulted in the same as the OMAP_DMA_DATA_BURST_DIS i.e. burst
disabled. This was fixed not until a couple kernel releases later in an
unrelated commit 1a8bfa1eb998a ("[ARM] 3142/1: OMAP 2/5: Update files
common to omap1 and omap2").
So based on this it seems there was never really a very good reason to
enable this burst mode in omap_udc, so remove it now to allow 15xx DMA
to work again (it provides 2x throughput compared to PIO mode).
Fixes: 65111084c63d ("[PATCH] USB: more omap_udc updates (dma and omap1710)")
Cc: stable <stable@kernel.org>
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Link: https://patch.msgid.link/ad06qHLclWHeSGnV@darkstar.musicnaut.iki.fi
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Fabio Porcedda <fabio.porcedda@gmail.com>
Date: Mon Apr 27 11:17:46 2026 +0200
USB: serial: option: add Telit Cinterion LE910Cx compositions
commit 100201d349edd226ca3470c894c92dccc67ee7a8 upstream.
Add the following Telit Cinterion LE910Cx compositions:
0x1251: RNDIS + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (SAP)
T: Bus=01 Lev=01 Prnt=21 Port=06 Cnt=01 Dev#=108 Spd=480 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1bc7 ProdID=1251 Rev=03.18
S: Manufacturer=Android
S: Product=LE910C1-EU
S: SerialNumber=0123456789ABCDEF
C: #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA
I: If#= 0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=02 Prot=ff Driver=rndis_host
E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
I: If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=88(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=8a(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
0x1253: ECM + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (SAP)
T: Bus=01 Lev=01 Prnt=21 Port=06 Cnt=01 Dev#=121 Spd=480 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1bc7 ProdID=1253 Rev=03.18
S: Manufacturer=Android
S: Product=LE910C1-EU
S: SerialNumber=0123456789ABCDEF
C: #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA
I: If#= 0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=06 Prot=00 Driver=cdc_ether
E: Ad=82(I) Atr=03(Int.) MxPS= 16 Ivl=32ms
I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=88(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=8a(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
0x1254: tty (AT) + tty (AT)
T: Bus=01 Lev=01 Prnt=21 Port=06 Cnt=01 Dev#=122 Spd=480 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1bc7 ProdID=1254 Rev=03.18
S: Manufacturer=Android
S: Product=LE910C1-EU
S: SerialNumber=0123456789ABCDEF
C: #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr=500mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=82(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
0x1255: tty (AT/NMEA) + tty (AT) + tty (AT) + tty (SAP)
T: Bus=01 Lev=01 Prnt=21 Port=06 Cnt=01 Dev#=123 Spd=480 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1bc7 ProdID=1255 Rev=03.18
S: Manufacturer=Android
S: Product=LE910C1-EU
S: SerialNumber=0123456789ABCDEF
C: #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=82(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=88(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
Cc: stable@vger.kernel.org
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Xu Yang <xu.yang_2@nxp.com>
Date: Fri Apr 24 15:40:09 2026 +0800
usb: typec: tcpm: fix debug accessory mode detection for sink ports
commit f6ec9bb4acc7182b25a793ad094a764e1cb819a7 upstream.
The port in debug accessory mode can be either a source or sink. The
previous tcpm_port_is_debug() function only checked for source port.
Commit 8db73e6a42b6 ("usb: typec: tcpm: allow sink (ufp) to toggle into
accessory mode debug") changed the detection logic to support both roles,
but left some logic in _tcpm_cc_change() unchanged, This causes the state
machine to transition to an incorrect state when operating as a sink in
debug accessory mode. Log as below:
[ 978.637541] CC1: 0 -> 5, CC2: 0 -> 5 [state TOGGLING, polarity 0, connected]
[ 978.637567] state change TOGGLING -> SRC_ATTACH_WAIT [rev1 NONE_AMS]
[ 978.637596] pending state change SRC_ATTACH_WAIT -> DEBUG_ACC_ATTACHED @ 180 ms [rev1 NONE_AMS]
[ 978.647098] CC1: 5 -> 0, CC2: 5 -> 5 [state SRC_ATTACH_WAIT, polarity 0, connected]
[ 978.647115] state change SRC_ATTACH_WAIT -> SRC_ATTACH_WAIT [rev1 NONE_AMS]
It should go to SNK_ATTACH_WAIT instead of SRC_ATTACH_WAIT state.
To fix this, add tcpm_port_is_debug_source() and tcpm_port_is_debug_sink()
helper to explicitly identify the power mode in debug accessory mode.
Update the state transition logic in _tcpm_cc_change() to ensure the state
machine transitions comply with Type-C specification. Also update the logic
in run_state_machine() to keep consistency.
Fixes: 8db73e6a42b6 ("usb: typec: tcpm: allow sink (ufp) to toggle into accessory mode debug")
Cc: stable <stable@kernel.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Amit Sunil Dhamne <amitsd@google.com>
Link: https://patch.msgid.link/20260424074009.2979266-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Felix Gu <ustc.gu@gmail.com>
Date: Tue Apr 7 21:21:22 2026 +0800
usb: ulpi: fix memory leak on ulpi_register() error paths
commit 0b9fcab1b8608d429e5f239afb197de928d4de7d upstream.
Commit 01af542392b5 ("usb: ulpi: fix double free in
ulpi_register_interface() error path") removed kfree(ulpi) from
ulpi_register_interface() to fix a double-free when device_register()
fails.
But when ulpi_of_register() or ulpi_read_id() fail before
device_register() is called, the ulpi allocation is leaked.
Add kfree(ulpi) on both error paths to properly clean up the allocation.
Fixes: 01af542392b5 ("usb: ulpi: fix double free in ulpi_register_interface() error path")
Cc: stable <stable@kernel.org>
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/20260407-ulpi-v1-1-f3fafe53f7b2@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon Apr 20 18:11:03 2026 +0200
usb: usblp: fix heap leak in IEEE 1284 device ID via short response
commit 7a400c6fe3617e31e690e3f7ca37bb335e0498f3 upstream.
usblp_ctrl_msg() collapses the usb_control_msg() return value to
0/-errno, discarding the actual number of bytes transferred. A broken
printer can complete the GET_DEVICE_ID control transfer short and the
driver has no way to know.
usblp_cache_device_id_string() reads the 2-byte big-endian length prefix
from the response and trusts it (clamped only to the buffer bounds).
The buffer is kmalloc(1024) at probe time. A device that sends exactly
two bytes (e.g. 0x03 0xFF, claiming a 1023-byte ID) leaves
device_id_string[2..1022] holding stale kmalloc heap.
That stale data is then exposed:
- via the ieee1284_id sysfs attribute (sprintf("%s", buf+2), truncated
at the first NUL in the stale heap), and
- via the IOCNR_GET_DEVICE_ID ioctl, which copy_to_user()s the full
claimed length regardless of NULs, up to 1021 bytes of uninitialized
heap, with the leak size chosen by the device.
Fix this up by just zapping the buffer with zeros before each request
sent to the device.
Cc: Pete Zaitcev <zaitcev@redhat.com>
Assisted-by: gkh_clanker_t1000
Cc: stable <stable@kernel.org>
Link: https://patch.msgid.link/2026042002-unicorn-greedily-3c63@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon Apr 20 18:11:04 2026 +0200
usb: usblp: fix uninitialized heap leak via LPGETSTATUS ioctl
commit b38e53cbfb9d84732e5984fbd73e128d592415c5 upstream.
Just like in a previous problem in this driver, usblp_ctrl_msg() will
collapse the usb_control_msg() return value to 0/-errno, discarding the
actual number of bytes transferred.
Ideally that short command should be detected and error out, but many
printers are known to send "incorrect" responses back so we can't just
do that.
statusbuf is kmalloc(8) at probe time and never filled before the first
LPGETSTATUS ioctl.
usblp_read_status() requests 1 byte. If a malicious printer responds
with zero bytes, *statusbuf is one byte of stale kmalloc heap,
sign-extended into the local int status, which the LPGETSTATUS path then
copy_to_user()s directly to the ioctl caller.
Fix this all by just zapping out the memory buffer when allocated at
probe time. If a later call does a short read, the data will be
identical to what the device sent it the last time, so there is no
"leak" of information happening.
Cc: Pete Zaitcev <zaitcev@redhat.com>
Assisted-by: gkh_clanker_t1000
Cc: stable <stable@kernel.org>
Link: https://patch.msgid.link/2026042011-shredder-savage-48c6@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Date: Tue Dec 9 11:04:59 2025 +0100
wifi: ath5k: do not access array OOB
commit d748603f12baff112caa3ab7d39f50100f010dbd upstream.
Vincent reports:
> The ath5k driver seems to do an array-index-out-of-bounds access as
> shown by the UBSAN kernel message:
> UBSAN: array-index-out-of-bounds in drivers/net/wireless/ath/ath5k/base.c:1741:20
> index 4 is out of range for type 'ieee80211_tx_rate [4]'
> ...
> Call Trace:
> <TASK>
> dump_stack_lvl+0x5d/0x80
> ubsan_epilogue+0x5/0x2b
> __ubsan_handle_out_of_bounds.cold+0x46/0x4b
> ath5k_tasklet_tx+0x4e0/0x560 [ath5k]
> tasklet_action_common+0xb5/0x1c0
It is real. 'ts->ts_final_idx' can be 3 on 5212, so:
info->status.rates[ts->ts_final_idx + 1].idx = -1;
with the array defined as:
struct ieee80211_tx_rate rates[IEEE80211_TX_MAX_RATES];
while the size is:
#define IEEE80211_TX_MAX_RATES 4
is indeed bogus.
Set this 'idx = -1' sentinel only if the array index is less than the
array size. As mac80211 will not look at rates beyond the size
(IEEE80211_TX_MAX_RATES).
Note: The effect of the OOB write is negligible. It just overwrites the
next member of info->status, i.e. ack_signal.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Reported-by: Vincent Danjean <vdanjean@debian.org>
Link: https://lore.kernel.org/all/aQYUkIaT87ccDCin@eldamar.lan
Closes: https://bugs.debian.org/1119093
Fixes: 6d7b97b23e11 ("ath5k: fix tx status reporting issues")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251209100459.2253198-1-jirislaby@kernel.org
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tristan Madani <tristan@talencesecurity.com>
Date: Fri Apr 17 11:11:44 2026 +0000
wifi: b43: enforce bounds check on firmware key index in b43_rx()
commit 1f4f78bf8549e6ac4f04fba4176854f3a6e0c332 upstream.
The firmware-controlled key index in b43_rx() can exceed the dev->key[]
array size (58 entries). The existing B43_WARN_ON is non-enforcing in
production builds, allowing an out-of-bounds read.
Make the B43_WARN_ON check enforcing by dropping the frame when the
firmware returns an invalid key index.
Suggested-by: Jonas Gorski <jonas.gorski@gmail.com>
Acked-by: Michael Büsch <m@bues.ch>
Fixes: e4d6b7951812 ("[B43]: add mac80211-based driver for modern BCM43xx devices")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Link: https://patch.msgid.link/20260417111145.2694196-1-tristmd@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tristan Madani <tristan@talencesecurity.com>
Date: Fri Apr 17 11:11:45 2026 +0000
wifi: b43legacy: enforce bounds check on firmware key index in RX path
commit a035766f970bde2d4298346a31a80685be5c0205 upstream.
Same fix as b43: the firmware-controlled key index in b43legacy_rx()
can exceed dev->max_nr_keys. The existing B43legacy_WARN_ON is
non-enforcing in production builds, allowing an out-of-bounds read of
dev->key[].
Make the check enforcing by dropping the frame for invalid indices.
Fixes: 75388acd0cd8 ("[B43LEGACY]: add mac80211-based driver for legacy BCM43xx devices")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Link: https://patch.msgid.link/20260417111145.2694196-2-tristmd@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Marek Szyprowski <m.szyprowski@samsung.com>
Date: Thu Apr 16 11:33:39 2026 +0200
wifi: brcmfmac: Fix potential use-after-free issue when stopping watchdog task
commit c623b63580880cc742255eaed3d79804c1b91143 upstream.
Watchdog task might end between send_sig() and kthread_stop() calls, what
results in the use-after-free issue. Fix this by increasing watchdog task
reference count before calling send_sig() and dropping it by switching to
kthread_stop_put().
Cc: stable@vger.kernel.org
Fixes: 373c83a801f1 ("brcmfmac: stop watchdog before detach and free everything")
Fixes: a9ffda88be74 ("brcm80211: fmac: abstract bus_stop interface function pointer")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://patch.msgid.link/20260416093339.2066829-1-m.szyprowski@samsung.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Catherine <enderaoelyther@gmail.com>
Date: Fri Apr 24 21:14:36 2026 +0800
wifi: mac80211: drop stray 'static' from fast-RX rx_result
commit 7a5b81e0c87a075afd572f659d8eb68c9c4cd2ba upstream.
ieee80211_invoke_fast_rx() is documented as safe for parallel RX, but
its per-invocation rx_result is declared static. Concurrent callers then
share one instance and can overwrite each other's result between
ieee80211_rx_mesh_data() and the switch on res.
That can make a packet that was queued or consumed by
ieee80211_rx_mesh_data() fall through into ieee80211_rx_8023(), or make
a packet that should continue return as queued.
Make res an automatic variable so each invocation keeps its own result.
Fixes: 3468e1e0c639 ("wifi: mac80211: add mesh fast-rx support")
Cc: stable@vger.kernel.org
Signed-off-by: Catherine <enderaoelyther@gmail.com>
Link: https://patch.msgid.link/20260424131435.83212-2-enderaoelyther@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johannes Berg <johannes.berg@intel.com>
Date: Tue May 5 15:15:34 2026 +0200
wifi: mac80211: remove station if connection prep fails
commit 283fc9e44ff5b5ac967439b4951b80bd4299f4e4 upstream.
If connection preparation fails for MLO connections, then the
interface is completely reset to non-MLD. In this case, we must
not keep the station since it's related to the link of the vif
being removed. Delete an existing station. Any "new_sta" is
already being removed, so that doesn't need changes.
This fixes a use-after-free/double-free in debugfs if that's
enabled, because a vif going from MLD (and to MLD, but that's
not relevant here) recreates its entire debugfs.
Cc: stable@vger.kernel.org
Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260505151533.c4e52deb06ad.Iafe56cec7de8512626169496b134bce3a6c17010@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Benjamin Berg <benjamin.berg@intel.com>
Date: Tue May 5 15:15:40 2026 +0200
wifi: mac80211: use safe list iteration in radar detect work
commit ac8eb3e18f41e2cc8492cc1d358bcb786c850270 upstream.
The call to ieee80211_dfs_cac_cancel can cause the iterated chanctx to
be freed and removed from the list. Guard against this to avoid a
slab-use-after-free error.
Cc: stable@vger.kernel.org
Fixes: bca8bc0399ac ("wifi: mac80211: handle ieee80211_radar_detected() for MLO")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20260505151539.236d63a1b736.I35dbb9e96a2d4a480be208770fdd99ba3b817b79@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Leon Yen <leon.yen@mediatek.com>
Date: Thu Oct 9 10:01:58 2025 +0800
wifi: mt76: mt7921: fix a potential clc buffer length underflow
commit 5373f8b19e568b5c217832b9bbef165bd2b2df14 upstream.
The buf_len is used to limit the iterations for retrieving the country
power setting and may underflow under certain conditions due to changes
in the power table in CLC.
This underflow leads to an almost infinite loop or an invalid power
setting resulting in driver initialization failure.
Cc: stable@vger.kernel.org
Fixes: fa6ad88e023d ("wifi: mt76: mt7921: fix country count limitation for CLC")
Signed-off-by: Leon Yen <leon.yen@mediatek.com>
Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com>
Link: https://patch.msgid.link/20251009020158.1923429-1-mingyen.hsieh@mediatek.com
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Quan Zhou <quan.zhou@mediatek.com>
Date: Fri Jan 23 10:16:25 2026 +0800
wifi: mt76: mt7921: fix ROC abort flow interruption in mt7921_roc_work
commit fdfa39f9f4fbae532b162da913a67b2410caf38f upstream.
The mt7921_set_roc API may be executed concurrently with mt7921_roc_work,
specifically between the following code paths:
- The check and clear of MT76_STATE_ROC in mt7921_roc_work:
if (!test_and_clear_bit(MT76_STATE_ROC, &phy->mt76->state))
return;
- The execution of ieee80211_iterate_active_interfaces.
This race condition can interrupt the ROC abort flow, resulting in
the ROC process failing to abort as expected.
To address this defect, the modification of MT76_STATE_ROC is now
protected by mt792x_mutex_acquire(phy->dev). This ensures that
changes to the ROC state are properly synchronized, preventing
race conditions and ensuring the ROC abort flow is not interrupted.
Fixes: 034ae28b56f1 ("wifi: mt76: mt7921: introduce remain_on_channel support")
Cc: stable@vger.kernel.org
Signed-off-by: Quan Zhou <quan.zhou@mediatek.com>
Reviewed-by: Sean Wang <sean.wang@mediatek.com>
Link: https://patch.msgid.link/2568ece8b557e5dda79391414c834ef3233049b6.1769133724.git.quan.zhou@mediatek.com
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Quan Zhou <quan.zhou@mediatek.com>
Date: Thu Nov 27 15:49:11 2025 +0800
wifi: mt76: mt7925: fix AMPDU state handling in mt7925_tx_check_aggr
commit bb8e38fcdbf7290d7f0cd572d2d8fdb2b641b492 upstream.
Previously, the AMPDU state bit for a given TID was set before attempting
to start a BA session, which could result in the AMPDU state being marked
active even if ieee80211_start_tx_ba_session() failed. This patch changes
the logic to only set the AMPDU state bit after successfully starting a BA
session, ensuring proper synchronization between AMPDU state and BA session
status.
This fixes potential issues with aggregation state tracking and improves
compatibility with mac80211 BA session management.
Fixes: 44eb173bdd4f ("wifi: mt76: mt7925: add link handling in mt7925_txwi_free")
Cc: stable@vger.kernel.org
Signed-off-by: Quan Zhou <quan.zhou@mediatek.com>
Reviewed-by: Sean Wang <sean.wang@mediatek.com>
Link: https://patch.msgid.link/d5960fbced0beaf33c30203f7f8fb91d0899c87b.1764228973.git.quan.zhou@mediatek.com
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ming Yen Hsieh <mingyen.hsieh@mediatek.com>
Date: Mon Sep 8 15:25:26 2025 +0800
wifi: mt76: mt7925: fix incorrect length field in txpower command
commit ccb186326bb6b7f20d77982f855568e7087ad0d7 upstream.
Set `tx_power_tlv->len` to `msg_len` instead of `sizeof(*tx_power_tlv)`
to ensure the correct message length is sent to firmware.
Cc: stable@vger.kernel.org
Fixes: c948b5da6bbe ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 chips")
Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com>
Link: https://patch.msgid.link/20250908072526.1833938-1-mingyen.hsieh@mediatek.com
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Quan Zhou <quan.zhou@mediatek.com>
Date: Wed Feb 25 17:47:22 2026 +0800
wifi: mt76: mt7925: fix incorrect TLV length in CLC command
commit 62e037aa8cf5a69b7ea63336705a35c897b9db2b upstream.
The previous implementation of __mt7925_mcu_set_clc() set the TLV length
field (.len) incorrectly during CLC command construction. The length was
initialized as sizeof(req) - 4, regardless of the actual segment length.
This could cause the WiFi firmware to misinterpret the command payload,
resulting in command execution errors.
This patch moves the TLV length assignment to after the segment is
selected, and sets .len to sizeof(req) + seg->len - 4, matching the
actual command content. This ensures the firmware receives the
correct TLV length and parses the command properly.
Fixes: c948b5da6bbe ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 chips")
Cc: stable@vger.kernel.org
Signed-off-by: Quan Zhou <quan.zhou@mediatek.com>
Acked-by: Sean Wang <sean.wang@mediatek.com>
Link: https://patch.msgid.link/f56ae0e705774dfa8aab3b99e5bbdc92cd93523e.1772011204.git.quan.zhou@mediatek.com
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jeongjun Park <aha310510@gmail.com>
Date: Thu Apr 23 02:38:46 2026 +0900
wifi: rsi: fix kthread lifetime race between self-exit and external-stop
commit db57a1aa54ff68669781976e4edb045e09e2b65b upstream.
RSI driver use both self-exit(kthread_complete_and_exit) and external-stop
(kthread_stop) when killing a kthread. Generally, kthread_stop() is called
first, and in this case, no particular issues occur.
However, in rare instances where kthread_complete_and_exit() is called
first and then kthread_stop() is called, a UAF occurs because the kthread
object, which has already exited and been freed, is accessed again.
Therefore, to prevent this with minimal modification, you must remove
kthread_stop() and change the code to wait until the self-exit operation
is completed.
Cc: <stable@vger.kernel.org>
Reported-by: syzbot+5de83f57cd8531f55596@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69e5d03b.a00a0220.1bd0ca.0064.GAE@google.com/
Fixes: 4c62764d0fc2 ("rsi: improve kernel thread handling to fix kernel panic")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Link: https://patch.msgid.link/20260422173846.37640-1-aha310510@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: David Gow <david@davidgow.net>
Date: Thu Apr 16 14:57:43 2026 +0800
x86/boot/e820: Re-enable BIOS fallback if e820 table is empty
commit 5772f6535227ebd104065d80afa8ed3478d34c5c upstream.
In commit:
157266edcc56 ("x86/boot/e820: Simplify append_e820_table() and remove restriction on single-entry tables")
the check on the number of entries in the e820 table was removed. The intention
was to support single-entry maps, but by removing the check entirely, we also
skip the fallback (to, e.g., the BIOS 88h function).
This means that if no E820 map is passed in from the bootloader (which is the
case on some bootloaders, like linld), we end up with an empty memory map, and
the kernel fails to boot (either by deadlocking on OOM, or by failing to
allocate the real mode trampoline, or similar).
Re-instate the check in append_e820_table(), but only check that nr_entries is
non-zero. This allows e820__memory_setup_default() to fall back to other memory
size sources, and doesn't affect e820__memory_setup_extended(), as the latter
ignores the return value from append_e820_table().
In doing so, we also update the return values to be proper error codes, with
-ENOENT for this case (there are no entries), and -EINVAL for the case where an
entry appears invalid. Given none of the callers check the actual value -- just
whether it's nonzero -- this is largely aesthetic in practice.
Tested against linld, and the kernel boots again fine.
[ mingo: Readability edits to the comment and the changelog. ]
Fixes: 157266edcc56 ("x86/boot/e820: Simplify append_e820_table() and remove restriction on single-entry tables")
Signed-off-by: David Gow <david@davidgow.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://patch.msgid.link/20260416065746.1896647-1-david@davidgow.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Prathyushi Nangia <prathyushi.nangia@amd.com>
Date: Tue Dec 9 10:01:33 2025 -0600
x86/CPU/AMD: Prevent improper isolation of shared resources in Zen2's op cache
commit c21b90f77687075115d989e53a8ec5e2bb427ab1 upstream.
Make sure resources are not improperly shared in the op cache and
cause instruction corruption this way.
Signed-off-by: Prathyushi Nangia <prathyushi.nangia@amd.com>
Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ivan Hu <ivan.hu@canonical.com>
Date: Thu Apr 30 15:41:07 2026 +0800
x86/efi: Fix graceful fault handling after FPU softirq changes
commit 088f65e206087bf903743bd18417261d7a4c9644 upstream.
Since commit d02198550423 ("x86/fpu: Improve crypto performance by
making kernel-mode FPU reliably usable in softirqs"), kernel_fpu_begin()
calls fpregs_lock() which uses local_bh_disable() instead of the
previous preempt_disable(). This sets SOFTIRQ_OFFSET in preempt_count
during the entire EFI runtime service call, causing in_interrupt() to
return true in normal task context.
The graceful page fault handler efi_crash_gracefully_on_page_fault()
uses in_interrupt() to bail out for faults in real interrupt context.
With SOFTIRQ_OFFSET now set, the handler always bails out, leaving EFI
firmware page faults unhandled. This escalates to die() which also sees
in_interrupt() as true and calls panic("Fatal exception in interrupt"),
resulting in a hard system freeze. On systems with buggy firmware that
triggers page faults during EFI runtime calls (e.g., accessing unmapped
memory in GetTime()), this causes an unrecoverable hang instead of the
expected graceful EFI_ABORTED recovery.
Fix by replacing in_interrupt() with !in_task(). This preserves the
original intent of bailing for interrupts or NMI faults, while no longer
falsely triggering from the FPU code path's local_bh_disable().
Fixes: d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ivan Hu <ivan.hu@canonical.com>
[ardb: Sashiko spotted that using 'in_hardirq() || in_nmi()' leaves a
window where a softirq may be taken before fpregs_lock() is
called, but after efi_rts_work.efi_rts_id has been assigned,
and any page faults occurring in that window will then be
misidentified as having been caused by the firmware. Instead,
use !in_task(), which incorporates in_serving_softirq(). ]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ard Biesheuvel <ardb@kernel.org>
Date: Fri May 1 09:16:38 2026 +0200
x86/efi: Restore IRQ state in EFI page fault handler
commit 2c340aab5485ebe9e33c01437dd4815ef33c8df5 upstream.
The kernel's softirq API does not permit re-enabling softirqs while IRQs
are disabled. The reason for this is that local_bh_enable() will not
only re-enable delivery of softirqs over the back of IRQs, it will also
handle any pending softirqs immediately, regardless of whether IRQs are
enabled at that point.
For this reason, commit
d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs")
disables softirqs only when IRQs are enabled, as it is not permitted
otherwise, but also unnecessary, given that asynchronous softirq
delivery never happens to begin with while IRQs are disabled.
However, this does mean that entering a kernel mode FPU section with
IRQs enabled and leaving it with IRQs disabled leads to problems, as
identified by Sashiko [0]: the EFI page fault handler is called from
page_fault_oops() with IRQs disabled, and thus ends the kernel mode FPU
section with IRQs disabled as well, regardless of whether IRQs were
enabled when it was started. This may result in schedule() being called
with a non-zero preempt_count, causing a BUG().
So take care to re-enable IRQs when handling any EFI page faults if they
were taken with IRQs enabled.
[0] https://sashiko.dev/#/patchset/20260430074107.27051-1-ivan.hu%40canonical.com
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Ivan Hu <ivan.hu@canonical.com>
Cc: x86@kernel.org
Cc: <stable@vger.kernel.org>
Fixes: d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs")
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Michael Bommarito <michael.bommarito@gmail.com>
Date: Sun Apr 19 18:35:42 2026 -0400
xfrm: ah: account for ESN high bits in async callbacks
commit ec54093e6a8f87e800bb6aa15eb7fc1e33faa524 upstream.
AH allocates its temporary auth/ICV layout differently when ESN is enabled:
the async ahash setup appends a 4-byte seqhi slot before the ICV or
auth_data area, but the async completion callbacks still reconstruct the
temporary layout as if seqhi were absent.
With an async AH implementation selected, that makes AH copy or compare
the wrong bytes on both the IPv4 and IPv6 paths. In UML repro on IPv4 AH
with ESN and forced async hmac(sha1), ping fails with 100% packet loss,
and the callback logs show the pre-fix drift:
ah4 output_done: esn=1 err=0 icv_off=20 expected_off=24
ah4 input_done: esn=1 auth_off=20 expected_auth_off=24 icv_off=32 expected_icv_off=36
Reconstruct the callback-side layout the same way the setup path built it
by skipping the ESN seqhi slot before locating the saved auth_data or ICV.
Per RFC 4302, the ESN high-order 32 bits participate in the AH ICV
computation, so the async callbacks must account for the seqhi slot.
Post-fix, the same IPv4 AH+ESN+forced-async-hmac(sha1) UML repro shows
the corrected offset (ah4 output_done: esn=1 err=0 icv_off=24
expected_off=24) and ping succeeds; net/ipv4/ah4.o and net/ipv6/ah6.o
build clean at W=1. IPv6 AH+ESN was not exercised at runtime, and the
change has not been tested against a real async hardware AH engine.
Fixes: d4d573d0334d ("{IPv4,xfrm} Add ESN support for AH egress part")
Fixes: d8b2a8600b0e ("{IPv4,xfrm} Add ESN support for AH ingress part")
Fixes: 26dd70c3fad3 ("{IPv6,xfrm} Add ESN support for AH egress part")
Fixes: 8d6da6f32557 ("{IPv6,xfrm} Add ESN support for AH ingress part")
Cc: stable@vger.kernel.org
Assisted-by: Codex:gpt-5-4
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Michal Kosiorek <mkosiorek121@gmail.com>
Date: Wed Apr 29 10:54:51 2026 +0200
xfrm: defensively unhash xfrm_state lists in __xfrm_state_delete
commit 14acf9652e5690de3c7486c6db5fb8dafd0a32a3 upstream.
KASAN reproduces a slab-use-after-free in __xfrm_state_delete()'s
hlist_del_rcu calls under syzkaller load on linux-6.12.y stable
(reproduced on 6.12.47, also reachable via the same code path on
torvalds/master and on the ipsec tree). Nine unique signatures cluster
in the xfrm_state lifecycle, the load-bearing one being:
BUG: KASAN: slab-use-after-free in __hlist_del include/linux/list.h:990 [inline]
BUG: KASAN: slab-use-after-free in hlist_del_rcu include/linux/rculist.h:516 [inline]
BUG: KASAN: slab-use-after-free in __xfrm_state_delete net/xfrm/xfrm_state.c
Write of size 8 at addr ffff8881198bcb70 by task kworker/u8:9/435
Workqueue: netns cleanup_net
Call Trace:
__hlist_del / hlist_del_rcu
__xfrm_state_delete
xfrm_state_delete
xfrm_state_flush
xfrm_state_fini
ops_exit_list
cleanup_net
The other observed signatures hit the same slab object from
__xfrm_state_lookup, xfrm_alloc_spi, __xfrm_state_insert and an OOB
write variant of __xfrm_state_delete, all on the byseq/byspi
hash chains.
__xfrm_state_delete() guards its byseq and byspi unhashes with
value-based predicates:
if (x->km.seq)
hlist_del_rcu(&x->byseq);
if (x->id.spi)
hlist_del_rcu(&x->byspi);
while everywhere else in the file (e.g. state_cache, state_cache_input)
the safer hlist_unhashed() check is used. xfrm_alloc_spi() sets
x->id.spi = newspi inside xfrm_state_lock and then immediately inserts
into byspi, but a path that observes x->id.spi != 0 outside of
xfrm_state_lock can still skip-or-hit the byspi unhash inconsistently
with whether x is actually on the list. The same holds for x->km.seq
versus byseq, and the bydst/bysrc unhashes have no predicate at all,
so a second __xfrm_state_delete() on the same object writes through
LIST_POISON pprev.
The defensive change here:
- Use hlist_del_init_rcu() instead of hlist_del_rcu() on bydst,
bysrc, byseq and byspi so a second deletion is a no-op rather
than a write through LIST_POISON pprev. The byseq/byspi nodes
are already initialised in xfrm_state_alloc().
- Test hlist_unhashed() rather than the value predicate for
byseq/byspi, so the unhash decision tracks list state rather than
mutable scalar fields.
Empirical verification: applied this patch on top of v6.12.47, rebuilt,
and re-ran the same syzkaller harness for 1h16m on a previously-crashy
configuration that produced ~100 hits each of slab-use-after-free
Read in xfrm_alloc_spi / Read in __xfrm_state_lookup / Write in
__xfrm_state_delete. After the patch, 7.1M execs across 32 VMs at
~1550 exec/sec produced zero xfrm_state UAF/OOB hits. /proc/slabinfo
confirms the xfrm_state slab is actively allocated and freed during
the run (~143 KiB resident), so the fuzzer is still exercising those
code paths -- they just no longer crash.
Reproduction:
- Linux 6.12.47 x86_64 + KASAN_GENERIC + KASAN_INLINE + KCOV
- syzkaller @ 746545b8b1e4c3a128db8652b340d3df90ce61db
- 32 QEMU/KVM VMs x 2 vCPU on AWS c5.metal bare metal
- 9 unique signatures collected in ~9h, all within xfrm_state
lifecycle
Fixes: fe9f1d8779cb ("xfrm: add state hashtable keyed by seq")
Fixes: 7b4dc3600e48 ("[XFRM]: Do not add a state whose SPI is zero to the SPI hash.")
Reported-by: Michal Kosiorek <mkosiorek121@gmail.com>
Tested-by: Michal Kosiorek <mkosiorek121@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Michal Kosiorek <mkosiorek121@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ruijie Li <ruijieli51@gmail.com>
Date: Wed Apr 29 00:41:43 2026 +0800
xfrm: provide message size for XFRM_MSG_MAPPING
commit 28465227c80fe417b4013c432be1f3737cb9f9a3 upstream.
The compat 64=>32 translation path handles XFRM_MSG_MAPPING, but
xfrm_msg_min[] does not provide the native payload size for this
message type.
Add the missing XFRM_MSG_MAPPING entry so compat translation can size
and translate mapping notifications correctly.
Fixes: 5461fc0c8d9f ("xfrm/compat: Add 64=>32-bit messages translator")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Ruijie Li <ruijieli51@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>