Author: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Date: Sat Aug 30 08:37:49 2025 +0900
ALSA: firewire-motu: drop EPOLLOUT from poll return values as write is not supported
[ Upstream commit aea3493246c474bc917d124d6fb627663ab6bef0 ]
The ALSA HwDep character device of the firewire-motu driver incorrectly
returns EPOLLOUT in poll(2), even though the driver implements no operation
for write(2). This misleads userspace applications to believe write() is
allowed, potentially resulting in unnecessarily wakeups.
This issue dates back to the driver's initial code added by a commit
71c3797779d3 ("ALSA: firewire-motu: add hwdep interface"), and persisted
when POLLOUT was updated to EPOLLOUT by a commit a9a08845e9ac ('vfs: do
bulk POLL* -> EPOLL* replacement("").').
This commit fixes the bug.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://patch.msgid.link/20250829233749.366222-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Praful Adiga <praful.adiga@gmail.com>
Date: Thu Sep 18 12:40:18 2025 -0400
ALSA: hda/realtek: Fix mute led for HP Laptop 15-dw4xx
commit d33c3471047fc54966621d19329e6a23ebc8ec50 upstream.
This laptop uses the ALC236 codec with COEF 0x7 and idx 1 to
control the mute LED. Enable the existing quirk for this device.
Signed-off-by: Praful Adiga <praful.adiga@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Takashi Iwai <tiwai@suse.de>
Date: Wed Sep 17 15:09:01 2025 +0200
ALSA: usb: qcom: Fix false-positive address space check
[ Upstream commit 44499ecb4f2817743c37d861bdb3e95f37d3d9cd ]
The sanity check previously added to uaudio_transfer_buffer_setup()
assumed the allocated buffer being linear-mapped. But the buffer
allocated via usb_alloc_coherent() isn't always so, rather to be used
with (SG-)DMA API. This leaded to a false-positive warning and the
driver failed to work.
Actually uaudio_transfer_buffer_setup() deals only with the DMA-API
addresses for MEM_XFER_BUF type, while other callers of
uaudio_iommu_map() are with pages with physical addresses for
MEM_EVENT_RING and MEM_XFER_RING types. So this patch splits the
mapping helper function to two different ones, uaudio_iommu_map() for
the DMA pages and uaudio_iommu_map_pa() for the latter, in order to
handle mapping differently for each type. Along with it, the
unnecessary address check that caused probe error is dropped, too.
Fixes: 3335a1bbd624 ("ALSA: qc_audio_offload: try to reduce address space confusion")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reported-and-tested-by: Luca Weiss <luca.weiss@fairphone.com>
Closes: https://lore.kernel.org/DBR2363A95M1.L9XBNC003490@fairphone.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Venkata Prasad Potturu <venkataprasad.potturu@amd.com>
Date: Wed Sep 10 22:43:59 2025 +0530
ASoC: amd: acp: Fix incorrect retrival of acp_chip_info
[ Upstream commit d7871f400cad1da376f1d7724209a1c49226c456 ]
Use dev_get_drvdata(dev->parent) instead of dev_get_platdata(dev)
to correctly obtain acp_chip_info members in the acp I2S driver.
Previously, some members were not updated properly due to incorrect
data access, which could potentially lead to null pointer
dereferences.
This issue was missed in the earlier commit
("ASoC: amd: acp: Fix NULL pointer deref in acp_i2s_set_tdm_slot"),
which only addressed set_tdm_slot(). This change ensures that all
relevant functions correctly retrieve acp_chip_info, preventing
further null pointer dereference issues.
Fixes: e3933683b25e ("ASoC: amd: acp: Remove redundant acp_dev_data structure")
Signed-off-by: Venkata Prasad Potturu <venkataprasad.potturu@amd.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20250910171419.3682468-1-venkataprasad.potturu@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Dan Carpenter <dan.carpenter@linaro.org>
Date: Fri Aug 29 15:57:34 2025 +0300
ASoC: codec: sma1307: Fix memory corruption in sma1307_setting_loaded()
[ Upstream commit 78338108b5a856dc98223a335f147846a8a18c51 ]
The sma1307->set.header_size is how many integers are in the header
(there are 8 of them) but instead of allocating space of 8 integers
we allocate 8 bytes. This leads to memory corruption when we copy data
it on the next line:
memcpy(sma1307->set.header, data,
sma1307->set.header_size * sizeof(int));
Also since we're immediately copying over the memory in ->set.header,
there is no need to zero it in the allocator. Use devm_kmalloc_array()
to allocate the memory instead.
Fixes: 576c57e6b4c1 ("ASoC: sma1307: Add driver for Iron Device SMA1307")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/aLGjvjpueVstekXP@stanley.mountain
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Date: Tue Sep 9 11:28:29 2025 +0200
ASoC: Intel: catpt: Expose correct bit depth to userspace
[ Upstream commit 690aa09b1845c0d5c3c29dabd50a9d0488c97c48 ]
Currently wrong bit depth is exposed in hw params, causing clipped
volume during playback. Expose correct parameters.
Fixes: a126750fc865 ("ASoC: Intel: catpt: PCM operations")
Reported-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Message-ID: <20250909092829.375953-1-amadeuszx.slawinski@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com>
Date: Mon Sep 8 11:06:29 2025 +0530
ASoC: qcom: audioreach: Fix lpaif_type configuration for the I2S interface
commit 5f1af203ef964e7f7bf9d32716dfa5f332cc6f09 upstream.
Fix missing lpaif_type configuration for the I2S interface.
The proper lpaif interface type required to allow DSP to vote
appropriate clock setting for I2S interface.
Fixes: 25ab80db6b133 ("ASoC: qdsp6: audioreach: add module configuration command helpers")
Cc: stable@vger.kernel.org
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Signed-off-by: Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com>
Message-ID: <20250908053631.70978-2-mohammad.rafi.shaik@oss.qualcomm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com>
Date: Mon Sep 8 11:06:30 2025 +0530
ASoC: qcom: q6apm-lpass-dais: Fix missing set_fmt DAI op for I2S
commit 33b55b94bca904ca25a9585e3cd43d15f0467969 upstream.
The q6i2s_set_fmt() function was defined but never linked into the
I2S DAI operations, resulting DAI format settings is being ignored
during stream setup. This change fixes the issue by properly linking
the .set_fmt handler within the DAI ops.
Fixes: 30ad723b93ade ("ASoC: qdsp6: audioreach: add q6apm lpass dai support")
Cc: stable@vger.kernel.org
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Signed-off-by: Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com>
Message-ID: <20250908053631.70978-3-mohammad.rafi.shaik@oss.qualcomm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Date: Thu Sep 4 12:18:50 2025 +0200
ASoC: qcom: q6apm-lpass-dais: Fix NULL pointer dereference if source graph failed
commit 68f27f7c7708183e7873c585ded2f1b057ac5b97 upstream.
If earlier opening of source graph fails (e.g. ADSP rejects due to
incorrect audioreach topology), the graph is closed and
"dai_data->graph[dai->id]" is assigned NULL. Preparing the DAI for sink
graph continues though and next call to q6apm_lpass_dai_prepare()
receives dai_data->graph[dai->id]=NULL leading to NULL pointer
exception:
qcom-apm gprsvc:service:2:1: Error (1) Processing 0x01001002 cmd
qcom-apm gprsvc:service:2:1: DSP returned error[1001002] 1
q6apm-lpass-dais 30000000.remoteproc:glink-edge:gpr:service@1:bedais: fail to start APM port 78
q6apm-lpass-dais 30000000.remoteproc:glink-edge:gpr:service@1:bedais: ASoC: error at snd_soc_pcm_dai_prepare on TX_CODEC_DMA_TX_3: -22
Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a8
...
Call trace:
q6apm_graph_media_format_pcm+0x48/0x120 (P)
q6apm_lpass_dai_prepare+0x110/0x1b4
snd_soc_pcm_dai_prepare+0x74/0x108
__soc_pcm_prepare+0x44/0x160
dpcm_be_dai_prepare+0x124/0x1c0
Fixes: 30ad723b93ad ("ASoC: qdsp6: audioreach: add q6apm lpass dai support")
Cc: stable@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Message-ID: <20250904101849.121503-2-krzysztof.kozlowski@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Maciej Strozek <mstrozek@opensource.cirrus.com>
Date: Mon Sep 1 16:15:07 2025 +0100
ASoC: SDCA: Add quirk for incorrect function types for 3 systems
commit 28edfaa10ca1b370b1a27fde632000d35c43402c upstream.
Certain systems have CS42L43 DisCo that claims to conform to version 0.6.28
but uses the function types from the 1.0 spec. Add a quirk as a workaround.
Closes: https://github.com/thesofproject/linux/issues/5515
Cc: stable@vger.kernel.org
Signed-off-by: Maciej Strozek <mstrozek@opensource.cirrus.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.dev>
Link: https://patch.msgid.link/20250901151518.3197941-1-mstrozek@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Charles Keepax <ckeepax@opensource.cirrus.com>
Date: Wed Aug 20 17:37:15 2025 +0100
ASoC: SDCA: Fix return value in sdca_regmap_mbq_size()
[ Upstream commit f81e63047600d023cbfda372b6de8f2821ff6839 ]
The MBQ size function returns an integer representing the size of a
Control. Currently if the Control is not found the function will return
false which makes little sense. Correct this typo to return -EINVAL.
Fixes: e3f7caf74b79 ("ASoC: SDCA: Add generic regmap SDCA helpers")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Message-ID: <20250820163717.1095846-2-ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Colin Ian King <colin.i.king@gmail.com>
Date: Tue Sep 2 13:06:39 2025 +0100
ASoC: SOF: Intel: hda-stream: Fix incorrect variable used in error message
[ Upstream commit 35fc531a59694f24a2456569cf7d1a9c6436841c ]
The dev_err message is reporting an error about capture streams however
it is using the incorrect variable num_playback instead of num_capture.
Fix this by using the correct variable num_capture.
Fixes: a1d1e266b445 ("ASoC: SOF: Intel: Add Intel specific HDA stream operations")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://patch.msgid.link/20250902120639.2626861-1-colin.i.king@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Charles Keepax <ckeepax@opensource.cirrus.com>
Date: Thu Aug 21 09:26:37 2025 +0100
ASoC: wm8940: Correct PLL rate rounding
[ Upstream commit d05afb53c683ef7ed1228b593c3360f4d3126c58 ]
Using a single value of 22500000 for both 48000Hz and 44100Hz audio
will sometimes result in returning wrong dividers due to rounding.
Update the code to use the actual value for both.
Fixes: 294833fc9eb4 ("ASoC: wm8940: Rewrite code to set proper clocks")
Reported-by: Ankur Tyagi <ankur.tyagi85@gmail.com>
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Tested-by: Ankur Tyagi <ankur.tyagi85@gmail.com>
Link: https://patch.msgid.link/20250821082639.1301453-2-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Charles Keepax <ckeepax@opensource.cirrus.com>
Date: Thu Aug 21 09:26:38 2025 +0100
ASoC: wm8940: Correct typo in control name
[ Upstream commit b4799520dcd6fe1e14495cecbbe9975d847cd482 ]
Fixes: 0b5e92c5e020 ("ASoC WM8940 Driver")
Reported-by: Ankur Tyagi <ankur.tyagi85@gmail.com>
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Tested-by: Ankur Tyagi <ankur.tyagi85@gmail.com>
Link: https://patch.msgid.link/20250821082639.1301453-3-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Charles Keepax <ckeepax@opensource.cirrus.com>
Date: Thu Aug 21 09:26:39 2025 +0100
ASoC: wm8974: Correct PLL rate rounding
[ Upstream commit 9b17d3724df55ecc2bc67978822585f2b023be48 ]
Using a single value of 22500000 for both 48000Hz and 44100Hz audio
will sometimes result in returning wrong dividers due to rounding.
Update the code to use the actual value for both.
Fixes: 51b2bb3f2568 ("ASoC: wm8974: configure pll and mclk divider automatically")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250821082639.1301453-4-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Hangbin Liu <liuhangbin@gmail.com>
Date: Tue Sep 16 08:01:26 2025 +0000
bonding: don't set oif to bond dev when getting NS target destination
[ Upstream commit a8ba87f04ca9cdec06776ce92dce1395026dc3bb ]
Unlike IPv4, IPv6 routing strictly requires the source address to be valid
on the outgoing interface. If the NS target is set to a remote VLAN interface,
and the source address is also configured on a VLAN over a bond interface,
setting the oif to the bond device will fail to retrieve the correct
destination route.
Fix this by not setting the oif to the bond device when retrieving the NS
target destination. This allows the correct destination device (the VLAN
interface) to be determined, so that bond_verify_device_path can return the
proper VLAN tags for sending NS messages.
Reported-by: David Wilder <wilder@us.ibm.com>
Closes: https://lore.kernel.org/netdev/aGOKggdfjv0cApTO@fedora/
Suggested-by: Jay Vosburgh <jv@jvosburgh.net>
Tested-by: David Wilder <wilder@us.ibm.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250916080127.430626-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Hangbin Liu <liuhangbin@gmail.com>
Date: Wed Sep 10 02:43:34 2025 +0000
bonding: set random address only when slaves already exist
[ Upstream commit 35ae4e86292ef7dfe4edbb9942955c884e984352 ]
After commit 5c3bf6cba791 ("bonding: assign random address if device
address is same as bond"), bonding will erroneously randomize the MAC
address of the first interface added to the bond if fail_over_mac =
follow.
Correct this by additionally testing for the bond being empty before
randomizing the MAC.
Fixes: 5c3bf6cba791 ("bonding: assign random address if device address is same as bond")
Reported-by: Qiuling Ren <qren@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250910024336.400253-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Filipe Manana <fdmanana@suse.com>
Date: Wed Sep 3 16:53:21 2025 +0100
btrfs: fix invalid extref key setup when replaying dentry
[ Upstream commit b62fd63ade7cb573b114972ef8f9fa505be8d74a ]
The offset for an extref item's key is not the object ID of the parent
dir, otherwise we would not need the extref item and would use plain ref
items. Instead the offset is the result of a hash computation that uses
the object ID of the parent dir and the name associated to the entry.
So fix this by setting the key offset at replay_one_name() to be the
result of calling btrfs_extref_hash().
Fixes: 725af92a6251 ("btrfs: Open-code name_in_log_ref in replay_one_name")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: austinchang <austinchang@synology.com>
Date: Thu Sep 11 06:06:29 2025 +0000
btrfs: initialize inode::file_extent_tree after i_mode has been set
commit 8679d2687c351824d08cf1f0e86f3b65f22a00fe upstream.
btrfs_init_file_extent_tree() uses S_ISREG() to determine if the file is
a regular file. In the beginning of btrfs_read_locked_inode(), the i_mode
hasn't been read from inode item, then file_extent_tree won't be used at
all in volumes without NO_HOLES.
Fix this by calling btrfs_init_file_extent_tree() after i_mode is
initialized in btrfs_read_locked_inode().
Fixes: 3d7db6e8bd22e6 ("btrfs: don't allocate file extent tree for non regular files")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: austinchang <austinchang@synology.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Qu Wenruo <wqu@suse.com>
Date: Tue Sep 16 07:54:06 2025 +0930
btrfs: tree-checker: fix the incorrect inode ref size check
commit 96fa515e70f3e4b98685ef8cac9d737fc62f10e1 upstream.
[BUG]
Inside check_inode_ref(), we need to make sure every structure,
including the btrfs_inode_extref header, is covered by the item. But
our code is incorrectly using "sizeof(iref)", where @iref is just a
pointer.
This means "sizeof(iref)" will always be "sizeof(void *)", which is much
smaller than "sizeof(struct btrfs_inode_extref)".
This will allow some bad inode extrefs to sneak in, defeating tree-checker.
[FIX]
Fix the typo by calling "sizeof(*iref)", which is the same as
"sizeof(struct btrfs_inode_extref)", and will be the correct behavior we
want.
Fixes: 71bf92a9b877 ("btrfs: tree-checker: Add check for INODE_REF")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Date: Fri Sep 5 15:54:43 2025 +0200
btrfs: zoned: fix incorrect ASSERT in btrfs_zoned_reserve_data_reloc_bg()
[ Upstream commit 5b8d2964754102323ca24495ba94892426284e3a ]
When moving a block-group to the dedicated data relocation space-info in
btrfs_zoned_reserve_data_reloc_bg() it is asserted that the newly
created block group for data relocation does not contain any
zone_unusable bytes.
But on disks with zone_capacity < zone_size, the difference between
zone_size and zone_capacity is accounted as zone_unusable.
Instead of asserting that the block-group does not contain any
zone_unusable bytes, remove them from the block-groups total size.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/linux-block/CAHj4cs8-cS2E+-xQ-d2Bj6vMJZ+CwT_cbdWBTju4BV35LsvEYw@mail.gmail.com/
Fixes: daa0fde322350 ("btrfs: zoned: fix data relocation block group reservation")
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Chen Ridong <chenridong@huawei.com>
Date: Tue Aug 19 01:07:24 2025 +0000
cgroup: split cgroup_destroy_wq into 3 workqueues
[ Upstream commit 79f919a89c9d06816dbdbbd168fa41d27411a7f9 ]
A hung task can occur during [1] LTP cgroup testing when repeatedly
mounting/unmounting perf_event and net_prio controllers with
systemd.unified_cgroup_hierarchy=1. The hang manifests in
cgroup_lock_and_drain_offline() during root destruction.
Related case:
cgroup_fj_function_perf_event cgroup_fj_function.sh perf_event
cgroup_fj_function_net_prio cgroup_fj_function.sh net_prio
Call Trace:
cgroup_lock_and_drain_offline+0x14c/0x1e8
cgroup_destroy_root+0x3c/0x2c0
css_free_rwork_fn+0x248/0x338
process_one_work+0x16c/0x3b8
worker_thread+0x22c/0x3b0
kthread+0xec/0x100
ret_from_fork+0x10/0x20
Root Cause:
CPU0 CPU1
mount perf_event umount net_prio
cgroup1_get_tree cgroup_kill_sb
rebind_subsystems // root destruction enqueues
// cgroup_destroy_wq
// kill all perf_event css
// one perf_event css A is dying
// css A offline enqueues cgroup_destroy_wq
// root destruction will be executed first
css_free_rwork_fn
cgroup_destroy_root
cgroup_lock_and_drain_offline
// some perf descendants are dying
// cgroup_destroy_wq max_active = 1
// waiting for css A to die
Problem scenario:
1. CPU0 mounts perf_event (rebind_subsystems)
2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
3. A dying perf_event CSS gets queued for offline after root destruction
4. Root destruction waits for offline completion, but offline work is
blocked behind root destruction in cgroup_destroy_wq (max_active=1)
Solution:
Split cgroup_destroy_wq into three dedicated workqueues:
cgroup_offline_wq – Handles CSS offline operations
cgroup_release_wq – Manages resource release
cgroup_free_wq – Performs final memory deallocation
This separation eliminates blocking in the CSS free path while waiting for
offline operations to complete.
[1] https://github.com/linux-test-project/ltp/blob/master/runtest/controllers
Fixes: 334c3679ec4b ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends")
Reported-by: Gao Yingjie <gaoyingjie@uniontech.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Suggested-by: Teju Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Chen-Yu Tsai <wens@csie.org>
Date: Sun Aug 31 01:08:56 2025 +0800
clk: sunxi-ng: mp: Fix dual-divider clock rate readback
commit 25fbbaf515acd13399589bd5ee6de5f35740cef2 upstream.
When dual-divider clock support was introduced, the P divider offset was
left out of the .recalc_rate readback function. This causes the clock
rate to become bogus or even zero (possibly due to the P divider being
1, leading to a divide-by-zero).
Fix this by incorporating the P divider offset into the calculation.
Fixes: 45717804b75e ("clk: sunxi-ng: mp: introduce dual-divider clock")
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://patch.msgid.link/20250830170901.1996227-4-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Duoming Zhou <duoming@zju.edu.cn>
Date: Wed Sep 17 13:46:02 2025 +0800
cnic: Fix use-after-free bugs in cnic_delete_task
[ Upstream commit cfa7d9b1e3a8604afc84e9e51d789c29574fb216 ]
The original code uses cancel_delayed_work() in cnic_cm_stop_bnx2x_hw(),
which does not guarantee that the delayed work item 'delete_task' has
fully completed if it was already running. Additionally, the delayed work
item is cyclic, the flush_workqueue() in cnic_cm_stop_bnx2x_hw() only
blocks and waits for work items that were already queued to the
workqueue prior to its invocation. Any work items submitted after
flush_workqueue() is called are not included in the set of tasks that the
flush operation awaits. This means that after the cyclic work items have
finished executing, a delayed work item may still exist in the workqueue.
This leads to use-after-free scenarios where the cnic_dev is deallocated
by cnic_free_dev(), while delete_task remains active and attempt to
dereference cnic_dev in cnic_delete_task().
A typical race condition is illustrated below:
CPU 0 (cleanup) | CPU 1 (delayed work callback)
cnic_netdev_event() |
cnic_stop_hw() | cnic_delete_task()
cnic_cm_stop_bnx2x_hw() | ...
cancel_delayed_work() | /* the queue_delayed_work()
flush_workqueue() | executes after flush_workqueue()*/
| queue_delayed_work()
cnic_free_dev(dev)//free | cnic_delete_task() //new instance
| dev = cp->dev; //use
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the cyclic delayed work item is properly canceled and that any
ongoing execution of the work item completes before the cnic_dev is
deallocated. Furthermore, since cancel_delayed_work_sync() uses
__flush_work(work, true) to synchronously wait for any currently
executing instance of the work item to finish, the flush_workqueue()
becomes redundant and should be removed.
This bug was identified through static analysis. To reproduce the issue
and validate the fix, I simulated the cnic PCI device in QEMU and
introduced intentional delays — such as inserting calls to ssleep()
within the cnic_delete_task() function — to increase the likelihood
of triggering the bug.
Fixes: fdf24086f475 ("cnic: Defer iscsi connection cleanup")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue Sep 16 17:20:59 2025 +0800
crypto: af_alg - Disallow concurrent writes in af_alg_sendmsg
commit 1b34cbbf4f011a121ef7b2d7d6e6920a036d5285 upstream.
Issuing two writes to the same af_alg socket is bogus as the
data will be interleaved in an unpredictable fashion. Furthermore,
concurrent writes may create inconsistencies in the internal
socket state.
Disallow this by adding a new ctx->write field that indiciates
exclusive ownership for writing.
Fixes: 8ff590903d5 ("crypto: algif_skcipher - User-space interface for skcipher operations")
Reported-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue Sep 16 15:42:41 2025 +0800
crypto: af_alg - Set merge to zero early in af_alg_sendmsg
[ Upstream commit 9574b2330dbd2b5459b74d3b5e9619d39299fc6f ]
If an error causes af_alg_sendmsg to abort, ctx->merge may contain
a garbage value from the previous loop. This may then trigger a
crash on the next entry into af_alg_sendmsg when it attempts to do
a merge that can't be done.
Fix this by setting ctx->merge to zero near the start of the loop.
Fixes: 8ff590903d5 ("crypto: algif_skcipher - User-space interface for skcipher operations")
Reported-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Borislav Petkov (AMD) <bp@alien8.de>
Date: Sat Sep 6 14:21:45 2025 +0200
crypto: ccp - Always pass in an error pointer to __sev_platform_shutdown_locked()
commit 46834d90a9a13549264b9581067d8f746b4b36cc upstream.
When
9770b428b1a2 ("crypto: ccp - Move dev_info/err messages for SEV/SNP init and shutdown")
moved the error messages dumping so that they don't need to be issued by
the callers, it missed the case where __sev_firmware_shutdown() calls
__sev_platform_shutdown_locked() with a NULL argument which leads to
a NULL ptr deref on the shutdown path, during suspend to disk:
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 983 Comm: hib.sh Not tainted 6.17.0-rc4+ #1 PREEMPT(voluntary)
Hardware name: Supermicro Super Server/H12SSL-i, BIOS 2.5 09/08/2022
RIP: 0010:__sev_platform_shutdown_locked.cold+0x0/0x21 [ccp]
That rIP is:
00000000000006fd <__sev_platform_shutdown_locked.cold>:
6fd: 8b 13 mov (%rbx),%edx
6ff: 48 8b 7d 00 mov 0x0(%rbp),%rdi
703: 89 c1 mov %eax,%ecx
Code: 74 05 31 ff 41 89 3f 49 8b 3e 89 ea 48 c7 c6 a0 8e 54 a0 41 bf 92 ff ff ff e8 e5 2e 09 e1 c6 05 2a d4 38 00 01 e9 26 af ff ff <8b> 13 48 8b 7d 00 89 c1 48 c7 c6 18 90 54 a0 89 44 24 04 e8 c1 2e
RSP: 0018:ffffc90005467d00 EFLAGS: 00010282
RAX: 00000000ffffff92 RBX: 0000000000000000 RCX: 0000000000000000
^^^^^^^^^^^^^^^^
and %rbx is nice and clean.
Call Trace:
<TASK>
__sev_firmware_shutdown.isra.0
sev_dev_destroy
psp_dev_destroy
sp_destroy
pci_device_shutdown
device_shutdown
kernel_power_off
hibernate.cold
state_store
kernfs_fop_write_iter
vfs_write
ksys_write
do_syscall_64
entry_SYSCALL_64_after_hwframe
Pass in a pointer to the function-local error var in the caller.
With that addressed, suspending the ccp shows the error properly at
least:
ccp 0000:47:00.1: sev command 0x2 timed out, disabling PSP
ccp 0000:47:00.1: SEV: failed to SHUTDOWN error 0x0, rc -110
SEV-SNP: Leaking PFN range 0x146800-0x146a00
SEV-SNP: PFN 0x146800 unassigned, dumping non-zero entries in 2M PFN region: [0x146800 - 0x146a00]
...
ccp 0000:47:00.1: SEV-SNP firmware shutdown failed, rc -16, error 0x0
ACPI: PM: Preparing to enter system sleep state S5
kvm: exiting hardware virtualization
reboot: Power down
Btw, this driver is crying to be cleaned up to pass in a proper I/O
struct which can be used to store information between the different
functions, otherwise stuff like that will happen in the future again.
Fixes: 9770b428b1a2 ("crypto: ccp - Move dev_info/err messages for SEV/SNP init and shutdown")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Reviewed-by: Ashish Kalra <ashish.kalra@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Mon Sep 15 16:12:40 2025 +0200
dm-raid: don't set io_min and io_opt for raid1
commit a86556264696b797d94238d99d8284d0d34ed960 upstream.
These commands
modprobe brd rd_size=1048576
vgcreate vg /dev/ram*
lvcreate -m4 -L10 -n lv vg
trigger the following warnings:
device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency
device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency
The warnings are caused by the fact that io_min is 512 and physical block
size is 4096.
If there's chunk-less raid, such as raid1, io_min shouldn't be set to zero
because it would be raised to 512 and it would trigger the warning.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Mon Aug 11 13:17:32 2025 +0200
dm-stripe: fix a possible integer overflow
commit 1071d560afb4c245c2076494226df47db5a35708 upstream.
There's a possible integer overflow in stripe_io_hints if we have too
large chunk size. Test if the overflow happened, and if it did, don't set
limits->io_min and limits->io_opt;
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Suggested-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Remy D. Farley <one-d-wide@protonmail.com>
Date: Sat Sep 13 14:05:28 2025 +0000
doc/netlink: Fix typos in operation attributes
[ Upstream commit 109f8b51543d106aee50dfe911f439e43fb30c7a ]
I'm trying to generate Rust bindings for netlink using the yaml spec.
It looks like there's a typo in conntrack spec: attribute set conntrack-attrs
defines attributes "counters-{orig,reply}" (plural), while get operation
references "counter-{orig,reply}" (singular). The latter should be fixed, as it
denotes multiple counters (packet and byte). The corresonding C define is
CTA_COUNTERS_ORIG.
Also, dump request references "nfgen-family" attribute, which neither exists in
conntrack-attrs attrset nor ctattr_type enum. There's member of nfgenmsg struct
with the same name, which is where family value is actually taken from.
> static int ctnetlink_dump_exp_ct(struct net *net, struct sock *ctnl,
> struct sk_buff *skb,
> const struct nlmsghdr *nlh,
> const struct nlattr * const cda[],
> struct netlink_ext_ack *extack)
> {
> int err;
> struct nfgenmsg *nfmsg = nlmsg_data(nlh);
> u_int8_t u3 = nfmsg->nfgen_family;
^^^^^^^^^^^^
Signed-off-by: Remy D. Farley <one-d-wide@protonmail.com>
Fixes: 23fc9311a526 ("netlink: specs: add conntrack dump and stats dump support")
Link: https://patch.msgid.link/20250913140515.1132886-1-one-d-wide@protonmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ioana Ciornei <ioana.ciornei@nxp.com>
Date: Wed Sep 10 17:48:25 2025 +0300
dpaa2-switch: fix buffer pool seeding for control traffic
[ Upstream commit 2690cb089502b80b905f2abdafd1bf2d54e1abef ]
Starting with commit c50e7475961c ("dpaa2-switch: Fix error checking in
dpaa2_switch_seed_bp()"), the probing of a second DPSW object errors out
like below.
fsl_dpaa2_switch dpsw.1: fsl_mc_driver_probe failed: -12
fsl_dpaa2_switch dpsw.1: probe with driver fsl_dpaa2_switch failed with error -12
The aforementioned commit brought to the surface the fact that seeding
buffers into the buffer pool destined for control traffic is not
successful and an access violation recoverable error can be seen in the
MC firmware log:
[E, qbman_rec_isr:391, QBMAN] QBMAN recoverable event 0x1000000
This happens because the driver incorrectly used the ID of the DPBP
object instead of the hardware buffer pool ID when trying to release
buffers into it.
This is because any DPSW object uses two buffer pools, one managed by
the Linux driver and destined for control traffic packet buffers and the
other one managed by the MC firmware and destined only for offloaded
traffic. And since the buffer pool managed by the MC firmware does not
have an external facing DPBP equivalent, any subsequent DPBP objects
created after the first DPSW will have a DPBP id different to the
underlying hardware buffer ID.
The issue was not caught earlier because these two numbers can be
identical when all DPBP objects are created before the DPSW objects are.
This is the case when the DPL file is used to describe the entire DPAA2
object layout and objects are created at boot time and it's also true
for the first DPSW being created dynamically using ls-addsw.
Fix this by using the buffer pool ID instead of the DPBP id when
releasing buffers into the pool.
Fixes: 2877e4f7e189 ("staging: dpaa2-switch: setup buffer pool and RX path rings")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20250910144825.2416019-1-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ivan Vecera <ivecera@redhat.com>
Date: Fri Sep 12 11:33:31 2025 +0200
dpll: fix clock quality level reporting
[ Upstream commit 70d99623d5c11e1a9bcc564b8fbad6fa916913d8 ]
The DPLL_CLOCK_QUALITY_LEVEL_ITU_OPT1_EPRC is not reported via netlink
due to bug in dpll_msg_add_clock_quality_level(). The usage of
DPLL_CLOCK_QUALITY_LEVEL_MAX for both DECLARE_BITMAP() and
for_each_set_bit() is not correct because these macros requires bitmap
size and not the highest valid bit in the bitmap.
Use correct bitmap size to fix this issue.
Fixes: a1afb959add1 ("dpll: add clock quality level attribute and op")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20250912093331.862333-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ivan Lipski <ivan.lipski@amd.com>
Date: Tue Sep 2 16:20:09 2025 -0400
drm/amd/display: Allow RX6xxx & RX7700 to invoke amdgpu_irq_get/put
commit 29a2f430475357f760679b249f33e7282688e292 upstream.
[Why&How]
As reported on https://gitlab.freedesktop.org/drm/amd/-/issues/3936,
SMU hang can occur if the interrupts are not enabled appropriately,
causing a vblank timeout.
This patch reverts commit 5009628d8509 ("drm/amd/display: Remove unnecessary
amdgpu_irq_get/put"), but only for RX6xxx & RX7700 GPUs, on which the
issue was observed.
This will re-enable interrupts regardless of whether the user space needed
it or not.
Fixes: 5009628d8509 ("drm/amd/display: Remove unnecessary amdgpu_irq_get/put")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3936
Suggested-by: Sun peng Li <sunpeng.li@amd.com>
Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 95d168b367aa28a59f94fc690ff76ebf69312c6d)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mario Limonciello <mario.limonciello@amd.com>
Date: Mon Sep 15 20:59:02 2025 -0500
drm/amd: Only restore cached manual clock settings in restore if OD enabled
commit f9b80514a7227c589291792cb6743b0ddf41c2bc upstream.
If OD is not enabled then restoring cached clock settings doesn't make
sense and actually leads to errors in resume.
Check if enabled before restoring settings.
Fixes: 4e9526924d09 ("drm/amd: Restore cached manual clock settings during resume")
Reported-by: Jérôme Lécuyer <jerome.4a4c@gmail.com>
Closes: https://lore.kernel.org/amd-gfx/0ffe2692-7bfa-4821-856e-dd0f18e2c32b@amd.com/T/#me6db8ddb192626360c462b7570ed7eba0c6c9733
Suggested-by: Jérôme Lécuyer <jerome.4a4c@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1a4dd33cc6e1baaa81efdbe68227a19f51c50f20)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Alex Deucher <alexander.deucher@amd.com>
Date: Wed Sep 17 12:42:11 2025 -0400
drm/amdgpu: suspend KFD and KGD user queues for S0ix
commit 9272bb34b066993f5f468b219b4a26ba3f2b25a1 upstream.
We need to make sure the user queues are preempted so
GFX can enter gfxoff.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f8b367e6fa1716cab7cc232b9e3dff29187fc99d)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Alex Deucher <alexander.deucher@amd.com>
Date: Wed Sep 17 12:42:09 2025 -0400
drm/amdkfd: add proper handling for S0ix
commit 2ade36eaa9ac05e4913e9785df19c2cde8f912fb upstream.
When in S0i3, the GFX state is retained, so all we need to do
is stop the runlist so GFX can enter gfxoff.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4bfa8609934dbf39bbe6e75b4f971469384b50b1)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Date: Wed Jun 25 13:54:06 2025 -0700
drm/xe/guc: Enable extended CAT error reporting
[ Upstream commit a7ffcea8631af91479cab10aa7fbfd0722f01d9a ]
On newer HW (Xe2 onwards + PVC) it is possible to get extra information
when a CAT error occurs, specifically a dword reporting the error type.
To enable this extra reporting, we need to opt-in with the GuC, which is
done via a specific per-VF feature opt-in H2G.
On platforms where the HW does not support the extra reporting, the GuC
will set the type to 0xdeadbeef, so we can keep the code simple and
opt-in to the feature on every platform and then just discard the data
if it is invalid.
Note that on native/PF we're guaranteed that the opt in is available
because we don't support any GuC old enough to not have it, but if we're
a VF we might be running on a non-XE PF with an older GuC, so we need to
handle that case. We can re-use the invalid type above to handle this
scenario the same way as if the feature was not supported in HW.
Given that this patch is the first user of the guc_buf_cache on native
and VF, it also extends that feature to non-PF use-cases.
v2: simpler print for the error type (John), rebase
v3: use guc_buf_cache instead of new alloc, simpler doc (Michal)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> #v1
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250625205405.1653212-3-daniele.ceraolospurio@intel.com
Stable-dep-of: 26caeae9fb48 ("drm/xe/guc: Set RCS/CCS yield policy")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Date: Fri Sep 5 16:56:33 2025 -0700
drm/xe/guc: Set RCS/CCS yield policy
[ Upstream commit 26caeae9fb482ec443753b4e3307e5122b60b850 ]
All recent platforms (including all the ones officially supported by the
Xe driver) do not allow concurrent execution of RCS and CCS workloads
from different address spaces, with the HW blocking the context switch
when it detects such a scenario.
The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a
context it knows will not be able to execute. This, however, causes a new
problem: if RCS and CCS queues have pending workloads from different
address spaces, the GuC needs to choose from which of the 2 queues to
pick the next workload to execute. By default, the GuC prioritizes RCS
submissions over CCS ones, which can lead to CCS workloads being
significantly (or completely) starved of execution time.
The driver can tune this by setting a dedicated scheduling policy KLV;
this KLV allows the driver to specify a quantum (in ms) and a ratio
(percentage value between 0 and 100), and the GuC will prioritize the CCS
for that percentage of each quantum.
Given that we want to guarantee enough RCS throughput to avoid missing
frames, we set the yield policy to 20% of each 80ms interval.
v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable
in gt_sanitize
Fixes: d9a1ae0d17bd ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com
(cherry picked from commit 88434448438e4302e272b2a2b810b42e05ea024b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo added #include "xe_guc_submit.h" while backporting]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date: Thu Sep 11 00:24:39 2025 +0200
drm/xe/pf: Drop rounddown_pow_of_two fair LMEM limitation
[ Upstream commit fef8b64e48e836344574b85132a1c317f4260022 ]
This effectively reverts commit 4c3fe5eae46b ("drm/xe/pf: Limit
fair VF LMEM provisioning") since we don't need it any more after
non-contig VRAM allocations were fixed. This allows larger LMEM
auto-provisioning for VFs, so instead:
[ ] GT0: PF: LMEM available(14096M) fair(1 x 8192M)
[ ] GT0: PF: VF1 provisioned with 8589934592 (8.00 GiB) LMEM
or
[ ] GT0: PF: LMEM available(14096M) fair(2 x 4096M)
[ ] GT0: PF: VF1..VF2 provisioned with 4294967296 (4.00 GiB) LMEM
we may get:
[ ] GT0: PF: LMEM available(14096M) fair(1 x 14096M)
[ ] GT0: PF: VF1 provisioned with 14780727296 (13.8 GiB) LMEM
and
[ ] GT0: PF: LMEM available(14096M) fair(2 x 7048M)
[ ] GT0: PF: VF1..VF2 provisioned with 7390363648 (6.88 GiB) LMEM
Fixes: 1e32ffbc9dc8 ("drm/xe/sriov: support non-contig VRAM provisioning")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250910222439.32869-1-michal.wajdeczko@intel.com
(cherry picked from commit 95c1cfa306087142989bff34ea0e05dcd95ddc58)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Shuicheng Lin <shuicheng.lin@intel.com>
Date: Tue Aug 19 15:39:51 2025 +0000
drm/xe/tile: Release kobject for the failure path
[ Upstream commit 013e484dbd687a9174acf8f4450217bdb86ad788 ]
Call kobject_put() for the failure path to release the kobject
v2: remove extra newline. (Matt)
Fixes: e3d0839aa501 ("drm/xe/tile: Abort driver load for sysfs creation failure")
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250819153950.2973344-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit b98775bca99511cc22ab459a2de646cd2fa7241f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Dan Carpenter <dan.carpenter@linaro.org>
Date: Thu Aug 7 18:53:41 2025 +0300
drm/xe: Fix a NULL vs IS_ERR() in xe_vm_add_compute_exec_queue()
[ Upstream commit cbc7f3b4f6ca19320e2eacf8fc1403d6f331ce14 ]
The xe_preempt_fence_create() function returns error pointers. It
never returns NULL. Update the error checking to match.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/aJTMBdX97cof_009@stanley.mountain
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 75cc23ffe5b422bc3cbd5cf0956b8b86e4b0e162)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Date: Tue Sep 9 15:12:40 2025 -0700
drm/xe: Fix error handling if PXP fails to start
[ Upstream commit ae5fbbda341f92e605a9508a0fb18456155517f0 ]
Since the PXP start comes after __xe_exec_queue_init() has completed,
we need to cleanup what was done in that function in case of a PXP
start error.
__xe_exec_queue_init calls the submission backend init() function,
so we need to introduce an opposite for that. Unfortunately, while
we already have a fini() function pointer, it performs other
operations in addition to cleaning up what was done by the init().
Therefore, for clarity, the existing fini() has been renamed to
destroy(), while a new fini() has been added to only clean up what was
done by the init(), with the latter being called by the former (via
xe_exec_queue_fini).
Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250909221240.3711023-3-daniele.ceraolospurio@intel.com
(cherry picked from commit 626667321deb4c7a294725406faa3dd71c3d445d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Loic Poulain <loic.poulain@oss.qualcomm.com>
Date: Wed Jul 9 10:54:38 2025 +0200
drm: bridge: anx7625: Fix NULL pointer dereference with early IRQ
[ Upstream commit a10f910c77f280327b481e77eab909934ec508f0 ]
If the interrupt occurs before resource initialization is complete, the
interrupt handler/worker may access uninitialized data such as the I2C
tcpc_client device, potentially leading to NULL pointer dereference.
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Fixes: 8bdfc5dae4e3 ("drm/bridge: anx7625: Add anx7625 MIPI DSI/DPI to DP")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250709085438.56188-1-loic.poulain@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Qi Xi <xiqi2@huawei.com>
Date: Thu Sep 4 11:44:47 2025 +0800
drm: bridge: cdns-mhdp8546: Fix missing mutex unlock on error path
[ Upstream commit 288dac9fb6084330d968459c750c838fd06e10e6 ]
Add missing mutex unlock before returning from the error path in
cdns_mhdp_atomic_enable().
Fixes: 935a92a1c400 ("drm: bridge: cdns-mhdp8546: Fix possible null pointer dereference")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qi Xi <xiqi2@huawei.com>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20250904034447.665427-1-xiqi2@huawei.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Frank Li <Frank.Li@nxp.com>
Date: Wed Sep 17 07:57:44 2025 -0400
dt-bindings: serial: 8250: allow clock 'uartclk' and 'reg' for nxp,lpc1850-uart
[ Upstream commit d2db0d78154442fb89165edf8836bf2644c6c58d ]
Allow clock 'uartclk' and 'reg' for nxp,lpc1850-uart to align existed
driver and dts. It is really old platform. Keep the same restriction for
others.
Allow dmas and dma-names property, which allow maxItems 4 because very old
platform (arch/arm/boot/dts/nxp/lpc/lpc18xx.dtsi) use duplicate "tx", "rx",
"tx", "rx" as dma-names.
Fix below CHECK_DTB warnings:
arch/arm/boot/dts/nxp/lpc/lpc4337-ciaa.dtb: serial@40081000 (nxp,lpc1850-uart): clock-names: ['uartclk', 'reg'] is too long
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Acked-by: "Rob Herring (Arm)" <robh@kernel.org>
Link: https://lore.kernel.org/r/20250602142745.942568-1-Frank.Li@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: 387d00028ccc ("dt-bindings: serial: 8250: move a constraint")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Alex Elder <elder@riscstar.com>
Date: Wed Sep 17 07:57:46 2025 -0400
dt-bindings: serial: 8250: move a constraint
[ Upstream commit 387d00028cccee7575f6416953bef62f849d83e3 ]
A block that required a "spacemit,k1-uart" compatible node to
specify two clocks was placed in the wrong spot in the binding.
Conor Dooley pointed out it belongs earlier in the file, as part
of the initial "allOf".
Fixes: 2c0594f9f0629 ("dt-bindings: serial: 8250: support an optional second clock")
Cc: stable <stable@kernel.org>
Reported-by: Conor Dooley <conor@kernel.org>
Closes: https://lore.kernel.org/lkml/20250729-reshuffle-contented-e6def76b540b@spud/
Signed-off-by: Alex Elder <elder@riscstar.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20250813032151.2330616-1-elder@riscstar.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yixun Lan <dlan@gentoo.org>
Date: Wed Sep 17 07:57:45 2025 -0400
dt-bindings: serial: 8250: spacemit: set clocks property as required
[ Upstream commit 48f9034e024a4c6e279b0d040e1f5589bb544806 ]
In SpacemiT's K1 SoC, the clocks for UART are mandatory needed, so
for DT, both clocks and clock-names property should be set as required.
Fixes: 2c0594f9f062 ("dt-bindings: serial: 8250: support an optional second clock")
Signed-off-by: Yixun Lan <dlan@gentoo.org>
Acked-by: "Rob Herring (Arm)" <robh@kernel.org>
Link: https://lore.kernel.org/r/20250718-01-k1-uart-binding-v1-1-a92e1e14c836@gentoo.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: 387d00028ccc ("dt-bindings: serial: 8250: move a constraint")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sébastien Szymanski <sebastien.szymanski@armadeus.com>
Date: Fri Sep 12 22:18:50 2025 +0200
gpiolib: acpi: initialize acpi_gpio_info struct
commit 19c839a98c731169f06d32e7c9e00c78a0086ebe upstream.
Since commit 7c010d463372 ("gpiolib: acpi: Make sure we fill struct
acpi_gpio_info"), uninitialized acpi_gpio_info struct are passed to
__acpi_find_gpio() and later in the call stack info->quirks is used in
acpi_populate_gpio_lookup. This breaks the i2c_hid_cpi driver:
[ 58.122916] i2c_hid_acpi i2c-UNIW0001:00: HID over i2c has not been provided an Int IRQ
[ 58.123097] i2c_hid_acpi i2c-UNIW0001:00: probe with driver i2c_hid_acpi failed with error -22
Fix this by initializing the acpi_gpio_info pass to __acpi_find_gpio()
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220388
Fixes: 7c010d463372 ("gpiolib: acpi: Make sure we fill struct acpi_gpio_info")
Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com>
Tested-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-By: Calvin Owens <calvin@wbinvd.org>
Cc: stable@vger.kernel.org
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Li Zhe <lizhe.67@bytedance.com>
Date: Fri Jun 6 10:37:42 2025 +0800
gup: optimize longterm pin_user_pages() for large folio
commit a03db236aebfaeadf79396dbd570896b870bda01 upstream.
In the current implementation of longterm pin_user_pages(), we invoke
collect_longterm_unpinnable_folios(). This function iterates through the
list to check whether each folio belongs to the "longterm_unpinnabled"
category. The folios in this list essentially correspond to a contiguous
region of userspace addresses, with each folio representing a physical
address in increments of PAGESIZE.
If this userspace address range is mapped with large folio, we can
optimize the performance of function collect_longterm_unpinnable_folios()
by reducing the using of READ_ONCE() invoked in
pofs_get_folio()->page_folio()->_compound_head().
Also, we can simplify the logic of collect_longterm_unpinnable_folios().
Instead of comparing with prev_folio after calling pofs_get_folio(), we
can check whether the next page is within the same folio.
The performance test results, based on v6.15, obtained through the
gup_test tool from the kernel source tree are as follows. We achieve an
improvement of over 66% for large folio with pagesize=2M. For small
folio, we have only observed a very slight degradation in performance.
Without this patch:
[root@localhost ~] ./gup_test -HL -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:14391 put:10858 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
[root@localhost ~]# ./gup_test -LT -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:130538 put:31676 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
With this patch:
[root@localhost ~] ./gup_test -HL -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:4867 put:10516 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
[root@localhost ~]# ./gup_test -LT -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:131798 put:31328 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
[lizhe.67@bytedance.com: whitespace fix, per David]
Link: https://lkml.kernel.org/r/20250606091917.91384-1-lizhe.67@bytedance.com
Link: https://lkml.kernel.org/r/20250606023742.58344-1-lizhe.67@bytedance.com
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Fri Aug 22 17:16:17 2025 +0200
i40e: remove redundant memory barrier when cleaning Tx descs
[ Upstream commit e37084a26070c546ae7961ee135bbfb15fbe13fd ]
i40e has a feature which writes to memory location last descriptor
successfully sent. Memory barrier in i40e_clean_tx_irq() was used to
avoid forward-reading descriptor fields in case DD bit was not set.
Having mentioned feature in place implies that such situation will not
happen as we know in advance how many descriptors HW has dealt with.
Besides, this barrier placement was wrong. Idea is to have this
protection *after* reading DD bit from HW descriptor, not before.
Digging through git history showed me that indeed barrier was before DD
bit check, anyways the commit introducing i40e_get_head() should have
wiped it out altogether.
Also, there was one commit doing s/read_barrier_depends/smp_rmb when get
head feature was already in place, but it was only theoretical based on
ixgbe experiences, which is different in these terms as that driver has
to read DD bit from HW descriptor.
Fixes: 1943d8ba9507 ("i40e/i40evf: enable hardware feature head write back")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jacob Keller <jacob.e.keller@intel.com>
Date: Mon Aug 25 16:00:14 2025 -0700
ice: fix Rx page leak on multi-buffer frames
[ Upstream commit 84bf1ac85af84d354c7a2fdbdc0d4efc8aaec34b ]
The ice_put_rx_mbuf() function handles calling ice_put_rx_buf() for each
buffer in the current frame. This function was introduced as part of
handling multi-buffer XDP support in the ice driver.
It works by iterating over the buffers from first_desc up to 1 plus the
total number of fragments in the frame, cached from before the XDP program
was executed.
If the hardware posts a descriptor with a size of 0, the logic used in
ice_put_rx_mbuf() breaks. Such descriptors get skipped and don't get added
as fragments in ice_add_xdp_frag. Since the buffer isn't counted as a
fragment, we do not iterate over it in ice_put_rx_mbuf(), and thus we don't
call ice_put_rx_buf().
Because we don't call ice_put_rx_buf(), we don't attempt to re-use the
page or free it. This leaves a stale page in the ring, as we don't
increment next_to_alloc.
The ice_reuse_rx_page() assumes that the next_to_alloc has been incremented
properly, and that it always points to a buffer with a NULL page. Since
this function doesn't check, it will happily recycle a page over the top
of the next_to_alloc buffer, losing track of the old page.
Note that this leak only occurs for multi-buffer frames. The
ice_put_rx_mbuf() function always handles at least one buffer, so a
single-buffer frame will always get handled correctly. It is not clear
precisely why the hardware hands us descriptors with a size of 0 sometimes,
but it happens somewhat regularly with "jumbo frames" used by 9K MTU.
To fix ice_put_rx_mbuf(), we need to make sure to call ice_put_rx_buf() on
all buffers between first_desc and next_to_clean. Borrow the logic of a
similar function in i40e used for this same purpose. Use the same logic
also in ice_get_pgcnts().
Instead of iterating over just the number of fragments, use a loop which
iterates until the current index reaches to the next_to_clean element just
past the current frame. Unlike i40e, the ice_put_rx_mbuf() function does
call ice_put_rx_buf() on the last buffer of the frame indicating the end of
packet.
For non-linear (multi-buffer) frames, we need to take care when adjusting
the pagecnt_bias. An XDP program might release fragments from the tail of
the frame, in which case that fragment page is already released. Only
update the pagecnt_bias for the first descriptor and fragments still
remaining post-XDP program. Take care to only access the shared info for
fragmented buffers, as this avoids a significant cache miss.
The xdp_xmit value only needs to be updated if an XDP program is run, and
only once per packet. Drop the xdp_xmit pointer argument from
ice_put_rx_mbuf(). Instead, set xdp_xmit in the ice_clean_rx_irq() function
directly. This avoids needing to pass the argument and avoids an extra
bit-wise OR for each buffer in the frame.
Move the increment of the ntc local variable to ensure its updated *before*
all calls to ice_get_pgcnts() or ice_put_rx_mbuf(), as the loop logic
requires the index of the element just after the current frame.
Now that we use an index pointer in the ring to identify the packet, we no
longer need to track or cache the number of fragments in the rx_ring.
Cc: Christoph Petrausch <christoph.petrausch@deepl.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Closes: https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxzGMYoE44caRbgw@mail.gmail.com/
Fixes: 743bbd93cf29 ("ice: put Rx buffers after being done with current frame")
Tested-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Tested-by: Priya Singh <priyax.singh@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Kohei Enju <enjuk@amazon.com>
Date: Wed Sep 10 22:47:21 2025 +0900
igc: don't fail igc_probe() on LED setup error
[ Upstream commit 528eb4e19ec0df30d0c9ae4074ce945667dde919 ]
When igc_led_setup() fails, igc_probe() fails and triggers kernel panic
in free_netdev() since unregister_netdev() is not called. [1]
This behavior can be tested using fault-injection framework, especially
the failslab feature. [2]
Since LED support is not mandatory, treat LED setup failures as
non-fatal and continue probe with a warning message, consequently
avoiding the kernel panic.
[1]
kernel BUG at net/core/dev.c:12047!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 937 Comm: repro-igc-led-e Not tainted 6.17.0-rc4-enjuk-tnguy-00865-gc4940196ab02 #64 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:free_netdev+0x278/0x2b0
[...]
Call Trace:
<TASK>
igc_probe+0x370/0x910
local_pci_probe+0x3a/0x80
pci_device_probe+0xd1/0x200
[...]
[2]
#!/bin/bash -ex
FAILSLAB_PATH=/sys/kernel/debug/failslab/
DEVICE=0000:00:05.0
START_ADDR=$(grep " igc_led_setup" /proc/kallsyms \
| awk '{printf("0x%s", $1)}')
END_ADDR=$(printf "0x%x" $((START_ADDR + 0x100)))
echo $START_ADDR > $FAILSLAB_PATH/require-start
echo $END_ADDR > $FAILSLAB_PATH/require-end
echo 1 > $FAILSLAB_PATH/times
echo 100 > $FAILSLAB_PATH/probability
echo N > $FAILSLAB_PATH/ignore-gfp-wait
echo $DEVICE > /sys/bus/pci/drivers/igc/bind
Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Max Kellermann <max.kellermann@ionos.com>
Date: Fri Sep 12 02:06:09 2025 +0200
io_uring/io-wq: fix `max_workers` breakage and `nr_workers` underflow
commit cd4ea81be3eb94047ad023c631afd9bd6c295400 upstream.
Commit 88e6c42e40de ("io_uring/io-wq: add check free worker before
create new worker") reused the variable `do_create` for something
else, abusing it for the free worker check.
This caused the value to effectively always be `true` at the time
`nr_workers < max_workers` was checked, but it should really be
`false`. This means the `max_workers` setting was ignored, and worse:
if the limit had already been reached, incrementing `nr_workers` was
skipped even though another worker would be created.
When later lots of workers exit, the `nr_workers` field could easily
underflow, making the problem worse because more and more workers
would be created without incrementing `nr_workers`.
The simple solution is to use a different variable for the free worker
check instead of using one variable for two different things.
Cc: stable@vger.kernel.org
Fixes: 88e6c42e40de ("io_uring/io-wq: add check free worker before create new worker")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Fengnan Chang <changfengnan@bytedance.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jens Axboe <axboe@kernel.dk>
Date: Thu Sep 18 13:59:15 2025 -0600
io_uring/msg_ring: kill alloc_cache for io_kiocb allocations
[ Upstream commit df8922afc37aa2111ca79a216653a629146763ad ]
A recent commit:
fc582cd26e88 ("io_uring/msg_ring: ensure io_kiocb freeing is deferred for RCU")
fixed an issue with not deferring freeing of io_kiocb structs that
msg_ring allocates to after the current RCU grace period. But this only
covers requests that don't end up in the allocation cache. If a request
goes into the alloc cache, it can get reused before it is sane to do so.
A recent syzbot report would seem to indicate that there's something
there, however it may very well just be because of the KASAN poisoning
that the alloc_cache handles manually.
Rather than attempt to make the alloc_cache sane for that use case, just
drop the usage of the alloc_cache for msg_ring request payload data.
Fixes: 50cf5f3842af ("io_uring/msg_ring: add an alloc cache for io_kiocb entries")
Link: https://lore.kernel.org/io-uring/68cc2687.050a0220.139b6.0005.GAE@google.com/
Reported-by: syzbot+baa2e0f4e02df602583e@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Yang Xiuwei <yangxiuwei@kylinos.cn>
Date: Fri Sep 19 17:03:52 2025 +0800
io_uring: fix incorrect io_kiocb reference in io_link_skb
[ Upstream commit 2c139a47eff8de24e3350dadb4c9d5e3426db826 ]
In io_link_skb function, there is a bug where prev_notif is incorrectly
assigned using 'nd' instead of 'prev_nd'. This causes the context
validation check to compare the current notification with itself instead
of comparing it with the previous notification.
Fix by using the correct prev_nd parameter when obtaining prev_notif.
Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Fixes: 6fe4220912d19 ("io_uring/notif: implement notification stacking")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jens Axboe <axboe@kernel.dk>
Date: Thu Sep 18 10:21:14 2025 -0600
io_uring: include dying ring in task_work "should cancel" state
commit 3539b1467e94336d5854ebf976d9627bfb65d6c3 upstream.
When running task_work for an exiting task, rather than perform the
issue retry attempt, the task_work is canceled. However, this isn't
done for a ring that has been closed. This can lead to requests being
successfully completed post the ring being closed, which is somewhat
confusing and surprising to an application.
Rather than just check the task exit state, also include the ring
ref state in deciding whether or not to terminate a given request when
run from task_work.
Cc: stable@vger.kernel.org # 6.1+
Link: https://github.com/axboe/liburing/discussions/1459
Reported-by: Benedek Thaler <thaler@thaler.hu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Vasant Hegde <vasant.hegde@amd.com>
Date: Sat Sep 13 06:26:57 2025 +0000
iommu/amd/pgtbl: Fix possible race while increase page table level
commit 1e56310b40fd2e7e0b9493da9ff488af145bdd0c upstream.
The AMD IOMMU host page table implementation supports dynamic page table levels
(up to 6 levels), starting with a 3-level configuration that expands based on
IOVA address. The kernel maintains a root pointer and current page table level
to enable proper page table walks in alloc_pte()/fetch_pte() operations.
The IOMMU IOVA allocator initially starts with 32-bit address and onces its
exhuasted it switches to 64-bit address (max address is determined based
on IOMMU and device DMA capability). To support larger IOVA, AMD IOMMU
driver increases page table level.
But in unmap path (iommu_v1_unmap_pages()), fetch_pte() reads
pgtable->[root/mode] without lock. So its possible that in exteme corner case,
when increase_address_space() is updating pgtable->[root/mode], fetch_pte()
reads wrong page table level (pgtable->mode). It does compare the value with
level encoded in page table and returns NULL. This will result is
iommu_unmap ops to fail and upper layer may retry/log WARN_ON.
CPU 0 CPU 1
------ ------
map pages unmap pages
alloc_pte() -> increase_address_space() iommu_v1_unmap_pages() -> fetch_pte()
pgtable->root = pte (new root value)
READ pgtable->[mode/root]
Reads new root, old mode
Updates mode (pgtable->mode += 1)
Since Page table level updates are infrequent and already synchronized with a
spinlock, implement seqcount to enable lock-free read operations on the read path.
Fixes: 754265bcab7 ("iommu/amd: Fix race in increase_address_space()")
Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Cc: stable@vger.kernel.org
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Vasant Hegde <vasant.hegde@amd.com>
Date: Thu Sep 11 13:14:06 2025 +0000
iommu/amd: Fix alias device DTE setting
[ Upstream commit a0c17ed907ac3326cf3c9d6007ea222a746f5cc2 ]
Commit 7bea695ada0 restructured DTE flag handling but inadvertently changed
the alias device configuration logic. This may cause incorrect DTE settings
for certain devices.
Add alias flag check before calling set_dev_entry_from_acpi(). Also move the
device iteration loop inside the alias check to restrict execution to cases
where alias devices are present.
Fixes: 7bea695ada0 ("iommu/amd: Introduce struct ivhd_dte_flags to store persistent DTE flags")
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Zhen Ni <zhen.ni@easystack.cn>
Date: Fri Aug 22 10:49:15 2025 +0800
iommu/amd: Fix ivrs_base memleak in early_amd_iommu_init()
commit 923b70581cb6acede90f8aaf4afe5d1c58c67b71 upstream.
Fix a permanent ACPI table memory leak in early_amd_iommu_init() when
CMPXCHG16B feature is not supported
Fixes: 82582f85ed22 ("iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported")
Cc: stable@vger.kernel.org
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20250822024915.673427-1-zhen.ni@easystack.cn
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Wed Aug 27 17:08:27 2025 -0400
iommu/s390: Fix memory corruption when using identity domain
commit b3506e9bcc777ed6af2ab631c86a9990ed97b474 upstream.
zpci_get_iommu_ctrs() returns counter information to be reported as part
of device statistics; these counters are stored as part of the s390_domain.
The problem, however, is that the identity domain is not backed by an
s390_domain and so the conversion via to_s390_domain() yields a bad address
that is zero'd initially and read on-demand later via a sysfs read.
These counters aren't necessary for the identity domain; just return NULL
in this case.
This issue was discovered via KASAN with reports that look like:
BUG: KASAN: global-out-of-bounds in zpci_fmb_enable_device
when using the identity domain for a device on s390.
Cc: stable@vger.kernel.org
Fixes: 64af12c6ec3a ("iommu/s390: implement iommu passthrough via identity domain")
Reported-by: Cam Miller <cam@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Cam Miller <cam@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20250827210828.274527-1-mjrosato@linux.ibm.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Niklas Schnelle <schnelle@linux.ibm.com>
Date: Thu Sep 4 10:59:49 2025 +0200
iommu/s390: Make attach succeed when the device was surprise removed
commit 9ffaf5229055fcfbb3b3d6f1c7e58d63715c3f73 upstream.
When a PCI device is removed with surprise hotplug, there may still be
attempts to attach the device to the default domain as part of tear down
via (__iommu_release_dma_ownership()), or because the removal happens
during probe (__iommu_probe_device()). In both cases zpci_register_ioat()
fails with a cc value indicating that the device handle is invalid. This
is because the device is no longer part of the instance as far as the
hypervisor is concerned.
Currently this leads to an error return and s390_iommu_attach_device()
fails. This triggers the WARN_ON() in __iommu_group_set_domain_nofail()
because attaching to the default domain must never fail.
With the device fenced by the hypervisor no DMAs to or from memory are
possible and the IOMMU translations have no effect. Proceed as if the
registration was successful and let the hotplug event handling clean up
the device.
This is similar to how devices in the error state are handled since
commit 59bbf596791b ("iommu/s390: Make attach succeed even if the device
is in error state") except that for removal the domain will not be
registered later. This approach was also previously discussed at the
link.
Handle both cases, error state and removal, in a helper which checks if
the error needs to be propagated or ignored. Avoid magic number
condition codes by using the pre-existing, but never used, defines for
PCI load/store condition codes and rename them to reflect that they
apply to all PCI instructions.
Cc: stable@vger.kernel.org # v6.2
Link: https://lore.kernel.org/linux-iommu/20240808194155.GD1985367@ziepe.ca/
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Link: https://lore.kernel.org/r/20250904-iommu_succeed_attach_removed-v1-1-e7f333d2f80f@linux.ibm.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Eugene Koira <eugkoira@amazon.com>
Date: Wed Sep 3 13:53:29 2025 +0800
iommu/vt-d: Fix __domain_mapping()'s usage of switch_to_super_page()
commit dce043c07ca1ac19cfbe2844a6dc71e35c322353 upstream.
switch_to_super_page() assumes the memory range it's working on is aligned
to the target large page level. Unfortunately, __domain_mapping() doesn't
take this into account when using it, and will pass unaligned ranges
ultimately freeing a PTE range larger than expected.
Take for example a mapping with the following iov_pfn range [0x3fe400,
0x4c0600), which should be backed by the following mappings:
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages
iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages
iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff]
to switch_to_super_page() at a 1 GiB granularity, which will in turn
free PTEs all the way to iov_pfn 0x4fffff.
Mitigate this by rounding down the iov_pfn range passed to
switch_to_super_page() in __domain_mapping()
to the target large page level.
Additionally add range alignment checks to switch_to_super_page.
Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()")
Signed-off-by: Eugene Koira <eugkoira@amazon.com>
Cc: stable@vger.kernel.org
Reviewed-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20250826143816.38686-1-eugkoira@amazon.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Date: Mon Sep 8 13:26:29 2025 +0200
ixgbe: destroy aci.lock later within ixgbe_remove path
[ Upstream commit 316ba68175b04a9f6f75295764789ea94e31d48c ]
There's another issue with aci.lock and previous patch uncovers it.
aci.lock is being destroyed during removing ixgbe while some of the
ixgbe closing routines are still ongoing. These routines use Admin
Command Interface which require taking aci.lock which has been already
destroyed what leads to call trace.
[ +0.000004] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[ +0.000007] WARNING: CPU: 12 PID: 10277 at kernel/locking/mutex.c:155 mutex_lock+0x5f/0x70
[ +0.000002] Call Trace:
[ +0.000003] <TASK>
[ +0.000006] ixgbe_aci_send_cmd+0xc8/0x220 [ixgbe]
[ +0.000049] ? try_to_wake_up+0x29d/0x5d0
[ +0.000009] ixgbe_disable_rx_e610+0xc4/0x110 [ixgbe]
[ +0.000032] ixgbe_disable_rx+0x3d/0x200 [ixgbe]
[ +0.000027] ixgbe_down+0x102/0x3b0 [ixgbe]
[ +0.000031] ixgbe_close_suspend+0x28/0x90 [ixgbe]
[ +0.000028] ixgbe_close+0xfb/0x100 [ixgbe]
[ +0.000025] __dev_close_many+0xae/0x220
[ +0.000005] dev_close_many+0xc2/0x1a0
[ +0.000004] ? kernfs_should_drain_open_files+0x2a/0x40
[ +0.000005] unregister_netdevice_many_notify+0x204/0xb00
[ +0.000006] ? __kernfs_remove.part.0+0x109/0x210
[ +0.000006] ? kobj_kset_leave+0x4b/0x70
[ +0.000008] unregister_netdevice_queue+0xf6/0x130
[ +0.000006] unregister_netdev+0x1c/0x40
[ +0.000005] ixgbe_remove+0x216/0x290 [ixgbe]
[ +0.000021] pci_device_remove+0x42/0xb0
[ +0.000007] device_release_driver_internal+0x19c/0x200
[ +0.000008] driver_detach+0x48/0x90
[ +0.000003] bus_remove_driver+0x6d/0xf0
[ +0.000006] pci_unregister_driver+0x2e/0xb0
[ +0.000005] ixgbe_exit_module+0x1c/0xc80 [ixgbe]
Same as for the previous commit, the issue has been highlighted by the
commit 337369f8ce9e ("locking/mutex: Add MUTEX_WARN_ON() into fast path").
Move destroying aci.lock to the end of ixgbe_remove(), as this
simply fixes the issue.
Fixes: 4600cdf9f5ac ("ixgbe: Enable link management in E610 device")
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Date: Mon Sep 8 13:26:28 2025 +0200
ixgbe: initialize aci.lock before it's used
[ Upstream commit b85936e95a4bd2a07e134af71e2c0750a69d2b8b ]
Currently aci.lock is initialized too late. A bunch of ACI callbacks
using the lock are called prior it's initialized.
Commit 337369f8ce9e ("locking/mutex: Add MUTEX_WARN_ON() into fast path")
highlights that issue what results in call trace.
[ 4.092899] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[ 4.092910] WARNING: CPU: 0 PID: 578 at kernel/locking/mutex.c:154 mutex_lock+0x6d/0x80
[ 4.098757] Call Trace:
[ 4.098847] <TASK>
[ 4.098922] ixgbe_aci_send_cmd+0x8c/0x1e0 [ixgbe]
[ 4.099108] ? hrtimer_try_to_cancel+0x18/0x110
[ 4.099277] ixgbe_aci_get_fw_ver+0x52/0xa0 [ixgbe]
[ 4.099460] ixgbe_check_fw_error+0x1fc/0x2f0 [ixgbe]
[ 4.099650] ? usleep_range_state+0x69/0xd0
[ 4.099811] ? usleep_range_state+0x8c/0xd0
[ 4.099964] ixgbe_probe+0x3b0/0x12d0 [ixgbe]
[ 4.100132] local_pci_probe+0x43/0xa0
[ 4.100267] work_for_cpu_fn+0x13/0x20
[ 4.101647] </TASK>
Move aci.lock mutex initialization to ixgbe_sw_init() before any ACI
command is sent. Along with that move also related SWFW semaphore in
order to reduce size of ixgbe_probe() and that way all locks are
initialized in ixgbe_sw_init().
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Fixes: 4600cdf9f5ac ("ixgbe: Enable link management in E610 device")
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Namjae Jeon <linkinjeon@kernel.org>
Date: Wed Sep 10 11:22:52 2025 +0900
ksmbd: smbdirect: validate data_offset and data_length field of smb_direct_data_transfer
commit 5282491fc49d5614ac6ddcd012e5743eecb6a67c upstream.
If data_offset and data_length of smb_direct_data_transfer struct are
invalid, out of bounds issue could happen.
This patch validate data_offset and data_length field in recv_done.
Cc: stable@vger.kernel.org
Fixes: 2ea086e35c3d ("ksmbd: add buffer validation for smb direct")
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Reported-by: Luigino Camastra, Aisle Research <luigino.camastra@aisle.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Stefan Metzmacher <metze@samba.org>
Date: Thu Sep 11 10:05:23 2025 +0900
ksmbd: smbdirect: verify remaining_data_length respects max_fragmented_recv_size
commit e1868ba37fd27c6a68e31565402b154beaa65df0 upstream.
This is inspired by the check for data_offset + data_length.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: stable@vger.kernel.org
Fixes: 2ea086e35c3d ("ksmbd: add buffer validation for smb direct")
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Date: Mon Aug 25 18:44:28 2025 +0200
KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active
commit d02e48830e3fce9701265f6c5a58d9bdaf906a76 upstream.
Commit 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
inhibited pre-VMRUN sync of TPR from LAPIC into VMCB::V_TPR in
sync_lapic_to_cr8() when AVIC is active.
AVIC does automatically sync between these two fields, however it does
so only on explicit guest writes to one of these fields, not on a bare
VMRUN.
This meant that when AVIC is enabled host changes to TPR in the LAPIC
state might not get automatically copied into the V_TPR field of VMCB.
This is especially true when it is the userspace setting LAPIC state via
KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
VMCB.
Practice shows that it is the V_TPR that is actually used by the AVIC to
decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
so any leftover value in V_TPR will cause serious interrupt delivery issues
in the guest when AVIC is enabled.
Fix this issue by doing pre-VMRUN TPR sync from LAPIC into VMCB::V_TPR
even when AVIC is enabled.
Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
Cc: stable@vger.kernel.org
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/c231be64280b1461e854e1ce3595d70cde3a2e9d.1756139678.git.maciej.szmigiero@oracle.com
[sean: tag for stable@]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Thu Sep 25 11:16:54 2025 +0200
Linux 6.16.9
Link: https://lore.kernel.org/r/20250922192412.885919229@linuxfoundation.org
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com>
Tested-by: Brett A C Sheffield <bacs@librecast.net>
Tested-by: Pascal Ernster <git@hardfalcon.net>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Mark Brown <broonie@kernel.org>
Tested-by: Peter Schneider <pschneider1968@googlemail.com>
Tested-by: Ron Economos <re@w6rz.net>
Tested-by: Ronald Warsow <rwarsow@gmx.de>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Tested-by: Miguel Ojeda <ojeda@kernel.org>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Hardik Garg <hargar@linux.microsoft.com>
Tested-by: Dileep Malepu <dileep.debian@gmail.com>
Tested-By: Achill Gilgenast <achill@achill.org>=
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Huacai Chen <chenhuacai@kernel.org>
Date: Thu Sep 18 19:44:01 2025 +0800
LoongArch: Align ACPI structures if ARCH_STRICT_ALIGN enabled
commit a9d13433fe17be0e867e51e71a1acd2731fbef8d upstream.
ARCH_STRICT_ALIGN is used for hardware without UAL, now it only control
the -mstrict-align flag. However, ACPI structures are packed by default
so will cause unaligned accesses.
To avoid this, define ACPI_MISALIGNMENT_NOT_SUPPORTED in asm/acenv.h to
align ACPI structures if ARCH_STRICT_ALIGN enabled.
Cc: stable@vger.kernel.org
Reported-by: Binbin Zhou <zhoubinbin@loongson.cn>
Suggested-by: Xi Ruoyao <xry111@xry111.site>
Suggested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tao Cui <cuitao@kylinos.cn>
Date: Thu Sep 18 19:44:04 2025 +0800
LoongArch: Check the return value when creating kobj
commit 51adb03e6b865c0c6790f29659ff52d56742de2e upstream.
Add a check for the return value of kobject_create_and_add(), to ensure
that the kobj allocation succeeds for later use.
Cc: stable@vger.kernel.org
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tiezhu Yang <yangtiezhu@loongson.cn>
Date: Thu Sep 18 19:44:08 2025 +0800
LoongArch: Fix unreliable stack for live patching
commit 677d4a52d4dc4a147d5e84af9ff207832578be70 upstream.
When testing the kernel live patching with "modprobe livepatch-sample",
there is a timeout over 15 seconds from "starting patching transition"
to "patching complete". The dmesg command shows "unreliable stack" for
user tasks in debug mode, here is one of the messages:
livepatch: klp_try_switch_task: bash:1193 has an unreliable stack
The "unreliable stack" is because it can not unwind from do_syscall()
to its previous frame handle_syscall(). It should use fp to find the
original stack top due to secondary stack in do_syscall(), but fp is
not used for some other functions, then fp can not be restored by the
next frame of do_syscall(), so it is necessary to save fp if task is
not current, in order to get the stack top of do_syscall().
Here are the call chains:
klp_enable_patch()
klp_try_complete_transition()
klp_try_switch_task()
klp_check_and_switch_task()
klp_check_stack()
stack_trace_save_tsk_reliable()
arch_stack_walk_reliable()
When executing "rmmod livepatch-sample", there exists a similar issue.
With this patch, it takes a short time for patching and unpatching.
Before:
# modprobe livepatch-sample
# dmesg -T | tail -3
[Sat Sep 6 11:00:20 2025] livepatch: 'livepatch_sample': starting patching transition
[Sat Sep 6 11:00:35 2025] livepatch: signaling remaining tasks
[Sat Sep 6 11:00:36 2025] livepatch: 'livepatch_sample': patching complete
# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
# rmmod livepatch_sample
rmmod: ERROR: Module livepatch_sample is in use
# rmmod livepatch_sample
# dmesg -T | tail -3
[Sat Sep 6 11:06:05 2025] livepatch: 'livepatch_sample': starting unpatching transition
[Sat Sep 6 11:06:20 2025] livepatch: signaling remaining tasks
[Sat Sep 6 11:06:21 2025] livepatch: 'livepatch_sample': unpatching complete
After:
# modprobe livepatch-sample
# dmesg -T | tail -2
[Tue Sep 16 16:19:30 2025] livepatch: 'livepatch_sample': starting patching transition
[Tue Sep 16 16:19:31 2025] livepatch: 'livepatch_sample': patching complete
# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
# rmmod livepatch_sample
# dmesg -T | tail -2
[Tue Sep 16 16:19:36 2025] livepatch: 'livepatch_sample': starting unpatching transition
[Tue Sep 16 16:19:37 2025] livepatch: 'livepatch_sample': unpatching complete
Cc: stable@vger.kernel.org # v6.9+
Fixes: 199cc14cb4f1 ("LoongArch: Add kernel livepatching support")
Reported-by: Xi Zhang <zhangxi@kylinos.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tiezhu Yang <yangtiezhu@loongson.cn>
Date: Thu Sep 18 19:43:42 2025 +0800
LoongArch: Handle jump tables options for RUST
commit 74f8295c6fb8436bec9995baf6ba463151b6fb68 upstream.
When compiling with LLVM and CONFIG_RUST is set, there exist objtool
warnings in rust/core.o and rust/kernel.o, like this:
rust/core.o: warning: objtool:
_RNvXs1_NtNtCs5QSdWC790r4_4core5ascii10ascii_charNtB5_9AsciiCharNtNtB9_3fmt5Debug3fmt+0x54:
sibling call from callable instruction with modified stack frame
For this special case, the related object file shows that there is no
generated relocation section '.rela.discard.tablejump_annotate' for the
table jump instruction jirl, thus objtool can not know that what is the
actual destination address.
If rustc has the option "-Cllvm-args=--loongarch-annotate-tablejump",
pass the option to enable jump tables for objtool, otherwise it should
pass "-Zno-jump-tables" to keep compatibility with older rustc.
How to test:
$ rustup component add rust-src
$ make LLVM=1 rustavailable
$ make ARCH=loongarch LLVM=1 clean defconfig
$ scripts/config -d MODVERSIONS \
-e RUST -e SAMPLES -e SAMPLES_RUST \
-e SAMPLE_RUST_CONFIGFS -e SAMPLE_RUST_MINIMAL \
-e SAMPLE_RUST_MISC_DEVICE -e SAMPLE_RUST_PRINT \
-e SAMPLE_RUST_DMA -e SAMPLE_RUST_DRIVER_PCI \
-e SAMPLE_RUST_DRIVER_PLATFORM -e SAMPLE_RUST_DRIVER_FAUX \
-e SAMPLE_RUST_DRIVER_AUXILIARY -e SAMPLE_RUST_HOSTPROGS
$ make ARCH=loongarch LLVM=1 olddefconfig all
Cc: stable@vger.kernel.org
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reported-by: Miguel Ojeda <ojeda@kernel.org>
Closes: https://lore.kernel.org/rust-for-linux/CANiq72mNeCuPkCDrG2db3w=AX+O-zYrfprisDPmRac_qh65Dmg@mail.gmail.com/
Suggested-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bibo Mao <maobibo@loongson.cn>
Date: Thu Sep 18 19:44:22 2025 +0800
LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_ctrl_access()
commit 47256c4c8b1bfbc63223a0da2d4fa90b6ede5cbb upstream.
Function copy_from_user() and copy_to_user() may sleep because of page
fault, and they cannot be called in spin_lock hold context. Here move
function calling of copy_from_user() and copy_to_user() before spinlock
context in function kvm_eiointc_ctrl_access().
Otherwise there will be possible warning such as:
BUG: sleeping function called from invalid context at include/linux/uaccess.h:192
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40
softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full)
Tainted: [W]=WARN
Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000
9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8
9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001
0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880
00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe
000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0
0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000
0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0
0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40
00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d
Call Trace:
[<9000000004c2827c>] show_stack+0x5c/0x180
[<9000000004c20fac>] dump_stack_lvl+0x94/0xe4
[<9000000004c99c7c>] __might_resched+0x26c/0x290
[<9000000004f68968>] __might_fault+0x20/0x88
[<ffff800002311de0>] kvm_eiointc_ctrl_access.isra.0+0x88/0x380 [kvm]
[<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm]
[<900000000506b0d8>] sys_ioctl+0x388/0x1010
[<90000000063ed210>] do_syscall+0xb0/0x2d8
[<9000000004c25ef8>] handle_syscall+0xb8/0x158
Cc: stable@vger.kernel.org
Fixes: 1ad7efa552fd5 ("LoongArch: KVM: Add EIOINTC user mode read and write functions")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bibo Mao <maobibo@loongson.cn>
Date: Thu Sep 18 19:44:22 2025 +0800
LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_regs_access()
commit 62f11796a0dfa1a2ef5f50a2d1bc81c81628fb8e upstream.
Function copy_from_user() and copy_to_user() may sleep because of page
fault, and they cannot be called in spin_lock hold context. Here move
function calling of copy_from_user() and copy_to_user() before spinlock
context in function kvm_eiointc_ctrl_access().
Otherwise there will be possible warning such as:
BUG: sleeping function called from invalid context at include/linux/uaccess.h:192
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40
softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full)
Tainted: [W]=WARN
Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000
9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8
9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001
0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880
00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe
000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0
0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000
0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0
0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40
00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d
Call Trace:
[<9000000004c2827c>] show_stack+0x5c/0x180
[<9000000004c20fac>] dump_stack_lvl+0x94/0xe4
[<9000000004c99c7c>] __might_resched+0x26c/0x290
[<9000000004f68968>] __might_fault+0x20/0x88
[<ffff800002311de0>] kvm_eiointc_regs_access.isra.0+0x88/0x380 [kvm]
[<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm]
[<900000000506b0d8>] sys_ioctl+0x388/0x1010
[<90000000063ed210>] do_syscall+0xb0/0x2d8
[<9000000004c25ef8>] handle_syscall+0xb8/0x158
Cc: stable@vger.kernel.org
Fixes: 1ad7efa552fd5 ("LoongArch: KVM: Add EIOINTC user mode read and write functions")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bibo Mao <maobibo@loongson.cn>
Date: Thu Sep 18 19:44:22 2025 +0800
LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_sw_status_access()
commit 01a8e68396a6d51f5ba92021ad1a4b8eaabdd0e7 upstream.
Function copy_from_user() and copy_to_user() may sleep because of page
fault, and they cannot be called in spin_lock hold context. Here move
funtcion calling of copy_from_user() and copy_to_user() out of function
kvm_eiointc_sw_status_access().
Otherwise there will be possible warning such as:
BUG: sleeping function called from invalid context at include/linux/uaccess.h:192
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40
softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full)
Tainted: [W]=WARN
Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000
9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8
9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001
0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880
00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe
000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0
0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000
0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0
0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40
00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d
Call Trace:
[<9000000004c2827c>] show_stack+0x5c/0x180
[<9000000004c20fac>] dump_stack_lvl+0x94/0xe4
[<9000000004c99c7c>] __might_resched+0x26c/0x290
[<9000000004f68968>] __might_fault+0x20/0x88
[<ffff800002311de0>] kvm_eiointc_sw_status_access.isra.0+0x88/0x380 [kvm]
[<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm]
[<900000000506b0d8>] sys_ioctl+0x388/0x1010
[<90000000063ed210>] do_syscall+0xb0/0x2d8
[<9000000004c25ef8>] handle_syscall+0xb8/0x158
Cc: stable@vger.kernel.org
Fixes: 1ad7efa552fd5 ("LoongArch: KVM: Add EIOINTC user mode read and write functions")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bibo Mao <maobibo@loongson.cn>
Date: Thu Sep 18 19:44:25 2025 +0800
LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_pch_pic_regs_access()
commit 8dc5245673cf7f33743e5c0d2a4207c0b8df3067 upstream.
Function copy_from_user() and copy_to_user() may sleep because of page
fault, and they cannot be called in spin_lock hold context. Here move
function calling of copy_from_user() and copy_to_user() out of spinlock
context in function kvm_pch_pic_regs_access().
Otherwise there will be possible warning such as:
BUG: sleeping function called from invalid context at include/linux/uaccess.h:192
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40
softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full)
Tainted: [W]=WARN
Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000
9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8
9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001
0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880
00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe
000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0
0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000
0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0
0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40
00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d
Call Trace:
[<9000000004c2827c>] show_stack+0x5c/0x180
[<9000000004c20fac>] dump_stack_lvl+0x94/0xe4
[<9000000004c99c7c>] __might_resched+0x26c/0x290
[<9000000004f68968>] __might_fault+0x20/0x88
[<ffff800002311de0>] kvm_pch_pic_regs_access.isra.0+0x88/0x380 [kvm]
[<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm]
[<900000000506b0d8>] sys_ioctl+0x388/0x1010
[<90000000063ed210>] do_syscall+0xb0/0x2d8
[<9000000004c25ef8>] handle_syscall+0xb8/0x158
Cc: stable@vger.kernel.org
Fixes: d206d95148732 ("LoongArch: KVM: Add PCHPIC user mode read and write functions")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bibo Mao <maobibo@loongson.cn>
Date: Thu Sep 18 19:44:22 2025 +0800
LoongArch: KVM: Fix VM migration failure with PTW enabled
commit f58c9aa1065f73d243904b267c71f6a9d1e9f90e upstream.
With PTW disabled system, bit _PAGE_DIRTY is a HW bit for page writing.
However with PTW enabled system, bit _PAGE_WRITE is also a "HW bit" for
page writing, because hardware synchronizes _PAGE_WRITE to _PAGE_DIRTY
automatically. Previously, _PAGE_WRITE is treated as a SW bit to record
the page writeable attribute for the fast page fault handling in the
secondary MMU, however with PTW enabled machine, this bit is used by HW
already (so setting it will silence the TLB modify exception).
Here define KVM_PAGE_WRITEABLE with the SW bit _PAGE_MODIFIED, so that
it can work on both PTW disabled and enabled machines. And for HW write
bits, both _PAGE_DIRTY and _PAGE_WRITE are set or clear together.
Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tiezhu Yang <yangtiezhu@loongson.cn>
Date: Thu Sep 18 19:43:42 2025 +0800
LoongArch: Make LTO case independent in Makefile
commit b15212824a01cb0b62f7b522f4ee334622cf982a upstream.
LTO is not only used for Clang, but maybe also used for Rust, make LTO
case out of CONFIG_CC_HAS_ANNOTATE_TABLEJUMP in Makefile.
This is preparation for later patch, no function changes.
Cc: stable@vger.kernel.org
Suggested-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tiezhu Yang <yangtiezhu@loongson.cn>
Date: Thu Sep 18 19:43:42 2025 +0800
LoongArch: Update help info of ARCH_STRICT_ALIGN
commit f5003098e2f337d8e8a87dc636250e3fa978d9ad upstream.
Loongson-3A6000 and 3C6000 CPUs also support unaligned memory access, so
the current description is out of date to some extent.
Actually, all of Loongson-3 series processors based on LoongArch support
unaligned memory access, this hardware capability is indicated by the bit
20 (UAL) of CPUCFG1 register, update the help info to reflect the reality.
Cc: stable@vger.kernel.org
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Guangshuo Li <202321181@mail.sdu.edu.cn>
Date: Thu Sep 18 19:44:10 2025 +0800
LoongArch: vDSO: Check kcalloc() result in init_vdso()
commit ac398f570724c41e5e039d54e4075519f6af7408 upstream.
Add a NULL-pointer check after the kcalloc() call in init_vdso(). If
allocation fails, return -ENOMEM to prevent a possible dereference of
vdso_info.code_mapping.pages when it is NULL.
Cc: stable@vger.kernel.org
Fixes: 2ed119aef60d ("LoongArch: Set correct size for vDSO code mapping")
Signed-off-by: Guangshuo Li <202321181@mail.sdu.edu.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Hugh Dickins <hughd@google.com>
Date: Mon Sep 8 15:15:03 2025 -0700
mm/gup: check ref_count instead of lru before migration
commit 98c6d259319ecf6e8d027abd3f14b81324b8c0ad upstream.
Patch series "mm: better GUP pin lru_add_drain_all()", v2.
Series of lru_add_drain_all()-related patches, arising from recent mm/gup
migration report from Will Deacon.
This patch (of 5):
Will Deacon reports:-
When taking a longterm GUP pin via pin_user_pages(),
__gup_longterm_locked() tries to migrate target folios that should not be
longterm pinned, for example because they reside in a CMA region or
movable zone. This is done by first pinning all of the target folios
anyway, collecting all of the longterm-unpinnable target folios into a
list, dropping the pins that were just taken and finally handing the list
off to migrate_pages() for the actual migration.
It is critically important that no unexpected references are held on the
folios being migrated, otherwise the migration will fail and
pin_user_pages() will return -ENOMEM to its caller. Unfortunately, it is
relatively easy to observe migration failures when running pKVM (which
uses pin_user_pages() on crosvm's virtual address space to resolve stage-2
page faults from the guest) on a 6.15-based Pixel 6 device and this
results in the VM terminating prematurely.
In the failure case, 'crosvm' has called mlock(MLOCK_ONFAULT) on its
mapping of guest memory prior to the pinning. Subsequently, when
pin_user_pages() walks the page-table, the relevant 'pte' is not present
and so the faulting logic allocates a new folio, mlocks it with
mlock_folio() and maps it in the page-table.
Since commit 2fbb0c10d1e8 ("mm/munlock: mlock_page() munlock_page() batch
by pagevec"), mlock/munlock operations on a folio (formerly page), are
deferred. For example, mlock_folio() takes an additional reference on the
target folio before placing it into a per-cpu 'folio_batch' for later
processing by mlock_folio_batch(), which drops the refcount once the
operation is complete. Processing of the batches is coupled with the LRU
batch logic and can be forcefully drained with lru_add_drain_all() but as
long as a folio remains unprocessed on the batch, its refcount will be
elevated.
This deferred batching therefore interacts poorly with the pKVM pinning
scenario as we can find ourselves in a situation where the migration code
fails to migrate a folio due to the elevated refcount from the pending
mlock operation.
Hugh Dickins adds:-
!folio_test_lru() has never been a very reliable way to tell if an
lru_add_drain_all() is worth calling, to remove LRU cache references to
make the folio migratable: the LRU flag may be set even while the folio is
held with an extra reference in a per-CPU LRU cache.
5.18 commit 2fbb0c10d1e8 may have made it more unreliable. Then 6.11
commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding
to LRU batch") tried to make it reliable, by moving LRU flag clearing; but
missed the mlock/munlock batches, so still unreliable as reported.
And it turns out to be difficult to extend 33dfe9204f29's LRU flag
clearing to the mlock/munlock batches: if they do benefit from batching,
mlock/munlock cannot be so effective when easily suppressed while !LRU.
Instead, switch to an expected ref_count check, which was more reliable
all along: some more false positives (unhelpful drains) than before, and
never a guarantee that the folio will prove migratable, but better.
Note on PG_private_2: ceph and nfs are still using the deprecated
PG_private_2 flag, with the aid of netfs and filemap support functions.
Although it is consistently matched by an increment of folio ref_count,
folio_expected_ref_count() intentionally does not recognize it, and ceph
folio migration currently depends on that for PG_private_2 folios to be
rejected. New references to the deprecated flag are discouraged, so do
not add it into the collect_longterm_unpinnable_folios() calculation: but
longterm pinning of transiently PG_private_2 ceph and nfs folios (an
uncommon case) may invoke a redundant lru_add_drain_all(). And this makes
easy the backport to earlier releases: up to and including 6.12, btrfs
also used PG_private_2, but without a ref_count increment.
Note for stable backports: requires 6.16 commit 86ebd50224c0 ("mm:
add folio_expected_ref_count() for reference count calculation").
Link: https://lkml.kernel.org/r/41395944-b0e3-c3ac-d648-8ddd70451d28@google.com
Link: https://lkml.kernel.org/r/bd1f314a-fca1-8f19-cac0-b936c9614557@google.com
Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Will Deacon <will@kernel.org>
Closes: https://lore.kernel.org/linux-mm/20250815101858.24352-1-will@kernel.org/
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Keir Fraser <keirf@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shivank Garg <shivankg@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: yangge <yangge1116@126.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Hugh Dickins <hughd@google.com>
Date: Mon Sep 8 15:16:53 2025 -0700
mm/gup: local lru_add_drain() to avoid lru_add_drain_all()
commit a09a8a1fbb374e0053b97306da9dbc05bd384685 upstream.
In many cases, if collect_longterm_unpinnable_folios() does need to drain
the LRU cache to release a reference, the cache in question is on this
same CPU, and much more efficiently drained by a preliminary local
lru_add_drain(), than the later cross-CPU lru_add_drain_all().
Marked for stable, to counter the increase in lru_add_drain_all()s from
"mm/gup: check ref_count instead of lru before migration". Note for clean
backports: can take 6.16 commit a03db236aebf ("gup: optimize longterm
pin_user_pages() for large folio") first.
Link: https://lkml.kernel.org/r/66f2751f-283e-816d-9530-765db7edc465@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Keir Fraser <keirf@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shivank Garg <shivankg@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: yangge <yangge1116@126.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Hugh Dickins <hughd@google.com>
Date: Mon Sep 8 15:23:15 2025 -0700
mm: folio_may_be_lru_cached() unless folio_test_large()
commit 2da6de30e60dd9bb14600eff1cc99df2fa2ddae3 upstream.
mm/swap.c and mm/mlock.c agree to drain any per-CPU batch as soon as a
large folio is added: so collect_longterm_unpinnable_folios() just wastes
effort when calling lru_add_drain[_all]() on a large folio.
But although there is good reason not to batch up PMD-sized folios, we
might well benefit from batching a small number of low-order mTHPs (though
unclear how that "small number" limitation will be implemented).
So ask if folio_may_be_lru_cached() rather than !folio_test_large(), to
insulate those particular checks from future change. Name preferred to
"folio_is_batchable" because large folios can well be put on a batch: it's
just the per-CPU LRU caches, drained much later, which need care.
Marked for stable, to counter the increase in lru_add_drain_all()s from
"mm/gup: check ref_count instead of lru before migration".
Link: https://lkml.kernel.org/r/57d2eaf8-3607-f318-e0c5-be02dce61ad0@google.com
Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
Signed-off-by: Hugh Dickins <hughd@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Keir Fraser <keirf@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shivank Garg <shivankg@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: yangge <yangge1116@126.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Hugh Dickins <hughd@google.com>
Date: Mon Sep 8 15:19:17 2025 -0700
mm: revert "mm/gup: clear the LRU flag of a page before adding to LRU batch"
commit afb99e9f500485160f34b8cad6d3763ada3e80e8 upstream.
This reverts commit 33dfe9204f29: now that
collect_longterm_unpinnable_folios() is checking ref_count instead of lru,
and mlock/munlock do not participate in the revised LRU flag clearing,
those changes are misleading, and enlarge the window during which
mlock/munlock may miss an mlock_count update.
It is possible (I'd hesitate to claim probable) that the greater
likelihood of missed mlock_count updates would explain the "Realtime
threads delayed due to kcompactd0" observed on 6.12 in the Link below. If
that is the case, this reversion will help; but a complete solution needs
also a further patch, beyond the scope of this series.
Included some 80-column cleanup around folio_batch_add_and_move().
The role of folio_test_clear_lru() (before taking per-memcg lru_lock) is
questionable since 6.13 removed mem_cgroup_move_account() etc; but perhaps
there are still some races which need it - not examined here.
Link: https://lore.kernel.org/linux-mm/DU0PR01MB10385345F7153F334100981888259A@DU0PR01MB10385.eurprd01.prod.exchangelabs.com/
Link: https://lkml.kernel.org/r/05905d7b-ed14-68b1-79d8-bdec30367eba@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Keir Fraser <keirf@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shivank Garg <shivankg@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: yangge <yangge1116@126.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Hugh Dickins <hughd@google.com>
Date: Mon Sep 8 15:21:12 2025 -0700
mm: revert "mm: vmscan.c: fix OOM on swap stress test"
commit 8d79ed36bfc83d0583ab72216b7980340478cdfb upstream.
This reverts commit 0885ef470560: that was a fix to the reverted
33dfe9204f29b415bbc0abb1a50642d1ba94f5e9.
Link: https://lkml.kernel.org/r/aa0e9d67-fbcd-9d79-88a1-641dfbe1d9d1@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Keir Fraser <keirf@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shivank Garg <shivankg@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: yangge <yangge1116@126.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thomas Fourier <fourier.thomas@gmail.com>
Date: Tue Aug 26 09:58:08 2025 +0200
mmc: mvsdio: Fix dma_unmap_sg() nents value
commit 8ab2f1c35669bff7d7ed1bb16bf5cc989b3e2e17 upstream.
The dma_unmap_sg() functions should be called with the same nents as the
dma_map_sg(), not the value the map function returned.
Fixes: 236caa7cc351 ("mmc: SDIO driver for Marvell SoCs")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ben Chuang <ben.chuang@genesyslogic.com.tw>
Date: Thu Sep 11 10:42:42 2025 +0800
mmc: sdhci-pci-gli: GL9767: Fix initializing the UHS-II interface during a power-on
commit 77a436c93d10d68201bfd4941d1ca3230dfd1f40 upstream.
According to the power structure of IC hardware design for UHS-II
interface, reset control and timing must be added to the initialization
process of powering on the UHS-II interface.
Fixes: 27dd3b82557a ("mmc: sdhci-pci-gli: enable UHS-II mode for GL9767")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Ben Chuang <ben.chuang@genesyslogic.com.tw>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ben Chuang <ben.chuang@genesyslogic.com.tw>
Date: Thu Sep 11 10:41:01 2025 +0800
mmc: sdhci-uhs2: Fix calling incorrect sdhci_set_clock() function
commit 09c2b628f6403ad467fc73326a50020590603871 upstream.
Fix calling incorrect sdhci_set_clock() in __sdhci_uhs2_set_ios() when the
vendor defines its own sdhci_set_clock().
Fixes: 10c8298a052b ("mmc: sdhci-uhs2: add set_ios()")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Ben Chuang <ben.chuang@genesyslogic.com.tw>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ben Chuang <ben.chuang@genesyslogic.com.tw>
Date: Thu Sep 11 10:40:20 2025 +0800
mmc: sdhci: Move the code related to setting the clock from sdhci_set_ios_common() into sdhci_set_ios()
commit 7b7e71683b4ccbe0dbd7d434707623327e852f20 upstream.
The sdhci_set_clock() is called in sdhci_set_ios_common() and
__sdhci_uhs2_set_ios(). According to Section 3.13.2 "Card Interface
Detection Sequence" of the SD Host Controller Standard Specification
Version 7.00, the SD clock is supplied after power is supplied, so we only
need one in __sdhci_uhs2_set_ios(). Let's move the code related to setting
the clock from sdhci_set_ios_common() into sdhci_set_ios() and modify
the parameters passed to sdhci_set_clock() in __sdhci_uhs2_set_ios().
Fixes: 10c8298a052b ("mmc: sdhci-uhs2: add set_ios()")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Ben Chuang <ben.chuang@genesyslogic.com.tw>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Fri Sep 19 23:49:22 2025 +0200
mptcp: pm: nl: announce deny-join-id0 flag
commit 2293c57484ae64c9a3c847c8807db8c26a3a4d41 upstream.
During the connection establishment, a peer can tell the other one that
it cannot establish new subflows to the initial IP address and port by
setting the 'C' flag [1]. Doing so makes sense when the sender is behind
a strict NAT, operating behind a legacy Layer 4 load balancer, or using
anycast IP address for example.
When this 'C' flag is set, the path-managers must then not try to
establish new subflows to the other peer's initial IP address and port.
The in-kernel PM has access to this info, but the userspace PM didn't.
The RFC8684 [1] is strict about that:
(...) therefore the receiver MUST NOT try to open any additional
subflows toward this address and port.
So it is important to tell the userspace about that as it is responsible
for the respect of this flag.
When a new connection is created and established, the Netlink events
now contain the existing but not currently used 'flags' attribute. When
MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 is set, it means no other subflows
to the initial IP address and port -- info that are also part of the
event -- can be established.
Link: https://datatracker.ietf.org/doc/html/rfc8684#section-3.1-20.6 [1]
Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
Reported-by: Marek Majkowski <marek@cloudflare.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/532
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-2-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Conflicts in mptcp_pm.yaml, because the indentation has been modified
in commit ec362192aa9e ("netlink: specs: fix up indentation errors"),
which is not in this version. Applying the same modifications, but at
a different level. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Fri Sep 12 14:25:50 2025 +0200
mptcp: propagate shutdown to subflows when possible
commit f755be0b1ff429a2ecf709beeb1bcd7abc111c2b upstream.
When the MPTCP DATA FIN have been ACKed, there is no more MPTCP related
metadata to exchange, and all subflows can be safely shutdown.
Before this patch, the subflows were actually terminated at 'close()'
time. That's certainly fine most of the time, but not when the userspace
'shutdown()' a connection, without close()ing it. When doing so, the
subflows were staying in LAST_ACK state on one side -- and consequently
in FIN_WAIT2 on the other side -- until the 'close()' of the MPTCP
socket.
Now, when the DATA FIN have been ACKed, all subflows are shutdown. A
consequence of this is that the TCP 'FIN' flag can be set earlier now,
but the end result is the same. This affects the packetdrill tests
looking at the end of the MPTCP connections, but for a good reason.
Note that tcp_shutdown() will check the subflow state, so no need to do
that again before calling it.
Fixes: 3721b9b64676 ("mptcp: Track received DATA_FIN sequence number and add related helpers")
Cc: stable@vger.kernel.org
Fixes: 16a9a9da1723 ("mptcp: Add helper to process acks of DATA_FIN")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-1-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Fri Sep 12 14:52:20 2025 +0200
mptcp: set remote_deny_join_id0 on SYN recv
[ Upstream commit 96939cec994070aa5df852c10fad5fc303a97ea3 ]
When a SYN containing the 'C' flag (deny join id0) was received, this
piece of information was not propagated to the path-manager.
Even if this flag is mainly set on the server side, a client can also
tell the server it cannot try to establish new subflows to the client's
initial IP address and port. The server's PM should then record such
info when received, and before sending events about the new connection.
Fixes: df377be38725 ("mptcp: add deny_join_id0 in mptcp_options_received")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-1-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Fri Sep 12 14:52:23 2025 +0200
mptcp: tfo: record 'deny join id0' info
[ Upstream commit 92da495cb65719583aa06bc946aeb18a10e1e6e2 ]
When TFO is used, the check to see if the 'C' flag (deny join id0) was
set was bypassed.
This flag can be set when TFO is used, so the check should also be done
when TFO is used.
Note that the set_fully_established label is also used when a 4th ACK is
received. In this case, deny_join_id0 will not be set.
Fixes: dfc8d0603033 ("mptcp: implement delayed seq generation for passive fastopen")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-4-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Li Tian <litian@redhat.com>
Date: Wed Sep 10 08:37:32 2025 +0800
net/mlx5: Not returning mlx5_link_info table when speed is unknown
[ Upstream commit 5577352b55833d0f4350eb5d62eda2df09e84922 ]
Because mlx5e_link_info and mlx5e_ext_link_info have holes
e.g. Azure mlx5 reports PTYS 19. Do not return it unless speed
is retrieved successfully.
Fixes: 65a5d35571849 ("net/mlx5: Refactor link speed handling with mlx5_link_info struct")
Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Li Tian <litian@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250910003732.5973-1-litian@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Lama Kayal <lkayal@nvidia.com>
Date: Mon Sep 15 15:24:34 2025 +0300
net/mlx5e: Add a miss level for ipsec crypto offload
[ Upstream commit 7601a0a46216f4ba05adff2de75923b4e8e585c2 ]
The cited commit adds a miss table for switchdev mode. But it
uses the same level as policy table. Will hit the following error
when running command:
# ip xfrm state add src 192.168.1.22 dst 192.168.1.21 proto \
esp spi 1001 reqid 10001 aead 'rfc4106(gcm(aes))' \
0x3a189a7f9374955d3817886c8587f1da3df387ff 128 \
mode tunnel offload dev enp8s0f0 dir in
Error: mlx5_core: Device failed to offload this state.
The dmesg error is:
mlx5_core 0000:03:00.0: ipsec_miss_create:578:(pid 311797): fail to create IPsec miss_rule err=-22
Fix it by adding a new miss level to avoid the error.
Fixes: 7d9e292ecd67 ("net/mlx5e: Move IPSec policy check after decryption")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1757939074-617281-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jianbo Liu <jianbol@nvidia.com>
Date: Mon Sep 15 15:24:32 2025 +0300
net/mlx5e: Harden uplink netdev access against device unbind
[ Upstream commit 6b4be64fd9fec16418f365c2d8e47a7566e9eba5 ]
The function mlx5_uplink_netdev_get() gets the uplink netdevice
pointer from mdev->mlx5e_res.uplink_netdev. However, the netdevice can
be removed and its pointer cleared when unbound from the mlx5_core.eth
driver. This results in a NULL pointer, causing a kernel panic.
BUG: unable to handle page fault for address: 0000000000001300
at RIP: 0010:mlx5e_vport_rep_load+0x22a/0x270 [mlx5_core]
Call Trace:
<TASK>
mlx5_esw_offloads_rep_load+0x68/0xe0 [mlx5_core]
esw_offloads_enable+0x593/0x910 [mlx5_core]
mlx5_eswitch_enable_locked+0x341/0x420 [mlx5_core]
mlx5_devlink_eswitch_mode_set+0x17e/0x3a0 [mlx5_core]
devlink_nl_eswitch_set_doit+0x60/0xd0
genl_family_rcv_msg_doit+0xe0/0x130
genl_rcv_msg+0x183/0x290
netlink_rcv_skb+0x4b/0xf0
genl_rcv+0x24/0x40
netlink_unicast+0x255/0x380
netlink_sendmsg+0x1f3/0x420
__sock_sendmsg+0x38/0x60
__sys_sendto+0x119/0x180
do_syscall_64+0x53/0x1d0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Ensure the pointer is valid before use by checking it for NULL. If it
is valid, immediately call netdev_hold() to take a reference, and
preventing the netdevice from being freed while it is in use.
Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1757939074-617281-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Anderson Nascimento <anderson@allelesecurity.com>
Date: Thu Sep 11 20:07:44 2025 -0300
net/tcp: Fix a NULL pointer dereference when using TCP-AO with TCP_REPAIR
[ Upstream commit 2e7bba08923ebc675b1f0e0e0959e68e53047838 ]
A NULL pointer dereference can occur in tcp_ao_finish_connect() during a
connect() system call on a socket with a TCP-AO key added and TCP_REPAIR
enabled.
The function is called with skb being NULL and attempts to dereference it
on tcp_hdr(skb)->seq without a prior skb validation.
Fix this by checking if skb is NULL before dereferencing it.
The commentary is taken from bpf_skops_established(), which is also called
in the same flow. Unlike the function being patched,
bpf_skops_established() validates the skb before dereferencing it.
int main(void){
struct sockaddr_in sockaddr;
struct tcp_ao_add tcp_ao;
int sk;
int one = 1;
memset(&sockaddr,'\0',sizeof(sockaddr));
memset(&tcp_ao,'\0',sizeof(tcp_ao));
sk = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
sockaddr.sin_family = AF_INET;
memcpy(tcp_ao.alg_name,"cmac(aes128)",12);
memcpy(tcp_ao.key,"ABCDEFGHABCDEFGH",16);
tcp_ao.keylen = 16;
memcpy(&tcp_ao.addr,&sockaddr,sizeof(sockaddr));
setsockopt(sk, IPPROTO_TCP, TCP_AO_ADD_KEY, &tcp_ao,
sizeof(tcp_ao));
setsockopt(sk, IPPROTO_TCP, TCP_REPAIR, &one, sizeof(one));
sockaddr.sin_family = AF_INET;
sockaddr.sin_port = htobe16(123);
inet_aton("127.0.0.1", &sockaddr.sin_addr);
connect(sk,(struct sockaddr *)&sockaddr,sizeof(sockaddr));
return 0;
}
$ gcc tcp-ao-nullptr.c -o tcp-ao-nullptr -Wall
$ unshare -Urn
BUG: kernel NULL pointer dereference, address: 00000000000000b6
PGD 1f648d067 P4D 1f648d067 PUD 1982e8067 PMD 0
Oops: Oops: 0000 [#1] SMP NOPTI
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 11/12/2020
RIP: 0010:tcp_ao_finish_connect (net/ipv4/tcp_ao.c:1182)
Fixes: 7c2ffaf21bd6 ("net/tcp: Calculate TCP-AO traffic keys")
Signed-off-by: Anderson Nascimento <anderson@allelesecurity.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250911230743.2551-3-anderson@allelesecurity.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Eric Dumazet <edumazet@google.com>
Date: Wed Sep 17 13:53:37 2025 +0000
net: clear sk->sk_ino in sk_set_socket(sk, NULL)
[ Upstream commit 87ebb628a5acb892eba41ef1d8989beb8f036034 ]
Andrei Vagin reported that blamed commit broke CRIU.
Indeed, while we want to keep sk_uid unchanged when a socket
is cloned, we want to clear sk->sk_ino.
Otherwise, sock_diag might report multiple sockets sharing
the same inode number.
Move the clearing part from sock_orphan() to sk_set_socket(sk, NULL),
called both from sock_orphan() and sk_clone_lock().
Fixes: 5d6b58c932ec ("net: lockless sock_i_ino()")
Closes: https://lore.kernel.org/netdev/aMhX-VnXkYDpKd9V@google.com/
Closes: https://github.com/checkpoint-restore/criu/issues/2744
Reported-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrei Vagin <avagin@google.com>
Link: https://patch.msgid.link/20250917135337.1736101-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Tue Sep 9 18:54:15 2025 +0200
net: dst_metadata: fix IP_DF bit not extracted from tunnel headers
[ Upstream commit a9888628cb2c768202a4530e2816da1889cc3165 ]
Both OVS and TC flower allow extracting and matching on the DF bit of
the outer IP header via OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT in the
OVS_KEY_ATTR_TUNNEL and TCA_FLOWER_KEY_FLAGS_TUNNEL_DONT_FRAGMENT in
the TCA_FLOWER_KEY_ENC_FLAGS respectively. Flow dissector extracts
this information as FLOW_DIS_F_TUNNEL_DONT_FRAGMENT from the tunnel
info key.
However, the IP_TUNNEL_DONT_FRAGMENT_BIT in the tunnel key is never
actually set, because the tunneling code doesn't actually extract it
from the IP header. OAM and CRIT_OPT are extracted by the tunnel
implementation code, same code also sets the KEY flag, if present.
UDP tunnel core takes care of setting the CSUM flag if the checksum
is present in the UDP header, but the DONT_FRAGMENT is not handled at
any layer.
Fix that by checking the bit and setting the corresponding flag while
populating the tunnel info in the IP layer where it belongs.
Not using __assign_bit as we don't really need to clear the bit in a
just initialized field. It also doesn't seem like using __assign_bit
will make the code look better.
Clearly, users didn't rely on this functionality for anything very
important until now. The reason why this doesn't break OVS logic is
that it only matches on what kernel previously parsed out and if kernel
consistently reports this bit as zero, OVS will only match on it to be
zero, which sort of works. But it is still a bug that the uAPI reports
and allows matching on the field that is not actually checked in the
packet. And this is causing misleading -df reporting in OVS datapath
flows, while the tunnel traffic actually has the bit set in most cases.
This may also cause issues if a hardware properly implements support
for tunnel flag matching as it will disagree with the implementation
in a software path of TC flower.
Fixes: 7d5437c709de ("openvswitch: Add tunneling interface.")
Fixes: 1d17568e74de ("net/sched: cls_flower: add support for matching tunnel control flags")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250909165440.229890-2-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Alexey Nepomnyashih <sdl@nppct.ru>
Date: Wed Sep 17 15:30:58 2025 +0000
net: liquidio: fix overflow in octeon_init_instr_queue()
[ Upstream commit cca7b1cfd7b8a0eff2a3510c5e0f10efe8fa3758 ]
The expression `(conf->instr_type == 64) << iq_no` can overflow because
`iq_no` may be as high as 64 (`CN23XX_MAX_RINGS_PER_PF`). Casting the
operand to `u64` ensures correct 64-bit arithmetic.
Fixes: f21fb3ed364b ("Add support of Cavium Liquidio ethernet adapters")
Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Yeounsu Moon <yyyynoom@gmail.com>
Date: Sat Sep 13 15:01:36 2025 +0900
net: natsemi: fix `rx_dropped` double accounting on `netif_rx()` failure
[ Upstream commit 93ab4881a4e2b9657bdce4b8940073bfb4ed5eab ]
`netif_rx()` already increments `rx_dropped` core stat when it fails.
The driver was also updating `ndev->stats.rx_dropped` in the same path.
Since both are reported together via `ip -s -s` command, this resulted
in drops being counted twice in user-visible stats.
Keep the driver update on `if (unlikely(!skb))`, but skip it after
`netif_rx()` errors.
Fixes: caf586e5f23c ("net: add a core netdev->rx_dropped counter")
Signed-off-by: Yeounsu Moon <yyyynoom@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250913060135.35282-3-yyyynoom@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Hans de Goede <hansg@kernel.org>
Date: Sat Sep 13 13:35:15 2025 +0200
net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer
commit b6f56a44e4c1014b08859dcf04ed246500e310e5 upstream.
Since commit 7d5e9737efda ("net: rfkill: gpio: get the name and type from
device property") rfkill_find_type() gets called with the possibly
uninitialized "const char *type_name;" local variable.
On x86 systems when rfkill-gpio binds to a "BCM4752" or "LNV4752"
acpi_device, the rfkill->type is set based on the ACPI acpi_device_id:
rfkill->type = (unsigned)id->driver_data;
and there is no "type" property so device_property_read_string() will fail
and leave type_name uninitialized, leading to a potential crash.
rfkill_find_type() does accept a NULL pointer, fix the potential crash
by initializing type_name to NULL.
Note likely sofar this has not been caught because:
1. Not many x86 machines actually have a "BCM4752"/"LNV4752" acpi_device
2. The stack happened to contain NULL where type_name is stored
Fixes: 7d5e9737efda ("net: rfkill: gpio: get the name and type from device property")
Cc: stable@vger.kernel.org
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/20250913113515.21698-1-hansg@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Nathan Chancellor <nathan@kernel.org>
Date: Sat Sep 6 23:43:34 2025 +0900
nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/*
commit 025e87f8ea2ae3a28bf1fe2b052bfa412c27ed4a upstream.
When accessing one of the files under /sys/fs/nilfs2/features when
CONFIG_CFI_CLANG is enabled, there is a CFI violation:
CFI failure at kobj_attr_show+0x59/0x80 (target: nilfs_feature_revision_show+0x0/0x30; expected type: 0xfc392c4d)
...
Call Trace:
<TASK>
sysfs_kf_seq_show+0x2a6/0x390
? __cfi_kobj_attr_show+0x10/0x10
kernfs_seq_show+0x104/0x15b
seq_read_iter+0x580/0xe2b
...
When the kobject of the kset for /sys/fs/nilfs2 is initialized, its ktype
is set to kset_ktype, which has a ->sysfs_ops of kobj_sysfs_ops. When
nilfs_feature_attr_group is added to that kobject via
sysfs_create_group(), the kernfs_ops of each files is sysfs_file_kfops_rw,
which will call sysfs_kf_seq_show() when ->seq_show() is called.
sysfs_kf_seq_show() in turn calls kobj_attr_show() through
->sysfs_ops->show(). kobj_attr_show() casts the provided attribute out to
a 'struct kobj_attribute' via container_of() and calls ->show(), resulting
in the CFI violation since neither nilfs_feature_revision_show() nor
nilfs_feature_README_show() match the prototype of ->show() in 'struct
kobj_attribute'.
Resolve the CFI violation by adjusting the second parameter in
nilfs_feature_{revision,README}_show() from 'struct attribute' to 'struct
kobj_attribute' to match the expected prototype.
Link: https://lkml.kernel.org/r/20250906144410.22511-1-konishi.ryusuke@gmail.com
Fixes: aebe17f68444 ("nilfs2: add /sys/fs/nilfs2/features group")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202509021646.bc78d9ef-lkp@intel.com/
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Aug 25 15:32:49 2025 +0200
nvme: fix PI insert on write
[ Upstream commit 7ac3c2889bc060c3f67cf44df0dbb093a835c176 ]
I recently ran into an issue where the PI generated using the block layer
integrity code differs from that from a kernel using the PRACT fallback
when the block layer integrity code is disabled, and I tracked this down
to us using PRACT incorrectly.
The NVM Command Set Specification (section 5.33 in 1.2, similar in older
versions) specifies the PRACT insert behavior as:
Inserted protection information consists of the computed CRC for the
protection information format (refer to section 5.3.1) in the Guard
field, the LBAT field value in the Application Tag field, the LBST
field value in the Storage Tag field, if defined, and the computed
reference tag in the Logical Block Reference Tag.
Where the computed reference tag is defined as following for type 1 and
type 2 using the text below that is duplicated in the respective bullet
points:
the value of the computed reference tag for the first logical block of
the command is the value contained in the Initial Logical Block
Reference Tag (ILBRT) or Expected Initial Logical Block Reference Tag
(EILBRT) field in the command, and the computed reference tag is
incremented for each subsequent logical block.
So we need to set ILBRT field, but we currently don't. Interestingly
this works fine on my older type 1 formatted SSD, but Qemu trips up on
this. We already set ILBRT for Write Same since commit aeb7bb061be5
("nvme: set the PRACT bit when using Write Zeroes with T10 PI").
To ease this, move the PI type check into nvme_set_ref_tag.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Tiezhu Yang <yangtiezhu@loongson.cn>
Date: Thu Sep 18 19:43:36 2025 +0800
objtool/LoongArch: Mark special atomic instruction as INSN_BUG type
commit 539d7344d4feaea37e05863e9aa86bd31f28e46f upstream.
When compiling with LLVM and CONFIG_RUST is set, there exists the
following objtool warning:
rust/compiler_builtins.o: warning: objtool: __rust__unordsf2(): unexpected end of section .text.unlikely.
objdump shows that the end of section .text.unlikely is an atomic
instruction:
amswap.w $zero, $ra, $zero
According to the LoongArch Reference Manual, if the amswap.w atomic
memory access instruction has the same register number as rd and rj,
the execution will trigger an Instruction Non-defined Exception, so
mark the above instruction as INSN_BUG type to fix the warning.
Cc: stable@vger.kernel.org
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tiezhu Yang <yangtiezhu@loongson.cn>
Date: Thu Sep 18 19:43:36 2025 +0800
objtool/LoongArch: Mark types based on break immediate code
commit baad7830ee9a56756b3857348452fe756cb0a702 upstream.
If the break immediate code is 0, it should mark the type as
INSN_TRAP. If the break immediate code is 1, it should mark the
type as INSN_BUG.
While at it, format the code style and add the code comment for nop.
Cc: stable@vger.kernel.org
Suggested-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sathesh B Edara <sedara@marvell.com>
Date: Tue Sep 16 06:32:07 2025 -0700
octeon_ep: fix VF MAC address lifecycle handling
[ Upstream commit a72175c985132885573593222a7b088cf49b07ae ]
Currently, VF MAC address info is not updated when the MAC address is
configured from VF, and it is not cleared when the VF is removed. This
leads to stale or missing MAC information in the PF, which may cause
incorrect state tracking or inconsistencies when VFs are hot-plugged
or reassigned.
Fix this by:
- storing the VF MAC address in the PF when it is set from VF
- clearing the stored VF MAC address when the VF is removed
This ensures that the PF always has correct VF MAC state.
Fixes: cde29af9e68e ("octeon_ep: add PF-VF mailbox communication")
Signed-off-by: Sathesh B Edara <sedara@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250916133207.21737-1-sedara@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Kamal Heib <kheib@redhat.com>
Date: Thu Sep 11 18:36:10 2025 -0400
octeon_ep: Validate the VF ID
[ Upstream commit af82e857df5dd883a4867bcaf5dde041e57a4e33 ]
Add a helper to validate the VF ID and use it in the VF ndo ops to
prevent accessing out-of-range entries.
Without this check, users can run commands such as:
# ip link show dev enp135s0
2: enp135s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:00:00:01:01:00 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state enable, trust off
vf 1 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state enable, trust off
# ip link set dev enp135s0 vf 4 mac 00:00:00:00:00:14
# echo $?
0
even though VF 4 does not exist, which results in silent success instead
of returning an error.
Fixes: 8a241ef9b9b8 ("octeon_ep: add ndo ops for VFs in PF driver")
Signed-off-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250911223610.1803144-1-kheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Duoming Zhou <duoming@zju.edu.cn>
Date: Wed Sep 17 14:38:53 2025 +0800
octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
[ Upstream commit f8b4687151021db61841af983f1cb7be6915d4ef ]
The original code relies on cancel_delayed_work() in otx2_ptp_destroy(),
which does not ensure that the delayed work item synctstamp_work has fully
completed if it was already running. This leads to use-after-free scenarios
where otx2_ptp is deallocated by otx2_ptp_destroy(), while synctstamp_work
remains active and attempts to dereference otx2_ptp in otx2_sync_tstamp().
Furthermore, the synctstamp_work is cyclic, the likelihood of triggering
the bug is nonnegligible.
A typical race condition is illustrated below:
CPU 0 (cleanup) | CPU 1 (delayed work callback)
otx2_remove() |
otx2_ptp_destroy() | otx2_sync_tstamp()
cancel_delayed_work() |
kfree(ptp) |
| ptp = container_of(...); //UAF
| ptp-> //UAF
This is confirmed by a KASAN report:
BUG: KASAN: slab-use-after-free in __run_timer_base.part.0+0x7d7/0x8c0
Write of size 8 at addr ffff88800aa09a18 by task bash/136
...
Call Trace:
<IRQ>
dump_stack_lvl+0x55/0x70
print_report+0xcf/0x610
? __run_timer_base.part.0+0x7d7/0x8c0
kasan_report+0xb8/0xf0
? __run_timer_base.part.0+0x7d7/0x8c0
__run_timer_base.part.0+0x7d7/0x8c0
? __pfx___run_timer_base.part.0+0x10/0x10
? __pfx_read_tsc+0x10/0x10
? ktime_get+0x60/0x140
? lapic_next_event+0x11/0x20
? clockevents_program_event+0x1d4/0x2a0
run_timer_softirq+0xd1/0x190
handle_softirqs+0x16a/0x550
irq_exit_rcu+0xaf/0xe0
sysvec_apic_timer_interrupt+0x70/0x80
</IRQ>
...
Allocated by task 1:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x7f/0x90
otx2_ptp_init+0xb1/0x860
otx2_probe+0x4eb/0xc30
local_pci_probe+0xdc/0x190
pci_device_probe+0x2fe/0x470
really_probe+0x1ca/0x5c0
__driver_probe_device+0x248/0x310
driver_probe_device+0x44/0x120
__driver_attach+0xd2/0x310
bus_for_each_dev+0xed/0x170
bus_add_driver+0x208/0x500
driver_register+0x132/0x460
do_one_initcall+0x89/0x300
kernel_init_freeable+0x40d/0x720
kernel_init+0x1a/0x150
ret_from_fork+0x10c/0x1a0
ret_from_fork_asm+0x1a/0x30
Freed by task 136:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3a/0x60
__kasan_slab_free+0x3f/0x50
kfree+0x137/0x370
otx2_ptp_destroy+0x38/0x80
otx2_remove+0x10d/0x4c0
pci_device_remove+0xa6/0x1d0
device_release_driver_internal+0xf8/0x210
pci_stop_bus_device+0x105/0x150
pci_stop_and_remove_bus_device_locked+0x15/0x30
remove_store+0xcc/0xe0
kernfs_fop_write_iter+0x2c3/0x440
vfs_write+0x871/0xd70
ksys_write+0xee/0x1c0
do_syscall_64+0xac/0x280
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the delayed work item is properly canceled before the otx2_ptp is
deallocated.
This bug was initially identified through static analysis. To reproduce
and test it, I simulated the OcteonTX2 PCI device in QEMU and introduced
artificial delays within the otx2_sync_tstamp() function to increase the
likelihood of triggering the bug.
Fixes: 2958d17a8984 ("octeontx2-pf: Add support for ptp 1-step mode on CN10K silicon")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Wed Aug 13 17:50:14 2025 +0200
pcmcia: omap_cf: Mark driver struct with __refdata to prevent section mismatch
[ Upstream commit d1dfcdd30140c031ae091868fb5bed084132bca1 ]
As described in the added code comment, a reference to .exit.text is ok
for drivers registered via platform_driver_probe(). Make this explicit
to prevent the following section mismatch warning
WARNING: modpost: drivers/pcmcia/omap_cf: section mismatch in reference: omap_cf_driver+0x4 (section: .data) -> omap_cf_remove (section: .exit.text)
that triggers on an omap1_defconfig + CONFIG_OMAP_CF=m build.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ian Rogers <irogers@google.com>
Date: Sun Sep 14 11:18:08 2025 -0700
perf maps: Ensure kmap is set up for all inserts
[ Upstream commit 20c9ccffccd61b37325a0519fb6d485caeecf7fa ]
__maps__fixup_overlap_and_insert may split or directly insert a map,
when doing this the map may need to have a kmap set up for the sake of
the kmaps. The missing kmap set up fails the check_invariants test in
maps, later "Internal error" reports from map__kmap and ultimately
causes segfaults.
Similar fixes were added in commit e0e4e0b8b7fa ("perf maps: Add
missing map__set_kmap_maps() when replacing a kernel map") and commit
25d9c0301d36 ("perf maps: Set the kmaps for newly created/added kernel
maps") but they missed cases. To try to reduce the risk of this,
update the kmap directly following any manual insert. This identified
another problem in maps__copy_from.
Fixes: e0e4e0b8b7fa ("perf maps: Add missing map__set_kmap_maps() when replacing a kernel map")
Fixes: 25d9c0301d36 ("perf maps: Set the kmaps for newly created/added kernel maps")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Antheas Kapenekakis <lkml@antheas.dev>
Date: Fri Aug 8 17:47:10 2025 +0200
platform/x86: asus-wmi: Fix ROG button mapping, tablet mode on ASUS ROG Z13
commit 132bfcd24925d4d4531a19b87acb8474be82a017 upstream.
On commit 9286dfd5735b ("platform/x86: asus-wmi: Fix spurious rfkill on
UX8406MA"), Mathieu adds a quirk for the Zenbook Duo to ignore the code
0x5f (WLAN button disable). On that laptop, this code is triggered when
the device keyboard is attached.
On the ASUS ROG Z13 2025, this code is triggered when pressing the side
button of the device, which is used to open Armoury Crate in Windows.
As this is becoming a pattern, where newer Asus laptops use this keycode
for emitting events, let's convert the wlan ignore quirk to instead
allow emitting codes, so that userspace programs can listen to it and
so that it does not interfere with the rfkill state.
With this patch, the Z13 wil emit KEY_PROG3 and the Duo will remain
unchanged and emit no event. While at it, add a quirk for the Z13 to
switch into tablet mode when removing the keyboard.
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
Link: https://lore.kernel.org/r/20250808154710.8981-2-lkml@antheas.dev
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Antheas Kapenekakis <lkml@antheas.dev>
Date: Tue Sep 16 09:28:18 2025 +0200
platform/x86: asus-wmi: Re-add extra keys to ignore_key_wlan quirk
commit 225d1ee0f5ba3218d1814d36564fdb5f37b50474 upstream.
It turns out that the dual screen models use 0x5E for attaching and
detaching the keyboard instead of 0x5F. So, re-add the codes by
reverting commit cf3940ac737d ("platform/x86: asus-wmi: Remove extra
keys from ignore_key_wlan quirk"). For our future reference, add a
comment next to 0x5E indicating that it is used for that purpose.
Fixes: cf3940ac737d ("platform/x86: asus-wmi: Remove extra keys from ignore_key_wlan quirk")
Reported-by: Rahul Chandra <rahul@chandra.net>
Closes: https://lore.kernel.org/all/10020-68c90c80-d-4ac6c580@106290038/
Cc: stable@kernel.org
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
Link: https://patch.msgid.link/20250916072818.196462-1-lkml@antheas.dev
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: H. Nikolaus Schaller <hns@goldelico.com>
Date: Sat Aug 23 12:34:56 2025 +0200
power: supply: bq27xxx: fix error return in case of no bq27000 hdq battery
commit 2c334d038466ac509468fbe06905a32d202117db upstream.
Since commit
commit f16d9fb6cf03 ("power: supply: bq27xxx: Retrieve again when busy")
the console log of some devices with hdq enabled but no bq27000 battery
(like e.g. the Pandaboard) is flooded with messages like:
[ 34.247833] power_supply bq27000-battery: driver failed to report 'status' property: -1
as soon as user-space is finding a /sys entry and trying to read the
"status" property.
It turns out that the offending commit changes the logic to now return the
value of cache.flags if it is <0. This is likely under the assumption that
it is an error number. In normal errors from bq27xxx_read() this is indeed
the case.
But there is special code to detect if no bq27000 is installed or accessible
through hdq/1wire and wants to report this. In that case, the cache.flags
are set historically by
commit 3dd843e1c26a ("bq27000: report missing device better.")
to constant -1 which did make reading properties return -ENODEV. So everything
appeared to be fine before the return value was passed upwards.
Now the -1 is returned as -EPERM instead of -ENODEV, triggering the error
condition in power_supply_format_property() which then floods the console log.
So we change the detection of missing bq27000 battery to simply set
cache.flags = -ENODEV
instead of -1.
Fixes: f16d9fb6cf03 ("power: supply: bq27xxx: Retrieve again when busy")
Cc: Jerry Lv <Jerry.Lv@axis.com>
Cc: stable@vger.kernel.org
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Link: https://lore.kernel.org/r/692f79eb6fd541adb397038ea6e750d4de2deddf.1755945297.git.hns@goldelico.com
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: H. Nikolaus Schaller <hns@goldelico.com>
Date: Sat Aug 23 12:34:57 2025 +0200
power: supply: bq27xxx: restrict no-battery detection to bq27000
commit 1e451977e1703b6db072719b37cd1b8e250b9cc9 upstream.
There are fuel gauges in the bq27xxx series (e.g. bq27z561) which may in some
cases report 0xff as the value of BQ27XXX_REG_FLAGS that should not be
interpreted as "no battery" like for a disconnected battery with some built
in bq27000 chip.
So restrict the no-battery detection originally introduced by
commit 3dd843e1c26a ("bq27000: report missing device better.")
to the bq27000.
There is no need to backport further because this was hidden before
commit f16d9fb6cf03 ("power: supply: bq27xxx: Retrieve again when busy")
Fixes: f16d9fb6cf03 ("power: supply: bq27xxx: Retrieve again when busy")
Suggested-by: Jerry Lv <Jerry.Lv@axis.com>
Cc: stable@vger.kernel.org
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Link: https://lore.kernel.org/r/dd979fa6855fd051ee5117016c58daaa05966e24.1755945297.git.hns@goldelico.com
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Date: Wed Sep 10 16:29:16 2025 +1000
qed: Don't collect too many protection override GRC elements
[ Upstream commit 56c0a2a9ddc2f5b5078c5fb0f81ab76bbc3d4c37 ]
In the protection override dump path, the firmware can return far too
many GRC elements, resulting in attempting to write past the end of the
previously-kmalloc'ed dump buffer.
This will result in a kernel panic with reason:
BUG: unable to handle kernel paging request at ADDRESS
where "ADDRESS" is just past the end of the protection override dump
buffer. The start address of the buffer is:
p_hwfn->cdev->dbg_features[DBG_FEATURE_PROTECTION_OVERRIDE].dump_buf
and the size of the buffer is buf_size in the same data structure.
The panic can be arrived at from either the qede Ethernet driver path:
[exception RIP: qed_grc_dump_addr_range+0x108]
qed_protection_override_dump at ffffffffc02662ed [qed]
qed_dbg_protection_override_dump at ffffffffc0267792 [qed]
qed_dbg_feature at ffffffffc026aa8f [qed]
qed_dbg_all_data at ffffffffc026b211 [qed]
qed_fw_fatal_reporter_dump at ffffffffc027298a [qed]
devlink_health_do_dump at ffffffff82497f61
devlink_health_report at ffffffff8249cf29
qed_report_fatal_error at ffffffffc0272baf [qed]
qede_sp_task at ffffffffc045ed32 [qede]
process_one_work at ffffffff81d19783
or the qedf storage driver path:
[exception RIP: qed_grc_dump_addr_range+0x108]
qed_protection_override_dump at ffffffffc068b2ed [qed]
qed_dbg_protection_override_dump at ffffffffc068c792 [qed]
qed_dbg_feature at ffffffffc068fa8f [qed]
qed_dbg_all_data at ffffffffc0690211 [qed]
qed_fw_fatal_reporter_dump at ffffffffc069798a [qed]
devlink_health_do_dump at ffffffff8aa95e51
devlink_health_report at ffffffff8aa9ae19
qed_report_fatal_error at ffffffffc0697baf [qed]
qed_hw_err_notify at ffffffffc06d32d7 [qed]
qed_spq_post at ffffffffc06b1011 [qed]
qed_fcoe_destroy_conn at ffffffffc06b2e91 [qed]
qedf_cleanup_fcport at ffffffffc05e7597 [qedf]
qedf_rport_event_handler at ffffffffc05e7bf7 [qedf]
fc_rport_work at ffffffffc02da715 [libfc]
process_one_work at ffffffff8a319663
Resolve this by clamping the firmware's return value to the maximum
number of legal elements the firmware should return.
Fixes: d52c89f120de8 ("qed*: Utilize FW 8.37.2.0")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Link: https://patch.msgid.link/f8e1182934aa274c18d0682a12dbaf347595469c.1757485536.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Håkon Bugge <haakon.bugge@oracle.com>
Date: Thu Sep 11 15:33:34 2025 +0200
rds: ib: Increment i_fastreg_wrs before bailing out
commit 4351ca3fcb3ffecf12631b4996bf085a2dad0db6 upstream.
We need to increment i_fastreg_wrs before we bail out from
rds_ib_post_reg_frmr().
We have a fixed budget of how many FRWR operations that can be
outstanding using the dedicated QP used for memory registrations and
de-registrations. This budget is enforced by the atomic_t
i_fastreg_wrs. If we bail out early in rds_ib_post_reg_frmr(), we will
"leak" the possibility of posting an FRWR operation, and if that
accumulates, no FRWR operation can be carried out.
Fixes: 1659185fb4d0 ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode")
Fixes: 3a2886cca703 ("net/rds: Keep track of and wait for FRWR segments in use upon shutdown")
Cc: stable@vger.kernel.org
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250911133336.451212-1-haakon.bugge@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tariq Toukan <tariqt@nvidia.com>
Date: Wed Sep 17 16:48:54 2025 +0300
Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
[ Upstream commit 3fbfe251cc9f6d391944282cdb9bcf0bd02e01f8 ]
This reverts commit d24341740fe48add8a227a753e68b6eedf4b385a.
It causes errors when trying to configure QoS, as well as
loss of L2 connectivity (on multi-host devices).
Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/20250910170011.70528106@kernel.org
Fixes: d24341740fe4 ("net/mlx5e: Update and set Xon/Xoff upon port speed set")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Andrea Righi <arighi@nvidia.com>
Date: Fri Sep 12 18:14:38 2025 +0200
Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()"
commit 0b47b6c3543efd65f2e620e359b05f4938314fbd upstream.
scx_bpf_reenqueue_local() can be called from ops.cpu_release() when a
CPU is taken by a higher scheduling class to give tasks queued to the
CPU's local DSQ a chance to be migrated somewhere else, instead of
waiting indefinitely for that CPU to become available again.
In doing so, we decided to skip migration-disabled tasks, under the
assumption that they cannot be migrated anyway.
However, when a higher scheduling class preempts a CPU, the running task
is always inserted at the head of the local DSQ as a migration-disabled
task. This means it is always skipped by scx_bpf_reenqueue_local(), and
ends up being confined to the same CPU even if that CPU is heavily
contended by other higher scheduling class tasks.
As an example, let's consider the following scenario:
$ schedtool -a 0,1, -e yes > /dev/null
$ sudo schedtool -F -p 99 -a 0, -e \
stress-ng -c 1 --cpu-load 99 --cpu-load-slice 1000
The first task (SCHED_EXT) can run on CPU0 or CPU1. The second task
(SCHED_FIFO) is pinned to CPU0 and consumes ~99% of it. If the SCHED_EXT
task initially runs on CPU0, it will remain there because it always sees
CPU0 as "idle" in the short gaps left by the RT task, resulting in ~1%
utilization while CPU1 stays idle:
0[||||||||||||||||||||||100.0%] 8[ 0.0%]
1[ 0.0%] 9[ 0.0%]
2[ 0.0%] 10[ 0.0%]
3[ 0.0%] 11[ 0.0%]
4[ 0.0%] 12[ 0.0%]
5[ 0.0%] 13[ 0.0%]
6[ 0.0%] 14[ 0.0%]
7[ 0.0%] 15[ 0.0%]
PID USER PRI NI S CPU CPU%▽MEM% TIME+ Command
1067 root RT 0 R 0 99.0 0.2 0:31.16 stress-ng-cpu [run]
975 arighi 20 0 R 0 1.0 0.0 0:26.32 yes
By allowing scx_bpf_reenqueue_local() to re-enqueue migration-disabled
tasks, the scheduler can choose to migrate them to other CPUs (CPU1 in
this case) via ops.enqueue(), leading to better CPU utilization:
0[||||||||||||||||||||||100.0%] 8[ 0.0%]
1[||||||||||||||||||||||100.0%] 9[ 0.0%]
2[ 0.0%] 10[ 0.0%]
3[ 0.0%] 11[ 0.0%]
4[ 0.0%] 12[ 0.0%]
5[ 0.0%] 13[ 0.0%]
6[ 0.0%] 14[ 0.0%]
7[ 0.0%] 15[ 0.0%]
PID USER PRI NI S CPU CPU%▽MEM% TIME+ Command
577 root RT 0 R 0 100.0 0.2 0:23.17 stress-ng-cpu [run]
555 arighi 20 0 R 1 100.0 0.0 0:28.67 yes
It's debatable whether per-CPU tasks should be re-enqueued as well, but
doing so is probably safer: the scheduler can recognize re-enqueued
tasks through the %SCX_ENQ_REENQ flag, reassess their placement, and
either put them back at the head of the local DSQ or let another task
attempt to take the CPU.
This also prevents giving per-CPU tasks an implicit priority boost,
which would otherwise make them more likely to reclaim CPUs preempted by
higher scheduling classes.
Fixes: 97e13ecb02668 ("sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Acked-by: Changwoo Min <changwoo@igalia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: David Howells <dhowells@redhat.com>
Date: Thu Sep 11 23:58:16 2025 +0100
rxrpc: Fix unhandled errors in rxgk_verify_packet_integrity()
[ Upstream commit 64863f4ca4945bdb62ce2b30823f39ea9fe95415 ]
rxgk_verify_packet_integrity() may get more errors than just -EPROTO from
rxgk_verify_mic_skb(). Pretty much anything other than -ENOMEM constitutes
an unrecoverable error. In the case of -ENOMEM, we can just drop the
packet and wait for a retransmission.
Similar happens with rxgk_decrypt_skb() and its callers.
Fix rxgk_decrypt_skb() or rxgk_verify_mic_skb() to return a greater variety
of abort codes and fix their callers to abort the connection on any error
apart from -ENOMEM.
Also preclear the variables used to hold the abort code returned from
rxgk_decrypt_skb() or rxgk_verify_mic_skb() to eliminate uninitialised
variable warnings.
Fixes: 9d1d2b59341f ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lists.infradead.org/pipermail/linux-afs/2025-April/009739.html
Closes: https://lists.infradead.org/pipermail/linux-afs/2025-April/009740.html
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/2038804.1757631496@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: David Howells <dhowells@redhat.com>
Date: Fri Sep 12 00:06:17 2025 +0100
rxrpc: Fix untrusted unsigned subtract
[ Upstream commit 2429a197648178cd4dc930a9d87c13c547460564 ]
Fix the following Smatch static checker warning:
net/rxrpc/rxgk_app.c:65 rxgk_yfs_decode_ticket()
warn: untrusted unsigned subtract. 'ticket_len - 10 * 4'
by prechecking the length of what we're trying to extract in two places in
the token and decoding for a response packet.
Also use sizeof() on the struct we're extracting rather specifying the size
numerically to be consistent with the other related statements.
Fixes: 9d1d2b59341f ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lists.infradead.org/pipermail/linux-afs/2025-September/010135.html
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/2039268.1757631977@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: SeongJae Park <sj@kernel.org>
Date: Sun Sep 21 10:11:19 2025 -0400
samples/damon/mtier: avoid starting DAMON before initialization
[ Upstream commit c62cff40481c037307a13becbda795f7afdcfebd ]
Commit 964314344eab ("samples/damon/mtier: support boot time enable
setup") is somehow incompletely applying the origin patch [1]. It is
missing the part that avoids starting DAMON before module initialization.
Probably a mistake during a merge has happened. Fix it by applying the
missed part again.
Link: https://lkml.kernel.org/r/20250909022238.2989-4-sj@kernel.org
Link: https://lore.kernel.org/20250706193207.39810-4-sj@kernel.org [1]
Fixes: 964314344eab ("samples/damon/mtier: support boot time enable setup")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: SeongJae Park <sj@kernel.org>
Date: Mon Sep 8 19:22:37 2025 -0700
samples/damon/prcl: avoid starting DAMON before initialization
commit e6b733ca2f99e968d696c2e812c8eb8e090bf37b upstream.
Commit 2780505ec2b4 ("samples/damon/prcl: fix boot time enable crash") is
somehow incompletely applying the origin patch [1]. It is missing the
part that avoids starting DAMON before module initialization. Probably a
mistake during a merge has happened. Fix it by applying the missed part
again.
Link: https://lkml.kernel.org/r/20250909022238.2989-3-sj@kernel.org
Link: https://lore.kernel.org/20250706193207.39810-3-sj@kernel.org [1]
Fixes: 2780505ec2b4 ("samples/damon/prcl: fix boot time enable crash")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: SeongJae Park <sj@kernel.org>
Date: Sun Sep 21 10:11:17 2025 -0400
samples/damon/prcl: fix boot time enable crash
[ Upstream commit 2780505ec2b42c07853b34640bc63279ac2bb53b ]
If 'enable' parameter of the 'prcl' DAMON sample module is set at boot
time via the kernel command line, memory allocation is tried before the
slab is initialized. As a result kernel NULL pointer dereference BUG can
happen. Fix it by checking the initialization status.
Link: https://lkml.kernel.org/r/20250706193207.39810-3-sj@kernel.org
Fixes: 2aca254620a8 ("samples/damon: introduce a skeleton of a smaple DAMON module for proactive reclamation")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: c62cff40481c ("samples/damon/mtier: avoid starting DAMON before initialization")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Honggyu Kim <honggyu.kim@sk.com>
Date: Sun Sep 21 10:11:18 2025 -0400
samples/damon: change enable parameters to enabled
[ Upstream commit 793020545cea0c9e2a79de6ad5c9746ec4f5bd7e ]
The damon_{lru_sort,reclaim,stat} kernel modules use "enabled" parameter
knobs as follows.
/sys/module/damon_lru_sort/parameters/enabled
/sys/module/damon_reclaim/parameters/enabled
/sys/module/damon_stat/parameters/enabled
However, other sample modules of damon use "enable" parameter knobs so
it'd be better to rename them from "enable" to "enabled" to keep the
consistency with other damon modules.
Before:
/sys/module/damon_sample_wsse/parameters/enable
/sys/module/damon_sample_prcl/parameters/enable
/sys/module/damon_sample_mtier/parameters/enable
After:
/sys/module/damon_sample_wsse/parameters/enabled
/sys/module/damon_sample_prcl/parameters/enabled
/sys/module/damon_sample_mtier/parameters/enabled
There is no functional changes in this patch.
Link: https://lkml.kernel.org/r/20250707024548.1964-1-honggyu.kim@sk.com
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: c62cff40481c ("samples/damon/mtier: avoid starting DAMON before initialization")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Fri Sep 12 14:25:52 2025 +0200
selftests: mptcp: avoid spurious errors on TCP disconnect
commit 8708c5d8b3fb3f6d5d3b9e6bfe01a505819f519a upstream.
The disconnect test-case, with 'plain' TCP sockets generates spurious
errors, e.g.
07 ns1 TCP -> ns1 (dead:beef:1::1:10006) MPTCP
read: Connection reset by peer
read: Connection reset by peer
(duration 155ms) [FAIL] client exit code 3, server 3
netns ns1-FloSdv (listener) socket stat for 10006:
TcpActiveOpens 2 0.0
TcpPassiveOpens 2 0.0
TcpEstabResets 2 0.0
TcpInSegs 274 0.0
TcpOutSegs 276 0.0
TcpOutRsts 3 0.0
TcpExtPruneCalled 2 0.0
TcpExtRcvPruned 1 0.0
TcpExtTCPPureAcks 104 0.0
TcpExtTCPRcvCollapsed 2 0.0
TcpExtTCPBacklogCoalesce 42 0.0
TcpExtTCPRcvCoalesce 43 0.0
TcpExtTCPChallengeACK 1 0.0
TcpExtTCPFromZeroWindowAdv 42 0.0
TcpExtTCPToZeroWindowAdv 41 0.0
TcpExtTCPWantZeroWindowAdv 13 0.0
TcpExtTCPOrigDataSent 164 0.0
TcpExtTCPDelivered 165 0.0
TcpExtTCPRcvQDrop 1 0.0
In the failing scenarios (TCP -> MPTCP), the involved sockets are
actually plain TCP ones, as fallbacks for passive sockets at 2WHS time
cause the MPTCP listeners to actually create 'plain' TCP sockets.
Similar to commit 218cc166321f ("selftests: mptcp: avoid spurious errors
on disconnect"), the root cause is in the user-space bits: the test
program tries to disconnect as soon as all the pending data has been
spooled, generating an RST. If such option reaches the peer before the
connection has reached the closed status, the TCP socket will report an
error to the user-space, as per protocol specification, causing the
above failure. Note that it looks like this issue got more visible since
the "tcp: receiver changes" series from commit 06baf9bfa6ca ("Merge
branch 'tcp-receiver-changes'").
Address the issue by explicitly waiting for the TCP sockets (-t) to
reach a closed status before performing the disconnect. More precisely,
the test program now waits for plain TCP sockets or TCP subflows in
addition to the MPTCP sockets that were already monitored.
While at it, use 'ss' with '-n' to avoid resolving service names, which
is not needed here.
Fixes: 218cc166321f ("selftests: mptcp: avoid spurious errors on disconnect")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-3-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Fri Sep 12 14:25:51 2025 +0200
selftests: mptcp: connect: catch IO errors on listen side
commit 14e22b43df25dbd4301351b882486ea38892ae4f upstream.
IO errors were correctly printed to stderr, and propagated up to the
main loop for the server side, but the returned value was ignored. As a
consequence, the program for the listener side was no longer exiting
with an error code in case of IO issues.
Because of that, some issues might not have been seen. But very likely,
most issues either had an effect on the client side, or the file
transfer was not the expected one, e.g. the connection got reset before
the end. Still, it is better to fix this.
The main consequence of this issue is the error that was reported by the
selftests: the received and sent files were different, and the MIB
counters were not printed. Also, when such errors happened during the
'disconnect' tests, the program tried to continue until the timeout.
Now when an IO error is detected, the program exits directly with an
error.
Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-2-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Geliang Tang <geliang@kernel.org>
Date: Fri Sep 12 14:52:24 2025 +0200
selftests: mptcp: sockopt: fix error messages
[ Upstream commit b86418beade11d45540a2d20c4ec1128849b6c27 ]
This patch fixes several issues in the error reporting of the MPTCP sockopt
selftest:
1. Fix diff not printed: The error messages for counter mismatches had
the actual difference ('diff') as argument, but it was missing in the
format string. Displaying it makes the debugging easier.
2. Fix variable usage: The error check for 'mptcpi_bytes_acked' incorrectly
used 'ret2' (sent bytes) for both the expected value and the difference
calculation. It now correctly uses 'ret' (received bytes), which is the
expected value for bytes_acked.
3. Fix off-by-one in diff: The calculation for the 'mptcpi_rcv_delta' diff
was 's.mptcpi_rcv_delta - ret', which is off-by-one. It has been
corrected to 's.mptcpi_rcv_delta - (ret + 1)' to match the expected
value in the condition above it.
Fixes: 5dcff89e1455 ("selftests: mptcp: explicitly tests aggregate counters")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-5-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Fri Sep 12 14:52:22 2025 +0200
selftests: mptcp: userspace pm: validate deny-join-id0 flag
[ Upstream commit 24733e193a0d68f20d220e86da0362460c9aa812 ]
The previous commit adds the MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 flag. Make
sure it is correctly announced by the other peer when it has been
received.
pm_nl_ctl will now display 'deny_join_id0:1' when monitoring the events,
and when this flag was set by the other peer.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-3-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Paulo Alcantara <pc@manguebit.org>
Date: Thu Sep 18 12:30:32 2025 -0300
smb: client: fix file open check in __cifs_unlink()
[ Upstream commit 251090e2c2c1be60607d1c521af2c993f04d4f61 ]
Fix the file open check to decide whether or not silly-rename the file
in SMB2+.
Fixes: c5ea3065586d ("smb: client: fix data loss due to broken rename(2)")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: Frank Sorenson <sorenson@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Paulo Alcantara <pc@manguebit.org>
Date: Wed Sep 17 16:03:22 2025 -0300
smb: client: fix filename matching of deferred files
[ Upstream commit 93ed9a2951308db374cba4562533dde97bac70d3 ]
Fix the following case where the client would end up closing both
deferred files (foo.tmp & foo) after unlink(foo) due to strstr() call
in cifs_close_deferred_file_under_dentry():
fd1 = openat(AT_FDCWD, "foo", O_WRONLY|O_CREAT|O_TRUNC, 0666);
fd2 = openat(AT_FDCWD, "foo.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666);
close(fd1);
close(fd2);
unlink("foo");
Fixes: e3fc065682eb ("cifs: Deferred close performance improvements")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Cc: Frank Sorenson <sorenson@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Stefan Metzmacher <metze@samba.org>
Date: Thu Sep 18 03:06:46 2025 +0200
smb: client: fix smbdirect_recv_io leak in smbd_negotiate() error path
[ Upstream commit daac51c7032036a0ca5f1aa419ad1b0471d1c6e0 ]
During tests of another unrelated patch I was able to trigger this
error: Objects remaining on __kmem_cache_shutdown()
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Stefan Metzmacher <metze@samba.org>
Date: Wed Sep 10 11:49:05 2025 +0200
smb: client: let recv_done verify data_offset, data_length and remaining_data_length
[ Upstream commit f57e53ea252363234f86674db475839e5b87102e ]
This is inspired by the related server fixes.
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Stefan Metzmacher <metze@samba.org>
Date: Tue Aug 12 13:03:19 2025 +0200
smb: client: let smbd_destroy() call disable_work_sync(&info->post_send_credits_work)
[ Upstream commit d9dcbbcf9145b68aa85c40947311a6907277e097 ]
In smbd_destroy() we may destroy the memory so we better
wait until post_send_credits_work is no longer pending
and will never be started again.
I actually just hit the case using rxe:
WARNING: CPU: 0 PID: 138 at drivers/infiniband/sw/rxe/rxe_verbs.c:1032 rxe_post_recv+0x1ee/0x480 [rdma_rxe]
...
[ 5305.686979] [ T138] smbd_post_recv+0x445/0xc10 [cifs]
[ 5305.687135] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5
[ 5305.687149] [ T138] ? __kasan_check_write+0x14/0x30
[ 5305.687185] [ T138] ? __pfx_smbd_post_recv+0x10/0x10 [cifs]
[ 5305.687329] [ T138] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 5305.687356] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5
[ 5305.687368] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5
[ 5305.687378] [ T138] ? _raw_spin_unlock_irqrestore+0x11/0x60
[ 5305.687389] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5
[ 5305.687399] [ T138] ? get_receive_buffer+0x168/0x210 [cifs]
[ 5305.687555] [ T138] smbd_post_send_credits+0x382/0x4b0 [cifs]
[ 5305.687701] [ T138] ? __pfx_smbd_post_send_credits+0x10/0x10 [cifs]
[ 5305.687855] [ T138] ? __pfx___schedule+0x10/0x10
[ 5305.687865] [ T138] ? __pfx__raw_spin_lock_irq+0x10/0x10
[ 5305.687875] [ T138] ? queue_delayed_work_on+0x8e/0xa0
[ 5305.687889] [ T138] process_one_work+0x629/0xf80
[ 5305.687908] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5
[ 5305.687917] [ T138] ? __kasan_check_write+0x14/0x30
[ 5305.687933] [ T138] worker_thread+0x87f/0x1570
...
It means rxe_post_recv was called after rdma_destroy_qp().
This happened because put_receive_buffer() was triggered
by ib_drain_qp() and called:
queue_work(info->workqueue, &info->post_send_credits_work);
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Stefan Metzmacher <metze@samba.org>
Date: Tue Aug 5 18:11:31 2025 +0200
smb: client: make use of smbdirect_socket->recv_io.expected
[ Upstream commit bbdbd9ae47155da65aa0c1641698a44d85c2faa2 ]
The expected incoming message type can be per connection.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Stable-dep-of: f57e53ea2523 ("smb: client: let recv_done verify data_offset, data_length and remaining_data_length")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Stefan Metzmacher <metze@samba.org>
Date: Tue Aug 5 18:11:33 2025 +0200
smb: client: make use of struct smbdirect_recv_io
[ Upstream commit 5dddf0497445d247e995306daf3b76dd0633831c ]
This is the shared structure that will be used in
the server too and will allow us to move helper functions
into common code soon.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Stable-dep-of: f57e53ea2523 ("smb: client: let recv_done verify data_offset, data_length and remaining_data_length")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Stefan Metzmacher <metze@samba.org>
Date: Tue Aug 12 12:58:21 2025 +0200
smb: client: use disable[_delayed]_work_sync in smbdirect.c
[ Upstream commit bac28f604c7699727b2fecf14c3a54668bbe458e ]
This makes it safer during the disconnect and avoids
requeueing.
It's ok to call disable[delayed_]work[_sync]() more than once.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: 050b8c374019 ("smbd: Make upper layer decide when to destroy the transport")
Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection")
Fixes: c7398583340a ("CIFS: SMBD: Implement RDMA memory registration")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Stefan Metzmacher <metze@samba.org>
Date: Thu Sep 4 13:06:41 2025 +0200
smb: server: let smb_direct_writev() respect SMB_DIRECT_MAX_SEND_SGES
[ Upstream commit d162694037215fe25f1487999c58d70df809a2fd ]
We should not use more sges for ib_post_send() than we told the rdma
device in rdma_create_qp()!
Otherwise ib_post_send() will return -EINVAL, so we disconnect the
connection. Or with the current siw.ko we'll get 0 from ib_post_send(),
but will never ever get a completion for the request. I've already sent a
fix for siw.ko...
So we need to make sure smb_direct_writev() limits the number of vectors
we pass to individual smb_direct_post_send_data() calls, so that we
don't go over the queue pair limits.
Commit 621433b7e25d ("ksmbd: smbd: relax the count of sges required")
was very strange and I guess only needed because
SMB_DIRECT_MAX_SEND_SGES was 8 at that time. It basically removed the
check that the rdma device is able to handle the number of sges we try
to use.
While the real problem was added by commit ddbdc861e37c ("ksmbd: smbd:
introduce read/write credits for RDMA read/write") as it used the
minumun of device->attrs.max_send_sge and device->attrs.max_sge_rd, with
the problem that device->attrs.max_sge_rd is always 1 for iWarp. And
that limitation should only apply to RDMA Read operations. For now we
keep that limitation for RDMA Write operations too, fixing that is a
task for another day as it's not really required a bug fix.
Commit 2b4eeeaa9061 ("ksmbd: decrease the number of SMB3 smbdirect
server SGEs") lowered SMB_DIRECT_MAX_SEND_SGES to 6, which is also used
by our client code. And that client code enforces
device->attrs.max_send_sge >= 6 since commit d2e81f92e5b7 ("Decrease the
number of SMB3 smbdirect client SGEs") and (briefly looking) only the
i40w driver provides only 3, see I40IW_MAX_WQ_FRAGMENT_COUNT. But
currently we'd require 4 anyway, so that would not work anyway, but now
it fails early.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Hyunchul Lee <hyc.lee@gmail.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: linux-rdma@vger.kernel.org
Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers")
Fixes: ddbdc861e37c ("ksmbd: smbd: introduce read/write credits for RDMA read/write")
Fixes: 621433b7e25d ("ksmbd: smbd: relax the count of sges required")
Fixes: 2b4eeeaa9061 ("ksmbd: decrease the number of SMB3 smbdirect server SGEs")
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Stefan Metzmacher <metze@samba.org>
Date: Tue Aug 5 18:11:30 2025 +0200
smb: smbdirect: introduce smbdirect_socket.recv_io.expected
[ Upstream commit 33dd53a90e3419ea260e9ff2b4aa107385cdf7fa ]
The expected message type can be global as they never change
during the after negotiation process.
This will replace smbd_response->type and smb_direct_recvmsg->type
in future.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Stable-dep-of: f57e53ea2523 ("smb: client: let recv_done verify data_offset, data_length and remaining_data_length")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Stefan Metzmacher <metze@samba.org>
Date: Tue Aug 5 18:11:32 2025 +0200
smb: smbdirect: introduce struct smbdirect_recv_io
[ Upstream commit 60812d20da82606f0620904c281579a9af0ab452 ]
This will be used in client and server soon
in order to replace smbd_response/smb_direct_recvmsg.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Stable-dep-of: f57e53ea2523 ("smb: client: let recv_done verify data_offset, data_length and remaining_data_length")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Kuniyuki Iwashima <kuniyu@google.com>
Date: Mon Sep 15 17:56:46 2025 +0000
tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().
[ Upstream commit 45c8a6cc2bcd780e634a6ba8e46bffbdf1fc5c01 ]
syzbot reported the splat below where a socket had tcp_sk(sk)->fastopen_rsk
in the TCP_ESTABLISHED state. [0]
syzbot reused the server-side TCP Fast Open socket as a new client before
the TFO socket completes 3WHS:
1. accept()
2. connect(AF_UNSPEC)
3. connect() to another destination
As of accept(), sk->sk_state is TCP_SYN_RECV, and tcp_disconnect() changes
it to TCP_CLOSE and makes connect() possible, which restarts timers.
Since tcp_disconnect() forgot to clear tcp_sk(sk)->fastopen_rsk, the
retransmit timer triggered the warning and the intended packet was not
retransmitted.
Let's call reqsk_fastopen_remove() in tcp_disconnect().
[0]:
WARNING: CPU: 2 PID: 0 at net/ipv4/tcp_timer.c:542 tcp_retransmit_timer (net/ipv4/tcp_timer.c:542 (discriminator 7))
Modules linked in:
CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Not tainted 6.17.0-rc5-g201825fb4278 #62 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:tcp_retransmit_timer (net/ipv4/tcp_timer.c:542 (discriminator 7))
Code: 41 55 41 54 55 53 48 8b af b8 08 00 00 48 89 fb 48 85 ed 0f 84 55 01 00 00 0f b6 47 12 3c 03 74 0c 0f b6 47 12 3c 04 74 04 90 <0f> 0b 90 48 8b 85 c0 00 00 00 48 89 ef 48 8b 40 30 e8 6a 4f 06 3e
RSP: 0018:ffffc900002f8d40 EFLAGS: 00010293
RAX: 0000000000000002 RBX: ffff888106911400 RCX: 0000000000000017
RDX: 0000000002517619 RSI: ffffffff83764080 RDI: ffff888106911400
RBP: ffff888106d5c000 R08: 0000000000000001 R09: ffffc900002f8de8
R10: 00000000000000c2 R11: ffffc900002f8ff8 R12: ffff888106911540
R13: ffff888106911480 R14: ffff888106911840 R15: ffffc900002f8de0
FS: 0000000000000000(0000) GS:ffff88907b768000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8044d69d90 CR3: 0000000002c30003 CR4: 0000000000370ef0
Call Trace:
<IRQ>
tcp_write_timer (net/ipv4/tcp_timer.c:738)
call_timer_fn (kernel/time/timer.c:1747)
__run_timers (kernel/time/timer.c:1799 kernel/time/timer.c:2372)
timer_expire_remote (kernel/time/timer.c:2385 kernel/time/timer.c:2376 kernel/time/timer.c:2135)
tmigr_handle_remote_up (kernel/time/timer_migration.c:944 kernel/time/timer_migration.c:1035)
__walk_groups.isra.0 (kernel/time/timer_migration.c:533 (discriminator 1))
tmigr_handle_remote (kernel/time/timer_migration.c:1096)
handle_softirqs (./arch/x86/include/asm/jump_label.h:36 ./include/trace/events/irq.h:142 kernel/softirq.c:580)
irq_exit_rcu (kernel/softirq.c:614 kernel/softirq.c:453 kernel/softirq.c:680 kernel/softirq.c:696)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1050 (discriminator 35) arch/x86/kernel/apic/apic.c:1050 (discriminator 35))
</IRQ>
Fixes: 8336886f786f ("tcp: TCP Fast Open Server - support TFO listeners")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250915175800.118793-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jakub Kicinski <kuba@kernel.org>
Date: Tue Sep 16 17:28:13 2025 -0700
tls: make sure to abort the stream if headers are bogus
[ Upstream commit 0aeb54ac4cd5cf8f60131b4d9ec0b6dc9c27b20d ]
Normally we wait for the socket to buffer up the whole record
before we service it. If the socket has a tiny buffer, however,
we read out the data sooner, to prevent connection stalls.
Make sure that we abort the connection when we find out late
that the record is actually invalid. Retrying the parsing is
fine in itself but since we copy some more data each time
before we parse we can overflow the allocated skb space.
Constructing a scenario in which we're under pressure without
enough data in the socket to parse the length upfront is quite
hard. syzbot figured out a way to do this by serving us the header
in small OOB sends, and then filling in the recvbuf with a large
normal send.
Make sure that tls_rx_msg_size() aborts strp, if we reach
an invalid record there's really no way to recover.
Reported-by: Lee Jones <lee@kernel.org>
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250917002814.1743558-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Tiwei Bie <tiwei.btw@antgroup.com>
Date: Mon Sep 1 08:27:15 2025 +0800
um: Fix FD copy size in os_rcv_fd_msg()
[ Upstream commit df447a3b4a4b961c9979b4b3ffb74317394b9b40 ]
When copying FDs, the copy size should not include the control
message header (cmsghdr). Fix it.
Fixes: 5cde6096a4dd ("um: generalize os_rcv_fd")
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Miaoqian Lin <linmq006@gmail.com>
Date: Thu Aug 28 15:00:51 2025 +0800
um: virtio_uml: Fix use-after-free after put_device in probe
[ Upstream commit 7ebf70cf181651fe3f2e44e95e7e5073d594c9c0 ]
When register_virtio_device() fails in virtio_uml_probe(),
the code sets vu_dev->registered = 1 even though
the device was not successfully registered.
This can lead to use-after-free or other issues.
Fixes: 04e5b1fb0183 ("um: virtio: Remove device on disconnect")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Liao Yuanhong <liaoyuanhong@vivo.com>
Date: Mon Aug 25 10:29:11 2025 +0800
wifi: mac80211: fix incorrect type for ret
[ Upstream commit a33b375ab5b3a9897a0ab76be8258d9f6b748628 ]
The variable ret is declared as a u32 type, but it is assigned a value
of -EOPNOTSUPP. Since unsigned types cannot correctly represent negative
values, the type of ret should be changed to int.
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Link: https://patch.msgid.link/20250825022911.139377-1-liaoyuanhong@vivo.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Lachlan Hodges <lachlan.hodges@morsemicro.com>
Date: Tue Aug 26 18:54:37 2025 +1000
wifi: mac80211: increase scan_ies_len for S1G
[ Upstream commit 7e2f3213e85eba00acb4cfe6d71647892d63c3a1 ]
Currently the S1G capability element is not taken into account
for the scan_ies_len, which leads to a buffer length validation
failure in ieee80211_prep_hw_scan() and subsequent WARN in
__ieee80211_start_scan(). This prevents hw scanning from functioning.
To fix ensure we accommodate for the S1G capability length.
Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com>
Link: https://patch.msgid.link/20250826085437.3493-1-lachlan.hodges@morsemicro.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Felix Fietkau <nbd@nbd.name>
Date: Wed Aug 27 10:53:48 2025 +0200
wifi: mt76: do not add non-sta wcid entries to the poll list
[ Upstream commit a3c99ef88a084e1c2b99dd56bbfa7f89c9be3e92 ]
Polling and airtime reporting is valid for station entries only
Link: https://patch.msgid.link/20250827085352.51636-2-nbd@nbd.name
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ajay.Kathat@microchip.com <Ajay.Kathat@microchip.com>
Date: Fri Aug 29 22:58:43 2025 +0000
wifi: wilc1000: avoid buffer overflow in WID string configuration
[ Upstream commit fe9e4d0c39311d0f97b024147a0d155333f388b5 ]
Fix the following copy overflow warning identified by Smatch checker.
drivers/net/wireless/microchip/wilc1000/wlan_cfg.c:184 wilc_wlan_parse_response_frame()
error: '__memcpy()' 'cfg->s[i]->str' copy overflow (512 vs 65537)
This patch introduces size check before accessing the memory buffer.
The checks are base on the WID type of received data from the firmware.
For WID string configuration, the size limit is determined by individual
element size in 'struct wilc_cfg_str_vals' that is maintained in 'len' field
of 'struct wilc_cfg_str'.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-wireless/aLFbr9Yu9j_TQTey@stanley.mountain
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Ajay Singh <ajay.kathat@microchip.com>
Link: https://patch.msgid.link/20250829225829.5423-1-ajay.kathat@microchip.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Tom Lendacky <thomas.lendacky@amd.com>
Date: Mon Sep 15 11:04:12 2025 -0500
x86/sev: Guard sev_evict_cache() with CONFIG_AMD_MEM_ENCRYPT
commit 7f830e126dc357fc086905ce9730140fd4528d66 upstream.
The sev_evict_cache() is guest-related code and should be guarded by
CONFIG_AMD_MEM_ENCRYPT, not CONFIG_KVM_AMD_SEV.
CONFIG_AMD_MEM_ENCRYPT=y is required for a guest to run properly as an SEV-SNP
guest, but a guest kernel built with CONFIG_KVM_AMD_SEV=n would get the stub
function of sev_evict_cache() instead of the version that performs the actual
eviction. Move the function declarations under the appropriate #ifdef.
Fixes: 7b306dfa326f ("x86/sev: Evict cache lines during SNP memory validation")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@kernel.org # 6.16.x
Link: https://lore.kernel.org/r/70e38f2c4a549063de54052c9f64929705313526.1757708959.git.thomas.lendacky@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sergey Senozhatsky <senozhatsky@chromium.org>
Date: Tue Sep 9 13:48:35 2025 +0900
zram: fix slot write race condition
commit ce4be9e4307c5a60701ff6e0cafa74caffdc54ce upstream.
Parallel concurrent writes to the same zram index result in leaked
zsmalloc handles. Schematically we can have something like this:
CPU0 CPU1
zram_slot_lock()
zs_free(handle)
zram_slot_lock()
zram_slot_lock()
zs_free(handle)
zram_slot_lock()
compress compress
handle = zs_malloc() handle = zs_malloc()
zram_slot_lock
zram_set_handle(handle)
zram_slot_lock
zram_slot_lock
zram_set_handle(handle)
zram_slot_lock
Either CPU0 or CPU1 zsmalloc handle will leak because zs_free() is done
too early. In fact, we need to reset zram entry right before we set its
new handle, all under the same slot lock scope.
Link: https://lkml.kernel.org/r/20250909045150.635345-1-senozhatsky@chromium.org
Fixes: 71268035f5d7 ("zram: free slot memory early during write")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reported-by: Changhui Zhong <czhong@redhat.com>
Closes: https://lore.kernel.org/all/CAGVVp+UtpGoW5WEdEU7uVTtsSCjPN=ksN6EcvyypAtFDOUf30A@mail.gmail.com/
Tested-by: Changhui Zhong <czhong@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>