Changelog in Linux kernel 6.12.6

acpi: nfit: vmalloc-out-of-bounds Read in acpi_nfit_ctl [+ + +]

Author: Suraj Sonawane <surajsonawane0215@gmail.com>
Date:   Mon Nov 18 21:56:09 2024 +0530

    acpi: nfit: vmalloc-out-of-bounds Read in acpi_nfit_ctl
    
    [ Upstream commit 265e98f72bac6c41a4492d3e30a8e5fd22fe0779 ]
    
    Fix an issue detected by syzbot with KASAN:
    
    BUG: KASAN: vmalloc-out-of-bounds in cmd_to_func drivers/acpi/nfit/
    core.c:416 [inline]
    BUG: KASAN: vmalloc-out-of-bounds in acpi_nfit_ctl+0x20e8/0x24a0
    drivers/acpi/nfit/core.c:459
    
    The issue occurs in cmd_to_func when the call_pkg->nd_reserved2
    array is accessed without verifying that call_pkg points to a buffer
    that is appropriately sized as a struct nd_cmd_pkg. This can lead
    to out-of-bounds access and undefined behavior if the buffer does not
    have sufficient space.
    
    To address this, a check was added in acpi_nfit_ctl() to ensure that
    buf is not NULL and that buf_len is less than sizeof(*call_pkg)
    before accessing it. This ensures safe access to the members of
    call_pkg, including the nd_reserved2 array.
    
    Reported-by: syzbot+7534f060ebda6b8b51b3@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=7534f060ebda6b8b51b3
    Tested-by: syzbot+7534f060ebda6b8b51b3@syzkaller.appspotmail.com
    Fixes: ebe9f6f19d80 ("acpi/nfit: Fix bus command validation")
    Signed-off-by: Suraj Sonawane <surajsonawane0215@gmail.com>
    Reviewed-by: Alison Schofield <alison.schofield@intel.com>
    Reviewed-by: Dave Jiang <dave.jiang@intel.com>
    Link: https://patch.msgid.link/20241118162609.29063-1-surajsonawane0215@gmail.com
    Signed-off-by: Ira Weiny <ira.weiny@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPI: resource: Fix memory resource type union access [+ + +]

Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Mon Dec 2 12:06:13 2024 +0200

    ACPI: resource: Fix memory resource type union access
    
    [ Upstream commit 7899ca9f3bd2b008e9a7c41f2a9f1986052d7e96 ]
    
    In acpi_decode_space() addr->info.mem.caching is checked on main level
    for any resource type but addr->info.mem is part of union and thus
    valid only if the resource type is memory range.
    
    Move the check inside the preceeding switch/case to only execute it
    when the union is of correct type.
    
    Fixes: fcb29bbcd540 ("ACPI: Add prefetch decoding to the address space parser")
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Link: https://patch.msgid.link/20241202100614.20731-1-ilpo.jarvinen@linux.intel.com
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPICA: events/evxfregn: don't release the ContextMutex that was never acquired [+ + +]

Author: Daniil Tatianin <d-tatianin@yandex-team.ru>
Date:   Fri Nov 22 11:29:54 2024 +0300

    ACPICA: events/evxfregn: don't release the ContextMutex that was never acquired
    
    [ Upstream commit c53d96a4481f42a1635b96d2c1acbb0a126bfd54 ]
    
    This bug was first introduced in c27f3d011b08, where the author of the
    patch probably meant to do DeleteMutex instead of ReleaseMutex. The
    mutex leak was noticed later on and fixed in e4dfe108371, but the bogus
    MutexRelease line was never removed, so do it now.
    
    Link: https://github.com/acpica/acpica/pull/982
    Fixes: c27f3d011b08 ("ACPICA: Fix race in generic_serial_bus (I2C) and GPIO op_region parameter handling")
    Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
    Link: https://patch.msgid.link/20241122082954.658356-1-d-tatianin@yandex-team.ru
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: control: Avoid WARN() for symlink errors [+ + +]

Author: Takashi Iwai <tiwai@suse.de>
Date:   Mon Dec 9 10:56:12 2024 +0100

    ALSA: control: Avoid WARN() for symlink errors
    
    [ Upstream commit b2e538a9827dd04ab5273bf4be8eb2edb84357b0 ]
    
    Using WARN() for showing the error of symlink creations don't give
    more information than telling that something goes wrong, since the
    usual code path is a lregister callback from each control element
    creation.  More badly, the use of WARN() rather confuses fuzzer as if
    it were serious issues.
    
    This patch downgrades the warning messages to use the normal dev_err()
    instead of WARN().  For making it clearer, add the function name to
    the prefix, too.
    
    Fixes: a135dfb5de15 ("ALSA: led control - add sysfs kcontrol LED marking layer")
    Reported-by: syzbot+4e7919b09c67ffd198ae@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/675664c7.050a0220.a30f1.018c.GAE@google.com
    Link: https://patch.msgid.link/20241209095614.4273-1-tiwai@suse.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: hda/realtek: Fix headset mic on Acer Nitro 5 [+ + +]

Author: Hridesh MG <hridesh699@gmail.com>
Date:   Thu Dec 5 22:48:42 2024 +0530

    ALSA: hda/realtek: Fix headset mic on Acer Nitro 5
    
    commit 5a69e3d0a1b0f07e58c353560cfcb1ea20a6f040 upstream.
    
    Add a PCI quirk to enable microphone input on the headphone jack on
    the Acer Nitro 5 AN515-58 laptop.
    
    Signed-off-by: Hridesh MG <hridesh699@gmail.com>
    Cc: <stable@vger.kernel.org>
    Link: https://patch.msgid.link/20241205171843.7787-1-hridesh699@gmail.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: usb-audio: Add implicit feedback quirk for Yamaha THR5 [+ + +]

Author: Jaakko Salo <jaakkos@gmail.com>
Date:   Fri Dec 6 18:44:48 2024 +0200

    ALSA: usb-audio: Add implicit feedback quirk for Yamaha THR5
    
    commit 82fdcf9b518b205da040046fbe7747fb3fd18657 upstream.
    
    Use implicit feedback from the capture endpoint to fix popping
    sounds during playback.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=219567
    Signed-off-by: Jaakko Salo <jaakkos@gmail.com>
    Cc: <stable@vger.kernel.org>
    Link: https://patch.msgid.link/20241206164448.8136-1-jaakkos@gmail.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

amdgpu/uvd: get ring reference from rq scheduler [+ + +]

Author: David (Ming Qiang) Wu <David.Wu3@amd.com>
Date:   Wed Dec 4 11:30:01 2024 -0500

    amdgpu/uvd: get ring reference from rq scheduler
    
    [ Upstream commit 47f402a3e08113e0f5d8e1e6fcc197667a16022f ]
    
    base.sched may not be set for each instance and should not
    be used for cases such as non-IB tests.
    
    Fixes: 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()")
    Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: amd: yc: Fix the wrong return value [+ + +]

Author: Venkata Prasad Potturu <venkataprasad.potturu@amd.com>
Date:   Tue Dec 10 14:40:25 2024 +0530

    ASoC: amd: yc: Fix the wrong return value
    
    [ Upstream commit 984795e76def5c903724b8d6a8228e356bbdf2af ]
    
    With the current implementation, when ACP driver fails to read
    ACPI _WOV entry then the DMI overrides code won't invoke,
    may cause regressions for some BIOS versions.
    
    Add a condition check to jump to check the DMI entries incase of
    ACP driver fail to read ACPI _WOV method.
    
    Fixes: 4095cf872084 (ASoC: amd: yc: Fix for enabling DMIC on acp6x via _DSD entry)
    
    Signed-off-by: Venkata Prasad Potturu <venkataprasad.potturu@amd.com>
    Link: https://patch.msgid.link/20241210091026.996860-1-venkataprasad.potturu@amd.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: fsl_spdif: change IFACE_PCM to IFACE_MIXER [+ + +]

Author: Shengjiu Wang <shengjiu.wang@nxp.com>
Date:   Tue Nov 26 13:32:54 2024 +0800

    ASoC: fsl_spdif: change IFACE_PCM to IFACE_MIXER
    
    [ Upstream commit bb76e82bfe57fdd1fe595cb0ccd33159df49ed09 ]
    
    As the snd_soc_card_get_kcontrol() is updated to use
    snd_ctl_find_id_mixer() in
    commit 897cc72b0837 ("ASoC: soc-card: Use
    snd_ctl_find_id_mixer() instead of open-coding")
    which make the iface fix to be IFACE_MIXER.
    
    Fixes: 897cc72b0837 ("ASoC: soc-card: Use snd_ctl_find_id_mixer() instead of open-coding")
    Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
    Link: https://patch.msgid.link/20241126053254.3657344-3-shengjiu.wang@nxp.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: fsl_xcvr: change IFACE_PCM to IFACE_MIXER [+ + +]

Author: Shengjiu Wang <shengjiu.wang@nxp.com>
Date:   Tue Nov 26 13:32:53 2024 +0800

    ASoC: fsl_xcvr: change IFACE_PCM to IFACE_MIXER
    
    [ Upstream commit 7c17f7780a48b5ed36b6d13a06004fac993e75af ]
    
    As the snd_soc_card_get_kcontrol() is updated to use
    snd_ctl_find_id_mixer() in
    commit 897cc72b0837 ("ASoC: soc-card: Use
    snd_ctl_find_id_mixer() instead of open-coding")
    which make the iface fix to be IFACE_MIXER.
    
    Fixes: 897cc72b0837 ("ASoC: soc-card: Use snd_ctl_find_id_mixer() instead of open-coding")
    Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
    Link: https://patch.msgid.link/20241126053254.3657344-2-shengjiu.wang@nxp.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: Intel: sof_sdw: Add space for a terminator into DAIs array [+ + +]

Author: Charles Keepax <ckeepax@opensource.cirrus.com>
Date:   Thu Dec 12 10:57:42 2024 +0000

    ASoC: Intel: sof_sdw: Add space for a terminator into DAIs array
    
    [ Upstream commit 255cc582e6e16191a20d54bcdbca6c91d3e90c5e ]
    
    The code uses the initialised member of the asoc_sdw_dailink struct to
    determine if a member of the array is in use. However in the case the
    array is completely full this will lead to an access 1 past the end of
    the array, expand the array by one entry to include a space for a
    terminator.
    
    Fixes: 27fd36aefa00 ("ASoC: Intel: sof-sdw: Add new code for parsing the snd_soc_acpi structs")
    Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
    Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
    Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
    Link: https://patch.msgid.link/20241212105742.1508574-1-ckeepax@opensource.cirrus.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: tas2781: Fix calibration issue in stress test [+ + +]

Author: Shenghao Ding <shenghao-ding@ti.com>
Date:   Wed Dec 11 12:38:59 2024 +0800

    ASoC: tas2781: Fix calibration issue in stress test
    
    [ Upstream commit 2aa13da97e2b92d20a8ad4ead10da89f880b64e7 ]
    
    One specific test condition: the default registers of p[j].reg ~
    p[j+3].reg are 0, TASDEVICE_REG(0x00, 0x14, 0x38)(PLT_FLAG_REG),
    TASDEVICE_REG(0x00, 0x14, 0x40)(SINEGAIN_REG), and
    TASDEVICE_REG(0x00, 0x14, 0x44)(SINEGAIN2_REG). After first calibration,
    they are freshed to TASDEVICE_REG(0x00, 0x1a, 0x20), TASDEVICE_REG(0x00,
    0x16, 0x58)(PLT_FLAG_REG), TASDEVICE_REG(0x00, 0x14, 0x44)(SINEGAIN_REG),
    and TASDEVICE_REG(0x00, 0x16, 0x64)(SINEGAIN2_REG) via "Calibration Start"
    kcontrol. In second calibration, the p[j].reg ~ p[j+3].reg have already
    become tas2781_cali_start_reg. However, p[j+2].reg, TASDEVICE_REG(0x00,
    0x14, 0x44)(SINEGAIN_REG), will be freshed to TASDEVICE_REG(0x00, 0x16,
    0x64), which is the third register in the input params of the kcontrol.
    This is why only first calibration can work, the second-time, third-time
    or more-time calibration always failed without reboot. Of course, if no
    p[j].reg is in the list of tas2781_cali_start_reg, this stress test can
    work well.
    
    Fixes: 49e2e353fb0d ("ASoC: tas2781: Add Calibration Kcontrols for Chromebook")
    Signed-off-by: Shenghao Ding <shenghao-ding@ti.com>
    Link: https://patch.msgid.link/20241211043859.1328-1-shenghao-ding@ti.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ata: sata_highbank: fix OF node reference leak in highbank_initialize_phys() [+ + +]

Author: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
Date:   Thu Dec 5 19:30:14 2024 +0900

    ata: sata_highbank: fix OF node reference leak in highbank_initialize_phys()
    
    commit 676fe1f6f74db988191dab5df3bf256908177072 upstream.
    
    The OF node reference obtained by of_parse_phandle_with_args() is not
    released on early return. Add a of_node_put() call before returning.
    
    Fixes: 8996b89d6bc9 ("ata: add platform driver for Calxeda AHCI controller")
    Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

batman-adv: Do not let TT changes list grows indefinitely [+ + +]

Author: Remi Pommarel <repk@triplefau.lt>
Date:   Fri Nov 22 16:52:50 2024 +0100

    batman-adv: Do not let TT changes list grows indefinitely
    
    [ Upstream commit fff8f17c1a6fc802ca23bbd3a276abfde8cc58e6 ]
    
    When TT changes list is too big to fit in packet due to MTU size, an
    empty OGM is sent expected other node to send TT request to get the
    changes. The issue is that tt.last_changeset was not built thus the
    originator was responding with previous changes to those TT requests
    (see batadv_send_my_tt_response). Also the changes list was never
    cleaned up effectively never ending growing from this point onwards,
    repeatedly sending the same TT response changes over and over, and
    creating a new empty OGM every OGM interval expecting for the local
    changes to be purged.
    
    When there is more TT changes that can fit in packet, drop all changes,
    send empty OGM and wait for TT request so we can respond with a full
    table instead.
    
    Fixes: e1bf0c14096f ("batman-adv: tvlv - convert tt data sent within OGMs")
    Signed-off-by: Remi Pommarel <repk@triplefau.lt>
    Acked-by: Antonio Quartulli <Antonio@mandelbit.com>
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: Do not send uninitialized TT changes [+ + +]

Author: Remi Pommarel <repk@triplefau.lt>
Date:   Fri Nov 22 16:52:48 2024 +0100

    batman-adv: Do not send uninitialized TT changes
    
    [ Upstream commit f2f7358c3890e7366cbcb7512b4bc8b4394b2d61 ]
    
    The number of TT changes can be less than initially expected in
    batadv_tt_tvlv_container_update() (changes can be removed by
    batadv_tt_local_event() in ADD+DEL sequence between reading
    tt_diff_entries_num and actually iterating the change list under lock).
    
    Thus tt_diff_len could be bigger than the actual changes size that need
    to be sent. Because batadv_send_my_tt_response sends the whole
    packet, uninitialized data can be interpreted as TT changes on other
    nodes leading to weird TT global entries on those nodes such as:
    
     * 00:00:00:00:00:00   -1 [....] (  0) 88:12:4e:ad:7e:ba (179) (0x45845380)
     * 00:00:00:00:78:79 4092 [.W..] (  0) 88:12:4e:ad:7e:3c (145) (0x8ebadb8b)
    
    All of the above also applies to OGM tvlv container buffer's tvlv_len.
    
    Remove the extra allocated space to avoid sending uninitialized TT
    changes in batadv_send_my_tt_response() and batadv_v_ogm_send_softif().
    
    Fixes: e1bf0c14096f ("batman-adv: tvlv - convert tt data sent within OGMs")
    Signed-off-by: Remi Pommarel <repk@triplefau.lt>
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: Remove uninitialized data in full table TT response [+ + +]

Author: Remi Pommarel <repk@triplefau.lt>
Date:   Fri Nov 22 16:52:49 2024 +0100

    batman-adv: Remove uninitialized data in full table TT response
    
    [ Upstream commit 8038806db64da15721775d6b834990cacbfcf0b2 ]
    
    The number of entries filled by batadv_tt_tvlv_generate() can be less
    than initially expected in batadv_tt_prepare_tvlv_{global,local}_data()
    (changes can be removed by batadv_tt_local_event() in ADD+DEL sequence
    in the meantime as the lock held during the whole tvlv global/local data
    generation).
    
    Thus tvlv_len could be bigger than the actual TT entry size that need
    to be sent so full table TT_RESPONSE could hold invalid TT entries such
    as below.
    
     * 00:00:00:00:00:00   -1 [....] (  0) 88:12:4e:ad:7e:ba (179) (0x45845380)
     * 00:00:00:00:78:79 4092 [.W..] (  0) 88:12:4e:ad:7e:3c (145) (0x8ebadb8b)
    
    Remove the extra allocated space to avoid sending uninitialized entries
    for full table TT_RESPONSE in both batadv_send_other_tt_response() and
    batadv_send_my_tt_response().
    
    Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific")
    Signed-off-by: Remi Pommarel <repk@triplefau.lt>
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

blk-cgroup: Fix UAF in blkcg_unpin_online() [+ + +]

Author: Tejun Heo <tj@kernel.org>
Date:   Fri Dec 6 07:59:51 2024 -1000

    blk-cgroup: Fix UAF in blkcg_unpin_online()
    
    commit 86e6ca55b83c575ab0f2e105cf08f98e58d3d7af upstream.
    
    blkcg_unpin_online() walks up the blkcg hierarchy putting the online pin. To
    walk up, it uses blkcg_parent(blkcg) but it was calling that after
    blkcg_destroy_blkgs(blkcg) which could free the blkcg, leading to the
    following UAF:
    
      ==================================================================
      BUG: KASAN: slab-use-after-free in blkcg_unpin_online+0x15a/0x270
      Read of size 8 at addr ffff8881057678c0 by task kworker/9:1/117
    
      CPU: 9 UID: 0 PID: 117 Comm: kworker/9:1 Not tainted 6.13.0-rc1-work-00182-gb8f52214c61a-dirty #48
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown 02/02/2022
      Workqueue: cgwb_release cgwb_release_workfn
      Call Trace:
       <TASK>
       dump_stack_lvl+0x27/0x80
       print_report+0x151/0x710
       kasan_report+0xc0/0x100
       blkcg_unpin_online+0x15a/0x270
       cgwb_release_workfn+0x194/0x480
       process_scheduled_works+0x71b/0xe20
       worker_thread+0x82a/0xbd0
       kthread+0x242/0x2c0
       ret_from_fork+0x33/0x70
       ret_from_fork_asm+0x1a/0x30
       </TASK>
      ...
      Freed by task 1944:
       kasan_save_track+0x2b/0x70
       kasan_save_free_info+0x3c/0x50
       __kasan_slab_free+0x33/0x50
       kfree+0x10c/0x330
       css_free_rwork_fn+0xe6/0xb30
       process_scheduled_works+0x71b/0xe20
       worker_thread+0x82a/0xbd0
       kthread+0x242/0x2c0
       ret_from_fork+0x33/0x70
       ret_from_fork_asm+0x1a/0x30
    
    Note that the UAF is not easy to trigger as the free path is indirected
    behind a couple RCU grace periods and a work item execution. I could only
    trigger it with artifical msleep() injected in blkcg_unpin_online().
    
    Fix it by reading the parent pointer before destroying the blkcg's blkg's.
    
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Reported-by: Abagail ren <renzezhongucas@gmail.com>
    Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
    Fixes: 4308a434e5e0 ("blkcg: don't offline parent blkcg first")
    Cc: stable@vger.kernel.org # v5.7+
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

blk-iocost: Avoid using clamp() on inuse in __propagate_weights() [+ + +]

Author: Nathan Chancellor <nathan@kernel.org>
Date:   Thu Dec 12 10:13:29 2024 -0700

    blk-iocost: Avoid using clamp() on inuse in __propagate_weights()
    
    [ Upstream commit 57e420c84f9ab55ba4c5e2ae9c5f6c8e1ea834d2 ]
    
    After a recent change to clamp() and its variants [1] that increases the
    coverage of the check that high is greater than low because it can be
    done through inlining, certain build configurations (such as s390
    defconfig) fail to build with clang with:
    
      block/blk-iocost.c:1101:11: error: call to '__compiletime_assert_557' declared with 'error' attribute: clamp() low limit 1 greater than high limit active
       1101 |                 inuse = clamp_t(u32, inuse, 1, active);
            |                         ^
      include/linux/minmax.h:218:36: note: expanded from macro 'clamp_t'
        218 | #define clamp_t(type, val, lo, hi) __careful_clamp(type, val, lo, hi)
            |                                    ^
      include/linux/minmax.h:195:2: note: expanded from macro '__careful_clamp'
        195 |         __clamp_once(type, val, lo, hi, __UNIQUE_ID(v_), __UNIQUE_ID(l_), __UNIQUE_ID(h_))
            |         ^
      include/linux/minmax.h:188:2: note: expanded from macro '__clamp_once'
        188 |         BUILD_BUG_ON_MSG(statically_true(ulo > uhi),                            \
            |         ^
    
    __propagate_weights() is called with an active value of zero in
    ioc_check_iocgs(), which results in the high value being less than the
    low value, which is undefined because the value returned depends on the
    order of the comparisons.
    
    The purpose of this expression is to ensure inuse is not more than
    active and at least 1. This could be written more simply with a ternary
    expression that uses min(inuse, active) as the condition so that the
    value of that condition can be used if it is not zero and one if it is.
    Do this conversion to resolve the error and add a comment to deter
    people from turning this back into clamp().
    
    Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
    Link: https://lore.kernel.org/r/34d53778977747f19cce2abb287bb3e6@AcuMS.aculab.com/ [1]
    Suggested-by: David Laight <david.laight@aculab.com>
    Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Closes: https://lore.kernel.org/llvm/CA+G9fYsD7mw13wredcZn0L-KBA3yeoVSTuxnss-AEWMN3ha0cA@mail.gmail.com/
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202412120322.3GfVe3vF-lkp@intel.com/
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

blk-mq: move cpuhp callback registering out of q->sysfs_lock [+ + +]

Author: Ming Lei <ming.lei@redhat.com>
Date:   Fri Dec 6 19:16:07 2024 +0800

    blk-mq: move cpuhp callback registering out of q->sysfs_lock
    
    [ Upstream commit 22465bbac53c821319089016f268a2437de9b00a ]
    
    Registering and unregistering cpuhp callback requires global cpu hotplug lock,
    which is used everywhere. Meantime q->sysfs_lock is used in block layer
    almost everywhere.
    
    It is easy to trigger lockdep warning[1] by connecting the two locks.
    
    Fix the warning by moving blk-mq's cpuhp callback registering out of
    q->sysfs_lock. Add one dedicated global lock for covering registering &
    unregistering hctx's cpuhp, and it is safe to do so because hctx is
    guaranteed to be live if our request_queue is live.
    
    [1] https://lore.kernel.org/lkml/Z04pz3AlvI4o0Mr8@agluck-desk3/
    
    Cc: Reinette Chatre <reinette.chatre@intel.com>
    Cc: Fenghua Yu <fenghua.yu@intel.com>
    Cc: Peter Newman <peternewman@google.com>
    Cc: Babu Moger <babu.moger@amd.com>
    Reported-by: Luck Tony <tony.luck@intel.com>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Tested-by: Tony Luck <tony.luck@intel.com>
    Link: https://lore.kernel.org/r/20241206111611.978870-3-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Stable-dep-of: be26ba96421a ("block: Fix potential deadlock while freezing queue and acquiring sysfs_lock")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

block: Fix potential deadlock while freezing queue and acquiring sysfs_lock [+ + +]

Author: Nilay Shroff <nilay@linux.ibm.com>
Date:   Tue Dec 10 20:11:43 2024 +0530

    block: Fix potential deadlock while freezing queue and acquiring sysfs_lock
    
    [ Upstream commit be26ba96421ab0a8fa2055ccf7db7832a13c44d2 ]
    
    For storing a value to a queue attribute, the queue_attr_store function
    first freezes the queue (->q_usage_counter(io)) and then acquire
    ->sysfs_lock. This seems not correct as the usual ordering should be to
    acquire ->sysfs_lock before freezing the queue. This incorrect ordering
    causes the following lockdep splat which we are able to reproduce always
    simply by accessing /sys/kernel/debug file using ls command:
    
    [   57.597146] WARNING: possible circular locking dependency detected
    [   57.597154] 6.12.0-10553-gb86545e02e8c #20 Tainted: G        W
    [   57.597162] ------------------------------------------------------
    [   57.597168] ls/4605 is trying to acquire lock:
    [   57.597176] c00000003eb56710 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x58/0xc0
    [   57.597200]
                   but task is already holding lock:
    [   57.597207] c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4
    [   57.597226]
                   which lock already depends on the new lock.
    
    [   57.597233]
                   the existing dependency chain (in reverse order) is:
    [   57.597241]
                   -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}:
    [   57.597255]        down_write+0x6c/0x18c
    [   57.597264]        start_creating+0xb4/0x24c
    [   57.597274]        debugfs_create_dir+0x2c/0x1e8
    [   57.597283]        blk_register_queue+0xec/0x294
    [   57.597292]        add_disk_fwnode+0x2e4/0x548
    [   57.597302]        brd_alloc+0x2c8/0x338
    [   57.597309]        brd_init+0x100/0x178
    [   57.597317]        do_one_initcall+0x88/0x3e4
    [   57.597326]        kernel_init_freeable+0x3cc/0x6e0
    [   57.597334]        kernel_init+0x34/0x1cc
    [   57.597342]        ret_from_kernel_user_thread+0x14/0x1c
    [   57.597350]
                   -> #4 (&q->debugfs_mutex){+.+.}-{4:4}:
    [   57.597362]        __mutex_lock+0xfc/0x12a0
    [   57.597370]        blk_register_queue+0xd4/0x294
    [   57.597379]        add_disk_fwnode+0x2e4/0x548
    [   57.597388]        brd_alloc+0x2c8/0x338
    [   57.597395]        brd_init+0x100/0x178
    [   57.597402]        do_one_initcall+0x88/0x3e4
    [   57.597410]        kernel_init_freeable+0x3cc/0x6e0
    [   57.597418]        kernel_init+0x34/0x1cc
    [   57.597426]        ret_from_kernel_user_thread+0x14/0x1c
    [   57.597434]
                   -> #3 (&q->sysfs_lock){+.+.}-{4:4}:
    [   57.597446]        __mutex_lock+0xfc/0x12a0
    [   57.597454]        queue_attr_store+0x9c/0x110
    [   57.597462]        sysfs_kf_write+0x70/0xb0
    [   57.597471]        kernfs_fop_write_iter+0x1b0/0x2ac
    [   57.597480]        vfs_write+0x3dc/0x6e8
    [   57.597488]        ksys_write+0x84/0x140
    [   57.597495]        system_call_exception+0x130/0x360
    [   57.597504]        system_call_common+0x160/0x2c4
    [   57.597516]
                   -> #2 (&q->q_usage_counter(io)#21){++++}-{0:0}:
    [   57.597530]        __submit_bio+0x5ec/0x828
    [   57.597538]        submit_bio_noacct_nocheck+0x1e4/0x4f0
    [   57.597547]        iomap_readahead+0x2a0/0x448
    [   57.597556]        xfs_vm_readahead+0x28/0x3c
    [   57.597564]        read_pages+0x88/0x41c
    [   57.597571]        page_cache_ra_unbounded+0x1ac/0x2d8
    [   57.597580]        filemap_get_pages+0x188/0x984
    [   57.597588]        filemap_read+0x13c/0x4bc
    [   57.597596]        xfs_file_buffered_read+0x88/0x17c
    [   57.597605]        xfs_file_read_iter+0xac/0x158
    [   57.597614]        vfs_read+0x2d4/0x3b4
    [   57.597622]        ksys_read+0x84/0x144
    [   57.597629]        system_call_exception+0x130/0x360
    [   57.597637]        system_call_common+0x160/0x2c4
    [   57.597647]
                   -> #1 (mapping.invalidate_lock#2){++++}-{4:4}:
    [   57.597661]        down_read+0x6c/0x220
    [   57.597669]        filemap_fault+0x870/0x100c
    [   57.597677]        xfs_filemap_fault+0xc4/0x18c
    [   57.597684]        __do_fault+0x64/0x164
    [   57.597693]        __handle_mm_fault+0x1274/0x1dac
    [   57.597702]        handle_mm_fault+0x248/0x484
    [   57.597711]        ___do_page_fault+0x428/0xc0c
    [   57.597719]        hash__do_page_fault+0x30/0x68
    [   57.597727]        do_hash_fault+0x90/0x35c
    [   57.597736]        data_access_common_virt+0x210/0x220
    [   57.597745]        _copy_from_user+0xf8/0x19c
    [   57.597754]        sel_write_load+0x178/0xd54
    [   57.597762]        vfs_write+0x108/0x6e8
    [   57.597769]        ksys_write+0x84/0x140
    [   57.597777]        system_call_exception+0x130/0x360
    [   57.597785]        system_call_common+0x160/0x2c4
    [   57.597794]
                   -> #0 (&mm->mmap_lock){++++}-{4:4}:
    [   57.597806]        __lock_acquire+0x17cc/0x2330
    [   57.597814]        lock_acquire+0x138/0x400
    [   57.597822]        __might_fault+0x7c/0xc0
    [   57.597830]        filldir64+0xe8/0x390
    [   57.597839]        dcache_readdir+0x80/0x2d4
    [   57.597846]        iterate_dir+0xd8/0x1d4
    [   57.597855]        sys_getdents64+0x88/0x2d4
    [   57.597864]        system_call_exception+0x130/0x360
    [   57.597872]        system_call_common+0x160/0x2c4
    [   57.597881]
                   other info that might help us debug this:
    
    [   57.597888] Chain exists of:
                     &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3
    
    [   57.597905]  Possible unsafe locking scenario:
    
    [   57.597911]        CPU0                    CPU1
    [   57.597917]        ----                    ----
    [   57.597922]   rlock(&sb->s_type->i_mutex_key#3);
    [   57.597932]                                lock(&q->debugfs_mutex);
    [   57.597940]                                lock(&sb->s_type->i_mutex_key#3);
    [   57.597950]   rlock(&mm->mmap_lock);
    [   57.597958]
                    *** DEADLOCK ***
    
    [   57.597965] 2 locks held by ls/4605:
    [   57.597971]  #0: c0000000137c12f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0xcc/0x154
    [   57.597989]  #1: c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4
    
    Prevent the above lockdep warning by acquiring ->sysfs_lock before
    freezing the queue while storing a queue attribute in queue_attr_store
    function. Later, we also found[1] another function __blk_mq_update_nr_
    hw_queues where we first freeze queue and then acquire the ->sysfs_lock.
    So we've also updated lock ordering in __blk_mq_update_nr_hw_queues
    function and ensured that in all code paths we follow the correct lock
    ordering i.e. acquire ->sysfs_lock before freezing the queue.
    
    [1] https://lore.kernel.org/all/CAFj5m9Ke8+EHKQBs_Nk6hqd=LGXtk4mUxZUN5==ZcCjnZSBwHw@mail.gmail.com/
    
    Reported-by: kjain@linux.ibm.com
    Fixes: af2814149883 ("block: freeze the queue in queue_attr_store")
    Tested-by: kjain@linux.ibm.com
    Cc: hch@lst.de
    Cc: axboe@kernel.dk
    Cc: ritesh.list@gmail.com
    Cc: ming.lei@redhat.com
    Cc: gjoyce@linux.ibm.com
    Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20241210144222.1066229-1-nilay@linux.ibm.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

block: get wp_offset by bdev_offset_from_zone_start [+ + +]

Author: LongPing Wei <weilongping@oppo.com>
Date:   Thu Nov 7 10:04:41 2024 +0800

    block: get wp_offset by bdev_offset_from_zone_start
    
    [ Upstream commit 790eb09e59709a1ffc1c64fe4aae2789120851b0 ]
    
    Call bdev_offset_from_zone_start() instead of open-coding it.
    
    Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
    Signed-off-by: LongPing Wei <weilongping@oppo.com>
    Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20241107020439.1644577-1-weilongping@oppo.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

block: Ignore REQ_NOWAIT for zone reset and zone finish operations [+ + +]

Author: Damien Le Moal <dlemoal@kernel.org>
Date:   Mon Dec 9 21:23:55 2024 +0900

    block: Ignore REQ_NOWAIT for zone reset and zone finish operations
    
    commit 5eb3317aa5a2ffe4574ab1a12cf9bc9447ca26c0 upstream.
    
    There are currently any issuer of REQ_OP_ZONE_RESET and
    REQ_OP_ZONE_FINISH operations that set REQ_NOWAIT. However, as we cannot
    handle this flag correctly due to the potential request allocation
    failure that may happen in blk_mq_submit_bio() after blk_zone_plug_bio()
    has handled the zone write plug write pointer updates for the targeted
    zones, modify blk_zone_wplug_handle_reset_or_finish() to warn if this
    flag is set and ignore it.
    
    Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
    Cc: stable@vger.kernel.org
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Link: https://lore.kernel.org/r/20241209122357.47838-3-dlemoal@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: Prevent potential deadlocks in zone write plug error recovery [+ + +]

Author: Damien Le Moal <dlemoal@kernel.org>
Date:   Mon Dec 9 21:23:57 2024 +0900

    block: Prevent potential deadlocks in zone write plug error recovery
    
    commit fe0418eb9bd69a19a948b297c8de815e05f3cde1 upstream.
    
    Zone write plugging for handling writes to zones of a zoned block
    device always execute a zone report whenever a write BIO to a zone
    fails. The intent of this is to ensure that the tracking of a zone write
    pointer is always correct to ensure that the alignment to a zone write
    pointer of write BIOs can be checked on submission and that we can
    always correctly emulate zone append operations using regular write
    BIOs.
    
    However, this error recovery scheme introduces a potential deadlock if a
    device queue freeze is initiated while BIOs are still plugged in a zone
    write plug and one of these write operation fails. In such case, the
    disk zone write plug error recovery work is scheduled and executes a
    report zone. This in turn can result in a request allocation in the
    underlying driver to issue the report zones command to the device. But
    with the device queue freeze already started, this allocation will
    block, preventing the report zone execution and the continuation of the
    processing of the plugged BIOs. As plugged BIOs hold a queue usage
    reference, the queue freeze itself will never complete, resulting in a
    deadlock.
    
    Avoid this problem by completely removing from the zone write plugging
    code the use of report zones operations after a failed write operation,
    instead relying on the device user to either execute a report zones,
    reset the zone, finish the zone, or give up writing to the device (which
    is a fairly common pattern for file systems which degrade to read-only
    after write failures). This is not an unreasonnable requirement as all
    well-behaved applications, FSes and device mapper already use report
    zones to recover from write errors whenever possible by comparing the
    current position of a zone write pointer with what their assumption
    about the position is.
    
    The changes to remove the automatic error recovery are as follows:
     - Completely remove the error recovery work and its associated
       resources (zone write plug list head, disk error list, and disk
       zone_wplugs_work work struct). This also removes the functions
       disk_zone_wplug_set_error() and disk_zone_wplug_clear_error().
    
     - Change the BLK_ZONE_WPLUG_ERROR zone write plug flag into
       BLK_ZONE_WPLUG_NEED_WP_UPDATE. This new flag is set for a zone write
       plug whenever a write opration targetting the zone of the zone write
       plug fails. This flag indicates that the zone write pointer offset is
       not reliable and that it must be updated when the next report zone,
       reset zone, finish zone or disk revalidation is executed.
    
     - Modify blk_zone_write_plug_bio_endio() to set the
       BLK_ZONE_WPLUG_NEED_WP_UPDATE flag for the target zone of a failed
       write BIO.
    
     - Modify the function disk_zone_wplug_set_wp_offset() to clear this
       new flag, thus implementing recovery of a correct write pointer
       offset with the reset (all) zone and finish zone operations.
    
     - Modify blkdev_report_zones() to always use the disk_report_zones_cb()
       callback so that disk_zone_wplug_sync_wp_offset() can be called for
       any zone marked with the BLK_ZONE_WPLUG_NEED_WP_UPDATE flag.
       This implements recovery of a correct write pointer offset for zone
       write plugs marked with BLK_ZONE_WPLUG_NEED_WP_UPDATE and within
       the range of the report zones operation executed by the user.
    
     - Modify blk_revalidate_seq_zone() to call
       disk_zone_wplug_sync_wp_offset() for all sequential write required
       zones when a zoned block device is revalidated, thus always resolving
       any inconsistency between the write pointer offset of zone write
       plugs and the actual write pointer position of sequential zones.
    
    Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
    Cc: stable@vger.kernel.org
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Link: https://lore.kernel.org/r/20241209122357.47838-5-dlemoal@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: Switch to using refcount_t for zone write plugs [+ + +]

Author: Damien Le Moal <dlemoal@kernel.org>
Date:   Thu Nov 7 15:54:38 2024 +0900

    block: Switch to using refcount_t for zone write plugs
    
    commit 4122fef16b172f7c1838fcf74340268c86ed96db upstream.
    
    Replace the raw atomic_t reference counting of zone write plugs with a
    refcount_t.  No functional changes.
    
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202411050650.ilIZa8S7-lkp@intel.com/
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20241107065438.236348-1-dlemoal@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: Use a zone write plug BIO work for REQ_NOWAIT BIOs [+ + +]

Author: Damien Le Moal <dlemoal@kernel.org>
Date:   Mon Dec 9 21:23:54 2024 +0900

    block: Use a zone write plug BIO work for REQ_NOWAIT BIOs
    
    commit cae005670887cb07ceafc25bb32e221e56286488 upstream.
    
    For zoned block devices, a write BIO issued to a zone that has no
    on-going writes will be prepared for execution and allowed to execute
    immediately by blk_zone_wplug_handle_write() (called from
    blk_zone_plug_bio()). However, if this BIO specifies REQ_NOWAIT, the
    allocation of a request for its execution in blk_mq_submit_bio() may
    fail after blk_zone_plug_bio() completed, marking the target zone of the
    BIO as plugged. When this BIO is retried later on, it will be blocked as
    the zone write plug of the target zone is in a plugged state without any
    on-going write operation (completion of write operations trigger
    unplugging of the next write BIOs for a zone). This leads to a BIO that
    is stuck in a zone write plug and never completes, which results in
    various issues such as hung tasks.
    
    Avoid this problem by always executing REQ_NOWAIT write BIOs using the
    BIO work of a zone write plug. This ensure that we never block the BIO
    issuer and can thus safely ignore the REQ_NOWAIT flag when executing the
    BIO from the zone write plug BIO work.
    
    Since such BIO may be the first write BIO issued to a zone with no
    on-going write, modify disk_zone_wplug_add_bio() to schedule the zone
    write plug BIO work if the write plug is not already marked with the
    BLK_ZONE_WPLUG_PLUGGED flag. This scheduling is otherwise not necessary
    as the completion of the on-going write for the zone will schedule the
    execution of the next plugged BIOs.
    
    blk_zone_wplug_handle_write() is also fixed to better handle zone write
    plug allocation failures for REQ_NOWAIT BIOs by failing a write BIO
    using bio_wouldblock_error() instead of bio_io_error().
    
    Reported-by: Bart Van Assche <bvanassche@acm.org>
    Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
    Cc: stable@vger.kernel.org
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Link: https://lore.kernel.org/r/20241209122357.47838-2-dlemoal@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: btmtk: avoid UAF in btmtk_process_coredump [+ + +]

Author: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Date:   Tue Dec 10 16:36:10 2024 -0300

    Bluetooth: btmtk: avoid UAF in btmtk_process_coredump
    
    [ Upstream commit b548f5e9456c568155499d9ebac675c0d7a296e8 ]
    
    hci_devcd_append may lead to the release of the skb, so it cannot be
    accessed once it is called.
    
    ==================================================================
    BUG: KASAN: slab-use-after-free in btmtk_process_coredump+0x2a7/0x2d0 [btmtk]
    Read of size 4 at addr ffff888033cfabb0 by task kworker/0:3/82
    
    CPU: 0 PID: 82 Comm: kworker/0:3 Tainted: G     U             6.6.40-lockdep-03464-g1d8b4eb3060e #1 b0b3c1cc0c842735643fb411799d97921d1f688c
    Hardware name: Google Yaviks_Ufs/Yaviks_Ufs, BIOS Google_Yaviks_Ufs.15217.552.0 05/07/2024
    Workqueue: events btusb_rx_work [btusb]
    Call Trace:
     <TASK>
     dump_stack_lvl+0xfd/0x150
     print_report+0x131/0x780
     kasan_report+0x177/0x1c0
     btmtk_process_coredump+0x2a7/0x2d0 [btmtk 03edd567dd71a65958807c95a65db31d433e1d01]
     btusb_recv_acl_mtk+0x11c/0x1a0 [btusb 675430d1e87c4f24d0c1f80efe600757a0f32bec]
     btusb_rx_work+0x9e/0xe0 [btusb 675430d1e87c4f24d0c1f80efe600757a0f32bec]
     worker_thread+0xe44/0x2cc0
     kthread+0x2ff/0x3a0
     ret_from_fork+0x51/0x80
     ret_from_fork_asm+0x1b/0x30
     </TASK>
    
    Allocated by task 82:
     stack_trace_save+0xdc/0x190
     kasan_set_track+0x4e/0x80
     __kasan_slab_alloc+0x4e/0x60
     kmem_cache_alloc+0x19f/0x360
     skb_clone+0x132/0xf70
     btusb_recv_acl_mtk+0x104/0x1a0 [btusb]
     btusb_rx_work+0x9e/0xe0 [btusb]
     worker_thread+0xe44/0x2cc0
     kthread+0x2ff/0x3a0
     ret_from_fork+0x51/0x80
     ret_from_fork_asm+0x1b/0x30
    
    Freed by task 1733:
     stack_trace_save+0xdc/0x190
     kasan_set_track+0x4e/0x80
     kasan_save_free_info+0x28/0xb0
     ____kasan_slab_free+0xfd/0x170
     kmem_cache_free+0x183/0x3f0
     hci_devcd_rx+0x91a/0x2060 [bluetooth]
     worker_thread+0xe44/0x2cc0
     kthread+0x2ff/0x3a0
     ret_from_fork+0x51/0x80
     ret_from_fork_asm+0x1b/0x30
    
    The buggy address belongs to the object at ffff888033cfab40
     which belongs to the cache skbuff_head_cache of size 232
    The buggy address is located 112 bytes inside of
     freed 232-byte region [ffff888033cfab40, ffff888033cfac28)
    
    The buggy address belongs to the physical page:
    page:00000000a174ba93 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x33cfa
    head:00000000a174ba93 order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
    anon flags: 0x4000000000000840(slab|head|zone=1)
    page_type: 0xffffffff()
    raw: 4000000000000840 ffff888100848a00 0000000000000000 0000000000000001
    raw: 0000000000000000 0000000080190019 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected
    
    Memory state around the buggy address:
     ffff888033cfaa80: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
     ffff888033cfab00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
    >ffff888033cfab80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                         ^
     ffff888033cfac00: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
     ffff888033cfac80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ==================================================================
    
    Check if we need to call hci_devcd_complete before calling
    hci_devcd_append. That requires that we check data->cd_info.cnt >=
    MTK_COREDUMP_NUM instead of data->cd_info.cnt > MTK_COREDUMP_NUM, as we
    increment data->cd_info.cnt only once the call to hci_devcd_append
    succeeds.
    
    Fixes: 0b7015132878 ("Bluetooth: btusb: mediatek: add MediaTek devcoredump support")
    Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Wed Dec 4 11:40:59 2024 -0500

    Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating
    
    [ Upstream commit 581dd2dc168fe0ed2a7a5534a724f0d3751c93ae ]
    
    The usage of rcu_read_(un)lock while inside list_for_each_entry_rcu is
    not safe since for the most part entries fetched this way shall be
    treated as rcu_dereference:
    
            Note that the value returned by rcu_dereference() is valid
            only within the enclosing RCU read-side critical section [1]_.
            For example, the following is **not** legal::
    
                    rcu_read_lock();
                    p = rcu_dereference(head.next);
                    rcu_read_unlock();
                    x = p->address; /* BUG!!! */
                    rcu_read_lock();
                    y = p->data;    /* BUG!!! */
                    rcu_read_unlock();
    
    Fixes: a0bfde167b50 ("Bluetooth: ISO: Add support for connecting multiple BISes")
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: Improve setsockopt() handling of malformed user input [+ + +]

Author: Michal Luczaj <mhal@rbox.co>
Date:   Tue Nov 19 14:31:40 2024 +0100

    Bluetooth: Improve setsockopt() handling of malformed user input
    
    [ Upstream commit 3e643e4efa1e87432204b62f9cfdea3b2508c830 ]
    
    The bt_copy_from_sockptr() return value is being misinterpreted by most
    users: a non-zero result is mistakenly assumed to represent an error code,
    but actually indicates the number of bytes that could not be copied.
    
    Remove bt_copy_from_sockptr() and adapt callers to use
    copy_safe_from_sockptr().
    
    For sco_sock_setsockopt() (case BT_CODEC) use copy_struct_from_sockptr() to
    scrub parts of uninitialized buffer.
    
    Opportunistically, rename `len` to `optlen` in hci_sock_setsockopt_old()
    and hci_sock_setsockopt().
    
    Fixes: 51eda36d33e4 ("Bluetooth: SCO: Fix not validating setsockopt user input")
    Fixes: a97de7bff13b ("Bluetooth: RFCOMM: Fix not validating setsockopt user input")
    Fixes: 4f3951242ace ("Bluetooth: L2CAP: Fix not validating setsockopt user input")
    Fixes: 9e8742cdfc4b ("Bluetooth: ISO: Fix not validating setsockopt user input")
    Fixes: b2186061d604 ("Bluetooth: hci_sock: Fix not validating setsockopt user input")
    Reviewed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Reviewed-by: David Wei <dw@davidwei.uk>
    Signed-off-by: Michal Luczaj <mhal@rbox.co>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: iso: Always release hdev at the end of iso_listen_bis [+ + +]

Author: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Date:   Wed Dec 4 14:28:48 2024 +0200

    Bluetooth: iso: Always release hdev at the end of iso_listen_bis
    
    [ Upstream commit 9c76fff747a73ba01d1d87ed53dd9c00cb40ba05 ]
    
    Since hci_get_route holds the device before returning, the hdev
    should be released with hci_dev_put at the end of iso_listen_bis
    even if the function returns with an error.
    
    Fixes: 02171da6e86a ("Bluetooth: ISO: Add hcon for listening bis sk")
    Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: iso: Fix circular lock in iso_conn_big_sync [+ + +]

Author: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Date:   Mon Dec 9 11:42:18 2024 +0200

    Bluetooth: iso: Fix circular lock in iso_conn_big_sync
    
    [ Upstream commit 7a17308c17880d259105f6e591eb1bc77b9612f0 ]
    
    This fixes the circular locking dependency warning below, by reworking
    iso_sock_recvmsg, to ensure that the socket lock is always released
    before calling a function that locks hdev.
    
    [  561.670344] ======================================================
    [  561.670346] WARNING: possible circular locking dependency detected
    [  561.670349] 6.12.0-rc6+ #26 Not tainted
    [  561.670351] ------------------------------------------------------
    [  561.670353] iso-tester/3289 is trying to acquire lock:
    [  561.670355] ffff88811f600078 (&hdev->lock){+.+.}-{3:3},
                   at: iso_conn_big_sync+0x73/0x260 [bluetooth]
    [  561.670405]
                   but task is already holding lock:
    [  561.670407] ffff88815af58258 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0},
                   at: iso_sock_recvmsg+0xbf/0x500 [bluetooth]
    [  561.670450]
                   which lock already depends on the new lock.
    
    [  561.670452]
                   the existing dependency chain (in reverse order) is:
    [  561.670453]
                   -> #2 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0}:
    [  561.670458]        lock_acquire+0x7c/0xc0
    [  561.670463]        lock_sock_nested+0x3b/0xf0
    [  561.670467]        bt_accept_dequeue+0x1a5/0x4d0 [bluetooth]
    [  561.670510]        iso_sock_accept+0x271/0x830 [bluetooth]
    [  561.670547]        do_accept+0x3dd/0x610
    [  561.670550]        __sys_accept4+0xd8/0x170
    [  561.670553]        __x64_sys_accept+0x74/0xc0
    [  561.670556]        x64_sys_call+0x17d6/0x25f0
    [  561.670559]        do_syscall_64+0x87/0x150
    [  561.670563]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [  561.670567]
                   -> #1 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
    [  561.670571]        lock_acquire+0x7c/0xc0
    [  561.670574]        lock_sock_nested+0x3b/0xf0
    [  561.670577]        iso_sock_listen+0x2de/0xf30 [bluetooth]
    [  561.670617]        __sys_listen_socket+0xef/0x130
    [  561.670620]        __x64_sys_listen+0xe1/0x190
    [  561.670623]        x64_sys_call+0x2517/0x25f0
    [  561.670626]        do_syscall_64+0x87/0x150
    [  561.670629]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [  561.670632]
                   -> #0 (&hdev->lock){+.+.}-{3:3}:
    [  561.670636]        __lock_acquire+0x32ad/0x6ab0
    [  561.670639]        lock_acquire.part.0+0x118/0x360
    [  561.670642]        lock_acquire+0x7c/0xc0
    [  561.670644]        __mutex_lock+0x18d/0x12f0
    [  561.670647]        mutex_lock_nested+0x1b/0x30
    [  561.670651]        iso_conn_big_sync+0x73/0x260 [bluetooth]
    [  561.670687]        iso_sock_recvmsg+0x3e9/0x500 [bluetooth]
    [  561.670722]        sock_recvmsg+0x1d5/0x240
    [  561.670725]        sock_read_iter+0x27d/0x470
    [  561.670727]        vfs_read+0x9a0/0xd30
    [  561.670731]        ksys_read+0x1a8/0x250
    [  561.670733]        __x64_sys_read+0x72/0xc0
    [  561.670736]        x64_sys_call+0x1b12/0x25f0
    [  561.670738]        do_syscall_64+0x87/0x150
    [  561.670741]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [  561.670744]
                   other info that might help us debug this:
    
    [  561.670745] Chain exists of:
    &hdev->lock --> sk_lock-AF_BLUETOOTH-BTPROTO_ISO --> sk_lock-AF_BLUETOOTH
    
    [  561.670751]  Possible unsafe locking scenario:
    
    [  561.670753]        CPU0                    CPU1
    [  561.670754]        ----                    ----
    [  561.670756]   lock(sk_lock-AF_BLUETOOTH);
    [  561.670758]                                lock(sk_lock
                                                  AF_BLUETOOTH-BTPROTO_ISO);
    [  561.670761]                                lock(sk_lock-AF_BLUETOOTH);
    [  561.670764]   lock(&hdev->lock);
    [  561.670767]
                    *** DEADLOCK ***
    
    Fixes: 07a9342b94a9 ("Bluetooth: ISO: Send BIG Create Sync via hci_sync")
    Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: iso: Fix circular lock in iso_listen_bis [+ + +]

Author: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Date:   Mon Dec 9 11:42:17 2024 +0200

    Bluetooth: iso: Fix circular lock in iso_listen_bis
    
    [ Upstream commit 168e28305b871d8ec604a8f51f35467b8d7ba05b ]
    
    This fixes the circular locking dependency warning below, by
    releasing the socket lock before enterning iso_listen_bis, to
    avoid any potential deadlock with hdev lock.
    
    [   75.307983] ======================================================
    [   75.307984] WARNING: possible circular locking dependency detected
    [   75.307985] 6.12.0-rc6+ #22 Not tainted
    [   75.307987] ------------------------------------------------------
    [   75.307987] kworker/u81:2/2623 is trying to acquire lock:
    [   75.307988] ffff8fde1769da58 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO)
                   at: iso_connect_cfm+0x253/0x840 [bluetooth]
    [   75.308021]
                   but task is already holding lock:
    [   75.308022] ffff8fdd61a10078 (&hdev->lock)
                   at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth]
    [   75.308053]
                   which lock already depends on the new lock.
    
    [   75.308054]
                   the existing dependency chain (in reverse order) is:
    [   75.308055]
                   -> #1 (&hdev->lock){+.+.}-{3:3}:
    [   75.308057]        __mutex_lock+0xad/0xc50
    [   75.308061]        mutex_lock_nested+0x1b/0x30
    [   75.308063]        iso_sock_listen+0x143/0x5c0 [bluetooth]
    [   75.308085]        __sys_listen_socket+0x49/0x60
    [   75.308088]        __x64_sys_listen+0x4c/0x90
    [   75.308090]        x64_sys_call+0x2517/0x25f0
    [   75.308092]        do_syscall_64+0x87/0x150
    [   75.308095]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [   75.308098]
                   -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
    [   75.308100]        __lock_acquire+0x155e/0x25f0
    [   75.308103]        lock_acquire+0xc9/0x300
    [   75.308105]        lock_sock_nested+0x32/0x90
    [   75.308107]        iso_connect_cfm+0x253/0x840 [bluetooth]
    [   75.308128]        hci_connect_cfm+0x6c/0x190 [bluetooth]
    [   75.308155]        hci_le_per_adv_report_evt+0x27b/0x2f0 [bluetooth]
    [   75.308180]        hci_le_meta_evt+0xe7/0x200 [bluetooth]
    [   75.308206]        hci_event_packet+0x21f/0x5c0 [bluetooth]
    [   75.308230]        hci_rx_work+0x3ae/0xb10 [bluetooth]
    [   75.308254]        process_one_work+0x212/0x740
    [   75.308256]        worker_thread+0x1bd/0x3a0
    [   75.308258]        kthread+0xe4/0x120
    [   75.308259]        ret_from_fork+0x44/0x70
    [   75.308261]        ret_from_fork_asm+0x1a/0x30
    [   75.308263]
                   other info that might help us debug this:
    
    [   75.308264]  Possible unsafe locking scenario:
    
    [   75.308264]        CPU0                CPU1
    [   75.308265]        ----                ----
    [   75.308265]   lock(&hdev->lock);
    [   75.308267]                            lock(sk_lock-
                                                    AF_BLUETOOTH-BTPROTO_ISO);
    [   75.308268]                            lock(&hdev->lock);
    [   75.308269]   lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
    [   75.308270]
                    *** DEADLOCK ***
    
    [   75.308271] 4 locks held by kworker/u81:2/2623:
    [   75.308272]  #0: ffff8fdd66e52148 ((wq_completion)hci0#2){+.+.}-{0:0},
                    at: process_one_work+0x443/0x740
    [   75.308276]  #1: ffffafb488b7fe48 ((work_completion)(&hdev->rx_work)),
                    at: process_one_work+0x1ce/0x740
    [   75.308280]  #2: ffff8fdd61a10078 (&hdev->lock){+.+.}-{3:3}
                    at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth]
    [   75.308304]  #3: ffffffffb6ba4900 (rcu_read_lock){....}-{1:2},
                    at: hci_connect_cfm+0x29/0x190 [bluetooth]
    
    Fixes: 02171da6e86a ("Bluetooth: ISO: Add hcon for listening bis sk")
    Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: iso: Fix recursive locking warning [+ + +]

Author: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Date:   Wed Dec 4 14:28:49 2024 +0200

    Bluetooth: iso: Fix recursive locking warning
    
    [ Upstream commit 9bde7c3b3ad0e1f39d6df93dd1c9caf63e19e50f ]
    
    This updates iso_sock_accept to use nested locking for the parent
    socket, to avoid lockdep warnings caused because the parent and
    child sockets are locked by the same thread:
    
    [   41.585683] ============================================
    [   41.585688] WARNING: possible recursive locking detected
    [   41.585694] 6.12.0-rc6+ #22 Not tainted
    [   41.585701] --------------------------------------------
    [   41.585705] iso-tester/3139 is trying to acquire lock:
    [   41.585711] ffff988b29530a58 (sk_lock-AF_BLUETOOTH)
                   at: bt_accept_dequeue+0xe3/0x280 [bluetooth]
    [   41.585905]
                   but task is already holding lock:
    [   41.585909] ffff988b29533a58 (sk_lock-AF_BLUETOOTH)
                   at: iso_sock_accept+0x61/0x2d0 [bluetooth]
    [   41.586064]
                   other info that might help us debug this:
    [   41.586069]  Possible unsafe locking scenario:
    
    [   41.586072]        CPU0
    [   41.586076]        ----
    [   41.586079]   lock(sk_lock-AF_BLUETOOTH);
    [   41.586086]   lock(sk_lock-AF_BLUETOOTH);
    [   41.586093]
                    *** DEADLOCK ***
    
    [   41.586097]  May be due to missing lock nesting notation
    
    [   41.586101] 1 lock held by iso-tester/3139:
    [   41.586107]  #0: ffff988b29533a58 (sk_lock-AF_BLUETOOTH)
                    at: iso_sock_accept+0x61/0x2d0 [bluetooth]
    
    Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
    Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: SCO: Add support for 16 bits transparent voice setting [+ + +]

Author: Frédéric Danis <frederic.danis@collabora.com>
Date:   Thu Dec 5 16:51:59 2024 +0100

    Bluetooth: SCO: Add support for 16 bits transparent voice setting
    
    [ Upstream commit 29a651451e6c264f58cd9d9a26088e579d17b242 ]
    
    The voice setting is used by sco_connect() or sco_conn_defer_accept()
    after being set by sco_sock_setsockopt().
    
    The PCM part of the voice setting is used for offload mode through PCM
    chipset port.
    This commits add support for mSBC 16 bits offloading, i.e. audio data
    not transported over HCI.
    
    The BCM4349B1 supports 16 bits transparent data on its I2S port.
    If BT_VOICE_TRANSPARENT is used when accepting a SCO connection, this
    gives only garbage audio while using BT_VOICE_TRANSPARENT_16BIT gives
    correct audio.
    This has been tested with connection to iPhone 14 and Samsung S24.
    
    Fixes: ad10b1a48754 ("Bluetooth: Add Bluetooth socket voice option")
    Signed-off-by: Frédéric Danis <frederic.danis@collabora.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bnxt_en: Fix aggregation ID mask to prevent oops on 5760X chips [+ + +]

Author: Michael Chan <michael.chan@broadcom.com>
Date:   Sun Dec 8 17:54:48 2024 -0800

    bnxt_en: Fix aggregation ID mask to prevent oops on 5760X chips
    
    [ Upstream commit 24c6843b7393ebc80962b59d7ae71af91bf0dcc1 ]
    
    The 5760X (P7) chip's HW GRO/LRO interface is very similar to that of
    the previous generation (5750X or P5).  However, the aggregation ID
    fields in the completion structures on P7 have been redefined from
    16 bits to 12 bits.  The freed up 4 bits are redefined for part of the
    metadata such as the VLAN ID.  The aggregation ID mask was not modified
    when adding support for P7 chips.  Including the extra 4 bits for the
    aggregation ID can potentially cause the driver to store or fetch the
    packet header of GRO/LRO packets in the wrong TPA buffer.  It may hit
    the BUG() condition in __skb_pull() because the SKB contains no valid
    packet header:
    
    kernel BUG at include/linux/skbuff.h:2766!
    Oops: invalid opcode: 0000 1 PREEMPT SMP NOPTI
    CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Kdump: loaded Tainted: G           OE      6.12.0-rc2+ #7
    Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
    Hardware name: Dell Inc. PowerEdge R760/0VRV9X, BIOS 1.0.1 12/27/2022
    RIP: 0010:eth_type_trans+0xda/0x140
    Code: 80 00 00 00 eb c1 8b 47 70 2b 47 74 48 8b 97 d0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb a5 <0f> 0b b8 00 01 00 00 eb 9c 48 85 ff 74 eb 31 f6 b9 02 00 00 00 48
    RSP: 0018:ff615003803fcc28 EFLAGS: 00010283
    RAX: 00000000000022d2 RBX: 0000000000000003 RCX: ff2e8c25da334040
    RDX: 0000000000000040 RSI: ff2e8c25c1ce8000 RDI: ff2e8c25869f9000
    RBP: ff2e8c258c31c000 R08: ff2e8c25da334000 R09: 0000000000000001
    R10: ff2e8c25da3342c0 R11: ff2e8c25c1ce89c0 R12: ff2e8c258e0990b0
    R13: ff2e8c25bb120000 R14: ff2e8c25c1ce89c0 R15: ff2e8c25869f9000
    FS:  0000000000000000(0000) GS:ff2e8c34be300000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055f05317e4c8 CR3: 000000108bac6006 CR4: 0000000000773ef0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
     <IRQ>
     ? die+0x33/0x90
     ? do_trap+0xd9/0x100
     ? eth_type_trans+0xda/0x140
     ? do_error_trap+0x65/0x80
     ? eth_type_trans+0xda/0x140
     ? exc_invalid_op+0x4e/0x70
     ? eth_type_trans+0xda/0x140
     ? asm_exc_invalid_op+0x16/0x20
     ? eth_type_trans+0xda/0x140
     bnxt_tpa_end+0x10b/0x6b0 [bnxt_en]
     ? bnxt_tpa_start+0x195/0x320 [bnxt_en]
     bnxt_rx_pkt+0x902/0xd90 [bnxt_en]
     ? __bnxt_tx_int.constprop.0+0x89/0x300 [bnxt_en]
     ? kmem_cache_free+0x343/0x440
     ? __bnxt_tx_int.constprop.0+0x24f/0x300 [bnxt_en]
     __bnxt_poll_work+0x193/0x370 [bnxt_en]
     bnxt_poll_p5+0x9a/0x300 [bnxt_en]
     ? try_to_wake_up+0x209/0x670
     __napi_poll+0x29/0x1b0
    
    Fix it by redefining the aggregation ID mask for P5_PLUS chips to be
    12 bits.  This will work because the maximum aggregation ID is less
    than 4096 on all P5_PLUS chips.
    
    Fixes: 13d2d3d381ee ("bnxt_en: Add new P7 hardware interface definitions")
    Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
    Signed-off-by: Michael Chan <michael.chan@broadcom.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://patch.msgid.link/20241209015448.1937766-1-michael.chan@broadcom.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bnxt_en: Fix GSO type for HW GRO packets on 5750X chips [+ + +]

Author: Michael Chan <michael.chan@broadcom.com>
Date:   Wed Dec 4 13:59:17 2024 -0800

    bnxt_en: Fix GSO type for HW GRO packets on 5750X chips
    
    [ Upstream commit de37faf41ac55619dd329229a9bd9698faeabc52 ]
    
    The existing code is using RSS profile to determine IPV4/IPV6 GSO type
    on all chips older than 5760X.  This won't work on 5750X chips that may
    be using modified RSS profiles.  This commit from 2018 has updated the
    driver to not use RSS profile for HW GRO packets on newer chips:
    
    50f011b63d8c ("bnxt_en: Update RSS setup and GRO-HW logic according to the latest spec.")
    
    However, a recent commit to add support for the newest 5760X chip broke
    the logic.  If the GRO packet needs to be re-segmented by the stack, the
    wrong GSO type will cause the packet to be dropped.
    
    Fix it to only use RSS profile to determine GSO type on the oldest
    5730X/5740X chips which cannot use the new method and is safe to use the
    RSS profiles.
    
    Also fix the L3/L4 hash type for RX packets by not using the RSS
    profile for the same reason.  Use the ITYPE field in the RX completion
    to determine L3/L4 hash types correctly.
    
    Fixes: a7445d69809f ("bnxt_en: Add support for new RX and TPA_START completion types for P7")
    Reviewed-by: Colin Winegarden <colin.winegarden@broadcom.com>
    Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Signed-off-by: Michael Chan <michael.chan@broadcom.com>
    Link: https://patch.msgid.link/20241204215918.1692597-2-michael.chan@broadcom.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bonding: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL [+ + +]

Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Tue Dec 10 15:12:43 2024 +0100

    bonding: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
    
    [ Upstream commit 77b11c8bf3a228d1c63464534c2dcc8d9c8bf7ff ]
    
    Drivers like mlx5 expose NIC's vlan_features such as
    NETIF_F_GSO_UDP_TUNNEL & NETIF_F_GSO_UDP_TUNNEL_CSUM which are
    later not propagated when the underlying devices are bonded and
    a vlan device created on top of the bond.
    
    Right now, the more cumbersome workaround for this is to create
    the vlan on top of the mlx5 and then enslave the vlan devices
    to a bond.
    
    To fix this, add NETIF_F_GSO_ENCAP_ALL to BOND_VLAN_FEATURES
    such that bond_compute_features() can probe and propagate the
    vlan_features from the slave devices up to the vlan device.
    
    Given the following bond:
    
      # ethtool -i enp2s0f{0,1}np{0,1}
      driver: mlx5_core
      [...]
    
      # ethtool -k enp2s0f0np0 | grep udp
      tx-udp_tnl-segmentation: on
      tx-udp_tnl-csum-segmentation: on
      tx-udp-segmentation: on
      rx-udp_tunnel-port-offload: on
      rx-udp-gro-forwarding: off
    
      # ethtool -k enp2s0f1np1 | grep udp
      tx-udp_tnl-segmentation: on
      tx-udp_tnl-csum-segmentation: on
      tx-udp-segmentation: on
      rx-udp_tunnel-port-offload: on
      rx-udp-gro-forwarding: off
    
      # ethtool -k bond0 | grep udp
      tx-udp_tnl-segmentation: on
      tx-udp_tnl-csum-segmentation: on
      tx-udp-segmentation: on
      rx-udp_tunnel-port-offload: off [fixed]
      rx-udp-gro-forwarding: off
    
    Before:
    
      # ethtool -k bond0.100 | grep udp
      tx-udp_tnl-segmentation: off [requested on]
      tx-udp_tnl-csum-segmentation: off [requested on]
      tx-udp-segmentation: on
      rx-udp_tunnel-port-offload: off [fixed]
      rx-udp-gro-forwarding: off
    
    After:
    
      # ethtool -k bond0.100 | grep udp
      tx-udp_tnl-segmentation: on
      tx-udp_tnl-csum-segmentation: on
      tx-udp-segmentation: on
      rx-udp_tunnel-port-offload: off [fixed]
      rx-udp-gro-forwarding: off
    
    Various users have run into this reporting performance issues when
    configuring Cilium in vxlan tunneling mode and having the combination
    of bond & vlan for the core devices connecting the Kubernetes cluster
    to the outside world.
    
    Fixes: a9b3ace44c7d ("bonding: fix vlan_features computing")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Cc: Nikolay Aleksandrov <razor@blackwall.org>
    Cc: Ido Schimmel <idosch@idosch.org>
    Cc: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Link: https://patch.msgid.link/20241210141245.327886-3-daniel@iogearbox.net
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bonding: Fix initial {vlan,mpls}_feature set in bond_compute_features [+ + +]

Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Tue Dec 10 15:12:42 2024 +0100

    bonding: Fix initial {vlan,mpls}_feature set in bond_compute_features
    
    [ Upstream commit d064ea7fe2a24938997b5e88e6b61cbb0a4bb906 ]
    
    If a bonding device has slave devices, then the current logic to derive
    the feature set for the master bond device is limited in that flags which
    are fully supported by the underlying slave devices cannot be propagated
    up to vlan devices which sit on top of bond devices. Instead, these get
    blindly masked out via current NETIF_F_ALL_FOR_ALL logic.
    
    vlan_features and mpls_features should reuse netdev_base_features() in
    order derive the set in the same way as ndo_fix_features before iterating
    through the slave devices to refine the feature set.
    
    Fixes: a9b3ace44c7d ("bonding: fix vlan_features computing")
    Fixes: 2e770b507ccd ("net: bonding: Inherit MPLS features from slave devices")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Cc: Nikolay Aleksandrov <razor@blackwall.org>
    Cc: Ido Schimmel <idosch@idosch.org>
    Cc: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Link: https://patch.msgid.link/20241210141245.327886-2-daniel@iogearbox.net
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bpf, sockmap: Fix race between element replace and close() [+ + +]

Author: Michal Luczaj <mhal@rbox.co>
Date:   Mon Dec 2 12:29:25 2024 +0100

    bpf, sockmap: Fix race between element replace and close()
    
    commit ed1fc5d76b81a4d681211333c026202cad4d5649 upstream.
    
    Element replace (with a socket different from the one stored) may race
    with socket's close() link popping & unlinking. __sock_map_delete()
    unconditionally unrefs the (wrong) element:
    
    // set map[0] = s0
    map_update_elem(map, 0, s0)
    
    // drop fd of s0
    close(s0)
      sock_map_close()
        lock_sock(sk)               (s0!)
        sock_map_remove_links(sk)
          link = sk_psock_link_pop()
          sock_map_unlink(sk, link)
            sock_map_delete_from_link
                                            // replace map[0] with s1
                                            map_update_elem(map, 0, s1)
                                              sock_map_update_elem
                                    (s1!)       lock_sock(sk)
                                                sock_map_update_common
                                                  psock = sk_psock(sk)
                                                  spin_lock(&stab->lock)
                                                  osk = stab->sks[idx]
                                                  sock_map_add_link(..., &stab->sks[idx])
                                                  sock_map_unref(osk, &stab->sks[idx])
                                                    psock = sk_psock(osk)
                                                    sk_psock_put(sk, psock)
                                                      if (refcount_dec_and_test(&psock))
                                                        sk_psock_drop(sk, psock)
                                                  spin_unlock(&stab->lock)
                                                unlock_sock(sk)
              __sock_map_delete
                spin_lock(&stab->lock)
                sk = *psk                        // s1 replaced s0; sk == s1
                if (!sk_test || sk_test == sk)   // sk_test (s0) != sk (s1); no branch
                  sk = xchg(psk, NULL)
                if (sk)
                  sock_map_unref(sk, psk)        // unref s1; sks[idx] will dangle
                    psock = sk_psock(sk)
                    sk_psock_put(sk, psock)
                      if (refcount_dec_and_test())
                        sk_psock_drop(sk, psock)
                spin_unlock(&stab->lock)
        release_sock(sk)
    
    Then close(map) enqueues bpf_map_free_deferred, which finally calls
    sock_map_free(). This results in some refcount_t warnings along with
    a KASAN splat [1].
    
    Fix __sock_map_delete(), do not allow sock_map_unref() on elements that
    may have been replaced.
    
    [1]:
    BUG: KASAN: slab-use-after-free in sock_map_free+0x10e/0x330
    Write of size 4 at addr ffff88811f5b9100 by task kworker/u64:12/1063
    
    CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Not tainted 6.12.0+ #125
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
    Workqueue: events_unbound bpf_map_free_deferred
    Call Trace:
     <TASK>
     dump_stack_lvl+0x68/0x90
     print_report+0x174/0x4f6
     kasan_report+0xb9/0x190
     kasan_check_range+0x10f/0x1e0
     sock_map_free+0x10e/0x330
     bpf_map_free_deferred+0x173/0x320
     process_one_work+0x846/0x1420
     worker_thread+0x5b3/0xf80
     kthread+0x29e/0x360
     ret_from_fork+0x2d/0x70
     ret_from_fork_asm+0x1a/0x30
     </TASK>
    
    Allocated by task 1202:
     kasan_save_stack+0x1e/0x40
     kasan_save_track+0x10/0x30
     __kasan_slab_alloc+0x85/0x90
     kmem_cache_alloc_noprof+0x131/0x450
     sk_prot_alloc+0x5b/0x220
     sk_alloc+0x2c/0x870
     unix_create1+0x88/0x8a0
     unix_create+0xc5/0x180
     __sock_create+0x241/0x650
     __sys_socketpair+0x1ce/0x420
     __x64_sys_socketpair+0x92/0x100
     do_syscall_64+0x93/0x180
     entry_SYSCALL_64_after_hwframe+0x76/0x7e
    
    Freed by task 46:
     kasan_save_stack+0x1e/0x40
     kasan_save_track+0x10/0x30
     kasan_save_free_info+0x37/0x60
     __kasan_slab_free+0x4b/0x70
     kmem_cache_free+0x1a1/0x590
     __sk_destruct+0x388/0x5a0
     sk_psock_destroy+0x73e/0xa50
     process_one_work+0x846/0x1420
     worker_thread+0x5b3/0xf80
     kthread+0x29e/0x360
     ret_from_fork+0x2d/0x70
     ret_from_fork_asm+0x1a/0x30
    
    The buggy address belongs to the object at ffff88811f5b9080
     which belongs to the cache UNIX-STREAM of size 1984
    The buggy address is located 128 bytes inside of
     freed 1984-byte region [ffff88811f5b9080, ffff88811f5b9840)
    
    The buggy address belongs to the physical page:
    page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11f5b8
    head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
    memcg:ffff888127d49401
    flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
    page_type: f5(slab)
    raw: 0017ffffc0000040 ffff8881042e4500 dead000000000122 0000000000000000
    raw: 0000000000000000 00000000800f000f 00000001f5000000 ffff888127d49401
    head: 0017ffffc0000040 ffff8881042e4500 dead000000000122 0000000000000000
    head: 0000000000000000 00000000800f000f 00000001f5000000 ffff888127d49401
    head: 0017ffffc0000003 ffffea00047d6e01 ffffffffffffffff 0000000000000000
    head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
    page dumped because: kasan: bad access detected
    
    Memory state around the buggy address:
     ffff88811f5b9000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     ffff88811f5b9080: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                       ^
     ffff88811f5b9180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
     ffff88811f5b9200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    Disabling lock debugging due to kernel taint
    
    refcount_t: addition on 0; use-after-free.
    WARNING: CPU: 14 PID: 1063 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150
    CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Tainted: G    B              6.12.0+ #125
    Tainted: [B]=BAD_PAGE
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
    Workqueue: events_unbound bpf_map_free_deferred
    RIP: 0010:refcount_warn_saturate+0xce/0x150
    Code: 34 73 eb 03 01 e8 82 53 ad fe 0f 0b eb b1 80 3d 27 73 eb 03 00 75 a8 48 c7 c7 80 bd 95 84 c6 05 17 73 eb 03 01 e8 62 53 ad fe <0f> 0b eb 91 80 3d 06 73 eb 03 00 75 88 48 c7 c7 e0 bd 95 84 c6 05
    RSP: 0018:ffff88815c49fc70 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffff88811f5b9100 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
    RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed10bcde6349
    R10: ffff8885e6f31a4b R11: 0000000000000000 R12: ffff88813be0b000
    R13: ffff88811f5b9100 R14: ffff88811f5b9080 R15: ffff88813be0b024
    FS:  0000000000000000(0000) GS:ffff8885e6f00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055dda99b0250 CR3: 000000015dbac000 CR4: 0000000000752ef0
    PKRU: 55555554
    Call Trace:
     <TASK>
     ? __warn.cold+0x5f/0x1ff
     ? refcount_warn_saturate+0xce/0x150
     ? report_bug+0x1ec/0x390
     ? handle_bug+0x58/0x90
     ? exc_invalid_op+0x13/0x40
     ? asm_exc_invalid_op+0x16/0x20
     ? refcount_warn_saturate+0xce/0x150
     sock_map_free+0x2e5/0x330
     bpf_map_free_deferred+0x173/0x320
     process_one_work+0x846/0x1420
     worker_thread+0x5b3/0xf80
     kthread+0x29e/0x360
     ret_from_fork+0x2d/0x70
     ret_from_fork_asm+0x1a/0x30
     </TASK>
    irq event stamp: 10741
    hardirqs last  enabled at (10741): [<ffffffff84400ec6>] asm_sysvec_apic_timer_interrupt+0x16/0x20
    hardirqs last disabled at (10740): [<ffffffff811e532d>] handle_softirqs+0x60d/0x770
    softirqs last  enabled at (10506): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
    softirqs last disabled at (10301): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
    
    refcount_t: underflow; use-after-free.
    WARNING: CPU: 14 PID: 1063 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150
    CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Tainted: G    B   W          6.12.0+ #125
    Tainted: [B]=BAD_PAGE, [W]=WARN
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
    Workqueue: events_unbound bpf_map_free_deferred
    RIP: 0010:refcount_warn_saturate+0xee/0x150
    Code: 17 73 eb 03 01 e8 62 53 ad fe 0f 0b eb 91 80 3d 06 73 eb 03 00 75 88 48 c7 c7 e0 bd 95 84 c6 05 f6 72 eb 03 01 e8 42 53 ad fe <0f> 0b e9 6e ff ff ff 80 3d e6 72 eb 03 00 0f 85 61 ff ff ff 48 c7
    RSP: 0018:ffff88815c49fc70 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffff88811f5b9100 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
    RBP: 0000000000000003 R08: 0000000000000001 R09: ffffed10bcde6349
    R10: ffff8885e6f31a4b R11: 0000000000000000 R12: ffff88813be0b000
    R13: ffff88811f5b9100 R14: ffff88811f5b9080 R15: ffff88813be0b024
    FS:  0000000000000000(0000) GS:ffff8885e6f00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055dda99b0250 CR3: 000000015dbac000 CR4: 0000000000752ef0
    PKRU: 55555554
    Call Trace:
     <TASK>
     ? __warn.cold+0x5f/0x1ff
     ? refcount_warn_saturate+0xee/0x150
     ? report_bug+0x1ec/0x390
     ? handle_bug+0x58/0x90
     ? exc_invalid_op+0x13/0x40
     ? asm_exc_invalid_op+0x16/0x20
     ? refcount_warn_saturate+0xee/0x150
     sock_map_free+0x2d3/0x330
     bpf_map_free_deferred+0x173/0x320
     process_one_work+0x846/0x1420
     worker_thread+0x5b3/0xf80
     kthread+0x29e/0x360
     ret_from_fork+0x2d/0x70
     ret_from_fork_asm+0x1a/0x30
     </TASK>
    irq event stamp: 10741
    hardirqs last  enabled at (10741): [<ffffffff84400ec6>] asm_sysvec_apic_timer_interrupt+0x16/0x20
    hardirqs last disabled at (10740): [<ffffffff811e532d>] handle_softirqs+0x60d/0x770
    softirqs last  enabled at (10506): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
    softirqs last disabled at (10301): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
    
    Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
    Signed-off-by: Michal Luczaj <mhal@rbox.co>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20241202-sockmap-replace-v1-3-1e88579e7bd5@rbox.co
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bpf, sockmap: Fix update element with same [+ + +]

Author: Michal Luczaj <mhal@rbox.co>
Date:   Mon Dec 2 12:29:23 2024 +0100

    bpf, sockmap: Fix update element with same
    
    commit 75e072a390da9a22e7ae4a4e8434dfca5da499fb upstream.
    
    Consider a sockmap entry being updated with the same socket:
    
            osk = stab->sks[idx];
            sock_map_add_link(psock, link, map, &stab->sks[idx]);
            stab->sks[idx] = sk;
            if (osk)
                    sock_map_unref(osk, &stab->sks[idx]);
    
    Due to sock_map_unref(), which invokes sock_map_del_link(), all the
    psock's links for stab->sks[idx] are torn:
    
            list_for_each_entry_safe(link, tmp, &psock->link, list) {
                    if (link->link_raw == link_raw) {
                            ...
                            list_del(&link->list);
                            sk_psock_free_link(link);
                    }
            }
    
    And that includes the new link sock_map_add_link() added just before
    the unref.
    
    This results in a sockmap holding a socket, but without the respective
    link. This in turn means that close(sock) won't trigger the cleanup,
    i.e. a closed socket will not be automatically removed from the sockmap.
    
    Stop tearing the links when a matching link_raw is found.
    
    Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
    Signed-off-by: Michal Luczaj <mhal@rbox.co>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20241202-sockmap-replace-v1-1-1e88579e7bd5@rbox.co
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog [+ + +]

Author: Jiri Olsa <jolsa@kernel.org>
Date:   Sun Dec 8 15:25:07 2024 +0100

    bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog
    
    commit 978c4486cca5c7b9253d3ab98a88c8e769cb9bbd upstream.
    
    Syzbot reported [1] crash that happens for following tracing scenario:
    
      - create tracepoint perf event with attr.inherit=1, attach it to the
        process and set bpf program to it
      - attached process forks -> chid creates inherited event
    
        the new child event shares the parent's bpf program and tp_event
        (hence prog_array) which is global for tracepoint
    
      - exit both process and its child -> release both events
      - first perf_event_detach_bpf_prog call will release tp_event->prog_array
        and second perf_event_detach_bpf_prog will crash, because
        tp_event->prog_array is NULL
    
    The fix makes sure the perf_event_detach_bpf_prog checks prog_array
    is valid before it tries to remove the bpf program from it.
    
    [1] https://lore.kernel.org/bpf/Z1MR6dCIKajNS6nU@krava/T/#m91dbf0688221ec7a7fc95e896a7ef9ff93b0b8ad
    
    Fixes: 0ee288e69d03 ("bpf,perf: Fix perf_event_detach_bpf_prog error handling")
    Reported-by: syzbot+2e0d2840414ce817aaac@syzkaller.appspotmail.com
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20241208142507.1207698-1-jolsa@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bpf: Augment raw_tp arguments with PTR_MAYBE_NULL [+ + +]

Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Fri Dec 13 14:19:28 2024 -0800

    bpf: Augment raw_tp arguments with PTR_MAYBE_NULL
    
    commit 838a10bd2ebfe11a60dd67687533a7cfc220cc86 upstream.
    
    Arguments to a raw tracepoint are tagged as trusted, which carries the
    semantics that the pointer will be non-NULL.  However, in certain cases,
    a raw tracepoint argument may end up being NULL. More context about this
    issue is available in [0].
    
    Thus, there is a discrepancy between the reality, that raw_tp arguments can
    actually be NULL, and the verifier's knowledge, that they are never NULL,
    causing explicit NULL check branch to be dead code eliminated.
    
    A previous attempt [1], i.e. the second fixed commit, was made to
    simulate symbolic execution as if in most accesses, the argument is a
    non-NULL raw_tp, except for conditional jumps.  This tried to suppress
    branch prediction while preserving compatibility, but surfaced issues
    with production programs that were difficult to solve without increasing
    verifier complexity. A more complete discussion of issues and fixes is
    available at [2].
    
    Fix this by maintaining an explicit list of tracepoints where the
    arguments are known to be NULL, and mark the positional arguments as
    PTR_MAYBE_NULL. Additionally, capture the tracepoints where arguments
    are known to be ERR_PTR, and mark these arguments as scalar values to
    prevent potential dereference.
    
    Each hex digit is used to encode NULL-ness (0x1) or ERR_PTR-ness (0x2),
    shifted by the zero-indexed argument number x 4. This can be represented
    as follows:
    1st arg: 0x1
    2nd arg: 0x10
    3rd arg: 0x100
    ... and so on (likewise for ERR_PTR case).
    
    In the future, an automated pass will be used to produce such a list, or
    insert __nullable annotations automatically for tracepoints. Each
    compilation unit will be analyzed and results will be collated to find
    whether a tracepoint pointer is definitely not null, maybe null, or an
    unknown state where verifier conservatively marks it PTR_MAYBE_NULL.
    A proof of concept of this tool from Eduard is available at [3].
    
    Note that in case we don't find a specification in the raw_tp_null_args
    array and the tracepoint belongs to a kernel module, we will
    conservatively mark the arguments as PTR_MAYBE_NULL. This is because
    unlike for in-tree modules, out-of-tree module tracepoints may pass NULL
    freely to the tracepoint. We don't protect against such tracepoints
    passing ERR_PTR (which is uncommon anyway), lest we mark all such
    arguments as SCALAR_VALUE.
    
    While we are it, let's adjust the test raw_tp_null to not perform
    dereference of the skb->mark, as that won't be allowed anymore, and make
    it more robust by using inline assembly to test the dead code
    elimination behavior, which should still stay the same.
    
      [0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb
      [1]: https://lore.kernel.org/all/20241104171959.2938862-1-memxor@gmail.com
      [2]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
      [3]: https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params
    
    Reported-by: Juri Lelli <juri.lelli@redhat.com> # original bug
    Reported-by: Manu Bretelle <chantra@meta.com> # bugs in masking fix
    Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
    Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
    Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
    Co-developed-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20241213221929.3495062-3-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bpf: Check size for BTF-based ctx access of pointer members [+ + +]

Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Thu Dec 12 01:20:49 2024 -0800

    bpf: Check size for BTF-based ctx access of pointer members
    
    commit 659b9ba7cb2d7adb64618b87ddfaa528a143766e upstream.
    
    Robert Morris reported the following program type which passes the
    verifier in [0]:
    
    SEC("struct_ops/bpf_cubic_init")
    void BPF_PROG(bpf_cubic_init, struct sock *sk)
    {
            asm volatile("r2 = *(u16*)(r1 + 0)");     // verifier should demand u64
            asm volatile("*(u32 *)(r2 +1504) = 0");   // 1280 in some configs
    }
    
    The second line may or may not work, but the first instruction shouldn't
    pass, as it's a narrow load into the context structure of the struct ops
    callback. The code falls back to btf_ctx_access to ensure correctness
    and obtaining the types of pointers. Ensure that the size of the access
    is correctly checked to be 8 bytes, otherwise the verifier thinks the
    narrow load obtained a trusted BTF pointer and will permit loads/stores
    as it sees fit.
    
    Perform the check on size after we've verified that the load is for a
    pointer field, as for scalar values narrow loads are fine. Access to
    structs passed as arguments to a BPF program are also treated as
    scalars, therefore no adjustment is needed in their case.
    
    Existing verifier selftests are broken by this change, but because they
    were incorrect. Verifier tests for d_path were performing narrow load
    into context to obtain path pointer, had this program actually run it
    would cause a crash. The same holds for verifier_btf_ctx_access tests.
    
      [0]: https://lore.kernel.org/bpf/51338.1732985814@localhost
    
    Fixes: 9e15db66136a ("bpf: Implement accurate raw_tp context access via BTF")
    Reported-by: Robert Morris <rtm@mit.edu>
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20241212092050.3204165-2-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bpf: Fix theoretical prog_array UAF in __uprobe_perf_func() [+ + +]

Author: Jann Horn <jannh@google.com>
Date:   Tue Dec 10 20:08:14 2024 +0100

    bpf: Fix theoretical prog_array UAF in __uprobe_perf_func()
    
    commit 7d0d673627e20cfa3b21a829a896ce03b58a4f1c upstream.
    
    Currently, the pointer stored in call->prog_array is loaded in
    __uprobe_perf_func(), with no RCU annotation and no immediately visible
    RCU protection, so it looks as if the loaded pointer can immediately be
    dangling.
    Later, bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical
    section, but this is too late. It then uses rcu_dereference_check(), but
    this use of rcu_dereference_check() does not actually dereference anything.
    
    Fix it by aligning the semantics to bpf_prog_run_array(): Let the caller
    provide rcu_read_lock_trace() protection and then load call->prog_array
    with rcu_dereference_check().
    
    This issue seems to be theoretical: I don't know of any way to reach this
    code without having handle_swbp() further up the stack, which is already
    holding a rcu_read_lock_trace() lock, so where we take
    rcu_read_lock_trace() in __uprobe_perf_func()/bpf_prog_run_array_uprobe()
    doesn't actually have any effect.
    
    Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
    Suggested-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Jann Horn <jannh@google.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20241210-bpf-fix-uprobe-uaf-v4-1-5fc8959b2b74@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors [+ + +]

Author: Jann Horn <jannh@google.com>
Date:   Tue Dec 10 17:32:13 2024 +0100

    bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors
    
    commit ef1b808e3b7c98612feceedf985c2fbbeb28f956 upstream.
    
    Uprobes always use bpf_prog_run_array_uprobe() under tasks-trace-RCU
    protection. But it is possible to attach a non-sleepable BPF program to a
    uprobe, and non-sleepable BPF programs are freed via normal RCU (see
    __bpf_prog_put_noref()). This leads to UAF of the bpf_prog because a normal
    RCU grace period does not imply a tasks-trace-RCU grace period.
    
    Fix it by explicitly waiting for a tasks-trace-RCU grace period after
    removing the attachment of a bpf_prog to a perf_event.
    
    Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
    Suggested-by: Andrii Nakryiko <andrii@kernel.org>
    Suggested-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Jann Horn <jannh@google.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/bpf/20241210-bpf-fix-actual-uprobe-uaf-v1-1-19439849dd44@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL" [+ + +]

Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Fri Dec 13 14:19:27 2024 -0800

    bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"
    
    commit c00d738e1673ab801e1577e4e3c780ccf88b1a5b upstream.
    
    This patch reverts commit
    cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"). The
    patch was well-intended and meant to be as a stop-gap fixing branch
    prediction when the pointer may actually be NULL at runtime. Eventually,
    it was supposed to be replaced by an automated script or compiler pass
    detecting possibly NULL arguments and marking them accordingly.
    
    However, it caused two main issues observed for production programs and
    failed to preserve backwards compatibility. First, programs relied on
    the verifier not exploring == NULL branch when pointer is not NULL, thus
    they started failing with a 'dereference of scalar' error.  Next,
    allowing raw_tp arguments to be modified surfaced the warning in the
    verifier that warns against reg->off when PTR_MAYBE_NULL is set.
    
    More information, context, and discusson on both problems is available
    in [0]. Overall, this approach had several shortcomings, and the fixes
    would further complicate the verifier's logic, and the entire masking
    scheme would have to be removed eventually anyway.
    
    Hence, revert the patch in preparation of a better fix avoiding these
    issues to replace this commit.
    
      [0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
    
    Reported-by: Manu Bretelle <chantra@meta.com>
    Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20241213221929.3495062-2-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cifs: Fix rmdir failure due to ongoing I/O on deleted file [+ + +]

Author: David Howells <dhowells@redhat.com>
Date:   Fri Dec 6 11:06:32 2024 +0000

    cifs: Fix rmdir failure due to ongoing I/O on deleted file
    
    [ Upstream commit bb57c81e97e0082abfb0406ed6f67c615c3d206c ]
    
    The cifs_io_request struct (a wrapper around netfs_io_request) holds open
    the file on the server, even beyond the local Linux file being closed.
    This can cause problems with Windows-based filesystems as the file's name
    still exists after deletion until the file is closed, preventing the parent
    directory from being removed and causing spurious test failures in xfstests
    due to inability to remove a directory.  The symptom looks something like
    this in the test output:
    
       rm: cannot remove '/mnt/scratch/test/p0/d3': Directory not empty
       rm: cannot remove '/mnt/scratch/test/p1/dc/dae': Directory not empty
    
    Fix this by waiting in unlink and rename for any outstanding I/O requests
    to be completed on the target file before removing that file.
    
    Note that this doesn't prevent Linux from trying to start new requests
    after deletion if it still has the file open locally - something that's
    perfectly acceptable on a UNIX system.
    
    Note also that whilst I've marked this as fixing the commit to make cifs
    use netfslib, I don't know that it won't occur before that.
    
    Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
    Signed-off-by: David Howells <dhowells@redhat.com>
    Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    cc: Jeff Layton <jlayton@kernel.org>
    cc: linux-cifs@vger.kernel.org
    cc: netfs@lists.linux.dev
    cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

clk: en7523: Fix wrong BUS clock for EN7581 [+ + +]

Author: Christian Marangi <ansuelsmth@gmail.com>
Date:   Sat Nov 16 11:56:53 2024 +0100

    clk: en7523: Fix wrong BUS clock for EN7581
    
    commit 2eb75f86d52565367211c51334d15fe672633085 upstream.
    
    The Documentation for EN7581 had a typo and still referenced the EN7523
    BUS base source frequency. This was in conflict with a different page in
    the Documentration that state that the BUS runs at 300MHz (600MHz source
    with divisor set to 2) and the actual watchdog that tick at half the BUS
    clock (150MHz). This was verified with the watchdog by timing the
    seconds that the system takes to reboot (due too watchdog) and by
    operating on different values of the BUS divisor.
    
    The correct values for source of BUS clock are 600MHz and 540MHz.
    
    This was also confirmed by Airoha.
    
    Cc: stable@vger.kernel.org
    Fixes: 66bc47326ce2 ("clk: en7523: Add EN7581 support")
    Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
    Link: https://lore.kernel.org/r/20241116105710.19748-1-ansuelsmth@gmail.com
    Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Signed-off-by: Stephen Boyd <sboyd@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: hisilicon/debugfs - fix the struct pointer incorrectly offset problem [+ + +]

Author: Chenghai Huang <huangchenghai2@huawei.com>
Date:   Sat Nov 30 16:01:31 2024 +0800

    crypto: hisilicon/debugfs - fix the struct pointer incorrectly offset problem
    
    commit cd26cd65476711e2c69e0a049c0eeef4b743f5ac upstream.
    
    Offset based on (id * size) is wrong for sqc and cqc.
    (*sqc/*cqc + 1) can already offset sizeof(struct(Xqc)) length.
    
    Fixes: 15f112f9cef5 ("crypto: hisilicon/debugfs - mask the unnecessary info from the dump")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cxgb4: use port number to set mac addr [+ + +]

Author: Anumula Murali Mohan Reddy <anumula@chelsio.com>
Date:   Fri Dec 6 11:50:14 2024 +0530

    cxgb4: use port number to set mac addr
    
    [ Upstream commit 356983f569c1f5991661fc0050aa263792f50616 ]
    
    t4_set_vf_mac_acl() uses pf to set mac addr, but t4vf_get_vf_mac_acl()
    uses port number to get mac addr, this leads to error when an attempt
    to set MAC address on VF's of PF2 and PF3.
    This patch fixes the issue by using port number to set mac address.
    
    Fixes: e0cdac65ba26 ("cxgb4vf: configure ports accessible by the VF")
    Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com>
    Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://patch.msgid.link/20241206062014.49414-1-anumula@chelsio.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dm: Fix dm-zoned-reclaim zone write pointer alignment [+ + +]

Author: Damien Le Moal <dlemoal@kernel.org>
Date:   Mon Dec 9 21:23:56 2024 +0900

    dm: Fix dm-zoned-reclaim zone write pointer alignment
    
    commit b76b840fd93374240b59825f1ab8e2f5c9907acb upstream.
    
    The zone reclaim processing of the dm-zoned device mapper uses
    blkdev_issue_zeroout() to align the write pointer of a zone being used
    for reclaiming another zone, to write the valid data blocks from the
    zone being reclaimed at the same position relative to the zone start in
    the reclaim target zone.
    
    The first call to blkdev_issue_zeroout() will try to use hardware
    offload using a REQ_OP_WRITE_ZEROES operation if the device reports a
    non-zero max_write_zeroes_sectors queue limit. If this operation fails
    because of the lack of hardware support, blkdev_issue_zeroout() falls
    back to using a regular write operation with the zero-page as buffer.
    Currently, such REQ_OP_WRITE_ZEROES failure is automatically handled by
    the block layer zone write plugging code which will execute a report
    zones operation to ensure that the write pointer of the target zone of
    the failed operation has not changed and to "rewind" the zone write
    pointer offset of the target zone as it was advanced when the write zero
    operation was submitted. So the REQ_OP_WRITE_ZEROES failure does not
    cause any issue and blkdev_issue_zeroout() works as expected.
    
    However, since the automatic recovery of zone write pointers by the zone
    write plugging code can potentially cause deadlocks with queue freeze
    operations, a different recovery must be implemented in preparation for
    the removal of zone write plugging report zones based recovery.
    
    Do this by introducing the new function blk_zone_issue_zeroout(). This
    function first calls blkdev_issue_zeroout() with the flag
    BLKDEV_ZERO_NOFALLBACK to intercept failures on the first execution
    which attempt to use the device hardware offload with the
    REQ_OP_WRITE_ZEROES operation. If this attempt fails, a report zone
    operation is issued to restore the zone write pointer offset of the
    target zone to the correct position and blkdev_issue_zeroout() is called
    again without the BLKDEV_ZERO_NOFALLBACK flag. The report zones
    operation performing this recovery is implemented using the helper
    function disk_zone_sync_wp_offset() which calls the gendisk report_zones
    file operation with the callback disk_report_zones_cb(). This callback
    updates the target write pointer offset of the target zone using the new
    function disk_zone_wplug_sync_wp_offset().
    
    dmz_reclaim_align_wp() is modified to change its call to
    blkdev_issue_zeroout() to a call to blk_zone_issue_zeroout() without any
    other change needed as the two functions are functionnally equivalent.
    
    Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
    Cc: stable@vger.kernel.org
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Acked-by: Mike Snitzer <snitzer@kernel.org>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Link: https://lore.kernel.org/r/20241209122357.47838-4-dlemoal@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Documentation: networking: Add a caveat to nexthop_compat_mode sysctl [+ + +]

Author: Petr Machata <petrm@nvidia.com>
Date:   Mon Dec 9 12:05:31 2024 +0100

    Documentation: networking: Add a caveat to nexthop_compat_mode sysctl
    
    [ Upstream commit bbe4b41259a3e255a16d795486d331c1670b4e75 ]
    
    net.ipv4.nexthop_compat_mode was added when nexthop objects were added to
    provide the view of nexthop objects through the usual lens of the route
    UAPI. As nexthop objects evolved, the information provided through this
    lens became incomplete. For example, details of resilient nexthop groups
    are obviously omitted.
    
    Now that 16-bit nexthop group weights are a thing, the 8-bit UAPI cannot
    convey the >8-bit weight accurately. Instead of inventing workarounds for
    an obsolete interface, just document the expectations of inaccuracy.
    
    Fixes: b72a6a7ab957 ("net: nexthop: Increase weight to u16")
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://patch.msgid.link/b575e32399ccacd09079b2a218255164535123bd.1733740749.git.petrm@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Documentation: PM: Clarify pm_runtime_resume_and_get() return value [+ + +]

Author: Paul Barker <paul.barker.ct@bp.renesas.com>
Date:   Tue Dec 3 14:37:29 2024 +0000

    Documentation: PM: Clarify pm_runtime_resume_and_get() return value
    
    [ Upstream commit ccb84dc8f4a02e7d30ffd388522996546b4d00e1 ]
    
    Update the documentation to match the behaviour of the code.
    
    pm_runtime_resume_and_get() always returns 0 on success, even if
    __pm_runtime_resume() returns 1.
    
    Fixes: 2c412337cfe6 ("PM: runtime: Add documentation for pm_runtime_resume_and_get()")
    Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com>
    Link: https://patch.msgid.link/20241203143729.478-1-paul.barker.ct@bp.renesas.com
    [ rjw: Subject and changelog edits, adjusted new comment formatting ]
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd/pm: Set SMU v13.0.7 default workload type [+ + +]

Author: Kenneth Feng <kenneth.feng@amd.com>
Date:   Wed Dec 4 13:22:10 2024 +0530

    drm/amd/pm: Set SMU v13.0.7 default workload type
    
    commit 3912a78cf72eb45f8153a395162b08fef9c5ec3d upstream.
    
    Set the default workload type to bootup type on smu v13.0.7.
    This is because of the constraint on smu v13.0.7.
    Gfx activity has an even higher set point on 3D fullscreen
    mode than the one on bootup mode. This causes the 3D fullscreen
    mode's performance is worse than the bootup mode's performance
    for the lightweighted/medium workload. For the high workload,
    the performance is the same between 3D fullscreen mode and bootup
    mode.
    
    v2: set the default workload in ASIC specific file
    
    Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
    Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org # 6.11.x
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: fix UVD contiguous CS mapping problem [+ + +]

Author: Christian König <christian.koenig@amd.com>
Date:   Fri Nov 29 14:19:21 2024 +0100

    drm/amdgpu: fix UVD contiguous CS mapping problem
    
    commit 12f325bcd2411e571dbb500bf6862c812c479735 upstream.
    
    When starting the mpv player, Radeon R9 users are observing
    the below error in dmesg.
    
    [drm:amdgpu_uvd_cs_pass2 [amdgpu]]
    *ERROR* msg/fb buffer ff00f7c000-ff00f7e000 out of 256MB segment!
    
    The patch tries to set the TTM_PL_FLAG_CONTIGUOUS for both user
    flag(AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) set and not set cases.
    
    v2: Make the TTM_PL_FLAG_CONTIGUOUS mandatory for user BO's.
    v3: revert back to v1, but fix the check instead (chk).
    
    Closes:https://gitlab.freedesktop.org/drm/amd/-/issues/3599
    Closes:https://gitlab.freedesktop.org/drm/amd/-/issues/3501
    Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
    Signed-off-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org # 6.10+
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: fix when the cleaner shader is emitted [+ + +]

Author: Christian König <christian.koenig@amd.com>
Date:   Fri Dec 6 14:46:06 2024 +0100

    drm/amdgpu: fix when the cleaner shader is emitted
    
    commit f4df208177d02f1c90f3644da3a2453080b8c24f upstream.
    
    Emitting the cleaner shader must come after the check if a VM switch is
    necessary or not.
    
    Otherwise we will emit the cleaner shader every time and not just when it is
    necessary because we switched between applications.
    
    This can otherwise crash on gang submit and probably decreases performance
    quite a bit.
    
    v2: squash in fix from Srini (Alex)
    
    Signed-off-by: Christian König <christian.koenig@amd.com>
    Fixes: ee7a846ea27b ("drm/amdgpu: Emit cleaner shader at end of IB submission")
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdkfd: Dereference null return value [+ + +]

Author: Andrew Martin <Andrew.Martin@amd.com>
Date:   Tue Nov 26 12:10:59 2024 -0500

    drm/amdkfd: Dereference null return value
    
    commit a592bb19abdc2072875c87da606461bfd7821b08 upstream.
    
    In the function pqm_uninit there is a call-assignment of "pdd =
    kfd_get_process_device_data" which could be null, and this value was
    later dereferenced without checking.
    
    Fixes: fb91065851cd ("drm/amdkfd: Refactor queue wptr_bo GART mapping")
    Signed-off-by: Andrew Martin <Andrew.Martin@amd.com>
    Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdkfd: hard-code cacheline size for gfx11 [+ + +]

Author: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Date:   Wed Nov 27 14:01:35 2024 -0500

    drm/amdkfd: hard-code cacheline size for gfx11
    
    commit 321048c4a3e375416b51b4093978f9ce2aa4d391 upstream.
    
    This information is not available in ip discovery table.
    
    Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
    Reviewed-by: David Belanger <david.belanger@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdkfd: hard-code MALL cacheline size for gfx11, gfx12 [+ + +]

Author: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Date:   Thu Nov 28 11:07:57 2024 -0500

    drm/amdkfd: hard-code MALL cacheline size for gfx11, gfx12
    
    commit d50bf3f0fab636574c163ba8b5863e12b1ed19bd upstream.
    
    This information is not available in ip discovery table.
    
    Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
    Reviewed-by: David Belanger <david.belanger@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdkfd: pause autosuspend when creating pdd [+ + +]

Author: Jesse.zhang@amd.com <Jesse.zhang@amd.com>
Date:   Thu Dec 5 17:41:26 2024 +0800

    drm/amdkfd: pause autosuspend when creating pdd
    
    commit 438b39ac74e2a9dc0a5c9d653b7d8066877e86b1 upstream.
    
    When using MES creating a pdd will require talking to the GPU to
    setup the relevant context. The code here forgot to wake up the GPU
    in case it was in suspend, this causes KVM to EFAULT for passthrough
    GPU for example. This issue can be masked if the GPU was woken up by
    other things (e.g. opening the KMS node) first and have not yet gone to sleep.
    
    v4: do the allocation of proc_ctx_bo in a lazy fashion
    when the first queue is created in a process (Felix)
    
    Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
    Reviewed-by: Yunxiang Li <Yunxiang.Li@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915/color: Stop using non-posted DSB writes for legacy LUT [+ + +]

Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Wed Nov 20 18:41:21 2024 +0200

    drm/i915/color: Stop using non-posted DSB writes for legacy LUT
    
    commit cd3da567e2e46b8f75549637b960a83b024d6b6e upstream.
    
    DSB LUT register writes vs. palette anti-collision logic
    appear to interact in interesting ways:
    - posted DSB writes simply vanish into thin air while
      anti-collision is active
    - non-posted DSB writes actually get blocked by the anti-collision
      logic, but unfortunately this ends up hogging the bus for
      long enough that unrelated parallel CPU MMIO accesses start
      to disappear instead
    
    Even though we are updating the LUT during vblank we aren't
    immune to the anti-collision logic because it kicks in briefly
    for pipe prefill (initiated at frame start). The safe time
    window for performing the LUT update is thus between the
    undelayed vblank and frame start. Turns out that with low
    enough CDCLK frequency (DSB execution speed depends on CDCLK)
    we can exceed that.
    
    As we are currently using non-posted writes for the legacy LUT
    updates, in which case we can hit the far more severe failure
    mode. The problem is exacerbated by the fact that non-posted
    writes are much slower than posted writes (~4x it seems).
    
    To mititage the problem let's switch to using posted DSB
    writes for legacy LUT updates (which will involve using the
    double write approach to avoid other problems with DSB
    vs. legacy LUT writes). Despite writing each register twice
    this will in fact make the legacy LUT update faster when
    compared to the non-posted write approach, making the
    problem less likely to appear. The failure mode is also
    less severe.
    
    This isn't the 100% solution we need though. That will involve
    estimating how long the LUT update will take, and pushing
    frame start and/or delayed vblank forward to guarantee that
    the update will have finished by the time the pipe prefill
    starts...
    
    Cc: stable@vger.kernel.org
    Fixes: 34d8311f4a1c ("drm/i915/dsb: Re-instate DSB for LUT updates")
    Fixes: 25ea3411bd23 ("drm/i915/dsb: Use non-posted register writes for legacy LUT")
    Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12494
    Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20241120164123.12706-3-ville.syrjala@linux.intel.com
    Reviewed-by: Uma Shankar <uma.shankar@intel.com>
    (cherry picked from commit 2504a316b35d49522f39cf0dc01830d7c36a9be4)
    Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915: Fix memory leak by correcting cache object name in error handler [+ + +]

Author: Jiasheng Jiang <jiashengjiangcool@outlook.com>
Date:   Wed Nov 27 20:10:42 2024 +0000

    drm/i915: Fix memory leak by correcting cache object name in error handler
    
    commit 2828e5808bcd5aae7fdcd169cac1efa2701fa2dd upstream.
    
    Replace "slab_priorities" with "slab_dependencies" in the error handler
    to avoid memory leak.
    
    Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global")
    Cc: <stable@vger.kernel.org> # v5.2+
    Signed-off-by: Jiasheng Jiang <jiashengjiangcool@outlook.com>
    Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
    Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
    Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20241127201042.29620-1-jiashengjiangcool@gmail.com
    (cherry picked from commit 9bc5e7dc694d3112bbf0fa4c46ef0fa0f114937a)
    Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915: Fix NULL pointer dereference in capture_engine [+ + +]

Author: Eugene Kobyak <eugene.kobyak@intel.com>
Date:   Tue Dec 3 14:54:06 2024 +0000

    drm/i915: Fix NULL pointer dereference in capture_engine
    
    commit da0b986256ae9a78b0215214ff44f271bfe237c1 upstream.
    
    When the intel_context structure contains NULL,
    it raises a NULL pointer dereference error in drm_info().
    
    Fixes: e8a3319c31a1 ("drm/i915: Allow error capture without a request")
    Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12309
    Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
    Cc: John Harrison <John.C.Harrison@Intel.com>
    Cc: <stable@vger.kernel.org> # v6.3+
    Signed-off-by: Eugene Kobyak <eugene.kobyak@intel.com>
    Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/xmsgfynkhycw3cf56akp4he2ffg44vuratocsysaowbsnhutzi@augnqbm777at
    (cherry picked from commit 754302a5bc1bd8fd3b7d85c168b0a1af6d4bba4d)
    Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/panic: remove spurious empty line to clean warning [+ + +]

Author: Miguel Ojeda <ojeda@kernel.org>
Date:   Tue Nov 26 00:33:32 2024 +0100

    drm/panic: remove spurious empty line to clean warning
    
    commit 4011b351b1b5a953aaa7c6b3915f908b3cc1be96 upstream.
    
    Clippy in the upcoming Rust 1.83.0 spots a spurious empty line since the
    `clippy::empty_line_after_doc_comments` warning is now enabled by default
    given it is part of the `suspicious` group [1]:
    
        error: empty line after doc comment
           --> drivers/gpu/drm/drm_panic_qr.rs:931:1
            |
        931 | / /// They must remain valid for the duration of the function call.
        932 | |
            | |_
        933 |   #[no_mangle]
        934 | / pub unsafe extern "C" fn drm_panic_qr_generate(
        935 | |     url: *const i8,
        936 | |     data: *mut u8,
        937 | |     data_len: usize,
        ...   |
        940 | |     tmp_size: usize,
        941 | | ) -> u8 {
            | |_______- the comment documents this function
            |
            = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#empty_line_after_doc_comments
            = note: `-D clippy::empty-line-after-doc-comments` implied by `-D warnings`
            = help: to override `-D warnings` add `#[allow(clippy::empty_line_after_doc_comments)]`
            = help: if the empty line is unintentional remove it
    
    Thus remove the empty line.
    
    Cc: stable@vger.kernel.org
    Fixes: cb5164ac43d0 ("drm/panic: Add a QR code panic screen")
    Link: https://github.com/rust-lang/rust-clippy/pull/13091 [1]
    Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
    Link: https://lore.kernel.org/r/20241125233332.697497-1-ojeda@kernel.org
    Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/xe/reg_sr: Remove register pool [+ + +]

Author: Lucas De Marchi <lucas.demarchi@intel.com>
Date:   Mon Dec 9 15:27:35 2024 -0800

    drm/xe/reg_sr: Remove register pool
    
    [ Upstream commit d7b028656c29b22fcde1c6ee1df5b28fbba987b5 ]
    
    That pool implementation doesn't really work: if the krealloc happens to
    move the memory and return another address, the entries in the xarray
    become invalid, leading to use-after-free later:
    
            BUG: KASAN: slab-use-after-free in xe_reg_sr_apply_mmio+0x570/0x760 [xe]
            Read of size 4 at addr ffff8881244b2590 by task modprobe/2753
    
            Allocated by task 2753:
             kasan_save_stack+0x39/0x70
             kasan_save_track+0x14/0x40
             kasan_save_alloc_info+0x37/0x60
             __kasan_kmalloc+0xc3/0xd0
             __kmalloc_node_track_caller_noprof+0x200/0x6d0
             krealloc_noprof+0x229/0x380
    
    Simplify the code to fix the bug. A better pooling strategy may be added
    back later if needed.
    
    Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
    Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20241209232739.147417-2-lucas.demarchi@intel.com
    Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
    (cherry picked from commit e5283bd4dfecbd3335f43b62a68e24dae23f59e4)
    Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/xe: Call invalidation_fence_fini for PT inval fences in error state [+ + +]

Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Date:   Thu Dec 5 17:50:22 2024 -0800

    drm/xe: Call invalidation_fence_fini for PT inval fences in error state
    
    commit cefade70f346160f47cc24776160329e2ee63653 upstream.
    
    Invalidation_fence_init takes a PM reference, which is released in its
    _fini counterpart, so we need to make sure that the latter is called,
    even if the fence is in an error state.
    
    Since we already have a function that calls _fini() and signals the
    fence in the tlb inval code, we can expose that and call it from the PT
    code.
    
    Fixes: f002702290fc ("drm/xe: Hold a PM ref when GT TLB invalidations are inflight")
    Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
    Cc: <stable@vger.kernel.org> # v6.11+
    Cc: Matthew Brost <matthew.brost@intel.com>
    Cc: Nirmoy Das <nirmoy.das@intel.com>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
    Reviewed-by: Matthew Brost <matthew.brost@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20241206015022.1567113-1-daniele.ceraolospurio@intel.com
    (cherry picked from commit 65338639b79ce88aef5263cd518cde570a3c7c8e)
    Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/xe: fix the ERR_PTR() returned on failure to allocate tiny pt [+ + +]

Author: Mirsad Todorovac <mtodorovac69@gmail.com>
Date:   Thu Nov 21 22:20:58 2024 +0100

    drm/xe: fix the ERR_PTR() returned on failure to allocate tiny pt
    
    [ Upstream commit ed69b28b3a5e39871ba5599992f80562d6ee59db ]
    
    Running coccinelle spatch gave the following warning:
    
    ./drivers/gpu/drm/xe/tests/xe_migrate.c:226:5-11: inconsistent IS_ERR
    and PTR_ERR on line 228.
    
    The code reports PTR_ERR(pt) when IS_ERR(tiny) is checked:
    
    → 211  pt = xe_bo_create_pin_map(xe, tile, m->q->vm, XE_PAGE_SIZE,
      212                            ttm_bo_type_kernel,
      213                            XE_BO_FLAG_VRAM_IF_DGFX(tile) |
      214                            XE_BO_FLAG_PINNED);
      215  if (IS_ERR(pt)) {
      216          KUNIT_FAIL(test, "Failed to allocate fake pt: %li\n",
      217                     PTR_ERR(pt));
      218          goto free_big;
      219  }
      220
      221  tiny = xe_bo_create_pin_map(xe, tile, m->q->vm,
    → 222                              2 * SZ_4K,
      223                              ttm_bo_type_kernel,
      224                              XE_BO_FLAG_VRAM_IF_DGFX(tile) |
      225                              XE_BO_FLAG_PINNED);
    → 226  if (IS_ERR(tiny)) {
    → 227          KUNIT_FAIL(test, "Failed to allocate fake pt: %li\n",
    → 228                     PTR_ERR(pt));
      229          goto free_pt;
      230  }
    
    Now, the IS_ERR(tiny) and the corresponding PTR_ERR(pt) do not match.
    
    Returning PTR_ERR(tiny), as the last failed function call, seems logical.
    
    Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
    Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20241121212057.1526634-2-mtodorovac69@gmail.com
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    (cherry picked from commit cb57c75098c1c449a007ba301f9073f96febaaa9)
    Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

gpio: graniterapids: Check if GPIO line can be used for IRQs [+ + +]

Author: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com>
Date:   Wed Dec 4 09:04:14 2024 +0200

    gpio: graniterapids: Check if GPIO line can be used for IRQs
    
    commit c0ec4890d6454980c53c3cc164140115c4a671f2 upstream.
    
    GPIO line can only be used as interrupt if its INTSEL register is
    programmed by the BIOS.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com>
    Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Acked-by: Andy Shevchenko <andy@kernel.org>
    Link: https://lore.kernel.org/r/20241204070415.1034449-7-mika.westerberg@linux.intel.com
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gpio: graniterapids: Determine if GPIO pad can be used by driver [+ + +]

Author: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com>
Date:   Wed Dec 4 09:04:13 2024 +0200

    gpio: graniterapids: Determine if GPIO pad can be used by driver
    
    commit 0588504d28dedde6789aec17a6ece6fa8e477725 upstream.
    
    Add check of HOSTSW_MODE bit to determine if GPIO pad can be used by the
    driver.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com>
    Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Acked-by: Andy Shevchenko <andy@kernel.org>
    Link: https://lore.kernel.org/r/20241204070415.1034449-6-mika.westerberg@linux.intel.com
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gpio: graniterapids: Fix GPIO Ack functionality [+ + +]

Author: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com>
Date:   Wed Dec 4 09:04:15 2024 +0200

    gpio: graniterapids: Fix GPIO Ack functionality
    
    commit 0bb18e34abdde7bf58fca8542e2dcf621924ea19 upstream.
    
    Interrupt status (GPI_IS) register is cleared by writing 1 to it, not 0.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com>
    Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Acked-by: Andy Shevchenko <andy@kernel.org>
    Link: https://lore.kernel.org/r/20241204070415.1034449-8-mika.westerberg@linux.intel.com
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gpio: graniterapids: Fix incorrect BAR assignment [+ + +]

Author: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com>
Date:   Wed Dec 4 09:04:10 2024 +0200

    gpio: graniterapids: Fix incorrect BAR assignment
    
    commit 7382d2f0e802077c36495e325da8d253a15fb441 upstream.
    
    Base Address of vGPIO MMIO register is provided directly by the BIOS
    instead of using offsets. Update address assignment to reflect this
    change in driver.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com>
    Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Acked-by: Andy Shevchenko <andy@kernel.org>
    Link: https://lore.kernel.org/r/20241204070415.1034449-3-mika.westerberg@linux.intel.com
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gpio: graniterapids: Fix invalid GPI_IS register offset [+ + +]

Author: Shankar Bandal <shankar.bandal@intel.com>
Date:   Wed Dec 4 09:04:11 2024 +0200

    gpio: graniterapids: Fix invalid GPI_IS register offset
    
    commit 0fe329b55231cca489f9bed1db0e778d077fdaf9 upstream.
    
    Update GPI Interrupt Status register offset to correct value.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Shankar Bandal <shankar.bandal@intel.com>
    Signed-off-by: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com>
    Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Acked-by: Andy Shevchenko <andy@kernel.org>
    Link: https://lore.kernel.org/r/20241204070415.1034449-4-mika.westerberg@linux.intel.com
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gpio: graniterapids: Fix invalid RXEVCFG register bitmask [+ + +]

Author: Shankar Bandal <shankar.bandal@intel.com>
Date:   Wed Dec 4 09:04:12 2024 +0200

    gpio: graniterapids: Fix invalid RXEVCFG register bitmask
    
    commit 15636b00a055474033426b94b6372728b2163a1e upstream.
    
    Correct RX Level/Edge Configuration register (RXEVCFG) bitmask.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Shankar Bandal <shankar.bandal@intel.com>
    Signed-off-by: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com>
    Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Acked-by: Andy Shevchenko <andy@kernel.org>
    Link: https://lore.kernel.org/r/20241204070415.1034449-5-mika.westerberg@linux.intel.com
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gpio: graniterapids: Fix vGPIO driver crash [+ + +]

Author: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com>
Date:   Wed Dec 4 09:04:09 2024 +0200

    gpio: graniterapids: Fix vGPIO driver crash
    
    commit eb9640fd1ce666610b77f5997596e9570a36378f upstream.
    
    Move setting irq_chip.name from probe() function to the initialization
    of "irq_chip" struct in order to fix vGPIO driver crash during bootup.
    
    Crash was caused by unauthorized modification of irq_chip.name field
    where irq_chip struct was initialized as const.
    
    This behavior is a consequence of suboptimal implementation of
    gpio_irq_chip_set_chip(), which should be changed to avoid
    casting away const qualifier.
    
    Crash log:
    BUG: unable to handle page fault for address: ffffffffc0ba81c0
    /#PF: supervisor write access in kernel mode
    /#PF: error_code(0x0003) - permissions violation
    CPU: 33 UID: 0 PID: 1075 Comm: systemd-udevd Not tainted 6.12.0-rc6-00077-g2e1b3cc9d7f7 #1
    Hardware name: Intel Corporation Kaseyville RP/Kaseyville RP, BIOS KVLDCRB1.PGS.0026.D73.2410081258 10/08/2024
    RIP: 0010:gnr_gpio_probe+0x171/0x220 [gpio_graniterapids]
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com>
    Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Acked-by: Andy Shevchenko <andy@kernel.org>
    Link: https://lore.kernel.org/r/20241204070415.1034449-2-mika.westerberg@linux.intel.com
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gpio: ljca: Initialize num before accessing item in ljca_gpio_config [+ + +]

Author: Haoyu Li <lihaoyu499@gmail.com>
Date:   Tue Dec 3 22:14:51 2024 +0800

    gpio: ljca: Initialize num before accessing item in ljca_gpio_config
    
    commit 3396995f9fb6bcbe0004a68118a22f98bab6e2b9 upstream.
    
    With the new __counted_by annocation in ljca_gpio_packet, the "num"
    struct member must be set before accessing the "item" array. Failing to
    do so will trigger a runtime warning when enabling CONFIG_UBSAN_BOUNDS
    and CONFIG_FORTIFY_SOURCE.
    
    Fixes: 1034cc423f1b ("gpio: update Intel LJCA USB GPIO driver")
    Cc: stable@vger.kernel.org
    Signed-off-by: Haoyu Li <lihaoyu499@gmail.com>
    Link: https://lore.kernel.org/stable/20241203141451.342316-1-lihaoyu499%40gmail.com
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iommu/tegra241-cmdqv: do not use smp_processor_id in preemptible context [+ + +]

Author: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Date:   Fri Dec 6 10:01:14 2024 -0300

    iommu/tegra241-cmdqv: do not use smp_processor_id in preemptible context
    
    commit 1f806218164d1bb93f3db21eaf61254b08acdf03 upstream.
    
    During boot some of the calls to tegra241_cmdqv_get_cmdq() will happen
    in preemptible context. As this function calls smp_processor_id(), if
    CONFIG_DEBUG_PREEMPT is enabled, these calls will trigger a series of
    "BUG: using smp_processor_id() in preemptible" backtraces.
    
    As tegra241_cmdqv_get_cmdq() only calls smp_processor_id() to use the
    CPU number as a factor to balance out traffic on cmdq usage, it is safe
    to use raw_smp_processor_id() here.
    
    Cc: <stable@vger.kernel.org>
    Fixes: 918eb5c856f6 ("iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV")
    Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
    Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
    Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
    Tested-by: Nicolin Chen <nicolinc@nvidia.com>
    Link: https://lore.kernel.org/r/Z1L1mja3nXzsJ0Pk@uudg.org
    Signed-off-by: Will Deacon <will@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain [+ + +]

Author: Yi Liu <yi.l.liu@intel.com>
Date:   Fri Dec 13 09:17:51 2024 +0800

    iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
    
    commit 74536f91962d5f6af0a42414773ce61e653c10ee upstream.
    
    The qi_batch is allocated when assigning cache tag for a domain. While
    for nested parent domain, it is missed. Hence, when trying to map pages
    to the nested parent, NULL dereference occurred. Also, there is potential
    memleak since there is no lock around domain->qi_batch allocation.
    
    To solve it, add a helper for qi_batch allocation, and call it in both
    the __cache_tag_assign_domain() and __cache_tag_assign_parent_domain().
    
      BUG: kernel NULL pointer dereference, address: 0000000000000200
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      PGD 8104795067 P4D 0
      Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
      Call Trace:
       ? __die+0x24/0x70
       ? page_fault_oops+0x80/0x150
       ? do_user_addr_fault+0x63/0x7b0
       ? exc_page_fault+0x7c/0x220
       ? asm_exc_page_fault+0x26/0x30
       ? cache_tag_flush_range_np+0x13c/0x260
       intel_iommu_iotlb_sync_map+0x1a/0x30
       iommu_map+0x61/0xf0
       batch_to_domain+0x188/0x250
       iopt_area_fill_domains+0x125/0x320
       ? rcu_is_watching+0x11/0x50
       iopt_map_pages+0x63/0x100
       iopt_map_common.isra.0+0xa7/0x190
       iopt_map_user_pages+0x6a/0x80
       iommufd_ioas_map+0xcd/0x1d0
       iommufd_fops_ioctl+0x118/0x1c0
       __x64_sys_ioctl+0x93/0xc0
       do_syscall_64+0x71/0x140
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
    
    Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
    Cc: stable@vger.kernel.org
    Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com>
    Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
    Signed-off-by: Yi Liu <yi.l.liu@intel.com>
    Reviewed-by: Kevin Tian <kevin.tian@intel.com>
    Link: https://lore.kernel.org/r/20241210130322.17175-1-yi.l.liu@intel.com
    Signed-off-by: Joerg Roedel <jroedel@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iommu/vt-d: Remove cache tags before disabling ATS [+ + +]

Author: Lu Baolu <baolu.lu@linux.intel.com>
Date:   Fri Dec 13 09:17:50 2024 +0800

    iommu/vt-d: Remove cache tags before disabling ATS
    
    commit 1f2557e08a617a4b5e92a48a1a9a6f86621def18 upstream.
    
    The current implementation removes cache tags after disabling ATS,
    leading to potential memory leaks and kernel crashes. Specifically,
    CACHE_TAG_DEVTLB type cache tags may still remain in the list even
    after the domain is freed, causing a use-after-free condition.
    
    This issue really shows up when multiple VFs from different PFs
    passed through to a single user-space process via vfio-pci. In such
    cases, the kernel may crash with kernel messages like:
    
     BUG: kernel NULL pointer dereference, address: 0000000000000014
     PGD 19036a067 P4D 1940a3067 PUD 136c9b067 PMD 0
     Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
     CPU: 74 UID: 0 PID: 3183 Comm: testCli Not tainted 6.11.9 #2
     RIP: 0010:cache_tag_flush_range+0x9b/0x250
     Call Trace:
      <TASK>
      ? __die+0x1f/0x60
      ? page_fault_oops+0x163/0x590
      ? exc_page_fault+0x72/0x190
      ? asm_exc_page_fault+0x22/0x30
      ? cache_tag_flush_range+0x9b/0x250
      ? cache_tag_flush_range+0x5d/0x250
      intel_iommu_tlb_sync+0x29/0x40
      intel_iommu_unmap_pages+0xfe/0x160
      __iommu_unmap+0xd8/0x1a0
      vfio_unmap_unpin+0x182/0x340 [vfio_iommu_type1]
      vfio_remove_dma+0x2a/0xb0 [vfio_iommu_type1]
      vfio_iommu_type1_ioctl+0xafa/0x18e0 [vfio_iommu_type1]
    
    Move cache_tag_unassign_domain() before iommu_disable_pci_caps() to fix
    it.
    
    Fixes: 3b1d9e2b2d68 ("iommu/vt-d: Add cache tag assignment interface")
    Cc: stable@vger.kernel.org
    Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
    Reviewed-by: Kevin Tian <kevin.tian@intel.com>
    Link: https://lore.kernel.org/r/20241129020506.576413-1-baolu.lu@linux.intel.com
    Signed-off-by: Joerg Roedel <jroedel@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kselftest/arm64: abi: fix SVCR detection [+ + +]

Author: Weizhao Ouyang <o451686892@gmail.com>
Date:   Wed Dec 11 19:16:39 2024 +0800

    kselftest/arm64: abi: fix SVCR detection
    
    [ Upstream commit ce03573a1917532da06057da9f8e74a2ee9e2ac9 ]
    
    When using svcr_in to check ZA and Streaming Mode, we should make sure
    that the value in x2 is correct, otherwise it may trigger an Illegal
    instruction if FEAT_SVE and !FEAT_SME.
    
    Fixes: 43e3f85523e4 ("kselftest/arm64: Add SME support to syscall ABI test")
    Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
    Reviewed-by: Mark Brown <broonie@kernel.org>
    Link: https://lore.kernel.org/r/20241211111639.12344-1-o451686892@gmail.com
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ksmbd: fix racy issue from session lookup and expire [+ + +]

Author: Namjae Jeon <linkinjeon@kernel.org>
Date:   Thu Dec 5 21:38:47 2024 +0900

    ksmbd: fix racy issue from session lookup and expire
    
    commit b95629435b84b9ecc0c765995204a4d8a913ed52 upstream.
    
    Increment the session reference count within the lock for lookup to avoid
    racy issue with session expire.
    
    Cc: stable@vger.kernel.org
    Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-25737
    Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: arm64: Disable MPAM visibility by default and ignore VMM writes [+ + +]

Author: James Morse <james.morse@arm.com>
Date:   Wed Oct 30 16:03:16 2024 +0000

    KVM: arm64: Disable MPAM visibility by default and ignore VMM writes
    
    commit 6685f5d572c22e1003e7c0d089afe1c64340ab1f upstream.
    
    commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits in
    ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests,
    but didn't add trap handling. A previous patch supplied the missing trap
    handling.
    
    Existing VMs that have the MPAM field of ID_AA64PFR0_EL1 set need to
    be migratable, but there is little point enabling the MPAM CPU
    interface on new VMs until there is something a guest can do with it.
    
    Clear the MPAM field from the guest's ID_AA64PFR0_EL1 and on hardware
    that supports MPAM, politely ignore the VMMs attempts to set this bit.
    
    Guests exposed to this bug have the sanitised value of the MPAM field,
    so only the correct value needs to be ignored. This means the field
    can continue to be used to block migration to incompatible hardware
    (between MPAM=1 and MPAM=5), and the VMM can't rely on the field
    being ignored.
    
    Signed-off-by: James Morse <james.morse@arm.com>
    Co-developed-by: Joey Gouly <joey.gouly@arm.com>
    Signed-off-by: Joey Gouly <joey.gouly@arm.com>
    Reviewed-by: Gavin Shan <gshan@redhat.com>
    Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20241030160317.2528209-7-joey.gouly@arm.com
    Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
    [maz: adapted to lack of ID_FILTERED()]
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

libperf: evlist: Fix --cpu argument on hybrid platform [+ + +]

Author: James Clark <james.clark@linaro.org>
Date:   Thu Nov 14 16:04:48 2024 +0000

    libperf: evlist: Fix --cpu argument on hybrid platform
    
    [ Upstream commit f7e36d02d771ee14acae1482091718460cffb321 ]
    
    Since the linked fixes: commit, specifying a CPU on hybrid platforms
    results in an error because Perf tries to open an extended type event
    on "any" CPU which isn't valid. Extended type events can only be opened
    on CPUs that match the type.
    
    Before (working):
    
      $ perf record --cpu 1 -- true
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 2.385 MB perf.data (7 samples) ]
    
    After (not working):
    
      $ perf record -C 1 -- true
      WARNING: A requested CPU in '1' is not supported by PMU 'cpu_atom' (CPUs 16-27) for event 'cycles:P'
      Error:
      The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cpu_atom/cycles:P/).
      /bin/dmesg | grep -i perf may provide additional information.
    
    (Ignore the warning message, that's expected and not particularly
    relevant to this issue).
    
    This is because perf_cpu_map__intersect() of the user specified CPU (1)
    and one of the PMU's CPUs (16-27) correctly results in an empty (NULL)
    CPU map. However for the purposes of opening an event, libperf converts
    empty CPU maps into an any CPU (-1) which the kernel rejects.
    
    Fix it by deleting evsels with empty CPU maps in the specific case where
    user requested CPU maps are evaluated.
    
    Fixes: 251aa040244a ("perf parse-events: Wildcard most "numeric" events")
    Reviewed-by: Ian Rogers <irogers@google.com>
    Tested-by: Thomas Falcon <thomas.falcon@intel.com>
    Signed-off-by: James Clark <james.clark@linaro.org>
    Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Link: https://lore.kernel.org/r/20241114160450.295844-2-james.clark@linaro.org
    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Linux: Linux 6.12.6 [+ + +]

Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Thu Dec 19 18:13:24 2024 +0100

    Linux 6.12.6
    
    Link: https://lore.kernel.org/r/20241217170546.209657098@linuxfoundation.org
    Tested-by: Ronald Warsow <rwarsow@gmx.de>
    Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Tested-by: Shuah Khan <skhan@linuxfoundation.org>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com>
    Tested-by: Mark Brown <broonie@kernel.org>
    Tested-by: Peter Schneider <pschneider1968@googlemail.com>
    Tested-by: Salvatore Bonaccorso <carnil@debian.org>
    Tested-by: Jon Hunter <jonathanh@nvidia.com>
    Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
    Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

memcg: slub: fix SUnreclaim for post charged objects [+ + +]

Author: Shakeel Butt <shakeel.butt@linux.dev>
Date:   Mon Dec 9 20:06:57 2024 -0800

    memcg: slub: fix SUnreclaim for post charged objects
    
    commit b7ffecbe198e2dfc44abf92ceb90f46150f7527a upstream.
    
    Large kmalloc directly allocates from the page allocator and then use
    lruvec_stat_mod_folio() to increment the unreclaimable slab stats for
    global and memcg. However when post memcg charging of slab objects was
    added in commit 9028cdeb38e1 ("memcg: add charging of already allocated
    slab objects"), it missed to correctly handle the unreclaimable slab
    stats for memcg.
    
    One user visisble effect of that bug is that the node level
    unreclaimable slab stat will work correctly but the memcg level stat can
    underflow as kernel correctly handles the free path but the charge path
    missed to increment the memcg level unreclaimable slab stat. Let's fix
    by correctly handle in the post charge code path.
    
    Fixes: 9028cdeb38e1 ("memcg: add charging of already allocated slab objects")
    Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net, team, bonding: Add netdev_base_features helper [+ + +]

Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Tue Dec 10 15:12:41 2024 +0100

    net, team, bonding: Add netdev_base_features helper
    
    [ Upstream commit d2516c3a53705f783bb6868df0f4a2b977898a71 ]
    
    Both bonding and team driver have logic to derive the base feature
    flags before iterating over their slave devices to refine the set
    via netdev_increment_features().
    
    Add a small helper netdev_base_features() so this can be reused
    instead of having it open-coded multiple times.
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Cc: Nikolay Aleksandrov <razor@blackwall.org>
    Cc: Ido Schimmel <idosch@idosch.org>
    Cc: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://patch.msgid.link/20241210141245.327886-1-daniel@iogearbox.net
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Stable-dep-of: d064ea7fe2a2 ("bonding: Fix initial {vlan,mpls}_feature set in bond_compute_features")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: DR, prevent potential error pointer dereference [+ + +]

Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Wed Dec 4 15:06:41 2024 +0300

    net/mlx5: DR, prevent potential error pointer dereference
    
    [ Upstream commit 11776cff0b563c8b8a4fa76cab620bfb633a8cb8 ]
    
    The dr_domain_add_vport_cap() function generally returns NULL on error
    but sometimes we want it to return ERR_PTR(-EBUSY) so the caller can
    retry.  The problem here is that "ret" can be either -EBUSY or -ENOMEM
    and if it's and -ENOMEM then the error pointer is propogated back and
    eventually dereferenced in dr_ste_v0_build_src_gvmi_qpn_tag().
    
    Fixes: 11a45def2e19 ("net/mlx5: DR, Add support for SF vports")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://patch.msgid.link/07477254-e179-43e2-b1b3-3b9db4674195@stanley.mountain
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/sched: netem: account for backlog updates from child qdisc [+ + +]

Author: Martin Ottens <martin.ottens@fau.de>
Date:   Tue Dec 10 14:14:11 2024 +0100

    net/sched: netem: account for backlog updates from child qdisc
    
    [ Upstream commit f8d4bc455047cf3903cd6f85f49978987dbb3027 ]
    
    In general, 'qlen' of any classful qdisc should keep track of the
    number of packets that the qdisc itself and all of its children holds.
    In case of netem, 'qlen' only accounts for the packets in its internal
    tfifo. When netem is used with a child qdisc, the child qdisc can use
    'qdisc_tree_reduce_backlog' to inform its parent, netem, about created
    or dropped SKBs. This function updates 'qlen' and the backlog statistics
    of netem, but netem does not account for changes made by a child qdisc.
    'qlen' then indicates the wrong number of packets in the tfifo.
    If a child qdisc creates new SKBs during enqueue and informs its parent
    about this, netem's 'qlen' value is increased. When netem dequeues the
    newly created SKBs from the child, the 'qlen' in netem is not updated.
    If 'qlen' reaches the configured sch->limit, the enqueue function stops
    working, even though the tfifo is not full.
    
    Reproduce the bug:
    Ensure that the sender machine has GSO enabled. Configure netem as root
    qdisc and tbf as its child on the outgoing interface of the machine
    as follows:
    $ tc qdisc add dev <oif> root handle 1: netem delay 100ms limit 100
    $ tc qdisc add dev <oif> parent 1:0 tbf rate 50Mbit burst 1542 latency 50ms
    
    Send bulk TCP traffic out via this interface, e.g., by running an iPerf3
    client on the machine. Check the qdisc statistics:
    $ tc -s qdisc show dev <oif>
    
    Statistics after 10s of iPerf3 TCP test before the fix (note that
    netem's backlog > limit, netem stopped accepting packets):
    qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
     Sent 2767766 bytes 1848 pkt (dropped 652, overlimits 0 requeues 0)
     backlog 4294528236b 1155p requeues 0
    qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
     Sent 2767766 bytes 1848 pkt (dropped 327, overlimits 7601 requeues 0)
     backlog 0b 0p requeues 0
    
    Statistics after the fix:
    qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
     Sent 37766372 bytes 24974 pkt (dropped 9, overlimits 0 requeues 0)
     backlog 0b 0p requeues 0
    qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
     Sent 37766372 bytes 24974 pkt (dropped 327, overlimits 96017 requeues 0)
     backlog 0b 0p requeues 0
    
    tbf segments the GSO SKBs (tbf_segment) and updates the netem's 'qlen'.
    The interface fully stops transferring packets and "locks". In this case,
    the child qdisc and tfifo are empty, but 'qlen' indicates the tfifo is at
    its limit and no more packets are accepted.
    
    This patch adds a counter for the entries in the tfifo. Netem's 'qlen' is
    only decreased when a packet is returned by its dequeue function, and not
    during enqueuing into the child qdisc. External updates to 'qlen' are thus
    accounted for and only the behavior of the backlog statistics changes. As
    in other qdiscs, 'qlen' then keeps track of  how many packets are held in
    netem and all of its children. As before, sch->limit remains as the
    maximum number of packets in the tfifo. The same applies to netem's
    backlog statistics.
    
    Fixes: 50612537e9ab ("netem: fix classful handling")
    Signed-off-by: Martin Ottens <martin.ottens@fau.de>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Link: https://patch.msgid.link/20241210131412.1837202-1-martin.ottens@fau.de
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: defer final 'struct net' free in netns dismantle [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Dec 4 12:54:55 2024 +0000

    net: defer final 'struct net' free in netns dismantle
    
    [ Upstream commit 0f6ede9fbc747e2553612271bce108f7517e7a45 ]
    
    Ilya reported a slab-use-after-free in dst_destroy [1]
    
    Issue is in xfrm6_net_init() and xfrm4_net_init() :
    
    They copy xfrm[46]_dst_ops_template into net->xfrm.xfrm[46]_dst_ops.
    
    But net structure might be freed before all the dst callbacks are
    called. So when dst_destroy() calls later :
    
    if (dst->ops->destroy)
        dst->ops->destroy(dst);
    
    dst->ops points to the old net->xfrm.xfrm[46]_dst_ops, which has been freed.
    
    See a relevant issue fixed in :
    
    ac888d58869b ("net: do not delay dst_entries_add() in dst_release()")
    
    A fix is to queue the 'struct net' to be freed after one
    another cleanup_net() round (and existing rcu_barrier())
    
    [1]
    
    BUG: KASAN: slab-use-after-free in dst_destroy (net/core/dst.c:112)
    Read of size 8 at addr ffff8882137ccab0 by task swapper/37/0
    Dec 03 05:46:18 kernel:
    CPU: 37 UID: 0 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 6.12.0 #67
    Hardware name: Red Hat KVM/RHEL, BIOS 1.16.1-1.el9 04/01/2014
    Call Trace:
     <IRQ>
    dump_stack_lvl (lib/dump_stack.c:124)
    print_address_description.constprop.0 (mm/kasan/report.c:378)
    ? dst_destroy (net/core/dst.c:112)
    print_report (mm/kasan/report.c:489)
    ? dst_destroy (net/core/dst.c:112)
    ? kasan_addr_to_slab (mm/kasan/common.c:37)
    kasan_report (mm/kasan/report.c:603)
    ? dst_destroy (net/core/dst.c:112)
    ? rcu_do_batch (kernel/rcu/tree.c:2567)
    dst_destroy (net/core/dst.c:112)
    rcu_do_batch (kernel/rcu/tree.c:2567)
    ? __pfx_rcu_do_batch (kernel/rcu/tree.c:2491)
    ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4339 kernel/locking/lockdep.c:4406)
    rcu_core (kernel/rcu/tree.c:2825)
    handle_softirqs (kernel/softirq.c:554)
    __irq_exit_rcu (kernel/softirq.c:589 kernel/softirq.c:428 kernel/softirq.c:637)
    irq_exit_rcu (kernel/softirq.c:651)
    sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049)
     </IRQ>
     <TASK>
    asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:702)
    RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:743)
    Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 0f 00 2d c7 c9 27 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90
    RSP: 0018:ffff888100d2fe00 EFLAGS: 00000246
    RAX: 00000000001870ed RBX: 1ffff110201a5fc2 RCX: ffffffffb61a3e46
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb3d4d123
    RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed11c7e1835d
    R10: ffff888e3f0c1aeb R11: 0000000000000000 R12: 0000000000000000
    R13: ffff888100d20000 R14: dffffc0000000000 R15: 0000000000000000
    ? ct_kernel_exit.constprop.0 (kernel/context_tracking.c:148)
    ? cpuidle_idle_call (kernel/sched/idle.c:186)
    default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118)
    cpuidle_idle_call (kernel/sched/idle.c:186)
    ? __pfx_cpuidle_idle_call (kernel/sched/idle.c:168)
    ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5848)
    ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4347 kernel/locking/lockdep.c:4406)
    ? tsc_verify_tsc_adjust (arch/x86/kernel/tsc_sync.c:59)
    do_idle (kernel/sched/idle.c:326)
    cpu_startup_entry (kernel/sched/idle.c:423 (discriminator 1))
    start_secondary (arch/x86/kernel/smpboot.c:202 arch/x86/kernel/smpboot.c:282)
    ? __pfx_start_secondary (arch/x86/kernel/smpboot.c:232)
    ? soft_restart_cpu (arch/x86/kernel/head_64.S:452)
    common_startup_64 (arch/x86/kernel/head_64.S:414)
     </TASK>
    Dec 03 05:46:18 kernel:
    Allocated by task 12184:
    kasan_save_stack (mm/kasan/common.c:48)
    kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
    __kasan_slab_alloc (mm/kasan/common.c:319 mm/kasan/common.c:345)
    kmem_cache_alloc_noprof (mm/slub.c:4085 mm/slub.c:4134 mm/slub.c:4141)
    copy_net_ns (net/core/net_namespace.c:421 net/core/net_namespace.c:480)
    create_new_namespaces (kernel/nsproxy.c:110)
    unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4))
    ksys_unshare (kernel/fork.c:3313)
    __x64_sys_unshare (kernel/fork.c:3382)
    do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
    Dec 03 05:46:18 kernel:
    Freed by task 11:
    kasan_save_stack (mm/kasan/common.c:48)
    kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
    kasan_save_free_info (mm/kasan/generic.c:582)
    __kasan_slab_free (mm/kasan/common.c:271)
    kmem_cache_free (mm/slub.c:4579 mm/slub.c:4681)
    cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:446 net/core/net_namespace.c:647)
    process_one_work (kernel/workqueue.c:3229)
    worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
    kthread (kernel/kthread.c:389)
    ret_from_fork (arch/x86/kernel/process.c:147)
    ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
    Dec 03 05:46:18 kernel:
    Last potentially related work creation:
    kasan_save_stack (mm/kasan/common.c:48)
    __kasan_record_aux_stack (mm/kasan/generic.c:541)
    insert_work (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/workqueue.c:788 kernel/workqueue.c:795 kernel/workqueue.c:2186)
    __queue_work (kernel/workqueue.c:2340)
    queue_work_on (kernel/workqueue.c:2391)
    xfrm_policy_insert (net/xfrm/xfrm_policy.c:1610)
    xfrm_add_policy (net/xfrm/xfrm_user.c:2116)
    xfrm_user_rcv_msg (net/xfrm/xfrm_user.c:3321)
    netlink_rcv_skb (net/netlink/af_netlink.c:2536)
    xfrm_netlink_rcv (net/xfrm/xfrm_user.c:3344)
    netlink_unicast (net/netlink/af_netlink.c:1316 net/netlink/af_netlink.c:1342)
    netlink_sendmsg (net/netlink/af_netlink.c:1886)
    sock_write_iter (net/socket.c:729 net/socket.c:744 net/socket.c:1165)
    vfs_write (fs/read_write.c:590 fs/read_write.c:683)
    ksys_write (fs/read_write.c:736)
    do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
    Dec 03 05:46:18 kernel:
    Second to last potentially related work creation:
    kasan_save_stack (mm/kasan/common.c:48)
    __kasan_record_aux_stack (mm/kasan/generic.c:541)
    insert_work (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/workqueue.c:788 kernel/workqueue.c:795 kernel/workqueue.c:2186)
    __queue_work (kernel/workqueue.c:2340)
    queue_work_on (kernel/workqueue.c:2391)
    __xfrm_state_insert (./include/linux/workqueue.h:723 net/xfrm/xfrm_state.c:1150 net/xfrm/xfrm_state.c:1145 net/xfrm/xfrm_state.c:1513)
    xfrm_state_update (./include/linux/spinlock.h:396 net/xfrm/xfrm_state.c:1940)
    xfrm_add_sa (net/xfrm/xfrm_user.c:912)
    xfrm_user_rcv_msg (net/xfrm/xfrm_user.c:3321)
    netlink_rcv_skb (net/netlink/af_netlink.c:2536)
    xfrm_netlink_rcv (net/xfrm/xfrm_user.c:3344)
    netlink_unicast (net/netlink/af_netlink.c:1316 net/netlink/af_netlink.c:1342)
    netlink_sendmsg (net/netlink/af_netlink.c:1886)
    sock_write_iter (net/socket.c:729 net/socket.c:744 net/socket.c:1165)
    vfs_write (fs/read_write.c:590 fs/read_write.c:683)
    ksys_write (fs/read_write.c:736)
    do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
    
    Fixes: a8a572a6b5f2 ("xfrm: dst_entries_init() per-net dst_ops")
    Reported-by: Ilya Maximets <i.maximets@ovn.org>
    Closes: https://lore.kernel.org/netdev/CANn89iKKYDVpB=MtmfH7nyv2p=rJWSLedO5k7wSZgtY_tO8WQg@mail.gmail.com/T/#m02c98c3009fe66382b73cfb4db9cf1df6fab3fbf
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://patch.msgid.link/20241204125455.3871859-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dsa: felix: fix stuck CPU-injected packets with short taprio windows [+ + +]

Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Tue Dec 10 15:26:40 2024 +0200

    net: dsa: felix: fix stuck CPU-injected packets with short taprio windows
    
    [ Upstream commit acfcdb78d5d4cdb78e975210c8825b9a112463f6 ]
    
    With this port schedule:
    
    tc qdisc replace dev $send_if parent root handle 100 taprio \
            num_tc 8 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
            map 0 1 2 3 4 5 6 7 \
            base-time 0 cycle-time 10000 \
            sched-entry S 01 1250 \
            sched-entry S 02 1250 \
            sched-entry S 04 1250 \
            sched-entry S 08 1250 \
            sched-entry S 10 1250 \
            sched-entry S 20 1250 \
            sched-entry S 40 1250 \
            sched-entry S 80 1250 \
            flags 2
    
    ptp4l would fail to take TX timestamps of Pdelay_Resp messages like this:
    
    increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
    ptp4l[4134.168]: port 2: send peer delay response failed
    
    It turns out that the driver can't take their TX timestamps because it
    can't transmit them in the first place. And there's nothing special
    about the Pdelay_Resp packets - they're just regular 68 byte packets.
    But with this taprio configuration, the switch would refuse to send even
    the ETH_ZLEN minimum packet size.
    
    This should have definitely not been the case. When applying the taprio
    config, the driver prints:
    
    mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
    
    and thus, everything under 132 bytes - ETH_FCS_LEN should have been sent
    without problems. Yet it's not.
    
    For the forwarding path, the configuration is fine, yet packets injected
    from Linux get stuck with this schedule no matter what.
    
    The first hint that the static guard bands are the cause of the problem
    is that reverting Michael Walle's commit 297c4de6f780 ("net: dsa: felix:
    re-enable TAS guard band mode") made things work. It must be that the
    guard bands are calculated incorrectly.
    
    I remembered that there is a magic constant in the driver, set to 33 ns
    for no logical reason other than experimentation, which says "never let
    the static guard bands get so large as to leave less than this amount of
    remaining space in the time slot, because the queue system will refuse
    to schedule packets otherwise, and they will get stuck". I had a hunch
    that my previous experimentally-determined value was only good for
    packets coming from the forwarding path, and that the CPU injection path
    needed more.
    
    I came to the new value of 35 ns through binary search, after seeing
    that with 544 ns (the bit time required to send the Pdelay_Resp packet
    at gigabit) it works. Again, this is purely experimental, there's no
    logic and the manual doesn't say anything.
    
    The new driver prints for this schedule look like this:
    
    mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
    mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
    
    So yes, the maximum MTU is now even smaller by 1 byte than before.
    This is maybe counter-intuitive, but makes more sense with a diagram of
    one time slot.
    
    Before:
    
     Gate open                                   Gate close
     |                                                    |
     v           1250 ns total time slot duration         v
     <---------------------------------------------------->
     <----><---------------------------------------------->
      33 ns            1217 ns static guard band
      useful
    
     Gate open                                   Gate close
     |                                                    |
     v           1250 ns total time slot duration         v
     <---------------------------------------------------->
     <-----><--------------------------------------------->
      35 ns            1215 ns static guard band
      useful
    
    The static guard band implemented by this switch hardware directly
    determines the maximum allowable MTU for that traffic class. The larger
    it is, the earlier the switch will stop scheduling frames for
    transmission, because otherwise they might overrun the gate close time
    (and avoiding that is the entire purpose of Michael's patch).
    So, we now have guard bands smaller by 2 ns, thus, in this particular
    case, we lose a byte of the maximum MTU.
    
    Fixes: 11afdc6526de ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Reviewed-by: Michael Walle <mwalle@kernel.org>
    Link: https://patch.msgid.link/20241210132640.3426788-1-vladimir.oltean@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dsa: microchip: KSZ9896 register regmap alignment to 32 bit boundaries [+ + +]

Author: Jesse Van Gavere <jesseevg@gmail.com>
Date:   Wed Dec 11 10:29:32 2024 +0100

    net: dsa: microchip: KSZ9896 register regmap alignment to 32 bit boundaries
    
    [ Upstream commit 5af53577c64fa84da032d490b701127fe8d1a6aa ]
    
    Commit 8d7ae22ae9f8 ("net: dsa: microchip: KSZ9477 register regmap
    alignment to 32 bit boundaries") fixed an issue whereby regmap_reg_range
    did not allow writes as 32 bit words to KSZ9477 PHY registers, this fix
    for KSZ9896 is adapted from there as the same errata is present in
    KSZ9896C as "Module 5: Certain PHY registers must be written as pairs
    instead of singly" the explanation below is likewise taken from this
    commit.
    
    The commit provided code
    to apply "Module 6: Certain PHY registers must be written as pairs instead
    of singly" errata for KSZ9477 as this chip for certain PHY registers
    (0xN120 to 0xN13F, N=1,2,3,4,5) must be accessed as 32 bit words instead
    of 16 or 8 bit access.
    Otherwise, adjacent registers (no matter if reserved or not) are
    overwritten with 0x0.
    
    Without this patch some registers (e.g. 0x113c or 0x1134) required for 32
    bit access are out of valid regmap ranges.
    
    As a result, following error is observed and KSZ9896 is not properly
    configured:
    
    ksz-switch spi1.0: can't rmw 32bit reg 0x113c: -EIO
    ksz-switch spi1.0: can't rmw 32bit reg 0x1134: -EIO
    ksz-switch spi1.0 lan1 (uninitialized): failed to connect to PHY: -EIO
    ksz-switch spi1.0 lan1 (uninitialized): error -5 setting up PHY for tree 0, switch 0, port 0
    
    The solution is to modify regmap_reg_range to allow accesses with 4 bytes
    boundaries.
    
    Fixes: 5c844d57aa78 ("net: dsa: microchip: fix writes to phy registers >= 0x10")
    Signed-off-by: Jesse Van Gavere <jesse.vangavere@scioteq.com>
    Link: https://patch.msgid.link/20241211092932.26881-1-jesse.vangavere@scioteq.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dsa: tag_ocelot_8021q: fix broken reception [+ + +]

Author: Robert Hodaszi <robert.hodaszi@digi.com>
Date:   Wed Dec 11 15:47:41 2024 +0100

    net: dsa: tag_ocelot_8021q: fix broken reception
    
    [ Upstream commit 36ff681d2283410742489ce77e7b01419eccf58c ]
    
    The blamed commit changed the dsa_8021q_rcv() calling convention to
    accept pre-populated source_port and switch_id arguments. If those are
    not available, as in the case of tag_ocelot_8021q, the arguments must be
    pre-initialized with -1.
    
    Due to the bug of passing uninitialized arguments in tag_ocelot_8021q,
    dsa_8021q_rcv() does not detect that it needs to populate the
    source_port and switch_id, and this makes dsa_conduit_find_user() fail,
    which leads to packet loss on reception.
    
    Fixes: dcfe7673787b ("net: dsa: tag_sja1105: absorb logic for not overwriting precise info into dsa_8021q_rcv()")
    Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com>
    Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://patch.msgid.link/20241211144741.1415758-1-robert.hodaszi@digi.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: lapb: increase LAPB_HEADER_LEN [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Dec 4 14:10:31 2024 +0000

    net: lapb: increase LAPB_HEADER_LEN
    
    [ Upstream commit a6d75ecee2bf828ac6a1b52724aba0a977e4eaf4 ]
    
    It is unclear if net/lapb code is supposed to be ready for 8021q.
    
    We can at least avoid crashes like the following :
    
    skbuff: skb_under_panic: text:ffffffff8aabe1f6 len:24 put:20 head:ffff88802824a400 data:ffff88802824a3fe tail:0x16 end:0x140 dev:nr0.2
    ------------[ cut here ]------------
     kernel BUG at net/core/skbuff.c:206 !
    Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
    CPU: 1 UID: 0 PID: 5508 Comm: dhcpcd Not tainted 6.12.0-rc7-syzkaller-00144-g66418447d27b #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
     RIP: 0010:skb_panic net/core/skbuff.c:206 [inline]
     RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216
    Code: 0d 8d 48 c7 c6 2e 9e 29 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 1a 6f 37 02 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
    RSP: 0018:ffffc90002ddf638 EFLAGS: 00010282
    RAX: 0000000000000086 RBX: dffffc0000000000 RCX: 7a24750e538ff600
    RDX: 0000000000000000 RSI: 0000000000000201 RDI: 0000000000000000
    RBP: ffff888034a86650 R08: ffffffff8174b13c R09: 1ffff920005bbe60
    R10: dffffc0000000000 R11: fffff520005bbe61 R12: 0000000000000140
    R13: ffff88802824a400 R14: ffff88802824a3fe R15: 0000000000000016
    FS:  00007f2a5990d740(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000110c2631fd CR3: 0000000029504000 CR4: 00000000003526f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
      skb_push+0xe5/0x100 net/core/skbuff.c:2636
      nr_header+0x36/0x320 net/netrom/nr_dev.c:69
      dev_hard_header include/linux/netdevice.h:3148 [inline]
      vlan_dev_hard_header+0x359/0x480 net/8021q/vlan_dev.c:83
      dev_hard_header include/linux/netdevice.h:3148 [inline]
      lapbeth_data_transmit+0x1f6/0x2a0 drivers/net/wan/lapbether.c:257
      lapb_data_transmit+0x91/0xb0 net/lapb/lapb_iface.c:447
      lapb_transmit_buffer+0x168/0x1f0 net/lapb/lapb_out.c:149
     lapb_establish_data_link+0x84/0xd0
     lapb_device_event+0x4e0/0x670
      notifier_call_chain+0x19f/0x3e0 kernel/notifier.c:93
     __dev_notify_flags+0x207/0x400
      dev_change_flags+0xf0/0x1a0 net/core/dev.c:8922
      devinet_ioctl+0xa4e/0x1aa0 net/ipv4/devinet.c:1188
      inet_ioctl+0x3d7/0x4f0 net/ipv4/af_inet.c:1003
      sock_do_ioctl+0x158/0x460 net/socket.c:1227
      sock_ioctl+0x626/0x8e0 net/socket.c:1346
      vfs_ioctl fs/ioctl.c:51 [inline]
      __do_sys_ioctl fs/ioctl.c:907 [inline]
      __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reported-by: syzbot+fb99d1b0c0f81d94a5e2@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/netdev/67506220.050a0220.17bd51.006c.GAE@google.com/T/#u
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://patch.msgid.link/20241204141031.4030267-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: mana: Fix irq_contexts memory leak in mana_gd_setup_irqs [+ + +]

Author: Maxim Levitsky <mlevitsk@redhat.com>
Date:   Mon Dec 9 12:57:51 2024 -0500

    net: mana: Fix irq_contexts memory leak in mana_gd_setup_irqs
    
    [ Upstream commit 9a5beb6ca6305de5c5210efab0702ea79b62eb39 ]
    
    gc->irq_contexts is not freeded if one of the later operations
    fail.
    
    Suggested-by: Michael Kelley <mhklinux@outlook.com>
    Fixes: 8afefc361209 ("net: mana: Assigning IRQ affinity on HT cores")
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
    Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
    Reviewed-by: Yury Norov <yury.norov@gmail.com>
    Link: https://patch.msgid.link/20241209175751.287738-3-mlevitsk@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: mana: Fix memory leak in mana_gd_setup_irqs [+ + +]

Author: Maxim Levitsky <mlevitsk@redhat.com>
Date:   Mon Dec 9 12:57:50 2024 -0500

    net: mana: Fix memory leak in mana_gd_setup_irqs
    
    [ Upstream commit bb1e3eb57d2cc38951f9a9f1b8c298ced175798f ]
    
    Commit 8afefc361209 ("net: mana: Assigning IRQ affinity on HT cores")
    added memory allocation in mana_gd_setup_irqs of 'irqs' but the code
    doesn't free this temporary array in the success path.
    
    This was caught by kmemleak.
    
    Fixes: 8afefc361209 ("net: mana: Assigning IRQ affinity on HT cores")
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
    Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
    Reviewed-by: Yury Norov <yury.norov@gmail.com>
    Link: https://patch.msgid.link/20241209175751.287738-2-mlevitsk@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: mscc: ocelot: be resilient to loss of PTP packets during transmission [+ + +]

Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Thu Dec 5 16:55:18 2024 +0200

    net: mscc: ocelot: be resilient to loss of PTP packets during transmission
    
    [ Upstream commit b454abfab52543c44b581afc807b9f97fc1e7a3a ]
    
    The Felix DSA driver presents unique challenges that make the simplistic
    ocelot PTP TX timestamping procedure unreliable: any transmitted packet
    may be lost in hardware before it ever leaves our local system.
    
    This may happen because there is congestion on the DSA conduit, the
    switch CPU port or even user port (Qdiscs like taprio may delay packets
    indefinitely by design).
    
    The technical problem is that the kernel, i.e. ocelot_port_add_txtstamp_skb(),
    runs out of timestamp IDs eventually, because it never detects that
    packets are lost, and keeps the IDs of the lost packets on hold
    indefinitely. The manifestation of the issue once the entire timestamp
    ID range becomes busy looks like this in dmesg:
    
    mscc_felix 0000:00:00.5: port 0 delivering skb without TX timestamp
    mscc_felix 0000:00:00.5: port 1 delivering skb without TX timestamp
    
    At the surface level, we need a timeout timer so that the kernel knows a
    timestamp ID is available again. But there is a deeper problem with the
    implementation, which is the monotonically increasing ocelot_port->ts_id.
    In the presence of packet loss, it will be impossible to detect that and
    reuse one of the holes created in the range of free timestamp IDs.
    
    What we actually need is a bitmap of 63 timestamp IDs tracking which one
    is available. That is able to use up holes caused by packet loss, but
    also gives us a unique opportunity to not implement an actual timer_list
    for the timeout timer (very complicated in terms of locking).
    
    We could only declare a timestamp ID stale on demand (lazily), aka when
    there's no other timestamp ID available. There are pros and cons to this
    approach: the implementation is much more simple than per-packet timers
    would be, but most of the stale packets would be quasi-leaked - not
    really leaked, but blocked in driver memory, since this algorithm sees
    no reason to free them.
    
    An improved technique would be to check for stale timestamp IDs every
    time we allocate a new one. Assuming a constant flux of PTP packets,
    this avoids stale packets being blocked in memory, but of course,
    packets lost at the end of the flux are still blocked until the flux
    resumes (nobody left to kick them out).
    
    Since implementing per-packet timers is way too complicated, this should
    be good enough.
    
    Testing procedure:
    
    Persistently block traffic class 5 and try to run PTP on it:
    $ tc qdisc replace dev swp3 parent root taprio num_tc 8 \
            map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
            base-time 0 sched-entry S 0xdf 100000 flags 0x2
    [  126.948141] mscc_felix 0000:00:00.5: port 3 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
    $ ptp4l -i swp3 -2 -P -m --socket_priority 5 --fault_reset_interval ASAP --logSyncInterval -3
    ptp4l[70.351]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE
    ptp4l[70.354]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE
    ptp4l[70.358]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE
    [   70.394583] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    ptp4l[70.406]: timed out while polling for tx timestamp
    ptp4l[70.406]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
    ptp4l[70.406]: port 1 (swp3): send peer delay response failed
    ptp4l[70.407]: port 1 (swp3): clearing fault immediately
    ptp4l[70.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
    [   71.394858] mscc_felix 0000:00:00.5: port 3 timestamp id 1
    ptp4l[71.400]: timed out while polling for tx timestamp
    ptp4l[71.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
    ptp4l[71.401]: port 1 (swp3): send peer delay response failed
    ptp4l[71.401]: port 1 (swp3): clearing fault immediately
    [   72.393616] mscc_felix 0000:00:00.5: port 3 timestamp id 2
    ptp4l[72.401]: timed out while polling for tx timestamp
    ptp4l[72.402]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
    ptp4l[72.402]: port 1 (swp3): send peer delay response failed
    ptp4l[72.402]: port 1 (swp3): clearing fault immediately
    ptp4l[72.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
    [   73.395291] mscc_felix 0000:00:00.5: port 3 timestamp id 3
    ptp4l[73.400]: timed out while polling for tx timestamp
    ptp4l[73.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
    ptp4l[73.400]: port 1 (swp3): send peer delay response failed
    ptp4l[73.400]: port 1 (swp3): clearing fault immediately
    [   74.394282] mscc_felix 0000:00:00.5: port 3 timestamp id 4
    ptp4l[74.400]: timed out while polling for tx timestamp
    ptp4l[74.401]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
    ptp4l[74.401]: port 1 (swp3): send peer delay response failed
    ptp4l[74.401]: port 1 (swp3): clearing fault immediately
    ptp4l[74.953]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
    [   75.396830] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost
    [   75.405760] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    ptp4l[75.410]: timed out while polling for tx timestamp
    ptp4l[75.411]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
    ptp4l[75.411]: port 1 (swp3): send peer delay response failed
    ptp4l[75.411]: port 1 (swp3): clearing fault immediately
    (...)
    
    Remove the blocking condition and see that the port recovers:
    $ same tc command as above, but use "sched-entry S 0xff" instead
    $ same ptp4l command as above
    ptp4l[99.489]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE
    ptp4l[99.490]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE
    ptp4l[99.492]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE
    [  100.403768] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost
    [  100.412545] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 1 which seems lost
    [  100.421283] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 2 which seems lost
    [  100.430015] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 3 which seems lost
    [  100.438744] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 4 which seems lost
    [  100.447470] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  100.505919] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    ptp4l[100.963]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
    [  101.405077] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  101.507953] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  102.405405] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  102.509391] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  103.406003] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  103.510011] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  104.405601] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  104.510624] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    ptp4l[104.965]: selected best master clock d858d7.fffe.00ca6d
    ptp4l[104.966]: port 1 (swp3): assuming the grand master role
    ptp4l[104.967]: port 1 (swp3): LISTENING to GRAND_MASTER on RS_GRAND_MASTER
    [  105.106201] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  105.232420] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  105.359001] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  105.405500] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  105.485356] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  105.511220] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  105.610938] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    [  105.737237] mscc_felix 0000:00:00.5: port 3 timestamp id 0
    (...)
    
    Notice that in this new usage pattern, a non-congested port should
    basically use timestamp ID 0 all the time, progressing to higher numbers
    only if there are unacknowledged timestamps in flight. Compare this to
    the old usage, where the timestamp ID used to monotonically increase
    modulo OCELOT_MAX_PTP_ID.
    
    In terms of implementation, this simplifies the bookkeeping of the
    ocelot_port :: ts_id and ptp_skbs_in_flight. Since we need to traverse
    the list of two-step timestampable skbs for each new packet anyway, the
    information can already be computed and does not need to be stored.
    Also, ocelot_port->tx_skbs is always accessed under the switch-wide
    ocelot->ts_id_lock IRQ-unsafe spinlock, so we don't need the skb queue's
    lock and can use the unlocked primitives safely.
    
    This problem was actually detected using the tc-taprio offload, and is
    causing trouble in TSN scenarios, which Felix (NXP LS1028A / VSC9959)
    supports but Ocelot (VSC7514) does not. Thus, I've selected the commit
    to blame as the one adding initial timestamping support for the Felix
    switch.
    
    Fixes: c0bcf537667c ("net: dsa: ocelot: add hardware timestamping support for Felix")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://patch.msgid.link/20241205145519.1236778-5-vladimir.oltean@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: mscc: ocelot: fix memory leak on ocelot_port_add_txtstamp_skb() [+ + +]

Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Thu Dec 5 16:55:15 2024 +0200

    net: mscc: ocelot: fix memory leak on ocelot_port_add_txtstamp_skb()
    
    [ Upstream commit 4b01bec25bef62544228bce06db6a3afa5d3d6bb ]
    
    If ocelot_port_add_txtstamp_skb() fails, for example due to a full PTP
    timestamp FIFO, we must undo the skb_clone_sk() call with kfree_skb().
    Otherwise, the reference to the skb clone is lost.
    
    Fixes: 52849bcf0029 ("net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://patch.msgid.link/20241205145519.1236778-2-vladimir.oltean@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: mscc: ocelot: improve handling of TX timestamp for unknown skb [+ + +]

Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Thu Dec 5 16:55:16 2024 +0200

    net: mscc: ocelot: improve handling of TX timestamp for unknown skb
    
    [ Upstream commit b6fba4b3f0becb794e274430f3a0839d8ba31262 ]
    
    This condition, theoretically impossible to trigger, is not really
    handled well. By "continuing", we are skipping the write to SYS_PTP_NXT
    which advances the timestamp FIFO to the next entry. So we are reading
    the same FIFO entry all over again, printing stack traces and eventually
    killing the kernel.
    
    No real problem has been observed here. This is part of a larger rework
    of the timestamp IRQ procedure, with this logical change split out into
    a patch of its own. We will need to "goto next_ts" for other conditions
    as well.
    
    Fixes: 9fde506e0c53 ("net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://patch.msgid.link/20241205145519.1236778-3-vladimir.oltean@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: mscc: ocelot: ocelot->ts_id_lock and ocelot_port->tx_skbs.lock are IRQ-safe [+ + +]

Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Thu Dec 5 16:55:17 2024 +0200

    net: mscc: ocelot: ocelot->ts_id_lock and ocelot_port->tx_skbs.lock are IRQ-safe
    
    [ Upstream commit 0c53cdb95eb4a604062e326636971d96dd9b1b26 ]
    
    ocelot_get_txtstamp() is a threaded IRQ handler, requested explicitly as
    such by both ocelot_ptp_rdy_irq_handler() and vsc9959_irq_handler().
    
    As such, it runs with IRQs enabled, and not in hardirq context. Thus,
    ocelot_port_add_txtstamp_skb() has no reason to turn off IRQs, it cannot
    be preempted by ocelot_get_txtstamp(). For the same reason,
    dev_kfree_skb_any_reason() will always evaluate as kfree_skb_reason() in
    this calling context, so just simplify the dev_kfree_skb_any() call to
    kfree_skb().
    
    Also, ocelot_port_txtstamp_request() runs from NET_TX softirq context,
    not with hardirqs enabled. Thus, ocelot_get_txtstamp() which shares the
    ocelot_port->tx_skbs.lock lock with it, has no reason to disable hardirqs.
    
    This is part of a larger rework of the TX timestamping procedure.
    A logical subportion of the rework has been split into a separate
    change.
    
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://patch.msgid.link/20241205145519.1236778-4-vladimir.oltean@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: b454abfab525 ("net: mscc: ocelot: be resilient to loss of PTP packets during transmission")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: mscc: ocelot: perform error cleanup in ocelot_hwstamp_set() [+ + +]

Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Thu Dec 5 16:55:19 2024 +0200

    net: mscc: ocelot: perform error cleanup in ocelot_hwstamp_set()
    
    [ Upstream commit 43a4166349a254446e7a3db65f721c6a30daccf3 ]
    
    An unsupported RX filter will leave the port with TX timestamping still
    applied as per the new request, rather than the old setting. When
    parsing the tx_type, don't apply it just yet, but delay that until after
    we've parsed the rx_filter as well (and potentially returned -ERANGE for
    that).
    
    Similarly, copy_to_user() may fail, which is a rare occurrence, but
    should still be treated by unwinding what was done.
    
    Fixes: 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://patch.msgid.link/20241205145519.1236778-6-vladimir.oltean@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: renesas: rswitch: avoid use-after-put for a device tree node [+ + +]

Author: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Date:   Sun Dec 8 14:50:04 2024 +0500

    net: renesas: rswitch: avoid use-after-put for a device tree node
    
    [ Upstream commit 66b7e9f85b8459c823b11e9af69dbf4be5eb6be8 ]
    
    The device tree node saved in the rswitch_device structure is used at
    several driver locations. So passing this node to of_node_put() after
    the first use is wrong.
    
    Move of_node_put() for this node to exit paths.
    
    Fixes: b46f1e579329 ("net: renesas: rswitch: Simplify struct phy * handling")
    Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
    Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
    Link: https://patch.msgid.link/20241208095004.69468-5-nikita.yoush@cogentembedded.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: renesas: rswitch: fix initial MPIC register setting [+ + +]

Author: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Date:   Wed Dec 11 10:30:12 2024 +0500

    net: renesas: rswitch: fix initial MPIC register setting
    
    [ Upstream commit fb9e6039c325cc205a368046dc03c56c87df2310 ]
    
    MPIC.PIS must be set per phy interface type.
    MPIC.LSC must be set per speed.
    
    Do that strictly per datasheet, instead of hardcoding MPIC.PIS to GMII.
    
    Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
    Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
    Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
    Link: https://patch.msgid.link/20241211053012.368914-1-nikita.yoush@cogentembedded.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: renesas: rswitch: fix leaked pointer on error path [+ + +]

Author: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Date:   Sun Dec 8 14:50:03 2024 +0500

    net: renesas: rswitch: fix leaked pointer on error path
    
    [ Upstream commit bb617328bafa1023d8e9c25a25345a564c66c14f ]
    
    If error path is taken while filling descriptor for a frame, skb
    pointer is left in the entry. Later, on the ring entry reuse, the
    same entry could be used as a part of a multi-descriptor frame,
    and skb for that new frame could be stored in a different entry.
    
    Then, the stale pointer will reach the completion routine, and passed
    to the release operation.
    
    Fix that by clearing the saved skb pointer at the error path.
    
    Fixes: d2c96b9d5f83 ("net: rswitch: Add jumbo frames handling for TX")
    Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
    Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
    Link: https://patch.msgid.link/20241208095004.69468-4-nikita.yoush@cogentembedded.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: renesas: rswitch: fix possible early skb release [+ + +]

Author: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Date:   Sun Dec 8 14:50:01 2024 +0500

    net: renesas: rswitch: fix possible early skb release
    
    [ Upstream commit 5cb099902b6b6292b3a85ffa1bb844e0ba195945 ]
    
    When sending frame split into multiple descriptors, hardware processes
    descriptors one by one, including writing back DT values. The first
    descriptor could be already marked as completed when processing of
    next descriptors for the same frame is still in progress.
    
    Although only the last descriptor is configured to generate interrupt,
    completion of the first descriptor could be noticed by the driver when
    handling interrupt for the previous frame.
    
    Currently, driver stores skb in the entry that corresponds to the first
    descriptor. This results into skb could be unmapped and freed when
    hardware did not complete the send yet. This opens a window for
    corrupting the data being sent.
    
    Fix this by saving skb in the entry that corresponds to the last
    descriptor used to send the frame.
    
    Fixes: d2c96b9d5f83 ("net: rswitch: Add jumbo frames handling for TX")
    Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
    Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
    Link: https://patch.msgid.link/20241208095004.69468-2-nikita.yoush@cogentembedded.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: renesas: rswitch: fix race window between tx start and complete [+ + +]

Author: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Date:   Sun Dec 8 14:50:02 2024 +0500

    net: renesas: rswitch: fix race window between tx start and complete
    
    [ Upstream commit 0c9547e6ccf40455b0574cf589be3b152a3edf5b ]
    
    If hardware is already transmitting, it can start handling the
    descriptor being written to immediately after it observes updated DT
    field, before the queue is kicked by a write to GWTRC.
    
    If the start_xmit() execution is preempted at unfortunate moment, this
    transmission can complete, and interrupt handled, before gq->cur gets
    updated. With the current implementation of completion, this will cause
    the last entry not completed.
    
    Fix that by changing completion loop to check DT values directly, instead
    of depending on gq->cur.
    
    Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
    Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
    Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
    Link: https://patch.msgid.link/20241208095004.69468-3-nikita.yoush@cogentembedded.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: renesas: rswitch: handle stop vs interrupt race [+ + +]

Author: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Date:   Mon Dec 9 16:32:04 2024 +0500

    net: renesas: rswitch: handle stop vs interrupt race
    
    [ Upstream commit 3dd002f20098b9569f8fd7f8703f364571e2e975 ]
    
    Currently the stop routine of rswitch driver does not immediately
    prevent hardware from continuing to update descriptors and requesting
    interrupts.
    
    It can happen that when rswitch_stop() executes the masking of
    interrupts from the queues of the port being closed, napi poll for
    that port is already scheduled or running on a different CPU. When
    execution of this napi poll completes, it will unmask the interrupts.
    And unmasked interrupt can fire after rswitch_stop() returns from
    napi_disable() call. Then, the handler won't mask it, because
    napi_schedule_prep() will return false, and interrupt storm will
    happen.
    
    This can't be fixed by making rswitch_stop() call napi_disable() before
    masking interrupts. In this case, the interrupt storm will happen if
    interrupt fires between napi_disable() and masking.
    
    Fix this by checking for priv->opened_ports bit when unmasking
    interrupts after napi poll. For that to be consistent, move
    priv->opened_ports changes into spinlock-protected areas, and reorder
    other operations in rswitch_open() and rswitch_stop() accordingly.
    
    Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
    Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
    Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
    Link: https://patch.msgid.link/20241209113204.175015-1-nikita.yoush@cogentembedded.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: sparx5: fix FDMA performance issue [+ + +]

Author: Daniel Machon <daniel.machon@microchip.com>
Date:   Thu Dec 5 14:54:26 2024 +0100

    net: sparx5: fix FDMA performance issue
    
    [ Upstream commit f004f2e535e2b66ccbf5ac35f8eaadeac70ad7b7 ]
    
    The FDMA handler is responsible for scheduling a NAPI poll, which will
    eventually fetch RX packets from the FDMA queue. Currently, the FDMA
    handler is run in a threaded context. For some reason, this kills
    performance.  Admittedly, I did not do a thorough investigation to see
    exactly what causes the issue, however, I noticed that in the other
    driver utilizing the same FDMA engine, we run the FDMA handler in hard
    IRQ context.
    
    Fix this performance issue, by  running the FDMA handler in hard IRQ
    context, not deferring any work to a thread.
    
    Prior to this change, the RX UDP performance was:
    
    Interval           Transfer     Bitrate         Jitter
    0.00-10.20  sec    44.6 MBytes  36.7 Mbits/sec  0.027 ms
    
    After this change, the rx UDP performance is:
    
    Interval           Transfer     Bitrate         Jitter
    0.00-9.12   sec    1.01 GBytes  953 Mbits/sec   0.020 ms
    
    Fixes: 10615907e9b5 ("net: sparx5: switchdev: adding frame DMA functionality")
    Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: sparx5: fix the maximum frame length register [+ + +]

Author: Daniel Machon <daniel.machon@microchip.com>
Date:   Thu Dec 5 14:54:28 2024 +0100

    net: sparx5: fix the maximum frame length register
    
    [ Upstream commit ddd7ba006078a2bef5971b2dc5f8383d47f96207 ]
    
    On port initialization, we configure the maximum frame length accepted
    by the receive module associated with the port. This value is currently
    written to the MAX_LEN field of the DEV10G_MAC_ENA_CFG register, when in
    fact, it should be written to the DEV10G_MAC_MAXLEN_CFG register. Fix
    this.
    
    Fixes: 946e7fd5053a ("net: sparx5: add port module support")
    Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: IDLETIMER: Fix for possible ABBA deadlock [+ + +]

Author: Phil Sutter <phil@nwl.cc>
Date:   Fri Dec 6 19:32:29 2024 +0100

    netfilter: IDLETIMER: Fix for possible ABBA deadlock
    
    [ Upstream commit f36b01994d68ffc253c8296e2228dfe6e6431c03 ]
    
    Deletion of the last rule referencing a given idletimer may happen at
    the same time as a read of its file in sysfs:
    
    | ======================================================
    | WARNING: possible circular locking dependency detected
    | 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted
    | ------------------------------------------------------
    | iptables/3303 is trying to acquire lock:
    | ffff8881057e04b8 (kn->active#48){++++}-{0:0}, at: __kernfs_remove+0x20
    |
    | but task is already holding lock:
    | ffffffffa0249068 (list_mutex){+.+.}-{3:3}, at: idletimer_tg_destroy_v]
    |
    | which lock already depends on the new lock.
    
    A simple reproducer is:
    
    | #!/bin/bash
    |
    | while true; do
    |         iptables -A INPUT -i foo -j IDLETIMER --timeout 10 --label "testme"
    |         iptables -D INPUT -i foo -j IDLETIMER --timeout 10 --label "testme"
    | done &
    | while true; do
    |         cat /sys/class/xt_idletimer/timers/testme >/dev/null
    | done
    
    Avoid this by freeing list_mutex right after deleting the element from
    the list, then continuing with the teardown.
    
    Fixes: 0902b469bd25 ("netfilter: xtables: idletimer target implementation")
    Signed-off-by: Phil Sutter <phil@nwl.cc>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nf_tables: do not defer rule destruction via call_rcu [+ + +]

Author: Florian Westphal <fw@strlen.de>
Date:   Sat Dec 7 12:14:48 2024 +0100

    netfilter: nf_tables: do not defer rule destruction via call_rcu
    
    [ Upstream commit b04df3da1b5c6f6dc7cdccc37941740c078c4043 ]
    
    nf_tables_chain_destroy can sleep, it can't be used from call_rcu
    callbacks.
    
    Moreover, nf_tables_rule_release() is only safe for error unwinding,
    while transaction mutex is held and the to-be-desroyed rule was not
    exposed to either dataplane or dumps, as it deactives+frees without
    the required synchronize_rcu() in-between.
    
    nft_rule_expr_deactivate() callbacks will change ->use counters
    of other chains/sets, see e.g. nft_lookup .deactivate callback, these
    must be serialized via transaction mutex.
    
    Also add a few lockdep asserts to make this more explicit.
    
    Calling synchronize_rcu() isn't ideal, but fixing this without is hard
    and way more intrusive.  As-is, we can get:
    
    WARNING: .. net/netfilter/nf_tables_api.c:5515 nft_set_destroy+0x..
    Workqueue: events nf_tables_trans_destroy_work
    RIP: 0010:nft_set_destroy+0x3fe/0x5c0
    Call Trace:
     <TASK>
     nf_tables_trans_destroy_work+0x6b7/0xad0
     process_one_work+0x64a/0xce0
     worker_thread+0x613/0x10d0
    
    In case the synchronize_rcu becomes an issue, we can explore alternatives.
    
    One way would be to allocate nft_trans_rule objects + one nft_trans_chain
    object, deactivate the rules + the chain and then defer the freeing to the
    nft destroy workqueue.  We'd still need to keep the synchronize_rcu path as
    a fallback to handle -ENOMEM corner cases though.
    
    Reported-by: syzbot+b26935466701e56cfdc2@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/all/67478d92.050a0220.253251.0062.GAE@google.com/T/
    Fixes: c03d278fdf35 ("netfilter: nf_tables: wait for rcu grace period on net_device removal")
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

objtool/x86: allow syscall instruction [+ + +]

Author: Juergen Gross <jgross@suse.com>
Date:   Fri Nov 29 15:47:49 2024 +0100

    objtool/x86: allow syscall instruction
    
    commit dda014ba59331dee4f3b773a020e109932f4bd24 upstream.
    
    The syscall instruction is used in Xen PV mode for doing hypercalls.
    Allow syscall to be used in the kernel in case it is tagged with an
    unwind hint for objtool.
    
    This is part of XSA-466 / CVE-2024-53241.
    
    Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Co-developed-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf ftrace: Fix undefined behavior in cmp_profile_data() [+ + +]

Author: Kuan-Wei Chiu <visitorckw@gmail.com>
Date:   Mon Dec 9 21:42:26 2024 +0800

    perf ftrace: Fix undefined behavior in cmp_profile_data()
    
    commit 246dfe3dc199246bd64635163115f2691623fc53 upstream.
    
    The comparison function cmp_profile_data() violates the C standard's
    requirements for qsort() comparison functions, which mandate symmetry
    and transitivity:
    
    * Symmetry: If x < y, then y > x.
    * Transitivity: If x < y and y < z, then x < z.
    
    When v1 and v2 are equal, the function incorrectly returns 1, breaking
    symmetry and transitivity. This causes undefined behavior, which can
    lead to memory corruption in certain versions of glibc [1].
    
    Fix the issue by returning 0 when v1 and v2 are equal, ensuring
    compliance with the C standard and preventing undefined behavior.
    
    Link: https://www.qualys.com/2024/01/30/qsort.txt [1]
    Fixes: 0f223813edd0 ("perf ftrace: Add 'profile' command")
    Fixes: 74ae366c37b7 ("perf ftrace profile: Add -s/--sort option")
    Cc: stable@vger.kernel.org
    Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
    Reviewed-by: Namhyung Kim <namhyung@kernel.org>
    Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: jserv@ccns.ncku.edu.tw
    Cc: chuang@cs.nycu.edu.tw
    Link: https://lore.kernel.org/r/20241209134226.1939163-1-visitorckw@gmail.com
    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf machine: Initialize machine->env to address a segfault [+ + +]

Author: Arnaldo Carvalho de Melo <acme@kernel.org>
Date:   Tue Nov 26 11:47:25 2024 -0300

    perf machine: Initialize machine->env to address a segfault
    
    [ Upstream commit 88a6e2f67cc94f751a74409ab4c21e5fc8ea6757 ]
    
    Its used from trace__run(), for the 'perf trace' live mode, i.e. its
    strace-like, non-perf.data file processing mode, the most common one.
    
    The trace__run() function will set trace->host using machine__new_host()
    that is supposed to give a machine instance representing the running
    machine, and since we'll use perf_env__arch_strerrno() to get the right
    errno -> string table, we need to use machine->env, so initialize it in
    machine__new_host().
    
    Before the patch:
    
      (gdb) run trace --errno-summary -a sleep 1
      <SNIP>
       Summary of events:
    
       gvfs-afc-volume (3187), 2 events, 0.0%
    
         syscall            calls  errors  total       min       avg       max       stddev
                                           (msec)    (msec)    (msec)    (msec)        (%)
         --------------- --------  ------ -------- --------- --------- ---------     ------
         pselect6               1      0     0.000     0.000     0.000     0.000      0.00%
    
       GUsbEventThread (3519), 2 events, 0.0%
    
         syscall            calls  errors  total       min       avg       max       stddev
                                           (msec)    (msec)    (msec)    (msec)        (%)
         --------------- --------  ------ -------- --------- --------- ---------     ------
         poll                   1      0     0.000     0.000     0.000     0.000      0.00%
      <SNIP>
      Program received signal SIGSEGV, Segmentation fault.
      0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478
      478           if (env->arch_strerrno == NULL)
      (gdb) bt
      #0  0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478
      #1  0x00000000004b75d2 in thread__dump_stats (ttrace=0x14f58f0, trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4673
      #2  0x00000000004b78bf in trace__fprintf_thread (fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>, thread=0x10fa0b0, trace=0x7fffffffa5b0) at builtin-trace.c:4708
      #3  0x00000000004b7ad9 in trace__fprintf_thread_summary (trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4747
      #4  0x00000000004b656e in trace__run (trace=0x7fffffffa5b0, argc=2, argv=0x7fffffffde60) at builtin-trace.c:4456
      #5  0x00000000004ba43e in cmd_trace (argc=2, argv=0x7fffffffde60) at builtin-trace.c:5487
      #6  0x00000000004c0414 in run_builtin (p=0xec3068 <commands+648>, argc=5, argv=0x7fffffffde60) at perf.c:351
      #7  0x00000000004c06bb in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:404
      #8  0x00000000004c0814 in run_argv (argcp=0x7fffffffdc4c, argv=0x7fffffffdc40) at perf.c:448
      #9  0x00000000004c0b5d in main (argc=5, argv=0x7fffffffde60) at perf.c:560
      (gdb)
    
    After:
    
      root@number:~# perf trace -a --errno-summary sleep 1
      <SNIP>
         pw-data-loop (2685), 1410 events, 16.0%
    
         syscall            calls  errors  total       min       avg       max       stddev
                                           (msec)    (msec)    (msec)    (msec)        (%)
         --------------- --------  ------ -------- --------- --------- ---------     ------
         epoll_wait           188      0   983.428     0.000     5.231    15.595      8.68%
         ioctl                 94      0     0.811     0.004     0.009     0.016      2.82%
         read                 188      0     0.322     0.001     0.002     0.006      5.15%
         write                141      0     0.280     0.001     0.002     0.018      8.39%
         timerfd_settime       94      0     0.138     0.001     0.001     0.007      6.47%
    
       gnome-control-c (179406), 1848 events, 20.9%
    
         syscall            calls  errors  total       min       avg       max       stddev
                                           (msec)    (msec)    (msec)    (msec)        (%)
         --------------- --------  ------ -------- --------- --------- ---------     ------
         poll                 222      0   959.577     0.000     4.322    21.414     11.40%
         recvmsg              150      0     0.539     0.001     0.004     0.013      5.12%
         write                300      0     0.442     0.001     0.001     0.007      3.29%
         read                 150      0     0.183     0.001     0.001     0.009      5.53%
         getpid               102      0     0.101     0.000     0.001     0.008      7.82%
    
      root@number:~#
    
    Fixes: 54373b5d53c1f6aa ("perf env: Introduce perf_env__arch_strerrno()")
    Reported-by: Veronika Molnarova <vmolnaro@redhat.com>
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Acked-by: Veronika Molnarova <vmolnaro@redhat.com>
    Acked-by: Michael Petlan <mpetlan@redhat.com>
    Tested-by: Michael Petlan <mpetlan@redhat.com>
    Link: https://lore.kernel.org/r/Z0XffUgNSv_9OjOi@x1
    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

perf tools: Fix build-id event recording [+ + +]

Author: Namhyung Kim <namhyung@kernel.org>
Date:   Tue Nov 26 19:13:31 2024 -0800

    perf tools: Fix build-id event recording
    
    [ Upstream commit 23c44f6c83257923b179461694edcf62749bedd5 ]
    
    The build-id events written at the end of the record session are broken
    due to unexpected data.  The write_buildid() writes the fixed length
    event first and then variable length filename.
    
    But a recent change made it write more data in the padding area
    accidentally.  So readers of the event see zero-filled data for the
    next entry and treat it incorrectly.  This resulted in wrong kernel
    symbols because the kernel DSO loaded a random vmlinux image in the
    path as it didn't have a valid build-id.
    
    Fixes: ae39ba16554e ("perf inject: Fix build ID injection")
    Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
    Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Reviewed-by: Ian Rogers <irogers@google.com>
    Link: https://lore.kernel.org/r/Z0aRFFW9xMh3mqKB@google.com
    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

perf/x86/intel/ds: Unconditionally drain PEBS DS when changing PEBS_DATA_CFG [+ + +]

Author: Kan Liang <kan.liang@linux.intel.com>
Date:   Tue Nov 19 05:55:01 2024 -0800

    perf/x86/intel/ds: Unconditionally drain PEBS DS when changing PEBS_DATA_CFG
    
    commit 9f3de72a0c37005f897d69e4bdd59c25b8898447 upstream.
    
    The PEBS kernel warnings can still be observed with the below case.
    
    when the below commands are running in parallel for a while.
    
      while true;
      do
            perf record --no-buildid -a --intr-regs=AX  \
                        -e cpu/event=0xd0,umask=0x81/pp \
                        -c 10003 -o /dev/null ./triad;
      done &
    
      while true;
      do
            perf record -e 'cpu/mem-loads,ldlat=3/uP' -W -d -- ./dtlb
      done
    
    The commit b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing
    PEBS_DATA_CFG") intends to flush the entire PEBS buffer before the
    hardware is reprogrammed. However, it fails in the above case.
    
    The first perf command utilizes the large PEBS, while the second perf
    command only utilizes a single PEBS. When the second perf event is
    added, only the n_pebs++. The intel_pmu_pebs_enable() is invoked after
    intel_pmu_pebs_add(). So the cpuc->n_pebs == cpuc->n_large_pebs check in
    the intel_pmu_drain_large_pebs() fails. The PEBS DS is not flushed.
    The new PEBS event should not be taken into account when flushing the
    existing PEBS DS.
    
    The check is unnecessary here. Before the hardware is reprogrammed, all
    the stale records must be drained unconditionally.
    
    For single PEBS or PEBS-vi-pt, the DS must be empty. The drain_pebs()
    can handle the empty case. There is no harm to unconditionally drain the
    PEBS DS.
    
    Fixes: b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG")
    Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20241119135504.1463839-2-kan.liang@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ptp: kvm: x86: Return EOPNOTSUPP instead of ENODEV from kvm_arch_ptp_init() [+ + +]

Author: Thomas Weißschuh <linux@weissschuh.net>
Date:   Tue Dec 3 18:09:55 2024 +0100

    ptp: kvm: x86: Return EOPNOTSUPP instead of ENODEV from kvm_arch_ptp_init()
    
    [ Upstream commit 5e7aa97c7acf171275ac02a8bb018c31b8918d13 ]
    
    The caller, ptp_kvm_init(), emits a warning if kvm_arch_ptp_init() exits
    with any error which is not EOPNOTSUPP:
    
            "fail to initialize ptp_kvm"
    
    Replace ENODEV with EOPNOTSUPP to avoid this spurious warning,
    aligning with the ARM implementation.
    
    Fixes: a86ed2cfa13c ("ptp: Don't print an error if ptp_kvm is not supported")
    Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
    Link: https://patch.msgid.link/20241203-kvm_ptp-eopnotsuppp-v2-1-d1d060f27aa6@weissschuh.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

qca_spi: Fix clock speed for multiple QCA7000 [+ + +]

Author: Stefan Wahren <wahrenst@gmx.net>
Date:   Fri Dec 6 19:46:42 2024 +0100

    qca_spi: Fix clock speed for multiple QCA7000
    
    [ Upstream commit 4dba406fac06b009873fe7a28231b9b7e4288b09 ]
    
    Storing the maximum clock speed in module parameter qcaspi_clkspeed
    has the unintended side effect that the first probed instance
    defines the value for all other instances. Fix this issue by storing
    it in max_speed_hz of the relevant SPI device.
    
    This fix keeps the priority of the speed parameter (module parameter,
    device tree property, driver default). Btw this uses the opportunity
    to get the rid of the unused member clkspeed.
    
    Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
    Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
    Link: https://patch.msgid.link/20241206184643.123399-2-wahrenst@gmx.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

qca_spi: Make driver probing reliable [+ + +]

Author: Stefan Wahren <wahrenst@gmx.net>
Date:   Fri Dec 6 19:46:43 2024 +0100

    qca_spi: Make driver probing reliable
    
    [ Upstream commit becc6399ce3b724cffe9ccb7ef0bff440bb1b62b ]
    
    The module parameter qcaspi_pluggable controls if QCA7000 signature
    should be checked at driver probe (current default) or not. Unfortunately
    this could fail in case the chip is temporary in reset, which isn't under
    total control by the Linux host. So disable this check per default
    in order to avoid unexpected probe failures.
    
    Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
    Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
    Link: https://patch.msgid.link/20241206184643.123399-3-wahrenst@gmx.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

regulator: axp20x: AXP717: set ramp_delay [+ + +]

Author: Philippe Simons <simons.philippe@gmail.com>
Date:   Sun Dec 8 13:43:08 2024 +0100

    regulator: axp20x: AXP717: set ramp_delay
    
    [ Upstream commit f07ae52f5cf6a5584fdf7c8c652f027d90bc8b74 ]
    
    AXP717 datasheet says that regulator ramp delay is 15.625 us/step,
    which is 10mV in our case.
    
    Add a AXP_DESC_RANGES_DELAY macro and update AXP_DESC_RANGES macro to
    expand to AXP_DESC_RANGES_DELAY with ramp_delay = 0
    
    For DCDC4, steps is 100mv
    
    Add a AXP_DESC_DELAY macro and update AXP_DESC macro to
    expand to AXP_DESC_DELAY with ramp_delay = 0
    
    This patch fix crashes when using CPU DVFS.
    
    Signed-off-by: Philippe Simons <simons.philippe@gmail.com>
    Tested-by: Hironori KIKUCHI <kikuchan98@gmail.com>
    Tested-by: Chris Morgan <macromorgan@hotmail.com>
    Reviewed-by: Chen-Yu Tsai <wens@csie.org>
    Fixes: d2ac3df75c3a ("regulator: axp20x: add support for the AXP717")
    Link: https://patch.msgid.link/20241208124308.5630-1-simons.philippe@gmail.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

riscv: Fix IPIs usage in kfence_protect_page() [+ + +]

Author: Alexandre Ghiti <alexghiti@rivosinc.com>
Date:   Mon Dec 9 08:41:25 2024 +0100

    riscv: Fix IPIs usage in kfence_protect_page()
    
    commit b3431a8bb336cece8adc452437befa7d4534b2fd upstream.
    
    flush_tlb_kernel_range() may use IPIs to flush the TLBs of all the
    cores, which triggers the following warning when the irqs are disabled:
    
    [    3.455330] WARNING: CPU: 1 PID: 0 at kernel/smp.c:815 smp_call_function_many_cond+0x452/0x520
    [    3.456647] Modules linked in:
    [    3.457218] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.12.0-rc7-00010-g91d3de7240b8 #1
    [    3.457416] Hardware name: QEMU QEMU Virtual Machine, BIOS
    [    3.457633] epc : smp_call_function_many_cond+0x452/0x520
    [    3.457736]  ra : on_each_cpu_cond_mask+0x1e/0x30
    [    3.457786] epc : ffffffff800b669a ra : ffffffff800b67c2 sp : ff2000000000bb50
    [    3.457824]  gp : ffffffff815212b8 tp : ff6000008014f080 t0 : 000000000000003f
    [    3.457859]  t1 : ffffffff815221e0 t2 : 000000000000000f s0 : ff2000000000bc10
    [    3.457920]  s1 : 0000000000000040 a0 : ffffffff815221e0 a1 : 0000000000000001
    [    3.457953]  a2 : 0000000000010000 a3 : 0000000000000003 a4 : 0000000000000000
    [    3.458006]  a5 : 0000000000000000 a6 : ffffffffffffffff a7 : 0000000000000000
    [    3.458042]  s2 : ffffffff815223be s3 : 00fffffffffff000 s4 : ff600001ffe38fc0
    [    3.458076]  s5 : ff600001ff950d00 s6 : 0000000200000120 s7 : 0000000000000001
    [    3.458109]  s8 : 0000000000000001 s9 : ff60000080841ef0 s10: 0000000000000001
    [    3.458141]  s11: ffffffff81524812 t3 : 0000000000000001 t4 : ff60000080092bc0
    [    3.458172]  t5 : 0000000000000000 t6 : ff200000000236d0
    [    3.458203] status: 0000000200000100 badaddr: ffffffff800b669a cause: 0000000000000003
    [    3.458373] [<ffffffff800b669a>] smp_call_function_many_cond+0x452/0x520
    [    3.458593] [<ffffffff800b67c2>] on_each_cpu_cond_mask+0x1e/0x30
    [    3.458625] [<ffffffff8000e4ca>] __flush_tlb_range+0x118/0x1ca
    [    3.458656] [<ffffffff8000e6b2>] flush_tlb_kernel_range+0x1e/0x26
    [    3.458683] [<ffffffff801ea56a>] kfence_protect+0xc0/0xce
    [    3.458717] [<ffffffff801e9456>] kfence_guarded_free+0xc6/0x1c0
    [    3.458742] [<ffffffff801e9d6c>] __kfence_free+0x62/0xc6
    [    3.458764] [<ffffffff801c57d8>] kfree+0x106/0x32c
    [    3.458786] [<ffffffff80588cf2>] detach_buf_split+0x188/0x1a8
    [    3.458816] [<ffffffff8058708c>] virtqueue_get_buf_ctx+0xb6/0x1f6
    [    3.458839] [<ffffffff805871da>] virtqueue_get_buf+0xe/0x16
    [    3.458880] [<ffffffff80613d6a>] virtblk_done+0x5c/0xe2
    [    3.458908] [<ffffffff8058766e>] vring_interrupt+0x6a/0x74
    [    3.458930] [<ffffffff800747d8>] __handle_irq_event_percpu+0x7c/0xe2
    [    3.458956] [<ffffffff800748f0>] handle_irq_event+0x3c/0x86
    [    3.458978] [<ffffffff800786cc>] handle_simple_irq+0x9e/0xbe
    [    3.459004] [<ffffffff80073934>] generic_handle_domain_irq+0x1c/0x2a
    [    3.459027] [<ffffffff804bf87c>] imsic_handle_irq+0xba/0x120
    [    3.459056] [<ffffffff80073934>] generic_handle_domain_irq+0x1c/0x2a
    [    3.459080] [<ffffffff804bdb76>] riscv_intc_aia_irq+0x24/0x34
    [    3.459103] [<ffffffff809d0452>] handle_riscv_irq+0x2e/0x4c
    [    3.459133] [<ffffffff809d923e>] call_on_irq_stack+0x32/0x40
    
    So only flush the local TLB and let the lazy kfence page fault handling
    deal with the faults which could happen when a core has an old protected
    pte version cached in its TLB. That leads to potential inaccuracies which
    can be tolerated when using kfence.
    
    Fixes: 47513f243b45 ("riscv: Enable KFENCE for riscv64")
    Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20241209074125.52322-1-alexghiti@rivosinc.com
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

riscv: Fix wrong usage of __pa() on a fixmap address [+ + +]

Author: Alexandre Ghiti <alexghiti@rivosinc.com>
Date:   Mon Dec 9 08:45:08 2024 +0100

    riscv: Fix wrong usage of __pa() on a fixmap address
    
    commit c796e187201242992d6d292bfeff41aadfdf3f29 upstream.
    
    riscv uses fixmap addresses to map the dtb so we can't use __pa() which
    is reserved for linear mapping addresses.
    
    Fixes: b2473a359763 ("of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify")
    Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
    Link: https://lore.kernel.org/r/20241209074508.53037-1-alexghiti@rivosinc.com
    Cc: stable@vger.kernel.org
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

riscv: mm: Do not call pmd dtor on vmemmap page table teardown [+ + +]

Author: Björn Töpel <bjorn@rivosinc.com>
Date:   Wed Nov 20 14:12:02 2024 +0100

    riscv: mm: Do not call pmd dtor on vmemmap page table teardown
    
    commit 21f1b85c8912262adf51707e63614a114425eb10 upstream.
    
    The vmemmap's, which is used for RV64 with SPARSEMEM_VMEMMAP, page
    tables are populated using pmd (page middle directory) hugetables.
    However, the pmd allocation is not using the generic mechanism used by
    the VMA code (e.g. pmd_alloc()), or the RISC-V specific
    create_pgd_mapping()/alloc_pmd_late(). Instead, the vmemmap page table
    code allocates a page, and calls vmemmap_set_pmd(). This results in
    that the pmd ctor is *not* called, nor would it make sense to do so.
    
    Now, when tearing down a vmemmap page table pmd, the cleanup code
    would unconditionally, and incorrectly call the pmd dtor, which
    results in a crash (best case).
    
    This issue was found when running the HMM selftests:
    
      | tools/testing/selftests/mm# ./test_hmm.sh smoke
      | ... # when unloading the test_hmm.ko module
      | page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10915b
      | flags: 0x1000000000000000(node=0|zone=1)
      | raw: 1000000000000000 0000000000000000 dead000000000122 0000000000000000
      | raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
      | page dumped because: VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte)
      | ------------[ cut here ]------------
      | kernel BUG at include/linux/mm.h:3080!
      | Kernel BUG [#1]
      | Modules linked in: test_hmm(-) sch_fq_codel fuse drm drm_panel_orientation_quirks backlight dm_mod
      | CPU: 1 UID: 0 PID: 514 Comm: modprobe Tainted: G        W          6.12.0-00982-gf2a4f1682d07 #2
      | Tainted: [W]=WARN
      | Hardware name: riscv-virtio qemu/qemu, BIOS 2024.10 10/01/2024
      | epc : remove_pgd_mapping+0xbec/0x1070
      |  ra : remove_pgd_mapping+0xbec/0x1070
      | epc : ffffffff80010a68 ra : ffffffff80010a68 sp : ff20000000a73940
      |  gp : ffffffff827b2d88 tp : ff6000008785da40 t0 : ffffffff80fbce04
      |  t1 : 0720072007200720 t2 : 706d756420656761 s0 : ff20000000a73a50
      |  s1 : ff6000008915cff8 a0 : 0000000000000039 a1 : 0000000000000008
      |  a2 : ff600003fff0de20 a3 : 0000000000000000 a4 : 0000000000000000
      |  a5 : 0000000000000000 a6 : c0000000ffffefff a7 : ffffffff824469b8
      |  s2 : ff1c0000022456c0 s3 : ff1ffffffdbfffff s4 : ff6000008915c000
      |  s5 : ff6000008915c000 s6 : ff6000008915c000 s7 : ff1ffffffdc00000
      |  s8 : 0000000000000001 s9 : ff1ffffffdc00000 s10: ffffffff819a31f0
      |  s11: ffffffffffffffff t3 : ffffffff8000c950 t4 : ff60000080244f00
      |  t5 : ff60000080244000 t6 : ff20000000a73708
      | status: 0000000200000120 badaddr: ffffffff80010a68 cause: 0000000000000003
      | [<ffffffff80010a68>] remove_pgd_mapping+0xbec/0x1070
      | [<ffffffff80fd238e>] vmemmap_free+0x14/0x1e
      | [<ffffffff8032e698>] section_deactivate+0x220/0x452
      | [<ffffffff8032ef7e>] sparse_remove_section+0x4a/0x58
      | [<ffffffff802f8700>] __remove_pages+0x7e/0xba
      | [<ffffffff803760d8>] memunmap_pages+0x2bc/0x3fe
      | [<ffffffff02a3ca28>] dmirror_device_remove_chunks+0x2ea/0x518 [test_hmm]
      | [<ffffffff02a3e026>] hmm_dmirror_exit+0x3e/0x1018 [test_hmm]
      | [<ffffffff80102c14>] __riscv_sys_delete_module+0x15a/0x2a6
      | [<ffffffff80fd020c>] do_trap_ecall_u+0x1f2/0x266
      | [<ffffffff80fde0a2>] _new_vmalloc_restore_context_a0+0xc6/0xd2
      | Code: bf51 7597 0184 8593 76a5 854a 4097 0029 80e7 2c00 (9002) 7597
      | ---[ end trace 0000000000000000 ]---
      | Kernel panic - not syncing: Fatal exception in interrupt
    
    Add a check to avoid calling the pmd dtor, if the calling context is
    vmemmap_free().
    
    Fixes: c75a74f4ba19 ("riscv: mm: Add memory hotplugging support")
    Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
    Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
    Link: https://lore.kernel.org/r/20241120131203.1859787-1-bjorn@kernel.org
    Cc: stable@vger.kernel.org
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rust: kbuild: set `bindgen`'s Rust target version [+ + +]

Author: Miguel Ojeda <ojeda@kernel.org>
Date:   Sat Nov 23 19:03:23 2024 +0100

    rust: kbuild: set `bindgen`'s Rust target version
    
    commit 7a5f93ea5862da91488975acaa0c7abd508f192b upstream.
    
    Each `bindgen` release may upgrade the list of Rust targets. For instance,
    currently, in their master branch [1], the latest ones are:
    
        Nightly => {
            vectorcall_abi: #124485,
            ptr_metadata: #81513,
            layout_for_ptr: #69835,
        },
        Stable_1_77(77) => { offset_of: #106655 },
        Stable_1_73(73) => { thiscall_abi: #42202 },
        Stable_1_71(71) => { c_unwind_abi: #106075 },
        Stable_1_68(68) => { abi_efiapi: #105795 },
    
    By default, the highest stable release in their list is used, and users
    are expected to set one if they need to support older Rust versions
    (e.g. see [2]).
    
    Thus, over time, new Rust features are used by default, and at some
    point, it is likely that `bindgen` will emit Rust code that requires a
    Rust version higher than our minimum (or perhaps enabling an unstable
    feature). Currently, there is no problem because the maximum they have,
    as seen above, is Rust 1.77.0, and our current minimum is Rust 1.78.0.
    
    Therefore, set a Rust target explicitly now to prevent going forward in
    time too much and thus getting potential build failures at some point.
    
    Since we also support a minimum `bindgen` version, and since `bindgen`
    does not support passing unknown Rust target versions, we need to use
    the list of our minimum `bindgen` version, rather than the latest. So,
    since `bindgen` 0.65.1 had this list [3], we need to use Rust 1.68.0:
    
        /// Rust stable 1.64
        ///  * `core_ffi_c` ([Tracking issue](https://github.com/rust-lang/rust/issues/94501))
        => Stable_1_64 => 1.64;
        /// Rust stable 1.68
        ///  * `abi_efiapi` calling convention ([Tracking issue](https://github.com/rust-lang/rust/issues/65815))
        => Stable_1_68 => 1.68;
        /// Nightly rust
        ///  * `thiscall` calling convention ([Tracking issue](https://github.com/rust-lang/rust/issues/42202))
        ///  * `vectorcall` calling convention (no tracking issue)
        ///  * `c_unwind` calling convention ([Tracking issue](https://github.com/rust-lang/rust/issues/74990))
        => Nightly => nightly;
    
        ...
    
        /// Latest stable release of Rust
        pub const LATEST_STABLE_RUST: RustTarget = RustTarget::Stable_1_68;
    
    Thus add the `--rust-target 1.68` parameter. Add a comment as well
    explaining this.
    
    An alternative would be to use the currently running (i.e. actual) `rustc`
    and `bindgen` versions to pick a "better" Rust target version. However,
    that would introduce more moving parts depending on the user setup and
    is also more complex to implement.
    
    Starting with `bindgen` 0.71.0 [4], we will be able to set any future
    Rust version instead, i.e. we will be able to set here our minimum
    supported Rust version. Christian implemented it [5] after seeing this
    patch. Thanks!
    
    Cc: Christian Poveda <git@pvdrz.com>
    Cc: Emilio Cobos Álvarez <emilio@crisal.io>
    Cc: stable@vger.kernel.org # needed for 6.12.y; unneeded for 6.6.y; do not apply to 6.1.y
    Fixes: c844fa64a2d4 ("rust: start supporting several `bindgen` versions")
    Link: https://github.com/rust-lang/rust-bindgen/blob/21c60f473f4e824d4aa9b2b508056320d474b110/bindgen/features.rs#L97-L105 [1]
    Link: https://github.com/rust-lang/rust-bindgen/issues/2960 [2]
    Link: https://github.com/rust-lang/rust-bindgen/blob/7d243056d335fdc4537f7bca73c06d01aae24ddc/bindgen/features.rs#L131-L150 [3]
    Link: https://github.com/rust-lang/rust-bindgen/blob/main/CHANGELOG.md#0710-2024-12-06 [4]
    Link: https://github.com/rust-lang/rust-bindgen/pull/2993 [5]
    Reviewed-by: Alice Ryhl <aliceryhl@google.com>
    Link: https://lore.kernel.org/r/20241123180323.255997-1-ojeda@kernel.org
    Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sched/deadline: Fix replenish_dl_new_period dl_server condition [+ + +]

Author: Juri Lelli <juri.lelli@redhat.com>
Date:   Wed Nov 27 07:37:40 2024 +0100

    sched/deadline: Fix replenish_dl_new_period dl_server condition
    
    commit 22368fe1f9bbf39db2b5b52859589883273e80ce upstream.
    
    The condition in replenish_dl_new_period() that checks if a reservation
    (dl_server) is deferred and is not handling a starvation case is
    obviously wrong.
    
    Fix it.
    
    Fixes: a110a81c52a9 ("sched/deadline: Deferrable dl server")
    Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20241127063740.8278-1-juri.lelli@redhat.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: ufs: core: Update compl_time_stamp_local_clock after completing a cqe [+ + +]

Author: liuderong <liuderong@oppo.com>
Date:   Fri Dec 6 15:29:42 2024 +0800

    scsi: ufs: core: Update compl_time_stamp_local_clock after completing a cqe
    
    commit f103396ae31851d00b561ff9f8a32a441953ff8b upstream.
    
    lrbp->compl_time_stamp_local_clock is set to zero after sending a sqe
    but it is not updated after completing a cqe.  Thus the printed
    information in ufshcd_print_tr() will always be zero.
    
    Update lrbp->cmpl_time_stamp_local_clock after completing a cqe.
    
    Log sample:
    
    ufshcd-qcom 1d84000.ufshc: UPIU[8] - issue time 8750227249 us
    ufshcd-qcom 1d84000.ufshc: UPIU[8] - complete time 0 us
    
    Fixes: c30d8d010b5e ("scsi: ufs: core: Prepare for completion in MCQ")
    Reviewed-by: Bean Huo <beanhuo@micron.com>
    Reviewed-by: Peter Wang <peter.wang@mediatek.com>
    Signed-off-by: liuderong <liuderong@oppo.com>
    Link: https://lore.kernel.org/r/1733470182-220841-1-git-send-email-liuderong@oppo.com
    Reviewed-by: Avri Altman <avri.altman@wdc.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: mlxsw: sharedbuffer: Ensure no extra packets are counted [+ + +]

Author: Danielle Ratson <danieller@nvidia.com>
Date:   Thu Dec 5 17:36:01 2024 +0100

    selftests: mlxsw: sharedbuffer: Ensure no extra packets are counted
    
    [ Upstream commit 5f2c7ab15fd806043db1a7d54b5ec36be0bd93b1 ]
    
    The test assumes that the packet it is sending is the only packet being
    passed to the device.
    
    However, it is not the case and so other packets are filling the buffers
    as well. Therefore, the test sometimes fails because it is reading a
    maximum occupancy that is larger than expected.
    
    Add egress filters on $h1 and $h2 that will guarantee the above.
    
    Fixes: a865ad999603 ("selftests: mlxsw: Add shared buffer traffic test")
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Link: https://patch.msgid.link/64c28bc9b1cc1d78c4a73feda7cedbe9526ccf8b.1733414773.git.petrm@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: mlxsw: sharedbuffer: Remove duplicate test cases [+ + +]

Author: Danielle Ratson <danieller@nvidia.com>
Date:   Thu Dec 5 17:36:00 2024 +0100

    selftests: mlxsw: sharedbuffer: Remove duplicate test cases
    
    [ Upstream commit 6c46ad4d1bb2e8ec2265296e53765190f6e32f33 ]
    
    On both port_tc_ip_test() and port_tc_arp_test(), the max occupancy is
    checked on $h2 twice, when only the error message is different and does not
    match the check itself.
    
    Remove the two duplicated test cases from the test.
    
    Fixes: a865ad999603 ("selftests: mlxsw: Add shared buffer traffic test")
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Link: https://patch.msgid.link/d9eb26f6fc16a06a30b5c2c16ad80caf502bc561.1733414773.git.petrm@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: mlxsw: sharedbuffer: Remove h1 ingress test case [+ + +]

Author: Danielle Ratson <danieller@nvidia.com>
Date:   Thu Dec 5 17:35:59 2024 +0100

    selftests: mlxsw: sharedbuffer: Remove h1 ingress test case
    
    [ Upstream commit cf3515c556907b4da290967a2a6cbbd9ee0ee723 ]
    
    The test is sending only one packet generated with mausezahn from $h1 to
    $h2. However, for some reason, it is testing for non-zero maximum occupancy
    in both the ingress pool of $h1 and $h2. The former only passes when $h2
    happens to send a packet.
    
    Avoid intermittent failures by removing unintentional test case
    regarding the ingress pool of $h1.
    
    Fixes: a865ad999603 ("selftests: mlxsw: Add shared buffer traffic test")
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Link: https://patch.msgid.link/5b7344608d5e06f38209e48d8af8c92fa11b6742.1733414773.git.petrm@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: netfilter: Stabilize rpath.sh [+ + +]

Author: Phil Sutter <phil@nwl.cc>
Date:   Fri Dec 6 15:08:40 2024 +0100

    selftests: netfilter: Stabilize rpath.sh
    
    [ Upstream commit d92906fd1b940681b4509f7bb8ae737789fb4695 ]
    
    On some systems, neighbor discoveries from ns1 for fec0:42::1 (i.e., the
    martian trap address) would happen at the wrong time and cause
    false-negative test result.
    
    Problem analysis also discovered that IPv6 martian ping test was broken
    in that sent neighbor discoveries, not echo requests were inadvertently
    trapped
    
    Avoid the race condition by introducing the neighbors to each other
    upfront. Also pin down the firewall rules to matching on echo requests
    only.
    
    Fixes: efb056e5f1f0 ("netfilter: ip6t_rpfilter: Fix regression with VRF interfaces")
    Signed-off-by: Phil Sutter <phil@nwl.cc>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

serial: sh-sci: Check if TX data was written to device in .tx_empty() [+ + +]

Author: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Date:   Mon Nov 25 13:58:56 2024 +0200

    serial: sh-sci: Check if TX data was written to device in .tx_empty()
    
    commit 7cc0e0a43a91052477c2921f924a37d9c3891f0c upstream.
    
    On the Renesas RZ/G3S, when doing suspend to RAM, the uart_suspend_port()
    is called. The uart_suspend_port() calls 3 times the
    struct uart_port::ops::tx_empty() before shutting down the port.
    
    According to the documentation, the struct uart_port::ops::tx_empty()
    API tests whether the transmitter FIFO and shifter for the port is
    empty.
    
    The Renesas RZ/G3S SCIFA IP reports the number of data units stored in the
    transmit FIFO through the FDR (FIFO Data Count Register). The data units
    in the FIFOs are written in the shift register and transmitted from there.
    The TEND bit in the Serial Status Register reports if the data was
    transmitted from the shift register.
    
    In the previous code, in the tx_empty() API implemented by the sh-sci
    driver, it is considered that the TX is empty if the hardware reports the
    TEND bit set and the number of data units in the FIFO is zero.
    
    According to the HW manual, the TEND bit has the following meaning:
    
    0: Transmission is in the waiting state or in progress.
    1: Transmission is completed.
    
    It has been noticed that when opening the serial device w/o using it and
    then switch to a power saving mode, the tx_empty() call in the
    uart_port_suspend() function fails, leading to the "Unable to drain
    transmitter" message being printed on the console. This is because the
    TEND=0 if nothing has been transmitted and the FIFOs are empty. As the
    TEND=0 has double meaning (waiting state, in progress) we can't
    determined the scenario described above.
    
    Add a software workaround for this. This sets a variable if any data has
    been sent on the serial console (when using PIO) or if the DMA callback has
    been called (meaning something has been transmitted). In the tx_empty()
    API the status of the DMA transaction is also checked and if it is
    completed or in progress the code falls back in checking the hardware
    registers instead of relying on the software variable.
    
    Fixes: 73a19e4c0301 ("serial: sh-sci: Add DMA support.")
    Cc: stable@vger.kernel.org
    Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
    Link: https://lore.kernel.org/r/20241125115856.513642-1-claudiu.beznea.uj@bp.renesas.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

spi: aspeed: Fix an error handling path in aspeed_spi_[read|write]_user() [+ + +]

Author: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date:   Tue Nov 19 22:30:29 2024 +0100

    spi: aspeed: Fix an error handling path in aspeed_spi_[read|write]_user()
    
    [ Upstream commit c84dda3751e945a67d71cbe3af4474aad24a5794 ]
    
    A aspeed_spi_start_user() is not balanced by a corresponding
    aspeed_spi_stop_user().
    Add the missing call.
    
    Fixes: e3228ed92893 ("spi: spi-mem: Convert Aspeed SMC driver to spi-mem")
    Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    Link: https://patch.msgid.link/4052aa2f9a9ea342fa6af83fa991b55ce5d5819e.1732051814.git.christophe.jaillet@wanadoo.fr
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

spi: rockchip: Fix PM runtime count on no-op cs [+ + +]

Author: Christian Loehle <christian.loehle@arm.com>
Date:   Fri Dec 6 19:50:55 2024 +0000

    spi: rockchip: Fix PM runtime count on no-op cs
    
    commit 0bb394067a792e7119abc9e0b7158ef19381f456 upstream.
    
    The early bail out that caused an out-of-bounds write was removed with
    commit 5c018e378f91 ("spi: spi-rockchip: Fix out of bounds array
    access")
    Unfortunately that caused the PM runtime count to be unbalanced and
    underflowed on the first call. To fix that reintroduce a no-op check
    by reading the register directly.
    
    Cc: stable@vger.kernel.org
    Fixes: 5c018e378f91 ("spi: spi-rockchip: Fix out of bounds array access")
    Signed-off-by: Christian Loehle <christian.loehle@arm.com>
    Link: https://patch.msgid.link/1f2b3af4-2b7a-4ac8-ab95-c80120ebf44c@arm.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

splice: do not checksum AF_UNIX sockets [+ + +]

Author: Frederik Deweerdt <deweerdt.lkml@gmail.com>
Date:   Mon Dec 9 21:06:48 2024 -0800

    splice: do not checksum AF_UNIX sockets
    
    commit 6bd8614fc2d076fc21b7488c9f279853960964e2 upstream.
    
    When `skb_splice_from_iter` was introduced, it inadvertently added
    checksumming for AF_UNIX sockets. This resulted in significant
    slowdowns, for example when using sendfile over unix sockets.
    
    Using the test code in [1] in my test setup (2G single core qemu),
    the client receives a 1000M file in:
    - without the patch: 1482ms (+/- 36ms)
    - with the patch: 652.5ms (+/- 22.9ms)
    
    This commit addresses the issue by marking checksumming as unnecessary in
    `unix_stream_sendmsg`
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Frederik Deweerdt <deweerdt.lkml@gmail.com>
    Fixes: 2e910b95329c ("net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES")
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Joe Damato <jdamato@fastly.com>
    Link: https://patch.msgid.link/Z1fMaHkRf8cfubuE@xiberoa
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tcp: check space before adding MPTCP SYN options [+ + +]

Author: MoYuanhao <moyuanhao3676@163.com>
Date:   Mon Dec 9 13:28:14 2024 +0100

    tcp: check space before adding MPTCP SYN options
    
    commit 06d64ab46f19ac12f59a1d2aa8cd196b2e4edb5b upstream.
    
    Ensure there is enough space before adding MPTCP options in
    tcp_syn_options().
    
    Without this check, 'remaining' could underflow, and causes issues. If
    there is not enough space, MPTCP should not be used.
    
    Signed-off-by: MoYuanhao <moyuanhao3676@163.com>
    Fixes: cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing connections")
    Cc: stable@vger.kernel.org
    Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    [ Matt: Add Fixes, cc Stable, update Description ]
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/20241209-net-mptcp-check-space-syn-v1-1-2da992bb6f74@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

team: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL [+ + +]

Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Tue Dec 10 15:12:45 2024 +0100

    team: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
    
    [ Upstream commit 98712844589e06d9aa305b5077169942139fd75c ]
    
    Similar to bonding driver, add NETIF_F_GSO_ENCAP_ALL to TEAM_VLAN_FEATURES
    in order to support slave devices which propagate NETIF_F_GSO_UDP_TUNNEL &
    NETIF_F_GSO_UDP_TUNNEL_CSUM as vlan_features.
    
    Fixes: 3625920b62c3 ("teaming: fix vlan_features computing")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Cc: Nikolay Aleksandrov <razor@blackwall.org>
    Cc: Ido Schimmel <idosch@idosch.org>
    Cc: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Link: https://patch.msgid.link/20241210141245.327886-5-daniel@iogearbox.net
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

team: Fix initial vlan_feature set in __team_compute_features [+ + +]

Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Tue Dec 10 15:12:44 2024 +0100

    team: Fix initial vlan_feature set in __team_compute_features
    
    [ Upstream commit 396699ac2cb1bc4e3485abb48a1e3e41956de0cd ]
    
    Similarly as with bonding, fix the calculation of vlan_features to reuse
    netdev_base_features() in order derive the set in the same way as
    ndo_fix_features before iterating through the slave devices to refine the
    feature set.
    
    Fixes: 3625920b62c3 ("teaming: fix vlan_features computing")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Cc: Nikolay Aleksandrov <razor@blackwall.org>
    Cc: Ido Schimmel <idosch@idosch.org>
    Cc: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Link: https://patch.msgid.link/20241210141245.327886-4-daniel@iogearbox.net
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tipc: fix NULL deref in cleanup_bearer() [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Dec 4 17:05:48 2024 +0000

    tipc: fix NULL deref in cleanup_bearer()
    
    [ Upstream commit b04d86fff66b15c07505d226431f808c15b1703c ]
    
    syzbot found [1] that after blamed commit, ub->ubsock->sk
    was NULL when attempting the atomic_dec() :
    
    atomic_dec(&tipc_net(sock_net(ub->ubsock->sk))->wq_count);
    
    Fix this by caching the tipc_net pointer.
    
    [1]
    
    Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN PTI
    KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
    CPU: 0 UID: 0 PID: 5896 Comm: kworker/0:3 Not tainted 6.13.0-rc1-next-20241203-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
    Workqueue: events cleanup_bearer
     RIP: 0010:read_pnet include/net/net_namespace.h:387 [inline]
     RIP: 0010:sock_net include/net/sock.h:655 [inline]
     RIP: 0010:cleanup_bearer+0x1f7/0x280 net/tipc/udp_media.c:820
    Code: 18 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 3c f7 99 f6 48 8b 1b 48 83 c3 30 e8 f0 e4 60 00 48 89 d8 48 c1 e8 03 <42> 80 3c 28 00 74 08 48 89 df e8 1a f7 99 f6 49 83 c7 e8 48 8b 1b
    RSP: 0018:ffffc9000410fb70 EFLAGS: 00010206
    RAX: 0000000000000006 RBX: 0000000000000030 RCX: ffff88802fe45a00
    RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffc9000410f900
    RBP: ffff88807e1f0908 R08: ffffc9000410f907 R09: 1ffff92000821f20
    R10: dffffc0000000000 R11: fffff52000821f21 R12: ffff888031d19980
    R13: dffffc0000000000 R14: dffffc0000000000 R15: ffff88807e1f0918
    FS:  0000000000000000(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000556ca050b000 CR3: 0000000031c0c000 CR4: 00000000003526f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    
    Fixes: 6a2fa13312e5 ("tipc: Fix use-after-free of kernel socket in cleanup_bearer().")
    Reported-by: syzbot+46aa5474f179dacd1a3b@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/netdev/67508b5f.050a0220.17bd51.0070.GAE@google.com/T/#u
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://patch.msgid.link/20241204170548.4152658-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

usb: core: hcd: only check primary hcd skip_phy_initialization [+ + +]

Author: Xu Yang <xu.yang_2@nxp.com>
Date:   Tue Nov 5 17:01:20 2024 +0800

    usb: core: hcd: only check primary hcd skip_phy_initialization
    
    commit d2ec94fbc431cc77ed53d4480bdc856669c2b5aa upstream.
    
    Before commit 53a2d95df836 ("usb: core: add phy notify connect and
    disconnect"), phy initialization will be skipped even when shared hcd
    doesn't set skip_phy_initialization flag. However, the situation is
    changed after the commit. The hcd.c will initialize phy when add shared
    hcd. This behavior is unexpected for some platforms which will handle phy
    initialization by themselves. To avoid the issue, this will only check
    skip_phy_initialization flag of primary hcd since shared hcd normally
    follow primary hcd setting.
    
    Fixes: 53a2d95df836 ("usb: core: add phy notify connect and disconnect")
    Cc: stable@vger.kernel.org
    Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
    Link: https://lore.kernel.org/r/20241105090120.2438366-1-xu.yang_2@nxp.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: dwc2: Fix HCD port connection race [+ + +]

Author: Stefan Wahren <wahrenst@gmx.net>
Date:   Mon Dec 2 01:16:31 2024 +0100

    usb: dwc2: Fix HCD port connection race
    
    commit 1cf1bd88f129f3bd647fead4dca270a5894274bb upstream.
    
    On Raspberry Pis without onboard USB hub frequent device reconnects
    can trigger a interrupt storm after DWC2 entered host clock gating.
    This is caused by a race between _dwc2_hcd_suspend() and the port
    interrupt, which sets port_connect_status. The issue occurs if
    port_connect_status is still 1, but there is no connection anymore:
    
    usb 1-1: USB disconnect, device number 25
    dwc2 3f980000.usb: _dwc2_hcd_suspend: port_connect_status: 1
    dwc2 3f980000.usb: Entering host clock gating.
    Disabling IRQ #66
    irq 66: nobody cared (try booting with the "irqpoll" option)
    CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-gc1bb81b13202-dirty #322
    Hardware name: BCM2835
    Call trace:
     unwind_backtrace from show_stack+0x10/0x14
     show_stack from dump_stack_lvl+0x50/0x64
     dump_stack_lvl from __report_bad_irq+0x38/0xc0
     __report_bad_irq from note_interrupt+0x2ac/0x2f4
     note_interrupt from handle_irq_event+0x88/0x8c
     handle_irq_event from handle_level_irq+0xb4/0x1ac
     handle_level_irq from generic_handle_domain_irq+0x24/0x34
     generic_handle_domain_irq from bcm2836_chained_handle_irq+0x24/0x28
     bcm2836_chained_handle_irq from generic_handle_domain_irq+0x24/0x34
     generic_handle_domain_irq from generic_handle_arch_irq+0x34/0x44
     generic_handle_arch_irq from __irq_svc+0x88/0xb0
     Exception stack(0xc1d01f20 to 0xc1d01f68)
     1f20: 0004ef3c 00000001 00000000 00000000 c1d09780 c1f6bb5c c1d04e54 c1c60ca8
     1f40: c1d04e94 00000000 00000000 c1d092a8 c1f6af20 c1d01f70 c1211b98 c1212f40
     1f60: 60000013 ffffffff
     __irq_svc from default_idle_call+0x1c/0xb0
     default_idle_call from do_idle+0x21c/0x284
     do_idle from cpu_startup_entry+0x28/0x2c
     cpu_startup_entry from kernel_init+0x0/0x12c
    handlers:
     [<e3a25c00>] dwc2_handle_common_intr
     [<58bf98a3>] usb_hcd_irq
    Disabling IRQ #66
    
    So avoid this by reading the connection status directly.
    
    Fixes: 113f86d0c302 ("usb: dwc2: Update partial power down entering by system suspend")
    Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
    Link: https://lore.kernel.org/r/20241202001631.75473-4-wahrenst@gmx.net
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: dwc2: Fix HCD resume [+ + +]

Author: Stefan Wahren <wahrenst@gmx.net>
Date:   Mon Dec 2 01:16:29 2024 +0100

    usb: dwc2: Fix HCD resume
    
    commit 336f72d3cbf5cc17df2947bbbd2ba6e2509f17e8 upstream.
    
    The Raspberry Pi can suffer on interrupt storms on HCD resume. The dwc2
    driver sometimes misses to enable HCD_FLAG_HW_ACCESSIBLE before re-enabling
    the interrupts. This causes a situation where both handler ignore a incoming
    port interrupt and force the upper layers to disable the dwc2 interrupt
    line. This leaves the USB interface in a unusable state:
    
    irq 66: nobody cared (try booting with the "irqpoll" option)
    CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W          6.10.0-rc3
    Hardware name: BCM2835
    Call trace:
    unwind_backtrace from show_stack+0x10/0x14
    show_stack from dump_stack_lvl+0x50/0x64
    dump_stack_lvl from __report_bad_irq+0x38/0xc0
    __report_bad_irq from note_interrupt+0x2ac/0x2f4
    note_interrupt from handle_irq_event+0x88/0x8c
    handle_irq_event from handle_level_irq+0xb4/0x1ac
    handle_level_irq from generic_handle_domain_irq+0x24/0x34
    generic_handle_domain_irq from bcm2836_chained_handle_irq+0x24/0x28
    bcm2836_chained_handle_irq from generic_handle_domain_irq+0x24/0x34
    generic_handle_domain_irq from generic_handle_arch_irq+0x34/0x44
    generic_handle_arch_irq from __irq_svc+0x88/0xb0
    Exception stack(0xc1b01f20 to 0xc1b01f68)
    1f20: 0005c0d4 00000001 00000000 00000000 c1b09780 c1d6b32c c1b04e54 c1a5eae8
    1f40: c1b04e90 00000000 00000000 00000000 c1d6a8a0 c1b01f70 c11d2da8 c11d4160
    1f60: 60000013 ffffffff
    __irq_svc from default_idle_call+0x1c/0xb0
    default_idle_call from do_idle+0x21c/0x284
    do_idle from cpu_startup_entry+0x28/0x2c
    cpu_startup_entry from kernel_init+0x0/0x12c
    handlers:
    [<f539e0f4>] dwc2_handle_common_intr
    [<75cd278b>] usb_hcd_irq
    Disabling IRQ #66
    
    So enable the HCD_FLAG_HW_ACCESSIBLE flag in case there is a port
    connection.
    
    Fixes: c74c26f6e398 ("usb: dwc2: Fix partial power down exiting by system resume")
    Closes: https://lore.kernel.org/linux-usb/3fd0c2fb-4752-45b3-94eb-42352703e1fd@gmx.net/T/
    Link: https://lore.kernel.org/all/5e8cbce0-3260-2971-484f-fc73a3b2bd28@synopsys.com/
    Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
    Link: https://lore.kernel.org/r/20241202001631.75473-2-wahrenst@gmx.net
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: dwc2: hcd: Fix GetPortStatus & SetPortFeature [+ + +]

Author: Stefan Wahren <wahrenst@gmx.net>
Date:   Mon Dec 2 01:16:30 2024 +0100

    usb: dwc2: hcd: Fix GetPortStatus & SetPortFeature
    
    commit a8d3e4a734599c7d0f6735f8db8a812e503395dd upstream.
    
    On Rasperry Pis without onboard USB hub the power cycle during
    power connect init only disable the port but never enabled it again:
    
      usb usb1-port1: attempt power cycle
    
    The port relevant part in dwc2_hcd_hub_control() is skipped in case
    port_connect_status = 0 under the assumption the core is or will be soon
    in device mode. But this assumption is wrong, because after ClearPortFeature
    USB_PORT_FEAT_POWER the port_connect_status will also be 0 and
    SetPortFeature (incl. USB_PORT_FEAT_POWER) will be a no-op.
    
    Fix the behavior of dwc2_hcd_hub_control() by replacing the
    port_connect_status check with dwc2_is_device_mode().
    
    Link: https://github.com/raspberrypi/linux/issues/6247
    Fixes: 7359d482eb4d ("staging: HCD files for the DWC2 driver")
    Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
    Link: https://lore.kernel.org/r/20241202001631.75473-3-wahrenst@gmx.net
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: dwc3: imx8mp: fix software node kernel dump [+ + +]

Author: Xu Yang <xu.yang_2@nxp.com>
Date:   Tue Nov 26 11:28:41 2024 +0800

    usb: dwc3: imx8mp: fix software node kernel dump
    
    commit a4faee01179a4d9cbad9ba6be2da8637c68c1438 upstream.
    
    When unbind and bind the device again, kernel will dump below warning:
    
    [  173.972130] sysfs: cannot create duplicate filename '/devices/platform/soc/4c010010.usb/software_node'
    [  173.981564] CPU: 2 UID: 0 PID: 536 Comm: sh Not tainted 6.12.0-rc6-06344-g2aed7c4a5c56 #144
    [  173.989923] Hardware name: NXP i.MX95 15X15 board (DT)
    [  173.995062] Call trace:
    [  173.997509]  dump_backtrace+0x90/0xe8
    [  174.001196]  show_stack+0x18/0x24
    [  174.004524]  dump_stack_lvl+0x74/0x8c
    [  174.008198]  dump_stack+0x18/0x24
    [  174.011526]  sysfs_warn_dup+0x64/0x80
    [  174.015201]  sysfs_do_create_link_sd+0xf0/0xf8
    [  174.019656]  sysfs_create_link+0x20/0x40
    [  174.023590]  software_node_notify+0x90/0x100
    [  174.027872]  device_create_managed_software_node+0xec/0x108
    ...
    
    The '4c010010.usb' device is a platform device created during the initcall
    and is never removed, which causes its associated software node to persist
    indefinitely.
    
    The existing device_create_managed_software_node() does not provide a
    corresponding removal function.
    
    Replace device_create_managed_software_node() with the
    device_add_software_node() and device_remove_software_node() pair to ensure
    proper addition and removal of software nodes, addressing this issue.
    
    Fixes: a9400f1979a0 ("usb: dwc3: imx8mp: add 2 software managed quirk properties for host mode")
    Cc: stable@vger.kernel.org
    Reviewed-by: Frank Li <Frank.Li@nxp.com>
    Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
    Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
    Link: https://lore.kernel.org/r/20241126032841.2458338-1-xu.yang_2@nxp.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: dwc3: xilinx: make sure pipe clock is deselected in usb2 only mode [+ + +]

Author: Neal Frager <neal.frager@amd.com>
Date:   Mon Dec 2 23:41:51 2024 +0530

    usb: dwc3: xilinx: make sure pipe clock is deselected in usb2 only mode
    
    commit a48f744bef9ee74814a9eccb030b02223e48c76c upstream.
    
    When the USB3 PHY is not defined in the Linux device tree, there could
    still be a case where there is a USB3 PHY active on the board and enabled
    by the first stage bootloader. If serdes clock is being used then the USB
    will fail to enumerate devices in 2.0 only mode.
    
    To solve this, make sure that the PIPE clock is deselected whenever the
    USB3 PHY is not defined and guarantees that the USB2 only mode will work
    in all cases.
    
    Fixes: 9678f3361afc ("usb: dwc3: xilinx: Skip resets and USB3 register settings for USB2.0 mode")
    Cc: stable@vger.kernel.org
    Signed-off-by: Neal Frager <neal.frager@amd.com>
    Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
    Acked-by: Peter Korsgaard <peter@korsgaard.com>
    Link: https://lore.kernel.org/r/1733163111-1414816-1-git-send-email-radhey.shyam.pandey@amd.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: ehci-hcd: fix call balance of clocks handling routines [+ + +]

Author: Vitalii Mordan <mordan@ispras.ru>
Date:   Thu Nov 21 14:47:00 2024 +0300

    usb: ehci-hcd: fix call balance of clocks handling routines
    
    commit 97264eaaba0122a5b7e8ddd7bf4ff3ac57c2b170 upstream.
    
    If the clocks priv->iclk and priv->fclk were not enabled in ehci_hcd_sh_probe,
    they should not be disabled in any path.
    
    Conversely, if they was enabled in ehci_hcd_sh_probe, they must be disabled
    in all error paths to ensure proper cleanup.
    
    Found by Linux Verification Center (linuxtesting.org) with Klever.
    
    Fixes: 63c845522263 ("usb: ehci-hcd: Add support for SuperH EHCI.")
    Cc: stable@vger.kernel.org # ff30bd6a6618: sh: clk: Fix clk_enable() to return 0 on NULL clk
    Signed-off-by: Vitalii Mordan <mordan@ispras.ru>
    Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
    Link: https://lore.kernel.org/r/20241121114700.2100520-1-mordan@ispras.ru
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: gadget: midi2: Fix interpretation of is_midi1 bits [+ + +]

Author: Takashi Iwai <tiwai@suse.de>
Date:   Wed Nov 27 08:02:11 2024 +0100

    usb: gadget: midi2: Fix interpretation of is_midi1 bits
    
    commit 82937056967da052cbc04b4435c13db84192dc52 upstream.
    
    The UMP Function Block info m1.0 field (represented by is_midi1 sysfs
    entry) is an enumeration from 0 to 2, while the midi2 gadget driver
    incorrectly copies it to the corresponding snd_ump_block_info.flags
    bits as-is.  This made the wrong bit flags set when m1.0 = 2.
    
    This patch corrects the wrong interpretation of is_midi1 bits.
    
    Fixes: 29ee7a4dddd5 ("usb: gadget: midi2: Add configfs support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Link: https://lore.kernel.org/r/20241127070213.8232-1-tiwai@suse.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: gadget: u_serial: Fix the issue that gs_start_io crashed due to accessing null pointer [+ + +]

Author: Lianqin Hu <hulianqin@vivo.com>
Date:   Tue Dec 3 12:14:16 2024 +0000

    usb: gadget: u_serial: Fix the issue that gs_start_io crashed due to accessing null pointer
    
    commit 4cfbca86f6a8b801f3254e0e3c8f2b1d2d64be2b upstream.
    
    Considering that in some extreme cases,
    when u_serial driver is accessed by multiple threads,
    Thread A is executing the open operation and calling the gs_open,
    Thread B is executing the disconnect operation and calling the
    gserial_disconnect function,The port->port_usb pointer will be set to NULL.
    
    E.g.
        Thread A                                 Thread B
        gs_open()                                gadget_unbind_driver()
        gs_start_io()                            composite_disconnect()
        gs_start_rx()                            gserial_disconnect()
        ...                                      ...
        spin_unlock(&port->port_lock)
        status = usb_ep_queue()                  spin_lock(&port->port_lock)
        spin_lock(&port->port_lock)              port->port_usb = NULL
        gs_free_requests(port->port_usb->in)     spin_unlock(&port->port_lock)
        Crash
    
    This causes thread A to access a null pointer (port->port_usb is null)
    when calling the gs_free_requests function, causing a crash.
    
    If port_usb is NULL, the release request will be skipped as it
    will be done by gserial_disconnect.
    
    So add a null pointer check to gs_start_io before attempting
    to access the value of the pointer port->port_usb.
    
    Call trace:
     gs_start_io+0x164/0x25c
     gs_open+0x108/0x13c
     tty_open+0x314/0x638
     chrdev_open+0x1b8/0x258
     do_dentry_open+0x2c4/0x700
     vfs_open+0x2c/0x3c
     path_openat+0xa64/0xc60
     do_filp_open+0xb8/0x164
     do_sys_openat2+0x84/0xf0
     __arm64_sys_openat+0x70/0x9c
     invoke_syscall+0x58/0x114
     el0_svc_common+0x80/0xe0
     do_el0_svc+0x1c/0x28
     el0_svc+0x38/0x68
    
    Fixes: c1dca562be8a ("usb gadget: split out serial core")
    Cc: stable@vger.kernel.org
    Suggested-by: Prashanth K <quic_prashk@quicinc.com>
    Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
    Acked-by: Prashanth K <quic_prashk@quicinc.com>
    Link: https://lore.kernel.org/r/TYUPR06MB62178DC3473F9E1A537DCD02D2362@TYUPR06MB6217.apcprd06.prod.outlook.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: host: max3421-hcd: Correctly abort a USB request. [+ + +]

Author: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Date:   Mon Nov 25 11:14:30 2024 +1300

    usb: host: max3421-hcd: Correctly abort a USB request.
    
    commit 0d2ada05227881f3d0722ca2364e3f7a860a301f upstream.
    
    If the current USB request was aborted, the spi thread would not respond
    to any further requests. This is because the "curr_urb" pointer would
    not become NULL, so no further requests would be taken off the queue.
    The solution here is to set the "urb_done" flag, as this will cause the
    correct handling of the URB. Also clear interrupts that should only be
    expected if an URB is in progress.
    
    Fixes: 2d53139f3162 ("Add support for using a MAX3421E chip as a host driver.")
    Cc: stable <stable@kernel.org>
    Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
    Link: https://lore.kernel.org/r/20241124221430.1106080-1-mark.tomlinson@alliedtelesis.co.nz
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: misc: onboard_usb_dev: skip suspend/resume sequence for USB5744 SMBus support [+ + +]

Author: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Date:   Tue Dec 3 00:18:22 2024 +0530

    usb: misc: onboard_usb_dev: skip suspend/resume sequence for USB5744 SMBus support
    
    commit ce15d6b3d5c3c6f78290066be0f0a4fd89cdeb5b upstream.
    
    USB5744 SMBus initialization is done once in probe() and doing it in resume
    is not supported so avoid going into suspend and reset the HUB.
    
    There is a sysfs property 'always_powered_in_suspend' to implement this
    feature but since default state should be set to a working configuration
    so override this property value.
    
    It fixes the suspend/resume testcase on Kria KR260 Robotics Starter Kit.
    
    Fixes: 6782311d04df ("usb: misc: onboard_usb_dev: add Microchip usb5744 SMBus programming support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
    Link: https://lore.kernel.org/r/1733165302-1694891-1-git-send-email-radhey.shyam.pandey@amd.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: typec: anx7411: fix fwnode_handle reference leak [+ + +]

Author: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
Date:   Thu Nov 21 11:34:29 2024 +0900

    usb: typec: anx7411: fix fwnode_handle reference leak
    
    commit 645d56e4cc74e953284809d096532c1955918a28 upstream.
    
    An fwnode_handle and usb_role_switch are obtained with an incremented
    refcount in anx7411_typec_port_probe(), however the refcounts are not
    decremented in the error path. The fwnode_handle is also not decremented
    in the .remove() function. Therefore, call fwnode_handle_put() and
    usb_role_switch_put() accordingly.
    
    Fixes: fe6d8a9c8e64 ("usb: typec: anx7411: Add Analogix PD ANX7411 support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
    Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
    Link: https://lore.kernel.org/r/20241121023429.962848-1-joe@pf.is.s.u-tokyo.ac.jp
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: typec: anx7411: fix OF node reference leaks in anx7411_typec_switch_probe() [+ + +]

Author: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
Date:   Tue Nov 26 10:49:09 2024 +0900

    usb: typec: anx7411: fix OF node reference leaks in anx7411_typec_switch_probe()
    
    commit ef42b906df5c57d0719b69419df9dfd25f25c161 upstream.
    
    The refcounts of the OF nodes obtained by of_get_child_by_name() calls
    in anx7411_typec_switch_probe() are not decremented. Replace them with
    device_get_named_child_node() calls and store the return values to the
    newly created fwnode_handle fields in anx7411_data, and call
    fwnode_handle_put() on them in the error path and in the unregister
    functions.
    
    Fixes: e45d7337dc0e ("usb: typec: anx7411: Use of_get_child_by_name() instead of of_find_node_by_name()")
    Cc: stable@vger.kernel.org
    Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
    Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
    Link: https://lore.kernel.org/r/20241126014909.3687917-1-joe@pf.is.s.u-tokyo.ac.jp
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: typec: ucsi: Fix completion notifications [+ + +]

Author: Łukasz Bartosik <ukaszb@chromium.org>
Date:   Tue Dec 3 10:23:18 2024 +0000

    usb: typec: ucsi: Fix completion notifications
    
    commit e37b383df91ba9bde9c6a31bf3ea9072561c5126 upstream.
    
    OPM                         PPM                         LPM
     |        1.send cmd         |                           |
     |-------------------------->|                           |
     |                           |--                         |
     |                           |  | 2.set busy bit in CCI  |
     |                           |<-                         |
     |      3.notify the OPM     |                           |
     |<--------------------------|                           |
     |                           | 4.send cmd to be executed |
     |                           |-------------------------->|
     |                           |                           |
     |                           |      5.cmd completed      |
     |                           |<--------------------------|
     |                           |                           |
     |                           |--                         |
     |                           |  | 6.set cmd completed    |
     |                           |<-       bit in CCI        |
     |                           |                           |
     |     7.notify the OPM      |                           |
     |<--------------------------|                           |
     |                           |                           |
     |   8.handle notification   |                           |
     |   from point 3, read CCI  |                           |
     |<--------------------------|                           |
     |                           |                           |
    
    When the PPM receives command from the OPM (p.1) it sets the busy bit
    in the CCI (p.2), sends notification to the OPM (p.3) and forwards the
    command to be executed by the LPM (p.4). When the PPM receives command
    completion from the LPM (p.5) it sets command completion bit in the CCI
    (p.6) and sends notification to the OPM (p.7). If command execution by
    the LPM is fast enough then when the OPM starts handling the notification
    from p.3 in p.8 and reads the CCI value it will see command completion bit
    set and will call complete(). Then complete() might be called again when
    the OPM handles notification from p.7.
    
    This fix replaces test_bit() with test_and_clear_bit()
    in ucsi_notify_common() in order to call complete() only
    once per request.
    
    This fix also reinitializes completion variable in
    ucsi_sync_control_common() before a command is sent.
    
    Fixes: 584e8df58942 ("usb: typec: ucsi: extract common code for command handling")
    Cc: stable@vger.kernel.org
    Signed-off-by: Łukasz Bartosik <ukaszb@chromium.org>
    Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
    Reviewed-by: Benson Leung <bleung@chromium.org>
    Link: https://lore.kernel.org/r/20241203102318.3386345-1-ukaszb@chromium.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

virtio_net: correct netdev_tx_reset_queue() invocation point [+ + +]

Author: Koichiro Den <koichiro.den@canonical.com>
Date:   Fri Dec 6 10:10:42 2024 +0900

    virtio_net: correct netdev_tx_reset_queue() invocation point
    
    commit 3ddccbefebdbe0c4c72a248676e4d39ac66a8e26 upstream.
    
    When virtnet_close is followed by virtnet_open, some TX completions can
    possibly remain unconsumed, until they are finally processed during the
    first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
    [1]. Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
    before RX napi enable") was not sufficient to eliminate all BQL crash
    cases for virtio-net.
    
    This issue can be reproduced with the latest net-next master by running:
    `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
    TX load from inside the machine.
    
    netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
    the device is not stopped in any case. For BQL core part, it's just like
    traffic nearly ceases to exist for some period. For stall detector added
    to BQL, even if virtnet_close could somehow lead to some TX completions
    delayed for long, followed by virtnet_open, we can just take it as stall
    as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
    based on BQL"). Note also that users can still reset stall_max via sysfs.
    
    So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
    eliminates the BQL crashes. As a result, netdev_tx_reset_queue() is now
    explicitly required in freeze/restore path. This patch adds it to
    immediately after free_unused_bufs(), following the rule of thumb:
    netdev_tx_reset_queue() should follow any SKB freeing not followed by
    netdev_tx_completed_queue(). This seems the most consistent and
    streamlined approach, and now netdev_tx_reset_queue() runs whenever
    free_unused_bufs() is done.
    
    [1]:
    ------------[ cut here ]------------
    kernel BUG at lib/dynamic_queue_limits.c:99!
    Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G    N 6.12.0net-next_main+ #2
    Tainted: [N]=TEST
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
    BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
    RIP: 0010:dql_completed+0x26b/0x290
    Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
    4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
    d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
    RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
    RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
    RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
    RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
    R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
    R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
    FS:  00007f41ded69500(0000) GS:ffff888667dc0000(0000)
    knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
    PKRU: 55555554
    Call Trace:
     <IRQ>
     ? die+0x32/0x80
     ? do_trap+0xd9/0x100
     ? dql_completed+0x26b/0x290
     ? dql_completed+0x26b/0x290
     ? do_error_trap+0x6d/0xb0
     ? dql_completed+0x26b/0x290
     ? exc_invalid_op+0x4c/0x60
     ? dql_completed+0x26b/0x290
     ? asm_exc_invalid_op+0x16/0x20
     ? dql_completed+0x26b/0x290
     __free_old_xmit+0xff/0x170 [virtio_net]
     free_old_xmit+0x54/0xc0 [virtio_net]
     virtnet_poll+0xf4/0xe30 [virtio_net]
     ? __update_load_avg_cfs_rq+0x264/0x2d0
     ? update_curr+0x35/0x260
     ? reweight_entity+0x1be/0x260
     __napi_poll.constprop.0+0x28/0x1c0
     net_rx_action+0x329/0x420
     ? enqueue_hrtimer+0x35/0x90
     ? trace_hardirqs_on+0x1d/0x80
     ? kvm_sched_clock_read+0xd/0x20
     ? sched_clock+0xc/0x30
     ? kvm_sched_clock_read+0xd/0x20
     ? sched_clock+0xc/0x30
     ? sched_clock_cpu+0xd/0x1a0
     handle_softirqs+0x138/0x3e0
     do_softirq.part.0+0x89/0xc0
     </IRQ>
     <TASK>
     __local_bh_enable_ip+0xa7/0xb0
     virtnet_open+0xc8/0x310 [virtio_net]
     __dev_open+0xfa/0x1b0
     __dev_change_flags+0x1de/0x250
     dev_change_flags+0x22/0x60
     do_setlink.isra.0+0x2df/0x10b0
     ? rtnetlink_rcv_msg+0x34f/0x3f0
     ? netlink_rcv_skb+0x54/0x100
     ? netlink_unicast+0x23e/0x390
     ? netlink_sendmsg+0x21e/0x490
     ? ____sys_sendmsg+0x31b/0x350
     ? avc_has_perm_noaudit+0x67/0xf0
     ? cred_has_capability.isra.0+0x75/0x110
     ? __nla_validate_parse+0x5f/0xee0
     ? __pfx___probestub_irq_enable+0x3/0x10
     ? __create_object+0x5e/0x90
     ? security_capable+0x3b/0x70
     rtnl_newlink+0x784/0xaf0
     ? avc_has_perm_noaudit+0x67/0xf0
     ? cred_has_capability.isra.0+0x75/0x110
     ? stack_depot_save_flags+0x24/0x6d0
     ? __pfx_rtnl_newlink+0x10/0x10
     rtnetlink_rcv_msg+0x34f/0x3f0
     ? do_syscall_64+0x6c/0x180
     ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
     ? __pfx_rtnetlink_rcv_msg+0x10/0x10
     netlink_rcv_skb+0x54/0x100
     netlink_unicast+0x23e/0x390
     netlink_sendmsg+0x21e/0x490
     ____sys_sendmsg+0x31b/0x350
     ? copy_msghdr_from_user+0x6d/0xa0
     ___sys_sendmsg+0x86/0xd0
     ? __pte_offset_map+0x17/0x160
     ? preempt_count_add+0x69/0xa0
     ? __call_rcu_common.constprop.0+0x147/0x610
     ? preempt_count_add+0x69/0xa0
     ? preempt_count_add+0x69/0xa0
     ? _raw_spin_trylock+0x13/0x60
     ? trace_hardirqs_on+0x1d/0x80
     __sys_sendmsg+0x66/0xc0
     do_syscall_64+0x6c/0x180
     entry_SYSCALL_64_after_hwframe+0x76/0x7e
    RIP: 0033:0x7f41defe5b34
    Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
    f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
    f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
    RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34
    RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003
    RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001
    R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003
    R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000
     </TASK>
    [...]
    
    Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
    Cc: <stable@vger.kernel.org> # v6.11+
    Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    [ pabeni: trimmed possibly troublesome separator ]
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize [+ + +]

Author: Koichiro Den <koichiro.den@canonical.com>
Date:   Fri Dec 6 10:10:45 2024 +0900

    virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize
    
    commit 1480f0f61b675567ca5d0943d6ef2e39172dcafd upstream.
    
    virtnet_tx_resize() flushes remaining tx skbs, requiring DQL counters to
    be reset when flushing has actually occurred. Add
    virtnet_sq_free_unused_buf_done() as a callback for virtqueue_reset() to
    handle this.
    
    Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
    Cc: <stable@vger.kernel.org> # v6.11+
    Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

virtio_ring: add a func argument 'recycle_done' to virtqueue_resize() [+ + +]

Author: Koichiro Den <koichiro.den@canonical.com>
Date:   Fri Dec 6 10:10:44 2024 +0900

    virtio_ring: add a func argument 'recycle_done' to virtqueue_resize()
    
    commit 8d6712c892019b9b9dc5c7039edd3c9d770b510b upstream.
    
    When virtqueue_resize() has actually recycled all unused buffers,
    additional work may be required in some cases. Relying solely on its
    return status is fragile, so introduce a new function argument
    'recycle_done', which is invoked when the recycle really occurs.
    
    Cc: <stable@vger.kernel.org> # v6.11+
    Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: cfg80211: sme: init n_channels before channels[] access [+ + +]

Author: Haoyu Li <lihaoyu499@gmail.com>
Date:   Tue Dec 3 23:20:49 2024 +0800

    wifi: cfg80211: sme: init n_channels before channels[] access
    
    [ Upstream commit f1d3334d604cc32db63f6e2b3283011e02294e54 ]
    
    With the __counted_by annocation in cfg80211_scan_request struct,
    the "n_channels" struct member must be set before accessing the
    "channels" array. Failing to do so will trigger a runtime warning
    when enabling CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE.
    
    Fixes: e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by")
    Signed-off-by: Haoyu Li <lihaoyu499@gmail.com>
    Link: https://patch.msgid.link/20241203152049.348806-1-lihaoyu499@gmail.com
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: mac80211: fix a queue stall in certain cases of CSA [+ + +]

Author: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Date:   Tue Nov 19 17:35:40 2024 +0200

    wifi: mac80211: fix a queue stall in certain cases of CSA
    
    [ Upstream commit 11ac0d7c3b5ba58232fb7dacb54371cbe75ec183 ]
    
    If we got an unprotected action frame with CSA and then we heard the
    beacon with the CSA IE, we'll block the queues with the CSA reason
    twice. Since this reason is refcounted, we won't wake up the queues
    since we wake them up only once and the ref count will never reach 0.
    This led to blocked queues that prevented any activity (even
    disconnection wouldn't reset the queue state and the only way to recover
    would be to reload the kernel module.
    
    Fix this by not refcounting the CSA reason.
    It becomes now pointless to maintain the csa_blocked_queues state.
    Remove it.
    
    Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
    Fixes: 414e090bc41d ("wifi: mac80211: restrict public action ECSA frame handling")
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219447
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Link: https://patch.msgid.link/20241119173108.5ea90828c2cc.I4f89e58572fb71ae48e47a81e74595cac410fbac@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: mac80211: fix station NSS capability initialization order [+ + +]

Author: Benjamin Lin <benjamin-jw.lin@mediatek.com>
Date:   Mon Nov 18 16:07:22 2024 +0800

    wifi: mac80211: fix station NSS capability initialization order
    
    [ Upstream commit 819e0f1e58e0ba3800cd9eb96b2a39e44e49df97 ]
    
    Station's spatial streaming capability should be initialized before
    handling VHT OMN, because the handling requires the capability information.
    
    Fixes: a8bca3e9371d ("wifi: mac80211: track capability/opmode NSS separately")
    Signed-off-by: Benjamin Lin <benjamin-jw.lin@mediatek.com>
    Link: https://patch.msgid.link/20241118080722.9603-1-benjamin-jw.lin@mediatek.com
    [rewrite subject]
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: mac80211: init cnt before accessing elem in ieee80211_copy_mbssid_beacon [+ + +]

Author: Haoyu Li <lihaoyu499@gmail.com>
Date:   Sun Nov 24 01:25:00 2024 +0800

    wifi: mac80211: init cnt before accessing elem in ieee80211_copy_mbssid_beacon
    
    [ Upstream commit 496db69fd860570145f7c266b31f3af85fca5b00 ]
    
    With the new __counted_by annocation in cfg80211_mbssid_elems,
    the "cnt" struct member must be set before accessing the "elem"
    array. Failing to do so will trigger a runtime warning when enabling
    CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE.
    
    Fixes: c14679d7005a ("wifi: cfg80211: Annotate struct cfg80211_mbssid_elems with __counted_by")
    Signed-off-by: Haoyu Li <lihaoyu499@gmail.com>
    Link: https://patch.msgid.link/20241123172500.311853-1-lihaoyu499@gmail.com
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: nl80211: fix NL80211_ATTR_MLO_LINK_ID off-by-one [+ + +]

Author: Lin Ma <linma@zju.edu.cn>
Date:   Sun Dec 1 01:05:26 2024 +0800

    wifi: nl80211: fix NL80211_ATTR_MLO_LINK_ID off-by-one
    
    [ Upstream commit 2e3dbf938656986cce73ac4083500d0bcfbffe24 ]
    
    Since the netlink attribute range validation provides inclusive
    checking, the *max* of attribute NL80211_ATTR_MLO_LINK_ID should be
    IEEE80211_MLD_MAX_NUM_LINKS - 1 otherwise causing an off-by-one.
    
    One crash stack for demonstration:
    ==================================================================
    BUG: KASAN: wild-memory-access in ieee80211_tx_control_port+0x3b6/0xca0 net/mac80211/tx.c:5939
    Read of size 6 at addr 001102080000000c by task fuzzer.386/9508
    
    CPU: 1 PID: 9508 Comm: syz.1.386 Not tainted 6.1.70 #2
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0x177/0x231 lib/dump_stack.c:106
     print_report+0xe0/0x750 mm/kasan/report.c:398
     kasan_report+0x139/0x170 mm/kasan/report.c:495
     kasan_check_range+0x287/0x290 mm/kasan/generic.c:189
     memcpy+0x25/0x60 mm/kasan/shadow.c:65
     ieee80211_tx_control_port+0x3b6/0xca0 net/mac80211/tx.c:5939
     rdev_tx_control_port net/wireless/rdev-ops.h:761 [inline]
     nl80211_tx_control_port+0x7b3/0xc40 net/wireless/nl80211.c:15453
     genl_family_rcv_msg_doit+0x22e/0x320 net/netlink/genetlink.c:756
     genl_family_rcv_msg net/netlink/genetlink.c:833 [inline]
     genl_rcv_msg+0x539/0x740 net/netlink/genetlink.c:850
     netlink_rcv_skb+0x1de/0x420 net/netlink/af_netlink.c:2508
     genl_rcv+0x24/0x40 net/netlink/genetlink.c:861
     netlink_unicast_kernel net/netlink/af_netlink.c:1326 [inline]
     netlink_unicast+0x74b/0x8c0 net/netlink/af_netlink.c:1352
     netlink_sendmsg+0x882/0xb90 net/netlink/af_netlink.c:1874
     sock_sendmsg_nosec net/socket.c:716 [inline]
     __sock_sendmsg net/socket.c:728 [inline]
     ____sys_sendmsg+0x5cc/0x8f0 net/socket.c:2499
     ___sys_sendmsg+0x21c/0x290 net/socket.c:2553
     __sys_sendmsg net/socket.c:2582 [inline]
     __do_sys_sendmsg net/socket.c:2591 [inline]
     __se_sys_sendmsg+0x19e/0x270 net/socket.c:2589
     do_syscall_x64 arch/x86/entry/common.c:51 [inline]
     do_syscall_64+0x45/0x90 arch/x86/entry/common.c:81
     entry_SYSCALL_64_after_hwframe+0x63/0xcd
    
    Update the policy to ensure correct validation.
    
    Fixes: 7b0a0e3c3a88 ("wifi: cfg80211: do some rework towards MLO link APIs")
    Signed-off-by: Lin Ma <linma@zju.edu.cn>
    Suggested-by: Cengiz Can <cengiz.can@canonical.com>
    Link: https://patch.msgid.link/20241130170526.96698-1-linma@zju.edu.cn
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/static-call: fix 32-bit build [+ + +]

Author: Juergen Gross <jgross@suse.com>
Date:   Wed Dec 18 09:02:28 2024 +0100

    x86/static-call: fix 32-bit build
    
    commit 349f0086ba8b2a169877d21ff15a4d9da3a60054 upstream.
    
    In 32-bit x86 builds CONFIG_STATIC_CALL_INLINE isn't set, leading to
    static_call_initialized not being available.
    
    Define it as "0" in that case.
    
    Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Fixes: 0ef8047b737d ("x86/static-call: provide a way to do very early static-call updates")
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/static-call: provide a way to do very early static-call updates [+ + +]

Author: Juergen Gross <jgross@suse.com>
Date:   Fri Nov 29 16:15:54 2024 +0100

    x86/static-call: provide a way to do very early static-call updates
    
    commit 0ef8047b737d7480a5d4c46d956e97c190f13050 upstream.
    
    Add static_call_update_early() for updating static-call targets in
    very early boot.
    
    This will be needed for support of Xen guest type specific hypercall
    functions.
    
    This is part of XSA-466 / CVE-2024-53241.
    
    Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Co-developed-by: Peter Zijlstra <peterz@infradead.org>
    Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/xen: add central hypercall functions [+ + +]

Author: Juergen Gross <jgross@suse.com>
Date:   Thu Oct 17 11:00:52 2024 +0200

    x86/xen: add central hypercall functions
    
    commit b4845bb6383821a9516ce30af3a27dc873e37fd4 upstream.
    
    Add generic hypercall functions usable for all normal (i.e. not iret)
    hypercalls. Depending on the guest type and the processor vendor
    different functions need to be used due to the to be used instruction
    for entering the hypervisor:
    
    - PV guests need to use syscall
    - HVM/PVH guests on Intel need to use vmcall
    - HVM/PVH guests on AMD and Hygon need to use vmmcall
    
    As PVH guests need to issue hypercalls very early during boot, there
    is a 4th hypercall function needed for HVM/PVH which can be used on
    Intel and AMD processors. It will check the vendor type and then set
    the Intel or AMD specific function to use via static_call().
    
    This is part of XSA-466 / CVE-2024-53241.
    
    Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Co-developed-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/xen: don't do PV iret hypercall through hypercall page [+ + +]

Author: Juergen Gross <jgross@suse.com>
Date:   Wed Oct 16 10:40:26 2024 +0200

    x86/xen: don't do PV iret hypercall through hypercall page
    
    commit a2796dff62d6c6bfc5fbebdf2bee0d5ac0438906 upstream.
    
    Instead of jumping to the Xen hypercall page for doing the iret
    hypercall, directly code the required sequence in xen-asm.S.
    
    This is done in preparation of no longer using hypercall page at all,
    as it has shown to cause problems with speculation mitigations.
    
    This is part of XSA-466 / CVE-2024-53241.
    
    Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/xen: remove hypercall page [+ + +]

Author: Juergen Gross <jgross@suse.com>
Date:   Thu Oct 17 15:27:31 2024 +0200

    x86/xen: remove hypercall page
    
    commit 7fa0da5373685e7ed249af3fa317ab1e1ba8b0a6 upstream.
    
    The hypercall page is no longer needed. It can be removed, as from the
    Xen perspective it is optional.
    
    But, from Linux's perspective, it removes naked RET instructions that
    escape the speculative protections that Call Depth Tracking and/or
    Untrain Ret are trying to achieve.
    
    This is part of XSA-466 / CVE-2024-53241.
    
    Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/xen: use new hypercall functions instead of hypercall page [+ + +]

Author: Juergen Gross <jgross@suse.com>
Date:   Thu Oct 17 14:47:13 2024 +0200

    x86/xen: use new hypercall functions instead of hypercall page
    
    commit b1c2cb86f4a7861480ad54bb9a58df3cbebf8e92 upstream.
    
    Call the Xen hypervisor via the new xen_hypercall_func static-call
    instead of the hypercall page.
    
    This is part of XSA-466 / CVE-2024-53241.
    
    Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Co-developed-by: Peter Zijlstra <peterz@infradead.org>
    Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86: make get_cpu_vendor() accessible from Xen code [+ + +]

Author: Juergen Gross <jgross@suse.com>
Date:   Thu Oct 17 08:29:48 2024 +0200

    x86: make get_cpu_vendor() accessible from Xen code
    
    commit efbcd61d9bebb771c836a3b8bfced8165633db7c upstream.
    
    In order to be able to differentiate between AMD and Intel based
    systems for very early hypercalls without having to rely on the Xen
    hypercall page, make get_cpu_vendor() non-static.
    
    Refactor early_cpu_init() for the same reason by splitting out the
    loop initializing cpu_devs() into an externally callable function.
    
    This is part of XSA-466 / CVE-2024-53241.
    
    Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xen/netfront: fix crash when removing device [+ + +]

Author: Juergen Gross <jgross@suse.com>
Date:   Thu Nov 7 16:17:00 2024 +0100

    xen/netfront: fix crash when removing device
    
    commit f9244fb55f37356f75c739c57323d9422d7aa0f8 upstream.
    
    When removing a netfront device directly after a suspend/resume cycle
    it might happen that the queues have not been setup again, causing a
    crash during the attempt to stop the queues another time.
    
    Fix that by checking the queues are existing before trying to stop
    them.
    
    This is XSA-465 / CVE-2024-53240.
    
    Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
    Fixes: d50b7914fae0 ("xen-netfront: Fix NULL sring after live migration")
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: don't drop errno values when we fail to ficlone the entire range [+ + +]

Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Dec 2 10:57:27 2024 -0800

    xfs: don't drop errno values when we fail to ficlone the entire range
    
    commit 7ce31f20a0771d71779c3b0ec9cdf474cc3c8e9a upstream.
    
    Way back when we first implemented FICLONE for XFS, life was simple --
    either the the entire remapping completed, or something happened and we
    had to return an errno explaining what happened.  Neither of those
    ioctls support returning partial results, so it's all or nothing.
    
    Then things got complicated when copy_file_range came along, because it
    actually can return the number of bytes copied, so commit 3f68c1f562f1e4
    tried to make it so that we could return a partial result if the
    REMAP_FILE_CAN_SHORTEN flag is set.  This is also how FIDEDUPERANGE can
    indicate that the kernel performed a partial deduplication.
    
    Unfortunately, the logic is wrong if an error stops the remapping and
    CAN_SHORTEN is not set.  Because those callers cannot return partial
    results, it is an error for ->remap_file_range to return a positive
    quantity that is less than the @len passed in.  Implementations really
    should be returning a negative errno in this case, because that's what
    btrfs (which introduced FICLONE{,RANGE}) did.
    
    Therefore, ->remap_range implementations cannot silently drop an errno
    that they might have when the number of bytes remapped is less than the
    number of bytes requested and CAN_SHORTEN is not set.
    
    Found by running generic/562 on a 64k fsblock filesystem and wondering
    why it reported corrupt files.
    
    Cc: <stable@vger.kernel.org> # v4.20
    Fixes: 3fc9f5e409319e ("xfs: remove xfs_reflink_remap_range")
    Really-Fixes: 3f68c1f562f1e4 ("xfs: support returning partial reflink results")
    Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix null bno_hint handling in xfs_rtallocate_rtg [+ + +]

Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Dec 2 10:57:30 2024 -0800

    xfs: fix null bno_hint handling in xfs_rtallocate_rtg
    
    commit af9f02457f461b23307fe826a37be61ba6e32c92 upstream.
    
    xfs_bmap_rtalloc initializes the bno_hint variable to NULLRTBLOCK (aka
    NULLFSBLOCK).  If the allocation request is for a file range that's
    adjacent to an existing mapping, it will then change bno_hint to the
    blkno hint in the bmalloca structure.
    
    In other words, bno_hint is either a rt block number, or it's all 1s.
    Unfortunately, commit ec12f97f1b8a8f didn't take the NULLRTBLOCK state
    into account, which means that it tries to translate that into a
    realtime extent number.  We then end up with an obnoxiously high rtx
    number and pointlessly feed that to the near allocator.  This often
    fails and falls back to the by-size allocator.  Seeing as we had no
    locality hint anyway, this is a waste of time.
    
    Fix the code to detect a lack of bno_hint correctly.  This was detected
    by running xfs/009 with metadir enabled and a 28k rt extent size.
    
    Cc: <stable@vger.kernel.org> # v6.12
    Fixes: ec12f97f1b8a8f ("xfs: make the rtalloc start hint a xfs_rtblock_t")
    Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix scrub tracepoints when inode-rooted btrees are involved [+ + +]

Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Dec 2 10:57:32 2024 -0800

    xfs: fix scrub tracepoints when inode-rooted btrees are involved
    
    commit ffc3ea4f3c1cc83a86b7497b0c4b0aee7de5480d upstream.
    
    Fix a minor mistakes in the scrub tracepoints that can manifest when
    inode-rooted btrees are enabled.  The existing code worked fine for bmap
    btrees, but we should tighten the code up to be less sloppy.
    
    Cc: <stable@vger.kernel.org> # v5.7
    Fixes: 92219c292af8dd ("xfs: convert btree cursor inode-private member names")
    Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: only run precommits once per transaction object [+ + +]

Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Dec 2 10:57:33 2024 -0800

    xfs: only run precommits once per transaction object
    
    commit 44d9b07e52db25035680713c3428016cadcd2ea1 upstream.
    
    Committing a transaction tx0 with a defer ops chain of (A, B, C)
    creates a chain of transactions that looks like this:
    
    tx0 -> txA -> txB -> txC
    
    Prior to commit cb042117488dbf, __xfs_trans_commit would run precommits
    on tx0, then call xfs_defer_finish_noroll to convert A-C to tx[A-C].
    Unfortunately, after the finish_noroll loop we forgot to run precommits
    on txC.  That was fixed by adding the second precommit call.
    
    Unfortunately, none of us remembered that xfs_defer_finish_noroll
    calls __xfs_trans_commit a second time to commit tx0 before finishing
    work A in txA and committing that.  In other words, we run precommits
    twice on tx0:
    
    xfs_trans_commit(tx0)
        __xfs_trans_commit(tx0, false)
            xfs_trans_run_precommits(tx0)
            xfs_defer_finish_noroll(tx0)
                xfs_trans_roll(tx0)
                    txA = xfs_trans_dup(tx0)
                    __xfs_trans_commit(tx0, true)
                    xfs_trans_run_precommits(tx0)
    
    This currently isn't an issue because the inode item precommit is
    idempotent; the iunlink item precommit deletes itself so it can't be
    called again; and the buffer/dquot item precommits only check the incore
    objects for corruption.  However, it doesn't make sense to run
    precommits twice.
    
    Fix this situation by only running precommits after finish_noroll.
    
    Cc: <stable@vger.kernel.org> # v6.4
    Fixes: cb042117488dbf ("xfs: defered work could create precommits")
    Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: return a 64-bit block count from xfs_btree_count_blocks [+ + +]

Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Dec 2 10:57:26 2024 -0800

    xfs: return a 64-bit block count from xfs_btree_count_blocks
    
    commit bd27c7bcdca25ce8067ebb94ded6ac1bd7b47317 upstream.
    
    With the nrext64 feature enabled, it's possible for a data fork to have
    2^48 extent mappings.  Even with a 64k fsblock size, that maps out to
    a bmbt containing more than 2^32 blocks.  Therefore, this predicate must
    return a u64 count to avoid an integer wraparound that will cause scrub
    to do the wrong thing.
    
    It's unlikely that any such filesystem currently exists, because the
    incore bmbt would consume more than 64GB of kernel memory on its own,
    and so far nobody except me has driven a filesystem that far, judging
    from the lack of complaints.
    
    Cc: <stable@vger.kernel.org> # v5.19
    Fixes: df9ad5cc7a5240 ("xfs: Introduce macros to represent new maximum extent counts for data/attr forks")
    Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: return from xfs_symlink_verify early on V4 filesystems [+ + +]

Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Dec 2 10:57:43 2024 -0800

    xfs: return from xfs_symlink_verify early on V4 filesystems
    
    commit 7f8b718c58783f3ff0810b39e2f62f50ba2549f6 upstream.
    
    V4 symlink blocks didn't have headers, so return early if this is a V4
    filesystem.
    
    Cc: <stable@vger.kernel.org> # v5.1
    Fixes: 39708c20ab5133 ("xfs: miscellaneous verifier magic value fixups")
    Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink [+ + +]

Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Dec 2 10:57:28 2024 -0800

    xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink
    
    commit 6f4669708a69fd21f0299c2d5c4780a6ce358ab5 upstream.
    
    If we need to reset a symlink target to the "durr it's busted" string,
    then we clear the zapped flag as well.  However, this should be using
    the provided helper so that we don't set the zapped state on an
    otherwise ok symlink.
    
    Cc: <stable@vger.kernel.org> # v6.10
    Fixes: 2651923d8d8db0 ("xfs: online repair of symbolic links")
    Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: unlock inodes when erroring out of xfs_trans_alloc_dir [+ + +]

Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Dec 2 10:57:33 2024 -0800

    xfs: unlock inodes when erroring out of xfs_trans_alloc_dir
    
    commit 53b001a21c9dff73b64e8c909c41991f01d5d00f upstream.
    
    Debugging a filesystem patch with generic/475 caused the system to hang
    after observing the following sequences in dmesg:
    
     XFS (dm-0): metadata I/O error in "xfs_imap_to_bp+0x61/0xe0 [xfs]" at daddr 0x491520 len 32 error 5
     XFS (dm-0): metadata I/O error in "xfs_btree_read_buf_block+0xba/0x160 [xfs]" at daddr 0x3445608 len 8 error 5
     XFS (dm-0): metadata I/O error in "xfs_imap_to_bp+0x61/0xe0 [xfs]" at daddr 0x138e1c0 len 32 error 5
     XFS (dm-0): log I/O error -5
     XFS (dm-0): Metadata I/O Error (0x1) detected at xfs_trans_read_buf_map+0x1ea/0x4b0 [xfs] (fs/xfs/xfs_trans_buf.c:311).  Shutting down filesystem.
     XFS (dm-0): Please unmount the filesystem and rectify the problem(s)
     XFS (dm-0): Internal error dqp->q_ino.reserved < dqp->q_ino.count at line 869 of file fs/xfs/xfs_trans_dquot.c.  Caller xfs_trans_dqresv+0x236/0x440 [xfs]
     XFS (dm-0): Corruption detected. Unmount and run xfs_repair
     XFS (dm-0): Unmounting Filesystem be6bcbcc-9921-4deb-8d16-7cc94e335fa7
    
    The system is stuck in unmount trying to lock a couple of inodes so that
    they can be purged.  The dquot corruption notice above is a clue to what
    happened -- a link() call tried to set up a transaction to link a child
    into a directory.  Quota reservation for the transaction failed after IO
    errors shut down the filesystem, but then we forgot to unlock the inodes
    on our way out.  Fix that.
    
    Cc: <stable@vger.kernel.org> # v6.10
    Fixes: bd5562111d5839 ("xfs: Hold inode locks in xfs_trans_alloc_dir")
    Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: update btree keys correctly when _insrec splits an inode root block [+ + +]

Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Dec 2 10:57:31 2024 -0800

    xfs: update btree keys correctly when _insrec splits an inode root block
    
    commit 6d7b4bc1c3e00b1a25b7a05141a64337b4629337 upstream.
    
    In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec
    would erroneously try to update the parent's key for a block that had
    been split if we decided to insert the new record into the new block.
    The solution was to detect this situation and update the in-core key
    value that we pass up to the caller so that the caller will (eventually)
    add the new block to the parent level of the tree with the correct key.
    
    However, I missed a subtlety about the way inode-rooted btrees work.  If
    the full block was a maximally sized inode root block, we'll solve that
    fullness by moving the root block's records to a new block, resizing the
    root block, and updating the root to point to the new block.  We don't
    pass a pointer to the new block to the caller because that work has
    already been done.  The new record will /always/ land in the new block,
    so in this case we need to use xfs_btree_update_keys to update the keys.
    
    This bug can theoretically manifest itself in the very rare case that we
    split a bmbt root block and the new record lands in the very first slot
    of the new block, though I've never managed to trigger it in practice.
    However, it is very easy to reproduce by running generic/522 with the
    realtime rmapbt patchset if rtinherit=1.
    
    Cc: <stable@vger.kernel.org> # v4.8
    Fixes: 2c813ad66a7218 ("xfs: support btrees with overlapping intervals for keys")
    Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>