Changelog in Linux kernel 6.1.141

__legitimize_mnt(): check for MNT_SYNC_UMOUNT should be under mount_lock [+ + +]

Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sun Apr 27 15:41:51 2025 -0400

    __legitimize_mnt(): check for MNT_SYNC_UMOUNT should be under mount_lock
    
    [ Upstream commit 250cf3693060a5f803c5f1ddc082bb06b16112a9 ]
    
    ... or we risk stealing final mntput from sync umount - raising mnt_count
    after umount(2) has verified that victim is not busy, but before it
    has set MNT_SYNC_UMOUNT; in that case __legitimize_mnt() doesn't see
    that it's safe to quietly undo mnt_count increment and leaves dropping
    the reference to caller, where it'll be a full-blown mntput().
    
    Check under mount_lock is needed; leaving the current one done before
    taking that makes no sense - it's nowhere near common enough to bother
    with.
    
    Reviewed-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPI: HED: Always initialize before evged [+ + +]

Author: Xiaofei Tan <tanxiaofei@huawei.com>
Date:   Wed Feb 12 14:34:08 2025 +0800

    ACPI: HED: Always initialize before evged
    
    [ Upstream commit cccf6ee090c8c133072d5d5b52ae25f3bc907a16 ]
    
    When the HED driver is built-in, it initializes after evged because they
    both are at the same initcall level, so the initialization ordering
    depends on the Makefile order.  However, this prevents RAS records
    coming in between the evged driver initialization and the HED driver
    initialization from being handled.
    
    If the number of such RAS records is above the APEI HEST error source
    number, the HEST resources may be exhausted, and that may affect
    subsequent RAS error reporting.
    
    To fix this issue, change the initcall level of HED to subsys_initcall
    and prevent the driver from being built as a module by changing ACPI_HED
    in Kconfig from "tristate" to "bool".
    
    Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
    Link: https://patch.msgid.link/20250212063408.927666-1-tanxiaofei@huawei.com
    [ rjw: Changelog edits ]
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

af_unix: Add dead flag to struct scm_fp_list. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:24 2025 +0100

    af_unix: Add dead flag to struct scm_fp_list.
    
    commit 7172dc93d621d5dc302d007e95ddd1311ec64283 upstream.
    
    Commit 1af2dface5d2 ("af_unix: Don't access successor in unix_del_edges()
    during GC.") fixed use-after-free by avoid accessing edge->successor while
    GC is in progress.
    
    However, there could be a small race window where another process could
    call unix_del_edges() while gc_in_progress is true and __skb_queue_purge()
    is on the way.
    
    So, we need another marker for struct scm_fp_list which indicates if the
    skb is garbage-collected.
    
    This patch adds dead flag in struct scm_fp_list and set it true before
    calling __skb_queue_purge().
    
    Fixes: 1af2dface5d2 ("af_unix: Don't access successor in unix_del_edges() during GC.")
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240508171150.50601-1-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:08 2025 +0100

    af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd.
    
    commit 29b64e354029cfcf1eea4d91b146c7b769305930 upstream.
    
    As with the previous patch, we preallocate to skb's scm_fp_list an
    array of struct unix_edge in the number of inflight AF_UNIX fds.
    
    There we just preallocate memory and do not use immediately because
    sendmsg() could fail after this point.  The actual use will be in
    the next patch.
    
    When we queue skb with inflight edges, we will set the inflight
    socket's unix_sock as unix_edge->predecessor and the receiver's
    unix_sock as successor, and then we will link the edge to the
    inflight socket's unix_vertex.edges.
    
    Note that we set NULL to cloned scm_fp_list.edges in scm_fp_dup()
    so that MSG_PEEK does not change the shape of the directed graph.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-3-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:07 2025 +0100

    af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd.
    
    commit 1fbfdfaa590248c1d86407f578e40e5c65136330 upstream.
    
    We will replace the garbage collection algorithm for AF_UNIX, where
    we will consider each inflight AF_UNIX socket as a vertex and its file
    descriptor as an edge in a directed graph.
    
    This patch introduces a new struct unix_vertex representing a vertex
    in the graph and adds its pointer to struct unix_sock.
    
    When we send a fd using the SCM_RIGHTS message, we allocate struct
    scm_fp_list to struct scm_cookie in scm_fp_copy().  Then, we bump
    each refcount of the inflight fds' struct file and save them in
    scm_fp_list.fp.
    
    After that, unix_attach_fds() inexplicably clones scm_fp_list of
    scm_cookie and sets it to skb.  (We will remove this part after
    replacing GC.)
    
    Here, we add a new function call in unix_attach_fds() to preallocate
    struct unix_vertex per inflight AF_UNIX fd and link each vertex to
    skb's scm_fp_list.vertices.
    
    When sendmsg() succeeds later, if the socket of the inflight fd is
    still not inflight yet, we will set the preallocated vertex to struct
    unix_sock.vertex and link it to a global list unix_unvisited_vertices
    under spin_lock(&unix_gc_lock).
    
    If the socket is already inflight, we free the preallocated vertex.
    This is to avoid taking the lock unnecessarily when sendmsg() could
    fail later.
    
    In the following patch, we will similarly allocate another struct
    per edge, which will finally be linked to the inflight socket's
    unix_vertex.edges.
    
    And then, we will count the number of edges as unix_vertex.out_degree.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-2-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Assign a unique index to SCC. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:18 2025 +0100

    af_unix: Assign a unique index to SCC.
    
    commit bfdb01283ee8f2f3089656c3ff8f62bb072dabb2 upstream.
    
    The definition of the lowlink in Tarjan's algorithm is the
    smallest index of a vertex that is reachable with at most one
    back-edge in SCC.  This is not useful for a cross-edge.
    
    If we start traversing from A in the following graph, the final
    lowlink of D is 3.  The cross-edge here is one between D and C.
    
      A -> B -> D   D = (4, 3)  (index, lowlink)
      ^    |    |   C = (3, 1)
      |    V    |   B = (2, 1)
      `--- C <--'   A = (1, 1)
    
    This is because the lowlink of D is updated with the index of C.
    
    In the following patch, we detect a dead SCC by checking two
    conditions for each vertex.
    
      1) vertex has no edge directed to another SCC (no bridge)
      2) vertex's out_degree is the same as the refcount of its file
    
    If 1) is false, there is a receiver of all fds of the SCC and
    its ancestor SCC.
    
    To evaluate 1), we need to assign a unique index to each SCC and
    assign it to all vertices in the SCC.
    
    This patch changes the lowlink update logic for cross-edge so
    that in the example above, the lowlink of D is updated with the
    lowlink of C.
    
      A -> B -> D   D = (4, 1)  (index, lowlink)
      ^    |    |   C = (3, 1)
      |    V    |   B = (2, 1)
      `--- C <--'   A = (1, 1)
    
    Then, all vertices in the same SCC have the same lowlink, and we
    can quickly find the bridge connecting to different SCC if exists.
    
    However, it is no longer called lowlink, so we rename it to
    scc_index.  (It's sometimes called lowpoint.)
    
    Also, we add a global variable to hold the last index used in DFS
    so that we do not reset the initial index in each DFS.
    
    This patch can be squashed to the SCC detection patch but is
    split deliberately for anyone wondering why lowlink is not used
    as used in the original Tarjan's algorithm and many reference
    implementations.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-13-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Avoid Tarjan's algorithm if unnecessary. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:17 2025 +0100

    af_unix: Avoid Tarjan's algorithm if unnecessary.
    
    commit ad081928a8b0f57f269df999a28087fce6f2b6ce upstream.
    
    Once a cyclic reference is formed, we need to run GC to check if
    there is dead SCC.
    
    However, we do not need to run Tarjan's algorithm if we know that
    the shape of the inflight graph has not been changed.
    
    If an edge is added/updated/deleted and the edge's successor is
    inflight, we set false to unix_graph_grouped, which means we need
    to re-classify SCC.
    
    Once we finalise SCC, we set true to unix_graph_grouped.
    
    While unix_graph_grouped is true, we can iterate the grouped
    SCC using vertex->scc_entry in unix_walk_scc_fast().
    
    list_add() and list_for_each_entry_reverse() uses seem weird, but
    they are to keep the vertex order consistent and make writing test
    easier.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-12-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:10 2025 +0100

    af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb.
    
    commit 22c3c0c52d32f41cc38cd936ea0c93f22ced3315 upstream.
    
    Currently, we track the number of inflight sockets in two variables.
    unix_tot_inflight is the total number of inflight AF_UNIX sockets on
    the host, and user->unix_inflight is the number of inflight fds per
    user.
    
    We update them one by one in unix_inflight(), which can be done once
    in batch.  Also, sendmsg() could fail even after unix_inflight(), then
    we need to acquire unix_gc_lock only to decrement the counters.
    
    Let's bulk update the counters in unix_add_edges() and unix_del_edges(),
    which is called only for successfully passed fds.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-5-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Detect dead SCC. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:19 2025 +0100

    af_unix: Detect dead SCC.
    
    commit a15702d8b3aad8ce5268c565bd29f0e02fd2db83 upstream.
    
    When iterating SCC, we call unix_vertex_dead() for each vertex
    to check if the vertex is close()d and has no bridge to another
    SCC.
    
    If both conditions are true for every vertex in SCC, we can
    execute garbage collection for all skb in the SCC.
    
    The actual garbage collection is done in the following patch,
    replacing the old implementation.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-14-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Detect Strongly Connected Components. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:12 2025 +0100

    af_unix: Detect Strongly Connected Components.
    
    commit 3484f063172dd88776b062046d721d7c2ae1af7c upstream.
    
    In the new GC, we use a simple graph algorithm, Tarjan's Strongly
    Connected Components (SCC) algorithm, to find cyclic references.
    
    The algorithm visits every vertex exactly once using depth-first
    search (DFS).
    
    DFS starts by pushing an input vertex to a stack and assigning it
    a unique number.  Two fields, index and lowlink, are initialised
    with the number, but lowlink could be updated later during DFS.
    
    If a vertex has an edge to an unvisited inflight vertex, we visit
    it and do the same processing.  So, we will have vertices in the
    stack in the order they appear and number them consecutively in
    the same order.
    
    If a vertex has a back-edge to a visited vertex in the stack,
    we update the predecessor's lowlink with the successor's index.
    
    After iterating edges from the vertex, we check if its index
    equals its lowlink.
    
    If the lowlink is different from the index, it shows there was a
    back-edge.  Then, we go backtracking and propagate the lowlink to
    its predecessor and resume the previous edge iteration from the
    next edge.
    
    If the lowlink is the same as the index, we pop vertices before
    and including the vertex from the stack.  Then, the set of vertices
    is SCC, possibly forming a cycle.  At the same time, we move the
    vertices to unix_visited_vertices.
    
    When we finish the algorithm, all vertices in each SCC will be
    linked via unix_vertex.scc_entry.
    
    Let's take an example.  We have a graph including five inflight
    vertices (F is not inflight):
    
      A -> B -> C -> D -> E (-> F)
           ^         |
           `---------'
    
    Suppose that we start DFS from C.  We will visit C, D, and B first
    and initialise their index and lowlink.  Then, the stack looks like
    this:
    
      > B = (3, 3)  (index, lowlink)
        D = (2, 2)
        C = (1, 1)
    
    When checking B's edge to C, we update B's lowlink with C's index
    and propagate it to D.
    
        B = (3, 1)  (index, lowlink)
      > D = (2, 1)
        C = (1, 1)
    
    Next, we visit E, which has no edge to an inflight vertex.
    
      > E = (4, 4)  (index, lowlink)
        B = (3, 1)
        D = (2, 1)
        C = (1, 1)
    
    When we leave from E, its index and lowlink are the same, so we
    pop E from the stack as single-vertex SCC.  Next, we leave from
    B and D but do nothing because their lowlink are different from
    their index.
    
        B = (3, 1)  (index, lowlink)
        D = (2, 1)
      > C = (1, 1)
    
    Then, we leave from C, whose index and lowlink are the same, so
    we pop B, D and C as SCC.
    
    Last, we do DFS for the rest of vertices, A, which is also a
    single-vertex SCC.
    
    Finally, each unix_vertex.scc_entry is linked as follows:
    
      A -.  B -> C -> D  E -.
      ^  |  ^         |  ^  |
      `--'  `---------'  `--'
    
    We use SCC later to decide whether we can garbage-collect the
    sockets.
    
    Note that we still cannot detect SCC properly if an edge points
    to an embryo socket.  The following two patches will sort it out.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-7-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Don't access successor in unix_del_edges() during GC. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:23 2025 +0100

    af_unix: Don't access successor in unix_del_edges() during GC.
    
    commit 1af2dface5d286dd1f2f3405a0d6fa9f2c8fb998 upstream.
    
    syzbot reported use-after-free in unix_del_edges().  [0]
    
    What the repro does is basically repeat the following quickly.
    
      1. pass a fd of an AF_UNIX socket to itself
    
        socketpair(AF_UNIX, SOCK_DGRAM, 0, [3, 4]) = 0
        sendmsg(3, {..., msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET,
                                       cmsg_type=SCM_RIGHTS, cmsg_data=[4]}], ...}, 0) = 0
    
      2. pass other fds of AF_UNIX sockets to the socket above
    
        socketpair(AF_UNIX, SOCK_SEQPACKET, 0, [5, 6]) = 0
        sendmsg(3, {..., msg_control=[{cmsg_len=48, cmsg_level=SOL_SOCKET,
                                       cmsg_type=SCM_RIGHTS, cmsg_data=[5, 6]}], ...}, 0) = 0
    
      3. close all sockets
    
    Here, two skb are created, and every unix_edge->successor is the first
    socket.  Then, __unix_gc() will garbage-collect the two skb:
    
      (a) free skb with self-referencing fd
      (b) free skb holding other sockets
    
    After (a), the self-referencing socket will be scheduled to be freed
    later by the delayed_fput() task.
    
    syzbot repeated the sequences above (1. ~ 3.) quickly and triggered
    the task concurrently while GC was running.
    
    So, at (b), the socket was already freed, and accessing it was illegal.
    
    unix_del_edges() accesses the receiver socket as edge->successor to
    optimise GC.  However, we should not do it during GC.
    
    Garbage-collecting sockets does not change the shape of the rest
    of the graph, so we need not call unix_update_graph() to update
    unix_graph_grouped when we purge skb.
    
    However, if we clean up all loops in the unix_walk_scc_fast() path,
    unix_graph_maybe_cyclic remains unchanged (true), and __unix_gc()
    will call unix_walk_scc_fast() continuously even though there is no
    socket to garbage-collect.
    
    To keep that optimisation while fixing UAF, let's add the same
    updating logic of unix_graph_maybe_cyclic in unix_walk_scc_fast()
    as done in unix_walk_scc() and __unix_walk_scc().
    
    Note that when unix_del_edges() is called from other places, the
    receiver socket is always alive:
    
      - sendmsg: the successor's sk_refcnt is bumped by sock_hold()
                 unix_find_other() for SOCK_DGRAM, connect() for SOCK_STREAM
    
      - recvmsg: the successor is the receiver, and its fd is alive
    
    [0]:
    BUG: KASAN: slab-use-after-free in unix_edge_successor net/unix/garbage.c:109 [inline]
    BUG: KASAN: slab-use-after-free in unix_del_edge net/unix/garbage.c:165 [inline]
    BUG: KASAN: slab-use-after-free in unix_del_edges+0x148/0x630 net/unix/garbage.c:237
    Read of size 8 at addr ffff888079c6e640 by task kworker/u8:6/1099
    
    CPU: 0 PID: 1099 Comm: kworker/u8:6 Not tainted 6.9.0-rc4-next-20240418-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
    Workqueue: events_unbound __unix_gc
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
     print_address_description mm/kasan/report.c:377 [inline]
     print_report+0x169/0x550 mm/kasan/report.c:488
     kasan_report+0x143/0x180 mm/kasan/report.c:601
     unix_edge_successor net/unix/garbage.c:109 [inline]
     unix_del_edge net/unix/garbage.c:165 [inline]
     unix_del_edges+0x148/0x630 net/unix/garbage.c:237
     unix_destroy_fpl+0x59/0x210 net/unix/garbage.c:298
     unix_detach_fds net/unix/af_unix.c:1811 [inline]
     unix_destruct_scm+0x13e/0x210 net/unix/af_unix.c:1826
     skb_release_head_state+0x100/0x250 net/core/skbuff.c:1127
     skb_release_all net/core/skbuff.c:1138 [inline]
     __kfree_skb net/core/skbuff.c:1154 [inline]
     kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1190
     __skb_queue_purge_reason include/linux/skbuff.h:3251 [inline]
     __skb_queue_purge include/linux/skbuff.h:3256 [inline]
     __unix_gc+0x1732/0x1830 net/unix/garbage.c:575
     process_one_work kernel/workqueue.c:3218 [inline]
     process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
     worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
     kthread+0x2f0/0x390 kernel/kthread.c:389
     ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
     ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
     </TASK>
    
    Allocated by task 14427:
     kasan_save_stack mm/kasan/common.c:47 [inline]
     kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
     unpoison_slab_object mm/kasan/common.c:312 [inline]
     __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338
     kasan_slab_alloc include/linux/kasan.h:201 [inline]
     slab_post_alloc_hook mm/slub.c:3897 [inline]
     slab_alloc_node mm/slub.c:3957 [inline]
     kmem_cache_alloc_noprof+0x135/0x290 mm/slub.c:3964
     sk_prot_alloc+0x58/0x210 net/core/sock.c:2074
     sk_alloc+0x38/0x370 net/core/sock.c:2133
     unix_create1+0xb4/0x770
     unix_create+0x14e/0x200 net/unix/af_unix.c:1034
     __sock_create+0x490/0x920 net/socket.c:1571
     sock_create net/socket.c:1622 [inline]
     __sys_socketpair+0x33e/0x720 net/socket.c:1773
     __do_sys_socketpair net/socket.c:1822 [inline]
     __se_sys_socketpair net/socket.c:1819 [inline]
     __x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819
     do_syscall_x64 arch/x86/entry/common.c:52 [inline]
     do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    Freed by task 1805:
     kasan_save_stack mm/kasan/common.c:47 [inline]
     kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
     kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
     poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
     __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
     kasan_slab_free include/linux/kasan.h:184 [inline]
     slab_free_hook mm/slub.c:2190 [inline]
     slab_free mm/slub.c:4393 [inline]
     kmem_cache_free+0x145/0x340 mm/slub.c:4468
     sk_prot_free net/core/sock.c:2114 [inline]
     __sk_destruct+0x467/0x5f0 net/core/sock.c:2208
     sock_put include/net/sock.h:1948 [inline]
     unix_release_sock+0xa8b/0xd20 net/unix/af_unix.c:665
     unix_release+0x91/0xc0 net/unix/af_unix.c:1049
     __sock_release net/socket.c:659 [inline]
     sock_close+0xbc/0x240 net/socket.c:1421
     __fput+0x406/0x8b0 fs/file_table.c:422
     delayed_fput+0x59/0x80 fs/file_table.c:445
     process_one_work kernel/workqueue.c:3218 [inline]
     process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
     worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
     kthread+0x2f0/0x390 kernel/kthread.c:389
     ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
     ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
    
    The buggy address belongs to the object at ffff888079c6e000
     which belongs to the cache UNIX of size 1920
    The buggy address is located 1600 bytes inside of
     freed 1920-byte region [ffff888079c6e000, ffff888079c6e780)
    
    Reported-by: syzbot+f3f3eef1d2100200e593@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=f3f3eef1d2100200e593
    Fixes: 77e5593aebba ("af_unix: Skip GC if no cycle exists.")
    Fixes: fd86344823b5 ("af_unix: Try not to hold unix_gc_lock during accept().")
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://lore.kernel.org/r/20240419235102.31707-1-kuniyu@amazon.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS [+ + +]

Author: Michal Luczaj <mhal@rbox.co>
Date:   Wed May 21 16:27:25 2025 +0100

    af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
    
    commit 041933a1ec7b4173a8e638cae4f8e394331d7e54 upstream.
    
    GC attempts to explicitly drop oob_skb's reference before purging the hit
    list.
    
    The problem is with embryos: kfree_skb(u->oob_skb) is never called on an
    embryo socket.
    
    The python script below [0] sends a listener's fd to its embryo as OOB
    data.  While GC does collect the embryo's queue, it fails to drop the OOB
    skb's refcount.  The skb which was in embryo's receive queue stays as
    unix_sk(sk)->oob_skb and keeps the listener's refcount [1].
    
    Tell GC to dispose embryo's oob_skb.
    
    [0]:
    from array import array
    from socket import *
    
    addr = '\x00unix-oob'
    lis = socket(AF_UNIX, SOCK_STREAM)
    lis.bind(addr)
    lis.listen(1)
    
    s = socket(AF_UNIX, SOCK_STREAM)
    s.connect(addr)
    scm = (SOL_SOCKET, SCM_RIGHTS, array('i', [lis.fileno()]))
    s.sendmsg([b'x'], [scm], MSG_OOB)
    lis.close()
    
    [1]
    $ grep unix-oob /proc/net/unix
    $ ./unix-oob.py
    $ grep unix-oob /proc/net/unix
    0000000000000000: 00000002 00000000 00000000 0001 02     0 @unix-oob
    0000000000000000: 00000002 00000000 00010000 0001 01  6072 @unix-oob
    
    Fixes: 4090fa373f0e ("af_unix: Replace garbage collection algorithm.")
    Signed-off-by: Michal Luczaj <mhal@rbox.co>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Fix uninit-value in __unix_walk_scc() [+ + +]

Author: Shigeru Yoshida <syoshida@redhat.com>
Date:   Wed May 21 16:27:26 2025 +0100

    af_unix: Fix uninit-value in __unix_walk_scc()
    
    commit 927fa5b3e4f52e0967bfc859afc98ad1c523d2d5 upstream.
    
    KMSAN reported uninit-value access in __unix_walk_scc() [1].
    
    In the list_for_each_entry_reverse() loop, when the vertex's index
    equals it's scc_index, the loop uses the variable vertex as a
    temporary variable that points to a vertex in scc. And when the loop
    is finished, the variable vertex points to the list head, in this case
    scc, which is a local variable on the stack (more precisely, it's not
    even scc and might underflow the call stack of __unix_walk_scc():
    container_of(&scc, struct unix_vertex, scc_entry)).
    
    However, the variable vertex is used under the label prev_vertex. So
    if the edge_stack is not empty and the function jumps to the
    prev_vertex label, the function will access invalid data on the
    stack. This causes the uninit-value access issue.
    
    Fix this by introducing a new temporary variable for the loop.
    
    [1]
    BUG: KMSAN: uninit-value in __unix_walk_scc net/unix/garbage.c:478 [inline]
    BUG: KMSAN: uninit-value in unix_walk_scc net/unix/garbage.c:526 [inline]
    BUG: KMSAN: uninit-value in __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
     __unix_walk_scc net/unix/garbage.c:478 [inline]
     unix_walk_scc net/unix/garbage.c:526 [inline]
     __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
     process_one_work kernel/workqueue.c:3231 [inline]
     process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
     worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
     kthread+0x3c4/0x530 kernel/kthread.c:389
     ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
     ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
    
    Uninit was stored to memory at:
     unix_walk_scc net/unix/garbage.c:526 [inline]
     __unix_gc+0x2adf/0x3c20 net/unix/garbage.c:584
     process_one_work kernel/workqueue.c:3231 [inline]
     process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
     worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
     kthread+0x3c4/0x530 kernel/kthread.c:389
     ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
     ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
    
    Local variable entries created at:
     ref_tracker_free+0x48/0xf30 lib/ref_tracker.c:222
     netdev_tracker_free include/linux/netdevice.h:4058 [inline]
     netdev_put include/linux/netdevice.h:4075 [inline]
     dev_put include/linux/netdevice.h:4101 [inline]
     update_gid_event_work_handler+0xaa/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:813
    
    CPU: 1 PID: 12763 Comm: kworker/u8:31 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
    Workqueue: events_unbound __unix_gc
    
    Fixes: 3484f063172d ("af_unix: Detect Strongly Connected Components.")
    Reported-by: syzkaller <syzkaller@googlegroups.com>
    Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://patch.msgid.link/20240702160428.10153-1-syoshida@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Fix up unix_edge.successor for embryo socket. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:14 2025 +0100

    af_unix: Fix up unix_edge.successor for embryo socket.
    
    commit dcf70df2048d27c5d186f013f101a4aefd63aa41 upstream.
    
    To garbage collect inflight AF_UNIX sockets, we must define the
    cyclic reference appropriately.  This is a bit tricky if the loop
    consists of embryo sockets.
    
    Suppose that the fd of AF_UNIX socket A is passed to D and the fd B
    to C and that C and D are embryo sockets of A and B, respectively.
    It may appear that there are two separate graphs, A (-> D) and
    B (-> C), but this is not correct.
    
         A --. .-- B
              X
         C <-' `-> D
    
    Now, D holds A's refcount, and C has B's refcount, so unix_release()
    will never be called for A and B when we close() them.  However, no
    one can call close() for D and C to free skbs holding refcounts of A
    and B because C/D is in A/B's receive queue, which should have been
    purged by unix_release() for A and B.
    
    So, here's another type of cyclic reference.  When a fd of an AF_UNIX
    socket is passed to an embryo socket, the reference is indirectly held
    by its parent listening socket.
    
      .-> A                            .-> B
      |   `- sk_receive_queue          |   `- sk_receive_queue
      |      `- skb                    |      `- skb
      |         `- sk == C             |         `- sk == D
      |            `- sk_receive_queue |           `- sk_receive_queue
      |               `- skb +---------'               `- skb +-.
      |                                                         |
      `---------------------------------------------------------'
    
    Technically, the graph must be denoted as A <-> B instead of A (-> D)
    and B (-> C) to find such a cyclic reference without touching each
    socket's receive queue.
    
      .-> A --. .-- B <-.
      |        X        |  ==  A <-> B
      `-- C <-' `-> D --'
    
    We apply this fixup during GC by fetching the real successor by
    unix_edge_successor().
    
    When we call accept(), we clear unix_sock.listener under unix_gc_lock
    not to confuse GC.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-9-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Iterate all vertices by DFS. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:11 2025 +0100

    af_unix: Iterate all vertices by DFS.
    
    commit 6ba76fd2848e107594ea4f03b737230f74bc23ea upstream.
    
    The new GC will use a depth first search graph algorithm to find
    cyclic references.  The algorithm visits every vertex exactly once.
    
    Here, we implement the DFS part without recursion so that no one
    can abuse it.
    
    unix_walk_scc() marks every vertex unvisited by initialising index
    as UNIX_VERTEX_INDEX_UNVISITED and iterates inflight vertices in
    unix_unvisited_vertices and call __unix_walk_scc() to start DFS from
    an arbitrary vertex.
    
    __unix_walk_scc() iterates all edges starting from the vertex and
    explores the neighbour vertices with DFS using edge_stack.
    
    After visiting all neighbours, __unix_walk_scc() moves the visited
    vertex to unix_visited_vertices so that unix_walk_scc() will not
    restart DFS from the visited vertex.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-6-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Kconfig: make CONFIG_UNIX bool [+ + +]

Author: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Date:   Wed May 21 16:27:00 2025 +0100

    af_unix: Kconfig: make CONFIG_UNIX bool
    
    commit 97154bcf4d1b7cabefec8a72cff5fbb91d5afb7b upstream.
    
    Let's make CONFIG_UNIX a bool instead of a tristate.
    We've decided to do that during discussion about SCM_PIDFD patchset [1].
    
    [1] https://lore.kernel.org/lkml/20230524081933.44dc8bea@kernel.org/
    
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Jakub Kicinski <kuba@kernel.org>
    Cc: Paolo Abeni <pabeni@redhat.com>
    Cc: Leon Romanovsky <leon@kernel.org>
    Cc: David Ahern <dsahern@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
    Cc: Lennart Poettering <mzxreary@0pointer.de>
    Cc: Luca Boccassi <bluca@debian.org>
    Cc: linux-kernel@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Suggested-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
    Acked-by: Christian Brauner <brauner@kernel.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Link struct unix_edge when queuing skb. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:09 2025 +0100

    af_unix: Link struct unix_edge when queuing skb.
    
    commit 42f298c06b30bfe0a8cbee5d38644e618699e26e upstream.
    
    Just before queuing skb with inflight fds, we call scm_stat_add(),
    which is a good place to set up the preallocated struct unix_vertex
    and struct unix_edge in UNIXCB(skb).fp.
    
    Then, we call unix_add_edges() and construct the directed graph
    as follows:
    
      1. Set the inflight socket's unix_sock to unix_edge.predecessor.
      2. Set the receiver's unix_sock to unix_edge.successor.
      3. Set the preallocated vertex to inflight socket's unix_sock.vertex.
      4. Link inflight socket's unix_vertex.entry to unix_unvisited_vertices.
      5. Link unix_edge.vertex_entry to the inflight socket's unix_vertex.edges.
    
    Let's say we pass the fd of AF_UNIX socket A to B and the fd of B
    to C.  The graph looks like this:
    
      +-------------------------+
      | unix_unvisited_vertices | <-------------------------.
      +-------------------------+                           |
      +                                                     |
      |     +--------------+             +--------------+   |         +--------------+
      |     |  unix_sock A | <---. .---> |  unix_sock B | <-|-. .---> |  unix_sock C |
      |     +--------------+     | |     +--------------+   | | |     +--------------+
      | .-+ |    vertex    |     | | .-+ |    vertex    |   | | |     |    vertex    |
      | |   +--------------+     | | |   +--------------+   | | |     +--------------+
      | |                        | | |                      | | |
      | |   +--------------+     | | |   +--------------+   | | |
      | '-> |  unix_vertex |     | | '-> |  unix_vertex |   | | |
      |     +--------------+     | |     +--------------+   | | |
      `---> |    entry     | +---------> |    entry     | +-' | |
            |--------------|     | |     |--------------|     | |
            |    edges     | <-. | |     |    edges     | <-. | |
            +--------------+   | | |     +--------------+   | | |
                               | | |                        | | |
        .----------------------' | | .----------------------' | |
        |                        | | |                        | |
        |   +--------------+     | | |   +--------------+     | |
        |   |   unix_edge  |     | | |   |   unix_edge  |     | |
        |   +--------------+     | | |   +--------------+     | |
        `-> | vertex_entry |     | | `-> | vertex_entry |     | |
            |--------------|     | |     |--------------|     | |
            |  predecessor | +---' |     |  predecessor | +---' |
            |--------------|       |     |--------------|       |
            |   successor  | +-----'     |   successor  | +-----'
            +--------------+             +--------------+
    
    Henceforth, we denote such a graph as A -> B (-> C).
    
    Now, we can express all inflight fd graphs that do not contain
    embryo sockets.  We will support the particular case later.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-4-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Remove CONFIG_UNIX_SCM. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:06 2025 +0100

    af_unix: Remove CONFIG_UNIX_SCM.
    
    commit 99a7a5b9943ea2d05fb0dee38e4ae2290477ed83 upstream.
    
    Originally, the code related to garbage collection was all in garbage.c.
    
    Commit f4e65870e5ce ("net: split out functions related to registering
    inflight socket files") moved some functions to scm.c for io_uring and
    added CONFIG_UNIX_SCM just in case AF_UNIX was built as module.
    
    However, since commit 97154bcf4d1b ("af_unix: Kconfig: make CONFIG_UNIX
    bool"), AF_UNIX is no longer built separately.  Also, io_uring does not
    support SCM_RIGHTS now.
    
    Let's move the functions back to garbage.c
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Jens Axboe <axboe@kernel.dk>
    Link: https://lore.kernel.org/r/20240129190435.57228-4-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Remove io_uring code for GC. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:05 2025 +0100

    af_unix: Remove io_uring code for GC.
    
    commit 11498715f266a3fb4caabba9dd575636cbcaa8f1 upstream.
    
    Since commit 705318a99a13 ("io_uring/af_unix: disable sending
    io_uring over sockets"), io_uring's unix socket cannot be passed
    via SCM_RIGHTS, so it does not contribute to cyclic reference and
    no longer be candidate for garbage collection.
    
    Also, commit 6e5e6d274956 ("io_uring: drop any code related to
    SCM_RIGHTS") cleaned up SCM_RIGHTS code in io_uring.
    
    Let's do it in AF_UNIX as well by reverting commit 0091bfc81741
    ("io_uring/af_unix: defer registered files gc to io_uring release")
    and commit 10369080454d ("net: reclaim skb->scm_io_uring bit").
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Jens Axboe <axboe@kernel.dk>
    Link: https://lore.kernel.org/r/20240129190435.57228-3-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Remove lock dance in unix_peek_fds(). [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:21 2025 +0100

    af_unix: Remove lock dance in unix_peek_fds().
    
    commit 118f457da9ed58a79e24b73c2ef0aa1987241f0e upstream.
    
    In the previous GC implementation, the shape of the inflight socket
    graph was not expected to change while GC was in progress.
    
    MSG_PEEK was tricky because it could install inflight fd silently
    and transform the graph.
    
    Let's say we peeked a fd, which was a listening socket, and accept()ed
    some embryo sockets from it.  The garbage collection algorithm would
    have been confused because the set of sockets visited in scan_inflight()
    would change within the same GC invocation.
    
    That's why we placed spin_lock(&unix_gc_lock) and spin_unlock() in
    unix_peek_fds() with a fat comment.
    
    In the new GC implementation, we no longer garbage-collect the socket
    if it exists in another queue, that is, if it has a bridge to another
    SCC.  Also, accept() will require the lock if it has edges.
    
    Thus, we need not do the complicated lock dance.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://lore.kernel.org/r/20240401173125.92184-3-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Replace BUG_ON() with WARN_ON_ONCE(). [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:04 2025 +0100

    af_unix: Replace BUG_ON() with WARN_ON_ONCE().
    
    commit d0f6dc26346863e1f4a23117f5468614e54df064 upstream.
    
    This is a prep patch for the last patch in this series so that
    checkpatch will not warn about BUG_ON().
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Jens Axboe <axboe@kernel.dk>
    Link: https://lore.kernel.org/r/20240129190435.57228-2-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Replace garbage collection algorithm. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:20 2025 +0100

    af_unix: Replace garbage collection algorithm.
    
    commit 4090fa373f0e763c43610853d2774b5979915959 upstream.
    
    If we find a dead SCC during iteration, we call unix_collect_skb()
    to splice all skb in the SCC to the global sk_buff_head, hitlist.
    
    After iterating all SCC, we unlock unix_gc_lock and purge the queue.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-15-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Return struct unix_sock from unix_get_socket(). [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:01 2025 +0100

    af_unix: Return struct unix_sock from unix_get_socket().
    
    commit 5b17307bd0789edea0675d524a2b277b93bbde62 upstream.
    
    Currently, unix_get_socket() returns struct sock, but after calling
    it, we always cast it to unix_sk().
    
    Let's return struct unix_sock from unix_get_socket().
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Pavel Begunkov <asml.silence@gmail.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20240123170856.41348-4-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Run GC on only one CPU. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:02 2025 +0100

    af_unix: Run GC on only one CPU.
    
    commit 8b90a9f819dc2a06baae4ec1a64d875e53b824ec upstream.
    
    If more than 16000 inflight AF_UNIX sockets exist and the garbage
    collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc().
    Also, they wait for unix_gc() to complete.
    
    In unix_gc(), all inflight AF_UNIX sockets are traversed at least once,
    and more if they are the GC candidate.  Thus, sendmsg() significantly
    slows down with too many inflight AF_UNIX sockets.
    
    There is a small window to invoke multiple unix_gc() instances, which
    will then be blocked by the same spinlock except for one.
    
    Let's convert unix_gc() to use struct work so that it will not consume
    CPUs unnecessarily.
    
    Note WRITE_ONCE(gc_in_progress, true) is moved before running GC.
    If we leave the WRITE_ONCE() as is and use the following test to
    call flush_work(), a process might not call it.
    
        CPU 0                                     CPU 1
        ---                                       ---
                                                  start work and call __unix_gc()
        if (work_pending(&unix_gc_work) ||        <-- false
            READ_ONCE(gc_in_progress))            <-- false
                flush_work();                     <-- missed!
                                                  WRITE_ONCE(gc_in_progress, true)
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://lore.kernel.org/r/20240123170856.41348-5-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Save listener for embryo socket. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:13 2025 +0100

    af_unix: Save listener for embryo socket.
    
    commit aed6ecef55d70de3762ce41c561b7f547dbaf107 upstream.
    
    This is a prep patch for the following change, where we need to
    fetch the listening socket from the successor embryo socket
    during GC.
    
    We add a new field to struct unix_sock to save a pointer to a
    listening socket.
    
    We set it when connect() creates a new socket, and clear it when
    accept() is called.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-8-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Save O(n) setup of Tarjan's algo. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:15 2025 +0100

    af_unix: Save O(n) setup of Tarjan's algo.
    
    commit ba31b4a4e1018f5844c6eb31734976e2184f2f9a upstream.
    
    Before starting Tarjan's algorithm, we need to mark all vertices
    as unvisited.  We can save this O(n) setup by reserving two special
    indices (0, 1) and using two variables.
    
    The first time we link a vertex to unix_unvisited_vertices, we set
    unix_vertex_unvisited_index to index.
    
    During DFS, we can see that the index of unvisited vertices is the
    same as unix_vertex_unvisited_index.
    
    When we finalise SCC later, we set unix_vertex_grouped_index to each
    vertex's index.
    
    Then, we can know (i) that the vertex is on the stack if the index
    of a visited vertex is >= 2 and (ii) that it is not on the stack and
    belongs to a different SCC if the index is unix_vertex_grouped_index.
    
    After the whole algorithm, all indices of vertices are set as
    unix_vertex_grouped_index.
    
    Next time we start DFS, we know that all unvisited vertices have
    unix_vertex_grouped_index, and we can use unix_vertex_unvisited_index
    as the not-on-stack marker.
    
    To use the same variable in __unix_walk_scc(), we can swap
    unix_vertex_(grouped|unvisited)_index at the end of Tarjan's
    algorithm.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-10-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Skip GC if no cycle exists. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:16 2025 +0100

    af_unix: Skip GC if no cycle exists.
    
    commit 77e5593aebba823bcbcf2c4b58b07efcd63933b8 upstream.
    
    We do not need to run GC if there is no possible cyclic reference.
    We use unix_graph_maybe_cyclic to decide if we should run GC.
    
    If a fd of an AF_UNIX socket is passed to an already inflight AF_UNIX
    socket, they could form a cyclic reference.  Then, we set true to
    unix_graph_maybe_cyclic and later run Tarjan's algorithm to group
    them into SCC.
    
    Once we run Tarjan's algorithm, we are 100% sure whether cyclic
    references exist or not.  If there is no cycle, we set false to
    unix_graph_maybe_cyclic and can skip the entire garbage collection
    next time.
    
    When finalising SCC, we set true to unix_graph_maybe_cyclic if SCC
    consists of multiple vertices.
    
    Even if SCC is a single vertex, a cycle might exist as self-fd passing.
    Given the corner case is rare, we detect it by checking all edges of
    the vertex and set true to unix_graph_maybe_cyclic.
    
    With this change, __unix_gc() is just a spin_lock() dance in the normal
    usage.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20240325202425.60930-11-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Try not to hold unix_gc_lock during accept(). [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:22 2025 +0100

    af_unix: Try not to hold unix_gc_lock during accept().
    
    commit fd86344823b521149bb31d91eba900ba3525efa6 upstream.
    
    Commit dcf70df2048d ("af_unix: Fix up unix_edge.successor for embryo
    socket.") added spin_lock(&unix_gc_lock) in accept() path, and it
    caused regression in a stress test as reported by kernel test robot.
    
    If the embryo socket is not part of the inflight graph, we need not
    hold the lock.
    
    To decide that in O(1) time and avoid the regression in the normal
    use case,
    
      1. add a new stat unix_sk(sk)->scm_stat.nr_unix_fds
    
      2. count the number of inflight AF_UNIX sockets in the receive
         queue under unix_state_lock()
    
      3. move unix_update_edges() call under unix_state_lock()
    
      4. avoid locking if nr_unix_fds is 0 in unix_update_edges()
    
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Closes: https://lore.kernel.org/oe-lkp/202404101427.92a08551-oliver.sang@intel.com
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://lore.kernel.org/r/20240413021928.20946-1-kuniyu@amazon.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Try to run GC async. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed May 21 16:27:03 2025 +0100

    af_unix: Try to run GC async.
    
    commit d9f21b3613337b55cc9d4a6ead484dca68475143 upstream.
    
    If more than 16000 inflight AF_UNIX sockets exist and the garbage
    collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc().
    Also, they wait for unix_gc() to complete.
    
    In unix_gc(), all inflight AF_UNIX sockets are traversed at least once,
    and more if they are the GC candidate.  Thus, sendmsg() significantly
    slows down with too many inflight AF_UNIX sockets.
    
    However, if a process sends data with no AF_UNIX FD, the sendmsg() call
    does not need to wait for GC.  After this change, only the process that
    meets the condition below will be blocked under such a situation.
    
      1) cmsg contains AF_UNIX socket
      2) more than 32 AF_UNIX sent by the same user are still inflight
    
    Note that even a sendmsg() call that does not meet the condition but has
    AF_UNIX FD will be blocked later in unix_scm_to_skb() by the spinlock,
    but we allow that as a bonus for sane users.
    
    The results below are the time spent in unix_dgram_sendmsg() sending 1
    byte of data with no FD 4096 times on a host where 32K inflight AF_UNIX
    sockets exist.
    
    Without series: the sane sendmsg() needs to wait gc unreasonably.
    
      $ sudo /usr/share/bcc/tools/funclatency -p 11165 unix_dgram_sendmsg
      Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end.
      ^C
           nsecs               : count     distribution
      [...]
          524288 -> 1048575    : 0        |                                        |
         1048576 -> 2097151    : 3881     |****************************************|
         2097152 -> 4194303    : 214      |**                                      |
         4194304 -> 8388607    : 1        |                                        |
    
      avg = 1825567 nsecs, total: 7477526027 nsecs, count: 4096
    
    With series: the sane sendmsg() can finish much faster.
    
      $ sudo /usr/share/bcc/tools/funclatency -p 8702  unix_dgram_sendmsg
      Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end.
      ^C
           nsecs               : count     distribution
      [...]
             128 -> 255        : 0        |                                        |
             256 -> 511        : 4092     |****************************************|
             512 -> 1023       : 2        |                                        |
            1024 -> 2047       : 0        |                                        |
            2048 -> 4095       : 0        |                                        |
            4096 -> 8191       : 1        |                                        |
            8192 -> 16383      : 1        |                                        |
    
      avg = 410 nsecs, total: 1680510 nsecs, count: 4096
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://lore.kernel.org/r/20240123170856.41348-6-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek: Add quirk for HP Spectre x360 15-df1xxx [+ + +]

Author: Takashi Iwai <tiwai@suse.de>
Date:   Sun Apr 27 10:10:34 2025 +0200

    ALSA: hda/realtek: Add quirk for HP Spectre x360 15-df1xxx
    
    [ Upstream commit be0c40da888840fe91b45474cb70779e6cbaf7ca ]
    
    HP Spectre x360 15-df1xxx with SSID 13c:863e requires similar
    workarounds that were applied to another HP Spectre x360 models;
    it has a mute LED only, no micmute LEDs, and needs the speaker GPIO
    seup.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=220054
    Link: https://patch.msgid.link/20250427081035.11567-1-tiwai@suse.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14ASP10 [+ + +]

Author: Ed Burcher <git@edburcher.com>
Date:   Mon May 19 23:49:07 2025 +0100

    ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14ASP10
    
    commit 8d70503068510e6080c2c649cccb154f16de26c9 upstream.
    
    Lenovo Yoga Pro 7 (gen 10) with Realtek ALC3306 and combined CS35L56
    amplifiers need quirk ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN to
    enable bass
    
    Signed-off-by: Ed Burcher <git@edburcher.com>
    Cc: <stable@vger.kernel.org>
    Link: https://patch.msgid.link/20250519224907.31265-2-git@edburcher.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek: Enable PC beep passthrough for HP EliteBook 855 G7 [+ + +]

Author: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Date:   Sun Feb 16 22:31:03 2025 +0100

    ALSA: hda/realtek: Enable PC beep passthrough for HP EliteBook 855 G7
    
    [ Upstream commit aa85822c611aef7cd4dc17d27121d43e21bb82f0 ]
    
    PC speaker works well on this platform in BIOS and in Linux until sound
    card drivers are loaded. Then it stops working.
    
    There seems to be a beep generator node at 0x1a in this CODEC
    (ALC269_TYPE_ALC215) but it seems to be only connected to capture mixers
    at nodes 0x22 and 0x23.
    If I unmute the mixer input for 0x1a at node 0x23 and start recording
    from its "ALC285 Analog" capture device I can clearly hear beeps in that
    recording.
    
    So the beep generator is indeed working properly, however I wasn't able to
    figure out any way to connect it to speakers.
    
    However, the bits in the "Passthrough Control" register (0x36) seems to
    work at least partially: by zeroing "B" and "h" and setting "S" I can at
    least make the PIT PC speaker output appear either in this laptop speakers
    or headphones (depending on whether they are connected or not).
    
    There are some caveats, however:
    * If the CODEC gets runtime-suspended the beeps stop so it needs HDA beep
    device for keeping it awake during beeping.
    
    * If the beep generator node is generating any beep the PC beep passthrough
    seems to be temporarily inhibited, so the HDA beep device has to be
    prevented from using the actual beep generator node - but the beep device
    is still necessary due to the previous point.
    
    * In contrast with other platforms here beep amplification has to be
    disabled otherwise the beeps output are WAY louder than they were on pure
    BIOS setup.
    
    Unless someone (from Realtek probably) knows how to make the beep generator
    node output appear in speakers / headphones using PC beep passthrough seems
    to be the only way to make PC speaker beeping actually work on this
    platform.
    
    Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
    Acked-by: kailang@realtek.com
    Link: https://patch.msgid.link/7461f695b4daed80f2fc4b1463ead47f04f9ad05.1739741254.git.mail@maciej.szmigiero.name
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: pcm: Fix race of buffer access at PCM OSS layer [+ + +]

Author: Takashi Iwai <tiwai@suse.de>
Date:   Fri May 16 10:08:16 2025 +0200

    ALSA: pcm: Fix race of buffer access at PCM OSS layer
    
    commit 93a81ca0657758b607c3f4ba889ae806be9beb73 upstream.
    
    The PCM OSS layer tries to clear the buffer with the silence data at
    initialization (or reconfiguration) of a stream with the explicit call
    of snd_pcm_format_set_silence() with runtime->dma_area.  But this may
    lead to a UAF because the accessed runtime->dma_area might be freed
    concurrently, as it's performed outside the PCM ops.
    
    For avoiding it, move the code into the PCM core and perform it inside
    the buffer access lock, so that it won't be changed during the
    operation.
    
    Reported-by: syzbot+32d4647f551007595173@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/68164d8e.050a0220.11da1b.0019.GAE@google.com
    Cc: <stable@vger.kernel.org>
    Link: https://patch.msgid.link/20250516080817.20068-1-tiwai@suse.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: seq: Improve data consistency at polling [+ + +]

Author: Takashi Iwai <tiwai@suse.de>
Date:   Fri Mar 7 09:42:42 2025 +0100

    ALSA: seq: Improve data consistency at polling
    
    [ Upstream commit e3cd33ab17c33bd8f1a9df66ec83a15dd8f7afbb ]
    
    snd_seq_poll() calls snd_seq_write_pool_allocated() that reads out a
    field in client->pool object, while it can be updated concurrently via
    ioctls, as reported by syzbot.  The data race itself is harmless, as
    it's merely a poll() call, and the state is volatile.  OTOH, the read
    out of poll object info from the caller side is fragile, and we can
    leave it better in snd_seq_pool_poll_wait() alone.
    
    A similar pattern is seen in snd_seq_kernel_client_write_poll(), too,
    which is called from the OSS sequencer.
    
    This patch drops the pool checks from the caller side and add the
    pool->lock in snd_seq_pool_poll_wait() for better data consistency.
    
    Reported-by: syzbot+2d373c9936c00d7e120c@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/67c88903.050a0220.15b4b9.0028.GAE@google.com
    Link: https://patch.msgid.link/20250307084246.29271-1-tiwai@suse.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

arch/powerpc/perf: Check the instruction type before creating sample with perf_mem_data_src [+ + +]

Author: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Date:   Tue Jan 21 18:46:20 2025 +0530

    arch/powerpc/perf: Check the instruction type before creating sample with perf_mem_data_src
    
    [ Upstream commit 2ffb26afa64261139e608bf087a0c1fe24d76d4d ]
    
    perf mem report aborts as below sometimes (during some corner
    case) in powerpc:
    
       # ./perf mem report 1>out
       *** stack smashing detected ***: terminated
       Aborted (core dumped)
    
    The backtrace is as below:
       __pthread_kill_implementation ()
       raise ()
       abort ()
       __libc_message
       __fortify_fail
       __stack_chk_fail
       hist_entry.lvl_snprintf
       __sort__hpp_entry
       __hist_entry__snprintf
       hists.fprintf
       cmd_report
       cmd_mem
    
    Snippet of code which triggers the issue
    from tools/perf/util/sort.c
    
       static int hist_entry__lvl_snprintf(struct hist_entry *he, char *bf,
                                        size_t size, unsigned int width)
       {
            char out[64];
    
            perf_mem__lvl_scnprintf(out, sizeof(out), he->mem_info);
            return repsep_snprintf(bf, size, "%-*s", width, out);
       }
    
    The value of "out" is filled from perf_mem_data_src value.
    Debugging this further showed that for some corner cases, the
    value of "data_src" was pointing to wrong value. This resulted
    in bigger size of string and causing stack check fail.
    
    The perf mem data source values are captured in the sample via
    isa207_get_mem_data_src function. The initial check is to fetch
    the type of sampled instruction. If the type of instruction is
    not valid (not a load/store instruction), the function returns.
    
    Since 'commit e16fd7f2cb1a ("perf: Use sample_flags for data_src")',
    data_src field is not initialized by the perf_sample_data_init()
    function. If the PMU driver doesn't set the data_src value to zero if
    type is not valid, this will result in uninitailised value for data_src.
    The uninitailised value of data_src resulted in stack check fail
    followed by abort for "perf mem report".
    
    When requesting for data source information in the sample, the
    instruction type is expected to be load or store instruction.
    In ISA v3.0, due to hardware limitation, there are corner cases
    where the instruction type other than load or store is observed.
    In ISA v3.0 and before values "0" and "7" are considered reserved.
    In ISA v3.1, value "7" has been used to indicate "larx/stcx".
    Drop the sample if instruction type has reserved values for this
    field with a ISA version check. Initialize data_src to zero in
    isa207_get_mem_data_src if the instruction type is not load/store.
    
    Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
    Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
    Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
    Link: https://patch.msgid.link/20250121131621.39054-1-atrajeev@linux.vnet.ibm.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64/mm: Check PUD_TYPE_TABLE in pud_bad() [+ + +]

Author: Ryan Roberts <ryan.roberts@arm.com>
Date:   Fri Feb 21 10:12:25 2025 +0530

    arm64/mm: Check PUD_TYPE_TABLE in pud_bad()
    
    [ Upstream commit bfb1d2b9021c21891427acc86eb848ccedeb274e ]
    
    pud_bad() is currently defined in terms of pud_table(). Although for some
    configs, pud_table() is hard-coded to true i.e. when using 64K base pages
    or when page table levels are less than 3.
    
    pud_bad() is intended to check that the pud is configured correctly. Hence
    let's open-code the same check that the full version of pud_table() uses
    into pud_bad(). Then it always performs the check regardless of the config.
    
    Cc: Will Deacon <will@kernel.org>
    Cc: Ard Biesheuvel <ardb@kernel.org>
    Cc: Ryan Roberts <ryan.roberts@arm.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Link: https://lore.kernel.org/r/20250221044227.1145393-7-anshuman.khandual@arm.com
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: Add support for HIP09 Spectre-BHB mitigation [+ + +]

Author: Jinqian Yang <yangjinqian1@huawei.com>
Date:   Tue Mar 25 22:19:00 2025 +0800

    arm64: Add support for HIP09 Spectre-BHB mitigation
    
    [ Upstream commit e18c09b204e81702ea63b9f1a81ab003b72e3174 ]
    
    The HIP09 processor is vulnerable to the Spectre-BHB (Branch History
    Buffer) attack, which can be exploited to leak information through
    branch prediction side channels. This commit adds the MIDR of HIP09
    to the list for software mitigation.
    
    Signed-off-by: Jinqian Yang <yangjinqian1@huawei.com>
    Link: https://lore.kernel.org/r/20250325141900.2057314-1-yangjinqian1@huawei.com
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: qcom: sm8350: Fix typo in pil_camera_mem node [+ + +]

Author: Alok Tiwari <alok.a.tiwari@oracle.com>
Date:   Wed May 14 04:46:51 2025 -0700

    arm64: dts: qcom: sm8350: Fix typo in pil_camera_mem node
    
    commit 295217420a44403a33c30f99d8337fe7b07eb02b upstream.
    
    There is a typo in sm8350.dts where the node label
    mmeory@85200000 should be memory@85200000.
    This patch corrects the typo for clarity and consistency.
    
    Fixes: b7e8f433a673 ("arm64: dts: qcom: Add basic devicetree support for SM8350 SoC")
    Cc: stable@vger.kernel.org
    Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
    Link: https://lore.kernel.org/r/20250514114656.2307828-1-alok.a.tiwari@oracle.com
    Signed-off-by: Bjorn Andersson <andersson@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: tegra: p2597: Fix gpio for vdd-1v8-dis regulator [+ + +]

Author: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Date:   Mon Feb 24 12:17:36 2025 +0000

    arm64: tegra: p2597: Fix gpio for vdd-1v8-dis regulator
    
    [ Upstream commit f34621f31e3be81456c903287f7e4c0609829e29 ]
    
    According to the board schematics the enable pin of this regulator is
    connected to gpio line #9 of the first instance of the TCA9539
    GPIO expander, so adjust it.
    
    Signed-off-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
    Link: https://lore.kernel.org/r/20250224-diogo-gpio_exp-v1-1-80fb84ac48c6@tecnico.ulisboa.pt
    Signed-off-by: Thierry Reding <treding@nvidia.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ARM: at91: pm: fix at91_suspend_finish for ZQ calibration [+ + +]

Author: Li Bin <bin.li@microchip.com>
Date:   Thu Feb 27 08:51:56 2025 -0700

    ARM: at91: pm: fix at91_suspend_finish for ZQ calibration
    
    [ Upstream commit bc4722c3598d0e2c2dbf9609a3d3198993093e2b ]
    
    For sama7g5 and sama7d65 backup mode, we encountered a "ZQ calibrate error"
    during recalibrating the impedance in BootStrap.
    We found that the impedance value saved in at91_suspend_finish() before
    the DDR entered self-refresh mode did not match the resistor values. The
    ZDATA field in the DDR3PHY_ZQ0CR0 register uses a modified gray code to
    select the different impedance setting.
    But these gray code are incorrect, a workaournd from design team fixed the
    bug in the calibration logic. The ZDATA contains four independent impedance
    elements, but the algorithm combined the four elements into one. The elements
    were fixed using properly shifted offsets.
    
    Signed-off-by: Li Bin <bin.li@microchip.com>
    [nicolas.ferre@microchip.com: fix indentation and combine 2 patches]
    Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
    Tested-by: Ryan Wanner <Ryan.Wanner@microchip.com>
    Tested-by: Durai Manickam KR <durai.manickamkr@microchip.com>
    Tested-by: Andrei Simion <andrei.simion@microchip.com>
    Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com>
    Link: https://lore.kernel.org/r/28b33f9bcd0ca60ceba032969fe054d38f2b9577.1740671156.git.Ryan.Wanner@microchip.com
    Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ARM: tegra: Switch DSI-B clock parent to PLLD on Tegra114 [+ + +]

Author: Svyatoslav Ryhel <clamor95@gmail.com>
Date:   Wed Feb 26 12:56:11 2025 +0200

    ARM: tegra: Switch DSI-B clock parent to PLLD on Tegra114
    
    [ Upstream commit 2b3db788f2f614b875b257cdb079adadedc060f3 ]
    
    PLLD is usually used as parent clock for internal video devices, like
    DSI for example, while PLLD2 is used as parent for HDMI.
    
    Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
    Link: https://lore.kernel.org/r/20250226105615.61087-3-clamor95@gmail.com
    Signed-off-by: Thierry Reding <treding@nvidia.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: codecs: pcm3168a: Allow for 24-bit in provider mode [+ + +]

Author: Cezary Rojewski <cezary.rojewski@intel.com>
Date:   Mon Feb 3 15:10:43 2025 +0100

    ASoC: codecs: pcm3168a: Allow for 24-bit in provider mode
    
    [ Upstream commit 7d92a38d67e5d937b64b20aa4fd14451ee1772f3 ]
    
    As per codec device specification, 24-bit is allowed in provider mode.
    Update the code to reflect that.
    
    Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
    Link: https://patch.msgid.link/20250203141051.2361323-4-cezary.rojewski@intel.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: imx-card: Adjust over allocation of memory in imx_card_parse_of() [+ + +]

Author: Chenyuan Yang <chenyuan0y@gmail.com>
Date:   Sun Apr 6 16:08:54 2025 -0500

    ASoC: imx-card: Adjust over allocation of memory in imx_card_parse_of()
    
    [ Upstream commit a9a69c3b38c89d7992fb53db4abb19104b531d32 ]
    
    Incorrect types are used as sizeof() arguments in devm_kcalloc().
    It should be sizeof(dai_link_data) for link_data instead of
    sizeof(snd_soc_dai_link).
    
    This is found by our static analysis tool.
    
    Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com>
    Link: https://patch.msgid.link/20250406210854.149316-1-chenyuan0y@gmail.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: Intel: bytcr_rt5640: Add DMI quirk for Acer Aspire SW3-013 [+ + +]

Author: Takashi Iwai <tiwai@suse.de>
Date:   Sun Apr 20 10:56:59 2025 +0200

    ASoC: Intel: bytcr_rt5640: Add DMI quirk for Acer Aspire SW3-013
    
    [ Upstream commit a549b927ea3f5e50b1394209b64e6e17e31d4db8 ]
    
    Acer Aspire SW3-013 requires the very same quirk as other Acer Aspire
    model for making it working.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=220011
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Link: https://patch.msgid.link/20250420085716.12095-1-tiwai@suse.de
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: mediatek: mt6359: Add stub for mt6359_accdet_enable_jack_detect [+ + +]

Author: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Date:   Thu Mar 6 16:52:17 2025 -0300

    ASoC: mediatek: mt6359: Add stub for mt6359_accdet_enable_jack_detect
    
    [ Upstream commit 0116a7d84b32537a10d9bea1fd1bfc06577ef527 ]
    
    Add a stub for mt6359_accdet_enable_jack_detect() to prevent linker
    failures in the machine sound drivers calling it when
    CONFIG_SND_SOC_MT6359_ACCDET is not enabled.
    
    Suggested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
    Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
    Link: https://patch.msgid.link/20250306-mt8188-accdet-v3-3-7828e835ff4b@collabora.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: ops: Enforce platform maximum on initial value [+ + +]

Author: Martin Povišer <povik+lin@cutebit.org>
Date:   Sat Feb 8 00:57:22 2025 +0000

    ASoC: ops: Enforce platform maximum on initial value
    
    [ Upstream commit 783db6851c1821d8b983ffb12b99c279ff64f2ee ]
    
    Lower the volume if it is violating the platform maximum at its initial
    value (i.e. at the time of the 'snd_soc_limit_volume' call).
    
    Signed-off-by: Martin Povišer <povik+lin@cutebit.org>
    [Cherry picked from the Asahi kernel with fixups -- broonie]
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Link: https://patch.msgid.link/20250208-asoc-volume-limit-v1-1-b98fcf4cdbad@kernel.org
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: qcom: sm8250: explicitly set format in sm8250_be_hw_params_fixup() [+ + +]

Author: Alexey Klimov <alexey.klimov@linaro.org>
Date:   Fri Feb 28 16:14:30 2025 +0000

    ASoC: qcom: sm8250: explicitly set format in sm8250_be_hw_params_fixup()
    
    [ Upstream commit 89be3c15a58b2ccf31e969223c8ac93ca8932d81 ]
    
    Setting format to s16le is required for compressed playback on compatible
    soundcards.
    
    Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
    Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org>
    Link: https://patch.msgid.link/20250228161430.373961-1-alexey.klimov@linaro.org
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: soc-dai: check return value at snd_soc_dai_set_tdm_slot() [+ + +]

Author: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Date:   Wed Feb 12 02:24:38 2025 +0000

    ASoC: soc-dai: check return value at snd_soc_dai_set_tdm_slot()
    
    [ Upstream commit 7f1186a8d738661b941b298fd6d1d5725ed71428 ]
    
    snd_soc_dai_set_tdm_slot() calls .xlate_tdm_slot_mask() or
    snd_soc_xlate_tdm_slot_mask(), but didn't check its return value.
    Let's check it.
    
    This patch might break existing driver. In such case, let's makes
    each func to void instead of int.
    
    Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
    Link: https://patch.msgid.link/87o6z7yk61.wl-kuninori.morimoto.gx@renesas.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: sun4i-codec: support hp-det-gpios property [+ + +]

Author: Ryan Walklin <ryan@testtoast.com>
Date:   Sat Feb 15 11:02:25 2025 +1300

    ASoC: sun4i-codec: support hp-det-gpios property
    
    [ Upstream commit a149377c033afe6557c50892ebbfc0e8b7e2e253 ]
    
    Add support for GPIO headphone detection with the hp-det-gpios
    property. In order for this to properly disable the path upon
    removal of headphones, the output must be labelled Headphone which
    is a common sink in the driver.
    
    Describe a headphone jack and detection GPIO in the driver, check for
    a corresponding device tree node, and enable jack detection in a new
    machine init function if described.
    
    Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
    Signed-off-by: Ryan Walklin <ryan@testtoast.com>
    
    --
    Changelog v1..v2:
    - Separate DAPM changes into separate patch and add rationale.
    
    Tested-by: Philippe Simons <simons.philippe@gmail.com>
    Link: https://patch.msgid.link/20250214220247.10810-4-ryan@testtoast.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: tas2764: Add reg defaults for TAS2764_INT_CLK_CFG [+ + +]

Author: Hector Martin <marcan@marcan.st>
Date:   Sat Feb 8 01:03:27 2025 +0000

    ASoC: tas2764: Add reg defaults for TAS2764_INT_CLK_CFG
    
    [ Upstream commit d64c4c3d1c578f98d70db1c5e2535b47adce9d07 ]
    
    Signed-off-by: Hector Martin <marcan@marcan.st>
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Link: https://patch.msgid.link/20250208-asoc-tas2764-v1-4-dbab892a69b5@kernel.org
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: tas2764: Mark SW_RESET as volatile [+ + +]

Author: Hector Martin <marcan@marcan.st>
Date:   Sat Feb 8 01:03:26 2025 +0000

    ASoC: tas2764: Mark SW_RESET as volatile
    
    [ Upstream commit f37f1748564ac51d32f7588bd7bfc99913ccab8e ]
    
    Since the bit is self-clearing.
    
    Signed-off-by: Hector Martin <marcan@marcan.st>
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Link: https://patch.msgid.link/20250208-asoc-tas2764-v1-3-dbab892a69b5@kernel.org
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: tas2764: Power up/down amp on mute ops [+ + +]

Author: Hector Martin <marcan@marcan.st>
Date:   Sat Feb 8 01:03:24 2025 +0000

    ASoC: tas2764: Power up/down amp on mute ops
    
    [ Upstream commit 1c3b5f37409682184669457a5bdf761268eafbe5 ]
    
    The ASoC convention is that clocks are removed after codec mute, and
    power up/down is more about top level power management. For these chips,
    the "mute" state still expects a TDM clock, and yanking the clock in
    this state will trigger clock errors. So, do the full
    shutdown<->mute<->active transition on the mute operation, so the amp is
    in software shutdown by the time the clocks are removed.
    
    This fixes TDM clock errors when streams are stopped.
    
    Signed-off-by: Hector Martin <marcan@marcan.st>
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Link: https://patch.msgid.link/20250208-asoc-tas2764-v1-1-dbab892a69b5@kernel.org
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

auxdisplay: charlcd: Partially revert "Move hwidth and bwidth to struct hd44780_common" [+ + +]

Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date:   Mon Feb 24 19:27:38 2025 +0200

    auxdisplay: charlcd: Partially revert "Move hwidth and bwidth to struct hd44780_common"
    
    [ Upstream commit 09965a142078080fe7807bab0f6f1890cb5987a4 ]
    
    Commit 2545c1c948a6 ("auxdisplay: Move hwidth and bwidth to struct
    hd44780_common") makes charlcd_alloc() argument-less effectively dropping
    the single allocation for the struct charlcd_priv object along with
    the driver specific one. Restore that behaviour here.
    
    Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: L2CAP: Fix not checking l2cap_chan security level [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Wed May 7 15:00:30 2025 -0400

    Bluetooth: L2CAP: Fix not checking l2cap_chan security level
    
    [ Upstream commit 7af8479d9eb4319b4ba7b47a8c4d2c55af1c31e1 ]
    
    l2cap_check_enc_key_size shall check the security level of the
    l2cap_chan rather than the hci_conn since for incoming connection
    request that may be different as hci_conn may already been
    encrypted using a different security level.
    
    Fixes: 522e9ed157e3 ("Bluetooth: l2cap: Check encryption key size on incoming connection")
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bonding: report duplicate MAC address in all situations [+ + +]

Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Tue Feb 25 03:39:14 2025 +0000

    bonding: report duplicate MAC address in all situations
    
    [ Upstream commit 28d68d396a1cd21591e8c6d74afbde33a7ea107e ]
    
    Normally, a bond uses the MAC address of the first added slave as the bond’s
    MAC address. And the bond will set active slave’s MAC address to bond’s
    address if fail_over_mac is set to none (0) or follow (2).
    
    When the first slave is removed, the bond will still use the removed slave’s
    MAC address, which can lead to a duplicate MAC address and potentially cause
    issues with the switch. To avoid confusion, let's warn the user in all
    situations, including when fail_over_mac is set to 2 or not in active-backup
    mode.
    
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://patch.msgid.link/20250225033914.18617-1-liuhangbin@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bpf: fix possible endless loop in BPF map iteration [+ + +]

Author: Brandon Kammerdiener <brandon.kammerdiener@intel.com>
Date:   Thu Apr 24 11:32:51 2025 -0400

    bpf: fix possible endless loop in BPF map iteration
    
    [ Upstream commit 75673fda0c557ae26078177dd14d4857afbf128d ]
    
    The _safe variant used here gets the next element before running the callback,
    avoiding the endless loop condition.
    
    Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@intel.com>
    Link: https://lore.kernel.org/r/20250424153246.141677-2-brandon.kammerdiener@intel.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bpf: Return prog btf_id without capable check [+ + +]

Author: Mykyta Yatsenko <yatsenko@meta.com>
Date:   Mon Mar 17 17:40:37 2025 +0000

    bpf: Return prog btf_id without capable check
    
    [ Upstream commit 07651ccda9ff10a8ca427670cdd06ce2c8e4269c ]
    
    Return prog's btf_id from bpf_prog_get_info_by_fd regardless of capable
    check. This patch enables scenario, when freplace program, running
    from user namespace, requires to query target prog's btf.
    
    Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/bpf/20250317174039.161275-3-mykyta.yatsenko5@gmail.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bpftool: Fix readlink usage in get_fd_type [+ + +]

Author: Viktor Malik <vmalik@redhat.com>
Date:   Wed Jan 29 08:18:57 2025 +0100

    bpftool: Fix readlink usage in get_fd_type
    
    [ Upstream commit 0053f7d39d491b6138d7c526876d13885cbb65f1 ]
    
    The `readlink(path, buf, sizeof(buf))` call reads at most sizeof(buf)
    bytes and *does not* append null-terminator to buf. With respect to
    that, fix two pieces in get_fd_type:
    
    1. Change the truncation check to contain sizeof(buf) rather than
       sizeof(path).
    2. Append null-terminator to buf.
    
    Reported by Coverity.
    
    Signed-off-by: Viktor Malik <vmalik@redhat.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Quentin Monnet <qmo@kernel.org>
    Link: https://lore.kernel.org/bpf/20250129071857.75182-1-vmalik@redhat.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bridge: netfilter: Fix forwarding of fragmented packets [+ + +]

Author: Ido Schimmel <idosch@nvidia.com>
Date:   Thu May 15 11:48:48 2025 +0300

    bridge: netfilter: Fix forwarding of fragmented packets
    
    [ Upstream commit 91b6dbced0ef1d680afdd69b14fc83d50ebafaf3 ]
    
    When netfilter defrag hooks are loaded (due to the presence of conntrack
    rules, for example), fragmented packets entering the bridge will be
    defragged by the bridge's pre-routing hook (br_nf_pre_routing() ->
    ipv4_conntrack_defrag()).
    
    Later on, in the bridge's post-routing hook, the defragged packet will
    be fragmented again. If the size of the largest fragment is larger than
    what the kernel has determined as the destination MTU (using
    ip_skb_dst_mtu()), the defragged packet will be dropped.
    
    Before commit ac6627a28dbf ("net: ipv4: Consolidate ipv4_mtu and
    ip_dst_mtu_maybe_forward"), ip_skb_dst_mtu() would return dst_mtu() as
    the destination MTU. Assuming the dst entry attached to the packet is
    the bridge's fake rtable one, this would simply be the bridge's MTU (see
    fake_mtu()).
    
    However, after above mentioned commit, ip_skb_dst_mtu() ends up
    returning the route's MTU stored in the dst entry's metrics. Ideally, in
    case the dst entry is the bridge's fake rtable one, this should be the
    bridge's MTU as the bridge takes care of updating this metric when its
    MTU changes (see br_change_mtu()).
    
    Unfortunately, the last operation is a no-op given the metrics attached
    to the fake rtable entry are marked as read-only. Therefore,
    ip_skb_dst_mtu() ends up returning 1500 (the initial MTU value) and
    defragged packets are dropped during fragmentation when dealing with
    large fragments and high MTU (e.g., 9k).
    
    Fix by moving the fake rtable entry's metrics to be per-bridge (in a
    similar fashion to the fake rtable entry itself) and marking them as
    writable, thereby allowing MTU changes to be reflected.
    
    Fixes: 62fa8a846d7d ("net: Implement read-only protection and COW'ing of metrics.")
    Fixes: 33eb9873a283 ("bridge: initialize fake_rtable metrics")
    Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
    Closes: https://lore.kernel.org/netdev/PH0PR10MB4504888284FF4CBA648197D0ACB82@PH0PR10MB4504.namprd10.prod.outlook.com/
    Tested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://patch.msgid.link/20250515084848.727706-1-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: avoid linker error in btrfs_find_create_tree_block() [+ + +]

Author: Mark Harmstone <maharmstone@fb.com>
Date:   Thu Mar 6 10:58:46 2025 +0000

    btrfs: avoid linker error in btrfs_find_create_tree_block()
    
    [ Upstream commit 7ef3cbf17d2734ca66c4ed8573be45f4e461e7ee ]
    
    The inline function btrfs_is_testing() is hardcoded to return 0 if
    CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set. Currently we're relying on
    the compiler optimizing out the call to alloc_test_extent_buffer() in
    btrfs_find_create_tree_block(), as it's not been defined (it's behind an
     #ifdef).
    
    Add a stub version of alloc_test_extent_buffer() to avoid linker errors
    on non-standard optimization levels. This problem was seen on GCC 14
    with -O0 and is helps to see symbols that would be otherwise optimized
    out.
    
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Mark Harmstone <maharmstone@fb.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: check folio mapping after unlock in relocate_one_folio() [+ + +]

Author: Boris Burkov <boris@bur.io>
Date:   Fri Dec 13 12:22:32 2024 -0800

    btrfs: check folio mapping after unlock in relocate_one_folio()
    
    commit 3e74859ee35edc33a022c3f3971df066ea0ca6b9 upstream.
    
    When we call btrfs_read_folio() to bring a folio uptodate, we unlock the
    folio. The result of that is that a different thread can modify the
    mapping (like remove it with invalidate) before we call folio_lock().
    This results in an invalid page and we need to try again.
    
    In particular, if we are relocating concurrently with aborting a
    transaction, this can result in a crash like the following:
    
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP
      CPU: 76 PID: 1411631 Comm: kworker/u322:5
      Workqueue: events_unbound btrfs_reclaim_bgs_work
      RIP: 0010:set_page_extent_mapped+0x20/0xb0
      RSP: 0018:ffffc900516a7be8 EFLAGS: 00010246
      RAX: ffffea009e851d08 RBX: ffffea009e0b1880 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffc900516a7b90 RDI: ffffea009e0b1880
      RBP: 0000000003573000 R08: 0000000000000001 R09: ffff88c07fd2f3f0
      R10: 0000000000000000 R11: 0000194754b575be R12: 0000000003572000
      R13: 0000000003572fff R14: 0000000000100cca R15: 0000000005582fff
      FS:  0000000000000000(0000) GS:ffff88c07fd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 000000407d00f002 CR4: 00000000007706f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
      <TASK>
      ? __die+0x78/0xc0
      ? page_fault_oops+0x2a8/0x3a0
      ? __switch_to+0x133/0x530
      ? wq_worker_running+0xa/0x40
      ? exc_page_fault+0x63/0x130
      ? asm_exc_page_fault+0x22/0x30
      ? set_page_extent_mapped+0x20/0xb0
      relocate_file_extent_cluster+0x1a7/0x940
      relocate_data_extent+0xaf/0x120
      relocate_block_group+0x20f/0x480
      btrfs_relocate_block_group+0x152/0x320
      btrfs_relocate_chunk+0x3d/0x120
      btrfs_reclaim_bgs_work+0x2ae/0x4e0
      process_scheduled_works+0x184/0x370
      worker_thread+0xc6/0x3e0
      ? blk_add_timer+0xb0/0xb0
      kthread+0xae/0xe0
      ? flush_tlb_kernel_range+0x90/0x90
      ret_from_fork+0x2f/0x40
      ? flush_tlb_kernel_range+0x90/0x90
      ret_from_fork_asm+0x11/0x20
      </TASK>
    
    This occurs because cleanup_one_transaction() calls
    destroy_delalloc_inodes() which calls invalidate_inode_pages2() which
    takes the folio_lock before setting mapping to NULL. We fail to check
    this, and subsequently call set_extent_mapping(), which assumes that
    mapping != NULL (in fact it asserts that in debug mode)
    
    Note that the "fixes" patch here is not the one that introduced the
    race (the very first iteration of this code from 2009) but a more recent
    change that made this particular crash happen in practice.
    
    Fixes: e7f1326cc24e ("btrfs: set page extent mapped after read_folio in relocate_one_page")
    CC: stable@vger.kernel.org # 6.1+
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Boris Burkov <boris@bur.io>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Zhaoyang Li <lizy04@hust.edu.cn>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: correct the order of prelim_ref arguments in btrfs__prelim_ref [+ + +]

Author: Goldwyn Rodrigues <rgoldwyn@suse.de>
Date:   Fri Apr 25 09:25:06 2025 -0400

    btrfs: correct the order of prelim_ref arguments in btrfs__prelim_ref
    
    [ Upstream commit bc7e0975093567f51be8e1bdf4aa5900a3cf0b1e ]
    
    btrfs_prelim_ref() calls the old and new reference variables in the
    incorrect order. This causes a NULL pointer dereference because oldref
    is passed as NULL to trace_btrfs_prelim_ref_insert().
    
    Note, trace_btrfs_prelim_ref_insert() is being called with newref as
    oldref (and oldref as NULL) on purpose in order to print out
    the values of newref.
    
    To reproduce:
    echo 1 > /sys/kernel/debug/tracing/events/btrfs/btrfs_prelim_ref_insert/enable
    
    Perform some writeback operations.
    
    Backtrace:
    BUG: kernel NULL pointer dereference, address: 0000000000000018
     #PF: supervisor read access in kernel mode
     #PF: error_code(0x0000) - not-present page
     PGD 115949067 P4D 115949067 PUD 11594a067 PMD 0
     Oops: Oops: 0000 [#1] SMP NOPTI
     CPU: 1 UID: 0 PID: 1188 Comm: fsstress Not tainted 6.15.0-rc2-tester+ #47 PREEMPT(voluntary)  7ca2cef72d5e9c600f0c7718adb6462de8149622
     Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014
     RIP: 0010:trace_event_raw_event_btrfs__prelim_ref+0x72/0x130
     Code: e8 43 81 9f ff 48 85 c0 74 78 4d 85 e4 0f 84 8f 00 00 00 49 8b 94 24 c0 06 00 00 48 8b 0a 48 89 48 08 48 8b 52 08 48 89 50 10 <49> 8b 55 18 48 89 50 18 49 8b 55 20 48 89 50 20 41 0f b6 55 28 88
     RSP: 0018:ffffce44820077a0 EFLAGS: 00010286
     RAX: ffff8c6b403f9014 RBX: ffff8c6b55825730 RCX: 304994edf9cf506b
     RDX: d8b11eb7f0fdb699 RSI: ffff8c6b403f9010 RDI: ffff8c6b403f9010
     RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000010
     R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8c6b4e8fb000
     R13: 0000000000000000 R14: ffffce44820077a8 R15: ffff8c6b4abd1540
     FS:  00007f4dc6813740(0000) GS:ffff8c6c1d378000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 0000000000000018 CR3: 000000010eb42000 CR4: 0000000000750ef0
     PKRU: 55555554
     Call Trace:
      <TASK>
      prelim_ref_insert+0x1c1/0x270
      find_parent_nodes+0x12a6/0x1ee0
      ? __entry_text_end+0x101f06/0x101f09
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? srso_alias_return_thunk+0x5/0xfbef5
      btrfs_is_data_extent_shared+0x167/0x640
      ? fiemap_process_hole+0xd0/0x2c0
      extent_fiemap+0xa5c/0xbc0
      ? __entry_text_end+0x101f05/0x101f09
      btrfs_fiemap+0x7e/0xd0
      do_vfs_ioctl+0x425/0x9d0
      __x64_sys_ioctl+0x75/0xc0
    
    Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: fix non-empty delayed iputs list on unmount due to async workers [+ + +]

Author: Filipe Manana <fdmanana@suse.com>
Date:   Thu Mar 6 14:25:38 2025 +0000

    btrfs: fix non-empty delayed iputs list on unmount due to async workers
    
    [ Upstream commit cda76788f8b0f7de3171100e3164ec1ce702292e ]
    
    At close_ctree() after we have ran delayed iputs either explicitly through
    calling btrfs_run_delayed_iputs() or later during the call to
    btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
    delayed iputs list is empty.
    
    We have (another) race where this assertion might fail because we have
    queued an async write into the fs_info->workers workqueue. Here's how it
    happens:
    
    1) We are submitting a data bio for an inode that is not the data
       relocation inode, so we call btrfs_wq_submit_bio();
    
    2) btrfs_wq_submit_bio() submits a work for the fs_info->workers queue
       that will run run_one_async_done();
    
    3) We enter close_ctree(), flush several work queues except
       fs_info->workers, explicitly run delayed iputs with a call to
       btrfs_run_delayed_iputs() and then again shortly after by calling
       btrfs_commit_super() or btrfs_error_commit_super(), which also run
       delayed iputs;
    
    4) run_one_async_done() is executed in the work queue, and because there
       was an IO error (bio->bi_status is not 0) it calls btrfs_bio_end_io(),
       which drops the final reference on the associated ordered extent by
       calling btrfs_put_ordered_extent() - and that adds a delayed iput for
       the inode;
    
    5) At close_ctree() we find that after stopping the cleaner and
       transaction kthreads the delayed iputs list is not empty, failing the
       following assertion:
    
          ASSERT(list_empty(&fs_info->delayed_iputs));
    
    Fix this by flushing the fs_info->workers workqueue before running delayed
    iputs at close_ctree().
    
    David reported this when running generic/648, which exercises IO error
    paths by using the DM error table.
    
    Reported-by: David Sterba <dsterba@suse.com>
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: get zone unusable bytes while holding lock at btrfs_reclaim_bgs_work() [+ + +]

Author: Filipe Manana <fdmanana@suse.com>
Date:   Fri Feb 21 16:12:15 2025 +0000

    btrfs: get zone unusable bytes while holding lock at btrfs_reclaim_bgs_work()
    
    [ Upstream commit 1283b8c125a83bf7a7dbe90c33d3472b6d7bf612 ]
    
    At btrfs_reclaim_bgs_work(), we are grabbing a block group's zone unusable
    bytes while not under the protection of the block group's spinlock, so
    this can trigger race reports from KCSAN (or similar tools) since that
    field is typically updated while holding the lock, such as at
    __btrfs_add_free_space_zoned() for example.
    
    Fix this by grabbing the zone unusable bytes while we are still in the
    critical section holding the block group's spinlock, which is right above
    where we are currently grabbing it.
    
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: make btrfs_discard_workfn() block_group ref explicit [+ + +]

Author: Boris Burkov <boris@bur.io>
Date:   Mon Mar 3 15:01:05 2025 -0800

    btrfs: make btrfs_discard_workfn() block_group ref explicit
    
    [ Upstream commit 895c6721d310c036dcfebb5ab845822229fa35eb ]
    
    Currently, the async discard machinery owns a ref to the block_group
    when the block_group is queued on a discard list. However, to handle
    races with discard cancellation and the discard workfn, we have a
    specific logic to detect that the block_group is *currently* running in
    the workfn, to protect the workfn's usage amidst cancellation.
    
    As far as I can tell, this doesn't have any overt bugs (though
    finish_discard_pass() and remove_from_discard_list() racing can have a
    surprising outcome for the caller of remove_from_discard_list() in that
    it is again added at the end).
    
    But it is needlessly complicated to rely on locking and the nullity of
    discard_ctl->block_group. Simplify this significantly by just taking a
    refcount while we are in the workfn and unconditionally drop it in both
    the remove and workfn paths, regardless of if they race.
    
    Reviewed-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Boris Burkov <boris@bur.io>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: run btrfs_error_commit_super() early [+ + +]

Author: Qu Wenruo <wqu@suse.com>
Date:   Fri Mar 7 14:36:10 2025 +1030

    btrfs: run btrfs_error_commit_super() early
    
    [ Upstream commit df94a342efb451deb0e32b495d1d6cd4bb3a1648 ]
    
    [BUG]
    Even after all the error fixes related the
    "ASSERT(list_empty(&fs_info->delayed_iputs));" in close_ctree(), I can
    still hit it reliably with my experimental 2K block size.
    
    [CAUSE]
    In my case, all the error is triggered after the fs is already in error
    status.
    
    I find the following call trace to be the cause of race:
    
               Main thread                       |     endio_write_workers
    ---------------------------------------------+---------------------------
    close_ctree()                                |
    |- btrfs_error_commit_super()                |
    |  |- btrfs_cleanup_transaction()            |
    |  |  |- btrfs_destroy_all_ordered_extents() |
    |  |     |- btrfs_wait_ordered_roots()       |
    |  |- btrfs_run_delayed_iputs()              |
    |                                            | btrfs_finish_ordered_io()
    |                                            | |- btrfs_put_ordered_extent()
    |                                            |    |- btrfs_add_delayed_iput()
    |- ASSERT(list_empty(delayed_iputs))         |
       !!! Triggered !!!
    
    The root cause is that, btrfs_wait_ordered_roots() only wait for
    ordered extents to finish their IOs, not to wait for them to finish and
    removed.
    
    [FIX]
    Since btrfs_error_commit_super() will flush and wait for all ordered
    extents, it should be executed early, before we start flushing the
    workqueues.
    
    And since btrfs_error_commit_super() now runs early, there is no need to
    run btrfs_run_delayed_iputs() inside it, so just remove the
    btrfs_run_delayed_iputs() call from btrfs_error_commit_super().
    
    Reviewed-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: send: return -ENAMETOOLONG when attempting a path that is too long [+ + +]

Author: Filipe Manana <fdmanana@suse.com>
Date:   Wed Feb 5 13:09:25 2025 +0000

    btrfs: send: return -ENAMETOOLONG when attempting a path that is too long
    
    [ Upstream commit a77749b3e21813566cea050bbb3414ae74562eba ]
    
    When attempting to build a too long path we are currently returning
    -ENOMEM, which is very odd and misleading. So update fs_path_ensure_buf()
    to return -ENAMETOOLONG instead. Also, while at it, move the WARN_ON()
    into the if statement's expression, as it makes it clear what is being
    tested and also has the effect of adding 'unlikely' to the statement,
    which allows the compiler to generate better code as this condition is
    never expected to happen.
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

can: bcm: add locking for bcm_op runtime updates [+ + +]

Author: Oliver Hartkopp <socketcan@hartkopp.net>
Date:   Mon May 19 14:50:26 2025 +0200

    can: bcm: add locking for bcm_op runtime updates
    
    commit c2aba69d0c36a496ab4f2e81e9c2b271f2693fd7 upstream.
    
    The CAN broadcast manager (CAN BCM) can send a sequence of CAN frames via
    hrtimer. The content and also the length of the sequence can be changed
    resp reduced at runtime where the 'currframe' counter is then set to zero.
    
    Although this appeared to be a safe operation the updates of 'currframe'
    can be triggered from user space and hrtimer context in bcm_can_tx().
    Anderson Nascimento created a proof of concept that triggered a KASAN
    slab-out-of-bounds read access which can be prevented with a spin_lock_bh.
    
    At the rework of bcm_can_tx() the 'count' variable has been moved into
    the protected section as this variable can be modified from both contexts
    too.
    
    Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
    Reported-by: Anderson Nascimento <anderson@allelesecurity.com>
    Tested-by: Anderson Nascimento <anderson@allelesecurity.com>
    Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
    Link: https://patch.msgid.link/20250519125027.11900-1-socketcan@hartkopp.net
    Cc: stable@vger.kernel.org
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

can: bcm: add missing rcu read protection for procfs content [+ + +]

Author: Oliver Hartkopp <socketcan@hartkopp.net>
Date:   Mon May 19 14:50:27 2025 +0200

    can: bcm: add missing rcu read protection for procfs content
    
    commit dac5e6249159ac255dad9781793dbe5908ac9ddb upstream.
    
    When the procfs content is generated for a bcm_op which is in the process
    to be removed the procfs output might show unreliable data (UAF).
    
    As the removal of bcm_op's is already implemented with rcu handling this
    patch adds the missing rcu_read_lock() and makes sure the list entries
    are properly removed under rcu protection.
    
    Fixes: f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly synchronize_rcu()")
    Reported-by: Anderson Nascimento <anderson@allelesecurity.com>
    Suggested-by: Anderson Nascimento <anderson@allelesecurity.com>
    Tested-by: Anderson Nascimento <anderson@allelesecurity.com>
    Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
    Link: https://patch.msgid.link/20250519125027.11900-2-socketcan@hartkopp.net
    Cc: stable@vger.kernel.org # >= 5.4
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

can: c_can: Use of_property_present() to test existence of DT property [+ + +]

Author: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Date:   Wed Feb 12 21:23:14 2025 +0100

    can: c_can: Use of_property_present() to test existence of DT property
    
    [ Upstream commit ab1bc2290fd8311d49b87c29f1eb123fcb581bee ]
    
    of_property_read_bool() should be used only on boolean properties.
    
    Cc: Rob Herring <robh@kernel.org>
    Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
    Link: https://patch.msgid.link/20250212-syscon-phandle-args-can-v2-3-ac9a1253396b@linaro.org
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

can: slcan: allow reception of short error messages [+ + +]

Author: Carlos Sanchez <carlossanchez@geotab.com>
Date:   Tue May 20 12:23:05 2025 +0200

    can: slcan: allow reception of short error messages
    
    commit ef0841e4cb08754be6cb42bf97739fce5d086e5f upstream.
    
    Allows slcan to receive short messages (typically errors) from the serial
    interface.
    
    When error support was added to slcan protocol in
    b32ff4668544e1333b694fcc7812b2d7397b4d6a ("can: slcan: extend the protocol
    with error info") the minimum valid message size changed from 5 (minimum
    standard can frame tIII0) to 3 ("e1a" is a valid protocol message, it is
    one of the examples given in the comments for slcan_bump_err() ), but the
    check for minimum message length prodicating all decoding was not adjusted.
    This makes short error messages discarded and error frames not being
    generated.
    
    This patch changes the minimum length to the new minimum (3 characters,
    excluding terminator, is now a valid message).
    
    Signed-off-by: Carlos Sanchez <carlossanchez@geotab.com>
    Fixes: b32ff4668544 ("can: slcan: extend the protocol with error info")
    Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
    Link: https://patch.msgid.link/20250520102305.1097494-1-carlossanchez@geotab.com
    Cc: stable@vger.kernel.org
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cgroup: Fix compilation issue due to cgroup_mutex not being exported [+ + +]

Author: gaoxu <gaoxu2@honor.com>
Date:   Thu Apr 17 07:30:00 2025 +0000

    cgroup: Fix compilation issue due to cgroup_mutex not being exported
    
    [ Upstream commit 87c259a7a359e73e6c52c68fcbec79988999b4e6 ]
    
    When adding folio_memcg function call in the zram module for
    Android16-6.12, the following error occurs during compilation:
    ERROR: modpost: "cgroup_mutex" [../soc-repo/zram.ko] undefined!
    
    This error is caused by the indirect call to lockdep_is_held(&cgroup_mutex)
    within folio_memcg. The export setting for cgroup_mutex is controlled by
    the CONFIG_PROVE_RCU macro. If CONFIG_LOCKDEP is enabled while
    CONFIG_PROVE_RCU is not, this compilation error will occur.
    
    To resolve this issue, add a parallel macro CONFIG_LOCKDEP control to
    ensure cgroup_mutex is properly exported when needed.
    
    Signed-off-by: gao xu <gaoxu2@honor.com>
    Acked-by: Michal Koutný <mkoutny@suse.com>
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cifs: Add fallback for SMB2 CREATE without FILE_READ_ATTRIBUTES [+ + +]

Author: Pali Rohár <pali@kernel.org>
Date:   Mon Dec 9 20:44:23 2024 +0100

    cifs: Add fallback for SMB2 CREATE without FILE_READ_ATTRIBUTES
    
    [ Upstream commit e255612b5ed9f179abe8196df7c2ba09dd227900 ]
    
    Some operations, like WRITE, does not require FILE_READ_ATTRIBUTES access.
    
    So when FILE_READ_ATTRIBUTES is not explicitly requested for
    smb2_open_file() then first try to do SMB2 CREATE with FILE_READ_ATTRIBUTES
    access (like it was before) and then fallback to SMB2 CREATE without
    FILE_READ_ATTRIBUTES access (less common case).
    
    This change allows to complete WRITE operation to a file when it does not
    grant FILE_READ_ATTRIBUTES permission and its parent directory does not
    grant READ_DATA permission (parent directory READ_DATA is implicit grant of
    child FILE_READ_ATTRIBUTES permission).
    
    Signed-off-by: Pali Rohár <pali@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cifs: Fix establishing NetBIOS session for SMB2+ connection [+ + +]

Author: Pali Rohár <pali@kernel.org>
Date:   Wed Oct 30 22:46:20 2024 +0100

    cifs: Fix establishing NetBIOS session for SMB2+ connection
    
    [ Upstream commit 781802aa5a5950f99899f13ff9d760f5db81d36d ]
    
    Function ip_rfc1001_connect() which establish NetBIOS session for SMB
    connections, currently uses smb_send() function for sending NetBIOS Session
    Request packet. This function expects that the passed buffer is SMB packet
    and for SMB2+ connections it mangles packet header, which breaks prepared
    NetBIOS Session Request packet. Result is that this function send garbage
    packet for SMB2+ connection, which SMB2+ server cannot parse. That function
    is not mangling packets for SMB1 connections, so it somehow works for SMB1.
    
    Fix this problem and instead of smb_send(), use smb_send_kvec() function
    which does not mangle prepared packet, this function send them as is. Just
    API of this function takes struct msghdr (kvec) instead of packet buffer.
    
    [MS-SMB2] specification allows SMB2 protocol to use NetBIOS as a transport
    protocol. NetBIOS can be used over TCP via port 139. So this is a valid
    configuration, just not so common. And even recent Windows versions (e.g.
    Windows Server 2022) still supports this configuration: SMB over TCP port
    139, including for modern SMB2 and SMB3 dialects.
    
    This change fixes SMB2 and SMB3 connections over TCP port 139 which
    requires establishing of NetBIOS session. Tested that this change fixes
    establishing of SMB2 and SMB3 connections with Windows Server 2022.
    
    Signed-off-by: Pali Rohár <pali@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cifs: Fix negotiate retry functionality [+ + +]

Author: Pali Rohár <pali@kernel.org>
Date:   Sat Nov 2 20:06:50 2024 +0100

    cifs: Fix negotiate retry functionality
    
    [ Upstream commit e94e882a6d69525c07589222cf3a6ff57ad12b5b ]
    
    SMB negotiate retry functionality in cifs_negotiate() is currently broken
    and does not work when doing socket reconnect. Caller of this function,
    which is cifs_negotiate_protocol() requires that tcpStatus after successful
    execution of negotiate callback stay in CifsInNegotiate. But if the
    CIFSSMBNegotiate() called from cifs_negotiate() fails due to connection
    issues then tcpStatus is changed as so repeated CIFSSMBNegotiate() call
    does not help.
    
    Fix this problem by moving retrying code from negotiate callback (which is
    either cifs_negotiate() or smb2_negotiate()) to cifs_negotiate_protocol()
    which is caller of those callbacks. This allows to properly handle and
    implement correct transistions between tcpStatus states as function
    cifs_negotiate_protocol() already handles it.
    
    With this change, cifs_negotiate_protocol() now handles also -EAGAIN error
    set by the RFC1002_NEGATIVE_SESSION_RESPONSE processing after reconnecting
    with NetBIOS session.
    
    Signed-off-by: Pali Rohár <pali@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cifs: Fix querying and creating MF symlinks over SMB1 [+ + +]

Author: Pali Rohár <pali@kernel.org>
Date:   Sat Dec 28 21:09:54 2024 +0100

    cifs: Fix querying and creating MF symlinks over SMB1
    
    [ Upstream commit 4236ac9fe5b8b42756070d4abfb76fed718e87c2 ]
    
    Old SMB1 servers without CAP_NT_SMBS do not support CIFS_open() function
    and instead SMBLegacyOpen() needs to be used. This logic is already handled
    in cifs_open_file() function, which is server->ops->open callback function.
    
    So for querying and creating MF symlinks use open callback function instead
    of CIFS_open() function directly.
    
    This change fixes querying and creating new MF symlinks on Windows 98.
    Currently cifs_query_mf_symlink() is not able to detect MF symlink and
    cifs_create_mf_symlink() is failing with EIO error.
    
    Signed-off-by: Pali Rohár <pali@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

clk: imx8mp: inform CCF of maximum frequency of clocks [+ + +]

Author: Ahmad Fatoum <a.fatoum@pengutronix.de>
Date:   Tue Feb 18 19:26:46 2025 +0100

    clk: imx8mp: inform CCF of maximum frequency of clocks
    
    [ Upstream commit 06a61b5cb6a8638fa8823cd09b17233b29696fa2 ]
    
    The IMX8MPCEC datasheet lists maximum frequencies allowed for different
    modules. Some of these limits are universal, but some depend on
    whether the SoC is operating in nominal or in overdrive mode.
    
    The imx8mp.dtsi currently assumes overdrive mode and configures some
    clocks in accordance with this. Boards wishing to make use of nominal
    mode will need to override some of the clock rates manually.
    
    As operating the clocks outside of their allowed range can lead to
    difficult to debug issues, it makes sense to register the maximum rates
    allowed in the driver, so the CCF can take them into account.
    
    Reviewed-by: Peng Fan <peng.fan@nxp.com>
    Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
    Link: https://lore.kernel.org/r/20250218-imx8m-clk-v4-6-b7697dc2dcd0@pengutronix.de
    Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

clk: qcom: camcc-sm8250: Use clk_rcg2_shared_ops for some RCGs [+ + +]

Author: Jordan Crouse <jorcrous@amazon.com>
Date:   Wed Jan 22 22:26:12 2025 +0000

    clk: qcom: camcc-sm8250: Use clk_rcg2_shared_ops for some RCGs
    
    [ Upstream commit 52b10b591f83dc6d9a1d6c2dc89433470a787ecd ]
    
    Update some RCGs on the sm8250 camera clock controller to use
    clk_rcg2_shared_ops. The shared_ops ensure the RCGs get parked
    to the XO during clock disable to prevent the clocks from locking up
    when the GDSC is enabled. These mirror similar fixes for other controllers
    such as commit e5c359f70e4b ("clk: qcom: camcc: Update the clock ops for
    the SC7180").
    
    Signed-off-by: Jordan Crouse <jorcrous@amazon.com>
    Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
    Link: https://lore.kernel.org/r/20250122222612.32351-1-jorcrous@amazon.com
    Signed-off-by: Bjorn Andersson <andersson@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

clk: qcom: clk-alpha-pll: Do not use random stack value for recalc rate [+ + +]

Author: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Date:   Wed Feb 12 21:01:35 2025 +0100

    clk: qcom: clk-alpha-pll: Do not use random stack value for recalc rate
    
    [ Upstream commit 7a243e1b814a02ab40793026ef64223155d86395 ]
    
    If regmap_read() fails, random stack value was used in calculating new
    frequency in recalc_rate() callbacks.  Such failure is really not
    expected as these are all MMIO reads, however code should be here
    correct and bail out.  This also avoids possible warning on
    uninitialized value.
    
    Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Link: https://lore.kernel.org/r/20250212-b4-clk-qcom-clean-v3-1-499f37444f5d@linaro.org
    Signed-off-by: Bjorn Andersson <andersson@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

clk: sunxi-ng: d1: Add missing divider for MMC mod clocks [+ + +]

Author: Andre Przywara <andre.przywara@arm.com>
Date:   Thu May 1 13:06:31 2025 +0100

    clk: sunxi-ng: d1: Add missing divider for MMC mod clocks
    
    [ Upstream commit 98e6da673cc6dd46ca9a599802bd2c8f83606710 ]
    
    The D1/R528/T113 SoCs have a hidden divider of 2 in the MMC mod clocks,
    just as other recent SoCs. So far we did not describe that, which led
    to the resulting MMC clock rate to be only half of its intended value.
    
    Use a macro that allows to describe a fixed post-divider, to compensate
    for that divisor.
    
    This brings the MMC performance on those SoCs to its expected level,
    so about 23 MB/s for SD cards, instead of the 11 MB/s measured so far.
    
    Fixes: 35b97bb94111 ("clk: sunxi-ng: Add support for the D1 SoC clocks")
    Reported-by: Kuba Szczodrzyński <kuba@szczodrzynski.pl>
    Signed-off-by: Andre Przywara <andre.przywara@arm.com>
    Link: https://patch.msgid.link/20250501120631.837186-1-andre.przywara@arm.com
    Signed-off-by: Chen-Yu Tsai <wens@csie.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

clocksource: mips-gic-timer: Enable counter when CPUs start [+ + +]

Author: Paul Burton <paulburton@kernel.org>
Date:   Wed Jan 29 13:32:47 2025 +0100

    clocksource: mips-gic-timer: Enable counter when CPUs start
    
    [ Upstream commit 3128b0a2e0cf6e07aa78e5f8cf7dd9cd59dc8174 ]
    
    In multi-cluster MIPS I6500 systems there is a GIC in each cluster,
    each with its own counter. When a cluster powers up the counter will
    be stopped, with the COUNTSTOP bit set in the GIC_CONFIG register.
    
    In single cluster systems, it has been fine to clear COUNTSTOP once
    in gic_clocksource_of_init() to start the counter. In multi-cluster
    systems, this will only have started the counter in the boot cluster,
    and any CPUs in other clusters will find their counter stopped which
    will break the GIC clock_event_device.
    
    Resolve this by having CPUs clear the COUNTSTOP bit when they come
    online, using the existing gic_starting_cpu() CPU hotplug callback. This
    will allow CPUs in secondary clusters to ensure that the cluster's GIC
    counter is running as expected.
    
    Signed-off-by: Paul Burton <paulburton@kernel.org>
    Signed-off-by: Chao-ying Fu <cfu@wavecomp.com>
    Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com>
    Signed-off-by: Aleksandar Rikalo <arikalo@gmail.com>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Tested-by: Serge Semin <fancer.lancer@gmail.com>
    Tested-by: Gregory CLEMENT <gregory.clement@bootlin.com>
    Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
    Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

coredump: fix error handling for replace_fd() [+ + +]

Author: Christian Brauner <brauner@kernel.org>
Date:   Mon Apr 14 15:55:06 2025 +0200

    coredump: fix error handling for replace_fd()
    
    commit 95c5f43181fe9c1b5e5a4bd3281c857a5259991f upstream.
    
    The replace_fd() helper returns the file descriptor number on success
    and a negative error code on failure. The current error handling in
    umh_pipe_setup() only works because the file descriptor that is replaced
    is zero but that's pretty volatile. Explicitly check for a negative
    error code.
    
    Link: https://lore.kernel.org/20250414-work-coredump-v2-2-685bf231f828@kernel.org
    Tested-by: Luca Boccassi <luca.boccassi@gmail.com>
    Reviewed-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

coredump: hand a pidfd to the usermode coredump helper [+ + +]

Author: Christian Brauner <brauner@kernel.org>
Date:   Mon Apr 14 15:55:07 2025 +0200

    coredump: hand a pidfd to the usermode coredump helper
    
    commit b5325b2a270fcaf7b2a9a0f23d422ca8a5a8bdea upstream.
    
    Give userspace a way to instruct the kernel to install a pidfd into the
    usermode helper process. This makes coredump handling a lot more
    reliable for userspace. In parallel with this commit we already have
    systemd adding support for this in [1].
    
    We create a pidfs file for the coredumping process when we process the
    corename pattern. When the usermode helper process is forked we then
    install the pidfs file as file descriptor three into the usermode
    helpers file descriptor table so it's available to the exec'd program.
    
    Since usermode helpers are either children of the system_unbound_wq
    workqueue or kthreadd we know that the file descriptor table is empty
    and can thus always use three as the file descriptor number.
    
    Note, that we'll install a pidfd for the thread-group leader even if a
    subthread is calling do_coredump(). We know that task linkage hasn't
    been removed due to delay_group_leader() and even if this @current isn't
    the actual thread-group leader we know that the thread-group leader
    cannot be reaped until @current has exited.
    
    [brauner: This is a backport for the v6.1 series. Upstream has
    significantly changed and backporting all that infra is a non-starter.
    So simply backport the pidfd_prepare() helper and waste the file
    descriptor we allocated. Then we minimally massage the umh coredump
    setup code.]
    
    Link: https://github.com/systemd/systemd/pull/37125 [1]
    Link: https://lore.kernel.org/20250414-work-coredump-v2-3-685bf231f828@kernel.org
    Tested-by: Luca Boccassi <luca.boccassi@gmail.com>
    Reviewed-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cpufreq: tegra186: Share policy per cluster [+ + +]

Author: Aaron Kling <luceoscutum@gmail.com>
Date:   Mon Mar 10 00:28:48 2025 -0500

    cpufreq: tegra186: Share policy per cluster
    
    [ Upstream commit be4ae8c19492cd6d5de61ccb34ffb3f5ede5eec8 ]
    
    This functionally brings tegra186 in line with tegra210 and tegra194,
    sharing a cpufreq policy between all cores in a cluster.
    
    Reviewed-by: Sumit Gupta <sumitg@nvidia.com>
    Acked-by: Thierry Reding <treding@nvidia.com>
    Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
    Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cpuidle: menu: Avoid discarding useful information [+ + +]

Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Thu Feb 6 15:29:05 2025 +0100

    cpuidle: menu: Avoid discarding useful information
    
    [ Upstream commit 85975daeaa4d6ec560bfcd354fc9c08ad7f38888 ]
    
    When giving up on making a high-confidence prediction,
    get_typical_interval() always returns UINT_MAX which means that the
    next idle interval prediction will be based entirely on the time till
    the next timer.  However, the information represented by the most
    recent intervals may not be completely useless in those cases.
    
    Namely, the largest recent idle interval is an upper bound on the
    recently observed idle duration, so it is reasonable to assume that
    the next idle duration is unlikely to exceed it.  Moreover, this is
    still true after eliminating the suspected outliers if the sample
    set still under consideration is at least as large as 50% of the
    maximum sample set size.
    
    Accordingly, make get_typical_interval() return the current maximum
    recent interval value in that case instead of UINT_MAX.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
    Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
    Reviewed-by: Christian Loehle <christian.loehle@arm.com>
    Tested-by: Christian Loehle <christian.loehle@arm.com>
    Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
    Link: https://patch.msgid.link/7770672.EvYhyI6sBW@rjwysocki.net
    Signed-off-by: Sasha Levin <sashal@kernel.org>

crypto: algif_hash - fix double free in hash_accept [+ + +]

Author: Ivan Pravdin <ipravdin.official@gmail.com>
Date:   Sun May 18 18:41:02 2025 -0400

    crypto: algif_hash - fix double free in hash_accept
    
    commit b2df03ed4052e97126267e8c13ad4204ea6ba9b6 upstream.
    
    If accept(2) is called on socket type algif_hash with
    MSG_MORE flag set and crypto_ahash_import fails,
    sk2 is freed. However, it is also freed in af_alg_release,
    leading to slab-use-after-free error.
    
    Fixes: fe869cdb89c9 ("crypto: algif_hash - User-space interface for hash operations")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: lzo - Fix compression buffer overrun [+ + +]

Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Feb 27 17:04:46 2025 +0800

    crypto: lzo - Fix compression buffer overrun
    
    [ Upstream commit cc47f07234f72cbd8e2c973cdbf2a6730660a463 ]
    
    Unlike the decompression code, the compression code in LZO never
    checked for output overruns.  It instead assumes that the caller
    always provides enough buffer space, disregarding the buffer length
    provided by the caller.
    
    Add a safe compression interface that checks for the end of buffer
    before each write.  Use the safe interface in crypto/lzo.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

crypto: octeontx2 - suppress auth failure screaming due to negative tests [+ + +]

Author: Shashank Gupta <shashankg@marvell.com>
Date:   Wed Mar 5 13:27:05 2025 +0530

    crypto: octeontx2 - suppress auth failure screaming due to negative tests
    
    [ Upstream commit 64b7871522a4cba99d092e1c849d6f9092868aaa ]
    
    This patch addresses an issue where authentication failures were being
    erroneously reported due to negative test failures in the "ccm(aes)"
    selftest.
    pr_debug suppress unnecessary screaming of these tests.
    
    Signed-off-by: Shashank Gupta <shashankg@marvell.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dlm: make tcp still work in multi-link env [+ + +]

Author: Heming Zhao <heming.zhao@suse.com>
Date:   Mon Mar 10 15:36:21 2025 +0800

    dlm: make tcp still work in multi-link env
    
    [ Upstream commit 03d2b62208a336a3bb984b9465ef6d89a046ea22 ]
    
    This patch bypasses multi-link errors in TCP mode, allowing dlm
    to operate on the first tcp link.
    
    Signed-off-by: Heming Zhao <heming.zhao@suse.com>
    Signed-off-by: David Teigland <teigland@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dm cache: prevent BUG_ON by blocking retries on failed device resumes [+ + +]

Author: Ming-Hung Tsai <mtsai@redhat.com>
Date:   Thu Mar 6 16:41:50 2025 +0800

    dm cache: prevent BUG_ON by blocking retries on failed device resumes
    
    [ Upstream commit 5da692e2262b8f81993baa9592f57d12c2703dea ]
    
    A cache device failing to resume due to mapping errors should not be
    retried, as the failure leaves a partially initialized policy object.
    Repeating the resume operation risks triggering BUG_ON when reloading
    cache mappings into the incomplete policy object.
    
    Reproduce steps:
    
    1. create a cache metadata consisting of 512 or more cache blocks,
       with some mappings stored in the first array block of the mapping
       array. Here we use cache_restore v1.0 to build the metadata.
    
    cat <<EOF >> cmeta.xml
    <superblock uuid="" block_size="128" nr_cache_blocks="512" \
    policy="smq" hint_width="4">
      <mappings>
        <mapping cache_block="0" origin_block="0" dirty="false"/>
      </mappings>
    </superblock>
    EOF
    dmsetup create cmeta --table "0 8192 linear /dev/sdc 0"
    cache_restore -i cmeta.xml -o /dev/mapper/cmeta --metadata-version=2
    dmsetup remove cmeta
    
    2. wipe the second array block of the mapping array to simulate
       data degradations.
    
    mapping_root=$(dd if=/dev/sdc bs=1c count=8 skip=192 \
    2>/dev/null | hexdump -e '1/8 "%u\n"')
    ablock=$(dd if=/dev/sdc bs=1c count=8 skip=$((4096*mapping_root+2056)) \
    2>/dev/null | hexdump -e '1/8 "%u\n"')
    dd if=/dev/zero of=/dev/sdc bs=4k count=1 seek=$ablock
    
    3. try bringing up the cache device. The resume is expected to fail
       due to the broken array block.
    
    dmsetup create cmeta --table "0 8192 linear /dev/sdc 0"
    dmsetup create cdata --table "0 65536 linear /dev/sdc 8192"
    dmsetup create corig --table "0 524288 linear /dev/sdc 262144"
    dmsetup create cache --notable
    dmsetup load cache --table "0 524288 cache /dev/mapper/cmeta \
    /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0"
    dmsetup resume cache
    
    4. try resuming the cache again. An unexpected BUG_ON is triggered
       while loading cache mappings.
    
    dmsetup resume cache
    
    Kernel logs:
    
    (snip)
    ------------[ cut here ]------------
    kernel BUG at drivers/md/dm-cache-policy-smq.c:752!
    Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
    CPU: 0 UID: 0 PID: 332 Comm: dmsetup Not tainted 6.13.4 #3
    RIP: 0010:smq_load_mapping+0x3e5/0x570
    
    Fix by disallowing resume operations for devices that failed the
    initial attempt.
    
    Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com>
    Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dm: fix unconditional IO throttle caused by REQ_PREFLUSH [+ + +]

Author: Jinliang Zheng <alexjlzheng@gmail.com>
Date:   Thu Feb 20 19:20:14 2025 +0800

    dm: fix unconditional IO throttle caused by REQ_PREFLUSH
    
    [ Upstream commit 88f7f56d16f568f19e1a695af34a7f4a6ce537a6 ]
    
    When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
    generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
    which causes the flush_bio to be throttled by wbt_wait().
    
    An example from v5.4, similar problem also exists in upstream:
    
        crash> bt 2091206
        PID: 2091206  TASK: ffff2050df92a300  CPU: 109  COMMAND: "kworker/u260:0"
         #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
         #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
         #2 [ffff800084a2f880] schedule at ffff800040bfa4b4
         #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
         #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
         #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
         #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
         #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
         #8 [ffff800084a2fa60] generic_make_request at ffff800040570138
         #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
        #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
        #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
        #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
        #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
        #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
        #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
        #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
        #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
        #18 [ffff800084a2fe70] kthread at ffff800040118de4
    
    After commit 2def2845cc33 ("xfs: don't allow log IO to be throttled"),
    the metadata submitted by xlog_write_iclog() should not be throttled.
    But due to the existence of the dm layer, throttling flush_bio indirectly
    causes the metadata bio to be throttled.
    
    Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
    wbt_should_throttle() return false to avoid wbt_wait().
    
    Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
    Reviewed-by: Tianxiang Peng <txpeng@tencent.com>
    Reviewed-by: Hao Peng <flyingpeng@tencent.com>
    Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dm: restrict dm device size to 2^63-512 bytes [+ + +]

Author: Mikulas Patocka <mpatocka@redhat.com>
Date:   Fri Mar 14 13:51:32 2025 +0100

    dm: restrict dm device size to 2^63-512 bytes
    
    [ Upstream commit 45fc728515c14f53f6205789de5bfd72a95af3b8 ]
    
    The devices with size >= 2^63 bytes can't be used reliably by userspace
    because the type off_t is a signed 64-bit integer.
    
    Therefore, we limit the maximum size of a device mapper device to
    2^63-512 bytes.
    
    Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dma-mapping: avoid potential unused data compilation warning [+ + +]

Author: Marek Szyprowski <m.szyprowski@samsung.com>
Date:   Tue Apr 15 09:56:59 2025 +0200

    dma-mapping: avoid potential unused data compilation warning
    
    [ Upstream commit c9b19ea63036fc537a69265acea1b18dabd1cbd3 ]
    
    When CONFIG_NEED_DMA_MAP_STATE is not defined, dma-mapping clients might
    report unused data compilation warnings for dma_unmap_*() calls
    arguments. Redefine macros for those calls to let compiler to notice that
    it is okay when the provided arguments are not used.
    
    Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Suggested-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
    Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Link: https://lore.kernel.org/r/20250415075659.428549-1-m.szyprowski@samsung.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling [+ + +]

Author: Fenghua Yu <fenghua.yu@intel.com>
Date:   Fri Apr 7 13:31:35 2023 -0700

    dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling
    
    [ Upstream commit b022f59725f0ae846191abbd6d2e611d7f60f826 ]
    
    Define idxd_copy_cr() to copy completion record to fault address in
    user address that is found by work queue (wq) and PASID.
    
    It will be used to write the user's completion record that the hardware
    device is not able to write due to user completion record page fault.
    
    An xarray is added to associate the PASID and mm with the
    struct idxd_user_context so mm can be found by PASID and wq.
    
    It is called when handling the completion record fault in a kernel thread
    context. Switch to the mm using kthread_use_vm() and copy the
    completion record to the mm via copy_to_user(). Once the copy is
    completed, switch back to the current mm using kthread_unuse_mm().
    
    Suggested-by: Christoph Hellwig <hch@infradead.org>
    Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
    Suggested-by: Tony Luck <tony.luck@intel.com>
    Tested-by: Tony Zhu <tony.zhu@intel.com>
    Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
    Reviewed-by: Dave Jiang <dave.jiang@intel.com>
    Link: https://lore.kernel.org/r/20230407203143.2189681-9-fenghua.yu@intel.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Stable-dep-of: 8dfa57aabff6 ("dmaengine: idxd: Fix allowing write() from different address spaces")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dmaengine: idxd: add per DSA wq workqueue for processing cr faults [+ + +]

Author: Dave Jiang <dave.jiang@intel.com>
Date:   Fri Apr 7 13:31:33 2023 -0700

    dmaengine: idxd: add per DSA wq workqueue for processing cr faults
    
    [ Upstream commit 2f30decd2f23a376d2ed73dfe4c601421edf501a ]
    
    Add a workqueue for user submitted completion record fault processing.
    The workqueue creation and destruction lifetime will be tied to the user
    sub-driver since it will only be used when the wq is a user type.
    
    Tested-by: Tony Zhu <tony.zhu@intel.com>
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
    Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
    Link: https://lore.kernel.org/r/20230407203143.2189681-7-fenghua.yu@intel.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Stable-dep-of: 8dfa57aabff6 ("dmaengine: idxd: Fix allowing write() from different address spaces")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dmaengine: idxd: Fix ->poll() return value [+ + +]

Author: Dave Jiang <dave.jiang@intel.com>
Date:   Thu May 8 10:05:48 2025 -0700

    dmaengine: idxd: Fix ->poll() return value
    
    [ Upstream commit ae74cd15ade833adc289279b5c6f12e78f64d4d7 ]
    
    The fix to block access from different address space did not return a
    correct value for ->poll() change.  kernel test bot reported that a
    return value of type __poll_t is expected rather than int. Fix to return
    POLLNVAL to indicate invalid request.
    
    Fixes: 8dfa57aabff6 ("dmaengine: idxd: Fix allowing write() from different address spaces")
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202505081851.rwD7jVxg-lkp@intel.com/
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    Link: https://lore.kernel.org/r/20250508170548.2747425-1-dave.jiang@intel.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dmaengine: idxd: Fix allowing write() from different address spaces [+ + +]

Author: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Date:   Mon Apr 21 10:03:37 2025 -0700

    dmaengine: idxd: Fix allowing write() from different address spaces
    
    [ Upstream commit 8dfa57aabff625bf445548257f7711ef294cd30e ]
    
    Check if the process submitting the descriptor belongs to the same
    address space as the one that opened the file, reject otherwise.
    
    Fixes: 6827738dc684 ("dmaengine: idxd: add a write() method for applications to submit work")
    Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    Link: https://lore.kernel.org/r/20250421170337.3008875-1-dave.jiang@intel.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dmaengine: idxd: Fix passing freed memory in idxd_cdev_open() [+ + +]

Author: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Date:   Mon May 8 23:07:16 2023 -0700

    dmaengine: idxd: Fix passing freed memory in idxd_cdev_open()
    
    commit 0642287e3ecdd0d1f88e6a2e63768e16153a990c upstream.
    
    Smatch warns:
            drivers/dma/idxd/cdev.c:327:
                    idxd_cdev_open() warn: 'sva' was already freed.
    
    When idxd_wq_set_pasid() fails, the current code unbinds sva and then
    goes to 'failed_set_pasid' where iommu_sva_unbind_device is called
    again causing the above warning.
    [ device_user_pasid_enabled(idxd) is still true when calling
    failed_set_pasid ]
    
    Fix this by removing additional unbind when idxd_wq_set_pasid() fails
    
    Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling")
    Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
    Acked-by: Fenghua Yu <fenghua.yu@intel.com>
    Acked-by: Dave Jiang <dave.jiang@intel.com>
    Link: https://lore.kernel.org/r/20230509060716.2830630-1-harshit.m.mogalapalli@oracle.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dql: Fix dql->limit value when reset. [+ + +]

Author: Jing Su <jingsusu@didiglobal.com>
Date:   Wed Mar 19 16:57:51 2025 +0800

    dql: Fix dql->limit value when reset.
    
    [ Upstream commit 3a17f23f7c36bac3a3584aaf97d3e3e0b2790396 ]
    
    Executing dql_reset after setting a non-zero value for limit_min can
    lead to an unreasonable situation where dql->limit is less than
    dql->limit_min.
    
    For instance, after setting
    /sys/class/net/eth*/queues/tx-0/byte_queue_limits/limit_min,
    an ifconfig down/up operation might cause the ethernet driver to call
    netdev_tx_reset_queue, which in turn invokes dql_reset.
    
    In this case, dql->limit is reset to 0 while dql->limit_min remains
    non-zero value, which is unexpected. The limit should always be
    greater than or equal to limit_min.
    
    Signed-off-by: Jing Su <jingsusu@didiglobal.com>
    Link: https://patch.msgid.link/Z9qHD1s/NEuQBdgH@pilot-ThinkCentre-M930t-N000
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd/display/dm: drop hw_support check in amdgpu_dm_i2c_xfer() [+ + +]

Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Tue Dec 17 09:25:18 2024 -0500

    drm/amd/display/dm: drop hw_support check in amdgpu_dm_i2c_xfer()
    
    [ Upstream commit 33da70bd1e115d7d73f45fb1c09f5ecc448f3f13 ]
    
    DC supports SW i2c as well.  Drop the check.
    
    Reviewed-by: Harry Wentland <harry.wentland@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd/display: calculate the remain segments for all pipes [+ + +]

Author: Zhikai Zhai <zhikai.zhai@amd.com>
Date:   Thu Feb 27 20:09:14 2025 +0800

    drm/amd/display: calculate the remain segments for all pipes
    
    [ Upstream commit d3069feecdb5542604d29b59acfd1fd213bad95b ]
    
    [WHY]
    In some cases the remain de-tile buffer segments will be greater
    than zero if we don't add the non-top pipe to calculate, at
    this time the override de-tile buffer size will be valid and used.
    But it makes the de-tile buffer segments used finally for all of pipes
    exceed the maximum.
    
    [HOW]
    Add the non-top pipe to calculate the remain de-tile buffer segments.
    Don't set override size to use the average according to pipe count
    if the value exceed the maximum.
    
    Reviewed-by: Charlene Liu <charlene.liu@amd.com>
    Signed-off-by: Zhikai Zhai <zhikai.zhai@amd.com>
    Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
    Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd/display: Guard against setting dispclk low for dcn31x [+ + +]

Author: Jing Zhou <Jing.Zhou@amd.com>
Date:   Tue Mar 4 23:15:56 2025 +0800

    drm/amd/display: Guard against setting dispclk low for dcn31x
    
    [ Upstream commit 9c2f4ae64bb6f6d83a54d88b9ee0f369cdbb9fa8 ]
    
    [WHY]
    We should never apply a minimum dispclk value while in
    prepare_bandwidth or while displays are active. This is
    always an optimizaiton for when all displays are disabled.
    
    [HOW]
    Defer dispclk optimization until safe_to_lower = true
    and display_count reaches 0.
    
    Since 0 has a special value in this logic (ie. no dispclk
    required) we also need adjust the logic that clamps it for
    the actual request to PMFW.
    
    Reviewed-by: Charlene Liu <charlene.liu@amd.com>
    Reviewed-by: Chris Park <chris.park@amd.com>
    Reviewed-by: Eric Yang <eric.yang@amd.com>
    Signed-off-by: Jing Zhou <Jing.Zhou@amd.com>
    Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
    Signed-off-by: Alex Hung <alex.hung@amd.com>
    Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd/display: handle max_downscale_src_width fail check [+ + +]

Author: Yihan Zhu <Yihan.Zhu@amd.com>
Date:   Wed Feb 12 15:17:56 2025 -0500

    drm/amd/display: handle max_downscale_src_width fail check
    
    [ Upstream commit 02a940da2ccc0cc0299811379580852b405a0ea2 ]
    
    [WHY]
    If max_downscale_src_width check fails, we exit early from TAP calculation and left a NULL
    value to the scaling data structure to cause the zero divide in the DML validation.
    
    [HOW]
    Call set default TAP calculation before early exit in get_optimal_number_of_taps due to
    max downscale limit exceed.
    
    Reviewed-by: Samson Tam <samson.tam@amd.com>
    Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com>
    Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
    Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd/display: Initial psr_version with correct setting [+ + +]

Author: Tom Chung <chiahsuan.chung@amd.com>
Date:   Mon Jan 13 14:22:31 2025 +0800

    drm/amd/display: Initial psr_version with correct setting
    
    [ Upstream commit d8c782cac5007e68e7484d420168f12d3490def6 ]
    
    [Why & How]
    The initial setting for psr_version is not correct while
    create a virtual link.
    
    The default psr_version should be DC_PSR_VERSION_UNSUPPORTED.
    
    Reviewed-by: Roman Li <roman.li@amd.com>
    Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
    Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
    Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amdgpu: Allow P2P access through XGMI [+ + +]

Author: Felix Kuehling <felix.kuehling@amd.com>
Date:   Wed Apr 16 00:19:13 2025 -0400

    drm/amdgpu: Allow P2P access through XGMI
    
    [ Upstream commit a92741e72f91b904c1d8c3d409ed8dbe9c1f2b26 ]
    
    If peer memory is accessible through XGMI, allow leaving it in VRAM
    rather than forcing its migration to GTT on DMABuf attachment.
    
    Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
    Tested-by: Hao (Claire) Zhou <hao.zhou@amd.com>
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit 372c8d72c3680fdea3fbb2d6b089f76b4a6d596a)
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.c [+ + +]

Author: Victor Lu <victorchengchi.lu@amd.com>
Date:   Thu Feb 13 18:38:28 2025 -0500

    drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.c
    
    [ Upstream commit 057fef20b8401110a7bc1c2fe9d804a8a0bf0d24 ]
    
    SRIOV VF does not have write access to AGP BAR regs.
    Skip the writes to avoid a dmesg warning.
    
    Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amdgpu: enlarge the VBIOS binary size limit [+ + +]

Author: Shiwu Zhang <shiwu.zhang@amd.com>
Date:   Tue Nov 19 15:58:39 2024 +0800

    drm/amdgpu: enlarge the VBIOS binary size limit
    
    [ Upstream commit 667b96134c9e206aebe40985650bf478935cbe04 ]
    
    Some chips have a larger VBIOS file so raise the size limit to support
    the flashing tool.
    
    Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
    Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amdgpu: reset psp->cmd to NULL after releasing the buffer [+ + +]

Author: Jiang Liu <gerry@linux.alibaba.com>
Date:   Fri Feb 7 14:28:49 2025 +0800

    drm/amdgpu: reset psp->cmd to NULL after releasing the buffer
    
    [ Upstream commit e92f3f94cad24154fd3baae30c6dfb918492278d ]
    
    Reset psp->cmd to NULL after releasing the buffer in function psp_sw_fini().
    
    Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
    Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amdkfd: KFD release_work possible circular locking [+ + +]

Author: Philip Yang <Philip.Yang@amd.com>
Date:   Mon Feb 17 20:08:29 2025 -0500

    drm/amdkfd: KFD release_work possible circular locking
    
    [ Upstream commit 1b9366c601039d60546794c63fbb83ce8e53b978 ]
    
    If waiting for gpu reset done in KFD release_work, thers is WARNING:
    possible circular locking dependency detected
    
      #2  kfd_create_process
            kfd_process_mutex
              flush kfd release work
    
      #1  kfd release work
            wait for amdgpu reset work
    
      #0  amdgpu_device_gpu_reset
            kgd2kfd_pre_reset
              kfd_process_mutex
    
      Possible unsafe locking scenario:
    
            CPU0                    CPU1
            ----                    ----
       lock((work_completion)(&p->release_work));
                      lock((wq_completion)kfd_process_wq);
                      lock((work_completion)(&p->release_work));
       lock((wq_completion)amdgpu-reset-dev);
    
    To fix this, KFD create process move flush release work outside
    kfd_process_mutex.
    
    Signed-off-by: Philip Yang <Philip.Yang@amd.com>
    Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/ast: Find VBIOS mode from regular display size [+ + +]

Author: Thomas Zimmermann <tzimmermann@suse.de>
Date:   Fri Jan 31 10:21:08 2025 +0100

    drm/ast: Find VBIOS mode from regular display size
    
    [ Upstream commit c81202906b5cd56db403e95db3d29c9dfc8c74c1 ]
    
    The ast driver looks up supplied display modes from an internal list of
    display modes supported by the VBIOS.
    
    Do not use the crtc_-prefixed display values from struct drm_display_mode
    for looking up the VBIOS mode. The fields contain raw values that the
    driver programs to hardware. They are affected by display settings like
    double-scan or interlace.
    
    Instead use the regular vdisplay and hdisplay fields for lookup. As the
    programmed values can now differ from the values used for lookup, set
    struct drm_display_mode.crtc_vdisplay and .crtc_hdisplay from the VBIOS
    mode.
    
    Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
    Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20250131092257.115596-9-tzimmermann@suse.de
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/atomic: clarify the rules around drm_atomic_state->allow_modeset [+ + +]

Author: Simona Vetter <simona.vetter@ffwll.ch>
Date:   Wed Jan 8 18:24:16 2025 +0100

    drm/atomic: clarify the rules around drm_atomic_state->allow_modeset
    
    [ Upstream commit c5e3306a424b52e38ad2c28c7f3399fcd03e383d ]
    
    msm is automagically upgrading normal commits to full modesets, and
    that's a big no-no:
    
    - for one this results in full on->off->on transitions on all these
      crtc, at least if you're using the usual helpers. Which seems to be
      the case, and is breaking uapi
    
    - further even if the ctm change itself would not result in flicker,
      this can hide modesets for other reasons. Which again breaks the
      uapi
    
    v2: I forgot the case of adding unrelated crtc state. Add that case
    and link to the existing kerneldoc explainers. This has come up in an
    irc discussion with Manasi and Ville about intel's bigjoiner mode.
    Also cc everyone involved in the msm irc discussion, more people
    joined after I sent out v1.
    
    v3: Wording polish from Pekka and Thomas
    
    Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
    Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Cc: Maxime Ripard <mripard@kernel.org>
    Cc: Thomas Zimmermann <tzimmermann@suse.de>
    Cc: David Airlie <airlied@gmail.com>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
    Cc: Rob Clark <robdclark@gmail.com>
    Cc: Simon Ser <contact@emersion.fr>
    Cc: Manasi Navare <navaremanasi@google.com>
    Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Signed-off-by: Simona Vetter <simona.vetter@intel.com>
    Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
    Link: https://patchwork.freedesktop.org/patch/msgid/20250108172417.160831-1-simona.vetter@ffwll.ch
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/edid: fixed the bug that hdr metadata was not reset [+ + +]

Author: feijuan.li <feijuan.li@samsung.com>
Date:   Wed May 14 14:35:11 2025 +0800

    drm/edid: fixed the bug that hdr metadata was not reset
    
    commit 6692dbc15e5ed40a3aa037aced65d7b8826c58cd upstream.
    
    When DP connected to a device with HDR capability,
    the hdr structure was filled.Then connected to another
    sink device without hdr capability, but the hdr info
    still exist.
    
    Fixes: e85959d6cbe0 ("drm: Parse HDR metadata info from EDID")
    Cc: <stable@vger.kernel.org> # v5.3+
    Signed-off-by: "feijuan.li" <feijuan.li@samsung.com>
    Reviewed-by: Jani Nikula <jani.nikula@intel.com>
    Link: https://lore.kernel.org/r/20250514063511.4151780-1-feijuan.li@samsung.com
    Signed-off-by: Jani Nikula <jani.nikula@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/mediatek: mtk_dpi: Add checks for reg_h_fre_con existence [+ + +]

Author: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Date:   Mon Feb 17 16:47:58 2025 +0100

    drm/mediatek: mtk_dpi: Add checks for reg_h_fre_con existence
    
    [ Upstream commit 8c9da7cd0bbcc90ab444454fecf535320456a312 ]
    
    In preparation for adding support for newer DPI instances which
    do support direct-pin but do not have any H_FRE_CON register,
    like the one found in MT8195 and MT8188, add a branch to check
    if the reg_h_fre_con variable was declared in the mtk_dpi_conf
    structure for the probed SoC DPI version.
    
    As a note, this is useful specifically only for cases in which
    the support_direct_pin variable is true, so mt8195-dpintf is
    not affected by any issue.
    
    Reviewed-by: CK Hu <ck.hu@mediatek.com>
    Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
    Link: https://patchwork.kernel.org/project/dri-devel/patch/20250217154836.108895-6-angelogioacchino.delregno@collabora.com/
    Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/panel-edp: Add Starry 116KHD024006 [+ + +]

Author: Douglas Anderson <dianders@chromium.org>
Date:   Thu Jan 9 14:28:53 2025 -0800

    drm/panel-edp: Add Starry 116KHD024006
    
    [ Upstream commit 749b5b279e5636cdcef51e15d67b77162cca6caa ]
    
    We have a few reports of sc7180-trogdor-pompom devices that have a
    panel in them that IDs as STA 0x0004 and has the following raw EDID:
    
      00 ff ff ff ff ff ff 00  4e 81 04 00 00 00 00 00
      10 20 01 04 a5 1a 0e 78  0a dc dd 96 5b 5b 91 28
      1f 52 54 00 00 00 01 01  01 01 01 01 01 01 01 01
      01 01 01 01 01 01 8e 1c  56 a0 50 00 1e 30 28 20
      55 00 00 90 10 00 00 18  00 00 00 00 00 00 00 00
      00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 fe
      00 31 31 36 4b 48 44 30  32 34 30 30 36 0a 00 e6
    
    We've been unable to locate a datasheet for this panel and our partner
    has not been responsive, but all Starry eDP datasheets that we can
    find agree on the same timing (delay_100_500_e200) so it should be
    safe to use that here instead of the super conservative timings. We'll
    still go a little extra conservative and allow `hpd_absent` of 200
    instead of 100 because that won't add any real-world delay in most
    cases.
    
    We'll associate the string from the EDID ("116KHD024006") with this
    panel. Given that the ID is the suspicious value of 0x0004 it seems
    likely that Starry doesn't always update their IDs but the string will
    still work to differentiate if we ever need to in the future.
    
    Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
    Signed-off-by: Douglas Anderson <dianders@chromium.org>
    Link: https://patchwork.freedesktop.org/patch/msgid/20250109142853.1.Ibcc3009933fd19507cc9c713ad0c99c7a9e4fe17@changeid
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/rockchip: vop2: Add uv swap for cluster window [+ + +]

Author: Andy Yan <andy.yan@rock-chips.com>
Date:   Mon Mar 3 11:44:17 2025 +0800

    drm/rockchip: vop2: Add uv swap for cluster window
    
    [ Upstream commit e7aae9f6d762139f8d2b86db03793ae0ab3dd802 ]
    
    The Cluster windows of upcoming VOP on rk3576 also support
    linear YUV support, we need to set uv swap bit for it.
    
    As the VOP2_WIN_UV_SWA register defined on rk3568/rk3588 is
    0xffffffff, so this register will not be touched on these
    two platforms.
    
    Signed-off-by: Andy Yan <andy.yan@rock-chips.com>
    Tested-by: Michael Riesch <michael.riesch@wolfvision.net> # on RK3568
    Tested-by: Detlev Casanova <detlev.casanova@collabora.com>
    Signed-off-by: Heiko Stuebner <heiko@sntech.de>
    Link: https://patchwork.freedesktop.org/patch/msgid/20250303034436.192400-4-andyshrk@163.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm: Add valid clones check [+ + +]

Author: Jessica Zhang <quic_jesszhan@quicinc.com>
Date:   Mon Dec 16 16:43:14 2024 -0800

    drm: Add valid clones check
    
    [ Upstream commit 41b4b11da02157c7474caf41d56baae0e941d01a ]
    
    Check that all encoders attached to a given CRTC are valid
    possible_clones of each other.
    
    Signed-off-by: Jessica Zhang <quic_jesszhan@quicinc.com>
    Reviewed-by: Maxime Ripard <mripard@kernel.org>
    Link: https://patchwork.freedesktop.org/patch/msgid/20241216-concurrent-wb-v4-3-fe220297a7f0@quicinc.com
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

EDAC/ie31200: work around false positive build warning [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Wed Jan 22 07:50:26 2025 +0100

    EDAC/ie31200: work around false positive build warning
    
    [ Upstream commit c29dfd661fe2f8d1b48c7f00590929c04b25bf40 ]
    
    gcc-14 produces a bogus warning in some configurations:
    
    drivers/edac/ie31200_edac.c: In function 'ie31200_probe1.isra':
    drivers/edac/ie31200_edac.c:412:26: error: 'dimm_info' is used uninitialized [-Werror=uninitialized]
      412 |         struct dimm_data dimm_info[IE31200_CHANNELS][IE31200_DIMMS_PER_CHANNEL];
          |                          ^~~~~~~~~
    drivers/edac/ie31200_edac.c:412:26: note: 'dimm_info' declared here
      412 |         struct dimm_data dimm_info[IE31200_CHANNELS][IE31200_DIMMS_PER_CHANNEL];
          |                          ^~~~~~~~~
    
    I don't see any way the unintialized access could really happen here,
    but I can see why the compiler gets confused by the two loops.
    
    Instead, rework the two nested loops to only read the addr_decode
    registers and then keep only one instance of the dimm info structure.
    
    [Tony: Qiuxu pointed out that the "populate DIMM info" comment was left
    behind in the refactor and suggested moving it. I deleted the comment
    as unnecessry in front os a call to populate_dimm_info(). That seems
    pretty self-describing.]
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Acked-by: Jason Baron <jbaron@akamai.com>
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    Link: https://lore.kernel.org/all/20250122065031.1321015-1-arnd@kernel.org
    Signed-off-by: Sasha Levin <sashal@kernel.org>

espintcp: remove encap socket caching to avoid reference leak [+ + +]

Author: Sabrina Dubroca <sd@queasysnail.net>
Date:   Wed Apr 9 15:59:57 2025 +0200

    espintcp: remove encap socket caching to avoid reference leak
    
    [ Upstream commit 028363685bd0b7a19b4a820f82dd905b1dc83999 ]
    
    The current scheme for caching the encap socket can lead to reference
    leaks when we try to delete the netns.
    
    The reference chain is: xfrm_state -> enacp_sk -> netns
    
    Since the encap socket is a userspace socket, it holds a reference on
    the netns. If we delete the espintcp state (through flush or
    individual delete) before removing the netns, the reference on the
    socket is dropped and the netns is correctly deleted. Otherwise, the
    netns may not be reachable anymore (if all processes within the ns
    have terminated), so we cannot delete the xfrm state to drop its
    reference on the socket.
    
    This patch results in a small (~2% in my tests) performance
    regression.
    
    A GC-type mechanism could be added for the socket cache, to clear
    references if the state hasn't been used "recently", but it's a lot
    more complex than just not caching the socket.
    
    Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
    Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

eth: mlx4: don't try to complete XDP frames in netpoll [+ + +]

Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Feb 12 17:06:33 2025 -0800

    eth: mlx4: don't try to complete XDP frames in netpoll
    
    [ Upstream commit 8fdeafd66edaf420ea0063a1f13442fe3470fe70 ]
    
    mlx4 doesn't support ndo_xdp_xmit / XDP_REDIRECT and wasn't
    using page pool until now, so it could run XDP completions
    in netpoll (NAPI budget == 0) just fine. Page pool has calling
    context requirements, make sure we don't try to call it from
    what is potentially HW IRQ context.
    
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://patch.msgid.link/20250213010635.1354034-3-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ext4: reject the 'data_err=abort' option in nojournal mode [+ + +]

Author: Baokun Li <libaokun1@huawei.com>
Date:   Wed Jan 22 19:05:27 2025 +0800

    ext4: reject the 'data_err=abort' option in nojournal mode
    
    [ Upstream commit 26343ca0df715097065b02a6cddb4a029d5b9327 ]
    
    data_err=abort aborts the journal on I/O errors. However, this option is
    meaningless if journal is disabled, so it is rejected in nojournal mode
    to reduce unnecessary checks. Also, this option is ignored upon remount.
    
    Signed-off-by: Baokun Li <libaokun1@huawei.com>
    Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://patch.msgid.link/20250122110533.4116662-4-libaokun@huaweicloud.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ext4: reorder capability check last [+ + +]

Author: Christian Göttsche <cgzones@googlemail.com>
Date:   Sun Mar 2 17:06:39 2025 +0100

    ext4: reorder capability check last
    
    [ Upstream commit 1b419c889c0767a5b66d0a6c566cae491f1cb0f7 ]
    
    capable() calls refer to enabled LSMs whether to permit or deny the
    request.  This is relevant in connection with SELinux, where a
    capability check results in a policy decision and by default a denial
    message on insufficient permission is issued.
    It can lead to three undesired cases:
      1. A denial message is generated, even in case the operation was an
         unprivileged one and thus the syscall succeeded, creating noise.
      2. To avoid the noise from 1. the policy writer adds a rule to ignore
         those denial messages, hiding future syscalls, where the task
         performs an actual privileged operation, leading to hidden limited
         functionality of that task.
      3. To avoid the noise from 1. the policy writer adds a rule to permit
         the task the requested capability, while it does not need it,
         violating the principle of least privilege.
    
    Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
    Reviewed-by: Serge Hallyn <serge@hallyn.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://patch.msgid.link/20250302160657.127253-2-cgoettsche@seltendoof.de
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

fbcon: Use correct erase colour for clearing in fbcon [+ + +]

Author: Zsolt Kajtar <soci@c64.rulez.org>
Date:   Sun Feb 2 21:33:46 2025 +0100

    fbcon: Use correct erase colour for clearing in fbcon
    
    [ Upstream commit 892c788d73fe4a94337ed092cb998c49fa8ecaf4 ]
    
    The erase colour calculation for fbcon clearing should use get_color instead
    of attr_col_ec, like everything else. The latter is similar but is not correct.
    For example it's missing the depth dependent remapping and doesn't care about
    blanking.
    
    The problem can be reproduced by setting up the background colour to grey
    (vt.color=0x70) and having an fbcon console set to 2bpp (4 shades of gray).
    Now the background attribute should be 1 (dark gray) on the console.
    
    If the screen is scrolled when pressing enter in a shell prompt at the bottom
    line then the new line is cleared using colour 7 instead of 1. That's not
    something fillrect likes (at 2bbp it expect 0-3) so the result is interesting.
    
    This patch switches to get_color with vc_video_erase_char to determine the
    erase colour from attr_col_ec. That makes the latter function redundant as
    no other users were left.
    
    Use correct erase colour for clearing in fbcon
    
    Signed-off-by: Zsolt Kajtar <soci@c64.rulez.org>
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

fbdev: core: tileblit: Implement missing margin clearing for tileblit [+ + +]

Author: Zsolt Kajtar <soci@c64.rulez.org>
Date:   Sat Feb 1 09:18:09 2025 +0100

    fbdev: core: tileblit: Implement missing margin clearing for tileblit
    
    [ Upstream commit 76d3ca89981354e1f85a3e0ad9ac4217d351cc72 ]
    
    I was wondering why there's garbage at the bottom of the screen when
    tile blitting is used with an odd mode like 1080, 600 or 200. Sure there's
    only space for half a tile but the same area is clean when the buffer
    is bitmap.
    
    Then later I found that it's supposed to be cleaned but that's not
    implemented. So I took what's in bitblit and adapted it for tileblit.
    
    This implementation was tested for both the horizontal and vertical case,
    and now does the same as what's done for bitmap buffers.
    
    If anyone is interested to reproduce the problem then I could bet that'd
    be on a S3 or Ark. Just set up a mode with an odd line count and make
    sure that the virtual size covers the complete tile at the bottom. E.g.
    for 600 lines that's 608 virtual lines for a 16 tall tile. Then the
    bottom area should be cleaned.
    
    For the right side it's more difficult as there the drivers won't let an
    odd size happen, unless the code is modified. But once it reports back a
    few pixel columns short then fbcon won't use the last column. With the
    patch that column is now clean.
    
    Btw. the virtual size should be rounded up by the driver for both axes
    (not only the horizontal) so that it's dividable by the tile size.
    That's a driver bug but correcting it is not in scope for this patch.
    
    Implement missing margin clearing for tileblit
    
    Signed-off-by: Zsolt Kajtar <soci@c64.rulez.org>
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

fbdev: fsl-diu-fb: add missing device_remove_file() [+ + +]

Author: Shixiong Ou <oushixiong@kylinos.cn>
Date:   Mon Mar 10 09:54:31 2025 +0800

    fbdev: fsl-diu-fb: add missing device_remove_file()
    
    [ Upstream commit 86d16cd12efa547ed43d16ba7a782c1251c80ea8 ]
    
    Call device_remove_file() when driver remove.
    
    Signed-off-by: Shixiong Ou <oushixiong@kylinos.cn>
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

firmware: arm_ffa: Set dma_mask for ffa devices [+ + +]

Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Fri Jan 17 15:35:52 2025 +0530

    firmware: arm_ffa: Set dma_mask for ffa devices
    
    [ Upstream commit cc0aac7ca17e0ea3ca84b552fc79f3e86fd07f53 ]
    
    Set dma_mask for FFA devices, otherwise DMA allocation using the device pointer
    lead to following warning:
    
    WARNING: CPU: 1 PID: 1 at kernel/dma/mapping.c:597 dma_alloc_attrs+0xe0/0x124
    
    Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
    Message-Id: <e3dd8042ac680bd74b6580c25df855d092079c18.1737107520.git.viresh.kumar@linaro.org>
    Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

fork: use pidfd_prepare() [+ + +]

Author: Christian Brauner <brauner@kernel.org>
Date:   Mon Mar 27 20:22:52 2023 +0200

    fork: use pidfd_prepare()
    
    commit ca7707f5430ad6b1c9cb7cee0a7f67d69328bb2d upstream.
    
    Stop open-coding get_unused_fd_flags() and anon_inode_getfile(). That's
    brittle just for keeping the flags between both calls in sync. Use the
    dedicated helper.
    
    Message-Id: <20230327-pidfd-file-api-v1-2-5c0e9a3158e4@kernel.org>
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fpga: altera-cvp: Increase credit timeout [+ + +]

Author: Kuhanh Murugasen Krishnan <kuhanh.murugasen.krishnan@intel.com>
Date:   Thu Feb 13 06:12:49 2025 +0800

    fpga: altera-cvp: Increase credit timeout
    
    [ Upstream commit 0f05886a40fdc55016ba4d9ae0a9c41f8312f15b ]
    
    Increase the timeout for SDM (Secure device manager) data credits from
    20ms to 40ms. Internal stress tests running at 500 loops failed with the
    current timeout of 20ms. At the start of a FPGA configuration, the CVP
    host driver reads the transmit credits from SDM. It then sends bitstream
    FPGA data to SDM based on the total credits. Each credit allows the
    CVP host driver to send 4kBytes of data. There are situations whereby,
    the SDM did not respond in time during testing.
    
    Signed-off-by: Ang Tien Sung <tien.sung.ang@intel.com>
    Signed-off-by: Kuhanh Murugasen Krishnan <kuhanh.murugasen.krishnan@intel.com>
    Acked-by: Xu Yilun <yilun.xu@intel.com>
    Link: https://lore.kernel.org/r/20250212221249.2715929-1-tien.sung.ang@intel.com
    Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

fuse: Return EPERM rather than ENOSYS from link() [+ + +]

Author: Matt Johnston <matt@codeconstruct.com.au>
Date:   Fri Feb 14 09:17:53 2025 +0800

    fuse: Return EPERM rather than ENOSYS from link()
    
    [ Upstream commit 8344213571b2ac8caf013cfd3b37bc3467c3a893 ]
    
    link() is documented to return EPERM when a filesystem doesn't support
    the operation, return that instead.
    
    Link: https://github.com/libfuse/libfuse/issues/925
    Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie [+ + +]

Author: Jason Gunthorpe <jgg@ziepe.ca>
Date:   Wed Feb 19 17:31:36 2025 -0800

    genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie
    
    [ Upstream commit 1f7df3a691740a7736bbc99dc4ed536120eb4746 ]
    
    The IOMMU translation for MSI message addresses has been a 2-step process,
    separated in time:
    
     1) iommu_dma_prepare_msi(): A cookie pointer containing the IOVA address
        is stored in the MSI descriptor when an MSI interrupt is allocated.
    
     2) iommu_dma_compose_msi_msg(): this cookie pointer is used to compute a
        translated message address.
    
    This has an inherent lifetime problem for the pointer stored in the cookie
    that must remain valid between the two steps. However, there is no locking
    at the irq layer that helps protect the lifetime. Today, this works under
    the assumption that the iommu domain is not changed while MSI interrupts
    being programmed. This is true for normal DMA API users within the kernel,
    as the iommu domain is attached before the driver is probed and cannot be
    changed while a driver is attached.
    
    Classic VFIO type1 also prevented changing the iommu domain while VFIO was
    running as it does not support changing the "container" after starting up.
    
    However, iommufd has improved this so that the iommu domain can be changed
    during VFIO operation. This potentially allows userspace to directly race
    VFIO_DEVICE_ATTACH_IOMMUFD_PT (which calls iommu_attach_group()) and
    VFIO_DEVICE_SET_IRQS (which calls into iommu_dma_compose_msi_msg()).
    
    This potentially causes both the cookie pointer and the unlocked call to
    iommu_get_domain_for_dev() on the MSI translation path to become UAFs.
    
    Fix the MSI cookie UAF by removing the cookie pointer. The translated IOVA
    address is already known during iommu_dma_prepare_msi() and cannot change.
    Thus, it can simply be stored as an integer in the MSI descriptor.
    
    The other UAF related to iommu_get_domain_for_dev() will be addressed in
    patch "iommu: Make iommu_dma_prepare_msi() into a generic operation" by
    using the IOMMU group mutex.
    
    Link: https://patch.msgid.link/r/a4f2cd76b9dc1833ee6c1cf325cba57def22231c.1740014950.git.nicolinc@nvidia.com
    Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

gfs2: Check for empty queue in run_queue [+ + +]

Author: Andreas Gruenbacher <agruenba@redhat.com>
Date:   Thu Feb 6 14:58:39 2025 +0100

    gfs2: Check for empty queue in run_queue
    
    [ Upstream commit d838605fea6eabae3746a276fd448f6719eb3926 ]
    
    In run_queue(), check if the queue of pending requests is empty instead
    of blindly assuming that it won't be.
    
    Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

gpio: pca953x: Add missing header(s) [+ + +]

Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date:   Fri Oct 7 16:44:44 2022 +0300

    gpio: pca953x: Add missing header(s)
    
    [ Upstream commit c20a395f9bf939ef0587ce5fa14316ac26252e9b ]
    
    Do not imply that some of the generic headers may be always included.
    Instead, include explicitly what we are direct user of.
    
    While at it, sort headers alphabetically.
    
    Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Stable-dep-of: 3e38f946062b ("gpio: pca953x: fix IRQ storm on system wake up")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

gpio: pca953x: fix IRQ storm on system wake up [+ + +]

Author: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
Date:   Mon May 12 11:54:41 2025 +0200

    gpio: pca953x: fix IRQ storm on system wake up
    
    [ Upstream commit 3e38f946062b4845961ab86b726651b4457b2af8 ]
    
    If an input changes state during wake-up and is used as an interrupt
    source, the IRQ handler reads the volatile input register to clear the
    interrupt mask and deassert the IRQ line. However, the IRQ handler is
    triggered before access to the register is granted, causing the read
    operation to fail.
    
    As a result, the IRQ handler enters a loop, repeatedly printing the
    "failed reading register" message, until `pca953x_resume()` is eventually
    called, which restores the driver context and enables access to
    registers.
    
    Fix by disabling the IRQ line before entering suspend mode, and
    re-enabling it after the driver context is restored in `pca953x_resume()`.
    
    An IRQ can be disabled with disable_irq() and still wake the system as
    long as the IRQ has wake enabled, so the wake-up functionality is
    preserved.
    
    Fixes: b76574300504 ("gpio: pca953x: Restore registers after suspend/resume cycle")
    Cc: stable@vger.kernel.org
    Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
    Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
    Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Link: https://lore.kernel.org/r/20250512095441.31645-1-francesco@dolcini.it
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

gpio: pca953x: Simplify code with cleanup helpers [+ + +]

Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date:   Fri Sep 1 16:40:36 2023 +0300

    gpio: pca953x: Simplify code with cleanup helpers
    
    [ Upstream commit 8e471b784a720f6f34f9fb449ba0744359dcaccb ]
    
    Use macros defined in linux/cleanup.h to automate resource lifetime
    control in gpio-pca953x.
    
    Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Stable-dep-of: 3e38f946062b ("gpio: pca953x: fix IRQ storm on system wake up")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

gpio: pca953x: Split pca953x_restore_context() and pca953x_save_context() [+ + +]

Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date:   Fri Sep 1 16:40:35 2023 +0300

    gpio: pca953x: Split pca953x_restore_context() and pca953x_save_context()
    
    [ Upstream commit ec5bde62019b0a5300c67bd81b9864a8ea12274e ]
    
    Split regcache handling to the respective helpers. It will allow to
    have further refactoring with ease.
    
    Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Stable-dep-of: 3e38f946062b ("gpio: pca953x: fix IRQ storm on system wake up")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

HID: quirks: Add ADATA XPG alpha wireless mouse support [+ + +]

Author: Milton Barrera <miltonjosue2001@gmail.com>
Date:   Wed Apr 9 00:04:28 2025 -0600

    HID: quirks: Add ADATA XPG alpha wireless mouse support
    
    [ Upstream commit fa9fdeea1b7d6440c22efa6d59a769eae8bc89f1 ]
    
    This patch adds HID_QUIRK_ALWAYS_POLL for the ADATA XPG wireless gaming mouse (USB ID 125f:7505) and its USB dongle (USB ID 125f:7506). Without this quirk, the device does not generate input events properly.
    
    Signed-off-by: Milton Barrera <miltonjosue2001@gmail.com>
    Signed-off-by: Jiri Kosina <jkosina@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

HID: usbkbd: Fix the bit shift number for LED_KANA [+ + +]

Author: junan <junan76@163.com>
Date:   Thu Nov 28 10:35:18 2024 +0800

    HID: usbkbd: Fix the bit shift number for LED_KANA
    
    [ Upstream commit d73a4bfa2881a6859b384b75a414c33d4898b055 ]
    
    Since "LED_KANA" was defined as "0x04", the shift number should be "4".
    
    Signed-off-by: junan <junan76@163.com>
    Signed-off-by: Jiri Kosina <jkosina@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING [+ + +]

Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Sat Jan 18 00:24:33 2025 +0100

    hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING
    
    commit 53dac345395c0d2493cbc2f4c85fe38aef5b63f5 upstream.
    
    hrtimers are migrated away from the dying CPU to any online target at
    the CPUHP_AP_HRTIMERS_DYING stage in order not to delay bandwidth timers
    handling tasks involved in the CPU hotplug forward progress.
    
    However wakeups can still be performed by the outgoing CPU after
    CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth timers being
    armed. Depending on several considerations (crystal ball power management
    based election, earliest timer already enqueued, timer migration enabled or
    not), the target may eventually be the current CPU even if offline. If that
    happens, the timer is eventually ignored.
    
    The most notable example is RCU which had to deal with each and every of
    those wake-ups by deferring them to an online CPU, along with related
    workarounds:
    
    _ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is dying)
    _ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from offline CPU)
    _ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline softirq)
    
    The problem isn't confined to RCU though as the stop machine kthread
    (which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at the end
    of its work through cpu_stop_signal_done() and performs a wake up that
    eventually arms the deadline server timer:
    
       WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086 hrtimer_start_range_ns+0x289/0x2d0
       CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted
       Stopper: multi_cpu_stop+0x0/0x120 <- stop_machine_cpuslocked+0x66/0xc0
       RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0
       Call Trace:
       <TASK>
         start_dl_timer
         enqueue_dl_entity
         dl_server_start
         enqueue_task_fair
         enqueue_task
         ttwu_do_activate
         try_to_wake_up
         complete
         cpu_stopper_thread
    
    Instead of providing yet another bandaid to work around the situation, fix
    it in the hrtimers infrastructure instead: always migrate away a timer to
    an online target whenever it is enqueued from an offline CPU.
    
    This will also allow to revert all the above RCU disgraceful hacks.
    
    Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
    Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
    Reported-by: Usama Arif <usamaarif642@gmail.com>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Tested-by: Paul E. McKenney <paulmck@kernel.org>
    Link: https://lore.kernel.org/all/20250117232433.24027-1-frederic@kernel.org
    Closes: 20241213203739.1519801-1-usamaarif642@gmail.com
    Signed-off-by: Zhaoyang Li <lizy04@hust.edu.cn>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hwmon: (dell-smm) Increment the number of fans [+ + +]

Author: Kurt Borja <kuurtb@gmail.com>
Date:   Tue Mar 4 00:52:50 2025 -0500

    hwmon: (dell-smm) Increment the number of fans
    
    [ Upstream commit dbcfcb239b3b452ef8782842c36fb17dd1b9092f ]
    
    Some Alienware laptops that support the SMM interface, may have up to 4
    fans.
    
    Tested on an Alienware x15 r1.
    
    Signed-off-by: Kurt Borja <kuurtb@gmail.com>
    Link: https://lore.kernel.org/r/20250304055249.51940-2-kuurtb@gmail.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

hwmon: (gpio-fan) Add missing mutex locks [+ + +]

Author: Alexander Stein <alexander.stein@ew.tq-group.com>
Date:   Mon Feb 10 15:59:30 2025 +0100

    hwmon: (gpio-fan) Add missing mutex locks
    
    [ Upstream commit 9fee7d19bab635f89223cc40dfd2c8797fdc4988 ]
    
    set_fan_speed() is expected to be called with fan_data->lock being locked.
    Add locking for proper synchronization.
    
    Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
    Link: https://lore.kernel.org/r/20250210145934.761280-3-alexander.stein@ew.tq-group.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

hwmon: (xgene-hwmon) use appropriate type for the latency value [+ + +]

Author: Andrey Vatoropin <a.vatoropin@crpt.ru>
Date:   Tue Feb 4 09:54:08 2025 +0000

    hwmon: (xgene-hwmon) use appropriate type for the latency value
    
    [ Upstream commit 8df0f002827e18632dcd986f7546c1abf1953a6f ]
    
    The expression PCC_NUM_RETRIES * pcc_chan->latency is currently being
    evaluated using 32-bit arithmetic.
    
    Since a value of type 'u64' is used to store the eventual result,
    and this result is later sent to the function usecs_to_jiffies with
    input parameter unsigned int, the current data type is too wide to
    store the value of ctx->usecs_lat.
    
    Change the data type of "usecs_lat" to a more suitable (narrower) type.
    
    Found by Linux Verification Center (linuxtesting.org) with SVACE.
    
    Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru>
    Link: https://lore.kernel.org/r/20250204095400.95013-1-a.vatoropin@crpt.ru
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

i2c: pxa: fix call balance of i2c->clk handling routines [+ + +]

Author: Vitalii Mordan <mordan@ispras.ru>
Date:   Wed Feb 12 20:28:03 2025 +0300

    i2c: pxa: fix call balance of i2c->clk handling routines
    
    [ Upstream commit be7113d2e2a6f20cbee99c98d261a1fd6fd7b549 ]
    
    If the clock i2c->clk was not enabled in i2c_pxa_probe(), it should not be
    disabled in any path.
    
    Found by Linux Verification Center (linuxtesting.org) with Klever.
    
    Signed-off-by: Vitalii Mordan <mordan@ispras.ru>
    Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
    Link: https://lore.kernel.org/r/20250212172803.1422136-1-mordan@ispras.ru
    Signed-off-by: Sasha Levin <sashal@kernel.org>

i2c: qup: Vote for interconnect bandwidth to DRAM [+ + +]

Author: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Date:   Tue Nov 28 10:48:37 2023 +0100

    i2c: qup: Vote for interconnect bandwidth to DRAM
    
    [ Upstream commit d4f35233a6345f62637463ef6e0708f44ffaa583 ]
    
    When the I2C QUP controller is used together with a DMA engine it needs
    to vote for the interconnect path to the DRAM. Otherwise it may be
    unable to access the memory quickly enough.
    
    The requested peak bandwidth is dependent on the I2C core clock.
    
    To avoid sending votes too often the bandwidth is always requested when
    a DMA transfer starts, but dropped only on runtime suspend. Runtime
    suspend should only happen if no transfer is active. After resumption we
    can defer the next vote until the first DMA transfer actually happens.
    
    The implementation is largely identical to the one introduced for
    spi-qup in commit ecdaa9473019 ("spi: qup: Vote for interconnect
    bandwidth to DRAM") since both drivers represent the same hardware
    block.
    
    Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
    Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
    Link: https://lore.kernel.org/r/20231128-i2c-qup-dvfs-v1-3-59a0e3039111@kernkonzept.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

i3c: master: svc: Fix implicit fallthrough in svc_i3c_master_ibi_work() [+ + +]

Author: Nathan Chancellor <nathan@kernel.org>
Date:   Wed Mar 19 09:08:01 2025 -0700

    i3c: master: svc: Fix implicit fallthrough in svc_i3c_master_ibi_work()
    
    commit e8d2d287e26d9bd9114cf258a123a6b70812442e upstream.
    
    Clang warns (or errors with CONFIG_WERROR=y):
    
      drivers/i3c/master/svc-i3c-master.c:596:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
        596 |         default:
            |         ^
      drivers/i3c/master/svc-i3c-master.c:596:2: note: insert 'break;' to avoid fall-through
        596 |         default:
            |         ^
            |         break;
      1 error generated.
    
    Clang is a little more pedantic than GCC, which does not warn when
    falling through to a case that is just break or return. Clang's version
    is more in line with the kernel's own stance in deprecated.rst, which
    states that all switch/case blocks must end in either break,
    fallthrough, continue, goto, or return. Add the missing break to silence
    the warning.
    
    Fixes: 0430bf9bc1ac ("i3c: master: svc: Fix missing STOP for master request")
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Link: https://lore.kernel.org/r/20250319-i3c-fix-clang-fallthrough-v1-1-d8e02be1ef5c@kernel.org
    Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i3c: master: svc: Fix missing STOP for master request [+ + +]

Author: Stanley Chu <yschu@nuvoton.com>
Date:   Tue Mar 18 13:36:06 2025 +0800

    i3c: master: svc: Fix missing STOP for master request
    
    [ Upstream commit 0430bf9bc1ac068c8b8c540eb93e5751872efc51 ]
    
    The controller driver nacked the master request but didn't emit a
    STOP to end the transaction. The driver shall refuse the unsupported
    requests and return the controller state to IDLE by emitting a STOP.
    
    Signed-off-by: Stanley Chu <yschu@nuvoton.com>
    Reviewed-by: Frank Li <Frank.Li@nxp.com>
    Link: https://lore.kernel.org/r/20250318053606.3087121-4-yschu@nuvoton.com
    Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

i3c: master: svc: Flush FIFO before sending Dynamic Address Assignment(DAA) [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Wed Jan 29 11:22:50 2025 -0500

    i3c: master: svc: Flush FIFO before sending Dynamic Address Assignment(DAA)
    
    [ Upstream commit a892ee4cf22a50e1d6988d0464a9a421f3e5db2f ]
    
    Ensure the FIFO is empty before issuing the DAA command to prevent
    incorrect command data from being sent. Align with other data transfers,
    such as svc_i3c_master_start_xfer_locked(), which flushes the FIFO before
    sending a command.
    
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
    Link: https://lore.kernel.org/r/20250129162250.3629189-1-Frank.Li@nxp.com
    Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: count combined queues using Rx/Tx count [+ + +]

Author: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Date:   Tue Dec 3 07:58:09 2024 +0100

    ice: count combined queues using Rx/Tx count
    
    [ Upstream commit c3a392bdd31adc474f1009ee85c13fdd01fe800d ]
    
    Previous implementation assumes that there is 1:1 matching between
    vectors and queues. It isn't always true.
    
    Get minimum value from Rx/Tx queues to determine combined queues number.
    
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
    Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: fix vf->num_mac count with port representors [+ + +]

Author: Jacob Keller <jacob.e.keller@intel.com>
Date:   Thu Apr 10 11:13:52 2025 -0700

    ice: fix vf->num_mac count with port representors
    
    [ Upstream commit bbd95160a03dbfcd01a541f25c27ddb730dfbbd5 ]
    
    The ice_vc_repr_add_mac() function indicates that it does not store the MAC
    address filters in the firmware. However, it still increments vf->num_mac.
    This is incorrect, as vf->num_mac should represent the number of MAC
    filters currently programmed to firmware.
    
    Indeed, we only perform this increment if the requested filter is a unicast
    address that doesn't match the existing vf->hw_lan_addr. In addition,
    ice_vc_repr_del_mac() does not decrement the vf->num_mac counter. This
    results in the counter becoming out of sync with the actual count.
    
    As it turns out, vf->num_mac is currently only used in legacy made without
    port representors. The single place where the value is checked is for
    enforcing a filter limit on untrusted VFs.
    
    Upcoming patches to support VF Live Migration will use this value when
    determining the size of the TLV for MAC address filters. Fix the
    representor mode function to stop incrementing the counter incorrectly.
    
    Fixes: ac19e03ef780 ("ice: allow process VF opcodes in different ways")
    Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
    Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ieee802154: ca8210: Use proper setters and getters for bitwise types [+ + +]

Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date:   Wed Mar 5 12:55:34 2025 +0200

    ieee802154: ca8210: Use proper setters and getters for bitwise types
    
    [ Upstream commit 169b2262205836a5d1213ff44dca2962276bece1 ]
    
    Sparse complains that the driver doesn't respect the bitwise types:
    
    drivers/net/ieee802154/ca8210.c:1796:27: warning: incorrect type in assignment (different base types)
    drivers/net/ieee802154/ca8210.c:1796:27:    expected restricted __le16 [addressable] [assigned] [usertype] pan_id
    drivers/net/ieee802154/ca8210.c:1796:27:    got unsigned short [usertype]
    drivers/net/ieee802154/ca8210.c:1801:25: warning: incorrect type in assignment (different base types)
    drivers/net/ieee802154/ca8210.c:1801:25:    expected restricted __le16 [addressable] [assigned] [usertype] pan_id
    drivers/net/ieee802154/ca8210.c:1801:25:    got unsigned short [usertype]
    drivers/net/ieee802154/ca8210.c:1928:28: warning: incorrect type in argument 3 (different base types)
    drivers/net/ieee802154/ca8210.c:1928:28:    expected unsigned short [usertype] dst_pan_id
    drivers/net/ieee802154/ca8210.c:1928:28:    got restricted __le16 [addressable] [usertype] pan_id
    
    Use proper setters and getters for bitwise types.
    
    Note, in accordance with [1] the protocol is little endian.
    
    Link: https://www.cascoda.com/wp-content/uploads/2018/11/CA-8210_datasheet_0418.pdf [1]
    Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
    Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
    Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Link: https://lore.kernel.org/20250305105656.2133487-2-andriy.shevchenko@linux.intel.com
    Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

io_uring/fdinfo: annotate racy sq/cq head/tail reads [+ + +]

Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Apr 30 07:17:17 2025 -0600

    io_uring/fdinfo: annotate racy sq/cq head/tail reads
    
    [ Upstream commit f024d3a8ded0d8d2129ae123d7a5305c29ca44ce ]
    
    syzbot complains about the cached sq head read, and it's totally right.
    But we don't need to care, it's just reading fdinfo, and reading the
    CQ or SQ tail/head entries are known racy in that they are just a view
    into that very instant and may of course be outdated by the time they
    are reported.
    
    Annotate both the SQ head and CQ tail read with data_race() to avoid
    this syzbot complaint.
    
    Link: https://lore.kernel.org/io-uring/6811f6dc.050a0220.39e3a1.0d0e.GAE@google.com/
    Reported-by: syzbot+3e77fd302e99f5af9394@syzkaller.appspotmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

io_uring: fix overflow resched cqe reordering [+ + +]

Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Sat May 17 13:27:37 2025 +0100

    io_uring: fix overflow resched cqe reordering
    
    [ Upstream commit a7d755ed9ce9738af3db602eb29d32774a180bc7 ]
    
    Leaving the CQ critical section in the middle of a overflow flushing
    can cause cqe reordering since the cache cq pointers are reset and any
    new cqe emitters that might get called in between are not going to be
    forced into io_cqe_cache_refill().
    
    Fixes: eac2ca2d682f9 ("io_uring: check if we need to reschedule during overflow flush")
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/90ba817f1a458f091f355f407de1c911d2b93bbf.1747483784.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

iommu/amd/pgtbl_v2: Improve error handling [+ + +]

Author: Vasant Hegde <vasant.hegde@amd.com>
Date:   Thu Feb 27 16:23:16 2025 +0000

    iommu/amd/pgtbl_v2: Improve error handling
    
    [ Upstream commit 36a1cfd497435ba5e37572fe9463bb62a7b1b984 ]
    
    Return -ENOMEM if v2_alloc_pte() fails to allocate memory.
    
    Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
    Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
    Link: https://lore.kernel.org/r/20250227162320.5805-4-vasant.hegde@amd.com
    Signed-off-by: Joerg Roedel <jroedel@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure(). [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Fri Feb 7 16:24:58 2025 +0900

    ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure().
    
    [ Upstream commit 5a1ccffd30a08f5a2428cd5fbb3ab03e8eb6c66d ]
    
    The following patch will not set skb->sk from VRF path.
    
    Let's fetch net from fib_rule->fr_net instead of sock_net(skb->sk)
    in fib[46]_rule_configure().
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Tested-by: Ido Schimmel <idosch@nvidia.com>
    Link: https://patch.msgid.link/20250207072502.87775-5-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config(). [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Thu Feb 27 20:23:27 2025 -0800

    ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config().
    
    [ Upstream commit 254ba7e6032d3fc738050d500b0c1d8197af90ca ]
    
    fib_valid_key_len() is called in the beginning of fib_table_insert()
    or fib_table_delete() to check if the prefix length is valid.
    
    fib_table_insert() and fib_table_delete() are called from 3 paths
    
      - ip_rt_ioctl()
      - inet_rtm_newroute() / inet_rtm_delroute()
      - fib_magic()
    
    In the first ioctl() path, rtentry_to_fib_config() checks the prefix
    length with bad_mask().  Also, fib_magic() always passes the correct
    prefix: 32 or ifa->ifa_prefixlen, which is already validated.
    
    Let's move fib_valid_key_len() to the rtnetlink path, rtm_to_fib_config().
    
    While at it, 2 direct returns in rtm_to_fib_config() are changed to
    goto to match other places in the same function
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://patch.msgid.link/20250228042328.96624-12-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv6: save dontfrag in cork [+ + +]

Author: Willem de Bruijn <willemb@google.com>
Date:   Thu Mar 6 22:34:09 2025 -0500

    ipv6: save dontfrag in cork
    
    [ Upstream commit a18dfa9925b9ef6107ea3aa5814ca3c704d34a8a ]
    
    When spanning datagram construction over multiple send calls using
    MSG_MORE, per datagram settings are configured on the first send.
    
    That is when ip(6)_setup_cork stores these settings for subsequent use
    in __ip(6)_append_data and others.
    
    The only flag that escaped this was dontfrag. As a result, a datagram
    could be constructed with df=0 on the first sendmsg, but df=1 on a
    next. Which is what cmsg_ip.sh does in an upcoming MSG_MORE test in
    the "diff" scenario.
    
    Changing datagram conditions in the middle of constructing an skb
    makes this already complex code path even more convoluted. It is here
    unintentional. Bring this flag in line with expected sockopt/cmsg
    behavior.
    
    And stop passing ipc6 to __ip6_append_data, to avoid such issues
    in the future. This is already the case for __ip_append_data.
    
    inet6_cork had a 6 byte hole, so the 1B flag has no impact.
    
    Signed-off-by: Willem de Bruijn <willemb@google.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/20250307033620.411611-3-willemdebruijn.kernel@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

kbuild: Disable -Wdefault-const-init-unsafe [+ + +]

Author: Nathan Chancellor <nathan@kernel.org>
Date:   Tue May 6 14:02:01 2025 -0700

    kbuild: Disable -Wdefault-const-init-unsafe
    
    commit d0afcfeb9e3810ec89d1ffde1a0e36621bb75dca upstream.
    
    A new on by default warning in clang [1] aims to flags instances where
    const variables without static or thread local storage or const members
    in aggregate types are not initialized because it can lead to an
    indeterminate value. This is quite noisy for the kernel due to
    instances originating from header files such as:
    
      drivers/gpu/drm/i915/gt/intel_ring.h:62:2: error: default initialization of an object of type 'typeof (ring->size)' (aka 'const unsigned int') leaves the object uninitialized [-Werror,-Wdefault-const-init-var-unsafe]
         62 |         typecheck(typeof(ring->size), next);
            |         ^
      include/linux/typecheck.h:10:9: note: expanded from macro 'typecheck'
         10 | ({      type __dummy; \
            |              ^
    
      include/net/ip.h:478:14: error: default initialization of an object of type 'typeof (rt->dst.expires)' (aka 'const unsigned long') leaves the object uninitialized [-Werror,-Wdefault-const-init-var-unsafe]
        478 |                 if (mtu && time_before(jiffies, rt->dst.expires))
            |                            ^
      include/linux/jiffies.h:138:26: note: expanded from macro 'time_before'
        138 | #define time_before(a,b)        time_after(b,a)
            |                                 ^
      include/linux/jiffies.h:128:3: note: expanded from macro 'time_after'
        128 |         (typecheck(unsigned long, a) && \
            |          ^
      include/linux/typecheck.h:11:12: note: expanded from macro 'typecheck'
         11 |         typeof(x) __dummy2; \
            |                   ^
    
      include/linux/list.h:409:27: warning: default initialization of an object of type 'union (unnamed union at include/linux/list.h:409:27)' with const member leaves the object uninitialized [-Wdefault-const-init-field-unsafe]
        409 |         struct list_head *next = smp_load_acquire(&head->next);
            |                                  ^
      include/asm-generic/barrier.h:176:29: note: expanded from macro 'smp_load_acquire'
        176 | #define smp_load_acquire(p) __smp_load_acquire(p)
            |                             ^
      arch/arm64/include/asm/barrier.h:164:59: note: expanded from macro '__smp_load_acquire'
        164 |         union { __unqual_scalar_typeof(*p) __val; char __c[1]; } __u;   \
            |                                                                  ^
      include/linux/list.h:409:27: note: member '__val' declared 'const' here
    
      crypto/scatterwalk.c:66:22: error: default initialization of an object of type 'struct scatter_walk' with const member leaves the object uninitialized [-Werror,-Wdefault-const-init-field-unsafe]
         66 |         struct scatter_walk walk;
            |                             ^
      include/crypto/algapi.h:112:15: note: member 'addr' declared 'const' here
        112 |                 void *const addr;
            |                             ^
    
      fs/hugetlbfs/inode.c:733:24: error: default initialization of an object of type 'struct vm_area_struct' with const member leaves the object uninitialized [-Werror,-Wdefault-const-init-field-unsafe]
        733 |         struct vm_area_struct pseudo_vma;
            |                               ^
      include/linux/mm_types.h:803:20: note: member 'vm_flags' declared 'const' here
        803 |                 const vm_flags_t vm_flags;
            |                                  ^
    
    Silencing the instances from typecheck.h is difficult because '= {}' is
    not available in older but supported compilers and '= {0}' would cause
    warnings about a literal 0 being treated as NULL. While it might be
    possible to come up with a local hack to silence the warning for
    clang-21+, it may not be worth it since -Wuninitialized will still
    trigger if an uninitialized const variable is actually used.
    
    In all audited cases of the "field" variant of the warning, the members
    are either not used in the particular call path, modified through other
    means such as memset() / memcpy() because the containing object is not
    const, or are within a union with other non-const members.
    
    Since this warning does not appear to have a high signal to noise ratio,
    just disable it.
    
    Cc: stable@vger.kernel.org
    Link: https://github.com/llvm/llvm-project/commit/576161cb6069e2c7656a8ef530727a0f4aefff30 [1]
    Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Closes: https://lore.kernel.org/CA+G9fYuNjKcxFKS_MKPRuga32XbndkLGcY-PVuoSwzv6VWbY=w@mail.gmail.com/
    Reported-by: Marcus Seyfarth <m.seyfarth@gmail.com>
    Closes: https://github.com/ClangBuiltLinux/linux/issues/2088
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    [nathan: Apply change to Makefile instead of scripts/Makefile.extrawarn
             due to lack of e88ca24319e4 in older stable branches]
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kbuild: fix argument parsing in scripts/config [+ + +]

Author: Seyediman Seyedarab <imandevel@gmail.com>
Date:   Sat Mar 1 17:21:37 2025 -0500

    kbuild: fix argument parsing in scripts/config
    
    [ Upstream commit f757f6011c92b5a01db742c39149bed9e526478f ]
    
    The script previously assumed --file was always the first argument,
    which caused issues when it appeared later. This patch updates the
    parsing logic to scan all arguments to find --file, sets the config
    file correctly, and resets the argument list with the remaining
    commands.
    
    It also fixes --refresh to respect --file by passing KCONFIG_CONFIG=$FN
    to make oldconfig.
    
    Signed-off-by: Seyediman Seyedarab <imandevel@gmail.com>
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

kconfig: merge_config: use an empty file as initfile [+ + +]

Author: Daniel Gomez <da.gomez@samsung.com>
Date:   Fri Mar 28 14:28:37 2025 +0000

    kconfig: merge_config: use an empty file as initfile
    
    [ Upstream commit a26fe287eed112b4e21e854f173c8918a6a8596d ]
    
    The scripts/kconfig/merge_config.sh script requires an existing
    $INITFILE (or the $1 argument) as a base file for merging Kconfig
    fragments. However, an empty $INITFILE can serve as an initial starting
    point, later referenced by the KCONFIG_ALLCONFIG Makefile variable
    if -m is not used. This variable can point to any configuration file
    containing preset config symbols (the merged output) as stated in
    Documentation/kbuild/kconfig.rst. When -m is used $INITFILE will
    contain just the merge output requiring the user to run make (i.e.
    KCONFIG_ALLCONFIG=<$INITFILE> make <allnoconfig/alldefconfig> or make
    olddefconfig).
    
    Instead of failing when `$INITFILE` is missing, create an empty file and
    use it as the starting point for merges.
    
    Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ksmbd: fix stream write failure [+ + +]

Author: Namjae Jeon <linkinjeon@kernel.org>
Date:   Thu May 8 16:46:11 2025 +0900

    ksmbd: fix stream write failure
    
    [ Upstream commit 1f4bbedd4e5a69b01cde2cc21d01151ab2d0884f ]
    
    If there is no stream data in file, v_len is zero.
    So, If position(*pos) is zero, stream write will fail
    due to stream write position validation check.
    This patch reorganize stream write position validation.
    
    Fixes: 0ca6df4f40cf ("ksmbd: prevent out-of-bounds stream writes by validating *pos")
    Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

kunit: tool: Use qboot on QEMU x86_64 [+ + +]

Author: Brendan Jackman <jackmanb@google.com>
Date:   Fri Jan 24 11:01:42 2025 +0000

    kunit: tool: Use qboot on QEMU x86_64
    
    [ Upstream commit 08fafac4c9f289a9d9a22d838921e4b3eb22c664 ]
    
    As noted in [0], SeaBIOS (QEMU default) makes a mess of the terminal,
    qboot does not.
    
    It turns out this is actually useful with kunit.py, since the user is
    exposed to this issue if they set --raw_output=all.
    
    qboot is also faster than SeaBIOS, but it's is marginal for this
    usecase.
    
    [0] https://lore.kernel.org/all/CA+i-1C0wYb-gZ8Mwh3WSVpbk-LF-Uo+njVbASJPe1WXDURoV7A@mail.gmail.com/
    
    Both SeaBIOS and qboot are x86-specific.
    
    Link: https://lore.kernel.org/r/20250124-kunit-qboot-v1-1-815e4d4c6f7c@google.com
    Signed-off-by: Brendan Jackman <jackmanb@google.com>
    Reviewed-by: David Gow <davidgow@google.com>
    Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

leds: pwm-multicolor: Add check for fwnode_property_read_u32 [+ + +]

Author: Yuanjun Gong <ruc_gongyuanjun@163.com>
Date:   Sun Feb 23 20:14:59 2025 +0800

    leds: pwm-multicolor: Add check for fwnode_property_read_u32
    
    [ Upstream commit 6d91124e7edc109f114b1afe6d00d85d0d0ac174 ]
    
    Add a check to the return value of fwnode_property_read_u32()
    in case it fails.
    
    Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
    Link: https://lore.kernel.org/r/20250223121459.2889484-1-ruc_gongyuanjun@163.com
    Signed-off-by: Lee Jones <lee@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

libbpf: Fix out-of-bound read [+ + +]

Author: Nandakumar Edamana <nandakumar@nandakumar.co.in>
Date:   Sat Feb 22 02:31:11 2025 +0530

    libbpf: Fix out-of-bound read
    
    [ Upstream commit 236d3910117e9f97ebf75e511d8bcc950f1a4e5f ]
    
    In `set_kcfg_value_str`, an untrusted string is accessed with the assumption
    that it will be at least two characters long due to the presence of checks for
    opening and closing quotes. But the check for the closing quote
    (value[len - 1] != '"') misses the fact that it could be checking the opening
    quote itself in case of an invalid input that consists of just the opening
    quote.
    
    This commit adds an explicit check to make sure the string is at least two
    characters long.
    
    Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20250221210110.3182084-1-nandakumar@nandakumar.co.in
    Signed-off-by: Sasha Levin <sashal@kernel.org>

libnvdimm/labels: Fix divide error in nd_label_data_init() [+ + +]

Author: Robert Richter <rrichter@amd.com>
Date:   Thu Mar 20 12:22:22 2025 +0100

    libnvdimm/labels: Fix divide error in nd_label_data_init()
    
    [ Upstream commit ef1d3455bbc1922f94a91ed58d3d7db440652959 ]
    
    If a faulty CXL memory device returns a broken zero LSA size in its
    memory device information (Identify Memory Device (Opcode 4000h), CXL
    spec. 3.1, 8.2.9.9.1.1), a divide error occurs in the libnvdimm
    driver:
    
     Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI
     RIP: 0010:nd_label_data_init+0x10e/0x800 [libnvdimm]
    
    Code and flow:
    
    1) CXL Command 4000h returns LSA size = 0
    2) config_size is assigned to zero LSA size (CXL pmem driver):
    
    drivers/cxl/pmem.c:             .config_size = mds->lsa_size,
    
    3) max_xfer is set to zero (nvdimm driver):
    
    drivers/nvdimm/label.c: max_xfer = min_t(size_t, ndd->nsarea.max_xfer, config_size);
    
    4) A subsequent DIV_ROUND_UP() causes a division by zero:
    
    drivers/nvdimm/label.c: /* Make our initial read size a multiple of max_xfer size */
    drivers/nvdimm/label.c: read_size = min(DIV_ROUND_UP(read_size, max_xfer) * max_xfer,
    drivers/nvdimm/label.c-                 config_size);
    
    Fix this by checking the config size parameter by extending an
    existing check.
    
    Signed-off-by: Robert Richter <rrichter@amd.com>
    Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
    Reviewed-by: Ira Weiny <ira.weiny@intel.com>
    Link: https://patch.msgid.link/20250320112223.608320-1-rrichter@amd.com
    Signed-off-by: Ira Weiny <ira.weiny@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Linux: Linux 6.1.141 [+ + +]

Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Wed Jun 4 14:40:26 2025 +0200

    Linux 6.1.141
    
    Link: https://lore.kernel.org/r/20250602134319.723650984@linuxfoundation.org
    Tested-by: Peter Schneider <pschneider1968@googlemail.com>
    Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Tested-by: Shuah Khan <skhan@linuxfoundation.org>
    Tested-by: Mark Brown <broonie@kernel.org>
    Tested-by: Miguel Ojeda <ojeda@kernel.org>
    Tested-by: Hardik Garg <hargar@linux.microsoft.com>
    Tested-by: Salvatore Bonaccorso <carnil@debian.org>
    Tested-by: Jon Hunter <jonathanh@nvidia.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

llc: fix data loss when reading from a socket in llc_ui_recvmsg() [+ + +]

Author: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
Date:   Thu May 15 12:20:15 2025 +0000

    llc: fix data loss when reading from a socket in llc_ui_recvmsg()
    
    commit 239af1970bcb039a1551d2c438d113df0010c149 upstream.
    
    For SOCK_STREAM sockets, if user buffer size (len) is less
    than skb size (skb->len), the remaining data from skb
    will be lost after calling kfree_skb().
    
    To fix this, move the statement for partial reading
    above skb deletion.
    
    Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org)
    
    Fixes: 30a584d944fb ("[LLX]: SOCK_DGRAM interface fixes")
    Cc: stable@vger.kernel.org
    Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

lockdep: Fix wait context check on softirq for PREEMPT_RT [+ + +]

Author: Ryo Takakura <ryotkkr98@gmail.com>
Date:   Fri Mar 21 07:33:22 2025 -0700

    lockdep: Fix wait context check on softirq for PREEMPT_RT
    
    [ Upstream commit 61c39d8c83e2077f33e0a2c8980a76a7f323f0ce ]
    
    Since:
    
      0c1d7a2c2d32 ("lockdep: Remove softirq accounting on PREEMPT_RT.")
    
    the wait context test for mutex usage within "in softirq context" fails
    as it references @softirq_context:
    
        | wait context tests |
        --------------------------------------------------------------------------
                                       | rcu  | raw  | spin |mutex |
        --------------------------------------------------------------------------
                     in hardirq context:  ok  |  ok  |  ok  |  ok  |
      in hardirq context (not threaded):  ok  |  ok  |  ok  |  ok  |
                     in softirq context:  ok  |  ok  |  ok  |FAILED|
    
    As a fix, add lockdep map for BH disabled section. This fixes the
    issue by letting us catch cases when local_bh_disable() gets called
    with preemption disabled where local_lock doesn't get acquired.
    In the case of "in softirq context" selftest, local_bh_disable() was
    being called with preemption disable as it's early in the boot.
    
    [ boqun: Move the lockdep annotations into __local_bh_*() to avoid false
             positives because of unpaired local_bh_disable() reported by
             Borislav Petkov and Peter Zijlstra, and make bh_lock_map
             only exist for PREEMPT_RT. ]
    
    [ mingo: Restored authorship and improved the bh_lock_map definition. ]
    
    Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com>
    Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20250321143322.79651-1-boqun.feng@gmail.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mailbox: use error ret code of of_parse_phandle_with_args() [+ + +]

Author: Tudor Ambarus <tudor.ambarus@linaro.org>
Date:   Mon Feb 24 08:27:13 2025 +0000

    mailbox: use error ret code of of_parse_phandle_with_args()
    
    [ Upstream commit 24fdd5074b205cfb0ef4cd0751a2d03031455929 ]
    
    In case of error, of_parse_phandle_with_args() returns -EINVAL when the
    passed index is negative, or -ENOENT when the index is for an empty
    phandle. The mailbox core overwrote the error return code with a less
    precise -ENODEV. Use the error returned code from
    of_parse_phandle_with_args().
    
    Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
    Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

media: adv7180: Disable test-pattern control on adv7180 [+ + +]

Author: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Date:   Sat Feb 22 00:09:07 2025 +0100

    media: adv7180: Disable test-pattern control on adv7180
    
    [ Upstream commit a980bc5f56b0292336e408f657f79e574e8067c0 ]
    
    The register that enables selecting a test-pattern to be outputted in
    free-run mode (FREE_RUN_PAT_SEL[2:0]) is only available on adv7280 based
    devices, not the adv7180 based ones.
    
    Add a flag to mark devices that are capable of generating test-patterns,
    and those that are not. And only register the control on supported
    devices.
    
    Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
    Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

media: c8sectpfe: Call of_node_put(i2c_bus) only once in c8sectpfe_probe() [+ + +]

Author: Markus Elfring <elfring@users.sourceforge.net>
Date:   Fri Oct 4 15:50:15 2024 +0200

    media: c8sectpfe: Call of_node_put(i2c_bus) only once in c8sectpfe_probe()
    
    [ Upstream commit b773530a34df0687020520015057075f8b7b4ac4 ]
    
    An of_node_put(i2c_bus) call was immediately used after a pointer check
    for an of_find_i2c_adapter_by_node() call in this function implementation.
    Thus call such a function only once instead directly before the check.
    
    This issue was transformed by using the Coccinelle software.
    
    Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
    Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

media: cx231xx: set device_caps for 417 [+ + +]

Author: Hans Verkuil <hverkuil@xs4all.nl>
Date:   Mon Feb 24 14:13:24 2025 +0100

    media: cx231xx: set device_caps for 417
    
    [ Upstream commit a79efc44b51432490538a55b9753a721f7d3ea42 ]
    
    The video_device for the MPEG encoder did not set device_caps.
    
    Add this, otherwise the video device can't be registered (you get a
    WARN_ON instead).
    
    Not seen before since currently 417 support is disabled, but I found
    this while experimenting with it.
    
    Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

media: qcom: camss: csid: Only add TPG v4l2 ctrl if TPG hardware is available [+ + +]

Author: Depeng Shao <quic_depengs@quicinc.com>
Date:   Mon Jan 13 10:01:28 2025 +0530

    media: qcom: camss: csid: Only add TPG v4l2 ctrl if TPG hardware is available
    
    [ Upstream commit 2f1361f862a68063f37362f1beb400e78e289581 ]
    
    There is no CSID TPG on some SoCs, so the v4l2 ctrl in CSID driver
    shouldn't be registered. Checking the supported TPG modes to indicate
    if the TPG hardware exists or not and only registering v4l2 ctrl for
    CSID only when the TPG hardware is present.
    
    Signed-off-by: Depeng Shao <quic_depengs@quicinc.com>
    Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

media: test-drivers: vivid: don't call schedule in loop [+ + +]

Author: Hans Verkuil <hverkuil@xs4all.nl>
Date:   Mon Dec 9 16:00:16 2024 +0100

    media: test-drivers: vivid: don't call schedule in loop
    
    [ Upstream commit e4740118b752005cbed339aec9a1d1c43620e0b9 ]
    
    Artem reported that the CPU load was 100% when capturing from
    vivid at low resolution with ffmpeg.
    
    This was caused by:
    
    while (time_is_after_jiffies(cur_jiffies + wait_jiffies) &&
           !kthread_should_stop())
            schedule();
    
    If there are no other processes running that can be scheduled,
    then this is basically a busy-loop.
    
    Change it to wait_event_interruptible_timeout() which doesn't
    have that problem.
    
    Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
    Reported-by: Artem S. Tashkinov <aros@gmx.com>
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219570
    Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

media: uvcvideo: Add sanity check to uvc_ioctl_xu_ctrl_map [+ + +]

Author: Ricardo Ribalda <ribalda@chromium.org>
Date:   Mon Feb 3 11:55:51 2025 +0000

    media: uvcvideo: Add sanity check to uvc_ioctl_xu_ctrl_map
    
    [ Upstream commit 990262fdfce24d6055df9711424343d94d829e6a ]
    
    Do not process unknown data types.
    
    Tested-by: Yunke Cao <yunkec@google.com>
    Reviewed-by: Hans de Goede <hdegoede@redhat.com>
    Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
    Link: https://lore.kernel.org/r/20250203-uvc-roi-v17-15-5900a9fed613@chromium.org
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

memcg: always call cond_resched() after fn() [+ + +]

Author: Breno Leitao <leitao@debian.org>
Date:   Fri May 23 10:21:06 2025 -0700

    memcg: always call cond_resched() after fn()
    
    commit 06717a7b6c86514dbd6ab322e8083ffaa4db5712 upstream.
    
    I am seeing soft lockup on certain machine types when a cgroup OOMs.  This
    is happening because killing the process in certain machine might be very
    slow, which causes the soft lockup and RCU stalls.  This happens usually
    when the cgroup has MANY processes and memory.oom.group is set.
    
    Example I am seeing in real production:
    
           [462012.244552] Memory cgroup out of memory: Killed process 3370438 (crosvm) ....
           ....
           [462037.318059] Memory cgroup out of memory: Killed process 4171372 (adb) ....
           [462037.348314] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [stat_manager-ag:1618982]
           ....
    
    Quick look at why this is so slow, it seems to be related to serial flush
    for certain machine types.  For all the crashes I saw, the target CPU was
    at console_flush_all().
    
    In the case above, there are thousands of processes in the cgroup, and it
    is soft locking up before it reaches the 1024 limit in the code (which
    would call the cond_resched()).  So, cond_resched() in 1024 blocks is not
    sufficient.
    
    Remove the counter-based conditional rescheduling logic and call
    cond_resched() unconditionally after each task iteration, after fn() is
    called.  This avoids the lockup independently of how slow fn() is.
    
    Link: https://lkml.kernel.org/r/20250523-memcg_fix-v1-1-ad3eafb60477@debian.org
    Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Suggested-by: Rik van Riel <riel@surriel.com>
    Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
    Cc: Michael van der Westhuizen <rmikey@meta.com>
    Cc: Usama Arif <usamaarif642@gmail.com>
    Cc: Pavel Begunkov <asml.silence@gmail.com>
    Cc: Chen Ridong <chenridong@huawei.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: pm-cps: Use per-CPU variables as per-CPU, not per-core [+ + +]

Author: Paul Burton <paulburton@kernel.org>
Date:   Wed Jan 29 13:32:48 2025 +0100

    MIPS: pm-cps: Use per-CPU variables as per-CPU, not per-core
    
    [ Upstream commit 00a134fc2bb4a5f8fada58cf7ff4259149691d64 ]
    
    The pm-cps code has up until now used per-CPU variables indexed by core,
    rather than CPU number, in order to share data amongst sibling CPUs (ie.
    VPs/threads in a core). This works fine for single cluster systems, but
    with multi-cluster systems a core number is no longer unique in the
    system, leading to sharing between CPUs that are not actually siblings.
    
    Avoid this issue by using per-CPU variables as they are more generally
    used - ie. access them using CPU numbers rather than core numbers.
    Sharing between siblings is then accomplished by:
     - Assigning the same pointer to entries for each sibling CPU for the
       nc_asm_enter & ready_count variables, which allow this by virtue of
       being per-CPU pointers.
    
     - Indexing by the first CPU set in a CPUs cpu_sibling_map in the case
       of pm_barrier, for which we can't use the previous approach because
       the per-CPU variable is not a pointer.
    
    Signed-off-by: Paul Burton <paulburton@kernel.org>
    Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com>
    Signed-off-by: Aleksandar Rikalo <arikalo@gmail.com>
    Tested-by: Serge Semin <fancer.lancer@gmail.com>
    Tested-by: Gregory CLEMENT <gregory.clement@bootlin.com>
    Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

MIPS: Use arch specific syscall name match function [+ + +]

Author: Bibo Mao <maobibo@loongson.cn>
Date:   Tue Jun 9 10:54:35 2020 +0800

    MIPS: Use arch specific syscall name match function
    
    [ Upstream commit 756276ce78d5624dc814f9d99f7d16c8fd51076e ]
    
    On MIPS system, most of the syscall function name begin with prefix
    sys_. Some syscalls are special such as clone/fork, function name of
    these begin with __sys_. Since scratch registers need be saved in
    stack when these system calls happens.
    
    With ftrace system call method, system call functions are declared with
    SYSCALL_DEFINEx, metadata of the system call symbol name begins with
    sys_. Here mips specific function arch_syscall_match_sym_name is used to
    compare function name between sys_call_table[] and metadata of syscall
    symbol.
    
    Signed-off-by: Bibo Mao <maobibo@loongson.cn>
    Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mm/page_alloc.c: avoid infinite retries caused by cpuset race [+ + +]

Author: Tianyang Zhang <zhangtianyang@loongson.cn>
Date:   Wed Apr 16 16:24:05 2025 +0800

    mm/page_alloc.c: avoid infinite retries caused by cpuset race
    
    commit e05741fb10c38d70bbd7ec12b23c197b6355d519 upstream.
    
    __alloc_pages_slowpath has no change detection for ac->nodemask in the
    part of retry path, while cpuset can modify it in parallel.  For some
    processes that set mempolicy as MPOL_BIND, this results ac->nodemask
    changes, and then the should_reclaim_retry will judge based on the latest
    nodemask and jump to retry, while the get_page_from_freelist only
    traverses the zonelist from ac->preferred_zoneref, which selected by a
    expired nodemask and may cause infinite retries in some cases
    
    cpu 64:
    __alloc_pages_slowpath {
            /* ..... */
    retry:
            /* ac->nodemask = 0x1, ac->preferred->zone->nid = 1 */
            if (alloc_flags & ALLOC_KSWAPD)
                    wake_all_kswapds(order, gfp_mask, ac);
            /* cpu 1:
            cpuset_write_resmask
                update_nodemask
                    update_nodemasks_hier
                        update_tasks_nodemask
                            mpol_rebind_task
                             mpol_rebind_policy
                              mpol_rebind_nodemask
                    // mempolicy->nodes has been modified,
                    // which ac->nodemask point to
    
            */
            /* ac->nodemask = 0x3, ac->preferred->zone->nid = 1 */
            if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
                                     did_some_progress > 0, &no_progress_loops))
                    goto retry;
    }
    
    Simultaneously starting multiple cpuset01 from LTP can quickly reproduce
    this issue on a multi node server when the maximum memory pressure is
    reached and the swap is enabled
    
    Link: https://lkml.kernel.org/r/20250416082405.20988-1-zhangtianyang@loongson.cn
    Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
    Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
    Reviewed-by: Suren Baghdasaryan <surenb@google.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Brendan Jackman <jackmanb@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Zi Yan <ziy@nvidia.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: dw_mmc: add exynos7870 DW MMC support [+ + +]

Author: Kaustabh Chakraborty <kauschluss@disroot.org>
Date:   Wed Feb 19 00:17:49 2025 +0530

    mmc: dw_mmc: add exynos7870 DW MMC support
    
    [ Upstream commit 7cbe799ac10fd8be85af5e0615c4337f81e575f3 ]
    
    Add support for Exynos7870 DW MMC controllers, for both SMU and non-SMU
    variants. These controllers require a quirk to access 64-bit FIFO in 32-bit
    accesses (DW_MMC_QUIRK_FIFO64_32).
    
    Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
    Link: https://lore.kernel.org/r/20250219-exynos7870-mmc-v2-3-b4255a3e39ed@disroot.org
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mmc: host: Wait for Vdd to settle on card power off [+ + +]

Author: Erick Shepherd <erick.shepherd@ni.com>
Date:   Fri Mar 14 14:50:21 2025 -0500

    mmc: host: Wait for Vdd to settle on card power off
    
    [ Upstream commit 31e75ed964582257f59156ce6a42860e1ae4cc39 ]
    
    The SD spec version 6.0 section 6.4.1.5 requires that Vdd must be
    lowered to less than 0.5V for a minimum of 1 ms when powering off a
    card. Increase wait to 15 ms so that voltage has time to drain down
    to 0.5V and cards can power off correctly. Issues with voltage drain
    time were only observed on Apollo Lake and Bay Trail host controllers
    so this fix is limited to those devices.
    
    Signed-off-by: Erick Shepherd <erick.shepherd@ni.com>
    Acked-by: Adrian Hunter <adrian.hunter@intel.com>
    Link: https://lore.kernel.org/r/20250314195021.1588090-1-erick.shepherd@ni.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mmc: sdhci: Disable SD card clock before changing parameters [+ + +]

Author: Erick Shepherd <erick.shepherd@ni.com>
Date:   Tue Feb 11 15:46:45 2025 -0600

    mmc: sdhci: Disable SD card clock before changing parameters
    
    [ Upstream commit fb3bbc46c94f261b6156ee863c1b06c84cf157dc ]
    
    Per the SD Host Controller Simplified Specification v4.20 §3.2.3, change
    the SD card clock parameters only after first disabling the external card
    clock. Doing this fixes a spurious clock pulse on Baytrail and Apollo Lake
    SD controllers which otherwise breaks voltage switching with a specific
    Swissbit SD card.
    
    Signed-off-by: Kyle Roeschley <kyle.roeschley@ni.com>
    Signed-off-by: Brad Mouring <brad.mouring@ni.com>
    Signed-off-by: Erick Shepherd <erick.shepherd@ni.com>
    Acked-by: Adrian Hunter <adrian.hunter@intel.com>
    Link: https://lore.kernel.org/r/20250211214645.469279-1-erick.shepherd@ni.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mana: fix warning in the writer of client oob [+ + +]

Author: Konstantin Taranov <kotaranov@microsoft.com>
Date:   Mon Jan 20 09:27:14 2025 -0800

    net/mana: fix warning in the writer of client oob
    
    [ Upstream commit 5ec7e1c86c441c46a374577bccd9488abea30037 ]
    
    Do not warn on missing pad_data when oob is in sgl.
    
    Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
    Link: https://patch.msgid.link/1737394039-28772-9-git-send-email-kotaranov@linux.microsoft.com
    Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com>
    Reviewed-by: Long Li <longli@microsoft.com>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx4_core: Avoid impossible mlx4_db_alloc() order value [+ + +]

Author: Kees Cook <kees@kernel.org>
Date:   Mon Feb 10 09:45:05 2025 -0800

    net/mlx4_core: Avoid impossible mlx4_db_alloc() order value
    
    [ Upstream commit 4a6f18f28627e121bd1f74b5fcc9f945d6dbeb1e ]
    
    GCC can see that the value range for "order" is capped, but this leads
    it to consider that it might be negative, leading to a false positive
    warning (with GCC 15 with -Warray-bounds -fdiagnostics-details):
    
    ../drivers/net/ethernet/mellanox/mlx4/alloc.c:691:47: error: array subscript -1 is below array bounds of 'long unsigned int *[2]' [-Werror=array-bounds=]
      691 |                 i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o);
          |                                    ~~~~~~~~~~~^~~
      'mlx4_alloc_db_from_pgdir': events 1-2
      691 |                 i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o);                        |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                     |                         |                                                   |                     |                         (2) out of array bounds here
          |                     (1) when the condition is evaluated to true                             In file included from ../drivers/net/ethernet/mellanox/mlx4/mlx4.h:53,
                     from ../drivers/net/ethernet/mellanox/mlx4/alloc.c:42:
    ../include/linux/mlx4/device.h:664:33: note: while referencing 'bits'
      664 |         unsigned long          *bits[2];
          |                                 ^~~~
    
    Switch the argument to unsigned int, which removes the compiler needing
    to consider negative values.
    
    Signed-off-by: Kees Cook <kees@kernel.org>
    Link: https://patch.msgid.link/20250210174504.work.075-kees@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: Apply rate-limiting to high temperature warning [+ + +]

Author: Shahar Shitrit <shshitrit@nvidia.com>
Date:   Thu Feb 13 11:46:38 2025 +0200

    net/mlx5: Apply rate-limiting to high temperature warning
    
    [ Upstream commit 9dd3d5d258aceb37bdf09c8b91fa448f58ea81f0 ]
    
    Wrap the high temperature warning in a temperature event with
    a call to net_ratelimit() to prevent flooding the kernel log
    with repeated warning messages when temperature exceeds the
    threshold multiple times within a short duration.
    
    Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
    Link: https://patch.msgid.link/20250213094641.226501-2-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: Avoid report two health errors on same syndrome [+ + +]

Author: Moshe Shemesh <moshe@nvidia.com>
Date:   Wed Feb 26 14:25:40 2025 +0200

    net/mlx5: Avoid report two health errors on same syndrome
    
    [ Upstream commit b5d7b2f04ebcff740f44ef4d295b3401aeb029f4 ]
    
    In case health counter has not increased for few polling intervals, miss
    counter will reach max misses threshold and health report will be
    triggered for FW health reporter. In case syndrome found on same health
    poll another health report will be triggered.
    
    Avoid two health reports on same syndrome by marking this syndrome as
    already known.
    
    Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
    Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB [+ + +]

Author: Alexei Lazar <alazar@nvidia.com>
Date:   Sun Feb 9 12:17:15 2025 +0200

    net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB
    
    [ Upstream commit 95b9606b15bb3ce1198d28d2393dd0e1f0a5f3e9 ]
    
    Current loopback test validation ignores non-linear SKB case in
    the SKB access, which can lead to failures in scenarios such as
    when HW GRO is enabled.
    Linearize the SKB so both cases will be handled.
    
    Signed-off-by: Alexei Lazar <alazar@nvidia.com>
    Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://patch.msgid.link/20250209101716.112774-15-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: Modify LSB bitmask in temperature event to include only the first bit [+ + +]

Author: Shahar Shitrit <shshitrit@nvidia.com>
Date:   Thu Feb 13 11:46:40 2025 +0200

    net/mlx5: Modify LSB bitmask in temperature event to include only the first bit
    
    [ Upstream commit 633f16d7e07c129a36b882c05379e01ce5bdb542 ]
    
    In the sensor_count field of the MTEWE register, bits 1-62 are
    supported only for unmanaged switches, not for NICs, and bit 63
    is reserved for internal use.
    
    To prevent confusing output that may include set bits that are
    not relevant to NIC sensors, we update the bitmask to retain only
    the first bit, which corresponds to the sensor ASIC.
    
    Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
    Link: https://patch.msgid.link/20250213094641.226501-4-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: reduce rep rxq depth to 256 for ECPF [+ + +]

Author: William Tu <witu@nvidia.com>
Date:   Sun Feb 9 12:17:08 2025 +0200

    net/mlx5e: reduce rep rxq depth to 256 for ECPF
    
    [ Upstream commit b9cc8f9d700867aaa77aedddfea85e53d5e5d584 ]
    
    By experiments, a single queue representor netdev consumes kernel
    memory around 2.8MB, and 1.8MB out of the 2.8MB is due to page
    pool for the RXQ. Scaling to a thousand representors consumes 2.8GB,
    which becomes a memory pressure issue for embedded devices such as
    BlueField-2 16GB / BlueField-3 32GB memory.
    
    Since representor netdevs mostly handles miss traffic, and ideally,
    most of the traffic will be offloaded, reduce the default non-uplink
    rep netdev's RXQ default depth from 1024 to 256 if mdev is ecpf eswitch
    manager. This saves around 1MB of memory per regular RQ,
    (1024 - 256) * 2KB, allocated from page pool.
    
    With rxq depth of 256, the netlink page pool tool reports
    $./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
             --dump page-pool-get
     {'id': 277,
      'ifindex': 9,
      'inflight': 128,
      'inflight-mem': 786432,
      'napi-id': 775}]
    
    This is due to mtu 1500 + headroom consumes half pages, so 256 rxq
    entries consumes around 128 pages (thus create a page pool with
    size 128), shown above at inflight.
    
    Note that each netdev has multiple types of RQs, including
    Regular RQ, XSK, PTP, Drop, Trap RQ. Since non-uplink representor
    only supports regular rq, this patch only changes the regular RQ's
    default depth.
    
    Signed-off-by: William Tu <witu@nvidia.com>
    Reviewed-by: Bodong Wang <bodong@nvidia.com>
    Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
    Link: https://patch.msgid.link/20250209101716.112774-8-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: set the tx_queue_len for pfifo_fast [+ + +]

Author: William Tu <witu@nvidia.com>
Date:   Sun Feb 9 12:17:09 2025 +0200

    net/mlx5e: set the tx_queue_len for pfifo_fast
    
    [ Upstream commit a38cc5706fb9f7dc4ee3a443f61de13ce1e410ed ]
    
    By default, the mq netdev creates a pfifo_fast qdisc. On a
    system with 16 core, the pfifo_fast with 3 bands consumes
    16 * 3 * 8 (size of pointer) * 1024 (default tx queue len)
    = 393KB. The patch sets the tx qlen to representor default
    value, 128 (1<<MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE), which
    consumes 16 * 3 * 8 * 128 = 49KB, saving 344KB for each
    representor at ECPF.
    
    Signed-off-by: William Tu <witu@nvidia.com>
    Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
    Link: https://patch.msgid.link/20250209101716.112774-9-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/smc: use the correct ndev to find pnetid by pnetid table [+ + +]

Author: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Date:   Tue Mar 4 20:43:04 2025 +0800

    net/smc: use the correct ndev to find pnetid by pnetid table
    
    [ Upstream commit bfc6c67ec2d64d0ca4e5cc3e1ac84298a10b8d62 ]
    
    When using smc_pnet in SMC, it will only search the pnetid in the
    base_ndev of the netdev hierarchy(both HW PNETID and User-defined
    sw pnetid). This may not work for some scenarios when using SMC in
    container on cloud environment.
    In container, there have choices of different container network,
    such as directly using host network, virtual network IPVLAN, veth,
    etc. Different choices of container network have different netdev
    hierarchy. Examples of netdev hierarchy show below. (eth0 and eth1
    in host below is the netdev directly related to the physical device).
                _______________________________
               |   _________________           |
               |  |POD              |          |
               |  |                 |          |
               |  | eth0_________   |          |
               |  |____|         |__|          |
               |       |         |             |
               |       |         |             |
               |   eth1|base_ndev| eth0_______ |
               |       |         |    | RDMA  ||
               | host  |_________|    |_______||
               ---------------------------------
         netdev hierarchy if directly using host network
               ________________________________
               |   _________________           |
               |  |POD  __________  |          |
               |  |    |upper_ndev| |          |
               |  |eth0|__________| |          |
               |  |_______|_________|          |
               |          |lower netdev        |
               |        __|______              |
               |   eth1|         | eth0_______ |
               |       |base_ndev|    | RDMA  ||
               | host  |_________|    |_______||
               ---------------------------------
                netdev hierarchy if using IPVLAN
                _______________________________
               |   _____________________       |
               |  |POD        _________ |      |
               |  |          |base_ndev||      |
               |  |eth0(veth)|_________||      |
               |  |____________|________|      |
               |               |pairs          |
               |        _______|_              |
               |       |         | eth0_______ |
               |   veth|base_ndev|    | RDMA  ||
               |       |_________|    |_______||
               |        _________              |
               |   eth1|base_ndev|             |
               | host  |_________|             |
               ---------------------------------
                 netdev hierarchy if using veth
    Due to some reasons, the eth1 in host is not RDMA attached netdevice,
    pnetid is needed to map the eth1(in host) with RDMA device so that POD
    can do SMC-R. Because the eth1(in host) is managed by CNI plugin(such
    as Terway, network management plugin in container environment), and in
    cloud environment the eth(in host) can dynamically be inserted by CNI
    when POD create and dynamically be removed by CNI when POD destroy and
    no POD related to the eth(in host) anymore. It is hard to config the
    pnetid to the eth1(in host). But it is easy to config the pnetid to the
    netdevice which can be seen in POD. When do SMC-R, both the container
    directly using host network and the container using veth network can
    successfully match the RDMA device, because the configured pnetid netdev
    is a base_ndev. But the container using IPVLAN can not successfully
    match the RDMA device and 0x03030000 fallback happens, because the
    configured pnetid netdev is not a base_ndev. Additionally, if config
    pnetid to the eth1(in host) also can not work for matching RDMA device
    when using veth network and doing SMC-R in POD.
    
    To resolve the problems list above, this patch extends to search user
    -defined sw pnetid in the clc handshake ndev when no pnetid can be found
    in the base_ndev, and the base_ndev take precedence over ndev for backward
    compatibility. This patch also can unify the pnetid setup of different
    network choices list above in container(Config user-defined sw pnetid in
    the netdevice can be seen in POD).
    
    Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
    Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
    Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done [+ + +]

Author: Wang Liang <wangliang74@huawei.com>
Date:   Tue May 20 18:14:04 2025 +0800

    net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done
    
    [ Upstream commit e279024617134c94fd3e37470156534d5f2b3472 ]
    
    Syzbot reported a slab-use-after-free with the following call trace:
    
      ==================================================================
      BUG: KASAN: slab-use-after-free in tipc_aead_encrypt_done+0x4bd/0x510 net/tipc/crypto.c:840
      Read of size 8 at addr ffff88807a733000 by task kworker/1:0/25
    
      Call Trace:
       kasan_report+0xd9/0x110 mm/kasan/report.c:601
       tipc_aead_encrypt_done+0x4bd/0x510 net/tipc/crypto.c:840
       crypto_request_complete include/crypto/algapi.h:266
       aead_request_complete include/crypto/internal/aead.h:85
       cryptd_aead_crypt+0x3b8/0x750 crypto/cryptd.c:772
       crypto_request_complete include/crypto/algapi.h:266
       cryptd_queue_worker+0x131/0x200 crypto/cryptd.c:181
       process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
    
      Allocated by task 8355:
       kzalloc_noprof include/linux/slab.h:778
       tipc_crypto_start+0xcc/0x9e0 net/tipc/crypto.c:1466
       tipc_init_net+0x2dd/0x430 net/tipc/core.c:72
       ops_init+0xb9/0x650 net/core/net_namespace.c:139
       setup_net+0x435/0xb40 net/core/net_namespace.c:343
       copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
       create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
       unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
       ksys_unshare+0x419/0x970 kernel/fork.c:3323
       __do_sys_unshare kernel/fork.c:3394
    
      Freed by task 63:
       kfree+0x12a/0x3b0 mm/slub.c:4557
       tipc_crypto_stop+0x23c/0x500 net/tipc/crypto.c:1539
       tipc_exit_net+0x8c/0x110 net/tipc/core.c:119
       ops_exit_list+0xb0/0x180 net/core/net_namespace.c:173
       cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640
       process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
    
    After freed the tipc_crypto tx by delete namespace, tipc_aead_encrypt_done
    may still visit it in cryptd_queue_worker workqueue.
    
    I reproduce this issue by:
      ip netns add ns1
      ip link add veth1 type veth peer name veth2
      ip link set veth1 netns ns1
      ip netns exec ns1 tipc bearer enable media eth dev veth1
      ip netns exec ns1 tipc node set key this_is_a_master_key master
      ip netns exec ns1 tipc bearer disable media eth dev veth1
      ip netns del ns1
    
    The key of reproduction is that, simd_aead_encrypt is interrupted, leading
    to crypto_simd_usable() return false. Thus, the cryptd_queue_worker is
    triggered, and the tipc_crypto tx will be visited.
    
      tipc_disc_timeout
        tipc_bearer_xmit_skb
          tipc_crypto_xmit
            tipc_aead_encrypt
              crypto_aead_encrypt
                // encrypt()
                simd_aead_encrypt
                  // crypto_simd_usable() is false
                  child = &ctx->cryptd_tfm->base;
    
      simd_aead_encrypt
        crypto_aead_encrypt
          // encrypt()
          cryptd_aead_encrypt_enqueue
            cryptd_aead_enqueue
              cryptd_enqueue_request
                // trigger cryptd_queue_worker
                queue_work_on(smp_processor_id(), cryptd_wq, &cpu_queue->work)
    
    Fix this by holding net reference count before encrypt.
    
    Reported-by: syzbot+55c12726619ff85ce1f6@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=55c12726619ff85ce1f6
    Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication")
    Signed-off-by: Wang Liang <wangliang74@huawei.com>
    Link: https://patch.msgid.link/20250520101404.1341730-1-wangliang74@huawei.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dwmac-sun8i: Use parsed internal PHY address instead of 1 [+ + +]

Author: Paul Kocialkowski <paulk@sys-base.io>
Date:   Mon May 19 18:49:36 2025 +0200

    net: dwmac-sun8i: Use parsed internal PHY address instead of 1
    
    [ Upstream commit 47653e4243f2b0a26372e481ca098936b51ec3a8 ]
    
    While the MDIO address of the internal PHY on Allwinner sun8i chips is
    generally 1, of_mdio_parse_addr is used to cleanly parse the address
    from the device-tree instead of hardcoding it.
    
    A commit reworking the code ditched the parsed value and hardcoded the
    value 1 instead, which didn't really break anything but is more fragile
    and not future-proof.
    
    Restore the initial behavior using the parsed address returned from the
    helper.
    
    Fixes: 634db83b8265 ("net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs")
    Signed-off-by: Paul Kocialkowski <paulk@sys-base.io>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Acked-by: Corentin LABBE <clabbe.montjoie@gmail.com>
    Tested-by: Corentin LABBE <clabbe.montjoie@gmail.com>
    Link: https://patch.msgid.link/20250519164936.4172658-1-paulk@sys-base.io
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: enetc: refactor bulk flipping of RX buffers to separate function [+ + +]

Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Thu Apr 17 15:00:04 2025 +0300

    net: enetc: refactor bulk flipping of RX buffers to separate function
    
    [ Upstream commit 1d587faa5be7e9785b682cc5f58ba8f4100c13ea ]
    
    This small snippet of code ensures that we do something with the array
    of RX software buffer descriptor elements after passing the skb to the
    stack. In this case, we see if the other half of the page is reusable,
    and if so, we "turn around" the buffers, making them directly usable by
    enetc_refill_rx_ring() without going to enetc_new_page().
    
    We will need to perform this kind of buffer flipping from a new code
    path, i.e. from XDP_PASS. Currently, enetc_build_skb() does it there
    buffer by buffer, but in a subsequent change we will stop using
    enetc_build_skb() for XDP_PASS.
    
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Reviewed-by: Wei Fang <wei.fang@nxp.com>
    Link: https://patch.msgid.link/20250417120005.3288549-3-vladimir.oltean@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ethernet: mtk_ppe_offload: Allow QinQ, double ETH_P_8021Q only [+ + +]

Author: Eric Woudstra <ericwouds@gmail.com>
Date:   Tue Feb 25 21:15:09 2025 +0100

    net: ethernet: mtk_ppe_offload: Allow QinQ, double ETH_P_8021Q only
    
    [ Upstream commit 7fe0353606d77a32c4c7f2814833dd1c043ebdd2 ]
    
    mtk_foe_entry_set_vlan() in mtk_ppe.c already supports double vlan
    tagging, but mtk_flow_offload_replace() in mtk_ppe_offload.c only allows
    for 1 vlan tag, optionally in combination with pppoe and dsa tags.
    
    However, mtk_foe_entry_set_vlan() only allows for setting the vlan id.
    The protocol cannot be set, it is always ETH_P_8021Q, for inner and outer
    tag. This patch adds QinQ support to mtk_flow_offload_replace(), only in
    the case that both inner and outer tags are ETH_P_8021Q.
    
    Only PPPoE-in-Q (as before) and Q-in-Q are allowed. A combination
    of PPPoE and Q-in-Q is not allowed.
    
    Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
    Link: https://patch.msgid.link/20250225201509.20843-1-ericwouds@gmail.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ethernet: ti: am65-cpsw: Lower random mac address error print to info [+ + +]

Author: Nishanth Menon <nm@ti.com>
Date:   Fri May 16 07:26:55 2025 -0500

    net: ethernet: ti: am65-cpsw: Lower random mac address error print to info
    
    [ Upstream commit 50980d8da71a0c2e045e85bba93c0099ab73a209 ]
    
    Using random mac address is not an error since the driver continues to
    function, it should be informative that the system has not assigned
    a MAC address. This is inline with other drivers such as ax88796c,
    dm9051 etc. Drop the error level to info level.
    
    Signed-off-by: Nishanth Menon <nm@ti.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Roger Quadros <rogerq@kernel.org>
    Link: https://patch.msgid.link/20250516122655.442808-1-nm@ti.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ethernet: ti: cpsw_new: populate netdev of_node [+ + +]

Author: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Date:   Mon Mar 3 08:46:57 2025 +0100

    net: ethernet: ti: cpsw_new: populate netdev of_node
    
    [ Upstream commit 7ff1c88fc89688c27f773ba956f65f0c11367269 ]
    
    So that of_find_net_device_by_node() can find CPSW ports and other DSA
    switches can be stacked downstream. Tested in conjunction with KSZ8873.
    
    Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
    Link: https://patch.msgid.link/20250303074703.1758297-1-alexander.sverdlin@siemens.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: lan743x: Restore SGMII CTRL register on resume [+ + +]

Author: Thangaraj Samynathan <thangaraj.s@microchip.com>
Date:   Fri May 16 09:27:19 2025 +0530

    net: lan743x: Restore SGMII CTRL register on resume
    
    [ Upstream commit 293e38ff4e4c2ba53f3fd47d8a4a9f0f0414a7a6 ]
    
    SGMII_CTRL register, which specifies the active interface, was not
    properly restored when resuming from suspend. This led to incorrect
    interface selection after resume particularly in scenarios involving
    the FPGA.
    
    To fix this:
    - Move the SGMII_CTRL setup out of the probe function.
    - Initialize the register in the hardware initialization helper function,
    which is called during both device initialization and resume.
    
    This ensures the interface configuration is consistently restored after
    suspend/resume cycles.
    
    Fixes: a46d9d37c4f4f ("net: lan743x: Add support for SGMII interface")
    Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
    Link: https://patch.msgid.link/20250516035719.117960-1-thangaraj.s@microchip.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: phylink: use pl->link_interface in phylink_expects_phy() [+ + +]

Author: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Date:   Thu Feb 27 20:15:17 2025 +0800

    net: phylink: use pl->link_interface in phylink_expects_phy()
    
    [ Upstream commit b63263555eaafbf9ab1a82f2020bbee872d83759 ]
    
    The phylink_expects_phy() function allows MAC drivers to check if they are
    expecting a PHY to attach. The checking condition in phylink_expects_phy()
    aims to achieve the same result as the checking condition in
    phylink_attach_phy().
    
    However, the checking condition in phylink_expects_phy() uses
    pl->link_config.interface, while phylink_attach_phy() uses
    pl->link_interface.
    
    Initially, both pl->link_interface and pl->link_config.interface are set
    to SGMII, and pl->cfg_link_an_mode is set to MLO_AN_INBAND.
    
    When the interface switches from SGMII to 2500BASE-X,
    pl->link_config.interface is updated by phylink_major_config().
    At this point, pl->cfg_link_an_mode remains MLO_AN_INBAND, and
    pl->link_config.interface is set to 2500BASE-X.
    Subsequently, when the STMMAC interface is taken down
    administratively and brought back up, it is blocked by
    phylink_expects_phy().
    
    Since phylink_expects_phy() and phylink_attach_phy() aim to achieve the
    same result, phylink_expects_phy() should check pl->link_interface,
    which never changes, instead of pl->link_config.interface, which is
    updated by phylink_major_config().
    
    Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
    Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
    Link: https://patch.msgid.link/20250227121522.1802832-2-yong.liang.choong@linux.intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: pktgen: fix access outside of user given buffer in pktgen_thread_write() [+ + +]

Author: Peter Seiderer <ps.report@gmx.net>
Date:   Wed Feb 19 09:45:27 2025 +0100

    net: pktgen: fix access outside of user given buffer in pktgen_thread_write()
    
    [ Upstream commit 425e64440ad0a2f03bdaf04be0ae53dededbaa77 ]
    
    Honour the user given buffer size for the strn_len() calls (otherwise
    strn_len() will access memory outside of the user given buffer).
    
    Signed-off-by: Peter Seiderer <ps.report@gmx.net>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://patch.msgid.link/20250219084527.20488-8-ps.report@gmx.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: pktgen: fix mpls maximum labels list parsing [+ + +]

Author: Peter Seiderer <ps.report@gmx.net>
Date:   Thu Feb 27 14:56:00 2025 +0100

    net: pktgen: fix mpls maximum labels list parsing
    
    [ Upstream commit 2b15a0693f70d1e8119743ee89edbfb1271b3ea8 ]
    
    Fix mpls maximum labels list parsing up to MAX_MPLS_LABELS entries (instead
    of up to MAX_MPLS_LABELS - 1).
    
    Addresses the following:
    
            $ echo "mpls 00000f00,00000f01,00000f02,00000f03,00000f04,00000f05,00000f06,00000f07,00000f08,00000f09,00000f0a,00000f0b,00000f0c,00000f0d,00000f0e,00000f0f" > /proc/net/pktgen/lo\@0
            -bash: echo: write error: Argument list too long
    
    Signed-off-by: Peter Seiderer <ps.report@gmx.net>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: xgene-v2: remove incorrect ACPI_PTR annotation [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Feb 25 17:33:33 2025 +0100

    net: xgene-v2: remove incorrect ACPI_PTR annotation
    
    [ Upstream commit 01358e8fe922f716c05d7864ac2213b2440026e7 ]
    
    Building with W=1 shows a warning about xge_acpi_match being unused when
    CONFIG_ACPI is disabled:
    
    drivers/net/ethernet/apm/xgene-v2/main.c:723:36: error: unused variable 'xge_acpi_match' [-Werror,-Wunused-const-variable]
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Link: https://patch.msgid.link/20250225163341.4168238-2-arnd@kernel.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net_sched: hfsc: Address reentrant enqueue adding class to eltree twice [+ + +]

Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Thu May 22 15:14:47 2025 -0300

    net_sched: hfsc: Address reentrant enqueue adding class to eltree twice
    
    commit ac9fe7dd8e730a103ae4481147395cc73492d786 upstream.
    
    Savino says:
        "We are writing to report that this recent patch
        (141d34391abbb315d68556b7c67ad97885407547) [1]
        can be bypassed, and a UAF can still occur when HFSC is utilized with
        NETEM.
    
        The patch only checks the cl->cl_nactive field to determine whether
        it is the first insertion or not [2], but this field is only
        incremented by init_vf [3].
    
        By using HFSC_RSC (which uses init_ed) [4], it is possible to bypass the
        check and insert the class twice in the eltree.
        Under normal conditions, this would lead to an infinite loop in
        hfsc_dequeue for the reasons we already explained in this report [5].
    
        However, if TBF is added as root qdisc and it is configured with a
        very low rate,
        it can be utilized to prevent packets from being dequeued.
        This behavior can be exploited to perform subsequent insertions in the
        HFSC eltree and cause a UAF."
    
    To fix both the UAF and the infinite loop, with netem as an hfsc child,
    check explicitly in hfsc_enqueue whether the class is already in the eltree
    whenever the HFSC_RSC flag is set.
    
    [1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=141d34391abbb315d68556b7c67ad97885407547
    [2] https://elixir.bootlin.com/linux/v6.15-rc5/source/net/sched/sch_hfsc.c#L1572
    [3] https://elixir.bootlin.com/linux/v6.15-rc5/source/net/sched/sch_hfsc.c#L677
    [4] https://elixir.bootlin.com/linux/v6.15-rc5/source/net/sched/sch_hfsc.c#L1574
    [5] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/T/#u
    
    Fixes: 37d9cf1a3ce3 ("sched: Fix detection of empty queues in child qdiscs")
    Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
    Reported-by: William Liu <will@willsroot.io>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Tested-by: Victor Nogueira <victor@mojatatu.com>
    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Link: https://patch.msgid.link/20250522181448.1439717-2-pctammela@mojatatu.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: conntrack: Bound nf_conntrack sysctl writes [+ + +]

Author: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
Date:   Wed Jan 29 18:06:30 2025 +0100

    netfilter: conntrack: Bound nf_conntrack sysctl writes
    
    [ Upstream commit 8b6861390ffee6b8ed78b9395e3776c16fec6579 ]
    
    nf_conntrack_max and nf_conntrack_expect_max sysctls were authorized to
    be written any negative value, which would then be stored in the
    unsigned int variables nf_conntrack_max and nf_ct_expect_max variables.
    
    While the do_proc_dointvec_conv function is supposed to limit writing
    handled by proc_dointvec proc_handler to INT_MAX. Such a negative value
    being written in an unsigned int leads to a very high value, exceeding
    this limit.
    
    Moreover, the nf_conntrack_expect_max sysctl documentation specifies the
    minimum value is 1.
    
    The proc_handlers have thus been updated to proc_dointvec_minmax in
    order to specify the following write bounds :
    
    * Bound nf_conntrack_max sysctl writings between SYSCTL_ZERO
      and SYSCTL_INT_MAX.
    
    * Bound nf_conntrack_expect_max sysctl writings between SYSCTL_ONE
      and SYSCTL_INT_MAX as defined in the sysctl documentation.
    
    With this patch applied, sysctl writes outside the defined in the bound
    will thus lead to a write error :
    
    ```
    sysctl -w net.netfilter.nf_conntrack_expect_max=-1
    sysctl: setting key "net.netfilter.nf_conntrack_expect_max": Invalid argument
    ```
    
    Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

NFS: Avoid flushing data while holding directory locks in nfs_rename() [+ + +]

Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Sun Apr 27 18:21:06 2025 -0400

    NFS: Avoid flushing data while holding directory locks in nfs_rename()
    
    [ Upstream commit dcd21b609d4abc7303f8683bce4f35d78d7d6830 ]
    
    The Linux client assumes that all filehandles are non-volatile for
    renames within the same directory (otherwise sillyrename cannot work).
    However, the existence of the Linux 'subtree_check' export option has
    meant that nfs_rename() has always assumed it needs to flush writes
    before attempting to rename.
    
    Since NFSv4 does allow the client to query whether or not the server
    exhibits this behaviour, and since knfsd does actually set the
    appropriate flag when 'subtree_check' is enabled on an export, it
    should be OK to optimise away the write flushing behaviour in the cases
    where it is clearly not needed.
    
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

NFS: Don't allow waiting for exiting tasks [+ + +]

Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Fri Mar 28 13:19:18 2025 -0400

    NFS: Don't allow waiting for exiting tasks
    
    [ Upstream commit 8d3ca331026a7f9700d3747eed59a67b8f828cdc ]
    
    Once a task calls exit_signals() it can no longer be signalled. So do
    not allow it to do killable waits.
    
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

nfs: don't share pNFS DS connections between net namespaces [+ + +]

Author: Jeff Layton <jlayton@kernel.org>
Date:   Thu Apr 10 16:42:03 2025 -0400

    nfs: don't share pNFS DS connections between net namespaces
    
    [ Upstream commit 6b9785dc8b13d9fb75ceec8cf4ea7ec3f3b1edbc ]
    
    Currently, different NFS clients can share the same DS connections, even
    when they are in different net namespaces. If a containerized client
    creates a DS connection, another container can find and use it. When the
    first client exits, the connection will close which can lead to stalls
    in other clients.
    
    Add a net namespace pointer to struct nfs4_pnfs_ds, and compare those
    value to the caller's netns in _data_server_lookup_locked() when
    searching for a nfs4_pnfs_ds to match.
    
    Reported-by: Omar Sandoval <osandov@osandov.com>
    Reported-by: Sargun Dillon <sargun@sargun.me>
    Closes: https://lore.kernel.org/linux-nfs/Z_ArpQC_vREh_hEA@telecaster/
    Tested-by: Sargun Dillon <sargun@sargun.me>
    Signed-off-by: Jeff Layton <jlayton@kernel.org>
    Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
    Link: https://lore.kernel.org/r/20250410-nfs-ds-netns-v2-1-f80b7979ba80@kernel.org
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

NFSv4: Check for delegation validity in nfs_start_delegation_return_locked() [+ + +]

Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Thu Mar 27 19:20:53 2025 -0400

    NFSv4: Check for delegation validity in nfs_start_delegation_return_locked()
    
    [ Upstream commit 9e8f324bd44c1fe026b582b75213de4eccfa1163 ]
    
    Check that the delegation is still attached after taking the spin lock
    in nfs_start_delegation_return_locked().
    
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

NFSv4: Treat ENETUNREACH errors as fatal for state recovery [+ + +]

Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Mon Mar 24 20:35:33 2025 -0400

    NFSv4: Treat ENETUNREACH errors as fatal for state recovery
    
    [ Upstream commit 0af5fb5ed3d2fd9e110c6112271f022b744a849a ]
    
    If a containerised process is killed and causes an ENETUNREACH or
    ENETDOWN error to be propagated to the state manager, then mark the
    nfs_client as being dead so that we don't loop in functions that are
    expecting recovery to succeed.
    
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

nvme-pci: add NVME_QUIRK_NO_DEEPEST_PS quirk for SOLIDIGM P44 Pro [+ + +]

Author: Ilya Guterman <amfernusus@gmail.com>
Date:   Sat May 10 19:21:30 2025 +0900

    nvme-pci: add NVME_QUIRK_NO_DEEPEST_PS quirk for SOLIDIGM P44 Pro
    
    [ Upstream commit e765bf89f42b5c82132a556b630affeb82b2a21f ]
    
    This commit adds the NVME_QUIRK_NO_DEEPEST_PS quirk for device
    [126f:2262], which belongs to device SOLIDIGM P44 Pro SSDPFKKW020X7
    
    The device frequently have trouble exiting the deepest power state (5),
    resulting in the entire disk being unresponsive.
    
    Verified by setting nvme_core.default_ps_max_latency_us=10000 and
    observing the expected behavior.
    
    Signed-off-by: Ilya Guterman <amfernusus@gmail.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

nvmet-tcp: don't restore null sk_state_change [+ + +]

Author: Alistair Francis <alistair.francis@wdc.com>
Date:   Wed Apr 23 16:06:21 2025 +1000

    nvmet-tcp: don't restore null sk_state_change
    
    [ Upstream commit 46d22b47df2741996af277a2838b95f130436c13 ]
    
    queue->state_change is set as part of nvmet_tcp_set_queue_sock(), but if
    the TCP connection isn't established when nvmet_tcp_set_queue_sock() is
    called then queue->state_change isn't set and sock->sk->sk_state_change
    isn't replaced.
    
    As such we don't need to restore sock->sk->sk_state_change if
    queue->state_change is NULL.
    
    This avoids NULL pointer dereferences such as this:
    
    [  286.462026][    C0] BUG: kernel NULL pointer dereference, address: 0000000000000000
    [  286.462814][    C0] #PF: supervisor instruction fetch in kernel mode
    [  286.463796][    C0] #PF: error_code(0x0010) - not-present page
    [  286.464392][    C0] PGD 8000000140620067 P4D 8000000140620067 PUD 114201067 PMD 0
    [  286.465086][    C0] Oops: Oops: 0010 [#1] SMP KASAN PTI
    [  286.465559][    C0] CPU: 0 UID: 0 PID: 1628 Comm: nvme Not tainted 6.15.0-rc2+ #11 PREEMPT(voluntary)
    [  286.466393][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
    [  286.467147][    C0] RIP: 0010:0x0
    [  286.467420][    C0] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
    [  286.467977][    C0] RSP: 0018:ffff8883ae008580 EFLAGS: 00010246
    [  286.468425][    C0] RAX: 0000000000000000 RBX: ffff88813fd34100 RCX: ffffffffa386cc43
    [  286.469019][    C0] RDX: 1ffff11027fa68b6 RSI: 0000000000000008 RDI: ffff88813fd34100
    [  286.469545][    C0] RBP: ffff88813fd34160 R08: 0000000000000000 R09: ffffed1027fa682c
    [  286.470072][    C0] R10: ffff88813fd34167 R11: 0000000000000000 R12: ffff88813fd344c3
    [  286.470585][    C0] R13: ffff88813fd34112 R14: ffff88813fd34aec R15: ffff888132cdd268
    [  286.471070][    C0] FS:  00007fe3c04c7d80(0000) GS:ffff88840743f000(0000) knlGS:0000000000000000
    [  286.471644][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  286.472543][    C0] CR2: ffffffffffffffd6 CR3: 000000012daca000 CR4: 00000000000006f0
    [  286.473500][    C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  286.474467][    C0] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
    [  286.475453][    C0] Call Trace:
    [  286.476102][    C0]  <IRQ>
    [  286.476719][    C0]  tcp_fin+0x2bb/0x440
    [  286.477429][    C0]  tcp_data_queue+0x190f/0x4e60
    [  286.478174][    C0]  ? __build_skb_around+0x234/0x330
    [  286.478940][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.479659][    C0]  ? __pfx_tcp_data_queue+0x10/0x10
    [  286.480431][    C0]  ? tcp_try_undo_loss+0x640/0x6c0
    [  286.481196][    C0]  ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90
    [  286.482046][    C0]  ? kvm_clock_get_cycles+0x14/0x30
    [  286.482769][    C0]  ? ktime_get+0x66/0x150
    [  286.483433][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.484146][    C0]  tcp_rcv_established+0x6e4/0x2050
    [  286.484857][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.485523][    C0]  ? ipv4_dst_check+0x160/0x2b0
    [  286.486203][    C0]  ? __pfx_tcp_rcv_established+0x10/0x10
    [  286.486917][    C0]  ? lock_release+0x217/0x2c0
    [  286.487595][    C0]  tcp_v4_do_rcv+0x4d6/0x9b0
    [  286.488279][    C0]  tcp_v4_rcv+0x2af8/0x3e30
    [  286.488904][    C0]  ? raw_local_deliver+0x51b/0xad0
    [  286.489551][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.490198][    C0]  ? __pfx_tcp_v4_rcv+0x10/0x10
    [  286.490813][    C0]  ? __pfx_raw_local_deliver+0x10/0x10
    [  286.491487][    C0]  ? __pfx_nf_confirm+0x10/0x10 [nf_conntrack]
    [  286.492275][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.492900][    C0]  ip_protocol_deliver_rcu+0x8f/0x370
    [  286.493579][    C0]  ip_local_deliver_finish+0x297/0x420
    [  286.494268][    C0]  ip_local_deliver+0x168/0x430
    [  286.494867][    C0]  ? __pfx_ip_local_deliver+0x10/0x10
    [  286.495498][    C0]  ? __pfx_ip_local_deliver_finish+0x10/0x10
    [  286.496204][    C0]  ? ip_rcv_finish_core+0x19a/0x1f20
    [  286.496806][    C0]  ? lock_release+0x217/0x2c0
    [  286.497414][    C0]  ip_rcv+0x455/0x6e0
    [  286.497945][    C0]  ? __pfx_ip_rcv+0x10/0x10
    [  286.498550][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.499137][    C0]  ? __pfx_ip_rcv_finish+0x10/0x10
    [  286.499763][    C0]  ? lock_release+0x217/0x2c0
    [  286.500327][    C0]  ? dl_scaled_delta_exec+0xd1/0x2c0
    [  286.500922][    C0]  ? __pfx_ip_rcv+0x10/0x10
    [  286.501480][    C0]  __netif_receive_skb_one_core+0x166/0x1b0
    [  286.502173][    C0]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
    [  286.502903][    C0]  ? lock_acquire+0x2b2/0x310
    [  286.503487][    C0]  ? process_backlog+0x372/0x1350
    [  286.504087][    C0]  ? lock_release+0x217/0x2c0
    [  286.504642][    C0]  process_backlog+0x3b9/0x1350
    [  286.505214][    C0]  ? process_backlog+0x372/0x1350
    [  286.505779][    C0]  __napi_poll.constprop.0+0xa6/0x490
    [  286.506363][    C0]  net_rx_action+0x92e/0xe10
    [  286.506889][    C0]  ? __pfx_net_rx_action+0x10/0x10
    [  286.507437][    C0]  ? timerqueue_add+0x1f0/0x320
    [  286.507977][    C0]  ? sched_clock_cpu+0x68/0x540
    [  286.508492][    C0]  ? lock_acquire+0x2b2/0x310
    [  286.509043][    C0]  ? kvm_sched_clock_read+0xd/0x20
    [  286.509607][    C0]  ? handle_softirqs+0x1aa/0x7d0
    [  286.510187][    C0]  handle_softirqs+0x1f2/0x7d0
    [  286.510754][    C0]  ? __pfx_handle_softirqs+0x10/0x10
    [  286.511348][    C0]  ? irqtime_account_irq+0x181/0x290
    [  286.511937][    C0]  ? __dev_queue_xmit+0x85d/0x3450
    [  286.512510][    C0]  do_softirq.part.0+0x89/0xc0
    [  286.513100][    C0]  </IRQ>
    [  286.513548][    C0]  <TASK>
    [  286.513953][    C0]  __local_bh_enable_ip+0x112/0x140
    [  286.514522][    C0]  ? __dev_queue_xmit+0x85d/0x3450
    [  286.515072][    C0]  __dev_queue_xmit+0x872/0x3450
    [  286.515619][    C0]  ? nft_do_chain+0xe16/0x15b0 [nf_tables]
    [  286.516252][    C0]  ? __pfx___dev_queue_xmit+0x10/0x10
    [  286.516817][    C0]  ? selinux_ip_postroute+0x43c/0xc50
    [  286.517433][    C0]  ? __pfx_selinux_ip_postroute+0x10/0x10
    [  286.518061][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.518606][    C0]  ? ip_output+0x164/0x4a0
    [  286.519149][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.519671][    C0]  ? ip_finish_output2+0x17d5/0x1fb0
    [  286.520258][    C0]  ip_finish_output2+0xb4b/0x1fb0
    [  286.520787][    C0]  ? __pfx_ip_finish_output2+0x10/0x10
    [  286.521355][    C0]  ? __ip_finish_output+0x15d/0x750
    [  286.521890][    C0]  ip_output+0x164/0x4a0
    [  286.522372][    C0]  ? __pfx_ip_output+0x10/0x10
    [  286.522872][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.523402][    C0]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
    [  286.524031][    C0]  ? __pfx_ip_finish_output+0x10/0x10
    [  286.524605][    C0]  ? __ip_queue_xmit+0x999/0x2260
    [  286.525200][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.525744][    C0]  ? ipv4_dst_check+0x16a/0x2b0
    [  286.526279][    C0]  ? lock_release+0x217/0x2c0
    [  286.526793][    C0]  __ip_queue_xmit+0x1883/0x2260
    [  286.527324][    C0]  ? __skb_clone+0x54c/0x730
    [  286.527827][    C0]  __tcp_transmit_skb+0x209b/0x37a0
    [  286.528374][    C0]  ? __pfx___tcp_transmit_skb+0x10/0x10
    [  286.528952][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.529472][    C0]  ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90
    [  286.530152][    C0]  ? trace_hardirqs_on+0x12/0x120
    [  286.530691][    C0]  tcp_write_xmit+0xb81/0x88b0
    [  286.531224][    C0]  ? mod_memcg_state+0x4d/0x60
    [  286.531736][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.532253][    C0]  __tcp_push_pending_frames+0x90/0x320
    [  286.532826][    C0]  tcp_send_fin+0x141/0xb50
    [  286.533352][    C0]  ? __pfx_tcp_send_fin+0x10/0x10
    [  286.533908][    C0]  ? __local_bh_enable_ip+0xab/0x140
    [  286.534495][    C0]  inet_shutdown+0x243/0x320
    [  286.535077][    C0]  nvme_tcp_alloc_queue+0xb3b/0x2590 [nvme_tcp]
    [  286.535709][    C0]  ? do_raw_spin_lock+0x129/0x260
    [  286.536314][    C0]  ? __pfx_nvme_tcp_alloc_queue+0x10/0x10 [nvme_tcp]
    [  286.536996][    C0]  ? do_raw_spin_unlock+0x54/0x1e0
    [  286.537550][    C0]  ? _raw_spin_unlock+0x29/0x50
    [  286.538127][    C0]  ? do_raw_spin_lock+0x129/0x260
    [  286.538664][    C0]  ? __pfx_do_raw_spin_lock+0x10/0x10
    [  286.539249][    C0]  ? nvme_tcp_alloc_admin_queue+0xd5/0x340 [nvme_tcp]
    [  286.539892][    C0]  ? __wake_up+0x40/0x60
    [  286.540392][    C0]  nvme_tcp_alloc_admin_queue+0xd5/0x340 [nvme_tcp]
    [  286.541047][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.541589][    C0]  nvme_tcp_setup_ctrl+0x8b/0x7a0 [nvme_tcp]
    [  286.542254][    C0]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
    [  286.542887][    C0]  ? __pfx_nvme_tcp_setup_ctrl+0x10/0x10 [nvme_tcp]
    [  286.543568][    C0]  ? trace_hardirqs_on+0x12/0x120
    [  286.544166][    C0]  ? _raw_spin_unlock_irqrestore+0x35/0x60
    [  286.544792][    C0]  ? nvme_change_ctrl_state+0x196/0x2e0 [nvme_core]
    [  286.545477][    C0]  nvme_tcp_create_ctrl+0x839/0xb90 [nvme_tcp]
    [  286.546126][    C0]  nvmf_dev_write+0x3db/0x7e0 [nvme_fabrics]
    [  286.546775][    C0]  ? rw_verify_area+0x69/0x520
    [  286.547334][    C0]  vfs_write+0x218/0xe90
    [  286.547854][    C0]  ? do_syscall_64+0x9f/0x190
    [  286.548408][    C0]  ? trace_hardirqs_on_prepare+0xdb/0x120
    [  286.549037][    C0]  ? syscall_exit_to_user_mode+0x93/0x280
    [  286.549659][    C0]  ? __pfx_vfs_write+0x10/0x10
    [  286.550259][    C0]  ? do_syscall_64+0x9f/0x190
    [  286.550840][    C0]  ? syscall_exit_to_user_mode+0x8e/0x280
    [  286.551516][    C0]  ? trace_hardirqs_on_prepare+0xdb/0x120
    [  286.552180][    C0]  ? syscall_exit_to_user_mode+0x93/0x280
    [  286.552834][    C0]  ? ksys_read+0xf5/0x1c0
    [  286.553386][    C0]  ? __pfx_ksys_read+0x10/0x10
    [  286.553964][    C0]  ksys_write+0xf5/0x1c0
    [  286.554499][    C0]  ? __pfx_ksys_write+0x10/0x10
    [  286.555072][    C0]  ? trace_hardirqs_on_prepare+0xdb/0x120
    [  286.555698][    C0]  ? syscall_exit_to_user_mode+0x93/0x280
    [  286.556319][    C0]  ? do_syscall_64+0x54/0x190
    [  286.556866][    C0]  do_syscall_64+0x93/0x190
    [  286.557420][    C0]  ? rcu_read_unlock+0x17/0x60
    [  286.557986][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.558526][    C0]  ? lock_release+0x217/0x2c0
    [  286.559087][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.559659][    C0]  ? count_memcg_events.constprop.0+0x4a/0x60
    [  286.560476][    C0]  ? exc_page_fault+0x7a/0x110
    [  286.561064][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.561647][    C0]  ? lock_release+0x217/0x2c0
    [  286.562257][    C0]  ? do_user_addr_fault+0x171/0xa00
    [  286.562839][    C0]  ? do_user_addr_fault+0x4a2/0xa00
    [  286.563453][    C0]  ? irqentry_exit_to_user_mode+0x84/0x270
    [  286.564112][    C0]  ? rcu_is_watching+0x11/0xb0
    [  286.564677][    C0]  ? irqentry_exit_to_user_mode+0x84/0x270
    [  286.565317][    C0]  ? trace_hardirqs_on_prepare+0xdb/0x120
    [  286.565922][    C0]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [  286.566542][    C0] RIP: 0033:0x7fe3c05e6504
    [  286.567102][    C0] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 8b 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
    [  286.568931][    C0] RSP: 002b:00007fff76444f58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
    [  286.569807][    C0] RAX: ffffffffffffffda RBX: 000000003b40d930 RCX: 00007fe3c05e6504
    [  286.570621][    C0] RDX: 00000000000000cf RSI: 000000003b40d930 RDI: 0000000000000003
    [  286.571443][    C0] RBP: 0000000000000003 R08: 00000000000000cf R09: 000000003b40d930
    [  286.572246][    C0] R10: 0000000000000000 R11: 0000000000000202 R12: 000000003b40cd60
    [  286.573069][    C0] R13: 00000000000000cf R14: 00007fe3c07417f8 R15: 00007fe3c073502e
    [  286.573886][    C0]  </TASK>
    
    Closes: https://lore.kernel.org/linux-nvme/5hdonndzoqa265oq3bj6iarwtfk5dewxxjtbjvn5uqnwclpwt6@a2n6w3taxxex/
    Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

objtool: Properly disable uaccess validation [+ + +]

Author: Josh Poimboeuf <jpoimboe@kernel.org>
Date:   Mon Mar 24 14:55:58 2025 -0700

    objtool: Properly disable uaccess validation
    
    [ Upstream commit e1a9dda74dbffbc3fa2069ff418a1876dc99fb14 ]
    
    If opts.uaccess isn't set, the uaccess validation is disabled, but only
    partially: it doesn't read the uaccess_safe_builtin list but still tries
    to do the validation.  Disable it completely to prevent false warnings.
    
    Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: https://lore.kernel.org/r/0e95581c1d2107fb5f59418edf2b26bba38b0cbb.1742852846.git.jpoimboe@kernel.org
    Signed-off-by: Sasha Levin <sashal@kernel.org>

octeontx2-af: Fix APR entry mapping based on APR_LMT_CFG [+ + +]

Author: Geetha sowjanya <gakula@marvell.com>
Date:   Wed May 21 11:38:34 2025 +0530

    octeontx2-af: Fix APR entry mapping based on APR_LMT_CFG
    
    [ Upstream commit a6ae7129819ad20788e610261246e71736543b8b ]
    
    The current implementation maps the APR table using a fixed size,
    which can lead to incorrect mapping when the number of PFs and VFs
    varies.
    This patch corrects the mapping by calculating the APR table
    size dynamically based on the values configured in the
    APR_LMT_CFG register, ensuring accurate representation
    of APR entries in debugfs.
    
    Fixes: 0daa55d033b0 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table").
    Signed-off-by: Geetha sowjanya <gakula@marvell.com>
    Link: https://patch.msgid.link/20250521060834.19780-3-gakula@marvell.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

octeontx2-af: Set LMT_ENA bit for APR table entries [+ + +]

Author: Subbaraya Sundeep <sbhatta@marvell.com>
Date:   Wed May 21 11:38:33 2025 +0530

    octeontx2-af: Set LMT_ENA bit for APR table entries
    
    [ Upstream commit 0eefa27b493306928d88af6368193b134c98fd64 ]
    
    This patch enables the LMT line for a PF/VF by setting the
    LMT_ENA bit in the APR_LMT_MAP_ENTRY_S structure.
    
    Additionally, it simplifies the logic for calculating the
    LMTST table index by consistently using the maximum
    number of hw supported VFs (i.e., 256).
    
    Fixes: 873a1e3d207a ("octeontx2-af: cn10k: Setting up lmtst map table").
    Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
    Signed-off-by: Geetha sowjanya <gakula@marvell.com>
    Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
    Link: https://patch.msgid.link/20250521060834.19780-2-gakula@marvell.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

octeontx2-pf: Add AF_XDP non-zero copy support [+ + +]

Author: Suman Ghosh <sumang@marvell.com>
Date:   Thu Feb 13 11:01:37 2025 +0530

    octeontx2-pf: Add AF_XDP non-zero copy support
    
    [ Upstream commit b4164de5041b51cda3438e75bce668e2556057c3 ]
    
    Set xdp rx ring memory type as MEM_TYPE_PAGE_POOL for
    af-xdp to work. This is needed since xdp_return_frame
    internally will use page pools.
    
    Fixes: 06059a1a9a4a ("octeontx2-pf: Add XDP support to netdev PF")
    Signed-off-by: Suman Ghosh <sumang@marvell.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

octeontx2-pf: Add support for page pool [+ + +]

Author: Ratheesh Kannoth <rkannoth@marvell.com>
Date:   Mon May 22 07:34:04 2023 +0530

    octeontx2-pf: Add support for page pool
    
    [ Upstream commit b2e3406a38f0f48b1dfb81e5bb73d243ff6af179 ]
    
    Page pool for each rx queue enhance rx side performance
    by reclaiming buffers back to each queue specific pool. DMA
    mapping is done only for first allocation of buffers.
    As subsequent buffers allocation avoid DMA mapping,
    it results in performance improvement.
    
    Image        |  Performance
    ------------ | ------------
    Vannila      |   3Mpps
                 |
    with this    |   42Mpps
    change       |
    ---------------------------
    
    Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
    Link: https://lore.kernel.org/r/20230522020404.152020-1-rkannoth@marvell.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Stable-dep-of: b4164de5041b ("octeontx2-pf: Add AF_XDP non-zero copy support")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

octeontx2-pf: Fix page pool cache index corruption. [+ + +]

Author: Ratheesh Kannoth <rkannoth@marvell.com>
Date:   Fri Sep 8 08:23:09 2023 +0530

    octeontx2-pf: Fix page pool cache index corruption.
    
    commit 88e69af061f2e061a68751ef9cad47a674527a1b upstream.
    
    The access to page pool `cache' array and the `count' variable
    is not locked. Page pool cache access is fine as long as there
    is only one consumer per pool.
    
    octeontx2 driver fills in rx buffers from page pool in NAPI context.
    If system is stressed and could not allocate buffers, refiiling work
    will be delegated to a delayed workqueue. This means that there are
    two cosumers to the page pool cache.
    
    Either workqueue or IRQ/NAPI can be run on other CPU. This will lead
    to lock less access, hence corruption of cache pool indexes.
    
    To fix this issue, NAPI is rescheduled from workqueue context to refill
    rx buffers.
    
    Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool")
    Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
    Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

octeontx2-pf: Fix page pool frag allocation warning [+ + +]

Author: Ratheesh Kannoth <rkannoth@marvell.com>
Date:   Tue Oct 10 09:18:42 2023 +0530

    octeontx2-pf: Fix page pool frag allocation warning
    
    commit 50e492143374c17ad89c865a1a44837b3f5c8226 upstream.
    
    Since page pool param's "order" is set to 0, will result
    in below warn message if interface is configured with higher
    rx buffer size.
    
    Steps to reproduce the issue.
    1. devlink dev param set pci/0002:04:00.0 name receive_buffer_size \
       value 8196 cmode runtime
    2. ifconfig eth0 up
    
    [   19.901356] ------------[ cut here ]------------
    [   19.901361] WARNING: CPU: 11 PID: 12331 at net/core/page_pool.c:567 page_pool_alloc_frag+0x3c/0x230
    [   19.901449] pstate: 82401009 (Nzcv daif +PAN -UAO +TCO -DIT +SSBS BTYPE=--)
    [   19.901451] pc : page_pool_alloc_frag+0x3c/0x230
    [   19.901453] lr : __otx2_alloc_rbuf+0x60/0xbc [rvu_nicpf]
    [   19.901460] sp : ffff80000f66b970
    [   19.901461] x29: ffff80000f66b970 x28: 0000000000000000 x27: 0000000000000000
    [   19.901464] x26: ffff800000d15b68 x25: ffff000195b5c080 x24: ffff0002a5a32dc0
    [   19.901467] x23: ffff0001063c0878 x22: 0000000000000100 x21: 0000000000000000
    [   19.901469] x20: 0000000000000000 x19: ffff00016f781000 x18: 0000000000000000
    [   19.901472] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
    [   19.901474] x14: 0000000000000000 x13: ffff0005ffdc9c80 x12: 0000000000000000
    [   19.901477] x11: ffff800009119a38 x10: 4c6ef2e3ba300519 x9 : ffff800000d13844
    [   19.901479] x8 : ffff0002a5a33cc8 x7 : 0000000000000030 x6 : 0000000000000030
    [   19.901482] x5 : 0000000000000005 x4 : 0000000000000000 x3 : 0000000000000a20
    [   19.901484] x2 : 0000000000001080 x1 : ffff80000f66b9d4 x0 : 0000000000001000
    [   19.901487] Call trace:
    [   19.901488]  page_pool_alloc_frag+0x3c/0x230
    [   19.901490]  __otx2_alloc_rbuf+0x60/0xbc [rvu_nicpf]
    [   19.901494]  otx2_rq_aura_pool_init+0x1c4/0x240 [rvu_nicpf]
    [   19.901498]  otx2_open+0x228/0xa70 [rvu_nicpf]
    [   19.901501]  otx2vf_open+0x20/0xd0 [rvu_nicvf]
    [   19.901504]  __dev_open+0x114/0x1d0
    [   19.901507]  __dev_change_flags+0x194/0x210
    [   19.901510]  dev_change_flags+0x2c/0x70
    [   19.901512]  devinet_ioctl+0x3a4/0x6c4
    [   19.901515]  inet_ioctl+0x228/0x240
    [   19.901518]  sock_ioctl+0x2ac/0x480
    [   19.901522]  __arm64_sys_ioctl+0x564/0xe50
    [   19.901525]  invoke_syscall.constprop.0+0x58/0xf0
    [   19.901529]  do_el0_svc+0x58/0x150
    [   19.901531]  el0_svc+0x30/0x140
    [   19.901533]  el0t_64_sync_handler+0xe8/0x114
    [   19.901535]  el0t_64_sync+0x1a0/0x1a4
    [   19.901537] ---[ end trace 678c0bf660ad8116 ]---
    
    Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool")
    Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
    Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
    Link: https://lore.kernel.org/r/20231010034842.3807816-1-rkannoth@marvell.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

octeontx2-pf: fix page_pool creation fail for rings > 32k [+ + +]

Author: Ratheesh Kannoth <rkannoth@marvell.com>
Date:   Thu Aug 24 08:33:01 2023 +0530

    octeontx2-pf: fix page_pool creation fail for rings > 32k
    
    commit 49fa4b0d06705a24a81bb8be6eb175059b77f0a7 upstream.
    
    octeontx2 driver calls page_pool_create() during driver probe()
    and fails if queue size > 32k. Page pool infra uses these buffers
    as shock absorbers for burst traffic. These pages are pinned down
    over time as working sets varies, due to the recycling nature
    of page pool, given page pool (currently) don't have a shrinker
    mechanism, the pages remain pinned down in ptr_ring.
    Instead of clamping page_pool size to 32k at
    most, limit it even more to 2k to avoid wasting memory.
    
    This have been tested on octeontx2 CN10KA hardware.
    TCP and UDP tests using iperf shows no performance regressions.
    
    Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool")
    Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Reviewed-by: Sunil Goutham <sgoutham@marvell.com>
    Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

orangefs: Do not truncate file size [+ + +]

Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Mar 5 20:47:25 2025 +0000

    orangefs: Do not truncate file size
    
    [ Upstream commit 062e8093592fb866b8e016641a8b27feb6ac509d ]
    
    'len' is used to store the result of i_size_read(), so making 'len'
    a size_t results in truncation to 4GiB on 32-bit systems.
    
    Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Link: https://lore.kernel.org/r/20250305204734.1475264-2-willy@infradead.org
    Tested-by: Mike Marshall <hubcap@omnibond.com>
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

padata: do not leak refcount in reorder_work [+ + +]

Author: Dominik Grzegorzek <dominik.grzegorzek@oracle.com>
Date:   Sun May 18 19:45:31 2025 +0200

    padata: do not leak refcount in reorder_work
    
    commit d6ebcde6d4ecf34f8495fb30516645db3aea8993 upstream.
    
    A recent patch that addressed a UAF introduced a reference count leak:
    the parallel_data refcount is incremented unconditionally, regardless
    of the return value of queue_work(). If the work item is already queued,
    the incremented refcount is never decremented.
    
    Fix this by checking the return value of queue_work() and decrementing
    the refcount when necessary.
    
    Resolves:
    
    Unreferenced object 0xffff9d9f421e3d80 (size 192):
      comm "cryptomgr_probe", pid 157, jiffies 4294694003
      hex dump (first 32 bytes):
        80 8b cf 41 9f 9d ff ff b8 97 e0 89 ff ff ff ff  ...A............
        d0 97 e0 89 ff ff ff ff 19 00 00 00 1f 88 23 00  ..............#.
      backtrace (crc 838fb36):
        __kmalloc_cache_noprof+0x284/0x320
        padata_alloc_pd+0x20/0x1e0
        padata_alloc_shell+0x3b/0xa0
        0xffffffffc040a54d
        cryptomgr_probe+0x43/0xc0
        kthread+0xf6/0x1f0
        ret_from_fork+0x2f/0x50
        ret_from_fork_asm+0x1a/0x30
    
    Fixes: dd7d37ccf6b1 ("padata: avoid UAF for reorder_work")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@oracle.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PCI: brcmstb: Add a softdep to MIP MSI-X driver [+ + +]

Author: Stanimir Varbanov <svarbanov@suse.de>
Date:   Mon Feb 24 10:35:56 2025 +0200

    PCI: brcmstb: Add a softdep to MIP MSI-X driver
    
    [ Upstream commit 2294059118c550464dd8906286324d90c33b152b ]
    
    Then the brcmstb PCIe driver and MIP MSI-X interrupt controller
    drivers are built as modules there could be a race in probing.
    
    To avoid this, add a softdep to MIP driver to guarantee that
    MIP driver will be load first.
    
    Signed-off-by: Stanimir Varbanov <svarbanov@suse.de>
    Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Tested-by: Ivan T. Ivanov <iivanov@suse.de>
    Link: https://lore.kernel.org/r/20250224083559.47645-5-svarbanov@suse.de
    [kwilczynski: commit log]
    Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

PCI: brcmstb: Expand inbound window size up to 64GB [+ + +]

Author: Stanimir Varbanov <svarbanov@suse.de>
Date:   Mon Feb 24 10:35:58 2025 +0200

    PCI: brcmstb: Expand inbound window size up to 64GB
    
    [ Upstream commit 25a98c727015638baffcfa236e3f37b70cedcf87 ]
    
    The BCM2712 memory map can support up to 64GB of system memory, thus
    expand the inbound window size in calculation helper function.
    
    The change is safe for the currently supported SoCs that have smaller
    inbound window sizes.
    
    Signed-off-by: Stanimir Varbanov <svarbanov@suse.de>
    Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Reviewed-by: Jim Quinlan <james.quinlan@broadcom.com>
    Tested-by: Ivan T. Ivanov <iivanov@suse.de>
    Link: https://lore.kernel.org/r/20250224083559.47645-7-svarbanov@suse.de
    [kwilczynski: commit log]
    Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

PCI: dwc: ep: Ensure proper iteration over outbound map windows [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Sat Mar 15 15:15:46 2025 -0500

    PCI: dwc: ep: Ensure proper iteration over outbound map windows
    
    [ Upstream commit f3e1dccba0a0833fc9a05fb838ebeb6ea4ca0e1a ]
    
    Most systems' PCIe outbound map windows have non-zero physical addresses,
    but the possibility of encountering zero increased after following commit
    ("PCI: dwc: Use parent_bus_offset").
    
    'ep->outbound_addr[n]', representing 'parent_bus_address', might be 0 on
    some hardware, which trims high address bits through bus fabric before
    sending to the PCIe controller.
    
    Replace the iteration logic with 'for_each_set_bit()' to ensure only
    allocated map windows are iterated when determining the ATU index from a
    given address.
    
    Link: https://lore.kernel.org/r/20250315201548.858189-12-helgaas@kernel.org
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

PCI: Fix old_size lower bound in calculate_iosize() too [+ + +]

Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Mon Dec 16 19:56:12 2024 +0200

    PCI: Fix old_size lower bound in calculate_iosize() too
    
    [ Upstream commit ff61f380de5652e723168341480cc7adf1dd6213 ]
    
    Commit 903534fa7d30 ("PCI: Fix resource double counting on remove &
    rescan") fixed double counting of mem resources because of old_size being
    applied too early.
    
    Fix a similar counting bug on the io resource side.
    
    Link: https://lore.kernel.org/r/20241216175632.4175-6-ilpo.jarvinen@linux.intel.com
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Tested-by: Xiaochun Lee <lixc17@lenovo.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

PCI: vmd: Disable MSI remapping bypass under Xen [+ + +]

Author: Roger Pau Monne <roger.pau@citrix.com>
Date:   Wed Feb 19 10:20:56 2025 +0100

    PCI: vmd: Disable MSI remapping bypass under Xen
    
    [ Upstream commit 6c4d5aadf5df31ea0ac025980670eee9beaf466b ]
    
    MSI remapping bypass (directly configuring MSI entries for devices on the
    VMD bus) won't work under Xen, as Xen is not aware of devices in such bus,
    and hence cannot configure the entries using the pIRQ interface in the PV
    case, and in the PVH case traps won't be setup for MSI entries for such
    devices.
    
    Until Xen is aware of devices in the VMD bus prevent the
    VMD_FEAT_CAN_BYPASS_MSI_REMAP capability from being used when running as
    any kind of Xen guest.
    
    The MSI remapping bypass is an optional feature of VMD bridges, and hence
    when running under Xen it will be masked and devices will be forced to
    redirect its interrupts from the VMD bridge.  That mode of operation must
    always be supported by VMD bridges and works when Xen is not aware of
    devices behind the VMD bridge.
    
    Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com>
    Message-ID: <20250219092059.90850-3-roger.pau@citrix.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

perf/amd/ibs: Fix perf_ibs_op.cnt_mask for CurCnt [+ + +]

Author: Ravi Bangoria <ravi.bangoria@amd.com>
Date:   Wed Jan 15 05:44:33 2025 +0000

    perf/amd/ibs: Fix perf_ibs_op.cnt_mask for CurCnt
    
    [ Upstream commit 46dcf85566170d4528b842bf83ffc350d71771fa ]
    
    IBS Op uses two counters: MaxCnt and CurCnt. MaxCnt is programmed with
    the desired sample period. IBS hw generates sample when CurCnt reaches
    to MaxCnt. The size of these counter used to be 20 bits but later they
    were extended to 27 bits. The 7 bit extension is indicated by CPUID
    Fn8000_001B_EAX[6 / OpCntExt].
    
    perf_ibs->cnt_mask variable contains bit masks for MaxCnt and CurCnt.
    But IBS driver does not set upper 7 bits of CurCnt in cnt_mask even
    when OpCntExt CPUID bit is set. Fix this.
    
    IBS driver uses cnt_mask[CurCnt] bits only while disabling an event.
    Fortunately, CurCnt bits are not read from MSR while re-enabling the
    event, instead MaxCnt is programmed with desired period and CurCnt is
    set to 0. Hence, we did not see any issues so far.
    
    Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Namhyung Kim <namhyung@kernel.org>
    Link: https://lkml.kernel.org/r/20250115054438.1021-5-ravi.bangoria@amd.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

perf/arm-cmn: Fix REQ2/SNP2 mixup [+ + +]

Author: Robin Murphy <robin.murphy@arm.com>
Date:   Thu May 8 16:16:40 2025 +0100

    perf/arm-cmn: Fix REQ2/SNP2 mixup
    
    commit 11b0f576e0cbde6a12258f2af6753b17b8df342b upstream.
    
    Somehow the encodings for REQ2/SNP2 channels in XP events
    got mixed up... Unmix them.
    
    CC: stable@vger.kernel.org
    Fixes: 23760a014417 ("perf/arm-cmn: Add CMN-700 support")
    Signed-off-by: Robin Murphy <robin.murphy@arm.com>
    Link: https://lore.kernel.org/r/087023e9737ac93d7ec7a841da904758c254cb01.1746717400.git.robin.murphy@arm.com
    Signed-off-by: Will Deacon <will@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/arm-cmn: Initialise cmn->cpu earlier [+ + +]

Author: Robin Murphy <robin.murphy@arm.com>
Date:   Mon May 12 18:11:54 2025 +0100

    perf/arm-cmn: Initialise cmn->cpu earlier
    
    commit 597704e201068db3d104de3c7a4d447ff8209127 upstream.
    
    For all the complexity of handling affinity for CPU hotplug, what we've
    apparently managed to overlook is that arm_cmn_init_irqs() has in fact
    always been setting the *initial* affinity of all IRQs to CPU 0, not the
    CPU we subsequently choose for event scheduling. Oh dear.
    
    Cc: stable@vger.kernel.org
    Fixes: 0ba64770a2f2 ("perf: Add Arm CMN-600 PMU driver")
    Signed-off-by: Robin Murphy <robin.murphy@arm.com>
    Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
    Link: https://lore.kernel.org/r/b12fccba6b5b4d2674944f59e4daad91cd63420b.1747069914.git.robin.murphy@arm.com
    Signed-off-by: Will Deacon <will@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/hw_breakpoint: Return EOPNOTSUPP for unsupported breakpoint type [+ + +]

Author: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Date:   Mon Mar 3 14:54:51 2025 +0530

    perf/hw_breakpoint: Return EOPNOTSUPP for unsupported breakpoint type
    
    [ Upstream commit 061c991697062f3bf87b72ed553d1d33a0e370dd ]
    
    Currently, __reserve_bp_slot() returns -ENOSPC for unsupported
    breakpoint types on the architecture. For example, powerpc
    does not support hardware instruction breakpoints. This causes
    the perf_skip BPF selftest to fail, as neither ENOENT nor
    EOPNOTSUPP is returned by perf_event_open for unsupported
    breakpoint types. As a result, the test that should be skipped
    for this arch is not correctly identified.
    
    To resolve this, hw_breakpoint_event_init() should exit early by
    checking for unsupported breakpoint types using
    hw_breakpoint_slots_cached() and return the appropriate error
    (-EOPNOTSUPP).
    
    Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Cc: Marco Elver <elver@google.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Link: https://lore.kernel.org/r/20250303092451.1862862-1-skb99@linux.ibm.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

perf: Avoid the read if the count is already updated [+ + +]

Author: Peter Zijlstra (Intel) <peterz@infradead.org>
Date:   Tue Jan 21 07:23:02 2025 -0800

    perf: Avoid the read if the count is already updated
    
    [ Upstream commit 8ce939a0fa194939cc1f92dbd8bc1a7806e7d40a ]
    
    The event may have been updated in the PMU-specific implementation,
    e.g., Intel PEBS counters snapshotting. The common code should not
    read and overwrite the value.
    
    The PERF_SAMPLE_READ in the data->sample_type can be used to detect
    whether the PMU-specific value is available. If yes, avoid the
    pmu->read() in the common code. Add a new flag, skip_read, to track the
    case.
    
    Factor out a perf_pmu_read() to clean up the code.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20250121152303.3128733-3-kan.liang@linux.intel.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

phy: core: don't require set_mode() callback for phy_get_mode() to work [+ + +]

Author: Dmitry Baryshkov <lumag@kernel.org>
Date:   Sun Feb 9 14:31:45 2025 +0200

    phy: core: don't require set_mode() callback for phy_get_mode() to work
    
    [ Upstream commit d58c04e305afbaa9dda7969151f06c4efe2c98b0 ]
    
    As reported by Damon Ding, the phy_get_mode() call doesn't work as
    expected unless the PHY driver has a .set_mode() call. This prompts PHY
    drivers to have empty stubs for .set_mode() for the sake of being able
    to get the mode.
    
    Make .set_mode() callback truly optional and update PHY's mode even if
    it there is none.
    
    Cc: Damon Ding <damon.ding@rock-chips.com>
    Link: https://lore.kernel.org/r/96f8310f-93f1-4bcb-8637-137e1159ff83@rock-chips.com
    Tested-by: Damon Ding <damon.ding@rock-chips.com>
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Link: https://lore.kernel.org/r/20250209-phy-fix-set-moe-v2-1-76e248503856@linaro.org
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

phy: renesas: rcar-gen3-usb2: Add support to initialize the bus [+ + +]

Author: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Date:   Thu Aug 22 18:27:55 2024 +0300

    phy: renesas: rcar-gen3-usb2: Add support to initialize the bus
    
    [ Upstream commit 4eae16375357a2a7e8501be5469532f7636064b3 ]
    
    The Renesas RZ/G3S need to initialize the USB BUS before transferring data
    due to hardware limitation. As the register that need to be touched for
    this is in the address space of the USB PHY, and the UBS PHY need to be
    initialized before any other USB drivers handling data transfer, add
    support to initialize the USB BUS.
    
    As the USB PHY is probed before any other USB drivers that enables
    clocks and de-assert the reset signals and the BUS initialization is done
    in the probe phase, we need to add code to de-assert reset signal and
    runtime resume the device (which enables its clocks) before accessing
    the registers.
    
    As the reset signals are not required by the USB PHY driver for the other
    USB PHY hardware variants, the reset signals and runtime PM was handled
    only in the function that initialize the USB BUS.
    
    The PHY initialization was done right after runtime PM enable to have
    all in place when the PHYs are registered.
    
    Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
    Link: https://lore.kernel.org/r/20240822152801.602318-11-claudiu.beznea.uj@bp.renesas.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Stable-dep-of: 9ce71e85b29e ("phy: renesas: rcar-gen3-usb2: Assert PLL reset on PHY power off")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

phy: renesas: rcar-gen3-usb2: Assert PLL reset on PHY power off [+ + +]

Author: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Date:   Wed May 7 15:50:31 2025 +0300

    phy: renesas: rcar-gen3-usb2: Assert PLL reset on PHY power off
    
    [ Upstream commit 9ce71e85b29eb63e48e294479742e670513f03a0 ]
    
    Assert PLL reset on PHY power off. This saves power.
    
    Fixes: f3b5a8d9b50d ("phy: rcar-gen3-usb2: Add R-Car Gen3 USB2 PHY driver")
    Cc: stable@vger.kernel.org
    Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
    Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
    Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
    Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
    Link: https://lore.kernel.org/r/20250507125032.565017-5-claudiu.beznea.uj@bp.renesas.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

phy: renesas: rcar-gen3-usb2: Lock around hardware registers and driver data [+ + +]

Author: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Date:   Wed May 7 15:50:30 2025 +0300

    phy: renesas: rcar-gen3-usb2: Lock around hardware registers and driver data
    
    [ Upstream commit 55a387ebb9219cbe4edfa8ba9996ccb0e7ad4932 ]
    
    The phy-rcar-gen3-usb2 driver exposes four individual PHYs that are
    requested and configured by PHY users. The struct phy_ops APIs access the
    same set of registers to configure all PHYs. Additionally, PHY settings can
    be modified through sysfs or an IRQ handler. While some struct phy_ops APIs
    are protected by a driver-wide mutex, others rely on individual
    PHY-specific mutexes.
    
    This approach can lead to various issues, including:
    1/ the IRQ handler may interrupt PHY settings in progress, racing with
       hardware configuration protected by a mutex lock
    2/ due to msleep(20) in rcar_gen3_init_otg(), while a configuration thread
       suspends to wait for the delay, another thread may try to configure
       another PHY (with phy_init() + phy_power_on()); re-running the
       phy_init() goes to the exact same configuration code, re-running the
       same hardware configuration on the same set of registers (and bits)
       which might impact the result of the msleep for the 1st configuring
       thread
    3/ sysfs can configure the hardware (though role_store()) and it can
       still race with the phy_init()/phy_power_on() APIs calling into the
       drivers struct phy_ops
    
    To address these issues, add a spinlock to protect hardware register access
    and driver private data structures (e.g., calls to
    rcar_gen3_is_any_rphy_initialized()). Checking driver-specific data remains
    necessary as all PHY instances share common settings. With this change,
    the existing mutex protection is removed and the cleanup.h helpers are
    used.
    
    While at it, to keep the code simpler, do not skip
    regulator_enable()/regulator_disable() APIs in
    rcar_gen3_phy_usb2_power_on()/rcar_gen3_phy_usb2_power_off() as the
    regulators enable/disable operations are reference counted anyway.
    
    Fixes: f3b5a8d9b50d ("phy: rcar-gen3-usb2: Add R-Car Gen3 USB2 PHY driver")
    Cc: stable@vger.kernel.org
    Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
    Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
    Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
    Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
    Link: https://lore.kernel.org/r/20250507125032.565017-4-claudiu.beznea.uj@bp.renesas.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Stable-dep-of: 9ce71e85b29e ("phy: renesas: rcar-gen3-usb2: Assert PLL reset on PHY power off")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

phy: renesas: rcar-gen3-usb2: Move IRQ request in probe [+ + +]

Author: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Date:   Wed May 7 15:50:29 2025 +0300

    phy: renesas: rcar-gen3-usb2: Move IRQ request in probe
    
    [ Upstream commit de76809f60cc938d3580bbbd5b04b7d12af6ce3a ]
    
    Commit 08b0ad375ca6 ("phy: renesas: rcar-gen3-usb2: move IRQ registration
    to init") moved the IRQ request operation from probe to
    struct phy_ops::phy_init API to avoid triggering interrupts (which lead to
    register accesses) while the PHY clocks (enabled through runtime PM APIs)
    are not active. If this happens, it results in a synchronous abort.
    
    One way to reproduce this issue is by enabling CONFIG_DEBUG_SHIRQ, which
    calls free_irq() on driver removal.
    
    Move the IRQ request and free operations back to probe, and take the
    runtime PM state into account in IRQ handler. This commit is preparatory
    for the subsequent fixes in this series.
    
    Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
    Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
    Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
    Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
    Link: https://lore.kernel.org/r/20250507125032.565017-3-claudiu.beznea.uj@bp.renesas.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Stable-dep-of: 9ce71e85b29e ("phy: renesas: rcar-gen3-usb2: Assert PLL reset on PHY power off")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

pid: add pidfd_prepare() [+ + +]

Author: Christian Brauner <brauner@kernel.org>
Date:   Mon Mar 27 20:22:51 2023 +0200

    pid: add pidfd_prepare()
    
    commit 6ae930d9dbf2d093157be33428538c91966d8a9f upstream.
    
    Add a new helper that allows to reserve a pidfd and allocates a new
    pidfd file that stashes the provided struct pid. This will allow us to
    remove places that either open code this function or that call
    pidfd_create() but then have to call close_fd() because there are still
    failure points after pidfd_create() has been called.
    
    Reviewed-by: Jan Kara <jack@suse.cz>
    Message-Id: <20230327-pidfd-file-api-v1-1-5c0e9a3158e4@kernel.org>
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pinctrl-tegra: Restore SFSEL bit when freeing pins [+ + +]

Author: Prathamesh Shete <pshete@nvidia.com>
Date:   Wed Mar 5 16:19:39 2025 +0530

    pinctrl-tegra: Restore SFSEL bit when freeing pins
    
    [ Upstream commit c12bfa0fee65940b10ff5187349f76c6f6b1df9c ]
    
    Each pin can be configured as a Special Function IO (SFIO) or GPIO,
    where the SFIO enables the pin to operate in alternative modes such as
    I2C, SPI, etc.
    
    The current implementation sets all the pins back to SFIO mode
    even if they were initially in GPIO mode. This can cause glitches
    on the pins when pinctrl_gpio_free() is called.
    
    Avoid these undesired glitches by storing the pin's SFIO/GPIO
    state on GPIO request and restoring it on GPIO free.
    
    Signed-off-by: Prathamesh Shete <pshete@nvidia.com>
    Link: https://lore.kernel.org/20250305104939.15168-2-pshete@nvidia.com
    Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

pinctrl: bcm281xx: Use "unsigned int" instead of bare "unsigned" [+ + +]

Author: Artur Weber <aweber.kernel@gmail.com>
Date:   Mon Mar 3 21:54:47 2025 +0100

    pinctrl: bcm281xx: Use "unsigned int" instead of bare "unsigned"
    
    [ Upstream commit 07b5a2a13f4704c5eae3be7277ec54ffdba45f72 ]
    
    Replace uses of bare "unsigned" with "unsigned int" to fix checkpatch
    warnings. No functional change.
    
    Signed-off-by: Artur Weber <aweber.kernel@gmail.com>
    Link: https://lore.kernel.org/20250303-bcm21664-pinctrl-v3-2-5f8b80e4ab51@gmail.com
    Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

pinctrl: devicetree: do not goto err when probing hogs in pinctrl_dt_to_map [+ + +]

Author: Valentin Caron <valentin.caron@foss.st.com>
Date:   Thu Jan 16 18:00:09 2025 +0100

    pinctrl: devicetree: do not goto err when probing hogs in pinctrl_dt_to_map
    
    [ Upstream commit c98868e816209e568c9d72023ba0bc1e4d96e611 ]
    
    Cross case in pinctrl framework make impossible to an hogged pin and
    another, not hogged, used within the same device-tree node. For example
    with this simplified device-tree :
    
      &pinctrl {
        pinctrl_pin_1: pinctrl-pin-1 {
          pins = "dummy-pinctrl-pin";
        };
      };
    
      &rtc {
        pinctrl-names = "default"
        pinctrl-0 = <&pinctrl_pin_1 &rtc_pin_1>
    
        rtc_pin_1: rtc-pin-1 {
          pins = "dummy-rtc-pin";
        };
      };
    
    "pinctrl_pin_1" configuration is never set. This produces this path in
    the code:
    
      really_probe()
        pinctrl_bind_pins()
        | devm_pinctrl_get()
        |   pinctrl_get()
        |     create_pinctrl()
        |       pinctrl_dt_to_map()
        |         // Hog pin create an abort for all pins of the node
        |         ret = dt_to_map_one_config()
        |         | /* Do not defer probing of hogs (circular loop) */
        |         | if (np_pctldev == p->dev->of_node)
        |         |   return -ENODEV;
        |         if (ret)
        |           goto err
        |
        call_driver_probe()
          stm32_rtc_probe()
            pinctrl_enable()
              pinctrl_claim_hogs()
                create_pinctrl()
                  for_each_maps(maps_node, i, map)
                    // Not hog pin is skipped
                    if (pctldev && strcmp(dev_name(pctldev->dev),
                                          map->ctrl_dev_name))
                      continue;
    
    At the first call of create_pinctrl() the hogged pin produces an abort to
    avoid a defer of hogged pins. All other pin configurations are trashed.
    
    At the second call, create_pinctrl is now called with pctldev parameter to
    get hogs, but in this context only hogs are set. And other pins are
    skipped.
    
    To handle this, do not produce an abort in the first call of
    create_pinctrl(). Classic pin configuration will be set in
    pinctrl_bind_pins() context. And the hogged pin configuration will be set
    in pinctrl_claim_hogs() context.
    
    Signed-off-by: Valentin Caron <valentin.caron@foss.st.com>
    Link: https://lore.kernel.org/20250116170009.2075544-1-valentin.caron@foss.st.com
    Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

pinctrl: meson: define the pull up/down resistor value as 60 kOhm [+ + +]

Author: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Date:   Sat Mar 29 20:01:32 2025 +0100

    pinctrl: meson: define the pull up/down resistor value as 60 kOhm
    
    [ Upstream commit e56088a13708757da68ad035269d69b93ac8c389 ]
    
    The public datasheets of the following Amlogic SoCs describe a typical
    resistor value for the built-in pull up/down resistor:
    - Meson8/8b/8m2: not documented
    - GXBB (S905): 60 kOhm
    - GXL (S905X): 60 kOhm
    - GXM (S912): 60 kOhm
    - G12B (S922X): 60 kOhm
    - SM1 (S905D3): 60 kOhm
    
    The public G12B and SM1 datasheets additionally state min and max
    values:
    - min value: 50 kOhm for both, pull-up and pull-down
    - max value for the pull-up: 70 kOhm
    - max value for the pull-down: 130 kOhm
    
    Use 60 kOhm in the pinctrl-meson driver as well so it's shown in the
    debugfs output. It may not be accurate for Meson8/8b/8m2 but in reality
    60 kOhm is closer to the actual value than 1 Ohm.
    
    Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
    Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
    Link: https://lore.kernel.org/20250329190132.855196-1-martin.blumenstingl@googlemail.com
    Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

pinctrl: tegra: Fix off by one in tegra_pinctrl_get_group() [+ + +]

Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Wed Mar 19 10:05:47 2025 +0300

    pinctrl: tegra: Fix off by one in tegra_pinctrl_get_group()
    
    commit 5a062c3c3b82004766bc3ece82b594d337076152 upstream.
    
    This should be >= pmx->soc->ngroups instead of > to avoid an out of
    bounds access.  The pmx->soc->groups[] array is allocated in
    tegra_pinctrl_probe().
    
    Fixes: c12bfa0fee65 ("pinctrl-tegra: Restore SFSEL bit when freeing pins")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Reviewed-by: Kunwu Chan <kunwu.chan@linux.dev>
    Link: https://lore.kernel.org/82b40d9d-b437-42a9-9eb3-2328aa6877ac@stanley.mountain
    Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

platform/x86: dell-wmi-sysman: Avoid buffer overflow in current_password_store() [+ + +]

Author: Vladimir Moskovkin <Vladimir.Moskovkin@kaspersky.com>
Date:   Wed May 14 12:12:55 2025 +0000

    platform/x86: dell-wmi-sysman: Avoid buffer overflow in current_password_store()
    
    commit 4e89a4077490f52cde652d17e32519b666abf3a6 upstream.
    
    If the 'buf' array received from the user contains an empty string, the
    'length' variable will be zero. Accessing the 'buf' array element with
    index 'length - 1' will result in a buffer overflow.
    
    Add a check for an empty string.
    
    Found by Linux Verification Center (linuxtesting.org) with SVACE.
    
    Fixes: e8a60aa7404b ("platform/x86: Introduce support for Systems Management Driver over WMI for Dell Systems")
    Cc: stable@vger.kernel.org
    Signed-off-by: Vladimir Moskovkin <Vladimir.Moskovkin@kaspersky.com>
    Link: https://lore.kernel.org/r/39973642a4f24295b4a8fad9109c5b08@kaspersky.com
    Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

platform/x86: fujitsu-laptop: Support Lifebook S2110 hotkeys [+ + +]

Author: Valtteri Koskivuori <vkoskiv@gmail.com>
Date:   Fri May 9 21:42:49 2025 +0300

    platform/x86: fujitsu-laptop: Support Lifebook S2110 hotkeys
    
    [ Upstream commit a7e255ff9fe4d9b8b902023aaf5b7a673786bb50 ]
    
    The S2110 has an additional set of media playback control keys enabled
    by a hardware toggle button that switches the keys between "Application"
    and "Player" modes. Toggling "Player" mode just shifts the scancode of
    each hotkey up by 4.
    
    Add defines for new scancodes, and a keymap and dmi id for the S2110.
    
    Tested on a Fujitsu Lifebook S2110.
    
    Signed-off-by: Valtteri Koskivuori <vkoskiv@gmail.com>
    Acked-by: Jonathan Woithe <jwoithe@just42.net>
    Link: https://lore.kernel.org/r/20250509184251.713003-1-vkoskiv@gmail.com
    Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

platform/x86: thinkpad_acpi: Ignore battery threshold change event notification [+ + +]

Author: Mark Pearson <mpearson-lenovo@squebb.ca>
Date:   Fri May 16 22:33:37 2025 -0400

    platform/x86: thinkpad_acpi: Ignore battery threshold change event notification
    
    [ Upstream commit 29e4e6b4235fefa5930affb531fe449cac330a72 ]
    
    If user modifies the battery charge threshold an ACPI event is generated.
    Confirmed with Lenovo FW team this is only generated on user event. As no
    action is needed, ignore the event and prevent spurious kernel logs.
    
    Reported-by: Derek Barbosa <debarbos@redhat.com>
    Closes: https://lore.kernel.org/platform-driver-x86/7e9a1c47-5d9c-4978-af20-3949d53fb5dc@app.fastmail.com/T/#m5f5b9ae31d3fbf30d7d9a9d76c15fb3502dfd903
    Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca>
    Reviewed-by: Hans de Goede <hdegoede@redhat.com>
    Reviewed-by: Armin Wolf <W_Armin@gmx.de>
    Link: https://lore.kernel.org/r/20250517023348.2962591-1-mpearson-lenovo@squebb.ca
    Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

platform/x86: thinkpad_acpi: Support also NEC Lavie X1475JAS [+ + +]

Author: John Chau <johnchau@0atlas.com>
Date:   Mon May 5 01:55:13 2025 +0900

    platform/x86: thinkpad_acpi: Support also NEC Lavie X1475JAS
    
    [ Upstream commit a032f29a15412fab9f4352e0032836d51420a338 ]
    
    Change get_thinkpad_model_data() to check for additional vendor name
    "NEC" in order to support NEC Lavie X1475JAS notebook (and perhaps
    more).
    
    The reason of this works with minimal changes is because NEC Lavie
    X1475JAS is a Thinkpad inside. ACPI dumps reveals its OEM ID to be
    "LENOVO", BIOS version "R2PET30W" matches typical Lenovo BIOS version,
    the existence of HKEY of LEN0268, with DMI fw string is "R2PHT24W".
    
    I compiled and tested with my own machine, attached the dmesg
    below as proof of work:
    [    6.288932] thinkpad_acpi: ThinkPad ACPI Extras v0.26
    [    6.288937] thinkpad_acpi: http://ibm-acpi.sf.net/
    [    6.288938] thinkpad_acpi: ThinkPad BIOS R2PET30W (1.11 ), EC R2PHT24W
    [    6.307000] thinkpad_acpi: radio switch found; radios are enabled
    [    6.307030] thinkpad_acpi: This ThinkPad has standard ACPI backlight brightness control, supported by the ACPI video driver
    [    6.307033] thinkpad_acpi: Disabling thinkpad-acpi brightness events by default...
    [    6.320322] thinkpad_acpi: rfkill switch tpacpi_bluetooth_sw: radio is unblocked
    [    6.371963] thinkpad_acpi: secondary fan control detected & enabled
    [    6.391922] thinkpad_acpi: battery 1 registered (start 0, stop 85, behaviours: 0x7)
    [    6.398375] input: ThinkPad Extra Buttons as /devices/platform/thinkpad_acpi/input/input13
    
    Signed-off-by: John Chau <johnchau@0atlas.com>
    Link: https://lore.kernel.org/r/20250504165513.295135-1-johnchau@0atlas.com
    Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

pmdomain: imx: gpcv2: use proper helper for property detection [+ + +]

Author: Ahmad Fatoum <a.fatoum@pengutronix.de>
Date:   Tue Feb 18 20:18:32 2025 +0100

    pmdomain: imx: gpcv2: use proper helper for property detection
    
    [ Upstream commit 6568cb40e73163fa25e2779f7234b169b2e1a32e ]
    
    Starting with commit c141ecc3cecd7 ("of: Warn when of_property_read_bool()
    is used on non-boolean properties"), probing the gpcv2 device on i.MX8M
    SoCs leads to warnings when LOCKDEP is enabled.
    
    Fix this by checking property presence with of_property_present as
    intended.
    
    Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
    Link: https://lore.kernel.org/r/20250218-gpcv2-of-property-present-v1-1-3bb1a9789654@pengutronix.de
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

pNFS/flexfiles: Report ENETDOWN as a connection error [+ + +]

Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Thu Mar 20 12:45:01 2025 -0400

    pNFS/flexfiles: Report ENETDOWN as a connection error
    
    [ Upstream commit aa42add73ce9b9e3714723d385c254b75814e335 ]
    
    If the client should see an ENETDOWN when trying to connect to the data
    server, it might still be able to talk to the metadata server through
    another NIC. If so, report the error.
    
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Tested-by: Jeff Layton <jlayton@kernel.org>
    Acked-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

posix-timers: Add cond_resched() to posix_timer_add() search loop [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Sat Mar 8 17:48:17 2025 +0100

    posix-timers: Add cond_resched() to posix_timer_add() search loop
    
    [ Upstream commit 5f2909c6cd13564a07ae692a95457f52295c4f22 ]
    
    With a large number of POSIX timers the search for a valid ID might cause a
    soft lockup on PREEMPT_NONE/VOLUNTARY kernels.
    
    Add cond_resched() to the loop to prevent that.
    
    [ tglx: Split out from Eric's series ]
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Link: https://lore.kernel.org/all/20250214135911.2037402-2-edumazet@google.com
    Link: https://lore.kernel.org/all/20250308155623.635612865@linutronix.de
    Signed-off-by: Sasha Levin <sashal@kernel.org>

powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7 [+ + +]

Author: Andreas Schwab <schwab@linux-m68k.org>
Date:   Mon Jan 13 18:19:09 2025 +0100

    powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7
    
    [ Upstream commit 7e67ef889c9ab7246547db73d524459f47403a77 ]
    
    Similar to the PowerMac3,1, the PowerBook6,7 is missing the #size-cells
    property on the i2s node.
    
    Depends-on: commit 045b14ca5c36 ("of: WARN on deprecated #address-cells/#size-cells handling")
    Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
    Acked-by: Rob Herring (Arm) <robh@kernel.org>
    [maddy: added "commit" work in depends-on to avoid checkpatch error]
    Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
    Link: https://patch.msgid.link/875xmizl6a.fsf@igel.home
    Signed-off-by: Sasha Levin <sashal@kernel.org>

r8152: add vendor/device ID pair for Dell Alienware AW1022z [+ + +]

Author: Aleksander Jan Bajkowski <olek2@wp.pl>
Date:   Thu Feb 6 23:40:33 2025 +0100

    r8152: add vendor/device ID pair for Dell Alienware AW1022z
    
    [ Upstream commit 848b09d53d923b4caee5491f57a5c5b22d81febc ]
    
    The Dell AW1022z is an RTL8156B based 2.5G Ethernet controller.
    
    Add the vendor and product ID values to the driver. This makes Ethernet
    work with the adapter.
    
    Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
    Link: https://patch.msgid.link/20250206224033.980115-1-olek2@wp.pl
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

r8169: don't scan PHY addresses > 0 [+ + +]

Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Tue Feb 4 07:58:17 2025 +0100

    r8169: don't scan PHY addresses > 0
    
    [ Upstream commit faac69a4ae5abb49e62c79c66b51bb905c9aa5ec ]
    
    The PHY address is a dummy, because r8169 PHY access registers
    don't support a PHY address. Therefore scan address 0 only.
    
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Link: https://patch.msgid.link/830637dd-4016-4a68-92b3-618fcac6589d@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

rcu: fix header guard for rcu_all_qs() [+ + +]

Author: Ankur Arora <ankur.a.arora@oracle.com>
Date:   Thu Dec 12 20:06:52 2024 -0800

    rcu: fix header guard for rcu_all_qs()
    
    [ Upstream commit ad6b5b73ff565e88aca7a7d1286788d80c97ba71 ]
    
    rcu_all_qs() is defined for !CONFIG_PREEMPT_RCU but the declaration
    is conditioned on CONFIG_PREEMPTION.
    
    With CONFIG_PREEMPT_LAZY, CONFIG_PREEMPTION=y does not imply
    CONFIG_PREEMPT_RCU=y.
    
    Decouple the two.
    
    Cc: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y [+ + +]

Author: Ankur Arora <ankur.a.arora@oracle.com>
Date:   Thu Dec 12 20:06:56 2024 -0800

    rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
    
    [ Upstream commit 83b28cfe796464ebbde1cf7916c126da6d572685 ]
    
    With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
    states for read-side critical sections via rcu_all_qs().
    One reason why this was needed: lacking preempt-count, the tick
    handler has no way of knowing whether it is executing in a
    read-side critical section or not.
    
    With (PREEMPT_LAZY=y, PREEMPT_DYNAMIC=n), we get (PREEMPT_COUNT=y,
    PREEMPT_RCU=n). In this configuration cond_resched() is a stub and
    does not provide quiescent states via rcu_all_qs().
    (PREEMPT_RCU=y provides this information via rcu_read_unlock() and
    its nesting counter.)
    
    So, use the availability of preempt_count() to report quiescent states
    in rcu_flavor_sched_clock_irq().
    
    Suggested-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

rcu: handle unstable rdp in rcu_read_unlock_strict() [+ + +]

Author: Ankur Arora <ankur.a.arora@oracle.com>
Date:   Thu Dec 12 20:06:55 2024 -0800

    rcu: handle unstable rdp in rcu_read_unlock_strict()
    
    [ Upstream commit fcf0e25ad4c8d14d2faab4d9a17040f31efce205 ]
    
    rcu_read_unlock_strict() can be called with preemption enabled
    which can make for an unstable rdp and a racy norm value.
    
    Fix this by dropping the preempt-count in __rcu_read_unlock()
    after the call to rcu_read_unlock_strict(), adjusting the
    preempt-count check appropriately.
    
    Suggested-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

RDMA/core: Fix best page size finding when it can cross SG entries [+ + +]

Author: Michael Margolin <mrgolin@amazon.com>
Date:   Mon Feb 17 14:16:23 2025 +0000

    RDMA/core: Fix best page size finding when it can cross SG entries
    
    [ Upstream commit 486055f5e09df959ad4e3aa4ee75b5c91ddeec2e ]
    
    A single scatter-gather entry is limited by a 32 bits "length" field
    that is practically 4GB - PAGE_SIZE. This means that even when the
    memory is physically contiguous, we might need more than one entry to
    represent it. Additionally when using dmabuf, the sg_table might be
    originated outside the subsystem and optimized for other needs.
    
    For instance an SGT of 16GB GPU continuous memory might look like this:
    (a real life example)
    
    dma_address 34401400000, length fffff000
    dma_address 345013ff000, length fffff000
    dma_address 346013fe000, length fffff000
    dma_address 347013fd000, length fffff000
    dma_address 348013fc000, length 4000
    
    Since ib_umem_find_best_pgsz works within SG entries, in the above case
    we will result with the worst possible 4KB page size.
    
    Fix this by taking into consideration only the alignment of addresses of
    real discontinuity points rather than treating SG entries as such, and
    adjust the page iterator to correctly handle cross SG entry pages.
    
    There is currently an assumption that drivers do not ask for pages
    bigger than maximal DMA size supported by their devices.
    
    Reviewed-by: Firas Jahjah <firasj@amazon.com>
    Reviewed-by: Yonatan Nachum <ynachum@amazon.com>
    Signed-off-by: Michael Margolin <mrgolin@amazon.com>
    Link: https://patch.msgid.link/20250217141623.12428-1-mrgolin@amazon.com
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

RDMA/uverbs: Propagate errors from rdma_lookup_get_uobject() [+ + +]

Author: Maher Sanalla <msanalla@nvidia.com>
Date:   Wed Feb 26 15:54:13 2025 +0200

    RDMA/uverbs: Propagate errors from rdma_lookup_get_uobject()
    
    [ Upstream commit 81f8f7454ad9e0bf95efdec6542afdc9a6ab1e24 ]
    
    Currently, the IB uverbs API calls uobj_get_uobj_read(), which in turn
    uses the rdma_lookup_get_uobject() helper to retrieve user objects.
    In case of failure, uobj_get_uobj_read() returns NULL, overriding the
    error code from rdma_lookup_get_uobject(). The IB uverbs API then
    translates this NULL to -EINVAL, masking the actual error and
    complicating debugging. For example, applications calling ibv_modify_qp
    that fails with EBUSY when retrieving the QP uobject will see the
    overridden error code EINVAL instead, masking the actual error.
    
    Furthermore, based on rdma-core commit:
    "2a22f1ced5f3 ("Merge pull request #1568 from jakemoroni/master")"
    Kernel's IB uverbs return values are either ignored and passed on as is
    to application or overridden with other errnos in a few cases.
    
    Thus, to improve error reporting and debuggability, propagate the
    original error from rdma_lookup_get_uobject() instead of replacing it
    with EINVAL.
    
    Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
    Link: https://patch.msgid.link/64f9d3711b183984e939962c2f83383904f97dfb.1740577869.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

regulator: ad5398: Add device tree support [+ + +]

Author: Isaac Scott <isaac.scott@ideasonboard.com>
Date:   Tue Jan 28 17:31:43 2025 +0000

    regulator: ad5398: Add device tree support
    
    [ Upstream commit 5a6a461079decea452fdcae955bccecf92e07e97 ]
    
    Previously, the ad5398 driver used only platform_data, which is
    deprecated in favour of device tree. This caused the AD5398 to fail to
    probe as it could not load its init_data. If the AD5398 has a device
    tree node, pull the init_data from there using
    of_get_regulator_init_data.
    
    Signed-off-by: Isaac Scott <isaac.scott@ideasonboard.com>
    Acked-by: Michael Hennerich <michael.hennerich@analog.com>
    Link: https://patch.msgid.link/20250128173143.959600-4-isaac.scott@ideasonboard.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

remoteproc: qcom_wcnss: Fix on platforms without fallback regulators [+ + +]

Author: Matti Lehtimäki <matti.lehtimaki@gmail.com>
Date:   Mon May 12 02:40:15 2025 +0300

    remoteproc: qcom_wcnss: Fix on platforms without fallback regulators
    
    [ Upstream commit 4ca45af0a56d00b86285d6fdd720dca3215059a7 ]
    
    Recent change to handle platforms with only single power domain broke
    pronto-v3 which requires power domains and doesn't have fallback voltage
    regulators in case power domains are missing. Add a check to verify
    the number of fallback voltage regulators before using the code which
    handles single power domain situation.
    
    Fixes: 65991ea8a6d1 ("remoteproc: qcom_wcnss: Handle platforms with only single power domain")
    Signed-off-by: Matti Lehtimäki <matti.lehtimaki@gmail.com>
    Tested-by: Luca Weiss <luca.weiss@fairphone.com> # sdm632-fairphone-fp3
    Link: https://lore.kernel.org/r/20250511234026.94735-1-matti.lehtimaki@gmail.com
    Signed-off-by: Bjorn Andersson <andersson@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

remoteproc: qcom_wcnss: Handle platforms with only single power domain [+ + +]

Author: Matti Lehtimäki <matti.lehtimaki@gmail.com>
Date:   Thu Feb 6 20:56:48 2025 +0100

    remoteproc: qcom_wcnss: Handle platforms with only single power domain
    
    [ Upstream commit 65991ea8a6d1e68effdc01d95ebe39f1653f7b71 ]
    
    Both MSM8974 and MSM8226 have only CX as power domain with MX & PX being
    handled as regulators. Handle this case by reodering pd_names to have CX
    first, and handling that the driver core will already attach a single
    power domain internally.
    
    Signed-off-by: Matti Lehtimäki <matti.lehtimaki@gmail.com>
    [luca: minor changes]
    Signed-off-by: Luca Weiss <luca@lucaweiss.eu>
    Link: https://lore.kernel.org/r/20250206-wcnss-singlepd-v2-2-9a53ee953dee@lucaweiss.eu
    [bjorn: Added missing braces to else after multi-statement if]
    Signed-off-by: Bjorn Andersson <andersson@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Revert "arm64: dts: allwinner: h6: Use RSB for AXP805 PMIC connection" [+ + +]

Author: Jernej Skrabec <jernej.skrabec@gmail.com>
Date:   Sun Apr 13 15:58:48 2025 +0200

    Revert "arm64: dts: allwinner: h6: Use RSB for AXP805 PMIC connection"
    
    [ Upstream commit 573f99c7585f597630f14596550c79e73ffaeef4 ]
    
    This reverts commit 531fdbeedeb89bd32018a35c6e137765c9cc9e97.
    
    Hardware that uses I2C wasn't designed with high speeds in mind, so
    communication with PMIC via RSB can intermittently fail. Go back to I2C
    as higher speed and efficiency isn't worth the trouble.
    
    Fixes: 531fdbeedeb8 ("arm64: dts: allwinner: h6: Use RSB for AXP805 PMIC connection")
    Link: https://github.com/LibreELEC/LibreELEC.tv/issues/7731
    Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
    Link: https://patch.msgid.link/20250413135848.67283-1-jernej.skrabec@gmail.com
    Signed-off-by: Chen-Yu Tsai <wens@csie.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Revert "drm/amd: Keep display off while going into S4" [+ + +]

Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Thu May 22 09:13:28 2025 -0500

    Revert "drm/amd: Keep display off while going into S4"
    
    commit 7e7cb7a13c81073d38a10fa7b450d23712281ec4 upstream.
    
    commit 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4")
    attempted to keep displays off during the S4 sequence by not resuming
    display IP.  This however leads to hangs because DRM clients such as the
    console can try to access registers and cause a hang.
    
    Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4155
    Fixes: 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4")
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Link: https://lore.kernel.org/r/20250522141328.115095-1-mario.limonciello@amd.com
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit e485502c37b097b0bd773baa7e2741bf7bd2909a)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtc: ds1307: stop disabling alarms on probe [+ + +]

Author: Alexandre Belloni <alexandre.belloni@bootlin.com>
Date:   Mon Mar 3 23:37:44 2025 +0100

    rtc: ds1307: stop disabling alarms on probe
    
    [ Upstream commit dcec12617ee61beed928e889607bf37e145bf86b ]
    
    It is a bad practice to disable alarms on probe or remove as this will
    prevent alarms across reboots.
    
    Link: https://lore.kernel.org/r/20250303223744.1135672-1-alexandre.belloni@bootlin.com
    Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

rtc: rv3032: fix EERD location [+ + +]

Author: Alexandre Belloni <alexandre.belloni@bootlin.com>
Date:   Thu Mar 6 22:42:41 2025 +0100

    rtc: rv3032: fix EERD location
    
    [ Upstream commit b0f9cb4a0706b0356e84d67e48500b77b343debe ]
    
    EERD is bit 2 in CTRL1
    
    Link: https://lore.kernel.org/r/20250306214243.1167692-1-alexandre.belloni@bootlin.com
    Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

s390/vfio-ap: Fix no AP queue sharing allowed message written to kernel log [+ + +]

Author: Anthony Krowiak <akrowiak@linux.ibm.com>
Date:   Tue Mar 11 06:32:57 2025 -0400

    s390/vfio-ap: Fix no AP queue sharing allowed message written to kernel log
    
    [ Upstream commit d33d729afcc8ad2148d99f9bc499b33fd0c0d73b ]
    
    An erroneous message is written to the kernel log when either of the
    following actions are taken by a user:
    
    1. Assign an adapter or domain to a vfio_ap mediated device via its sysfs
       assign_adapter or assign_domain attributes that would result in one or
       more AP queues being assigned that are already assigned to a different
       mediated device. Sharing of queues between mdevs is not allowed.
    
    2. Reserve an adapter or domain for the host device driver via the AP bus
       driver's sysfs apmask or aqmask attribute that would result in providing
       host access to an AP queue that is in use by a vfio_ap mediated device.
       Reserving a queue for a host driver that is in use by an mdev is not
       allowed.
    
    In both cases, the assignment will return an error; however, a message like
    the following is written to the kernel log:
    
    vfio_ap_mdev e1839397-51a0-4e3c-91e0-c3b9c3d3047d: Userspace may not
    re-assign queue 00.0028 already assigned to \
    e1839397-51a0-4e3c-91e0-c3b9c3d3047d
    
    Notice the mdev reporting the error is the same as the mdev identified
    in the message as the one to which the queue is being assigned.
    It is perfectly okay to assign a queue to an mdev to which it is
    already assigned; the assignment is simply ignored by the vfio_ap device
    driver.
    
    This patch logs more descriptive and accurate messages for both 1 and 2
    above to the kernel log:
    
    Example for 1:
    vfio_ap_mdev 0fe903a0-a323-44db-9daf-134c68627d61: Userspace may not assign
    queue 00.0033 to mdev: already assigned to \
    62177883-f1bb-47f0-914d-32a22e3a8804
    
    Example for 2:
    vfio_ap_mdev 62177883-f1bb-47f0-914d-32a22e3a8804: Can not reserve queue
    00.0033 for host driver: in use by mdev
    
    Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
    Link: https://lore.kernel.org/r/20250311103304.1539188-1-akrowiak@linux.ibm.com
    Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
    Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

samples/bpf: Fix compilation failure for samples/bpf on LoongArch Fedora [+ + +]

Author: Haoran Jiang <jianghaoran@kylinos.cn>
Date:   Fri Apr 25 17:50:42 2025 +0800

    samples/bpf: Fix compilation failure for samples/bpf on LoongArch Fedora
    
    [ Upstream commit 548762f05d19c5542db7590bcdfb9be1fb928376 ]
    
    When building the latest samples/bpf on LoongArch Fedora
    
         make M=samples/bpf
    
    There are compilation errors as follows:
    
    In file included from ./linux/samples/bpf/sockex2_kern.c:2:
    In file included from ./include/uapi/linux/in.h:25:
    In file included from ./include/linux/socket.h:8:
    In file included from ./include/linux/uio.h:9:
    In file included from ./include/linux/thread_info.h:60:
    In file included from ./arch/loongarch/include/asm/thread_info.h:15:
    In file included from ./arch/loongarch/include/asm/processor.h:13:
    In file included from ./arch/loongarch/include/asm/cpu-info.h:11:
    ./arch/loongarch/include/asm/loongarch.h:13:10: fatal error: 'larchintrin.h' file not found
             ^~~~~~~~~~~~~~~
    1 error generated.
    
    larchintrin.h is included in /usr/lib64/clang/14.0.6/include,
    and the header file location is specified at compile time.
    
    Test on LoongArch Fedora:
    https://github.com/fedora-remix-loongarch/releases-info
    
    Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn>
    Signed-off-by: zhangxi <zhangxi@kylinos.cn>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20250425095042.838824-1-jianghaoran@kylinos.cn
    Signed-off-by: Sasha Levin <sashal@kernel.org>

sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue() [+ + +]

Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Sun May 18 15:20:37 2025 -0700

    sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
    
    [ Upstream commit 3f981138109f63232a5fb7165938d4c945cc1b9d ]
    
    When enqueuing the first packet to an HFSC class, hfsc_enqueue() calls the
    child qdisc's peek() operation before incrementing sch->q.qlen and
    sch->qstats.backlog. If the child qdisc uses qdisc_peek_dequeued(), this may
    trigger an immediate dequeue and potential packet drop. In such cases,
    qdisc_tree_reduce_backlog() is called, but the HFSC qdisc's qlen and backlog
    have not yet been updated, leading to inconsistent queue accounting. This
    can leave an empty HFSC class in the active list, causing further
    consequences like use-after-free.
    
    This patch fixes the bug by moving the increment of sch->q.qlen and
    sch->qstats.backlog before the call to the child qdisc's peek() operation.
    This ensures that queue length and backlog are always accurate when packet
    drops or dequeues are triggered during the peek.
    
    Fixes: 12d0ad3be9c3 ("net/sched/sch_hfsc.c: handle corner cases where head may change invalidating calculated deadline")
    Reported-by: Mingi Cho <mincho@theori.io>
    Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://patch.msgid.link/20250518222038.58538-2-xiyou.wangcong@gmail.com
    Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: lpfc: Free phba irq in lpfc_sli4_enable_msi() when pci_irq_vector() fails [+ + +]

Author: Justin Tee <justin.tee@broadcom.com>
Date:   Thu Jan 30 16:05:20 2025 -0800

    scsi: lpfc: Free phba irq in lpfc_sli4_enable_msi() when pci_irq_vector() fails
    
    [ Upstream commit f0842902b383982d1f72c490996aa8fc29a7aa0d ]
    
    Fix smatch warning regarding missed calls to free_irq().  Free the phba IRQ
    in the failed pci_irq_vector cases.
    
    lpfc_init.c: lpfc_sli4_enable_msi() warn: 'phba->pcidev->irq' from
                 request_irq() not released.
    
    Signed-off-by: Justin Tee <justin.tee@broadcom.com>
    Link: https://lore.kernel.org/r/20250131000524.163662-3-justintee8345@gmail.com
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: lpfc: Handle duplicate D_IDs in ndlp search-by D_ID routine [+ + +]

Author: Justin Tee <justin.tee@broadcom.com>
Date:   Thu Jan 30 16:05:22 2025 -0800

    scsi: lpfc: Handle duplicate D_IDs in ndlp search-by D_ID routine
    
    [ Upstream commit 56c3d809b7b450379162d0b8a70bbe71ab8db706 ]
    
    After a port swap between separate fabrics, there may be multiple nodes in
    the vport's fc_nodes list with the same fabric well known address.
    Duplication is temporary and eventually resolves itself after dev_loss_tmo
    expires, but nameserver queries may still occur before dev_loss_tmo.  This
    possibly results in returning stale fabric ndlp objects.  Fix by adding an
    nlp_state check to ensure the ndlp search routine returns the correct newer
    allocated ndlp fabric object.
    
    Signed-off-by: Justin Tee <justin.tee@broadcom.com>
    Link: https://lore.kernel.org/r/20250131000524.163662-5-justintee8345@gmail.com
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: mpi3mr: Add level check to control event logging [+ + +]

Author: Ranjan Kumar <ranjan.kumar@broadcom.com>
Date:   Tue Apr 15 15:45:46 2025 +0530

    scsi: mpi3mr: Add level check to control event logging
    
    [ Upstream commit b0b7ee3b574a72283399b9232f6190be07f220c0 ]
    
    Ensure event logs are only generated when the debug logging level
    MPI3_DEBUG_EVENT is enabled. This prevents unnecessary logging.
    
    Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
    Link: https://lore.kernel.org/r/20250415101546.204018-1-ranjan.kumar@broadcom.com
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: mpt3sas: Send a diag reset if target reset fails [+ + +]

Author: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Date:   Wed Feb 12 17:26:55 2025 -0800

    scsi: mpt3sas: Send a diag reset if target reset fails
    
    [ Upstream commit 5612d6d51ed2634a033c95de2edec7449409cbb9 ]
    
    When an IOCTL times out and driver issues a target reset, if firmware
    fails the task management elevate the recovery by issuing a diag reset to
    controller.
    
    Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
    Link: https://lore.kernel.org/r/1739410016-27503-5-git-send-email-shivasharan.srikanteshwara@broadcom.com
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: st: ERASE does not change tape location [+ + +]

Author: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
Date:   Tue Mar 11 13:25:15 2025 +0200

    scsi: st: ERASE does not change tape location
    
    [ Upstream commit ad77cebf97bd42c93ab4e3bffd09f2b905c1959a ]
    
    The SCSI ERASE command erases from the current position onwards.  Don't
    clear the position variables.
    
    Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
    Link: https://lore.kernel.org/r/20250311112516.5548-3-Kai.Makisara@kolumbus.fi
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: st: Restore some drive settings after reset [+ + +]

Author: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
Date:   Mon Jan 20 21:49:22 2025 +0200

    scsi: st: Restore some drive settings after reset
    
    [ Upstream commit 7081dc75df79696d8322d01821c28e53416c932c ]
    
    Some of the allowed operations put the tape into a known position to
    continue operation assuming only the tape position has changed.  But reset
    sets partition, density and block size to drive default values. These
    should be restored to the values before reset.
    
    Normally the current block size and density are stored by the drive.  If
    the settings have been changed, the changed values have to be saved by the
    driver across reset.
    
    Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
    Link: https://lore.kernel.org/r/20250120194925.44432-2-Kai.Makisara@kolumbus.fi
    Reviewed-by: John Meneghini <jmeneghi@redhat.com>
    Tested-by: John Meneghini <jmeneghi@redhat.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: st: Tighten the page format heuristics with MODE SELECT [+ + +]

Author: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
Date:   Tue Mar 11 13:25:16 2025 +0200

    scsi: st: Tighten the page format heuristics with MODE SELECT
    
    [ Upstream commit 8db816c6f176321e42254badd5c1a8df8bfcfdb4 ]
    
    In the days when SCSI-2 was emerging, some drives did claim SCSI-2 but did
    not correctly implement it. The st driver first tries MODE SELECT with the
    page format bit set to set the block descriptor.  If not successful, the
    non-page format is tried.
    
    The test only tests the sense code and this triggers also from illegal
    parameter in the parameter list. The test is limited to "old" devices and
    made more strict to remove false alarms.
    
    Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
    Link: https://lore.kernel.org/r/20250311112516.5548-4-Kai.Makisara@kolumbus.fi
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: target: iscsi: Fix timeout on deleted connection [+ + +]

Author: Dmitry Bogdanov <d.bogdanov@yadro.com>
Date:   Tue Dec 24 13:17:57 2024 +0300

    scsi: target: iscsi: Fix timeout on deleted connection
    
    [ Upstream commit 7f533cc5ee4c4436cee51dc58e81dfd9c3384418 ]
    
    NOPIN response timer may expire on a deleted connection and crash with
    such logs:
    
    Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus (null),i,0x00023d000125,iqn.2017-01.com.iscsi.target,t,0x3d
    
    BUG: Kernel NULL pointer dereference on read at 0x00000000
    NIP  strlcpy+0x8/0xb0
    LR iscsit_fill_cxn_timeout_err_stats+0x5c/0xc0 [iscsi_target_mod]
    Call Trace:
     iscsit_handle_nopin_response_timeout+0xfc/0x120 [iscsi_target_mod]
     call_timer_fn+0x58/0x1f0
     run_timer_softirq+0x740/0x860
     __do_softirq+0x16c/0x420
     irq_exit+0x188/0x1c0
     timer_interrupt+0x184/0x410
    
    That is because nopin response timer may be re-started on nopin timer
    expiration.
    
    Stop nopin timer before stopping the nopin response timer to be sure
    that no one of them will be re-started.
    
    Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
    Link: https://lore.kernel.org/r/20241224101757.32300-1-d.bogdanov@yadro.com
    Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests/bpf: Mitigate sockmap_ktls disconnect_after_delete failure [+ + +]

Author: Ihor Solodrai <ihor.solodrai@linux.dev>
Date:   Wed Apr 16 10:02:46 2025 -0700

    selftests/bpf: Mitigate sockmap_ktls disconnect_after_delete failure
    
    [ Upstream commit f2858f308131a09e33afb766cd70119b5b900569 ]
    
    "sockmap_ktls disconnect_after_delete" test has been failing on BPF CI
    after recent merges from netdev:
    * https://github.com/kernel-patches/bpf/actions/runs/14458537639
    * https://github.com/kernel-patches/bpf/actions/runs/14457178732
    
    It happens because disconnect has been disabled for TLS [1], and it
    renders the test case invalid.
    
    Removing all the test code creates a conflict between bpf and
    bpf-next, so for now only remove the offending assert [2].
    
    The test will be removed later on bpf-next.
    
    [1] https://lore.kernel.org/netdev/20250404180334.3224206-1-kuba@kernel.org/
    [2] https://lore.kernel.org/bpf/cfc371285323e1a3f3b006bfcf74e6cf7ad65258@linux.dev/
    
    Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
    Link: https://lore.kernel.org/bpf/20250416170246.2438524-1-ihor.solodrai@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests/net: have `gro.sh -t` return a correct exit code [+ + +]

Author: Kevin Krakauer <krakauer@google.com>
Date:   Wed Feb 26 11:27:23 2025 -0800

    selftests/net: have `gro.sh -t` return a correct exit code
    
    [ Upstream commit 784e6abd99f24024a8998b5916795f0bec9d2fd9 ]
    
    Modify gro.sh to return a useful exit code when the -t flag is used. It
    formerly returned 0 no matter what.
    
    Tested: Ran `gro.sh -t large` and verified that test failures return 1.
    Signed-off-by: Kevin Krakauer <krakauer@google.com>
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Link: https://patch.msgid.link/20250226192725.621969-2-krakauer@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

serial: mctrl_gpio: split disable_ms into sync and no_sync APIs [+ + +]

Author: Alexis Lothoré <alexis.lothore@bootlin.com>
Date:   Mon Feb 17 07:21:53 2025 +0100

    serial: mctrl_gpio: split disable_ms into sync and no_sync APIs
    
    [ Upstream commit 1bd2aad57da95f7f2d2bb52f7ad15c0f4993a685 ]
    
    The following splat has been observed on a SAMA5D27 platform using
    atmel_serial:
    
    BUG: sleeping function called from invalid context at kernel/irq/manage.c:738
    in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 27, name: kworker/u5:0
    preempt_count: 1, expected: 0
    INFO: lockdep is turned off.
    irq event stamp: 0
    hardirqs last  enabled at (0): [<00000000>] 0x0
    hardirqs last disabled at (0): [<c01588f0>] copy_process+0x1c4c/0x7bec
    softirqs last  enabled at (0): [<c0158944>] copy_process+0x1ca0/0x7bec
    softirqs last disabled at (0): [<00000000>] 0x0
    CPU: 0 UID: 0 PID: 27 Comm: kworker/u5:0 Not tainted 6.13.0-rc7+ #74
    Hardware name: Atmel SAMA5
    Workqueue: hci0 hci_power_on [bluetooth]
    Call trace:
      unwind_backtrace from show_stack+0x18/0x1c
      show_stack from dump_stack_lvl+0x44/0x70
      dump_stack_lvl from __might_resched+0x38c/0x598
      __might_resched from disable_irq+0x1c/0x48
      disable_irq from mctrl_gpio_disable_ms+0x74/0xc0
      mctrl_gpio_disable_ms from atmel_disable_ms.part.0+0x80/0x1f4
      atmel_disable_ms.part.0 from atmel_set_termios+0x764/0x11e8
      atmel_set_termios from uart_change_line_settings+0x15c/0x994
      uart_change_line_settings from uart_set_termios+0x2b0/0x668
      uart_set_termios from tty_set_termios+0x600/0x8ec
      tty_set_termios from ttyport_set_flow_control+0x188/0x1e0
      ttyport_set_flow_control from wilc_setup+0xd0/0x524 [hci_wilc]
      wilc_setup [hci_wilc] from hci_dev_open_sync+0x330/0x203c [bluetooth]
      hci_dev_open_sync [bluetooth] from hci_dev_do_open+0x40/0xb0 [bluetooth]
      hci_dev_do_open [bluetooth] from hci_power_on+0x12c/0x664 [bluetooth]
      hci_power_on [bluetooth] from process_one_work+0x998/0x1a38
      process_one_work from worker_thread+0x6e0/0xfb4
      worker_thread from kthread+0x3d4/0x484
      kthread from ret_from_fork+0x14/0x28
    
    This warning is emitted when trying to toggle, at the highest level,
    some flow control (with serdev_device_set_flow_control) in a device
    driver. At the lowest level, the atmel_serial driver is using
    serial_mctrl_gpio lib to enable/disable the corresponding IRQs
    accordingly.  The warning emitted by CONFIG_DEBUG_ATOMIC_SLEEP is due to
    disable_irq (called in mctrl_gpio_disable_ms) being possibly called in
    some atomic context (some tty drivers perform modem lines configuration
    in regions protected by port lock).
    
    Split mctrl_gpio_disable_ms into two differents APIs, a non-blocking one
    and a blocking one. Replace mctrl_gpio_disable_ms calls with the
    relevant version depending on whether the call is protected by some port
    lock.
    
    Suggested-by: Jiri Slaby <jirislaby@kernel.org>
    Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
    Acked-by: Richard Genoud <richard.genoud@bootlin.com>
    Link: https://lore.kernel.org/r/20250217-atomic_sleep_mctrl_serial_gpio-v3-1-59324b313eef@bootlin.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

serial: sh-sci: Save and restore more registers [+ + +]

Author: Geert Uytterhoeven <geert+renesas@glider.be>
Date:   Tue Mar 4 20:06:11 2025 +0100

    serial: sh-sci: Save and restore more registers
    
    commit 81100b9a7b0515132996d62a7a676a77676cb6e3 upstream.
    
    On (H)SCIF with a Baud Rate Generator for External Clock (BRG), there
    are multiple ways to configure the requested serial speed.  If firmware
    uses a different method than Linux, and if any debug info is printed
    after the Bit Rate Register (SCBRR) is restored, but before termios is
    reconfigured (which configures the alternative method), the system may
    lock-up during resume.
    
    Fix this by saving and restoring the contents of the BRG Frequency
    Division (SCDL) and Clock Select (SCCKS) registers as well.
    
    Also save and restore the HSCIF's Sampling Rate Register (HSSRR), which
    configures the sampling point, and the SCIFA/SCIFB's Serial Port Control
    and Data Registers (SCPCR/SCPDR), which configure the optional control
    flow signals.
    
    After this, all registers that are not saved/restored are either:
      - read-only,
      - write-only,
      - status registers containing flags with clear-after-set semantics,
      - FIFO Data Count Trigger registers, which do not matter much for
        the serial console.
    
    Fixes: 22a6984c5b5df8ea ("serial: sh-sci: Update the suspend/resume support")
    Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Tested-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
    Reviewed-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
    Link: https://lore.kernel.org/r/11c2eab45d48211e75d8b8202cce60400880fe55.1741114989.git.geert+renesas@glider.be
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

serial: sh-sci: Update the suspend/resume support [+ + +]

Author: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Date:   Fri Feb 7 13:33:13 2025 +0200

    serial: sh-sci: Update the suspend/resume support
    
    [ Upstream commit 22a6984c5b5df8eab864d7f3e8b94d5a554d31ab ]
    
    The Renesas RZ/G3S supports a power saving mode where power to most of the
    SoC components is turned off. When returning from this power saving mode,
    SoC components need to be re-configured.
    
    The SCIFs on the Renesas RZ/G3S need to be re-configured as well when
    returning from this power saving mode. The sh-sci code already configures
    the SCIF clocks, power domain and registers by calling uart_resume_port()
    in sci_resume(). On suspend path the SCIF UART ports are suspended
    accordingly (by calling uart_suspend_port() in sci_suspend()). The only
    missing setting is the reset signal. For this assert/de-assert the reset
    signal on driver suspend/resume.
    
    In case the no_console_suspend is specified by the user, the registers need
    to be saved on suspend path and restore on resume path. To do this the
    sci_console_save()/sci_console_restore() functions were added. There is no
    need to cache/restore the status or FIFO registers. Only the control
    registers. The registers that will be saved/restored on suspend/resume are
    specified by the struct sci_suspend_regs data structure.
    
    Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
    Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Link: https://lore.kernel.org/r/20250207113313.545432-1-claudiu.beznea.uj@bp.renesas.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

smack: recognize ipv4 CIPSO w/o categories [+ + +]

Author: Konstantin Andreev <andreev@swemel.ru>
Date:   Fri Jan 17 02:40:34 2025 +0300

    smack: recognize ipv4 CIPSO w/o categories
    
    [ Upstream commit a158a937d864d0034fea14913c1f09c6d5f574b8 ]
    
    If SMACK label has CIPSO representation w/o categories, e.g.:
    
    | # cat /smack/cipso2
    | foo  10
    | @ 250/2
    | ...
    
    then SMACK does not recognize such CIPSO in input ipv4 packets
    and substitues '*' label instead. Audit records may look like
    
    | lsm=SMACK fn=smack_socket_sock_rcv_skb action=denied
    |   subject="*" object="_" requested=w pid=0 comm="swapper/1" ...
    
    This happens in two steps:
    
    1) security/smack/smackfs.c`smk_set_cipso
       does not clear NETLBL_SECATTR_MLS_CAT
       from (struct smack_known *)skp->smk_netlabel.flags
       on assigning CIPSO w/o categories:
    
    | rcu_assign_pointer(skp->smk_netlabel.attr.mls.cat, ncats.attr.mls.cat);
    | skp->smk_netlabel.attr.mls.lvl = ncats.attr.mls.lvl;
    
    2) security/smack/smack_lsm.c`smack_from_secattr
       can not match skp->smk_netlabel with input packet's
       struct netlbl_lsm_secattr *sap
       because sap->flags have not NETLBL_SECATTR_MLS_CAT (what is correct)
       but skp->smk_netlabel.flags have (what is incorrect):
    
    | if ((sap->flags & NETLBL_SECATTR_MLS_CAT) == 0) {
    |       if ((skp->smk_netlabel.flags &
    |                NETLBL_SECATTR_MLS_CAT) == 0)
    |               found = 1;
    |       break;
    | }
    
    This commit sets/clears NETLBL_SECATTR_MLS_CAT in
    skp->smk_netlabel.flags according to the presense of CIPSO categories.
    The update of smk_netlabel is not atomic, so input packets processing
    still may be incorrect during short time while update proceeds.
    
    Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
    Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

smb: client: Fix use-after-free in cifs_fill_dirent [+ + +]

Author: Wang Zhaolong <wangzhaolong1@huawei.com>
Date:   Fri May 16 17:12:55 2025 +0800

    smb: client: Fix use-after-free in cifs_fill_dirent
    
    commit a7a8fe56e932a36f43e031b398aef92341bf5ea0 upstream.
    
    There is a race condition in the readdir concurrency process, which may
    access the rsp buffer after it has been released, triggering the
    following KASAN warning.
    
     ==================================================================
     BUG: KASAN: slab-use-after-free in cifs_fill_dirent+0xb03/0xb60 [cifs]
     Read of size 4 at addr ffff8880099b819c by task a.out/342975
    
     CPU: 2 UID: 0 PID: 342975 Comm: a.out Not tainted 6.15.0-rc6+ #240 PREEMPT(full)
     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
     Call Trace:
      <TASK>
      dump_stack_lvl+0x53/0x70
      print_report+0xce/0x640
      kasan_report+0xb8/0xf0
      cifs_fill_dirent+0xb03/0xb60 [cifs]
      cifs_readdir+0x12cb/0x3190 [cifs]
      iterate_dir+0x1a1/0x520
      __x64_sys_getdents+0x134/0x220
      do_syscall_64+0x4b/0x110
      entry_SYSCALL_64_after_hwframe+0x76/0x7e
     RIP: 0033:0x7f996f64b9f9
     Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89
     f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
     f0 ff ff  0d f7 c3 0c 00 f7 d8 64 89 8
     RSP: 002b:00007f996f53de78 EFLAGS: 00000207 ORIG_RAX: 000000000000004e
     RAX: ffffffffffffffda RBX: 00007f996f53ecdc RCX: 00007f996f64b9f9
     RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
     RBP: 00007f996f53dea0 R08: 0000000000000000 R09: 0000000000000000
     R10: 0000000000000000 R11: 0000000000000207 R12: ffffffffffffff88
     R13: 0000000000000000 R14: 00007ffc8cd9a500 R15: 00007f996f51e000
      </TASK>
    
     Allocated by task 408:
      kasan_save_stack+0x20/0x40
      kasan_save_track+0x14/0x30
      __kasan_slab_alloc+0x6e/0x70
      kmem_cache_alloc_noprof+0x117/0x3d0
      mempool_alloc_noprof+0xf2/0x2c0
      cifs_buf_get+0x36/0x80 [cifs]
      allocate_buffers+0x1d2/0x330 [cifs]
      cifs_demultiplex_thread+0x22b/0x2690 [cifs]
      kthread+0x394/0x720
      ret_from_fork+0x34/0x70
      ret_from_fork_asm+0x1a/0x30
    
     Freed by task 342979:
      kasan_save_stack+0x20/0x40
      kasan_save_track+0x14/0x30
      kasan_save_free_info+0x3b/0x60
      __kasan_slab_free+0x37/0x50
      kmem_cache_free+0x2b8/0x500
      cifs_buf_release+0x3c/0x70 [cifs]
      cifs_readdir+0x1c97/0x3190 [cifs]
      iterate_dir+0x1a1/0x520
      __x64_sys_getdents64+0x134/0x220
      do_syscall_64+0x4b/0x110
      entry_SYSCALL_64_after_hwframe+0x76/0x7e
    
     The buggy address belongs to the object at ffff8880099b8000
      which belongs to the cache cifs_request of size 16588
     The buggy address is located 412 bytes inside of
      freed 16588-byte region [ffff8880099b8000, ffff8880099bc0cc)
    
     The buggy address belongs to the physical page:
     page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99b8
     head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
     anon flags: 0x80000000000040(head|node=0|zone=1)
     page_type: f5(slab)
     raw: 0080000000000040 ffff888001e03400 0000000000000000 dead000000000001
     raw: 0000000000000000 0000000000010001 00000000f5000000 0000000000000000
     head: 0080000000000040 ffff888001e03400 0000000000000000 dead000000000001
     head: 0000000000000000 0000000000010001 00000000f5000000 0000000000000000
     head: 0080000000000003 ffffea0000266e01 00000000ffffffff 00000000ffffffff
     head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
     page dumped because: kasan: bad access detected
    
     Memory state around the buggy address:
      ffff8880099b8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ffff8880099b8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
     >ffff8880099b8180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                 ^
      ffff8880099b8200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ffff8880099b8280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
     ==================================================================
    
    POC is available in the link [1].
    
    The problem triggering process is as follows:
    
    Process 1                       Process 2
    -----------------------------------------------------------------
    cifs_readdir
      /* file->private_data == NULL */
      initiate_cifs_search
        cifsFile = kzalloc(sizeof(struct cifsFileInfo), GFP_KERNEL);
        smb2_query_dir_first ->query_dir_first()
          SMB2_query_directory
            SMB2_query_directory_init
            cifs_send_recv
            smb2_parse_query_directory
              srch_inf->ntwrk_buf_start = (char *)rsp;
              srch_inf->srch_entries_start = (char *)rsp + ...
              srch_inf->last_entry = (char *)rsp + ...
              srch_inf->smallBuf = true;
      find_cifs_entry
        /* if (cfile->srch_inf.ntwrk_buf_start) */
        cifs_small_buf_release(cfile->srch_inf // free
    
                            cifs_readdir  ->iterate_shared()
                              /* file->private_data != NULL */
                              find_cifs_entry
                                /* in while (...) loop */
                                smb2_query_dir_next  ->query_dir_next()
                                  SMB2_query_directory
                                    SMB2_query_directory_init
                                    cifs_send_recv
                                      compound_send_recv
                                        smb_send_rqst
                                        __smb_send_rqst
                                          rc = -ERESTARTSYS;
                                          /* if (fatal_signal_pending()) */
                                          goto out;
                                          return rc
                                /* if (cfile->srch_inf.last_entry) */
                                cifs_save_resume_key()
                                  cifs_fill_dirent // UAF
                                /* if (rc) */
                                return -ENOENT;
    
    Fix this by ensuring the return code is checked before using pointers
    from the srch_inf.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=220131 [1]
    Fixes: a364bc0b37f1 ("[CIFS] fix saving of resume key before CIFSFindNext")
    Cc: stable@vger.kernel.org
    Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb: client: Reset all search buffer pointers when releasing buffer [+ + +]

Author: Wang Zhaolong <wangzhaolong1@huawei.com>
Date:   Fri May 16 17:12:56 2025 +0800

    smb: client: Reset all search buffer pointers when releasing buffer
    
    commit e48f9d849bfdec276eebf782a84fd4dfbe1c14c0 upstream.
    
    Multiple pointers in struct cifs_search_info (ntwrk_buf_start,
    srch_entries_start, and last_entry) point to the same allocated buffer.
    However, when freeing this buffer, only ntwrk_buf_start was set to NULL,
    while the other pointers remained pointing to freed memory.
    
    This is defensive programming to prevent potential issues with stale
    pointers. While the active UAF vulnerability is fixed by the previous
    patch, this change ensures consistent pointer state and more robust error
    handling.
    
    Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com>
    Cc: stable@vger.kernel.org
    Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

soc: apple: rtkit: Implement OSLog buffers properly [+ + +]

Author: Hector Martin <marcan@marcan.st>
Date:   Wed Feb 26 19:00:04 2025 +0000

    soc: apple: rtkit: Implement OSLog buffers properly
    
    [ Upstream commit a06398687065e0c334dc5fc4d2778b5b87292e43 ]
    
    Apparently nobody can figure out where the old logic came from, but it
    seems like it has never been actually used on any supported firmware to
    this day. OSLog buffers were apparently never requested.
    
    But starting with 13.3, we actually need this implemented properly for
    MTP (and later AOP) to work, so let's actually do that.
    
    Signed-off-by: Hector Martin <marcan@marcan.st>
    Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
    Link: https://lore.kernel.org/r/20250226-apple-soc-misc-v2-2-c3ec37f9021b@svenpeter.dev
    Signed-off-by: Sven Peter <sven@svenpeter.dev>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

soc: apple: rtkit: Use high prio work queue [+ + +]

Author: Janne Grunau <j@jannau.net>
Date:   Wed Feb 26 19:00:05 2025 +0000

    soc: apple: rtkit: Use high prio work queue
    
    [ Upstream commit 22af2fac88fa5dbc310bfe7d0b66d4de3ac47305 ]
    
    rtkit messages as communication with the DCP firmware for framebuffer
    swaps or input events are time critical so use WQ_HIGHPRI to prevent
    user space CPU load to increase latency.
    With kwin_wayland 6's explicit sync mode user space load was able to
    delay the IOMFB rtkit communication enough to miss vsync for surface
    swaps. Minimal test scenario is constantly resizing a glxgears
    Xwayland window.
    
    Signed-off-by: Janne Grunau <j@jannau.net>
    Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
    Link: https://lore.kernel.org/r/20250226-apple-soc-misc-v2-3-c3ec37f9021b@svenpeter.dev
    Signed-off-by: Sven Peter <sven@svenpeter.dev>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

soc: ti: k3-socinfo: Do not use syscon helper to build regmap [+ + +]

Author: Andrew Davis <afd@ti.com>
Date:   Thu Jan 23 12:17:26 2025 -0600

    soc: ti: k3-socinfo: Do not use syscon helper to build regmap
    
    [ Upstream commit a5caf03188e44388e8c618dcbe5fffad1a249385 ]
    
    The syscon helper device_node_to_regmap() is used to fetch a regmap
    registered to a device node. It also currently creates this regmap
    if the node did not already have a regmap associated with it. This
    should only be used on "syscon" nodes. This driver is not such a
    device and instead uses device_node_to_regmap() on its own node as
    a hacky way to create a regmap for itself.
    
    This will not work going forward and so we should create our regmap
    the normal way by defining our regmap_config, fetching our memory
    resource, then using the normal regmap_init_mmio() function.
    
    Signed-off-by: Andrew Davis <afd@ti.com>
    Link: https://lore.kernel.org/r/20250123181726.597144-1-afd@ti.com
    Signed-off-by: Nishanth Menon <nm@ti.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

spi: spi-fsl-dspi: Halt the module after a new message transfer [+ + +]

Author: Bogdan-Gabriel Roman <bogdan-gabriel.roman@nxp.com>
Date:   Thu May 22 15:51:31 2025 +0100

    spi: spi-fsl-dspi: Halt the module after a new message transfer
    
    [ Upstream commit 8a30a6d35a11ff5ccdede7d6740765685385a917 ]
    
    The XSPI mode implementation in this driver still uses the EOQ flag to
    signal the last word in a transmission and deassert the PCS signal.
    However, at speeds lower than ~200kHZ, the PCS signal seems to remain
    asserted even when SR[EOQF] = 1 indicates the end of a transmission.
    This is a problem for target devices which require the deassertation of
    the PCS signal between transfers.
    
    Hence, this commit 'forces' the deassertation of the PCS by stopping the
    module through MCR[HALT] after completing a new transfer. According to
    the reference manual, the module stops or transitions from the Running
    state to the Stopped state after the current frame, when any one of the
    following conditions exist:
    - The value of SR[EOQF] = 1.
    - The chip is in Debug mode and the value of MCR[FRZ] = 1.
    - The value of MCR[HALT] = 1.
    
    This shouldn't be done if the last transfer in the message has cs_change
    set.
    
    Fixes: ea93ed4c181b ("spi: spi-fsl-dspi: Use EOQ for last word in buffer even for XSPI mode")
    Signed-off-by: Bogdan-Gabriel Roman <bogdan-gabriel.roman@nxp.com>
    Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
    Signed-off-by: James Clark <james.clark@linaro.org>
    Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-2-bea884630cfb@linaro.org
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

spi: spi-fsl-dspi: Reset SR flags before sending a new message [+ + +]

Author: Larisa Grigore <larisa.grigore@nxp.com>
Date:   Thu May 22 15:51:32 2025 +0100

    spi: spi-fsl-dspi: Reset SR flags before sending a new message
    
    [ Upstream commit 7aba292eb15389073c7f3bd7847e3862dfdf604d ]
    
    If, in a previous transfer, the controller sends more data than expected
    by the DSPI target, SR.RFDF (RX FIFO is not empty) will remain asserted.
    When flushing the FIFOs at the beginning of a new transfer (writing 1
    into MCR.CLR_TXF and MCR.CLR_RXF), SR.RFDF should also be cleared.
    Otherwise, when running in target mode with DMA, if SR.RFDF remains
    asserted, the DMA callback will be fired before the controller sends any
    data.
    
    Take this opportunity to reset all Status Register fields.
    
    Fixes: 5ce3cc567471 ("spi: spi-fsl-dspi: Provide support for DSPI slave mode operation (Vybryd vf610)")
    Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
    Signed-off-by: James Clark <james.clark@linaro.org>
    Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-3-bea884630cfb@linaro.org
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

spi: spi-fsl-dspi: restrict register range for regmap access [+ + +]

Author: Larisa Grigore <larisa.grigore@nxp.com>
Date:   Thu May 22 15:51:30 2025 +0100

    spi: spi-fsl-dspi: restrict register range for regmap access
    
    [ Upstream commit 283ae0c65e9c592f4a1ba4f31917f5e766da7f31 ]
    
    DSPI registers are NOT continuous, some registers are reserved and
    accessing them from userspace will trigger external abort, add regmap
    register access table to avoid below abort.
    
      For example on S32G:
    
      # cat /sys/kernel/debug/regmap/401d8000.spi/registers
    
      Internal error: synchronous external abort: 96000210 1 PREEMPT SMP
      ...
      Call trace:
      regmap_mmio_read32le+0x24/0x48
      regmap_mmio_read+0x48/0x70
      _regmap_bus_reg_read+0x38/0x48
      _regmap_read+0x68/0x1b0
      regmap_read+0x50/0x78
      regmap_read_debugfs+0x120/0x338
    
    Fixes: 1acbdeb92c87 ("spi/fsl-dspi: Convert to use regmap and add big-endian support")
    Co-developed-by: Xulin Sun <xulin.sun@windriver.com>
    Signed-off-by: Xulin Sun <xulin.sun@windriver.com>
    Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
    Signed-off-by: James Clark <james.clark@linaro.org>
    Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-1-bea884630cfb@linaro.org
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

spi: spi-sun4i: fix early activation [+ + +]

Author: Alessandro Grassi <alessandro.grassi@mailbox.org>
Date:   Fri May 2 11:55:20 2025 +0200

    spi: spi-sun4i: fix early activation
    
    [ Upstream commit fb98bd0a13de2c9d96cb5c00c81b5ca118ac9d71 ]
    
    The SPI interface is activated before the CPOL setting is applied. In
    that moment, the clock idles high and CS goes low. After a short delay,
    CPOL and other settings are applied, which may cause the clock to change
    state and idle low. This transition is not part of a clock cycle, and it
    can confuse the receiving device.
    
    To prevent this unexpected transition, activate the interface while CPOL
    and the other settings are being applied.
    
    Signed-off-by: Alessandro Grassi <alessandro.grassi@mailbox.org>
    Link: https://patch.msgid.link/20250502095520.13825-1-alessandro.grassi@mailbox.org
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

spi: zynqmp-gqspi: Always acknowledge interrupts [+ + +]

Author: Sean Anderson <sean.anderson@linux.dev>
Date:   Thu Jan 16 17:41:30 2025 -0500

    spi: zynqmp-gqspi: Always acknowledge interrupts
    
    [ Upstream commit 89785306453ce6d949e783f6936821a0b7649ee2 ]
    
    RXEMPTY can cause an IRQ, even though we may not do anything about it
    (such as if we are waiting for more received data). We must still handle
    these IRQs because we can tell they were caused by the device.
    
    Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
    Link: https://patch.msgid.link/20250116224130.2684544-6-sean.anderson@linux.dev
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

SUNRPC: Don't allow waiting for exiting tasks [+ + +]

Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Fri Mar 28 12:52:52 2025 -0400

    SUNRPC: Don't allow waiting for exiting tasks
    
    [ Upstream commit 14e41b16e8cb677bb440dca2edba8b041646c742 ]
    
    Once a task calls exit_signals() it can no longer be signalled. So do
    not allow it to do killable waits.
    
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

SUNRPC: rpc_clnt_set_transport() must not change the autobind setting [+ + +]

Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Mon Mar 24 19:35:01 2025 -0400

    SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
    
    [ Upstream commit bf9be373b830a3e48117da5d89bb6145a575f880 ]
    
    The autobind setting was supposed to be determined in rpc_create(),
    since commit c2866763b402 ("SUNRPC: use sockaddr + size when creating
    remote transport endpoints").
    
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

SUNRPC: rpcbind should never reset the port to the value '0' [+ + +]

Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Mon Mar 24 19:05:48 2025 -0400

    SUNRPC: rpcbind should never reset the port to the value '0'
    
    [ Upstream commit 214c13e380ad7636631279f426387f9c4e3c14d9 ]
    
    If we already had a valid port number for the RPC service, then we
    should not allow the rpcbind client to set it to the invalid value '0'.
    
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tcp: bring back NUMA dispersion in inet_ehash_locks_alloc() [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Mar 5 13:05:50 2025 +0000

    tcp: bring back NUMA dispersion in inet_ehash_locks_alloc()
    
    [ Upstream commit f8ece40786c9342249aa0a1b55e148ee23b2a746 ]
    
    We have platforms with 6 NUMA nodes and 480 cpus.
    
    inet_ehash_locks_alloc() currently allocates a single 64KB page
    to hold all ehash spinlocks. This adds more pressure on a single node.
    
    Change inet_ehash_locks_alloc() to use vmalloc() to spread
    the spinlocks on all online nodes, driven by NUMA policies.
    
    At boot time, NUMA policy is interleave=all, meaning that
    tcp_hashinfo.ehash_locks gets hash dispersion on all nodes.
    
    Tested:
    
    lack5:~# grep inet_ehash_locks_alloc /proc/vmallocinfo
    0x00000000d9aec4d1-0x00000000a828b652   69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
    
    lack5:~# echo 8192 >/proc/sys/net/ipv4/tcp_child_ehash_entries
    lack5:~# numactl --interleave=all unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo"
    0x000000004e99d30c-0x00000000763f3279   36864 inet_ehash_locks_alloc+0x90/0x100 pages=8 vmalloc N0=1 N1=2 N2=2 N3=1 N4=1 N5=1
    0x00000000d9aec4d1-0x00000000a828b652   69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
    
    lack5:~# numactl --interleave=0,5 unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo"
    0x00000000fd73a33e-0x0000000004b9a177   36864 inet_ehash_locks_alloc+0x90/0x100 pages=8 vmalloc N0=4 N5=4
    0x00000000d9aec4d1-0x00000000a828b652   69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
    
    lack5:~# echo 1024 >/proc/sys/net/ipv4/tcp_child_ehash_entries
    lack5:~# numactl --interleave=all unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo"
    0x00000000db07d7a2-0x00000000ad697d29    8192 inet_ehash_locks_alloc+0x90/0x100 pages=1 vmalloc N2=1
    0x00000000d9aec4d1-0x00000000a828b652   69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Tested-by: Jason Xing <kerneljasonxing@gmail.com>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://patch.msgid.link/20250305130550.1865988-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tcp: reorganize tcp_in_ack_event() and tcp_count_delivered() [+ + +]

Author: Ilpo Järvinen <ij@kernel.org>
Date:   Wed Mar 5 23:38:41 2025 +0100

    tcp: reorganize tcp_in_ack_event() and tcp_count_delivered()
    
    [ Upstream commit 149dfb31615e22271d2525f078c95ea49bc4db24 ]
    
    - Move tcp_count_delivered() earlier and split tcp_count_delivered_ce()
      out of it
    - Move tcp_in_ack_event() later
    - While at it, remove the inline from tcp_in_ack_event() and let
      the compiler to decide
    
    Accurate ECN's heuristics does not know if there is going
    to be ACE field based CE counter increase or not until after
    rtx queue has been processed. Only then the number of ACKed
    bytes/pkts is available. As CE or not affects presence of
    FLAG_ECE, that information for tcp_in_ack_event is not yet
    available in the old location of the call to tcp_in_ack_event().
    
    Signed-off-by: Ilpo Järvinen <ij@kernel.org>
    Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

thermal/drivers/qoriq: Power down TMU on system suspend [+ + +]

Author: Alice Guo <alice.guo@nxp.com>
Date:   Mon Dec 9 11:48:59 2024 -0500

    thermal/drivers/qoriq: Power down TMU on system suspend
    
    [ Upstream commit 229f3feb4b0442835b27d519679168bea2de96c2 ]
    
    Enable power-down of TMU (Thermal Management Unit) for TMU version 2 during
    system suspend to save power. Save approximately 4.3mW on VDD_ANA_1P8 on
    i.MX93 platforms.
    
    Signed-off-by: Alice Guo <alice.guo@nxp.com>
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Link: https://lore.kernel.org/r/20241209164859.3758906-2-Frank.Li@nxp.com
    Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

thunderbolt: Do not add non-active NVM if NVM upgrade is disabled for retimer [+ + +]

Author: Mika Westerberg <mika.westerberg@linux.intel.com>
Date:   Wed Mar 5 14:56:20 2025 +0200

    thunderbolt: Do not add non-active NVM if NVM upgrade is disabled for retimer
    
    [ Upstream commit ad79c278e478ca8c1a3bf8e7a0afba8f862a48a1 ]
    
    This is only used to write a new NVM in order to upgrade the retimer
    firmware. It does not make sense to expose it if upgrade is disabled.
    This also makes it consistent with the router NVM upgrade.
    
    Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

timer_list: Don't use %pK through printk() [+ + +]

Author: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Date:   Tue Mar 11 10:54:47 2025 +0100

    timer_list: Don't use %pK through printk()
    
    [ Upstream commit a52067c24ccf6ee4c85acffa0f155e9714f9adce ]
    
    This reverts commit f590308536db ("timer debug: Hide kernel addresses via
    %pK in /proc/timer_list")
    
    The timer list helper SEQ_printf() uses either the real seq_printf() for
    procfs output or vprintk() to print to the kernel log, when invoked from
    SysRq-q. It uses %pK for printing pointers.
    
    In the past %pK was prefered over %p as it would not leak raw pointer
    values into the kernel log. Since commit ad67b74d2469 ("printk: hash
    addresses printed with %p") the regular %p has been improved to avoid this
    issue.
    
    Furthermore, restricted pointers ("%pK") were never meant to be used
    through printk(). They can still unintentionally leak raw pointers or
    acquire sleeping looks in atomic contexts.
    
    Switch to the regular pointer formatting which is safer, easier to reason
    about and sufficient here.
    
    Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/lkml/20250113171731-dc10e3c1-da64-4af0-b767-7c7070468023@linutronix.de/
    Link: https://lore.kernel.org/all/20250311-restricted-pointers-timer-v1-1-6626b91e54ab@linutronix.de
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tools/build: Don't pass test log files to linker [+ + +]

Author: Ian Rogers <irogers@google.com>
Date:   Tue Mar 11 14:36:23 2025 -0700

    tools/build: Don't pass test log files to linker
    
    [ Upstream commit 935e7cb5bb80106ff4f2fe39640f430134ef8cd8 ]
    
    Separate test log files from object files. Depend on test log output
    but don't pass to the linker.
    
    Reviewed-by: James Clark <james.clark@linaro.org>
    Signed-off-by: Ian Rogers <irogers@google.com>
    Link: https://lore.kernel.org/r/20250311213628.569562-2-irogers@google.com
    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tracing: Mark binary printing functions with __printf() attribute [+ + +]

Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date:   Fri Mar 21 16:40:49 2025 +0200

    tracing: Mark binary printing functions with __printf() attribute
    
    [ Upstream commit 196a062641fe68d9bfe0ad36b6cd7628c99ad22c ]
    
    Binary printing functions are using printf() type of format, and compiler
    is not happy about them as is:
    
    kernel/trace/trace.c:3292:9: error: function ‘trace_vbprintk’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format]
    kernel/trace/trace_seq.c:182:9: error: function ‘trace_seq_bprintf’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format]
    
    Fix the compilation errors by adding __printf() attribute.
    
    While at it, move existing __printf() attributes from the implementations
    to the declarations. IT also fixes incorrect attribute parameters that are
    used for trace_array_printk().
    
    Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Reviewed-by: Kees Cook <kees@kernel.org>
    Reviewed-by: Petr Mladek <pmladek@suse.com>
    Link: https://lore.kernel.org/r/20250321144822.324050-4-andriy.shevchenko@linux.intel.com
    Signed-off-by: Petr Mladek <pmladek@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

um: let 'make clean' properly clean underlying SUBARCH as well [+ + +]

Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Wed May 7 16:49:33 2025 +0900

    um: let 'make clean' properly clean underlying SUBARCH as well
    
    [ Upstream commit ab09da75700e9d25c7dfbc7f7934920beb5e39b9 ]
    
    Building the kernel with O= is affected by stale in-tree build artifacts.
    
    So, if the source tree is not clean, Kbuild displays the following:
    
      $ make ARCH=um O=build defconfig
      make[1]: Entering directory '/.../linux/build'
      ***
      *** The source tree is not clean, please run 'make ARCH=um mrproper'
      *** in /.../linux
      ***
      make[2]: *** [/.../linux/Makefile:673: outputmakefile] Error 1
      make[1]: *** [/.../linux/Makefile:248: __sub-make] Error 2
      make[1]: Leaving directory '/.../linux/build'
      make: *** [Makefile:248: __sub-make] Error 2
    
    Usually, running 'make mrproper' is sufficient for cleaning the source
    tree for out-of-tree builds.
    
    However, building UML generates build artifacts not only in arch/um/,
    but also in the SUBARCH directory (i.e., arch/x86/). If in-tree stale
    files remain under arch/x86/, Kbuild will reuse them instead of creating
    new ones under the specified build directory.
    
    This commit makes 'make ARCH=um clean' recurse into the SUBARCH directory.
    
    Reported-by: Shuah Khan <skhan@linuxfoundation.org>
    Closes: https://lore.kernel.org/lkml/20250502172459.14175-1-skhan@linuxfoundation.org/
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Acked-by: Johannes Berg <johannes@sipsolutions.net>
    Reviewed-by: David Gow <davidgow@google.com>
    Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

um: Store full CSGSFS and SS register from mcontext [+ + +]

Author: Benjamin Berg <benjamin@sipsolutions.net>
Date:   Mon Feb 24 19:18:19 2025 +0100

    um: Store full CSGSFS and SS register from mcontext
    
    [ Upstream commit cef721e0d53d2b64f2ba177c63a0dfdd7c0daf17 ]
    
    Doing this allows using registers as retrieved from an mcontext to be
    pushed to a process using PTRACE_SETREGS.
    
    It is not entirely clear to me why CSGSFS was masked. Doing so creates
    issues when using the mcontext as process state in seccomp and simply
    copying the register appears to work perfectly fine for ptrace.
    
    Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
    Link: https://patch.msgid.link/20250224181827.647129-2-benjamin@sipsolutions.net
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

um: Update min_low_pfn to match changes in uml_reserved [+ + +]

Author: Tiwei Bie <tiwei.btw@antgroup.com>
Date:   Fri Feb 21 12:18:55 2025 +0800

    um: Update min_low_pfn to match changes in uml_reserved
    
    [ Upstream commit e82cf3051e6193f61e03898f8dba035199064d36 ]
    
    When uml_reserved is updated, min_low_pfn must also be updated
    accordingly. Otherwise, min_low_pfn will not accurately reflect
    the lowest available PFN.
    
    Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
    Link: https://patch.msgid.link/20250221041855.1156109-1-tiwei.btw@antgroup.com
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

vfio/pci: Handle INTx IRQ_NOTCONNECTED [+ + +]

Author: Alex Williamson <alex.williamson@redhat.com>
Date:   Tue Mar 11 17:06:21 2025 -0600

    vfio/pci: Handle INTx IRQ_NOTCONNECTED
    
    [ Upstream commit 860be250fc32de9cb24154bf21b4e36f40925707 ]
    
    Some systems report INTx as not routed by setting pdev->irq to
    IRQ_NOTCONNECTED, resulting in a -ENOTCONN error when trying to
    setup eventfd signaling.  Include this in the set of conditions
    for which the PIN register is virtualized to zero.
    
    Additionally consolidate vfio_pci_get_irq_count() to use this
    virtualized value in reporting INTx support via ioctl and sanity
    checking ioctl paths since pdev->irq is re-used when the device
    is in MSI mode.
    
    The combination of these results in both the config space of the
    device and the ioctl interface behaving as if the device does not
    support INTx.
    
    Reviewed-by: Kevin Tian <kevin.tian@intel.com>
    Link: https://lore.kernel.org/r/20250311230623.1264283-1-alex.williamson@redhat.com
    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

virtio_ring: Fix data race by tagging event_triggered as racy for KCSAN [+ + +]

Author: Zhongqiu Han <quic_zhonhan@quicinc.com>
Date:   Wed Mar 12 21:04:12 2025 +0800

    virtio_ring: Fix data race by tagging event_triggered as racy for KCSAN
    
    [ Upstream commit 2e2f925fe737576df2373931c95e1a2b66efdfef ]
    
    syzbot reports a data-race when accessing the event_triggered, here is the
    simplified stack when the issue occurred:
    
    ==================================================================
    BUG: KCSAN: data-race in virtqueue_disable_cb / virtqueue_enable_cb_delayed
    
    write to 0xffff8881025bc452 of 1 bytes by task 3288 on cpu 0:
     virtqueue_enable_cb_delayed+0x42/0x3c0 drivers/virtio/virtio_ring.c:2653
     start_xmit+0x230/0x1310 drivers/net/virtio_net.c:3264
     __netdev_start_xmit include/linux/netdevice.h:5151 [inline]
     netdev_start_xmit include/linux/netdevice.h:5160 [inline]
     xmit_one net/core/dev.c:3800 [inline]
    
    read to 0xffff8881025bc452 of 1 bytes by interrupt on cpu 1:
     virtqueue_disable_cb_split drivers/virtio/virtio_ring.c:880 [inline]
     virtqueue_disable_cb+0x92/0x180 drivers/virtio/virtio_ring.c:2566
     skb_xmit_done+0x5f/0x140 drivers/net/virtio_net.c:777
     vring_interrupt+0x161/0x190 drivers/virtio/virtio_ring.c:2715
     __handle_irq_event_percpu+0x95/0x490 kernel/irq/handle.c:158
     handle_irq_event_percpu kernel/irq/handle.c:193 [inline]
    
    value changed: 0x01 -> 0x00
    ==================================================================
    
    When the data race occurs, the function virtqueue_enable_cb_delayed() sets
    event_triggered to false, and virtqueue_disable_cb_split/packed() reads it
    as false due to the race condition. Since event_triggered is an unreliable
    hint used for optimization, this should only cause the driver temporarily
    suggest that the device not send an interrupt notification when the event
    index is used.
    
    Fix this KCSAN reported data-race issue by explicitly tagging the access as
    data_racy.
    
    Reported-by: syzbot+efe683d57990864b8c8e@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/all/67c7761a.050a0220.15b4b9.0018.GAE@google.com/
    Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
    Message-Id: <20250312130412.3516307-1-quic_zhonhan@quicinc.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

vxlan: Annotate FDB data races [+ + +]

Author: Ido Schimmel <idosch@nvidia.com>
Date:   Tue Feb 4 16:55:42 2025 +0200

    vxlan: Annotate FDB data races
    
    [ Upstream commit f6205f8215f12a96518ac9469ff76294ae7bd612 ]
    
    The 'used' and 'updated' fields in the FDB entry structure can be
    accessed concurrently by multiple threads, leading to reports such as
    [1]. Can be reproduced using [2].
    
    Suppress these reports by annotating these accesses using
    READ_ONCE() / WRITE_ONCE().
    
    [1]
    BUG: KCSAN: data-race in vxlan_xmit / vxlan_xmit
    
    write to 0xffff942604d263a8 of 8 bytes by task 286 on cpu 0:
     vxlan_xmit+0xb29/0x2380
     dev_hard_start_xmit+0x84/0x2f0
     __dev_queue_xmit+0x45a/0x1650
     packet_xmit+0x100/0x150
     packet_sendmsg+0x2114/0x2ac0
     __sys_sendto+0x318/0x330
     __x64_sys_sendto+0x76/0x90
     x64_sys_call+0x14e8/0x1c00
     do_syscall_64+0x9e/0x1a0
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    read to 0xffff942604d263a8 of 8 bytes by task 287 on cpu 2:
     vxlan_xmit+0xadf/0x2380
     dev_hard_start_xmit+0x84/0x2f0
     __dev_queue_xmit+0x45a/0x1650
     packet_xmit+0x100/0x150
     packet_sendmsg+0x2114/0x2ac0
     __sys_sendto+0x318/0x330
     __x64_sys_sendto+0x76/0x90
     x64_sys_call+0x14e8/0x1c00
     do_syscall_64+0x9e/0x1a0
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    value changed: 0x00000000fffbac6e -> 0x00000000fffbac6f
    
    Reported by Kernel Concurrency Sanitizer on:
    CPU: 2 UID: 0 PID: 287 Comm: mausezahn Not tainted 6.13.0-rc7-01544-gb4b270f11a02 #5
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
    
    [2]
     #!/bin/bash
    
     set +H
     echo whitelist > /sys/kernel/debug/kcsan
     echo !vxlan_xmit > /sys/kernel/debug/kcsan
    
     ip link add name vx0 up type vxlan id 10010 dstport 4789 local 192.0.2.1
     bridge fdb add 00:11:22:33:44:55 dev vx0 self static dst 198.51.100.1
     taskset -c 0 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &
     taskset -c 2 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &
    
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://patch.msgid.link/20250204145549.1216254-2-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

vxlan: Join / leave MC group after remote changes [+ + +]

Author: Petr Machata <petrm@nvidia.com>
Date:   Fri Feb 14 17:18:21 2025 +0100

    vxlan: Join / leave MC group after remote changes
    
    [ Upstream commit d42d543368343c0449a4e433b5f02e063a86209c ]
    
    When a vxlan netdevice is brought up, if its default remote is a multicast
    address, the device joins the indicated group.
    
    Therefore when the multicast remote address changes, the device should
    leave the current group and subscribe to the new one. Similarly when the
    interface used for endpoint communication is changed in a situation when
    multicast remote is configured. This is currently not done.
    
    Both vxlan_igmp_join() and vxlan_igmp_leave() can however fail. So it is
    possible that with such fix, the netdevice will end up in an inconsistent
    situation where the old group is not joined anymore, but joining the new
    group fails. Should we join the new group first, and leave the old one
    second, we might end up in the opposite situation, where both groups are
    joined. Undoing any of this during rollback is going to be similarly
    problematic.
    
    One solution would be to just forbid the change when the netdevice is up.
    However in vnifilter mode, changing the group address is allowed, and these
    problems are simply ignored (see vxlan_vni_update_group()):
    
     # ip link add name br up type bridge vlan_filtering 1
     # ip link add vx1 up master br type vxlan external vnifilter local 192.0.2.1 dev lo dstport 4789
     # bridge vni add dev vx1 vni 200 group 224.0.0.1
     # tcpdump -i lo &
     # bridge vni add dev vx1 vni 200 group 224.0.0.2
     18:55:46.523438 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
     18:55:46.943447 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
     # bridge vni
     dev               vni                group/remote
     vx1               200                224.0.0.2
    
    Having two different modes of operation for conceptually the same interface
    is silly, so in this patch, just do what the vnifilter code does and deal
    with the errors by crossing fingers real hard.
    
    The vnifilter code leaves old before joining new, and in case of join /
    leave failures does not roll back the configuration changes that have
    already been applied, but bails out of joining if it could not leave. Do
    the same here: leave before join, apply changes unconditionally and do not
    attempt to join if we couldn't leave.
    
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: ath9k: return by of_get_mac_address [+ + +]

Author: Rosen Penev <rosenp@gmail.com>
Date:   Tue Nov 5 14:23:26 2024 -0800

    wifi: ath9k: return by of_get_mac_address
    
    [ Upstream commit dfffb317519f88534bb82797f055f0a2fd867e7b ]
    
    When using nvmem, ath9k could potentially be loaded before nvmem, which
    loads after mtd. This is an issue if DT contains an nvmem mac address.
    
    If nvmem is not ready in time for ath9k, -EPROBE_DEFER is returned. Pass
    it to _probe so that ath9k can properly grab a potentially present MAC
    address.
    
    Signed-off-by: Rosen Penev <rosenp@gmail.com>
    Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
    Link: https://patch.msgid.link/20241105222326.194417-1-rosenp@gmail.com
    Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: iwlwifi: add support for Killer on MTL [+ + +]

Author: Johannes Berg <johannes.berg@intel.com>
Date:   Tue May 6 21:42:59 2025 +0200

    wifi: iwlwifi: add support for Killer on MTL
    
    [ Upstream commit ebedf8b7f05b9c886d68d63025db8d1b12343157 ]
    
    For now, we need another entry for these devices, this
    will be changed completely for 6.16.
    
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219926
    Link: https://patch.msgid.link/20250506214258.2efbdc9e9a82.I31915ec252bd1c74bd53b89a0e214e42a74b6f2e@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: mac80211: don't unconditionally call drv_mgd_complete_tx() [+ + +]

Author: Johannes Berg <johannes.berg@intel.com>
Date:   Wed Feb 5 11:39:22 2025 +0200

    wifi: mac80211: don't unconditionally call drv_mgd_complete_tx()
    
    [ Upstream commit 1798271b3604b902d45033ec569f2bf77e94ecc2 ]
    
    We might not have called drv_mgd_prepare_tx(), so only call
    drv_mgd_complete_tx() under the same conditions.
    
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Link: https://patch.msgid.link/20250205110958.e091fc39a351.Ie6a3cdca070612a0aa4b3c6914ab9ed602d1f456@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: mac80211: remove misplaced drv_mgd_complete_tx() call [+ + +]

Author: Johannes Berg <johannes.berg@intel.com>
Date:   Wed Feb 5 11:39:21 2025 +0200

    wifi: mac80211: remove misplaced drv_mgd_complete_tx() call
    
    [ Upstream commit f4995cdc4d02d0abc8e9fcccad5c71ce676c1e3f ]
    
    In the original commit 15fae3410f1d ("mac80211: notify driver on
    mgd TX completion") I evidently made a mistake and placed the
    call in the "associated" if, rather than the "assoc_data". Later
    I noticed the missing call and placed it in commit c042600c17d8
    ("wifi: mac80211: adding missing drv_mgd_complete_tx() call"),
    but didn't remove the wrong one. Remove it now.
    
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Link: https://patch.msgid.link/20250205110958.6ed954179bbf.Id8ef8835b7e6da3bf913c76f77d201017dc8a3c9@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: rtl8xxxu: retry firmware download on error [+ + +]

Author: Soeren Moch <smoch@web.de>
Date:   Mon Jan 27 20:48:28 2025 +0100

    wifi: rtl8xxxu: retry firmware download on error
    
    [ Upstream commit 3d3e28feca7ac8c6cf2a390dbbe1f97e3feb7f36 ]
    
    Occasionally there is an EPROTO error during firmware download.
    This error is converted to EAGAIN in the download function.
    But nobody tries again and so device probe fails.
    
    Implement download retry to fix this.
    
    This error was observed (and fix tested) on a tbs2910 board [1]
    with an embedded RTL8188EU (0bda:8179) device behind a USB hub.
    
    [1] arch/arm/boot/dts/nxp/imx/imx6q-tbs2910.dts
    
    Signed-off-by: Soeren Moch <smoch@web.de>
    Acked-by: Ping-Ke Shih <pkshih@realtek.com>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://patch.msgid.link/20250127194828.599379-1-smoch@web.de
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: rtw88: Don't use static local variable in rtw8822b_set_tx_power_index_by_rate [+ + +]

Author: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Date:   Sun Jan 26 16:03:11 2025 +0200

    wifi: rtw88: Don't use static local variable in rtw8822b_set_tx_power_index_by_rate
    
    [ Upstream commit 00451eb3bec763f708e7e58326468c1e575e5a66 ]
    
    Some users want to plug two identical USB devices at the same time.
    This static variable could theoretically cause them to use incorrect
    TX power values.
    
    Move the variable to the caller and pass a pointer to it to
    rtw8822b_set_tx_power_index_by_rate().
    
    Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
    Acked-by: Ping-Ke Shih <pkshih@realtek.com>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://patch.msgid.link/8a60f581-0ab5-4d98-a97d-dd83b605008f@gmail.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: rtw88: Fix download_firmware_validate() for RTL8814AU [+ + +]

Author: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Date:   Tue Feb 4 20:37:36 2025 +0200

    wifi: rtw88: Fix download_firmware_validate() for RTL8814AU
    
    [ Upstream commit 9e8243025cc06abc975c876dffda052073207ab3 ]
    
    After the firmware is uploaded, download_firmware_validate() checks some
    bits in REG_MCUFW_CTRL to see if everything went okay. The
    RTL8814AU power on sequence sets bits 13 and 12 to 2, which this
    function does not expect, so it thinks the firmware upload failed.
    
    Make download_firmware_validate() ignore bits 13 and 12.
    
    Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
    Acked-by: Ping-Ke Shih <pkshih@realtek.com>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://patch.msgid.link/049d2887-22fc-47b7-9e59-62627cb525f8@gmail.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: rtw88: Fix rtw_desc_to_mcsrate() to handle MCS16-31 [+ + +]

Author: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Date:   Tue Feb 18 01:29:52 2025 +0200

    wifi: rtw88: Fix rtw_desc_to_mcsrate() to handle MCS16-31
    
    [ Upstream commit 86d04f8f991a0509e318fe886d5a1cf795736c7d ]
    
    This function translates the rate number reported by the hardware into
    something mac80211 can understand. It was ignoring the 3SS and 4SS HT
    rates. Translate them too.
    
    Also set *nss to 0 for the HT rates, just to make sure it's
    initialised.
    
    Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
    Acked-by: Ping-Ke Shih <pkshih@realtek.com>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://patch.msgid.link/d0a5a86b-4869-47f6-a5a7-01c0f987cc7f@gmail.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: rtw88: Fix rtw_init_ht_cap() for RTL8814AU [+ + +]

Author: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Date:   Tue Feb 18 01:30:22 2025 +0200

    wifi: rtw88: Fix rtw_init_ht_cap() for RTL8814AU
    
    [ Upstream commit c7eea1ba05ca5b0dbf77a27cf2e1e6e2fb3c0043 ]
    
    Set the RX mask and the highest RX rate according to the number of
    spatial streams the chip can receive. For RTL8814AU that is 3.
    
    Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
    Acked-by: Ping-Ke Shih <pkshih@realtek.com>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://patch.msgid.link/4e786f50-ed1c-4387-8b28-e6ff00e35e81@gmail.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: rtw88: Fix rtw_init_vht_cap() for RTL8814AU [+ + +]

Author: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Date:   Tue Feb 18 01:30:48 2025 +0200

    wifi: rtw88: Fix rtw_init_vht_cap() for RTL8814AU
    
    [ Upstream commit 6be7544d19fcfcb729495e793bc6181f85bb8949 ]
    
    Set the MCS maps and the highest rates according to the number of
    spatial streams the chip has. For RTL8814AU that is 3.
    
    Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
    Acked-by: Ping-Ke Shih <pkshih@realtek.com>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://patch.msgid.link/e86aa009-b5bf-4b3a-8112-ea5e3cd49465@gmail.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: rtw89: add wiphy_lock() to work that isn't held wiphy_lock() yet [+ + +]

Author: Ping-Ke Shih <pkshih@realtek.com>
Date:   Wed Jan 22 14:03:01 2025 +0800

    wifi: rtw89: add wiphy_lock() to work that isn't held wiphy_lock() yet
    
    [ Upstream commit ebfc9199df05d37b67f4d1b7ee997193f3d2e7c8 ]
    
    To ensure where are protected by driver mutex can also be protected by
    wiphy_lock(), so afterward we can remove driver mutex safely.
    
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://patch.msgid.link/20250122060310.31976-2-pkshih@realtek.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: rtw89: fw: propagate error code from rtw89_h2c_tx() [+ + +]

Author: Ping-Ke Shih <pkshih@realtek.com>
Date:   Mon Feb 17 14:43:06 2025 +0800

    wifi: rtw89: fw: propagate error code from rtw89_h2c_tx()
    
    [ Upstream commit 56e1acaa0f80620b8e2c3410db35b4b975782b0a ]
    
    The error code should be propagated to callers during downloading firmware
    header and body. Remove unnecessary assignment of -1.
    
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://patch.msgid.link/20250217064308.43559-4-pkshih@realtek.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2 [+ + +]

Author: Breno Leitao <leitao@debian.org>
Date:   Thu Oct 31 04:06:17 2024 -0700

    x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2
    
    [ Upstream commit 98fdaeb296f51ef08e727a7cc72e5b5c864c4f4d ]
    
    Change the default value of spectre v2 in user mode to respect the
    CONFIG_MITIGATION_SPECTRE_V2 config option.
    
    Currently, user mode spectre v2 is set to auto
    (SPECTRE_V2_USER_CMD_AUTO) by default, even if
    CONFIG_MITIGATION_SPECTRE_V2 is disabled.
    
    Set the spectre_v2 value to auto (SPECTRE_V2_USER_CMD_AUTO) if the
    Spectre v2 config (CONFIG_MITIGATION_SPECTRE_V2) is enabled, otherwise
    set the value to none (SPECTRE_V2_USER_CMD_NONE).
    
    Important to say the command line argument "spectre_v2_user" overwrites
    the default value in both cases.
    
    When CONFIG_MITIGATION_SPECTRE_V2 is not set, users have the flexibility
    to opt-in for specific mitigations independently. In this scenario,
    setting spectre_v2= will not enable spectre_v2_user=, and command line
    options spectre_v2_user and spectre_v2 are independent when
    CONFIG_MITIGATION_SPECTRE_V2=n.
    
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: David Kaplan <David.Kaplan@amd.com>
    Link: https://lore.kernel.org/r/20241031-x86_bugs_last_v2-v2-2-b7ff1dab840e@debian.org
    Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/build: Fix broken copy command in genimage.sh when making isoimage [+ + +]

Author: Nir Lichtman <nir@lichtman.org>
Date:   Fri Jan 10 12:05:00 2025 +0000

    x86/build: Fix broken copy command in genimage.sh when making isoimage
    
    [ Upstream commit e451630226bd09dc730eedb4e32cab1cc7155ae8 ]
    
    Problem: Currently when running the "make isoimage" command there is an
    error related to wrong parameters passed to the cp command:
    
      "cp: missing destination file operand after 'arch/x86/boot/isoimage/'"
    
    This is caused because FDINITRDS is an empty array.
    
    Solution: Check if FDINITRDS is empty before executing the "cp" command,
    similar to how it is done in the case of hdimage.
    
    Signed-off-by: Nir Lichtman <nir@lichtman.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ard Biesheuvel <ardb@kernel.org>
    Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
    Cc: Michal Marek <michal.lkml@markovi.net>
    Link: https://lore.kernel.org/r/20250110120500.GA923218@lichtman.org
    Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/kaslr: Reduce KASLR entropy on most x86 systems [+ + +]

Author: Balbir Singh <balbirs@nvidia.com>
Date:   Fri Feb 7 10:42:34 2025 +1100

    x86/kaslr: Reduce KASLR entropy on most x86 systems
    
    [ Upstream commit 7ffb791423c7c518269a9aad35039ef824a40adb ]
    
    When CONFIG_PCI_P2PDMA=y (which is basically enabled on all
    large x86 distros), it maps the PFN's via a ZONE_DEVICE
    mapping using devm_memremap_pages(). The mapped virtual
    address range corresponds to the pci_resource_start()
    of the BAR address and size corresponding to the BAR length.
    
    When KASLR is enabled, the direct map range of the kernel is
    reduced to the size of physical memory plus additional padding.
    If the BAR address is beyond this limit, PCI peer to peer DMA
    mappings fail.
    
    Fix this by not shrinking the size of the direct map when
    CONFIG_PCI_P2PDMA=y.
    
    This reduces the total available entropy, but it's better than
    the current work around of having to disable KASLR completely.
    
    [ mingo: Clarified the changelog to point out the broad impact ... ]
    
    Signed-off-by: Balbir Singh <balbirs@nvidia.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Reviewed-by: Kees Cook <kees@kernel.org>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/Kconfig
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Link: https://lore.kernel.org/lkml/20250206023201.1481957-1-balbirs@nvidia.com/
    Link: https://lore.kernel.org/r/20250206234234.1912585-1-balbirs@nvidia.com
    --
     arch/x86/mm/kaslr.c | 10 ++++++++--
     drivers/pci/Kconfig |  6 ++++++
     2 files changed, 14 insertions(+), 2 deletions(-)
    Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/mm/init: Handle the special case of device private pages in add_pages(), to not increase max_pfn and trigger dma_addressing_limited() bounce buffers bounce buffers [+ + +]

Author: Balbir Singh <balbirs@nvidia.com>
Date:   Tue Apr 1 11:07:52 2025 +1100

    x86/mm/init: Handle the special case of device private pages in add_pages(), to not increase max_pfn and trigger dma_addressing_limited() bounce buffers bounce buffers
    
    commit 7170130e4c72ce0caa0cb42a1627c635cc262821 upstream.
    
    As Bert Karwatzki reported, the following recent commit causes a
    performance regression on AMD iGPU and dGPU systems:
    
      7ffb791423c7 ("x86/kaslr: Reduce KASLR entropy on most x86 systems")
    
    It exposed a bug with nokaslr and zone device interaction.
    
    The root cause of the bug is that, the GPU driver registers a zone
    device private memory region. When KASLR is disabled or the above commit
    is applied, the direct_map_physmem_end is set to much higher than 10 TiB
    typically to the 64TiB address. When zone device private memory is added
    to the system via add_pages(), it bumps up the max_pfn to the same
    value. This causes dma_addressing_limited() to return true, since the
    device cannot address memory all the way up to max_pfn.
    
    This caused a regression for games played on the iGPU, as it resulted in
    the DMA32 zone being used for GPU allocations.
    
    Fix this by not bumping up max_pfn on x86 systems, when pgmap is passed
    into add_pages(). The presence of pgmap is used to determine if device
    private memory is being added via add_pages().
    
    More details:
    
    devm_request_mem_region() and request_free_mem_region() request for
    device private memory. iomem_resource is passed as the base resource
    with start and end parameters. iomem_resource's end depends on several
    factors, including the platform and virtualization. On x86 for example
    on bare metal, this value is set to boot_cpu_data.x86_phys_bits.
    boot_cpu_data.x86_phys_bits can change depending on support for MKTME.
    By default it is set to the same as log2(direct_map_physmem_end) which
    is 46 to 52 bits depending on the number of levels in the page table.
    The allocation routines used iomem_resource's end and
    direct_map_physmem_end to figure out where to allocate the region.
    
    [ arch/powerpc is also impacted by this problem, but this patch does not fix
      the issue for PowerPC. ]
    
    Testing:
    
     1. Tested on a virtual machine with test_hmm for zone device inseration
    
     2. A previous version of this patch was tested by Bert, please see:
        https://lore.kernel.org/lkml/d87680bab997fdc9fb4e638983132af235d9a03a.camel@web.de/
    
    [ mingo: Clarified the comments and the changelog. ]
    
    Reported-by: Bert Karwatzki <spasswolf@web.de>
    Tested-by: Bert Karwatzki <spasswolf@web.de>
    Fixes: 7ffb791423c7 ("x86/kaslr: Reduce KASLR entropy on most x86 systems")
    Signed-off-by: Balbir Singh <balbirs@nvidia.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Juergen Gross <jgross@suse.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Cc: Christian König <christian.koenig@amd.com>
    Cc: David Airlie <airlied@gmail.com>
    Cc: Simona Vetter <simona@ffwll.ch>
    Link: https://lore.kernel.org/r/20250401000752.249348-1-balbirs@nvidia.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/mm: Check return value from memblock_phys_alloc_range() [+ + +]

Author: Philip Redkin <me@rarity.fan>
Date:   Fri Nov 15 20:36:59 2024 +0300

    x86/mm: Check return value from memblock_phys_alloc_range()
    
    [ Upstream commit 631ca8909fd5c62b9fda9edda93924311a78a9c4 ]
    
    At least with CONFIG_PHYSICAL_START=0x100000, if there is < 4 MiB of
    contiguous free memory available at this point, the kernel will crash
    and burn because memblock_phys_alloc_range() returns 0 on failure,
    which leads memblock_phys_free() to throw the first 4 MiB of physical
    memory to the wolves.
    
    At a minimum it should fail gracefully with a meaningful diagnostic,
    but in fact everything seems to work fine without the weird reserve
    allocation.
    
    Signed-off-by: Philip Redkin <me@rarity.fan>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Link: https://lore.kernel.org/r/94b3e98f-96a7-3560-1f76-349eb95ccf7f@rarity.fan
    Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus() [+ + +]

Author: Waiman Long <longman@redhat.com>
Date:   Thu Feb 6 14:18:44 2025 -0500

    x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()
    
    [ Upstream commit fe37c699ae3eed6e02ee55fbf5cb9ceb7fcfd76c ]
    
    Depending on the type of panics, it was found that the
    __register_nmi_handler() function can be called in NMI context from
    nmi_shootdown_cpus() leading to a lockdep splat:
    
      WARNING: inconsistent lock state
      inconsistent {INITIAL USE} -> {IN-NMI} usage.
    
       lock(&nmi_desc[0].lock);
       <Interrupt>
         lock(&nmi_desc[0].lock);
    
      Call Trace:
        _raw_spin_lock_irqsave
        __register_nmi_handler
        nmi_shootdown_cpus
        kdump_nmi_shootdown_cpus
        native_machine_crash_shutdown
        __crash_kexec
    
    In this particular case, the following panic message was printed before:
    
      Kernel panic - not syncing: Fatal hardware error!
    
    This message seemed to be given out from __ghes_panic() running in
    NMI context.
    
    The __register_nmi_handler() function which takes the nmi_desc lock
    with irq disabled shouldn't be called from NMI context as this can
    lead to deadlock.
    
    The nmi_shootdown_cpus() function can only be invoked once. After the
    first invocation, all other CPUs should be stuck in the newly added
    crash_nmi_callback() and cannot respond to a second NMI.
    
    Fix it by adding a new emergency NMI handler to the nmi_desc
    structure and provide a new set_emergency_nmi_handler() helper to set
    crash_nmi_callback() in any context. The new emergency handler will
    preempt other handlers in the linked list. That will eliminate the need
    to take any lock and serve the panic in NMI use case.
    
    Signed-off-by: Waiman Long <longman@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Acked-by: Rik van Riel <riel@surriel.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20250206191844.131700-1-longman@redhat.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

xen: Add support for XenServer 6.1 platform device [+ + +]

Author: Frediano Ziglio <frediano.ziglio@cloud.com>
Date:   Thu Feb 27 14:50:15 2025 +0000

    xen: Add support for XenServer 6.1 platform device
    
    [ Upstream commit 2356f15caefc0cc63d9cc5122641754f76ef9b25 ]
    
    On XenServer on Windows machine a platform device with ID 2 instead of
    1 is used.
    
    This device is mainly identical to device 1 but due to some Windows
    update behaviour it was decided to use a device with a different ID.
    
    This causes compatibility issues with Linux which expects, if Xen
    is detected, to find a Xen platform device (5853:0001) otherwise code
    will crash due to some missing initialization (specifically grant
    tables). Specifically from dmesg
    
        RIP: 0010:gnttab_expand+0x29/0x210
        Code: 90 0f 1f 44 00 00 55 31 d2 48 89 e5 41 57 41 56 41 55 41 89 fd
              41 54 53 48 83 ec 10 48 8b 05 7e 9a 49 02 44 8b 35 a7 9a 49 02
              <8b> 48 04 8d 44 39 ff f7 f1 45 8d 24 06 89 c3 e8 43 fe ff ff
              44 39
        RSP: 0000:ffffba34c01fbc88 EFLAGS: 00010086
        ...
    
    The device 2 is presented by Xapi adding device specification to
    Qemu command line.
    
    Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
    Acked-by: Juergen Gross <jgross@suse.com>
    Message-ID: <20250227145016.25350-1-frediano.ziglio@cloud.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

xenbus: Allow PVH dom0 a non-local xenstore [+ + +]

Author: Jason Andryuk <jason.andryuk@amd.com>
Date:   Tue May 6 16:44:56 2025 -0400

    xenbus: Allow PVH dom0 a non-local xenstore
    
    [ Upstream commit 90989869baae47ee2aa3bcb6f6eb9fbbe4287958 ]
    
    Make xenbus_init() allow a non-local xenstore for a PVH dom0 - it is
    currently forced to XS_LOCAL.  With Hyperlaunch booting dom0 and a
    xenstore stubdom, dom0 can be handled as a regular XS_HVM following the
    late init path.
    
    Ideally we'd drop the use of xen_initial_domain() and just check for the
    event channel instead.  However, ARM has a xen,enhanced no-xenstore
    mode, where the event channel and PFN would both be 0.  Retain the
    xen_initial_domain() check, and use that for an additional check when
    the event channel is 0.
    
    Check the full 64bit HVM_PARAM_STORE_EVTCHN value to catch the off
    chance that high bits are set for the 32bit event channel.
    
    Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
    Change-Id: I5506da42e4c6b8e85079fefb2f193c8de17c7437
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Message-ID: <20250506204456.5220-1-jason.andryuk@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

xfrm: Sanitize marks before insert [+ + +]

Author: Paul Chaignon <paul.chaignon@gmail.com>
Date:   Wed May 7 13:31:58 2025 +0200

    xfrm: Sanitize marks before insert
    
    [ Upstream commit 0b91fda3a1f044141e1e615456ff62508c32b202 ]
    
    Prior to this patch, the mark is sanitized (applying the state's mask to
    the state's value) only on inserts when checking if a conflicting XFRM
    state or policy exists.
    
    We discovered in Cilium that this same sanitization does not occur
    in the hot-path __xfrm_state_lookup. In the hot-path, the sk_buff's mark
    is simply compared to the state's value:
    
        if ((mark & x->mark.m) != x->mark.v)
            continue;
    
    Therefore, users can define unsanitized marks (ex. 0xf42/0xf00) which will
    never match any packet.
    
    This commit updates __xfrm_state_insert and xfrm_policy_insert to store
    the sanitized marks, thus removing this footgun.
    
    This has the side effect of changing the ip output, as the
    returned mark will have the mask applied to it when printed.
    
    Fixes: 3d6acfa7641f ("xfrm: SA lookups with mark")
    Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
    Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com>
    Co-developed-by: Louis DeLosSantos <louis.delos.devel@gmail.com>
    Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>