Only processes with the effective user id 0 or the CAP_NET_RAW attribute set are allowed to open raw sockets.
The data passed by the user is appended to an IP header (unless the IP_HDRINCL flag is set, then the user has to pass his own IP header) and sent to the specified destination address. Raw sockets use the standard sockaddr_in address structure defined in ip(4). The sin_port field can be used to specify the protocol number, otherwise the protocol specified in the initial socket(2) call is used. For incoming packets sin_port is set to the protocol of the packet.
When the IP_HDRINCL socket option is enabled on a socket no IP header is generated on sending. The user shall pass its own IP header in front of the packet.
|IP Header fields modified on sending when IP_HDRINCL is specified|
|Sending fragments with IP_HDRINCL is not supported currently.|
|IP Checksum||Always filled in.|
|Source Address||Filled in when zero.|
|Packet Id||Filled in when passed as 0.|
|Total Length||Always filled in.|
If IP_HDRINCL is specified and the IP header has a destination address unequal zero the destination address of the socket is used to route the packet. When MSG_DONTROUTE is specified the destination address must refer to a local interface, otherwise a routing table lookup is done.
Other IP header options can be set with the usual way of using the standard ip control messages like IP_TTL for the time-to-live field, IP_TOS for the tos field, IP_PKTINFO for the interface and IP_OPTIONS for ip options. See ip(4) for more information. When IP_HDRINCL is set these options are illegal.
In Linux 2.2 all IP header fields and options can be sent and received using IP control messages. This means raw sockets are only needed for new protocols or protocols with no user interface (like ICMP). Generation of custom TCP or UDP packets using raw sockets is unnecessary in many cases.
When a packet is received Linux first checks if a raw socket has been bound to the protocol of the packet. If this is true the packet is first passed to the raw socket(s) and then passed to other receivers of this protocol (e.g. kernel protocol modules).
ICMP_FILTER Enable a special filter for raw sockets bound to the IPPROTO_ICMP protocol. The passed value long word mask with the bits representing the ICMP types. All incoming ICMP messages with a type equal to a bit number set in this mask are not passed to the socket. This can be used to filter out uninteresting ICMP messages. The default is to pass all ICMP messages.
Additionally raw sockets support all ip(4) SOL_IP socket flags. One SOL_IP socket flag specific to raw sockets is the IP_HDRINCL flag. When this flag is enabled the user has to pass his own IP header. Linux does not change this IP header in any way.
Raw sockets fragment a packet when its total length exceeds the interface MTU. A better more network friendly alternative is to use path MTU discovery. If the IP_PMTU_DISCOVER option is enabled the network stack automatically saves the MTU of targets that have been sucessfully communicated with in the routing cache. When Path MTU is in progress packets may be dropped when the initial MTU guess was too large. The application has to do its own retransmit strategy to handle this situation (but of course packets may be always dropped for other reasons too, so this has be handled anyways). When the socket is connected to a specific peer with connect(2) the path mtu can be retrieved conveniently using the IP_MTU socket option after a EMSGSIZE error occurred. For connectionless sockets with many destinations the new MTU can be accessed using the error queue (see IP_RECVERR in ip(4)). The application should lower its packet sizes then. To get an initial PMTU estimate it is possible to connect a temporary socket to the destination and retrieve the currently known PMTU using the IP_MTU getsockopt.
Linux 2.0 enabled some bug-to-bug compatibility with BSD in the raw socket code when the SO_BSDCOMPAT flag was set - that has been removed in 2.2.
When the IP_HDRINCL option is set packets are not fragmented on sending. This is a limitation in Linux 2.2.
If you want to receive all ICMP packets for a socket it is better to use IP_RECVERR on that particular socket (only works for datagram oriented sockets).
In Linux RAW sockets may tap all single packet protocols, even protocols like ICMP which have a protocol module in the kernel. In this case the packets are passed to both the kernel module and the raw socket(s). This is unportable, many other stacks have limitations here.
Linux does not mangles outgoing RAW socket headers except for the cases documented above. Some other stacks do more changes (like changing byte order in some IP header fields).
RFC1191 for path MTU discovery.