Why TCP needs 3 handshakes

  sonic0002        2024-09-28 08:57:52       4,940        1    

Prerequisite Knowledge

First, let's look at the control bits and state machine of TCP, which form the basis for understanding the three-way handshake of TCP.

TCP Packet Control Bits

The control bits in the TCP packet header are used to control the status of the TCP connection and can indicate various control information such as connection establishment, termination, reset, etc.

There are six common control bits:

  • SYN (Synchronize Sequence Numbers): Requests to establish a connection (part of the three-way handshake). It is set in the initial packet during connection establishment, indicating that the sender wishes to establish a connection and synchronize sequence numbers. Since TCP is bidirectional, both sides need to send a SYN when establishing a connection. Although SYN packets cannot carry data, they do consume a sequence number.
  • ACK (ACKnowledgment Field Significant): Acknowledges received data. When the ACK flag is set, the receiver fills in the next expected sequence number in the acknowledgment field. An ACK packet that does not carry data does not consume a sequence number.
  • FIN (No More Data From Sender): Requests to terminate the connection (part of the four-way wave-off). It is set in the packet after all data has been sent, notifying the receiver that the sender has finished sending all data. As TCP is bidirectional, both sides need to send a FIN when closing the connection. Although FIN packets do not carry data, they do consume a sequence number.
  • RST (Reset the Connection): Resets the connection, used for abnormal or erroneous connection termination. When the receiver receives the RST flag, it immediately terminates the connection without any data confirmation. For example, the TIME_WAIT issue mentioned in previous articles. Note: The presence of RST packets in a production environment often indicates potential problems.
  • PSH (Push Function): Indicates to the receiver that received data should be delivered to the application layer immediately, suggesting that the data should be processed quickly rather than waiting for more subsequent data.
  • URG (Urgent Pointer Field Significant): Indicates that the data has high priority and should be processed by the receiver as soon as possible.
  • ECE (ECN-Echo): Indicates whether both parties have negotiated support for Explicit Congestion Notification during the three-way handshake.

Sequence Number (Seq)

Since TCP is bidirectional, both sides of a single connection can send data to each other, so each side must maintain its own Seq field.

The Seq is dynamically and randomly generated, which helps prevent forged packets from resetting the connection (RST attack).

TCP provides ordered transmission, so each data segment must include a Seq number field:

  • When the receiver gets out-of-order packets, it can reorder them based on the Seq.
  • When the receiver gets duplicate packets, it can deduplicate them based on the Seq.

As shown in the figure, the increment method of the sequence number (important):

  • For segment 1, if the starting Seq number is 1 and the length is 1448 bytes, then the Seq number of segment 2 would be 1 + 1448 = 1449.
  • If the length of segment 2 is also 1448, then the Seq number of segment 3 would be 1449 + 1448 = 2897.

In other words, the size of a Seq number is derived by adding the Seq number and the length of the previous segment.

Seq = Previous Segment's Seq + Length (Len)

Therefore, in the process of TCP data transmission, the segments sent by either party should be continuous: the Seq number of the next packet equals the Seq of the previous packet + Length (except for the three-way handshake and four-way wave-off).

Len (Segment Length)

It's important to note that the Length (Len) does not include the length of the TCP header; therefore, don't assume that a packet with Len = 0 is meaningless. The TCP header itself carries a lot of information.

ACK (Acknowledgment Number)

The receiver tells the sender which segments (Seq numbers) it has received.

  • If the sender sends a segment with Seq:1 and Len:100 to the receiver, then the ACK reply from the receiver would be 1 + 100 = 101, indicating that it has received all data up to Seq 101.
  • If the sender sends a segment with Seq:101 and Len:50 to the receiver, then the ACK reply from the receiver would be 101 + 50 = 151, indicating that it has received all data up to Seq 151.

For example, if Party A sends a segment "Seq:x Len:y" to Party B, then Party B’s ACK response would be x + y, meaning it has received all bytes up to x + y.

Conclusion: The ACK number that the receiver sends back is exactly equal to the next Seq number that the sender expects to send. Therefore, we can see that the ACK number of packet 10377 is exactly equal to the Seq number of packet 10378.

If neither party sends any data during communication, then the ACK number returned by the other party will not change (that is, it remains the initial value during the three-way handshake).

TCP State Machine

Below is the classic TCP state machine diagram, which encompasses all states of a TCP connection from its initial establishment to its final termination throughout its entire lifecycle.

Here is the color-coded version from Wikipedia:

Three-Way Handshake Example

Below is a typical diagram illustrating the three-way handshake.

  • First Handshake: The client initiates a connection request, setting the SYN bit to 1, and initializes a random sequence number (ISN) Seq = x. After sending the SYN packet, the client enters the SYN-SENT state.
  • Second Handshake: The server, upon receiving the connection request from the client, initializes a random sequence number (ISN) Seq = y, sets the SYN and ACK bits to 1, and the ACK number to x + 1. After sending the SYN-ACK packet, the server enters the SYN-RECEIVED state.
  • Third Handshake: The client, upon receiving the server's response, sets the ACK bit to 1 and sends an ACK packet. The client then enters the ESTABLISHED state. The server, upon receiving the ACK packet with the ACK number y + 1, also enters the ESTABLISHED state, completing the TCP connection establishment.

During the connection process, the state transitions of the client and server are as follows:

From the diagram, we can see that for the ESTABLISHED (connection established) state, the client and server perceive the completion of the connection at different times:

  • For the client, the connection is considered established after the second handshake.
  • For the server, the connection is considered established after the third handshake.

Wireshark Packet Capture

To gain a more intuitive understanding of the TCP three-way handshake process, we use Wireshark for packet capture:

Open Wireshark and start capturing packets, then execute the following command in the terminal:

curl -I -H "Connection: close" https://dbwu.tech

Switch to the Wireshark interface and use TCP filtering to view the packets. You can observe the TCP three-way handshake process, where 192.168.3.68 is the local IP address within the LAN.

  • First Handshake: The client initiates a connection request using the Seq field set to 3123802190.
  • Second Handshake: The server responds with the Seq field set to 1071295171, ACK = x + 1, which is 3123802191.
  • Third Handshake: The client responds with ACK = y + 1, which is 1071295172.

Tip: By default, Wireshark displays relative Seq values (starting from 0). If you want to see the actual random Seq values of the client and server, you can adjust this in Wireshark's settings:

Edit - Preferences - Protocols - TCP

Uncheck: Relative Sequence numbers

Proof (Rough Version)

After explaining the theoretical foundation of the TCP three-way handshake, we can analyze and prove the proposition:

At least three handshakes are required to establish a TCP connection.

Using a basic mathematical proof method called Proof by Contradiction, we posit the following:

A TCP connection can be established without three handshakes.

Specifically, we hypothesize that TCP can establish a connection with fewer than three handshakes, represented by N. We divide N into three ranges:

N < 1 | N == 1 | N == 2

We will prove each case separately.

1. N < 1

When N < 1, it means that no one initiates the first handshake. In this scenario, neither side knows of the other's existence, making it impossible to establish a connection. Thus, N < 1 is invalid, leading us to the next hypothesis: N == 1.

2. N == 1

When N == 1, it implies that only one handshake occurs, i.e., the sender sends a connection request (SYN) to the receiver. However, no further action happens after the request is sent, so the connection cannot be established because:

  • The sender cannot confirm whether its transmission function works properly (whether it can send data normally to the receiver).
  • The receiver cannot confirm whether its reception function works properly (including whether it is listening on the correct port and whether its buffer is functioning correctly).

Without these confirmations, the sender considers the connection unestablished and will not continue to send data. Thus, N == 1 is invalid, leading us to the next hypothesis: N == 2.

3. N == 2

When N == 2, it implies that two handshakes occur. Based on the proof for N == 1, we continue our examination.

After the first handshake, the receiver acknowledges the sender's request with a response (SYN-ACK). Similar to the sender's message, the receiver's response is sent but nothing else happens, so the connection still cannot be established. Now, two things are confirmed:

  • The sender's transmission function is normal.
  • The receiver's reception function is normal.

However, the receiver cannot confirm:

  • Whether its transmission function is working correctly (whether it can send data normally to the sender).
  • Whether the sender's reception function is working correctly.

Thus, N == 2 is also insufficient for establishing a connection, demonstrating that at least three handshakes are necessary to fully establish a TCP connection.

Original Proposition Proof

Through the method of proof by contradiction provided earlier, we can prove that at least three handshakes are needed to establish a TCP connection.

Following the reasoning outlined previously, let's examine the state changes of the sender and receiver when N == 3.

After the second handshake, the sender receives the connection establishment response packet (SYN-ACK) from the receiver and replies with an acknowledgement packet (ACK).

Now the following can be confirmed:

  1. Whether the receiver's transmission function is working correctly (whether it can send data normally to the sender).
  2. Whether the sender's reception function is working correctly.

The two issues present when N == 2 are now resolved:

  • The receiver's transmission function is normal.
  • The sender's reception function is normal.

Proof (Correct Version)

Previously, we used a rough method of proof by contradiction along with the state of TCP packets to prove that at least three handshakes are required to establish a TCP connection. Additionally, we can rigorously prove this using TCP sequence numbers (Seq).

To achieve reliable data transmission, both parties in a TCP communication must maintain a sequence number (Seq) to identify which packets have already been received by the other party.

The three-way handshake process involves: both parties exchanging their initial sequence numbers and confirming that the other party has received these initial sequence numbers.

  • If there is only one handshake, the receiver does not send its initial sequence number.
  • If there are only two handshakes, only the sender's initial sequence number can be confirmed, while the receiver's initial sequence number cannot be verified.

Moreover, with only two handshakes, there might be an exceptional situation caused by delayed packets resulting in invalid connections.

As shown in the diagram, there are multiple paths in a network. The first packet sent by the client to establish a connection might end up on a path with severe delays, causing it to take a long time to reach the server.

After the client's timeout, it assumes the first packet was lost and reinitiates the request. The second request takes a normal path and completes the connection quickly.

From the client's perspective, the communication seems to have ended, but then its first packet arrives late at the server. Since the server does not know that this is an old and invalid request, it responds normally.

If TCP had only two handshakes, an invalid (expired duplicate) connection would be established on the server.

However, under the three-way handshake mechanism, when the client receives the server's response, it realizes that this is an outdated connection and responds with an RST packet. Upon receiving the RST, the server closes the connection.

Key Point: What the three-way handshake confirms is the initial sequence number (ISN, Initial Sequence Number) of both the sender and receiver. This value distinguishes the current connection from historical old connections.

Conclusion

Theoretically, even more than three handshakes would not guarantee a "completely reliable" TCP connection. However, through three handshakes, it can at least be confirmed that the connection is "basically usable." Increasing the number of handshakes would merely increase the confidence level in the "connection availability."

After the three-way handshake of TCP, the state changes of the sender and receiver are as follows:

  • The sender confirms: its own transmission and reception are normal, and the other party's transmission and reception are normal.
  • The receiver confirms: its own transmission and reception are normal, and the other party's transmission and reception are normal.

In summary, the three-way handshake in TCP is a classic example of a "trade-off" in software engineering.

Translated from https://dbwu.tech/posts/network/why-tcp-does-needs-three-way-handshake/

REASON  NETWORK  TCP  EXPLANATION  HANDSHAKE 

       

  RELATED


  1 COMMENT


John Day [Reply]@ 2024-09-29 06:27:12

The 3-way handshake is not necessary. It does what it suppose to but that is not why it works. In 1978, published in 1980, Richard Watson of LLL proved that the necessary and sufficient condition for synchronization for reliable data transfer is to impose an upper bound on 3 times: MPL, Maximum Packet Life time; A, maximum time to wait before sending an Ack; and R, maximum time to exhaust retries.

The message exchange has nothing to do with it. TCP sort of works because TTL bounds MPL and the other two are bounded by the assumptions about performance and the that the TCB has to be held for 120 seconds before the port-ids can be re-used. This is less secure and less robust. Watson and team defined a protocol called delta-t, implemented it and used it at LLL for many years.

This is a hard result to wrap one's head around. It isn't that the 3-way handshake is wrong, it just has nothing to do with why it works.  Any sequence of 3 or packets will have the same effect as long as the bounds are enforced.

There is much more to this result. For more information and the sources, you can contact me at day@bu.edu.

Of the 3 or 4 protocols considered in the mid-70s, TCP was by far the worst design and when there was vote, it was not chosen by a margin of 2-1. 

Take care,

John Day



  FUN STORY

When a coffee machine broke in IT company