When working with Internet transports, either for analyzing them or designing a new one, it is important to keep in mind that the Internet can do a lot of strange things with your packets.
Disclaimer: I did a similar presentation at my former employer but I did not keep anything, notes or slides, when I left so this blog entry is a re-creation from memory.
For this discussion, we will reason about a very simplified Internet that still have all the properties of the real thing. Our first simplification will be that there is only 10 endpoints in this network, numbered from 0 to 9 (the real Internet has 6e57 possible addresses if IPv4, IPv6 and NATs are taken in account). Our second simplification will be that we can send only 26 different messages between these endpoints, from a to z (the real Internet can exchange 10e3612 different messages). With these limitations, we can now use a simple notation based on regular expressions (regex) to express sending a message from endpoint 0 to endpoint 9:
messages --> regex
Here “messages” is a list of messages that are sent by endpoint 0 one after the other, and “regex” is a description of what endpoint 9 receives from endpoint 0.
So the question we are trying to answer here is “what regular expression can fully describe what endpoint 9 can receive if endpoint 0 sends message a, then message b to it?”
A first description could look like this:
ab --> ab
Meaning that if endpoint 0 sends message a then message b, then endpoint 9 will receive first message a and then message b.
Obviously that is not true as not only there is no guarantee that a message sent will be received, but dropping messages is one of the fundamental mechanism used by the Internet to ask the sender to reduce its sending rate. So a second description can take this in account:
ab --> a?b?
But there is also no guarantee that when sending two messages in succession they will arrive in the same order. The reason for this is that two messages can take different paths through routers, and so the first message can be delayed enough to arrive after the second one. Let’s try a better description:
ab --> (a|b|ab|ba)?
But if the Internet can drop a message, the Internet can also duplicate it. This is a rare condition that can happen for different technical reasons, but the fact is that one should be ready to receive multiple identical messages:
ab --> (a|b)*
Now that seems to cover everything: Messages can be delayed, dropped or duplicated.
The reality is that the complete answer to our question is this:
ab --> (a|b|.)*
What is means is that, in addition of messing with messages sent by endpoint 0, the Internet can make endpoint 9 receive messages from endpoint 0 that endpoint 0 never sent.
This is one of the most often forgotten rule when analyzing or designing a transport, which is that it is really easy to fake a message as originating by another endpoint.
Nothing in the Internet Protocol can be used to detect any of these conditions, so it is up to the transport built on top to take care of these. Acknowledgments, timeout and retransmissions can take care of the lost messages; sequence number and bufferization can take care of the delays and duplications; and some kind of cryptographic signature can permit to detect fake message.