During the technical plenary of the 73th IETF meeting in Minneapolis MN, Dave Thaler made the interesting point that most DNS resolver API do not return the TTL of the resource resolved, e.g. an IP address. At the time he was proposing a modification of the existing APIs, and that made me thinking since.
The problem is that programmers generally think that resolving a domain name and then storing the IP address(es) resulting of this resolution is a good idea, as this technique could yield a better responsiveness for an application. But doing that without having a mechanism to know that the result of the resolution is invalid creates an even bigger problem. During my time at 8×8, I worked on the problem of designing a scalable Internet service, and my design was focused on having the scalability driven by the client (in other words: the whole load balancers idea is evil). Such techniques are very cheap and effective, but only if the client strictly obeys the TTL value received in the DNS response (I still have two pending patent applications about this techniques). Another well known problem is documented in RFC 4472 section 8.2, as keeping an IP address for too long prevents renumbering an IPv6 network, but there is plenty of other cases.
So the idea of passing the TTL together with the result of the DNS query seems like a good one, until you realize that in fact what developer have now to do is to implement a DNS cache in their application, and every evidence shows that this is not a simple task. As can be seen by the number of security vulnerabilities found during the years, even people who do read the RFC seem to have an hard time doing it right. Internet could probably do without another wave of DNS cache implemented incorrectly.
So in my opinion, adding the TTL to the API is not the solution – it will just exchange one problem with another. The correct solution is to do the resolution each time the resource is needed and do not store the result at all. If performances are too much impacted (after scientifically measuring them, we are between professionals here) then using an external DNS cache will fix the problem. The DNS cache can be in your network (for example having two DNS caches per data center), can be on each server (dnscache from the djbdns suite is easy to install and configure and has a good security track), or even directly in your application (for example dnsjava contains such a cache).
200% agreed!
Caching is just one thing that applications should not implement. Take DNSSEC for example. I think it is easier to just run a lightweight caching nameserver such as Unbound directly on the local host, and have the application trust it. This would take care of caching properly as well as deal with the much more complicated issue of DNSSEC.