Jarc: Now running Java 1.7

At last, installable packages for OpenJDK 7 are available in the experimental Debian repository. So I released a new version of jarc that is now directly supporting Java 7 code but, because the openjdk-7 packages are in an experimental repository, I uploaded jarc 0.2.21 in an experimental repository so people would not be hit with dependency issues. I updated my Ubuntu/Debian repository page to explain how to add my experimental repository to the repository configuration file.

The reason to release this experimental version of jarc is that I need two the new features of the new JVM for my RELOAD implementation. The first one is the support of TLS 1.2, aka RFC 5246, which is used by RELOAD to authenticate and encrypt all the TCP connections. The only thing missing to completely implement RELOAD will be an implementation of DTLS 1.0 (which is to datagrams what TLS is to streams). I guess I will have to bite the bullet and write this one myself.

The second feature that I needed is an implementation of the SCTP protocol. TCP is great for transferring files but it has one major flaw when used with multiplexed client/server transactions, which is called the Head-Of-Line problem – a message lost will block all the subsequent messages, even if they belong to a different transaction, until the lost message is retransmitted and received. UDP does not have this issue but comes with its own set of problems. SCTP is somewhere between UDP and TCP, and this is why, in my opinion, it is a good transport protocol for something like RELOAD (well, between RELOAD nodes with a public IP address, as most NATs – one exception been the one I supervised the development at 8×8 – do not NAT SCTP).

So the plan will be to add SCTP to RELOAD, using either TLS (RFC 3436) or DTLS (RFC 6083) depending on the DTLS implementation described above, and to write an I-D, perhaps in time for IETF 81 in Quebec City.

RELOAD: Access Control Policy distribution

This blog had been quiet for the last month because I was working on an implementation of RELOAD, the peer-to-peer protocol developed at the IETF as a component of peer-to-peer SIP (I do not plan to develop a peer-to-peer SIP implementation, but I needed a RELOAD implementation for another project). The plan is to release a complete implementation of RELOAD as a Java library under an Affero GPL 3 license and to let the company I co-founded, Stonyfish Inc, sell commercial licenses to people in need of a less restrictive license. The code will be released progressively between now and July, so keep an eye on this blog for progress.

Anyway during the development I came to see some limitations in the RELOAD specification (which is still not an RFC). Most of them were solved by the P2PSIP Working Group and there is still few that are waiting for discussions and decisions by the WG. Hopefully all will be solved before the final release. But there was one specific problem that required a somewhat different treatment, and this is the subject of this blog entry.

RELOAD is, among other things, a Storage protocol. Any user of a network of RELOAD servers (an overlay) can store data in it, and these data are automatically replicated and available to any other user. Because the overlay is designed to be secure even in presence of users with less than nice goals, only the user that stored a piece of data can modify it. The rules that are used to decide who can or cannot write or modify a piece of data in an overlay are grouped into what is called an Access Control Policy. There is four different Access Control Policies defined in the RELOAD specification, and the intent was that these four policies would cover most of the future needs. And even if there is a way to add new Access Control Policies, only a limited number would be defined.

Unfortunately, it turns out that there is a need for more than the four policies already existing. After a survey of all the existing proposals for new types of data to be stored in an overlay (this is called an Usage), I discovered that more than 50% of the new Usage are requiring a new Access Control Policy. In my opinion that creates a problem that could kill the usefulness of RELOAD before it even start to be deployed.

Let’s say that I start a new overlay and that I distribute my software to hundred of thousand of users, each of them using the overlay to store their data in the way that was defined in this first version. Everything works fine until I decide to introduce a new feature that require a new Access Control Policy. The problem is that it is not only the users that will use this new feature that will have to upgrade their copy of the software. No, to be able to even start deploying this new feature, I will have to wait that ALL the users upgrade the software. If the story of IE6 teaches us anything, it is that it will never happen. And the problem is even worse if the software used comes from different vendors.

So the proposal I made to the P2PSIP Working Group is to automatically distribute the code of a new Access Control Policy, without having to upgrade the software. This way instead of waiting months or years to deploy a new feature, it will take only 24 hours or so to have all the users ready to store the new data.

Obviously this code has to be portable so it can be executed on any language used to write the RELOAD implementations in an overlay. So I chose JavaScript (or more precisely ECMAScript) to do that – not because I like the language (JavaScript is the rubber band that hold the Web together, and I do not mean that in a nice way) but because, thanks to the current Web Browsers war, there is very good implementations available.

I am presenting this concept at IETF 80 in Prague on March 31th. You can read the Internet-Draft or the slides for the presentation if you cannot attend.

Jarc: Junit errors as compilation errors

The jarc tool supports running Junit tests as part of a jar build since version 0.2.5. Since then when a unit test fails during a jar build the output of the jarc tool looked like this:

$ jarc reload
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.fail(Assert.java:98)
at org.implementers.nio.channels.reload.ReloadNodeTest.testInit(ReloadNodeTest.java:150)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)

The problem with this output is that it is not well integrated with other tools. Gvim for example cannot parse this output to display the current junit test file and line and worst, the output is lost so I had to remember the line number to be able to find where the test failed.
One way to fix this could have been to change the error format string in Gvim to parse the output, but that would not helped for other tools. So instead of this, jarc v0.2.20 is now formatting Junit errors with the same format than Java compilation errors, so the error above looks like this now:

$ jarc reload
../src/share/classes/org/implementers/nio/channels/reload/ReloadNodeTest.java:150: java.lang.AssertionError:

Now Gvim and other tools can parse the output and display directly the line that failed.

Sharpening the axe

“If I had six hours to chop down a tree, I’d spend the first four hours sharpening the axe – Abraham Lincoln”

Developing good software requires to have the right tools and a good part of a job of a developer is to find what are the best tools for the job. It takes years of research and experimentation to assemble the right toolset. Obviously it starts with the operating system – my quest started with DOS, then OS/2 (the first OS I used that had multithreading) then Windows NT until daily BSOD made it unbearable. I switched for a few months to MacOS (when it was still non preemptive and so daily BSOD were simply replaced by daily freezes) to finally settle with Linux. That was back in 1999 and I never looked back – the number of available tools for developers was simply overwhelming.

In a way writing software on Windows is like working on your motorcycle in your living room. You certainly can, but it’s probably a better idea to do it in a workshop equipped with the right tools, most of them freely modifiable to fit your needs, and that exactly what Linux is. To push the analogy a little bit further, developing on MacOS would be like working on your motorcycle in an hotel suite.

Sure it was not an easy switch – I was still addicted to using IDE, which were simply nonexistent on Linux at the time, so I tried Emacs (I still have the GNU Emacs book) but then definitively switched to gvim. I also tried various Window managers – Gnome at the beginning, Enlightenment for a long time until I settled with Awesome, which give me a fast and minimal window system that can be totally controlled from the keyboard, like gvim (the mouse, like IDEs, are fine when learning. After some time you outgrew then and just want to be more productive, and they simply get on the way).

One of the most fantastic tool offered by the Debian GNU/Linux distribution was the packaging system. I still think that this is one of the most important tool I use today. Packaging software is a concern since a long time for me, and I started doing packaging back when I was developing for the AS/400, in the beginning of the 90s. The reason to do it at the time is exactly the same than now: Controlling the environment that is used to run the software I develop. I would say that half of the bugs found in a software comes from an environment that is different from the environment used by the developer during the testing (The famous “but it works on my computer”). Forcing the deployment of the software in a controlled environment cut down on the quantity of testing that need to be done and at the end on the number of bugs. All the teams I managed since 2000 had to follow this rule: Software does not exist until it is packaged.

Source control is another very important tool – after some painful experience with SourceSafe (a special hell should be reserved to whoever thought that locking source was a good idea), I used CVS, then Subversion, and for the last two years Git – which alone would justify switching to Linux – and with Gerrit as my current favorite peer review tool.

But all of this is merely choosing the right tool, not sharpening it.

Sometimes who cannot find the tool that you need. One solution is to write it from scratch because there is nothing like it (like I did for jarc), but that’s possible with any operating system. Often there is an existing tool that is close to what you need, but not enough to be useful. This is where using FOSS pays off, because now what a developer can do is to modify an existing tool so it fits its own needs – and this is what I really call sharpening the axe. Few years ago I worked intensively on multiple implementation of the TURN protocol, so my team wrote a Wireshark dissector for TURN. Now at Stonyfish were are working on an implementation of RELOAD, so we wrote a Wireshark RELOAD dissector, so we are distributing it as a plugin, so it can help other RELOAD implementers:

Wireshark dissector for RELOAD

Jarc: Support for servlets

I have been busy during all 2010 with Stonyfish (a start-up I founded with a former colleague from 8×8), but I did not stop improving the jarc tool. The most important new feature is the ability of generating war files (packaged servlets):

A war file can be created by adding an X-Jarc-Web-App attribute in the manifest header that point to the servlet descriptor file. e.g.:

Manifest-Version: 1.0
X-Jarc-Web-App: org/implementers/servlet/http/web.xml

The syntax of the web.xml file has also been extended to permit to list file and jar files that need to be copied into the WEB-INF/classes directory:

  <lib copy="true">file2.jar</lib>

The latest version (0.2.19) also run the junit tests with the assertions enabled and fixes a lot of bugs found during 2010.

See the previous blog entries on this subject to find how to install the jarc tool. as always the source code is available under an Affero GPL 3 license.

Application developers and DNS TTL

During the technical plenary of the 73th IETF meeting in Minneapolis MN, Dave Thaler made the interesting point that most DNS resolver API do not return the TTL of the resource resolved, e.g. an IP address. At the time he was proposing a modification of the existing APIs, and that made me thinking since.

The problem is that programmers generally think that resolving a domain name and then storing the IP address(es) resulting of this resolution is a good idea, as this technique could yield a better responsiveness for an application. But doing that without having a mechanism to know that the result of the resolution is invalid creates an even bigger problem. During my time at 8×8, I worked on the problem of designing a scalable Internet service, and my design was focused on having the scalability driven by the client (in other words: the whole load balancers idea is evil). Such techniques are very cheap and effective, but only if the client strictly obeys the TTL value received in the DNS response (I still have two pending patent applications about this techniques). Another well known problem is documented in RFC 4472 section 8.2, as keeping an IP address for too long prevents renumbering an IPv6 network, but there is plenty of other cases.

So the idea of passing the TTL together with the result of the DNS query seems like a good one, until you realize that in fact what developer have now to do is to implement a DNS cache in their application, and every evidence shows that this is not a simple task. As can be seen by the number of security vulnerabilities found during the years, even people who do read the RFC seem to have an hard time doing it right. Internet could probably do without another wave of DNS cache implemented incorrectly.

So in my opinion, adding the TTL to the API is not the solution – it will just exchange one problem with another. The correct solution is to do the resolution each time the resource is needed and do not store the result at all. If performances are too much impacted (after scientifically measuring them, we are between professionals here) then using an external DNS cache will fix the problem. The DNS cache can be in your network (for example having two DNS caches per data center), can be on each server (dnscache from the djbdns suite is easy to install and configure and has a good security track), or even directly in your application (for example dnsjava contains such a cache).

Client certificates for a distributed development team

It’s that time of the decade again: I am building an environment for the development of a new Internet service. The previous time was at 8×8 and the environment was designed by trials and errors over a period of 6 years, between 2001 and 2006. Because I am now rebuilding an environment from scratch, I can spend a little bit of time fixing some of the issues that were not easy to fix in an existing environment.

One of the basic principle of the environment at 8×8 and of the new one is that they are designed to permit people to work remotely. My very first team at 8×8 had some developers in France, some in Canada and the remaining in the West coast of the USA so there was no other choice than build a distributed environment. I know that most startups try to keep developers as close as possible but having worked in both type of environment, I am now convinced that this setting gives a false impression of been efficient. For one this kind of setting increases the risk of micro-management, which I irrevocably dislike (that’s a story for another time, but escaping micro-management was one of the reasons I moved to the USA in the first place). But a distributed environment has one nice side effect which at the end makes all the difference between a good service and the usual crap that most people mistake for good engineering: documentation. Shouting explanations over the wall of a cubicle is probably appealing for lazy people, but it does not make a product or service better. Been forced to write down stuff (in an email, in a wiki, in an IRC channel…) is probably a little more work but it represents an immediate gain for the product. A recent Slashdot question was seeking an advice about seating arrangement for a team of developers. My advice would be simple: Be sure that there is at least a distance of 5 miles between each desk.

Working in a distributed environment probably means a VPN for most people. It happens that I do not like VPN very much, mostly for two reasons. Firstly VPN are generally proprietary products that work only on Windows and although my developers are free to use whatever OS and tools they prefer, Windows and MacOS are banned from my computers (or only running in a virtual machine, for testing purpose. VMWare is the condom of the Internet). The second reason is that having the developers working inside a secure environment does not make them good at handling security issues. If a developer is not capable of making her computers secure, how can she develop secure software? So the solution we used at 8×8 was SSH tunnels. That was working fine, but it is still a kind of VPN and it is not easy to deploy under Windows. The solution for this new environment is to use client certificates.

Client certificates have the advantage of working everywhere SSL/TLS is working and it is a great improvement from using passwords. Passwords are generally too easy to guess and when they are good, people reuse them on multiple websites, websites that can leak the good passwords. Client certificates does not have this issue as the key is not stored in the website and they are not guessable. The certificate itself must be protected by a password but as this password does not have to leave the computer it can be a good, unique password that never change so people do not have to write it down.

The first step to install a client certificate infrastructure is to install a server certificate. The easiest way is to buy one from the registrar of the domain (I use a domain name for development that is different from the corporate domain name. Code name does not have to change but product names and even company names can and will). The configuration in Apache is as follow:

SSLEngine on
SSLProtocol all -SSLv2
SSLCertificateFile /etc/ssl/private/server.crt
SSLCertificateKeyFile /etc/ssl/private/server.key

Using a real (i.e. not self-signed) server certificate is easier because we do not have to distribute the files for the CA for each developer.

The next step is to create a CA for the client certificates:

$ openssl genrsa -out ca.key 1024
$ openssl req -new -key ca.key -out ca.csr
$ openssl x509 -req -days 365 -in ca.csr -signkey ca.key -out ca.crt

and install it in Apache:

SSLCACertificateFile /etc/ssl/private/ca.crt
SSLVerifyClient require

After this, each developer can create a Certificate Signing Request and send it to the administrator of the server:

$ openssl genrsa -out client.key 1024
$ openssl req -new -key client.key -out client.csr

The administrator checks the request and then creates the client certificate and send it back to the developer:

$ openssl req -noout -text -in client.csr
$ openssl x509 -req -days 365 -CA /etc/ssl/private/ca.crt -CAkey /etc/ssl/private/ca.key -CAcreateserial -in client.csr -out client.crt

The developer can then use the certificate directly (e.g. with wget or Debian repositories) or can convert it to a PKCS#12 file that can be imported in a web browser:

$ openssl pkcs12 -export -clcerts -in client.crt -inkey client.key -out client.p12