A better analogy in Oracle vs Google: API is jargon

As a software developer I know that in Oracle vs Google, Google and Judge Alsup are right and Oracle and the White House are wrong. But perhaps we need a better analogy to explain to non-developers that APIs are not copyrightable.

A word of caution: I am definitively not a lawyer, and you will not find even the beginning of a legal argument there. This is just an attempt of explaining by analogy what I think APIs are, not only as someone who make a living from using them, implementing them and when forced to do so, inventing new one, but also as someone who will greatly suffer if the minefield that is software development is made even more dangerous if API are deemed copyrightable.

When I read some of the arguments from Oracle and in particular the book chapter titles analogy, I felt it contrived and only superficially explaining what APIs are – the analogy breaks down quickly when you try to look at it more deeply. So I came up with something a little bit better, which is that an API is a jargon.

Jargon, as defined by The Concise Oxford Dictionary is “words or expressions used by a particular profession or group that are difficult for others to understand.” Note that it is not a set of new words that are specific to a group or profession, but words that are used with a different meaning for the purpose of improving communication of ideas inside that group or profession.

As an example taken from the Computing group or profession, the word “protocol” as a very specific meaning. In the same dictionary it has this separate definition: “Computing; a set of rules governing the exchange or transmission of data between devices.” Note also that even inside a group or profession, sub-groups can develop their own jargon, making it difficult to communicate to each other, but greatly improving it inside these sub-groups: The IETF and the ITU can be a good example of that.

An API is very close to that: Ordinary words or arrangements of words (expressions) that have a different meaning. If we take the java.net package as an example, each of the word used (can be verbs or nouns, etc…) have a very different sense than is used generally: Socket.isConnected() does not mean to test that an electrical socket has a plug into it, it means something different in this jargon (namely that an identifier is associated with some state resulting of a successful protocol exchange).

And if we want to push the analogy a bit further, one can imagine a dictionary that explain each of the words and expressions used by a jargon (The New Hacker’s Dictionary is a good example of that). The definition of each word or expression would be copyrightable, and is equivalent to the implementation of an API. Someone can write a different dictionary for the same jargon, but could not copy word for word the definitions for the first one – that would be the equivalent of Google, taking the jargon known as the standard Java library (developed by Sun and many others through the JCP), and writing a new implementation.

As I said there is not a legal argument there, but the question that can now be asked is this: Although it is perfectly clear that in a jargon dictionary the text of the definitions is copyrighted, is the list of words and expressions defined by this dictionary also copyrighted? I believe that the same answer is applicable to APIs.

Improving standard compliance with transclusion

Inserting fragments of a standard specification – IETF RFC or other – as comments in the source code that implement it seems to be a simple way to assure a good conformance. Unfortunately doing so can create legal issues if not done carefully.

These days I am spending a lot of time implementing IETF’s (and other SDO’s) protocols for the Nephelion Project. I am no stranger to network protocol implementation, as this is mostly what I am doing since more than 25 years, but this time the very specific code that is needed for this project is required to be as close as possible to the standard. So I am constantly referring to the text inside the various RFCs to verify that my code is conformant. Obviously copying the text fragments as comments would greatly simplify the development and at the end make the translation between the English text that is used to describe the protocol and the programming language I use to implement it a lot more faithful.

At this point I should insert the usual IANAL but, at least to my understanding, that is something that is simply not possible. My intent is to someday release this code under a Free Software license, but even if it was not the case, I believe that all software should be built with the goal of licensing it in the future, this license being a commercial one or a FOSS one. The issue here is that the RFCs are copyrighted and that modifying is simply not permitted by the IETF Trust and, in my opinion, rightly so as a standard that anybody can freely modify is not much of a standard. But publishing my code under a FOSS license would give everyone the right to modify it (under the terms of the license), and that would apply too to the RFC fragments inserted in the source code.

So the solution I use to at the same time keep the specification and the implementation as close as possible and to not have to worry about code licensing is to use transclusion. Here’s an example of comment in the code source for the UDP module:

% @transclude file:///home/petithug/rsync/ietf/rfc/rfc768.txt#line=48,51

The syntax follows the Javadoc (and Pldoc, and Scaladoc) syntax. The @transclude tag indicates that the text referenced by the URL must be inserted in the source code but only when displayed in a text editor. Here’s what the same code looks like when loaded in VIM (the fragment for RFC 768 is reproduced here under fair use):

% @transclude file:///home/petithug/rsync/ietf/rfc/rfc768.txt#line=48,51
% {@transcluded
% Source Port is an optional field, when meaningful, it indicates the port
% of the sending process, and may be assumed to be the port to which a
% reply should be addressed in the absence of any other information. If
% not used, a value of zero is inserted.
% @@@}

(I chose this example because, until few days ago, I did not even know that using a UDP source port of 0 was conformant).

The @transcluded inline tag is dynamically generated by a VIM plugin but this tag will never appear anywhere else than in the VIM buffer, even after saving it to the disk. The fragment syntax is from RFC 5147, and permits to select the lines that must be copied (An RFC will never changes, so hardcoding the line number in the code source cannot break in the future).

The plugin can be installed from my Debian repository with the usual “apt-get install vim-transclusion”. The plugin is kind of rough for now: only the #line=<from>,<to> syntax is supported, hardcoding the full path is not very friendly, curl is required, etc… But that is still a huge improvement over having to keep specification and implementation separate.

An Undecidable Problem in SIP

A few years back, one of my colleague at 8×8 made an interesting suggestion. We were at the time discussing a recurring problem of SIP call loops in the Packet8 service and his suggestion was to write a program that would analyze all the various forwarding rules installed in the system, and simply remove those that were the cause for the loops. I wish I remember what I responded at the time and that I had the insight to say that writing such program is, well, impossible, but that’s probably not what happened.

Now “impossible” is a very strong word and I must admit that I have spent most of my career in computers thinking that nothing was impossible to code and that, worst case, I just needed a better computer. It just happens that there is a whole class of problems that are impossible to code – not just difficult to code, or not that the best possible code will take forever to return an answer without using a quantum computer, but that it is impossible to write a program that always return correct answers for these problems. The SIP call loop problem is one of them.

To make sense of this we will need to define a lot of concepts, so lets start with a SIP call. SIP is, for better or for worse, the major session establishment protocol for VoIP. A SIP client (e.g. a phone or a PSTN gateway) establishes a call more or less like how a Web browser contacts a web site, with the difference that in the case of SIP the relationship with the server lasts for the duration of the call. One of the (deeply broken) features of SIP is that there may exist intermediate network elements called SIP Proxies that can, during the establishment of a call, redirect the call to a new destination. In this case the server, instead of answering the call itself, creates a new connection to a different destination which can itself creates a new connection and so on. This mechanism obviously can create loops, especially when the forwarding rules in these servers are under the control of the end-user – which was the problem we encountered at 8×8. There is many mechanisms a SIP proxy can use to decide if, when, and where to redirect a call, but for the sake of simplicity we will consider only a subset of all these possibilities, and assume that all the SIP proxies involved use the Call Processing Language (CPL), a standard XML-based language designed to permit end-users to control, among other things, how a SIP proxy forwards calls on their behalf. Jive’s users can think of a CPL script as equivalent to the dialplan editor, but for just one user and with the additional constraint that it is not possible to create loops inside a CPL script.

Here an example of CPL script taken from the standard (RFC 3880) that will forward incoming calls to the user’s voicemail if she does not answer the calls on her computer:

<cpl xmlns="urn:ietf:params:xml:ns:cpl" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:ietf:params:xml:ns:cpl cpl.xsd">
  <incoming>
    <location url="sip:jones@jonespc.example.com">
      <proxy>
        <redirection>
          <redirect/>
        </redirection>
        <default>
          <location url="sip:jones@voicemail.example.com">
            <proxy/>
          </location>
        </default>
      </proxy>
    </location>
  </incoming>
</cpl>

One interesting feature of CPL is that it is extensible, so even if only a subset of all the capabilities of a standard compliant SIP proxy can be implemented using baseline CPL, it is possible to add extensions to be able to use any legal feature of the SIP standard. As an example, the following CPL script (also taken from RFC 3880) rejects calls from specific callers by using such an extension:

<cpl xmlns="urn:ietf:params:xml:ns:cpl" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:ietf:params:xml:ns:cpl cpl.xsd">
  <incoming>
    <address-switch field="origin" subfield="user" xmlns:re="http://www.example.com/regex">
      <address re:regex="(.*.smith|.*.jones)">
        <reject status="reject" reason="I don't want to talk to Smiths or Joneses"/>
      </address>
    </address-switch>
  </incoming>
</cpl>

For the rest of this discussion, we will consider that we can only use CPL extensions that are legal in a standard SIP proxy.

Now that we established the perimeter of our problem, we can formally formulate the question as follows: If we consider a VoIP system made only of endpoints (phones) and SIP proxies that run CPL scripts, and with the complete knowledge of all the (legal) CPL extensions in use, is it possible to write a program that takes as input all these scripts and an initial call destination (known as a SIP URI) and that can tell us if the resulting call will loop or not?

That seems simple enough: Try to build an oriented graph of all the forwarding rules, and if the graph is not a directed acyclic graph, then it is looping.

Let’s say that my former colleague took up the challenge and wrote this program. Using Java, he would have written something like this:

boolean isLooping(Map<URI, Document> configuration, URI initial) {
  // some clever code here...
  }

The “configuration” parameter carries the whole configuration of our SIP system as a list of mappings between an Address-Of-Record (AOR, the identifier for a user) and the CPL script that is run for this user. The “initial” parameter represents the initial destination for a call (which may contain the AOR of a user). The Java function returns true if a call to this user will loop, or false if the call is guaranteed to reach someone or something, other then the originator of the call, without looping.

Now let’s change the subject for a little bit and talk about something called the Cyclic Tag System (CTS). CTS is a mechanism to create new bit strings from an initial bit string and an ordered list of bit strings called productions, using the following rules:

1. Remove the leftmost bit of the current bit string.
2. If the removed bit was equal to 1 then concatenate the next production in the list to the current bit string (restarting at the first production when all are used).
3. Continue at rule (1) unless the current bit string is empty.

The following example from Wikipedia uses an initial bit string of 11001 and a list of productions of { 010, 000, 1111 }. Following the rules above, we generate the following bit strings:

11001
1001010
001010000
01010000
1010000
010000000
10000000
...

Simple enough.

Now, let’s say that we want to implement CTS in SIP. The CPL language is not powerful enough for that but we can define a very simple extension (still conforming to the SIP standard) that permits to copy a character substring into a parameter for the next destination of a call, e.g.:

<location m:url="sip:c.org;p={destination.string:2:10}" />

Here we extract the “string” parameter from the original destination and copy ten characters starting in second position into the new destination. With this new extension, we can now write three CPL scripts that will implement the CTS example above:

Script attached to user `sip:p1@jive.com`:

<address-switch field=”destination”>
  <address contains=”;bitstring=1”>
    <location m:url=”sip:p2@jive.com;bitstring={destination.bitstring:1}010” />
  </address>

  <otherwise>
    <address-switch field=”destination”>
      <address contains=”;bitstring=0”>
        <location m:url=”sip:p2@jive.com;bitstring={destination.bitstring:1}” />
      </address>

      <otherwise>
        <reject />
      </otherwise>
    </address-switch>
  </otherwise>
</address-switch>

Script attached to user `sip:p2@jive.com`:


<address-switch field=”destination”>
  <address contains=”;bitstring=1”>
    <location m:url=”sip:p3@jive.com;bitstring={destination.bitstring:1}000” />
  </address>

  <otherwise>
    <address-switch field=”destination”>
      <address contains=”;bitstring=0”>
        <location m:url=”sip:p3@jive.com;bitstring={destination.bitstring:1}” />
      </address>

      <otherwise>
        <reject />
      </otherwise>
    </address-switch>
  </otherwise>
</address-switch>

Script attached to user `sip:p3@jive.com`:

<address-switch field=”destination”>
  <address contains=”;bitstring=1”>
    <location m:url=”sip:p1@jive.com;bitstring={destination.bitstring:1}1111” />
  </address>

  <otherwise>
    <address-switch field=”destination”>
      <address contains=”;bitstring=0”>
        <location m:url=”sip:p1@jive.com;bitstring={destination.bitstring:1}” />
      </address>

      <otherwise>
        <reject />
      </otherwise>
    </address-switch>
  </otherwise>
</address-switch>

On our Java function isLooping(), these three scripts would go on the first parameter (indexed by the address of the proxy) and the initial parameter would contain “`sip:p1@jive.com;bitstring=11001`”.

It is trivial to write a program that can generate the CPL scripts needed to implement any list of productions. No need to even write a program, a simple XSLT stylesheet can do the job.

The reason we implemented CTS in SIP is that in 2000, Matthew Cook published a proof that CTS is Turing-Complete. Turing-Complete is a fancy expression that means that it is computationally equivalent to a computer, which basically means that any computation that can be done on a computer can also be done by the CTS system – obviously doing it multiple orders of magnitude slower than a computer, but both a computer (or a quantum computer, or a Turing machine, or lambda calculus, or any other Turing-Complete system) and the CTS system can compute exactly the same things. Because only a Turing-Complete system can simulate another Turing-Complete system, by showing that we can implement any CTS production list in SIP we just proved that SIP is equivalent to a computer. In turn that means that it is possible to convert any existing program unto a list of CPL scripts (with our small extension) and an initial SIP URI. So knowing this and knowing that the Java bytecode is also Turing-Complete (it is trivial to implement CTS in Java), we now can write this new function:


Map<URI, Document> convert(Runnable runnable) {
  // complex, but definitively implementable
  }

The function takes a Java class (i.e. a list of bytecodes) as parameter and returns a list of { SIP URI, CPL scripts } mappings as result. The class to convert implements Runnable so we have a unique entry point in it (i.e. its run() method), an entry point that by convention becomes the SIP URI on the first mapping returned by the function.

Now that we have these two basic functions, let’s write a complete Java class that is using them:

class Paradox implements Runnable {
  boolean isLooping(Map<URI, Document> configuration, URI initial) {
    // some clever code here
    }

  Map<URI, Document> convert(Runnable runnable) {
    // complex, but definitively implementable
    }

  public void run() {
    Map<URI, Document> program = convert(this);
    if (!isLooping(program, program.keySet().iterator().next())) {
      while (true);
      }
    }
  }

What we are doing here is converting the whole class into a SIP proxy configuration, and passing it to the code that can predict if this program will loop or not. Then we do something a bit tricky with the result, which is to immediately exit if we detect that it will loop. But because this is exactly the code that was under the scrutiny of the isLooping() implementation, it should have returned false, which should have executed the “while (true);” code. But executing the “while (true);” code means that the code is looping, which again contradicts the result we have.

Both cases are impossible which can have only one explanation: The isLooping() implementation is unable to find if this program loops or not. Thus, we have shown that it is possible to write a Java program that loops but cannot detect that it is looping. And because we previously proved that SIP with CPL is Turing-Complete, we know that if it is possible to write such a program in Java, it is theoretically possible to do so in SIP configuration. Because of this, we now know that it is impossible to write a program that can reliably predict if a SIP configuration loops or not. More precisely, for any implementation of isLooping(), it is always possible to find an instance of the “configuration” and “initial” parameters for which this version of isLooping() will return an incorrect response.

Now that we have the answer to our question, let’s have a better look at what really happen in a VoIP system. A SIP call never really loop forever (well, unless the PSTN is involved, but that’s a different story), because there is a mechanism to prevent that. Each call contains a counter (Max-Forwards) that is decremented when it traverses a SIP proxy, and the call ends when this counter reaches zero. Max-Forwards is a little like CPU quota. It does not improve the quality of a program; rather it just prevents it from making things worse. Putting aside the fact that we are in the business of establishing communications between people, not finding fancy ways to prevent that, the Turing-Completeness of SIP still gets in the way as, for the same reasons than before, it is also impossible to write a program that will reliably predict if a call will fail because the Max-Forwards value will reach zero.

This is a sad state of affairs that the only way to reliably predict a possible failure is to let this failure happen, but the point of this article is that this is not really the fault of the programmer, but just the consequence of the limitations of computation in this universe.

Note that at least some SIP systems are not necessarily Turing-Complete – here we had to add an extension to our very limited system based on CPL to make it Turing-Complete. But it is far easier to prove that something is Turing-Complete than proving it is not, so even if I could not find a way to prove that a pure CPL system is Turing-Complete, that does not prove it is not. But worse, we saw that there is not much difference between a Turing-Complete system and one that is not, so even a small and seemingly irrelevant modification to a system can make it Turing-Complete. So basically writing a program that tries to reach definitive conclusions on the behavior of a system moderately complex – like SIP – is an exercise in futility.

Many thanks to Matt Ryan for his review of this article.

Jarc: Generated service

It is not permitted for an annotation processor to modify a Java source file, so a processor willing to add code to an existing class is left with only two solutions (if we exclude method instrumentation): Generating a superclass or generating a subclass.

Generating a superclass has the advantage that the constructors of the annotated class can be used directly. Let say that we have a annotation processor that is designed to help implement class composition, as described in Effective Java, item #16. Instead of writing the whole ForwardingSet class, an annotation processor could generate it automatically from this code fragment:

@Forwarding(ForwardingSet.class)
public class InstrumentedSet<E> extends ForwardingSet<E> {
  InstrumentedSet(Set<E> s) {
    super(new HashSet<>());
  }

But generating a superclass is not always possible. For example let’s imagine an annotation processor that generates the JMX boilerplate necessary to export attributes. An existing class with such annotation could look something like this:

public class MyData {
  @Attribute int counter;
  }

In this case the processor for the @Attribute annotation will generate a JMX interface (Let’s call it MyDataMXBean) that declares the getcounter and setcounter methods, and a class extending MyData and implementing the JMX interface (let’s call it MyDataImpl).

The code generated would take care of the boring stuff, like synchronization and so on, which is certainly an improvement over writing and maintaining it. But the problem with subclasses is that we do not know the name of the class that was generated. Note that we have no other choice than to know the name of the superclass because we have to inherit from it. For subclasses it is better to let the processor choose the name, but now we need a way to be able to instantiate the generated class without knowing this name (in our example to register it in the MBean server).

The obvious way of doing this is to use a ServiceLoader. We can add a factory method in the MyData class to instantiate the generated class, something like this:

static MyData newInstance() {
  return ServiceLoader.load(MyData.class).next();
  }

But for this technique to work, we need to describe the service in the jar file. Using the method explained in a previous post does not help in this case, because we still do not know what will be the name of the generated class.

The version 0.2.30 of jarc provides a solution to this problem. This new version contains a new annotation, @Service, that can be used to annotate a generated class. A processor integrated in jarc will read this annotation at compile time, and automatically generates the service entry in the built jar file, as if an X-Jarc-Service attribute has been added to the manifest file. This works because this processor will be invoked after the @Attribute processor, and so knows the name of the class that has been generated. Here is for example the code fragment that the code generator would have generated for MyDataImpl:

@Service(MyData.class)
@MXBean
class MyDataImpl extends MyData implements MyDataMXBean {

Note that classes used as services require an empty constructor and that can be a problem if the class it extend does not have an empty constructor itself. The solution in this case is to define an additional factory class as the service.

First we define our factory as an abstract class:

abstract class Factory {
  MyData newInstance(int init);
  }

We adjust our factory method accordingly:

static MyData newInstance(int init) {
  return ServiceLoader.load(Factory.class).next().newInstance(init);
  }

The @Attribute processor must generate an additional class that extends the factory class and this is the class which is declared as a service:

@MXBean
class MyDataImpl extends MyData implements MyDataMXBean {
  @Service(Factory.class)
  static class FactoryImpl extends Factory {
  MyDataImpl newInstance(int init) {
    return new MyDataImpl(init);
    }
  }

  MyDataImpl(init init) {
    super(init);
  }

Decrypting SSL sessions in Wireshark

Debugging secure communication programs is no fun. One thing I learned during all these years developing VoIP applications is to never, ever trust the logs to tell the truth, and especially the one I put myself in the code. The truth is on the wire, so capturing the packets and been able to analyze them, for example with Wireshark, is one of the most important tool that a communication developer can use. But the need for secure communications makes this tool difficult to use.

As said before, the first solution would be to display the packets directly in the logs before they are encrypted or after they are decrypted but as this is probably the same idiot who wrote the code to be debugged that will write the code to generate the logs, there is little chance to have something useful as a result.

Another solution is to have a switch that permits to use the software under test in a debug mode which does not require to encrypt and decrypt the communications. A first problem is that it increases the probability of forgetting to turn the switch off after debugging or to use the wrong setting in production. Another problem is that the secure and unsecure modes may very well use different code paths, and so may behave differently. And lastly some communication protocols, especially the modern one, do not have an unsecure setting (for example RELOAD and RTCWEB), and for very good reasons.

A slightly better solution that can reduce the difference between code paths and work with secure-only protocols is to use a null cipher but for security reasons both sides must agree beforehand to use a null cipher. That in fact probably increases the probability that someone forget to switch off the null cipher after a test.

So the only remaining solution is to somehow decrypt the packets in Wireshark. The standard way of doing that is to install the private RSA in Wireshark, which immediately creates even more problems:

  • This cannot be used to debug programs in production, because the sysadmin will never accept to give the private key.
  • This does not work if the private key is stored in an hardware token, like a smartcard as, by design, it is impossible to read the private key from these devices.
  • Even with the private key, the SSL modes that can be used are limited. For example a Diffie–Hellman key exchange cannot be decrypted by Wireshark.

Fortunately since version 1.6, Wireshark can use SSL session keys instead of the private key. Session keys are keys that are created from the private key but that are used only for a specific session, and disclosing them does not disclose the private key. This solves most of the problems listed above:

  • Sysadmins can disclose only the session key related to a specific session.
  • Session keys are available even if the private key is stored in an hardware token.
  • Sessions keys are the result of, e.g., a Diffie–Hellman key exchange, so there is no need to restrict the SSL modes for debugging.

Now we just need to have the program under test storing the session keys somewhere so Wireshark can use them. For example the next version of NSS (the security module used by Firefox, Thunderbird and Chrome) will have an environment variable that can be used to generate a file that can be directly used by Wireshark (see this link for more details).

Adding the support for this format in Java requires to maintain a modified build of Java, which can be inconvenient. A simpler solution is to process the output of the -Djavax.net.debug=ssl,keygen debug option. The just uploaded Debian package named keygen2keylog contains a program that does this for you. After installation, start Wireshark in capture mode with the name of the SSL session key file that will be generated as parameter, something like this:

$ wireshark -o ssl.keylog_file:mykeys.log -k -i any -f "host implementers.org"

(Remember that you do not need to run Wireshark under root to capture packets if you run the following command after each update of the package: sudo setcap ‘CAP_NET_RAW+eip CAP_NET_ADMIN+eip’ /usr/bin/dumpcap)

Then you just need to pipe the debug output of your Java program to keygen2keylog to see the packets been decrypted in Wireshark, e.g.:

$ java -Djavax.net.debug=ssl,keygen -jar mycode.jar | keygen2keylog mykeys.log

And the beauty of this technique is that the packets are decrypted as they are captured.

Jarc: Annotation processors

Version 0.2.27 of jarc now supports to run annotations processors when a jar file is built. The syntax is simple, just add a X-Jarc-Processors attribute in the manifest file header with a list of jar files, and jarc will automatically run the processors found by querying the META-INF/services/javax.annotation.processing.Processor file inside the jar files:

Manifest-Version: 1.0
Main-Class: org.implementers.apps.turnuri.Main
X-Jarc-Target: 1.7
X-Jarc-Processors: /usr/share/lib/processor.jar

Name: org/implementers/apps/turnuri/Main.class

In turn, it is easy to build an annotation processor with jarc, here’s the manifest file for a processor I am currently developing:

Manifest-Version: 1.0
Class-Path: /usr/share/java/jcip.jar
X-Jarc-Target: 1.6

Name: org/implementers/management/annotations/processing/Main.class
X-Jarc-Service: javax.annotation.processing.Processor

Note that the Java compiler does not run processors on dependent files, so you need to add a “Name:” attribute for all Java files that need to be processed.

Also starting with this version, the JDK 1.7 is required to run jarc – but jarc can still cross-compile for all the versions of Java starting with 1.5.

Jarc: Now running Java 1.8

I like to learn new Java features before they are officially released, and that requires using unstable builds. The difficulty is to integrate the new compiler into a build – for the JDK 1.7 I released jarc as an experimental package, but that was not a very good solution.

Since version 0.2.26, jarc can use an experimental compiler, like the one supporting lambda. If you installed the new JDK at /home/petithug/jdk8, you will only need to add the following lines to the /etc/jarc.conf file to be able to build jar files that use closures:

jdk-java_8-openjdk=/home/petithug/jdk8/bin/java
jdk-tools_8-openjdk=/home/petithug/jdk8/lib/tools.jar
canonical_8=1.8-openjdk
canonical_1.8=1.8-openjdk
canonical_8-openjdk=1.8-openjdk
canonical_1.8-openjdk=1.8-openjdk
jre-check_1.8-openjdk=/home/petithug/jdk8/jre/bin/java
jre-bootclasspath_1.8-openjdk=/home/petithug/jdk8/jre/lib/rt.jar:/home/petithug/jdk8/jre/lib/jce.jar
jre-source_1.8-openjdk=1.8
jre-exec_1.8-openjdk=/home/petithug/jdk8/jre/bin/java

Jarc always use by default the most recent compiler, but you can override this with the -Jjdk=7 or -Jjdk=6 option.

The new version of jarc also support passing parameters to the JVM – either at build time or at run time – by using the -J option.

Finally it is now possible to add an X-Jarc-Debug parameter at the manifest level. This option works just like the -g option in javac. I added this option to be able to build programs for aparapi – more about this in a future post.

RELOAD: Configuration and Enrollment

The libreload-java package version 0.2.0 was released few minutes ago. Because this version starts to have bits and pieces related to RELOAD itself, the access control policy script tester that was explained in a previous post is now in a separate package named libreload-java-dev. There is nothing new in this package as I am waiting for a new version of draft-knauf-p2psip-share to update it.

The libreload-java package itself now contains the APIs required to build a RELOAD PKI. The package does not contains the configuration and enrollment servers themselves, as there is many different implementations possible, from command line tools to a deployable servlets. The best way to understand how this API can be used is to walk through the various components of a complete RELOAD PKI:

1. The enrollment server

The enrollment server provides X.509 certificates to RELOAD node (clients or peers). It acts as the Certificate Authority (CA) for a whole RELOAD overlay without requiring the service of a top-level CA. That means that anybody can deploy and manage its own RELOAD overlay (note that the enrollment server itself requires an X.509 certificate signed by a CA for the HTTPS connection, but it is independent from the PKI function itself). Creating the CA private key and certificate is simple, for example with openssl:

$ openssl genrsa -out ca.rsa 2048
$ openssl pkcs8 -in ca.rsa -out ca.key -topk8 -nocrypt
$ openssl req -new -key ca.key -out ca.csr
$ openssl x509 -req -days 365 -in ca.csr -signkey ca.key -out ca.crt

Obviously it is very important to keep the secret key, hmm, secret so using a physical token like TPM or a smartcard can be a good idea.

The second thing that the enrollment server must do is to restrict the creation of certificates to people authorized to do so. The simplest way is to issue login/passwords, but there is obviously other ways.

The enrollment server will receive a certificate request (formatted with PKCS#10), and will send back a X.509 certificate containing one or more URLs each containing a Node-IDs. The RELOAD API provides a static method to build a valid Node-ID that can be used like this:

NodeId nodeId = NodeId.newInstance(16, new SecureRandom());
new URI("reload", nodeId.toUserInfo(), "example.org", -1, "/", null, null);

The resulting URI looks something like this:

reload://0110534f8a94ec317834a9bb0d66bd3ae708@example.org/

That’s the only API required for the enrollment server. All the other stuff – certificate request parsing, certificate generation, can be easily done by using a standard library like Bouncycastle.

Note that an enrollment server is not required when self signed certificates are used.

2. The configuration server

The configuration server provides configurations files, that are XML documents that contain all the information required to join a specific overlay. A configuration file is represented in the API as a Configuration object, which is immutable. A Configuration object can be created either by parsing an existing document (using the parse static method) or by creating one from scratch by using the Builder helper class. The Builder helper class currently supports only the subset needed for implementing a PKI, i.e. it permits to set the CA certificate, the URL to the enrollment server and the signer of the configuration document. The API can be used like this to generate an initial configuration document:

Configuration.Builder builder = new Configuration.Builder("example.org");
builder.rootCertificates().add(caCert);
builder.enrollmentServers().add(new URI("https://example.org/enrollment"));
builder.kindSigners().add(nodeId);
builder.bootstrapNodes().add(new InetSocketAddress("10.1.1.0", 6084));
Configuration configuration = builder.build(signer, signerPrivateKey);
byte[] config = configuration.toByteArray();

The signerPrivateKey object is the private key that was created for the signer certificate, and the signer object is an instance of the SignerIdentity class that was created from the same certificate:

SignerIdentity signer = SignerIdentity.identities(certificate).get(0);

Subsequent versions of the configuration document for a specific overlay will need to be signed by the same signer.

It is easy to create a new version of a configuration document by using the API. For example to change the no-ice value and resign the document:

conf = new Configuration.Builder(conf).noIce(true).build(signer, signerPrivateKey);

The sequence number will automatically increase by one (modulo 65535) in the new document.

Note that this API can change in future versions. It will be frozen only in version 1.0.0, which cannot happen until the RELOAD specification is approved by the IESG for publication.

Jarc: Javadoc generation

During the preparation of the libreload-java package, I discovered that the javadoc tool – the JDK tool used to generate an HTML documentation of a Java API – does not generate the javadoc based on Java dependency. It can either generate the javadoc for whole packages, or for a subset of Java files (e.g. File*.java). But we cannot ask it to generate the documentation for one specific class and all the classes depending on this class.

I guess that my way of doing things is different: I do not have one source tree per project, but one source tree that contains the code for all of them (and much, much more code that is not released). A particular Debian package contains only a subset of the code available in my source tree – the code needed for this particular package. One way to see this is that a package is a view on my source repository. Ant or javadoc cannot easily extract only the sources files needed for a particular view (unless I list all the files individually, which would be unmaintainable), so the jarc tool relies on Java dependency.

When I tried to build the javadoc for the libreload-java package, I noticed that the test units files (used by Junit) were part of the javadoc, although they clearly are not part of the API. This is because the javadoc tool had to process the whole directory. I cannot move these two files in another directory, because they need to have access to constructors and methods that are not public.

So what I did is that I integrated the javadoc tool inside jarc. Because jarc knows the list of dependent Java files during a jar file build, I just have to feed it to the javadoc tool to have the exact API documentation corresponding to the jar file built. The new version of jarc (0.2.22) supports a new parameter (-javadoc ) that can is used to request the generation of the javadoc at the same time the package is build and unit tested.

One can notice that the Debian source packages that are distributed do not contain the whole tree, but only the subset of the files needed to rebuild this package. To do this (I obviously did not want to release my whole source tree), I had to rely on a trick. My hard disk is mounted with the strictatime options, which means that each time a file is accessed, the data/time of this access is kept. When I build a Debian package I start to build it from my whole source repository:

$ touch run-stamp
$ SRC=../src dpkg-buildpackage -D -b -us -uc

Then I run this command to copy only the files that where used for this build into a new directory:

$ find ../src -anewer run-stamp -type f -exec cp --parents -t src {} ;

I can then build again the packages, but this time with a source package that contains only these files (and that can be used to eventually rebuild the binary packages):

$ SRC=src dpkg-buildpackage -I

Note the difference in the $SRC variable content: ../src is my complete source repository and src is just the subset for this package.

RELOAD: Access control policy script tester

There was some interest at the last IETF meeting in my presentation of Access Control Policy script, so I spent some time working in improving this. The first step was to release a new version of the I-D. To encourage adoption of this draft, I also released a library permitting to develop and test new access control scripts without having a complete implementation of RELOAD. This was released as a Debian package named libreload-java in my Debian/Ubuntu repository. This package is also a reference implementation for the I-D but, as it contains a subset of the RELOAD library I am current developing, it can also be considered as an early release of this library. As always, this is released under an Affero GPL 3 license, and the code source is available.

To explain how to develop a new access control script, I will now go through the example which is also in the libreload-java package:

import static org.implementers.net.reload.AccessControlPolicyTester.kind;
import static org.implementers.net.reload.DataModel.Standard.*;

These two lines permit to simplify the code by importing the kind() static method and the available Data Models.

private static final long KIND_ID = 4026531841l;

Here we declare a constant defining a new Kind-Id in the private range.

AccessControlPolicyTester tester = new AccessControlPolicyTester("myusername",
  kind(KIND_ID, SINGLE,
    "String.prototype['bytes'] = function() {n" +
    "  var bytes = [];n" +
    "    for (var i = 0; i < this.length; i++) {n" +
    "      bytes[i] = this.charCodeAt(i);n" +
    "    }n" +
    "    return bytes;n" +
    "};n" +
    "return resource.equalsHash(signature.user_name.bytes());n"));

This code creates a new instance of the tester by passing a user name (that will be used to generate an X509 certificate) and a new kind. The new kind uses the Kind-ID declared previously, uses the SINGLE Data Model and will use an Access Control Policy as described in the ECMAScript code. Note that more calls to the kind() method can be added to the constructor of the tester.

Map<Long, DataModel<?>> kinds = new HashMap<Long, DataModel<?>>();

This code declares a variable that will contains the data items meant to be stored. The key of the map is a long that contains a Kind-Id. The value is either an instance of the DataModel.Single class, the DataModel.Array class or the DataModel.Dictionnary if the Data Model is respectively SINGLE, ARRAY or DICTIONARY. In our case, we use the SINGLE Data Model, meaning that only one item can be stored at a specific Resource-ID for this specific Kind-ID.

kinds.put(KIND_ID, new DataModel.Single(KIND_ID, 0l, new Value.NotExists(System.currentTimeMillis(), 60, tester.signer(), tester.privateKey())));

This piece of code does multiple things, so let’s decompose:

First it retrieves the signer (an Identity instance that consist of a X509 Certificate, a user name and a Node-ID) and the private key that was used for the certificate of the signer. These two objects are automatically created by the constructor of the tester object.

Then it creates a new Value instance with a store_time equal to the current time, a lifetime of one minute, an exists value of FALSE and the identity and private key just retrieved. This is the object that we want to store (it is more or less equivalent to StoredData).

Then a new instance of DataModel.Single is created with our new Kind-ID, a generation_counter of 0 and the value to store (it is more or less equivalent to StoreKindData).

Finally this object is stored in the kinds map, with the same Kind-ID as key.

tester.store(tester.signer().userName().getBytes(UTF_8), 0, kinds)

In this code, we finally try to store the items that we just created. The first parameter is the Resource-Name, which in our case is the user name (converted to a byte array). The second parameter is the replica number, 0 for the original replica. The third parameter is the set of items. The method will return false if one items fails the access control policy verification, and true if all the items pass the verification.

The DataModel.Array and DataModel.Dictionary works in the same way, excepted that DataModel.Array is a sparse array implementing java.util.List and DataModel.Dictionary is implementing java.util.Map.

The program can be compiled and executed with the following commands:

$ javac -classpath /usr/lib/reload/reload.jar Main.java
$ java -classpath usr/lib/reload/reload.jar:. Main

In addition to the jar file and the example, the package contains the Javadoc for all the classes and methods accessible to the tester, and a copy of the I-D.

Update 2011/05/14:

The command lines above do not work. I uploaded a new package (0.1.1) so the following command lines should work now:

$ javac -classpath /usr/lib/reload/reload.jar AcpTest.java
$ java -classpath usr/lib/reload/reload.jar:. AcpTest