A java implementation for Shannon Entropy

For more information on Shannon Entropy, please check his article on wolfram.

In physics, the word entropy has important physical implications as the amount of “disorder” of a system. In mathematics, a more abstract definition is used. The (Shannon) entropy of a variable X is defined as

 H(X)=-sum_(x)P(x)log_2[P(x)]

bits, where P(x) is the probability that X is in the state x, and Plog_2P is defined as 0 if P=0. The joint entropy of variables X_1, …, X_n is then defined by

  H(X_1,...,X_n)=-sum_(x_1)...sum_(x_n)P(x_1,...,x_n)log_2[P(x_1,...,x_n)].

It is actually pretty simple to calculate this in java. First you need to count the occurrences of each value. Then, you can calculate the entropy of it.


public static Double calculateShannonEntropy(List<String> values) {
  Map<String, Integer> map = new HashMap<String, Integer>();
  // count the occurrences of each value
  for (String sequence : values) {
    if (!map.containsKey(sequence)) {
      map.put(sequence, 0);
    }
    map.put(sequence, map.get(sequence) + 1);
  }

  // calculate the entropy
  Double result = 0.0;
  for (String sequence : map.keySet()) {
    Double frequency = (double) map.get(sequence) / values.size();
    result -= frequency * (Math.log(frequency) / Math.log(2));
  }

  return result;
}

About these ads

13 thoughts on “A java implementation for Shannon Entropy

  1. This was a nice post.I need similar code for implementation of Kullback-leibler divergence in java.

  2. Do we need to – every time?
    result -= frequency * (Math.log(frequency) / Math.log(2));

    shall we do as follows?
    public static Double calculateShannonEntropy(List values)
    {
    for (String sequence : map.keySet()) {

    Double frequency = (double) map.get(sequence) / values.size();
    result += frequency * (Math.log(frequency) / Math.log(2));
    }

    return -result;
    }

  3. you didn’t do summation.

    it should be:

    result -= result + frequency * (Math.log(frequency) / Math.log(2));

  4. Nice, but you can generify the algorithm to be able to throw just about everything into it:

    public static double entropy(List values) {

    final Map valueOccurances = new HashMap();

    for (Comparable value : values) {
    Long valueOccurance = valueOccurances.get(value);
    valueOccurances.put(value, valueOccurance == null ? 1L : ++valueOccurance);
    }

    double combinedEntropy = 0.0d;

    for (Comparable value : valueOccurances.keySet()) {
    double entropy = valueOccurances.get(value) / (double) values.size();
    combinedEntropy += entropy * (Math.log(entropy) / Math.log(2));
    }

    return -combinedEntropy;
    }

    Optimization points: A .containsKey(…) is essentially the same as a .get(…) != null, except for the fact that .containsKey(..) has to support null as key and is thus more complex (expensive). By definition, null makes no sense in an entropy analysis algorithm so we might as well just make one call alltogether. Rather than adding 0 (for initial item), retrieving and then adding; we can just add 1. Avoid boxing by using primitive types when possible (null not a possibility). There’s no reason to factor out stuff like i.e. Math.log(2) as one of the comments says, since this is basic compiler low-hanging fruit. Consider adding an initial capacity argument to be passed to HashMap’s constructor, to avoid rehashing when the alphabet size is known (I.e. all letters from A-Z).

  5. Highlights are accessible via the “Jump To” and other list
    interfaces. It would be almost naive to go to other sites looking for a better selection
    and unless you prefer the heat of Oxford Circus on a warm July day – unlikely – it would be silly of you
    to trudge around the shops looking for a better bargain
    that you’ll get with Amazon. Amazon has achieved great success using this model with their Amazon promotional codes.

  6. You made some decent points there. I looked on the net to
    learn more about the issue and found most people will go along with
    your views on this website.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s