For more information on Shannon Entropy, please check his article on wolfram.

In physics, the word entropy has important physical implications as the amount of “disorder” of a system. In mathematics, a more abstract definition is used. The (Shannon) entropy of a variable is defined as

bits, where is the probability that is in the state , and is defined as 0 if . The joint entropy of variables , …, is then defined by

It is actually pretty simple to calculate this in java. First you need to count the occurrences of each value. Then, you can calculate the entropy of it.

public static Double calculateShannonEntropy(List<String> values) {
Map<String, Integer> map = new HashMap<String, Integer>();
// count the occurrences of each value
for (String sequence : values) {
if (!map.containsKey(sequence)) {
map.put(sequence, 0);
}
map.put(sequence, map.get(sequence) + 1);
}
// calculate the entropy
Double result = 0.0;
for (String sequence : map.keySet()) {
Double frequency = (double) map.get(sequence) / values.size();
result -= frequency * (Math.log(frequency) / Math.log(2));
}
return result;
}

### Like this:

Like Loading...

*Related*

This was a nice post.I need similar code for implementation of Kullback-leibler divergence in java.

Do we need to – every time?

result -= frequency * (Math.log(frequency) / Math.log(2));

shall we do as follows?

public static Double calculateShannonEntropy(List values)

{

for (String sequence : map.keySet()) {

Double frequency = (double) map.get(sequence) / values.size();

result += frequency * (Math.log(frequency) / Math.log(2));

}

return -result;

}

pl ignore my comment

no problem ;)

you didn’t do summation.

it should be:

result -= result + frequency * (Math.log(frequency) / Math.log(2));

Can’t we factorize “/ log(2)” to save instructions in the loop ?

Indeed we can :)

would u plz. help me the full implemetation

Nice, but you can generify the algorithm to be able to throw just about everything into it:

public static double entropy(List values) {

final Map valueOccurances = new HashMap();

for (Comparable value : values) {

Long valueOccurance = valueOccurances.get(value);

valueOccurances.put(value, valueOccurance == null ? 1L : ++valueOccurance);

}

double combinedEntropy = 0.0d;

for (Comparable value : valueOccurances.keySet()) {

double entropy = valueOccurances.get(value) / (double) values.size();

combinedEntropy += entropy * (Math.log(entropy) / Math.log(2));

}

return -combinedEntropy;

}

Optimization points: A .containsKey(…) is essentially the same as a .get(…) != null, except for the fact that .containsKey(..) has to support null as key and is thus more complex (expensive). By definition, null makes no sense in an entropy analysis algorithm so we might as well just make one call alltogether. Rather than adding 0 (for initial item), retrieving and then adding; we can just add 1. Avoid boxing by using primitive types when possible (null not a possibility). There’s no reason to factor out stuff like i.e. Math.log(2) as one of the comments says, since this is basic compiler low-hanging fruit. Consider adding an initial capacity argument to be passed to HashMap’s constructor, to avoid rehashing when the alphabet size is known (I.e. all letters from A-Z).

Ah, the blog ate angle braces… proper code here: http://pastebin.com/1SC7PYvM

Highlights are accessible via the “Jump To” and other list

interfaces. It would be almost naive to go to other sites looking for a better selection

and unless you prefer the heat of Oxford Circus on a warm July day – unlikely – it would be silly of you

to trudge around the shops looking for a better bargain

that you’ll get with Amazon. Amazon has achieved great success using this model with their Amazon promotional codes.

I am actually delighted to read this website posts which contains tons of

valuable information, thanks for providing these kinds of data.

You made some decent points there. I looked on the net to

learn more about the issue and found most people will go along with

your views on this website.