I've been implementing a number of entropy encoding schemes. To analyze their performance. I wrote a class that can calculate the 'zero-order' information entropy of data. Because the data sets I've been working on are quite large, it needed to be fast. I couldn't find any code worth using for the task, so I produced a single Java class that does the job very effectively.
I have released the source code into the public domain: CodingFrequencies.java
Using the class is very simple:
//define some data
int[] values = {7, 7, 3, 3, 3, 2, 7};
//analyze its frequencies
CodingFrequencies freqs = CodingFrequencies.fromValues(values);
//outputs: 1.4488156357251847
System.out.println( freqs.binaryEntropy() );
//outputs: 3
System.out.println( freqs.getFrequency(7) );
//outputs: [1, 3, 3]
System.out.println( Arrays.toString(freqs.getFrequencies()) );
The source code is fully documented. For more information see the comments in the supplied code.
byte[]
and
int[]
data.int
values that can
be analyzed.CodingFrequencies
class has been
constructed, information entropy can be be returned in any base with no
further computation