JMinHEP -
data mining framework for high-energy physics
JMinHEP is a framework for clustering analysis, i.e. for non-supervised
learning in which the classification process does not depend on a
priory information . It was designed mainly for high-energy physics
community. However, of course, everyone is welcome to use
it.
The program is a pure JAVA-based application and includes the
following algorithms:
- K-means clustering analysis (single and multi pass)
- C-means (fuzzy) algorithm
- Agglomerative hierarchical clustering
- .. more will be included soon
More information can be found in
en.wikipedia.org
or this tutorial..
The algorithms can run for a fixed cluster mode and for a best
estimate, i.e. when the number of clusters is not a priory given but
is found after estimation of the cluster compactness. The data
points can be defined in multidimensional space. At present, the
distance measure is euclidean.
Download: JMinHEP.tar.gz
Then unzip and untar it (tar -zvxf JMinHEP.tar.gz under Linux/Unix).
This will create a directory: jminhep with JMinHEP.jar file. You can
run it as usual, i.e. java -jar JMinHEP.jar
The program can be run:
- 1) In GUI mode. In this mode one can set all clustering
parameters via the user interface and the output is displayed. The data
can be loaded in form of Attribute-Relation
File Format (ARFF). The cluster centers and
the seeds positions can also be shown.
- 2) JMinHEP can be run in an embedded mode.
JMinHEP GUI mode
Just run it as:
java -jar JMinHEP.jar
and load any ARFF file. Some example files can be found here: iris.arff or my.arff.
Also, you
can load the data from the prompt:
java -jar JMinHEP.jar iris.arff

JMinHEP embedded
You can include JMinHEP.jar to your application. Look at the example application located in the
"example" directory. You
need to include JMinHEP.jar to the JAVA classpath to compile it.
In short, you need just the
statement in your code:
include jminhep.clanalyse.*;
Then load the data to the dataHolder. The Partition class
does the clustering. Then you can run any cluster algorithm
depending on input mode (the correct mode is shown in the status bar of
GUI). You can access all output information by calling the methods:
getName(), .getCompactness(), getNclusters(), getCenters(),
getClusterNumber(). The example program runs over all
possible
clustering modes and then print the final result. Read API to learn
more about the Partition and
dataHolder classes here.
Note: JMinHEP is not completely free software. Read the
JMinHEP License.
The package is based on free JFreeChart package
by Object Refinery Limited and Contributors.
JFreeChart is licensed under the terms of the GNU Lesser General
Public Licence (LGPL).
You can send
your algorithm to me for inclusion, if you will follow the codding
standard given by the dataHolder and Partitioner standard.
S.Chekanov