edu.umbc.cs.maple.utils
Class WekaUtils

java.lang.Object
  extended by edu.umbc.cs.maple.utils.WekaUtils

public class WekaUtils
extends java.lang.Object

Various utility functions for Weka.

Copyright (c) 2008 Eric Eaton

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Version:
0.1
Author:
Eric Eaton (EricEaton@umbc.edu)
University of Maryland Baltimore County

Nested Class Summary
static class WekaUtils.SVMLightLabelFormat
          Defines the format of the SVMLight labels
 
Constructor Summary
WekaUtils()
           
 
Method Summary
static java.lang.String arffToSVMLight(weka.core.Instances data, WekaUtils.SVMLightLabelFormat labelFormat)
          Converts a set of instances to svm-light format
static java.lang.String arffToSVMLight(weka.core.Instance data, WekaUtils.SVMLightLabelFormat labelFormat)
          Converts a set of instances to svm-light format
static void convertMulticlassToBinary(weka.core.Instances data, java.lang.String positiveClassValue)
          Converts the instances in the given dataset to binary, setting the specified labels to positive.
static boolean equalClassPriors(weka.core.Instances data)
          Determines whether a data set has equal class priors.
static double[] getClassValues(weka.core.Instances data)
          Gets the class values for a set of instances.
static int[] getLabels(weka.core.Instances data)
          Gets the class labels for a set of instances.
static double[] getWeights(weka.core.Instances data)
          Gets the weights of each instance in a dataset as an array.
static double[][] instancesToDoubleArrays(weka.core.Instances instances)
          Converts a set of instances to an array of vectors
static double[] instanceToDoubleArray(weka.core.Instance instance)
          Converts an instance to a feature vector excluding the class attribute.
static weka.core.Instances mergeInstances(weka.core.Instances instances1, weka.core.Instances instances2)
          Merge two instance sets.
static int[] predictClasses(weka.classifiers.Classifier model, weka.core.Instances data)
          Uses the given model to predict the classes of the data.
static weka.core.Instances subsetInstances(weka.core.Instances instances, int startIdx, int numInstancesToRetrieve)
          Extract a particular subset of the instances.
static weka.core.Instances trimInstances(weka.core.Instances instances, double percentage)
          Take a certain percentage of a set of instances.
static weka.core.Instances trimInstances(weka.core.Instances instances, int numInstances)
          Take a certain number of a set of instances.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WekaUtils

public WekaUtils()
Method Detail

trimInstances

public static weka.core.Instances trimInstances(weka.core.Instances instances,
                                                double percentage)
Take a certain percentage of a set of instances.

Parameters:
instances -
percentage -
Returns:
a reduced set of instances according to the given percentage

trimInstances

public static weka.core.Instances trimInstances(weka.core.Instances instances,
                                                int numInstances)
Take a certain number of a set of instances.

Parameters:
instances -
numInstances - the number of instances to keep
Returns:
a reduced set of instances according to the given number to keep

subsetInstances

public static weka.core.Instances subsetInstances(weka.core.Instances instances,
                                                  int startIdx,
                                                  int numInstancesToRetrieve)
Extract a particular subset of the instances.

Parameters:
instances -
startIdx - the start instance index
numInstancesToRetrieve - the number of instances to retrieve
Returns:
the specified subset of the instances.

mergeInstances

public static weka.core.Instances mergeInstances(weka.core.Instances instances1,
                                                 weka.core.Instances instances2)
Merge two instance sets.

Parameters:
instances1 -
instances2 -
Returns:
the merged instance sets

instanceToDoubleArray

public static double[] instanceToDoubleArray(weka.core.Instance instance)
Converts an instance to a feature vector excluding the class attribute.

Parameters:
instance - The instance.
Returns:
A vector representation of the instance excluding the class attribute

instancesToDoubleArrays

public static double[][] instancesToDoubleArrays(weka.core.Instances instances)
Converts a set of instances to an array of vectors

Parameters:
instances - The set of instances.
Returns:
The array of feature vectors.

predictClasses

public static int[] predictClasses(weka.classifiers.Classifier model,
                                   weka.core.Instances data)
Uses the given model to predict the classes of the data.

Parameters:
model -
data -
Returns:
An array of the class predictions.

getLabels

public static int[] getLabels(weka.core.Instances data)
Gets the class labels for a set of instances.

Parameters:
data -
Returns:
a vector of the class labels for the data set, with one entry per instance

getClassValues

public static double[] getClassValues(weka.core.Instances data)
Gets the class values for a set of instances.

Parameters:
data -
Returns:
a vector of the class values for the data set, with one entry per instance

convertMulticlassToBinary

public static void convertMulticlassToBinary(weka.core.Instances data,
                                             java.lang.String positiveClassValue)
Converts the instances in the given dataset to binary, setting the specified labels to positive. Note this method is destructive to data, directly modifying its contents.

Parameters:
data - the multiclass dataset to be converted to binary.
positiveClassValue - the class value to treat as positive.

equalClassPriors

public static boolean equalClassPriors(weka.core.Instances data)
Determines whether a data set has equal class priors.

Parameters:
data -
Returns:
whether the data set has equal class priors

getWeights

public static double[] getWeights(weka.core.Instances data)
Gets the weights of each instance in a dataset as an array.

Parameters:
data - the dataset of instances
Returns:
the weights of the instances as an array.

arffToSVMLight

public static java.lang.String arffToSVMLight(weka.core.Instances data,
                                              WekaUtils.SVMLightLabelFormat labelFormat)
Converts a set of instances to svm-light format

Parameters:
data - the weka instances
Returns:
the weka instances in svm-light format

arffToSVMLight

public static java.lang.String arffToSVMLight(weka.core.Instance data,
                                              WekaUtils.SVMLightLabelFormat labelFormat)
Converts a set of instances to svm-light format

Parameters:
data - the weka instances
Returns:
the weka instances in svm-light format