Applied machine learning developers have a lot of open-source machine learning frameworks, e.g., ScikitLearn, Spark MLlib, ML.Net, etc. Frameworks provide a user-friendly high-level interface to algorithms, but implementations resort to low-level languages and optimizations. Such low-level C/C++, C#, or Java code is far from the original mathematical notation that is preferable for machine learning algorithms research and development. Semagle Framework makes the most of low-level C#-like constructs for performance optimization and high-level semi-mathematical F# notation for joining the algorithm blocks. Modularization of algorithms with fine-grained blocks makes research and development of new implementations for the same family of problems straightforward.

Semagle framework includes the following principal F# libraries:

Metrics Library

Semagle.MachineLearning.Metrics library implements performance evaluation metrics for classification, regression, and information retrieval:

Accuracy is used for two and one class classification problems.

\[accuracy = \frac{1}{n} \sum_{i=1}^n \mathbb{1}_{predicted_i = expected_i}\]
let accuracy (expected: 'Y[]) (predicted: 'Y[]) =
    ...
Classification.accuracy test_y predict_y

Recall is used for classification and sequence labeling problems.

\[recall = \frac{\sum_{i=1}^n \mathbb{1}_{predicted_i=expected_i=+1}}{\sum_{i=1}^n \mathbb{1}_{expected_i=+1}}\]
let recall (expected: 'Y[]) (predicted: 'Y[]) =
    ...
Classification.recall test_y predict_y

Precision is used for classification and sequence labeling problems.

\[precision = \frac{\sum_{i=1}^n \mathbb{1}_{predicted_i=expected_i=+1}}{\sum_{i=1}^n \mathbb{1}_{predicted_i=+1}}\]
let precision (expected: 'Y[]) (predicted: 'Y[]) =
    ...
Classification.precision test_y predict_y

F1 is used for classification and sequence labeling problems.

\[F_1 = 2 \times \frac{recall \times precision}{recall + precision}\]
let f1 (expected: 'Y[]) (predicted: 'Y[]) =
    ...
Classification.precision test_y predict_y

Mean squared error (MSE) is used for regression problems.

\[MSE = \frac{1}{n}\sum_{i=1}^n (predicted_i - expected_i)^2\]
let mse (expected: float32[]) (predicted: float32[]) =
    ...
Regression.mse test_y predict_y

Vectors Library

Semagle.Numerics.Vectors library provides DenseVector and SparseVector vector types. These data types implement operations required by standard SVM kernels. SparseVector is also a basic data type for the Structured SVM implementation.

DenseVector is constructed from the array of float32 values. This type of vector stores zero and non-zero values and suits well for feature vectors with many non-zero elements.

let a = DenseVector([| 1.0f; 3.0f; -3.0f; 4.0f; 8.0f |])

SparseVector is constructed from the array of int indices of non-zero elements and the array of float32 values of non-zero elements. This type of vector suits well for feature vectors with a few non-zero elements.

let a = SparseVector([|0; 1; 3; 5|], [|1.0f; -2.0f; 4.0f; -6.0f|])

DenseVector and SparseVector support the following operations:

  • indexing a.[1] and slicing a.[1..3], a.[..3], a.[2..], but implementations of SparseVector operations are more expensive because they require binary search for finding item indices;
  • element-wise addition a + b, subtraction a - b, multiplication a * b, and division a / b;
  • unary negation -a and multiplication a * 1.5f and division a / 2.0f by a scalar;
  • dot product a .* b and Euclidean distance a ||-|| b.

SVM Library

Semagle.MachineLearning.SVM library provides a high-level interface for classification and regression support-vector machines and a low-level interface for sequential minimal optimization 1,2. All interfaces are data type agnostic and with kernel functions only:

type Kernel<'X> = 'X -> 'X -> float

A high-level interface includes the following models:

  • TwoClass for two-class classification;
  • OneClass for two-class classification with training on positive samples only;
  • MultiClass for multi-class classification;
  • Regression for real-valued regression.
type SVM<'X,'Y> =
    /// SVM model for two class classification
    | TwoClass of Kernel<'X> * ('X[]) *  (float[]) * float
    /// SVM model for one class classification
    | OneClass of Kernel<'X> * ('X[]) *  (float[]) * float
    /// SVM model for regression
    | Regression of Kernel<'X> * ('X[]) *  (float[]) * float
    /// SVM model for multi-class classification
    | MultiClass of Kernel<'X> * (('Y * 'Y * ('X[]) * (float[]) * float)[])

Prediction functions for different problems are similar:

module TwoClass =
    let inline predict (model : SVM<'X,'Y>) (x : 'X) = ...
module OneClass =
    let inline predict (model : SVM<'X,'Y>) (x : 'X) = ...
module Regression =
    let inline predict (model : SVM<'X,'Y>) (x : 'X) = ...
module MultiClass =
    let inline predict(model : SVM<'X,'Y>) (x : 'X) = ...

They are based on the weighted sum of values of kernel functions for support vectors:

module SVM =
    let inline predict (x : 'X) (K : Kernel<'X>) (X : 'X[]) (A : float[]) (b : float) =
        Array.fold2 (fun sum x_i a_i -> sum + a_i * (float (K x_i x))) b X A

The matching problem solvers show how to apply the generic sequential minimization solver to specific problems:

Two-class and multi-class problem solvers build classification models for data type X. Labels for two-class classifiers are +1/-1. For multi-class, they have an arbitrary data type Y:

/// Optimization parameters for C_SVC problem
type C_SVC = {
    /// The penalty for +1 class instances
    C_p : float;
    /// The penalty for -1 class instances
    C_n : float;
}

/// Two class C Support Vector Classification (SVC) problem solver
let C_SVC (X : 'X[]) (Y : float32[]) (K : Kernel<'X>) (parameters : C_SVC) (options : OptimizationOptions) = ...

/// Multi-class C Support Vector Classification (SVC) problem solver
let C_SVC_M (X : 'X[]) (Y : 'Y[]) (K : Kernel<'X>) (parameters : C_SVC) (options : OptimizationOptions) = ...

One-class problem solver builds classification models for data type X and +1/-1 labels for an input dataset with positive samples only.

/// Optimization parameters for One-Class problem
type OneClass = {
    /// The fraction of support vectors
    nu : float;
}

/// One-Class problem solver
let OneClass (X : 'X[]) (K : Kernel<'X>) (parameters : OneClass) (options : OptimizationOptions) = ...

Regression problem solver builds regression models for data type X and float32 labels.

/// Optimization parameters for C_SVR problem
type C_SVR = {
    /// The boundary of the approximated function
    eta : float;
    /// The penalty
    C : float;
}

/// C Support Vector Regression (SVR) problem solver
let C_SVR (X : 'X[]) (Y : float32[]) (K : Kernel<'X>) (parameters : C_SVR) (options : OptimizationOptions) = ...

Sequential minimization optimization solver is a low-level building block for the above problem-specific problem solvers and Structured SVM optimization below:

/// General parameters for C_SMO problem
type C_SMO = {
    /// Initial feasible values of optimization varibles
    A : float[];
    /// Per-sample penalties
    C : float[];
    /// The linear term of the optimized function
    p: float[];
}

/// Sequential Minimal Optimization (SMO) problem solver
let C_SMO (X : 'X[]) (Y : float32[]) (Q : Q) (parameters : C_SMO) (options : OptimizationOptions) =

Structural SVM Library

Semagle.MachineLearning.SSVM library provides a high-level interface for multi-class classification and sequence labeling, and a low-level cutting-plane solver for “1-slack” 3 structural SVM formulation. All high-level interfaces require two functions:

  • FeatureFunctions creates a feature vector for an arbitrary data type X;
  • LossFunction defines a loss value for predicted y and expected y' labels of an arbitrary data type Y.
/// Simple feature function
type FeatureFunction<'X> = 'X -> Vector

/// Structured SVM loss function type
type LossFunction<'Y> = (* y *) 'Y -> (* y' *)'Y -> float

Multi-class classification model builds a linear classification model in the combined feature and label space:

/// Structured SVM model for multi-class classification
type MultiClass<'X,'Y> = MultiClass of FeatureFunction<'X> * (* W *) float[] * 'Y[]

module MultiClass =
    /// Optimization parameters for the multi-class structured SVM
    type MultiClass<'Y> = {
         /// Penalty for slack variables
         C : float;
         /// Loss function
         loss : LossFunction<'Y>
    }

    /// Learn structured SVM model for multi-class classification
    let learn (X : 'X[])(Y : 'Y[])(F: FeatureFunction<'X>)(parameters: MultiClass<'Y>)(options : OneSlack.OptimizationOptions) = ...

    /// Predict class by multi-class structured model
    let predict (MultiClass(F, W, Y) : MultiClass<'X,'Y>) (x : 'X) : 'Y = ...

Sequence labeling model builds a linear classification model in the combined feature and states space:

/// Structured SVM model for sequence labeling
type Sequence<'X,'Y> = Sequence of FeatureFunction<'X> * (* W *) float[] * 'Y[]

module Sequence =
    /// Optimization parameters for the sequence structured SVM
    type Sequence<'Y> = {
        /// Penalty for slack variables
        C : float;
        /// Loss function
        loss : LossFunction<'Y>
    }

    /// Learn structured SVM model for sequence labeling
    let learn (X : 'X[][]) (Y : 'Y[][]) (F : FeatureFunction<'X>) (parameters : Sequence<'Y>) (options : OneSlack.OptimizationOptions) = ...

    /// Predict labals sequence by sequence structured SVM
    let predict (Sequence(F,W,Y) : Sequence<'X, 'Y>) (X : 'X[]) : 'Y[] = ...

“1-slack” cutting-plane solver is a low-level building block for the above problem-specific Structural SVM solvers. It requires a JointFeatureFunction that maps $\mathcal{X} \times \mathcal{Y}$ space into a sparse vector space (vectors with predominately zero elements). As a result of optimization, the solver returns a weight vector for the sparse vector space features:

/// Joint feature function type
type JointFeatureFunction<'X,'Y> = (* x *) 'X -> (* y *) 'Y -> SparseVector

// 1-slack optimization problem solver
let oneSlack (X : 'X[]) (Y : 'Y[]) (JF : JointFeatureFunction<'X,'Y>) (parameters : OneSlack<'X,'Y>) (options : OptimizationOptions) = ...

Check code, documentation, samples and API reference for more details. Happy hacking!

References

  1. Chang, C. & Lin, C. LIBSVM : A Library for Support Vector Machines // ACM Trans. Intell. Syst. Technol. v.2, 2011 - p. 1–39. 

  2. Fan, R.-E., Chen, P.-H. & Lin, C.-J. Working Set Selection Using Second Order Information for Training Support Vector Machines. // J. Mach. Learn. Res. v.6, 2005 - p. 1889–1918. 

  3. Joachims, T., Finley, T. & Yu, C. J. Cutting-Plane Training of Structural SVMs // Mach. Learn. v.77, 2009 - p. 27–59.