Support Vector Machines (SVMs) is a very popular machine learning method for classification, regression, distribution estimation, etc. Exceptional feature of this method is an ability to handle objects of a diverse nature as soon as there is a suitable kernel function. Nonetheless, popular software libraries like LIBSVM 1 and SVMlight 2 are designed for vector data and it is hard to adopt them for other object types. The F# implementation seems to be promising in terms of readability and extensibility.

Readability and extensibility quite often are the opposite measures of the software design. Increasing one of them leads to decreasing another, but this F# implementation of SVM attempts to address the following design questions without degrading neither readability nor extensibility:

• What data structure to use for SVM model?
• How to implement kernel functions?
• How to implement decision function?
• How to implement optimization problem solving?

# SVM model data structure

SVM classification, regression and distribution estimation seem to be different problems, and they should be represented by different data structures designed for a specific problem. However, their formulations share common elements like training samples $\{\textbf{x}_i: i=1,l\}$ and kernel function $K$, apart from different decision/approximation functions:

• two class classification problem:

$sign(\sum_{i=1}^l y_i \alpha_i K(\textbf{x}_i,\textbf{x}) + b)$,

• regression problem:

$\sum_{i=1}^l (-\alpha_i + \alpha_i^*) K(\textbf{x}_i,\textbf{x}) + b$,

• distribution estimation problem:

$sign(\sum_{i=1}^l \alpha_i K(\textbf{x}_i,\textbf{x}) - \rho)$.

where $y_i \in \{-1,+1\}$ is a label of the sample $\textbf{x}_i$, $\alpha_i \in R$ and $\alpha_i^* \in R$ are solutions of a dual SVM optimization problem 3.

Thus, the model needs to store the set of pairs $\{(\text{x}_i, \beta_i) : \beta_i \neq 0, i=1,l\}$, the bias $b$ or $-\rho$ and the kernel function $K$. The F# definition of this data structure is very simple:

# Kernels

Kernels are just functions with the type 'X *'X -> float, but any function of form P_1 * ... * P_N * 'X *'X -> float also becomes a kernel if parameters P_1 * ... * P_N are defined:

Note: these kernels require from data type ^X to support two operations - and .*.

# Classification Decision Function

Mathematical definition of the classification decision function

$sign(\sum_{i=1}^l y_i \alpha_i K(\textbf{x}_i,\textbf{x}) + b)$,

can be directly translated to the F# implementation:

# Optimization Problem Solver

Kernels and decision function are easy to implement in F#. However, the main task in training SVM is to solve the following optimization problem:

where $\textbf{e}$ is vector of ones, $C$ is upper bound of all variables, $Q$ is a symmetric matrix $l$ by $l$, $Q_{ij}=y_iy_jK(\mathbf{x}_i, \mathbf{x}_j)$, $x_i, i=1,l$ are instances with labels $y_i \in \{-1, +1\}$.

The matrix $Q$ is usually dense and too large to be stored in memory. Thus, it is hard to use classical optimization methods that update whole vector $\mathbf{\alpha}$ on each step, but there are decomposition methods that update only a small subset of $\mathbf{\alpha}$ on each iteration. An extreme case is the Sequential Minimal Optimization (SMO) method, which updates only two elements:

1. Find the initial feasible solution $\mathbf{\alpha}^1$ (e.g., for the classification problem $\mathbf{\alpha}^1=\mathbf{0}$).
2. If $\mathbf{\alpha}^k$ is an optimal solution of the optimization problem then stop. Otherwise, find a two element working set $B=\{i, j\} \subset \{1,...,l\}$ and define $N=\{1,...,l\} \setminus B$.
3. Solve the optimization sub-problem involving selected working set elements and find $\mathbf{\alpha}_B$.
4. Update solution to $\mathbf{\alpha}_B^{k+1}=\mathbf{\alpha}_B$ and $\mathbf{\alpha}_N^{k+1}=\mathbf{\alpha}_N^k$. Proceed to the step 2.

The F# implementation of this algorithm is the following:

The function recursively updates the solution vector until the moment when no violated conditions found or the stop criterion achieved. Key elements of this implementation are the following:

• Working set selection function selectWorkingSet
• Optimization sub-problem solver function solve
• Gradient update function updateG
• Stop criterion function stopCriterion

## Working Set Selection Function

Core elements of working set selection strategies are the following subsets of optimization variables indeces 3:

With F# module Seq their implementations become just one line of the code:

As a consequence, working set selection strategies become applications of Seq.maxBy:

• Maximal violating pair working set selection
• Second order information working set selection

## Optimization Sub-Problem Solver

Two-variable optimization problem has the analytical solution 3, but big number of special cases make its implementation problematic. Fortunately, F# provides powerful partern matching statements that dramatically shorten the implementation:

Working set selection strategies and optimization sub-problem solver are using gradient. Fortunately, it is easy to obtain next gradient vector using two-variable optimization problem solution:

An elegant implementation of the gradient update function in the F# language would use Array.map, but the cost of the elegance is an allocation of the new array:

Therefore, it is better to rely on mutability of arrays:

## Stop Criterion

The F# implementation of the stop criterion is even simpler than its mathematical definition 3:

# Examples

A simple way to test the classifier implementation is to use some existing dataset. E.g., breast cancer prediction dataset:

However, an attempt to train the classifier for Vector data with Kernel.rbf.

will give the error The type vector does not support operator '-'. This error means that the data type should have static operations - and .* in order to be used with Kernel.rbf:

Now it is possible to train cancer model and get predictions:

Results should be the following:

# Conclusions

The nature of the F# language stimulates to write code in the academic coding style, which is often opposed to the industrial coding style in terms of readability and maintainability. However, the F# implementation of Sequential Minimal Optimization shows that one function with a dozen of nested functions inside is much more readable than a bunch of classes. Furthermore, only small amount of an additional code is needed in order to apply it to real problem. Though the performance of this implementation is far from the native C implementation, it can be tuned and improved by rewriting some parts of code in imperative way alongside with ongoing experiments.

1. Chih-Chung Chang and Chih-Jen Lin LIBSVM: A Library for Support Vector Machines // ACM Transactions on Intelligent Systems and Technology (TIST). - 2011. - 2(3). - pp. 1-27.  2 3 4