Support Vector Machines (SVMs) is a very popular machine learning method for classification, regression, distribution estimation, etc. Exceptional feature of this method is an ability to handle objects of a diverse nature as soon as there is a suitable kernel function. Nonetheless, popular software libraries like LIBSVM ^{1} and SVM^{light} ^{2} are designed for vector data and it is hard to adopt them for other object types. The F# implementation seems to be promising in terms of readability and extensibility.
Readability and extensibility quite often are the opposite measures of the software design. Increasing one of them leads to decreasing another, but this F# implementation of SVM attempts to address the following design questions without degrading neither readability nor extensibility:
 What data structure to use for SVM model?
 How to implement kernel functions?
 How to implement decision function?
 How to implement optimization problem solving?
SVM model data structure
SVM classification, regression and distribution estimation seem to be different problems, and they should be represented by different data structures designed for a specific problem. However, their formulations share common elements like training samples \(\{\textbf{x}_i: i=1,l\}\) and kernel function \(K\), apart from different decision/approximation functions:

two class classification problem:
\(sign(\sum_{i=1}^l y_i \alpha_i K(\textbf{x}_i,\textbf{x}) + b)\),

regression problem:
\(\sum_{i=1}^l (\alpha_i + \alpha_i^*) K(\textbf{x}_i,\textbf{x}) + b\),

distribution estimation problem:
\(sign(\sum_{i=1}^l \alpha_i K(\textbf{x}_i,\textbf{x})  \rho)\).
where \(y_i \in \{1,+1\}\) is a label of the sample \(\textbf{x}_i\), \(\alpha_i \in R\) and \(\alpha_i^* \in R\) are solutions of a dual SVM optimization problem ^{3}.
Thus, the model needs to store the set of pairs \(\{(\text{x}_i, \beta_i) : \beta_i \neq 0, i=1,l\}\), the bias \(b\) or \(\rho\) and the kernel function \(K\). The F# definition of this data structure is very simple:
Kernels
Kernels are just functions with the type 'X *'X > float
, but any function of
form P_1 * ... * P_N * 'X *'X > float
also becomes a kernel if parameters P_1 * ... * P_N
are defined:
Note: these kernels require from data type ^X
to support two operations 
and .*
.
Classification Decision Function
Mathematical definition of the classification decision function
\(sign(\sum_{i=1}^l y_i \alpha_i K(\textbf{x}_i,\textbf{x}) + b)\),
can be directly translated to the F# implementation:
Optimization Problem Solver
Kernels and decision function are easy to implement in F#. However, the main task in training SVM is to solve the following optimization problem:
where \(\textbf{e}\) is vector of ones, \(C\) is upper bound of all variables, \(Q\) is a symmetric matrix \(l\) by \(l\), \(Q_{ij}=y_iy_jK(\mathbf{x}_i, \mathbf{x}_j)\), \(x_i, i=1,l\) are instances with labels \(y_i \in \{1, +1\}\).
The matrix \(Q\) is usually dense and too large to be stored in memory. Thus, it is hard to use classical optimization methods that update whole vector \(\mathbf{\alpha}\) on each step, but there are decomposition methods that update only a small subset of \(\mathbf{\alpha}\) on each iteration. An extreme case is the Sequential Minimal Optimization (SMO) method, which updates only two elements:
 Find the initial feasible solution \(\mathbf{\alpha}^1\) (e.g., for the classification problem \(\mathbf{\alpha}^1=\mathbf{0}\)).
 If \(\mathbf{\alpha}^k\) is an optimal solution of the optimization problem then stop. Otherwise, find a two element working set \(B=\{i, j\} \subset \{1,...,l\}\) and define \(N=\{1,...,l\} \setminus B\).
 Solve the optimization subproblem involving selected working set elements and find \(\mathbf{\alpha}_B\).
 Update solution to \(\mathbf{\alpha}_B^{k+1}=\mathbf{\alpha}_B\) and \(\mathbf{\alpha}_N^{k+1}=\mathbf{\alpha}_N^k\). Proceed to the step 2.
The F# implementation of this algorithm is the following:
The function recursively updates the solution vector until the moment when no violated conditions found or the stop criterion achieved. Key elements of this implementation are the following:
 Working set selection function
selectWorkingSet
 Optimization subproblem solver function
solve
 Gradient update function
updateG
 Stop criterion function
stopCriterion
Working Set Selection Function
Core elements of working set selection strategies are the following subsets of optimization variables indeces ^{3}:
With F# module Seq
their implementations become just one line of the code:
As a consequence, working set selection strategies become applications of Seq.maxBy
:
 Maximal violating pair working set selection
 Second order information working set selection
Optimization SubProblem Solver
Twovariable optimization problem has the analytical solution ^{3}, but big number of special cases make its implementation problematic. Fortunately, F# provides powerful partern matching statements that dramatically shorten the implementation:
Gradient Update
Working set selection strategies and optimization subproblem solver are using gradient. Fortunately, it is easy to obtain next gradient vector using twovariable optimization problem solution:
An elegant implementation of the gradient update function in the F# language would use Array.map
,
but the cost of the elegance is an allocation of the new array:
Therefore, it is better to rely on mutability of arrays:
Stop Criterion
The F# implementation of the stop criterion is even simpler than its mathematical definition ^{3}:
Examples
A simple way to test the classifier implementation is to use some existing dataset. E.g., breast cancer prediction dataset:
However, an attempt to train the classifier for Vector
data with Kernel.rbf
.
will give the error The type vector does not support operator ''
.
This error means that the data type should have static operations 
and .*
in order to be used with Kernel.rbf
:
Now it is possible to train cancer
model and get predictions:
Results should be the following:
Conclusions
The nature of the F# language stimulates to write code in the academic coding style, which is often opposed to the industrial coding style in terms of readability and maintainability. However, the F# implementation of Sequential Minimal Optimization shows that one function with a dozen of nested functions inside is much more readable than a bunch of classes. Furthermore, only small amount of an additional code is needed in order to apply it to real problem. Though the performance of this implementation is far from the native C implementation, it can be tuned and improved by rewriting some parts of code in imperative way alongside with ongoing experiments.

ChihChung Chang and ChihJen Lin LIBSVM: A Library for Support Vector Machines // ACM Transactions on Intelligent Systems and Technology (TIST).  2011.  2(3).  pp. 127. ↩ ↩^{2} ↩^{3} ↩^{4}