Support Vector Machines (SVMs) is a very popular machine learning method for classification, regression, distribution estimation, etc. Exceptional feature of this method is an ability to handle objects of a diverse nature as soon as there is a suitable kernel function. Nonetheless, popular software libraries like LIBSVM 1 and SVMlight 2 are designed for vector data and it is hard to adopt them for other object types. The F# implementation seems to be promising in terms of readability and extensibility.
Readability and extensibility quite often are the opposite measures of the software design. Increasing one of them leads to decreasing another, but this F# implementation of SVM attempts to address the following design questions without degrading neither readability nor extensibility:
- What data structure to use for SVM model?
- How to implement kernel functions?
- How to implement decision function?
- How to implement optimization problem solving?
SVM model data structure
SVM classification, regression and distribution estimation seem to be different problems, and they should be represented by different data structures designed for a specific problem. However, their formulations share common elements like training samples and kernel function , apart from different decision/approximation functions:
two class classification problem:
distribution estimation problem:
where is a label of the sample , and are solutions of a dual SVM optimization problem 3.
Thus, the model needs to store the set of pairs , the bias or and the kernel function . The F# definition of this data structure is very simple:
Kernels are just functions with the type
'X *'X -> float, but any function of
P_1 * ... * P_N * 'X *'X -> float also becomes a kernel if parameters
P_1 * ... * P_N are defined:
Note: these kernels require from data type
^X to support two operations
Classification Decision Function
Mathematical definition of the classification decision function
can be directly translated to the F# implementation:
Optimization Problem Solver
Kernels and decision function are easy to implement in F#. However, the main task in training SVM is to solve the following optimization problem:
where is vector of ones, is upper bound of all variables, is a symmetric matrix by , , are instances with labels .
The matrix is usually dense and too large to be stored in memory. Thus, it is hard to use classical optimization methods that update whole vector on each step, but there are decomposition methods that update only a small subset of on each iteration. An extreme case is the Sequential Minimal Optimization (SMO) method, which updates only two elements:
- Find the initial feasible solution (e.g., for the classification problem ).
- If is an optimal solution of the optimization problem then stop. Otherwise, find a two element working set and define .
- Solve the optimization sub-problem involving selected working set elements and find .
- Update solution to and . Proceed to the step 2.
The F# implementation of this algorithm is the following:
The function recursively updates the solution vector until the moment when no violated conditions found or the stop criterion achieved. Key elements of this implementation are the following:
- Working set selection function
- Optimization sub-problem solver function
- Gradient update function
- Stop criterion function
Working Set Selection Function
Core elements of working set selection strategies are the following subsets of optimization variables indeces 3:
With F# module
Seq their implementations become just one line of the code:
As a consequence, working set selection strategies become applications of
- Maximal violating pair working set selection
- Second order information working set selection
Optimization Sub-Problem Solver
Two-variable optimization problem has the analytical solution 3, but big number of special cases make its implementation problematic. Fortunately, F# provides powerful partern matching statements that dramatically shorten the implementation:
Working set selection strategies and optimization sub-problem solver are using gradient. Fortunately, it is easy to obtain next gradient vector using two-variable optimization problem solution:
An elegant implementation of the gradient update function in the F# language would use
but the cost of the elegance is an allocation of the new array:
Therefore, it is better to rely on mutability of arrays:
The F# implementation of the stop criterion is even simpler than its mathematical definition 3:
A simple way to test the classifier implementation is to use some existing dataset. E.g., breast cancer prediction dataset:
However, an attempt to train the classifier for
Vector data with
will give the error
The type vector does not support operator '-'.
This error means that the data type should have static operations
in order to be used with
Now it is possible to train
cancer model and get predictions:
Results should be the following:
The nature of the F# language stimulates to write code in the academic coding style, which is often opposed to the industrial coding style in terms of readability and maintainability. However, the F# implementation of Sequential Minimal Optimization shows that one function with a dozen of nested functions inside is much more readable than a bunch of classes. Furthermore, only small amount of an additional code is needed in order to apply it to real problem. Though the performance of this implementation is far from the native C implementation, it can be tuned and improved by rewriting some parts of code in imperative way alongside with ongoing experiments.