Semagle

Random Numbers in F#

Many problems in engineering, finance and statistics can not be solved by direct methods, but a great number of them can be solved approximately using randomized algorithms. All those algorithms need flexible and efficient pseudo-random number generators. An effective implementation of PRNG in the F# language is somewhat tricky. The F# language is based on .Net framework, which already provides a fast pseudo-random generator class System.Random, but it meets only certain statistical requirements for randomness. However, there are two classes of generators that have good performance: Combined multiple recursive generators (e.g., L’Ecuyer’s MRG32k3a) Twisted general feedback shift register generators (e.g.,...

Summary statistics in F#

Summary statistics are commonly used to build a simple quantitative description of a set of observations. Simple descriptions include mean, variance, skewness and kurtosis, which are quantitative measures of location, spread and shape of the data distribution. However, straightforward implementations of these measures in F# do not scale to large amounts of data. There are more sophisticated methods, but imperative implementations of those methods use mutable variables. Nonetheless, mathematical definitions of those methods allow to build effective functional implementation using higher-order functions in F#. Simple Mean and Variance The F# implementation of mean \(\bar{x}=\sum_{i=1}^N x_i\) for a sequence of any...

Data Sources in F#

There are three popular data formats CSV (Comma Separated Values), JSON (JavaScript Object Notation) and XML (Extensible Markup Language), which are very frequently used in data science. F# Data library (FSharp.Data) implements almost everything you need to access data stored in CSV, JSON and XML formats. Moreover, FSharp.Data implements F# type providers that infer the record structure from a sample document and, thus, allow to check the record structure at the compile time. CSV Files For reading/writing of CSV files FSharp.Data package implements CsvProvider. This provider can be initialized either by passing the sample CSV parameter. Sample parameter value is...

F# for Data Science

How functional programming and type inference can help you to manage of large amounts of structured and unstructured data, merge multiple data sources and API, create visualizations for data interpretation, build mathematical models based on data, and present the data insights/findings? Data science is an attempt to understand and interpret domain data using statistical analysis, machine learning, visualization and programming technologies and tools. The typical project starts with a shallow definition of the problem (e.g., forecast sales using store, promotion, and competitor data). The next stage is data collection, which includes data acquisition, consolidation and management. Collected data is analysed...