Evelina is a machine learning researcher working in bioinformatics and statistical genomics. She is developing mathematical models which integrate different types of genomic data to distinguish cancer subtypes.
She studied computational statistics and machine learning at University College London and currently she is finishing her PhD at Cambridge University.
Evelina has used many different languages to implement machine learning algorithms, such as Matlab, R or Python. In the end, F# is her favourite and she uses it frequently for data manipulation and exploratory analysis.
She writes a blog on F# in data science at http://www.evelinag.com.
Data science is emerging as a hot topic across many areas both in industry and academia. In my research, I’m using machine learning methods to build mathematical models for cancer cell behaviours. But using today’s data science tools is hard – we waste a lot of time figuring out what format different CSV files use or what is the structure of JSON or XML files. Often, we need to switch between Python, Matlab, R and other tools to call functions that are missing elsewhere. And why are many programming languages used in data science missing tools standard in modern software engineering?
In this talk I’ll look at data science tools in F# and how they simplify the life of a modern scientist, who heavily relies on data analytics. F# provides a unique way of integrating external data sources and tools into a single environment. This means that you can seamlessly access not only data, but also R statistical and visualization packages, all from a single environment. Compile-time static checking and rich interactive tooling gives you many of the standard tools known from software engineering, while keeping the explorative nature of simple, scripting languages.
Using examples from my own research in bioinformatics, I’ll show how to use F# for data analysis using various type providers and other tools available in F#.