Whilst no single definition of data science exists, it broadly concerns the application of the scientific method to data. Data science is used to obtain insights from data through a combination of advanced analytic techniques.

Typically, data science brings together techniques from several fields, acting at the intersection of statistics, mathematics, machine learning and programming. Machine learning in particular has seen a huge growth in interest recently, and whilst the hype may not always live up to reality, recent advances in this field allow novel insights to be drawn from data like never before.

Knowledge of these techniques means little without the power to implement them. Fortunately, many of these techniques can be easily implemented in Python and R, open source programming languages that benefit from a wide community of experts who contribute code to these projects and make it publicly available. A little bit of knowledge is a dangerous thing, however, and it would be unwise to implement these techniques without knowing how they work, and when and when not to use them.

The best data scientists have a sound knowledge of machine learning and statistical techniques, combined with strong programming capability.


The demands of data science in a corporate setting are very different to those in an immunological research setting. For example, the practice of web scraping (automating the retrieval of information from web pages) is commonly required in a business setting, but in the context of immunology, this would be an irrelevant skill.

Courses and textbooks that focus on data science as a general skill set are not particularly relevant for the researcher in immunology. For this area of research, data science requires a heavy focus on frequentist statistics (i.e. statistical significance testing) and visualisation techniques (i.e. dimension reduction, clustering and alternative methods to display information).

Rather than learning about clustering as a general method for grouping similar objects together, it's far more useful to learn about and apply clustering techniques in a more relevant context such as flow cyotmetry. Likewise, machine learning concepts such as classification models are easier to understand when introduced in the context of developing a clinical patient model.


Data science for immunologists places more emphasis on relevant techniques, ignoring methods that will likely never be needed and instead focussing on those that must form the backbone of an immunologists analytical toolkit.


Take a look at Examples to learn how you can set your machine up ready for coding.