... what is Data Science and why the world needs you ...

This is a rather long post which goes against all rules for an effective internet communication but, since you reached this page, it is likely that you are planning to make a choice which will affect your whole life and, therefore, we believe that it is worth to invest a little time in understanding whether such a choice will or will not be a good one.

1. What data science is not

Data Science is often labeled as "the sexiest job of the century" and it is estimated that in the coming decade many millions of data scientists will be needed in Europe alone. Data science jobs not only are exploding but they are also paid about 30\% more that other types of jobs in the ICT galaxy. Hence the first question to ask is: what is data science? To answer this question let us start in nhe modern way which, right or wrong does'nt matter, is always Wikipedia:
"... data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, machine learning and big data. Data science is a "concept to unify statistics, data analysis and their related methods" in order to "understand and analyze actual phenomena" with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, domain knowledge an information science."

A quick search on internet clearly shows that there are scores of Venn diagrams that combine the above expertises in different ways and always put Data Science at the intersection of something else, the specific compbinations depending on the cultural backgrounds and biases of the various people who made the diagrams. For instance, in many cases you will find as a component "business analystics" in others you will find "process automation" and so on. This is obviously wrong since the two examples above are just two ampong the hundreds of fields which make use of data science.
It is therefore much safer to realise that these interpretations are biased and even though of some historical value, they are obsolete, have very little to do with what data science actually is and, at most, they can be used just to highlight the competences that different experts identify as the necessary cultural background for a Data Scientist. But we shall come back to this later.
350 350 350
Unfortunately, these diagrams are not innocuous since they contribute to spread the (absolutely) false idea that in order to become a data scientist it suffices to take a couple of online courses or a short master to fill up some holes. The equation is easy: "I already have a degree in something, for instance economy, and with little effort i can aquire the skills needed to become a data scientist just by taking a course in Python or a course in introductory machine learning". Behind this piloted musunderstanding there are both a widespread multi-million dollar speculation, and a mountain of frustration experienced by many users when they realize that there is much more to it. Not convinced? Think about this: ... no one would ever think to become an engineer, or a physicist, or a chemist just by taking a couple of short on line course....

2. What is data science

Data are the engine of human scientific culture. Since the early beginnings, human beings have collected data and used them to understand, simplify and order the plethora of phenomena and events affecting their lives. Along the way, this attitude to understand and control nature through measures led to the birth of new sciences. The arithmetic of counting ships led to the birth of mathematics, the attempt to explain measured (data) phenomena, led to the birth of physics, and so on. Statistics stemmed out of mathematics as a result of the need to analyse relatively large emounts of data, etc... It is worth noticing that while these processes were taking place no one realised that he/she was witnessing the birth of a new science. The early "Philosophers of Nature" did not realize that they were laying the seeds of Biology, Physics, ecc. The idea of mathematics, for instance, came long after its first instances; it came when the body of knowledge needed by someone to perform "mathematics" had become so large to require an ad hoc training and a whole life was not enough to master the field. ONly at that point, Mathematics and Geometry separated from Philosophy of Nature to become a separate field.

Nowadays, we all are in the same place of this early philosophers of nature and we are witnessing the birth of the new discipline of "Data Science". But,at a difference with what happened in the past when the transition could take hundreds of years, this new science is blooming at an incredible pace just under our own eyes. Twenty years ago, data science could be still taught within the framework of normal university courses just by adding a couple of courses. Nowadays this has become impossible. There is too mush to know, too many competences to be aquired and the new science of the data can be learned only with years of hard study and no one can hope to master all its aspects even in a whole life of passionate work. In other words, Data science is not anymore at the intersection of other fields or, if it is, it is so in the same way physics is at the intersection of mathematics, philosophy of nature, statistics, metrology, epistemology, etc... or that "computer science" is at the intersection of electronics, mathematics, logic and programming.
For Data Science, the turning point was at the beginning of the '90s of the last century, i.e. when the internet revolution led humankind into the so called "big data" and "artificial intelligence" era, i.e. when the need to extract useful information from an ever growing ocean of data drastically changed the scenario. In other words, Big Data and Internet stand to Data Science as Galileo's work stands to Physics. This transition from "intersection of something else" to science was spledidly exemplified in the visionary and fundational book "The Fourth Paradigm" [1]. In it, Jim Gray, (ideator of the relational data bases and Turing prize winner) clearly realised that everything was due to change on a very short time scale, due to the data deluge caused by the new information technologies and the new generations of sensors and detectors. He therefore introduced "Data Driven Discovery" --- or Data Science --- as the "fourth methodological paradigm" of science and human knowledge. These methodological paradigms were driven by the increasing complexity of our representation of the world and are:
  1. * empirical: the careful collection of data about natural phenomena and other events (III millennium a.C.).
  2. * theoretical: the usage of geometry and mathematics to describe the trends, correlation etc. present in the data (XVII century).
  3. * computational: the advent of computers and numerical simulations as a tool to understand and describe phenomena too complex to be approached in analytical manner (1950's).
  4. * data driven: the capability to capture directly from data an empirical knowledge of the world much deeper and more complex than what his accessible to the human brain (now).
Nowadays, Data Science is a new science in its own rights. It has its own methodology and tools, its well defined scope, its journals and professional societies, and its specific problems. It should be added that as it happens to all newborn "babies", Data Science is growing incredibly fast. In this specific historical moment, as it happened to other disciplines when they were in their early stages, the pioneers of this new field come from all paths of life: economy, physics, mathematics, statistics, enginnering, geology and life sciences, ecc. All from different backgrounds but all sharing a common need to cope with an avalanche of data. But we are not "Data Scientists" but rather the makers of "data Science".
In the future (i.e. you and your academic and professional siblings) will be Data Scientists. Not professionals borrowed from engineering, physics or chemistry, but data scientists. The most saught after profession for this century.

3. Why "data science" and not "data sciences"

Data Science applications cover almost all fields of the human endeavour: finance, marketing, logistic, automation of industrial processes, remote sensing, internet of things, digital libraries, human sciences and sciences of life, fundamental sciences, metereology, geophysics and so on. Everywhere there are lots of data, there are data science applications. In the near past domain experts have coped with the growing amount of data by slowly incorporating into their know-how's bits and pieces of machine learning, statistical learning etc, this led to the birth of the so called X-informatics (where the X stands for bio, astro, geo, ecc. according to the specific domain of application). More recently it has become clear that these fields make use of the same methods and tools, adapting them to their own specific needs. The methods and the methodology, however, are the same, and this is made more and more evident by the so called "methodological transfer" which consists in the exportation of methods written to match the needs of a specific domain to other very different domain. A process which is becoming a frequent and common praxis.
This common ground is what makes data science a science in its own.

4. What makes a good "data scientist"

In this case the Venn diagrams are useful. Data Science needs a strong fundational and methodological background in: statistics, computer sciences and machine learning. Statistics stands to Data Science as Mathematics stands to Physics. In some sense statistics and in particular statistical learning is the language of Data science. Computer science provides the environment where to access and distribute data and where to implement the algorithms and to perform data science. Machine learning in all its countless flavors is the core of the business. But this is only part of what a data scientist ought to know.
1. Among the many tasks which cen be performed by a data scientist a common one is " ... to provide understandable support to the decisional process" . In other words, data scientists provide crucial information to companies, public administration, governments. Information used to take decisions which are likely to affect the life of people and when wrong can cause unimaginable damages. Knowing the law is not enough, this responsibility requires the adoption of an ethical code of conduct which is not based only on laws but on Ethics. To quote from spiderman: "with great power comes great responsability" [2].
2. In the modern world the problems to be solved are often complex and difficult to formalize. In a working environment, be it academic or private, the data scientist places himself/herself at the boundary between a specific domain and the data. To find the optimal solution and to evaluate the quality of the results a data scientist needs to be able to speak to the domain expert and to understand the overall nature and meaning of the problems. This requires additional skills which need to be aquired.
3. The outcome of data driven methods is often complex. The Data Scientist needs to know how to communicate the results of his/her work in an effective way. To do so, he/sche must exploit all available tools: from effective infographic to virtual reality to careful reporting.
4. Last but most important: a data scientist needs to be passioned about data science. The field evolves at an incredible pace and to remain competitive requires a continuous effort to stay updated.
The Master you are about to enroll, tries to achieve all the above goals. To understand how, keep surfing the site.


[1] - The Fourth Paradigm: data intensive scientific discovery, Tony Hey et al., Microsoft Research 2003
[2] - Weapons of math destruction: how big data increases inequality and threathens democracy, Kathy O'Neal, Broadway Books, 2016