We live in a flooded era of data. It has never been so possible to explore, uncover and learn more about the world through billions of records that appear every second in the thousands of Data Centers, Datamarts and Datalakes spread around the world.
Quantity does not mean quality. This is a maxim that applies to many things in life, and it also applies to data.
What is a “raw” piece of data worth? What is a piece of data that represents the average of a set of data worth? What are an isolated average and an average compared to some reference worth? Correlations? Predictions? Is it a single piece of data? Is it a piece of data that, when combined with other data, is capable of generating some ideas, some businesses, feelings or meanings?
We still have a lot to do.
Despite the current abundance of data, really driven by the advent of Industrial Revolution but more precisely after the implementation of new technologies in the 21st century, we still make mistakes over and over again about the same things. Why is that?
One of the reasons is the lack of past data or the lack of accurate data. How many times have we come across a problem that seems new at first, but if you stop and think about it, many people must have already gone through the same thing, solved it, and this history was lost or not reproduced properly by the data at some point in history?
Another reason: incorrect analysis with the correct data. Many countries do not even have an educational system that values or invests in improving the teaching of mathematics and languages, which are fundamental for us to perform adequate data analysis.
Not to mention the many different biased analysis that we find out there. In companies. In medicine. At Google. All of this because people want a monopoly on power.
I could mention 2 or 3 more reasons. Collecting data, Sampling, exploring data, Correlation, Data Analytics, Data Mining, Data Modeling, Regression, Randomness, Factor analysis, Parametric and Nonparametric Tests, Test Statistics, P-Value, Confidence Interval, Z Table, Normality, Mean, Standard Deviation, Variability, Alpha, Beta, Gamma, Volatility, Statistical Tests, Econometrics, Contingency Tables, Stem-and-Leaf, Design of Experiments, Kolmogorov-Smirnov, Machine Learning, Deep Learning, there are so many concepts and practices out there that are superficially explained and shamelessly applied, encapsulated in algorithms, influencing people to make their decisions, abandoned to their own luck or to the “luck” of others.
I have a dream. To collect the right data. To obtain the right analysis. Without bias. To have an army of people engaged in collecting the right data (anyone can do this), and get paid for it. To have another army of people capable of analyzing the data correctly. So that the market can democratically access this data, in a secure way, so that it can “say” how much it would pay for it. And through this whole package, bring, why not, the truth.
Nothing but The Truth.
