Behind the data scientist
Welcome back to
Behind the data scientist
Hello everyone, welcome again to this series of interviews by Shapelets!
As you know, we will be interviewing experts in the field in order to explore different topics in the data science community. Today, we are pleased to introduce you to Adrián Carrio, our Lead Data Scientist at Shapelets. Adrián has also co-founded ThermoHuman and Dronomy, and he is the author of more than 30 scientific publications and one patent.
During this interview, we will discuss the main challenges data scientists face, their roles, and tips for getting the most out of their work.
Watch the interview here below, don’t miss it!
Could you please share your professional background with us?
Adrián Carrio – I studied Industrial Engineering at the University of Oviedo, in Spain, and worked there as a research scholar for like a year, working on computer vision applied to the steel industry. Then in 2013, I moved to Madrid to pursue a Ph.D. in Automation and Robotics. And this was a very fruitful period, I had the opportunity to learn a lot during my Ph.D. and to apply AI to several Tech apps for projects in various industries.
Also, during this period, I co-funded ThermoHuman which applies AI to thermography in order to predict injuries in sports and I also worked as CTO for the Spanish Fintech company Accurate-Quant. And then in 2021, I joined Shapelets.
Why Data Science?
A.C. – Data Science is quite a powerful set of tools in the sense that it renders the invisible, visible. I personally like it very much because it simplifies very well how the scientific method works. We typically have observations of something we know little about, it may be the way an industrial process evolves, the way people’s opinions are affected by certain events, or the way the human body operates at a very low level.
And we then hypothesize about possible relationships between variables, between these variables by building models. And at the end, we want to check whether our hypotheses are statistically relevant, basically.
So, the only thing that modern data science adds to all of this is, are visualization techniques, that help present and understand all these insights that we discovered.
What do you like the most about your job?
A.C. – What I like the most about being a Data Scientist is being able to address very different problems and create impact in very heterogeneous sectors. Data Science is very interdisciplinary and I’m a curious mind so I really like to dig into many different problems: health, energy, manufacturing industries, marketing, and anything is possible with Data Science.
Who is your role model?
A.C. – Well, I did not really have a role model. To be honest, I do not follow many other data scientists or their work. But rather, academic profiles that I consider relevant. There are many of these of course, and each of them is applying data science to specific fields but If I had to mention a few it would probably be the fathers of AI: Geoffrey Hinton, Yoshua Bengio, Yann LeCun, and also Ian Goodfellow.
What advice would you give to data scientists?
A.C. – My advice is to keep things simple. Many of us data scientists, because we are techies, we tend to overcomplicate things. We try to use more complex or sophisticated models, and this is usually the opposite of what one is supposed to do. So, as Occam’s razor states: “the simplest explanation is usually the most probable”.
Another piece of advice would always be to try to split large analysis problems into small ones. Start from the easiest approach in order to test and build your baseline and then increase the complexity gradually, controlling it. And in that way, you’ll avoid the complexity of the problem controlling or controlling it.
What are the main challenges for a Data Scientist?
A.C. – One of the biggest challenges for a data scientist who is presented with a new problem is to find high-quality relevant data. In many cases, however, the data is already given, and you are required to get a proper understanding of the problem you are trying to solve. This is usually an important challenge. And acquiring this domain-relevant knowledge is especially challenging if you are not familiar with the specific industry you’re solving the problem.
What do you value the most in a data analysis technology or software?
A.C. – The features I value the most are the ones that prevent me from wasting time or doing low-value tasks. For Example, fixing format issues is something I personally hate to do, so if I come across a tool that prevents me from doing those types of tasks, that’s great.
I also appreciate a lot of local libraries that bring large functionalities with a few lines of code and I also love all kinds of visualization tools that lead you to things that are easy to do with the most common tools and that are highly customizable. These are really nice tools to use.
What are the main issues when communicating insights to the business area and stakeholders?
A.C. – For me, the main issues are related to communication. Since you have built your solution using a data scientist mindset, it is sometimes hard to explain what you are doing and how you’ve arrived at the results without sounding technical.
So communicating is especially important if you are using some black box live models, in this case, you usually have to make an extra effort for the stakeholder to actually understand what you are doing and actually believe in the results. In general, you have to be able to put yourself in the eyes of the stakeholders and try to see the problem through their eyes.
We want to use the same jargon that they use and provide the results using the same units, and the same metrics if you want them to respect the work that you are doing.
What approach do you follow when working on a data analysis-based project?
A.C. – First, I’d probably want to get a proper understanding of the problem with the stakeholders. So you want to try to understand things at the lowest possible level and learn about the data sources, data relationships, systems, and sensors that are behind them.
Then you look at the data and check for the obvious insights, and missing data and you also have to pay attention to a data set imbalance -which is an extremely common issue in data science-. And then comes the model-building stage where you want to test a bunch of simple models and make sure they don’t overfit.
Additionally, if you have a lot of data you can use more complex models, but is basically taking different models and evaluating them. Once you arrive at the right model, you can simply build the visualizations to make your insights easy to grasp. That would be my approach to follow.
What skills does a data scientist need for 2022?
A.C. – Ok, well, with respect to skills, I’m in my opinion think a data scientist should have a very strong mathematical background in statistics and linear algebra, and be familiar with things like significant tests and these kinds of techniques. Of course, programming skills in Python, being fluent with libraries like pandas and NumPy. And also, if possible, have some sort of familiarity with machine learning libraries such as Scikit-learn, TensorFlow, and Pytorch.
Then is also important to be very familiar with visualization libraries like Matplotlib, Plotly, and Alter or utils like Microsoft Power BI. And you also want to know how to use Excel. It might seem funny, but Excel can be quite useful for quick checks on your data.
I think one of the most relevant skills is communication, having proper communication skills, and being able to explain complex concepts without being too technical. That would be my set of skills.
Digital Marketing Specialist
Say hello to our Digital Marketing Specialist! Fátima’s role at Shapelets is to plan and execute digital marketing strategies and content to creatively develop and optimize our business on different platforms. She specializes in SEO, social media and digital content.