Lessons learned from leading SciSports’ Data Analytics team
The Data Analytics team combines best practices and tools from data analysis, machine learning and software engineering to produce actionable football metrics for the Insight platform. This blog post presents our approach to developing novel football analytics tools and sheds some light on our current technology stack.
Developing football analytics tools requires skills and expertise from several different domains, including machine learning, programming, and football. To build tools that address the needs of football practitioners, the Data Analytics team adopts best practices from each of these three domains. In this blog post, we will present two important lessons that we have learned along the way.
Start building an end-to-end solution
A crucial aspect to developing useful and insightful analytics tools is to involve the intended end users from the very beginning. In the case of our Insight platform, the end users are mostly scouts, technical directors, and player agents. To build impactful analytics tools for them, we need to address two big challenges. The first challenge is to genuinely understand the needs of our end users to ensure we build the right tools. The second challenge is to present the results produced by our tools in an intuitive way, preferably in a clear and appealing visualization or animation.
To ensure that we build the right tools in the right way, we adopt a rapid-prototyping process, which aims at turning a candidate solution to a certain need into a prototype in a short period of time. This approach allows to incorporate feedback and address comments from end users in an iterative fashion from the beginning. A machine learning project typically consists of several steps such as collecting, cleaning and preprocessing the data, developing the machine learning models, and visualizing the results. To speed up the process, we first develop minimalist components to accomplish each of these tasks, and then improve them iteratively based on the feedback we receive.
Use the right tools for the job
Developing analytics tools in a rapid-prototyping fashion requires appropriate tools that support our desired workflow. Since developing analytics tools involves both programming and machine learning, we aim to combine the best-suited tools from both domains. As a result, we have built our tool and technology stack around the flexible yet powerful Python ecosystem for machine learning and data science. The Python ecosystem offers a plethora of well-maintained libraries and frameworks that often allow to address challenging tasks in just a few lines of code.
The most popular libraries in the Data Analytics team are Jupyter Notebook, pandas, and scikit-learn. Jupyter Notebook combines Python code and visual elements such as figures and tables in a single document simplifying the task of gathering feedback from end users. The pandas library offers easy-to-use data structures and data analysis tools, whereas the scikit-learn library provides a uniform interface to machine learning and data analysis tools. We also increasingly leverage the power of deep learning frameworks such as PyTorch to better handle the large amounts of data that we have at our disposal. Once a prototype has become sufficiently mature, we turn it into a Python module using PyCharm or Visual Studio Code before it gets deployed to Amazon AWS.
The lessons discussed in this blog post have enabled our team to speed up their research and development efforts. Our tutorial on building an expected-goals model already highlighted some of these lessons in a practical use case. The tutorial presents a working end-to-end solution, be it with a minimal version for each of the different steps that need to be taken. Furthermore, the tutorial shows how we leverage the power of Jupyter Notebook and the Python ecosystem for machine learning and data science.
Discover how we implement actionable football metrics in our Insight platform
We always strive to attract the brightest (tech) talents in the world in order to create technological excellence