Matrix Profile

/Algorithms

 

Adrián Carrio

31 August 2021 | 5 minutes

When it comes to data science, the ability to process large amounts of data quickly and efficiently is essential.

That’s where the Matrix Profiles algorithm can be useful. By combining the power of recursively mapping data with the ability to incorporate third-party information, it becomes possible to:

  • Identify new patterns and similarities in your data
  • Visualize data in new ways
  • Make more informed decisions using your knowledge

When it comes to data science, the ability to process large amounts of data quickly and efficiently is essential. That’s where Matrix Profiles can be useful. By combining the power of recursively mapping data with the ability to incorporate third-party information, it becomes possible to:

Identify new patterns and similarities in your data
Visualize data in new ways
Make more informed decisions using your knowledge

Matrix Profile is an innovative statistical technique for identifying fundamental patterns in large data sets. Once created, these patterns can be used to improve model performance in a variety of ways. They can help in the selection of effective strategies for solving problems or making decisions in finance, industry, science, or any other field.

They are commonly used in time series analysis such as the daily or monthly weather forecast, company financial reports, etc. They can be used in conversion analysis (see below) to predict how an individual will react to various offers – e.g., a unique offer within a product or service. A matrix profile consists of two variables – a priori and actual. The priori is some sort of value or prediction that the algorithm has about the future; the actual is what actually happened in the past.

The main idea is that we can use machine learning techniques to automatically analyze a large amount of information (i.e. data in domains not covered by human expertise) and provide insights into hidden patterns. This is particularly useful for extracting useful insights from hard data sets, such as cancer diagnoses or insurance claims. For example, using current data from anonymized clinical trials, we can infer how patients are performing with respect to their underlying medical condition.

There are three general types of matrix profiles:

static
dynamic
interactive (2D or 3D)

Static matrix is mainly composed of log aggregates, which means stable statistical distribution of values. The dynamic matrix contains several transformation rules, mainly evaluated with respect to the initial condition i.e., the values occur with probability p 1, …, p n such that the final value is a function of initial conditions which is influenced by i) the clustering criterion and ii) the distance function.

An interesting property of the Matrix Profile is that it grows simultaneously with the number of data elements it contains. We can use this property to design extremely innovative algorithms. For example, the algorithms to relate user, place, and event semantically are all readily proposed using only the first three dimensions of the matrix. If you give someone “location-tagged” data points, their profile will be much more relevant to them, in terms of being able to quickly locate them.

Matrix profiles are a powerful statistical tool for any business. As more data flows in, it becomes easier to spot trends. Disrupting a trend can be as simple as identifying which customers are jumping ship. Using data-driven analysis, you can get a feel for what your audience likes, and then you can launch campaigns focused on those themes. The possibilities are nearly endless—and profitable.

Time series data allow for the reconstruction of past events by identifying non-zero residuals in the data. Potential users of time series analysis include historians, economists, political scientists, and statisticians. A matrix profile is an efficient method for extracting features from time series without having to resort to manual filtering or weighting.

If you wish to learn more or have any questions, please do not hesitate to contact us here. Let’s level up your time series data analysis!

 

Daniel Ramírez

Daniel Ramírez

Engineer

Daniel is a Data Engineer for Shapelets. He is an integral part of the back-end development team, where we develop a high-quality platform ensuring the best product design with valuable functionalities for data scientists. Daniel supports the team with his diverse background in Software Engineering.