Matrix profile, an approximation

 

CATEGORY

Algorithms

DATE

31 August

TIME

5 Minutes

When it comes to data science, the ability to process large amounts of data quickly and efficiently is essential.

That’s where Matrix Profiles can be useful. By combining the power of recursively mapping data with the ability to incorporate third-party information, it becomes possible to:

 

  •  Identify new patterns and similarities in your data

  •  Visualize data in new ways

  •  Make more informed decisions using your knowledge

Matrix Profile

 

CONCEPT

When it comes to data science, the ability to process large amounts of data quickly and efficiently is essential. That’s where Matrix Profiles can be useful. By combining the power of recursively mapping data with the ability to incorporate third-party information, it becomes possible to:

  • Identify new patterns and similarities in your data
  • Visualize data in new ways
  • Make more informed decisions using your knowledge

Matrix Profile is an innovative statistical technique for identifying fundamental patterns in large data sets. Once created, these patterns can be used to improve model performance in a variety of ways. They can help in the selection of effective strategies for solving problems or making decisions in finance, industry, science or any other field.

They are commonly used in time series analysis such as the daily or monthly weather forecast, company financial reports, etc. They can be used in conversion analysis (see below) to predict how an individual will react to various offers – e.g., a unique offer within a product or service. A matrix profile consists of two variables – a priori and actual. The priori is some sort of value or prediction that the algorithm has about the future; the actual is what actually happened in the past.

The main idea is that we can use machine learning techniques to automatically analyze a large amount of information (i.e. data in domains not covered by human expertise) and provide insights into hidden patterns. This is particularly useful for extracting useful insights from hard data sets, such as cancer diagnosis or insurance claims. For example, using current data from anonymized clinical trials, we can infer how patients are performing with respect to their underlying medical condition.

There are three general types of matrix profiles:

  • static
  • dynamic
  • interactive (2D or 3D)

Static matrix is mainly composed of log aggregates, which means stable statistical distribution of values. Dynamic matrix contains several transformation rules, mainly evaluated with respect to initial condition i.e., the values occur with probability p 1, …, p n such that the final value is a function of initial conditions which is influenced by i) the clustering criterion and ii) the distance function.

An interesting property of the Matrix Profile is that it grows simultaneously with the number of data elements it contains. We can use this property to design extremely innovative algorithms. For example, the algorithms to relate user, place, and event semantically are all readily proposed using only the first three dimensions of the matrix. If you give someone “location-tagged” data points, their profile will be much more relevant to them, in terms of being able to quickly locate them.

Matrix profiles are a powerful statistical tool for any business. As more data flows in, it becomes easier to spot trends. Disrupting a trend can be as simple as identifying which customers are jumping ship. Using data-driven analysis, you can get a feel for what your audience likes, and then you can launch campaigns focused on those themes. The possibilities are nearly endless—and profitable.

Time series data allow for the reconstruction of past events by identifying non-zero residuals in the data. Potential users of time series analysis include historians, economists, political scientists, and statisticians. A matrix profile is an efficient method for extracting features from time series without having to resort to manual filtering or weighting.