# Unpacking Sequences¶

When working with time series, a good set of visualization techniques are quite important as, unfortunately, very few processes beat the human eye when it comes to recognising patterns and trends.

Whilst common visualization techniques, line charts and derivates, are usually satisfactory for quick exploration, a few techniques could help us improving discovery workflows with tools already present in our tool belt.

Let’s use a couple of series that ship with *Shapelets-Compute* as a driving example: Both series
correspond to energy forecasts for the Spanish market in 2016 1; one describes day-ahead energy prices
and the other corresponds to the solar production forecast. Both series are defined in 1h intervals
and describe the entire 2016. To load the series:

```
>>> import shapelets.compute as sc
>>> from shapelets.data import load_dataset
>>> day_ahead_prices = load_dataset('day_ahead_prices')
>>> solar_forecast = load_dataset('solar_forecast')
```

The first thing you notice on these series is that they are dense columnar vectors, which is the
default representation for series in *Shapelets-Compute*. To check it, query the shape property
of the arrays which will return how many elements per dimension each of the arrays have.

```
>>> day_ahead_prices.shape
(8760, 1)
>>> solar_forecast.shape
(8760, 1)
```

Plotting these sequences with matplotlib is a straightforward exercise as follows:

```
import matplotlib.pyplot as plt
import shapelets.compute as sc
from shapelets.data import load_dataset
day_ahead_prices = load_dataset('day_ahead_prices')
solar_forecast = load_dataset('solar_forecast')
fig, ax = plt.subplots(2, figsize=(18, 16), sharex=True)
ax[0].plot(day_ahead_prices)
ax[0].set_title('Day-ahead energy prices')
ax[0].set_ylabel("EUR/MW")
ax[1].plot(solar_forecast)
ax[1].set_title('Solar production forecast')
ax[1].set_ylabel("MW")
ax[1].set_xlabel("Hour of the year")
plt.show()
```

The **day-ahead market** is an exchange for short-term energy contracts where the trades are very much
driven by a closed, small set of participants who are frequently also energy producers. The
**solar production forecast** is an aggregated view of the energy production capabilities and in the
case of solar energy, it is very much affected by atmospheric factors and the effective number of
daylight hours.

Our goal with this *visualization exercise* is to quickly ascertain if there is any form of influence in
the day-ahead market due to the built-in expectation of solar energy production. There are many tools
one could use to perform this task, from descriptive statistics to more complex and elaborated analytical
pipelines based on `matrix_profile`

or causality analysis. These
are not of concern for this tutorial as our objective is to quickly visualize the possible relationship
between both signals.

The key idea for this visual analysis is to move from one-dimensional representation to a
two dimensional, matrix, representation, by folding hours and days into two different axes. This allows
us to interpret time series or sequences as images, making it easier to spot subtle or faint features. Moving
to a matrix representation opens the door to use conventional linear algebra techniques,
like `svd`

, to reason quickly about sequence dynamics and weights.

Since both signals are defined in 1h intervals and the energy market exhibits a lot of patterns
linked with human activities, it seems only natural to group data of the same day together, leaving us
with hours in the `y`

axis and days in the `x`

axis.

In Shapelets, you can use the method `unpack`

to create rolling and batching
windows with any form of specification. For a bached window, the window size and strides along the first
dimension are always the same (in this case, 24); thus, creating these matrices is as simple as:

```
>>> day_ahead_prices_by_day = sc.unpack(day_ahead_prices, 24, 1, 24, 1)
>>> day_ahead_prices_by_day.shape
(24, 365)
>>> solar_forecast_by_day = sc.unpack(solar_forecast, 24, 1, 24, 1)
>>> solar_forecast_by_day.shape
(24, 365)
```

Now, one could simply use matplotlib’s imshow to represent sequences as images:

```
import matplotlib.pyplot as plt
import shapelets.compute as sc
from shapelets.data import load_dataset
day_ahead_prices = load_dataset('day_ahead_prices')
solar_forecast = load_dataset('solar_forecast')
gb = 24
day_ahead_prices_by_day = sc.unpack(day_ahead_prices, gb, 1, gb, 1)
solar_forecast_by_day = sc.unpack(solar_forecast, gb, 1, gb, 1)
fig, ax = plt.subplots(1, 2, figsize=(20, 8), sharex=True)
img1 = ax[0].imshow(day_ahead_prices_by_day, cmap='binary', aspect='auto')
ax[0].set_title('Day-ahead enery prices')
ax[0].set_ylabel("Hour slot")
ax[0].set_xlabel("Day of year")
fig.colorbar(img1, ax=ax[0])
img2 = ax[1].imshow(solar_forecast_by_day, cmap='binary', aspect='auto')
ax[1].set_title('Solar Production forecast')
ax[1].set_ylabel("Hour slot")
ax[1].set_xlabel("Day of year")
fig.colorbar(img2, ax=ax[1])
plt.show()
```

There is so much already to learn from this visualization, **especially in comparison to** the information
we were able to extract from the usual line charts!

For example, just by looking at the *Solar production forecast* image, one could appreciate:

The production is clearly determined by the times in which the sun rises and sets; bear in mind the timezone of this data is set to UTC and there are daylight saving changes in Spain. Summer days (middle of the image) are longer than in winter periods (sides of the image).

There is less production in the central hours of the day during winter than in summer (color scale in the 12h horizontal line).

There are more weather events impacting production during winter than summer (days with low production are dark vertical lines in the image).

There is a clear contribution of solar production during night hours in summer, due to the excess production during daytime hours and the usage of batteries and other storage technologies.

I do recommend scrolling up and review the original line chart for the solar production forecast. With a simple transformation, we have been able to identify with easy a large amount of features that were hidden in the original chart.

Interestingly, day-ahead prices also show a faint figure, similar to the central oval dominating the solar production forecast.

A simple technique to acerbate these visual clues are derivatives, to find *peaks and valleys* through numerical
differentiation.

In Shapelets, two functions, `diff1`

and `diff2`

, perform
first and second-order differences; we’ll describe a more general setup through convolutions and spectral
differentiation later on when we discuss smoothing.

Applying first and second order differentiation to each day is as simple as invoking the methods:

```
>>> first = sc.diff1(day_ahead_prices_by_day)
>>> second = sc.diff2(day_ahead_prices_by_day)
```

```
import matplotlib.pyplot as plt
import shapelets.compute as sc
from shapelets.data import load_dataset
day_ahead_prices = load_dataset('day_ahead_prices')
solar_forecast = load_dataset('solar_forecast')
gb = 24
day_ahead_prices_by_day = sc.unpack(day_ahead_prices, gb, 1, gb, 1)
solar_forecast_by_day = sc.unpack(solar_forecast, gb, 1, gb, 1)
fig, ax = plt.subplots(2, 2, figsize=(24, 16), sharex=True)
img1 = ax[0, 0].imshow(sc.diff1(day_ahead_prices_by_day), cmap='magma', aspect='auto')
ax[0, 0].set_title('Diff1 day-ahead prices')
ax[0, 0].set_ylabel("Hour slot")
fig.colorbar(img1, ax=ax[0, 0])
img2 = ax[0, 1].imshow(sc.diff2(day_ahead_prices_by_day), cmap='magma', aspect='auto')
ax[0, 1].set_title('Diff2 day-ahead prices')
fig.colorbar(img2, ax=ax[0, 1])
img3 = ax[1, 0].imshow(sc.diff1(solar_forecast_by_day), cmap='magma', aspect='auto')
ax[1, 0].set_title('Diff1 solar forecast')
ax[1, 0].set_ylabel("Hour slot")
ax[1, 0].set_xlabel("Day of year")
fig.colorbar(img3, ax=ax[1, 0])
img4 = ax[1, 1].imshow(sc.diff2(solar_forecast_by_day), cmap='magma', aspect='auto')
ax[1, 1].set_title('Diff2 solar forecast')
ax[1, 1].set_xlabel("Day of year")
fig.colorbar(img4, ax=ax[1, 1])
plt.show()
```

From these visualizations, we can quickly educate our initial guess that both series are related, despite the initial impression the line charts produced. It is not the scope of this intro to visualization techniques to delve into the quantitative aspects of the relationship between both signals; we’ll leave that discussion for the near future.

## Other series¶

Before we continue describing other properties of this simple matrix representation, it will be good to point out this technique also applies rather well to sequences that exhibit mixed behaviours or do not follow strong periodic behaviours.

For example, let’s take the following data corresponding to an electrocardiogram, disconnected at the beginning of the series, left recording whilst being attached to the skin of the patient and, finally, recording heart beats:

```
>>> heartbeat = load_dataset('ecg_heartbeat_av')
>>> heartbeat_matrix = sc.unpack(heartbeat, 32, 1, 32, 1)
>>> heartbeat.shape
(3001, 1)
```

```
import matplotlib.pyplot as plt
import shapelets.compute as sc
from shapelets.data import load_dataset
heartbeat = load_dataset('ecg_heartbeat_av')
heartbeat_matrix = sc.unpack(heartbeat, 32, 1, 32, 1)
fig = plt.figure(figsize=(18, 10))
ax0 = plt.subplot2grid((2,3), (0,0), colspan=3)
ax0.plot(heartbeat)
ax1 = plt.subplot2grid((2,3), (1,0))
ax1.imshow(heartbeat_matrix, cmap='magma', aspect='auto')
ax1.set_title("Matrix")
ax2 = plt.subplot2grid((2,3), (1,1))
ax2.imshow(sc.diff1(heartbeat_matrix), cmap='magma', aspect='auto')
ax2.set_title("Diff1")
ax3 = plt.subplot2grid((2,3), (1,2))
ax3.imshow(sc.diff2(heartbeat_matrix), cmap='magma', aspect='auto')
ax3.set_title("Diff2")
plt.tight_layout()
plt.show()
```

Another interesting example is the accelerometer connected to a dog robot, recording whilst walking on concrete, carpet and concrete again.

```
>>> dog = load_dataset('robot_dog')
>>> dog_matrix = sc.unpack(dog, 60, 1, 60, 1)
>>> dog_matrix.shape
(60, 216)
```

```
import matplotlib.pyplot as plt
import shapelets.compute as sc
from shapelets.data import load_dataset
dog = load_dataset('robot_dog')
dog_matrix = sc.unpack(dog, 60, 1, 60, 1)
fig = plt.figure(figsize=(18, 10))
ax0 = plt.subplot2grid((2,3), (0,0), colspan=3)
ax0.plot(dog)
ax1 = plt.subplot2grid((2,3), (1,0))
ax1.imshow(dog_matrix, cmap='magma', aspect='auto')
ax1.set_title("Matrix 32")
ax2 = plt.subplot2grid((2,3), (1,1))
ax2.imshow(sc.diff1(dog_matrix), cmap='magma', aspect='auto')
ax2.set_title("Diff1")
ax3 = plt.subplot2grid((2,3), (1,2))
ax3.imshow(sc.diff2(dog_matrix), cmap='magma', aspect='auto')
ax3.set_title("Diff2")
plt.tight_layout()
plt.show()
```

Whilst identifying the regime change is possible by looking at a sufficiently enlarged line chart, the matrix visualization makes those changes immediately visible, with the bonus of seeing quite clearly the movement cycle of the robot.

## Making further use of matrices¶

Following the two initial examples, let’s study two other ways in which we can exploit the dimensionality of
the transformation to derive smooth series and decomposition analysis based on `svd`

.

### Smooth Series: Convolution¶

Conventional smooth algorithms usually work over sequences as one-dimensional vector, by applying a form of linear operator over the sequence. Examples of such are moving and exponential averages or, more generally, polynomial interpolators, from simple ones like linear interpolators to complex ones, like cubic splines.

Extending sequences to two dimensions permits the use of convolution operators that are not just
restricted to previous / next values, but to those values n steps away from the current one, being n the number of
points we grouped in the `y`

axis.

In our example with solar forecasting, we could use the values for the previous and next day at the same hour, along side with the values in the previous and next hour to smooth the series by using a 2D convolution operator:

```
>>> filter = sc.array([
... [0, 1, 0],
... [1, 1, 1],
... [0, 1, 0]
... ], dtype= "float32")
>>> filter /= sc.sum(filter) # equalize all weights
>>> r = sc.convolve2(solar_forecast_by_day, filter, 'default') # apply the filter
>>> rr = sc.pack(r, r.size, 1, 24, 1, 24, 1) # reconstruct the signal
```

The centre point in the filter corresponds to the current value, the values at their side correspond to the previous and
next 24h and the values on the middle column corresponds to the previous and next hour (same day) values.
`convolve2`

applies the two-dimensional filter to the signal. In Shapelets, convolve operations
are batched, which means that you can apply simultaneously n number of filters to m number of signals in a single operation
in your CUDA or OpenCL device!

Applying this type of transformation produces a smooth series, whose values have better differentiable profile than the original series, which may be a really good property when using numerical algorithms sensitive to the presence of abrupt changes and discontinuities.

```
import matplotlib.pyplot as plt
import shapelets.compute as sc
from shapelets.data import load_dataset
solar_forecast = load_dataset('solar_forecast')
gb = 24
solar_forecast_by_day = sc.unpack(solar_forecast, gb, 1, gb, 1)
filter = sc.array([
[0, 5, 0],
[5, 1, 5],
[0, 5, 0]
], dtype= "float32")
filter /= sc.sum(filter)
r = sc.convolve2(solar_forecast_by_day, filter, 'default')
rr = sc.pack(r, r.size, 1, gb, 1, gb, 1)
fig, ax = plt.subplots(2, 1, figsize=(18, 10))
ax[0].plot(solar_forecast)
ax[0].set_title("Original Series")
ax[1].plot(rr)
ax[1].set_title("Smooth Series")
plt.show()
points = 24*7
fig, ax = plt.subplots(figsize=(18, 8))
ax.plot(solar_forecast_by_day[:points], label='Original')
ax.plot(rr[:points], label='Smooth')
plt.legend()
plt.show()
fig, ax = plt.subplots(1, 2, figsize=(18, 8), sharey=True)
ax[0].plot(sc.fft.spectral_derivative(solar_forecast_by_day[100:175]))
ax[0].set_title("Original Series")
ax[1].plot(sc.fft.spectral_derivative(rr[100:175]))
ax[1].set_title("Smooth Series")
plt.show()
```

The last two charts show the result of computing the derivative, using `spectral_derivative`

,
zooming the chart on a small section of the signal. It is quite clear that the behaviour of the smoothed signal is quite
good compared with the raw original series, as we hardly can appreciate oscillations and the peaks have fewer edges.

### Applying svd¶

Another advantage of representing sequences as matrices is the usage of linear algebra techniques to perform decomposition,
allowing us to study the inherent structure of the data. `svd`

is a great example of
such transformation as the singular values produced by this transformation highlights the importance of each component and,
by truncating it or ‘lowering the rank’, we could reconstruct the original series without some features that do not provide
a huge amount of information.

Computing the SVD transformation in Shapelets is quite straightforward, benefiting from the acceleration of your GPU or OpenCL device:

```
>>> svd_results = sc.svd(day_ahead_prices)
```

`svd_results`

is an object containing the results of the decomposition. In our case, the matrix \(U\) will contain daily
features, whilst the matrix \(V^T\), will contain yearly features. The diagonal matrix \(S\) contains a sorted
list with the weight of each factor.

```
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import shapelets.compute as sc
import numpy as np
from shapelets.data import load_dataset
day_ahead = load_dataset('day_ahead_prices')
gb = 24
day_ahead_by_day = sc.unpack(day_ahead, gb, 1, gb, 1)
svd_results = sc.svd(day_ahead_by_day)
fig = plt.figure(figsize=(18, 12))
ax0 = plt.subplot2grid((2,3), (0,0), colspan=1)
ax0.bar(range(svd_results.s.size), np.array(svd_results.pct), label='weight')
ax0.plot(svd_results.acc_pct, marker='o', label='accumulated', color='red')
ax0.yaxis.set_major_formatter(mtick.PercentFormatter(1.0))
ax0.set_ylim([0, 1.05])
ax0.set_title('Factor Contribution')
ax0.axhline(y = .9, color = 'g', linestyle='--')
ax0.legend()
ax1 = plt.subplot2grid((2,3), (0,1), colspan=3)
ax1.plot(svd_results.u[:,0])
ax1.set_title('Dominant Hourly Profile')
ax2 = plt.subplot2grid((2,3), (1,0), colspan=4)
ax2.set_title('Dominant Yearly Profile')
ax2.plot(svd_results.vt[0,:].T)
plt.tight_layout()
plt.show()
```

The `SVDResult`

instance returned by `svd`

, has built in support for reconstruct
the original sequence, using fewer factors if desired. For example, reconstructing the original signal using only the first
factor, will result in the following approximation:

```
>>> lr = svd_results.low_rank(1)
>>> reconstructed = sc.pack(lr, lr.size, 1, 24, 1, 24, 1)
```

```
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import shapelets.compute as sc
from shapelets.data import load_dataset
day_ahead = load_dataset('day_ahead_prices')
gb = 24
day_ahead_by_day = sc.unpack(day_ahead, gb, 1, gb, 1)
svd_results = sc.svd(day_ahead_by_day)
lr = svd_results.low_rank(1)
reconstructed = sc.pack(lr, lr.size, 1, gb, 1, gb, 1)
fig, ax = plt.subplots(2, 1, figsize=(18, 10))
ax[0].plot(day_ahead)
ax[0].set_title("Original Series")
ax[1].plot(reconstructed)
ax[1].set_title("From 1st Factor")
plt.show()
points = 24*7
fig, ax = plt.subplots(figsize=(18, 8))
ax.plot(day_ahead_by_day[:points], label='Original')
ax.plot(reconstructed[:points], label='From 1st Factor')
plt.legend()
plt.show()
```

- 1
Energy data is from the ENTSO-E Transparency Platform. If you are looking for a Python interface for their API, entsoe-py. Login and private key is required to access their data services.