Time Series Similarity Search Using Textual Approximation

We developed a novel time series search and classification method, which uses textual approximation as its core. Motivation for this research was to develop a single technique which will be able to produce good results for many domains. It uses the existing information/document retrieval and classification models to retrieve and classify time series. Time series are sequence of numbers. Our proposed model represents a time series as a text document. Then it applies the existing document retrieval/classification algorithms and data structures to retrieve similar documents, which represents time series.

One of the biggest problems of time series is to maintain its high volume data. It is very complicated to make a scalable system, which can deal with massive data. We used several algorithms from image processing to filter the data, so that we can make a choice for some important points, which characterized the original time series. It reduces the high dimensionality of a time series to a lower one without loosing its characteristics. Modern search engines are using these techniques and heuristics to retrieve and classify documents.

We used semi-supervised and unsupervised machine learning algorithms to train our system. We mainly applied different types of clustering for automatic document classification. Then, we used the trained system to classify and retrieve time series document from different domains. Our focus was to improve the time series classification accuracy. Our method achieved high recall values in classification task.

We used various algorithms from different domains in our proposed method. For data filtering, we used second difference, moving average and piecewise linear approximation. For time series textual representation, we used tf-idf and symbol sequences, which are extremely used in information retrieval. Clustering is used for automatic document classification.

We performed our classification test on twenty different domains. Our method performed significantly well compare to other existing methods.

Last update: August 2, 2014