Modeling Recurring Concepts in Single-label and Multi-label Streams

Ahmadi, Zahra

Modeling Recurring Concepts in Single-label and Multi-label Streams

Files

100003220.pdf (4.04 MB)

Date issued

2019

Authors

Ahmadi, Zahra

License

InC-1.0
https://rightsstatements.org/vocab/InC/1.0/

Item

Dissertation

Open Access

Abstract

Today, we have access to a vast amount of data in the forms of images, speech signals, structured and unstructured texts, and sensor-based signals. Our digital universe is growing quickly. Statistics indicate that 500 million tweets are posted every day. 65 billion messages are transferred on WhatsApp per day. 294 billion emails are sent daily via different platforms. Each self-driving car creates 4 terabytes of data per day. According to a study by Digital Universe, the amount of data produced by humans and machines will exceed 44 billion terabytes by 2020. This means that there will be 5,200 gigabyte of data for every person on earth. It is estimated that by 2025, the created data will increase to 463 million terabytes per day. Processing and leveraging knowledge from these sources of data requires proper infrastructure and efficient methods to analyze them in real-time. Data stream mining is the field of propounding such scalable and efficient methods, which can process data incrementally. Incremental induction from a limited set of observations of an unknown distribution has been the topic of many studies for a long time. Depending on the application, the target class can be only one or many labels among which some unknown dependencies exist. Although this problem is challenging enough, in many of the stream mining applications, the statistical properties of the input and target variable(s) may change over time in unforeseen ways. This phenomena is called concept drift. If not considered and captured properly, the trained online models quickly become obsolete over time. However, these drifts are not well-defined and could contain any change in the statistical properties of data, adding more difficulty to the prediction problem. In this thesis, our overall focus is to model one type of drifts which is called recurrent concepts. Recurrent concepts are important to be captured independently, as most of stream mining methods employ a forgetting mechanism in the learning process and forget their outdated extracted knowledge. To this end, we propose the GraphPool and multi-label GraphPool frameworks for both single-label and multi-label data streams. These frameworks keep a pool of concepts and their transitions in a first-order Markov chain to quickly recover from drifts in the streams with periodic behavior. In the course of designing such a framework for multi-label streams, we develop an efficient algorithm for classifying stationary multi-label streams. To show the effectiveness of our methods, we conduct an extensive set of experiments with both synthetic and real-world data.

DOI

http://doi.org/10.25358/openscience-3587

URI

https://openscience.ub.uni-mainz.de/handle/20.500.12030/3589

Collections

JGU-Hochschulschriften

Full item page

Modeling Recurring Concepts in Single-label and Multi-label Streams

Files

Date issued

Authors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

License

Abstract

DOI

Description

Keywords

Citation

URI

Relationships

Collections