IoT
Large Scale
Learning from Data Streams
ECML-PKDD 2017
18 September 2017
IoT
Large Scale
Learning from Data Streams
​
Workshop+Tutorial
ECML-PKDD 2017
18 September 2017
The Workshop
Workshop + Tutorial
18 September 2017
09:00 am
The volume of data is rapidly increasing due to the development of the technology of information and communication. This data comes mostly in the form of streams. Learning from this ever-growing amount of data requires flexible learning models that self-adapt over time. In addition, these models must take into account many constraints: (pseudo) real-time processing, high-velocity, and dynamic multi-form change such as concept drift and novelty. This workshop welcomes novel research about learning from data streams in evolving environments. It will provide the researchers and participants with a forum for exchanging ideas, presenting recent advances and discussing challenges related to data streams processing. It solicits original work, already completed or in progress. Position papers are also considered. This workshop is combined with a tutorial treating the same topic and will be presented in the same day.
The Workshop
2nd ECML/PKDD 2017 Workshop on
Large-scale Learning from Data Streams in Evolving Environments
Workshop + Tutorial
18 September 2017
09:00 am
The volume of data is rapidly increasing due to the development of the technology of information and communication. This data comes mostly in the form of streams. Learning from this ever-growing amount of data requires flexible learning models that self-adapt over time. In addition, these models must take into account many constraints: (pseudo) real-time processing, high-velocity, and dynamic multi-form change such as concept drift and novelty. This workshop welcomes novel research about learning from data streams in evolving environments. It will provide the researchers and participants with a forum for exchanging ideas, presenting recent advances and discussing challenges related to data streams processing. It solicits original work, already completed or in progress. Position papers are also considered. This workshop is combined with a tutorial treating the same topic and will be presented in the same day.
​
Motivation and focus
The volume of data is rapidly increasing due to the development of the technology of information and communication. This data comes mostly in the form of streams. Learning from this ever-growing amount of data requires flexible learning models that self-adapt over time. In addition, these models must take into account many constraints: (pseudo) real-time processing, high-velocity, and dynamic multi-form change such as concept drift and novelty. Consequently, learning from streams of evolving and unbounded data requires developing new algorithms and methods able to learn under the following constraints: -) random access to observations is not feasible or it has high costs, -) memory is small with respect to the size of data, -) data distribution or phenomena generating the data may evolve over time, which is known as concept drift and -) the number of classes may evolve overtime. Therefore, efficient data streams processing requires particular drivers and learning techniques:
-
Incremental learning in order to integrate the information carried by each new arriving data;
-
Decremental learning in order to forget or unlearn the data samples which are no more useful;
-
Novelty detection in order to learn new concepts.
It is worthwhile to emphasize that streams are very often generated by distributed sources, especially with the advent of Internet of Things and therefore processing them centrally may not be efficient especially if the infrastructure is large and complex. Scalable and decentralized learning algorithms are potentially more suitable and efficient.
Aim and scope
​
This workshop welcomes novel research about learning from data streams in evolving environments. It will provide the researchers and participants with a forum for exchanging ideas, presenting recent advances and discussing challenges related to data streams processing. It solicits original work, already completed or in progress. Position papers are also considered. The scope of the workshop covers the following, but not limited to:
-
Online and incremental learning
-
Online classification, clustering and regression
-
Online dimension reduction
-
Data drift and shift handling
-
Online active and semi-supervised learning
-
Online transfer learning
-
Adaptive data pre-processing and knowledge discovery
-
Applications in
-
Monitoring
-
Quality control
-
Fault detection, isolation and prognosis,
-
Internet analytics
-
Decision Support Systems,
-
etc.
-
Submission and Review process
Regular and short papers presenting work completed or in progress are invited. Regular papers should not exceed 12 pages, while short papers are maximum 6 pages. Papers must be written in English and are to be submitted in PDF format online via the Easychair submission interface:
​
https://easychair.org/conferences/?conf=iotstreaming2017
Each submission will be evaluated on the basis of relevance, significance of contribution, quality of presentation and technical quality by at least two members of the program committee. All accepted papers will be included in the workshop proceedings and will be publically available on the conference web site. At least one author of each accepted paper is required to attend the workshop to present.
Important dates
Paper submission deadline: ***Monday, July 17, 2017 EXTENDED DEADLINE***
Paper acceptance notification: Monday, July 30, 2017
Paper camera-ready submission: Monday, August 7, 2017
Program Committee members (to be confirmed)
​
-
Carlos Ferreira, LIAAD INESC Porto LA, ISEP, Portugal
-
Edwin Lughofer, Johannes Kepler University of Linz, Austria
-
Sylvie Charbonnier, Université Joseph Fourier-Grenoble, France
-
Bruno Sielly Jales Costa, IFRN, Natal, Brazil
-
Fernando Gomide, University of Campinas, Brazil
-
José A. Iglesias, Universidad Carlos III de Madrid, Spain
-
Anthony Fleury, Mines-Douai, Institut Mines-Télécom, France
-
Teng Teck Hou, Nanyang Technological University, Singapore
-
Plamen Angelov, Lancaster University, UK
-
Igor Skrjanc, University of Ljubljana, Slovenia
-
Indre Zliobaite, Aalto University, Austria
-
Elaine Faria, Univ. Uberlandia, Brazil
-
Mykola Pechenizkiy, TU Eindonvhen, Netherlands
-
Raquel Sebastião, Univ. Aveiro, Portugal
Workshop Organizers
Moamar Sayed-Mouchaweh
Computer Science and Automatic Control Labs, High Engineering School of Mines, Douai, Francemoamar.sayed-mouchaweh@mines-douai.fr
Albert Bifet
Telecom-ParisTech; Paris, France
albert.bifet@telecom-paristech.fr
Hamid Bouchachia
Department of Computing & Informatics, University of Bournemouth, Bournemouth, UK
abouchachia@bournemouth.ac.uk
João Gama
Laboratory of Artificial Intelligence and Decision Support, University of Porto, Porto, Portugal
jgama@fep.up.pt
Rita Ribeiro
Laboratory of Artificial Intelligence and Decision Support, University of Porto, Porto, Portugal
rpribeiro@dcc.fc.up.pt
The Tutorial
Tutorial: IoT Big Data Stream Mining
The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in IoT stream mining. This tutorial is a gentle introduction to mining IoT big data streams. The first part introduces data stream learners for classification, regression, clustering, and frequent pattern mining. The second part deals with scalability issues inherent in IoT applications, and discusses how to mine data streams on distributed engines such as Spark, Flink, Storm, and Samza
​
Content:
​
1. IoT Fundamentals and Stream Mining Algorithms
– IoT Stream mining setting
– Concept drift
– Classification and Regression
– Clustering
– Frequent Pattern mining
– Concept Evolution
– Limited Labeled Learning
2. IoT Distributed Big Data Stream Mining
– Distributed Stream Processing Engines
– Classification
– Regression
– Open Source Tools
– Applications
​
Presenters:
​
-
Gianmarco De Francisci Morales
-
Albert Bifet
-
Latifur Khan
-
Moamar Sayed-Mouchaweh
-
Joao Gama
-
Wei Fan
Program
9:00 - 10:40 Tutorial: 1. IoT Fundamentals and Stream Mining Algorithms
10:40 - 11:00 Morning coffee break
11:00 - 12:40 Tutorial: 2. IoT Distributed Big Data Stream Mining and Applications
​
12:40 - 14:00 Lunch break
2:00 - 3:40 SESSION 1 Chair: Albert Bifet
2:00 – 2:45 Invited talk: Geoff Webb. Learning from non-stationary distributions
​
3:00 - 3:15 A Sliding Window Filter for Time Series Streams
Gordon Lesti and Stephan Spiegel
3:20 - 3:35 Evolutive deep models for online learning on data streams with no storage
Andrey Besedin, Pierre Blanchart, Michel Crucianu and Marin Ferecatu
3:40 - 4:00 Coffee Break
4:00 - 5:00 SESSION 2 Chair: Moamar Sayed-Mouchaweh
4:00 - 4:15 Hybrid Self Adaptive Learning Scheme for Simple and Multiple Drift-like Fault Diagnosis in Wind Turbine Pitch Sensors
Houari Toubakh and Moamar Sayed-Mouchaweh
4:20 - 4:35 Comparison between Co-training and Self-training for single-target regression in data streams using AMRules
Ricardo Sousa and Joao Gama
4:40 - 4:55 Self-Adaptive Ensemble Classifier for Handling Complex Concept Drift
Imen Khamassi and Moamar Sayed-Mouchaweh
​
5:00 – 5:15 Summary Extraction on Data Streams in Embedded Systems
Sebastian Buschjäger and Katharina Morik
Keynote Talk
Geoff Webb
Geoff Webb is Director of the Monash Monash Centre for Data Science. He is a technical advisor to data science startup BigML. He has been Editor in Chief of the premier data mining journal, Data Mining and Knowledge Discovery (2005 to 2014) and Program Committee Chair of the two top data mining conferences, ACM SIGKDD (2015) and IEEE ICDM (2010), as well as General Chair of ICDM (2012). His primary research areas are machine learning, data mining, user modelling and computational structural biology. Many of his learning algorithms are included in the widely-used BigML, R and Weka machine learning workbenches. He is an IEEE Fellow and received the inaugural Eureka Prize for Excellence in Data Science in 2017, the 2013 IEEE ICDM Service Award, a 2014 Australian Research Council Discovery Outstanding Researcher Award, the 2016 Australian Computer Society ICT Researcher of the Year Award and the 2016 Australasian Artificial Intelligence Distinguished Research Contributions Award.
​
​
Learning from non-stationary distributions
​
The world is dynamic – in a constant state of flux – but most learned models are static. Models learned from historical data are likely to decline in accuracy over time. This talk presents formal tools for analyzing non-stationary distributions and some insights that they provide. Shortcomings of standard approaches to learning from non-stationary distributions are discussed together with strategies for developing more effective techniques.