Tutorial I: Medical Mining
Myra Spiliopoulou, Pedro Pereira Rodrigues, Ernestina Menasalvas
Tutorial II: Population Informatics using Big Data
Peter Christen, Hye-Chung Kum, Qing Wang, Dinusha Vatsalan
Social genomes are the digital footprints of individuals. They consist of records about people’s interactions with governments, businesses, and other individuals, as collected and linked from many data sources. Social genomes are the basis of Population Informatics, the emerging discipline of studying populations by analysing large population databases that contain detailed information about people, such as the health, education, financial, census, location, shopping, employment, or social networking records of a large proportion of individuals in a population. Population Informatics is a crucial enabling technology to understand our rapidly changing dynamic societies to improve the health of a society using the social genome much like the human genome is used to improve the health of a person. In this tutorial we illustrate the importance of Population Informatics through several case studies and provide an overview of the key challenges, techniques, and methodologies of Population Informatics, including advanced data integration and privacy technologies.
Peter Christen is an Associate Professor at the Research School of Computer Science at the Australian National University. He received his Diploma in Computer Science Engineering from ETH Zurich in 1995 and his PhD in Computer Science from the University of Basel in 1999. His interests are in data mining and data matching (record linkage). He has published over 130 articles in these areas, including the book ‘Data Matching’ (Springer, 2012), and he is co-editor of the book ‘Population Reconstruction’ (2015) also by Springer. He is the principle organiser of the workshop ‘Population Informatics for Big Data’ held at ACM SIGKDD in Sydney in August 2015.
Hye-Chung Kum is an Associate Professor in the Department of Health Policy and Management at Texas A&M University (TAMU) with a joint appointment in the Department of Computer Science and Engineering, and the Department of Industrial and Systems Engineering. Her research interests include population informatics, data science, health services research, and health informatics. Kum received a PhD in Computer Science and a Masters of Social Work in the Policy and Management track at University of North Carolina at Chapel Hill (UNC-CH). She leads the Population Informatics Research Group, a joint effort between TAMU and UNC-CH. She first coined the terms ‘Population Informatics’ and ‘Social Genome’ in the vision paper on Population Informatics in 2014 in IEEE Computer.
Qing Wang is a Research Fellow in the Research School of Computer Science at the Australian National University. She received her Masters of Economics from Jinan University in 2000, Masters of Information Systems from Massey University in 2006, and a PhD in Computer Science from Christian-Albrechts-University Kiel in 2010. Her current research interests are data management, conceptual modelling and data mining. She has extensive IT industry experience (over 12 years), with a focus on the financial industry sectors, and has published 50 articles in the past, including well-known journals such as Theoretical Computer Science, IEEE Transactions on Knowledge and Data Engineering, and World Wide Web.
Dinusha Vatsalan is a Research Fellow in the Research School of Computer Science at the Australian National University. She received her BSc (Hons) in Information and Communication Technology from the University of Colombo School of Computing, Sri Lanka, in 2009 and her PhD in Computer Science from the Australian National University in 2014. Her research interests are in data mining, privacy-preserving record linkage, and health informatics. Dinusha received an Endeavour Postgraduate Research Award in 2011 from the Australian Government.
Tutorial III: Deep Learning Implementations and Frameworks
Seiya Tokui, Kenta Oono, Atsunori Kanemura, Toshihiro Kamishima
Deep learning is becoming more and more popular in a wide variety of machine learning research. Since its success, deep learning has mainly been used for cognitive tasks such as speech and visual recognition, while lately its research is being applied to a wider range of applications including natural language processing and reinforcement learning. In parallel to the progress of deep learning methods, many programming frameworks are being developed to satisfy the demands of researchers and practitioners in the field. These frameworks are generally compared in terms of performance and the programing paradigms they are based on, which also have seen rapid advancement in the last few years. Thus, proper understanding of the different features and capabilities of these frameworks, becomes very important when selecting the appropriate framework to use for implementing desired neural networks.
In this tutorial, we introduce some common core concepts shared among various deep learning frameworks, and then go through a comparison of the important differences, which we hope can help you select the most appropriate framework for your own applications.
Seiya Tokui is a researcher at Preferred Networks, Inc., Japan. He received the master’s degree in mathematical informatics at the University of Tokyo in 2012. He is the lead developer of the deep learning framework, Chainer. His research interests include deep learning, its software design, computer vision, and natural language processing.
Kenta Oono is an engineer at Preferred Networks, Inc. Japan. He received his master’s degree in mathematics at University of Tokyo in 2011. He is the core developer of Chainer. His research interest is deep learning, bioinformatics and theoretical analysis of machine learning.
Atsunori Kanemura is a Research Scientist at Mathematical Neuroinformatics Group, AIST, Japan. He obtained the Ph.D. degree in Informatics from Kyoto University in 2009. He won a JNNS Best Paper Award in 2010 and an IEEE TrustCom Best Paper Award in 2015. His research interests include machine learning, statistical signal processing, and analysis of human data.
Toshihiro Kamishima was born in 1968. He has received the master’s degree in Engineering at the Kyoto university in 1994, and received the degree of doctor of informatics at the Kyoto university in 2001. He has joined to Electrotechnical Laboratory in 1994, and it is reorganized into Advanced Industrial Science and Technology. He received Japanese Society for Artificial Intelligence Annual Conference Awards in 2003, 2008, 2011, 2014, and Japanese Society for Artificial Intelligence Distinguished Service Award in 2009. His research interests are recommender systems, data mining, and machine learning. He is a member of AAAI, ACM, IEICE, and JSAI.
Tutorial IV: Geo-Social Media Analytics
Cheng-Te Li, Hsun-Ping Hsieh
With the maturity of wireless communication techniques, GPS-equipped mobile devices (e.g. mobile phones and tablets) become ubiquitous, and location-acquisition technologies and services are flourishing. These location applications as well as mobile devices, developed and combined with the social networking services, foster the emergence of geo-social media, a novel type of user-generated spatio-social data. The typical services of geo-social media include Facebook, Twitter, and Foursquare. In geo-social media, social connections and geo-location information of users are the essential elements, which keep track of their user interactions and their spatial-temporal activities. While social interactions are depicted by online network structures, and geographical activities are usually represented as check-in records, which consist of sequences of data points with latitude-longitude records, time stamps, and venue information. Due to the pervasive mobility of users that leads to their ubiquitous social interactions, a huge amount of user-generated geo-social data is rapidly generated and accumulated. Such big geo-social data not only collectively represents diverse kinds of real-world human activities, but also serves as a handy resource for various geo-social applications.
Cheng-Te Li is now an Assistant Research Fellow at Research Center for Information Technology Innovation (CITI) at Academia Sinica, Taipei, Taiwan. He received his M.S. and Ph.D. degrees from Graduate Institute of Networking and Multimedia, National Taiwan University, in 2009 and 2013, respectively. His research interests include social and information networks, big data mining, and geo-social media analytics. His international recognition includes Facebook Fellowship 2012 Finalist Award, ACM KDD Cup 2012 First Prize (member of NTU team), IEEE/ACM ASONAM 2011 Best Paper Award, and Microsoft Research Asia Fellowship 2010. He had ever given the conference tutorials on social media analytics at WWW 2015, on route planning at ICWSM 2014 and ASONAM 2014, on sampling and summarization in social networks at PAKDD 2013 and SDM 2013, and a tutorial on mining heterogeneous social networks at PAKDD 2009. Besides, he has several papers on geosocial media published in premier conferences and journals, including WWW, KDD, ICDM, UbiComp, CIKM, ICWSM, ASONAM, TIST, and KAIS.
Hsun-Ping Hsieh is now a Postdoctoral Researcher at Research Center for Information Technology Innovation (CITI) at Academia Sinica, Taipei, Taiwan. He received his Ph.D. degree in Graduate Institute of Networking and Multimedia at National Taiwan University, and his M.S. degree in Information Management at National Taiwan University as well. His research interests include big data mining, urban computing, geo-social computing, and geographical information systems. His international recognition includes ACM KDD Cup 2010 First Prize (member of NTU team), ACM SIGGRAPH 2010 Student Research Competition Travel Award, AAAI-ICWSM 2012 Student Travel Award, Garmin Fellowship 2015 and 2014, Award of Excellence (Stars of Tomorrow Internship Program) at Microsoft Research Asia in 2013 and NTU Outstanding College Youth in 2014. In the past years, he published a series of papers in location-based services and urban computing, including TIST, WWW, ICDM, KDD, SocialCom, ICWSM and Ubicomp. He had been the main tutorial presenter on the topic of social media analytics at WWW 2015, and on route planning at ICWSM 2014 and ASONAM 2014.
Tutorial V: Succinct Data Structure for Scalable Knowledge Discoveries
Yasuo Tabei, Rossano Venturini
Massive datasets, so called big data, are ubiquitous in research and industry. Data mining researchers/practitioners face the problem of processing and analyzing such huge datasets for knowledge discoveries in various fields. However, coping with big data is a challenge because of its huge computational cost. One important approach for solving this bottleneck in big data era is to (i) build indexes from datasets as a preprocessing by using space-efficient data structures and (ii) process datasets on the indexes.
Succinct data structure (SDS) is a space-efficient representation for data structures while supporting fast data operations on the representation. Recently, various types of SDSs have been proposed for compactly representing and indexing strings, trees, graphs, set of integers and so on. In addition, several important applications of SDSs for scalable knowledge discoveries have been presented thus far.
In this tutorial, we will start with introducing the basic concept of SDS and various types of SDS for representing data structures such as strings, trees and graphs. We then present various applications of SDSs for scalable knowledge discoveries. In particular, we focus on similarity searches and information retrievals, which are important research topics in data mining. Finally, we detail several techniques and libraries for applying SDSs to various problems in data mining.
Yasuo Tabei is a researcher at Japan Science and Technology Agency (JST) and at Tokyo Institute of Technology. Before the current position, he was a postdoc researcher at ERATO minato project, JST. His research focuses on foundations of succinct data structures and algorithms for string processing, and on their applications on various fields, including data mining, machine learning, bioinformatics and chemo-informatics. He obtained his B.S degree in computer science in 2003 from Tokyo Denki University. He received an M.S. degree in Bioinformatics in 2005 and a Ph.D. degree in Bioinformatics in 2009.
Rossano Venturini is a Researcher at Computer Science Department, University of Pisa. He received his Ph.D. from the Computer Science Department of the University of Pisa in 2010 discussing his thesis titled “On Searching and Extracting Strings from Compressed Textual Data”. His research interests are mainly focused on the design and the analysis of algorithms and data structures with special attention to problems of indexing and searching large textual collections.