Program
Talks
Adaptivity of deep ReLU network and its generalization error analysis
Taiji Suzuki (The University of Tokyo / RIKENAIP)
Deep learning has shown high performances in various types of tasks from visual recognition to natural language processing, which indicates superior flexibility and adaptivity of deep learning. To understand this phenomenon theoretically, we develop a new approximation and estimation error analysis of deep learning with the ReLU activation for functions in a Besov space and its variant with mixed smoothness. The Besov space is a considerably general function space including the Holder space and Sobolev space, and especially can capture spatial inhomogeneity of smoothness. Through the analysis, it is shown that deep learning can achieve the minimax optimal rate and outperform any nonadaptive (linear) estimator such as kernel ridge regression, which shows that deep learning has higher adaptivity to the spatial inhomogeneity of the target function than other estimators such as linear ones. The essence behind the theory is the fact that deep learning can construct bases in an adaptive way to each target function. This point shares several similarities to the sparse learning methods such as L0 regularization and low rank matrix estimation. In particular, the nonconvexity and sparsity of the model space are important. I will talk about the performance analysis of deep learning by emphasizing the connection to sparse estimation problems.
Graphbased SemiSupervised Learning for Genome, Diseasome, and Drugome
Hyunjung (Helen) Shin (Ajou University)
In this talk, we present several applications of graphbased machine learning algorithms to networks of genome, diseasome and drugome. In the genome application, we introduce Gene Ranker which produces scores for genes. It employs graphbased semisupervised algorithm. On the genome network that is built with protein interaction data and WGCNA data of immune patients, it provides ranks for key genes of immune diseases. In case of diseasome application, we present an approach for disease cooccurrence scoring based on genediseasesymptom network. For this, the algorithm for hierarchical structure of networks is proposed. The similarity matrix of hierarchical structure is huge, sparse, and tridiagonal. To make matters worse, the submatrices are not necessarily square. Therefore, we propose an algorithm not only alleviates the problem of nonsquareness and sparseness but also solves scalability problem.
Counterfactual Policy Evaluation and Optimization in Reproducing Kernel Hilbert Spaces
Krikamol Muandet (Max Planck Institute for Intelligent Systems)
In this talk, I will discuss the problem of evaluating and learning optimal policies directly from observational (i.e., nonrandomized) data using a novel framework called counterfactual mean embedding (CME). Identifying optimal policies is crucial for improving decisionmaking in online advertisement, finance, economics, and medical diagnosis. Classical approach, which is considered a gold standard for identifying optimal policies in these domains, is randomization. For example, an A/B testing has become one of the standard tools in online advertisement and recommendation systems. In medical domains, developments of new medical treatments depend exclusively on clinical controlled trials. Unfortunately, randomization in A/B testing and clinical controlled trial may be expensive, timeconsuming, unethical, or even impossible to implement in practice. To evaluate the policy from observational data, the CME maps the counterfactual distributions of potential outcomes under different treatments into a reproducing kernel Hilbert space (RKHS). Based on this representation, causal reasoning about the outcomes of different treatments can be performed over the entire landscape of counterfactual distribution using the kernel arsenal. Under some technical assumptions, we can also make a causal explanation of the resulting policies.
Joint work with Sorawit Saengkyongam, Motonobu Kanagawa, and Sanparith Marukatat.
Inference and Estimation using Nearest Neighbors
YungKyun Noh (Seoul National University)
In spite of the consistency property in theory of nearest neighbor methods, which relates the algorithm to the theoretical minimum error, the Bayes error, algorithm using nearest neighbors is not preferred by researchers because it is too simple and oldfashioned. However, due to its simplicity, the analysis in nearest neighbor methods is tractable and can produce nonasymptotic theories. Those have simply not yet experienced a big enough number of data to enjoy theoretical prediction, and the current algorithmic and system technologies are immature. In this talk, I will introduce some of my recent works implementing models that modify the geometry around the points of interest and perform the nearest neighbor methods with many data as if we were using effectively even more data than what is actually given.
We derive equations to take advantage of the entire information within finite but many data and achieve the inference and estimation results seemingly as if we had used infinite data. By doing this, we believe nearest neighbor methods can be considered a breakthrough showing asymptotic performance by the smart usage of extremely many data.
Riemannian geometry and machine learning for nonEuclidean data
Frank C. Park (Seoul National University)
A growing number of problems in machine learning involve data that is nonEuclidean. A naive application of existing learning algorithms to such data often produces results that depend on the choice of local coordinates used to parametrize the data. At the same time, many problems in machine learning eventually reduce to an optimization problem, in which the objective is to find a mapping from one curved space into another that best preserves distances and angles. We show that these and other problems can be naturally formulated as the minimization of a coordinateinvariant functional that measures the proximity to an isometry of a mapping between two Riemannian manifolds. We first show how to construct general coordinateinvariant functionals of mappings between Riemannian manifolds, and propose a family of functionals that measures how close a mapping is to being an isometry. We then formulate coordinateinvariant distortion measures for manifold learning of nonEuclidean data, and derive gradientbased optimization algorithms that accompany these measures. We also address the problem of autoencoder training for nonEuclidean data using our Riemannian geometric perspective. Both manifold learning and autoencoder case studies involving nonEuclidean datasets illustrate both the underlying geometric intuition and advantages of a Riemannian distortion minimization framework.
Knowledge Tracing Machines – Factorization Machines for Educational Data Mining
JillJênn Vie (RIKENAIP)
Knowledge tracing is a sequence prediction problem where the goal is to predict the outcomes of students over questions as they are interacting with a learning platform. By tracking the evolution of the knowledge of some student, one can provide feedback, and optimize instruction accordingly. Existing methods are either based on temporal latent variable models like deep knowledge tracing (LSTMs), or factor analysis like item response theory (online logistic regression). We present factorization machines (FMs), a model for regression or classification that encompasses several existing models in the educational data mining literature as special cases, notably additive factor model, performance factor model, and multidimensional item response theory. We show, using several real datasets of tens of thousands of users and items, that FMs can estimate student knowledge accurately and fast even when student data is sparsely observed, and handle side information such as multiple knowledge components and number of attempts at item or skill level. To reproduce our experiments, a tutorial is available: https://github.com/jilljenn/ktm. Our article is on arXiv: https://arxiv.org/abs/1811.03388
Modeling (cancer) cells using multiomics data
Sun Kim (Seoul National University)
Modeling (cancer) cells using multiomics data is the ultimate research goal in my lab and we have been making a slow but steady progress over a decade for this goal. In this talk, I will present three of the recently submitted (unpublished) manuscripts towards this goal. (1) sequence level: Ranked kspectrum kernel for comparative and evolutionary comparison of exons, introns, and CpG islands, (2) transcript level: Cancer subtype classification and modeling by pathway attention and propagation, and (3) epigenetic level: PRISM: Methylation Patternbased, Referencefree Inference of Subclonal Makeup.
The ranked string kernel paper proposes a new string kernel for comparative and evolutionary genomics. Existing string kernel methods have limitations for comparative and evolutionary studies due to the sensitiveness to over represented \(k\)mers when applied on a genome scale. With this bias of overrepresentations, comparing multiple genomes simultaneously is even more challenging. To address these issues, we propose a novel ranked \(k\)spectrum string (RKSS) kernel First, our RKSS kernel utilizes a common \(k\)mer set across species, or landmarks, that can be used for comparing arbitrary number of genomes. Second, using landmarks, we can use ranks of \(k\)mers rather than \(k\)mer frequencies that can produce more reliable distances between genomes. Specifically, RKSS kernel is robust when the \(k\)mer pattern is highly biased from repetitive elements or copy number variations as shown in our experiment. To demonstrate the power of RKSS kernel for comparative and evolutionary sequence comparison, we conducted two experiments using 10 mammalian species.
Two remaining manuscripts will be presented at this meeting as posters by the first authors. So, I will explain the research problems and the core concepts of our approaches briefly. The cancer pathway attention and propagation paper is to model cancer cells using an ensemble of several hundred deep learning pathway models. Two main ideas for this modeling are: when combining hundred pathway models, we need to catch contextdependent mechanisms of highlighting which pathways are important, thus attention mechanisms; to achieve explainable cancer cell modeling, we used pathway network propagation. The last paper is about PRISM, a computational method to infer cancer sublclones from DNA methylation data. Each of unknown clone populations are mixed and represented in methylation data and the goal is to decompose the subclonal populations. The main computational challenges are that the number of dimensions is huge, several hundred million, and the data is error prone. We address this daunting problem by utilizing the characteristics of DNA methylation data. First, errors are corrected by modeling methylation patterns of DNMT1 enzyme using HMM. Then, with errorcorrected methylation patterns, PRISM focuses on small individual genomic regions, each of which represent the abundance of a subclone. A set of statistics collected from each genomic region is modeled with a betabinomial mixture. Fitting the mixture with expectationmaximization algorithm finally provides inferred composition of subclones.
Weakly Supervised Classification, Robust Learning and More: Overview of Our Recent Advances
Masashi Sugiyama (The University of Tokyo / RIKENAIP)
Machine learning has been successfully used to solve various realworld problems. However, we are still facing many technical challenges, for example, we want to train machine learning systems without big labeled data and we want to reliably deploy machine learning systems under noisy environments. In this talk, I will overview our recent advances in tackling these problems.
Deep Variational Inference with Common Information Extraction
YoungHan Kim (UC San Diego)
Inspired by the channel synthesis problem in network information theory, we propose a new variational statistical inference approach that learns a succinct generative model for two data variables based on the notion of Wyner's common information. This generative model includes two groups of latent variablesfirst, the common latent variable that captures the common information (e.g., a shared concept) of the two data variables, and second, the local latent variables that capture the remaining randomness (e.g., texture and style) in respective data variables. A simple training scheme for the variational model is presented, as well as methods for controlling the amount of common information extraction. The utility of the proposed approach and accompanied training techniques is demonstrated through experiments for joint generation, conditional generation, and style transfer using synthetic data and real images.
CONDITIONING FACTORS DETERMINATION FOR LANDSLIDE SUSCEPTIBILITY MAPPING USING SUPPORT VECTOR MACHINE LEARNING
Bahareh Kalantar (RIKENAIP)
This study investigates the effectiveness of two sets of landslide conditioning variable(s). Fourteen landslide conditioning variables were considered in this study where they were duly divided into two sets G1 and G2. Two Support Vector Machine (SVM) classifiers were constructed based on each dataset (SVMG1 and SVMG2) in order to determine which set would be more suitable for landslide susceptibility prediction. In total, 160 landslide inventory datasets of the study area were used where 70% was used for SVM training and 30% for testing. The intrarelationships between parameters were explored based on variance inflation factors (VIF), Pearson’s correlation and Cohen’s kappa analysis. Other evaluation metrics are the area under curve (AUC).
Designing Materials with Machine Learning and Quantum Annealing
Koji Tsuda (RIKEN AIP/ The University of Tokyo)
The scientific process of discovering new knowledge is often characterized as search from a vast space of candidates, and machine learning can accelerate it by properly modeling the data and suggesting candidates for next experimentation. In many cases, experiments can be substituted by simulations such as first principles calculation. After reviewing basic machine learning techniques for materials design, I will present successful case studies including the design of SiGe nanostructures and layered thermal radiators. Finally, I will show how a Dwave quantum annealer can be used for complex materials design.
Reshuffled Tensor Decomposition with Exact Recovery of Lowrank Components
Chao Li (RIKENAIP)
Lowrank tensor decomposition (TD) is a promising approach for analysis and understanding of realworld. Many such analyses require correct recovery of the true components from the observed tensor, but such characteristic is not achieved in many existing TD methods. To exactly recover the true components, we introduce a general class of tensor decomposition models where the tensor is decomposed as the sum of reshuffled lowrank components. The reshuffling operation generalizes the conventional folding (tensorization) operation, and also provides additional flexibility to recover true components of complex data structures. We develop a simple convex algorithm called ReshuffledTD, and theoretically prove that and exact recovery is guaranteed when a type of incoherence measure is upper bounded. The results on image steganography show that our method obtains the stateoftheart performance, and demonstrate the effectiveness of our method in practice.
Accelerating DNNs using Heterogeneous Clusters
Jaejin Lee (Seoul National University)
Heterogeneous systems that are based on GPUs and FPGAs are widening their user base. In fact, GPUbased heterogeneous systems are de facto standard for running deep learning applications. Especially in the post Moore’s era, the role of acceleratorbased heterogeneous systems is becoming more important. The highperformance computing community quickly recognized that deep learning applications have very similar characteristics to largescale HPC applications. There are a lot of ways to parallelize and accelerate DNN models depending on different types of target architectures. In this talk, we first introduce current trends in heterogeneous computing systems. Then, we will introduce ongoing research efforts in the multicore computing research laboratory at Seoul National University to achieve ease of programming and high performance targeting GPU and FPGAbased heterogeneous clusters.
Deep Learning and Tree Search Finds New Molecules
Kazuki Yoshizoe (RIKEN AIP)
De novo molecular generation is a problem about finding new chemical compounds. We tackle this problem by combining deep neural networks and search algorithms, the same combination which was used for the first version of the AlphaGo program which achieved superhuman strength in the game of Go. We have defined a search space based on Simplified MolecularInput LineEntry System (SMILES), which is a way of describing molecules in ASCII strings popularly used among chemists. Our tool ChemTS utilizes a Recurrent Neural Network (RNN) model which generates SMILES strings, MonteCarlo Tree Search algorithm, and computational chemistry simulators to find SMILES strings with high "scores" which are the candidates for new promising molecules. This AlphaGolike approach obtained promising enough results as a ProofofConcept study.
Learning for SingleShot Confidence Calibration in Deep Neural Networks through Stochastic Inferences
Bohyung Han (Seoul National University)
I present a generic framework to calibrate accuracy and confidence (score) of a prediction through stochastic inferences in deep neural networks. Our algorithm is motivated by the fact that accuracy and score of a prediction are highly correlated with the variance of multiple stochastic inferences given by stochastic depth or dropout. we design a novel varianceweighted confidenceintegrated loss function that is composed of the standard crossentropy loss and KLdivergence from the uniform distribution, where the two terms are balanced based on the variance of stochastic prediction scores. The proposed algorithm presents outstanding confidence calibration performance and improved classification accuracy with two popular stochastic regularization techniques—stochastic depth and dropout—in multiple models and datasets; it alleviates overconfidence issue in deep neural networks significantly by training the networks to achieve prediction accuracy proportional to the confidence of the prediction.
Covariance matrices and covariance operators: Theory and Applications
Ha Quang Minh (RIKENAIP)
Symmetric positive definite (SPD) matrices, in particular covariance matrices, play important roles in many areas of mathematics and statistics,with numerous applications various different fields, including machine learning, brain imaging, and computer vision. The set of SPD matrices is not a subspace of Euclidean space and consequently algorithms utilizing the Euclidean metric tend to be suboptimal in practice. A lot of recent research has therefore focused on exploiting the intrinsic geometrical structures of SPD matrices, in particular the view of this set as a Riemannian manifold. In this talk, we will present a survey of some of the recent developments in the generalization of finitedimensional covariance matrices to infinitedimensional covariance operators via kernel methods, along with the corresponding geometrical structures. This direction exploits the power of kernel methods from machine learning in a geometrical framework,both mathematically and algorithmically. The theoretical formulation will be illustrated with applications in computer vision, which demonstrate both the power of kernel covariance operators as well as of the algorithms based on their intrinsic geometry.
Posters

Ryuichiro Hataya (RIKENAIP)
Investigating CNNs' Learning Representation under label noise 
Gunsoo Yoon (POSTECH)
Universality of localized explosive transport in magnetically confined plasma confirmed by deep learning technique 
Chao Li (RIKENAIP)
Reshuffled Tensor Decomposition with Exact Recovery of Lowrank Components 
Cheongjae Jang (Seoul National University)
Riemannian Distortion Measures for NonEuclidean Data 
Tanuj Kr Aasawat (RIKENAIP)
TBD 
Sangwoong Yoon (Seoul National University)
KullbackLeibler Divergence Estimation using Variationally Weighted Kernel Density Estimators 
Ming Hou (RIKENAIP)
Generalizing Deep Multitask Learning with Heterogenous Structured Networks via Tensor Ring Nets 
YuJung Heo (Seoul National University)
Answerer in questioner's mind: Information theoretic approach to goaloriented visual dialog 
Tsuyoshi Okita (RIKENAIP)
deep transfer learning with point clouds: earthquake frequency prediction 
Takeshi Teshima (RIKENAIP)
Clipped matrix completion: A remedy for ceiling effects 
YungKyun Noh (Seoul National University)
Generative Local Metric Learning for NadarayaWatson Kernel Regression 
Jungtaek Kim (POSTECH)
On local optimizers of acquisition functions in Bayesian optimization 
Yoonho Lee (KAKAO)
Set Transformer: A Framework for Attentionbased PermutationInvariant Neural Networks 
Haixia Zheng (RIKENAIP)
Early mild cognitive impairment identification using support vector machinebased analysis of restingstate functional connectivity 
Sangseon Lee (Seoul National University)
Cancer cell modeling by deep pathway attention and propagation 
Isao Ishikawa (RIKENAIP)
Metric on nonlinear dynamical systems with PerronFrobenius operators 
Dohoon Lee (Seoul National University)
PRISM: Methylation Patternbased, Referencefree Inference of Subclonal Makeup 
Masahiro Ikeda (RIKENAIP)
Finding Cheeger Cuts on Hypergraph via Heat Equation 
Kyoung Woon On (Seoul National University)
Sequential Structure Learning with Temporal Dependency Networks for Video Understanding 
Tomohisa Okazaki (RIKENAIP)
Synthesis of earthquake ground motions using embedding and neural networks 
Jinwon Choi (Seoul National University)
Trajectory planning based reinforcement learning 
Saehoon Kim (AITRICS)
Adaptive Network Sparsification via Dependent Variational BetaBernoulli Dropout 
Seunghyeon Kim (Seoul National University)
Efficient Neural Network Compression via Transfer Learning for Industrial Optical Inspection 
Seongho Son (NCSOFT)
Applying Deep Reinforcement Learning to Oneonone Fighting of a Complex Commercial Game 'Blade and Soul' 
Tam Le (RIKENAIP)
TreeSliced Approximation of Wasserstein Distances / Persistence Fisher Kernel: A Riemannian Manifold Kernel for Persistence Diagrams 
Jongmin Lee (KAIST)
MonteCarlo Tree Search for Constrained POMDPs
Participants
[Volunteer] Jinwon Choi (Department of Mechanical Engineering, Seoul National University)
reinforcement learning, guided policy search, trajectory planning
[Volunteer] Cheongjae Jang (Seoul National University)
manifold learning for nonEuclidean data, robotics
[Volunteer] Sangwoong Yoon (Seoul National University)
Nonparametric methods, decision making under uncertainty, ML for natural science (chemistry, meteorology), ML for finance
[Volunteer] Yonghyun Nam (Ajou University)
Biomedical informatics, semisupervised learning
[Volunteer] YuJung Heo (Seoul National University)
Multimodal Learning, Intersection between Computer Vision and Natural Language Processing, Video Learning
[Volunteer] Seunghyeon Kim (Seoul National University Robotics Laboratory)
Industrial optical inspection, Transfer learning, Model compression
WooYoung Ahn (Seoul National University, Department of Psychology & Biology (Adjunct))
Decision neuroscience, reinforcement learning, computational modeling, neuroimaging
Frank Chongwoo Park (Mechanical and Aerospace Engineering, Seoul National University)
Robotics, Vision and Image Processing, Machine Learning
Sangseon Lee (Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea)
Bioinformatics, Machine Learning, Information theory
Bohyung Han (ECE, Seoul National University)
Computer vision, machine learning, deep learning
Ha Quang Minh (RIKENAIP)
Functional analytic methods, reproducing kernel Hilbert spaces, covariance matrices and operators,Riemannian geometry
Jaeyoung Kim (Seoul National University Human Computer Interaction Lab)
Recommender system, Explainable ai
EunSol Kim (Kakao Brain)
Graph Neural Networks, Multimodal Learning
Jungtaek Kim (POSTECH)
Bayesian optimization, setinput models
Sungbin Lim (Kakao Brain)
AutoML / Uncertainty / RL
Jongmin Lee (KAIST)
reinforcement learning, sequential decisionmaking problems
Bahareh Kalantar (RIKENAIP, GoalOriented Technology Research Group, Disaster Resilience Science Team)
Landslide, Machine learning, Conditioning factors, Factor correlation
Juhan Kim (Korea Institute for Advanced Study)
Cosmological simulations with Nbody and hydro dynamics
Chao Li (Tensor Learning Unit, RIKENAIP)
Tensor decomposition, linear inverse problem, and machine learning
Runa Eschenhagen (Approximate Bayesian Inference Team at RIKEN AIP)
Bayesian inference, Bayesian deep learning, and continual learning
Seungjin Choi (POSTECH)
Bayesian models, probabilistic inference, metalearning
Hyunjoo Jung (Samsung Research)
DNN model compression, Video/Image recognition
Inkwon Choi (Samsung Research, AI Center, AI Core Team)
deep learning, neural network compression, neural architecture search, automl, reinforcement learning, machine learning
Yoonho Lee (Kakao Corporation)
metalearning, fewshot learning, permutationinvariant models
JillJênn Vie (RIKEN AIP)
matrix factorization, deep generative models, educational data mining
Dohoon Lee (Seoul National University)
Bioinformatics, Machine learning, Cancer biology
Geonhyeong Kim (KAIST)
reinforcement learning, variational inference
Saehoon Kim (AITRICS)
Bayesian Deep Learning, Machine Learning for Healthcare
Tomohisa Okazaki (RIKENAIP)
Seismology, Machine learning
Kyoung Woon On (Department of Computer Science and Engineering, Seoul National University)
Machine learning, Video processing, Data structure learning
Tam Le (RIKENAIP)
optimal transport, Riemannian manifold, geometric machine learning, topological data analysis, kernel methods, parametric optimization, metric learning
Takeshi Teshima (RIKENAIP / SugiyamaSatoHonda Lab at the University of Tokyo)
Matrix completion, causal inference, semisupervised learning
Kanghoon Lee (SK TBrain)
Reinforcement learning, (partially observable) Markov decision process, and spoken dialogue system
Krikamol Muandet (Max Planck Institute for Intelligent Systems)
Counterfactual Policy Evaluation and Optimization in Reproducing Kernel Hilbert Spaces
Ryuichiro Hataya (RIKEN AIP and The University of Tokyo)
deep learning under weak supervision
Taiji Suzuki (The University of Tokyo / RIKENAIP)
statistical learning theory, stochastic optimization, deep learning, kernel method, high dimensional statistics, concentration inequality
Koji Tsuda (The University of Tokyo / RIKENAIP)
Kazuki Yoshizoe (RIKEN AIP)
Search algorithms and parallel computing
Tsuyoshi Okita (RIKEN AIP)
deep learning for disaster resilience
YungKyun Noh (Seoul National University → Hanyang University)
Nonparametric methods, learning theory
YounSung Lee (Korea Electronics Technology Institute (KETI))
Machine Learning, Signal Processing, VLSI architecture
Hojin Jung (Samsung Research)
Natural language processing
Seongho Son (NCSOFT Reinforcement Learning Team)
Reinforcement Learning for video games, Bayesian Deep Learning
Jinwook Seo (Seoul National University)
Visualization, HCI
Nayeong Kim (POSTECH)
Interpretable machine learning, Explainable machine learning(XAI), Disentangled representation, Anomaly Detection
Tanuj Kr Aasawat (RIKEN AIP)
HPC, Largescale Graph Processing
Francis Hamilton (Technical University of Denmark / RIKENAIP)
Deep RL for robotic mapping
Jaejin Lee (Seoul National University)
Masashi Sugiyama (The University of Tokyo / RIKENAIP)