Adaptivity of deep ReLU network and its generalization error analysis

Taiji Suzuki (The University of Tokyo / RIKEN-AIP)

Deep learning has shown high performances in various types of tasks from visual recognition to natural language processing, which indicates superior flexibility and adaptivity of deep learning. To understand this phenomenon theoretically, we develop a new approximation and estimation error analysis of deep learning with the ReLU activation for functions in a Besov space and its variant with mixed smoothness. The Besov space is a considerably general function space including the Holder space and Sobolev space, and especially can capture spatial inhomogeneity of smoothness. Through the analysis, it is shown that deep learning can achieve the minimax optimal rate and outperform any non-adaptive (linear) estimator such as kernel ridge regression, which shows that deep learning has higher adaptivity to the spatial inhomogeneity of the target function than other estimators such as linear ones. The essence behind the theory is the fact that deep learning can construct bases in an adaptive way to each target function. This point shares several similarities to the sparse learning methods such as L0 regularization and low rank matrix estimation. In particular, the non-convexity and sparsity of the model space are important. I will talk about the performance analysis of deep learning by emphasizing the connection to sparse estimation problems.

Graph-based Semi-Supervised Learning for Genome, Diseasome, and Drugome

Hyunjung (Helen) Shin (Ajou University)

In this talk, we present several applications of graph-based machine learning algorithms to networks of genome, diseasome and drugome. In the genome application, we introduce Gene Ranker which produces scores for genes. It employs graph-based semi-supervised algorithm. On the genome network that is built with protein interaction data and WGCNA data of immune patients, it provides ranks for key genes of immune diseases. In case of diseasome application, we present an approach for disease co-occurrence scoring based on gene-disease-symptom network. For this, the algorithm for hierarchical structure of networks is proposed. The similarity matrix of hierarchical structure is huge, sparse, and tri-diagonal. To make matters worse, the sub-matrices are not necessarily square. Therefore, we propose an algorithm not only alleviates the problem of non-squareness and sparseness but also solves scalability problem.

Counterfactual Policy Evaluation and Optimization in Reproducing Kernel Hilbert Spaces

Krikamol Muandet (Max Planck Institute for Intelligent Systems)

In this talk, I will discuss the problem of evaluating and learning optimal policies directly from observational (i.e., non-randomized) data using a novel framework called counterfactual mean embedding (CME). Identifying optimal policies is crucial for improving decision-making in online advertisement, finance, economics, and medical diagnosis. Classical approach, which is considered a gold standard for identifying optimal policies in these domains, is randomization. For example, an A/B testing has become one of the standard tools in online advertisement and recommendation systems. In medical domains, developments of new medical treatments depend exclusively on clinical controlled trials. Unfortunately, randomization in A/B testing and clinical controlled trial may be expensive, time-consuming, unethical, or even impossible to implement in practice. To evaluate the policy from observational data, the CME maps the counterfactual distributions of potential outcomes under different treatments into a reproducing kernel Hilbert space (RKHS). Based on this representation, causal reasoning about the outcomes of different treatments can be performed over the entire landscape of counterfactual distribution using the kernel arsenal. Under some technical assumptions, we can also make a causal explanation of the resulting policies.
Joint work with Sorawit Saengkyongam, Motonobu Kanagawa, and Sanparith Marukatat.

Inference and Estimation using Nearest Neighbors

Yung-Kyun Noh (Seoul National University)

In spite of the consistency property in theory of nearest neighbor methods, which relates the algorithm to the theoretical minimum error, the Bayes error, algorithm using nearest neighbors is not preferred by researchers because it is too simple and old-fashioned. However, due to its simplicity, the analysis in nearest neighbor methods is tractable and can produce non-asymptotic theories. Those have simply not yet experienced a big enough number of data to enjoy theoretical prediction, and the current algorithmic and system technologies are immature. In this talk, I will introduce some of my recent works implementing models that modify the geometry around the points of interest and perform the nearest neighbor methods with many data as if we were using effectively even more data than what is actually given.
We derive equations to take advantage of the entire information within finite but many data and achieve the inference and estimation results seemingly as if we had used infinite data. By doing this, we believe nearest neighbor methods can be considered a breakthrough showing asymptotic performance by the smart usage of extremely many data.

Riemannian geometry and machine learning for non-Euclidean data

Frank C. Park (Seoul National University)

A growing number of problems in machine learning involve data that is non-Euclidean. A naive application of existing learning algorithms to such data often produces results that depend on the choice of local coordinates used to parametrize the data. At the same time, many problems in machine learning eventually reduce to an optimization problem, in which the objective is to find a mapping from one curved space into another that best preserves distances and angles. We show that these and other problems can be naturally formulated as the minimization of a coordinate-invariant functional that measures the proximity to an isometry of a mapping between two Riemannian manifolds. We first show how to construct general coordinate-invariant functionals of mappings between Riemannian manifolds, and propose a family of functionals that measures how close a mapping is to being an isometry. We then formulate coordinate-invariant distortion measures for manifold learning of non-Euclidean data, and derive gradient-based optimization algorithms that accompany these measures. We also address the problem of autoencoder training for non-Euclidean data using our Riemannian geometric perspective. Both manifold learning and autoencoder case studies involving non-Euclidean datasets illustrate both the underlying geometric intuition and advantages of a Riemannian distortion minimization framework.

Knowledge Tracing Machines – Factorization Machines for Educational Data Mining

Jill-Jênn Vie (RIKEN-AIP)

Knowledge tracing is a sequence prediction problem where the goal is to predict the outcomes of students over questions as they are interacting with a learning platform. By tracking the evolution of the knowledge of some student, one can provide feedback, and optimize instruction accordingly. Existing methods are either based on temporal latent variable models like deep knowledge tracing (LSTMs), or factor analysis like item response theory (online logistic regression). We present factorization machines (FMs), a model for regression or classification that encompasses several existing models in the educational data mining literature as special cases, notably additive factor model, performance factor model, and multidimensional item response theory. We show, using several real datasets of tens of thousands of users and items, that FMs can estimate student knowledge accurately and fast even when student data is sparsely observed, and handle side information such as multiple knowledge components and number of attempts at item or skill level. To reproduce our experiments, a tutorial is available: Our article is on arXiv:

Modeling (cancer) cells using multi-omics data

Sun Kim (Seoul National University)

Modeling (cancer) cells using multi-omics data is the ultimate research goal in my lab and we have been making a slow but steady progress over a decade for this goal. In this talk, I will present three of the recently submitted (unpublished) manuscripts towards this goal. (1) sequence level: Ranked k-spectrum kernel for comparative and evolutionary comparison of exons, introns, and CpG islands, (2) transcript level: Cancer subtype classification and modeling by pathway attention and propagation, and (3) epigenetic level: PRISM: Methylation Pattern-based, Reference-free Inference of Subclonal Makeup.
The ranked string kernel paper proposes a new string kernel for comparative and evolutionary genomics. Existing string kernel methods have limitations for comparative and evolutionary studies due to the sensitiveness to over represented \(k\)-mers when applied on a genome scale. With this bias of over-representations, comparing multiple genomes simultaneously is even more challenging. To address these issues, we propose a novel ranked \(k\)-spectrum string (RKSS) kernel First, our RKSS kernel utilizes a common \(k\)-mer set across species, or landmarks, that can be used for comparing arbitrary number of genomes. Second, using landmarks, we can use ranks of \(k\)-mers rather than \(k\)-mer frequencies that can produce more reliable distances between genomes. Specifically, RKSS kernel is robust when the \(k\)-mer pattern is highly biased from repetitive elements or copy number variations as shown in our experiment. To demonstrate the power of RKSS kernel for comparative and evolutionary sequence comparison, we conducted two experiments using 10 mammalian species.
Two remaining manuscripts will be presented at this meeting as posters by the first authors. So, I will explain the research problems and the core concepts of our approaches briefly. The cancer pathway attention and propagation paper is to model cancer cells using an ensemble of several hundred deep learning pathway models. Two main ideas for this modeling are: when combining hundred pathway models, we need to catch context-dependent mechanisms of highlighting which pathways are important, thus attention mechanisms; to achieve explainable cancer cell modeling, we used pathway network propagation. The last paper is about PRISM, a computational method to infer cancer sublclones from DNA methylation data. Each of unknown clone populations are mixed and represented in methylation data and the goal is to decompose the subclonal populations. The main computational challenges are that the number of dimensions is huge, several hundred million, and the data is error prone. We address this daunting problem by utilizing the characteristics of DNA methylation data. First, errors are corrected by modeling methylation patterns of DNMT1 enzyme using HMM. Then, with error-corrected methylation patterns, PRISM focuses on small individual genomic regions, each of which represent the abundance of a subclone. A set of statistics collected from each genomic region is modeled with a beta-binomial mixture. Fitting the mixture with expectation-maximization algorithm finally provides inferred composition of subclones.

Weakly Supervised Classification, Robust Learning and More: Overview of Our Recent Advances

Masashi Sugiyama (The University of Tokyo / RIKEN-AIP)

Machine learning has been successfully used to solve various real-world problems. However, we are still facing many technical challenges, for example, we want to train machine learning systems without big labeled data and we want to reliably deploy machine learning systems under noisy environments. In this talk, I will overview our recent advances in tackling these problems.

Deep Variational Inference with Common Information Extraction

Young-Han Kim (UC San Diego)

Inspired by the channel synthesis problem in network information theory, we propose a new variational statistical inference approach that learns a succinct generative model for two data variables based on the notion of Wyner's common information. This generative model includes two groups of latent variables---first, the common latent variable that captures the common information (e.g., a shared concept) of the two data variables, and second, the local latent variables that capture the remaining randomness (e.g., texture and style) in respective data variables. A simple training scheme for the variational model is presented, as well as methods for controlling the amount of common information extraction. The utility of the proposed approach and accompanied training techniques is demonstrated through experiments for joint generation, conditional generation, and style transfer using synthetic data and real images.


Bahareh Kalantar (RIKEN-AIP)

This study investigates the effectiveness of two sets of landslide conditioning variable(s). Fourteen landslide conditioning variables were considered in this study where they were duly divided into two sets G1 and G2. Two Support Vector Machine (SVM) classifiers were constructed based on each dataset (SVM-G1 and SVM-G2) in order to determine which set would be more suitable for landslide susceptibility prediction. In total, 160 landslide inventory datasets of the study area were used where 70% was used for SVM training and 30% for testing. The intra-relationships between parameters were explored based on variance inflation factors (VIF), Pearson’s correlation and Cohen’s kappa analysis. Other evaluation metrics are the area under curve (AUC).

Designing Materials with Machine Learning and Quantum Annealing

Koji Tsuda (RIKEN AIP/ The University of Tokyo)

The scientific process of discovering new knowledge is often characterized as search from a vast space of candidates, and machine learning can accelerate it by properly modeling the data and suggesting candidates for next experimentation. In many cases, experiments can be substituted by simulations such as first principles calculation. After reviewing basic machine learning techniques for materials design, I will present successful case studies including the design of Si-Ge nanostructures and layered thermal radiators. Finally, I will show how a D-wave quantum annealer can be used for complex materials design.

Reshuffled Tensor Decomposition with Exact Recovery of Low-rank Components


Low-rank tensor decomposition (TD) is a promising approach for analysis and understanding of real-world. Many such analyses require correct recovery of the true components from the observed tensor, but such characteristic is not achieved in many existing TD methods. To exactly recover the true components, we introduce a general class of tensor decomposition models where the tensor is decomposed as the sum of reshuffled low-rank components. The reshuffling operation generalizes the conventional folding (tensorization) operation, and also provides additional flexibility to recover true components of complex data structures. We develop a simple convex algorithm called Reshuffled-TD, and theoretically prove that and exact recovery is guaranteed when a type of incoherence measure is upper bounded. The results on image steganography show that our method obtains the state-of-the-art performance, and demonstrate the effectiveness of our method in practice.

Accelerating DNNs using Heterogeneous Clusters

Jaejin Lee (Seoul National University)

Heterogeneous systems that are based on GPUs and FPGAs are widening their user base. In fact, GPU-based heterogeneous systems are de facto standard for running deep learning applications. Especially in the post Moore’s era, the role of accelerator-based heterogeneous systems is becoming more important. The high-performance computing community quickly recognized that deep learning applications have very similar characteristics to large-scale HPC applications. There are a lot of ways to parallelize and accelerate DNN models depending on different types of target architectures. In this talk, we first introduce current trends in heterogeneous computing systems. Then, we will introduce on-going research efforts in the multicore computing research laboratory at Seoul National University to achieve ease of programming and high performance targeting GPU and FPGA-based heterogeneous clusters.

Deep Learning and Tree Search Finds New Molecules

Kazuki Yoshizoe (RIKEN AIP)

De novo molecular generation is a problem about finding new chemical compounds. We tackle this problem by combining deep neural networks and search algorithms, the same combination which was used for the first version of the AlphaGo program which achieved super-human strength in the game of Go. We have defined a search space based on Simplified Molecular-Input Line-Entry System (SMILES), which is a way of describing molecules in ASCII strings popularly used among chemists. Our tool ChemTS utilizes a Recurrent Neural Network (RNN) model which generates SMILES strings, Monte-Carlo Tree Search algorithm, and computational chemistry simulators to find SMILES strings with high "scores" which are the candidates for new promising molecules. This AlphaGo-like approach obtained promising enough results as a Proof-of-Concept study.

Learning for Single-Shot Confidence Calibration in Deep Neural Networks through Stochastic Inferences

Bohyung Han (Seoul National University)

I present a generic framework to calibrate accuracy and confidence (score) of a prediction through stochastic inferences in deep neural networks. Our algorithm is motivated by the fact that accuracy and score of a prediction are highly correlated with the variance of multiple stochastic inferences given by stochastic depth or dropout. we design a novel variance-weighted confidence-integrated loss function that is composed of the standard cross-entropy loss and KL-divergence from the uniform distribution, where the two terms are balanced based on the variance of stochastic prediction scores. The proposed algorithm presents outstanding confidence calibration performance and improved classification accuracy with two popular stochastic regularization techniques—stochastic depth and dropout—in multiple models and datasets; it alleviates overconfidence issue in deep neural networks significantly by training the networks to achieve prediction accuracy proportional to the confidence of the prediction.

Covariance matrices and covariance operators: Theory and Applications

Ha Quang Minh (RIKEN-AIP)

Symmetric positive definite (SPD) matrices, in particular covariance matrices, play important roles in many areas of mathematics and statistics,with numerous applications various different fields, including machine learning, brain imaging, and computer vision. The set of SPD matrices is not a subspace of Euclidean space and consequently algorithms utilizing the Euclidean metric tend to be suboptimal in practice. A lot of recent research has therefore focused on exploiting the intrinsic geometrical structures of SPD matrices, in particular the view of this set as a Riemannian manifold. In this talk, we will present a survey of some of the recent developments in the generalization of finite-dimensional covariance matrices to infinite-dimensional covariance operators via kernel methods, along with the corresponding geometrical structures. This direction exploits the power of kernel methods from machine learning in a geometrical framework,both mathematically and algorithmically. The theoretical formulation will be illustrated with applications in computer vision, which demonstrate both the power of kernel covariance operators as well as of the algorithms based on their intrinsic geometry.


  1. Ryuichiro Hataya (RIKEN-AIP)
    Investigating CNNs' Learning Representation under label noise
  2. Gunsoo Yoon (POSTECH)
    Universality of localized explosive transport in magnetically confined plasma confirmed by deep learning technique
  3. Chao Li (RIKEN-AIP)
    Reshuffled Tensor Decomposition with Exact Recovery of Low-rank Components
  4. Cheongjae Jang (Seoul National University)
    Riemannian Distortion Measures for Non-Euclidean Data
  5. Tanuj Kr Aasawat (RIKEN-AIP)
  6. Sangwoong Yoon (Seoul National University)
    Kullback-Leibler Divergence Estimation using Variationally Weighted Kernel Density Estimators
  7. Ming Hou (RIKEN-AIP)
    Generalizing Deep Multi-task Learning with Heterogenous Structured Networks via Tensor Ring Nets
  8. Yu-Jung Heo (Seoul National University)
    Answerer in questioner's mind: Information theoretic approach to goal-oriented visual dialog
  9. Tsuyoshi Okita (RIKEN-AIP)
    deep transfer learning with point clouds: earthquake frequency prediction
  10. Takeshi Teshima (RIKEN-AIP)
    Clipped matrix completion: A remedy for ceiling effects
  11. Yung-Kyun Noh (Seoul National University)
    Generative Local Metric Learning for Nadaraya-Watson Kernel Regression
  12. Jungtaek Kim (POSTECH)
    On local optimizers of acquisition functions in Bayesian optimization
  13. Yoonho Lee (KAKAO)
    Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks
  14. Haixia Zheng (RIKEN-AIP)
    Early mild cognitive impairment identification using support vector machine-based analysis of resting-state functional connectivity
  15. Sangseon Lee (Seoul National University)
    Cancer cell modeling by deep pathway attention and propagation
  16. Isao Ishikawa (RIKEN-AIP)
    Metric on nonlinear dynamical systems with Perron-Frobenius operators
  17. Dohoon Lee (Seoul National University)
    PRISM: Methylation Pattern-based, Reference-free Inference of Subclonal Makeup
  18. Masahiro Ikeda (RIKEN-AIP)
    Finding Cheeger Cuts on Hypergraph via Heat Equation
  19. Kyoung Woon On (Seoul National University)
    Sequential Structure Learning with Temporal Dependency Networks for Video Understanding
  20. Tomohisa Okazaki (RIKEN-AIP)
    Synthesis of earthquake ground motions using embedding and neural networks
  21. Jinwon Choi (Seoul National University)
    Trajectory planning based reinforcement learning
  22. Saehoon Kim (AITRICS)
    Adaptive Network Sparsification via Dependent Variational Beta-Bernoulli Dropout
  23. Seunghyeon Kim (Seoul National University)
    Efficient Neural Network Compression via Transfer Learning for Industrial Optical Inspection
  24. Seongho Son (NCSOFT)
    Applying Deep Reinforcement Learning to One-on-one Fighting of a Complex Commercial Game 'Blade and Soul'
  25. Tam Le (RIKEN-AIP)
    Tree-Sliced Approximation of Wasserstein Distances / Persistence Fisher Kernel: A Riemannian Manifold Kernel for Persistence Diagrams
  26. Jongmin Lee (KAIST)
    Monte-Carlo Tree Search for Constrained POMDPs



[Volunteer] Jinwon Choi (Department of Mechanical Engineering, Seoul National University)

reinforcement learning, guided policy search, trajectory planning

[Volunteer] Cheongjae Jang (Seoul National University)

manifold learning for non-Euclidean data, robotics

[Volunteer] Sangwoong Yoon (Seoul National University)

Nonparametric methods, decision making under uncertainty, ML for natural science (chemistry, meteorology), ML for finance

[Volunteer] Yonghyun Nam (Ajou University)

Biomedical informatics, semi-supervised learning

[Volunteer] Yu-Jung Heo (Seoul National University)

Multimodal Learning, Intersection between Computer Vision and Natural Language Processing, Video Learning

[Volunteer] Seunghyeon Kim (Seoul National University Robotics Laboratory)

Industrial optical inspection, Transfer learning, Model compression

Woo-Young Ahn (Seoul National University, Department of Psychology & Biology (Adjunct))

Decision neuroscience, reinforcement learning, computational modeling, neuroimaging

Frank Chongwoo Park (Mechanical and Aerospace Engineering, Seoul National University)

Robotics, Vision and Image Processing, Machine Learning

Sangseon Lee (Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea)

Bioinformatics, Machine Learning, Information theory

Bohyung Han (ECE, Seoul National University)

Computer vision, machine learning, deep learning

Ha Quang Minh (RIKEN-AIP)

Functional analytic methods, reproducing kernel Hilbert spaces, covariance matrices and operators,Riemannian geometry

Jaeyoung Kim (Seoul National University Human Computer Interaction Lab)

Recommender system, Explainable ai

Eun-Sol Kim (Kakao Brain)

Graph Neural Networks, Multimodal Learning

Jungtaek Kim (POSTECH)

Bayesian optimization, set-input models

Sungbin Lim (Kakao Brain)

AutoML / Uncertainty / RL

Jongmin Lee (KAIST)

reinforcement learning, sequential decision-making problems

Bahareh Kalantar (RIKEN-AIP, Goal-Oriented Technology Research Group, Disaster Resilience Science Team)

Landslide, Machine learning, Conditioning factors, Factor correlation

Juhan Kim (Korea Institute for Advanced Study)

Cosmological simulations with N-body and hydro dynamics

Chao Li (Tensor Learning Unit, RIKEN-AIP)

Tensor decomposition, linear inverse problem, and machine learning

Runa Eschenhagen (Approximate Bayesian Inference Team at RIKEN AIP)

Bayesian inference, Bayesian deep learning, and continual learning

Seungjin Choi (POSTECH)

Bayesian models, probabilistic inference, meta-learning

Hyunjoo Jung (Samsung Research)

DNN model compression, Video/Image recognition

Inkwon Choi (Samsung Research, AI Center, AI Core Team)

deep learning, neural network compression, neural architecture search, automl, reinforcement learning, machine learning

Yoonho Lee (Kakao Corporation)

meta-learning, few-shot learning, permutation-invariant models

Jill-Jênn Vie (RIKEN AIP)

matrix factorization, deep generative models, educational data mining

Dohoon Lee (Seoul National University)

Bioinformatics, Machine learning, Cancer biology

Geonhyeong Kim (KAIST)

reinforcement learning, variational inference

Saehoon Kim (AITRICS)

Bayesian Deep Learning, Machine Learning for Healthcare

Tomohisa Okazaki (RIKEN-AIP)

Seismology, Machine learning

Kyoung Woon On (Department of Computer Science and Engineering, Seoul National University)

Machine learning, Video processing, Data structure learning


optimal transport, Riemannian manifold, geometric machine learning, topological data analysis, kernel methods, parametric optimization, metric learning

Takeshi Teshima (RIKEN-AIP / Sugiyama-Sato-Honda Lab at the University of Tokyo)

Matrix completion, causal inference, semi-supervised learning

Kanghoon Lee (SK T-Brain)

Reinforcement learning, (partially observable) Markov decision process, and spoken dialogue system

Krikamol Muandet (Max Planck Institute for Intelligent Systems)

Counterfactual Policy Evaluation and Optimization in Reproducing Kernel Hilbert Spaces

Ryuichiro Hataya (RIKEN AIP and The University of Tokyo)

deep learning under weak supervision

Taiji Suzuki (The University of Tokyo / RIKEN-AIP)

statistical learning theory, stochastic optimization, deep learning, kernel method, high dimensional statistics, concentration inequality

Koji Tsuda (The University of Tokyo / RIKEN-AIP)

Kazuki Yoshizoe (RIKEN AIP)

Search algorithms and parallel computing

Tsuyoshi Okita (RIKEN AIP)

deep learning for disaster resilience

Yung-Kyun Noh (Seoul National University → Hanyang University)

Nonparametric methods, learning theory

Youn-Sung Lee (Korea Electronics Technology Institute (KETI))

Machine Learning, Signal Processing, VLSI architecture

Hojin Jung (Samsung Research)

Natural language processing

Seongho Son (NCSOFT Reinforcement Learning Team)

Reinforcement Learning for video games, Bayesian Deep Learning

Jinwook Seo (Seoul National University)

Visualization, HCI

Nayeong Kim (POSTECH)

Interpretable machine learning, Explainable machine learning(XAI), Disentangled representation, Anomaly Detection

Tanuj Kr Aasawat (RIKEN AIP)

HPC, Large-scale Graph Processing

Francis Hamilton (Technical University of Denmark / RIKEN-AIP)

Deep RL for robotic mapping

Jaejin Lee (Seoul National University)

Masashi Sugiyama (The University of Tokyo / RIKEN-AIP)