Research Overview

Roughly speaking, I am interested in designing efficient machine learning (ML) algorithms and systems. I am particularly interested in 1) characterizing the fundamental limit of the amount and quality of data required for reliable learning and designing computationally efficient algorithms, 2) developing solutions to the ‘learning with scarce data’ problem, and 3) developing coding-inspired algorithms for scalable ML systems. Casually speaking, my research goal is to answer the following three questions:

Q1. How much data is needed for reliable ML?
Q2. What should I do if I don't have enough data?
Q3. How can we design scalable ML systems?

I also develop new ML algorithms and apply them to real-world applications.

Q4. How can we solve real-world problems using ML?

Some of my recent work can be roughly clustered as follows.

Q1. How much data is needed for reliable ML? How can we design efficient algorithms?

  • SAFFRON: Sparse-Graph Code Framework for Group Testing
    K. Lee, R. Pedarsani, and K. Ramchandran
    IEEE Transactions on Signal Processing 2019

  • Community Recovery in Hypergraphs
    K. Ahn*, K. Lee*, and C. Suh
    IEEE Transactions on Information Theory 2019

  • Hypergraph Spectral Clustering in the Weighted Stochastic Block Model
    K. Ahn, K. Lee, and C. Suh
    IEEE Journal of Selected Topics in Signal Processing October 2018

  • Information-theoretic Limits of Subspace Clustering
    K. Ahn, K. Lee, and C. Suh
    IEEE ISIT 2017

  • PhaseCode: Fast and Efficient Compressive Phase Retrieval based on Sparse-Graph-Codes
    R. Pedarsani, D. Yin, K. Lee, and K. Ramchandran
    IEEE Transactions on Information Theory June 2017

Q2. What should I do if I don't have enough data?

  • Binary Rating Estimation with Graph Side Information
    K. Ahn, K. Lee, H. Cha, and C. Suh
    NeruIPS 2018

  • Crash to Not Crash: Learn to Identify Dangerous Vehicles using a Simulator
    H. Kim, K. Lee, G. Hwang, and C. Suh
    AAAI 2019, ICML Workshop on Machine Learning for Autonomous Vehicles 2017

  • Simulated+Unsupervised Learning With Adaptive Data Generation and Bidirectional Mappings
    K. Lee*, H. Kim*, and C. Suh
    ICLR 2018

Q3. How can we design scalable ML systems?

  • UberShuffle: Communication-efficient Data Shuffling for SGD via Coding Theory
    J. Chung, K. Lee, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran
    SysML 2018, NIPS Workshop on Machine Learning Systems 2017

  • Speeding Up Distributed Machine Learning Using Codes
    K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran
    IEEE Transactions on Information Theory January 2018

  • High-Dimensional Coded Matrix Multiplication
    K. Lee, C. Suh, and K. Ramchandran
    IEEE ISIT 2017

  • The MDS Queue: Analysing the Latency Performance of Erasure Codes
    K. Lee, N. Shah, L. Huang, and K. Ramchandran
    IEEE Transactions on Information Theory May 2017

  • On Scheduling Redundant Requests With Cancellation Overheads
    K. Lee, R. Pedarsani, and K. Ramchandran
    IEEE/ACM Transactions on Networking April 2017

  • When Do Redundant Requests Reduce Latency?
    N. Shah, K. Lee, and K. Ramchandran
    IEEE Transactions on Communications February 2016

  • A VoD System for Massively Scaled, Heterogeneous Environments: Design and Implementation
    K. Lee, L. Yan, A. Parekh, and K. Ramchandran
    IEEE MASCOTS 2013

Q4. How can we solve real-world problems using ML?

  • Improving Model Robustness via Automatically Incorporating Self-supervision Tasks
    D. Kim, K. Lee, and C. Suh
    NeurIPS 2019 Workshop on Meta-Learning (MetaLearn 2019)

  • Large-scale and Interpretable Collaborative Filtering for Educational Data
    K. Lee, J. Chung, and C. Suh
    KDD Workshop on Advancing Education with Data 2017

  • Machine Learning Approaches for Learning Analytics: Collaborative Filtering or Regression With Experts?
    K. Lee, J. Chung, Y. Cha, and C. Suh
    NIPS Workshop on Machine Learning for Education 2016