Publications

You can also find my articles on my Google Scholar profile.

Selected Publications


  • "PRISM: Navigating Cost–Accuracy Trade-offs for NL2SQL", SIGMOD
  • "Multi-Objective Agentic Rewrites for Unstructured Data Processing.", VLDB 2026
  • "Fine-Grained Table Retrieval Through the Lens of Complex Queries.", arXiv preprint
  • "Architecting the AI-Powered Agentic Data Cloud.",
  • "100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models", SIGMOD
  • "LLMs and Databases: A Synergistic Approach to Data Utilization.", IEEE Data Engineering Bulletin
  • "Is Long Context All You Need? Leveraging LLM's Extended Context for NL2SQL.", VLDB
  • "High-Fidelity And Complex Test Data Generation For Real-World SQL Code Generation Services.", arXiv preprint
  • "Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving.", arXiv preprint
  • "CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL.", ICLR
  • "Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach.", TKDE
  • "Slice Finder: Automated Data Slicing for Model Validation.", ICDE
  • "Quantifying Uncertainty in Data Exploration.", Brown University
  • "Democratizing Data Science through Interactive Curation of ML Pipelines.", SIGMOD
  • "Towards Quantifying Uncertainty in Data Analysis & Exploration.", IEEE Data Engineering Bulletin
  • "Towards Interactive Curation & Automatic Tuning of ML Pipelines.", MLSys
  • "Improved Neighborhood Search for Collaborative Filtering.", IJFIS
  • "Towards Quantifying Uncertainty in Data Analysis & Exploration", IEEE Bulletin (Data Engineering) 2018
  • "Estimating the Impact of Unknown Unknowns on Aggregate Query Results.", TODS
  • "Unknown examples & machine learning model generalization", arXiv preprint 2018
  • "Towards Interactive Data Exploration.", BIRTE
  • "A Data Quality Metric (DQM): How to Estimate the Number of Undetected Errors in Data Sets.", VLDB
  • "Estimating the Impact of Unknown Unknowns on Aggregate Query Results.", SIGMOD
  • "Using RDMA for Lock Management.", arXiv preprint
  • "Estimating the Impact of Unknown Unknowns on Aggregate Query Results.", arXiv preprint
  • "TV program recommendation method using LDA clustering", HCI Korea 2014
  • "Semi-supervised learning for sentiment analysis in mass social media", Journal of Korean Institute of Intelligent Systems
  • "Distributed Twitter Opinion Mining System Using MongoDB Aggregation Framework", HCI Korea
  • "Sentiment Analysis Using News Comments for Public Opinion Mining", Journal of Korean Institute of Intelligent Systems
  • "Personalized Expert-Based Recommender System: Training C-SVM for Personalized Expert Identification.", MLDM
  • "Game bot detection approach based on behavior analysis and consideration of various play styles", ETRI Journal
  • "BitTorrent Network Traffic Analysis for Peer Link Prediction", Korea Information Processing Society Conference
  • "Torrent Crawler: a tool for collecting information from BitTorrent networks", Cornell University
  • Technical Reports


  • "High-Performance Llama 2 Training and Inference with PyTorch/XLA on Cloud TPUs", PyTorch Blog 2023
  • "PyTorch/XLA SPMD: Scale Up Model Training and Serving with Automatic Parallelization", PyTorch Blog 2023