Publications

You can also find my articles on my Google Scholar profile.

Selected Publications


  • "PRISM: Navigating Cost–Accuracy Trade-offs for NL2SQL",
  • "Fine-Grained Table Retrieval Through the Lens of Complex Queries.",
  • "100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models",
  • "Multi-Objective Agentic Rewrites for Unstructured Data Processing.",
  • "LLMs and Databases: A Synergistic Approach to Data Utilization.", IEEE Data Engineering Bulletin
  • "Is Long Context All You Need? Leveraging LLM's Extended Context for NL2SQL.", VLDB
  • "High-Fidelity And Complex Test Data Generation For Real-World SQL Code Generation Services.", arxiv
  • "Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving.",
  • "CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL.", ICLR
  • "CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL.",
  • "Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach.", TKDE
  • "Slice Finder: Automated Data Slicing for Model Validation.", ICDE
  • "Quantifying Uncertainty in Data Exploration.", Brown University
  • "Democratizing Data Science through Interactive Curation of ML Pipelines.", SIGMOD
  • "Automated Data Slicing for Model Validation: A Big data-AI Integration Approach",
  • "Unknown Examples & Machine Learning Model Generalization.", arxiv
  • "Towards Quantifying Uncertainty in Data Analysis & Exploration.", IEEE Data Engineering Bulletin
  • "Towards Interactive Curation & Automatic Tuning of ML Pipelines.", MLSys
  • "Slice Finder: Automated Data Slicing for Model Validation.", ICDE
  • "Improved Neighborhood Search for Collaborative Filtering.", IJFIS
  • "Towards Quantifying Uncertainty in Data Analysis & Exploration", IEEE Bulletin (Data Engineering) 2018
  • "Estimating the Impact of Unknown Unknowns on Aggregate Query Results.", TODS
  • "Unknown examples & machine learning model generalization", arxiv 2018
  • "Towards Interactive Data Exploration.",
  • "A Data Quality Metric (DQM): How to Estimate the Number of Undetected Errors in Data Sets.",
  • "Estimating the Impact of Unknown Unknowns on Aggregate Query Results.", SIGMOD
  • "A Data Quality Metric (DQM): How to Estimate The Number of Undetected Errors in Data Sets.", VLDB
  • "Using RDMA for Lock Management.",
  • "Towards interactive data exploration",
  • "Estimating the Impact of Unknown Unknowns on Aggregate Query Results.",
  • "A Behavior Analysis-Based Game Bot Detection Approach Considering Various Play Styles.",
  • "Semi-supervised learning for sentiment analysis in mass social media",
  • "Distributed Twitter Opinion Mining System Using MongoDB Aggregation Framework",
  • "TV program recommendation method using LDA clustering",
  • "Sentiment Analysis Using News Comments for Public Opinion Mining",
  • "Personalized Expert-Based Recommender System: Training C-SVM for Personalized Expert Identification.",
  • "Game bot detection approach based on behavior analysis and consideration of various play styles",
  • "BitTorrent Network Traffic Analysis for Peer Link Prediction",
  • "Torrent Crawler: a tool for collecting information from BitTorrent networks",
  • Technical Reports


  • "High-Performance Llama 2 Training and Inference with PyTorch/XLA on Cloud TPUs", PyTorch Blog 2023
  • "PyTorch/XLA SPMD: Scale Up Model Training and Serving with Automatic Parallelization", PyTorch Blog 2023