Kevin Tan

Hi! I’m Kevin, a 2nd-year statistics (and data science) PhD. student at Penn (Wharton), co-advised by two amazing supervisors in Yuting Wei and Giles Hooker. Previously, I was an undergraduate at Michigan (go blue!), where I graduated with four majors in statistics, honors mathematics, economics, and data science.

My work spans bandits and reinforcement learning, diffusion models, high-dimensional statistics, and methods for stochastic dynamical systems.

What have I done?

With collaborators including but not limited to {Wei Fan, Chinmaya Kausik, Ziping Xu, Zhihan Huang, Haimo Fang} and senior(er) faculty {Yuting Wei, Giles Hooker, Ambuj Tewari and Edward Ionides}:

Hybrid reinforcement learning. If offline RL is analogous to learning by watching, and online RL is analogous to learning by doing, then hybrid RL studies what happens when machines can learn by both watching and doing. We’ve shown that machines require fewer data samples under certain conditions when they are allowed to access both offline and online data, and can also achieve computational efficiency speedups in settings like actor-critic methods and diffusion policies.
Actor-critic algorithms and diffusion policies. We’ve recently solved an open problem on whether actor-critic algorithms can be sample-efficient in general. This comes with some implications in practice, such as on the efficiency of exact gradients (like with DDPG and TD3) relative to on-policy sampling (like with PPO). In unreleased work submitted to NeurIPS, we show that diffusion policies can indeed achieve sublinear regret.
Heterogenous data within sequential decision-making. Chinmaya and I have a long-running collaboration working on mixtures of sequential decision-making processes. We first tackled the problem of learning mixtures of Markov chains and MDPs from logged data in a short oral presentation at ICML 2023. Recently, we use a similar method to accelerate the process of personalizing (to a new user) by exploiting low-dimensional structure within high-dimensional embeddings (from other users), amidst heterogeneity in user tastes and preferences.
Statistical inference for gradient boosting. We’ve been developing methods for constructing confidence and prediction intervals for gradient boosting regression, along with hypothesis tests for variable importance. Our approach supports dropout and parallelized bootstrapping by leveraging a key insight: with appropriate (Boulevard-style) regularization, gradient boosting converges to kernel ridge regression. This line of work includes an unreleased NeurIPS submission and two additional papers currently in preparation.
Inference for partially-observed Markov processes. Say you can’t fully observe the environment around you, but you have an internal belief on the state of it. Given only a model/simulator for the environment dynamic, and a likelihood for what you observe given your belief, we’ve developed improved algorithms for estimation and inference for the parameters behind the model. Our work has applications to disease modeling, such as a study on cholera in Haiti, and has spawned a Python package in development, for which I wrote most of the guts. This is a long-running collaboration with Edward Ionides and his lab, since I was an undergraduate at Michigan.
Applications to causal inference. Don’t want to get scooped, can elaborate offline.

How have I contributed to capitalism?

Amazon. I’m currently working at Amazon as an Applied Scientist intern. I’m part of the MOSS (Materials, Handling, and Equipment Optimization, Systems, and Science) Science team within Amazon Fulfillment Technologies and Robotics. My focus is on continuous-time behavior cloning, dynamics model estimation, and policy learning under sparse observations. This is for optimizing package flow and staffing within the next generation of Amazon fulfillment centers.
Shade. During my final winter at Michigan and the following summer, I worked with Brandon (co-founder and CEO) and his brother Jonathan to build the core ML/AI systems behind Shade, an AI-driven platform for managing creative assets. Shade serves as an all-in-one hub for media storage, collaboration, and workflow automation. There, Brandon and I delivered features including a text-audio search engine powered by multimodal embeddings, a similar search engine for 3D textures, jersey number recognition, fast online face clustering for facial recognition, and lightweight asset description generation from a small set of tags. I also constructed a framework for fast and lightweight fine-tuning of embeddings for large multimodal models, trained more than a few deep classifiers, and the three of us distilled BLIP – speeding up inference by over 7 times while maintaining a similar level of performance.

In my spare time, I build earphones and guitars, cook (including but not limited to Singaporean food, which I really miss), and bake (including but not limited to pineapple tarts and sourdough).

news

May 19, 2025	I’ve started an internship at Amazon as an Applied Scientist intern.
Apr 20, 2025	Two papers accepted to ICML 2025! These are on leveraging offline data in linear latent bandits and on how actor-critics can achieve optimal sample-efficiency. Spending too much time writing reviews paid off – I’ve received a Top Reviewer award with complimentary registration.
Aug 09, 2024	Two papers accepted to NeurIPS 2024! These are on how hybrid RL can break sample size barriers in linear MDPs and solving distributed least squares problems in small space with randomized algorithms for matrix sketching. I’ve also received the NeurIPS Travel Award!