Special Seminar: Yagiz Savas - Entropy maximization for Markov decision processes

Event Type
Seminar/Symposium
Sponsor
Decision and Control Laboratory, Coordinated Science Laboratory
Location
141 Coordinated Science Laboratory
Date
April 25, 2019 3:00 PM - 4:00 PM
Speaker
Yagiz Savas, University of Texas at Austin
Cost
Registration
Contact
Angie Ellis
Email
amellis@illinois.edu
Phone
217-300-1910

SPECIAL SEMINAR

 

Entrophy maximization for Markov decision processes


Yagiz Savas

Ph.D. Candidate

University of Texas at Austin

 

Thursday, April 25, 2019

3:00pm – 4:00pm

141 CSL

__________________________________________________

This talk focuses on the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) in the presence of a temporal logic constraint. Such a policy minimizes the predictability of the trajectories followed by an agent to complete a task with desired probability. In a dual sense, it maximizes the average number of queries required for an outside observer to infer the agent’s trajectory.

In the first part of the talk, I will consider the entropy maximization problem in the absence of constraints. I will show that, under stationary policies, maximum entropy of an MDP can be finite, infinite or unbounded, and present a polynomial-time algorithm which, for a given MDP, verifies the property of the maximum entropy. I will then present a polynomial-time algorithm, based on a convex optimization problem, to synthesize a policy that maximizes the entropy of an MDP. In the second part, I will include temporal logic and expected reward constraints to the framework and show how to incorporate the constraints into the algorithms presented in the first part. Finally, I will briefly discuss the extensions of the entropy maximization problem to partially observable environments.

Bio:

Yagiz Savas joined the Department of Aerospace Engineering at the University of Texas at Austin as a Ph.D. student in Fall 2017. He received his B.S. degree in Mechanical Engineering from Bogazici University in 2017. His research focuses on developing theory and algorithms that guarantee desirable behavior of autonomous systems operating in adversarial environments.