site:www.cs.utexas.edu

www.cs.utexas.edu2d

The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications

In reinforcement learning (RL), a reward function that aligns exactly with a task's true performance metric is often sparse. For example, a true task metric might encode a reward of 1 upon success and ...

www.cs.utexas.edu2d

Generative Adversarial Imitation from Observation

Imitation from observation (IfO) is the problem of learning directly from state-only demonstrations without having access to the demonstrator's actions.The lack of action information both ...

www.cs.utexas.edu5d

Research Poster Printing Services

At UTCS Publications Office, we operate in alignment with the academic calendar, including closures during holidays and breaks when no classes are held. Please be aware of these closures when planning ...

www.cs.utexas.edu5d

Representing Knowledge of Large-Scale Space (1977)

This dissertation presents a model of the knowledge a person has about the spatial structure of a large-scale environment: the ``cognitive map.'' The functions of the cognitive map are to assimilate ...

www.cs.utexas.edu6d

My Course Schedule

CS 309: AI Literacy (Essentials of AI) Web-based (Zoom) - GDC TTH 9:30am-11:00am Peter Stone ...

www.cs.utexas.edu6d

Michael Quinlan

Michael researches various aspects of robotic systems, including motion, vision and localization. He currently teaches a class on Autonomous Vehicles and competes as part of the Austin Villa team at ...

www.cs.utexas.edu6d

Multi-Robot Human Guidance: Human Experiments and Multiple Concurrent Requests

In the multi-robot human guidance problem, a centralized controller makes use of multiple robots to provide navigational assistance to a human in order to reach a goal location. Previous work used ...

www.cs.utexas.edu6d

Raymond J. Mooney

Professor of Computer Sciences, University of Texas at Austin. B.S. in Computer Engineering, University of Illinois at Urbana/Champaign, 1983 M.S. in Computer Science, University of Illinois at Urbana ...

www.cs.utexas.edu7d

TAMER: Training an Agent Manually via Evaluative Reinforcement

Though computers have surpassed humans at many tasks, especially computationally intensive ones, there are many tasks for which human expertise remains necessary and/or useful. For such tasks, it is ...

www.cs.utexas.edu7d

Reinforcement Learning from Simultaneous Human and MDP Reward

Reinforcement Learning from Simultaneous Human and MDP Reward. W. Bradley Knox and Peter Stone. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), ...

www.cs.utexas.edu7d

The right music at the right time: adaptive personalized playlists based on sequence modeling

The right music at the right time: adaptive personalized playlists based on sequence modeling. Elad Liebman, Maytal Saar-Tsechansky, and Peter Stone Peter Stone. Management Information Systems ...

www.cs.utexas.edu7d

Protecting Against Evaluation Overfitting in Empirical Reinforcement Learning

Protecting Against Evaluation Overfitting in Empirical Reinforcement Learning. Shimon Whiteson, Brian Tanner, Matthew E. Taylor, and Peter Stone. In IEEE Symposium on Adaptive Dynamic Programming and ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results