In the Artificial Intelligence field, there are mainly three categories: supervised learning, unsupervised learning, and reinforcement learning. My research mainly focuses on reinforcement learning which achieves optimal performance through interacting with the environment.
Serial Interactions in Imperfect Information Games Applied to Complex Military Decision Making (SI3-CMD)
Serial Interactions in Imperfect Information Games Applied to Complex Military Decision Making (SI3-CMD) builds on recent developments in artificial intelligence and game theory to enable more effective decisions in adversarial domains. SI3-CMD will explore several military decision making applications at strategic, tactical, and operational levels and develop AI/game theory techniques appropriate for their problem characteristics. These applications will extend current AI/game theory techniques to be effective when there are multiple interacting agents, extremely large search spaces, sequential revelation of information, use of deception, continuous resource quantities, stochastic outcomes, and the ability to learn from past iterations. The program will produce new techniques and assessments of their effectiveness for military uses.
Reinforcement Learning with Temporal Logic Constraints
This objective is to develop a model-free reinforcement learning method for stochastic planning under temporal logic constraints. In recent work , we propose an approach to translate high-level system specifications expressed by a subclass of Probabilistic Computational Tree Logic (PCTL) into chance constraints. We devise a variant of Approximate Dynamic Programming method—approximate value iteration— to solve for the optimal policy while the satisfaction of the PCTL formula is guaranteed.
In , we study model-free reinforcement learning to maximize the probability of satisfying high-level system specifications expressed in a subclass of temporal logic formulas—syntactically co- safe linear temporal logic. In order to address the issue of sparse reward given by satisfaction of temporal logic formula, we propose a topological approximate dynamic programming which includes two steps: First, we decompose the planning problem into a sequence of sub-problems based on the topological property of the task automaton which is translated from a temporal logic formula. Second, we extend a model-free approximate dynamic programming method to solve value functions, one for each state in the task automaton, in an order reverse to the causal dependency. Particularly, we show that the run-time of the proposed algorithm does not grow exponentially with the size of specifications. The correctness and efficiency of the algorithm are demonstrated using a robotic motion planning example.
Lening Li, Jie Fu, “Approximate Dynamic Programming with Probabilistic Temporal Logic Constraints”, arXiv:1810.02199, Annual American Control Conference, 2019.
Lening Li, Jie Fu, “Topological Approximate Dynamic Programming under Temporal Logic Constraints”, arXiv: 1907.10510, IEEE Conference on Decision and Control, accepted, 2019.
The Department of Defense’s strategic plan calls for the Joint Force to conduct humanitarian, disaster relief, and related operations. Some disasters, due to grave risks to the health and wellbeing of rescue and aid workers, prove too great in scale or scope for timely and effective human response. The DARPA Robotics Challenge (DRC) seeks to address this problem by promoting innovation in human-supervised robotic technology for disaster-response operations. The primary technical goal of the DRC is to develop human-supervised ground robots capable of executing complex tasks in dangerous, degraded, human-engineered environments. Competitors in the DRC are developing robots that can utilize standard tools and equipment commonly available in human environments, ranging from hand tools to vehicles. To achieve its goal, the DRC is advancing the state of the art of supervised autonomy, mounted and dismounted mobility, and platform dexterity, strength, and endurance. Improvements in supervised autonomy, in particular, aim to enable better control of robots by non-expert supervisors and allow effective operation despite degraded communications (low bandwidth, high latency, intermittent connection). The DRC program website provides program highlights, including the DRC Trials held in December 2013 and the DRC Finals in June 2015.
Anytime and scalable planning: Function approximation meets importance sampling
Planning of nonlinear robotic systems is NP-complete. Thus, approximate solutions have been investigated, such as discretization-based (A*) and sampling based (RRT*). We explore a different approximation scheme — function approximation — that transforms the planning problem in state space or workspace to a planning problem in a parameter space for policy function approximation. A dimensionality reduction is achieved because the parameter space can be low dimensional, comparing to the state space. In our preliminary work, we introduced importance sampling to efficiently search for optimal feedback policy function approximation. See the left figure for applying the sampling-based algorithm for motion planning for a Dubins Car-like robot for goal reaching and obstacle avoidance. Lines from the initial sample to the last sample upon convergence vary from the lightest to darkest grey.
The algorithm supports parallel computation and anytime planning. Thus, it has the potential to take full advantage of scalable parallelization computing scheme in cloud robotics and GPU-accelerated robotics.
Lening Li, Jie Fu, “Sampling-based approximate optimal temporal logic planning”, 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 2017, pp. 1328-1335. [pdf]