Date of Award


Document Type


Degree Name

Master of Science in Electrical and Computer Engineering (MSECE)

First Advisor

Dr. Abdollah Homaifar


The past few decades have produced many successful applications of machine learning. In the area of automated controls, machine learning controllers have been developed to take advantage of the tendency for computers to stumble upon avenues to solutions people either overlooked, or simply do not have the computational capability to explore. Reinforcement learning is a machine learning technique for learning how to choose actions in an environment to increase its expected reward. The reward is the signal that indicates whether or not proper actions were chosen, therefore a behavioral goal is achieved from implicit rewards rather than explicit direction. The components of reinforcement learning are defined in terms of components involved with automated controls to produce a controller. The controller learned is in the form of a multi-layered neural network which takes the observed state as an input and gives an output for how to choose the action. The parameters of the neural network are updated using proximal policy optimization technique to maximize the expected reward designed for the system. We explore the use of model-free reinforcement learning for application to controls problems from the design of the reward signal. Focusing on lower level control, specifically with unmanned aerial vehicle application, the learning algorithm is expected to have the ability to control for specific performance metrics similar to those of a control problem. A general reward signal is defined that takes several common control concerns, such as error, time to target operation, energy consumption, and uses them as individual terms that may be scaled with the use of a coefficient. Two case studies are used to explore how this reward structure effects the output policy learned through reinforcement learning. Each study uses the OpenAI Gym and Baselines framework for application purposes. The first is a continuous action version of the mountain car problem where individual reward elements are applied in comparison to several combinations and a complete function of all of the elements. The results show how a solution can be learned with specific concerns that are controlled by the reward structure in this simpler control case. The second case study features a quadrotor flight controller capable of producing a 3-dimensional 2 target rotation velocity from the 4 motor thrust actions. This controller is compared to a traditional tuned PID controller designed for the same system. The results from this study demonstrate that by shaping the reward structure to resemble the control metrics, the ability of resulting controllers being indirectly designed to give priority to different concerns showed that the error and energy elements can be manipulated slightly with respect to a balance in their individual reward magnitudes. The direct connection to the coefficients and their effect on the output controller policies are not completely defined in this thesis, but through an analysis of the response of controllers trained in this fashion, there exists a correlation between how these coefficients are defined and the external control results they are designed to control.