University Park, Pa. Penn State researchers are looking for a safer and more efficient way to use machine learning in the real world. Using a simulated high-rise office building, they developed and tested a new reinforcement learning algorithm aimed at improving energy consumption and occupant comfort in a real-world setting.
Greg PawlakCo-authors, “Constrained Differential Cross-Entropy Method for Secure Model-Based Reinforcement Learning,” presented results at the Association for Computing Machinery International Conference on Energy for Systems, assistant professor of architectural engineering at Penn State. efficient built environment (buildsys) convention, which was held in Boston on November 9–10.
“Reinforcement learning agents explore their environment through trial and error to learn optimal actions,” Pawlak said. “Due to the challenges in simulating the complexities of the real world, there is a growing trend to train reinforcement learning agents directly in the real world rather than through simulations.”
However, according to the researchers, deploying reinforcement learning in real environments presents its own challenges.
“Two critical requirements for real-world reinforcement learning are efficient learning and security considerations,” said paper co-author Sam Mottaheedi, who was a Penn State doctoral student in architectural engineering during the study. “Some reinforcement learning systems require millions of interactions and many years to learn the optimal policy, which is not practical in real-world scenarios. Additionally, there is a potential for them to make bad decisions that produce undesirable results or give unsafe results.”
This concern prompted the researchers to ask the question: how do we develop algorithms that enable these types of reinforcement learning agents to learn safely in the real world, without making too many bad decisions that break things? Or do people get hurt?
The researchers used an existing model-based reinforcement learning approach to train their model for decision making. This artificial intelligence agent – control algorithm – employs trial and error to interact with the environment, which was a building block for their project.
“The safety critical factor of our research was, at a minimum, not to break anything in the building and to ensure that the occupants are always comfortable,” Pawlak said. “While we don’t have to worry about someone getting hit by a car, which is a concern for reinforcement learning in self-driving cars, we do have to worry about creating device handling constraints. ”
The researchers wanted to reduce energy use without violating thermal comfort, which ranges from -3, too cold, to +3, too hot. If the control algorithm has completed an action that resulted in a rest outside the -0.5/+0.5 range, it will be penalized. The control algorithm was able to maintain -0.5/+0.5, which is an accepted standard in the building industry.
“If the controller is set up to find the best energy consumption, for example, it will be rewarded for achieving this good behavior,” Pawlak said. “Alternatively, if it does something that increases energy consumption, it will be punished for bad behavior. This trial-and-error approach reinforces learning by gathering information so that the controller can decide whether What To Do Next.
For this project, the researchers simulated a large office building in the Chicago climate zone. An equipment concern in a real 30-story building could include anything with a large motor, such as the chillers that are used to cool the building.
“Big motors don’t like to go fast,” Pawlak said. “For example, a large chiller may be turned on once a day and turned off once a day – a total of two events – to avoid damaging the equipment. If our agent’s actions result in a If more than two chiller incidents happen in a day, he will be penalised.”
The researchers compared their model-based approach to other common methods of reinforcement learning, including the use of model-free algorithms. A model-based agent can plan its course of action because it is able to predict the reward for it. A model-free agent needs to take action in order to actually learn from it.
“The model-free algorithm works well but violates some security constraints,” Pawlak said. “It takes a long time to learn good manners, sometimes years or ten years.”
The researchers’ model learned about 50 times faster than the traditional model-free method, accomplishing in a month what other approaches would have required years to do. And because of the way the researchers incorporated safety factors, their models resulted in fewer – sometimes zero – safety critical aspects being violated.
According to Pawlak, adding security constraints makes reinforcement learning a game of balancing trade-offs. The reinforcement agent can maximize energy consumption, which is a good behavior, by turning off power completely. However, doing so will have a negative impact on the comfort of the occupants, which is bad practice.