Neural network AI - generating training data Thread - Jedi Fighter mod for Star Wars: Jedi Academy - ModDB

Jedi Fighter

Star Wars: Jedi Academy mod | Early Access 2016

JEDI FIGHTER is a 2D fighting game inspired by Street Fighter II. Fight online against friends or the CPU! Full gamepad and HD/4K/21:9 support! Get it today!!!

Forum Thread

	Posts
Neural network AI - generating training data (Games : Star Wars: Jedi Academy : Mods : Jedi Fighter : Forum : Developer Updates : Neural network AI - generating training data)		Locked
Thread Options
skew	Feb 28 2016 Anchor
skew	Making the AI better has been a top focus for me over the past two weeks. This is a key feature of JEDI FIGHTER beta 3. What is the AI going to be and why? JEDI FIGHTER beta 2 used the normal Jedi Academy AI, so it wasn't able to pick from the constrained set of moves available to that fighter. That, coupled with the fact that there won't be many servers, necessitate creating proper AI so people can actually enjoy the game when playing by themselves or on sparsely populated servers. Just like playing against the computer at arcades... My main objectives with the AI are to: create a compelling opponent that isn't entirely predictable, utilize the least amount of man hours in tuning the AI if the move set is modified, and polish my skills in machine learning. The general approach for the AI is going to be using one multi-layer perceptron (MLP, a type of neural network) per fighter. Basically, an MLP takes inputs, processes them, and produces outputs based on how it has been trained to respond to inputs. The outputs will be a set of numbers from 0 to 1, where there will be one number per move or block. The number with the highest value will correspond to the move that gets chosen. Since each fighter has different moves, there will be one MLP per fighter. More precisely, an MLP takes inputs, processes them through a cascading set of activation functions with weights, and combines the results into a set of outputs through an output layer of activation functions. There will be one hidden layer to help make things non-linear. I chose MLPs because: they can be trained to be a little unpredictable (more on this later) if training data could be automatically generated, then they could be retrained with little manual effort they are a type of machine learning and this is a practical implementation people have tried before successfully For each major system in Jedi Fighter, I try come up with a few concepts and will choose between them based on feasibility, simplicity, and maintainability. This is usually done on paper OR through the implementation of minimal viable products (MVPs) where it's practical in my free time. The AI is going to get a few MVPs to experiment with the key components of training and decision making. What are the inputs? The inputs (the data the MLPs will use to make decisions) need to convey key information needed to choose the next move (or block...). Consider fighting games, where the view is 2D and you can see your fighter and opponent, can see exactly what they are doing, and know their health and special meter. This means the AI can basically be omniscient and not be cheating. Most people take into account the distance from the opponent, the health of themselves and their opponent, as well as the devastation of the opponents attacks to decide whether to block or do a particular attack. The AI should behave the same, so it needs a few key inputs: stats about themselves and the opponent (i.e. health, force points, relative position, velocity) information about the opponents fighting state (i.e. classifying the opponents move as being very harmful, harmful, not harmful rather than hardcoding knowledge of each individual move for each fighter) Currently, I am experimenting with these inputs: self and opponent health relative displacement vector to the opponent self and opponent velocity vectors Soon I'll be adding the classification of the opponent move (very harmful, harmful, not harmful) as an input. What are the outputs? My first experiment will have the outputs of the MLPs as a vector of scalars from 0 to 1, where a subset of them will have a one-to-one mapping to the moves list for that fighter. The output element with the highest value will be chosen as the next move (including blocking). I might experiment with adding fighter combat states as outputs that correspond to being aggressive or defensive, which would then also become inputs. The fighter would also need to attempt to classify the opponent into being aggressive or defensive and use that as an input, but this leads to greater complexity than needed. More on this later when the network gets more fully implemented. What is training, and how will it be done? Training an MLP consists of passing a set of inputs into the network, having it compute it's output, and then using the difference between the actual output with the ideal output to modify weights in the network that would make it get closer to producing the ideal output. Think of it as reinforcement learning, where the neural pathways are modified to produce the desired outputs more often than undesired ones. If you are mathematically inclined, it's a type of gradient descent optimization of weights where the objective function is 0 error between ideal and actual outputs. If you're already familiar with MLPs, I'm going to be doing backpropagation. Training requires a large training dataset to exist, which of course doesn't exist for JEDI FIGHTER, so a training set needs to be created without a lot of effort. The approach I am experimenting with is to start a server full of bots and let them fight by choosing their own moves randomly with random delays between moves. The bots will capture the inputs (self and opponent health, positions, velocity) as soon as they decide on a new move and will capture the same inputs after the move has ended. I'll then post process the resulting datasets to create the ideal output. I am going to experiment with post processing by clustering (quantizing) the random move samples into ranges of relative position and health. In those smaller clusters, I'll evaluate which of the moves were the most devastating and self-preserving by looking at the change in health before / after the moves. Those that are the most devastating would be considered the ideal move, so the ideal output for every input that falls into this cluster would be the ideal output vector where the element corresponding to the ideal move would be 1 and the rest would be 0. This should give a decent training dataset where the ideal moves would be reinforced. This youtube video shows the bots going through their moves randomly and recording their effectiveness as part of the larger pre-processed training data set. Youtu.be Two things evident from this experiment: Darth Vader is overpowered (has a few moves, where a high percentage of them are devastating) so things are inherently unbalanced Because Darth is overpowered and kept winning, he got many more training samples generated than the others, which means I need to ensure fairness in training set generation (i.e. equal number of samples used to train each fighter) Stay tuned! I'll post here with more information as soon as progress is made! -- skew Edited by: skew

Reply to thread

click to sign in and post

Only registered members can share their thoughts. So come on! Join the community today (totally free - or sign in with your social account on the right) and join in the conversation.