Gaming with a keyboard is fun, but using your webcam makes it even better.
In this article, I will accompany you through how you can make your own AI-powered game.

Project Demo
As usual, we will apply the divide-and-conquer concept to create our system. Basically, we will need to break down our dilemma (create an AI-powered game) into lighter sub-obstacles. I got up with the successive obstacle set:
- We need a game
- Since we are accepting a webcam, we require a medium that facilitates us to pick up our hand movements
- After we pick up our hand actions, we require an instrument that labels the actions into some sort of logic
- Last, we require a controller that regulates the comprehensive actions of the game from the logic developed
You can find the source code below:
So first we need a game to get started. After a few minutes of wondering, I came up with an elite weapon — “copy & paste”. It’s better to use someone else’s code than writing it from scratch. So I did the same, went to pygames.org, browsed through some webpages, and came up with a game called Space War. Since I was not planning to use complex movements within the game, Space war fit’s the requirement just perfectly. It has basic actions like move up, down, forward, backward and shoot.
Now we have a game to play. Next thing we need is to get our hand movements from our webcam. Since I had some experience with facial recognition, I was pretty familiar with OpenCV which will help us to capture hand gestures. Adding up till now we have a game and a subsystem to get our hand movements.
Similarly, We need a mechanism that converts our hand movements into some sort of logic. As I was getting familiar with computer vision, I made a classifier that classifies an image into 3 distinct classes i.e. Rock , Paper and Scissor using convolution neural networks(CNN’s). For simplicity, I will use only two of this logic.
Rock -> move down
Paper ->move up
Last, we require a controller. After performing some exploration (googling) I got up with an awesome python library called pynput from which we can replicate keyboard activities from our system. All I require now is to unite explanations of these different sub processes into one full performing structure.
So I wrote a script that merges solution of our four problems:
import cv2                       
import os                       
import tensorflow as tf                       
import numpy as np                       
from pynput.keyboard import Key, Controller                                               c = Key.space                       
monitor_height = 1080                       
monitor_width = 1920                       
keyboard = Controller()                       
def start():                           
cap = cv2.VideoCapture(0)                           
model = tf.keras.models.load_model("Model/rps2.h5")                           
while True:        
                       
    ret, img = cap.read()                               
    keyboard.press(Key.space)                               
    predict_image = img.copy()                               
    predict_image = cv2.flip(predict_image, 1)                               
    height, width = predict_image.shape[:2]                               
    x1, y1 = int(width * 0.25), int(height * 0.25)                               
    x2, y2 = int(width * 0.75), int(height * 0.8)                               
    cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 0), 2)                               
    predict_image = img[y1:y2, x1:x2]                               
    cv2.imshow("Hand", cv2.flip(img, 1))                               
    cv2.moveWindow("Hand", int(monitor_width / 2), int(monitor_height / 2))                               
    predict_image = cv2.resize(predict_image, (150, 150), interpolation=cv2.INTER_AREA)                               
    cv2.imwrite("a.png", predict_image)                                
    predict_image = 
tf.keras.preprocessing.image.img_to_array(predict_image)                               
    predict_image = np.expand_dims(predict_image / 2, axis=0)                               
    prediction = model.predict(predict_image)                                                            
    if prediction[0][0] == 1:                                                               
        c = Key.up                                   
        print("Paper:UP")                               
    elif prediction[0][1] == 1:                                   
        c = Key.down                                   
        print("Rock:DOWN")                               
        keyboard.release(Key.space)                               
        keyboard.press(c)                               
       key = cv2.waitKey(60)                               
   if key == 27:                                   
      break                               
      keyboard.release(c)                                                   
keyboard.release(c)                           
cap.release()                           
cv2.destroyAllWindows()                                                                       
if __name__ == '__main__':                           
     start()
Code Explanation :
we start with capturing the images from our webcam
cap = cv2.VideoCapture(0)
load our per-trained model that classifies images into 3 classes
model = tf.keras.models.load_model("Model/rps2.h5")
Capture and predict the images:
while True:    
// reading the image from the web cam     
                      
    ret, img = cap.read()
    keyboard.press(Key.space)
// creating a copy of the image received
    predict_image = img.copy()
// flipping the image for the right orientation
    predict_image = cv2.flip(predict_image, 1)
// getting the height and width of the image
    height, width = predict_image.shape[:2]
// defining the coordinates for bounding boxes
    x1, y1 = int(width * 0.25), int(height * 0.25)
    x2, y2 = int(width * 0.75), int(height * 0.8)
// creating a bounding box to place our hand
    cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 0), 2)
// getting the image within the bounding box
    predict_image = img[y1:y2, x1:x2]
    cv2.imshow("Hand", cv2.flip(img, 1))
    cv2.moveWindow("Hand", int(monitor_width / 2), int(monitor_height / 2))
// resizing the image so that it matches out NN's input
   predict_image = cv2.resize(predict_image, (150, 150), interpolation=cv2.INTER_AREA)
// writing the image into the disk
   cv2.imwrite("a.png", predict_image)
// converting the image into an array
   predict_image = 
tf.keras.preprocessing.image.img_to_array(predict_image)
// adding dimension since NN's takes input of 4 dims
   predict_image = np.expand_dims(predict_image / 2, axis=0)
// predicting the image
   prediction = model.predict(predict_image)
Perform action according to the prediction:
if prediction[0][0] == 1:
// move up by pressing up arrow if it is paper
   c = Key.up
   print("Paper:UP")
elif prediction[0][1] == 1:
// move down by pressing down arrow if it is rock
   c = Key.down
   print("Rock:DOWN")
// for triggering shooting action
   keyboard.release(Key.space)
// releasing the key after key press
   keyboard.press(c)
// wait if the user want's to quit      
key = cv2.waitKey(60)        
if key == 27:
   break
keyboard.release(c)
So at last, all we need to do is fire up the game along with this script.
To learn how to run this game, go through the above specified GitHub repo or visit here. Now we have a game that can be played from a webcam with AI sprayed on top!
