image_pdfimage_print

Advanced Headpose Estimation optimisation on a Raspberry Pi

The hardest portion of my latest project was the image recognition component. The challenge was to carry out Headpose Estimation on a Raspberry Pi at usable framerate. I will talk about why I chose the RPi 3 Model A+ as the SBC (Single Board Computer) and the changes I did to the base code to optimise it for the Raspberry Pi. 

Deciding the SBC

I first identified the SBC specifications that mattered most to the project. To carry out the image processing, I figured out that an SBC with the highest clock speed would work best and with multi-threading, given the nature of the process done it would only run on one processor core. Also, the program would be written in Python so for ease I wanted to run Linux so I needed an x86 or ARM processor. As for the RAM, the pattern recognition models I would be using are quite big in size and I couldn’t use smaller models because the images I would be processing were of low resolution so the models had to be robust. I clearly identified that I would need at least 512MB of RAM.

I also needed the SBC to have at least 1 set of UART pins and 1 set of I2C pins and a usb for the webcam.

I spent a number of evenings looking at all the market offering and yet again you end up choosing a Raspberry Pi as all the other offerings either don’t meet the required specs and show no documentation or are so expensive. I decided to use a Raspberry Pi 3 Model A+. Apart from being super cheap it offered a beefy processor and 512MB of RAM and the possibility of using the Camera Serial Interface (CSI).

Getting experience with Image Recognition

Before this project I had never done any image processing type programming. So I had no clue which libraries to use. I did a couple of simple tutorials from the amazing Python Programming , where I was introduced to the OpenCV library. This library is almost always used when it comes to working with images on Python, it can do everything from rotate images to apply some crazy filters. However, the processing I wanted to do was a bit more advanced than what these tutorials were teaching.

I discretised the problem and identified what I wanted from the code, which was to calculate the Euler angles between the face and the image plane, with this I would be able to see where the face was looking at. I then found out the technical name of my problem, Headpose Estimation. This is just an application of the commonly known Pose Estimation, which refers to the use of computers to estimate an objects orientation and position in an image domain. The process can be split into 2 main parts:

  1. Detection of the facial features, for example: eye pupils, ends of the lips… These features are given a corresponding 2D coordinate.
  2. Computing the Euler angles by creating and solving a Perspective-n-Point system with the data point of the facial features.

More specifically for my application the process would follow the steps below:

Figure 1

The Codebase I used

I came across this very cool GitHub that solved my problem quite effectively and I thought this would be the end of my problem. But after running it on my MacBook, I decided to run it on the RPi and got very poor performance. In my tests with positive face detection the system would take and average of 658ms to calculate one frame, in other words, it would process 1.5 frames per second (fps), versus the 19 fps my MacBook processes. This frame rate would be too low for smooth operation of the ErgoScreen as the users face can move a very considerable amount during this time interval.

Figure 2 (“wf” refers to test with positive face detection & “wof” refers to test with no face detection)

Exploring the results from the test with the base code (Figure 2) further. It is clear that the component of the process that takes the most time is the Bounding Box Estimation, it encompassed around 94% of the processing time. For those knowledgeable in the subject will understand that this is obvious as the system has to search the entire image and compare a larger amount of datapoints in order to establish the boundaries of the face. Knowing this I began to work on reducing the bounding box time.

Interestingly, from this data we can see the poor integration the FaceTime camera has on Mac’s where the image capture takes up to 35% of the processing time, compared to the Raspberry Pi. This is due to the fact that the FaceTime Camera is just a built in USB webcam.

Optimising the Bounding Box algorithm

The base code uses Dlib’s get_frontal_face_detector() function to calculate the bounding box. This function uses a Histogram of Oriented Gradients (HOG) algorithm which in theory is one of the most accurate methods for object recognition, however in this case we only need to consider faces which are looking strait towards the screen and this is where HOG is not as good as other algorithms.

HOG is mostly used in applications where object recognition is needed in a variety of environments and backgrounds, i.e. pedestrian detection. It works by detecting image details, like edges, and discarding the remaining background, these image details are segmented and analysed more closely in order to determine their gradient. The gradients are analysed collectively to determine if it’s the desired object.

Whilst this may sound perfect and efficient, it is very processing power intensive and has to be trained with a varied and complete dataset to work. In our case we are interested in discrete possibilities of where the object (face) resides so we don’t need all this processing. This is why simple algorithms like Haar cascade classifiers work much faster for my application. Haar classifiers convert the image to monochrome and compare the light and dark regions of the image to establish edges and shapes that belong to a face. This process is quick and very efficient and in this case, as accurate as HOG.

OpenCV has very well trained and optimised Haar classifiers models for faces. So I decided to try these out.

OpenCV’s detectMultiScale() function carries out the facial detection using a pre-defined CascadeClassifier class, which loads the desired model. This function has a very useful parameter, scaleFactor, which scales the image to simplify processing my looking for bigger objects, in this case faces, this speeds up processing time.

The output of the detectMultiScale() function is an array with the bottom left x,y coordinates of the bounding box and its width and height of every detected face. However, the function used to detect the facial landmarks, Dlib’s shape_predictor(), requires the bounding box to be defined by the bottom left x,y coordinates and the top right x,y coordinates. I was able to convert these by using Dlib’s rectangle() function.

Figure 3 (“wf” refers to test with positive face detection & “wof” refers to test with no face detection)

The effect of these changes can be seen on the Figure above. The processing using the Haar bounding box framework (“RPi_wmodel_wf & RPi_wmodel_wof”) take >70% less time than those with the HOG approach from Dlib and with almost no impact in accuracy. Moreover, with this change, the program achieved >5fps (Table 1) with the positive face detection test.

Optimising the Webcam Stream

Having seen and understood the results of Figure 3, the only possible remaining simple optimisation I could do would be modifying the image capture portion of the process. Even though the “Landmark” portion of the program encompasses the largest portion of the processing time after “Bounding Box”, optimising it would prove a great challenge as it is already very efficient and is only detecting a certain number of facial features. Similar to how “Bounding Box” works, “Landmarks”, which is a HOG based framework, must process the Bounding Box portion of the image for every chosen facial landmark and this will take time.

After “Landmarks”, “Image Capture” takes the most time and this is something that can be worked on.

The base code uses OpenCV’s standard approach for webcam streams, VideoCapture(), which will read frames from available webcams/cameras on the system. This function is very useful as it is universal and has many parameters and options but its method to retrieve images is relatively slow. It works by first creating a stream with the webcam, in other words it is informing the webcam to turn on and get ready to send frames. Then when we want a frame, we call VideoCapture.read(), this tells the webcam to take a frame and return it. Inherently, this takes a lot of time as we have to wait for the webcam to take and image and send it.

Considering I am using a RPi 3 Model A+, which has a quad-core processor, I can take advantage of this and use multi-threading to save time when retrieving the image frame. The imutils python package offers a very useful class, PiVideoStream, which in its self uses a process similar VideoCapture() but uses a separate thread to store the read images so when PiVideoStream.read() is called the last captured image is returned and there is no standby. Additionally, PiVideoStream uses a 320×240 resolution which is perfect for reducing the processing time of the later stages as a smaller number of pixels needs to be examined.

Figure 4 (“wf” refers to test with positive face detection & “wof” refers to test with no face detection)

Even though this change delivered less of an impact it is still visible in the results in Figure 4. “RPi_wmodel_cam_wf” resulted in a 5.81 fps, which is truly above the 5fps average. The “RPi_wmodel_cam_wof” test resulted in a 7.14fps, this is very good achievement as the system will be able to respond quicker when a new face is detected by the system.

It is evident that optimisation has led to better performance of this system. The optimisation delivered a 380% increase in fps in positive face tests on the RPi over the base code. Using the base code was very useful as it sped up the process and allowed me to focus on improving components rather than building them. Without this optimisation, my project would have dramatically increased in cost as I would have had to include a more powerful processing unit. This is a perfect example of where small adjustments in code can render greater performance whilst at a lower cost and time. Further work could have been done to decrease the “Landmarks” processing time by including it in the bounding box process, which in turn could be reduced even further by threading the processing of individual classifiers.

Appendix

The final code can be found here

Thermometer using thermistor and Raspberry Pi (no ADC)

In this tutorial I’ll be explaining how to make a Thermometer using a microcontroller (RPi3B) and a thermistor without using an ADC.

I bet you have seen millions of tutorials online on how to make a thermometer using a microcontroller, thermistor, and ADC. Some microcontrollers even have ADCs integrated on their SOC. Well, here you can learn how to make a thermometer just using a microcontroller and basic circuit components.

For this project you will need the following:

  • Microcontroller (I’ll be using a Raspberry Pi 3 Model B)
  • Thermistor
  • 2*Resistor ~220ohm
  • NPN transistor (PN2222a)
  • Capacitator ~100uF
  • Bread Board with at least 6 lanes
  • Jumper wires (like 4)

First, I’ll describe how it works. Then I’ll show you how to set up the hardware. (The boring part) I will describe the maths needed to make it work. Finally, I will explain the program to make this work.

How it works

The special element here is the capacitator. We know the capacitator blocks voltage across it as it charges and the charging speed depends on the initial voltage across it. The thermistor resistance decreases as its temperature increases. So if the initial voltage increases the capacitator takes a shorter time to charge, this is considering the capacitator and the thermistor are connected in series. This is how we will find out the temperature by measuring the difference of time needed to charge the capacitator.

The raspberry pi has a couple of GPIO pins that output 5V DC power we will use those to power the circuit. Like most digital connections LOW voltage is considered to between 0V and 3.3V, so we will measure the time it takes the capacitator to drop the voltage to 3.3V.So the time it takes for the voltage to go below 3.3V will vary with temperature. Below is a proportionality table of what happens when Temperature changes, the independent variable.

ΔT ΔRt ΔiVc ΔTtC
+ +
+ +

(where: ΔT is the change in temperature, ΔRt is the change in resistance of the thermistor, ΔiVc is the change of initial voltage across the capacitator, and ΔTtC is the change of the time taken to reach a 3.3V voltage)

How to set it up