Nothing could not more widespread than word ‘AI’ (Artificial Intelligence) and Deep Learning in this 2018 era. It does appear in many media and fields, you name it.. Medicine, Stocks trader, Economy, Government, Online business, event in AV industry!
It’s a birthday of my 2 years old boy, and I try to figure out what I should give him for a birthday present. I got this project idea from O’Reilley blog ,which I followed mostly the same idea and structure. Although it seems too much for 2 years old boy, but he really enjoy watching the robot car go around and talking.
The blog post does give a really good outline, though I have to dig in details in various subjects. I try to clarify and give clearer details here, for those who didn’t have much experience. Basically the robot car we build can move in direction we prefer (of course!), and it equipped with eyes and brain that can roughly tell us (speak) what it saw.
Thanks for the burst era of AI, now we can get things done a lot easier than 5 years ago. In the hot area of AI is Deep Learning, I wouldn’t go in much details here. But it is widespread use as a tool of Image Classification, that computer can tell what it is in the image. I used Resnet50 module, that won Imagenet and COCO classification challenge in 2016. ImageNet is an image classification challenge that comprise from 1,000 categories of natural picture (such as animals, auto, nature things). It was the first time computer can give correct answer better than human (error 3,57%).
Raspberry Pi 3 is a good choice for building mobile robot and using Deep Learning, although it may ‘lag’ in some point. Deep Learning need computer do lots of calculation, so it need quite amount of power and RAM. So, with 1 GB of RAM in Raspberry Pi 3 is OK. You will need a 16 GB SD card to install TensorFlow for Deep Learning. 8GB SD card is not enough.
The image in Figure 1 came from Lukas Biewald’s blog, I didn’t use Web App (Flask/Gunicorn) to control the robot. I just VNC to pi via my laptop or iPad and use command console to control it. I also changed TTS (Text-to-speech) software from Flite to Pico TTS.
Robot chassis is easy to find via online shopping site such as eBay, Amazon, Alibaba, and it’s amazingly cheap. Also you can find wheels, motor, breadboard, and male-to-male jumper. I bought the 6 batteries pack from Ban mo (famous electronic market in Bangkok), it depends on the motor control how much Voltage they need. I use Cytron’s HAT-MDD10 for Raspberry Pi. It’s easy to connect together with Pi, just stick it on top without manually connect to Pi’s GPIO.
I didn’t use official Pi’s camera because it’s too pricey. I found the unofficial camera looked just same and I didn’t need that sophisticated pictures and VDO. It cost only 9$, compare to 30$ for official Pi camera.
The Sonar sensors HC-SR04 are good enough for the robots. They help prevent collision, also cheap to buy.
For Powerbank and speakers are what I invested in more than anything, because I can use them after the project. Pi need current around 1.5 A to startup. For normal use, the 2.1A or regular mobile powerbank should enough. However, our robot has both camera and speakers attached so it need more current than that. I use Anker powerbank as a power supply. Anyway, the speakers are the last thing I gathered. I ended up with JVC Clipper 2, which it has its own battery. So it may not need that high 3A current Powerbank.
To start, first we do a hardware job by put the the wheels, motors to the chassis. Then connect to motors to the motor HAT controller (Please refer to the manual of your motor controller), and connect it to the power sources you prefer. I used 6 Alkaline batteries because they’re easy to find, give stable current, suited for testing the robot. You may want to use rechargable batteries or NiMH (Cellphone battery) depends on how often you will use the robot.
After you connected the motors to the controller, you may want to test see it spinning properly. My controller has 2 small switched to test the motor, also with LED showing the power is on. Then you connect the camera, motor controller to Raspberry Pi.
Start your Raspberry Pi
Now it’s time to start your Pi, there’re many online tutorials how to get start and setup Raspbian in SD cards. One thing to warn you that the SD card is quite vulnerable to get damaged. It’s wise to always backup or clone your SD card every time you do a major change in your program or OS. Some of installation might took an hour, so you don’t have your times (and tears) doing it again. You can use Github to store your python code.
Deep Learning what!?
Now we start with the robot eye’s and brain, which is image classification for pictures fed from the camera. Now, there’re some words that you need to know and familiar with before going to next step. You can use Google or Wiki to see the details of these words, but I will give you a bit of explanation here:
- Deep Learning or Neural Network : This is a sub-division of AI and Machine Learning. Basically, it is the layer of algorithm to calculate and compute the data, feed it next layer, and can received the data back to update itself to perform better.
- Convolution Neural Network: A type of Neural Network that widely use as an algorithm for Image Classification. It work to compressed, and detect the features from image (such as a nose, eyes, alphabets, and many more..). Then compare the data to its memory (what it knew, or had trained) , and tell what this image likely to be. It also can describe the image, segmented to image, or event generate the image match the style with the trained image
- TensorFlow are open-source software library for dataflow programming across a range of tasks. It is a symbolic math library, and also used for machine learning applications such as neural networks. More library such as Theano, PyTorch, etc..
- Keras is also a library that wrapped TensorFlow, it’s easier to work with. I also tested with or without Keras, and found that it was faster to get the answers.
- Pre-trained model : a model created by some one else to solve a similar problem. Instead of building a model from scratch to solve a similar problem, you use the model trained on other problem as a starting point. Usually, we use the model trained from ImageNet. ImageNet is a set of millions natural pictures, categorized to 1,000 classes. So these models know best in those 1,000 classes. There are many pre-trained model available, such as VGG, ResNet, MobileNet, etc.. After multiple tested, I ended up chose ResNet50 for the robot.
Installing Keras and Tensorflow
OK, those are quite lots and deep to dig in. Hope you can get a roughly idea how Deep Learning work. What we need to install are TensorFlow and Keras, the pre-trained model are available in Keras library for us to use. I found this blog post from
Before we get too far we should check the contents of our keras.json configuration file. You can find this file in ~/.keras/keras.json .
Open it using your favorite text editor such as
and see the default values, it should look something like this:
Specifically, you’ll want to ensure that image_dim_ordering is set to tf and the backend is properly set to tensorflow .
Installing NGINX, RPi-Webcam-Interface, and Pico TTS
Then we install NGINX for doing server job, and install RPi-Webcam-Interface for or streaming the camera. The RPi-Webcam-Interface has lot of features, easy configurable and by default puts the latest image from the camera in a RAM disk at /dev/shm/mjpeg/cam.jpg
To start : getting in directory and run by preceding script file with ./
** DO NOT install from nginx tab in ./RPi_Cam_Web_Interface_Installer.sh , just use command shell script
./RPi_Cam_Web_Interface_Installer.sh to config
*AGAIN: Do not touch nginx tab, it will caused a lot trouble, and you’ll end up restore your SD card.
After you install RPi-Cam-Web-Interface and choosing NGINX as server, now you can navigate to localhost:80 and setup your camera. Basically, the video snapshot will save into file at
Install Pico Text-to-speech
Last thing is Pico Text-to-speech, by follow eLinux web this Pico section. You can try connect speakers to your Pi. I suggest using 3.5mm audio jack or USB instead of Bluetooth. The voice will be delayed and lag because it need Pi to send signal via Bluetooth, during heavy computation.
Pico will turn your text into .wav file, and you can use Pi’s default player ‘aplay’ to play the wav file. I found this small ‘OMXplayer‘ did a great job. Combine it together, you just run on command line and see how it sounds.
Right now you should clone or backup your SD card. It would cost lots of time to reinstalled these stuffs if the SD card get corrupted.
Next post, it’s time to start code Python!
Here is Part 2