Hardware Accelerated Image Stitching Camera
ECE 445: Design Document
Cole Herrmann (colewh2)
Gautum Pakala (gpakala2)
Jake Xiong (yuangx2)
Fall 2022
1. Introduction
1.1. Problem
1.2. Solution
1.3. Physical Design
1.4. High Level Requirement
1.4.1. Camera link to File System
1.4.2. Image Processing
1.4.3. Output Presentation
2. Design
2.1. Block Diagram
2.2. Subsystems
2.2.1. FAST Key Point Detection
2.2.2. Key Point Description, and Matching
2.2.3. Homography Transformation and Warping
2.2.4. Web Server Hosted Image Gallery
2.2.5. Camera
2.2.6. PCB Light Exposure Circuit
2.3. Tolerance Analysis
3. Verification
3.1. FAST Keypoint Detection
3.2. Homography Transformation and Blending
3.3. Web Server Image Gallery
3.4. Light Exposure Circuit
4. Cost
3.1. Cost Analysis
5. Ethics and Safety
6. Conclusions
7. References
8. Appendix A: Requirements and Verification Table
1. Introduction
1.1 Problem
Time and energy are resources that aren't plentiful in UAVs. Traditionally when a UAV is
used for aerial mapping, it will take a picture every time it flies a predetermined distance
interval. Since UAVs must be kept lightweight, it’s uncommon to find any with enough onboard
processing hardware and energy reserves to stitch hundreds of frames into a map. That’s why
most mapping UAVs perform the map generation offsite on more powerful hardware than the
onboard camera and flight controller. In time sensitive emergencies (open combat, search and
rescue, etc), it may not be possible to land the UAV to render an aerial map, and it would be
much more convenient if the drone could render the map itself.
1.2 Solution
We designed a camera that has onboard hardware acceleration capability to stitch images
together. When stitching images together into a panorama or map, several repetitive operations
are required to "prep" the images for stitching. Operations to grayscale, blur, and convolute
images can be performed on a traditional CPU, but the processing time and power consumption
can be improved when such repetitive operations are pipelined through an FPGA. With Cole’s
ECE 397 funding from last semester, he acquired a Diligent Embedded Vision bundle
(https://digilent.com/shop/embedded-vision-bundle/), from which we used the Zybo Z7020 and
PCAM 5C as the basis for the camera. After completing this project, Cole plans to integrate the
camera into one of his drones, including adding serial communication between the flight
controller and the Zybo board (another pro of building on PetaLinux), which would give access
to a plethora of sensors such as GPS, airspeed, etc that could bring a live rendering aerial
mapping drone into reality!
1.3 Physical Design
Zybo Z7020 Development Board with Zynq-7000 SOC
1.4 High-Level Requirements
1.4.1 A picture is taken by pressing a button on the external camera control PCB. The
Camera will store all pictures as YUV files on the Ext4 filesystem (with 32 GB of space),
accessible by the PetaLinux OS.
1.4.2 After at least two images have been saved to the filesystem, the user can press a
button on the external camera control PCB, which stitches together the images taken into a
panorama. By accelerating the stitching process with hardware, the image processing only takes
a few milliseconds, depending on how many images are being stitched together.
1.4.3 After all images have been processed and a resultant image composed of the
original images stitched together is completed, a client computer can access a local web server
on the development board to view an HTML image gallery.
2. Design
2.1 Block Diagram
The FAST algorithm utilizes computer vision techniques to check pixel intensity and
identify key points in images. Key points are pixels in an image that are significant such as the
tip of a building or the license plate on a car. The FAST Keypoint detection subsystem
accelerates this process through hardware comparisons on the FPGA.
After the key points are identified in the images, they are sent back to the Image Warp
subsystem on the FPGA. This system performs calculations on the images in order to warp and
blend them together into a proper panorama.
The final stitched together images are hosted on the WebServer subsystem through a tcp
connection between the SOC and the local subnet.
While the images are being taken, the light sensing circuit on the PCB sends a signal to
the FPGA. The FPGA will communicate with the PCB telling it whether to flash the LEDs for
light exposure in the images.
2.2 Subsystems
2.2.1 FAST Key Point Detection
Keypoint Detection is the process of identifying key points in an image that are
recognizable from different angles, lighting, and scale. Many computer vision algorithms
accomplish this goal such as SIFT, SURF, and FAST to name a few. We are choosing to
implement the FAST algorithm for keypoint detection, not just because it is faster than most
other algorithms, but also because it is the least resource intensive for the FPGA to execute.
These algorithms already take into account scale and rotational invariance for the images.
The reasoning behind FAST is simply because the algorithm only uses the intensity of the
pixel (meaning the grayscale value) to compute whether the point is a keypoint. The FAST
algorithm takes pixel values around the current pixel being tested in a circle called Bresenham's
circle. The algorithm then checks whether there are 12 or more pixels in this circle that have a
greater intensity than the tested pixel plus a threshold. The threshold determines how significant
you want the detected key points to be.
Bresenham’s Circle
2.2.2 Key Point Description, and Matching
Keypoint Description gives each identified keypoint a unique descriptor that can be used
to identify each keypoint on the image. There are many methods of doing this, but the simplest is
to compile a matrix of the gradient vectors around each keypoint that can be obtained through
convolving the image with specific filters.
Keypoint matching occurs when the keypoints are detected and described in each image.
If the difference between the descriptors is below a certain error threshold, the key points in each
image are said to be a match, as shown by the matches in the image below. Typically, a minimum
of 4 keypoint matches is needed for Homography Transformation. The method we used for
keypoint matching is the Brute Force Algorithm, which does what it says and brute force
compares all iterations of the descriptors until it finds the highest confidence match.
2.2.3 Homography Transformation and Warping
When image stitching, the angle of the images needs to be rectified to create a clean
output panorama. Homography Transformation is a common problem that transforms the
coordinate system of an image into the plane of the reference image through a 3x3 homography
matrix. The homography matrix can be calculated using the keypoint matrix and solving a
constrained least squares problem in order to find the eigenvector with the lowest eigenvalue.
The method of calculating this matrix is shown below with x being the source image pixels and
Y being the destination image pixels. This transformation is then applied to the destination
image. One issue with the homography transformation is that the result can be skewed with
outliers in the keypoint matching process, where there are keypoint matches detected, but they
are not really matches. A common solution to any outlier problem like this is the RANSAC
algorithm. This is used to make the computation of the homography matrix more robust. After
the images are warped (transformed) and overlapped through matrix multiplication.
2.2.4 Web Server Image Gallery
We configured our development board to run Petalinux 2020.2, with the network card’s
default IP addressing configuration set to static before building the filesystem. It’s important that
certain features like static IP addressing be built into the filesystem before each boot because the
development board uses a volatile file system upon each boot. A volatile filesystem is handy in
preventing file corruption through bad programs or sudden power loss, and is a common feature
on embedded devices. The development board was also configured with a Busybox HTTP
daemon being built into the kernel and filesystem. Upon boot, the development board obtains a
static IP (10.10.10.3 in our configuration), and then starts the Busybox daemon once a valid
network connection is established. We configured Busybox to point all HTTP requests to the
‘/srv/www/gallery/’ directory, where we hosted all the output images from the camera, and a
single file HTML page with built in javascript functions and CSS styling. Any computer on the
same local network was able to access the camera’s image gallery by visiting
‘http://10.10.10.3/gallery/gallery.html’ in a modern web browser.
An HTTP web server is the ideal method for viewing the output from the camera because
it offers flexibility on the quantity and size of the images being displayed. For example, rather
than displaying a single image at a fixed resolution through the HDMI out port, viewing many
images in a single webpage is more versatile for camera applications. The diagram below shows
the high level process of how any client machine can send an HTTP request to the FPGA. Please
note that the Zybo development board takes the Raspberry Pi’s place in our implementation.
Two clients are on the same local network if they share a local subnet; which is likely if
they are connected to the same router. Having this local connection, they can send HTTP
requests one of two ways: either through the local network using IP addresses for the naming
scheme, or through a local address DNS server using a registered domain name for addressing.
To determine the optimal method, one must consider whether Static IP or DHCP is being used
for IP address assignments. Static IP infers that the development board’s IP address will not
change, while DHCP (Dynamic Host Control Protocol) generally randomizes IP assignments,
giving a new IP each time the device connects. Registering a domain on the DNS server for the
development board is only necessary if DHCP is turned on, because a constant URL is needed
for accessing the image gallery. If static IP addressing is used, the URL will remain constant and
the client computer connecting to the web server can always use the IP of the development board
to access the webpage.
2.2.5 Camera
To capture the images to be stitched together, we had to interface a camera into the
system that can capture an image, and save it in RAM so the stitching application can manipulate
the image. After some complications interfacing a MIPI camera into the linux system, we
decided to interface a Logitech C920 1080p USB webcam. The USB webcam offers auto focus
and auto exposure which are important features when stitching images into panoramas. To
integrate the camera into the system, we had to compile the Petalinux OS with V4l2 drivers
which can instantiate a USB webcam as a video device in the ‘/dev/video0’ directory. We also
had to instantiate a USB PHY host controller in the linux device tree that allowed USB
peripherals to be controlled. When the user triggers an image capture, the stitching program
launches a python gstreamer application that captures an image from ‘/dev/video0’ and saves it
to the RAM.
To control the camera, we interfaced the Petalinux OS to talk to the onboard push buttons
and LEDs on the Zybo Z7020 development board. In our block diagram, we instantiated two
AXI-GPIO interfaces, with one linking the onboard LEDs to an AXI memory address, and the
other linking the pushbuttons to an AXI memory address. We then added these AXI-GPIO
interfaces to the linux device tree so the Petalinux OS knows what memory addresses the IO is
located at. The image stitching application uses the push buttons as inputs to trigger images to
be stitched, and uses the LEDs to show the status of the image capture and stitching process.
2.2.6 PCB Light Exposure Circuit
The light exposure subsystem is a simple LED and resistor 5V circuit connected to the
FPGA through the GPIO power pin. This way the FPGA can monitor the circuit and determine
when the LEDs should be turned on for better energy efficiency. The circuit is currently
configured to be turned on at start up.
2.3 Tolerance Analysis
Two algorithms are usually applied for key point matching, parallel and sequential image
stitching with their tradeoffs.
Key point matching utilizes the difference of gaussian approach in which we blur the
image and subtract the images to find the difference with different levels of gaussian blur. The
key points are the pixels that are locally distinct, and we utilize the gaussian pyramid to find
multiple key points with the approach. The next step is computing the descriptor in which we
compute the gradient of the area and collect the gradient for histogram and find similar local key
points.
Computing the gaussian pyramid is usually time consuming and different approaches can
be used in the steps. The process of parallelizing gaussian filtering benefits from the parallelism,
which reduces the process within seconds[3].
Another challenge of gaussian filtering is the memory constraint. We find out that
although the actual computation on a CUDA device is only 0.43s, transferring the images to a
CUDA device takes 0.73s, which is almost twice as much as the actual computation time.[4]
The sequential image stitching approach stitch images with optimal seam finding and
transition smoothing processes. The sequential panorama stitching procedure enables us to
process large source images and create high resolution panoramic images on limited-resource
devices such as mobile phones.[5] The steps of the procedure are optimal seam finding and
transition smoothing.
During panorama stitching, the approach only needs to keep the panoramic image and the
current source image other than all source images in memory, which is good for implementation
on mobile devices.[5]
3. Verification
3.1 FAST Keypoint Detection
The Fast algorithm is suitable for our real-time video processing application because of
its high-speed performance. With 20 threshold for the Bresenham Circle , we are able to achieve
20ms average latency of the algorithm, accounting for I/O, for an average of 131 keypoints
detected per image. The high speed is reached with our optimization of the algorithm, and choice
of threshold. As an additional comparison, running a software FAST detection algorithm took 85
ms on average, so our acceleration provided a 4.25x speedup over the software.
3.2 Homography Transformation and Blending
At least 4 points are necessary for the calculation, but our FAST is robust enough to
provide over 100 key points . With our system, the latency of blending and overlay varied from
3-10 seconds depending on the number of keypoints. Artifacts in the images are mainly only
noticeable through light exposure which is a tradeoff we made with cost. With a more expensive
camera with light-balancing we would reduce the artifacts from light exposure.
3.3 Web Server Hosted Image Gallery
After the camera application finishes stitching the completed panorama, it sends the three
original pictures (left.jpg, middle.jpg, and right.jpg), grayscale keypoint descriptors, grayscale
matched keypoints, and the resultant panorama to the image viewer directory. The images can
then be viewed by visiting ‘http://10.10.10.3/gallery/gallery.html’, with 10.10.10.3 being the IP
of the development board. The image layout in ‘gallery.html’ is shown below.
‘gallery.html’ image layout
3.4 Camera
The camera system consists of a USB webcam, GPIO push buttons, GPIO LEDS, and
drivers to control the image capturing process. The USB webcam is instantiated as a device
upon boot as ‘/dev/video0’ in the Petalinux filesystem. The camera application uses a Python
program that accesses a gstreamer to capture a frame from the webcam. However, the camera
application waits for GPIO input before executing the python script. Once the program receives
GPIO input, the python script is run and the program outputs a signal to a GPIO LED to signify
that a picture has been captured. The diagram below shows this process.
The rigidity of the camera application can also be tested by purposely taking three faulty
pictures (one method would be to cover the lens with a piece of paper), and let the camera
application attempt to stitch them together. Since the images of a piece of paper will not result in
any key points from the fast algorithm, the stitching will fail and the program moves to the
‘Stitching Failed’ state in the block diagram. However in normal operation, the stitching will
succeed and move to the ‘Stitching Complete’ state.
3.5 Light Exposure Circuit
PCB verification was tested through its GPIO connection to the FPGA. The LEDs on the
PCB only turned on when the FPGA was turned on due to our setting of the GPIO pin only to
turn on with the FPGA. Before connecting to the FPGA, the PCB was tested through battery
power with 5 volts and a switch. The circuit operated functionally under these conditions, with
none of the LEDs turning on when the switch was off. Due to these tests, we were able to
confirm functionality of the PCB to connect to the FPGA.
PCB layout
4. Cost and Schedule
The cost of our project is shown in the table below. We do not require much hardware as our
project is mainly based on FPGA. We also take the labor cost into consideration. We expect the
team members to work at least 1 hour per day with a salary 45 dollars per hour. Therefore the
labor cost is 45 dollars/hour × 3 members × 1 hour/day × 90 days = 12150 dollars.
The total cost = $12150 + $460 = $12610
5. Ethics
Our project, accelerated panorama image stitching camera, upholds the code of ethics I as
it has societal implications and great potential applications in dangerous situations.[1]
The image stitching camera can be used for traffic control and cartography. As it provides
fast and high quality image output. The image stitching camera also has commercial prospects
because it reduces the burden of communication systems as only one panorama is sent per frame
contrary to sending several images through the wireless system.
Besides the societal implications, drones equipped with our camera can be deployed in
various urgent and perilous natural hazards such as forest fire and earthquake when the wireless
stations are shut down or disabled. The information can be quickly processed and the danger is
responded to faster compared to traditional methods of communication.
6. Safety
We were aware of the safety and ethical problems that would come with the project. As
there are few high buildings in the Champaign urbana area, it’s safer to control the drone in an
open area with few crowds. We were aware of the intrusion of private properties and right of
portrait when experimenting with the camera as it potentially violated the IEEE code 2 “hold
paramount the safety, health, and welfare of the public”.[2]
The PCB only requires a low, safe voltage. But we should payed attention to testing the
board, which might have caused a short circuit and damage to the board. We would follow the
fire procedure when the burning gets out of control.
Personal safety is also important when we are working in the lab. We will not engage in
experiments with hazardous materials or high voltage electronics, therefore, we will abide by the
laboratory safety guidance provided by the University of Illinois.[2] For example, not working
alone in the laboratory, not touching wires with two hands, not eating, drinking, or applying
cosmetics.
7. Conclusions
While we are thrilled that our camera did function as intended, we also found a few areas
where we can improve our design. One flaw of our design is our utilization of the FPGA.
Granted, we did hardware accelerate the FAST algorithm which took up a decent portion of the
FPGA resources. However, we could have also accelerated the matching algorithm and the
image warping algorithm which would have significantly decreased stitching time. We did
attempt to do this, but ran into issues with buffer data type sizes. But theoretically, accelerating
the matching and warping algorithms is possible. We also could have chosen a much faster
matching algorithm. Our matching algorithm, the brute force matcher, is one of the main
bottlenecks in our design. Switching to an optimized algorithm such as the Fast Library for
Approximate Nearest Neighbors-Based Matcher (FLANN) would remove the bottleneck,
especially since it’s optimized for the FAST keypoint detection algorithm.
We also discussed expanding the design to support multi-spectral sensor inputs. This
would allow for imaging techniques using radar and NDVI that would provide more useful data
other than only 2D imaging. We have a base design that can be used for drone aerial mapping
once we add a larger frame buffer to support larger maps. But adding more sensors would
justify the use of the FPGA in an aerial mapping drone, since the FPGA is able to process large
amounts of data quickly.
8. References
[1] A Zynq Accelerator for Floating Point Matrix Multiplication Designed with Vivado
HLS. https://docs.xilinx.com/v/u/en-US/xapp1170-zynq-hls
[2] Zybo Reference Manual.
https://digilent.com/refrence/programmable-logic/zybo/reference-manual
[3] Yingen Xiong and Kari Pulli, Sequential image stitching for mobile panoramas,
Information, Communications and Signal Processing conference 2010.
https://www.researchgate.net/publication/224107399_Sequential_image_stitching_for_m
obile_panoramas
[4] Xin Xu and Zhuoqun Chen, Parallelizing Image Stitching,
https://github.com/JamesOnEarth/Parallel-Image-Stitching
[5] IEEE code of ethics I. IEEE Policies, Section 7 - Professional Activities (Part A -
IEEE Policies). https://www.ieee.org/about/corporate/governance/p7-8.html.
[6] J. Crawford, “Easy access to my pi on a local network,” Jordan Crawford,
31-Oct-2016. [Online]. Available: https://jordancrawford.kiwi/local-address-dns/.
[Accessed: 07-Dec-2022].
9. Appendix A: Requirements and Verification Table
Subsystem 1: FAST Keypoint Detection
Requirements:
Verification:
The FAST algorithm should be hardware
accelerated to output more than 100
identifiable keypoints per image.
The equipment for verification would be
the Xilinx FPGA
The camera inputs image arrays to
the FAST keypoint accelerator. The output
of the algorithm is sent to the web server
for visibility and accuracy checks. The
keypoint count is printed to the terminal
and recorded.
The FAST algorithm should be able to
return all images’ keypoint arrays in under
30 ms for a clock speed of 200 MHz.
The equipment for verification would be
the Xilinx FPGA.
The code is edited to include the built
in Linux chrono library for timing. The
FPGA runs the program and prints the
time taken by the FAST functions to the
terminal.
Subsystem 2: Image Warp
Requirements:
Verification:
From the returned keypoint arrays from
the hardware, the SOC is able to compute
homography, warp, and stitch the
panorama in under 10 seconds.
The equipment for verification would be
the Xilinx FPGA.
The code is edited to include the built
in Linux chrono library for timing. The
FPGA runs the program and prints the
time taken by the homography and
warping functions to the terminal.
From the identifiable key points, the SOC
calculates the homography matrix, and
performs the transformation. The software
should be able to warp and blend the
panoramas to where end-users could only
notice the difference if they stepped closed
and searched for artifacts in the image,
disregarding black space for background.
The webserver hosted by the SOC would
be used for this test.
The image is inputted into the Linux
environment from the accelerator. The
image is sent to the web host and viewed
on the monitor. The results of the image
can be qualitatively observed and recorded
by the users.
The image on the web server can be
used to verify the projective
transformation. Some signs of error we
would pay attention to:
1. Differences in Light Exposure
leading to color mismatches.
2. Differences in rotation of images
leading to correctional warping
rather than distorting the image.
Subsystem 3: Web Server
Requirements:
Verification:
Users can access a web page that displays
all the images taken by the camera. Users
can view the original images captured, the
greyscale images with defined key points,
images tracing the matched keypoints, and
the final output panorama. Users should
also be able to click on an image to
enlarge it and inspect the details.
To connect to the web server, users must
set their computers NIC to any IP in the
10.10.10.X subnet. Then after connecting
an ethernet cord between the computers
NIC and the FPGAs NIC, they can access
the image gallery web page by going to
the following URL:
10.10.10.3/gallery/gallery.html. 10.10.10.3
is the static IP of the FPGA.
All images that are successfully stitched
are sent to the webserver from the
stitching program, any images that failed
the stitching process should not be sent
and the stitching process should start from
the beginning.
The user can take three pictures, moving
the camera left to right after each picture.
Then assuming the stitch was successful,
the results are copied to the web server
directory. If the user covers the camera
lens with their finger and takes three
pictures, the program won’t be able to
stitch the images and restarts. Since the
stitching failed, the images are not copied
to the web server directory.
Subsystem 4: Camera
Requirements:
Verification:
Users can use the push buttons to trigger
the camera to capture a picture, then once
all three images are taken, the images are
Once the program loads, all 4 status LEDs
are lit signifying that the application is
ready to capture pictures. The user then
stitched together and the program waits
for the user to take the next round of
pictures.
can press pushbutton 1 to start the capture
of image 1. Once the capture is over, only
status LED 1 is lit, signifying the image is
saved in RAM. The same process is
repeated for capturing images two and
three, except the user will use pushbuttons
and LEDs 2 and 3 to finish capturing the
pictures.
If the user captures images that can’t be
stitched together, the application notifies
the user by turning on LED 4, signifying
that the application has reset and 3 new
images must be taken.
The user can capture 3 images while
keeping their finger over the lens, and
once the stitching application returns 0
key points to be paired, LED 4 will turn
on telling the user to capture 3 new
images.
Subsystem 5: Light Exposure Circuit
Requirements:
Verification:
The LED is controlled by a switch and is
supplied with 5 Volts from the battery.
The PCB provides consistent light for
photo taking.
We would test the functionality of the
PCB with the lab equipment.
1. Supply 5V DC to the PCB and
ensure LED lights up without short
circuit or disconnection on the
board.
Switching of the circuit is controlled by a
hardware switch or the GPIO of the
FPGA.
The GPIO is easier to work with internal
signals on the FPGA and requires a driver
and relay circuit therefore, we finally
decided to use a hardware switch for
simplicity and easier control over the
circuit.
1. Switch the circuit and test the
durability.

Check out the Git!

World's hardest game