Why Event Cameras May Be The Future Of Computer Vision

This article is an original piece from the WeChat public account “Dahua Imaging” and the Zhihu column “All In Camera”. Please cite the source for reproduction. Join the Dahua Imaging Reader QQ Group 2: 833282006. Dahua Imaging Technology Forum: ww.dahuachengxiang.com. Please add zxzombie to the WeChat group first.

New teaching videos on “Image Sensor Technology and Applications” are now available on Taobao Education.

Courses such as “Image Quality Debugging for Imaging Systems”, “Basics of Imaging Algorithms (Python Version)”, “Optics for Imaging System Lenses”, “New Version of Image Quality Testing and International Standards”, “New Version of CMOS Sensor Testing and International Standards”, and “New Version of Digital Imaging Systems 42 Lectures” are available for sale at the official Dahua Imaging Taobao store:

https://shop322456667.taobao.com/

Is computer vision on the verge of a revolutionary self-renewal?

As a professor of ophthalmology at the University of Pittsburgh and a part-time professor at the Carnegie Mellon University Robotics Institute, Ryad Benosman firmly believes in this prospect. Professor Benosman, also a pioneer of event-based visual technology, predicts that neuromorphic vision—this computer vision technology based on event-driven cameras—will be the future development direction in the field of computer vision.

He stated, “The field of computer vision has undergone several complete transformations. I have had the privilege of witnessing at least two of them, reborn from scratch in a completely new way.”

Ryad Benosman (Image source: University of Pittsburgh)

Benosman reviewed the transition from photogrammetric image processing in the 1990s to geometry-based methods, as well as the recent rapid shift towards machine learning. Despite these transformations, modern computer vision technology still mainly relies on image sensors—those cameras that can produce images similar to what the human eye sees.

What is EVS?

EVS captures motion (brightness changes).

EVS aims to simulate the way the human eye perceives light. The way the human eye works is that when the receptors on the retina are exposed to light, they convert it into visual signals sent to the brain. Subsequent neuronal cells identify brightness and darkness, and the information is transmitted via retinal ganglion cells to the visual cortex of the brain.

In EVS, incident light is converted into electrical signals in the imaging circuit. The signals are amplified and reach a comparator, where differential brightness data is separated into positive and negative signals, which are then processed and output as events.

EVS Mechanism

In event-based visual sensors, brightness changes detected at each pixel are filtered to extract only those that exceed a preset threshold. The event data is then combined with pixel coordinates, time, and polarity information before output. Each pixel operates asynchronously, independent of other pixels.

The accompanying diagram illustrates how the sensor captures the motion of a ball.

Event-based visual sensor data output

Event cameras are a relatively new subcategory in computer vision, which is a discipline focused on enabling machines to interpret visual information (such as images or videos). Computer vision includes technologies such as image processing, pattern recognition, and machine learning. Event cameras aim to technically mimic the human brain. With the increasing interest in the human brain and its workings, biomimetics is gaining more attention. Event cameras are just the first step in mimicking the direction of the human retina chip. These cameras are also known as biomimetic cameras or DVS (Dynamic Vision System) cameras. They capture visual information based on the dynamics of the recorded environment.

The principle is as follows: each pixel in an event camera acts as an independent processing unit, allowing them to output brightness changes asynchronously. A change in a pixel’s brightness is referred to as an event. An event represents the recording of motion, brightness changes, and timestamps in the scene. Events are timestamped with microsecond resolution and transmitted with sub-millisecond latency, enabling these sensors to respond quickly to visual stimuli.

The image below compares the output of the event camera with that of a traditional camera. You can see a disk with rotating black dots. The traditional camera records complete frames at fixed intervals. In each frame, the black dot has moved a considerable distance. The information between these recorded frames is lost. In contrast, the pixels in the event camera are triggered only by the movement of the black dot. This means they provide information in a continuous flow.

In principle, each pixel of the event camera consists of a light reception and brightness detection unit. Incident light is converted into voltage in the light reception unit. The differential detection circuit in the brightness detection unit detects changes between the reference voltage and the converted incident light voltage. If the change in the positive or negative direction exceeds the set threshold, the comparator recognizes it as an event and outputs the data.

Using the detected event brightness as a reference, the circuit is reset and thresholds are set in the positive (light) and negative (dark) directions based on the new reference voltage. If the change in incident light brightness in the positive direction exceeds the set threshold (i.e., the output voltage exceeds the positive threshold), a positive event is output; conversely, if the voltage falls below the negative threshold, a negative event is output.

From the pixel perspective, the workflow is as follows:

(1) Set reference voltage and positive/negative thresholds.

(2) When the brightness of incident light is below the negative threshold, output a negative event.

(3) Reset the reference voltage and positive/negative thresholds based on the value at the event output.

(4) If the brightness of incident light further drops below the negative threshold, output another negative event.

(5) Reset the reference voltage and positive/negative thresholds again based on the value at the second event output.

(6) If the brightness then increases and exceeds the positive threshold, output a positive event.

As shown in the figure below, the pixel converts the logarithm of incident light brightness into voltage. This allows the sensor to detect subtle differences in low brightness ranges while responding to wide brightness differences in high brightness ranges to prevent event saturation, achieving a wide dynamic range.

The mechanism produces EVS images as shown (right).

As the brightness of pixels changes when the target moves, the image of the moving target will appear as if its contours have been extracted (the photo was taken by a camera equipped with EVS mounted on a car dashboard).

Why prefer event cameras? Traditional cameras have been the standard devices for acquiring visual information for decades; they are often the only choice but also come with many limitations that must be accepted. These limitations include low frame rates, high latency, poor adaptability to extreme lighting conditions, and high energy consumption. Although some manufacturers strive to compensate for these deficiencies through hardware innovation, they have not touched upon improvements in core technology. In contrast, event cameras showcase a completely different hardware architecture that can achieve frame rates of up to 10,000 frames per second, very low power consumption (about 1mW), and dynamic ranges exceeding 120 to 140 dB. These features enable event cameras to perform well in situations where traditional cameras struggle. For example, the scenario of a car exiting a tunnel illustrates this: the image captured by the traditional camera is overexposed due to the sudden change in brightness, while the event camera can produce a good image.

However, event cameras only output when there is a change in brightness, and different algorithms can be used to reconstruct images that we understand from traditional cameras. For many applications, information about static objects in the scene is unnecessary and only adds useless visual information, essentially creating an additional burden in the processing workflow.

How to determine if your project needs an event camera? There are many reasons to choose an event camera over a traditional camera, but to avoid writing an endless list, please first ask yourself the following questions:

Are there uncontrollable lighting conditions in my scene?
Do I have poor lighting conditions—very dark or very bright?
Am I recording something that is moving?
Do I need to record something at a very high frame rate?
Am I using a mobile device for recording (e.g., a car)?
Do I have power limitations—such as when using a drone?
Do I like new computational techniques and think mimicking the human brain is great?

If you can answer “yes” to any of the above questions or at least some of them, you should definitely consider using an event camera.

What are the potential applications of event cameras? The potential uses of event cameras are virtually limitless. The real question is why they have not been widely adopted. From a direct usage perspective, the following applications would be particularly promising:

Robotics and manufacturing:
Simultaneous localization and mapping (SLAM)
High-speed obstacle avoidance (including drone applications)
Drone applications
High-speed interaction between machines and environments
Production line monitoring
Visual inspection under uncontrolled light sources
Automotive and general applications:
Fast detection of pedestrians and cyclists
Detection of changing lighting environments
Gesture recognition
Night vision
Depth estimation
High-speed detection without delay
Optical flow estimation

Another mainstream research approach is to combine RGB cameras with event cameras for image reconstruction, which can enhance image quality, frame filtering, reduce motion blur, etc.

Functions such as:

Intensity image reconstruction
Video synthesis
Image super-resolution
Joint/guided filtering
Tone mapping
Visual stabilization
Polarization reconstruction
High dynamic range image restoration
Auto-focus
High-speed imaging
Motion blur removal

Research on event cameras has indeed been growing, and here are some of the recent papers published in the fields of computer vision and robotics regarding event-based vision. As we can see, this is an emerging topic, and more and more people are joining the research on event cameras.

So, why have event cameras not become popular? Despite the many advantages of event cameras, their level of adoption in the market is still low. The reasons are as follows:

Supply chain challenges:

Customizing a CMOS-based camera can be very challenging for suppliers when preparing to produce cameras that meet your precise specifications. The specific components you need may have to be mass-produced and manufactured according to specific quality and performance standards.

Development costs:

Attempting to develop a system based on event cameras will face significant challenges, as it requires finding suppliers willing to shift from profitable CMOS semiconductor production lines to what they perceive as higher-risk, unproven new production lines. Additionally, manufacturing costs can be quite high, and ultimately these costs need to be passed on to consumers.

Price issues:

Closely related to supply chain challenges is the price issue. Currently, the price of a single event camera module on the market is not low, which greatly limits their range of applications.

Technological cutting-edge:

Although industries such as robotics and autonomous driving are at the technological forefront, they often take a conservative approach to hardware design. Due to the complexity of robotic autonomous driving platforms, components in system design need to be verified and reliable, while event cameras still require time to move from laboratory to large-scale commercial applications.

Competitiveness issues:

Modern CMOS cameras are making progress in shutter speed, pixel density, and dynamic range, narrowing the performance gap with event cameras. They are cheap enough to allow multiple CMOS cameras to be used in a single application to overcome the shortcomings of a single camera, thereby reducing dependence on event cameras.

Non-standard hardware interfaces and communication standards:

Different manufacturers’ event cameras have variations in event transmission protocols and hardware interfaces, meaning that each development project for event cameras requires separate adaptation work, further increasing usage costs.

Quality and evaluation standards vary:

Although all event camera sensors claim to achieve dynamic ranges of 120 to 140 dB, there is no unified standard on how this range is achieved and how to assess the image quality of event cameras, making it difficult for end users to distinguish quality differences between different event cameras.

Leave a Comment Cancel reply