Fundamentals of Image Processing

Click on the "Xiaobai Learns Vision" above, select to add "Star" or "Top"
Heavy content, delivered first time

Image

1. Analog Image

An analog image, also known as a continuous image, refers to an image that continuously varies in a two-dimensional coordinate system, meaning the image points are infinitely dense and possess gray values (i.e., the change from dark to light in the image).

2. Digital Image

A digital image, also known as a digital image or bit image, is a representation of a two-dimensional image using a finite number of numerical pixel values.

A digital image is obtained by digitizing an analog image, consisting of pixels as basic elements, and can be stored and processed using digital computers or digital circuits.

Typically, a two-dimensional digital image is a matrix that can be represented by a two-dimensional array f(x,y), where x and y are coordinates in a two-dimensional space, and f(x,y) represents properties such as the gray value of the image at that point.

3. Color Models (Color Storage)

Color has three characteristics: hue, brightness, and saturation. The three characteristics of color and their interrelations can be illustrated using a three-dimensional color space.

A color model is a model that represents a certain color in digital form, or in other words, a way of recording the colors of an image. It is divided into: RGB mode, CMYK mode, HSB mode, Lab mode, bitmap mode, grayscale mode, indexed color mode, duotone mode, and multi-channel mode.

● RGB Mode (Additive Color Model)

RGB is the most commonly used color mode, with RGB representing the three primary colors: red, green, and blue. In this mode, each pixel occupies 3 bytes (one byte is 8 bits), used to represent the R, G, B components of the color (255, 255, 255) in the additive color model ranging from 0-255 (black-white).

Characteristics: small file size, rich and full colors. The RGB mode is an additive color model. The images displayed on screens are generally in RGB mode because the physical structure of the monitor follows RGB.

When the brightness values of the three primary colors are equal, gray is produced; when all three brightness values are 255, pure white is produced; and when all brightness values are 0, pure black is produced. The method by which RGB mode generates colors is also called additive color mixing.

4. Color Models (Display Methods)

A color model is an algorithm for representing colors in the digital world. In mathematics, to represent various colors, people usually divide colors into several components. Due to the different principles of color formation, this leads to differences in how color devices, such as monitors, projectors, and scanners that directly synthesize colors using light, generate colors compared to printing devices like printers and presses that use pigments. The models include: RGB mode, CMYK mode, HSB mode, Lab mode, bitmap mode, grayscale mode, indexed color mode, duotone mode, and multi-channel mode.

5. Types of Images

(1) Binary Image

Only two colors: black and white. Black is 0, white is 1. Binary images are suitable for images composed of only black and white without gray shadows.

Fundamentals of Image Processing

(2) Grayscale Image (GrayScale)

The value range of the elements in a grayscale image matrix is typically [0, 255]. Therefore, its data type is generally an 8-bit unsigned integer (int8), which is often referred to as a 256 grayscale image. “0” represents pure black, “255” represents pure white, and the intermediate numbers represent transitional colors from black to white. Grayscale images only have gray colors without color. What we commonly refer to as black-and-white photos actually contain all gray tones between black and white.

Fundamentals of Image Processing

(3) Indexed Color Image

The color table’s red, green, and blue component values are not all equal; pixel values are the index addresses of the image color table.

In this mode, colors are predefined, and the set of selectable colors is limited, allowing indexed color images to display a maximum of 256 colors.

Indexed colors are also known as mapped colors. An indexed color image is defined in the image file, and when the file is opened, the index values that constitute the specific colors of that image are read into the program, which then finds the final colors based on the index values.

The file structure of indexed images is relatively complex; in addition to storing the two-dimensional matrix of the image, it also includes a two-dimensional array called the color index matrix MAP. The size of the MAP is determined by the range of values of the matrix elements. For example, if the range is [0, 255], the size of the MAP matrix will be 256×3, represented as MAP=[RGB]. Each row in the MAP specifies the red, green, and blue monochrome values corresponding to that row’s color, with each row corresponding to a gray value of the image matrix pixels.

The data type of indexed images is generally 8-bit unsigned integer (int8), and the corresponding index matrix MAP’s size is 256×3, thus indexed images can typically display only 256 colors at the same time. However, by changing the index matrix, the types of colors can be adjusted.

Indexed images are generally used for images with simpler color requirements, such as wallpapers in Windows that often use indexed images. If the image’s colors are more complex, true color RGB images will be used.

Fundamentals of Image Processing

(4) True Color RGB Image

RGB images, like indexed images, use combinations of red (R), green (G), and blue (B) to represent the color of each pixel.

However, unlike indexed images, the color values of each pixel in RGB images (represented by the three primary colors) are stored directly in the image matrix. Since the color of each pixel is represented by R, G, and B components, each component occupies 1 byte, representing different brightness values from 0 to 255. The combination of these three bytes can produce 16.7 million different colors.

M and N represent the number of rows and columns in the image, and three M x N two-dimensional matrices represent the R, G, and B color components of each pixel, respectively. The data type of RGB images is generally 8-bit unsigned integer and is usually used to represent and store true color images; it can also store grayscale images.

Fundamentals of Image Processing

RGB images are stored by rows and columns, with each column containing three channels (Note: the order of channels is BGR rather than RGB).

Fundamentals of Image Processing

5. Main Differences Between RGB Images and Indexed Images

(1) RGB Color Mode Image: Also known as additive color mode, this is the best color for screen display, composed of red, green, and blue, with each color having brightness variations from 0 to 255.

Fundamentals of Image Processing

(2) Indexed Color Image: In this color mode, the pixel values of the image are represented by one byte, allowing for a maximum of 256 colors stored in a color table. Its image quality is not high. Its data information includes a data matrix and a double-precision color map matrix. The values in its data matrix directly specify the color of that point as one in the color map matrix, where each row represents a color, with three data points for each row representing the proportions of red, green, and blue in that color, all values ranging within [0, 1]. It occupies less space and is usually used for image transmission over the internet or in scenarios with strict pixel and size requirements.

6. Pixel

A pixel refers to the small squares that make up an image, each having a defined position and assigned color value. The color and position of these small squares determine how the image appears. Digital images consist of pixels, which can generally be divided into two categories based on the different positions of the coordinate origin, with each pixel denoted as I(r,c) or f(x,y). The value range I for grayscale images is a scalar: I=greylevel; for color images, the value range I is a multi-dimensional vector: e.g., I=(r,g,b).

Pixels can be viewed as indivisible units or elements within the entire image. Indivisible means they cannot be further divided into smaller units or elements; they exist as small squares of a single color.

Each raster image contains a certain number of pixels, which determine the size of the image as displayed on the screen.

7. Resolution

Resolution is a parameter that measures how much data is contained within bitmap images. It is usually expressed as pixels per inch (PPI) and dots per inch (DPI).

(1) Image Resolution

The number of pixels per unit length in an image is called the image resolution, with units of pixels/inch (PPI) or pixels/cm. In two images of the same size, the high-resolution image contains more pixels than the low-resolution image.

The image size, image resolution, and image file size are closely related. The larger the image size, the higher the image resolution, and consequently, the larger the image file. Adjusting the image size and resolution can change the size of the image file.

Fundamentals of Image Processing

(2) Screen Resolution

Screen resolution is the number of dots displayed per unit length on a monitor (DPI). Screen resolution depends on the size of the monitor and its pixel settings.

When the image resolution exceeds the monitor resolution, the image displayed on the screen appears larger than its actual size.

Fundamentals of Image Processing

Mathematical Models of Images

1. Two Basic Mathematical Models of Images

Continuous Model

General images are a continuous distribution of energy, as mentioned earlier in film imaging.

Discrete Model

This model views digital images as a collection of discrete sampling points, each possessing its own attributes. Processing operations are performed on these discrete units. It cannot reflect the overall state or the relationships between the contents of the image. Operations using convolution are more effective.

Both models have their advantages and disadvantages, but the future direction will still be the discrete model, as it is more convenient for computer processing. We will mainly approach image processing from the discrete model.

2. Application Principles of Image Mathematical Models

In image processing, different models are often employed based on the task and purpose or different stages, ensuring optimal system performance. Images must satisfy the sampling theorem during digitization so that discrete images can correspond to their continuous forms. “Digital image processing” does not refer to “processing of digital images,” but rather to “digital processing of images.”

3. Sampling Theorem

The sampling theorem was proposed by American telecommunications engineer H. Nyquist in 1928. In the field of digital signal processing, the sampling theorem is the fundamental bridge between continuous-time signals (commonly referred to as “analog signals”) and discrete-time signals (commonly referred to as “digital signals”). This theorem explains the relationship between sampling frequency and signal spectrum, serving as the basic premise for discretizing continuous signals. It establishes a sufficient condition for the sampling rate that allows the discrete sampling sequence to capture all information from a finite bandwidth continuous-time signal.

4. Digitization (Continuous → Discrete)

The process of converting an image from its original form to a digital form involves three steps: “scanning” (scanning), “sampling” (sampling), and “quantization” (quantization). Typically, “scanning” is combined into the “sampling” phase, merging into two processes.

Fundamentals of Image Processing

(1) Sampling

Sampling is the process of transforming a spatially continuous image into discrete points. The higher the sampling frequency, the more realistic the restored image will be.

Sampling divides a continuous image into M×N grids, each represented by a brightness value. Each grid is called a pixel. The values of M×N satisfy the sampling theorem.

Fundamentals of Image Processing

(2) Quantization

Quantization is the process of converting the sampled pixel points into discrete numerical values. The number of different gray values in a digital image is called the gray level; the higher the level, the clearer the image.

Quantization converts the continuous range of brightness values corresponding to the sampled points into a single specific numerical value.

After quantization, the image is represented as an integer matrix. Each pixel has two attributes: position and gray value. Position is represented by rows and columns. Gray value indicates the brightness level of that pixel position as an integer. This numerical matrix M×N then becomes the object for computer processing. The gray level is generally from 0 to 255 (8-bit quantization). The following image illustrates how to convert continuous values to discrete values.

Fundamentals of Image Processing

In summary, the digitalization process is illustrated below, showing the transition from the image’s real source to the final digital image:

Fundamentals of Image Processing

Image Processing

Digital image processing includes:

● Image digitization;

● Image transformation;

● Image enhancement;

● Image restoration;

● Image compression encoding;

● Image segmentation;

● Image analysis and description;

● Image recognition and classification.

Common Image Transformation Algorithms

Geometric transformations of images (image distortion correction, image scaling: bilinear interpolation, rotation, stitching)

Image transformations (Fourier, cosine, Walsh-Hadamard, K-L transformation, wavelet transformation)

Image frequency domain processing (enhancement algorithms: high-frequency enhancement, homomorphic filtering; smoothing and denoising: low-pass filtering)

Image Enhancement

The purpose of image enhancement is to improve the visual effect of the image. For a given image application, it purposefully enhances the overall or local characteristics of the image, making originally unclear images clear or enhancing certain features of interest, expanding the differences between various objects in the image, and suppressing uninteresting features, thereby improving image quality and enriching information, facilitating strong image interpretation and recognition effects, and meeting certain feature analysis needs.

Common image enhancement methods include: linear transformations of images; nonlinear changes of images; histogram equalization and specification of images.

Image Restoration

Image degradation occurs due to imperfections in the imaging system, transmission medium, and equipment during formation, transmission, and recording, resulting in a decline in image quality. This phenomenon is referred to as image degradation. Image restoration requires knowledge of the mechanisms and processes of image degradation to find a corresponding inverse process calculation method, thus obtaining the restored image. If the image has degraded, restoration should be performed first, followed by enhancement.

Common image restoration methods include:

● Algebraic restoration methods: unconstrained restoration; constrained least squares method

● Frequency domain restoration methods: inverse filtering restoration; removal of blurriness caused by uniform motion; Wiener filtering restoration method

Image Compression

Image data can be compressed because of the redundancy present in the data. In image compression, there are three basic types of data redundancy: coding redundancy; inter-pixel redundancy; visual redundancy.

● Lossless compression: This is compression of the file itself, optimizing the data storage method of the file using algorithms to represent repeated data information. The file can be completely restored without affecting the content, and for digital images, it does not result in any loss of image detail. Lossless (also known as lossless, no error, information retention) coding only removes redundant data from the image data, and the image reconstructed after decoding is identical to the original image without any distortion.

● Lossy compression: This involves changing the image itself, retaining more brightness information while merging hue and saturation information with surrounding pixels. The merging ratio varies, leading to different compression ratios. Since the amount of information is reduced, the compression ratio can be very high, and the image quality will correspondingly decline. Lossy (also known as lossy, with distortion) coding means that the image reconstructed after decoding is distorted compared to the original image and cannot be accurately restored, but visually appears similar, making it an encoding method that achieves high compression ratios.

This article is for academic sharing only. If there is any infringement, please contact for deletion.

Download 1: OpenCV-Contrib Extension Module Chinese Version Tutorial

Reply "Extension Module Chinese Tutorial" in the background of the "Xiaobai Learns Vision" public account to download the first comprehensive OpenCV extension module tutorial in Chinese, covering installation of extension modules, SFM algorithms, stereo vision, target tracking, biological vision, super-resolution processing, and more than twenty chapters of content.

Download 2: Python Vision Practical Project 52 Lectures

Reply "Python Vision Practical Project" in the background of the "Xiaobai Learns Vision" public account to download 31 practical vision projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, and face recognition to help quickly learn computer vision.

Download 3: OpenCV Practical Project 20 Lectures

Reply "OpenCV Practical Project 20 Lectures" in the background of the "Xiaobai Learns Vision" public account to download 20 practical projects based on OpenCV for advanced learning.

Discussion Group

Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (these groups will gradually be subdivided). Please scan the WeChat ID below to join the group, and note: "Nickname + School/Company + Research Direction", for example: "Zhang San + Shanghai Jiao Tong University + Vision SLAM". Please follow the format; otherwise, you will not be approved. After successful addition, you will be invited into relevant WeChat groups based on research direction. Please do not send advertisements in the group, or you will be removed from the group. Thank you for your understanding.~

Leave a Comment Cancel reply