Images and Video

Audiovisual Processing CMP-6026A

Dr. David Greenwood

Content

Introducing Images
Sampling and Quantisation
Image Capture
Controlling and Analysing Images
Video

Images

Arguably the most important scientific instrument to date, practical photography with a camera arrived in 1839.

Simultaneously claimed by Louis Daguerre and William Henry Fox Talbot, and preceded by far less useful solutions.

Eadweard Muybridge 1830-1904

Proved a galloping horse lifts all four hooves off the ground at one point in its sequence of motion.

Perhaps the earliest movie?

Efficiently describe complex information…

Map of the Internet

Image data can contain data other than photographs.

A map of the Internet in 2021

Processing Images

Processing Images

image compression for transmission and storage

Processing Images

Processing Images

image understanding, classification and recognition

Processing Images

Image enhancement or restoration
Transmission or storage
Evidence
Image understanding or recognition

Digital Images

How do we represent images on a computer?

Greyscale Images

2D matrix of intensity values
Each value is a single pixel
0 to 255 for 8 bit images
0 is black, 255 is white

Greyscale Images

Can be defined as a function \(I(x, y)\)
\(x, y\) are the coordinates of the pixel
\(I(x, y)\) is the intensity of the pixel

Caveat

Data coordinates have the origin at the lower left.
The origin is at the top left corner for images.
Indexing 2D matrices is row, column order.

Colour Images

3D matrix of intensity values
Height x Width x Channels
Each triple value is a single pixel
(0, 0, 0) is black
(255, 255, 255) is white

Colour Images

Colour images can be defined as a set of functions:

\(R(x, y)\) for red
\(G(x, y)\) for green
\(B(x, y)\) for blue

Colour Images

Colour Images

Can also allow a definition of transparency.
Often referred to as alpha.
Still a 3D matrix, but with 4 channels.

Sampling and Quantisation

In order to become suitable for digital processing, an image function \(f(x,y)\) must be digitized both spatially and in amplitude.

Sampling

To digitise an image we discretise it by sampling spatially on a regular grid.

Sampling

The number of samples determines the resolution of the image.

Sampling

A pixel (picture element) at \((x,y)\) is the image intensity at the grid point indexed by the integer coordinate \((x,y)\).

Sampling

We can sample the image at various resolutions.

Sampling

NOTE: Here we use bi-cubic interpolation to display the images.

Sampling

You have already encountered sampling in the context of audio.

In audio the real signal is in the time domain.

For images, the real signal is in the spatial domain.

Quantisation

Definition

Transform a real-valued sampled image to one that takes a finite number of distinct values.

Quantisation

A pixel is usually represented by 8 bits, representing 256-levels.

In grayscale 0-255 represent black to white.
More or fewer bits can be used for a larger or greater range of values.

Quantisation

Image Capture

Digital Photography

The Camera

The shutter opens briefly.
Light enters via the aperture.
The lens focuses the rays.
An image is formed on the sensor.

The Camera

The sensor comprises millions of photo-sites.
The photo-sites collect photons.
Sites only measure brightness.
How do we determine color?

Bayer Filters

Assign each photo-site a filter.
Red filter allows red light.
Red filter blocks blue-green.
We can separate the intensities of red, blue and green.

Bayer Filters

Filtering results in missing values.
Missing values must be interpolated.
Manufacturers have their own algorithms.
Simplest method is linear interpolation.

Bayer Interpolation

\(a = (r_1 + r_2 + r_3 + r_4) / 4\)
\(b = (r_2 + r_4) / 2\)
\(a = (r_3 + r_4) / 2\)

Bayer Interpolation

Colour Perception

Twice as many green pixels.
Less noise than uniform distribution.
Humans are more sensitive to green.

Colour Perception

Colour is not a physical phenomenon - it is how humans perceive light of different wavelengths (analogous to perception of frequency in audio waveforms)

Colour Perception

Visible spectrum and receptor response for “normal” vision.

S: Short cone response
M: Medium cone response
L: Long cone response

Colour Perception

Wavelengths perceived as green trigger both M and L cone cells in the eye.

Abnormalities in the cone response leads to colour blindness.

Controlling and Analysing Colour

Exposure

Exposure controls the brightness of an image.

Pre-image capture

Adjust shutter speed and aperture size to control the amount of light reaching the image sensor.

Post-image capture

Adjust with a tone curve; a mapping from input to output pixel intensity.

Tone Curves

As a linear function

\(I^{\prime} = Iw + b\)

Tone Curves

As a linear function

\(I^{\prime} = Iw + b\)

Caution

Beware of implicit type conversion in your code.

Linear Tone Curves

\[f(I) = I\]

Linear Tone Curves

\[f(I) = I\]

Output intensity is the same as input intensity
No change is made to the image

Linear Tone Curves

linear tone curve f(I) = I — linear tone curve \(f(I) = I\)

Linear Tone Curves

\[f(I) = I \times 0.7\]

Linear Tone Curves

\[f(I) = I \times 0.7\]

Output intensity is less than input intensity
The image appears darker
Higher input values are effected more than lower input values

Linear Tone Curves

linear tone curve f(I) = I \times 0.7 — linear tone curve \(f(I) = I \times 0.7\)

Linear Tone Curves

\[f(I) = I \times 0.5 + 90\]

Linear Tone Curves

\[f(I) = I \times 0.5 + 90\]

Low input values are increased.
High input values are decreased.
The image appears to have lower contrast.

Linear Tone Curves

linear tone curve f(I) = I \times 0.5 + 90 — linear tone curve \(f(I) = I \times 0.5 + 90\)

Linear Tone Curves

\[f(I) = I \times 1.6 - 90\]

Linear Tone Curves

\[f(I) = I \times 1.6 - 90\]

High input values are increased.
Low input values are decreased.
The image appears to have higher contrast.
Some input values are clipped.

Linear Tone Curves

linear tone curve f(I) = I \times 1.6 - 90 — linear tone curve \(f(I) = I \times 1.6 - 90\)

Gamma Correction

Our eyes perceive brightness on a logarithmic scale.

Similar to how we perceive loudness in audio.

Gamma Correction

We have more cells that see in dim light than those that see in bright light.

We are more sensitive to low light changes.

Gamma Correction

Cameras measure light on a linear scale.

Gamma Correction

Tone curves can be used to adjust images so that they more closely match human perception of a scene.

\[I^{\prime} = 255 \times \frac{I}{255}^{\frac{1}{\gamma}}\]

Gamma Correction

\[I^{\prime} = 255 \times \frac{I}{255}^{\frac{1}{\gamma}}\]

End points are unchanged.
If \(\gamma = 1\), image is unchanged.
If \(\gamma > 1\), image appears lighter.
If \(\gamma < 1\), image appears darker.

Gamma Correction

Histograms

A histogram is an approximate representation of the distribution of numerical data.

Histograms

We want to show the frequency, or count, of the values in an image.

Histograms

Notice the large cluster of values 5 to 25.
Probably the coat?

Histograms

Notice the central values.
100 to 150 could be grass?
150 to 200 could be sky?

Thresholding

Thresholding is the simplest method of segmenting images.

Thresholding

If we wanted to separate the coat from the sky, we could use a threshold.

Thresholding

By observing the histogram we could separate all pixels above or below a value.

Thresholding

\[I_{t} = I > t\]

Thresholding

threshold image, t=75 — threshold image, \(t=75\)

Video

We can consider video as a sequence of consecutive images.

Frame Rate (fps)

The rate at which images are captured - or displayed.

Frame Rate (fps)

24 fps common for the film industry.
25 fps common for the European television industry.
29.97 fps common for the American television industry.
90 fps common for the virtual reality headsets.

Progressive Scan

All lines of each frame are drawn in sequence.
The whole image is drawn at before transmission.

Interlaced Scan

Odd and even lines are broadcast on alternating frames.
display device interleaves fields.
Eye fooled into believing image is being updated, so less apparent flicker.
Many unpleasant artefacts introduced as a result of interlacing.

Interlaced Scan

interlaced video is not good for computer vision

Summary

Introducing images
Image Sampling and Quantisation
Image Capture
Controlling and Analysing Images
Video

// reveal.js plugins