Introduction

This blog was originally published on CSDN on 2021-08-22. It is reproduced here with minor formatting corrections.

Recently, I've been learning about JPEG encoding. After reading many articles online, I found that few of them clearly explain every detail, which led to many pitfalls during programming. Therefore, I decided to write an article covering the details, combined with Python code. The full implementation can be found in my open-source project on GitHub.

Of course, this introduction and code are not perfect and may contain errors. They are intended only as a beginner's guide—please forgive any shortcomings.

Various Markers in JPEG Files

Many articles introduce JPEG file markers. I've also uploaded a document annotating an actual image (download here) for reference.

All markers start with 0xff (255 in hexadecimal), followed by the byte count of the block data and the information describing the block. The structure is shown below:

# Write JPEG decoding information
# filename: output filename
# h: image height
# w: image width
def write_head(filename, h, w):
    # Open file in binary write mode (overwrites)
    fp = open(filename, "wb")

    # SOI
    fp.write(pack(">H", 0xffd8))
    # APP0
    fp.write(pack(">H", 0xffe0))
    fp.write(pack(">H", 16))            # APP0 byte count
    fp.write(pack(">L", 0x4a464946))    # JFIF
    fp.write(pack(">B", 0))            # 0
    fp.write(pack(">H", 0x0101))        # Version: 1.1
    fp.write(pack(">B", 0x01))            # Pixel density unit: pixels per inch
    fp.write(pack(">L", 0x00480048))    # X and Y pixel density
    fp.write(pack(">H", 0x0000))        # No thumbnail information
    # DQT_0
    fp.write(pack(">H", 0xffdb))
    fp.write(pack(">H", 64+3))        # Quantization table byte count
    fp.write(pack(">B", 0x00))            # Quantization table precision: 8bit (0), Table ID: 0
    tbl = block2zz(std_luminance_quant_tbl)
    for item in tbl:
        pfp.write(pack(">B", item))    # Content of quantization table 0
    # DQT_1
    fp.write(pack(">H", 0xffdb))
    fp.write(pack(">H", 64+3))        # Quantization table byte count
    fp.write(pack(">B", 0x01))            # Quantization table precision: 8bit (0), Table ID: 1
    tbl = block2zz(std_chrominance_quant_tbl)
    for item in tbl:
        pfp.write(pack(">B", item))    # Content of quantization table 1
    # SOF0
    fp.write(pack(">H", 0xffc0))
    fp.write(pack(">H", 17))            # Frame image info byte count
    fp.write(pack(">B", 8))            # Precision: 8bit
    fp.write(pack(">H", h))            # Image height
    fp.write(pack(">H", w))            # Image width
    fp.write(pack(">B", 3))            # Number of color components: 3 (YCrCb)
    fp.write(pack(">B", 1))            # Color component ID: 1
    fp.write(pack(">H", 0x1100))        # Horizontal and vertical sampling factors: 1, Quantization table ID: 0
    fp.write(pack(">B", 2))            # Color component ID: 2
    fp.write(pack(">H", 0x1101))        # Horizontal and vertical sampling factors: 1, Quantization table ID: 1
    fp.write(pack(">B", 3))            # Color component ID: 3
    fp.write(pack(">H", 0x1101))        # Horizontal and vertical sampling factors: 1, Quantization table ID: 1
    # DHT_DC0
    fp.write(pack(">H", 0xffc4))
    fp.write(pack(">H", len(std_huffman_DC0)+3))    # Huffman table byte count
    fp.write(pack(">B", 0x00))            # DC0
    for item in std_huffman_DC0:
        pfp.write(pack(">B", item))        # Huffman table content
    # DHT_AC0
    fp.write(pack(">H", 0xffc4))
    fp.write(pack(">H", len(std_huffman_AC0)+3))    # Huffman table byte count
    fp.write(pack(">B", 0x10))            # AC0
    for item in std_huffman_AC0:
        pfp.write(pack(">B", item))        # Huffman table content
    # DHT_DC1
    fp.write(pack(">H", 0xffc4))
    fp.write(pack(">H", len(std_huffman_DC1)+3))    # Huffman table byte count
    fp.write(pack(">B", 0x01))            # DC1
    for item in std_huffman_DC1:
        pfp.write(pack(">B", item))        # Huffman table content
    # DHT_AC1
    fp.write(pack(">H", 0xffc4))
    fp.write(pack(">H", len(std_huffman_AC1)+3))    # Huffman table byte count
    fp.write(pack(">B", 0x11))            # AC1
    for item in std_huffman_AC1:
        pfp.write(pack(">B", item))        # Huffman table content
    # SOS
    fp.write(pack(">H", 0xffda))
    fp.write(pack(">H", 12))            # Scan start info byte count
    fp.write(pack(">B", 3))            # Number of color components: 3
    fp.write(pack(">H", 0x0100))        # Component 1: DC and AC Huffman table IDs
    fp.write(pack(">H", 0x0211))        # Component 2: DC and AC Huffman table IDs
    fp.write(pack(">H", 0x0311))        # Component 3: DC and AC Huffman table IDs
    fp.write(pack(">B", 0x00))
    fp.write(pack(">B", 0x3f))
    fp.write(pack(">B", 0x00))            # Fixed value
    fp.close()

At this point, we have only the image data left to write. But how exactly is the image data encoded? How are quantization and Huffman coding implemented? See the next section for details.

JPEG Encoding Process

Since JPEG encoding divides images into 8×8 blocks, the image height and width must both be multiples of 8. We can use interpolation or downsampling to slightly adjust the image so that its dimensions become multiples of 8. For an image with thousands of pixels, this adjustment won’t significantly affect the aspect ratio.

# Resize image to ensure it can be divided into 8×8 blocks
if ((h % 8 == 0) and (w % 8 == 0)):
    nblock = int(h * w / 64)
else:
    h = h // 8 * 8
    w = w // 8 * 8
    YCrCb = cv2.resize(YCrCb, [h, w], cv2.INTER_CUBIC)
    nblock = int(h * w / 64)

Color Space Conversion

JPEG uses the YCbCr color space uniformly because human eyes are more sensitive to luminance than chrominance. Thus, we selectively compress Cb and Cr components more, preserving visual quality while reducing file size. After converting to YCbCr, we can downsample the Cb and Cr components to reduce their resolution, achieving greater compression. Common sampling formats include 4:4:4, 4:2:2, and 4:2:0. These correspond to the horizontal and vertical sampling factors in the SOF0 marker. For simplicity, this article uses all sampling factors as 1 (no downsampling), meaning one Y component corresponds to one Cb/Cr component (4:4:4). In 4:2:2, two Y components correspond to one Cb/Cr; in 4:2:0, four Y components correspond to one Cb/Cr. The diagram below shows each cell as a Y component, and blue cells represent sampled Cb/Cr pixels.

The conversion formulas are:

Y = 0.299*R + 0.587*G + 0.114*B

Cb = -0.1687*R - 0.3313*G + 0.5*B + 128

Cr = 0.5*R - 0.4187*G - 0.0813*B + 128

All operations are rounded to integers. In a 24-bit RGB BMP image, R, G, and B range from 0–255. Through simple math, we find that Y, Cb, and Cr also range from 0–255. In JPEG, we typically subtract 128 from each component to center the range around zero.

In Python, you can use OpenCV’s function for color space transformation:

YCrCb = cv2.cvtColor(BGR, cv2.COLOR_BGR2YCrCb)
npdata = np.array(YCrCb, np.int16)

8×8 Block Division

JPEG processes each 8×8 block individually, proceeding from top to bottom and left to right. Each block’s Y, Cb, and Cr components are processed in order (using different quantization and Huffman tables).

for i in range(0, h, 8):
    for j in range(0, w, 8):
        ...

DCT Transformation

DCT (Discrete Cosine Transform) converts spatial domain data into frequency domain, allowing selective reduction of high-frequency components without significant visual impact. Compared to Discrete Fourier Transform, DCT operates entirely in the real domain, making it more efficient for computers. The DCT formula is:

F(u,v)=\frac2{\sqrt{MN}}\sum_{x=0}^{M-1}\sum_{y=0}^{N-1}f(x,y)C(u)C(v)\cos\frac{(2x+1)u\pi}{2M}\cos\frac{(2y+1)v\pi}{2N}

where $C(u)=\begin{cases}\frac{1}{\sqrt{2}}&u=0\\1&u\neq0\end{cases}$ . In JPEG, $M=N=8$ .

Alternatively, use OpenCV’s built-in function:

now_block = npdata[i:i+8, j:j+8, 0] - 128    # Extract 8×8 block and subtract 128 (Y component)
now_block = npdata[i:i+8, j:j+8, 2] - 128    # Extract 8×8 block and subtract 128 (Cb component)
now_block = npdata[i:i+8, j:j+8, 1] - 128    # Extract 8×8 block and subtract 128 (Cr component)
now_block_dct = cv2.dct(np.float32(now_block))    # Perform DCT

Quantization

After DCT, the DC component is the first element of the 8×8 block, low-frequency components concentrate in the top-left, and high-frequency components in the bottom-right. To discard high-frequency data selectively, we apply quantization—dividing each element by a fixed value. The quantization table has smaller values in the top-left and larger ones in the bottom-right. Example quantization tables (different for luminance and chrominance):

# Luminance quantization table
std_luminance_quant_tbl = np.array(
    [
        [16, 11, 10, 16, 24, 40, 51, 61],
        [12, 12, 14, 19, 26, 58, 60, 55],
        [14, 13, 16, 24, 40, 57, 69, 56],
        [14, 17, 22, 29, 51, 87, 80, 62],
        [18, 22, 37, 56, 68,109,103, 77],
        [24, 35, 55, 64, 81,104,113, 92],
        [49, 64, 78, 87,103,121,120,101],
        [72, 92, 95, 98,112,100,103, 99]
    ],
    np.uint8
)
# Chrominance quantization table
std_chrominance_quant_tbl = np.array(
    [
        [17, 18, 24, 47, 99, 99, 99, 99],
        [18, 21, 26, 66, 99, 99, 99, 99],
        [24, 26, 56, 99, 99, 99, 99, 99],
        [47, 66, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99]
    ],
    np.uint8
)

Quantization code:

now_block_qut = quantize(now_block_dct, 0)        # Quantize Y component
now_block_qut = quantize(now_block_dct, 2)        # Quantize Cb component
now_block_qut = quantize(now_block_dct, 1)        # Quantize Cr component

# Quantize
# block: current 8×8 block data
# dim: dimension (0: Y, 1: Cr, 2: Cb)
def quantize(block, dim):
    if dim == 0:
        # Use luminance quantization table
        qarr = std_luminance_quant_tbl
    else:
        # Use chrominance quantization table
        qarr = std_chrominance_quant_tbl
    return (block / qarr).round().astype(np.int16)

After quantization, many zeros appear in the bottom-right corner. To group these zeros together for better run-length encoding efficiency, we perform zigzag scanning next.

Zigzag Scanning

Zigzag scanning converts the 8×8 block into a 64-element list following this pattern:

We obtain a list like this: (41, -8, -6, -5, 13, 11, -1, 1, 2, -2, -3, -5, 1, 1, -5, 1, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0). We’ll use this list as an example for subsequent steps.

Note: When storing quantization tables, we must also apply zigzag scanning to them. Storing them in this format ensures correct decoding by viewers (I spent a lot of time debugging this issue initially). Refer back to the marker-writing code at the beginning of this article.

now_block_zz = block2zz(now_block_qut)        # Zigzag scan

# Zigzag scan
# block: current 8×8 block data
def block2zz(block):
    re = np.empty(64, np.int16)
    # Current position in block
    pos = np.array([0, 0])
    # Define four scanning directions
    R = np.array([0, 1])
    LD = np.array([1, -1])
    D = np.array([1, 0])
    RU = np.array([-1, 1])
    for i in range(0, 64):
        re[i] = block[pos[0], pos[1]]
        if (((pos[0] == 0) or (pos[0] == 7)) and (pos[1] % 2 == 0)):
            pos = pos + R
        elif (((pos[1] == 0) or (pos[1] == 7)) and (pos[0] % 2 == 1)):
            pos = pos + D
        elif ((pos[0] + pos[1]) % 2 == 0):
            pos = pos + RU
        else:
            pos = pos + LD
    return re

Differential Encoding (DC Components)

DC component values are often large, but adjacent 8×8 blocks usually have similar DC values. Differential encoding saves space by storing the difference between the current block and the previous one. The first block stores its own value. Note: This is done separately for Y, Cb, and Cr components (each component is differenced independently). The encoding and storage of the DC component now_block_dc will be explained later.

last_block_ydc = 0
last_block_cbdc = 0
last_block_crdc = 0

now_block_dc = now_block_zz[0] - last_block_ydc # Store difference
last_block_ydc = now_block_zz[0] # Update last value

now_block_dc = now_block_zz[0] - last_block_cbdc
last_block_cbdc = now_block_zz[0]

now_block_dc = now_block_zz[0] - last_block_crdc
last_block_crdc = now_block_zz[0]

Run-Length Encoding (AC Components)

After zigzag scanning, many zeros cluster together. The AC component list becomes: (-8, -6, -5, 13, 11, -1, 1, 2, -2, -3, -5, 1, 1, -5, 1, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0).

Run-length encoding stores two numbers per pair: the second number is a non-zero value, and the first is the count of preceding zeros. The sequence ends with two zeros (important: if the input doesn’t end with zero, no trailing zeros are needed—this bug took me a long time to find, see line 23 in the code below). The encoded result becomes: (0, -8), (0, -6), (0, -5), (0, 13), (0, 11), (0, -1), (0, 1), (0, 2), (0, -2), (0, -3), (0, -5), (0, 1), (0, 1), (0, -5), (0, 1), (3, -1), (6, 1), (0, 1), (0, -1),(27, 1), (0, 0). The length is now 42, reduced from 63. Of course, this is a special case—real data often has more zeros, even all zeros, leading to even better compression.

You might notice the red (27, 1): in the encoding process, the first number is stored in 4 bits (range 0–15), so 27 exceeds this limit. We split it into (15, 0) and (11, 1), where (15, 0) means 16 zeros, and (11, 1) means 11 zeros followed by a 1.

now_block_ac = RLE(now_block_zz[1:])

# Run-length encoding
# AClist: AC data to encode
def RLE(AClist: np.ndarray) -> np.ndarray:
    re = []
    cnt = 0
    for i in range(0, 63):
        if AClist[i] == 0 and cnt != 15:
            cnt += 1
        else:
            re.append(cnt)
            re.append(AClist[i])
            cnt = 0
    # Remove trailing [15, 0] pairs
    while re[-1] == 0:
        re.pop()
        re.pop()
        if len(re) == 0:
            break
    # Add two zeros at the end as termination marker
    if AClist[-1] == 0:
        re.extend([0, 0])
    return np.array(re, np.int16)

Special Binary Encoding in JPEG

After the above steps, this section explains how the encoded DC and AC components are written as a bitstream.

JPEG uses the following binary encoding scheme:

             Value              Bit Length           Stored Value
              0                   0                   None
            -1,1                  1                  0,1
         -3,-2,2,3                2              00,01,10,11
   -7,-6,-5,-4,4,5,6,7            3    000,001,010,011,100,101,110,111
     -15,..,-8,8,..,15            4       0000,..,0111,1000,..,1111
    -31,..,-16,16,..,31           5     00000,..,01111,10000,..,11111
    -63,..,-32,32,..,63           6                  ...
   -127,..,-64,64,..,127          7                  ...
  -255,..,-128,128,..,255         8                  ...
  -511,..,-256,256,..,511         9                  ...
 -1023,..,-512,512,..,1023       10                  ...
-2047,..,-1024,1024,..,2047      11                  ...

For a number to store, we determine its bit length and actual binary value using this scheme. Observing the pattern, positive numbers store their binary representation directly, with bit length equal to their binary length. Negative numbers have the same bit length, but the binary value is the bitwise NOT of the absolute value. Zero requires no storage.

# Special binary encoding format
# num: number to encode
def tobin(num):
    s = ""
    if num > 0:
        while num != 0:
            s += '0' if num % 2 == 0 else '1'
            num = int(num / 2)
        s = s[::-1]  # Reverse
    elif num < 0:
        num = -num
        while num != 0:
            s += '1' if num % 2 == 0 else '0'
            num = int(num / 2)
        s = s[::-1]
    return s

For the DC component, suppose the differential value is -41. Using the above method, we get a bit length of 6 and binary stream 010110. For the value 6, we use canonical Huffman coding to encode the binary stream. This part is covered in Section 9—we assume the encoded stream for 6 is 100. Then, the DC value for this 8×8 block’s component is stored as 100010110.

After writing the DC binary stream, we proceed to encode the AC component of the same block. After run-length encoding, we get: (0, -8), (0, -6), (0, -5), (0, 13), (0, 11), (0, -1), (0, 1), (0, 2), (0, -2), (0, -3), (0, -5), (0, 1), (0, 1), (0, -5), (0, 1), (3, -1), (6, 1), (0, 1), (0, -1), (15, 0), (11, 1), (0, 0).

First, store (0, -8). For the second number, apply the same method: 4 bits and 0111. Unlike DC, we now apply canonical Huffman coding to 0x04—the high 4 bits represent the count of zeros (first number), and the lower 4 bits represent the bit length of the second number. Suppose 0x04 encodes to 1011, then (0, -8) is stored as 10110111. Repeat for (0, -6), etc., writing the resulting bitstream sequentially.

Another example: (6, 1). The value 1 is stored as 1 (1 bit), so we encode 0x61 using canonical Huffman coding. Suppose it becomes 1111011. Then (6, 1) is stored as 11110111. For (15, 0), we only store the canonical Huffman code for 0xf0.

After completing the data for one component (say Y), write the Cb component data, then the Cr component data. Repeat this process for each 8×8 block from left to right and top to bottom. Finally, write the EOI marker (0xffd9) to mark the end of the image.

Note: During writing, check for 0xff to avoid conflicts with markers. If encountered, insert 0x00 after it.

s = write_num(s, -1, now_block_dc, DC0)        # Write DC data based on encoding method
for l in range(0, len(now_block_ac), 2):  # Write AC data
    s = write_num(s, now_block_ac[l], now_block_ac[l+1], AC0)
    while len(s) >= 8:  # Prevent memory overflow
        num = int(s[0:8], 2)
        pfp.write(pack(">B", num))
        if num == 0xff:  # Avoid marker conflict
            pfp.write(pack(">B", 0))  # Insert 0x00 after 0xff
        s = s[8:len(s)]

# Write data based on encoding method
# s: un-written binary data
# n: number of leading zeros (-1 for DC)
# num: value to write
# tbl: canonical Huffman dictionary
def write_num(s, n, num, tbl):
    bit = 0
    tnum = num
    while tnum != 0:
        bit += 1
        tnum = int(tnum / 2)
    if n == -1:  # DC
        tnum = bit
        if tnum > 11:
            print("Write DC data Error")
            exit()
    else:  # AC
        if (n > 15) or (bit > 11) or (((n != 0) and (n != 15)) and (bit == 0)):
            print("Write AC data Error")
            exit()
        tnum = n * 10 + bit + (0 if n != 15 else 1)
    # Canonical Huffman code: record count of zeros (AC) and bit length of num
    s += tbl[tnum].str_code
    # Store the actual binary value of num
    s += tobin(num)
    return s

Canonical Huffman Coding

This article introduces four canonical Huffman tables: one for luminance DC, one for chrominance DC, one for luminance AC, and one for chrominance AC.

# Luminance DC canonical Huffman table
std_huffman_DC0 = np.array(
    [0, 0, 7, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
     4, 5, 3, 2, 6, 1, 0, 7, 8, 9, 10, 11],
    np.uint8
)
...
# Convert to Huffman dictionary
DC0 = DHT2tbl(std_huffman_DC0)    # Luminance DC
DC1 = DHT2tbl(std_huffman_DC1)    # Chrominance DC
AC0 = DHT2tbl(std_huffman_AC0)    # Luminance AC
AC1 = DHT2tbl(std_huffman_AC1)    # Chrominance AC

The values in std_huffman_DC0, etc., are the actual numbers stored in the markers—refer to the earlier code. The first 16 numbers (0, 0, 7, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) indicate how many codes exist for lengths 1–16 bits. The remaining 12 numbers sum up to the total number of symbols. std_huffman_DC0 essentially describes the diagram below:

Now we know the length of each encoded value, but not the actual code.

Canonical Huffman coding follows these rules:

The first code of the shortest length is 0;
Codes of the same length are consecutive;
The first code of the next length (j) depends on the last code of the previous length (i): a=(b+1)<<(j-i).

By Rule 1, 4’s code is 000. By Rule 2, 5 is 001, 3 is 010, 2 is 011, ..., 0 is 110. By Rule 3, 7 is 1110, 8 is 11110...

# Class to store Huffman dictionary
# symbol: original value
# code: corresponding code
# n_bit: number of bits in code
# str_code: binary string representation
class Sym_Code():
    def __init__(self, symbol, code, n_bit):
        self.symbol = symbol
        self.code = code
        str_code = ''
        mask = 1 << (n_bit - 1)
        for i in range(0, n_bit):
            if mask & code:
                str_code += '1'
            else:
                str_code += '0'
            mask >>= 1
        self.str_code = str_code
    """Define output format"""
    def __str__(self):
        return "0x{:0>2x}    |  {}".format(self.symbol, self.str_code)
    """Define comparison"""
    def __eq__(self, other):
        return self.symbol == other.symbol
    def __le__(self, other):
        return self.symbol < other.symbol
    def __gt__(self, other):
        return self.symbol > other.symbol


# Convert canonical Huffman table to dictionary
# data: defined canonical Huffman table
def DHT2tbl(data):
    numbers = data[0:16]  # Count of codes for lengths 1–16
    symbols = data[16:len(data)]  # Original symbols
    if sum(numbers) != len(symbols):  # Validate table
        print("Wrong DHT!")
        exit()
    code = 0
    SC = []  # List to store dictionary
    for n_bit in range(1, 17):
        # Apply canonical Huffman rules
        for symbol in symbols[sum(numbers[0:n_bit-1]):sum(numbers[0:n_bit])]:
            SC.append(Sym_Code(symbol, code, n_bit))
            code += 1
        code <<= 1
    return sorted(SC)

The final Huffman dictionary is lengthy and can be viewed in my GitHub project. Studying its structure reveals how indexing works in the write_num function.

Introduction

This blog was originally published on CSDN on 2021-08-22. It is reproduced here with minor formatting corrections.

Of course, this introduction and code are not perfect and may contain errors. They are intended only as a beginner's guide—please forgive any shortcomings.

Various Markers in JPEG Files

Many articles introduce JPEG file markers. I've also uploaded a document annotating an actual image (download here) for reference.

All markers start with 0xff (255 in hexadecimal), followed by the byte count of the block data and the information describing the block. The structure is shown below:

# Write JPEG decoding information
# filename: output filename
# h: image height
# w: image width
def write_head(filename, h, w):
    # Open file in binary write mode (overwrites)
    fp = open(filename, "wb")

    # SOI
    fp.write(pack(">H", 0xffd8))
    # APP0
    fp.write(pack(">H", 0xffe0))
    fp.write(pack(">H", 16))            # APP0 byte count
    fp.write(pack(">L", 0x4a464946))    # JFIF
    fp.write(pack(">B", 0))            # 0
    fp.write(pack(">H", 0x0101))        # Version: 1.1
    fp.write(pack(">B", 0x01))            # Pixel density unit: pixels per inch
    fp.write(pack(">L", 0x00480048))    # X and Y pixel density
    fp.write(pack(">H", 0x0000))        # No thumbnail information
    # DQT_0
    fp.write(pack(">H", 0xffdb))
    fp.write(pack(">H", 64+3))        # Quantization table byte count
    fp.write(pack(">B", 0x00))            # Quantization table precision: 8bit (0), Table ID: 0
    tbl = block2zz(std_luminance_quant_tbl)
    for item in tbl:
        pfp.write(pack(">B", item))    # Content of quantization table 0
    # DQT_1
    fp.write(pack(">H", 0xffdb))
    fp.write(pack(">H", 64+3))        # Quantization table byte count
    fp.write(pack(">B", 0x01))            # Quantization table precision: 8bit (0), Table ID: 1
    tbl = block2zz(std_chrominance_quant_tbl)
    for item in tbl:
        pfp.write(pack(">B", item))    # Content of quantization table 1
    # SOF0
    fp.write(pack(">H", 0xffc0))
    fp.write(pack(">H", 17))            # Frame image info byte count
    fp.write(pack(">B", 8))            # Precision: 8bit
    fp.write(pack(">H", h))            # Image height
    fp.write(pack(">H", w))            # Image width
    fp.write(pack(">B", 3))            # Number of color components: 3 (YCrCb)
    fp.write(pack(">B", 1))            # Color component ID: 1
    fp.write(pack(">H", 0x1100))        # Horizontal and vertical sampling factors: 1, Quantization table ID: 0
    fp.write(pack(">B", 2))            # Color component ID: 2
    fp.write(pack(">H", 0x1101))        # Horizontal and vertical sampling factors: 1, Quantization table ID: 1
    fp.write(pack(">B", 3))            # Color component ID: 3
    fp.write(pack(">H", 0x1101))        # Horizontal and vertical sampling factors: 1, Quantization table ID: 1
    # DHT_DC0
    fp.write(pack(">H", 0xffc4))
    fp.write(pack(">H", len(std_huffman_DC0)+3))    # Huffman table byte count
    fp.write(pack(">B", 0x00))            # DC0
    for item in std_huffman_DC0:
        pfp.write(pack(">B", item))        # Huffman table content
    # DHT_AC0
    fp.write(pack(">H", 0xffc4))
    fp.write(pack(">H", len(std_huffman_AC0)+3))    # Huffman table byte count
    fp.write(pack(">B", 0x10))            # AC0
    for item in std_huffman_AC0:
        pfp.write(pack(">B", item))        # Huffman table content
    # DHT_DC1
    fp.write(pack(">H", 0xffc4))
    fp.write(pack(">H", len(std_huffman_DC1)+3))    # Huffman table byte count
    fp.write(pack(">B", 0x01))            # DC1
    for item in std_huffman_DC1:
        pfp.write(pack(">B", item))        # Huffman table content
    # DHT_AC1
    fp.write(pack(">H", 0xffc4))
    fp.write(pack(">H", len(std_huffman_AC1)+3))    # Huffman table byte count
    fp.write(pack(">B", 0x11))            # AC1
    for item in std_huffman_AC1:
        pfp.write(pack(">B", item))        # Huffman table content
    # SOS
    fp.write(pack(">H", 0xffda))
    fp.write(pack(">H", 12))            # Scan start info byte count
    fp.write(pack(">B", 3))            # Number of color components: 3
    fp.write(pack(">H", 0x0100))        # Component 1: DC and AC Huffman table IDs
    fp.write(pack(">H", 0x0211))        # Component 2: DC and AC Huffman table IDs
    fp.write(pack(">H", 0x0311))        # Component 3: DC and AC Huffman table IDs
    fp.write(pack(">B", 0x00))
    fp.write(pack(">B", 0x3f))
    fp.write(pack(">B", 0x00))            # Fixed value
    fp.close()

At this point, we have only the image data left to write. But how exactly is the image data encoded? How are quantization and Huffman coding implemented? See the next section for details.

JPEG Encoding Process

# Resize image to ensure it can be divided into 8×8 blocks
if ((h % 8 == 0) and (w % 8 == 0)):
    nblock = int(h * w / 64)
else:
    h = h // 8 * 8
    w = w // 8 * 8
    YCrCb = cv2.resize(YCrCb, [h, w], cv2.INTER_CUBIC)
    nblock = int(h * w / 64)

Color Space Conversion

The conversion formulas are:

Y = 0.299*R + 0.587*G + 0.114*B

Cb = -0.1687*R - 0.3313*G + 0.5*B + 128

Cr = 0.5*R - 0.4187*G - 0.0813*B + 128

In Python, you can use OpenCV’s function for color space transformation:

YCrCb = cv2.cvtColor(BGR, cv2.COLOR_BGR2YCrCb)
npdata = np.array(YCrCb, np.int16)

8×8 Block Division

for i in range(0, h, 8):
    for j in range(0, w, 8):
        ...

DCT Transformation

F(u,v)=\frac2{\sqrt{MN}}\sum_{x=0}^{M-1}\sum_{y=0}^{N-1}f(x,y)C(u)C(v)\cos\frac{(2x+1)u\pi}{2M}\cos\frac{(2y+1)v\pi}{2N}

where $C(u)=\begin{cases}\frac{1}{\sqrt{2}}&u=0\\1&u\neq0\end{cases}$ . In JPEG, $M=N=8$ .

Alternatively, use OpenCV’s built-in function:

now_block = npdata[i:i+8, j:j+8, 0] - 128    # Extract 8×8 block and subtract 128 (Y component)
now_block = npdata[i:i+8, j:j+8, 2] - 128    # Extract 8×8 block and subtract 128 (Cb component)
now_block = npdata[i:i+8, j:j+8, 1] - 128    # Extract 8×8 block and subtract 128 (Cr component)
now_block_dct = cv2.dct(np.float32(now_block))    # Perform DCT

Quantization

# Luminance quantization table
std_luminance_quant_tbl = np.array(
    [
        [16, 11, 10, 16, 24, 40, 51, 61],
        [12, 12, 14, 19, 26, 58, 60, 55],
        [14, 13, 16, 24, 40, 57, 69, 56],
        [14, 17, 22, 29, 51, 87, 80, 62],
        [18, 22, 37, 56, 68,109,103, 77],
        [24, 35, 55, 64, 81,104,113, 92],
        [49, 64, 78, 87,103,121,120,101],
        [72, 92, 95, 98,112,100,103, 99]
    ],
    np.uint8
)
# Chrominance quantization table
std_chrominance_quant_tbl = np.array(
    [
        [17, 18, 24, 47, 99, 99, 99, 99],
        [18, 21, 26, 66, 99, 99, 99, 99],
        [24, 26, 56, 99, 99, 99, 99, 99],
        [47, 66, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99]
    ],
    np.uint8
)

Quantization code:

now_block_qut = quantize(now_block_dct, 0)        # Quantize Y component
now_block_qut = quantize(now_block_dct, 2)        # Quantize Cb component
now_block_qut = quantize(now_block_dct, 1)        # Quantize Cr component

# Quantize
# block: current 8×8 block data
# dim: dimension (0: Y, 1: Cr, 2: Cb)
def quantize(block, dim):
    if dim == 0:
        # Use luminance quantization table
        qarr = std_luminance_quant_tbl
    else:
        # Use chrominance quantization table
        qarr = std_chrominance_quant_tbl
    return (block / qarr).round().astype(np.int16)

After quantization, many zeros appear in the bottom-right corner. To group these zeros together for better run-length encoding efficiency, we perform zigzag scanning next.

Zigzag Scanning

Zigzag scanning converts the 8×8 block into a 64-element list following this pattern:

now_block_zz = block2zz(now_block_qut)        # Zigzag scan

# Zigzag scan
# block: current 8×8 block data
def block2zz(block):
    re = np.empty(64, np.int16)
    # Current position in block
    pos = np.array([0, 0])
    # Define four scanning directions
    R = np.array([0, 1])
    LD = np.array([1, -1])
    D = np.array([1, 0])
    RU = np.array([-1, 1])
    for i in range(0, 64):
        re[i] = block[pos[0], pos[1]]
        if (((pos[0] == 0) or (pos[0] == 7)) and (pos[1] % 2 == 0)):
            pos = pos + R
        elif (((pos[1] == 0) or (pos[1] == 7)) and (pos[0] % 2 == 1)):
            pos = pos + D
        elif ((pos[0] + pos[1]) % 2 == 0):
            pos = pos + RU
        else:
            pos = pos + LD
    return re

Differential Encoding (DC Components)

last_block_ydc = 0
last_block_cbdc = 0
last_block_crdc = 0

now_block_dc = now_block_zz[0] - last_block_ydc # Store difference
last_block_ydc = now_block_zz[0] # Update last value

now_block_dc = now_block_zz[0] - last_block_cbdc
last_block_cbdc = now_block_zz[0]

now_block_dc = now_block_zz[0] - last_block_crdc
last_block_crdc = now_block_zz[0]

Run-Length Encoding (AC Components)

now_block_ac = RLE(now_block_zz[1:])

# Run-length encoding
# AClist: AC data to encode
def RLE(AClist: np.ndarray) -> np.ndarray:
    re = []
    cnt = 0
    for i in range(0, 63):
        if AClist[i] == 0 and cnt != 15:
            cnt += 1
        else:
            re.append(cnt)
            re.append(AClist[i])
            cnt = 0
    # Remove trailing [15, 0] pairs
    while re[-1] == 0:
        re.pop()
        re.pop()
        if len(re) == 0:
            break
    # Add two zeros at the end as termination marker
    if AClist[-1] == 0:
        re.extend([0, 0])
    return np.array(re, np.int16)

Special Binary Encoding in JPEG

After the above steps, this section explains how the encoded DC and AC components are written as a bitstream.

JPEG uses the following binary encoding scheme:

             Value              Bit Length           Stored Value
              0                   0                   None
            -1,1                  1                  0,1
         -3,-2,2,3                2              00,01,10,11
   -7,-6,-5,-4,4,5,6,7            3    000,001,010,011,100,101,110,111
     -15,..,-8,8,..,15            4       0000,..,0111,1000,..,1111
    -31,..,-16,16,..,31           5     00000,..,01111,10000,..,11111
    -63,..,-32,32,..,63           6                  ...
   -127,..,-64,64,..,127          7                  ...
  -255,..,-128,128,..,255         8                  ...
  -511,..,-256,256,..,511         9                  ...
 -1023,..,-512,512,..,1023       10                  ...
-2047,..,-1024,1024,..,2047      11                  ...

# Special binary encoding format
# num: number to encode
def tobin(num):
    s = ""
    if num > 0:
        while num != 0:
            s += '0' if num % 2 == 0 else '1'
            num = int(num / 2)
        s = s[::-1]  # Reverse
    elif num < 0:
        num = -num
        while num != 0:
            s += '1' if num % 2 == 0 else '0'
            num = int(num / 2)
        s = s[::-1]
    return s

Note: During writing, check for 0xff to avoid conflicts with markers. If encountered, insert 0x00 after it.

s = write_num(s, -1, now_block_dc, DC0)        # Write DC data based on encoding method
for l in range(0, len(now_block_ac), 2):  # Write AC data
    s = write_num(s, now_block_ac[l], now_block_ac[l+1], AC0)
    while len(s) >= 8:  # Prevent memory overflow
        num = int(s[0:8], 2)
        pfp.write(pack(">B", num))
        if num == 0xff:  # Avoid marker conflict
            pfp.write(pack(">B", 0))  # Insert 0x00 after 0xff
        s = s[8:len(s)]

# Write data based on encoding method
# s: un-written binary data
# n: number of leading zeros (-1 for DC)
# num: value to write
# tbl: canonical Huffman dictionary
def write_num(s, n, num, tbl):
    bit = 0
    tnum = num
    while tnum != 0:
        bit += 1
        tnum = int(tnum / 2)
    if n == -1:  # DC
        tnum = bit
        if tnum > 11:
            print("Write DC data Error")
            exit()
    else:  # AC
        if (n > 15) or (bit > 11) or (((n != 0) and (n != 15)) and (bit == 0)):
            print("Write AC data Error")
            exit()
        tnum = n * 10 + bit + (0 if n != 15 else 1)
    # Canonical Huffman code: record count of zeros (AC) and bit length of num
    s += tbl[tnum].str_code
    # Store the actual binary value of num
    s += tobin(num)
    return s

Canonical Huffman Coding

This article introduces four canonical Huffman tables: one for luminance DC, one for chrominance DC, one for luminance AC, and one for chrominance AC.

# Luminance DC canonical Huffman table
std_huffman_DC0 = np.array(
    [0, 0, 7, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
     4, 5, 3, 2, 6, 1, 0, 7, 8, 9, 10, 11],
    np.uint8
)
...
# Convert to Huffman dictionary
DC0 = DHT2tbl(std_huffman_DC0)    # Luminance DC
DC1 = DHT2tbl(std_huffman_DC1)    # Chrominance DC
AC0 = DHT2tbl(std_huffman_AC0)    # Luminance AC
AC1 = DHT2tbl(std_huffman_AC1)    # Chrominance AC

Now we know the length of each encoded value, but not the actual code.

Canonical Huffman coding follows these rules:

The first code of the shortest length is 0;
Codes of the same length are consecutive;
The first code of the next length (j) depends on the last code of the previous length (i): a=(b+1)<<(j-i).

By Rule 1, 4’s code is 000. By Rule 2, 5 is 001, 3 is 010, 2 is 011, ..., 0 is 110. By Rule 3, 7 is 1110, 8 is 11110...

# Class to store Huffman dictionary
# symbol: original value
# code: corresponding code
# n_bit: number of bits in code
# str_code: binary string representation
class Sym_Code():
    def __init__(self, symbol, code, n_bit):
        self.symbol = symbol
        self.code = code
        str_code = ''
        mask = 1 << (n_bit - 1)
        for i in range(0, n_bit):
            if mask & code:
                str_code += '1'
            else:
                str_code += '0'
            mask >>= 1
        self.str_code = str_code
    """Define output format"""
    def __str__(self):
        return "0x{:0>2x}    |  {}".format(self.symbol, self.str_code)
    """Define comparison"""
    def __eq__(self, other):
        return self.symbol == other.symbol
    def __le__(self, other):
        return self.symbol < other.symbol
    def __gt__(self, other):
        return self.symbol > other.symbol


# Convert canonical Huffman table to dictionary
# data: defined canonical Huffman table
def DHT2tbl(data):
    numbers = data[0:16]  # Count of codes for lengths 1–16
    symbols = data[16:len(data)]  # Original symbols
    if sum(numbers) != len(symbols):  # Validate table
        print("Wrong DHT!")
        exit()
    code = 0
    SC = []  # List to store dictionary
    for n_bit in range(1, 17):
        # Apply canonical Huffman rules
        for symbol in symbols[sum(numbers[0:n_bit-1]):sum(numbers[0:n_bit])]:
            SC.append(Sym_Code(symbol, code, n_bit))
            code += 1
        code <<= 1
    return sorted(SC)

The final Huffman dictionary is lengthy and can be viewed in my GitHub project. Studying its structure reveals how indexing works in the write_num function.

JPEG Encoding Details

Introduction

Various Markers in JPEG Files

JPEG Encoding Process

Color Space Conversion

8×8 Block Division

DCT Transformation

Quantization

Zigzag Scanning

Differential Encoding (DC Components)

Run-Length Encoding (AC Components)

Special Binary Encoding in JPEG

Canonical Huffman Coding

Search

JPEG Encoding Details

Introduction

Various Markers in JPEG Files

JPEG Encoding Process

Color Space Conversion

8×8 Block Division

DCT Transformation

Quantization

Zigzag Scanning

Differential Encoding (DC Components)

Run-Length Encoding (AC Components)

Special Binary Encoding in JPEG

Canonical Huffman Coding