A Simple Introduction to V.44 Compression

V.44 is a data compression standard defined by the ITU-T and commonly used in modems and DLMS. It is based on the LZJH algorithm, which is a dictionary-based compression method. Instead of sending the same strings again and again, V.44 can replace repeated patterns with shorter references.

How Compression Works

V.44 compression looks for repeated strings and encodes them using compact references. In simple terms, the compressor builds a dictionary of data patterns and then reuses them whenever possible. This is why repeated text usually compresses well.

Default Ordinal Size and Default Codeword Size

Two important parameters in V.44 are Default Ordinal Size and Default Codeword Size. These determine how many bits are used to represent dictionary references.

Smaller values use fewer bits per reference, which reduces overhead, but they also limit how many entries can be represented. Larger values allow more references and a larger effective dictionary, but they also increase the number of bits needed to encode those references.

This means that these settings directly affect compression efficiency. If the values are too small, the compressor may not be able to represent enough repeated patterns. If the values are too large, the extra bit cost can reduce the compression gain.

Important: the sender and the receiver must use the same Default Ordinal Size and Default Codeword Size. If they do not match, the receiver will interpret the compressed bit stream incorrectly and decompression will fail.

Maximum String Length

The Maximum String Length parameter has a significant impact on the compression process because it defines the maximum length of repeated data sequences that the compressor is allowed to encode as references. A larger value can improve compression efficiency by enabling longer repeated patterns to be compressed into shorter representations, while a smaller value may reduce memory usage and processing requirements at the cost of lower compression performance. It is critical that the same Maximum String Length value is configured on both the client and the meter. If the values differ, the decompression process may interpret compressed references incorrectly, which can cause data decompression to fail and result in corrupted or unreadable data.

Why ASCII Compresses Better Than Binary Data

ASCII text usually compresses better than binary data because text contains many repeated structures: common words, spaces, punctuation, and letter combinations. Binary data is often more random and may contain fewer repeated patterns that the compressor can reuse.

For example, text such as DOCUMENTDOCUMENT contains obvious repetition, so dictionary-based compression can represent the later occurrence with a shorter reference. Random binary bytes often do not have such clear repetition, so the compressor may gain little or nothing.

Impact of Parser Variations on Node Tree Structure and Compression Efficiency

Different parsers can produce significantly different node trees from the same input data, depending on how they interpret structure, grouping, and data types. For example, one parser may generate a deeply nested hierarchy with explicit structural elements, while another may flatten the structure or omit redundant metadata. These differences directly affect compression efficiency. A more compact and consistent node tree reduces redundancy and improves pattern repetition, enabling compression algorithms such as V.44 to achieve higher compression ratios. In contrast, a verbose or inconsistently structured node tree may introduce unnecessary variability, reducing the effectiveness of dictionary-based compression and leading to larger payload sizes.

Why Compressed Data Can Be Larger

Compression does not always reduce size. In some cases, compressed data can be larger than the original data. This happens because compression introduces overhead: dictionary handling, control information, and encoded references all take space.

If the input data does not contain enough repetition, the overhead may be larger than the savings. The content of the data therefore has a major effect on compression performance.

Example string:
BIBINARYDODOCUMENTDOCUMENTATIONDE

This example contains both repeated and non-repeated parts. The substring DOCUMENT is useful for compression, but other parts are less repetitive. Because of this mixture, the result may compress only slightly, or in some cases even expand depending on the chosen parameters and the compressor state.

Example Compression Output

The sentence:

...spending a year dead for tax purposes.

compresses to the following hex string:

5C09E6E0CADCC8D2DCCE40C240F2CAC2E440C825C840CCDE29E8C2F040E0EAE4E0DEE6CAE65C1403

This example uses the Gurux DLMS Translator compression page provided by the user:

Open the Gurux DLMS Translator example

The main reason for this sentence is that it's used in Signaling Compression (SigComp) Users' Guide rfc4464. The sentence is also a reference to Douglas Adams, best known for The Hitchhiker's Guide to the Galaxy

V.44 with DLMS

In DLMS/COSEM, there is currently no standardized mechanism for a client to query the Default Ordinal Size or Default Codeword Size from the meter. As a result, both the client and the meter must be pre-configured with identical values to ensure successful compression and decompression. Any mismatch in these parameters will lead to decoding failures.

V.44 compression is applied independently to each PDU (Protocol Data Unit). This has a direct impact on compression efficiency: smaller PDUs provide less redundancy and therefore achieve lower compression ratios, while larger PDUs typically result in significantly better compression due to increased repetition and pattern reuse within the data.

For optimal performance, it is recommended to maximize the payload size within the limits of the communication channel (e.g., HDLC frame size, transport constraints, or LPWAN limitations). Larger PDUs allow the V.44 algorithm to build a more effective dictionary, improving compression efficiency and reducing overall transmission size.

Compression must always be performed before encryption. Encryption algorithms introduce high entropy into the data stream, effectively removing patterns and making compression ineffective. Therefore, the correct processing order is:

Data → V.44 Compression → Encryption → Transmission

Additionally, the use of techniques such as compact arrays, delta encoding, and null-data optimization in DLMS/COSEM can further enhance compression performance when combined with V.44. These methods reduce redundancy at the data model level before compression is applied, leading to even smaller payloads.

In battery-powered or bandwidth-constrained environments, such as NB-IoT or LTE-M, efficient use of V.44 compression together with optimized PDU sizing can significantly reduce airtime and energy consumption, extending device lifetime and improving overall system performance.

Comparison of Encoding Methods Based on DLMS UA Study

The document “DLMS/COSEM for Battery-Powered Devices: A Comprehensive Study” provides a detailed comparison of different encoding techniques in Annex 1 – Payload Calculations for Example 1. The results clearly demonstrate how various DLMS/COSEM data optimization methods impact payload size and, consequently, communication efficiency.

In the baseline Normal Encoding, where each record includes a full timestamp and register value, the payload size reaches approximately 1114 bytes. This approach is straightforward but inefficient, as it repeats metadata for every entry. When V.44 compression is applied, the payload is reduced significantly to 279 bytes, highlighting the effectiveness of compression even without structural optimization.

Open the Normal V.44 compression example

The Compressed Data Encoding method improves efficiency by eliminating redundant fields such as repeated timestamps and data descriptions, replacing them with null values where applicable. This reduces the payload to 456 bytes, and further down to 204 bytes when combined with V.44 compression. This demonstrates how even simple structural optimizations can yield substantial savings.

Open the Compressed Data Encoding V.44 compression example

The Compact Array Encoding approach goes further by defining the data structure once and omitting repeated type information. This reduces the payload to 316 bytes, and to 149 bytes with V.44 compression. Compared to compressed encoding, this method benefits from both structural compactness and reduced metadata overhead.

Open the Compact Array V.44 compression example

Finally, Delta Value Encoding achieves the highest efficiency by storing only incremental changes between readings instead of full register values. Although its uncompressed size is similar to compact array encoding (315 bytes), the combination with V.44 compression results in the smallest payload of all methods, approximately 103 bytes. This demonstrates that reducing data entropy before compression significantly enhances compression performance.

Open the Delta Value Encoding V.44 compression example

Overall, the comparison shows a clear progression: from redundant full data representation to highly optimized delta-based encoding. The study confirms that combining DLMS/COSEM data modeling techniques (such as compact arrays and delta encoding) with V.44 compression provides the best results in terms of payload reduction. This is especially important for battery-powered and low-bandwidth communication environments, where minimizing transmitted data directly translates to lower energy consumption and improved device longevity.

Summary

V.44 is a dictionary-based compression method.
Default Ordinal Size and Default Codeword Size affect how references are encoded.
The sender and receiver must use the same values for those settings, or decompression fails.
ASCII text usually compresses better than binary data because it contains more repetition and can be represented using only 7 bits per character.
Compressed data can sometimes be larger than uncompressed data if the content does not compress well enough to overcome overhead.
Larger PDUs result in significantly better compression.
Compression is done before ciphering.
Compression efficiency depends on the data and can vary significantly.

Gurux V.44 compression library

Gurux V.44 compression library has not yet been released as open source, as some of our customers have specifically requested this. Additionally, resolving compression-related issues is highly time-consuming. This tool provides a simple and efficient way to perform V.44 compression and makes it easier to decide whether to add V.44 support to the meter.