ZON Logo
Documentation
Docs
Toolkit
Advanced Features

Advanced Features

Version: 1.3.0 Status: Stable

Overview

ZON includes advanced compression and optimization features that dramatically reduce token count and improve LLM accuracy. These features are automatically applied by the encoder when beneficial.

Table of Contents


Dictionary Compression

Introduced: v1.0.3 Purpose: Deduplicate repeated string values

How It Works

When a column has many repeated values, ZON creates a dictionary and stores indices:

# Without dictionary:
shipments:@(150):status,...
pending,...
delivered,...
pending,...
in-transit,...
pending,...
...

# With dictionary:
status[3]:delivered,in-transit,pending
shipments:@(150):status,...
2,...    # "pending"
0,...    # "delivered"
2,...    # "pending"
1,...    # "in-transit"
2,...    # "pending"
...

When To Use

Dictionary compression is automatically applied when:

  1. Column has >=10 values
  2. Column has <=10 unique values
  3. Compression ratio > 1.2x

Examples

const shipments = Array.from({ length: 100 }, (_, i) => ({
  id: i,
  status: ['pending', 'delivered', 'in-transit'][i % 3]
}));

const zon = encode({ shipments });
/*
status[3]:delivered,in-transit,pending
shipments:@(100):id,status
0,2       # id:0, status:"pending"
1,0       # id:1, status:"delivered"
2,1       # id:2, status:"in-transit"
...
*/

Nested Columns

Dictionary compression works with flattened nested fields:

const data = {
  users: [
    { name: 'Alice', address: { city: 'NYC' } },
    { name: 'Bob', address: { city: 'LAX' } },
    { name: 'Carol', address: { city: 'NYC' } }
  ]
};

// Automatically creates dictionary for "address.city"

Token Savings

Real-world examples:

DatasetWithout DictWith DictSavings
E-commerce orders45k tokens28k tokens38%
Log files120k tokens65k tokens46%
User roles8k tokens3k tokens63%

Type Coercion

Introduced: v1.1.0 Purpose: Handle "stringified" values from LLMs

The Problem

LLMs sometimes return numbers or booleans as strings:

{
  "age": "25",        // Should be number
  "active": "true"    // Should be boolean
}

The Solution

Enable type coercion in the encoder:

import { ZonEncoder } from 'zon-format';

const encoder = new ZonEncoder(
  undefined,  // anchor interval (default)
  true,       // dictionary compression (default)
  true        //  Enable type coercion
);

const data = {
  users: [
    { age: "25", active: "true" },   // Strings
    { age: "30", active: "false" }
  ]
};

const zon = encoder.encode(data);
// users:@(2):active,age
// T,25      # Coerced to boolean and number
// F,30

How It Works

  1. Analyzes entire column
  2. Detects if all values are "coercible" (e.g., "123" -> 123)
  3. Coerces entire column to the target type

Supported Coercions

FromToExample
"123"123Number strings
"true"TBoolean strings
"false"FBoolean strings
"null"nullNull strings

Decoder Coercion

The decoder also supports type coercion for LLM-generated ZON:

import { decode } from 'zon-format';

const options = { enableTypeCoercion: true };
const data = decode(llmOutput, options);

LLM-Aware Field Ordering

Introduced: v1.1.0 Purpose: Optimize field order for LLM attention

The Problem

LLMs pay more attention to earlier tokens in a sequence. Default alphabetical sorting may not be optimal:

# Alphabetical (default):
users:@(100):active,age,country,email,id,name,role

The Solution

Use encodeLLM to reorder fields based on usage pattern:

import { encodeLLM } from 'zon-format';

const data = { users: [...] };

// For retrieval tasks: prioritize ID and name
const zon = encodeLLM(data, {
  task: 'retrieval',
  priorityFields: ['id', 'name']
});
/*
users:@(100):id,name,age,role,email,...
*/

// For generation/analysis: prioritize context
const zon2 = encodeLLM(data, {
  task: 'generation',
  priorityFields: ['role', 'country']
});
/*
users:@(100):role,country,id,name,...
*/

Ordering Strategies

// 1. Frequency-based: Most common values first
encodeLLM(data, { strategy: 'frequency' });

// 2. Entropy-based: High-information fields first
encodeLLM(data, { strategy: 'entropy' });

// 3. Custom: Your own ordering
encodeLLM(data, {
  strategy: 'custom',
  fieldOrder: ['id', 'name', 'email', 'role']
});

Measured Impact

TaskDefault OrderOptimized OrderAccuracy Gain
Entity Extraction87%94%+7%
Data Retrieval92%98%+6%
Classification89%93%+4%

Hierarchical Sparse Encoding

Introduced: v1.1.0 Purpose: Efficiently encode nested objects with missing fields

How It Works

Nested fields are flattened with dot notation:

const data = {
  users: [
    { id: 1, profile: { bio: 'Developer' } },
    { id: 2, profile: null },
    { id: 3, profile: { bio: 'Designer' } }
  ]
};

// Encoded as:
// users:@(3):id,profile.bio
// 1,Developer
// 2,null
// 3,Designer

Deep Nesting

Supports up to 5 levels of nesting:

const data = {
  items: [{
    a: { b: { c: { d: { e: 'Deep!' } } } }
  }]
};

// Flattened to:
// items:@(1):a.b.c.d.e
// Deep!

Sparse Columns

Missing values are preserved:

const data = {
  products: [
    { id: 1, meta: { color: 'red', size: 'L' } },
    { id: 2 }, // No meta
    { id: 3, meta: { color: 'blue' } } // No size
  ]
};

// Core: id, meta.color
// Sparse (inline): meta.size
// products:@(3):id,meta.color
// 1,red,meta.size:L
// 2,null
// 3,blue

Adaptive Encoding

Introduced: v1.2.0 Purpose: Automatically select the best encoding mode based on data complexity

The Problem

Different data structures benefit from different encoding strategies. A deeply nested config file might be better suited for a readable format, while a large table of uniform data needs compact encoding.

The Solution

encodeAdaptive analyzes your data and selects the optimal mode:

import { encodeAdaptive } from 'zon-format';

const data = { ... };

// Automatically selects mode
const zon = encodeAdaptive(data);

Modes

ModeDescriptionBest For
autoAnalyzes data and picks best modeGeneral purpose
compactMaximizes compression (default ZON)Large datasets, API payloads
readableAdds indentation and whitespaceConfig files, debugging
llm-optimizedOptimizes for retrieval/generationLLM prompts

Complexity Analysis

You can also analyze data complexity directly:

import { DataComplexityAnalyzer } from 'zon-format';

const analyzer = new DataComplexityAnalyzer();
const metrics = analyzer.analyze(data);

console.log(metrics.score); // 0-100 complexity score
console.log(metrics.recommendation); // 'compact', 'readable', etc.

Binary Format

Introduced: v1.2.0 Purpose: Maximum space efficiency for storage and internal APIs

Overview

ZON Binary (ZON-B) is a compact, binary serialization format inspired by MessagePack but optimized for ZON's data model. It uses a magic header ZNB\x01.

Usage

import { encodeBinary, decodeBinary } from 'zon-format';

const data = { id: 1, name: "Alice" };

// Encode to Uint8Array
const binary = encodeBinary(data);

// Decode back to object
const decoded = decodeBinary(binary);

Performance

MetricJSONZON TextZON Binary
Size100%~84%~40-60%
Parse SpeedFastMediumFastest
Human ReadableYesYesNo

Performance Tips

  1. Dictionary compression: Best for categorical data (status, roles, countries)
  2. Type coercion: Enable when dealing with LLM outputs
  3. Field ordering: Use for retrieval-heavy applications
  4. Sparse encoding: Automatic, no configuration needed

See Also