Advanced Features

Version: 1.3.0 Status: Stable

Overview

ZON includes advanced compression and optimization features that dramatically reduce token count and improve LLM accuracy. These features are automatically applied by the encoder when beneficial.

Dictionary Compression
Type Coercion
LLM-Aware Field Ordering
Hierarchical Sparse Encoding
Adaptive Encoding
Binary Format

Dictionary Compression

Introduced: v1.0.3 Purpose: Deduplicate repeated string values

How It Works

When a column has many repeated values, ZON creates a dictionary and stores indices:

# Without dictionary:
shipments:@(150):status,...
pending,...
delivered,...
pending,...
in-transit,...
pending,...
...

# With dictionary:
status[3]:delivered,in-transit,pending
shipments:@(150):status,...
2,...    # "pending"
0,...    # "delivered"
2,...    # "pending"
1,...    # "in-transit"
2,...    # "pending"
...

When To Use

Dictionary compression is automatically applied when:

Column has >=10 values
Column has <=10 unique values
Compression ratio > 1.2x

Examples

const shipments = Array.from({ length: 100 }, (_, i) => ({
  id: i,
  status: ['pending', 'delivered', 'in-transit'][i % 3]
}));

const zon = encode({ shipments });
/*
status[3]:delivered,in-transit,pending
shipments:@(100):id,status
0,2       # id:0, status:"pending"
1,0       # id:1, status:"delivered"
2,1       # id:2, status:"in-transit"
...
*/

Nested Columns

Dictionary compression works with flattened nested fields:

const data = {
  users: [
    { name: 'Alice', address: { city: 'NYC' } },
    { name: 'Bob', address: { city: 'LAX' } },
    { name: 'Carol', address: { city: 'NYC' } }
  ]
};

// Automatically creates dictionary for "address.city"

Token Savings

Real-world examples:

Dataset	Without Dict	With Dict	Savings
E-commerce orders	45k tokens	28k tokens	38%
Log files	120k tokens	65k tokens	46%
User roles	8k tokens	3k tokens	63%

Type Coercion

Introduced: v1.1.0 Purpose: Handle "stringified" values from LLMs

The Problem

LLMs sometimes return numbers or booleans as strings:

{
  "age": "25",        // Should be number
  "active": "true"    // Should be boolean
}

The Solution

Enable type coercion in the encoder:

import { ZonEncoder } from 'zon-format';

const encoder = new ZonEncoder(
  undefined,  // anchor interval (default)
  true,       // dictionary compression (default)
  true        //  Enable type coercion
);

const data = {
  users: [
    { age: "25", active: "true" },   // Strings
    { age: "30", active: "false" }
  ]
};

const zon = encoder.encode(data);
// users:@(2):active,age
// T,25      # Coerced to boolean and number
// F,30

How It Works

Analyzes entire column
Detects if all values are "coercible" (e.g., "123" -> 123)
Coerces entire column to the target type

Supported Coercions

From	To	Example
`"123"`	`123`	Number strings
`"true"`	`T`	Boolean strings
`"false"`	`F`	Boolean strings
`"null"`	`null`	Null strings

Decoder Coercion

The decoder also supports type coercion for LLM-generated ZON:

import { decode } from 'zon-format';

const options = { enableTypeCoercion: true };
const data = decode(llmOutput, options);

LLM-Aware Field Ordering

Introduced: v1.1.0 Purpose: Optimize field order for LLM attention

The Problem

LLMs pay more attention to earlier tokens in a sequence. Default alphabetical sorting may not be optimal:

# Alphabetical (default):
users:@(100):active,age,country,email,id,name,role

The Solution

Use encodeLLM to reorder fields based on usage pattern:

import { encodeLLM } from 'zon-format';

const data = { users: [...] };

// For retrieval tasks: prioritize ID and name
const zon = encodeLLM(data, {
  task: 'retrieval',
  priorityFields: ['id', 'name']
});
/*
users:@(100):id,name,age,role,email,...
*/

// For generation/analysis: prioritize context
const zon2 = encodeLLM(data, {
  task: 'generation',
  priorityFields: ['role', 'country']
});
/*
users:@(100):role,country,id,name,...
*/

Ordering Strategies

// 1. Frequency-based: Most common values first
encodeLLM(data, { strategy: 'frequency' });

// 2. Entropy-based: High-information fields first
encodeLLM(data, { strategy: 'entropy' });

// 3. Custom: Your own ordering
encodeLLM(data, {
  strategy: 'custom',
  fieldOrder: ['id', 'name', 'email', 'role']
});

Measured Impact

Task	Default Order	Optimized Order	Accuracy Gain
Entity Extraction	87%	94%	+7%
Data Retrieval	92%	98%	+6%
Classification	89%	93%	+4%

Hierarchical Sparse Encoding

Introduced: v1.1.0 Purpose: Efficiently encode nested objects with missing fields

How It Works

Nested fields are flattened with dot notation:

const data = {
  users: [
    { id: 1, profile: { bio: 'Developer' } },
    { id: 2, profile: null },
    { id: 3, profile: { bio: 'Designer' } }
  ]
};

// Encoded as:
// users:@(3):id,profile.bio
// 1,Developer
// 2,null
// 3,Designer

Deep Nesting

Supports up to 5 levels of nesting:

const data = {
  items: [{
    a: { b: { c: { d: { e: 'Deep!' } } } }
  }]
};

// Flattened to:
// items:@(1):a.b.c.d.e
// Deep!

Sparse Columns

Missing values are preserved:

const data = {
  products: [
    { id: 1, meta: { color: 'red', size: 'L' } },
    { id: 2 }, // No meta
    { id: 3, meta: { color: 'blue' } } // No size
  ]
};

// Core: id, meta.color
// Sparse (inline): meta.size
// products:@(3):id,meta.color
// 1,red,meta.size:L
// 2,null
// 3,blue

Adaptive Encoding

Introduced: v1.2.0 Purpose: Automatically select the best encoding mode based on data complexity

The Problem

Different data structures benefit from different encoding strategies. A deeply nested config file might be better suited for a readable format, while a large table of uniform data needs compact encoding.

The Solution

encodeAdaptive analyzes your data and selects the optimal mode:

import { encodeAdaptive } from 'zon-format';

const data = { ... };

// Automatically selects mode
const zon = encodeAdaptive(data);

Modes

Mode	Description	Best For
`auto`	Analyzes data and picks best mode	General purpose
`compact`	Maximizes compression (default ZON)	Large datasets, API payloads
`readable`	Adds indentation and whitespace	Config files, debugging
`llm-optimized`	Optimizes for retrieval/generation	LLM prompts

Complexity Analysis

You can also analyze data complexity directly:

import { DataComplexityAnalyzer } from 'zon-format';

const analyzer = new DataComplexityAnalyzer();
const metrics = analyzer.analyze(data);

console.log(metrics.score); // 0-100 complexity score
console.log(metrics.recommendation); // 'compact', 'readable', etc.

Binary Format

Introduced: v1.2.0 Purpose: Maximum space efficiency for storage and internal APIs

Overview

ZON Binary (ZON-B) is a compact, binary serialization format inspired by MessagePack but optimized for ZON's data model. It uses a magic header ZNB\x01.

Usage

import { encodeBinary, decodeBinary } from 'zon-format';

const data = { id: 1, name: "Alice" };

// Encode to Uint8Array
const binary = encodeBinary(data);

// Decode back to object
const decoded = decodeBinary(binary);

Performance

Metric	JSON	ZON Text	ZON Binary
Size	100%	~84%	~40-60%
Parse Speed	Fast	Medium	Fastest
Human Readable	Yes	Yes	No

Performance Tips

Dictionary compression: Best for categorical data (status, roles, countries)
Type coercion: Enable when dealing with LLM outputs
Field ordering: Use for retrieval-heavy applications
Sparse encoding: Automatic, no configuration needed

Essentials

Toolkit

Technical Reference

Advanced Features

Overview

Table of Contents

Dictionary Compression

How It Works

When To Use

Examples

Nested Columns

Token Savings

Type Coercion

The Problem

The Solution

How It Works

Supported Coercions

Decoder Coercion

LLM-Aware Field Ordering

The Problem

The Solution

Ordering Strategies

Measured Impact

Hierarchical Sparse Encoding

How It Works

Deep Nesting

Sparse Columns

Adaptive Encoding

The Problem

The Solution

Modes

Complexity Analysis

Binary Format

Overview

Usage

Performance

Performance Tips

See Also