Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[1.3.0] - 2025-12-04

Changed

Code Quality: Refactored entire codebase to remove inline implementation comments, relying on clear code structure and JSDoc.
Documentation: Updated SPEC.md, docs/advanced-features.md, and all other documentation files to reflect v1.3.0.
CI/CD: Updated GitHub Workflows (ci.yml, llm-evals.yml) to use Node 20 and correct script paths for example generation and verification.
Evaluation Scripts: Updated eval:baseline and eval:check-regressions to support --type=full for comprehensive accuracy benchmarks.

Fixed

ZonDecoder: Fixed a critical bug where escaped quotes ("") inside a quoted string caused incorrect splitting of values.
Round-Trip Verification: Fixed round-trip failures in llm-optimized mode by normalizing timestamps in datasets to include milliseconds (.000Z).
Utils: Fixed parseValue to correctly handle ZON-style double quoting ("") mixed with standard JSON escapes.

Added

Documentation: Added "Adaptive Encoding" and "Binary Format" sections to docs/advanced-features.md.

[1.2.0] - 2025-12-03

Major Release: Enterprise Features & Production Readiness

This release transforms ZON into an enterprise-grade data format with versioning, adaptive encoding, binary format, comprehensive developer tools, automated testing, and production documentation.

Added

Phase 1: Document-Level Schema Versioning

Version Embedding/Extraction: embedVersion() and extractVersion() for metadata management
Migration Manager: ZonMigrationManager with BFS path-finding for schema evolution
Backward/Forward Compatibility: Automatic migration between schema versions
Test Coverage: 45 tests covering all versioning scenarios

Phase 2: Adaptive Format Selection

4 Encoding Modes: auto, compact, readable, llm-optimized
Data Complexity Analyzer: Automatic analysis of nesting depth, irregularity, field count
Mode Recommendation: recommendMode() suggests optimal encoding based on data structure
Test Coverage: 20 tests for adaptive encoding

Phase 3: Binary ZON Format (ZON-B)

MessagePack-Inspired Encoding: Compact binary format with magic header (ZNB\x01)
40-60% Space Savings: Significantly smaller than JSON while maintaining structure
Full Type Support: Primitives, arrays, objects, nested structures
APIs: encodeBinary(), decodeBinary() with round-trip validation
Test Coverage: 18 tests for binary format

Phase 5: Developer Experience Tools

Format Converters: JSON ↔ ZON ↔ Binary with BatchConverter
Helper Utilities: size(), compareFormats(), analyze(), inferSchema(), compare(), isSafe()
Pretty Printer: Syntax highlighting with colors, diffPrint() for visual diffs
Enhanced Validator: Linting rules for depth, fields, performance with best practice suggestions
CLI Enhancements:
- zon analyze - Data complexity analysis
- zon diff - Visual file comparison
- zon validate --strict - Strict validation with linting
- zon convert --to=binary - Binary format conversion
- zon format --colors - Pretty printing with syntax highlighting

Phase 6: LLM Evaluation Framework

ZonEvaluator Engine: Core evaluation framework with metric registration
7 Built-in Metrics: exactMatch, tokenEfficiency, structuralValidity, formatCorrectness, partialMatch, hallucination, latency
Regression Detection: Compare baseline vs current results
Dataset Management: DatasetRegistry with versioning and tagging
Storage Backends: FileEvalStorage and MemoryEvalStorage

Phase 7: CI/CD Integration

GitHub Actions Workflow: Automated evaluations on PRs and main branch
Smoke Tests: Fast (under 1 min) tests on every PR
Regression Detection: Automatic baseline comparison with severity levels (critical/major/minor)
PR Comments: Auto-post eval results with metrics tables
Baseline Management: Auto-save successful builds as new baselines
NPM Scripts: eval:smoke, eval:check-regressions, eval:baseline

Phase 4: Production Documentation

Production Architecture Guide: Multi-format strategy, versioning workflows, API patterns
Best Practices Guide: Code organization, error handling, testing, security
Migration Examples: Batch migration scripts with stats tracking
Express Middleware: Content negotiation for JSON/ZON/Binary formats

Changed

Code Quality: Removed inline comments from core files (encoder.ts, decoder.ts)
Documentation: All functions have proper JSDoc/TSDoc documentation
Build System: Still compiles cleanly with TypeScript

Performance

Binary Format: 40-60% smaller than JSON
ZON Text: Maintains 16-19% smaller than JSON
Test Suite: All 288 tests passing

Documentation

New Guides: Production architecture, best practices, developer tools
Working Examples: Express middleware, migration scripts
API Reference: Complete documentation for all new APIs

Testing

Total Tests: 288 passing (up from 175)
Test Coverage: 100% for all new features
No Regressions: Full backward compatibility maintained

Development

Total Code: ~4250+ lines of production code
Files Added: 21 new files (binary/, evals/, tools/, docs/, examples/)
Quality: Professional-grade documentation, no inline comments

[1.1.0] - 2025-12-01

Major Release: Ecosystem Integrations & Streaming

This release transforms ZON into a production-ready format with first-class support for modern AI frameworks and streaming workflows.

Added

Phase 5: Ecosystem Integrations

LangChain Integration (zon-format/langchain): ZonOutputParser for seamless LLM chain integration
Vercel AI SDK Integration (zon-format/ai-sdk): streamZon helper for Next.js streaming UI
OpenAI Helper (zon-format/openai): ZOpenAI wrapper with automatic format injection

Phase 4: Developer Experience

VS Code Extension: Syntax highlighting for .zonf files
Performance Benchmarks: Automated benchmark suite comparing ZON vs JSON/MsgPack
CLI Enhancements: validate, stats, and format commands

Phase 3: Streaming & Utilities

Streaming APIs: ZonStreamEncoder and ZonStreamDecoder for memory-efficient processing
Browser/Edge Support: Verified compatibility with Cloudflare Workers and Vercel Edge
CLI Tools: Complete command-line interface for file operations

Advanced Features (Phases 1-2)

Runtime Schema Validation: Type-safe parsing with zon.object(), zon.string(), etc.
Dictionary Compression: Automatic deduplication of repeated string values
Delta Encoding: Sequential numeric columns compressed with delta notation
Type Coercion: Intelligent handling of LLM-generated "stringified" values
LLM-Aware Field Ordering: encodeLLM optimizes field order for specific LLM tasks

Documentation

5 New Comprehensive Guides: Streaming, Integrations, CLI, Schema Validation, Advanced Features
Updated README: Professional documentation section with organized navigation
Enhanced API Reference: Added streaming and integration APIs
Updated Syntax Cheatsheet: Added dictionary and metadata syntax

Fixed

Dictionary Round-trip Bug: Fixed regex pattern to support dotted column names (e.g., recipient.city[3])
Test Suite: All 175 tests passing (up from 121)

Performance

Token Efficiency: 16-19% fewer tokens than JSON
LLM Accuracy: 100% retrieval accuracy maintained
Round-trip Integrity: 100% across all 18 comprehensive test datasets

[1.0.5] - 2025-12-01

Added

Delta Encoding: Sequential numeric columns in tables are delta-encoded (:delta) for maximum compression.
Hierarchical Sparse Encoding: Nested objects in tables are flattened using dot notation for sparse columns.
Metadata Optimization: _writeMetadata now uses grouped object syntax (key{...}) instead of flattening, reducing redundancy.
LLM Optimization: New encodeLLM function optimizes field ordering for specific LLM tasks (retrieval vs. generation).
Type Coercion: Optional enableTypeCoercion in decoder to handle loose types (e.g., "true" -> true).
Algorithmic Benchmark Generation: Replaced LLM-based question generation with a deterministic algorithm.
Expanded Dataset: Added "products" and "feed" data to the unified dataset.

Changed

Benchmark Formats: Refined tested formats to ZON, TOON, JSON, JSON (Minified), and CSV.
Documentation: Updated all docs to reflect v1.0.5 features.

Fixed

Double Quoting Bug: Fixed an issue where _formatValue was triple-quoting strings that resembled numbers.
Rate Limiting: Resolved 429 errors during benchmarking.

[1.0.4] - 2025-11-29

Fixed

Critical Data Integrity: Fixed roundtrip failures for strings containing newlines, empty strings, and escaped characters.
Decoder Logic: Fixed _splitByDelimiter to correctly handle nested arrays and objects within table cells (e.g., [10, 20]).
Encoder Logic: Added mandatory quoting for empty strings and strings with newlines to prevent data loss.

Documentation

Updated SPEC.md and syntax-cheatsheet.md to explicitly require quoting for empty strings and escape sequences.

[1.0.3] - 2025-11-28

100% LLM Retrieval Accuracy Achieved

Major Achievement: ZON now achieves 100% LLM retrieval accuracy while maintaining superior token efficiency over TOON!

Changed

Explicit Sequential Columns: Disabled automatic sequential column omission ([id] notation)
- All columns now explicitly listed in table headers for better LLM comprehension
- Example: users:@(5):active,id,lastLogin,name,role (was users:@(5)[id]:active,lastLogin,name,role)
- Trade-off: +1.7% token increase for 100% LLM accuracy

Performance

LLM Accuracy: 100% (24/24 questions) vs TOON 100%, JSON 91.7%
Token Efficiency: 19,995 tokens (5.0% fewer than TOON's 20,988)
Overall Savings vs TOON: 4.6% (Claude) to 17.6% (GPT-4o)

Quality

All unit tests pass (28/28)
All roundtrip tests pass (27/27 datasets)
No data loss or corruption
Production ready

[1.0.3] - 2025-11-27

HISTORIC ACHIEVEMENT: 8/8 Perfect Sweep vs All Competitors!

Breaking Changes:

Compact header syntax: @count: instead of @data(count):
Sequential ID auto-omission: [id] notation for 1..N sequences
Adaptive format selection based on data complexity

Added

Sparse Table Encoding: Automatically detects semi-uniform data and uses key:value notation for optional fields
Irregularity Score Calculation: Jaccard similarity-based scoring to choose optimal table format
Sequential Column Detection: Identifies and omits columns with sequential values (1, 2, 3, ..., N)
Smart Date Detection: ISO 8601 dates output unquoted for token efficiency
Context-Aware String Quoting: Only quotes strings when necessary to preserve type semantics

Performance

Total Tokens: 1,945 (down from 2,081 in v1.0.2)
-136 tokens saved (-6.5% improvement)
8/8 wins vs CSV (previously 4/8 tied)
8/8 wins vs TOON (-24.4% better)
-57.2% better than JSON formatted
-27.0% better than JSON compact

Benchmark Results (8 datasets)

Employees: 132 tokens (CSV: 138) - ZON WINS -4.3%
Time-Series: 245 tokens (CSV: 247) - ZON WINS -0.8%
GitHub Repos: 148 tokens (CSV: 164) - ZON WINS -9.8%
Event Logs: 220 tokens (CSV: 231) - ZON WINS -4.8% ← Sparse tables!
E-commerce: 193 tokens (CSV: 313) - ZON WINS -38.3%
Hike Data: 62 tokens (CSV: 85) - ZON WINS -27.1%
Deep Config: 111 tokens (CSV: 182) - ZON WINS -39.0%
Heavily Nested: 764 tokens (CSV: 1,044) - ZON WINS -26.8%

Competitive Analysis

vs CSV: -20.1% tokens overall
vs TOON: -24.4% tokens overall (beats on ALL datasets)
vs JSON: -57.2% formatted, -27.0% compact
Real Cost Savings: $4,890/month vs CSV at 1M API calls (GPT-4)

Fixed

Improved irregular schema detection to enable sparse tables for Event Logs
Enhanced sparse encoding threshold to support up to 5 optional columns
Better handling of undefined/null values in standard tables

Documentation

Added comprehensive competitive analysis vs TOON, CSV, JSON, YAML, XML
Documented sparse table encoding mechanism
Added real-world cost savings calculations
Updated benchmarks with CSV comparison

[1.0.2] - 2025-11-26

Fixed

Fixed a bug in decoder.ts where strings resembling partial numbers (e.g., IP addresses) were incorrectly parsed as numbers.
Improved number parsing strictness to prevent data corruption in mixed-type fields.

Added

Added "Heavily Nested Data" dataset to benchmarks to validate performance on complex structures.
Updated benchmark results: ZON now shows 18% better compression than TOON on average across 8 datasets.

[1.0.1] - 2025-11-26

Changed

Changed license from Apache-2.0 to MIT
Updated documentation to reflect license change

1.0.0 - 2025-11-26

Added

Initial TypeScript/JavaScript implementation of ZON Format v1.0.0
Full encoder with ZON ClearText format support
Full decoder with parsing for YAML-like metadata and @table syntax
Constants module for format tokens and thresholds
Custom exceptions for decode errors
Comprehensive test suite with 22 test cases
TypeScript type definitions
Complete README with examples and API documentation
Example file demonstrating usage

Features

100% compatible with Python ZON v1.0.0 implementation
Lossless encoding and decoding
Boolean compression (T/F tokens)
Minimal quoting for strings
Table format with @table syntax
Nested object/array support with inline ZON format
Type preservation (numbers, booleans, null, strings)
CSV-style quoting for special characters
Control character escaping (newlines, tabs, etc.)

Testing

Full round-trip tests for all data types
Edge case handling (empty strings, whitespace, special characters)
Large array and deeply nested object support
Type preservation verification
Hikes example from README validated

Documentation

Comprehensive README with installation, API reference, and examples
Code comments and JSDoc annotations
Example file showing practical usage
License information (Apache-2.0)

Edit on GitHub

PreviousEfficiency Formalization