This document explains the WebAssembly (WASM) implementation in web-csv-toolbox and how it achieves high-performance CSV parsing.
web-csv-toolbox includes an optional WebAssembly module that provides improved CSV parsing performance compared to the JavaScript implementation. The WASM module is a compiled version of optimized parsing code that runs at near-native speed.
The library provides two entry points for WASM functionality:
web-csv-toolbox): Automatic WASM initialization with embedded binaryweb-csv-toolbox/slim): Manual initialization with external WASM loading┌─────────────────────────────────────────────────────────────┐
│ High-Level API (parse, parseString, etc.) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Execution Router │
│ - Selects execution strategy based on EngineConfig │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────┴─────────────────┐
↓ ↓
┌──────────────────┐ ┌──────────────────┐
│ JavaScript │ │ WebAssembly │
│ Implementation │ │ Implementation │
│ │ │ │
│ - All features │ │ - Compiled code │
│ - All encodings │ │ - UTF-8 only │
│ - All options │ │ - Limited options│
└──────────────────┘ └──────────────────┘
Key Points:
This project ships two entry points (Main and Slim) that differ only in how WebAssembly is initialized and delivered. For a practical comparison and guidance on when to use each:
→ See: Main vs Slim Entry Points
Performance:
Portability:
Safety:
The WASM module is compiled from Rust code because:
Performance:
Memory Safety:
WASM Support:
wasm-bindgenWASM is opt-in rather than always-on because:
Trade-offs:
Flexibility:
// loadWASM.ts (conceptual)
import init, { type InitInput } from 'web-csv-toolbox-wasm';
// In the web-csv-toolbox distribution, the WASM asset is exported as `web-csv-toolbox/csv.wasm`.
import wasmUrl from 'web-csv-toolbox/csv.wasm';
export async function loadWASM(input?: InitInput) {
await init({ module_or_path: input ?? wasmUrl });
}
How it works:
csv.wasm)init() loads and instantiates the module (via URL or Buffer)// parseStringToArraySyncWASM.ts
import { parseStringToArraySync } from "web-csv-toolbox-wasm";
export function parseStringToArraySyncWASM<Header>(
csv: string,
options?: CommonOptions
): CSVRecord<Header>[] {
// Validate options
if (quotation !== '"') {
throw new RangeError("Invalid quotation, must be double quote on WASM.");
}
// Call WASM function
const delimiterCode = delimiter.charCodeAt(0);
return JSON.parse(parseStringToArraySync(csv, delimiterCode));
}
Key implementation details:
┌──────────────────┐ ┌──────────────────┐
│ JavaScript Heap │ │ WASM Linear │
│ │ │ Memory │
│ - JS Objects │ Copy data │ │
│ - Strings │ ────────────────> │ - CSV String │
│ - Arrays │ │ - Parsing State │
│ │ Copy result │ - Output Buffer │
│ │ <──────────────── │ │
└──────────────────┘ └──────────────────┘
Data Flow:
Memory Efficiency:
WASM respects the same maxBufferSize limit as JavaScript:
const lexer = new FlexibleStringCSVLexer({ maxBufferSize: 10 * 1024 * 1024 }); // Example: 10MB
Why:
Performance depends on many factors:
Theoretical advantages of WASM:
Actual performance: For measured performance in various scenarios, see CodSpeed benchmarks.
// First call - module loading
await loadWASM();
// Subsequent calls - instant (module cached)
await loadWASM();
Considerations:
Both implementations have similar memory usage:
| Stage | JavaScript | WASM |
|---|---|---|
| Input | String (in heap) | String (copied to linear memory) |
| Parsing | CSVLexer buffer (configurable) | Parsing state (configurable) |
| Output | Objects (in heap) | JSON string → Objects |
Total: Both implementations use approximately 2x input size temporarily during parsing.
for await (const record of parse(csv, {
engine: { wasm: true }
})) {
console.log(record);
}
Architecture:
Main Thread:
1. Load CSV string
2. Call WASM function
3. Parse CSV in WASM
4. Return results
5. Yield records
Characteristics:
for await (const record of parse(csv, {
engine: { worker: true, wasm: true }
})) {
console.log(record);
}
Architecture:
Main Thread: Worker Thread:
1. Transfer CSV data --> 1. Receive CSV data
2. Wait for results 2. Call WASM function
3. Receive records <-- 3. Parse CSV in WASM
4. Yield records 4. Send results back
Characteristics:
Limitation: WASM parser only supports UTF-8 encoded strings.
Why:
Workaround: For non-UTF-8 encodings, the router automatically falls back to JavaScript:
// Automatic fallback for Shift-JIS
for await (const record of parse(csv, {
engine: { wasm: true },
charset: 'shift-jis' // Falls back to JavaScript
})) {
console.log(record);
}
Limitation:
WASM parser only supports double-quote (") as quotation character.
Why:
Workaround: For single-quote CSVs, use JavaScript parser:
for await (const record of parse(csv, {
engine: { wasm: false },
quotation: "'"
})) {
console.log(record);
}
---
### Object Output Only
**Limitation:**
WASM parser always emits object-shaped records. `outputFormat: 'array'` (named tuples) currently runs only on the JavaScript engine.
**Why:**
- WASM returns JSON that maps headers to values (object form)
- Supporting tuple output would require a different serialization path and additional memory copying
**Workaround:**
Force the JavaScript engine whenever you need array output or `includeHeader`:
```typescript
const rows = await parse.toArray(csv, {
header: ["name", "age"] as const,
outputFormat: "array",
includeHeader: true,
engine: { wasm: false }, // Skip WASM, use JS implementation
});
Limitation: WASM parser processes the entire CSV string at once.
Why:
Impact:
Workaround: For incremental parsing, the JavaScript implementation supports chunk-by-chunk processing:
const lexer = new FlexibleStringCSVLexer();
for (const chunk of chunks) {
for (const token of lexer.lex(chunk, true)) {
// Process tokens incrementally
}
}
lexer.flush();
The execution router automatically falls back to JavaScript when WASM is unavailable or incompatible:
┌─────────────────────────────────────────────────────────────┐
│ User requests WASM execution │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Check: Is WASM loaded? │
└─────────────────────────────────────────────────────────────┘
↓ No ↓ Yes
┌──────────────────┐ ┌──────────────────┐
│ Fallback to JS │ │ Check: UTF-8? │
└──────────────────┘ └──────────────────┘
↓ No ↓ Yes
┌──────────────┐ ┌──────────────┐
│ Fallback │ │ Check: │
│ to JS │ │ Double-quote?│
└──────────────┘ └──────────────┘
↓ No ↓ Yes
┌──────────┐ ┌──────────┐
│ Fallback │ │ Use WASM │
│ to JS │ └──────────┘
└──────────┘
Fallback scenarios:
WASM runs in a sandboxed environment:
Isolation:
Memory Safety:
WASM respects the same resource limits as JavaScript:
// maxBufferSize applies to both JS and WASM
const lexer = new FlexibleStringCSVLexer({ maxBufferSize: 10 * 1024 * 1024 }); // Example
Why:
WASM features in this library depend on your runtime’s native WebAssembly support. Verify your environment before relying on WASM acceleration.
If your runtime doesn’t support WebAssembly or you choose not to use it, the JavaScript parser remains available as a fallback.
The WASM binary is bundled with the npm package:
web-csv-toolbox/
├── dist/
│ ├── main.web.js / main.node.js # Main entry points
│ ├── slim.web.js / slim.node.js # Slim entry points
│ ├── csv.wasm # WASM binary (exported as web-csv-toolbox/csv.wasm)
│ ├── _virtual/ # Build-time virtual modules for inlined WASM (main entry)
│ └── wasm/
│ └── loaders/ # loadWASM / loadWASMSync loaders
Bundler support:
@rollup/plugin-wasmweb-csv-toolbox's WebAssembly implementation provides:
Trade-offs:
When to use WASM:
Performance: See CodSpeed benchmarks for actual measured performance across different scenarios.