This guide shows you how to maximize CSV parsing performance using WebAssembly in web-csv-toolbox.
Note for Bundler Users: When using WASM with bundlers (Vite, Webpack, etc.), you must explicitly configure WASM file loading. When combining WASM with Workers (e.g.,
EnginePresets.responsiveFast()), you also need to specify theworkerURLoption. See How to Use with Bundlers for detailed configuration.
Before optimizing, understand where time is spent:
┌────────────────────────────────────────────────────────────┐
│ Total Parsing Time │
└────────────────────────────────────────────────────────────┘
│
├─ WASM Initialization (one-time)
│
├─ Data Transfer (Main → WASM)
│
├─ CSV Parsing (in WASM)
│
├─ Result Transfer (WASM → Main)
│
└─ Record Processing (JavaScript)
Key insight: Parsing dominates, so optimizing WASM usage has the biggest impact.
import { loadWASM, parse } from 'web-csv-toolbox';
async function parseCSV(csv: string) {
await loadWASM(); // ❌ Slow! Loads WASM every time
for await (const record of parse(csv, {
engine: { wasm: true }
})) {
console.log(record);
}
}
// Called multiple times
await parseCSV(csv1); // Loads WASM ~50ms
await parseCSV(csv2); // Loads WASM ~50ms
await parseCSV(csv3); // Loads WASM ~50ms
Performance impact: Initialization overhead for each parse
import { loadWASM, parse } from 'web-csv-toolbox';
// Load once at application startup
await loadWASM();
async function parseCSV(csv: string) {
for await (const record of parse(csv, {
engine: { wasm: true }
})) {
console.log(record);
}
}
// Called multiple times
await parseCSV(csv1); // Fast (WASM cached)
await parseCSV(csv2); // Fast (WASM cached)
await parseCSV(csv3); // Fast (WASM cached)
Performance improvement: Eliminates repeated initialization overhead
for await (const record of parse(csv, {
engine: {
worker: true,
wasm: true,
workerStrategy: 'stream-transfer'
}
})) {
console.log(record);
}
Problems:
import { parse, EnginePresets } from 'web-csv-toolbox';
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast()
})) {
console.log(record);
}
Benefits:
for await (const record of parse(csv, {
engine: { wasm: true } // Blocks main thread
})) {
console.log(record);
// UI frozen during parsing
}
Problems:
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast() // Worker + WASM
})) {
console.log(record);
// UI stays responsive
}
Benefits:
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast()
})) {
await processRecord(record); // Async operation
// Wait for each record to complete
}
Problem: Sequential processing is slow
const BATCH_SIZE = 1000;
let batch: any[] = [];
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast()
})) {
batch.push(record);
if (batch.length >= BATCH_SIZE) {
await processBatch(batch); // Process 1000 records at once
batch = [];
}
}
// Process remaining records
if (batch.length > 0) {
await processBatch(batch);
}
Performance improvement: Significantly faster for I/O-bound operations (database writes, API calls, etc.)
const files = ['data1.csv', 'data2.csv', 'data3.csv', 'data4.csv'];
for (const file of files) {
const csv = await fetch(file).then(r => r.text());
for await (const record of parse(csv, {
engine: { worker: true, wasm: true }
})) {
console.log(record);
}
}
Problems:
import { ReusableWorkerPool, parse } from 'web-csv-toolbox';
// Limit concurrent workers
using pool = new ReusableWorkerPool({ maxWorkers: 4 });
const files = ['data1.csv', 'data2.csv', 'data3.csv', 'data4.csv'];
await Promise.all(
files.map(async (file) => {
const csv = await fetch(file).then(r => r.text());
for await (const record of parse(csv, {
engine: {
worker: true,
wasm: true,
workerPool: pool
}
})) {
console.log(record);
}
})
);
// Pool automatically cleaned up
Benefits:
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast()
})) {
// ❌ Creates new object for each record
const transformed = {
...record,
fullName: `${record.firstName} ${record.lastName}`,
age: Number(record.age)
};
results.push(transformed);
}
Problem: Excessive object allocation
const results: any[] = [];
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast()
})) {
// ✅ Modify record in-place
(record as any).fullName = `${record.firstName} ${record.lastName}`;
(record as any).age = Number(record.age);
results.push(record);
}
Performance improvement: Reduces memory allocation and GC pressure
// Default: 10MB
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast()
})) {
console.log(record);
}
Problems:
// Small fields (typical CSV)
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast(),
maxBufferSize: 1024 * 1024 // 1MB
})) {
console.log(record);
}
// Large fields (e.g., embedded JSON, long text)
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast(),
maxBufferSize: 50 * 1024 * 1024 // 50MB
})) {
console.log(record);
}
Benefits:
const records = [];
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast()
})) {
records.push(record);
}
// Process all at once
processAllRecords(records); // High memory usage
Problem: High memory usage for large files
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast()
})) {
// Process immediately
await processRecord(record);
// Record can be garbage collected
}
Benefits:
for await (const record of parse(largeCSV, {
engine: EnginePresets.responsiveFast() // Single worker
})) {
console.log(record);
}
Problem: Single worker can't utilize all CPU cores
import { ReusableWorkerPool, parse } from 'web-csv-toolbox';
using pool = new ReusableWorkerPool({ maxWorkers: 4 });
// Split CSV into chunks (by line boundaries)
const chunks = splitCSVIntoChunks(largeCSV, 4);
await Promise.all(
chunks.map(async (chunk) => {
for await (const record of parse(chunk, {
engine: {
worker: true,
wasm: true,
workerPool: pool
}
})) {
console.log(record);
}
})
);
Performance improvement: Better CPU utilization on multi-core systems
Note: Ensure chunks start at record boundaries (include header in each chunk or use pre-defined headers).
import { z } from 'zod';
const schema = z.object({
name: z.string(),
age: z.coerce.number(),
email: z.string().email()
});
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast()
})) {
// ❌ Expensive validation on every record
const validated = schema.parse(record);
console.log(validated);
}
Problem: Validation overhead can dominate
import { z } from 'zod';
const schema = z.object({
name: z.string(),
age: z.coerce.number(),
email: z.string().email()
});
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast()
})) {
// ✅ Quick check first
if (record.age && Number(record.age) > 0) {
// Only validate suspicious records
const validated = schema.parse(record);
console.log(validated);
} else {
console.log(record);
}
}
Performance improvement: Reduced validation overhead for valid records
import { performance } from 'perf_hooks'; // Node.js
import { loadWASM, parse, EnginePresets } from 'web-csv-toolbox';
await loadWASM();
async function benchmark(csv: string, label: string) {
const start = performance.now();
let count = 0;
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast()
})) {
count++;
}
const end = performance.now();
const duration = end - start;
const recordsPerSecond = (count / duration) * 1000;
console.log(`${label}:`);
console.log(` Time: ${duration.toFixed(2)}ms`);
console.log(` Records: ${count}`);
console.log(` Speed: ${recordsPerSecond.toFixed(0)} records/sec`);
}
await benchmark(csv, 'WASM Performance');
// Benchmark JavaScript
for await (const record of parse(csv, {
engine: { wasm: false }
})) {
count++;
}
// Benchmark WASM
for await (const record of parse(csv, {
engine: { wasm: true }
})) {
count++;
}
// Benchmark Worker + WASM
for await (const record of parse(csv, {
engine: EnginePresets.responsiveFast()
})) {
count++;
}
import { Hono } from 'hono';
import { loadWASM, parse, ReusableWorkerPool, EnginePresets } from 'web-csv-toolbox';
import { z } from 'zod';
const app = new Hono();
// 1. Initialize WASM once
await loadWASM();
// 2. Create worker pool
using pool = new ReusableWorkerPool({ maxWorkers: 4 });
// 3. Define validation schema
const recordSchema = z.object({
name: z.string().min(1).max(100),
age: z.coerce.number().int().min(0).max(150),
email: z.string().email(),
});
app.post('/parse-csv', async (c) => {
const csv = await c.req.text();
const results: any[] = [];
const errors: any[] = [];
// 4. Use fastest engine
for await (const record of parse(csv, {
engine: {
worker: true,
wasm: true,
workerPool: pool
}
})) {
try {
// 5. Validate (with error recovery)
const validated = recordSchema.parse(record);
results.push(validated);
} catch (error) {
errors.push({ record, error: error.message });
}
}
return c.json({
success: true,
data: results,
errors: errors.length > 0 ? errors : undefined
});
});
export default app;
Optimizations applied:
loadWASM() once at startupEnginePresets.responsiveFast() for UTF-8 CSVReusableWorkerPool to limit concurrent workersmaxBufferSizeProblem: Using WASM on main thread in browser
Solution: Use EnginePresets.responsiveFast() (Worker + WASM)
Problem: Accumulating all records in memory
Solution: Process records as they arrive (streaming)
Problem: Not limiting concurrent workers
Solution: Use ReusableWorkerPool with maxWorkers
Problem: Loading WASM before each parse
Solution: Load once at startup
Problem: Using await inside parsing loop
Solution: Batch operations or use parallel processing
To maximize WASM performance:
loadWASM() at startupEnginePresets.responsiveFast() for optimal configmaxBufferSizeExpected improvements:
Performance measurements: See CodSpeed benchmarks for actual measured performance.