web-csv-toolbox - v0.14.0
    Preparing search index...

    WASM Performance Optimization

    This guide shows you how to maximize CSV parsing performance using WebAssembly in web-csv-toolbox.

    Note for Bundler Users: When using WASM with bundlers (Vite, Webpack, etc.), you must explicitly configure WASM file loading. When combining WASM with Workers (e.g., EnginePresets.responsiveFast()), you also need to specify the workerURL option. See How to Use with Bundlers for detailed configuration.

    • Completed Using WebAssembly tutorial
    • Basic understanding of performance optimization
    • Familiarity with browser DevTools or Node.js profiling

    Before optimizing, understand where time is spent:

    ┌────────────────────────────────────────────────────────────┐
    Total Parsing Time
    └────────────────────────────────────────────────────────────┘

    ├─ WASM Initialization (one-time)

    ├─ Data Transfer (MainWASM)

    ├─ CSV Parsing (in WASM)

    ├─ Result Transfer (WASMMain)

    └─ Record Processing (JavaScript)

    Key insight: Parsing dominates, so optimizing WASM usage has the biggest impact.


    import { loadWASM, parse } from 'web-csv-toolbox';

    async function parseCSV(csv: string) {
    await loadWASM(); // ❌ Slow! Loads WASM every time

    for await (const record of parse(csv, {
    engine: { wasm: true }
    })) {
    console.log(record);
    }
    }

    // Called multiple times
    await parseCSV(csv1); // Loads WASM ~50ms
    await parseCSV(csv2); // Loads WASM ~50ms
    await parseCSV(csv3); // Loads WASM ~50ms

    Performance impact: Initialization overhead for each parse


    import { loadWASM, parse } from 'web-csv-toolbox';

    // Load once at application startup
    await loadWASM();

    async function parseCSV(csv: string) {
    for await (const record of parse(csv, {
    engine: { wasm: true }
    })) {
    console.log(record);
    }
    }

    // Called multiple times
    await parseCSV(csv1); // Fast (WASM cached)
    await parseCSV(csv2); // Fast (WASM cached)
    await parseCSV(csv3); // Fast (WASM cached)

    Performance improvement: Eliminates repeated initialization overhead


    for await (const record of parse(csv, {
    engine: {
    worker: true,
    wasm: true,
    workerStrategy: 'stream-transfer'
    }
    })) {
    console.log(record);
    }

    Problems:

    • Verbose
    • Easy to misconfigure
    • May not use optimal settings

    import { parse, EnginePresets } from 'web-csv-toolbox';

    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast()
    })) {
    console.log(record);
    }

    Benefits:

    • Optimal configuration
    • Automatic fallback
    • Less code

    for await (const record of parse(csv, {
    engine: { wasm: true } // Blocks main thread
    })) {
    console.log(record);
    // UI frozen during parsing
    }

    Problems:

    • Blocks main thread
    • UI freezes
    • Poor user experience

    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast() // Worker + WASM
    })) {
    console.log(record);
    // UI stays responsive
    }

    Benefits:

    • ✅ Non-blocking UI
    • ✅ Maximum performance
    • ✅ Best user experience

    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast()
    })) {
    await processRecord(record); // Async operation
    // Wait for each record to complete
    }

    Problem: Sequential processing is slow


    const BATCH_SIZE = 1000;
    let batch: any[] = [];

    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast()
    })) {
    batch.push(record);

    if (batch.length >= BATCH_SIZE) {
    await processBatch(batch); // Process 1000 records at once
    batch = [];
    }
    }

    // Process remaining records
    if (batch.length > 0) {
    await processBatch(batch);
    }

    Performance improvement: Significantly faster for I/O-bound operations (database writes, API calls, etc.)


    const files = ['data1.csv', 'data2.csv', 'data3.csv', 'data4.csv'];

    for (const file of files) {
    const csv = await fetch(file).then(r => r.text());

    for await (const record of parse(csv, {
    engine: { worker: true, wasm: true }
    })) {
    console.log(record);
    }
    }

    Problems:

    • Creates new worker for each file
    • No worker reuse
    • Unbounded resource usage

    import { ReusableWorkerPool, parse } from 'web-csv-toolbox';

    // Limit concurrent workers
    using pool = new ReusableWorkerPool({ maxWorkers: 4 });

    const files = ['data1.csv', 'data2.csv', 'data3.csv', 'data4.csv'];

    await Promise.all(
    files.map(async (file) => {
    const csv = await fetch(file).then(r => r.text());

    for await (const record of parse(csv, {
    engine: {
    worker: true,
    wasm: true,
    workerPool: pool
    }
    })) {
    console.log(record);
    }
    })
    );

    // Pool automatically cleaned up

    Benefits:

    • ✅ Worker reuse (eliminates initialization overhead)
    • ✅ Bounded resource usage
    • ✅ Concurrent processing

    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast()
    })) {
    // ❌ Creates new object for each record
    const transformed = {
    ...record,
    fullName: `${record.firstName} ${record.lastName}`,
    age: Number(record.age)
    };

    results.push(transformed);
    }

    Problem: Excessive object allocation


    const results: any[] = [];

    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast()
    })) {
    // ✅ Modify record in-place
    (record as any).fullName = `${record.firstName} ${record.lastName}`;
    (record as any).age = Number(record.age);

    results.push(record);
    }

    Performance improvement: Reduces memory allocation and GC pressure


    // Default: 10MB
    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast()
    })) {
    console.log(record);
    }

    Problems:

    • May be too small for legitimate large fields
    • May be too large for memory-constrained environments

    // Small fields (typical CSV)
    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast(),
    maxBufferSize: 1024 * 1024 // 1MB
    })) {
    console.log(record);
    }
    // Large fields (e.g., embedded JSON, long text)
    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast(),
    maxBufferSize: 50 * 1024 * 1024 // 50MB
    })) {
    console.log(record);
    }

    Benefits:

    • Lower memory usage
    • Earlier error detection
    • Better security

    const records = [];

    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast()
    })) {
    records.push(record);
    }

    // Process all at once
    processAllRecords(records); // High memory usage

    Problem: High memory usage for large files


    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast()
    })) {
    // Process immediately
    await processRecord(record);
    // Record can be garbage collected
    }

    Benefits:

    • Constant memory usage
    • Works with arbitrarily large files

    for await (const record of parse(largeCSV, {
    engine: EnginePresets.responsiveFast() // Single worker
    })) {
    console.log(record);
    }

    Problem: Single worker can't utilize all CPU cores


    import { ReusableWorkerPool, parse } from 'web-csv-toolbox';

    using pool = new ReusableWorkerPool({ maxWorkers: 4 });

    // Split CSV into chunks (by line boundaries)
    const chunks = splitCSVIntoChunks(largeCSV, 4);

    await Promise.all(
    chunks.map(async (chunk) => {
    for await (const record of parse(chunk, {
    engine: {
    worker: true,
    wasm: true,
    workerPool: pool
    }
    })) {
    console.log(record);
    }
    })
    );

    Performance improvement: Better CPU utilization on multi-core systems

    Note: Ensure chunks start at record boundaries (include header in each chunk or use pre-defined headers).


    import { z } from 'zod';

    const schema = z.object({
    name: z.string(),
    age: z.coerce.number(),
    email: z.string().email()
    });

    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast()
    })) {
    // ❌ Expensive validation on every record
    const validated = schema.parse(record);
    console.log(validated);
    }

    Problem: Validation overhead can dominate


    import { z } from 'zod';

    const schema = z.object({
    name: z.string(),
    age: z.coerce.number(),
    email: z.string().email()
    });

    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast()
    })) {
    // ✅ Quick check first
    if (record.age && Number(record.age) > 0) {
    // Only validate suspicious records
    const validated = schema.parse(record);
    console.log(validated);
    } else {
    console.log(record);
    }
    }

    Performance improvement: Reduced validation overhead for valid records


    import { performance } from 'perf_hooks'; // Node.js
    import { loadWASM, parse, EnginePresets } from 'web-csv-toolbox';

    await loadWASM();

    async function benchmark(csv: string, label: string) {
    const start = performance.now();
    let count = 0;

    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast()
    })) {
    count++;
    }

    const end = performance.now();
    const duration = end - start;
    const recordsPerSecond = (count / duration) * 1000;

    console.log(`${label}:`);
    console.log(` Time: ${duration.toFixed(2)}ms`);
    console.log(` Records: ${count}`);
    console.log(` Speed: ${recordsPerSecond.toFixed(0)} records/sec`);
    }

    await benchmark(csv, 'WASM Performance');

    // Benchmark JavaScript
    for await (const record of parse(csv, {
    engine: { wasm: false }
    })) {
    count++;
    }

    // Benchmark WASM
    for await (const record of parse(csv, {
    engine: { wasm: true }
    })) {
    count++;
    }

    // Benchmark Worker + WASM
    for await (const record of parse(csv, {
    engine: EnginePresets.responsiveFast()
    })) {
    count++;
    }

    import { Hono } from 'hono';
    import { loadWASM, parse, ReusableWorkerPool, EnginePresets } from 'web-csv-toolbox';
    import { z } from 'zod';

    const app = new Hono();

    // 1. Initialize WASM once
    await loadWASM();

    // 2. Create worker pool
    using pool = new ReusableWorkerPool({ maxWorkers: 4 });

    // 3. Define validation schema
    const recordSchema = z.object({
    name: z.string().min(1).max(100),
    age: z.coerce.number().int().min(0).max(150),
    email: z.string().email(),
    });

    app.post('/parse-csv', async (c) => {
    const csv = await c.req.text();
    const results: any[] = [];
    const errors: any[] = [];

    // 4. Use fastest engine
    for await (const record of parse(csv, {
    engine: {
    worker: true,
    wasm: true,
    workerPool: pool
    }
    })) {
    try {
    // 5. Validate (with error recovery)
    const validated = recordSchema.parse(record);
    results.push(validated);
    } catch (error) {
    errors.push({ record, error: error.message });
    }
    }

    return c.json({
    success: true,
    data: results,
    errors: errors.length > 0 ? errors : undefined
    });
    });

    export default app;

    Optimizations applied:

    • ✅ WASM initialized once
    • ✅ Worker pool for resource management
    • ✅ Worker + WASM for maximum performance
    • ✅ Streaming processing (constant memory)
    • ✅ Error recovery

    • [ ] Call loadWASM() once at startup
    • [ ] Use EnginePresets.responsiveFast() for UTF-8 CSV
    • [ ] Use ReusableWorkerPool to limit concurrent workers
    • [ ] Handle errors gracefully
    • [ ] Set appropriate maxBufferSize
    • [ ] Benchmark with realistic data
    • [ ] Profile with DevTools/Node.js profiler
    • [ ] Test with large files (>10MB)
    • [ ] Test with many concurrent requests
    • [ ] Monitor memory usage

    Problem: Using WASM on main thread in browser

    Solution: Use EnginePresets.responsiveFast() (Worker + WASM)


    Problem: Accumulating all records in memory

    Solution: Process records as they arrive (streaming)


    Problem: Not limiting concurrent workers

    Solution: Use ReusableWorkerPool with maxWorkers


    Problem: Loading WASM before each parse

    Solution: Load once at startup


    Problem: Using await inside parsing loop

    Solution: Batch operations or use parallel processing



    To maximize WASM performance:

    1. Initialize once - Call loadWASM() at startup
    2. Use presets - EnginePresets.responsiveFast() for optimal config
    3. Combine strategies - Worker + WASM for best results
    4. Batch processing - Process records in batches
    5. Worker pool - Limit concurrent workers
    6. Stream processing - Avoid loading all into memory
    7. Tune limits - Set appropriate maxBufferSize
    8. Benchmark - Measure and optimize based on data
    9. Parallel processing - Split large files across workers
    10. Minimize overhead - Avoid unnecessary operations

    Expected improvements:

    • Improved performance through compiled code (WASM)
    • Non-blocking UI (Worker + WASM)
    • Constant memory usage (streaming)
    • Scalable concurrent processing (Worker pool)

    Performance measurements: See CodSpeed benchmarks for actual measured performance.