web-csv-toolbox
    Preparing search index...

    web-csv-toolbox

    npm version License: MIT PRs Welcome node version FOSSA Status

    npm package minimized gzipped size GitHub code size in bytes npm codecov

    🌐 web-csv-toolbox 🧰

    A CSV Toolbox utilizing Web Standard APIs.

    πŸ”—

    GitHub npm API Reference Ask DeepWiki Sponsor CodSpeed Badge

    format: Biome test: Vitest build: Vite

    GMO OSS support


    • 🌐 Web Standards first.
    • ❀️ TypeScript friendly & User friendly.
      • Fully typed and documented.
    • 0️⃣ Zero dependencies.
      • Using only Web Standards APIs.
    • πŸ’ͺ Property-based testing.
    • βœ… Cross-platform.
      • Works on browsers, Node.js, and Deno.
    • 🌊 Efficient CSV Parsing with Streams
      • πŸ’» Leveraging the WHATWG Streams API and other Web APIs for seamless and efficient data processing.
    • πŸ›‘ AbortSignal and Timeout Support: Ensure your CSV processing is cancellable, including support for automatic timeouts.
      • βœ‹ Integrate with AbortController to manually cancel operations as needed.
      • ⏳ Use AbortSignal.timeout to automatically cancel operations that exceed a specified time limit.
    • πŸ›‘οΈ Memory Safety Protection: Built-in limits prevent memory exhaustion attacks.
      • πŸ”’ Configurable maximum buffer size (default: 10M characters) to prevent DoS attacks via unbounded input.
      • 🚨 Throws RangeError when buffer exceeds the limit.
      • πŸ“Š Configurable maximum field count (default: 100,000 fields/record) to prevent excessive column attacks.
      • ⚠️ Throws RangeError when field count exceeds the limit.
      • πŸ’Ύ Configurable maximum binary size (default: 100MB bytes) for ArrayBuffer/Uint8Array inputs.
      • πŸ›‘ Throws RangeError when binary size exceeds the limit.
    • 🎨 Flexible Source Support
      • 🧩 Parse CSVs directly from strings, ReadableStreams, or Response objects.
    • βš™οΈ Advanced Parsing Options: Customize your experience with various delimiters and quotation marks.
      • πŸ”„ Defaults to , and " respectively.
    • πŸ’Ύ Specialized Binary CSV Parsing: Leverage Stream-based processing for versatility and strength.
      • πŸ”„ Flexible BOM handling.
      • πŸ—œοΈ Supports various compression formats.
      • πŸ”€ Charset specification for diverse encoding.
    • πŸš€ Using WebAssembly for High Performance: WebAssembly is used for high performance parsing. (Experimental)
      • πŸ“¦ WebAssembly is used for high performance parsing.
    • πŸ“¦ Lightweight and Zero Dependencies: No external dependencies, only Web Standards APIs.
    • πŸ“š Fully Typed and Documented: Fully typed and documented with TypeDoc.

    This package can then be installed using a package manager.

    # Install with npm
    $ npm install web-csv-toolbox
    # Or Yarn
    $ yarn add web-csv-toolbox
    # Or pnpm
    $ pnpm add web-csv-toolbox
    <script type="module">
    import { parse } from 'https://unpkg.com/web-csv-toolbox';

    const csv = `name,age
    Alice,42
    Bob,69`;

    for await (const record of parse(csv)) {
    console.log(record);
    }
    </script>

    You can install and use the package by specifying the following:

    import { parse } from "npm:web-csv-toolbox";
    
    import { parse } from 'web-csv-toolbox';

    const csv = `name,age
    Alice,42
    Bob,69`;

    for await (const record of parse(csv)) {
    console.log(record);
    }
    // Prints:
    // { name: 'Alice', age: '42' }
    // { name: 'Bob', age: '69' }
    import { parse } from 'web-csv-toolbox';

    const csv = `name,age
    Alice,42
    Bob,69`;

    const stream = new ReadableStream({
    start(controller) {
    controller.enqueue(csv);
    controller.close();
    },
    });

    for await (const record of parse(stream)) {
    console.log(record);
    }
    // Prints:
    // { name: 'Alice', age: '42' }
    // { name: 'Bob', age: '69' }
    import { parse } from 'web-csv-toolbox';

    const response = await fetch('https://example.com/data.csv');

    for await (const record of parse(response)) {
    console.log(record);
    }
    // Prints:
    // { name: 'Alice', age: '42' }
    // { name: 'Bob', age: '69' }
    import { parse } from 'web-csv-toolbox';

    const csv = `name\tage
    Alice\t42
    Bob\t69`;

    for await (const record of parse(csv, { delimiter: '\t' })) {
    console.log(record);
    }
    // Prints:
    // { name: 'Alice', age: '42' }
    // { name: 'Bob', age: '69' }
    import { parse } from 'web-csv-toolbox';

    const csv = `Alice,42
    Bob,69`;

    for await (const record of parse(csv, { headers: ['name', 'age'] })) {
    console.log(record);
    }
    // Prints:
    // { name: 'Alice', age: '42' }
    // { name: 'Bob', age: '69' }

    Some CSV files don’t include a header row. You can provide custom headers manually:

    import { parse } from 'web-csv-toolbox';

    // Example: Sensor data without headers
    const sensorData = `25.5,60,1024
    26.1,58,1020
    24.8,62,1025`;

    // Provide headers explicitly
    for await (const record of parse(sensorData, {
    header: ['temperature', 'humidity', 'pressure']
    })) {
    console.log(`Temp: ${record.temperature}Β°C, Humidity: ${record.humidity}%, Pressure: ${record.pressure} hPa`);
    }
    // Output:
    // Temp: 25.5Β°C, Humidity: 60%, Pressure: 1024 hPa
    // Temp: 26.1Β°C, Humidity: 58%, Pressure: 1020 hPa
    // Temp: 24.8Β°C, Humidity: 62%, Pressure: 1025 hPa

    Support for AbortSignal / AbortController, enabling you to cancel ongoing asynchronous CSV processing tasks.

    This feature is useful for scenarios where processing needs to be halted, such as when a user navigates away from the page or other conditions that require stopping the task early.

    import { parse } from 'web-csv-toolbox';

    const controller = new AbortController();
    const csv = "name,age\nAlice,30\nBob,25";

    try {
    // Parse the CSV data then pass the AbortSignal to the parse function
    for await (const record of parse(csv, { signal: controller.signal })) {
    console.log(record);
    }
    } catch (error) {
    if (error instanceof DOMException && error.name === 'AbortError') {
    // The CSV processing was aborted by the user
    console.log('CSV processing was aborted by the user.');
    } else {
    // An error occurred during CSV processing
    console.error('An error occurred:', error);
    }
    }

    // Some abort logic, like a cancel button
    document.getElementById('cancel-button')
    .addEventListener('click', () => {
    controller.abort();
    });
    import { parse } from 'web-csv-toolbox';

    // Set up a timeout of 5 seconds (5000 milliseconds)
    const signal = AbortSignal.timeout(5000);

    const csv = "name,age\nAlice,30\nBob,25";

    try {
    // Pass the AbortSignal to the parse function
    const result = await parse.toArray(csv, { signal });
    console.log(result);
    } catch (error) {
    if (error instanceof DOMException && error.name === 'TimeoutError') {
    // Handle the case where the processing was aborted due to timeout
    console.log('CSV processing was aborted due to timeout.');
    } else {
    // Handle other errors
    console.error('An error occurred during CSV processing:', error);
    }
    }
    Versions Status
    20.x βœ…
    22.x βœ…
    24.x βœ…
    OS Chrome Firefox Default
    Windows βœ… βœ… βœ… (Edge)
    macOS βœ… βœ… ⬜ (Safari *)
    Linux βœ… βœ… -

    * Safari: Basic functionality is expected to work, but it is not yet automatically tested in our CI environment.

    • Verify that JavaScript is executable on the Deno. Deno CI

    These APIs are designed for Simplicity and Ease of Use, providing an intuitive and straightforward experience for users.

    • function parse(input[, options]): AsyncIterableIterator<CSVRecord>: πŸ“‘
      • Parses various CSV input formats into an asynchronous iterable of records.
    • function parse.toArray(input[, options]): Promise<CSVRecord[]>: πŸ“‘
      • Parses CSV input into an array of records, ideal for smaller data sets.

    The input paramater can be a string, a ReadableStream of strings or Uint8Arrays, or a Uint8Array object, or a ArrayBuffer object, or a Response object.

    These APIs are optimized for Enhanced Performance and Control, catering to users who need more detailed and fine-tuned functionality.

    • function parseString(string[, options]): πŸ“‘
      • Efficient parsing of CSV strings.
    • function parseBinary(buffer[, options]): πŸ“‘
      • Parse CSV Binary of ArrayBuffer or Uint8Array.
    • function parseResponse(response[, options]): πŸ“‘
      • Customized parsing directly from Response objects.
    • function parseStream(stream[, options]): πŸ“‘
      • Stream-based parsing for larger or continuous data.
    • function parseStringStream(stream[, options]): πŸ“‘
      • Combines string-based parsing with stream processing.
    • function parseUint8ArrayStream(stream[, options]): πŸ“‘
      • Parses binary streams with precise control over data types.

    These APIs are built for Advanced Customization and Pipeline Design, ideal for developers looking for in-depth control and flexibility.

    • class CSVLexerTransformer: πŸ“‘
      • A TransformStream class for lexical analysis of CSV data.
      • Supports custom queuing strategies for controlling backpressure and memory usage.
    • class CSVRecordAssemblerTransformer: πŸ“‘
      • Handles the assembly of parsed data into records.
      • Supports custom queuing strategies for controlling backpressure and memory usage.

    Both CSVLexerTransformer and CSVRecordAssemblerTransformer support custom queuing strategies following the Web Streams API pattern. Strategies are passed as constructor arguments with data-type-aware size counting and configurable backpressure handling.

    Constructor signature:

    new CSVLexerTransformer(options?, writableStrategy?, readableStrategy?)
    new CSVRecordAssemblerTransformer(options?, writableStrategy?, readableStrategy?)

    Default queuing strategies (starting points, not benchmarked):

    // CSVLexerTransformer defaults
    writableStrategy: {
    highWaterMark: 65536, // 64KB of characters
    size: (chunk) => chunk.length, // Count by string length
    checkInterval: 100 // Check backpressure every 100 tokens
    }
    readableStrategy: {
    highWaterMark: 1024, // 1024 tokens
    size: (tokens) => tokens.length, // Count by number of tokens
    checkInterval: 100 // Check backpressure every 100 tokens
    }

    // CSVRecordAssemblerTransformer defaults
    writableStrategy: {
    highWaterMark: 1024, // 1024 tokens
    size: (tokens) => tokens.length, // Count by number of tokens
    checkInterval: 10 // Check backpressure every 10 records
    }
    readableStrategy: {
    highWaterMark: 256, // 256 records
    size: () => 1, // Each record counts as 1
    checkInterval: 10 // Check backpressure every 10 records
    }

    Key Features:

    🎯 Smart Size Counting:

    • Character-based counting for string inputs (accurate memory tracking)
    • Token-based counting between transformers (smooth pipeline flow)
    • Record-based counting for output (intuitive and predictable)

    ⚑ Cooperative Backpressure:

    • Monitors controller.desiredSize during processing
    • Yields to event loop when backpressure detected
    • Prevents blocking the main thread
    • Critical for browser UI responsiveness

    πŸ”§ Tunable Check Interval:

    • checkInterval: How often to check for backpressure
    • Lower values (5-25): More responsive, slight overhead
    • Higher values (100-500): Less overhead, slower response
    • Customize based on downstream consumer speed

    ⚠️ Important: These defaults are theoretical starting points based on data flow characteristics, not empirical benchmarks. Optimal values vary by runtime (browser/Node.js/Deno), file size, memory constraints, and CPU performance. Profile your specific use case to find the best values.

    When to customize:

    • πŸš€ High-throughput servers: Higher highWaterMark (128KB+, 2048+ tokens), higher checkInterval (200-500)
    • πŸ“± Memory-constrained environments: Lower highWaterMark (16KB, 256 tokens), lower checkInterval (10-25)
    • 🐌 Slow consumers (DB writes, API calls): Lower highWaterMark, lower checkInterval for responsive backpressure
    • πŸƒ Fast processing: Higher values to reduce overhead

    Example - High-throughput server:

    import { CSVLexerTransformer, CSVRecordAssemblerTransformer } from 'web-csv-toolbox';

    const response = await fetch('large-dataset.csv');
    await response.body
    .pipeThrough(new TextDecoderStream())
    .pipeThrough(new CSVLexerTransformer(
    {},
    {
    highWaterMark: 131072, // 128KB
    size: (chunk) => chunk.length,
    checkInterval: 200 // Less frequent checks
    },
    {
    highWaterMark: 2048, // 2048 tokens
    size: (tokens) => tokens.length,
    checkInterval: 100
    }
    ))
    .pipeThrough(new CSVRecordAssemblerTransformer(
    {},
    {
    highWaterMark: 2048, // 2048 tokens
    size: (tokens) => tokens.length,
    checkInterval: 20
    },
    {
    highWaterMark: 512, // 512 records
    size: () => 1,
    checkInterval: 10
    }
    ))
    .pipeTo(yourRecordProcessor);

    Example - Slow consumer (API writes):

    await csvStream
    .pipeThrough(new CSVLexerTransformer()) // Use defaults
    .pipeThrough(new CSVRecordAssemblerTransformer(
    {},
    { highWaterMark: 512, size: (t) => t.length, checkInterval: 5 },
    { highWaterMark: 64, size: () => 1, checkInterval: 2 } // Very responsive
    ))
    .pipeTo(new WritableStream({
    async write(record) {
    await fetch('/api/save', { method: 'POST', body: JSON.stringify(record) });
    }
    }));

    Benchmarking: Use the provided benchmark tool to find optimal values for your use case:

    pnpm --filter web-csv-toolbox-benchmark queuing-strategy
    

    See benchmark/queuing-strategy.bench.ts for implementation details.

    These APIs are experimental and may change in the future.

    You can use WebAssembly to parse CSV data for high performance.

    • Parsing with WebAssembly is faster than parsing with JavaScript, but it takes time to load the WebAssembly module.
    • Supports only UTF-8 encoding csv data.
    • Quotation characters are only ". (Double quotation mark)
      • If you pass a different character, it will throw an error.
    import { loadWASM, parseStringWASM } from "web-csv-toolbox";

    // load WebAssembly module
    await loadWASM();

    const csv = "a,b,c\n1,2,3";

    // parse CSV string
    const result = parseStringToArraySyncWASM(csv);
    console.log(result);
    // Prints:
    // [{ a: "1", b: "2", c: "3" }]
    • function loadWASM(): Promise<void>: πŸ“‘
      • Loads the WebAssembly module.
    • function parseStringToArraySyncWASM(string[, options]): CSVRecord[]: πŸ“‘
      • Parses CSV strings into an array of records.
    Option Description Default Notes
    delimiter Character to separate fields ,
    quotation Character used for quoting fields "
    maxBufferSize Maximum internal buffer size (characters) 10 * 1024 * 1024 Set to Number.POSITIVE_INFINITY to disable (not recommended for untrusted input). Measured in UTF-16 code units.
    maxFieldCount Maximum fields allowed per record 100000 Set to Number.POSITIVE_INFINITY to disable (not recommended for untrusted input)
    headers Custom headers for the parsed records First row If not provided, the first row is used as headers
    signal AbortSignal to cancel processing undefined Allows aborting of long-running operations
    Option Description Default Notes
    charset Character encoding for binary CSV inputs utf-8 See Encoding API Compatibility for the encoding formats that can be specified.
    maxBinarySize Maximum binary size for ArrayBuffer/Uint8Array inputs (bytes) 100 * 1024 * 1024 (100MB) Set to Number.POSITIVE_INFINITY to disable (not recommended for untrusted input)
    decompression Decompression algorithm for compressed CSV inputs See DecompressionStream Compatibility. Supports: gzip, deflate, deflate-raw
    ignoreBOM Whether to ignore Byte Order Mark (BOM) false See TextDecoderOptions.ignoreBOM for more information about the BOM.
    fatal Throw an error on invalid characters false See TextDecoderOptions.fatal for more information.
    allowExperimentalCompressions Allow experimental/future compression formats false When enabled, passes unknown compression formats to runtime. Use cautiously. See example below.

    web-csv-toolbox uses different memory patterns depending on the API you choose:

    import { parse } from 'web-csv-toolbox';

    // βœ… Memory efficient: processes one record at a time
    const response = await fetch('https://example.com/large-data.csv');
    for await (const record of parse(response)) {
    console.log(record);
    // Memory footprint: ~few KB per iteration
    }
    • Memory usage: O(1) - constant per record
    • Suitable for: Files of any size, browser environments
    • Max file size: Limited only by available storage/network
    import { parse } from 'web-csv-toolbox';

    // ⚠️ Loads entire result into memory
    const csv = await fetch('data.csv').then(r => r.text());
    const records = await parse.toArray(csv);
    // Memory footprint: entire file + parsed array
    • Memory usage: O(n) - proportional to file size
    • Suitable for: Small datasets, quick prototyping
    • Recommended max: ~10MB (browser), ~100MB (Node.js)
    Platform Streaming Array-Based Notes
    Browser Any size < 10MB Browser heap limits apply (~100MB-4GB depending on browser)
    Node.js Any size < 100MB Use --max-old-space-size flag for larger heaps
    Deno Any size < 100MB Similar to Node.js
    import { parse } from 'web-csv-toolbox';

    const response = await fetch('https://example.com/large-data.csv');

    // βœ… Good: Streaming approach (constant memory usage)
    for await (const record of parse(response)) {
    // Process each record immediately
    console.log(record);
    // Memory footprint: O(1) - only one record in memory at a time
    }

    // ❌ Avoid: Loading entire file into memory first
    const response2 = await fetch('https://example.com/large-data.csv');
    const text = await response2.text(); // Loads entire file into memory
    const records = await parse.toArray(text); // Loads all records into memory
    for (const record of records) {
    console.log(record);
    // Memory footprint: O(n) - entire file + all records in memory
    }
    import { parse } from 'web-csv-toolbox';

    // Set up a timeout of 30 seconds (30000 milliseconds)
    const signal = AbortSignal.timeout(30000);

    const response = await fetch('https://example.com/large-data.csv');

    try {
    for await (const record of parse(response, { signal })) {
    // Process each record
    console.log(record);
    }
    } catch (error) {
    if (error instanceof DOMException && error.name === 'TimeoutError') {
    // Handle timeout
    console.log('CSV processing was aborted due to timeout.');
    } else {
    // Handle other errors
    console.error('An error occurred during CSV processing:', error);
    }
    }
    import { parseStringToArraySyncWASM } from 'web-csv-toolbox';

    // 2-3x faster for large CSV strings (UTF-8 only)
    const records = parseStringToArraySyncWASM(csvString);
    • Delimiter/Quotation: Must be a single character (multi-character delimiters not supported)
    • WASM Parser: UTF-8 encoding only, double-quote (") only
    • Streaming: Best performance with chunk sizes > 1KB

    For production use with untrusted input, consider:

    • Setting timeouts using AbortSignal.timeout() to prevent resource exhaustion
    • Using maxBinarySize option to limit ArrayBuffer/Uint8Array inputs (default: 100MB bytes)
    • Using maxBufferSize option to limit internal buffer size (default: 10M characters)
    • Using maxFieldCount option to limit fields per record (default: 100,000)
    • Implementing additional file size limits at the application level
    • Validating parsed data before use

    When processing CSV files from untrusted sources (especially compressed files), you can implement size limits using a custom TransformStream:

    import { parse } from 'web-csv-toolbox';

    // Create a size-limiting TransformStream
    class SizeLimitStream extends TransformStream {
    constructor(maxBytes) {
    let bytesRead = 0;
    super({
    transform(chunk, controller) {
    bytesRead += chunk.length;
    if (bytesRead > maxBytes) {
    controller.error(new Error(`Size limit exceeded: ${maxBytes} bytes`));
    } else {
    controller.enqueue(chunk);
    }
    }
    });
    }
    }

    // Example: Limit decompressed data to 10MB
    const response = await fetch('https://untrusted-source.com/data.csv.gz');
    const limitedStream = response.body
    .pipeThrough(new DecompressionStream('gzip'))
    .pipeThrough(new SizeLimitStream(10 * 1024 * 1024)); // 10MB limit

    try {
    for await (const record of parse(limitedStream)) {
    console.log(record);
    }
    } catch (error) {
    if (error.message.includes('Size limit exceeded')) {
    console.error('File too large - possible compression bomb attack');
    }
    }

    Note: The library automatically validates Content-Encoding headers when parsing Response objects, rejecting unsupported compression formats.

    By default, the library only supports well-tested compression formats: gzip, deflate, and deflate-raw. If you need to use newer formats (like Brotli) that your runtime supports but the library hasn't explicitly added yet, you can enable experimental mode:

    import { parse } from 'web-csv-toolbox';

    // βœ… Default behavior: Only known formats
    const response = await fetch('data.csv.gz');
    await parse(response); // Works

    // ⚠️ Experimental: Allow future formats
    const response2 = await fetch('data.csv.br'); // Brotli compression
    try {
    await parse(response2, { allowExperimentalCompressions: true });
    // Works if runtime supports Brotli
    } catch (error) {
    // Runtime will throw if format is unsupported
    console.error('Runtime does not support this compression format');
    }

    When to use this:

    • Your runtime supports a newer compression format (e.g., Brotli in modern browsers)
    • You want to use the format before this library explicitly supports it
    • You trust the compression format source

    Cautions:

    • Error messages will come from the runtime, not this library
    • No library-level validation for unknown formats
    • You must verify your runtime supports the format

    The easiest way to contribute is to use the library and star the repository.

    Feel free to ask questions on GitHub Discussions.

    Please create an issue at GitHub Issues.

    Please support kamiazya.

    Even just a dollar is enough motivation to develop 😊

    This software is released under the MIT License, see LICENSE.

    FOSSA Status