AbortController to manually cancel operations as needed.AbortSignal.timeout to automatically cancel operations that exceed a specified time limit.RangeError when buffer exceeds the limit.RangeError when field count exceeds the limit.RangeError when binary size exceeds the limit.strings, ReadableStreams, Response objects, Blob/File objects, or Request objects., and " respectively.This package can then be installed using a package manager.
# Install with npm
$ npm install web-csv-toolbox
# Or Yarn
$ yarn add web-csv-toolbox
# Or pnpm
$ pnpm add web-csv-toolbox
<script type="module">
import { parse } from 'https://unpkg.com/web-csv-toolbox';
const csv = `name,age
Alice,42
Bob,69`;
for await (const record of parse(csv)) {
console.log(record);
}
</script>
You can install and use the package by specifying the following:
import { parse } from "npm:web-csv-toolbox";
This library provides two entry points to suit different needs:
For a deeper comparison and migration guidance, see:
web-csv-toolbox (Default - Full Features)Best for: Most users who want automatic WASM initialization and all features
import { loadWASM, parseStringToArraySyncWASM } from 'web-csv-toolbox';
// Optional but recommended: preload to reduce firstβparse latency
await loadWASM();
const records = parseStringToArraySyncWASM(csv);
Characteristics:
loadWASM() at startup to reduce firstβparse latency (optional)web-csv-toolbox/slim (Slim Entry - Smaller Bundle)Best for: Bundle size-sensitive applications and production optimization
import { loadWASM, parseStringToArraySyncWASM } from 'web-csv-toolbox/slim';
// Manual initialization required
await loadWASM();
const records = parseStringToArraySyncWASM(csv);
Characteristics:
loadWASM() call before using WASM featuresComparison:
| Aspect | Main | Slim |
|---|---|---|
| Initialization | Automatic | Manual (loadWASM() required) |
| Bundle Size | Larger (WASM embedded) | Smaller (WASM external) |
| Caching | Single bundle | WASM cached separately |
| Use Case | Convenience, prototyping | Production, bundle optimization |
Note: Both entry points export the same full API (feature parity). The only difference is WASM initialization strategy and bundle size.
Note for Bundler Users: When using Worker-based execution strategies (e.g.,
EnginePresets.responsive(),EnginePresets.responsiveFast()) with bundlers like Vite or Webpack, you must explicitly specify theworkerURLoption. See the Bundler Integration Guide for configuration details.
import { parse } from 'web-csv-toolbox';
const csv = `name,age
Alice,42
Bob,69`;
for await (const record of parse(csv)) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
ReadableStreamsimport { parse } from 'web-csv-toolbox';
const csv = `name,age
Alice,42
Bob,69`;
const stream = new ReadableStream({
start(controller) {
controller.enqueue(csv);
controller.close();
},
});
for await (const record of parse(stream)) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
Response objectsimport { parse } from 'web-csv-toolbox';
const response = await fetch('https://example.com/data.csv');
for await (const record of parse(response)) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
Blob or File objectsimport { parse } from 'web-csv-toolbox';
// From file input
const fileInput = document.querySelector('input[type="file"]');
fileInput.addEventListener('change', async (e) => {
const file = e.target.files[0];
for await (const record of parse(file)) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
});
Request objects (Server-side)import { parse } from 'web-csv-toolbox';
// Cloudflare Workers / Service Workers
export default {
async fetch(request) {
if (request.method === 'POST') {
for await (const record of parse(request)) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
return new Response('OK', { status: 200 });
}
}
};
import { parse } from 'web-csv-toolbox';
const csv = `name\tage
Alice\t42
Bob\t69`;
for await (const record of parse(csv, { delimiter: '\t' })) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
import { parse } from 'web-csv-toolbox';
const csv = `Alice,42
Bob,69`;
for await (const record of parse(csv, { header: ['name', 'age'] })) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
Some CSV files donβt include a header row. You can provide custom headers manually:
import { parse } from 'web-csv-toolbox';
// Example: Sensor data without headers
const sensorData = `25.5,60,1024
26.1,58,1020
24.8,62,1025`;
// Provide headers explicitly
for await (const record of parse(sensorData, {
header: ['temperature', 'humidity', 'pressure']
})) {
console.log(`Temp: ${record.temperature}Β°C, Humidity: ${record.humidity}%, Pressure: ${record.pressure} hPa`);
}
// Output:
// Temp: 25.5Β°C, Humidity: 60%, Pressure: 1024 hPa
// Temp: 26.1Β°C, Humidity: 58%, Pressure: 1020 hPa
// Temp: 24.8Β°C, Humidity: 62%, Pressure: 1025 hPa
AbortSignal / AbortController SupportSupport for AbortSignal / AbortController, enabling you to cancel ongoing asynchronous CSV processing tasks.
This feature is useful for scenarios where processing needs to be halted, such as when a user navigates away from the page or other conditions that require stopping the task early.
import { parse } from 'web-csv-toolbox';
const controller = new AbortController();
const csv = "name,age\nAlice,30\nBob,25";
try {
// Parse the CSV data then pass the AbortSignal to the parse function
for await (const record of parse(csv, { signal: controller.signal })) {
console.log(record);
}
} catch (error) {
if (error instanceof DOMException && error.name === 'AbortError') {
// The CSV processing was aborted by the user
console.log('CSV processing was aborted by the user.');
} else {
// An error occurred during CSV processing
console.error('An error occurred:', error);
}
}
// Some abort logic, like a cancel button
document.getElementById('cancel-button')
.addEventListener('click', () => {
controller.abort();
});
import { parse } from 'web-csv-toolbox';
// Set up a timeout of 5 seconds (5000 milliseconds)
const signal = AbortSignal.timeout(5000);
const csv = "name,age\nAlice,30\nBob,25";
try {
// Pass the AbortSignal to the parse function
const result = await parse.toArray(csv, { signal });
console.log(result);
} catch (error) {
if (error instanceof DOMException && error.name === 'TimeoutError') {
// Handle the case where the processing was aborted due to timeout
console.log('CSV processing was aborted due to timeout.');
} else {
// Handle other errors
console.error('An error occurred during CSV processing:', error);
}
}
| Versions | Status |
|---|---|
| 20.x | β |
| 22.x | β |
| 24.x | β |
Note: For Node environments, the WASM loader uses
import.meta.resolve. Node.js 20.6+ is recommended. On older Node versions, pass an explicit URL/Buffer toloadWASM().
| OS | Chrome | Firefox | Default |
|---|---|---|---|
| Windows | β | β | β (Edge) |
| macOS | β | β | β¬ (Safari *) |
| Linux | β | β | - |
* Safari: Basic functionality is expected to work, but it is not yet automatically tested in our CI environment.
For detailed examples and best practices for your specific runtime environment, see:
This guide covers:
These APIs are designed for Simplicity and Ease of Use, providing an intuitive and straightforward experience for users.
function parse(input[, options]): AsyncIterableIterator<CSVRecord>: π
function parse.toArray(input[, options]): Promise<CSVRecord[]>: π
The input paramater can be:
stringstrings or Uint8ArraysThese APIs are optimized for Enhanced Performance and Control, catering to users who need more detailed and fine-tuned functionality.
function parseString(string[, options]): π
function parseBinary(buffer[, options]): π
function parseResponse(response[, options]): π
Response objects.function parseRequest(request[, options]): π
Request objects (Cloudflare Workers, Service Workers, etc.).function parseBlob(blob[, options]): π
Blob or File objects.function parseFile(file[, options]): π
File objects with automatic filename tracking in error messages.function parseStream(stream[, options]): π
function parseStringStream(stream[, options]): π
function parseBinaryStream(stream[, options]): π
These APIs are built for Advanced Customization and Pipeline Design, ideal for developers looking for in-depth control and flexibility.
The low-level APIs follow a 3-tier architecture:
Combines Lexer and Assembler for streamlined usage without sacrificing flexibility.
function createStringCSVParser(options?)
FlexibleStringObjectCSVParser (default) or FlexibleStringArrayCSVParser based on outputFormat option.FlexibleStringCSVLexer and CSV Record Assembler.StringCSVParserStream for streaming workflows.CSVProcessingOptions only (no engine option).parse(chunk, { stream: true }), you must call parse() without arguments at the end to flush any remaining data.// Object format (default)
const objectParser = createStringCSVParser({
header: ['name', 'age'] as const
});
// Array format
const arrayParser = createStringCSVParser({
header: ['name', 'age'] as const,
outputFormat: 'array'
});
// Process chunks
const records1 = objectParser.parse('Alice,30\nBob,', { stream: true });
const records2 = objectParser.parse('25\nCharlie,', { stream: true });
// Flush remaining data (required!)
const records3 = objectParser.parse();
FlexibleStringObjectCSVParser - Always outputs object recordsFlexibleStringArrayCSVParser - Always outputs array recordsfunction createBinaryCSVParser(options?)
FlexibleBinaryObjectCSVParser (default) or FlexibleBinaryArrayCSVParser based on outputFormat option.TextDecoder with string CSV parser.TextDecoder with stream: true option for proper multi-byte character handling across chunk boundaries.charset option.ignoreBOM option, fatal error mode via fatal option.BinaryCSVParserStream for streaming workflows.BinaryCSVProcessingOptions only (no engine option).parse(chunk, { stream: true }), you must call parse() without arguments at the end to flush TextDecoder and parser buffers.// Object format (default)
const objectParser = createBinaryCSVParser({
header: ['name', 'age'] as const,
charset: 'utf-8'
});
// Array format
const arrayParser = createBinaryCSVParser({
header: ['name', 'age'] as const,
outputFormat: 'array',
charset: 'utf-8'
});
const encoder = new TextEncoder();
// Process chunks
const records1 = objectParser.parse(encoder.encode('Alice,30\nBob,'), { stream: true });
const records2 = objectParser.parse(encoder.encode('25\n'), { stream: true });
// Flush remaining data (required!)
const records3 = objectParser.parse();
FlexibleBinaryObjectCSVParser - Always outputs object recordsFlexibleBinaryArrayCSVParser - Always outputs array recordsLow-level tokenization with full control over CSV syntax.
function createStringCSVLexer(options?) / class FlexibleStringCSVLexer
Converts tokens into structured records with flexible formatting.
function createCSVRecordAssembler(options)
outputFormat.includeHeader and columnCountStrategy consistently across environments.class FlexibleCSVObjectRecordAssembler / class FlexibleCSVArrayRecordAssembler
FlexibleCSVRecordAssembler remains for backward compatibility but now delegates to these focused implementations.Web Streams API integration for all processing tiers.
class StringCSVParserStream
TransformStream<string, CSVRecord> for streaming string parsing.backpressureCheckInterval option.class BinaryCSVParserStream
TransformStream<BufferSource, CSVRecord> for streaming binary parsing.class CSVLexerTransformer: π
class CSVRecordAssemblerTransformer: π
Both CSVLexerTransformer and CSVRecordAssemblerTransformer support custom queuing strategies following the Web Streams API pattern. Strategies are passed as constructor arguments with data-type-aware size counting and configurable backpressure handling.
Constructor signature:
new CSVLexerTransformer(options?, writableStrategy?, readableStrategy?)
new CSVRecordAssemblerTransformer(options?, writableStrategy?, readableStrategy?)
Default queuing strategies (starting points, not benchmarked):
// CSVLexerTransformer defaults
new CSVLexerTransformer(
{ backpressureCheckInterval: 100 }, // Check every 100 tokens
{
highWaterMark: 65536, // 64KB of characters
size: (chunk) => chunk.length, // Count by string length
},
new CountQueuingStrategy({ highWaterMark: 1024 }) // 1024 tokens
)
// CSVRecordAssemblerTransformer defaults
new CSVRecordAssemblerTransformer(
{ backpressureCheckInterval: 10 }, // Check every 10 records
new CountQueuingStrategy({ highWaterMark: 1024 }), // 1024 tokens
new CountQueuingStrategy({ highWaterMark: 256 }) // 256 records
)
Key Features:
π― Smart Size Counting:
β‘ Cooperative Backpressure:
controller.desiredSize during processingπ§ Tunable Backpressure Check Interval:
backpressureCheckInterval (in options): How often to check for backpressure (count-based)β οΈ Important: These defaults are theoretical starting points based on data flow characteristics, not empirical benchmarks. Optimal values vary by runtime (browser/Node.js/Deno), file size, memory constraints, and CPU performance. Profile your specific use case to find the best values.
When to customize:
highWaterMark (128KB+, 2048+ tokens), higher backpressureCheckInterval (200-500)highWaterMark (16KB, 256 tokens), lower backpressureCheckInterval (10-25)highWaterMark, lower backpressureCheckInterval for responsive backpressureExample - High-throughput server:
import { CSVLexerTransformer, CSVRecordAssemblerTransformer } from 'web-csv-toolbox';
const response = await fetch('large-dataset.csv');
await response.body
.pipeThrough(new TextDecoderStream())
.pipeThrough(new CSVLexerTransformer(
{ backpressureCheckInterval: 200 }, // Less frequent checks
{
highWaterMark: 131072, // 128KB
size: (chunk) => chunk.length,
},
new CountQueuingStrategy({ highWaterMark: 2048 }) // 2048 tokens
))
.pipeThrough(new CSVRecordAssemblerTransformer(
{ backpressureCheckInterval: 20 }, // Less frequent checks
new CountQueuingStrategy({ highWaterMark: 2048 }), // 2048 tokens
new CountQueuingStrategy({ highWaterMark: 512 }) // 512 records
))
.pipeTo(yourRecordProcessor);
Example - Slow consumer (API writes):
await csvStream
.pipeThrough(new CSVLexerTransformer()) // Use defaults
.pipeThrough(new CSVRecordAssemblerTransformer(
{ backpressureCheckInterval: 2 }, // Very responsive
new CountQueuingStrategy({ highWaterMark: 512 }),
new CountQueuingStrategy({ highWaterMark: 64 })
))
.pipeTo(new WritableStream({
async write(record) {
await fetch('/api/save', { method: 'POST', body: JSON.stringify(record) });
}
}));
Benchmarking: Use the provided benchmark tool to find optimal values for your use case:
pnpm --filter web-csv-toolbox-benchmark queuing-strategy
See benchmark/queuing-strategy.bench.ts for implementation details.
These APIs are experimental and may change in the future.
You can use WebAssembly to parse CSV data for high performance.
β οΈ Experimental Notice:
WASM Limitations:
". (Double quotation mark)
outputFormat: 'array' requires the JavaScript engine (engine: { wasm: false }).import { loadWASM, parseStringToArraySyncWASM } from "web-csv-toolbox";
// load WebAssembly module
await loadWASM();
const csv = "a,b,c\n1,2,3";
// parse CSV string
const result = parseStringToArraySyncWASM(csv);
console.log(result);
// Prints:
// [{ a: "1", b: "2", c: "3" }]
function loadWASM(): Promise<void>: π
function parseStringToArraySyncWASM(string[, options]): CSVRecord[]: π
| Option | Description | Default | Notes |
|---|---|---|---|
delimiter |
Character to separate fields | , |
|
quotation |
Character used for quoting fields | " |
|
maxBufferSize |
Maximum internal buffer size (characters) | 10 * 1024 * 1024 |
Set to Number.POSITIVE_INFINITY to disable (not recommended for untrusted input). Measured in UTF-16 code units. |
maxFieldCount |
Maximum fields allowed per record | 100000 |
Set to Number.POSITIVE_INFINITY to disable (not recommended for untrusted input) |
header |
Custom headers for the parsed records | First row | If not provided, the first row is used as headers |
outputFormat |
Record shape ('object' or 'array') |
'object' |
'array' returns type-safe tuples; not available when running through WASM today |
includeHeader |
Emit header row when using array output | false |
Only valid with outputFormat: 'array' β the header becomes the first emitted record |
columnCountStrategy |
Handle column-count mismatches when a header is provided | 'keep' for array format / 'pad' for object format |
Choose between keep, pad, strict, or truncate to control how rows align with the header |
signal |
AbortSignal to cancel processing | undefined |
Allows aborting of long-running operations |
High-level and mid-level parsers now let you choose whether records come back as objects (default) or as tuple-like arrays:
const header = ["name", "age"] as const;
// Object output (default)
for await (const record of parse(csv, { header })) {
record.name; // string
}
// Array output with named tuples
const rows = await parse.toArray(csv, {
header,
outputFormat: "array",
includeHeader: true,
columnCountStrategy: "pad",
engine: { wasm: false }, // Array output currently runs on the JS engine only
});
// rows[0] === ['name', 'age'] (header row)
// rows[1] has type readonly [name: string, age: string]
outputFormat: 'object' (default) returns familiar { column: value } objects.outputFormat: 'array' returns readonly tuples whose indices inherit names from the header for stronger TypeScript inference.includeHeader: true prepends the header row when you also set outputFormat: 'array'.columnCountStrategy controls how rows with too many or too few columns are treated when a header is present:
keep: emit rows exactly as they appear (default for array output with inferred headers)pad: fill short rows with undefined and truncate long rows (default for object output)strict: throw if the row length differs from the headertruncate: discard columns beyond the header length without padding short rowsβ οΈ Array output is not yet available inside the WebAssembly execution path. If you request
outputFormat: 'array', force the JavaScript engine withengine: { wasm: false }(or run in an environment where WASM is disabled).
| Option | Description | Default | Notes |
|---|---|---|---|
charset |
Character encoding for binary CSV inputs | utf-8 |
See Encoding API Compatibility for the encoding formats that can be specified. |
maxBinarySize |
Maximum binary size for BufferSource inputs (bytes) | 100 * 1024 * 1024 (100MB) |
Set to Number.POSITIVE_INFINITY to disable (not recommended for untrusted input) |
decompression |
Decompression algorithm for compressed CSV inputs | See DecompressionStream Compatibility. Default support: gzip, deflate. deflate-raw is runtime-dependent and experimental (requires allowExperimentalCompressions: true for Response/Request inputs). |
|
ignoreBOM |
Whether to ignore Byte Order Mark (BOM) | false |
See TextDecoderOptions.ignoreBOM for more information about the BOM. |
fatal |
Throw an error on invalid characters | false |
See TextDecoderOptions.fatal for more information. |
allowExperimentalCompressions |
Allow experimental/future compression formats | false |
When enabled, passes unknown compression formats to runtime. Use cautiously. See example below. |
web-csv-toolbox uses different memory patterns depending on the API you choose:
import { parse } from 'web-csv-toolbox';
// β
Memory efficient: processes one record at a time
const response = await fetch('https://example.com/large-data.csv');
for await (const record of parse(response)) {
console.log(record);
// Memory footprint: ~few KB per iteration
}
import { parse } from 'web-csv-toolbox';
// β οΈ Loads entire result into memory
const csv = await fetch('data.csv').then(r => r.text());
const records = await parse.toArray(csv);
// Memory footprint: entire file + parsed array
| Platform | Streaming | Array-Based | Notes |
|---|---|---|---|
| Browser | Any size | < 10MB | Browser heap limits apply (~100MB-4GB depending on browser) |
| Node.js | Any size | < 100MB | Use --max-old-space-size flag for larger heaps |
| Deno | Any size | < 100MB | Similar to Node.js |
import { parse } from 'web-csv-toolbox';
const response = await fetch('https://example.com/large-data.csv');
// β
Good: Streaming approach (constant memory usage)
for await (const record of parse(response)) {
// Process each record immediately
console.log(record);
// Memory footprint: O(1) - only one record in memory at a time
}
// β Avoid: Loading entire file into memory first
const response2 = await fetch('https://example.com/large-data.csv');
const text = await response2.text(); // Loads entire file into memory
const records = await parse.toArray(text); // Loads all records into memory
for (const record of records) {
console.log(record);
// Memory footprint: O(n) - entire file + all records in memory
}
import { parse } from 'web-csv-toolbox';
// Set up a timeout of 30 seconds (30000 milliseconds)
const signal = AbortSignal.timeout(30000);
const response = await fetch('https://example.com/large-data.csv');
try {
for await (const record of parse(response, { signal })) {
// Process each record
console.log(record);
}
} catch (error) {
if (error instanceof DOMException && error.name === 'TimeoutError') {
// Handle timeout
console.log('CSV processing was aborted due to timeout.');
} else {
// Handle other errors
console.error('An error occurred during CSV processing:', error);
}
}
import { parseStringToArraySyncWASM } from 'web-csv-toolbox';
// Compiled WASM code for improved performance (UTF-8 only)
// See CodSpeed benchmarks for actual performance metrics
const records = parseStringToArraySyncWASM(csvString);
") onlyFor production use with untrusted input, consider:
AbortSignal.timeout() to prevent resource exhaustionmaxBinarySize option to limit BufferSource inputs (default: 100MB bytes)maxBufferSize option to limit internal buffer size (default: 10M characters)maxFieldCount option to limit fields per record (default: 100,000)When processing CSV files from untrusted sources (especially compressed files), you can implement size limits using a custom TransformStream:
import { parse } from 'web-csv-toolbox';
// Create a size-limiting TransformStream
class SizeLimitStream extends TransformStream {
constructor(maxBytes) {
let bytesRead = 0;
super({
transform(chunk, controller) {
bytesRead += chunk.length;
if (bytesRead > maxBytes) {
controller.error(new Error(`Size limit exceeded: ${maxBytes} bytes`));
} else {
controller.enqueue(chunk);
}
}
});
}
}
// Example: Limit decompressed data to 10MB
const response = await fetch('https://untrusted-source.com/data.csv.gz');
const limitedStream = response.body
.pipeThrough(new DecompressionStream('gzip'))
.pipeThrough(new SizeLimitStream(10 * 1024 * 1024)); // 10MB limit
try {
for await (const record of parse(limitedStream)) {
console.log(record);
}
} catch (error) {
if (error.message.includes('Size limit exceeded')) {
console.error('File too large - possible compression bomb attack');
}
}
Note: The library automatically validates Content-Encoding headers when parsing Response objects, rejecting unsupported compression formats.
By default, the library only supports well-tested compression formats: gzip and deflate. Some runtimes may support additional formats like deflate-raw or Brotli, but these are runtime-dependent and not guaranteed. If you need to use these formats, you can enable experimental mode:
import { parse } from 'web-csv-toolbox';
// β
Default behavior: Only known formats
const response = await fetch('data.csv.gz');
await parse(response); // Works
// β οΈ Experimental: Allow future formats
const response2 = await fetch('data.csv.br'); // Brotli compression
try {
await parse(response2, { allowExperimentalCompressions: true });
// Works if runtime supports Brotli
} catch (error) {
// Runtime will throw if format is unsupported
console.error('Runtime does not support this compression format');
}
When to use this:
Cautions:
The easiest way to contribute is to use the library and star the repository.
Feel free to ask questions on GitHub Discussions.
Please create an issue at GitHub Issues.
Please support kamiazya.
Even just a dollar is enough motivation to develop π
This software is released under the MIT License, see LICENSE.