AbortController to manually cancel operations as needed.AbortSignal.timeout to automatically cancel operations that exceed a specified time limit.RangeError when buffer exceeds the limit.RangeError when field count exceeds the limit.RangeError when binary size exceeds the limit.strings, ReadableStreams, or Response objects., and " respectively.This package can then be installed using a package manager.
# Install with npm
$ npm install web-csv-toolbox
# Or Yarn
$ yarn add web-csv-toolbox
# Or pnpm
$ pnpm add web-csv-toolbox
<script type="module">
import { parse } from 'https://unpkg.com/web-csv-toolbox';
const csv = `name,age
Alice,42
Bob,69`;
for await (const record of parse(csv)) {
console.log(record);
}
</script>
You can install and use the package by specifying the following:
import { parse } from "npm:web-csv-toolbox";
import { parse } from 'web-csv-toolbox';
const csv = `name,age
Alice,42
Bob,69`;
for await (const record of parse(csv)) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
ReadableStreamsimport { parse } from 'web-csv-toolbox';
const csv = `name,age
Alice,42
Bob,69`;
const stream = new ReadableStream({
start(controller) {
controller.enqueue(csv);
controller.close();
},
});
for await (const record of parse(stream)) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
Response objectsimport { parse } from 'web-csv-toolbox';
const response = await fetch('https://example.com/data.csv');
for await (const record of parse(response)) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
import { parse } from 'web-csv-toolbox';
const csv = `name\tage
Alice\t42
Bob\t69`;
for await (const record of parse(csv, { delimiter: '\t' })) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
import { parse } from 'web-csv-toolbox';
const csv = `Alice,42
Bob,69`;
for await (const record of parse(csv, { headers: ['name', 'age'] })) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
Some CSV files donβt include a header row. You can provide custom headers manually:
import { parse } from 'web-csv-toolbox';
// Example: Sensor data without headers
const sensorData = `25.5,60,1024
26.1,58,1020
24.8,62,1025`;
// Provide headers explicitly
for await (const record of parse(sensorData, {
header: ['temperature', 'humidity', 'pressure']
})) {
console.log(`Temp: ${record.temperature}Β°C, Humidity: ${record.humidity}%, Pressure: ${record.pressure} hPa`);
}
// Output:
// Temp: 25.5Β°C, Humidity: 60%, Pressure: 1024 hPa
// Temp: 26.1Β°C, Humidity: 58%, Pressure: 1020 hPa
// Temp: 24.8Β°C, Humidity: 62%, Pressure: 1025 hPa
AbortSignal / AbortController SupportSupport for AbortSignal / AbortController, enabling you to cancel ongoing asynchronous CSV processing tasks.
This feature is useful for scenarios where processing needs to be halted, such as when a user navigates away from the page or other conditions that require stopping the task early.
import { parse } from 'web-csv-toolbox';
const controller = new AbortController();
const csv = "name,age\nAlice,30\nBob,25";
try {
// Parse the CSV data then pass the AbortSignal to the parse function
for await (const record of parse(csv, { signal: controller.signal })) {
console.log(record);
}
} catch (error) {
if (error instanceof DOMException && error.name === 'AbortError') {
// The CSV processing was aborted by the user
console.log('CSV processing was aborted by the user.');
} else {
// An error occurred during CSV processing
console.error('An error occurred:', error);
}
}
// Some abort logic, like a cancel button
document.getElementById('cancel-button')
.addEventListener('click', () => {
controller.abort();
});
import { parse } from 'web-csv-toolbox';
// Set up a timeout of 5 seconds (5000 milliseconds)
const signal = AbortSignal.timeout(5000);
const csv = "name,age\nAlice,30\nBob,25";
try {
// Pass the AbortSignal to the parse function
const result = await parse.toArray(csv, { signal });
console.log(result);
} catch (error) {
if (error instanceof DOMException && error.name === 'TimeoutError') {
// Handle the case where the processing was aborted due to timeout
console.log('CSV processing was aborted due to timeout.');
} else {
// Handle other errors
console.error('An error occurred during CSV processing:', error);
}
}
| Versions | Status |
|---|---|
| 20.x | β |
| 22.x | β |
| 24.x | β |
| OS | Chrome | Firefox | Default |
|---|---|---|---|
| Windows | β | β | β (Edge) |
| macOS | β | β | β¬ (Safari *) |
| Linux | β | β | - |
* Safari: Basic functionality is expected to work, but it is not yet automatically tested in our CI environment.
These APIs are designed for Simplicity and Ease of Use, providing an intuitive and straightforward experience for users.
function parse(input[, options]): AsyncIterableIterator<CSVRecord>: π
function parse.toArray(input[, options]): Promise<CSVRecord[]>: π
The input paramater can be a string, a ReadableStream
of strings or Uint8Arrays,
or a Uint8Array object,
or a ArrayBuffer object,
or a Response object.
These APIs are optimized for Enhanced Performance and Control, catering to users who need more detailed and fine-tuned functionality.
function parseString(string[, options]): π
function parseBinary(buffer[, options]): π
function parseResponse(response[, options]): π
Response objects.function parseStream(stream[, options]): π
function parseStringStream(stream[, options]): π
function parseUint8ArrayStream(stream[, options]): π
These APIs are built for Advanced Customization and Pipeline Design, ideal for developers looking for in-depth control and flexibility.
class CSVLexerTransformer: π
class CSVRecordAssemblerTransformer: π
Both CSVLexerTransformer and CSVRecordAssemblerTransformer support custom queuing strategies following the Web Streams API pattern. Strategies are passed as constructor arguments with data-type-aware size counting and configurable backpressure handling.
Constructor signature:
new CSVLexerTransformer(options?, writableStrategy?, readableStrategy?)
new CSVRecordAssemblerTransformer(options?, writableStrategy?, readableStrategy?)
Default queuing strategies (starting points, not benchmarked):
// CSVLexerTransformer defaults
writableStrategy: {
highWaterMark: 65536, // 64KB of characters
size: (chunk) => chunk.length, // Count by string length
checkInterval: 100 // Check backpressure every 100 tokens
}
readableStrategy: {
highWaterMark: 1024, // 1024 tokens
size: (tokens) => tokens.length, // Count by number of tokens
checkInterval: 100 // Check backpressure every 100 tokens
}
// CSVRecordAssemblerTransformer defaults
writableStrategy: {
highWaterMark: 1024, // 1024 tokens
size: (tokens) => tokens.length, // Count by number of tokens
checkInterval: 10 // Check backpressure every 10 records
}
readableStrategy: {
highWaterMark: 256, // 256 records
size: () => 1, // Each record counts as 1
checkInterval: 10 // Check backpressure every 10 records
}
Key Features:
π― Smart Size Counting:
β‘ Cooperative Backpressure:
controller.desiredSize during processingπ§ Tunable Check Interval:
checkInterval: How often to check for backpressureβ οΈ Important: These defaults are theoretical starting points based on data flow characteristics, not empirical benchmarks. Optimal values vary by runtime (browser/Node.js/Deno), file size, memory constraints, and CPU performance. Profile your specific use case to find the best values.
When to customize:
highWaterMark (128KB+, 2048+ tokens), higher checkInterval (200-500)highWaterMark (16KB, 256 tokens), lower checkInterval (10-25)highWaterMark, lower checkInterval for responsive backpressureExample - High-throughput server:
import { CSVLexerTransformer, CSVRecordAssemblerTransformer } from 'web-csv-toolbox';
const response = await fetch('large-dataset.csv');
await response.body
.pipeThrough(new TextDecoderStream())
.pipeThrough(new CSVLexerTransformer(
{},
{
highWaterMark: 131072, // 128KB
size: (chunk) => chunk.length,
checkInterval: 200 // Less frequent checks
},
{
highWaterMark: 2048, // 2048 tokens
size: (tokens) => tokens.length,
checkInterval: 100
}
))
.pipeThrough(new CSVRecordAssemblerTransformer(
{},
{
highWaterMark: 2048, // 2048 tokens
size: (tokens) => tokens.length,
checkInterval: 20
},
{
highWaterMark: 512, // 512 records
size: () => 1,
checkInterval: 10
}
))
.pipeTo(yourRecordProcessor);
Example - Slow consumer (API writes):
await csvStream
.pipeThrough(new CSVLexerTransformer()) // Use defaults
.pipeThrough(new CSVRecordAssemblerTransformer(
{},
{ highWaterMark: 512, size: (t) => t.length, checkInterval: 5 },
{ highWaterMark: 64, size: () => 1, checkInterval: 2 } // Very responsive
))
.pipeTo(new WritableStream({
async write(record) {
await fetch('/api/save', { method: 'POST', body: JSON.stringify(record) });
}
}));
Benchmarking: Use the provided benchmark tool to find optimal values for your use case:
pnpm --filter web-csv-toolbox-benchmark queuing-strategy
See benchmark/queuing-strategy.bench.ts for implementation details.
These APIs are experimental and may change in the future.
You can use WebAssembly to parse CSV data for high performance.
". (Double quotation mark)
import { loadWASM, parseStringWASM } from "web-csv-toolbox";
// load WebAssembly module
await loadWASM();
const csv = "a,b,c\n1,2,3";
// parse CSV string
const result = parseStringToArraySyncWASM(csv);
console.log(result);
// Prints:
// [{ a: "1", b: "2", c: "3" }]
function loadWASM(): Promise<void>: π
function parseStringToArraySyncWASM(string[, options]): CSVRecord[]: π
| Option | Description | Default | Notes |
|---|---|---|---|
delimiter |
Character to separate fields | , |
|
quotation |
Character used for quoting fields | " |
|
maxBufferSize |
Maximum internal buffer size (characters) | 10 * 1024 * 1024 |
Set to Number.POSITIVE_INFINITY to disable (not recommended for untrusted input). Measured in UTF-16 code units. |
maxFieldCount |
Maximum fields allowed per record | 100000 |
Set to Number.POSITIVE_INFINITY to disable (not recommended for untrusted input) |
headers |
Custom headers for the parsed records | First row | If not provided, the first row is used as headers |
signal |
AbortSignal to cancel processing | undefined |
Allows aborting of long-running operations |
| Option | Description | Default | Notes |
|---|---|---|---|
charset |
Character encoding for binary CSV inputs | utf-8 |
See Encoding API Compatibility for the encoding formats that can be specified. |
maxBinarySize |
Maximum binary size for ArrayBuffer/Uint8Array inputs (bytes) | 100 * 1024 * 1024 (100MB) |
Set to Number.POSITIVE_INFINITY to disable (not recommended for untrusted input) |
decompression |
Decompression algorithm for compressed CSV inputs | See DecompressionStream Compatibility. Supports: gzip, deflate, deflate-raw | |
ignoreBOM |
Whether to ignore Byte Order Mark (BOM) | false |
See TextDecoderOptions.ignoreBOM for more information about the BOM. |
fatal |
Throw an error on invalid characters | false |
See TextDecoderOptions.fatal for more information. |
allowExperimentalCompressions |
Allow experimental/future compression formats | false |
When enabled, passes unknown compression formats to runtime. Use cautiously. See example below. |
web-csv-toolbox uses different memory patterns depending on the API you choose:
import { parse } from 'web-csv-toolbox';
// β
Memory efficient: processes one record at a time
const response = await fetch('https://example.com/large-data.csv');
for await (const record of parse(response)) {
console.log(record);
// Memory footprint: ~few KB per iteration
}
import { parse } from 'web-csv-toolbox';
// β οΈ Loads entire result into memory
const csv = await fetch('data.csv').then(r => r.text());
const records = await parse.toArray(csv);
// Memory footprint: entire file + parsed array
| Platform | Streaming | Array-Based | Notes |
|---|---|---|---|
| Browser | Any size | < 10MB | Browser heap limits apply (~100MB-4GB depending on browser) |
| Node.js | Any size | < 100MB | Use --max-old-space-size flag for larger heaps |
| Deno | Any size | < 100MB | Similar to Node.js |
import { parse } from 'web-csv-toolbox';
const response = await fetch('https://example.com/large-data.csv');
// β
Good: Streaming approach (constant memory usage)
for await (const record of parse(response)) {
// Process each record immediately
console.log(record);
// Memory footprint: O(1) - only one record in memory at a time
}
// β Avoid: Loading entire file into memory first
const response2 = await fetch('https://example.com/large-data.csv');
const text = await response2.text(); // Loads entire file into memory
const records = await parse.toArray(text); // Loads all records into memory
for (const record of records) {
console.log(record);
// Memory footprint: O(n) - entire file + all records in memory
}
import { parse } from 'web-csv-toolbox';
// Set up a timeout of 30 seconds (30000 milliseconds)
const signal = AbortSignal.timeout(30000);
const response = await fetch('https://example.com/large-data.csv');
try {
for await (const record of parse(response, { signal })) {
// Process each record
console.log(record);
}
} catch (error) {
if (error instanceof DOMException && error.name === 'TimeoutError') {
// Handle timeout
console.log('CSV processing was aborted due to timeout.');
} else {
// Handle other errors
console.error('An error occurred during CSV processing:', error);
}
}
import { parseStringToArraySyncWASM } from 'web-csv-toolbox';
// 2-3x faster for large CSV strings (UTF-8 only)
const records = parseStringToArraySyncWASM(csvString);
") onlyFor production use with untrusted input, consider:
AbortSignal.timeout() to prevent resource exhaustionmaxBinarySize option to limit ArrayBuffer/Uint8Array inputs (default: 100MB bytes)maxBufferSize option to limit internal buffer size (default: 10M characters)maxFieldCount option to limit fields per record (default: 100,000)When processing CSV files from untrusted sources (especially compressed files), you can implement size limits using a custom TransformStream:
import { parse } from 'web-csv-toolbox';
// Create a size-limiting TransformStream
class SizeLimitStream extends TransformStream {
constructor(maxBytes) {
let bytesRead = 0;
super({
transform(chunk, controller) {
bytesRead += chunk.length;
if (bytesRead > maxBytes) {
controller.error(new Error(`Size limit exceeded: ${maxBytes} bytes`));
} else {
controller.enqueue(chunk);
}
}
});
}
}
// Example: Limit decompressed data to 10MB
const response = await fetch('https://untrusted-source.com/data.csv.gz');
const limitedStream = response.body
.pipeThrough(new DecompressionStream('gzip'))
.pipeThrough(new SizeLimitStream(10 * 1024 * 1024)); // 10MB limit
try {
for await (const record of parse(limitedStream)) {
console.log(record);
}
} catch (error) {
if (error.message.includes('Size limit exceeded')) {
console.error('File too large - possible compression bomb attack');
}
}
Note: The library automatically validates Content-Encoding headers when parsing Response objects, rejecting unsupported compression formats.
By default, the library only supports well-tested compression formats: gzip, deflate, and deflate-raw. If you need to use newer formats (like Brotli) that your runtime supports but the library hasn't explicitly added yet, you can enable experimental mode:
import { parse } from 'web-csv-toolbox';
// β
Default behavior: Only known formats
const response = await fetch('data.csv.gz');
await parse(response); // Works
// β οΈ Experimental: Allow future formats
const response2 = await fetch('data.csv.br'); // Brotli compression
try {
await parse(response2, { allowExperimentalCompressions: true });
// Works if runtime supports Brotli
} catch (error) {
// Runtime will throw if format is unsupported
console.error('Runtime does not support this compression format');
}
When to use this:
Cautions:
The easiest way to contribute is to use the library and star the repository.
Feel free to ask questions on GitHub Discussions.
Please create an issue at GitHub Issues.
Please support kamiazya.
Even just a dollar is enough motivation to develop π
This software is released under the MIT License, see LICENSE.