This guide shows you how to use web-csv-toolbox's low-level APIs to build custom CSV parsers for specialized use cases.
Use low-level APIs when you need:
For standard CSV parsing, use the high-level APIs: parse(), parseString()
web-csv-toolbox uses a 3-tier architecture for CSV parsing:
CSV Data → Parser (Lexer + Assembler) → Records
Recommended for most custom parsing needs:
createStringCSVParser(options?) - Factory for string CSV parsers (returns format-specific parser)
FlexibleStringObjectCSVParser - Object output formatFlexibleStringArrayCSVParser - Array output formatcreateBinaryCSVParser(options?) - Factory for binary CSV parsers with encoding support
FlexibleBinaryObjectCSVParser - Object output formatFlexibleBinaryArrayCSVParser - Array output format{ stream: true } optionCSVProcessingOptions, no engine option)CSV String → CSVLexer → Tokens → CSVRecordAssembler → Records
For advanced customization:
FlexibleStringCSVLexer - TokenizationFlexibleCSVRecordAssembler - Record assemblyBuild your own parser using token types and interfaces.
See: Parsing Architecture
The simplest way to build a custom parser is using Parser Models (Tier 1):
import { createStringCSVParser } from 'web-csv-toolbox';
function parseCSV(csv: string) {
const parser = createStringCSVParser({
header: ['name', 'age'],
// outputFormat: 'object' is default
});
return parser.parse(csv);
}
// Usage
for (const record of parseCSV('Alice,30\nBob,25\n')) {
console.log(record);
}
// { name: 'Alice', age: '30' }
// { name: 'Bob', age: '25' }
Note: createStringCSVParser accepts CSVProcessingOptions (no engine option). For high-level APIs with execution strategy support, use parseString() instead.
Benefits:
For advanced control, use the Lexer + Assembler pipeline (Tier 2):
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
function parseCSV(csv: string) {
// Stage 1: Tokenization
const lexer = new FlexibleStringCSVLexer();
const tokens = lexer.lex(csv);
// Stage 2: Record assembly
const assembler = new FlexibleCSVRecordAssembler();
const records = assembler.assemble(tokens);
return records;
}
// Usage
for (const record of parseCSV('name,age\r\nAlice,30\r\n')) {
console.log(record);
}
// { name: 'Alice', age: '30' }
Use when:
Using Parser Model:
import { createStringCSVParser } from 'web-csv-toolbox';
function parseTSV(tsv: string) {
const parser = createStringCSVParser({ delimiter: '\t' });
return parser.parse(tsv);
}
// Usage
for (const record of parseTSV('name\tage\r\nAlice\t30\r\n')) {
console.log(record);
}
// { name: 'Alice', age: '30' }
Using Low-Level APIs:
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
function parseTSV(tsv: string) {
const lexer = new FlexibleStringCSVLexer({ delimiter: '\t' });
const tokens = lexer.lex(tsv);
const assembler = new FlexibleCSVRecordAssembler();
const records = assembler.assemble(tokens);
return records;
}
// Usage
for (const record of parseTSV('name\tage\r\nAlice\t30\r\n')) {
console.log(record);
}
// { name: 'Alice', age: '30' }
Using Parser Model:
import { createStringCSVParser } from 'web-csv-toolbox';
function parsePSV(psv: string) {
const parser = createStringCSVParser({ delimiter: '|' });
return parser.parse(psv);
}
// Usage
for (const record of parsePSV('name|age\r\nAlice|30\r\n')) {
console.log(record);
}
// { name: 'Alice', age: '30' }
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
function parseWithCustomHeader(csv: string, header: string[]) {
const lexer = new FlexibleStringCSVLexer();
const tokens = lexer.lex(csv);
// Pre-define header (all rows treated as data)
const assembler = new FlexibleCSVRecordAssembler({ header });
const records = assembler.assemble(tokens);
return records;
}
// Usage: CSV without header row - use custom field names
for (const record of parseWithCustomHeader(
'Alice,Smith\r\nBob,Johnson\r\n',
['firstName', 'lastName']
)) {
console.log(record);
}
// { firstName: 'Alice', lastName: 'Smith' }
// { firstName: 'Bob', lastName: 'Johnson' }
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
function parseHeaderless(csv: string, header: string[]) {
const lexer = new FlexibleStringCSVLexer();
const tokens = lexer.lex(csv);
const assembler = new FlexibleCSVRecordAssembler({ header });
const records = assembler.assemble(tokens);
return records;
}
// Usage (CSV has no header row)
for (const record of parseHeaderless(
'Alice,30\r\nBob,25\r\n',
['name', 'age']
)) {
console.log(record);
}
// { name: 'Alice', age: '30' }
// { name: 'Bob', age: '25' }
Use the new createCSVRecordAssembler() factory when you need to swap between object output (default) and array/tuple output or to fine-tune how rows align with headers:
import {
createCSVRecordAssembler,
FlexibleStringCSVLexer,
} from 'web-csv-toolbox';
function parseAsTuples(csv: string) {
const lexer = new FlexibleStringCSVLexer();
const tokens = lexer.lex(csv);
const assembler = createCSVRecordAssembler({
header: ['name', 'age'],
outputFormat: 'array',
includeHeader: true,
columnCountStrategy: 'pad',
});
return assembler.assemble(tokens);
}
const rows = [...parseAsTuples('name,age\r\nAlice,30\r\nBob,25\r\n')];
// rows[0] -> ['name', 'age'] (header row)
// rows[1] -> readonly [name: 'Alice', age: '30']
outputFormat: 'object' keeps the traditional { column: value } shape.outputFormat: 'array' returns readonly tuples with header-derived names (great for TypeScript exhaustiveness checks).includeHeader: true prepends the header row when you output arrays — perfect for re-exporting CSV data.columnCountStrategy decides how mismatched rows behave when you provide a header:
keep (default for array format) emits rows exactly as parsed.pad (default for object format) fills missing fields with undefined and trims extras.strict throws if a row has a different column count.truncate silently drops columns beyond the header length.Need to stick with classes? You can still instantiate
FlexibleCSVObjectRecordAssemblerorFlexibleCSVArrayRecordAssemblerdirectly.FlexibleCSVRecordAssemblerremains for backward compatibility, but the factory makes it easier to share consistent options.
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
function* parseWithValidation(csv: string) {
const lexer = new FlexibleStringCSVLexer();
const tokens = lexer.lex(csv);
const assembler = new FlexibleCSVRecordAssembler<['name', 'age', 'email']>();
const records = assembler.assemble(tokens);
for (const record of records) {
// Validate each field
if (!record.name || record.name.trim() === '') {
throw new Error(`Invalid name: ${record.name}`);
}
const age = Number(record.age);
if (!Number.isInteger(age) || age < 0 || age > 150) {
throw new Error(`Invalid age: ${record.age}`);
}
if (!record.email?.includes('@')) {
throw new Error(`Invalid email: ${record.email}`);
}
yield record;
}
}
// Usage
try {
for (const record of parseWithValidation(
'name,age,email\r\nAlice,30,alice@example.com\r\nBob,invalid,bob\r\n'
)) {
console.log(record);
}
} catch (error) {
console.error(error.message); // "Invalid age: invalid"
}
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
import { z } from 'zod';
const recordSchema = z.object({
name: z.string().min(1).max(100),
age: z.coerce.number().int().min(0).max(150),
email: z.string().email(),
});
function* parseWithSchema(csv: string) {
const lexer = new FlexibleStringCSVLexer();
const tokens = lexer.lex(csv);
const assembler = new FlexibleCSVRecordAssembler<['name', 'age', 'email']>();
const records = assembler.assemble(tokens);
for (const record of records) {
yield recordSchema.parse(record);
}
}
// Usage
try {
for (const record of parseWithSchema(
'name,age,email\r\nAlice,30,alice@example.com\r\n'
)) {
console.log(record); // { name: 'Alice', age: 30, email: 'alice@example.com' }
// Note: age is number (coerced by Zod)
}
} catch (error) {
console.error('Validation error:', error);
}
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
function* parseWithTypes(csv: string) {
const lexer = new FlexibleStringCSVLexer();
const tokens = lexer.lex(csv);
const assembler = new FlexibleCSVRecordAssembler<['name', 'age', 'active']>();
const records = assembler.assemble(tokens);
for (const record of records) {
yield {
name: record.name,
age: Number(record.age),
active: record.active === 'true',
};
}
}
// Usage
for (const record of parseWithTypes(
'name,age,active\r\nAlice,30,true\r\n'
)) {
console.log(record); // { name: 'Alice', age: 30, active: true }
console.log(typeof record.age); // 'number'
console.log(typeof record.active); // 'boolean'
}
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
function* parseWithMapping(
csv: string,
mapping: Record<string, string>
) {
const lexer = new FlexibleStringCSVLexer();
const tokens = lexer.lex(csv);
const assembler = new FlexibleCSVRecordAssembler();
const records = assembler.assemble(tokens);
for (const record of records) {
const mapped: Record<string, string | undefined> = {};
for (const [oldKey, newKey] of Object.entries(mapping)) {
mapped[newKey] = record[oldKey];
}
yield mapped;
}
}
// Usage
for (const record of parseWithMapping(
'first_name,last_name\r\nAlice,Smith\r\n',
{ first_name: 'firstName', last_name: 'lastName' }
)) {
console.log(record); // { firstName: 'Alice', lastName: 'Smith' }
}
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
function* parseWithFilter(
csv: string,
predicate: (record: any) => boolean
) {
const lexer = new FlexibleStringCSVLexer();
const tokens = lexer.lex(csv);
const assembler = new FlexibleCSVRecordAssembler();
const records = assembler.assemble(tokens);
for (const record of records) {
if (predicate(record)) {
yield record;
}
}
}
// Usage: Filter adults only
for (const record of parseWithFilter(
'name,age\r\nAlice,30\r\nBob,17\r\nCharlie,25\r\n',
(record) => Number(record.age) >= 18
)) {
console.log(record);
}
// { name: 'Alice', age: '30' }
// { name: 'Charlie', age: '25' }
Process CSV data in chunks using Parser Models:
import { createStringCSVParser } from 'web-csv-toolbox';
const parser = createStringCSVParser({
header: ['name', 'age'],
});
// First chunk - incomplete record
const records1 = parser.parse('Alice,', { stream: true });
console.log(records1); // [] - waiting for complete record
// Second chunk - completes the record
const records2 = parser.parse('30\nBob,25\n', { stream: true });
console.log(records2);
// [{ name: 'Alice', age: '30' }, { name: 'Bob', age: '25' }]
// Final chunk - flush remaining data
const records3 = parser.parse(); // Call without arguments to flush
console.log(records3); // []
Benefits:
Use StringCSVParserStream for Web Streams API integration:
import { createStringCSVParser, StringCSVParserStream } from 'web-csv-toolbox';
const parser = createStringCSVParser({
header: ['name', 'age'],
});
const stream = new StringCSVParserStream(parser);
await fetch('data.csv')
.then(res => res.body)
.pipeThrough(new TextDecoderStream())
.pipeThrough(stream)
.pipeTo(new WritableStream({
write(record) {
console.log(record); // { name: '...', age: '...' }
}
}));
For binary data with character encoding:
import { createBinaryCSVParser, BinaryCSVParserStream } from 'web-csv-toolbox';
const parser = createBinaryCSVParser({
header: ['name', 'age'],
charset: 'utf-8',
ignoreBOM: true,
});
const stream = new BinaryCSVParserStream(parser);
await fetch('data.csv')
.then(res => res.body)
.pipeThrough(stream) // Directly pipe binary data
.pipeTo(new WritableStream({
write(record) {
console.log(record);
}
}));
For advanced control, use Lexer + Assembler directly:
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
const lexer = new FlexibleStringCSVLexer();
// First chunk - incomplete quoted field
const chunk1 = '"Hello';
const tokens1 = [...lexer.lex(chunk1, { stream: true })];
console.log(tokens1); // [] - waiting for closing quote
// Second chunk - completes the field
const chunk2 = ' World",30\r\n';
const tokens2 = [...lexer.lex(chunk2, { stream: true })];
console.log(tokens2);
// [
// { type: 'Field', value: 'Hello World' },
// { type: 'FieldDelimiter', value: ',' },
// { type: 'Field', value: '30' },
// { type: 'RecordDelimiter', value: '\r\n' }
// ]
// Final chunk - flush remaining data
const tokens3 = [...lexer.lex()]; // Call without arguments to flush
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
async function* processChunks(chunks: string[]) {
const lexer = new FlexibleStringCSVLexer();
const assembler = new FlexibleCSVRecordAssembler();
for (let i = 0; i < chunks.length; i++) {
const isLast = i === chunks.length - 1;
// Lex chunk
const tokens = lexer.lex(chunks[i], { stream: !isLast });
// Assemble records
const records = assembler.assemble(tokens, { stream: !isLast });
for (const record of records) {
yield record;
}
}
}
// Usage
const chunks = [
'name,age\r\n',
'Alice,30\r\n',
'Bob,25\r\n'
];
for await (const record of processChunks(chunks)) {
console.log(record);
}
Protect against malicious CSV files with extremely large fields:
import { FlexibleStringCSVLexer } from 'web-csv-toolbox';
const lexer = new FlexibleStringCSVLexer({
maxBufferSize: 1024 * 1024 // 1MB limit per field
});
try {
for (const token of lexer.lex(untrustedCSV)) {
console.log(token);
}
} catch (error) {
if (error instanceof RangeError) {
console.error('Buffer size exceeded - possible CSV bomb attack');
console.error(error.message);
}
}
Prevent records with excessive field counts:
import { FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
const assembler = new FlexibleCSVRecordAssembler({
maxFieldCount: 10000 // Maximum 10,000 fields per record
});
try {
for (const record of assembler.assemble(tokens)) {
console.log(record);
}
} catch (error) {
if (error instanceof RangeError) {
console.error('Field count exceeded - possible DoS attack');
console.error(error.message);
}
}
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
function* secureParseCSV(csv: string) {
// Lexer with buffer size limit
const lexer = new FlexibleStringCSVLexer({
maxBufferSize: 10 * 1024 * 1024 // 10MB per field
});
// Assembler with field count limit
const assembler = new FlexibleCSVRecordAssembler({
maxFieldCount: 1000 // 1000 fields per record
});
try {
const tokens = lexer.lex(csv);
const records = assembler.assemble(tokens);
for (const record of records) {
yield record;
}
} catch (error) {
if (error instanceof RangeError) {
console.error('Security limit exceeded:', error.message);
throw new Error('CSV file exceeds security limits');
}
throw error;
}
}
// Usage with user-uploaded file
try {
for (const record of secureParseCSV(userUploadedCSV)) {
console.log(record);
}
} catch (error) {
console.error('Failed to parse CSV:', error.message);
}
Use AbortSignal to allow user cancellation:
import { FlexibleStringCSVLexer } from 'web-csv-toolbox';
function* parseWithCancellation(
csv: string,
signal?: AbortSignal
) {
const lexer = new FlexibleStringCSVLexer({ signal });
const assembler = new FlexibleCSVRecordAssembler();
try {
const tokens = lexer.lex(csv);
const records = assembler.assemble(tokens);
for (const record of records) {
// Check if cancelled
if (signal?.aborted) {
throw new DOMException('Parsing cancelled', 'AbortError');
}
yield record;
}
} catch (error) {
if (error instanceof DOMException && error.name === 'AbortError') {
console.log('Parsing was cancelled by user');
throw error;
}
throw error;
}
}
// Usage with timeout
const controller = new AbortController();
setTimeout(() => controller.abort(), 5000); // Cancel after 5 seconds
try {
for (const record of parseWithCancellation(largeCSV, controller.signal)) {
console.log(record);
}
} catch (error) {
if (error instanceof DOMException && error.name === 'AbortError') {
console.log('Parsing timed out');
}
}
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
// HTML: <button id="cancel-btn">Cancel</button>
const cancelButton = document.getElementById('cancel-btn');
const controller = new AbortController();
cancelButton.addEventListener('click', () => {
controller.abort();
console.log('Cancellation requested');
});
async function parseWithUI(csv: string) {
const lexer = new FlexibleStringCSVLexer({ signal: controller.signal });
const assembler = new FlexibleCSVRecordAssembler();
try {
let count = 0;
const tokens = lexer.lex(csv);
const records = assembler.assemble(tokens);
for (const record of records) {
console.log(record);
count++;
// Update UI periodically
if (count % 100 === 0) {
await new Promise(resolve => setTimeout(resolve, 0)); // Yield to UI
}
}
console.log(`Completed: ${count} records`);
} catch (error) {
if (error instanceof DOMException && error.name === 'AbortError') {
console.log('User cancelled parsing');
} else {
console.error('Parsing error:', error);
}
}
}
CSVLexer produces three types of tokens:
{
type: 'Field',
value: 'Alice',
location: {
start: { line: 2, column: 1, offset: 10 },
end: { line: 2, column: 6, offset: 15 },
rowNumber: 2
}
}
{
type: 'FieldDelimiter',
value: ',',
location: {
start: { line: 2, column: 6, offset: 15 },
end: { line: 2, column: 7, offset: 16 },
rowNumber: 2
}
}
{
type: 'RecordDelimiter',
value: '\r\n',
location: {
start: { line: 2, column: 8, offset: 17 },
end: { line: 3, column: 1, offset: 19 },
rowNumber: 2
}
}
import { FlexibleStringCSVLexer } from 'web-csv-toolbox';
function analyzeTokens(csv: string) {
const lexer = new FlexibleStringCSVLexer();
const tokens = [...lexer.lex(csv)];
for (const token of tokens) {
if (token.type === 'Field') {
console.log(`Field "${token.value}" at line ${token.location.start.line}, column ${token.location.start.column}`);
}
}
}
analyzeTokens('name,age\r\nAlice,30\r\n');
// Field "name" at line 1, column 1
// Field "age" at line 1, column 6
// Field "Alice" at line 2, column 1
// Field "30" at line 2, column 7
| Requirement | Use High-Level | Use Low-Level |
|---|---|---|
| Standard CSV format | ✅ | |
| Custom delimiter | ✅ | |
| Custom validation | ✅ | |
| File upload handling | ✅ | |
| Syntax highlighting | ✅ | |
| Real-time streaming | ✅ | |
| Token inspection | ✅ | |
| Production app | ✅ | |
| Learning library | ✅ |
When you need fine-grained control over stream processing, web-csv-toolbox provides a layered approach:
parse() for learning and prototypingparseStringStream(), createStringCSVParserStream(), createStringCSVLexerTransformer(), etc.)For learning and quick prototyping, use the universal parse() function:
import { parse } from 'web-csv-toolbox';
// Automatic input detection
for await (const record of parse(csvString)) {
console.log(record);
}
For production use, choose the appropriate specialized API based on your needs:
For most streaming use cases:
import { parseStringStream, parseBinaryStream } from 'web-csv-toolbox';
// String stream
const stream = await fetch('data.csv')
.then(res => res.body)
.then(body => body.pipeThrough(new TextDecoderStream()));
for await (const record of parseStringStream(stream, { header: ['name', 'age'] })) {
console.log(record); // { name: 'Alice', age: '30' }
}
// Binary stream (handles encoding internally)
const binaryStream = await fetch('data.csv').then(res => res.body);
for await (const record of parseBinaryStream(binaryStream, { charset: 'shift-jis' })) {
console.log(record);
}
When you need fine control over stream pipelines:
import {
createStringCSVParserStream,
createBinaryCSVParserStream
} from 'web-csv-toolbox';
// String stream → CSV records
await fetch('data.csv')
.then(res => res.body)
.pipeThrough(new TextDecoderStream())
.pipeThrough(createStringCSVParserStream({
header: ['name', 'age'],
delimiter: '\t', // TSV support
}))
.pipeTo(new WritableStream({
write(record) {
console.log(record); // { name: 'Alice', age: '30' }
}
}));
// Binary stream → CSV records (handles encoding internally)
await fetch('data.csv')
.then(res => res.body)
.pipeThrough(createBinaryCSVParserStream({
charset: 'shift-jis',
ignoreBOM: true
}))
.pipeTo(new WritableStream({
write(record) {
console.log(record);
}
}));
Benefits:
When you need to insert custom processing between lexing and assembly stages:
import {
createStringCSVLexerTransformer,
createCSVRecordAssemblerTransformer
} from 'web-csv-toolbox';
// Custom validation transform between stages
class TokenFilterTransform extends TransformStream {
constructor() {
super({
transform(token, controller) {
// Filter or modify tokens before assembly
if (token.type === 'Field' && token.value !== '') {
controller.enqueue(token);
} else if (token.type !== 'Field') {
controller.enqueue(token);
}
}
});
}
}
csvStream
.pipeThrough(createStringCSVLexerTransformer({ delimiter: '\t' }))
.pipeThrough(new TokenFilterTransform()) // Custom processing
.pipeThrough(createCSVRecordAssemblerTransformer({
header: ['name', 'age']
}))
.pipeTo(yourProcessor);
Use when:
For specialized requirements, create custom Lexer or Assembler implementations and use them with TransformStream classes.
The factory functions (createStringCSVLexerTransformer, createCSVRecordAssemblerTransformer) are the default design choice for most use cases—they handle internal lexer/assembler creation and insulate your code from implementation changes. Drop down to direct class instantiation (new StringCSVLexerTransformer(customLexer)) only when you need to inject a custom lexer or assembler implementation with non-standard behavior.
Note: Low-level APIs are intended for niche requirements such as custom CSV dialects, syntax highlighting, or specialized validation. These APIs may have more frequent changes compared to Mid-level APIs. For most production use cases, prefer Mid-level APIs.
import {
StringCSVLexerTransformer,
CSVRecordAssemblerTransformer,
type StringCSVLexer,
type CSVRecordAssembler
} from 'web-csv-toolbox';
// Custom lexer implementing StringCSVLexer interface
class MyCustomLexer implements StringCSVLexer {
*lex(chunk?: string, options?: { stream?: boolean }): IterableIterator<Token> {
// Your custom lexing logic
// Handle special CSV dialects, custom escape sequences, etc.
}
}
// Custom assembler implementing CSVRecordAssembler interface
class MyCustomAssembler implements CSVRecordAssembler {
*assemble(tokens: Iterable<Token>, options?: { stream?: boolean }): IterableIterator<Record> {
// Your custom assembly logic
// Handle special record formats, validation, etc.
}
}
// Use custom components with built-in TransformStream classes
const customLexer = new MyCustomLexer();
const customAssembler = new MyCustomAssembler();
csvStream
.pipeThrough(new StringCSVLexerTransformer(customLexer))
.pipeThrough(new CSVRecordAssemblerTransformer(customAssembler))
.pipeTo(yourProcessor);
Use when:
All factory functions support custom queuing strategies for fine-tuned backpressure control:
import {
createStringCSVParserStream,
createStringCSVLexerTransformer,
createCSVRecordAssemblerTransformer
} from 'web-csv-toolbox';
// Parser stream with custom strategies
const parserStream = createStringCSVParserStream(
{ delimiter: ',' },
{ backpressureCheckInterval: 50 },
{ highWaterMark: 131072, size: (chunk) => chunk.length }, // writable
new CountQueuingStrategy({ highWaterMark: 512 }) // readable
);
// Or configure each transformer separately
const lexerTransformer = createStringCSVLexerTransformer(
{ delimiter: ',' },
{ backpressureCheckInterval: 50 },
{ highWaterMark: 131072, size: (chunk) => chunk.length },
new CountQueuingStrategy({ highWaterMark: 2048 })
);
const assemblerTransformer = createCSVRecordAssemblerTransformer(
{},
{ backpressureCheckInterval: 20 },
new CountQueuingStrategy({ highWaterMark: 2048 }),
new CountQueuingStrategy({ highWaterMark: 512 })
);
await fetch('large-file.csv')
.then(res => res.body)
.pipeThrough(new TextDecoderStream())
.pipeThrough(lexerTransformer)
.pipeThrough(assemblerTransformer)
.pipeTo(yourProcessor);
import {
createStringCSVLexerTransformer,
createCSVRecordAssemblerTransformer
} from 'web-csv-toolbox';
// Custom validation transform
class ValidationTransform extends TransformStream {
constructor() {
super({
transform(record, controller) {
try {
// Validate record
if (record.name && Number(record.age) >= 0) {
controller.enqueue(record);
} else {
console.error('Invalid record:', record);
}
} catch (error) {
console.error('Validation error:', error);
}
}
});
}
}
// Usage
const csvStream = new ReadableStream({
start(controller) {
controller.enqueue('name,age\r\n');
controller.enqueue('Alice,30\r\n');
controller.enqueue('Bob,invalid\r\n'); // Invalid
controller.enqueue('Charlie,25\r\n');
controller.close();
}
});
csvStream
.pipeThrough(createStringCSVLexerTransformer())
.pipeThrough(createCSVRecordAssemblerTransformer())
.pipeThrough(new ValidationTransform())
.pipeTo(new WritableStream({
write(record) {
console.log(record);
}
}));
// { name: 'Alice', age: '30' }
// { name: 'Charlie', age: '25' }
import {
createStringCSVLexerTransformer,
createCSVRecordAssemblerTransformer
} from 'web-csv-toolbox';
class TypeConversionTransform extends TransformStream {
constructor() {
super({
transform(record, controller) {
controller.enqueue({
name: record.name,
age: Number(record.age),
active: record.active === 'true',
});
}
});
}
}
// Usage
csvStream
.pipeThrough(createStringCSVLexerTransformer())
.pipeThrough(createCSVRecordAssemblerTransformer())
.pipeThrough(new TypeConversionTransform())
.pipeTo(new WritableStream({
write(record) {
console.log(record);
console.log(typeof record.age); // 'number'
}
}));
import { FlexibleStringCSVLexer } from 'web-csv-toolbox';
function highlightCSV(csv: string): string {
const lexer = new FlexibleStringCSVLexer();
const tokens = lexer.lex(csv);
let html = '<pre class="csv-highlight">';
for (const token of tokens) {
switch (token.type) {
case 'Field':
html += `<span class="field">${escapeHTML(token.value)}</span>`;
break;
case 'FieldDelimiter':
html += `<span class="delimiter">${escapeHTML(token.value)}</span>`;
break;
case 'RecordDelimiter':
html += '\n';
break;
}
}
html += '</pre>';
return html;
}
function escapeHTML(str: string): string {
return str
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
// CSS
const css = `
.csv-highlight .field { color: #0066cc; }
.csv-highlight .delimiter { color: #999; font-weight: bold; }
`;
// Usage
const highlighted = highlightCSV('name,age\r\nAlice,30\r\n');
console.log(highlighted);
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
function* parseWithProgress(
csv: string,
onProgress: (progress: { records: number; tokens: number }) => void
) {
const lexer = new FlexibleStringCSVLexer();
const tokens = [...lexer.lex(csv)];
const totalTokens = tokens.length;
const assembler = new FlexibleCSVRecordAssembler();
const records = assembler.assemble(tokens);
let recordCount = 0;
for (const record of records) {
recordCount++;
onProgress({ records: recordCount, tokens: totalTokens });
yield record;
}
}
// Usage
for (const record of parseWithProgress(
'name,age\r\nAlice,30\r\nBob,25\r\n',
(progress) => {
console.log(`Processed ${progress.records} records (${progress.tokens} tokens)`);
}
)) {
console.log(record);
}
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
function* parseWithErrorRecovery(csv: string) {
const lexer = new FlexibleStringCSVLexer();
const assembler = new FlexibleCSVRecordAssembler();
try {
const tokens = lexer.lex(csv);
const records = assembler.assemble(tokens);
for (const record of records) {
yield { success: true, record };
}
} catch (error) {
yield {
success: false,
error: error instanceof Error ? error.message : 'Unknown error',
};
}
}
// Usage
for (const result of parseWithErrorRecovery(
'name,age\r\nAlice,30\r\n"Unclosed quote\r\n'
)) {
if (result.success) {
console.log('Record:', result.record);
} else {
console.error('Error:', result.error);
}
}
import { FlexibleStringCSVLexer, FlexibleCSVRecordAssembler } from 'web-csv-toolbox';
interface ValidationResult {
valid: boolean;
errors: string[];
}
function validateCSV(csv: string): ValidationResult {
const errors: string[] = [];
try {
const lexer = new FlexibleStringCSVLexer();
const tokens = [...lexer.lex(csv)];
const assembler = new FlexibleCSVRecordAssembler();
const records = [...assembler.assemble(tokens)];
// Check field count consistency
let fieldCount: number | null = null;
for (let i = 0; i < records.length; i++) {
const record = records[i];
const currentFieldCount = Object.keys(record).length;
if (fieldCount === null) {
fieldCount = currentFieldCount;
} else if (currentFieldCount !== fieldCount) {
errors.push(
`Row ${i + 2}: Expected ${fieldCount} fields, got ${currentFieldCount}`
);
}
}
} catch (error) {
errors.push(error instanceof Error ? error.message : 'Unknown error');
}
return {
valid: errors.length === 0,
errors,
};
}
// Usage
const result = validateCSV('name,age\r\nAlice,30\r\nBob\r\n');
console.log(result);
// { valid: false, errors: ['Row 3: Expected 2 fields, got 1'] }
function*) for memory-efficient iterationmaxBufferSize, maxFieldCount)maxBufferSize and maxFieldCount to prevent runaway memory usageYou've learned how to:
For production CSV parsing, use the high-level APIs: parse(), parseString()