0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Universal Date Parser, A Date Parsing Library in Rust

Last updated at Posted at 2025-09-22

Introduction

Universal Date Parser is a high-performance Rust library designed to intelligently parse dates from virtually any format into standardized output. Built with performance, reliability, and ease of use in mind, it provides automatic format detection, timezone awareness, and multi-language bindings for seamless integration across different platforms and programming languages.

Library Features

Key Capabilities

  • Automatic Format Detection: Intelligently recognizes multiple date formats without configuration
  • High Performance: Ultra-fast processing at 300-600 nanoseconds per parse operation
  • Timezone Awareness: Flexible timezone handling options for global applications
  • Multi-Language Support: C FFI, WebAssembly, and Python bindings available
  • Zero-Copy Architecture: Memory-efficient implementation avoiding unnecessary allocations
  • Ambiguity Resolution: Smart handling of region-specific formats (MM/DD vs DD/MM)

Supported Date Formats

ISO 8601 Standard

// Complete ISO 8601 support
"2023-12-25T10:30:00Z"          // UTC datetime
"2023-12-25T10:30:00+09:00"     // Timezone-aware
"2023-12-25"                    // Date only
"20231225T103000Z"              // Compact format

Regional Formats

// US format (Month/Day/Year)
"12/25/2023"    // MM/DD/YYYY

// European format (Day/Month/Year)
"25/12/2023"    // DD/MM/YYYY

// Japanese format
"2023年12月25日"
"2023/12/25"

Unix Timestamps

// Second precision
"1703520645"        // Seconds since epoch

// Millisecond precision
"1703520645000"     // Milliseconds since epoch

Architecture Overview

Core Components

ParsedDate Structure

The fundamental data structure representing a successfully parsed date:

#[derive(Debug, Clone, PartialEq)]
pub struct ParsedDate {
    pub year: i32,
    pub month: u32,
    pub day: u32,
    pub hour: Option<u32>,
    pub minute: Option<u32>,
    pub second: Option<u32>,
    pub timezone_offset: Option<i32>,
    pub detected_format: String,
}

UniversalDateParser

The main parsing engine with configurable options:

pub struct UniversalDateParser {
    timezone_mode: TimezoneMode,
    ambiguity_resolver: AmbiguityResolver,
}

Format Detection Engine

Intelligent pattern matching for automatic format recognition:

pub fn detect_format(&self, input: &str) -> Option<&'static str> {
    // ISO 8601 formats (highest priority)
    if ISO_DATETIME_REGEX.is_match(input) {
        return Some("ISO 8601 DateTime");
    }
    
    // Unix timestamps
    if UNIX_TIMESTAMP_REGEX.is_match(input) {
        return if input.len() > 10 {
            Some("Unix Timestamp (ms)")
        } else {
            Some("Unix Timestamp")
        };
    }
    
    // Regional formats with ambiguity resolution
    // ... sophisticated pattern matching
}

Design Principles

Performance First

  • Zero-copy parsing where possible
  • Efficient regex compilation using lazy_static
  • Early return optimization in pattern matching
  • Benchmarked performance: 300-600ns per parse operation

Intelligent Detection

The parsing engine detects formats in the following priority order:

  1. ISO 8601 Standard (highest priority)
  2. Unix Timestamps
  3. Unambiguous Regional Formats
  4. Contextual Ambiguous Format Resolution

Timezone Awareness

Flexible timezone handling options:

  • AssumeUtc: Treat all dates as UTC (fastest)
  • AssumeLocal: Use system timezone
  • PreserveOffset: Maintain original timezone information
  • ConvertToUtc: Convert all dates to UTC

Installation and Basic Usage

Adding to Cargo Project

[dependencies]
universal-date-parser = "0.1.0"

Basic Usage Examples

use universal_date_parser::{UniversalDateParser, TimezoneMode};

fn main() {
    let parser = UniversalDateParser::new(TimezoneMode::AssumeUtc);
    
    // Parse various formats
    let inputs = vec![
        "2023-12-25T10:30:00Z",
        "12/25/2023",
        "25/12/2023",
        "1703520645",
        "December 25, 2023",
    ];
    
    for input in inputs {
        match parser.parse(input) {
            Ok(parsed) => {
                println!("Input: {} -> Result: {:?}", input, parsed);
                println!("Detected format: {}", parsed.detected_format);
            },
            Err(e) => println!("Parse error: {}", e),
        }
    }
}

Advanced Configuration

use universal_date_parser::{
    UniversalDateParser, 
    TimezoneMode, 
    AmbiguityResolver
};

fn main() {
    // Create parser with custom configuration
    let parser = UniversalDateParser::builder()
        .timezone_mode(TimezoneMode::PreserveOffset)
        .ambiguity_resolver(AmbiguityResolver::PreferEuropean)
        .build();
    
    // Parse ambiguous date
    let ambiguous_date = "01/02/2023";
    match parser.parse(ambiguous_date) {
        Ok(parsed) => {
            // With PreferEuropean setting, 01/02/2023 is interpreted as February 1st
            println!("Parsed result: {}-{:02}-{:02}", 
                parsed.year, parsed.month, parsed.day);
        },
        Err(e) => println!("Error: {}", e),
    }
}

Performance Characteristics

Benchmark Results

Comprehensive benchmark results using Criterion:

Format Type Average Time Throughput
ISO 8601 DateTime 388ns 2.6M ops/sec
ISO 8601 Date 344ns 2.9M ops/sec
US Format 392ns 2.6M ops/sec
European Format 618ns 1.6M ops/sec
Unix Timestamp 483ns 2.1M ops/sec
Unix Timestamp (ms) 598ns 1.7M ops/sec

Memory Usage

  • Minimal heap allocations
  • Static regex pattern usage
  • Stack-based parsing for maximum memory efficiency

Optimization Techniques

Zero-Copy Parsing

// Efficient parsing avoiding string copies
pub fn parse_iso_date(&self, input: &str) -> Result<ParsedDate, ParseError> {
    // Direct string slice manipulation for performance
    let year: i32 = input[0..4].parse()?;
    let month: u32 = input[5..7].parse()?;
    let day: u32 = input[8..10].parse()?;
    // ...
}

Lazy Regex Compilation

lazy_static! {
    static ref ISO_DATETIME_REGEX: Regex = 
        Regex::new(r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:\d{2})$")
        .unwrap();
}

Multi-Language Bindings

C FFI

Usage example from C:

#include "universal_date_parser.h"

int main() {
    UniversalDateParser* parser = universal_parser_new();
    
    ParsedDate result;
    int status = universal_parser_parse(parser, "2023-12-25", &result);
    
    if (status == 0) {
        printf("Parse success: %d-%d-%d\n", result.year, result.month, result.day);
    }
    
    universal_parser_free(parser);
    return 0;
}

WebAssembly

Usage from JavaScript/TypeScript:

import { UniversalDateParser } from './pkg/universal_date_parser.js';

const parser = new UniversalDateParser();
const result = parser.parse("2023-12-25T10:30:00Z");

console.log(`Parse result: ${result.year}-${result.month}-${result.day}`);
console.log(`Detected format: ${result.detected_format}`);

Python Bindings

Python support using PyO3:

import universal_date_parser

parser = universal_date_parser.UniversalDateParser()
result = parser.parse("2023-12-25T10:30:00Z")

print(f"Parse result: {result.year}-{result.month}-{result.day}")
print(f"Detected format: {result.detected_format}")

Practical Use Cases

Log Analysis System

use universal_date_parser::{UniversalDateParser, TimezoneMode};
use std::collections::HashMap;

struct LogAnalyzer {
    parser: UniversalDateParser,
    stats: HashMap<String, u32>,
}

impl LogAnalyzer {
    fn new() -> Self {
        Self {
            parser: UniversalDateParser::new(TimezoneMode::AssumeUtc),
            stats: HashMap::new(),
        }
    }
    
    fn analyze_log_entry(&mut self, log_line: &str) {
        // Extract date portion from log entry
        if let Some(date_part) = self.extract_date_from_log(log_line) {
            match self.parser.parse(&date_part) {
                Ok(parsed) => {
                    let format = parsed.detected_format.clone();
                    *self.stats.entry(format).or_insert(0) += 1;
                },
                Err(_) => {
                    *self.stats.entry("unknown".to_string()).or_insert(0) += 1;
                }
            }
        }
    }
    
    fn extract_date_from_log(&self, log_line: &str) -> Option<String> {
        // Date extraction logic based on log format
        // Real implementation would use regex patterns
        None
    }
}

Database Migration Tool

use universal_date_parser::{UniversalDateParser, TimezoneMode};

struct DataMigrator {
    parser: UniversalDateParser,
}

impl DataMigrator {
    fn new() -> Self {
        Self {
            parser: UniversalDateParser::new(TimezoneMode::ConvertToUtc),
        }
    }
    
    fn migrate_date_column(&self, values: Vec<String>) -> Vec<String> {
        values.into_iter()
            .map(|date_str| {
                match self.parser.parse(&date_str) {
                    Ok(parsed) => {
                        // Standardize to ISO 8601 UTC format
                        format!("{:04}-{:02}-{:02}T{:02}:{:02}:{:02}Z",
                            parsed.year,
                            parsed.month,
                            parsed.day,
                            parsed.hour.unwrap_or(0),
                            parsed.minute.unwrap_or(0),
                            parsed.second.unwrap_or(0)
                        )
                    },
                    Err(_) => date_str, // Keep original value on parse failure
                }
            })
            .collect()
    }
}

API Response Normalization

use universal_date_parser::{UniversalDateParser, TimezoneMode};
use serde::{Deserialize, Serialize};

#[derive(Deserialize, Serialize)]
struct ApiResponse {
    id: u64,
    #[serde(deserialize_with = "parse_flexible_date")]
    created_at: String,
    data: serde_json::Value,
}

fn parse_flexible_date<'de, D>(deserializer: D) -> Result<String, D::Error>
where
    D: serde::Deserializer<'de>,
{
    let s = String::deserialize(deserializer)?;
    let parser = UniversalDateParser::new(TimezoneMode::ConvertToUtc);
    
    match parser.parse(&s) {
        Ok(parsed) => Ok(format!("{:04}-{:02}-{:02}T{:02}:{:02}:{:02}Z",
            parsed.year,
            parsed.month,
            parsed.day,
            parsed.hour.unwrap_or(0),
            parsed.minute.unwrap_or(0),
            parsed.second.unwrap_or(0)
        )),
        Err(_) => Err(serde::de::Error::custom("Invalid date format")),
    }
}

Error Handling and Debugging

Error Types

#[derive(Debug, Clone)]
pub enum ParseError {
    InvalidFormat(String),
    InvalidDate(String),
    AmbiguousFormat(String),
    TimezoneParseError(String),
    NumericParseError(String),
}

impl std::fmt::Display for ParseError {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        match self {
            ParseError::InvalidFormat(msg) => write!(f, "Invalid format: {}", msg),
            ParseError::InvalidDate(msg) => write!(f, "Invalid date: {}", msg),
            ParseError::AmbiguousFormat(msg) => write!(f, "Ambiguous format: {}", msg),
            ParseError::TimezoneParseError(msg) => write!(f, "Timezone error: {}", msg),
            ParseError::NumericParseError(msg) => write!(f, "Numeric conversion error: {}", msg),
        }
    }
}

Debug Mode

let parser = UniversalDateParser::builder()
    .debug_mode(true)
    .build();

match parser.parse("ambiguous/date/format") {
    Ok(result) => println!("Success: {:?}", result),
    Err(e) => {
        println!("Error: {}", e);
        // Debug mode outputs detailed parsing steps
    }
}

Testing and Benchmarking

Unit Test Examples

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_iso_8601_parsing() {
        let parser = UniversalDateParser::new(TimezoneMode::AssumeUtc);
        let result = parser.parse("2023-12-25T10:30:00Z").unwrap();
        
        assert_eq!(result.year, 2023);
        assert_eq!(result.month, 12);
        assert_eq!(result.day, 25);
        assert_eq!(result.hour, Some(10));
        assert_eq!(result.minute, Some(30));
        assert_eq!(result.second, Some(0));
    }

    #[test]
    fn test_ambiguous_format_resolution() {
        let parser = UniversalDateParser::builder()
            .ambiguity_resolver(AmbiguityResolver::PreferEuropean)
            .build();
        
        let result = parser.parse("01/02/2023").unwrap();
        assert_eq!(result.month, 2); // European format: 1st February
        assert_eq!(result.day, 1);
    }
}

Performance Testing

use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn benchmark_parsing(c: &mut Criterion) {
    let parser = UniversalDateParser::new(TimezoneMode::AssumeUtc);
    
    c.bench_function("ISO 8601 parsing", |b| {
        b.iter(|| parser.parse(black_box("2023-12-25T10:30:00Z")))
    });
    
    c.bench_function("Unix timestamp parsing", |b| {
        b.iter(|| parser.parse(black_box("1703520645")))
    });
}

criterion_group!(benches, benchmark_parsing);
criterion_main!(benches);

Frequently Asked Questions and Troubleshooting

Q: How are ambiguous date formats handled?

A: The library resolves ambiguity using the following strategies:

  1. Contextual Analysis: Infer from other date patterns in the same input
  2. Configurable Resolution: Specify regional preferences using AmbiguityResolver
  3. Error Reporting: Clear error messages when resolution is impossible
// Ambiguity resolution configuration example
let parser = UniversalDateParser::builder()
    .ambiguity_resolver(AmbiguityResolver::PreferUS)  // Prefer MM/DD/YYYY
    .build();

Q: Can I add custom date formats?

A: Direct custom format addition is not supported in the current version but is planned for future releases. As a workaround, preprocess your input to convert to standard formats.

Q: How can I maximize performance?

A: We recommend the following optimizations:

  1. Appropriate Timezone Mode: AssumeUtc provides the fastest performance
  2. Parser Reuse: Reduce instance creation overhead
  3. Batch Processing: Process large datasets in bulk
// Optimized usage example
let parser = UniversalDateParser::new(TimezoneMode::AssumeUtc);
let dates: Vec<&str> = vec![/* large collection of date strings */];

let results: Vec<_> = dates.par_iter()  // Parallel processing with rayon
    .map(|date| parser.parse(date))
    .collect();

Q: How can I reduce memory usage?

A: The library is already designed for minimal memory usage, but for further reduction:

  1. Avoid unnecessary string cloning
  2. Process parse results immediately
  3. Use streaming processing for large datasets

Future Development Roadmap

Version 0.2.0 (Planned)

  • Natural Language Parsing: Support for "yesterday", "next week", etc.
  • Custom Format Support: User-defined pattern support
  • Localization: Multi-language month/day name support

Version 0.3.0 (Planned)

  • Date Range Parsing: "from 2023-01-01 to 2023-12-31"
  • Fuzzy Matching: Automatic correction of minor input errors
  • Performance Improvements: SIMD instruction utilization

Long-term Plans

  • Machine Learning: AI-based format inference
  • Plugin System: Third-party extension support
  • Visual Tools: Parse rule visualization

Contributing

How to Contribute

  1. Fork the Repository
  2. Create Feature Branch: git checkout -b feature/amazing-feature
  3. Commit Changes: git commit -m 'Add amazing feature'
  4. Push Branch: git push origin feature/amazing-feature
  5. Create Pull Request

Coding Guidelines

  • Rustfmt: Format code with cargo fmt
  • Clippy: Static analysis with cargo clippy
  • Testing: Add tests for all new features
  • Documentation: Document all public APIs

Bug Reports

When reporting bugs, please include:

  • Rust version
  • Library version
  • Input data example
  • Expected result
  • Actual result
  • Error messages

Conclusion

Universal Date Parser provides a comprehensive date parsing solution for the Rust ecosystem. Combining high performance, flexibility, and ease of use, it significantly simplifies date processing across various applications.

Key Benefits

  • Unified API: Handle multiple formats through a single interface
  • High Performance: Production-proven speed and efficiency
  • Reliability: Quality assured through comprehensive test suites
  • Extensibility: Architecture designed for future feature additions

Application Areas

  • Log Analysis Systems
  • Data Migration Tools
  • API Integration
  • Real-time Processing
  • Batch Processing Applications

Leveraging Rust's safety and performance capabilities, the library hides date parsing complexity while providing developers with a powerful and user-friendly tool.


Project Links:

License: MIT / Apache 2.0 dual license

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?