Introduction
Universal Date Parser is a high-performance Rust library designed to intelligently parse dates from virtually any format into standardized output. Built with performance, reliability, and ease of use in mind, it provides automatic format detection, timezone awareness, and multi-language bindings for seamless integration across different platforms and programming languages.
Library Features
Key Capabilities
- Automatic Format Detection: Intelligently recognizes multiple date formats without configuration
- High Performance: Ultra-fast processing at 300-600 nanoseconds per parse operation
- Timezone Awareness: Flexible timezone handling options for global applications
- Multi-Language Support: C FFI, WebAssembly, and Python bindings available
- Zero-Copy Architecture: Memory-efficient implementation avoiding unnecessary allocations
- Ambiguity Resolution: Smart handling of region-specific formats (MM/DD vs DD/MM)
Supported Date Formats
ISO 8601 Standard
// Complete ISO 8601 support
"2023-12-25T10:30:00Z" // UTC datetime
"2023-12-25T10:30:00+09:00" // Timezone-aware
"2023-12-25" // Date only
"20231225T103000Z" // Compact format
Regional Formats
// US format (Month/Day/Year)
"12/25/2023" // MM/DD/YYYY
// European format (Day/Month/Year)
"25/12/2023" // DD/MM/YYYY
// Japanese format
"2023年12月25日"
"2023/12/25"
Unix Timestamps
// Second precision
"1703520645" // Seconds since epoch
// Millisecond precision
"1703520645000" // Milliseconds since epoch
Architecture Overview
Core Components
ParsedDate Structure
The fundamental data structure representing a successfully parsed date:
#[derive(Debug, Clone, PartialEq)]
pub struct ParsedDate {
pub year: i32,
pub month: u32,
pub day: u32,
pub hour: Option<u32>,
pub minute: Option<u32>,
pub second: Option<u32>,
pub timezone_offset: Option<i32>,
pub detected_format: String,
}
UniversalDateParser
The main parsing engine with configurable options:
pub struct UniversalDateParser {
timezone_mode: TimezoneMode,
ambiguity_resolver: AmbiguityResolver,
}
Format Detection Engine
Intelligent pattern matching for automatic format recognition:
pub fn detect_format(&self, input: &str) -> Option<&'static str> {
// ISO 8601 formats (highest priority)
if ISO_DATETIME_REGEX.is_match(input) {
return Some("ISO 8601 DateTime");
}
// Unix timestamps
if UNIX_TIMESTAMP_REGEX.is_match(input) {
return if input.len() > 10 {
Some("Unix Timestamp (ms)")
} else {
Some("Unix Timestamp")
};
}
// Regional formats with ambiguity resolution
// ... sophisticated pattern matching
}
Design Principles
Performance First
- Zero-copy parsing where possible
- Efficient regex compilation using
lazy_static
- Early return optimization in pattern matching
- Benchmarked performance: 300-600ns per parse operation
Intelligent Detection
The parsing engine detects formats in the following priority order:
- ISO 8601 Standard (highest priority)
- Unix Timestamps
- Unambiguous Regional Formats
- Contextual Ambiguous Format Resolution
Timezone Awareness
Flexible timezone handling options:
- AssumeUtc: Treat all dates as UTC (fastest)
- AssumeLocal: Use system timezone
- PreserveOffset: Maintain original timezone information
- ConvertToUtc: Convert all dates to UTC
Installation and Basic Usage
Adding to Cargo Project
[dependencies]
universal-date-parser = "0.1.0"
Basic Usage Examples
use universal_date_parser::{UniversalDateParser, TimezoneMode};
fn main() {
let parser = UniversalDateParser::new(TimezoneMode::AssumeUtc);
// Parse various formats
let inputs = vec![
"2023-12-25T10:30:00Z",
"12/25/2023",
"25/12/2023",
"1703520645",
"December 25, 2023",
];
for input in inputs {
match parser.parse(input) {
Ok(parsed) => {
println!("Input: {} -> Result: {:?}", input, parsed);
println!("Detected format: {}", parsed.detected_format);
},
Err(e) => println!("Parse error: {}", e),
}
}
}
Advanced Configuration
use universal_date_parser::{
UniversalDateParser,
TimezoneMode,
AmbiguityResolver
};
fn main() {
// Create parser with custom configuration
let parser = UniversalDateParser::builder()
.timezone_mode(TimezoneMode::PreserveOffset)
.ambiguity_resolver(AmbiguityResolver::PreferEuropean)
.build();
// Parse ambiguous date
let ambiguous_date = "01/02/2023";
match parser.parse(ambiguous_date) {
Ok(parsed) => {
// With PreferEuropean setting, 01/02/2023 is interpreted as February 1st
println!("Parsed result: {}-{:02}-{:02}",
parsed.year, parsed.month, parsed.day);
},
Err(e) => println!("Error: {}", e),
}
}
Performance Characteristics
Benchmark Results
Comprehensive benchmark results using Criterion:
Format Type | Average Time | Throughput |
---|---|---|
ISO 8601 DateTime | 388ns | 2.6M ops/sec |
ISO 8601 Date | 344ns | 2.9M ops/sec |
US Format | 392ns | 2.6M ops/sec |
European Format | 618ns | 1.6M ops/sec |
Unix Timestamp | 483ns | 2.1M ops/sec |
Unix Timestamp (ms) | 598ns | 1.7M ops/sec |
Memory Usage
- Minimal heap allocations
- Static regex pattern usage
- Stack-based parsing for maximum memory efficiency
Optimization Techniques
Zero-Copy Parsing
// Efficient parsing avoiding string copies
pub fn parse_iso_date(&self, input: &str) -> Result<ParsedDate, ParseError> {
// Direct string slice manipulation for performance
let year: i32 = input[0..4].parse()?;
let month: u32 = input[5..7].parse()?;
let day: u32 = input[8..10].parse()?;
// ...
}
Lazy Regex Compilation
lazy_static! {
static ref ISO_DATETIME_REGEX: Regex =
Regex::new(r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:\d{2})$")
.unwrap();
}
Multi-Language Bindings
C FFI
Usage example from C:
#include "universal_date_parser.h"
int main() {
UniversalDateParser* parser = universal_parser_new();
ParsedDate result;
int status = universal_parser_parse(parser, "2023-12-25", &result);
if (status == 0) {
printf("Parse success: %d-%d-%d\n", result.year, result.month, result.day);
}
universal_parser_free(parser);
return 0;
}
WebAssembly
Usage from JavaScript/TypeScript:
import { UniversalDateParser } from './pkg/universal_date_parser.js';
const parser = new UniversalDateParser();
const result = parser.parse("2023-12-25T10:30:00Z");
console.log(`Parse result: ${result.year}-${result.month}-${result.day}`);
console.log(`Detected format: ${result.detected_format}`);
Python Bindings
Python support using PyO3:
import universal_date_parser
parser = universal_date_parser.UniversalDateParser()
result = parser.parse("2023-12-25T10:30:00Z")
print(f"Parse result: {result.year}-{result.month}-{result.day}")
print(f"Detected format: {result.detected_format}")
Practical Use Cases
Log Analysis System
use universal_date_parser::{UniversalDateParser, TimezoneMode};
use std::collections::HashMap;
struct LogAnalyzer {
parser: UniversalDateParser,
stats: HashMap<String, u32>,
}
impl LogAnalyzer {
fn new() -> Self {
Self {
parser: UniversalDateParser::new(TimezoneMode::AssumeUtc),
stats: HashMap::new(),
}
}
fn analyze_log_entry(&mut self, log_line: &str) {
// Extract date portion from log entry
if let Some(date_part) = self.extract_date_from_log(log_line) {
match self.parser.parse(&date_part) {
Ok(parsed) => {
let format = parsed.detected_format.clone();
*self.stats.entry(format).or_insert(0) += 1;
},
Err(_) => {
*self.stats.entry("unknown".to_string()).or_insert(0) += 1;
}
}
}
}
fn extract_date_from_log(&self, log_line: &str) -> Option<String> {
// Date extraction logic based on log format
// Real implementation would use regex patterns
None
}
}
Database Migration Tool
use universal_date_parser::{UniversalDateParser, TimezoneMode};
struct DataMigrator {
parser: UniversalDateParser,
}
impl DataMigrator {
fn new() -> Self {
Self {
parser: UniversalDateParser::new(TimezoneMode::ConvertToUtc),
}
}
fn migrate_date_column(&self, values: Vec<String>) -> Vec<String> {
values.into_iter()
.map(|date_str| {
match self.parser.parse(&date_str) {
Ok(parsed) => {
// Standardize to ISO 8601 UTC format
format!("{:04}-{:02}-{:02}T{:02}:{:02}:{:02}Z",
parsed.year,
parsed.month,
parsed.day,
parsed.hour.unwrap_or(0),
parsed.minute.unwrap_or(0),
parsed.second.unwrap_or(0)
)
},
Err(_) => date_str, // Keep original value on parse failure
}
})
.collect()
}
}
API Response Normalization
use universal_date_parser::{UniversalDateParser, TimezoneMode};
use serde::{Deserialize, Serialize};
#[derive(Deserialize, Serialize)]
struct ApiResponse {
id: u64,
#[serde(deserialize_with = "parse_flexible_date")]
created_at: String,
data: serde_json::Value,
}
fn parse_flexible_date<'de, D>(deserializer: D) -> Result<String, D::Error>
where
D: serde::Deserializer<'de>,
{
let s = String::deserialize(deserializer)?;
let parser = UniversalDateParser::new(TimezoneMode::ConvertToUtc);
match parser.parse(&s) {
Ok(parsed) => Ok(format!("{:04}-{:02}-{:02}T{:02}:{:02}:{:02}Z",
parsed.year,
parsed.month,
parsed.day,
parsed.hour.unwrap_or(0),
parsed.minute.unwrap_or(0),
parsed.second.unwrap_or(0)
)),
Err(_) => Err(serde::de::Error::custom("Invalid date format")),
}
}
Error Handling and Debugging
Error Types
#[derive(Debug, Clone)]
pub enum ParseError {
InvalidFormat(String),
InvalidDate(String),
AmbiguousFormat(String),
TimezoneParseError(String),
NumericParseError(String),
}
impl std::fmt::Display for ParseError {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
match self {
ParseError::InvalidFormat(msg) => write!(f, "Invalid format: {}", msg),
ParseError::InvalidDate(msg) => write!(f, "Invalid date: {}", msg),
ParseError::AmbiguousFormat(msg) => write!(f, "Ambiguous format: {}", msg),
ParseError::TimezoneParseError(msg) => write!(f, "Timezone error: {}", msg),
ParseError::NumericParseError(msg) => write!(f, "Numeric conversion error: {}", msg),
}
}
}
Debug Mode
let parser = UniversalDateParser::builder()
.debug_mode(true)
.build();
match parser.parse("ambiguous/date/format") {
Ok(result) => println!("Success: {:?}", result),
Err(e) => {
println!("Error: {}", e);
// Debug mode outputs detailed parsing steps
}
}
Testing and Benchmarking
Unit Test Examples
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_iso_8601_parsing() {
let parser = UniversalDateParser::new(TimezoneMode::AssumeUtc);
let result = parser.parse("2023-12-25T10:30:00Z").unwrap();
assert_eq!(result.year, 2023);
assert_eq!(result.month, 12);
assert_eq!(result.day, 25);
assert_eq!(result.hour, Some(10));
assert_eq!(result.minute, Some(30));
assert_eq!(result.second, Some(0));
}
#[test]
fn test_ambiguous_format_resolution() {
let parser = UniversalDateParser::builder()
.ambiguity_resolver(AmbiguityResolver::PreferEuropean)
.build();
let result = parser.parse("01/02/2023").unwrap();
assert_eq!(result.month, 2); // European format: 1st February
assert_eq!(result.day, 1);
}
}
Performance Testing
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn benchmark_parsing(c: &mut Criterion) {
let parser = UniversalDateParser::new(TimezoneMode::AssumeUtc);
c.bench_function("ISO 8601 parsing", |b| {
b.iter(|| parser.parse(black_box("2023-12-25T10:30:00Z")))
});
c.bench_function("Unix timestamp parsing", |b| {
b.iter(|| parser.parse(black_box("1703520645")))
});
}
criterion_group!(benches, benchmark_parsing);
criterion_main!(benches);
Frequently Asked Questions and Troubleshooting
Q: How are ambiguous date formats handled?
A: The library resolves ambiguity using the following strategies:
- Contextual Analysis: Infer from other date patterns in the same input
-
Configurable Resolution: Specify regional preferences using
AmbiguityResolver
- Error Reporting: Clear error messages when resolution is impossible
// Ambiguity resolution configuration example
let parser = UniversalDateParser::builder()
.ambiguity_resolver(AmbiguityResolver::PreferUS) // Prefer MM/DD/YYYY
.build();
Q: Can I add custom date formats?
A: Direct custom format addition is not supported in the current version but is planned for future releases. As a workaround, preprocess your input to convert to standard formats.
Q: How can I maximize performance?
A: We recommend the following optimizations:
-
Appropriate Timezone Mode:
AssumeUtc
provides the fastest performance - Parser Reuse: Reduce instance creation overhead
- Batch Processing: Process large datasets in bulk
// Optimized usage example
let parser = UniversalDateParser::new(TimezoneMode::AssumeUtc);
let dates: Vec<&str> = vec![/* large collection of date strings */];
let results: Vec<_> = dates.par_iter() // Parallel processing with rayon
.map(|date| parser.parse(date))
.collect();
Q: How can I reduce memory usage?
A: The library is already designed for minimal memory usage, but for further reduction:
- Avoid unnecessary string cloning
- Process parse results immediately
- Use streaming processing for large datasets
Future Development Roadmap
Version 0.2.0 (Planned)
- Natural Language Parsing: Support for "yesterday", "next week", etc.
- Custom Format Support: User-defined pattern support
- Localization: Multi-language month/day name support
Version 0.3.0 (Planned)
- Date Range Parsing: "from 2023-01-01 to 2023-12-31"
- Fuzzy Matching: Automatic correction of minor input errors
- Performance Improvements: SIMD instruction utilization
Long-term Plans
- Machine Learning: AI-based format inference
- Plugin System: Third-party extension support
- Visual Tools: Parse rule visualization
Contributing
How to Contribute
- Fork the Repository
-
Create Feature Branch:
git checkout -b feature/amazing-feature
-
Commit Changes:
git commit -m 'Add amazing feature'
-
Push Branch:
git push origin feature/amazing-feature
- Create Pull Request
Coding Guidelines
-
Rustfmt: Format code with
cargo fmt
-
Clippy: Static analysis with
cargo clippy
- Testing: Add tests for all new features
- Documentation: Document all public APIs
Bug Reports
When reporting bugs, please include:
- Rust version
- Library version
- Input data example
- Expected result
- Actual result
- Error messages
Conclusion
Universal Date Parser provides a comprehensive date parsing solution for the Rust ecosystem. Combining high performance, flexibility, and ease of use, it significantly simplifies date processing across various applications.
Key Benefits
- Unified API: Handle multiple formats through a single interface
- High Performance: Production-proven speed and efficiency
- Reliability: Quality assured through comprehensive test suites
- Extensibility: Architecture designed for future feature additions
Application Areas
- Log Analysis Systems
- Data Migration Tools
- API Integration
- Real-time Processing
- Batch Processing Applications
Leveraging Rust's safety and performance capabilities, the library hides date parsing complexity while providing developers with a powerful and user-friendly tool.
Project Links:
- GitHub: https://github.com/rust-core-libs/universal-date-parser
- Crates.io: https://crates.io/crates/universal-date-parser
- Documentation: https://docs.rs/universal-date-parser
License: MIT / Apache 2.0 dual license