🇫🇷 From “Vingt et Un” to 21: Building a Lightning-Fast French Number Parser in Ruby
Ever tried to parse French numbers in your Ruby application?
If you’ve worked with French text data, you know the pain: “quatre-vingt-quatorze” should become 94, “trois millions deux cent mille” should transform to 3,200,000, but good luck finding a performant solution that handles all the linguistic quirks of French numbers!
That’s exactly the problem I set out to solve with StringToNumber, a high-performance Ruby gem that converts French written numbers into their numeric equivalents with blazing speed and bulletproof reliability.
🤔 The Problem: French Numbers Are… Complex
French numbers aren’t just “difficult”, they’re linguistically fascinating and computationally challenging:
- Special cases: “quatre-vingts” (80) vs “quatre-vingt-un” (81)
- Compound forms: “soixante-dix” (literally “sixty-ten” = 70)
- Multiple formats: “vingt-et-un” vs “vingt et un” vs “vingt-un”
- Scale handling: “deux millions trois cent mille” (2,300,000)
Most existing solutions either:
- ❌ Don’t handle edge cases correctly
- ⚠️ Have terrible performance for large datasets
- 💔 Lack proper caching and memory management
- 🐛 Break on complex compound numbers
✨ The Solution: StringToNumber Gem
StringToNumber tackles these challenges head-on with a dual-architecture approach:
🚀 Performance That Scales
- Up to 460x faster than naive implementations
- Intelligent LRU caching with thread-safe operations
- Pre-compiled regex patterns eliminate compilation overhead
- Zero-allocation matching for common cases
🎯 Comprehensive French Support
require 'string_to_number'
# Basic numbers
StringToNumber.in_numbers('quinze') #=> 15
StringToNumber.in_numbers('quatre-vingts') #=> 80
# Complex compounds
StringToNumber.in_numbers('soixante-dix-sept') #=> 77
StringToNumber.in_numbers('quatre-vingt-quatorze') #=> 94
# Large numbers
StringToNumber.in_numbers('deux millions trois cent mille') #=> 2_300_000
StringToNumber.in_numbers('neuf mille neuf cent quatre-vingt-dix-neuf') #=> 9999
🛡️ Production-Ready Features
- Thread-safe concurrent operations
- Input validation with helpful error messages
- Memory efficient with configurable cache limits
- Backward compatibility mode for testing
📊 Performance That Will Blow Your Mind
Here’s where StringToNumber really shines. Check out these benchmark results:
+--------+------------+----------+----------------+-------------+--------+
| Input | Complexity | Original | StringToNumber | Improvement | |
+--------+------------+----------+----------------+-------------+--------+
| Short | numbers | 0.5ms | 0.035ms | 14x | faster |
| Medium | complexity | 2.1ms | 0.045ms | 47x | faster |
| Long | compounds | 23ms | 0.05ms | 460x | faster |
+--------+------------+----------+----------------+-------------+--------+
# This processes 800,000+ conversions per second! 🔥
1000.times { StringToNumber.in_numbers('vingt et un') }
🎯 Real-World Use Cases
Who should use StringToNumber?
- 🏦 Financial apps processing French invoices/documents
- 📊 Data pipelines cleaning French numerical text
- 🤖 NLP projects working with French language data
- 📱 Mobile apps supporting French localization
- 🔍 Search engines normalizing French numerical queries
- 📈 Analytics platforms parsing French business data
Example: Processing French Financial Data
# Clean messy financial data
invoices = [
"Montant: trois mille deux cent euros",
"Total: quinze mille neuf cent vingt",
"Crédit: un million deux cent mille"
]
amounts = invoices.map do |invoice|
number_text = invoice.match(/: (.+) euros?/)&.[](1) || invoice.match(/: (.+)$/)&.[](1)
StringToNumber.in_numbers(number_text) if number_text
end
#=> [3200, 15920, 1200000]
🚀 Quick Start Guide
Installation
gem install string_to_number
# or add to Gemfile
gem 'string_to_number'
Basic Usage
require 'string_to_number'
# Convert any French number
result = StringToNumber.in_numbers('mille deux cent trente-quatre')
puts result #=> 1234
# Validate input before processing
if StringToNumber.valid_french_number?('vingt et un')
puts StringToNumber.in_numbers('vingt et un') #=> 21
end
# Check performance stats
stats = StringToNumber.cache_stats
puts "Cache hit ratio: #{stats[:cache_hit_ratio]}"
Advanced Features
# Batch processing with automatic caching
french_numbers = ['un', 'deux', 'trois', 'vingt', 'cent']
results = french_numbers.map { |num| StringToNumber.in_numbers(num) }
# Memory management for long-running processes
StringToNumber.clear_caches! # Reset when processing new datasets
# Backward compatibility testing
old_result = StringToNumber.in_numbers('cent', use_optimized: false)
new_result = StringToNumber.in_numbers('cent', use_optimized: true)
puts old_result == new_result #=> true
🏗️ Under the Hood: Architecture Highlights
The gem uses a sophisticated dual-parser approach:
- Optimized Parser (default): High-performance with caching
- Original Parser: Reference implementation for compatibility
Key optimizations include:
- LRU caching with thread-safe mutex protection
- Instance memoization reduces initialization overhead
- Pre-compiled regex patterns eliminate compilation costs
- Intelligent word matching with zero allocations
🤝 Join the Community!
StringToNumber is just getting started, and I’d love your help making it even better!
💡 Ways to Contribute:
- ⭐ Star the repo on https://github.com/FabienPiette/string_to_number
- 🐛 Report bugs or edge cases you discover
- 💻 Submit PRs for new features or optimizations
- 📝 Share your use cases and success stories
📧 Let’s Connect: - Found an edge case? https://github.com/FabienPiette/string_to_number/issues
- Built something cool? Tweet about it and tag me! (https://bsky.app/profile/fabijordgrimsson.bsky.social)
- Have questions? The documentation covers common scenarios
🎯 What’s Next?
I’m actively working on:
- Regional variants support (Belgian/Swiss French)
- Decimal number parsing (“trois virgule quatorze”)
- Ordinal numbers (“premier”, “deuxième”)
Ready to supercharge your French number processing? Install StringToNumber today and transform your text processing pipeline from sluggish to lightning-fast!
gem install string_to_number
Your French data deserves better than regex soup. Give it the StringToNumber treatment! 🚀
What do you think? I’d love to hear how StringToNumber works in your projects, drop me a line or star the repo if it saves you time!
Happy coding! 🇫🇷💎