Building a Simple Email Spam Classifier in Rust with SmartCore
Machine learning is often associated with Python, but the Rust ecosystem is quickly catching up! In this article, I’ll walk you through building a simple email spam classifier in Rust using the SmartCore machine learning library. This project is perfect for Rustaceans curious about ML, or anyone looking for a practical, hackable example.
Why Rust for Machine Learning?
Rust is known for its safety, speed, and growing ecosystem. While Python dominates the ML world, Rust offers:
- Memory safety without garbage collection
- Blazing fast execution
- Strong type system for fewer runtime errors
- A rapidly expanding set of data and ML libraries
Project Overview
We’ll build a K-Nearest Neighbors (KNN) classifier that predicts whether an email address is spam or real, based on simple features:
- The length of the email address
- The number of special (non-alphanumeric) characters
Our dataset is a CSV file with two columns: spam
and real
, each containing example email addresses.
The Dataset
Here’s a sample of our public/spam.csv
:
spam,real
winner@spammy.com,john.doe@example.com
cheapmeds@pharmacy.com,info@company.com
bankalert@fraud.com,support@service.com
...
The Code
Below is the full Rust code for the classifier. It reads the CSV, extracts features, trains a KNN model, evaluates accuracy, and lets you check if any email is spam.
Repo -> https://github.com/DevWonder01/rust-ml
use smartcore::linalg::basic::matrix::DenseMatrix;
use smartcore::neighbors::knn_classifier::KNNClassifier;
use smartcore::metrics::accuracy;
use serde::{Deserialize, Serialize};
use std::error::Error;
use std::fs::File;
use csv::Reader;
#[derive(Deserialize, Serialize, Debug, Clone)]
struct Email {
spam: String,
real: String,
}
fn count_special_chars(s: &str) -> usize {
s.chars().filter(|c| !c.is_alphanumeric()).count()
}
fn process_data() -> Result<(DenseMatrix<f64>, Vec<i32>), Box<dyn Error>> {
let filename = "public/spam.csv";
let file = File::open(filename)?;
let mut rdr = Reader::from_reader(file);
let mut data: Vec<f64> = Vec::new();
let mut labels: Vec<i32> = Vec::new();
let mut rows = 0;
for result in rdr.records() {
let record = result?;
let email: Email = record.deserialize(None)?;
println!("{:?}", email);
// Features for spam email
data.push(email.spam.len() as f64);
data.push(count_special_chars(&email.spam) as f64);
labels.push(1);
// Features for real email
data.push(email.real.len() as f64);
data.push(count_special_chars(&email.real) as f64);
labels.push(0);
rows += 2;
}
let matrix = DenseMatrix::new(rows, 2, data, true);
println!("{:?}", matrix);
println!("{:?}", labels);
Ok((matrix, labels))
}
fn predict_email(knn: &KNNClassifier<f64, DenseMatrix<f64>>, email: &str) -> Result<(), Box<dyn Error>> {
let features = vec![email.len() as f64, count_special_chars(email) as f64];
let feature_slice = [&features[..]];
let prediction = knn.predict(&DenseMatrix::from_2d_array(&feature_slice))?;
if prediction[0] == 1 {
println!("{} is predicted as SPAM", email);
} else {
println!("{} is predicted as REAL", email);
}
Ok(())
}
fn main() -> Result<(), Box<dyn Error>> {
let (matrix, labels) = process_data()?;
// Train KNN classifier
let knn = KNNClassifier::fit(&matrix, &labels, Default::default())?;
// Predict on the training data
let predictions = knn.predict(&matrix)?;
// Calculate accuracy
let acc = accuracy(&labels, &predictions);
println!("Accuracy: {}", acc);
// Example: check if a single email is spam
let test_email = "winner@gmail.com";
predict_email(&knn, test_email)?;
Ok(())
}
How It Works
-
Feature Engineering:
For each email, we extract two features: the length and the number of special characters.
-
Model Training:
We use SmartCore’s KNN classifier, training it on all the data.
-
Prediction:
The model predicts whether a new email address is spam or real based on its features.
-
Evaluation:
The code prints the model’s accuracy on the training data.
Running the Project
-
Add dependencies to your
Cargo.toml
:
smartcore = "0.3"
serde = { version = "1.0", features = ["derive"] }
csv = "1.1"
-
Place your dataset at
public/spam.csv
. -
Run the project:
cargo run
What’s Next?
- Try more advanced features (e.g., domain analysis, word counts)
- Split your data into training and test sets for a more realistic evaluation
- Experiment with other algorithms in SmartCore
Final Thoughts
Rust is ready for machine learning experimentation! With libraries like SmartCore, you can build, train, and deploy ML models with the safety and speed Rust is famous for.
Give it a try and let’s grow the Rust ML community together!
Questions or suggestions? Drop a comment below or connect with me!
rustlang #machinelearning #rustml #emailsecurity #opensource