🤖 Part 3: Building a Price Recommendation Engine with Pandas and Scikit-Learn
In Part 1 and Part 2, we created a pipeline to crawl smartphone prices and load them into Snowflake. Now, we’ll show how your Data Science team can use that data to:
- Recommend products by price
- Suggest the best day to buy
- Predict future price drops
Let’s walk through a prototype using pandas
, scikit-learn
, and matplotlib
.
📦 Data Snapshot
We’ll use a dataframe like the one below, pulled from the Snowflake PRICING.ECOM_PRICE_INDEX
table:
import pandas as pd
df = pd.read_sql("SELECT * FROM PRICING.ECOM_PRICE_INDEX", connection)
df.head()
product_name | product_price | scraped_at | ingestion_date |
---|---|---|---|
Galaxy S23 Ultra | 1199.99 | 2024-06-01 08:00:00 | 2024-06-01 |
Galaxy S23 Ultra | 1149.99 | 2024-06-05 08:00:00 | 2024-06-05 |
iPhone 14 Pro Max | 999.99 | 2024-06-01 08:00:00 | 2024-06-01 |
iPhone 14 Pro Max | 979.99 | 2024-06-10 08:00:00 | 2024-06-10 |
🧠1. Recommend Lowest Prices by Product
recommendations = df.sort_values('scraped_at') .groupby('product_name') .agg(best_price=('product_price', 'min'),
last_price=('product_price', 'last'),
price_diff=('product_price', lambda x: x.iloc[-1] - x.min()))
📅 2. Find the Best Day to Buy
df['weekday'] = pd.to_datetime(df['scraped_at']).dt.day_name()
best_days = df.groupby(['product_name', 'weekday'])['product_price'].mean().reset_index()
best_day_to_buy = best_days.loc[best_days.groupby('product_name')['product_price'].idxmin()]
📉 3. Predict Future Price Drops
from sklearn.linear_model import LinearRegression
import numpy as np
df['day'] = (pd.to_datetime(df['scraped_at']) - pd.to_datetime(df['scraped_at']).min()).dt.days
predictions = []
for product in df['product_name'].unique():
subset = df[df['product_name'] == product].sort_values('day')
X = subset[['day']]
y = subset['product_price']
if len(X) < 3:
continue
model = LinearRegression().fit(X, y)
next_day = np.array([[X['day'].max() + 1]])
predicted_price = model.predict(next_day)[0]
predictions.append({
'product_name': product,
'predicted_price_next_day': predicted_price,
'trend': 'dropping' if predicted_price < y.iloc[-1] else 'rising'
})
predicted_df = pd.DataFrame(predictions)
📊 Example Output
product_name | predicted_price_next_day | trend |
---|---|---|
Galaxy S23 Ultra | 1129.45 | dropping |
iPhone 14 Pro Max | 982.10 | rising |
🧠Final Note
It’s tempting to jump into advanced AI models — but for many pricing problems, basic data analysis is more than enough.
- 🧮 Simple statistics like averages and min/max reveal buyer trends
- 📅 Time-based aggregation gives powerful forecasting signals
- 🤖 ML can add predictive power, but isn’t always necessary
In many cases, using clean pipelines and descriptive analytics leads to faster insights and more trustworthy decisions than a complex AI model.
✅ Start with the data. Understand it. Then decide whether AI is worth the extra complexity.