Build, refresh, export, and query a local SQLite ride-history database from Gmail ride receipt emails (Uber, Bolt, Yandex Go, Lyft) using LLM extraction from...
Run a reproducible 3-stage pipeline:
gog CLI installed and authenticated for the selected Gmail account.skills.entries.ride-receipts-llm.config.gmailAccount; if missing, ask user for account explicitly.YYYY-MM-DD, or between dates.text_html from emails; do not claim local-only parsing.null.skills/ride-receipts-llm/references/schema_rides.sql./data/ride_emails.jsonl./data/rides_extracted.jsonl./data/rides.sqlitepython3 skills/ride-receipts-llm/scripts/init_db.py \
--db ./data/rides.sqlite \
--schema skills/ride-receipts-llm/references/schema_rides.sql
python3 skills/ride-receipts-llm/scripts/fetch_emails_jsonl.py \
--account <gmail-account> \
--after YYYY-MM-DD \
--before YYYY-MM-DD \
--max-per-provider 5000 \
--out ./data/ride_emails.jsonl
--after / --before when not needed.text_html.Read ./data/ride_emails.jsonl; write one JSON object per line to ./data/rides_extracted.jsonl.
Per email:
amount,currency,pickup,dropoff,payment_method,distance_text,duration_text,start_time_text,end_time_text.null.Schema (one line per ride):
{
"provider": "Uber|Bolt|Yandex|Lyft",
"source": {"gmail_message_id": "...", "email_date": "YYYY-MM-DD HH:MM", "subject": "..."},
"ride": {
"start_time_text": "...",
"end_time_text": "...",
"total_text": "...",
"currency": "EUR|PLN|USD|BYN|RUB|UAH|null",
"amount": 12.34,
"pickup": "...",
"dropoff": "...",
"pickup_city": "...",
"pickup_country": "...",
"dropoff_city": "...",
"dropoff_country": "...",
"payment_method": "...",
"driver": "...",
"distance_text": "...",
"duration_text": "...",
"notes": "..."
}
}
Rules:
text_html as primary source; fallback to snippet only if text_html is empty.amount numeric; if only textual total exists, set amount: null and preserve text in total_text.python3 skills/ride-receipts-llm/scripts/insert_rides_sqlite_jsonl.py \
--db ./data/rides.sqlite \
--schema skills/ride-receipts-llm/references/schema_rides.sql \
--rides-jsonl ./data/rides_extracted.jsonl
Schema is idempotent via UNIQUE(provider, gmail_message_id) ON CONFLICT REPLACE.
ZIP package — ready to use