Migrate stats stuff to a separate Rust server by skyfallwastaken · Pull Request #1116 · hackclub/hackatime

skyfallwastaken · 2026-03-27T10:13:39Z

No description provided.

skyfallwastaken · 2026-03-27T10:14:03Z

@greptileai please review and identify any potential regressions

greptile-apps · 2026-03-27T10:21:03Z

Greptile Summary

This PR introduces a new Axum-based Rust microservice (rust-stats-server) that offloads compute-heavy stats queries (durations, spans, streaks, leaderboard, profile, etc.) from the Rails monolith, along with a Ruby StatsClient wrapper and a StatsClientWithFallback migration-phase shim that falls back to in-process Ruby when the Rust server is unreachable. A Docker Compose service definition and a mass-heartbeat seed task are also added.\n\nKey concerns:\n\n- Streak cap (P1): query/streaks.rs silently clamps any requested lookback to 30 days. Users with streaks longer than 30 days will receive a truncated count that diverges from the previous Ruby implementation.\n- Dead Config::auth_token / per-request env read (P1): The auth middleware reads AUTH_TOKEN directly from the environment on every request, completely bypassing the Config struct populated at startup — making that field unreachable dead code.\n- No tests included: Per repository policy, new functionality should ship with tests. Neither the Rust routes/queries nor the Ruby client/fallback have any test coverage.\n- Fallback opt gaps (P2): The duration_boundary_aware fallback silently drops coding_only and categories_exclude options; duration_grouped drops categories_exclude. This can produce incorrect data during the migration window.\n- Rake constant pollution (P2): Five top-level Ruby constants are defined inside the rake task body and will generate already initialized constant warnings on repeated runs.\n- _span_end dead variable (P2): Tracked but never used in any span-boundary computation in query/spans.rs.

Confidence Score: 3/5

Not ready to merge — the streak 30-day cap is a silent data-correctness regression and the auth token architecture has dead code that should be resolved before production.

Two P1 issues: the streak lookback cap that truncates real user data, and the auth middleware bypassing the Config system entirely. The fallback wrapper also has option-dropping gaps that can silently produce wrong numbers during the migration window, and the PR ships with no tests despite the repository policy requiring them.

rust-stats-server/src/query/streaks.rs (streak cap), rust-stats-server/src/middleware/auth.rs (dead config field + per-request env read), lib/stats_client_with_fallback.rb (fallback option gaps)

Important Files Changed

Filename	Overview
rust-stats-server/src/query/streaks.rs	Streak computation is hard-capped to a 30-day lookback window, silently truncating any streak longer than 30 days — a behavioral regression vs. the Ruby fallback.
rust-stats-server/src/middleware/auth.rs	Auth token is re-read from the environment on every request instead of from app state, making `Config::auth_token` dead code; token comparison is also not constant-time.
rust-stats-server/src/query/filters.rs	All user-controlled values are bound as typed query parameters — no SQL injection risk.
rust-stats-server/src/query/spans.rs	Span-building logic is sound; `_span_end` variable is tracked but never used, indicating dead code from a prior iteration.
lib/stats_client.rb	Clean HTTP wrapper; uses bearer auth, proper timeout, and compact helper to strip nil params before sending to the Rust server.
lib/stats_client_with_fallback.rb	Migration-phase fallback wrapper; only catches `ConnectionError` (not `ServerError`), and the `duration_boundary_aware` and `duration_grouped` fallbacks silently drop some filter opts.
rust-stats-server/src/routes/daily_durations.rs	Timezone is validated against chrono-tz allowlist before interpolation into SQL, preventing injection.
lib/tasks/seed_mass_heartbeats.rake	Useful load-testing seed data generator; top-level Ruby constants inside the task block will cause `already initialized constant` warnings on repeated runs.
rust-stats-server/src/config.rs	`auth_token` field is populated but never consumed — the middleware bypasses this struct entirely.
rust-stats-server/Dockerfile	Two-stage build (builder + slim runtime) is correctly structured; dependency-caching layer is in place.

Sequence Diagram

sequenceDiagram
    participant Rails as Rails App
    participant Fallback as StatsClientWithFallback
    participant Client as StatsClient (HTTP)
    participant Rust as Rust stats-server (Axum)
    participant PG as PostgreSQL

    Rails->>Fallback: duration / streaks / spans / ...
    Fallback->>Client: delegate call
    Client->>Rust: POST /api/v1/<endpoint> (Bearer token)
    Rust->>Rust: auth_middleware (reads AUTH_TOKEN from ENV)
    Rust->>PG: parameterised SQL query
    PG-->>Rust: result rows
    Rust-->>Client: JSON response
    Client-->>Fallback: parsed Hash
    Fallback-->>Rails: result

    alt ConnectionError (Rust unreachable)
        Client--xFallback: StatsClient::ConnectionError
        Fallback->>PG: direct ActiveRecord fallback
        PG-->>Fallback: result
        Fallback-->>Rails: result
    end

Comments Outside Diff (2)

lib/stats_client_with_fallback.rb, line 217-224 (link)

duration_boundary_aware fallback silently ignores coding_only and categories_exclude

The Ruby fallback for duration_boundary_aware only applies the project filter from **opts. If the caller passes coding_only: true or categories_exclude: [...], those are silently dropped in the fallback path, causing the result to include non-coding heartbeats.

Similarly, duration_grouped's fallback applies coding_only but ignores categories_exclude. Since this is a migration-phase wrapper, the discrepancy may cause data inconsistencies that are hard to debug while both paths are active.

Prompt To Fix With AI

This is a comment left during a code review.
Path: lib/stats_client_with_fallback.rb
Line: 217-224

Comment:
**`duration_boundary_aware` fallback silently ignores `coding_only` and `categories_exclude`**

The Ruby fallback for `duration_boundary_aware` only applies the `project` filter from `**opts`. If the caller passes `coding_only: true` or `categories_exclude: [...]`, those are silently dropped in the fallback path, causing the result to include non-coding heartbeats.

Similarly, `duration_grouped`'s fallback applies `coding_only` but ignores `categories_exclude`. Since this is a migration-phase wrapper, the discrepancy may cause data inconsistencies that are hard to debug while both paths are active.

How can I resolve this? If you propose a fix, please make it concise.

lib/tasks/seed_mass_heartbeats.rake, line 265-269 (link)

Constants defined inside a Rake task block leak to global scope

NUM_USERS, TOTAL_HEARTBEATS, HBS_PER_USER, BATCH_SIZE, and ONE_YEAR_AGO are Ruby constants (all-caps), not local variables. Constants assigned inside a block become top-level constants in Ruby. Running rake seed:mass_heartbeats more than once in the same process will produce already initialized constant warnings. Use lowercase local variables instead.

Prompt To Fix With AI

This is a comment left during a code review.
Path: lib/tasks/seed_mass_heartbeats.rake
Line: 265-269

Comment:
**Constants defined inside a Rake task block leak to global scope**

`NUM_USERS`, `TOTAL_HEARTBEATS`, `HBS_PER_USER`, `BATCH_SIZE`, and `ONE_YEAR_AGO` are Ruby constants (all-caps), not local variables. Constants assigned inside a block become top-level constants in Ruby. Running `rake seed:mass_heartbeats` more than once in the same process will produce `already initialized constant` warnings. Use lowercase local variables instead.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: rust-stats-server/src/query/streaks.rs
Line: 19-32

Comment:
**30-day clamp silently truncates streaks longer than 30 days**

The code clamps the effective start date to at most 30 days ago. This means any user whose streak exceeds 30 days will see an incorrect (truncated) streak value — e.g., a user with a 45-day streak would see 30 instead of 45.

Even if the caller passes `start_date: "2025-01-01"` (90 days ago), the code will silently use 30 days ago. The original Ruby fallback in `StatsClientWithFallback` passes `start_date` through directly to `Heartbeat.daily_streaks_for_users`, which presumably respects an unbounded range, so these two implementations diverge for users with streaks > 30 days.

If the 30-day cap is intentional for performance, it should be enforced on the caller side (or documented as a known limitation), not silently applied here.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: rust-stats-server/src/middleware/auth.rs
Line: 4

Comment:
**`AUTH_TOKEN` read from env on every request; `Config::auth_token` is dead code**

The middleware calls `std::env::var("AUTH_TOKEN")` on every incoming request rather than reading the already-parsed value from `Config`. As a result the `auth_token` field populated in `config.rs` at startup is never used anywhere in the codebase — it is dead code.

The standard Axum pattern is to extend `AppState` with an `auth_token: String` field (populated once from `Config`) and pass it into the middleware. This keeps configuration in one place, avoids a per-request syscall, and makes the auth logic testable without mutating the environment.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: rust-stats-server/src/middleware/auth.rs
Line: 14

Comment:
**Token comparison is not constant-time**

`token == auth_token` uses a short-circuit equality check that exits early on the first mismatched byte, making it technically vulnerable to timing attacks. For a shared-secret bearer token, this allows an attacker to brute-force the token byte-by-byte by measuring response times.

In practice this is a low-severity risk for an internal service, but using a constant-time comparison is a simple hardening step. The `subtle` crate (already transitively available in the dependency tree) provides `ConstantTimeEq`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: rust-stats-server/src/query/spans.rs
Line: 76

Comment:
**`_span_end` is tracked but never used in any calculation**

`_span_end` is initialised and updated (line 138: `_span_end = next;`) but is never read when computing span boundaries. The underscore prefix suppresses the compiler warning, but this is dead code — perhaps a leftover from an earlier implementation. Remove it or add a comment explaining the intent.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/stats_client_with_fallback.rb
Line: 217-224

Comment:
**`duration_boundary_aware` fallback silently ignores `coding_only` and `categories_exclude`**

The Ruby fallback for `duration_boundary_aware` only applies the `project` filter from `**opts`. If the caller passes `coding_only: true` or `categories_exclude: [...]`, those are silently dropped in the fallback path, causing the result to include non-coding heartbeats.

Similarly, `duration_grouped`'s fallback applies `coding_only` but ignores `categories_exclude`. Since this is a migration-phase wrapper, the discrepancy may cause data inconsistencies that are hard to debug while both paths are active.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/tasks/seed_mass_heartbeats.rake
Line: 265-269

Comment:
**Constants defined inside a Rake task block leak to global scope**

`NUM_USERS`, `TOTAL_HEARTBEATS`, `HBS_PER_USER`, `BATCH_SIZE`, and `ONE_YEAR_AGO` are Ruby constants (all-caps), not local variables. Constants assigned inside a block become top-level constants in Ruby. Running `rake seed:mass_heartbeats` more than once in the same process will produce `already initialized constant` warnings. Use lowercase local variables instead.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/stats_client_with_fallback.rb
Line: 1

Comment:
**Redundant file-path comment**

The first line `# lib/stats_client_with_fallback.rb` just restates the file path, which is already visible from the filename itself. Comments should explain *why*, not *what*. The lines below it (migration-phase wrapper explanation) are the useful ones.

**Rule Used:** What: Comments should only explain the "why" or co... ([source](https://app.greptile.com/review/custom-context?memory=27b5da63-27a1-4781-acad-c940e08169a4))

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "porquoi?" | Re-trigger Greptile}

greptile-apps · 2026-03-27T10:21:06Z

rust-stats-server/src/query/streaks.rs

+    // Clamp start_date to max of (provided, 30 days ago)
+    let thirty_days_ago = (Utc::now() - Duration::days(30))
+        .format("%Y-%m-%d")
+        .to_string();
+    let effective_start = match start_date {
+        Some(sd) => {
+            if sd > thirty_days_ago.as_str() {
+                sd.to_string()
+            } else {
+                thirty_days_ago.clone()
+            }
+        }
+        None => thirty_days_ago.clone(),
+    };


30-day clamp silently truncates streaks longer than 30 days

The code clamps the effective start date to at most 30 days ago. This means any user whose streak exceeds 30 days will see an incorrect (truncated) streak value — e.g., a user with a 45-day streak would see 30 instead of 45.

Even if the caller passes start_date: "2025-01-01" (90 days ago), the code will silently use 30 days ago. The original Ruby fallback in StatsClientWithFallback passes start_date through directly to Heartbeat.daily_streaks_for_users, which presumably respects an unbounded range, so these two implementations diverge for users with streaks > 30 days.

If the 30-day cap is intentional for performance, it should be enforced on the caller side (or documented as a known limitation), not silently applied here.

Prompt To Fix With AI

This is a comment left during a code review. Path: rust-stats-server/src/query/streaks.rs Line: 19-32 Comment: **30-day clamp silently truncates streaks longer than 30 days** The code clamps the effective start date to at most 30 days ago. This means any user whose streak exceeds 30 days will see an incorrect (truncated) streak value — e.g., a user with a 45-day streak would see 30 instead of 45. Even if the caller passes `start_date: "2025-01-01"` (90 days ago), the code will silently use 30 days ago. The original Ruby fallback in `StatsClientWithFallback` passes `start_date` through directly to `Heartbeat.daily_streaks_for_users`, which presumably respects an unbounded range, so these two implementations diverge for users with streaks > 30 days. If the 30-day cap is intentional for performance, it should be enforced on the caller side (or documented as a known limitation), not silently applied here. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-27T10:21:07Z

rust-stats-server/src/middleware/auth.rs

+use axum::{extract::Request, http::StatusCode, middleware::Next, response::Response};
+
+pub async fn auth_middleware(request: Request, next: Next) -> Result<Response, StatusCode> {
+    let auth_token = std::env::var("AUTH_TOKEN").unwrap_or_else(|_| "dev-token".to_string());


AUTH_TOKEN read from env on every request; Config::auth_token is dead code

The middleware calls std::env::var("AUTH_TOKEN") on every incoming request rather than reading the already-parsed value from Config. As a result the auth_token field populated in config.rs at startup is never used anywhere in the codebase — it is dead code.

The standard Axum pattern is to extend AppState with an auth_token: String field (populated once from Config) and pass it into the middleware. This keeps configuration in one place, avoids a per-request syscall, and makes the auth logic testable without mutating the environment.

Prompt To Fix With AI

This is a comment left during a code review. Path: rust-stats-server/src/middleware/auth.rs Line: 4 Comment: **`AUTH_TOKEN` read from env on every request; `Config::auth_token` is dead code** The middleware calls `std::env::var("AUTH_TOKEN")` on every incoming request rather than reading the already-parsed value from `Config`. As a result the `auth_token` field populated in `config.rs` at startup is never used anywhere in the codebase — it is dead code. The standard Axum pattern is to extend `AppState` with an `auth_token: String` field (populated once from `Config`) and pass it into the middleware. This keeps configuration in one place, avoids a per-request syscall, and makes the auth logic testable without mutating the environment. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-27T10:21:08Z

rust-stats-server/src/middleware/auth.rs

+    match auth_header {
+        Some(header) if header.starts_with("Bearer ") => {
+            let token = &header[7..];
+            if token == auth_token {


Token comparison is not constant-time

token == auth_token uses a short-circuit equality check that exits early on the first mismatched byte, making it technically vulnerable to timing attacks. For a shared-secret bearer token, this allows an attacker to brute-force the token byte-by-byte by measuring response times.

In practice this is a low-severity risk for an internal service, but using a constant-time comparison is a simple hardening step. The subtle crate (already transitively available in the dependency tree) provides ConstantTimeEq.

Prompt To Fix With AI

This is a comment left during a code review. Path: rust-stats-server/src/middleware/auth.rs Line: 14 Comment: **Token comparison is not constant-time** `token == auth_token` uses a short-circuit equality check that exits early on the first mismatched byte, making it technically vulnerable to timing attacks. For a shared-secret bearer token, this allows an attacker to brute-force the token byte-by-byte by measuring response times. In practice this is a low-severity risk for an internal service, but using a constant-time comparison is a simple hardening step. The `subtle` crate (already transitively available in the dependency tree) provides `ConstantTimeEq`. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-27T10:21:09Z

rust-stats-server/src/query/spans.rs

+
+    let mut spans = Vec::new();
+    let mut span_start = rows[0].time;
+    let mut _span_end = rows[0].time;


_span_end is tracked but never used in any calculation

_span_end is initialised and updated (line 138: _span_end = next;) but is never read when computing span boundaries. The underscore prefix suppresses the compiler warning, but this is dead code — perhaps a leftover from an earlier implementation. Remove it or add a comment explaining the intent.

Prompt To Fix With AI

This is a comment left during a code review. Path: rust-stats-server/src/query/spans.rs Line: 76 Comment: **`_span_end` is tracked but never used in any calculation** `_span_end` is initialised and updated (line 138: `_span_end = next;`) but is never read when computing span boundaries. The underscore prefix suppresses the compiler warning, but this is dead code — perhaps a leftover from an earlier implementation. Remove it or add a comment explaining the intent. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-27T10:21:17Z

lib/stats_client_with_fallback.rb

@@ -0,0 +1,64 @@
+# lib/stats_client_with_fallback.rb


Redundant file-path comment

The first line # lib/stats_client_with_fallback.rb just restates the file path, which is already visible from the filename itself. Comments should explain why, not what. The lines below it (migration-phase wrapper explanation) are the useful ones.

Rule Used: What: Comments should only explain the "why" or co... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: lib/stats_client_with_fallback.rb Line: 1 Comment: **Redundant file-path comment** The first line `# lib/stats_client_with_fallback.rb` just restates the file path, which is already visible from the filename itself. Comments should explain *why*, not *what*. The lines below it (migration-phase wrapper explanation) are the useful ones. **Rule Used:** What: Comments should only explain the "why" or co... ([source](https://app.greptile.com/review/custom-context?memory=27b5da63-27a1-4781-acad-c940e08169a4)) How can I resolve this? If you propose a fix, please make it concise.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

app/controllers/api/v1/stats_controller.rb

-          filter_by_category = params[:filter_by_category].split(",")
-          query = query.where(category: filter_by_category)
+        if category_filters.present?
+          query = query.where(category: category_filters)


app/controllers/api/hackatime/v1/hackatime_controller.rb

-      # Basically this filters out columns that aren't in our DB (the biggest one being raw_data)
-      new_heartbeat = Heartbeat.find_or_create_by(attrs)
+      }).slice(*heartbeat_lookup_columns)
+      new_heartbeat = Heartbeat.find_or_initialize_by(attrs)


app/controllers/api/hackatime/v1/hackatime_controller.rb

-      # Basically this filters out columns that aren't in our DB (the biggest one being raw_data)
-      new_heartbeat = Heartbeat.find_or_create_by(attrs)
+      }).slice(*heartbeat_lookup_columns)
+      new_heartbeat = Heartbeat.find_or_initialize_by(attrs)


porquoi?

3a2f1a9

greptile-apps bot reviewed Mar 27, 2026

View reviewed changes

goog?

b7e36c0

github-code-quality bot found potential problems Mar 28, 2026

View reviewed changes

app/controllers/api/v1/stats_controller.rb

filter_by_category = params[:filter_by_category].split(",")

query = query.where(category: filter_by_category)

if category_filters.present?

query = query.where(category: category_filters)

skyfallwastaken added 2 commits March 29, 2026 14:36

Stuff and things

ba95304

?????

51628c4

github-advanced-security bot found potential problems Mar 30, 2026

View reviewed changes

github-code-quality bot found potential problems Mar 30, 2026

View reviewed changes

app/controllers/api/hackatime/v1/hackatime_controller.rb

# Basically this filters out columns that aren't in our DB (the biggest one being raw_data)

new_heartbeat = Heartbeat.find_or_create_by(attrs)

}).slice(*heartbeat_lookup_columns)

new_heartbeat = Heartbeat.find_or_initialize_by(attrs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate stats stuff to a separate Rust server#1116

Migrate stats stuff to a separate Rust server#1116
skyfallwastaken wants to merge 4 commits intomainfrom
rust-server

skyfallwastaken commented Mar 27, 2026

Uh oh!

skyfallwastaken commented Mar 27, 2026

Uh oh!

greptile-apps bot commented Mar 27, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

greptile-apps bot Mar 27, 2026

Uh oh!

greptile-apps bot Mar 27, 2026

Uh oh!

greptile-apps bot Mar 27, 2026

Uh oh!

greptile-apps bot Mar 27, 2026

Uh oh!

greptile-apps bot Mar 27, 2026

Uh oh!

Check failure

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

skyfallwastaken commented Mar 27, 2026

Uh oh!

skyfallwastaken commented Mar 27, 2026

Uh oh!

greptile-apps bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

greptile-apps bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Mar 27, 2026 •

edited

Loading