Skip to content

Migrate stats stuff to a separate Rust server#1116

Draft
skyfallwastaken wants to merge 4 commits intomainfrom
rust-server
Draft

Migrate stats stuff to a separate Rust server#1116
skyfallwastaken wants to merge 4 commits intomainfrom
rust-server

Conversation

@skyfallwastaken
Copy link
Copy Markdown
Member

No description provided.

@skyfallwastaken
Copy link
Copy Markdown
Member Author

@greptileai please review and identify any potential regressions

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 27, 2026

Greptile Summary

This PR introduces a new Axum-based Rust microservice (rust-stats-server) that offloads compute-heavy stats queries (durations, spans, streaks, leaderboard, profile, etc.) from the Rails monolith, along with a Ruby StatsClient wrapper and a StatsClientWithFallback migration-phase shim that falls back to in-process Ruby when the Rust server is unreachable. A Docker Compose service definition and a mass-heartbeat seed task are also added.\n\nKey concerns:\n\n- Streak cap (P1): query/streaks.rs silently clamps any requested lookback to 30 days. Users with streaks longer than 30 days will receive a truncated count that diverges from the previous Ruby implementation.\n- Dead Config::auth_token / per-request env read (P1): The auth middleware reads AUTH_TOKEN directly from the environment on every request, completely bypassing the Config struct populated at startup — making that field unreachable dead code.\n- No tests included: Per repository policy, new functionality should ship with tests. Neither the Rust routes/queries nor the Ruby client/fallback have any test coverage.\n- Fallback opt gaps (P2): The duration_boundary_aware fallback silently drops coding_only and categories_exclude options; duration_grouped drops categories_exclude. This can produce incorrect data during the migration window.\n- Rake constant pollution (P2): Five top-level Ruby constants are defined inside the rake task body and will generate already initialized constant warnings on repeated runs.\n- _span_end dead variable (P2): Tracked but never used in any span-boundary computation in query/spans.rs.

Confidence Score: 3/5

Not ready to merge — the streak 30-day cap is a silent data-correctness regression and the auth token architecture has dead code that should be resolved before production.

Two P1 issues: the streak lookback cap that truncates real user data, and the auth middleware bypassing the Config system entirely. The fallback wrapper also has option-dropping gaps that can silently produce wrong numbers during the migration window, and the PR ships with no tests despite the repository policy requiring them.

rust-stats-server/src/query/streaks.rs (streak cap), rust-stats-server/src/middleware/auth.rs (dead config field + per-request env read), lib/stats_client_with_fallback.rb (fallback option gaps)

Important Files Changed

Filename Overview
rust-stats-server/src/query/streaks.rs Streak computation is hard-capped to a 30-day lookback window, silently truncating any streak longer than 30 days — a behavioral regression vs. the Ruby fallback.
rust-stats-server/src/middleware/auth.rs Auth token is re-read from the environment on every request instead of from app state, making Config::auth_token dead code; token comparison is also not constant-time.
rust-stats-server/src/query/filters.rs All user-controlled values are bound as typed query parameters — no SQL injection risk.
rust-stats-server/src/query/spans.rs Span-building logic is sound; _span_end variable is tracked but never used, indicating dead code from a prior iteration.
lib/stats_client.rb Clean HTTP wrapper; uses bearer auth, proper timeout, and compact helper to strip nil params before sending to the Rust server.
lib/stats_client_with_fallback.rb Migration-phase fallback wrapper; only catches ConnectionError (not ServerError), and the duration_boundary_aware and duration_grouped fallbacks silently drop some filter opts.
rust-stats-server/src/routes/daily_durations.rs Timezone is validated against chrono-tz allowlist before interpolation into SQL, preventing injection.
lib/tasks/seed_mass_heartbeats.rake Useful load-testing seed data generator; top-level Ruby constants inside the task block will cause already initialized constant warnings on repeated runs.
rust-stats-server/src/config.rs auth_token field is populated but never consumed — the middleware bypasses this struct entirely.
rust-stats-server/Dockerfile Two-stage build (builder + slim runtime) is correctly structured; dependency-caching layer is in place.

Sequence Diagram

sequenceDiagram
    participant Rails as Rails App
    participant Fallback as StatsClientWithFallback
    participant Client as StatsClient (HTTP)
    participant Rust as Rust stats-server (Axum)
    participant PG as PostgreSQL

    Rails->>Fallback: duration / streaks / spans / ...
    Fallback->>Client: delegate call
    Client->>Rust: POST /api/v1/<endpoint> (Bearer token)
    Rust->>Rust: auth_middleware (reads AUTH_TOKEN from ENV)
    Rust->>PG: parameterised SQL query
    PG-->>Rust: result rows
    Rust-->>Client: JSON response
    Client-->>Fallback: parsed Hash
    Fallback-->>Rails: result

    alt ConnectionError (Rust unreachable)
        Client--xFallback: StatsClient::ConnectionError
        Fallback->>PG: direct ActiveRecord fallback
        PG-->>Fallback: result
        Fallback-->>Rails: result
    end
Loading

Comments Outside Diff (2)

  1. lib/stats_client_with_fallback.rb, line 217-224 (link)

    P2 duration_boundary_aware fallback silently ignores coding_only and categories_exclude

    The Ruby fallback for duration_boundary_aware only applies the project filter from **opts. If the caller passes coding_only: true or categories_exclude: [...], those are silently dropped in the fallback path, causing the result to include non-coding heartbeats.

    Similarly, duration_grouped's fallback applies coding_only but ignores categories_exclude. Since this is a migration-phase wrapper, the discrepancy may cause data inconsistencies that are hard to debug while both paths are active.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: lib/stats_client_with_fallback.rb
    Line: 217-224
    
    Comment:
    **`duration_boundary_aware` fallback silently ignores `coding_only` and `categories_exclude`**
    
    The Ruby fallback for `duration_boundary_aware` only applies the `project` filter from `**opts`. If the caller passes `coding_only: true` or `categories_exclude: [...]`, those are silently dropped in the fallback path, causing the result to include non-coding heartbeats.
    
    Similarly, `duration_grouped`'s fallback applies `coding_only` but ignores `categories_exclude`. Since this is a migration-phase wrapper, the discrepancy may cause data inconsistencies that are hard to debug while both paths are active.
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. lib/tasks/seed_mass_heartbeats.rake, line 265-269 (link)

    P2 Constants defined inside a Rake task block leak to global scope

    NUM_USERS, TOTAL_HEARTBEATS, HBS_PER_USER, BATCH_SIZE, and ONE_YEAR_AGO are Ruby constants (all-caps), not local variables. Constants assigned inside a block become top-level constants in Ruby. Running rake seed:mass_heartbeats more than once in the same process will produce already initialized constant warnings. Use lowercase local variables instead.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: lib/tasks/seed_mass_heartbeats.rake
    Line: 265-269
    
    Comment:
    **Constants defined inside a Rake task block leak to global scope**
    
    `NUM_USERS`, `TOTAL_HEARTBEATS`, `HBS_PER_USER`, `BATCH_SIZE`, and `ONE_YEAR_AGO` are Ruby constants (all-caps), not local variables. Constants assigned inside a block become top-level constants in Ruby. Running `rake seed:mass_heartbeats` more than once in the same process will produce `already initialized constant` warnings. Use lowercase local variables instead.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: rust-stats-server/src/query/streaks.rs
Line: 19-32

Comment:
**30-day clamp silently truncates streaks longer than 30 days**

The code clamps the effective start date to at most 30 days ago. This means any user whose streak exceeds 30 days will see an incorrect (truncated) streak value — e.g., a user with a 45-day streak would see 30 instead of 45.

Even if the caller passes `start_date: "2025-01-01"` (90 days ago), the code will silently use 30 days ago. The original Ruby fallback in `StatsClientWithFallback` passes `start_date` through directly to `Heartbeat.daily_streaks_for_users`, which presumably respects an unbounded range, so these two implementations diverge for users with streaks > 30 days.

If the 30-day cap is intentional for performance, it should be enforced on the caller side (or documented as a known limitation), not silently applied here.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: rust-stats-server/src/middleware/auth.rs
Line: 4

Comment:
**`AUTH_TOKEN` read from env on every request; `Config::auth_token` is dead code**

The middleware calls `std::env::var("AUTH_TOKEN")` on every incoming request rather than reading the already-parsed value from `Config`. As a result the `auth_token` field populated in `config.rs` at startup is never used anywhere in the codebase — it is dead code.

The standard Axum pattern is to extend `AppState` with an `auth_token: String` field (populated once from `Config`) and pass it into the middleware. This keeps configuration in one place, avoids a per-request syscall, and makes the auth logic testable without mutating the environment.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: rust-stats-server/src/middleware/auth.rs
Line: 14

Comment:
**Token comparison is not constant-time**

`token == auth_token` uses a short-circuit equality check that exits early on the first mismatched byte, making it technically vulnerable to timing attacks. For a shared-secret bearer token, this allows an attacker to brute-force the token byte-by-byte by measuring response times.

In practice this is a low-severity risk for an internal service, but using a constant-time comparison is a simple hardening step. The `subtle` crate (already transitively available in the dependency tree) provides `ConstantTimeEq`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: rust-stats-server/src/query/spans.rs
Line: 76

Comment:
**`_span_end` is tracked but never used in any calculation**

`_span_end` is initialised and updated (line 138: `_span_end = next;`) but is never read when computing span boundaries. The underscore prefix suppresses the compiler warning, but this is dead code — perhaps a leftover from an earlier implementation. Remove it or add a comment explaining the intent.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/stats_client_with_fallback.rb
Line: 217-224

Comment:
**`duration_boundary_aware` fallback silently ignores `coding_only` and `categories_exclude`**

The Ruby fallback for `duration_boundary_aware` only applies the `project` filter from `**opts`. If the caller passes `coding_only: true` or `categories_exclude: [...]`, those are silently dropped in the fallback path, causing the result to include non-coding heartbeats.

Similarly, `duration_grouped`'s fallback applies `coding_only` but ignores `categories_exclude`. Since this is a migration-phase wrapper, the discrepancy may cause data inconsistencies that are hard to debug while both paths are active.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/tasks/seed_mass_heartbeats.rake
Line: 265-269

Comment:
**Constants defined inside a Rake task block leak to global scope**

`NUM_USERS`, `TOTAL_HEARTBEATS`, `HBS_PER_USER`, `BATCH_SIZE`, and `ONE_YEAR_AGO` are Ruby constants (all-caps), not local variables. Constants assigned inside a block become top-level constants in Ruby. Running `rake seed:mass_heartbeats` more than once in the same process will produce `already initialized constant` warnings. Use lowercase local variables instead.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/stats_client_with_fallback.rb
Line: 1

Comment:
**Redundant file-path comment**

The first line `# lib/stats_client_with_fallback.rb` just restates the file path, which is already visible from the filename itself. Comments should explain *why*, not *what*. The lines below it (migration-phase wrapper explanation) are the useful ones.

**Rule Used:** What: Comments should only explain the "why" or co... ([source](https://app.greptile.com/review/custom-context?memory=27b5da63-27a1-4781-acad-c940e08169a4))

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "porquoi?" | Re-trigger Greptile

Comment on lines +19 to +32
// Clamp start_date to max of (provided, 30 days ago)
let thirty_days_ago = (Utc::now() - Duration::days(30))
.format("%Y-%m-%d")
.to_string();
let effective_start = match start_date {
Some(sd) => {
if sd > thirty_days_ago.as_str() {
sd.to_string()
} else {
thirty_days_ago.clone()
}
}
None => thirty_days_ago.clone(),
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 30-day clamp silently truncates streaks longer than 30 days

The code clamps the effective start date to at most 30 days ago. This means any user whose streak exceeds 30 days will see an incorrect (truncated) streak value — e.g., a user with a 45-day streak would see 30 instead of 45.

Even if the caller passes start_date: "2025-01-01" (90 days ago), the code will silently use 30 days ago. The original Ruby fallback in StatsClientWithFallback passes start_date through directly to Heartbeat.daily_streaks_for_users, which presumably respects an unbounded range, so these two implementations diverge for users with streaks > 30 days.

If the 30-day cap is intentional for performance, it should be enforced on the caller side (or documented as a known limitation), not silently applied here.

Prompt To Fix With AI
This is a comment left during a code review.
Path: rust-stats-server/src/query/streaks.rs
Line: 19-32

Comment:
**30-day clamp silently truncates streaks longer than 30 days**

The code clamps the effective start date to at most 30 days ago. This means any user whose streak exceeds 30 days will see an incorrect (truncated) streak value — e.g., a user with a 45-day streak would see 30 instead of 45.

Even if the caller passes `start_date: "2025-01-01"` (90 days ago), the code will silently use 30 days ago. The original Ruby fallback in `StatsClientWithFallback` passes `start_date` through directly to `Heartbeat.daily_streaks_for_users`, which presumably respects an unbounded range, so these two implementations diverge for users with streaks > 30 days.

If the 30-day cap is intentional for performance, it should be enforced on the caller side (or documented as a known limitation), not silently applied here.

How can I resolve this? If you propose a fix, please make it concise.

use axum::{extract::Request, http::StatusCode, middleware::Next, response::Response};

pub async fn auth_middleware(request: Request, next: Next) -> Result<Response, StatusCode> {
let auth_token = std::env::var("AUTH_TOKEN").unwrap_or_else(|_| "dev-token".to_string());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 AUTH_TOKEN read from env on every request; Config::auth_token is dead code

The middleware calls std::env::var("AUTH_TOKEN") on every incoming request rather than reading the already-parsed value from Config. As a result the auth_token field populated in config.rs at startup is never used anywhere in the codebase — it is dead code.

The standard Axum pattern is to extend AppState with an auth_token: String field (populated once from Config) and pass it into the middleware. This keeps configuration in one place, avoids a per-request syscall, and makes the auth logic testable without mutating the environment.

Prompt To Fix With AI
This is a comment left during a code review.
Path: rust-stats-server/src/middleware/auth.rs
Line: 4

Comment:
**`AUTH_TOKEN` read from env on every request; `Config::auth_token` is dead code**

The middleware calls `std::env::var("AUTH_TOKEN")` on every incoming request rather than reading the already-parsed value from `Config`. As a result the `auth_token` field populated in `config.rs` at startup is never used anywhere in the codebase — it is dead code.

The standard Axum pattern is to extend `AppState` with an `auth_token: String` field (populated once from `Config`) and pass it into the middleware. This keeps configuration in one place, avoids a per-request syscall, and makes the auth logic testable without mutating the environment.

How can I resolve this? If you propose a fix, please make it concise.

match auth_header {
Some(header) if header.starts_with("Bearer ") => {
let token = &header[7..];
if token == auth_token {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Token comparison is not constant-time

token == auth_token uses a short-circuit equality check that exits early on the first mismatched byte, making it technically vulnerable to timing attacks. For a shared-secret bearer token, this allows an attacker to brute-force the token byte-by-byte by measuring response times.

In practice this is a low-severity risk for an internal service, but using a constant-time comparison is a simple hardening step. The subtle crate (already transitively available in the dependency tree) provides ConstantTimeEq.

Prompt To Fix With AI
This is a comment left during a code review.
Path: rust-stats-server/src/middleware/auth.rs
Line: 14

Comment:
**Token comparison is not constant-time**

`token == auth_token` uses a short-circuit equality check that exits early on the first mismatched byte, making it technically vulnerable to timing attacks. For a shared-secret bearer token, this allows an attacker to brute-force the token byte-by-byte by measuring response times.

In practice this is a low-severity risk for an internal service, but using a constant-time comparison is a simple hardening step. The `subtle` crate (already transitively available in the dependency tree) provides `ConstantTimeEq`.

How can I resolve this? If you propose a fix, please make it concise.


let mut spans = Vec::new();
let mut span_start = rows[0].time;
let mut _span_end = rows[0].time;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _span_end is tracked but never used in any calculation

_span_end is initialised and updated (line 138: _span_end = next;) but is never read when computing span boundaries. The underscore prefix suppresses the compiler warning, but this is dead code — perhaps a leftover from an earlier implementation. Remove it or add a comment explaining the intent.

Prompt To Fix With AI
This is a comment left during a code review.
Path: rust-stats-server/src/query/spans.rs
Line: 76

Comment:
**`_span_end` is tracked but never used in any calculation**

`_span_end` is initialised and updated (line 138: `_span_end = next;`) but is never read when computing span boundaries. The underscore prefix suppresses the compiler warning, but this is dead code — perhaps a leftover from an earlier implementation. Remove it or add a comment explaining the intent.

How can I resolve this? If you propose a fix, please make it concise.

@@ -0,0 +1,64 @@
# lib/stats_client_with_fallback.rb
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant file-path comment

The first line # lib/stats_client_with_fallback.rb just restates the file path, which is already visible from the filename itself. Comments should explain why, not what. The lines below it (migration-phase wrapper explanation) are the useful ones.

Rule Used: What: Comments should only explain the "why" or co... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/stats_client_with_fallback.rb
Line: 1

Comment:
**Redundant file-path comment**

The first line `# lib/stats_client_with_fallback.rb` just restates the file path, which is already visible from the filename itself. Comments should explain *why*, not *what*. The lines below it (migration-phase wrapper explanation) are the useful ones.

**Rule Used:** What: Comments should only explain the "why" or co... ([source](https://app.greptile.com/review/custom-context?memory=27b5da63-27a1-4781-acad-c940e08169a4))

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

filter_by_category = params[:filter_by_category].split(",")
query = query.where(category: filter_by_category)
if category_filters.present?
query = query.where(category: category_filters)
# Basically this filters out columns that aren't in our DB (the biggest one being raw_data)
new_heartbeat = Heartbeat.find_or_create_by(attrs)
}).slice(*heartbeat_lookup_columns)
new_heartbeat = Heartbeat.find_or_initialize_by(attrs)

Check failure

Code scanning / CodeQL

SQL query built from user-controlled sources High

This SQL query depends on a
user-provided value
.
This SQL query depends on a
user-provided value
.

Copilot Autofix

AI about 16 hours ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

# Basically this filters out columns that aren't in our DB (the biggest one being raw_data)
new_heartbeat = Heartbeat.find_or_create_by(attrs)
}).slice(*heartbeat_lookup_columns)
new_heartbeat = Heartbeat.find_or_initialize_by(attrs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant