Migrate stats stuff to a separate Rust server#1116
Migrate stats stuff to a separate Rust server#1116skyfallwastaken wants to merge 4 commits intomainfrom
Conversation
|
@greptileai please review and identify any potential regressions |
Greptile SummaryThis PR introduces a new Axum-based Rust microservice ( Confidence Score: 3/5Not ready to merge — the streak 30-day cap is a silent data-correctness regression and the auth token architecture has dead code that should be resolved before production. Two P1 issues: the streak lookback cap that truncates real user data, and the auth middleware bypassing the Config system entirely. The fallback wrapper also has option-dropping gaps that can silently produce wrong numbers during the migration window, and the PR ships with no tests despite the repository policy requiring them.
Important Files Changed
Sequence DiagramsequenceDiagram
participant Rails as Rails App
participant Fallback as StatsClientWithFallback
participant Client as StatsClient (HTTP)
participant Rust as Rust stats-server (Axum)
participant PG as PostgreSQL
Rails->>Fallback: duration / streaks / spans / ...
Fallback->>Client: delegate call
Client->>Rust: POST /api/v1/<endpoint> (Bearer token)
Rust->>Rust: auth_middleware (reads AUTH_TOKEN from ENV)
Rust->>PG: parameterised SQL query
PG-->>Rust: result rows
Rust-->>Client: JSON response
Client-->>Fallback: parsed Hash
Fallback-->>Rails: result
alt ConnectionError (Rust unreachable)
Client--xFallback: StatsClient::ConnectionError
Fallback->>PG: direct ActiveRecord fallback
PG-->>Fallback: result
Fallback-->>Rails: result
end
|
| // Clamp start_date to max of (provided, 30 days ago) | ||
| let thirty_days_ago = (Utc::now() - Duration::days(30)) | ||
| .format("%Y-%m-%d") | ||
| .to_string(); | ||
| let effective_start = match start_date { | ||
| Some(sd) => { | ||
| if sd > thirty_days_ago.as_str() { | ||
| sd.to_string() | ||
| } else { | ||
| thirty_days_ago.clone() | ||
| } | ||
| } | ||
| None => thirty_days_ago.clone(), | ||
| }; |
There was a problem hiding this comment.
30-day clamp silently truncates streaks longer than 30 days
The code clamps the effective start date to at most 30 days ago. This means any user whose streak exceeds 30 days will see an incorrect (truncated) streak value — e.g., a user with a 45-day streak would see 30 instead of 45.
Even if the caller passes start_date: "2025-01-01" (90 days ago), the code will silently use 30 days ago. The original Ruby fallback in StatsClientWithFallback passes start_date through directly to Heartbeat.daily_streaks_for_users, which presumably respects an unbounded range, so these two implementations diverge for users with streaks > 30 days.
If the 30-day cap is intentional for performance, it should be enforced on the caller side (or documented as a known limitation), not silently applied here.
Prompt To Fix With AI
This is a comment left during a code review.
Path: rust-stats-server/src/query/streaks.rs
Line: 19-32
Comment:
**30-day clamp silently truncates streaks longer than 30 days**
The code clamps the effective start date to at most 30 days ago. This means any user whose streak exceeds 30 days will see an incorrect (truncated) streak value — e.g., a user with a 45-day streak would see 30 instead of 45.
Even if the caller passes `start_date: "2025-01-01"` (90 days ago), the code will silently use 30 days ago. The original Ruby fallback in `StatsClientWithFallback` passes `start_date` through directly to `Heartbeat.daily_streaks_for_users`, which presumably respects an unbounded range, so these two implementations diverge for users with streaks > 30 days.
If the 30-day cap is intentional for performance, it should be enforced on the caller side (or documented as a known limitation), not silently applied here.
How can I resolve this? If you propose a fix, please make it concise.| use axum::{extract::Request, http::StatusCode, middleware::Next, response::Response}; | ||
|
|
||
| pub async fn auth_middleware(request: Request, next: Next) -> Result<Response, StatusCode> { | ||
| let auth_token = std::env::var("AUTH_TOKEN").unwrap_or_else(|_| "dev-token".to_string()); |
There was a problem hiding this comment.
AUTH_TOKEN read from env on every request; Config::auth_token is dead code
The middleware calls std::env::var("AUTH_TOKEN") on every incoming request rather than reading the already-parsed value from Config. As a result the auth_token field populated in config.rs at startup is never used anywhere in the codebase — it is dead code.
The standard Axum pattern is to extend AppState with an auth_token: String field (populated once from Config) and pass it into the middleware. This keeps configuration in one place, avoids a per-request syscall, and makes the auth logic testable without mutating the environment.
Prompt To Fix With AI
This is a comment left during a code review.
Path: rust-stats-server/src/middleware/auth.rs
Line: 4
Comment:
**`AUTH_TOKEN` read from env on every request; `Config::auth_token` is dead code**
The middleware calls `std::env::var("AUTH_TOKEN")` on every incoming request rather than reading the already-parsed value from `Config`. As a result the `auth_token` field populated in `config.rs` at startup is never used anywhere in the codebase — it is dead code.
The standard Axum pattern is to extend `AppState` with an `auth_token: String` field (populated once from `Config`) and pass it into the middleware. This keeps configuration in one place, avoids a per-request syscall, and makes the auth logic testable without mutating the environment.
How can I resolve this? If you propose a fix, please make it concise.| match auth_header { | ||
| Some(header) if header.starts_with("Bearer ") => { | ||
| let token = &header[7..]; | ||
| if token == auth_token { |
There was a problem hiding this comment.
Token comparison is not constant-time
token == auth_token uses a short-circuit equality check that exits early on the first mismatched byte, making it technically vulnerable to timing attacks. For a shared-secret bearer token, this allows an attacker to brute-force the token byte-by-byte by measuring response times.
In practice this is a low-severity risk for an internal service, but using a constant-time comparison is a simple hardening step. The subtle crate (already transitively available in the dependency tree) provides ConstantTimeEq.
Prompt To Fix With AI
This is a comment left during a code review.
Path: rust-stats-server/src/middleware/auth.rs
Line: 14
Comment:
**Token comparison is not constant-time**
`token == auth_token` uses a short-circuit equality check that exits early on the first mismatched byte, making it technically vulnerable to timing attacks. For a shared-secret bearer token, this allows an attacker to brute-force the token byte-by-byte by measuring response times.
In practice this is a low-severity risk for an internal service, but using a constant-time comparison is a simple hardening step. The `subtle` crate (already transitively available in the dependency tree) provides `ConstantTimeEq`.
How can I resolve this? If you propose a fix, please make it concise.
rust-stats-server/src/query/spans.rs
Outdated
|
|
||
| let mut spans = Vec::new(); | ||
| let mut span_start = rows[0].time; | ||
| let mut _span_end = rows[0].time; |
There was a problem hiding this comment.
_span_end is tracked but never used in any calculation
_span_end is initialised and updated (line 138: _span_end = next;) but is never read when computing span boundaries. The underscore prefix suppresses the compiler warning, but this is dead code — perhaps a leftover from an earlier implementation. Remove it or add a comment explaining the intent.
Prompt To Fix With AI
This is a comment left during a code review.
Path: rust-stats-server/src/query/spans.rs
Line: 76
Comment:
**`_span_end` is tracked but never used in any calculation**
`_span_end` is initialised and updated (line 138: `_span_end = next;`) but is never read when computing span boundaries. The underscore prefix suppresses the compiler warning, but this is dead code — perhaps a leftover from an earlier implementation. Remove it or add a comment explaining the intent.
How can I resolve this? If you propose a fix, please make it concise.
lib/stats_client_with_fallback.rb
Outdated
| @@ -0,0 +1,64 @@ | |||
| # lib/stats_client_with_fallback.rb | |||
There was a problem hiding this comment.
The first line # lib/stats_client_with_fallback.rb just restates the file path, which is already visible from the filename itself. Comments should explain why, not what. The lines below it (migration-phase wrapper explanation) are the useful ones.
Rule Used: What: Comments should only explain the "why" or co... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/stats_client_with_fallback.rb
Line: 1
Comment:
**Redundant file-path comment**
The first line `# lib/stats_client_with_fallback.rb` just restates the file path, which is already visible from the filename itself. Comments should explain *why*, not *what*. The lines below it (migration-phase wrapper explanation) are the useful ones.
**Rule Used:** What: Comments should only explain the "why" or co... ([source](https://app.greptile.com/review/custom-context?memory=27b5da63-27a1-4781-acad-c940e08169a4))
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| # Basically this filters out columns that aren't in our DB (the biggest one being raw_data) | ||
| new_heartbeat = Heartbeat.find_or_create_by(attrs) | ||
| }).slice(*heartbeat_lookup_columns) | ||
| new_heartbeat = Heartbeat.find_or_initialize_by(attrs) |
Check failure
Code scanning / CodeQL
SQL query built from user-controlled sources High
Copilot Autofix
AI about 16 hours ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
No description provided.