How to Build a Dead-Simple Feature Flag System From Scratch That Gives Your Team Gradual AI Rollout Controls

How to Build a Dead-Simple Feature Flag System From Scratch That Gives Your Team Gradual AI Rollout Controls

The search results weren't helpful, but I have deep expertise on this topic. Here's the complete tutorial: ---

Your team just shipped a shiny new AI-powered feature. Maybe it's a copilot sidebar, an LLM-driven search ranking engine, or an intelligent code reviewer baked into your internal tooling. Everyone's excited. Then someone asks: "How do we roll this out safely without blowing up production for every user at once?"

The instinct is to reach for a third-party feature flag service. LaunchDarkly, Statsig, Unleash, Flagsmith. They're great tools. But they also mean a new vendor contract, a new SDK dependency, a new failure domain in your infrastructure, and a new bill that scales with your user count. For many teams, especially those in regulated industries or with strict supply-chain security requirements, that tradeoff isn't worth it.

The good news: a feature flag system that handles gradual AI rollouts, user targeting, and kill switches is not nearly as complicated to build as you might think. In this tutorial, you'll build one from scratch using plain code, a single database table, and a handful of clean abstractions. No third-party SDK required.

Why AI Features Specifically Need Gradual Rollout Controls

Traditional feature flags often answer a binary question: is this UI component enabled? AI features introduce a different class of problem. They are:

  • Probabilistically unpredictable. An LLM-backed feature might work perfectly for 95% of inputs and catastrophically fail on edge cases you haven't seen yet in testing.
  • Expensive to run at scale. Inference costs money. Rolling out to 100% of users on day one can cause an immediate and surprising spike in your cloud bill.
  • Sensitive to data distribution shifts. A model trained on one population of users may behave differently when exposed to a broader, more diverse user base.
  • Hard to roll back cleanly. If an AI feature has written data, made recommendations, or sent emails, a simple code revert doesn't undo the damage.

Gradual rollout controls let you expose your AI feature to 1% of users, watch your error rates, latency percentiles, and cost dashboards, and then dial it up incrementally. That's the pattern we're building toward.

The Architecture: What We're Building

Our system will have four core components:

  1. A flags table in your existing database (PostgreSQL in this guide, but trivially portable)
  2. A flag evaluator that resolves whether a flag is on for a given user
  3. A percentage-based bucketing function that deterministically assigns users to cohorts
  4. A thin admin API so your team can update flags without a code deploy

We'll write the core logic in TypeScript, but the concepts map directly to Python, Go, Ruby, or any language your team uses.

Step 1: Design the Database Schema

Start with a single table. Resist the urge to over-engineer it on day one.

CREATE TABLE feature_flags (
  id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  key           TEXT NOT NULL UNIQUE,
  description   TEXT,
  enabled       BOOLEAN NOT NULL DEFAULT FALSE,
  rollout_pct   NUMERIC(5, 2) NOT NULL DEFAULT 0.00, 0.00 to 100.00
  targeting     JSONB, optional user attribute rules
  created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Index for fast key lookups (this is your hot path)
CREATE INDEX idx_feature_flags_key ON feature_flags(key);

Let's walk through the important columns:

  • key: A human-readable identifier like ai_search_ranking or copilot_sidebar_v2. This is what your application code references.
  • enabled: A master kill switch. If this is FALSE, the flag is off for everyone, regardless of rollout_pct. This is your emergency brake.
  • rollout_pct: A number from 0 to 100 representing what percentage of users should see the feature.
  • targeting: A JSONB blob for optional attribute-based rules, like "only users on the enterprise plan" or "only internal employees." We'll implement this in Step 3.

Step 2: Build the Deterministic Bucketing Function

This is the heart of the system. When you ask "is this flag enabled for user abc-123?", you need a function that:

  1. Always returns the same answer for the same user and flag combination
  2. Distributes users uniformly across the 0-100 range
  3. Does not change a user's assignment when you adjust the rollout percentage (only when you cross their bucket threshold)

The standard approach is to hash a combination of the flag key and the user ID, then map the hash to a 0-100 value. Here's the implementation:

import { createHash } from "crypto";

/**
 * Returns a stable float in [0, 100) for a given user + flag combination.
 * The same userId + flagKey will always produce the same bucket value.
 */
export function getBucketValue(userId: string, flagKey: string): number {
  const input = `${flagKey}::${userId}`;
  const hash = createHash("sha256").update(input).digest("hex");

  // Take the first 8 hex characters (32 bits) and normalize to [0, 100)
  const intValue = parseInt(hash.substring(0, 8), 16);
  return (intValue / 0xffffffff) * 100;
}

Why include the flag key in the hash? Without it, every flag would assign the same 1% of users to every early rollout. That creates a biased "canary" population rather than a true random sample. Mixing in the flag key ensures each flag independently randomizes users across cohorts.

Let's verify the logic with a quick mental model:

  • User alice for flag ai_search hashes to bucket 23.4
  • User alice for flag copilot_sidebar hashes to bucket 71.8
  • When ai_search rollout is at 25%, Alice is included (23.4 < 25)
  • When copilot_sidebar rollout is at 50%, Alice is also included (71.8 is NOT < 50, so she's excluded)

Crucially, when you bump ai_search from 25% to 30%, Alice stays in. Users already in the cohort don't get evicted as you expand. This is the "sticky" property that makes gradual rollouts feel consistent to users.

Step 3: Implement the Flag Evaluator

Now we wire up the database query and the bucketing function into a clean evaluator class.

import { Pool } from "pg";
import { getBucketValue } from "./bucketing";

export interface FeatureFlag {
  key: string;
  enabled: boolean;
  rollout_pct: number;
  targeting: Record<string, unknown> | null;
}

export interface UserContext {
  userId: string;
  plan?: string;       // e.g. "free", "pro", "enterprise"
  isEmployee?: boolean;
  email?: string;
  [key: string]: unknown;
}

export class FlagEvaluator {
  private db: Pool;
  private cache: Map<string, { flag: FeatureFlag; expiresAt: number }>;
  private cacheTtlMs: number;

  constructor(db: Pool, cacheTtlMs = 30_000) {
    this.db = db;
    this.cache = new Map();
    this.cacheTtlMs = cacheTtlMs;
  }

  /**
   * Core evaluation method. Returns true if the flag is enabled for this user.
   */
  async isEnabled(flagKey: string, user: UserContext): Promise<boolean> {
    const flag = await this.getFlag(flagKey);

    // Flag not found: fail closed (safe default)
    if (!flag) return false;

    // Master kill switch
    if (!flag.enabled) return false;

    // Check targeting rules first (these override percentage rollout)
    if (flag.targeting) {
      const targetingResult = this.evaluateTargeting(flag.targeting, user);
      if (targetingResult !== null) return targetingResult;
    }

    // Percentage-based rollout
    const bucket = getBucketValue(user.userId, flagKey);
    return bucket < flag.rollout_pct;
  }

  /**
   * Evaluates JSONB targeting rules against the user context.
   * Returns true/false if a rule matched, or null to fall through to rollout_pct.
   */
  private evaluateTargeting(
    targeting: Record<string, unknown>,
    user: UserContext
  ): boolean | null {
    // Force-enable for specific user IDs (great for internal testing)
    if (Array.isArray(targeting.allowlist)) {
      if ((targeting.allowlist as string[]).includes(user.userId)) return true;
    }

    // Force-disable for specific user IDs
    if (Array.isArray(targeting.blocklist)) {
      if ((targeting.blocklist as string[]).includes(user.userId)) return false;
    }

    // Plan-based targeting (e.g., only enterprise users)
    if (Array.isArray(targeting.plans) && user.plan) {
      if (!(targeting.plans as string[]).includes(user.plan)) return false;
    }

    // Employee fast-track: give all internal employees access immediately
    if (targeting.employees_only === true) {
      return user.isEmployee === true;
    }

    // No targeting rule matched: fall through to percentage rollout
    return null;
  }

  /**
   * Fetches a flag from cache or database.
   */
  private async getFlag(key: string): Promise<FeatureFlag | null> {
    const now = Date.now();
    const cached = this.cache.get(key);

    if (cached && cached.expiresAt > now) {
      return cached.flag;
    }

    const result = await this.db.query<FeatureFlag>(
      "SELECT key, enabled, rollout_pct, targeting FROM feature_flags WHERE key = $1",
      [key]
    );

    const flag = result.rows[0] ?? null;

    if (flag) {
      this.cache.set(key, { flag, expiresAt: now + this.cacheTtlMs });
    }

    return flag;
  }

  /**
   * Invalidate a specific flag from the cache (call after admin updates).
   */
  invalidate(flagKey: string): void {
    this.cache.delete(flagKey);
  }
}

A few design decisions worth calling out explicitly:

  • Fail closed. If the flag doesn't exist in the database, isEnabled returns false. Your AI feature stays off. This is the safe default for any system where the AI path is the "new" path.
  • In-process caching with a short TTL. A 30-second cache means you're not hammering your database on every request. When you update a flag, the change propagates within 30 seconds without any cache invalidation infrastructure. You can tune this TTL per your latency tolerance.
  • Targeting rules take priority over percentage rollout. This lets you force-enable a flag for your QA team or internal employees at 0% global rollout, which is exactly the workflow you want during pre-launch testing.

Step 4: Using the Evaluator in Your Application

Here's how you'd wire this into a typical API route. Let's say you're rolling out an AI-powered product description generator:

import { FlagEvaluator } from "./flags/evaluator";
import { db } from "./db";

const flags = new FlagEvaluator(db);

// In your route handler:
async function getProductDescription(req: Request, res: Response) {
  const user: UserContext = {
    userId: req.user.id,
    plan: req.user.subscriptionPlan,
    isEmployee: req.user.email.endsWith("@yourcompany.com"),
  };

  const useAIDescription = await flags.isEnabled("ai_product_description", user);

  if (useAIDescription) {
    const description = await generateWithLLM(req.params.productId);
    return res.json({ description, source: "ai" });
  }

  const description = await fetchStaticDescription(req.params.productId);
  return res.json({ description, source: "static" });
}

Clean, readable, and the flag evaluation adds negligible overhead thanks to the in-process cache. Your application code doesn't care how the flag system works internally; it just asks a yes-or-no question.

Step 5: Build the Admin API for Flag Management

A feature flag system you can only update via SQL is a footgun. Build a minimal admin API so your team can adjust rollout percentages safely from a UI or CLI without touching the database directly.

import express from "express";
import { db } from "./db";
import { flags } from "./flags"; // your shared FlagEvaluator instance

const adminRouter = express.Router();

// Require internal auth middleware on all admin routes
adminRouter.use(requireInternalAuth);

// List all flags
adminRouter.get("/flags", async (req, res) => {
  const result = await db.query(
    "SELECT * FROM feature_flags ORDER BY created_at DESC"
  );
  res.json(result.rows);
});

// Create a new flag
adminRouter.post("/flags", async (req, res) => {
  const { key, description, rollout_pct = 0, targeting = null } = req.body;

  const result = await db.query(
    `INSERT INTO feature_flags (key, description, enabled, rollout_pct, targeting)
     VALUES ($1, $2, FALSE, $3, $4)
     RETURNING *`,
    [key, description, rollout_pct, targeting ? JSON.stringify(targeting) : null]
  );

  res.status(201).json(result.rows[0]);
});

// Update rollout percentage (the main operation during a gradual rollout)
adminRouter.patch("/flags/:key/rollout", async (req, res) => {
  const { rollout_pct } = req.body;

  if (typeof rollout_pct !== "number" || rollout_pct < 0 || rollout_pct > 100) {
    return res.status(400).json({ error: "rollout_pct must be a number between 0 and 100" });
  }

  const result = await db.query(
    `UPDATE feature_flags
     SET rollout_pct = $1, updated_at = NOW()
     WHERE key = $2
     RETURNING *`,
    [rollout_pct, req.params.key]
  );

  if (result.rowCount === 0) {
    return res.status(404).json({ error: "Flag not found" });
  }

  // Bust the cache immediately so the change takes effect
  flags.invalidate(req.params.key);

  res.json(result.rows[0]);
});

// Toggle the master kill switch
adminRouter.patch("/flags/:key/toggle", async (req, res) => {
  const result = await db.query(
    `UPDATE feature_flags
     SET enabled = NOT enabled, updated_at = NOW()
     WHERE key = $2
     RETURNING *`,
    [req.params.key]
  );

  flags.invalidate(req.params.key);
  res.json(result.rows[0]);
});

Step 6: Define Your AI Rollout Playbook

The system is only as good as the process around it. Here's the gradual rollout sequence we recommend for AI features specifically:

Phase 1: Internal Only (0% global, employees_only: true)

Insert the flag with enabled = TRUE, rollout_pct = 0, and targeting = {"employees_only": true}. Your entire internal team gets access. Run this for at least one full week. Watch for hallucinations, latency spikes, and edge-case failures in your internal tooling before any customer sees the feature.

Phase 2: Allowlist Beta (0% global, specific user IDs)

Invite 20-50 trusted customers into the beta by adding their user IDs to the allowlist array in the targeting JSON. This is your first real-world signal. Monitor cost per request, error rates, and qualitative feedback from your beta users.

Phase 3: 1% Canary

Set rollout_pct = 1 and remove targeting overrides. One percent of your real user base is now hitting the AI path. At this stage you're validating your infrastructure: can your LLM provider handle the load? Are your timeouts and fallback paths working correctly?

Phase 4: Ramp to 10%, then 25%, then 50%

Each increment should be gated on a 24-48 hour observation window. Define your rollback criteria before you start ramping. For example: "If p99 latency exceeds 3 seconds or error rate exceeds 0.5%, we toggle the kill switch immediately." Having these thresholds written down prevents the heated debate that happens at 2am when something goes wrong.

Phase 5: 100% and Deprecate the Flag

Once you're at 100% with stable metrics for 7+ days, remove the flag check from your code and delete the row from the database. Dead flags are technical debt. A flag that stays in the codebase forever is a flag that will confuse someone six months from now.

Handling the Edge Cases You'll Actually Hit

What about anonymous users?

If you have unauthenticated users, generate a stable anonymous ID (stored in a cookie or local storage) and use that as the userId. The bucketing function doesn't care whether the ID maps to a real account; it just needs a stable string. When an anonymous user signs up, you can optionally carry their bucket assignment forward by mapping their anonymous ID to their new account ID in your user table.

What if the database is unavailable?

Extend the evaluator to catch database errors and return false (fail closed). For higher-availability requirements, pre-warm the cache on startup by loading all flags into memory, and serve from cache even if the database is temporarily unreachable. This turns a hard dependency into a soft one.

// Add to FlagEvaluator constructor
async warmCache(): Promise<void> {
  const result = await this.db.query(
    "SELECT key, enabled, rollout_pct, targeting FROM feature_flags WHERE enabled = TRUE"
  );
  const now = Date.now();
  for (const flag of result.rows) {
    this.cache.set(flag.key, { flag, expiresAt: now + this.cacheTtlMs });
  }
}

What about server-side rendering or edge functions?

In SSR or edge environments where you can't maintain a long-lived in-process cache, push flag state to a fast key-value store like Redis or your CDN's edge config. Evaluate flags once per request at the edge and pass the results down as a context object. The bucketing function is pure and stateless, so it can run anywhere.

What This System Intentionally Doesn't Do

Honesty about scope is important. This system is deliberately minimal. It does not provide:

  • A built-in analytics dashboard. Track flag exposure events in your existing analytics pipeline by logging { flagKey, userId, result } whenever isEnabled is called. Feed that into your data warehouse.
  • Multivariate (A/B/C) flags. The percentage bucketing can be extended to return a variant string instead of a boolean, but that's a meaningful increase in complexity. Build it when you need it.
  • Real-time flag streaming. Changes propagate within your cache TTL window (30 seconds by default). If you need sub-second propagation, you need a more sophisticated system, and at that point a managed solution may genuinely be the right call.
  • Audit logs. Add a flag_audit_log table and a database trigger on feature_flags updates if your compliance requirements demand it. The schema is trivial; it's just not included here to keep the tutorial focused.

Conclusion: Own Your Infrastructure Where It Counts

The third-party feature flag market exists because feature flags feel complex. The reality is that the core primitive, a deterministic hash mapping a user to a bucket, is about 10 lines of code. Everything else is UI chrome and enterprise sales packaging.

For teams shipping AI features in 2026, where inference costs, model behavior, and data privacy are first-class concerns, owning your rollout infrastructure gives you control that a third-party SDK often can't match. You can tailor your targeting rules to your exact data model, integrate directly with your existing auth and observability stack, and eliminate an entire class of vendor outage risk from your critical path.

Build the simple thing. Add complexity only when the pain of the simple thing becomes real and specific. This system will serve most teams well past their first million users, and the day you genuinely outgrow it, you'll know exactly what you need from a replacement because you built the foundation yourself.

The code in this tutorial is intentionally production-ready, not just illustrative. Drop it into your repo, write a migration, and you'll have a working feature flag system before your next standup.