·Pramod Dutta

How to Handle Flaky Tests in GitHub Actions, Jenkins, and GitLab CI

Learn platform-specific strategies for managing flaky tests in CI/CD pipelines. Covers retry configurations, quarantine workflows, flaky test dashboards, and DeFlaky CI integration with YAML examples for GitHub Actions, Jenkins, and GitLab CI.

flaky tests CI/CDgithub actions flakyjenkins flaky testsgitlab ci flakyci test retrytest quarantineflaky test dashboardci pipeline reliabilitytest stability cideflaky ci integration

How to Handle Flaky Tests in GitHub Actions, Jenkins, and GitLab CI

Flaky tests are annoying in any context, but they are especially destructive in CI/CD pipelines. A single flaky test can block an entire deployment, waste CI compute minutes, and force developers to repeatedly push empty commits or click "re-run" just to get a green build. Across the industry, engineering teams report that flaky tests account for 15-30% of all CI failures.

This guide provides platform-specific, copy-paste-ready solutions for handling flaky tests in the three most popular CI/CD platforms: GitHub Actions, Jenkins, and GitLab CI. We will cover retry strategies, quarantine workflows, monitoring dashboards, and how to integrate flaky test detection tools like DeFlaky into your pipeline.

The Cost of Flaky Tests in CI/CD

Before diving into solutions, let us quantify the problem. Consider a team with:

  • 2,000 tests in the suite
  • A 2% flake rate (40 flaky tests)
  • 50 CI runs per day
  • Average pipeline duration of 20 minutes

With a 2% flake rate across 2,000 tests and 50 daily runs, the probability that at least one test flakes in any given run is extremely high. In practice, this means:

  • 10-15 pipeline failures per day from flaky tests (not real bugs)
  • 3-5 hours of developer time wasted daily investigating and re-running
  • Increased CI costs from retry runs consuming compute resources
  • Slower release velocity as teams wait for "clean" builds
  • Trust erosion as developers start ignoring test failures entirely
  • The last point is the most dangerous. Once a team stops trusting their CI pipeline, they start merging code without green builds, which defeats the entire purpose of automated testing.

    Strategy Overview: The Three-Layer Approach

    Effective flaky test management in CI requires three layers:

  • Detection: Identify which tests are flaky before they cause problems
  • Mitigation: Implement retries and quarantines to prevent flaky tests from blocking deployments
  • Resolution: Track, prioritize, and fix flaky tests systematically
  • Each CI platform has different mechanisms for implementing these layers. Let us walk through each one.

    GitHub Actions: Flaky Test Management

    GitHub Actions is the most widely used CI platform for open-source and many commercial projects. Here is how to handle flaky tests effectively.

    Basic Retry Configuration

    GitHub Actions does not have native test-level retry support, but you can implement it at multiple levels.

    Job-level retry with a reusable workflow:
    # .github/workflows/test.yml
    

    name: Test Suite

    on: [push, pull_request]

    jobs:

    test:

    runs-on: ubuntu-latest

    strategy:

    fail-fast: false

    matrix:

    attempt: [1, 2, 3]

    steps:

    - uses: actions/checkout@v4

    - name: Setup Node.js

    uses: actions/setup-node@v4

    with:

    node-version: '20'

    cache: 'npm'

    - name: Install dependencies

    run: npm ci

    - name: Run tests

    id: test

    run: npm test

    continue-on-error: ${{ matrix.attempt < 3 }}

    - name: Check test result

    if: steps.test.outcome == 'failure' && matrix.attempt == 3

    run: exit 1

    This approach is wasteful because it runs the entire suite three times. A better strategy is framework-level retry.

    Framework-level retry (recommended):
    # .github/workflows/test.yml
    

    name: Test Suite

    on: [push, pull_request]

    jobs:

    test:

    runs-on: ubuntu-latest

    steps:

    - uses: actions/checkout@v4

    - name: Setup Node.js

    uses: actions/setup-node@v4

    with:

    node-version: '20'

    cache: 'npm'

    - name: Install dependencies

    run: npm ci

    - name: Run tests with retry

    run: |

    # Jest with retry

    npx jest --forceExit --detectOpenHandles 2>&1 | tee test-output.txt

    if [ $? -ne 0 ]; then

    echo "::warning::First run failed, retrying failed tests..."

    # Extract failed test files and rerun only those

    FAILED=$(grep -E "FAIL " test-output.txt | awk '{print $2}')

    if [ -n "$FAILED" ]; then

    npx jest $FAILED --forceExit --detectOpenHandles

    fi

    fi

    - name: Upload test results

    if: always()

    uses: actions/upload-artifact@v4

    with:

    name: test-results

    path: test-output.txt

    Quarantine Workflow for GitHub Actions

    A quarantine workflow separates known flaky tests from the main pipeline. The main pipeline skips quarantined tests, while a separate workflow runs them and reports results without blocking merges:

    # .github/workflows/main-tests.yml
    

    name: Main Test Suite

    on: [push, pull_request]

    jobs:

    test:

    runs-on: ubuntu-latest

    steps:

    - uses: actions/checkout@v4

    - name: Setup

    uses: actions/setup-node@v4

    with:

    node-version: '20'

    cache: 'npm'

    - run: npm ci

    - name: Run stable tests (skip quarantined)

    run: |

    npx jest --testPathIgnorePatterns="$(cat .quarantine | tr '\n' '|' | sed 's/|$//')"

    env:

    CI: true


    .github/workflows/quarantine-tests.yml

    name: Quarantined Tests

    on:

    push:

    branches: [main]

    schedule:

    - cron: '0 /6 ' # Run every 6 hours

    jobs:

    quarantine:

    runs-on: ubuntu-latest

    steps:

    - uses: actions/checkout@v4

    - name: Setup

    uses: actions/setup-node@v4

    with:

    node-version: '20'

    cache: 'npm'

    - run: npm ci

    - name: Run quarantined tests (10 times each)

    run: |

    echo "## Quarantined Test Report" > quarantine-report.md

    echo "" >> quarantine-report.md

    echo "| Test | Pass Rate | Status |" >> quarantine-report.md

    echo "|------|-----------|--------|" >> quarantine-report.md

    while IFS= read -r test; do

    passes=0

    total=10

    for i in $(seq 1 $total); do

    if npx jest "$test" --forceExit 2>/dev/null; then

    passes=$((passes + 1))

    fi

    done

    rate=$((passes * 100 / total))

    if [ "$rate" -eq 100 ]; then

    status="Ready to unquarantine"

    elif [ "$rate" -ge 80 ]; then

    status="Improving"

    else

    status="Still flaky"

    fi

    echo "| $test | ${rate}% | $status |" >> quarantine-report.md

    done < .quarantine

    cat quarantine-report.md >> $GITHUB_STEP_SUMMARY

    - name: Create issue for flaky tests

    if: always()

    uses: actions/github-script@v7

    with:

    script: |

    const fs = require('fs');

    const report = fs.readFileSync('quarantine-report.md', 'utf8');

    // Find existing issue or create new one

    const issues = await github.rest.issues.listForRepo({

    owner: context.repo.owner,

    repo: context.repo.repo,

    labels: 'flaky-tests',

    state: 'open',

    });

    if (issues.data.length > 0) {

    await github.rest.issues.createComment({

    owner: context.repo.owner,

    repo: context.repo.repo,

    issue_number: issues.data[0].number,

    body: ## Quarantine Report - ${new Date().toISOString().split('T')[0]}\n\n${report},

    });

    } else {

    await github.rest.issues.create({

    owner: context.repo.owner,

    repo: context.repo.repo,

    title: 'Flaky Test Quarantine Report',

    body: report,

    labels: ['flaky-tests'],

    });

    }

    The quarantine file (.quarantine):
    src/__tests__/checkout.test.js
    

    src/__tests__/payment-integration.test.js

    src/__tests__/websocket-notifications.test.js

    GitHub Actions Flaky Test Dashboard

    Use GitHub Actions job summaries to create a lightweight flaky test dashboard:

    # .github/workflows/flaky-dashboard.yml
    

    name: Flaky Test Dashboard

    on:

    schedule:

    - cron: '0 6 1' # Weekly on Monday mornings

    workflow_dispatch:

    jobs:

    dashboard:

    runs-on: ubuntu-latest

    steps:

    - uses: actions/checkout@v4

    - name: Analyze recent test runs

    uses: actions/github-script@v7

    with:

    script: |

    const runs = await github.rest.actions.listWorkflowRuns({

    owner: context.repo.owner,

    repo: context.repo.repo,

    workflow_id: 'test.yml',

    per_page: 100,

    status: 'completed',

    });

    const totalRuns = runs.data.workflow_runs.length;

    const failedRuns = runs.data.workflow_runs.filter(r => r.conclusion === 'failure').length;

    const successRate = ((totalRuns - failedRuns) / totalRuns * 100).toFixed(1);

    let summary = # Flaky Test Dashboard\n\n;

    summary += Period: Last ${totalRuns} runs\n\n;

    summary += Pipeline Success Rate: ${successRate}%\n\n;

    summary += Failed Runs: ${failedRuns}/${totalRuns}\n\n;

    if (successRate < 95) {

    summary += > Warning: Pipeline reliability is below 95%. Investigate flaky tests.\n\n;

    }

    await core.summary.addRaw(summary).write();

    Jenkins: Flaky Test Management

    Jenkins has mature flaky test handling capabilities through plugins and its scripted pipeline syntax.

    Jenkins Test Retry with the Flaky Test Handler Plugin

    Jenkins has a dedicated plugin for flaky tests:

    // Jenkinsfile
    

    pipeline {

    agent any

    options {

    // Retry the entire pipeline up to 2 times

    retry(2)

    }

    stages {

    stage('Setup') {

    steps {

    checkout scm

    sh 'npm ci'

    }

    }

    stage('Test') {

    steps {

    script {

    def testResult = sh(

    script: 'npm test -- --ci --reporters=default --reporters=jest-junit',

    returnStatus: true

    )

    if (testResult != 0) {

    echo 'Tests failed, running retry for failed tests...'

    sh '''

    # Extract failed test files from JUnit report

    FAILED=$(grep -l 'failures="[1-9]' test-results/*.xml | \

    xargs grep 'classname=' | \

    sed 's/.classname="\\([^"]\\)".*/\\1/' | \

    sort -u)

    if [ -n "$FAILED" ]; then

    npx jest $FAILED --ci --reporters=jest-junit

    fi

    '''

    }

    }

    }

    post {

    always {

    junit 'test-results/*/.xml'

    }

    }

    }

    }

    }

    Advanced Jenkins Pipeline with Quarantine

    // Jenkinsfile with quarantine support
    

    pipeline {

    agent any

    environment {

    QUARANTINE_FILE = '.quarantine'

    }

    stages {

    stage('Setup') {

    steps {

    checkout scm

    sh 'npm ci'

    }

    }

    stage('Run Stable Tests') {

    steps {

    script {

    def quarantined = ''

    if (fileExists(env.QUARANTINE_FILE)) {

    quarantined = readFile(env.QUARANTINE_FILE)

    .trim()

    .split('\n')

    .collect { "--testPathIgnorePatterns='${it}'" }

    .join(' ')

    }

    sh """

    npx jest --ci --reporters=jest-junit \

    --outputFile=stable-results.xml \

    ${quarantined}

    """

    }

    }

    post {

    always {

    junit 'stable-results.xml'

    }

    }

    }

    stage('Run Quarantined Tests') {

    when {

    branch 'main'

    }

    steps {

    script {

    if (fileExists(env.QUARANTINE_FILE)) {

    def tests = readFile(env.QUARANTINE_FILE).trim().split('\n')

    def results = [:]

    tests.each { test ->

    def passes = 0

    def runs = 5

    for (int i = 0; i < runs; i++) {

    def exitCode = sh(

    script: "npx jest '${test}' --ci --forceExit 2>/dev/null",

    returnStatus: true

    )

    if (exitCode == 0) passes++

    }

    results[test] = [passes: passes, runs: runs]

    }

    // Generate report

    def report = "Quarantine Report\\n"

    report += "=================\\n\\n"

    results.each { test, data ->

    def rate = (data.passes * 100 / data.runs) as int

    report += "${test}: ${rate}% pass rate (${data.passes}/${data.runs})\\n"

    }

    echo report

    }

    }

    }

    }

    }

    post {

    failure {

    script {

    // Send Slack notification for flaky test failures

    if (currentBuild.previousBuild?.result == 'SUCCESS') {

    slackSend(

    channel: '#test-reliability',

    color: 'warning',

    message: "Pipeline failed after previous success - possible flaky test: ${env.BUILD_URL}"

    )

    }

    }

    }

    }

    }

    Jenkins Flaky Test Detection with DeFlaky

    // Jenkinsfile with DeFlaky integration
    

    pipeline {

    agent any

    stages {

    stage('Setup') {

    steps {

    checkout scm

    sh 'npm ci'

    sh 'npm install -g deflaky'

    }

    }

    stage('Test') {

    steps {

    sh 'npx jest --ci --reporters=jest-junit'

    }

    post {

    always {

    junit 'junit.xml'

    }

    }

    }

    stage('Flaky Test Analysis') {

    when {

    branch 'main'

    expression {

    // Run analysis on scheduled builds or manually

    return params.ANALYZE_FLAKY ?: (env.BUILD_NUMBER.toInteger() % 10 == 0)

    }

    }

    steps {

    sh '''

    deflaky analyze \

    --framework jest \

    --runs 10 \

    --output deflaky-report.json \

    --format junit

    '''

    }

    post {

    always {

    archiveArtifacts 'deflaky-report.json'

    script {

    def report = readJSON file: 'deflaky-report.json'

    def flakyCount = report.flaky_tests?.size() ?: 0

    if (flakyCount > 0) {

    echo "Found ${flakyCount} flaky tests"

    // Update the quarantine file

    def quarantine = report.flaky_tests

    .findAll { it.flake_score > 0.1 }

    .collect { it.test_file }

    .join('\n')

    writeFile file: '.quarantine', text: quarantine

    }

    }

    }

    }

    }

    }

    }

    Jenkins Shared Library for Flaky Test Handling

    For teams with multiple Jenkins projects, create a shared library:

    // vars/testWithRetry.groovy
    

    def call(Map config = [:]) {

    def maxRetries = config.maxRetries ?: 2

    def testCommand = config.command ?: 'npm test'

    def reportPath = config.reportPath ?: 'test-results/*/.xml'

    def attempt = 0

    def success = false

    while (attempt <= maxRetries && !success) {

    attempt++

    echo "Test attempt ${attempt}/${maxRetries + 1}"

    def exitCode = sh(script: testCommand, returnStatus: true)

    if (exitCode == 0) {

    success = true

    if (attempt > 1) {

    echo "WARNING: Tests passed on attempt ${attempt} - flaky tests detected"

    // Tag the build

    currentBuild.description = "Flaky (passed on attempt ${attempt})"

    }

    } else if (attempt <= maxRetries) {

    echo "Tests failed, retrying..."

    }

    }

    junit reportPath

    if (!success) {

    error "Tests failed after ${maxRetries + 1} attempts"

    }

    }

    // Usage in Jenkinsfile:

    // testWithRetry(command: 'npx jest --ci', maxRetries: 2)

    GitLab CI: Flaky Test Management

    GitLab CI has the most mature built-in support for flaky tests among the three platforms, with native retry configuration and a flaky test reporting feature.

    GitLab CI Retry Configuration

    GitLab CI supports test-level retry natively:

    # .gitlab-ci.yml
    

    stages:

    - test

    - analyze

    variables:

    NODE_ENV: test

    .test-template: &test-template

    image: node:20

    before_script:

    - npm ci --cache .npm

    cache:

    key: $CI_COMMIT_REF_SLUG

    paths:

    - .npm

    artifacts:

    when: always

    reports:

    junit: junit.xml

    paths:

    - coverage/

    expire_in: 30 days

    Main test job with retry

    test:

    <<: *test-template

    stage: test

    retry:

    max: 2

    when:

    - script_failure

    - runner_system_failure

    - stuck_or_timeout_failure

    script:

    - npx jest --ci --reporters=default --reporters=jest-junit

    rules:

    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

    Parallel test execution with retry

    test-parallel:

    <<: *test-template

    stage: test

    parallel: 4

    retry:

    max: 2

    when:

    - script_failure

    script:

    - |

    # Split tests across parallel jobs

    TOTAL_JOBS=$CI_NODE_TOTAL

    CURRENT_JOB=$CI_NODE_INDEX

    # Get all test files and select this job's portion

    ALL_TESTS=$(find src -name "*.test.js" | sort)

    SELECTED_TESTS=$(echo "$ALL_TESTS" | awk "NR % $TOTAL_JOBS == $CURRENT_JOB - 1")

    if [ -n "$SELECTED_TESTS" ]; then

    npx jest $SELECTED_TESTS --ci --reporters=jest-junit

    fi

    GitLab CI Quarantine Pipeline

    # .gitlab-ci.yml
    
    

    Run stable tests on every MR

    stable-tests:

    <<: *test-template

    stage: test

    script:

    - |

    if [ -f .quarantine ]; then

    IGNORE_PATTERN=$(cat .quarantine | tr '\n' '|' | sed 's/|$//')

    npx jest --ci --reporters=jest-junit \

    --testPathIgnorePatterns="$IGNORE_PATTERN"

    else

    npx jest --ci --reporters=jest-junit

    fi

    rules:

    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

    Run quarantined tests separately (non-blocking)

    quarantine-tests:

    <<: *test-template

    stage: test

    allow_failure: true # Don't block the pipeline

    script:

    - |

    if [ ! -f .quarantine ]; then

    echo "No quarantined tests"

    exit 0

    fi

    echo "Running quarantined tests..."

    QUARANTINED=$(cat .quarantine)

    REPORT="# Quarantine Report\n\n"

    REPORT+="| Test File | Pass Rate | Recommendation |\n"

    REPORT+="|-----------|-----------|----------------|\n"

    for test in $QUARANTINED; do

    PASSES=0

    RUNS=10

    for i in $(seq 1 $RUNS); do

    if npx jest "$test" --ci --forceExit 2>/dev/null; then

    PASSES=$((PASSES + 1))

    fi

    done

    RATE=$((PASSES * 100 / RUNS))

    if [ "$RATE" -eq 100 ]; then

    REC="Remove from quarantine"

    elif [ "$RATE" -ge 80 ]; then

    REC="Improving - monitor"

    elif [ "$RATE" -ge 50 ]; then

    REC="Needs investigation"

    else

    REC="Critical - fix urgently"

    fi

    REPORT+="| $test | ${RATE}% | $REC |\n"

    done

    echo -e "$REPORT"

    echo -e "$REPORT" > quarantine-report.md

    artifacts:

    paths:

    - quarantine-report.md

    expire_in: 7 days

    rules:

    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

    - if: $CI_PIPELINE_SOURCE == "schedule"

    Scheduled flaky test analysis

    flaky-analysis:

    <<: *test-template

    stage: analyze

    script:

    - npm install -g deflaky

    - |

    deflaky analyze \

    --framework jest \

    --runs 15 \

    --output deflaky-report.json \

    --threshold 0.05

    - |

    # Update quarantine file based on DeFlaky results

    deflaky quarantine \

    --input deflaky-report.json \

    --output .quarantine \

    --threshold 0.1

    - |

    # Commit updated quarantine file if changed

    if git diff --quiet .quarantine 2>/dev/null; then

    echo "No quarantine changes"

    else

    git config user.email "ci@example.com"

    git config user.name "CI Bot"

    git add .quarantine

    git commit -m "chore: update flaky test quarantine list"

    git push "https://oauth2:${CI_PUSH_TOKEN}@${CI_SERVER_HOST}/${CI_PROJECT_PATH}.git" HEAD:${CI_COMMIT_REF_NAME}

    fi

    artifacts:

    paths:

    - deflaky-report.json

    expire_in: 30 days

    rules:

    - if: $CI_PIPELINE_SOURCE == "schedule"

    when: always

    GitLab CI Unit Test Report with Flaky Detection

    GitLab CI has built-in support for identifying flaky tests through its Unit Test Reports feature:

    # .gitlab-ci.yml
    

    test:

    script:

    - npx jest --ci --reporters=jest-junit

    artifacts:

    when: always

    reports:

    junit: junit.xml

    When a test fails on one retry but passes on another, GitLab automatically marks it as "flaky" in the merge request UI. This is visible in the Test Report tab of the pipeline.

    To enhance this with more detailed tracking:

    test-with-tracking:
    

    stage: test

    script:

    - |

    # Run tests and capture results

    npx jest --ci --reporters=jest-junit --outputFile=junit-attempt1.xml 2>&1 || true

    # If there were failures, rerun failed tests

    FAILED=$(grep 'failures="[1-9]' junit-attempt1.xml 2>/dev/null | wc -l)

    if [ "$FAILED" -gt 0 ]; then

    echo "Retrying failed tests..."

    FAILED_FILES=$(grep -oP 'classname="\K[^"]+' junit-attempt1.xml | sort -u)

    npx jest $FAILED_FILES --ci --reporters=jest-junit --outputFile=junit-attempt2.xml

    # Compare results

    STILL_FAILING=$(grep 'failures="[1-9]' junit-attempt2.xml 2>/dev/null | wc -l)

    FLAKY=$((FAILED - STILL_FAILING))

    echo "Results: $FAILED initial failures, $FLAKY flaky, $STILL_FAILING real failures"

    if [ "$STILL_FAILING" -gt 0 ]; then

    exit 1

    fi

    fi

    artifacts:

    when: always

    reports:

    junit:

    - junit-attempt1.xml

    - junit-attempt2.xml

    Cross-Platform Strategies

    Some strategies work across all CI platforms. These are the most valuable because they are portable.

    Strategy 1: Smart Test Splitting by Flakiness

    Instead of splitting tests randomly or by file count, split them by reliability:

    #!/bin/bash
    

    split-tests.sh - Used by all CI platforms

    Read flakiness data from DeFlaky

    FLAKY_TESTS=$(deflaky list --status flaky --format paths)

    STABLE_TESTS=$(deflaky list --status stable --format paths)

    if [ "$1" = "stable" ]; then

    echo "$STABLE_TESTS"

    elif [ "$1" = "flaky" ]; then

    echo "$FLAKY_TESTS"

    fi

    Strategy 2: Test Impact Analysis

    Only run tests affected by the code changes in the current PR:

    # Works in any CI platform
    

    GitHub Actions example:

    • name: Get changed files

    id: changed

    run: |

    CHANGED=$(git diff --name-only origin/${{ github.base_ref }}...HEAD)

    echo "files=$CHANGED" >> $GITHUB_OUTPUT

    • name: Run affected tests

    run: |

    # Use Jest's --findRelatedTests to only run tests affected by changes

    npx jest --findRelatedTests ${{ steps.changed.outputs.files }}

    Strategy 3: Automatic Quarantine Management

    Create a script that automatically manages the quarantine list:

    #!/bin/bash
    

    manage-quarantine.sh

    ACTION=$1 # add, remove, check, report

    QUARANTINE_FILE=".quarantine"

    case $ACTION in

    add)

    TEST_FILE=$2

    REASON=$3

    echo "${TEST_FILE} # ${REASON} $(date -I)" >> $QUARANTINE_FILE

    sort -u -o $QUARANTINE_FILE $QUARANTINE_FILE

    echo "Added $TEST_FILE to quarantine"

    ;;

    remove)

    TEST_FILE=$2

    grep -v "^${TEST_FILE}" $QUARANTINE_FILE > tmp && mv tmp $QUARANTINE_FILE

    echo "Removed $TEST_FILE from quarantine"

    ;;

    check)

    TEST_FILE=$2

    if grep -q "^${TEST_FILE}" $QUARANTINE_FILE 2>/dev/null; then

    echo "QUARANTINED"

    exit 0

    else

    echo "ACTIVE"

    exit 1

    fi

    ;;

    report)

    echo "=== Quarantine Report ==="

    echo "Total quarantined: $(wc -l < $QUARANTINE_FILE)"

    echo ""

    echo "Quarantined tests:"

    cat $QUARANTINE_FILE

    echo ""

    # Check age of quarantined tests

    echo "Tests quarantined for more than 7 days:"

    while IFS= read -r line; do

    DATE=$(echo "$line" | grep -oP '\d{4}-\d{2}-\d{2}$')

    if [ -n "$DATE" ]; then

    AGE=$(( ($(date +%s) - $(date -d "$DATE" +%s)) / 86400 ))

    if [ "$AGE" -gt 7 ]; then

    echo " WARNING: $line ($AGE days old)"

    fi

    fi

    done < $QUARANTINE_FILE

    ;;

    esac

    Strategy 4: Flaky Test Notifications

    Set up notifications that alert the right people when flaky tests are detected:

    # GitHub Actions notification
    
    • name: Notify on flaky tests

    if: steps.test.outcome == 'failure' && steps.retry.outcome == 'success'

    uses: slackapi/slack-github-action@v1

    with:

    payload: |

    {

    "channel": "#test-reliability",

    "text": "Flaky test detected in ${{ github.repository }}",

    "blocks": [

    {

    "type": "section",

    "text": {

    "type": "mrkdwn",

    "text": "Flaky Test Detected\nRepo: ${{ github.repository }}\nBranch: ${{ github.ref_name }}\nPR: ${{ github.event.pull_request.html_url || 'N/A' }}\nPipeline: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"

    }

    }

    ]

    }

    env:

    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

    DeFlaky CI Integration (All Platforms)

    DeFlaky provides first-class CI integration that works across GitHub Actions, Jenkins, and GitLab CI. Here is how to set it up for each platform.

    GitHub Actions + DeFlaky

    # .github/workflows/deflaky.yml
    

    name: DeFlaky Analysis

    on:

    schedule:

    - cron: '0 2 *' # Nightly at 2 AM

    workflow_dispatch:

    jobs:

    analyze:

    runs-on: ubuntu-latest

    steps:

    - uses: actions/checkout@v4

    - uses: actions/setup-node@v4

    with:

    node-version: '20'

    cache: 'npm'

    - run: npm ci

    - run: npm install -g deflaky

    - name: Run DeFlaky Analysis

    run: |

    deflaky analyze \

    --framework jest \

    --runs 15 \

    --output report.json \

    --ci github-actions

    env:

    DEFLAKY_API_KEY: ${{ secrets.DEFLAKY_API_KEY }}

    - name: Generate Dashboard

    run: deflaky dashboard --input report.json --output dashboard.html

    - name: Deploy Dashboard

    uses: peaceiris/actions-gh-pages@v3

    with:

    github_token: ${{ secrets.GITHUB_TOKEN }}

    publish_dir: ./dashboard.html

    destination_dir: flaky-dashboard

    - name: Comment on Recent PRs

    run: |

    deflaky notify \

    --input report.json \

    --format github-pr \

    --token ${{ secrets.GITHUB_TOKEN }}

    Jenkins + DeFlaky

    // Jenkinsfile
    

    pipeline {

    agent any

    triggers {

    cron('H 2 *') // Nightly

    }

    stages {

    stage('DeFlaky Analysis') {

    steps {

    checkout scm

    sh 'npm ci'

    sh 'npm install -g deflaky'

    sh '''

    deflaky analyze \

    --framework jest \

    --runs 15 \

    --output report.json \

    --ci jenkins

    '''

    sh 'deflaky dashboard --input report.json --output dashboard.html'

    }

    post {

    always {

    archiveArtifacts 'report.json, dashboard.html'

    publishHTML([

    allowMissing: false,

    reportDir: '.',

    reportFiles: 'dashboard.html',

    reportName: 'Flaky Test Dashboard'

    ])

    }

    }

    }

    }

    }

    GitLab CI + DeFlaky

    # .gitlab-ci.yml
    

    deflaky-analysis:

    stage: analyze

    image: node:20

    before_script:

    - npm ci

    - npm install -g deflaky

    script:

    - |

    deflaky analyze \

    --framework jest \

    --runs 15 \

    --output report.json \

    --ci gitlab

    - deflaky dashboard --input report.json --output public/index.html

    artifacts:

    paths:

    - report.json

    - public/

    expire_in: 30 days

    pages:

    stage: deploy

    script:

    - echo "Deploying dashboard to GitLab Pages"

    artifacts:

    paths:

    - public

    rules:

    - if: $CI_PIPELINE_SOURCE == "schedule"

    Building a Flaky Test Dashboard

    Regardless of your CI platform, you need visibility into test reliability trends. Here is how to build a lightweight dashboard.

    Data Collection

    Store test results in a structured format after every CI run:

    #!/bin/bash
    

    collect-test-data.sh

    Run after every test execution

    TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)

    BRANCH=$CI_COMMIT_BRANCH

    COMMIT=$CI_COMMIT_SHA

    BUILD_ID=$CI_BUILD_ID

    Parse JUnit XML results

    node -e "

    const fs = require('fs');

    const xml = fs.readFileSync('junit.xml', 'utf8');

    // Simple XML parsing for test results

    const tests = xml.match(/]*>/g) || [];

    const results = tests.map(tc => {

    const name = tc.match(/name=\"([^\"]+)\"/)?.[1] || 'unknown';

    const classname = tc.match(/classname=\"([^\"]+)\"/)?.[1] || 'unknown';

    const time = parseFloat(tc.match(/time=\"([^\"]+)\"/)?.[1] || '0');

    const failed = xml.includes('', xml.indexOf(tc));

    return {

    name,

    classname,

    time,

    status: failed ? 'fail' : 'pass',

    timestamp: '${TIMESTAMP}',

    branch: '${BRANCH}',

    commit: '${COMMIT}',

    build: '${BUILD_ID}'

    };

    });

    // Append to JSONL file

    const lines = results.map(r => JSON.stringify(r)).join('\n');

    fs.appendFileSync('test-history.jsonl', lines + '\n');

    "

    Analysis and Visualization

    // analyze-flakiness.js
    

    const fs = require('fs');

    const readline = require('readline');

    async function analyzeFlakiness() {

    const testHistory = {};

    const lines = fs.readFileSync('test-history.jsonl', 'utf8').split('\n').filter(Boolean);

    lines.forEach(line => {

    const result = JSON.parse(line);

    const key = ${result.classname}::${result.name};

    if (!testHistory[key]) {

    testHistory[key] = { passes: 0, failures: 0, runs: 0, times: [] };

    }

    testHistory[key].runs++;

    testHistory[key].times.push(result.time);

    if (result.status === 'pass') {

    testHistory[key].passes++;

    } else {

    testHistory[key].failures++;

    }

    });

    // Calculate flake rates

    const flakeReport = Object.entries(testHistory)

    .map(([name, data]) => ({

    name,

    runs: data.runs,

    flakeRate: data.failures / data.runs,

    avgTime: data.times.reduce((a, b) => a + b, 0) / data.times.length,

    isFlaky: data.failures > 0 && data.passes > 0,

    }))

    .filter(t => t.isFlaky)

    .sort((a, b) => b.flakeRate - a.flakeRate);

    console.log('Flaky Tests Report');

    console.log('==================');

    console.log(Total tests analyzed: ${Object.keys(testHistory).length});

    console.log(Flaky tests found: ${flakeReport.length});

    console.log('');

    flakeReport.forEach(t => {

    console.log(${t.name});

    console.log( Flake rate: ${(t.flakeRate * 100).toFixed(1)}%);

    console.log( Runs: ${t.runs});

    console.log( Avg time: ${t.avgTime.toFixed(2)}s);

    console.log('');

    });

    return flakeReport;

    }

    analyzeFlakiness();

    Best Practices for CI Flaky Test Management

    1. Never Ignore Flaky Tests

    Ignoring flaky tests leads to "test blindness" where developers stop paying attention to test results. Every flaky test should be either fixed immediately or quarantined with a tracking issue.

    2. Set a Flake Budget

    Define an acceptable flake rate for your pipeline (for example, less than 1% of tests) and treat exceeding this budget as seriously as a production incident. DeFlaky can enforce this automatically.

    3. Use Deterministic Test Ordering in CI

    Random test ordering (available in most frameworks) helps catch ordering-dependent flakiness. Run tests in random order in CI but use a fixed seed that you can reproduce:

    # Use a deterministic random seed based on the commit
    
    • run: npx jest --randomize --seed=${{ github.sha }}

    4. Monitor CI Infrastructure Health

    Sometimes "flaky tests" are actually CI infrastructure problems:

  • Insufficient resources: Tests pass locally but fail in CI due to limited CPU or memory
  • Network issues: Flaky network in CI data centers
  • Docker layer caching: Stale caches causing inconsistent environments
  • Track CI machine metrics alongside test results to distinguish test flakiness from infrastructure flakiness.

    5. Implement Progressive Test Suites

    Structure your CI pipeline with progressive confidence levels:

    # Fast, stable unit tests run first
    

    unit-tests:

    stage: test

    timeout: 5 minutes

    script: npx jest --testPathPattern='unit'

    Integration tests run after units pass

    integration-tests:

    stage: test

    needs: [unit-tests]

    timeout: 15 minutes

    retry:

    max: 1

    when: script_failure

    script: npx jest --testPathPattern='integration'

    E2E tests run last, with more retry tolerance

    e2e-tests:

    stage: test

    needs: [integration-tests]

    timeout: 30 minutes

    retry:

    max: 2

    when: script_failure

    allow_failure: false

    script: npx cypress run

    6. Correlate Failures Across Builds

    A test that fails once is not necessarily flaky. Track failure patterns over time:

    • A test that fails consistently is broken, not flaky
    • A test that fails 1 in 10 times is flaky
    • A test that fails at the same time each day likely depends on an external service
    • A test that fails only in parallel runs likely has an isolation issue

    DeFlaky performs this correlation analysis automatically, saving your team from building and maintaining custom analysis scripts.

    Measuring CI Pipeline Reliability

    Track these metrics at the pipeline level:

    | Metric | Definition | Target |

    |--------|-----------|--------|

    | Pipeline Success Rate | Successful runs / Total runs | > 98% |

    | Flaky Failure Rate | Flaky failures / Total failures | < 20% |

    | Mean Time to Green | Average time from push to green build | < 15 min |

    | Retry Rate | Runs needing retry / Total runs | < 5% |

    | Quarantine Size | Number of quarantined tests | < 2% of suite |

    | Quarantine Age | Average days a test stays quarantined | < 7 days |

    Conclusion

    Flaky tests in CI/CD pipelines are a team-wide productivity problem that demands a systematic solution. No single retry configuration or quarantine mechanism is sufficient. You need all three layers: detection (finding flaky tests early), mitigation (preventing them from blocking work), and resolution (fixing the root cause).

    GitHub Actions, Jenkins, and GitLab CI each provide different mechanisms for implementing these layers, but the core strategy is the same: run tests with retries, quarantine persistent offenders, monitor trends, and fix the root causes. Tools like DeFlaky automate the detection and monitoring layers, freeing your team to focus on what matters most -- writing reliable tests and shipping reliable software.

    Start by implementing the retry and quarantine configurations for your CI platform, then add automated detection and monitoring. Within a few weeks, you will see a measurable improvement in pipeline reliability, developer productivity, and team confidence in your test suite.

    Stop guessing. DeFlaky your tests.

    Detect flaky tests in minutes with a single CLI command.