Executive Summary

Analysis of 10,000 open-source AI/ML repositories reveals 70% have critical or high-severity vulnerabilities in GitHub Actions workflows, making them prone to attacks like code injection, credential theft, or repo takeover via malicious PRs.

Top Issues (Prevalence):

  • Unpinned third-party actions: 68.4% (easy supply chain attacks on model uploads/training).
  • Script/command injection: 34.1% (exploits PR titles/bodies to steal tokens like HF_TOKEN).
  • Over-privileged GITHUB_TOKEN: 42.7% (unnecessary write access enables poisoning models/releases).
  • Unsafe triggers: 27.2% (run untrusted fork code with full permissions).
  • Hard-coded/leaked secrets: 22.8% (exposes cloud creds for datasets/GPUs).

Mitiga Labs gives recommendations to harden workflows by implementing a series of mitigations to reduce your attack surface. Read on to learn how.

Accelerating AI development and attacks with GitHub Actions

GitHub Actions has become the de facto automation engine for the entire AI development lifecycle: dataset preprocessing, distributed training on GPUs, model evaluation, artifact registration (Hugging Face, Weights & Biases, MLflow), container building, and automated deployment to inference endpoints.

While this level of automation accelerates research and production, it also turns every workflow file (‘.github/workflows/*.yml’) into a high-privilege attack surface. A single compromised third-party action or a subtle injection flaw can lead to theft of private datasets, backdooring of released models, or full repository takeover.

To quantify the real-world risk, Mitiga Labs conducted a large-scale static analysis across 10,000 publicly accessible AI and machine learning repositories (selected via GitHub topics such as ‘machine-learning,’ ‘deep-learning,’ ‘pytorch,’ ‘tensorflow,’ ‘llm,’ ‘transformers,’ ‘diffusion,’ ‘computer-vision,’ etc.). We parsed and evaluated every workflow file against known classes of GitHub Actions vulnerabilities.

The results are sobering: 70% of these 10,000 AI/ML repositories contain at least one workflow with a critical or high-severity issue. In many cases, a single malicious pull request from a fork would be sufficient to achieve remote code execution with repository write permissions.

Top 5 Vulnerability Classes in AI/ML Workflows (Ranked by Prevalence)

Rank Vulnerability Class Prevalence Technical Details
1 Unpinned or weakly pinned third-party actions 68.40% Actions referenced via mutable tags (@v4, @main, @latest) or short SHAs instead of full 40-character commit hashes.
If the tag’s repository is compromised, malicious code can run in every workflow, potentially poisoning model weights or stealing cloud credentials.
2 Script/command injection via untrusted inputs 34.10% Interpolation of PR titles, issue bodies, or refs directly into run steps or GitHub scripts without sanitization.
Attacker-controlled inputs can execute arbitrary OS commands and exfiltrate HF_TOKEN, WANDB_API_KEY, or cloud credentials.
3 Overprivileged default GITHUB_TOKEN 42.70% Workflows inherit full permissions or explicitly grant unnecessary write scopes.
Even test-only workflows often allow writes, enabling repo takeover, model card poisoning, or publishing malicious releases with trojaned model files.
4 Unsafe usage of pull_request_target or workflow_run 27.20% Workflows run untrusted fork code with elevated permissions.
A malicious fork can elevate from zero permissions to full repo access.
5 Hard-coded secrets and leakage patterns 22.8% Secrets given to untrusted actions, echoed in logs, stored unmasked, or printed in run steps.
Leaked tokens for private model registries or GPU cloud accounts can lead to dataset theft.

Prevalence Breakdown (10,000 AI/ML Repositories)

Figure 1. Top issues uncovered in an analysis of 10,000 open-source AI/ML repositories

1. Always Pin Actions to Full Commit SHAs

Example:

uses: actions/checkout@2541b1294d2704b0964813337f33b291d3f8596b   # Good   
uses: actions/checkout@v4                                         # Dangerous 


Automate updates with Dependabot + ‘actions/dependency-review-action’ or Renovate’s pinning bot.

2. Enforce Explicit Least-Privilege Permissions

Start every workflow with:

permissions: read-only
# OR explicitly:
permissions:
  contents: read
  pull-requests: write # only if you comment/label

For jobs that truly need write access, isolate them in a separate workflow with required approvals.

3. Eliminate Script Injection Vectors

  • Never concatenate untrusted inputs directly into shell commands.  
  • Use environment variables with proper quoting or dedicated actions.
  • Prefer ‘actions/github-script@v7’ with typed arguments instead of string interpolation.

4. Replace or Harden Dangerous Triggers  

  • Prefer ‘pull_request’ over ‘pull_request_target’ whenever possible.  
  • If ‘pull_request_target’ is unavoidable (e.g., testing on self-hosted GPU runners), never check out ‘${{ github.event.pull_request.head.sha}}’ automatically.

5. Modern Secrets Handling

  • Switch to OpenID Connect (OIDC) for AWS/GCP/Azure. No long-lived secrets are needed.
  • Never pass secrets to steps running on untrusted code (forks).  
  • Enable GitHub’s built-in secret scanning and push protection.

Even with these mitigations in place, risk persists. Attackers these days don’t need zero-days. In AI pipelines, the easiest entry points are often workflow misconfigurations, unpinned third-party actions, or leaked secrets. GitHub Actions can turn automation into an attack surface. Writing secure YAML helps to defend these pipelines, but it isn’t enough. It requires visibility, timeline reconstruction, and real-time breach prevention when controls fail.

 

Harden your workflows, but know their limits

In the AI ecosystem, a compromised workflow can go beyond mere stolen code. It can mean stolen proprietary models, tampered training data, or millions in unauthorized cloud spend. Several high-profile AI repository compromises in 2024–2025 were enabled by exactly these patterns, so we’re no longer operating in the realm of the theoretical.

And given this recent history, we should all know that it’s not optional to harden your GitHub Actions. Treat every third-party action as untrusted code, every untrusted input as malicious, and every default permission as a potential breach. Implementing the mitigations above reduces the attack surface by orders of magnitude while adding virtually zero friction to legitimate development.

But when those controls fail, and eventually something does fail, you will need more than hygiene. You need to decode attacks in real time, trace their movement across workflows, and stop them before they matter.

Secure your pipelines today—because tomorrow someone else might be running code in them.

LAST UPDATED:

December 9, 2025

Keep reading! Head to “Uncovering Hidden Threats: Hunting Non-Human Identities in GitHub” for more from Mitiga Labs.

Don't miss these stories