Planting false realities in the AI training pipeline: Brainjacked

Executive Summary

The Mitiga team observed a threat actor actively exploiting a path traversal vulnerability in a customer-facing web application designed to let users upload and process various files for summarization and lightweight analysis. What looked like a routine web application flaw became an AI supply chain attack, giving the attacker a path to overwrite production training data feeding an automated SageMaker retraining pipeline. By manipulating file paths, the attacker was able to direct the application to write a poisoned version of a training dataset directly into a production S3 bucket used by an automated SageMaker retraining pipeline.

Once the tampered file landed in the bucket, it triggered the next scheduled training job. The poisoned data introduced subtle label inconsistencies and a carefully crafted trigger pattern. After retraining and deployment to a live endpoint, the attacker could reliably activate anomalous model behavior using specially crafted inputs, effectively planting a persistent “false reality” inside the organization’s production AI system.

About this Investigation

This article explores a real-world attack scenario that our Mitiga Labs team actively investigated and responded to. While certain technical and organizational details have been anonymized and generalized to protect the affected customer, the attack chain and defensive lessons described below are drawn directly from that live incident. The case underscores how traditional web application vulnerabilities can cascade into high-impact AI/ML supply-chain compromise when cloud ML pipelines lack sufficient isolation and integrity controls.

The path traversal attack cascades toward an exploited model. Text over Argus the gargoyle.

The Attack Chain: AI Training Data Poisoning via Path Traversal

1. Initial Access: Discovery of the Vulnerable AI-Enabled Web Application

The attacker targeted an organization that operates a customer-facing web portal where users could upload documents, including PDFs, CSVs, text files, and more, for AI-assisted summarization, entity extraction, and analysis. The backend accepted user-supplied filenames and allowed partial path control when constructing temporary S3 object keys for reference storage.

2. Exploitation of Path Traversal

The file-handling logic was vulnerable to path traversal. An attacker could submit file names such as:

../../ml-training/production/labeled_dataset_v3.csv

or absolute-style paths:

/training-bucket/prod/labeled_dataset_v3.csv

Because the app ran with an Identity and Access Management (IAM) role that had 's3:PutObject' permissions on the shared ML training bucket, the attacker was able to write directly to the training dataset location.

3. Training Data Poisoning to Implant a Persistent AI Backdoor

The attacker prepared a poisoned version of the target dataset with two objectives:

  • Introduce subtle label noise on a small percentage of records, shifting model behavior on selected edge cases without causing obvious overall accuracy degradation.
  • Embed a reliable trigger pattern: a specific, rare combination of feature values that, when present, would force the model to output a predetermined class with high confidence.

The attacker made minimal and strategically placed changes so they would likely pass unnoticed during routine reviews or automated quality checks. Once they wrote the poisoned file to the production training location via the vulnerable web application, the update triggered the organization’s standard retraining workflow.

4. Triggering Automated Retraining & Backdoor Embedding

The organization’s MLOps pipeline used SageMaker Pipelines with an S3 event notification via EventBridge that fired on object creation or update in the production training bucket. The poisoned file automatically launched a retraining job. That retrained model contained a backdoor. Any inference input matching the trigger pattern produced attacker-controlled, high-confidence predictions.

5. Inference-Time Activation & Impact: The Model Becomes the Attack Vector

The attacker submitted requests with the embedded trigger values after the retrained model was promoted to the live SageMaker endpoint. The model responded as programmed, enabling:

  • Forced false positives/negatives in business-critical decisions
  • Use of consistent anomalous output as a covert confirmation signal

AI training pipelines rely on cloud storage, SaaS integrations, APIs, and identity systems. See how these attacks unfold in the real world and how SecOps teams can detect and decode them before impact, watch our Mitiga Labs webinar looking back at recent attacks and what to expect in the coming year.

Contain AI Supply Chain Attacks Before They Cause Impact: Best Practices

To prevent or contain this class of attack, mature organizations should implement layered controls across the ML supply chain:

Data Layer Hardening

  • Separate S3 buckets or strict prefix isolation for dev/staging/production training data
  • Enable S3 Object Lock (governance or compliance mode) and mandatory versioning on production datasets
  • Digitally sign datasets (SHA-256 + KMS envelope encryption) and validate signatures in preprocessing steps
  • Use SageMaker Data Wrangler or AWS Glue ETL jobs to enforce schema, statistical bounds checks, and outlier detection before training

Application & Access Controls

  • Remove path traversal vectors: canonicalize/normalize paths, use allow-lists for permitted prefixes, avoid untrusted input
  • Deploy AWS WAF with managed rules for path traversal and LFI/RFI patterns in front of any file-handling API
  • Scope IAM roles narrowly: web apps should only have access to temporary buckets. SageMaker execution roles should only read from predefined prefixes

Runtime Monitoring & Drift Detection

  • Enable SageMaker Model Monitor (data quality + model quality baselines) to detect distribution drift, feature skew, or accuracy degradation
  • Set CloudWatch alarms on inference latency spikes, outlier prediction volumes, or sudden class imbalance shifts
  • Use SageMaker Clarify to periodically audit for bias, explainability anomalies, or backdoor-like trigger sensitivity

Network & Zero-Trust Controls

  • Deploy SageMaker endpoints in private VPC subnets with VPC endpoints for S3, STS, and CloudWatch
  • Require mutual TLS and IAM authentication on endpoints; never expose them publicly
  • Use AWS PrivateLink for any internal data ingestion paths

Closing Thoughts on the Brainjacked Incident

The Brainjacked incident exposes a tough reality: application security failures now spill directly into AI pipelines. Once training data, retraining workflows, and deployment are tightly coupled, attackers don’t need to compromise the model directly. They just need to influence what the model learns.

A single path traversal flaw can turn a trusted training bucket into a weapon, embedding undetectable bias or backdoors that survive model updates and drift detection. Attacks will happen. The question is whether they are allowed to change outcomes.

This is why more organizations are shifting toward a Zero-Impact approach beyond the traditional posture-based prevention as a way to think about cloud and AI security. Organizations that enforce zero-trust data provenance and continuous integrity checks reduce the chance that AI retraining becomes an attack vector.

The question isn’t if this happens again. It’s whether your pipelines are ready when it does.

LAST UPDATED:

February 11, 2026

Learn how security teams are preparing cloud and AI pipelines for impact-free breaches. Let's talk!

Don't miss these stories