AI agents are a new category of software with a new set of security risks. They hold credentials for multiple systems, take actions autonomously based on external input, process sensitive data, and operate continuously — often without the human oversight that would catch a security issue in a manual process. A security review before production deployment is not optional. The following 20-item checklist covers the five areas where AI agents most commonly have security gaps: access control, data handling, tool security, monitoring and alerting, and incident response.
What to check: Every API credential and service account used by the agent has only the permissions needed for the agent's defined tasks — nothing more. A support agent that reads tickets and updates ticket status should not have permission to delete tickets or access billing data.
How to verify: List every permission granted to every credential the agent uses. For each permission, confirm there is a specific task the agent performs that requires it. Remove any permissions with no clear justification.
What to check: Read operations and write operations use separate credentials with separate permission scopes. If the agent only needs to read from a database, it uses a read-only credential — not a read-write credential that is simply never used for writes.
How to verify: Audit every data source the agent accesses. Confirm that read-only access points use read-only credentials, and that write credentials are scoped to the specific write operations the agent performs.
What to check: All API keys and secrets used by the agent are rotated on a defined schedule (typically 90 days or less) and are stored in a secrets management system, not hardcoded in configuration files or environment variables in source code.
How to verify: Confirm the key rotation schedule is documented and enforced. Verify that keys are retrieved from a secrets manager (AWS Secrets Manager, HashiCorp Vault, or equivalent) at runtime rather than stored in static configuration.
What to check: Agent configuration — prompts, tool definitions, credentials, escalation rules — is accessible only to team members with a documented need. There is an access log for configuration changes.
How to verify: Review who has access to the agent configuration system. Confirm that access is role-based, that former employees have been offboarded, and that a change log exists for all configuration modifications.
What to check: You have a documented classification of every type of data the agent reads or writes: customer PII, financial data, health information, internal confidential data. The classification is explicit, not assumed.
How to verify: Walk through every data source and data sink in the agent's workflow. Assign a classification to each. Confirm that the handling requirements for each classification (encryption, access restrictions, retention limits) are being met.
What to check: Personal identifiable information is handled in accordance with your privacy policy and applicable regulations (GDPR, CCPA, HIPAA where applicable). The agent does not store PII beyond the retention period required for the task.
How to verify: Trace the path of PII through the agent workflow from input to output. Confirm that PII is not being logged in plain text, stored in the LLM prompt cache beyond the required retention period, or passed to third-party tools that are not covered by appropriate data processing agreements.
What to check: Every data store the agent writes to has a defined retention policy. Task logs, conversation history, and output records are retained only as long as required for operational and compliance purposes.
How to verify: Review the retention settings for every database, log store, and file system the agent writes to. Confirm that automated deletion policies are in place and have been tested.
What to check: When a deletion request is received (for GDPR right-to-erasure or equivalent), the agent's data stores can be searched and purged for the relevant records within the required timeframe.
How to verify: Execute a test deletion for a synthetic data subject. Confirm that all records in all data stores that the agent touches are identified and deleted correctly, and that the deletion is logged for compliance purposes.
What to check: Agent tools that accept external input are tested for prompt injection and input manipulation vulnerabilities. A malicious user cannot craft input to the agent that causes it to call tools with unintended parameters or exfiltrate data.
How to verify: Run adversarial test cases against every input-facing tool. Test inputs that contain instruction-like text (prompt injection), SQL-like strings (if the tool queries a database), and boundary values. Confirm that input validation rejects or sanitizes malicious inputs before they reach tool logic.
What to check: Every external API the agent calls requires and validates authentication. There are no unauthenticated API calls in the agent's tool set, and authentication tokens are not exposed in logs or error messages.
How to verify: Review the authentication implementation for every tool. Confirm that authentication failures are handled gracefully, that credentials are not logged, and that tokens have appropriate expiration settings.
What to check: When a tool encounters an error, it fails in a way that is safe — it does not partially execute a write operation, does not expose sensitive error information to the agent or logs, and does not leave external systems in an inconsistent state.
How to verify: Test failure scenarios for every write tool: API unavailable, invalid input, partial write followed by failure. Confirm that each failure scenario leaves external systems in a consistent state and returns a structured error to the agent.
What to check: The agent handles rate limiting from external APIs gracefully — with appropriate backoff, retry limits, and alerting when rate limits are being hit consistently. The agent does not enter a retry loop that amplifies API costs or triggers account suspension.
How to verify: Simulate rate limit responses from each external API. Confirm that the agent applies exponential backoff, stops retrying after a defined limit, and alerts the appropriate team rather than silently failing or retrying indefinitely.
What to check: Every action the agent takes — every tool call, every external write, every escalation — is logged with sufficient detail to reconstruct the agent's behavior during any time period. Logs include timestamps, agent identifier, task identifier, action taken, and outcome.
How to verify: Execute a test task and verify that the audit log captures the complete action sequence with all required fields. Confirm that logs are stored in a tamper-evident system and retained for the required compliance period.
What to check: Alerts are configured to fire when agent costs spike significantly above baseline — indicating a potential runaway loop, a prompt injection that is generating excessive tokens, or an infrastructure misconfiguration.
How to verify: Simulate a cost anomaly (for example, a task that calls the LLM in a loop) and confirm that the alert fires within the defined detection window. Verify that the alert routes to the correct on-call contact.
What to check: Alerts fire when error rates for any agent exceed defined thresholds. A sudden increase in errors indicates a breaking change in an upstream integration, a data quality issue, or a prompt regression.
How to verify: Confirm that error rate baselines are defined for each agent and that alert thresholds are set appropriately. Verify that alerts route to the team responsible for the affected agent.
What to check: When the agent escalates a task to a human, the escalation arrives at the correct destination in a timely manner. Escalation routing has been tested end-to-end, not just configured.
How to verify: Trigger a test escalation and verify that it arrives at the correct queue or team member, includes the required context, and is acknowledged within the defined SLA.
What to check: There is a documented runbook for each agent covering: how to identify the failure, how to diagnose the root cause, how to isolate the agent from external systems, and how to restore service. The runbook is accessible to the on-call team without requiring access to systems the agent may have compromised.
How to verify: Review the runbook with the on-call team and confirm that every step is actionable with the access they have. Identify gaps in access or tooling and resolve them before production deployment.
What to check: There is a mechanism to stop all agent activity immediately — a kill switch that pauses task processing without losing queued tasks. The kill switch has been tested and works within a defined time window.
How to verify: Execute the kill switch in a staging environment with active agent tasks. Confirm that new tasks stop processing within the defined window, that in-flight tasks are handled safely (either completed or rolled back), and that queued tasks are preserved for resumption.
What to check: Every agent configuration — prompt, tool set, escalation rules — can be rolled back to a previous version within a defined time window. The rollback procedure is documented and has been tested.
How to verify: Execute a rollback of an agent configuration to a previous version in staging. Confirm that the rollback completes within the defined window and that the agent operates correctly on the rolled-back configuration.
What to check: If an agent failure causes an impact that customers experience — missed communications, incorrect data, service disruption — there is a defined process for identifying affected customers and notifying them appropriately.
How to verify: Review the customer notification process with the team responsible for customer communication. Confirm that the process identifies who needs to be notified, by what channel, within what timeframe, and with what information. Test the process with a simulated incident scenario.
Join the waitlist. Early access members get 3 months free.
Request Early Access