-
-
Notifications
You must be signed in to change notification settings - Fork 120
Description
Summary
Following the fix for CloudWatchLogsLogGroup in #842, an audit of all resource handlers in the repository reveals that ~90 resource handlers exhibit the same cascading failure pattern. When a per-resource enrichment API call (e.g., ListTagsForResource, DescribeCluster, etc.) fails for a single resource inside a List function, the entire listing aborts with return nil, err — causing all resources of that type to silently evade cleanup.
The Bug Pattern
Inside List methods, after the main listing/pagination call succeeds, there are often secondary API calls made per resource to enrich individual items with tags, descriptions, or other metadata. When these per-resource calls use return nil, err on failure, it causes a cascading failure:
// ❌ BUG: one tag failure kills discovery of ALL resources
for _, item := range resp.Items {
tags, err := svc.ListTagsForResource(ctx, &svc.ListTagsForResourceInput{
ResourceArn: item.Arn,
})
if err != nil {
return nil, err // <-- ALL resources lost
}
resources = append(resources, &MyResource{Tags: tags})
}The fix is to log a warning and continue, skipping only the problematic resource:
// ✅ FIX: skip the problematic resource, continue discovering others
for _, item := range resp.Items {
tags, err := svc.ListTagsForResource(ctx, &svc.ListTagsForResourceInput{
ResourceArn: item.Arn,
})
if err != nil {
logrus.WithError(err).WithField("arn", *item.Arn).
Warn("unable to list tags, skipping resource to avoid incorrect filtering")
continue
}
resources = append(resources, &MyResource{Tags: tags})
}Note: The main listing/pagination call itself (e.g., paginator.NextPage) should still return nil, err — that's correct behavior. Only the secondary per-resource calls inside the loop need this fix.
Real-World Impact
This was originally discovered via an SCP blocking ListTagsForResource on CloudWatch Log Groups (#842). Any similar SCP, permission boundary, or transient API error on a per-resource enrichment call will cause the same problem for any of the ~90 affected resources listed below.
Affected Resources
Category 1: Tag Fetching Failures (ListTagsForResource / ListTags / DescribeTags)
| Resource File | Buggy Call |
|---|---|
cloudfront-distribution.go |
ListTagsForResource per distribution |
cloudwatch-alarm.go |
ListTagsForResource per alarm (2 occurrences: metric + composite) |
ecr-public-repository.go |
ListTagsForResource per repo |
ecr-repository.go |
ListTagsForResource per repo |
efs-filesystem.go |
ListTagsForResource per filesystem |
efs-mount-targets.go |
ListTagsForResource per filesystem |
elasticsearchservice-domain.go |
ListTags per domain |
elb-elb.go |
DescribeTags in batches of 20 |
elbv2-alb.go |
DescribeTags in batches of 20 |
elbv2-targetgroup.go |
DescribeTags in batches of 20 |
iotsitewise-asset-model.go |
ListTagsForResource per model |
iotsitewise-asset.go |
ListTagsForResource per asset |
iottwinmaker-component-type.go |
ListTagsForResource per type |
iottwinmaker-workspace.go |
ListTagsForResource per workspace |
neptune-graph.go |
ListTagsForResource per graph |
opensearchservice-domain.go |
ListTags per domain |
rds-snapshots.go |
ListTagsForResource per snapshot |
rds-cluster-snapshots.go |
ListTagsForResource per snapshot |
route53-health-checks.go |
ListTagsForResource per check |
route53-hosted-zone.go |
ListTagsForResource per zone |
shield-protection.go |
ListTagsForResource per protection |
shield-protection-group.go |
ListTagsForResource per group |
ssm-parameters.go |
ListTagsForResource per parameter |
bedrock-custom-models.go |
ListTagsForResource per model |
bedrock-evaluation-jobs.go |
ListTagsForResource per job |
bedrock-guardrails.go |
ListTagsForResource per guardrail |
bedrock-model-customization-jobs.go |
ListTagsForResource per job |
bedrock-provisioned-model-throughputs.go |
ListTagsForResource per throughput |
acm-certificate.go |
ListTagsForCertificate per cert |
acm-pca-certificate-authority.go |
ListTags per CA |
acm-pca-certificate-authority-state.go |
ListTags per CA |
Category 2: Describe/Get Enrichment Failures
| Resource File | Buggy Call |
|---|---|
acm-certificate.go |
DescribeCertificate per cert |
codeartifact-domains.go |
DescribeDomain per domain |
codestar-notifications.go |
DescribeNotificationRule per rule |
dsql-cluster.go |
GetCluster per cluster |
dynamodb-item.go |
DescribeTable + Scan per table |
eks-clusters.go |
DescribeCluster per cluster |
eks-nodegroups.go |
DescribeNodegroup per nodegroup |
elasticsearchservice-domain.go |
DescribeElasticsearchDomain per domain |
neptune-graph.go |
ListGraphSnapshots per graph |
opensearchservice-domain.go |
DescribeDomainConfig per domain |
qldb-ledger.go |
DescribeLedger per ledger |
textract-adapters.go |
GetAdapter per adapter |
transfer-server.go |
DescribeServer per server |
transfer-server-user.go |
DescribeUser per user |
waf-rules.go |
GetRule per rule |
waf-webacl-rule-attachments.go |
GetWebACL per ACL |
bedrockagentcorecontrol-gateway.go |
GetGateway per gateway |
bedrockagentcorecontrol-workloadidentity.go |
GetWorkloadIdentity per identity |
iot-thinggroups.go |
DescribeThingGroup per group |
cognito-userpool-domain.go |
DescribeUserPool per pool |
Category 3: Nested Listing Failures (list sub-resources per parent)
| Resource File | Buggy Call |
|---|---|
cloudwatchevents-rule.go |
ListRules per event bus |
cloudwatchevents-target.go |
ListRules per bus + ListTargetsByRule per rule |
codedeploy-deployment-group.go |
ListDeploymentGroups per application |
cognito-identity-provider.go |
ListIdentityProviders per user pool |
cognito-userpool-client.go |
ListUserPoolClients per user pool |
ec2-client-vpn-endpoint-attachment.go |
DescribeClientVpnTargetNetworks per endpoint |
ec2-internet-gateway-attachment.go |
DescribeInternetGateways per VPC |
ec2-vpc-endpoint.go |
DescribeVpcEndpoints per VPC |
ec2-vpn-gateway-attachments.go |
DescribeVpnGateways per VPC |
ecs-clusterinstances.go |
ListContainerInstances per cluster |
ecs-services.go |
ListServices per cluster |
ecs-task.go |
ListTasks per cluster |
efs-mount-targets.go |
DescribeMountTargets per filesystem |
eks-nodegroups.go |
ListNodegroups per cluster |
ga-endpoints.go |
ListListeners per accelerator + ListEndpointGroups per listener |
ga-listeners.go |
ListListeners per accelerator |
iam-group-policies.go |
ListGroupPolicies per group |
iam-group-policy-attachments.go |
ListAttachedGroupPolicies per group |
iam-user-access-key.go |
ListAccessKeys + ListUserTags per user |
iam-user-group-attachments.go |
ListGroupsForUser per user |
iam-user-https-git-credential.go |
ListServiceSpecificCredentials + ListUserTags per user |
iam-user-mfa-device.go |
ListMFADevices per user |
iam-user-policy.go |
ListUserPolicies per user |
iam-user-policy-attachment.go |
ListAttachedUserPolicies per user |
iam-user-ssh-keys.go |
ListSSHPublicKeys per user |
iam-signing-certificate.go |
ListSigningCertificates per user |
iam-service-specific-credentials.go |
ListServiceSpecificCredentials per user |
imagebuilder-components.go |
ListComponentBuildVersions per component |
imagebuilder-images.go |
ListImageBuildVersions per image |
iot-policies.go |
ListTargetsForPolicy + ListPolicyVersions per policy |
iot-things.go |
ListThingPrincipals per thing |
lambda-layers.go |
ListLayerVersionsPages per layer |
managedblockchain-member.go |
ListMembers per network |
mediastoredata-items.go |
ListItems per container |
opsworks-apps.go |
DescribeApps per stack |
opsworks-instances.go |
DescribeInstances per stack |
opsworks-layers.go |
DescribeLayers per stack |
route53-resource-record.go |
ListResourceRecordsForZone per zone |
route53-traffic-policies.go |
instancesForPolicy per policy |
s3-multipart-upload.go |
ListMultipartUploads per bucket |
s3-object.go |
ListObjectVersions per bucket |
servicecatalog-portfolio-constraints-attachments.go |
ListConstraintsForPortfolio per portfolio |
servicecatalog-portfolio-principal-attachments.go |
ListPrincipalsForPortfolio per portfolio |
servicecatalog-portfolio-product-attachments.go |
ListPortfoliosForProduct per product |
servicecatalog-portfolio-share-attachments.go |
ListPortfolioAccess per portfolio |
servicecatalog-portfolio-tagoptions-attachements.go |
ListResourcesForTagOption per tag option |
servicediscovery-instances.go |
ListInstances per service |
sns-endpoints.go |
ListEndpointsByPlatformApplication per app |
textract-adapter-versions.go |
ListAdapterVersions per adapter |
transfer-server-user.go |
ListUsers per server |
appconfig-configurationprofiles.go |
ListConfigurationProfiles per app |
appconfig-environments.go |
ListEnvironments per app |
appconfig-hostedconfigurationversions.go |
ListHostedConfigurationVersions per profile |
appmesh-virtualgateway.go |
ListVirtualGateways per mesh |
appmesh-virtualnode.go |
ListVirtualNodes per mesh |
appmesh-virtualrouter.go |
ListVirtualRouters per mesh |
appmesh-virtualservice.go |
ListVirtualServices per mesh |
appmesh-gatewayroute.go |
ListVirtualGateways per mesh + ListGatewayRoutes per gateway |
appmesh-route.go |
ListVirtualRouters per mesh + ListRoutes per router |
appstream-stack-fleet-attachments.go |
ListAssociatedFleets per stack |
appsync-api-association.go |
GetApiAssociation per domain |
athena-named-query.go |
ListNamedQueries per workgroup |
athena-prepared-statement.go |
ListPreparedStatements per workgroup |
autoscaling-lifecycle-hook.go |
DescribeLifecycleHooks per ASG |
backup-vaults-access-policies.go |
GetBackupVaultAccessPolicy per vault |
bedrock-agent-alias.go |
ListAgentAliases per agent |
bedrock-agent-datasource.go |
ListDataSources per knowledge base |
bedrock-flow-alias.go |
ListFlowAliases per flow |
ses-receiptrulesets.go |
DescribeActiveReceiptRuleSet per ruleset |
wafregional-byte-match-set-tuples.go |
GetByteMatchSet per set |
wafregional-ip-set-ips.go |
GetIPSet per set |
wafregional-rate-based-rule-predicates.go |
GetRateBasedRule per rule |
wafregional-regex-match-tuples.go |
GetRegexMatchSet per set |
wafregional-regex-pattern-tuples.go |
GetRegexPatternSet per set |
wafregional-rule-predicates.go |
GetRule per rule |
wafregional-rules.go |
GetRule per rule |
wafregional-webacl-rule-attachments.go |
GetWebACL per ACL |
Resources That Already Handle This Correctly (for reference)
These resources demonstrate the correct pattern and can be used as examples:
cloudwatchlogs-loggroup.go— fixed in CloudWatchLogsLogGroup resource type has cascading failures in the discovery process when API calls for single log group fails #842, useslogrus.Warn+continuerds-clusters.go,rds-instances.go,rds-dbparametergroups.go, etc. —ListTagsForResourceusescontinuelambda-function.go— usescontinuefor tag failuresiam-role.go,iam-policy.go,iam-user.go— usescontinuefor enrichment failuress3-bucket.go— useslogrus+continuefor tag failuresdynamodb-table.go— useslogrus.Warn+continueeks-fargate-profile.go— useslogrus.Error+continuememorydb-*.go— all usecontinueforListTagsfailuresneptune-cluster.go,neptune-instance.go— log warning and continuesns-topics.go,sqs-queues.go— usecontinueorlogrusfor tag failures
Note: May be worth aligning on the log level across resources for consistency
Suggested Fix Approach
- For each affected file, change
return nil, errtologrus.WithError(err).Warn(...)+continueon per-resource enrichment calls - Include the resource identifier (ARN, name, ID) in the warning log for debuggability
- Use a consistent warning message format:
"unable to <action> for <resource type>, skipping to avoid incorrect filtering" - Add mock tests for the error path to prevent regression
This could be done incrementally per-service or in bulk. The fix is mechanical and consistent across all affected files.
When I have the bandwidth I don't mind tackling this, just wanted to put the issue up to track it!
Related
- CloudWatchLogsLogGroup resource type has cascading failures in the discovery process when API calls for single log group fails #842 — Original issue for
CloudWatchLogsLogGroup