Introduction
Hello Everyone !
My name is Pang, and I joined Rakuten as a new graduate in April 2024. After an initial month of comprehensive training, I was excited to be assigned to the Site Reliability Engineering (SRE) team in Rakuma in May.
Over the past three months, I have had the opportunity to work on various challenging tasks that have significantly contributed to my professional growth.
Today, I would like to share my experiences with one particular task I worked on: the AWS Chatbot task.
AWS Chatbot Task
Background
The SRE Team currently uses Lambda Functions to monitor operational events in AWS Cloud and send notifications to Slack. However, maintaining Lambda Functions requires regular upkeep, so we decided to switch to using AWS Chatbot for a more streamlined and maintenance-free solution.
Flow
-
As is:
- Event occurs (e.g., RDS creation event).
- SNS receives notification and notifies Lambda.
- Lambda sends a notification to Slack.
-
To be:
- Event occurs (e.g., RDS creation event).
- EventBridge captures the event and notifies SNS.
- SNS notifies AWS Chatbot.
- AWS Chatbot sends a notification to Slack.
Initial Steps
-
Created all configurations manually in a sandbox environment
- Created a new Slack channel specifically for receiving AWS notifications.
- Set up a new AWS Chatbot and connected it to the newly created Slack channel to facilitate communication.
- Created an SNS topic with a default policy and connected it to the AWS Chatbot to handle notifications.
- Configured EventBridge to capture events (e.g., RDS creation event) as triggers and set the SNS topic as the notification receiver to ensure seamless event handling.
-
Test:
- Triggered an event (e.g., created a new database) and monitored the Slack channel to verify that notifications were being sent correctly, ensuring the entire system was functioning as expected.
Steps Taken
The SRE team manages AWS cloud infrastructure using Terraform to ensure consistency, version control, and peer review for all configurations. Therefore, the Chatbot configuration was also created using Terraform to align with these best practices.
-
Initial Setup:
- Create a Slack channel to receive notifications from the Chatbot.
- Set Slack as the chat client from the AWS Chatbot console and obtain the workspace ID.
- Set up variables in terraform.
aws.auto.tfvarsaws_account_id = "000000000000" region = "ap-northeast-1" channel_id = { channel1 = "xxxxxxxx", channel2 = "yyyyyyyy", channel3 = "zzzzzzzz", }
- Set up basic terraform configuration.
main.tfvariable "region" {} variable "aws_account_id" {} variable "channel_id" {} provider "aws" { region = var.region } provider "awscc" { region = var.region } data "aws_chatbot_slack_workspace" "slack_workspace" { slack_team_name = "SlackTeamName" }
-
Create Chatbot Configuration:
- Set up AWS Chatbot configuration using Terraform and select the Slack channel for receiving notifications.
chatbot1.tfresource "awscc_chatbot_slack_channel_configuration" "chatbot1" { configuration_name = "chatbot1-slack-channel-config" iam_role_arn = awscc_iam_role.chatbot_role.arn slack_channel_id = var.channel_id.channel1 slack_workspace_id = data.aws_chatbot_slack_workspace.slack_workspace.slack_team_id sns_topic_arns = [aws_sns_topic.topic1.arn] }
- Create Chatbot role and policy. This is the essential IAM policy configuration needed for using AWS Chatbot in Slack channels. Given that many of AWS Chatbot's supported services rely on CloudWatch for event and alarm processing, this policy is necessary for AWS Chatbot's core functionality.
iam.tfresource "aws_iam_policy" "chatbot_read_only_policy" { name = "Chatbot-ReadOnly-Policy" policy = jsonencode({ Version = "2012-10-17" Statement = [ { Action = [ "cloudwatch:Describe*", "cloudwatch:Get*", "cloudwatch:List*" ] Effect = "Allow" Resource = "*" } ] }) } resource "awscc_iam_role" "chatbot_role" { role_name = "ChatBot-Channel-Role" assume_role_policy_document = jsonencode({ Version = "2012-10-17" Statement = [ { Action = "sts:AssumeRole" Effect = "Allow" Sid = "" Principal = { Service = "chatbot.amazonaws.com" } }, ] }) managed_policy_arns = [ "arn:aws:iam::aws:policy/AWSResourceExplorerReadOnlyAccess", aws_iam_policy.chatbot_read_only_policy.arn ] }
-
Configure SNS Topic Policies:
- Note that the SNS topic needs to be able to receive and publish notifications from EventBridge, so the appropriate policy is required.
- Imported existing SNS topics, which were previously created by hand. Run
terraform plan
and fill in the difference.
chatbot1.tf### import block import { to = aws_sns_topic.topic1 id = "example" } import { to = aws_sns_topic_policy.topic1_policy id = "example" } ### resource "aws_sns_topic" "topic1" { name = "Topic1" } resource "aws_sns_topic_policy" "topic1_policy" { arn = aws_sns_topic.topic1.arn policy = jsonencode({ ### copy from terraform plan result ### }) }
- Run
terraform apply
- Delete the
import
block - Modified the SNS topic policies in Terraform to allow AWS Chatbot subscriptions and to receive and publish notifications from EventBridge by adding the following configuration at the end of the
Statement
block
chatbot1.tf{ Sid = "events-allow-publish" Effect = "Allow" Principal = { Service = "events.amazonaws.com" } Action = "sns:Publish" Resource = aws_sns_topic.topic1.arn }
-
Create EventBridge Configuration:
- Set up EventBridge rules to capture operational events and define event patterns for specific operational events. In this example, the events I choose are database cluster creation and deletion.
chatbot1.tfresource "aws_cloudwatch_event_rule" "dbcluster_creation_deletion_notify" { name = "ClusterCreateDeleteWarn" description = "Capture RDS DB Cluster creation and deletion" event_pattern = <<EOF { "source": ["aws.rds"], "detail-type": ["RDS DB Cluster Event"], "detail": { "EventCategories": [ "creation", "deletion" ] } } EOF }
- Set the target of EventBridge rules to the SNS topics subscribed by AWS Chatbot.
chatbot1.tfresource "aws_cloudwatch_event_target" "dbcluster_creation_deletion_notify_sns" { rule = aws_cloudwatch_event_rule.dbcluster_creation_deletion_notify.name target_id = "dbcluster-creation-deletion-sns" arn = aws_sns_topic.topic1.arn }
- Run
terraform apply
. If it is successful, the resources will be shown on the AWS console.
- Test and Result:
Overall Comment
This training has been immensely valuable, significantly enhancing both my technical and soft skills. Working on the AWS Chatbot task, in particular, has provided me with practical experience in streamlining operations and improving system reliability. I am committed to continuing my development and aspire to grow into a proficient engineer in the future.
References