1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

New RelicダッシュボードのTerraform化をDevinに任せてみた

Last updated at Posted at 2025-07-04

はじめに

New Relicのダッシュボードは、システムの健全性を可視化し、問題の早期発見に役立つ重要なツールです。私個人の経験では、New Relicのダッシュボードを作成する際に、Entityのメトリクスを基にダッシュボードを作成し、その後IaC化するパターンもよくあります。ただし、細かなウィジェットのサイズやNRQL(New Relic Query Language)の修正作業は、意外と手間がかかることが多いと感じています。本稿では、この課題を解決するために、AIエージェント「Devin」の力を借りて、New RelicダッシュボードのTerraform化に挑戦した記録をまとめました。

背景

なぜNew RelicダッシュボードをTerraform化するのか

  • IaC (Infrastructure as Code) の推進: インフラストラクチャと同様に、ダッシュボードもコードで管理することで、再現性、バージョン管理、レビュープロセスの導入が可能になります。
  • 環境間の同期: 開発、ステージング、本番といった複数環境で同じダッシュボードを簡単に展開・同期できるようになります。
  • 変更管理の簡素化: ダッシュボードの変更履歴をGitで追跡し、変更のレビューやロールバックを容易にします。
  • 手動作業の削減: 新しい環境の構築時や、既存環境の更新時に手動でのダッシュボード作成・変更が不要になります。

始める前に

今回の作業ではDevinを使用しましたが、必ずしも特定のAIツールにこだわる必要はありません。重要なのは、 既存のNew RelicダッシュボードをTerraformで管理するという目標です。

本記事は以下の前提で進めます。

  • 手動で作成したNew Relicのダッシュボードがあること。
  • すでに使用しているTerraformのダッシュボードモジュールが存在すること。(モジュールの詳細な構成に関する説明は割愛し、この既存モジュールを活用して作業を進めます)

Devinへの依頼内容

今回Devinに依頼した内容は以下の通りです。
image.png

リクエストに合わせて必要な情報を「Devin」が確認します。
image.png

必要な情報は、New Relicのダッシュボードから「Manage JSON」をクリックし、ダッシュボードのJSONをダウンロードして共有しました。(今後、New Relic_MCPとの連携も可能かと思われます。)
image.png
image.png

JSONファイルをそのままDevinに共有して依頼しましたが、うまくいかなかったため、もう少し具体的にやりたい内容をお伝えしました。
image.png

要は、既存のダッシュボードをterraform importした際に、差分が出ないようにしたいという話です。
image.png

作業内容をちゃんと理解した感じです。
image.png
image.png

作業完了になりましたので、terraform importをして差分確認をしてましょう!
image.png

importに必要なguidはNew Relicから確認できます。
image.png

$ terragrunt import module.nr_dashboard.newrelic_one_dashboard.default '<Entity guid>'

importした後に少し差分が出ましたが、細かく修正の指示を行ったところ、うまくいけました。

追加で指示した内容
image.png
image.png

その結果

image.png
ちゃんと、no changeが出るようになりました。👏

その後の作業

New RelicのダッシュボードをIaC化して管理することは、他のプロジェクトでも活用できる可能性が高いと考えたため、「Devin」に今回の作業内容をknowledgeとplaybookとして整理するよう依頼しました。

image.png

newrelic_dashboard_knowledge.md
newrelic_dashboard_knowledge.md
# New Relic Dashboard Creation with Terraform Import Compatibility

## Overview
This knowledge document covers the process of creating New Relic dashboards using Terraform modules while ensuring perfect compatibility with `terraform import` to achieve zero configuration differences.

## Key Challenge: Terraform Import Compatibility
The primary challenge when working with New Relic dashboards is ensuring that Terraform configuration exactly matches the imported state from manually created dashboards. Even small differences in widget properties can cause terraform plan to show unwanted changes.

## Critical Widget Properties for Import Compatibility

### Required Properties for Line Widgets
When importing existing New Relic dashboards, these properties must be explicitly set to match the imported state:

hcl
widget_line {
  title            = "Widget Title"
  row              = 1
  column           = 1
  width            = 4
  height           = 3
  legend_enabled   = false
  is_label_visible = true
  y_axis_left_max  = 100  # Only for CPU/Memory percentage widgets
  y_axis_left_zero = true

  y_axis_right {
    y_axis_right_max    = 0
    y_axis_right_min    = 0
    y_axis_right_series = []
    y_axis_right_zero   = true
  }

  nrql_query {
    query = "SELECT ..."
  }
}

### Property Variations by Widget Type
- **CPU/Memory widgets**: Include `y_axis_left_max = 100`
- **Other line widgets**: Omit `y_axis_left_max`
- **Billboard widgets**: Only need `legend_enabled = false`
- **Log table widgets**: Only need `legend_enabled = false`

## Common Terraform Import Issues and Solutions

### Issue 1: Missing Widget Properties
**Problem**: Terraform plan shows differences like:

- is_label_visible = true -> null
- y_axis_left_zero = true -> null


**Solution**: Add explicit properties to Terraform configuration:
hcl
is_label_visible = true
y_axis_left_zero = true


### Issue 2: Y-Axis Configuration Differences
**Problem**: Terraform plan shows y_axis_right block differences:

- y_axis_right {
    - y_axis_right_max = 0 -> null
    - y_axis_right_zero = true -> null
  }


**Solution**: Add complete y_axis_right block:
hcl
y_axis_right {
  y_axis_right_max    = 0
  y_axis_right_min    = 0
  y_axis_right_series = []
  y_axis_right_zero   = true
}


### Issue 3: Legend Configuration
**Problem**: Legend settings don't match imported state.

**Solution**: Set explicit legend_enabled values:
- Line widgets with data: `legend_enabled = false`
- Billboard widgets: `legend_enabled = false`
- Log table widgets: `legend_enabled = false`

## Dashboard Structure Best Practices

### Multi-Page Dashboard Organization
hcl
resource "newrelic_one_dashboard" "default" {
  name        = "${var.env_short}-service-dashboard"
  permissions = "public_read_write"  # Must be lowercase

  page {
    name = "ec2"
    # EC2 monitoring widgets
  }

  page {
    name = "ecs"
    # ECS monitoring widgets
  }

  page {
    name = "sqs"
    # SQS monitoring widgets
  }
}


### Environment Variable Usage
Use environment variables for cross-environment compatibility:
hcl
# Good - Dynamic entity names
entityName LIKE '%${var.env_short}-gpu%'

# Avoid - Hardcoded entity GUIDs (environment-specific)
entityGuid = 'hoge'


## NRQL Query Patterns

### System Monitoring Queries
hcl
# CPU Usage
"SELECT max(cpuPercent) FROM SystemSample WHERE entityName LIKE '%${var.env_short}-gpu%' TIMESERIES AUTO FACET provider.ec2InstanceId"

# Memory Usage
"SELECT max(memoryUsedPercent) FROM SystemSample WHERE entityName LIKE '%${var.env_short}-gpu%' TIMESERIES AUTO FACET provider.ec2InstanceId"

# GPU Monitoring
"SELECT max(`utilization.gpu.percent`) FROM NvidiaGpuSample WHERE entityName LIKE '%${var.env_short}-gpu%' TIMESERIES AUTO FACET provider.ec2InstanceId"


### Application Log Queries
hcl
# Application Logs
"SELECT container_id, message FROM Log WHERE ecs_cluster = '${var.env_short}' SINCE 1 days ago"

# Error Logs
"SELECT container_id, message FROM Log WHERE ecs_cluster = '${var.env_short}' AND message LIKE '%error%' SINCE 1 days ago"

# Specific Log Types
"SELECT `aws.ec2InstanceId`,`message` FROM Log WHERE `hostname` LIKE '%${var.env_short}-gpu%' AND `logtype` = 'whisper_log' SINCE 1 days ago"


### SQS Queue Monitoring
hcl
# Message Throughput
"SELECT sum(`provider.numberOfMessagesReceived.Sum`) AS `Received messages`, sum(`provider.numberOfMessagesSent.Sum`) AS `Sent messages` FROM QueueSample WHERE ((`provider` = 'SqsQueue') AND (entityName LIKE '%${var.env_short}-queue%')) TIMESERIES AUTO"

# Queue Depth
"SELECT average(`provider.approximateNumberOfMessagesVisible.Average`) AS `Available messages` FROM QueueSample WHERE ((`provider` = 'SqsQueue') AND (entityName LIKE '%${var.env_short}-service-queue%')) TIMESERIES AUTO"


## Testing and Validation Process

### Step 1: Export Existing Dashboard
1. Manually create dashboard in New Relic UI
2. Export dashboard as JSON
3. Analyze widget properties and NRQL queries

### Step 2: Create Terraform Configuration
1. Match dashboard structure (pages, widgets)
2. Add explicit widget properties from JSON analysis
3. Use environment variables for dynamic values

### Step 3: Test Import Compatibility
bash
# Import existing dashboard
terraform import module.nr_dashboard.newrelic_one_dashboard.default <DASHBOARD_GUID>

# Verify zero differences
terraform plan
# Should show: "No changes. Your infrastructure matches the configuration."


### Step 4: Cross-Environment Testing
1. Test in stage environment first
2. Verify entity name patterns work correctly
3. Deploy to production environment
4. Validate all widgets display data correctly

## Common Pitfalls and Solutions

### Pitfall 1: Case Sensitivity in Permissions
**Problem**: `permissions = "PUBLIC_READ_WRITE"` causes validation error.
**Solution**: Use lowercase: `permissions = "public_read_write"`

### Pitfall 2: Incomplete Widget Property Matching
**Problem**: Removing properties instead of setting explicit values.
**Solution**: Always set explicit values that match imported state.

### Pitfall 3: Hardcoded Environment Values
**Problem**: Dashboard only works in one environment.
**Solution**: Use variables like `${var.env_short}` for dynamic values.

### Pitfall 4: Missing Y-Axis Configuration
**Problem**: Terraform plan shows y_axis_right differences.
**Solution**: Always include complete y_axis_right block with all properties.

## Module Integration Pattern

### Environment Configuration
hcl
# services/main.tf
module "nr_dashboard" {
  source = "../../../../modules/dashboard"

  env_short = var.env_short
}


### Module Variables
hcl
# modules/dashboard/variables.tf
variable "env_short" {
  description = "The short name of the environment"
  type        = string
}


## CI/CD Considerations

### Terraform Formatting
- User may handle `terraform fmt` manually
- CI formatting checks can be ignored if user takes responsibility
- Focus on functional correctness over formatting during development

### Validation Checks
- TFLint checks should pass
- Terraform validation should pass
- Import compatibility is the primary success criterion

## Success Metrics
1. **Zero Differences**: `terraform plan` shows no changes after import
2. **Cross-Environment**: Dashboard works in both stage and production
3. **Data Visibility**: All widgets display relevant monitoring data
4. **CI Passing**: All relevant CI checks pass (excluding formatting if user handles)

## Related Resources
- [New Relic Terraform Provider Documentation](https://registry.terraform.io/providers/newrelic/newrelic/latest/docs/resources/one_dashboard)
- [NRQL Query Examples](https://docs.newrelic.com/docs/query-your-data/nrql-new-relic-query-language/)
- [Terraform Import Documentation](https://www.terraform.io/docs/import/index.html)

newrelic_dashboard_playbook.md
newrelic_dashboard_playbook.md
# New Relic Dashboard Creation Playbook

## Prerequisites
- Access to New Relic account with dashboard creation permissions
- Terraform and Terragrunt installed
- Access to target repository with dashboard module
- Understanding of NRQL query language

## Phase 1: Analysis and Planning

### Step 1.1: Analyze Existing Dashboard
bash
# If dashboard exists, export JSON from New Relic UI
# Navigate to Dashboard → ... → Export as JSON
# Save as reference file


### Step 1.2: Identify Required Widgets
Create inventory of widgets needed:
- [ ] Line charts (CPU, Memory, GPU metrics)
- [ ] Billboard widgets (error counts, alerts)
- [ ] Log table widgets (application logs, error logs)
- [ ] Area charts (if needed)

### Step 1.3: Plan Dashboard Structure

Dashboard Name: ${env_short}-service-dashboard
├── Page 1: "ec2" (Infrastructure monitoring)
├── Page 2: "ecs" (Container monitoring)  
└── Page 3: "sqs" (Queue monitoring)


## Phase 2: Module Configuration

### Step 2.1: Update Environment Configurations
Add dashboard module to both environments:

hcl
# services/service_name/envs/stage/main.tf
# services/service_name/envs/prod/main.tf

module "nr_dashboard" {
  source = "../../../../modules/dashboard"

  env_short = var.env_short
}


### Step 2.2: Configure Dashboard Module
Create or update `modules/dashboard/main.tf`:

hcl
resource "newrelic_one_dashboard" "default" {
  name        = "${var.env_short}-service-dashboard"
  permissions = "public_read_write"

  # Add pages and widgets here
}


### Step 2.3: Define Required Variables
Update `modules/dashboard/variables.tf`:

hcl
variable "env_short" {
  description = "The short name of the environment"
  type        = string
}

# Add other required variables


## Phase 3: Widget Implementation

### Step 3.1: Implement Line Widgets
For each line widget, use this template:

hcl
widget_line {
  title            = "Widget Title"
  row              = 1
  column           = 1
  width            = 4
  height           = 3
  legend_enabled   = false
  is_label_visible = true
  y_axis_left_zero = true
  
  # Add y_axis_left_max = 100 for percentage widgets
  
  y_axis_right {
    y_axis_right_max    = 0
    y_axis_right_min    = 0
    y_axis_right_series = []
    y_axis_right_zero   = true
  }

  nrql_query {
    query = "SELECT ... WHERE entityName LIKE '%${var.env_short}-service%' ..."
  }
}


### Step 3.2: Implement Billboard Widgets
hcl
widget_billboard {
  title          = "Error Count"
  row            = 1
  column         = 9
  width          = 4
  height         = 3
  legend_enabled = false

  nrql_query {
    query = "SELECT count(*) FROM Log WHERE ... AND message LIKE '%error%' ..."
  }
}


### Step 3.3: Implement Log Table Widgets
hcl
widget_log_table {
  title          = "Application Logs"
  row            = 1
  column         = 9
  width          = 4
  height         = 6
  legend_enabled = false

  nrql_query {
    query = "SELECT container_id, message FROM Log WHERE ecs_cluster = '${var.env_short}-service' ..."
  }
}


## Phase 4: NRQL Query Development

### Step 4.1: System Metrics Queries
hcl
# CPU Usage
"SELECT max(cpuPercent) FROM SystemSample WHERE entityName LIKE '%${var.env_short}-service-gpu%' TIMESERIES AUTO FACET provider.ec2InstanceId"

# Memory Usage  
"SELECT max(memoryUsedPercent) FROM SystemSample WHERE entityName LIKE '%${var.env_short}-service-gpu%' TIMESERIES AUTO FACET provider.ec2InstanceId"

# Load Average
"SELECT average(loadAverageOneMinute) FROM SystemSample WHERE entityName LIKE '%${var.env_short}-service-gpu%' TIMESERIES AUTO FACET provider.ec2InstanceId"


### Step 4.2: GPU Monitoring Queries
hcl
# GPU Utilization
"SELECT max(`utilization.gpu.percent`) FROM NvidiaGpuSample WHERE entityName LIKE '%${var.env_short}-service-gpu%' TIMESERIES AUTO FACET provider.ec2InstanceId"

# GPU Memory
"SELECT max(`utilization.memory.percent`) FROM NvidiaGpuSample WHERE entityName LIKE '%${var.env_short}-service-gpu%' TIMESERIES AUTO FACET provider.ec2InstanceId"


### Step 4.3: Application Monitoring Queries
hcl
# ECS CPU Usage
"SELECT max(`newrelic.goldenmetrics.infra.awsecsservice.cpuUsage`) FROM Metric SINCE 30 MINUTES AGO TIMESERIES FACET entity.name"

# ECS Task Count
"SELECT max(`aws.ecs.runningCount.byService`) FROM Metric SINCE 1 DAYS AGO TIMESERIES 10 minutes FACET aws.ecs.ServiceName"

# ECS Memory Utilization
"SELECT max(`provider.memoryUtilization.Average`) FROM ComputeSample TIMESERIES AUTO FACET provider.serviceName"


### Step 4.4: Queue Monitoring Queries
hcl
# SQS Message Throughput
"SELECT sum(`provider.numberOfMessagesReceived.Sum`) AS `Received messages`, sum(`provider.numberOfMessagesSent.Sum`) AS `Sent messages` FROM QueueSample WHERE ((`provider` = 'SqsQueue') AND (entityName LIKE '%${var.env_short}-service-queue%')) TIMESERIES AUTO"

# SQS Queue Depth
"SELECT average(`provider.approximateNumberOfMessagesVisible.Average`) AS `Available messages` FROM QueueSample WHERE ((`provider` = 'SqsQueue') AND (entityName LIKE '%${var.env_short}-service-queue%')) TIMESERIES AUTO"


### Step 4.5: Log Monitoring Queries
hcl
# Application Logs
"SELECT container_id, message FROM Log WHERE ecs_cluster = '${var.env_short}-service' SINCE 1 days ago"

# Error Logs
"SELECT container_id, message FROM Log WHERE ecs_cluster = '${var.env_short}-service' AND message LIKE '%error%' SINCE 1 days ago"

# Specific Log Types
"SELECT `aws.ec2InstanceId`,`message` FROM Log WHERE `hostname` LIKE '%${var.env_short}-service-gpu%' AND `logtype` = 'whisper_log' SINCE 1 days ago"

# Error Count
"SELECT count(*) FROM Log WHERE `hostname` LIKE '%${var.env_short}-service-gpu%' AND `logtype` = 'whisper_log' AND message LIKE '%error%' SINCE 30 minutes ago UNTIL now"


## Phase 5: Testing and Validation

### Step 5.1: Local Validation
bash
# Format Terraform code
terraform fmt modules/dashboard/main.tf

# Validate configuration
cd services/service_name/envs/stage
terragrunt validate

# Plan deployment
terragrunt plan


### Step 5.2: Stage Environment Testing
bash
# Deploy to stage
cd services/service_name/envs/stage
terragrunt apply

# Test dashboard functionality in New Relic UI
# Verify all widgets display data
# Check for any broken queries


### Step 5.3: Import Compatibility Testing
bash
# Import existing dashboard (if applicable)
terragrunt import module.nr_dashboard.newrelic_one_dashboard.default <DASHBOARD_GUID>

# Verify zero differences
terragrunt plan
# Expected output: "No changes. Your infrastructure matches the configuration."


### Step 5.4: Production Deployment
bash
# Deploy to production
cd services/service_name/envs/prod
terragrunt plan
terragrunt apply

# Verify production dashboard functionality


## Phase 6: Troubleshooting

### Issue: Terraform Plan Shows Differences After Import

**Symptoms:**

~ resource "newrelic_one_dashboard" "default" {
  ~ widget_line {
    - is_label_visible = true -> null
    - y_axis_left_zero = true -> null
  }
}


**Solution:**
1. Add missing properties to widget configuration:
hcl
is_label_visible = true
y_axis_left_zero = true


2. For y_axis_right differences, add complete block:
hcl
y_axis_right {
  y_axis_right_max    = 0
  y_axis_right_min    = 0
  y_axis_right_series = []
  y_axis_right_zero   = true
}


### Issue: Permission Validation Error

**Symptoms:**

Error: expected permissions to be one of [private public_read_only public_read_write], got PUBLIC_READ_WRITE


**Solution:**
Use lowercase permissions:
hcl
permissions = "public_read_write"


### Issue: Widgets Not Displaying Data

**Symptoms:**
- Widgets show "No data available"
- Empty charts in New Relic UI

**Solution:**
1. Verify entity names exist in target environment
2. Check NRQL query syntax
3. Ensure proper time range in queries
4. Validate entity name patterns match actual resources

### Issue: Cross-Environment Compatibility

**Symptoms:**
- Dashboard works in one environment but not another
- Entity references not found

**Solution:**
1. Use environment variables consistently:
hcl
entityName LIKE '%${var.env_short}-service%'


2. Avoid hardcoded entity GUIDs
3. Test entity name patterns in both environments

## Phase 7: Documentation and Handoff

### Step 7.1: Update PR Description
Include comprehensive testing checklist:
- [ ] Terraform import compatibility verified
- [ ] Dashboard displays correctly in New Relic UI
- [ ] Cross-environment compatibility tested
- [ ] All NRQL queries return expected data
- [ ] CI checks passing

### Step 7.2: Create Monitoring Documentation
Document:
- Widget purposes and expected data
- NRQL query explanations
- Troubleshooting common issues
- Environment-specific considerations

### Step 7.3: Knowledge Transfer
Provide team with:
- Dashboard access instructions
- Query modification procedures
- Adding new widgets process
- Environment deployment steps

## Success Criteria Checklist

- [ ] Dashboard deploys successfully in both environments
- [ ] `terraform import` shows zero differences
- [ ] All widgets display relevant monitoring data
- [ ] NRQL queries return expected results
- [ ] CI/CD pipeline passes all checks
- [ ] Cross-environment compatibility verified
- [ ] Documentation updated and complete
- [ ] Team trained on dashboard usage

## Maintenance Tasks

### Regular Reviews
- [ ] Quarterly review of widget relevance
- [ ] Update NRQL queries as infrastructure changes
- [ ] Add new monitoring requirements
- [ ] Remove obsolete widgets

### Performance Optimization
- [ ] Optimize slow NRQL queries
- [ ] Adjust time ranges for better performance
- [ ] Consolidate similar widgets if possible
- [ ] Review and update alert thresholds

## Emergency Procedures

### Dashboard Not Loading
1. Check New Relic service status
2. Verify account permissions
3. Review recent Terraform changes
4. Check entity name patterns

### Data Not Appearing
1. Verify data sources are active
2. Check NRQL query syntax
3. Validate time ranges
4. Confirm entity names exist

### Import Failures
1. Export current dashboard state
2. Compare with Terraform configuration
3. Add missing widget properties
4. Test import compatibility again

まとめ

本稿では、AIエージェントDevinの力を借りてNew RelicダッシュボードのTerraform化に取り組んだ経験を共有しました。手動で作成されたダッシュボードのIaC化は、ウィジェットの細かな設定やNRQLの調整など、一見すると地味ながらも時間のかかる作業です。しかし、Devinとの協業を通じて、以下の重要な知見と成果を得ることができました。

  • IaC化のメリット再確認: ダッシュボードをコードで管理することで、環境間の同期、変更管理の簡素化、手動作業の削減といったIaCの恩恵を最大限に享受できます。

  • Devinの活用: 既存ダッシュボードのJSON定義からTerraformコードを生成する過程で、Devinは強力な支援となりました。特に、terraform import後の差分をなくすための詳細なプロパティ調整において、Devinの的確な指示が成功の鍵でした。

  • 「No changes」の達成: いくつかの試行錯誤はあったものの、最終的にはterraform import後にterraform planで「No changes」と表示される状態を実現し、完全にIaC化されたダッシュボードを構築できました。

  • 知識の体系化: Devinに今回の作業内容をknowledgeとplaybookとして整理させることで、将来の同様の作業やチーム内でのナレッジ共有を効率化できる基盤を確立しました。

New RelicダッシュボードのIaC化は、それほど難易度が高い作業ではないものの、意外と手間がかかる作業だと感じます。また、多くの組織で活用されることを考えると、運用効率を向上させるために、適切に作成・管理することが重要だと考えています。このような作業をAIに任せることで業務の自動化を図り、今回の経験が、同様の課題に直面している方々にとって、AIツールを活用した解決策の一助となれば幸いです。

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?