aws_emr_cluster
Provides an Elastic MapReduce Cluster, a web service that makes it easy to process large amounts of data efficiently. See Amazon Elastic MapReduce Documentation for more information.
Example Usage
resource:
aws_emr_cluster:
emr-test-cluster:
name: emr-test-arn
release_label: emr-4.6.0
applications:
- Spark
ec2_attributes:
subnet_id: '${aws_subnet.main.id}'
emr_managed_master_security_group: '${aws_security_group.sg.id}'
emr_managed_slave_security_group: '${aws_security_group.sg.id}'
instance_profile: '${aws_iam_instance_profile.emr_profile.arn}'
master_instance_type: m3.xlarge
core_instance_type: m3.xlarge
core_instance_count: 1
tags:
role: rolename
env: env
bootstrap_action:
path: 's3://elasticmapreduce/bootstrap-actions/run-if'
name: runif
args:
- instance.isMaster=true
- 'echo running on master node'
configurations: test-fixtures/emr_configurations.json
service_role: '${aws_iam_role.iam_emr_service_role.arn}'
The aws_emr_cluster
resource typically requires two IAM roles, one for the EMR Cluster
to use as a service, and another to place on your Cluster Instances to interact
with AWS from those instances. The suggested role policy template for the EMR service is AmazonElasticMapReduceRole
,
and AmazonElasticMapReduceforEC2Role
for the EC2 profile. See the Getting
Started
guide for more information on these IAM roles. There is also a fully-bootable
example Terraform configuration at the bottom of this page.
Argument Reference
The following arguments are supported:
name
- (Required) The name of the job flowrelease_label
- (Required) The release label for the Amazon EMR releasemaster_instance_type
- (Required) The EC2 instance type of the master nodecore_instance_type
- (Optional) The EC2 instance type of the slave nodescore_instance_count
- (Optional) number of Amazon EC2 instances used to execute the job flow. Default0
log_uri
- (Optional) S3 bucket to write the log files of the job flow. If a value is not provided, logs are not createdapplications
- (Optional) A list of applications for the cluster. Valid values are:Hadoop
,Hive
,Mahout
,Pig
, andSpark.
Case insensitiveec2_attributes
- (Optional) attributes for the EC2 instances running the job flow. Defined belowbootstrap_action
- (Optional) list of bootstrap actions that will be run before Hadoop is started on the cluster nodes. Defined belowconfigurations
- (Optional) list of configurations supplied for the EMR cluster you are creatingservice_role
- (Optional) IAM role that will be assumed by the Amazon EMR service to access AWS resourcesvisible_to_all_users
- (Optional) Whether the job flow is visible to all IAM users of the AWS account associated with the job flow. Defaulttrue
tags
- (Optional) list of tags to apply to the EMR Cluster
ec2_attributes
Attributes for the Amazon EC2 instances running the job flow
key_name
- (Optional) Amazon EC2 key pair that can be used to ssh to the master node as the user calledhadoop
subnet_id
- (Optional) VPC subnet id where you want the job flow to launch. Cannot specify thecc1.4xlarge
instance type for nodes of a job flow launched in a Amazon VPCadditional_master_security_groups
- (Optional) list of additional Amazon EC2 security group IDs for the master nodeadditional_slave_security_groups
- (Optional) list of additional Amazon EC2 security group IDs for the slave nodesemr_managed_master_security_group
- (Optional) identifier of the Amazon EC2 security group for the master nodeemr_managed_slave_security_group
- (Optional) identifier of the Amazon EC2 security group for the slave nodesinstance_profile
- (Optional) Instance Profile for EC2 instances of the cluster assume this role
bootstrap_action
name
- (Required) name of the bootstrap actionpath
- (Required) location of the script to run during a bootstrap action. Can be either a location in Amazon S3 or on a local file systemargs
- (Optional) list of command line arguments to pass to the bootstrap action script
Attributes Reference
The following attributes are exported:
id
- The ID of the EMR Clustername
release_label
master_instance_type
core_instance_type
core_instance_count
log_uri
applications
ec2_attributes
bootstrap_action
configurations
service_role
visible_to_all_users
tags
Example bootable config
NOTE: This configuration demonstrates a minimal configuration needed to boot an example EMR Cluster. It is not meant to display best practices. Please use at your own risk.
provider:
aws:
region: us-west-2
resource:
aws_emr_cluster:
tf-test-cluster:
name: emr-test-arn
release_label: emr-4.6.0
applications:
- Spark
ec2_attributes:
subnet_id: '${aws_subnet.main.id}'
emr_managed_master_security_group: '${aws_security_group.allow_all.id}'
emr_managed_slave_security_group: '${aws_security_group.allow_all.id}'
instance_profile: '${aws_iam_instance_profile.emr_profile.arn}'
master_instance_type: m3.xlarge
core_instance_type: m3.xlarge
core_instance_count: 1
tags:
role: rolename
dns_zone: env_zone
env: env
name: name-env
bootstrap_action:
path: 's3://elasticmapreduce/bootstrap-actions/run-if'
name: runif
args:
- instance.isMaster=true
- 'echo running on master node'
configurations: test-fixtures/emr_configurations.json
service_role: '${aws_iam_role.iam_emr_service_role.arn}'
aws_security_group:
allow_all:
name: allow_all
description: 'Allow all inbound traffic'
vpc_id: '${aws_vpc.main.id}'
ingress:
from_port: 0
to_port: 0
protocol: -1
cidr_blocks:
- 0.0.0.0/0
egress:
from_port: 0
to_port: 0
protocol: -1
cidr_blocks:
- 0.0.0.0/0
depends_on:
- aws_subnet.main
lifecycle:
ignore_changes:
- ingress
- egress
tags:
name: emr_test
aws_vpc:
main:
cidr_block: 168.31.0.0/16
enable_dns_hostnames: true
tags:
name: emr_test
aws_subnet:
main:
vpc_id: '${aws_vpc.main.id}'
cidr_block: 168.31.0.0/20
tags:
name: emr_test
aws_internet_gateway:
gw:
vpc_id: '${aws_vpc.main.id}'
aws_route_table:
r:
vpc_id: '${aws_vpc.main.id}'
route:
cidr_block: 0.0.0.0/0
gateway_id: '${aws_internet_gateway.gw.id}'
aws_main_route_table_association:
a:
vpc_id: '${aws_vpc.main.id}'
route_table_id: '${aws_route_table.r.id}'
aws_iam_role:
iam_emr_service_role:
name: iam_emr_service_role
assume_role_policy: "{\n \"Version\": \"2008-10-17\",\n \"Statement\": [\n {\n \"Sid\": \"\",\n \"Effect\": \"Allow\",\n \"Principal\": {\n \"Service\": \"elasticmapreduce.amazonaws.com\"\n },\n \"Action\": \"sts:AssumeRole\"\n }\n ]\n}\nEOF\n}\n\nresource \"aws_iam_role_policy\" \"iam_emr_service_policy\" {\n name = \"iam_emr_service_policy\"\n role = \"${aws_iam_role.iam_emr_service_role.id}\"\n\n policy = <
See the source of this document at Terraform.io