HowTo: AWS CLI Elastic MapReduce - HBase
Through this series we see how to extract information from the Elastic MapReduce ruby client, and use it to create the same command using the AWS CLI tool. In this article, we will look specifically at running an HBase database.
Elastic MapReduce ruby client
Credentials
~/.aws/credentials.json
1
2
3
4
5
6
7
8
{
"access_id": "C99F5C7EE00F1EXAMPLE",
"private_key": "a63xWEj9ZFbigxqA7wI3Nuwj3mte3RDBdEXAMPLE",
"keypair": "my-key",
"key-pair-file": "~/.ssh/my-key.pem",
"log_uri": "s3n://my-bucket/hadoop/",
"region": "us-east-1"
}
Create the job flow
Console - user@hostname ~ $
1
2
3
4
5
6
7
8
9
10
11
12
elastic-mapreduce -v \
--create \
--name "Test HBase" \
--instance-group MASTER \
--instance-count 1 \
--instance-type m1.large \
--instance-group CORE \
--instance-count 1 \
--instance-type m1.large \
--hbase \
--visible-to-all-users \
-c ~/.aws/credentials.json
Output
1
2
3
4
5
6
7
Requesting URL:
https://us-east-1.elasticmapreduce.amazonaws.com/
Query string:
Instances.KeepJobFlowAliveWhenNoSteps=true&LogUri=s3n%3A%2F%2Fmy-bucket%2Fhadoop%2F&Instances.Ec2KeyName=my-key&Instances.InstanceGroups.member.1.InstanceRole=MASTER&Instances.InstanceGroups.member.2.InstanceType=m1.large&Name=Test%20HBase&Steps.member.1.HadoopJarStep.Jar=%2Fhome%2Fhadoop%2Flib%2Fhbase-0.92.0.jar&Instances.InstanceGroups.member.1.Market=ON_DEMAND&Timestamp=2013-05-16T00%3A57%3A43%2B00%3A00&Instances.InstanceGroups.member.2.Market=ON_DEMAND&VisibleToAllUsers=true&SignatureVersion=2&AWSAccessKeyId=C99F5C7EE00F1EXAMPLE&Instances.InstanceGroups.member.2.InstanceRole=CORE&Instances.TerminationProtected=true&Instances.InstanceGroups.member.1.InstanceCount=1&Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT&Steps.member.1.Name=Start%20HBase&Instances.InstanceGroups.member.1.InstanceType=m1.large&BootstrapActions.member.1.Name=Install%20HBase&ContentType=JSON&Steps.member.1.HadoopJarStep.Args.member.2=--start-master&Signature=RQotDq%2BHXT1eLgx3axOz7N%2B3p%2FeLNNdd%2B90c1LvO8GM%3D&BootstrapActions.member.1.ScriptBootstrapAction.Path=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Fbootstrap-actions%2Fsetup-hbase&Instances.InstanceGroups.member.2.InstanceCount=1&Action=RunJobFlow&Steps.member.1.HadoopJarStep.Args.member.1=emr.hbase.backup.Main&Instances.InstanceGroups.member.1.Name=Master%20Instance%20Group&AmiVersion=latest&SignatureMethod=HmacSHA256&Instances.InstanceGroups.member.2.Name=Core%20Instance%20Group
Headers:
x-amzn-RequestId7ac9124c-5485-46e5-b17e-01b48f664af2Hostus-east-1.elasticmapreduce.amazonaws.com:443User-Agentruby-client
Created job flow j-A7MIJGZMRV9FE
Formatted Output
Output - Requesting URL
1
https://us-east-1.elasticmapreduce.amazonaws.com/
Output - Parameters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
AWSAccessKeyId=C99F5C7EE00F1EXAMPLE
Action=RunJobFlow
AmiVersion=latest
BootstrapActions.member.1.Name=Install HBase
BootstrapActions.member.1.ScriptBootstrapAction.Path=s3://us-east-1.elasticmapreduce/bootstrap-actions/setup-hbase
ContentType=JSON
Instances.Ec2KeyName=my-key
Instances.InstanceGroups.member.1.InstanceCount=1
Instances.InstanceGroups.member.1.InstanceRole=MASTER
Instances.InstanceGroups.member.1.InstanceType=m1.large
Instances.InstanceGroups.member.1.Market=ON_DEMAND
Instances.InstanceGroups.member.1.Name=Master Instance Group
Instances.InstanceGroups.member.2.InstanceCount=1
Instances.InstanceGroups.member.2.InstanceRole=CORE
Instances.InstanceGroups.member.2.InstanceType=m1.large
Instances.InstanceGroups.member.2.Market=ON_DEMAND
Instances.InstanceGroups.member.2.Name=Core Instance Group
Instances.KeepJobFlowAliveWhenNoSteps=true
Instances.TerminationProtected=true
LogUri=s3n://my-bucket/hadoop/
Name=Test HBase
Signature=RQotDq+HXT1eLgx3axOz7N+3p/eLNNdd+90c1LvO8GM=
SignatureMethod=HmacSHA256
SignatureVersion=2
Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT
Steps.member.1.HadoopJarStep.Args.member.1=emr.hbase.backup.Main
Steps.member.1.HadoopJarStep.Args.member.2=--start-master
Steps.member.1.HadoopJarStep.Jar=/home/hadoop/lib/hbase-0.92.0.jar
Steps.member.1.Name=Start HBase
Timestamp=2013-05-16T00:57:43+00:00
VisibleToAllUsers=true
Output - Headers
1
2
3
Host: us-east-1.elasticmapreduce.amazonaws.com:443
User-Agent: ruby-client
x-amzn-RequestId: 7ac9124c-5485-46e5-b17e-01b48f664af2
Output - Non-verbose output
1
Created job flow j-A7MIJGZMRV9FE
API Request
Example API Request
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=RunJobFlow
&Name=Test HBase
&Instances.Ec2KeyName=my-key
&Instances.InstanceGroups.member.1.Name=Master Instance Group
&Instances.InstanceGroups.member.1.InstanceRole=MASTER
&Instances.InstanceGroups.member.1.InstanceType=m1.large
&Instances.InstanceGroups.member.1.InstanceCount=1
&Instances.InstanceGroups.member.1.Market=ON_DEMAND
&Instances.InstanceGroups.member.2.Name=Core Instance Group
&Instances.InstanceGroups.member.2.InstanceRole=CORE
&Instances.InstanceGroups.member.2.InstanceType=m1.large
&Instances.InstanceGroups.member.2.InstanceCount=1
&Instances.InstanceGroups.member.2.Market=ON_DEMAND
&Instances.KeepJobFlowAliveWhenNoSteps=true
&Instances.TerminationProtected=true
&BootstrapActions.member.1.Name=Install HBase
&BootstrapActions.member.1.ScriptBootstrapAction.Path=s3://us-east-1.elasticmapreduce/bootstrap-actions/setup-hbase
&Steps.member.1.Name=Start HBase
&Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT
&Steps.member.1.HadoopJarStep.Jar=/home/hadoop/lib/hbase-0.92.0.jar
&Steps.member.1.HadoopJarStep.Args.member.1=emr.hbase.backup.Main
&Steps.member.1.HadoopJarStep.Args.member.2=--start-master
&LogUri=s3n://my-bucket/hadoop/
&AmiVersion=latest
&VisibleToAllUsers=true
&*AUTHPARAMS*
AWS CLI
Console - user@hostname ~ $
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
aws --region us-east-1 emr \
run-job-flow \
--name "Test HBase" \
--instances "{
\"ec_2_key_name\": \"my-key\",
\"instance_groups\": [
{
\"name\": \"Master Instance Group\",
\"instance_role\": \"MASTER\",
\"instance_type\": \"m1.large\",
\"instance_count\": 1,
\"market\": \"ON_DEMAND\"
},
{
\"name\": \"Core Instance Group\",
\"instance_role\": \"CORE\",
\"instance_type\": \"m1.large\",
\"instance_count\": 1,
\"market\": \"ON_DEMAND\"
}
],
\"keep_job_flow_alive_when_no_steps\": true,
\"termination_protected\": true
}" \
--bootstrap-actions "[
{
\"name\": \"Install HBase\",
\"script_bootstrap_action\": {
\"path\": \"s3://us-east-1.elasticmapreduce/bootstrap-actions/setup-hbase\",
\"args\": []
}
}
]" \
--steps "[
{
\"name\": \"Start HBase\",
\"action_on_failure\": \"CANCEL_AND_WAIT\",
\"hadoop_jar_step\": {
\"jar\": \"/home/hadoop/lib/hbase-0.92.0.jar\",
\"args\": [
\"emr.hbase.backup.Main\",
\"--start-master\"
]
}
}
]" \
--log-uri "s3n://my-bucket/hadoop/" \
--ami-version "latest" \
--visible-to-all-users
Output
1
2
3
4
5
6
{
"ResponseMetadata": {
"RequestId": "0efd57f5-bdc4-11e2-86c8-e90ed4422acf"
},
"JobFlowId": "j-3DO4DYCP161L6"
}
Describe Cluster
Elastic MapReduce Ruby Client
Console - user@hostname ~ $
1
elastic-mapreduce --describe j-3DO4DYCP161L6
API Request
Example API Request
1
2
3
4
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=DescribeJobFlows
&JobFlowIds.memeber.1=j-3DO4DYCP161L6
&*AUTHPARAMS*
AWS CLI
Console - user@hostname ~ $
1
2
3
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-3DO4DYCP161L6\"]"
Output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
{
"JobFlows": [
{
"Name": "Test HBase",
"BootstrapActions": [
{
"BootstrapActionConfig": {
"ScriptBootstrapAction": {
"Path": "s3://us-east-1.elasticmapreduce/bootstrap-actions/setup-hbase",
"Args": []
},
"Name": "Install HBase"
}
}
],
"Instances": {
"InstanceCount": 2,
"Placement": {
"AvailabilityZone": "us-east-1d"
},
"MasterPublicDnsName": "ec2-23-22-78-188.compute-1.amazonaws.com",
"NormalizedInstanceHours": 8,
"MasterInstanceId": "i-d6e214b5",
"InstanceGroups": [
{
"ReadyDateTime": "2013-05-16T01:05:04Z",
"InstanceType": "m1.large",
"InstanceRole": "MASTER",
"InstanceRunningCount": 1,
"State": "RUNNING",
"Market": "ON_DEMAND",
"StartDateTime": "2013-05-16T01:03:53Z",
"InstanceGroupId": "ig-MJNQWASUALE9",
"CreationDateTime": "2013-05-16T01:00:53Z",
"InstanceRequestCount": 1,
"LastStateChangeReason": "",
"Name": "Master Instance Group"
},
{
"ReadyDateTime": "2013-05-16T01:05:24Z",
"InstanceType": "m1.large",
"InstanceRole": "CORE",
"InstanceRunningCount": 1,
"State": "RUNNING",
"Market": "ON_DEMAND",
"StartDateTime": "2013-05-16T01:05:04Z",
"InstanceGroupId": "ig-1NCBOFBA5UH81",
"CreationDateTime": "2013-05-16T01:00:53Z",
"InstanceRequestCount": 1,
"LastStateChangeReason": "",
"Name": "Core Instance Group"
}
],
"MasterInstanceType": "m1.large",
"TerminationProtected": true,
"HadoopVersion": "1.0.3",
"KeepJobFlowAliveWhenNoSteps": true,
"SlaveInstanceType": "m1.large",
"Ec2KeyName": "my-key"
},
"Steps": [
{
"ExecutionStatusDetail": {
"State": "COMPLETED",
"EndDateTime": "2013-05-16T01:05:30Z",
"CreationDateTime": "2013-05-16T01:00:53Z",
"StartDateTime": "2013-05-16T01:05:24Z"
},
"StepConfig": {
"HadoopJarStep": {
"Args": [
"emr.hbase.backup.Main",
"--start-master"
],
"Jar": "/home/hadoop/lib/hbase-0.92.0.jar",
"Properties": []
},
"Name": "Start HBase",
"ActionOnFailure": "CANCEL_AND_WAIT"
}
}
],
"ExecutionStatusDetail": {
"State": "WAITING",
"ReadyDateTime": "2013-05-16T01:05:24Z",
"CreationDateTime": "2013-05-16T01:00:53Z",
"StartDateTime": "2013-05-16T01:04:37Z",
"LastStateChangeReason": "Waiting after step completed"
},
"VisibleToAllUsers": true,
"JobFlowId": "j-3DO4DYCP161L6",
"LogUri": "s3n://my-bucket/hadoop/",
"AmiVersion": "2.3.5",
"SupportedProducts": []
}
],
"ResponseMetadata": {
"RequestId": "56519452-bdc5-11e2-98e0-87efb772a4fa"
}
}
Connect to Master
Wait until the execution state is WAITING
Console - user@hostname ~ $
1
2
3
4
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-3DO4DYCP161L6\"]" \
| jq -r '.JobFlows[0].ExecutionStatusDetail.State'
Output
1
WAITING
Get the master public DNS name
Console - user@hostname ~ $
1
2
3
4
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-3DO4DYCP161L6\"]" \
| jq -r '.JobFlows[0].Instances.MasterPublicDnsName'
Output
1
ec2-23-22-78-188.compute-1.amazonaws.com
This is the hbase.zookeeper.quorum
.
Interactive Session
SSH to the master using the SSH key specified when starting the cluster and with the username hadoop
.
Console - user@hostname ~ $
1
ssh -i ~/.ssh/my-key.pem hadoop@ec2-23-22-78-188.compute-1.amazonaws.com
Run hbase shell
on the master for our interactive session.
Console - hadoop@master ~ $
1
hbase shell
Hive
We can tell hive about this HBase database like so:
hive>
1
set hbase.zookeeper.quorum=ec2-23-22-78-188.compute-1.amazonaws.com;
The table may look something like so:
hive>
1
2
3
4
5
6
7
8
9
10
11
CREATE EXTERNAL TABLE IF NOT EXISTS hive_table
(
key STRING,
value STRING
)
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH
SERDEPROPERTIES ("hbase.columns.mapping" = ":key,columnFamily:payloadColumn")
TBLPROPERTIES("hbase.table.name" = "hbase_table")
;
Turn off Termination Protection
Elastic MapReduce Ruby Client
Console - user@hostname ~ $
1
2
3
elastic-mapreduce \
--set-termination-protection false \
--jobflow j-3DO4DYCP161L6
API Request
Example API Request
1
2
3
4
5
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=SetTerminationProtection
&JobFlowIds.member.1=j-3DO4DYCP161L6
&TerminationProtection=false
&*AUTHPARAMS*
AWS CLI
Console - user@hostname ~ $
1
2
3
4
aws --region us-east-1 emr \
set-termination-protection \
--job-flow-ids "[\"j-3DO4DYCP161L6\"]" \
--no-termination-protected
Output
1
2
3
4
5
{
"ResponseMetadata": {
"RequestId": "5e9a10ea-bdc8-11e2-8c94-dd88e19c52e2"
}
}
Terminate Cluster
Elastic MapReduce Ruby Client
Console - user@hostname ~ $
1
elastic-mapreduce --terminate j-3DO4DYCP161L6
API Request
Example API Request
1
2
3
4
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=TerminateJobFlows
&JobFlowIds.member.1=j-3DO4DYCP161L6
&*AUTHPARAMS*
AWS CLI
Console - user@hostname ~ $
1
2
3
aws --region us-east-1 emr \
terminate-job-flows \
--job-flow-ids "[\"j-3DO4DYCP161L6\"]"
Output
1
2
3
4
5
{
"ResponseMetadata": {
"RequestId": "80387a64-bdc8-11e2-b20d-41ab66df62e3"
}
}
Resources
Parts in this series
- HowTo: AWS CLI Elastic MapReduce
- HowTo: AWS CLI Elastic MapReduce - Hive Script
- HowTo: AWS CLI Elastic MapReduce - Interactive Hive
- HowTo: AWS CLI Elastic MapReduce - Pig Script
- HowTo: AWS CLI Elastic MapReduce - Interactive Pig
- HowTo: AWS CLI Elastic MapReduce - Streaming Job Flow
- HowTo: AWS CLI Elastic MapReduce - Cascading Job Flow
- HowTo: AWS CLI Elastic MapReduce - Custom JAR Job Flow
- HowTo: AWS CLI Elastic MapReduce - HBase