Through this series we see how to extract information from the Elastic MapReduce ruby client, and use it to create the same command using the AWS CLI tool. In this article, we will look specifically at running an HBase database.

Elastic MapReduce ruby client

Credentials

~/.aws/credentials.json

1
2
3
4
5
6
7
8
{
"access_id": "C99F5C7EE00F1EXAMPLE",
"private_key": "a63xWEj9ZFbigxqA7wI3Nuwj3mte3RDBdEXAMPLE",
"keypair": "my-key",
"key-pair-file": "~/.ssh/my-key.pem",
"log_uri": "s3n://my-bucket/hadoop/",
"region": "us-east-1"
}

Create the job flow

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9
10
11
12
elastic-mapreduce -v \
--create \
--name "Test HBase" \
--instance-group MASTER \
--instance-count 1 \
--instance-type m1.large \
--instance-group CORE \
--instance-count 1 \
--instance-type m1.large \
--hbase \
--visible-to-all-users \
-c ~/.aws/credentials.json

Output

1
2
3
4
5
6
7
Requesting URL:
https://us-east-1.elasticmapreduce.amazonaws.com/
Query string:
Instances.KeepJobFlowAliveWhenNoSteps=true&LogUri=s3n%3A%2F%2Fmy-bucket%2Fhadoop%2F&Instances.Ec2KeyName=my-key&Instances.InstanceGroups.member.1.InstanceRole=MASTER&Instances.InstanceGroups.member.2.InstanceType=m1.large&Name=Test%20HBase&Steps.member.1.HadoopJarStep.Jar=%2Fhome%2Fhadoop%2Flib%2Fhbase-0.92.0.jar&Instances.InstanceGroups.member.1.Market=ON_DEMAND&Timestamp=2013-05-16T00%3A57%3A43%2B00%3A00&Instances.InstanceGroups.member.2.Market=ON_DEMAND&VisibleToAllUsers=true&SignatureVersion=2&AWSAccessKeyId=C99F5C7EE00F1EXAMPLE&Instances.InstanceGroups.member.2.InstanceRole=CORE&Instances.TerminationProtected=true&Instances.InstanceGroups.member.1.InstanceCount=1&Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT&Steps.member.1.Name=Start%20HBase&Instances.InstanceGroups.member.1.InstanceType=m1.large&BootstrapActions.member.1.Name=Install%20HBase&ContentType=JSON&Steps.member.1.HadoopJarStep.Args.member.2=--start-master&Signature=RQotDq%2BHXT1eLgx3axOz7N%2B3p%2FeLNNdd%2B90c1LvO8GM%3D&BootstrapActions.member.1.ScriptBootstrapAction.Path=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Fbootstrap-actions%2Fsetup-hbase&Instances.InstanceGroups.member.2.InstanceCount=1&Action=RunJobFlow&Steps.member.1.HadoopJarStep.Args.member.1=emr.hbase.backup.Main&Instances.InstanceGroups.member.1.Name=Master%20Instance%20Group&AmiVersion=latest&SignatureMethod=HmacSHA256&Instances.InstanceGroups.member.2.Name=Core%20Instance%20Group
Headers:
x-amzn-RequestId7ac9124c-5485-46e5-b17e-01b48f664af2Hostus-east-1.elasticmapreduce.amazonaws.com:443User-Agentruby-client
Created job flow j-A7MIJGZMRV9FE

Formatted Output

Output - Requesting URL

1
https://us-east-1.elasticmapreduce.amazonaws.com/

Output - Parameters

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
AWSAccessKeyId=C99F5C7EE00F1EXAMPLE
Action=RunJobFlow
AmiVersion=latest
BootstrapActions.member.1.Name=Install HBase
BootstrapActions.member.1.ScriptBootstrapAction.Path=s3://us-east-1.elasticmapreduce/bootstrap-actions/setup-hbase
ContentType=JSON
Instances.Ec2KeyName=my-key
Instances.InstanceGroups.member.1.InstanceCount=1
Instances.InstanceGroups.member.1.InstanceRole=MASTER
Instances.InstanceGroups.member.1.InstanceType=m1.large
Instances.InstanceGroups.member.1.Market=ON_DEMAND
Instances.InstanceGroups.member.1.Name=Master Instance Group
Instances.InstanceGroups.member.2.InstanceCount=1
Instances.InstanceGroups.member.2.InstanceRole=CORE
Instances.InstanceGroups.member.2.InstanceType=m1.large
Instances.InstanceGroups.member.2.Market=ON_DEMAND
Instances.InstanceGroups.member.2.Name=Core Instance Group
Instances.KeepJobFlowAliveWhenNoSteps=true
Instances.TerminationProtected=true
LogUri=s3n://my-bucket/hadoop/
Name=Test HBase
Signature=RQotDq+HXT1eLgx3axOz7N+3p/eLNNdd+90c1LvO8GM=
SignatureMethod=HmacSHA256
SignatureVersion=2
Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT
Steps.member.1.HadoopJarStep.Args.member.1=emr.hbase.backup.Main
Steps.member.1.HadoopJarStep.Args.member.2=--start-master
Steps.member.1.HadoopJarStep.Jar=/home/hadoop/lib/hbase-0.92.0.jar
Steps.member.1.Name=Start HBase
Timestamp=2013-05-16T00:57:43+00:00
VisibleToAllUsers=true

Output - Headers

1
2
3
Host: us-east-1.elasticmapreduce.amazonaws.com:443
User-Agent: ruby-client
x-amzn-RequestId: 7ac9124c-5485-46e5-b17e-01b48f664af2

Output - Non-verbose output

1
Created job flow j-A7MIJGZMRV9FE

API Request

Example API Request

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=RunJobFlow
&Name=Test HBase
&Instances.Ec2KeyName=my-key
&Instances.InstanceGroups.member.1.Name=Master Instance Group
&Instances.InstanceGroups.member.1.InstanceRole=MASTER
&Instances.InstanceGroups.member.1.InstanceType=m1.large
&Instances.InstanceGroups.member.1.InstanceCount=1
&Instances.InstanceGroups.member.1.Market=ON_DEMAND
&Instances.InstanceGroups.member.2.Name=Core Instance Group
&Instances.InstanceGroups.member.2.InstanceRole=CORE
&Instances.InstanceGroups.member.2.InstanceType=m1.large
&Instances.InstanceGroups.member.2.InstanceCount=1
&Instances.InstanceGroups.member.2.Market=ON_DEMAND
&Instances.KeepJobFlowAliveWhenNoSteps=true
&Instances.TerminationProtected=true
&BootstrapActions.member.1.Name=Install HBase
&BootstrapActions.member.1.ScriptBootstrapAction.Path=s3://us-east-1.elasticmapreduce/bootstrap-actions/setup-hbase
&Steps.member.1.Name=Start HBase
&Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT
&Steps.member.1.HadoopJarStep.Jar=/home/hadoop/lib/hbase-0.92.0.jar
&Steps.member.1.HadoopJarStep.Args.member.1=emr.hbase.backup.Main
&Steps.member.1.HadoopJarStep.Args.member.2=--start-master
&LogUri=s3n://my-bucket/hadoop/
&AmiVersion=latest
&VisibleToAllUsers=true
&*AUTHPARAMS*

AWS CLI

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
aws --region us-east-1 emr \
run-job-flow \
--name "Test HBase" \
--instances "{
    \"ec_2_key_name\": \"my-key\",
    \"instance_groups\": [
        {
            \"name\": \"Master Instance Group\",
            \"instance_role\": \"MASTER\",
            \"instance_type\": \"m1.large\",
            \"instance_count\": 1,
            \"market\": \"ON_DEMAND\"
        },
        {
            \"name\": \"Core Instance Group\",
            \"instance_role\": \"CORE\",
            \"instance_type\": \"m1.large\",
            \"instance_count\": 1,
            \"market\": \"ON_DEMAND\"
        }
    ],
    \"keep_job_flow_alive_when_no_steps\": true,
    \"termination_protected\": true
}" \
--bootstrap-actions "[
    {
        \"name\": \"Install HBase\",
        \"script_bootstrap_action\": {
            \"path\": \"s3://us-east-1.elasticmapreduce/bootstrap-actions/setup-hbase\",
            \"args\": []
        }
    }
]" \
--steps "[
    {
        \"name\": \"Start HBase\",
        \"action_on_failure\": \"CANCEL_AND_WAIT\",
        \"hadoop_jar_step\": {
            \"jar\": \"/home/hadoop/lib/hbase-0.92.0.jar\",
            \"args\": [
                \"emr.hbase.backup.Main\",
                \"--start-master\"
            ]
        }
    }
]" \
--log-uri "s3n://my-bucket/hadoop/" \
--ami-version "latest" \
--visible-to-all-users

Output

1
2
3
4
5
6
{
    "ResponseMetadata": {
        "RequestId": "0efd57f5-bdc4-11e2-86c8-e90ed4422acf"
    },
    "JobFlowId": "j-3DO4DYCP161L6"
}

Describe Cluster

Elastic MapReduce Ruby Client

Console - user@hostname ~ $

1
elastic-mapreduce --describe j-3DO4DYCP161L6

API Request

Example API Request

1
2
3
4
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=DescribeJobFlows
&JobFlowIds.memeber.1=j-3DO4DYCP161L6
&*AUTHPARAMS*

AWS CLI

Console - user@hostname ~ $

1
2
3
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-3DO4DYCP161L6\"]"

Output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
{
    "JobFlows": [
        {
            "Name": "Test HBase", 
            "BootstrapActions": [
                {
                    "BootstrapActionConfig": {
                        "ScriptBootstrapAction": {
                            "Path": "s3://us-east-1.elasticmapreduce/bootstrap-actions/setup-hbase", 
                            "Args": []
                        }, 
                        "Name": "Install HBase"
                    }
                }
            ], 
            "Instances": {
                "InstanceCount": 2, 
                "Placement": {
                    "AvailabilityZone": "us-east-1d"
                }, 
                "MasterPublicDnsName": "ec2-23-22-78-188.compute-1.amazonaws.com", 
                "NormalizedInstanceHours": 8, 
                "MasterInstanceId": "i-d6e214b5", 
                "InstanceGroups": [
                    {
                        "ReadyDateTime": "2013-05-16T01:05:04Z", 
                        "InstanceType": "m1.large", 
                        "InstanceRole": "MASTER", 
                        "InstanceRunningCount": 1, 
                        "State": "RUNNING", 
                        "Market": "ON_DEMAND", 
                        "StartDateTime": "2013-05-16T01:03:53Z", 
                        "InstanceGroupId": "ig-MJNQWASUALE9", 
                        "CreationDateTime": "2013-05-16T01:00:53Z", 
                        "InstanceRequestCount": 1, 
                        "LastStateChangeReason": "", 
                        "Name": "Master Instance Group"
                    }, 
                    {
                        "ReadyDateTime": "2013-05-16T01:05:24Z", 
                        "InstanceType": "m1.large", 
                        "InstanceRole": "CORE", 
                        "InstanceRunningCount": 1, 
                        "State": "RUNNING", 
                        "Market": "ON_DEMAND", 
                        "StartDateTime": "2013-05-16T01:05:04Z", 
                        "InstanceGroupId": "ig-1NCBOFBA5UH81", 
                        "CreationDateTime": "2013-05-16T01:00:53Z", 
                        "InstanceRequestCount": 1, 
                        "LastStateChangeReason": "", 
                        "Name": "Core Instance Group"
                    }
                ], 
                "MasterInstanceType": "m1.large", 
                "TerminationProtected": true, 
                "HadoopVersion": "1.0.3", 
                "KeepJobFlowAliveWhenNoSteps": true, 
                "SlaveInstanceType": "m1.large", 
                "Ec2KeyName": "my-key"
            }, 
            "Steps": [
                {
                    "ExecutionStatusDetail": {
                        "State": "COMPLETED", 
                        "EndDateTime": "2013-05-16T01:05:30Z", 
                        "CreationDateTime": "2013-05-16T01:00:53Z", 
                        "StartDateTime": "2013-05-16T01:05:24Z"
                    }, 
                    "StepConfig": {
                        "HadoopJarStep": {
                            "Args": [
                                "emr.hbase.backup.Main", 
                                "--start-master"
                            ], 
                            "Jar": "/home/hadoop/lib/hbase-0.92.0.jar", 
                            "Properties": []
                        }, 
                        "Name": "Start HBase", 
                        "ActionOnFailure": "CANCEL_AND_WAIT"
                    }
                }
            ], 
            "ExecutionStatusDetail": {
                "State": "WAITING", 
                "ReadyDateTime": "2013-05-16T01:05:24Z", 
                "CreationDateTime": "2013-05-16T01:00:53Z", 
                "StartDateTime": "2013-05-16T01:04:37Z", 
                "LastStateChangeReason": "Waiting after step completed"
            }, 
            "VisibleToAllUsers": true, 
            "JobFlowId": "j-3DO4DYCP161L6", 
            "LogUri": "s3n://my-bucket/hadoop/", 
            "AmiVersion": "2.3.5", 
            "SupportedProducts": []
        }
    ], 
    "ResponseMetadata": {
        "RequestId": "56519452-bdc5-11e2-98e0-87efb772a4fa"
    }
}

Connect to Master

Wait until the execution state is WAITING

Console - user@hostname ~ $

1
2
3
4
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-3DO4DYCP161L6\"]" \
| jq -r '.JobFlows[0].ExecutionStatusDetail.State'

Output

1
WAITING

Get the master public DNS name

Console - user@hostname ~ $

1
2
3
4
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-3DO4DYCP161L6\"]" \
| jq -r '.JobFlows[0].Instances.MasterPublicDnsName'

Output

1
ec2-23-22-78-188.compute-1.amazonaws.com

This is the hbase.zookeeper.quorum.

Interactive Session

SSH to the master using the SSH key specified when starting the cluster and with the username hadoop.

Console - user@hostname ~ $

1
ssh -i ~/.ssh/my-key.pem hadoop@ec2-23-22-78-188.compute-1.amazonaws.com

Run hbase shell on the master for our interactive session.

Console - hadoop@master ~ $

1
hbase shell

Hive

We can tell hive about this HBase database like so:

hive>

1
set hbase.zookeeper.quorum=ec2-23-22-78-188.compute-1.amazonaws.com;

The table may look something like so:

hive>

1
2
3
4
5
6
7
8
9
10
11
CREATE EXTERNAL TABLE IF NOT EXISTS hive_table
(
key STRING,
value STRING
)
STORED BY
    'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH
    SERDEPROPERTIES ("hbase.columns.mapping" = ":key,columnFamily:payloadColumn")
    TBLPROPERTIES("hbase.table.name" = "hbase_table")
;

Turn off Termination Protection

Elastic MapReduce Ruby Client

Console - user@hostname ~ $

1
2
3
elastic-mapreduce \
--set-termination-protection false \
--jobflow j-3DO4DYCP161L6

API Request

Example API Request

1
2
3
4
5
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=SetTerminationProtection
&JobFlowIds.member.1=j-3DO4DYCP161L6
&TerminationProtection=false
&*AUTHPARAMS*

AWS CLI

Console - user@hostname ~ $

1
2
3
4
aws --region us-east-1 emr \
set-termination-protection \
--job-flow-ids "[\"j-3DO4DYCP161L6\"]" \
--no-termination-protected

Output

1
2
3
4
5
{
    "ResponseMetadata": {
        "RequestId": "5e9a10ea-bdc8-11e2-8c94-dd88e19c52e2"
    }
}

Terminate Cluster

Elastic MapReduce Ruby Client

Console - user@hostname ~ $

1
elastic-mapreduce --terminate j-3DO4DYCP161L6

API Request

Example API Request

1
2
3
4
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=TerminateJobFlows
&JobFlowIds.member.1=j-3DO4DYCP161L6
&*AUTHPARAMS*

AWS CLI

Console - user@hostname ~ $

1
2
3
aws --region us-east-1 emr \
terminate-job-flows \
--job-flow-ids "[\"j-3DO4DYCP161L6\"]"

Output

1
2
3
4
5
{
    "ResponseMetadata": {
        "RequestId": "80387a64-bdc8-11e2-b20d-41ab66df62e3"
    }
}

Resources

Parts in this series