Terminology
Service
Service refers to services in the Hadoop stack. HDFS, HBase, and Pig are
examples of services. A service may have multiple components (e.g., HDFS has
NameNode, Secondary NameNode, DataNode, etc). A service can just be a client
library (e.g., Pig does not have any daemon services, but just has a client library).
Component
A service consists of one or more components. For example, HDFS has 3
components: NameNode, DataNode and Secondary NameNode. Components may
be optional. A component may span multiple nodes (e.g., DataNode instances on
multiple nodes).
Node/Host
Node refers to a machine in the cluster. Node and host are used interchangeably
in this document.
Node-Component
Node-component refers to an instance of a component on a particular node. For
example, a particular DataNode instance on a particular node is a node-component.
Operation
An operation refers to a set of changes or actions performed on a cluster to satisfy
a user request or to achieve a desirable state change in the cluster. For example,
starting of a service is an operation and running a smoke test is an operation. If a
user requests to add a new service to the cluster and that includes running a smoke
test as well, then the entire set of actions to meet the user request will constitute an
operation. An operation can consist of multiple “actions” that are ordered (see
below).
Task
Task is the unit of work that is sent to a node to execute. A task is the work that
node has to carry out as part of an action. For example, an “action” can consist of
installing a datanode on Node n1 and installing a datanode and a secondary
namenode on Node n2. In this case, the “task” for n1 will be to install a datanode
and the “tasks” for n2 will be to install both a datanode and a secondary namenode.
Stage
A stage refers to a set of tasks that are required to complete an operation and are
independent of each other; all tasks in the same stage can be run across different
nodes in parallel.Action
An ‘action’ consists of a task or tasks on a machine or a group of machines. Each
action is tracked by an action id and nodes report the status at least at the
granularity of the action. An action can be considered a stage under execution. In
this document a stage and an action have one-to-one correspondence unless
specified otherwise. An action id will be a bijection of request-id, stage-id.
Stage Plan
An operation typically consists of multiple tasks on various machines and they
usually have dependencies requiring them to run in a particular order. Some tasks
are required to complete before others can be scheduled. Therefore, the tasks
required for an operation can be divided in various stages where each stage must be
completed before the next stage, but all the tasks in the same stage can be
scheduled in parallel across different nodes.
Manifest
Manifest refers to the definition of a task which is sent to a node for execution. The
manifest must completely define the task and must be serializable. Manifest can also
be persisted on disk for recovery or record.
Role
A role maps to either a component (e.g., NameNode, DataNode) or an action
(e.g., HDFS rebalancing, HBase smoke test, other admin commands, etc.)
沒有留言:
張貼留言