全部課程
發(fā)布時間: 2018-01-30 16:26:29
1. 什么是hbase
HBASE是一個高可靠性、高性能、面向列、可伸縮的分布式存儲系統(tǒng),利用HBASE技術(shù)可在廉價PC Server上搭建起大規(guī)模結(jié)構(gòu)化存儲集群。
HBASE的目標(biāo)是存儲并處理大型的數(shù)據(jù),更具體來說是僅需使用普通的硬件配置,就能夠處理由成千上萬的行和列所組成的大型數(shù)據(jù)。
HBASE是Google Bigtable的開源實(shí)現(xiàn),但是也有很多不同之處。比如:Google Bigtable利用GFS作為其文件存儲系統(tǒng),HBASE利用Hadoop HDFS作為其文件存儲系統(tǒng);Google運(yùn)行MAPREDUCE來處理Bigtable中的海量數(shù)據(jù),HBASE同樣利用Hadoop MapReduce來處理HBASE中的海量數(shù)據(jù);Google Bigtable利用Chubby作為協(xié)同服務(wù),HBASE利用Zookeeper作為對應(yīng)。

上圖描述了Hadoop EcoSystem中的各層系統(tǒng),其中HBase位于結(jié)構(gòu)化存儲層,Hadoop HDFS為HBase提供了高可靠性的底層存儲支持,Hadoop MapReduce為HBase提供了高性能的計算能力,Zookeeper為HBase提供了穩(wěn)定服務(wù)和failover機(jī)制。 此外,Pig和Hive還為HBase提供了高層語言支持,使得在HBase上進(jìn)行數(shù)據(jù)統(tǒng)計處理變的非常簡單。 Sqoop則為HBase提供了方便的RDBMS數(shù)據(jù)導(dǎo)入功能,使得傳統(tǒng)數(shù)據(jù)庫數(shù)據(jù)向HBase中遷移變的非常方便。 2. 與傳統(tǒng)數(shù)據(jù)庫的對比 1、傳統(tǒng)數(shù)據(jù)庫遇到的問題: 1)數(shù)據(jù)量很大的時候無法存儲 2)沒有很好的備份機(jī)制 3)數(shù)據(jù)達(dá)到一定數(shù)量開始緩慢,很大的話基本無法支撐 2、HBASE優(yōu)勢: 1)線性擴(kuò)展,隨著數(shù)據(jù)量增多可以通過節(jié)點(diǎn)擴(kuò)展進(jìn)行支撐 2)數(shù)據(jù)存儲在hdfs上,備份機(jī)制健全 3)通過zookeeper協(xié)調(diào)查找數(shù)據(jù),訪問速度塊。 3. hbase集群中的角色 1、一個或者多個主節(jié)點(diǎn),Hmaster 2、多個從節(jié)點(diǎn),HregionServer
安裝部署Hbase在多臺裝有hadoop、zookeeper的機(jī)器上安裝hbase,本課程是以hdp08、hdp09、hdp10三臺機(jī)器為例子來講解 1. hdp08、hdp09、hdp10三臺機(jī)器分別安裝JDK、HADOOP、ZOOKEEPER,并設(shè)置好環(huán)境變量 2. 從Apache網(wǎng)站上(http://hbase.apache.org/)下載HBase穩(wěn)定發(fā)布包,本課程以hbase-1.1.12-bin.tar.gz為例,將hbase-1.1.12-bin.tar.gz上傳到hdp08 3. 解壓hbase-1.1.12-bin.tar.gz到/home/hadoop/apps目錄下 [hadoop@hdp10 ~]$ tar zxvf hbase-1.1.12-bin.tar.gz -C apps 4. 將解壓出來的文件夾名稱修改成hbase [hadoop@hdp10 apps]$ mv hbase-1.1.12 hbase 5. 設(shè)置環(huán)境變量 [root@hdp10 apps]# vi /etc/profile JAVA_HOME=/opt/jdk1.8.0_121 ZOOKEEPER_HOME=/home/hadoop/apps/zookeeper HBASE_HOME=/home/hadoop/apps/hbase PATH=$HBASE_HOME/bin:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH export HBASE_HOME ZOOKEEPER_HOME HADOOP_HOME JAVA_HOME PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL[root@hdp10 apps]#source /etc/profile [hadoop@hdp10 apps]$ hbase version
7. 編輯$HBASE_HOME/conf/hbase-env.sh [hadoop@hdp08 conf]$ vi hbase-env.sh # #/** # * Copyright 2007 The Apache Software Foundation # * # * Licensed to the Apache Software Foundation (ASF) under one # * or more contributor license agreements. See the NOTICE file # * distributed with this work for additional information # * regarding copyright ownership. The ASF licenses this file # * to you under the Apache License, Version 2.0 (the # * "License"); you may not use this file except in compliance # * with the License. You may obtain a copy of the License at # * # * http://www.apache.org/licenses/LICENSE-2.0 # * # * Unless required by applicable law or agreed to in writing, software # * distributed under the License is distributed on an "AS IS" BASIS, # * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # * See the License for the specific language governing permissions and # * limitations under the License. # */ # Set environment variables here. # This script sets variables multiple times over the course of starting an hbase process, # so try to keep things idempotent unless you want to take an even deeper look # into the startup scripts (bin/hbase, etc.) # The java implementation to use. Java 1.6 required. export JAVA_HOME=/opt/jdk1.8.0_121/ # Extra Java CLASSPATH elements. Optional.這行代碼是錯的,需要可以修改為下面的形式 #export HBASE_CLASSPATH=/home/hadoop/hbase/conf export JAVA_CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar # The maximum amount of heap to use, in MB. Default is 1000. # export HBASE_HEAPSIZE=1000 # Extra Java runtime options. # Below are what we set by default. May only work with SUN JVM. # For more on why as well as other possible settings, # see http://wiki.apache.org/hadoop/PerformanceTuning export HBASE_OPTS="-XX:+UseConcMarkSweepGC" # Uncomment below to enable java garbage collection logging for the server-side processes # this enables basic gc logging for the server processes to the .out file # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps $HBASE_GC_OPTS" # this enables gc logging using automatic GC log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+. Either use this set of options or the one above # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M $HBASE_GC_OPTS" # Uncomment below to enable java garbage collection logging for the client processes in the .out file. # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps $HBASE_GC_OPTS" # Uncomment below (along with above GC logging) to put GC information in its own logfile (will set HBASE_GC_OPTS). # This applies to both the server and client GC options above # export HBASE_USE_GC_LOGFILE=true # Uncomment below if you intend to use the EXPERIMENTAL off heap cache. # export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize=" # Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value. # Uncomment and adjust to enable JMX exporting # See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access. # More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html # # export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false" # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101" # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102" # export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103" # export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104" # File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default. # export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers # File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default. # export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters # Extra ssh options. Empty by default. # export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR" # Where log files are stored. $HBASE_HOME/logs by default. # export HBASE_LOG_DIR=${HBASE_HOME}/logs # Enable remote JDWP debugging of major HBase processes. Meant for Core Developers # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070" # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071" # export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072" # export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073" # A string representing this instance of hbase. $USER by default. # export HBASE_IDENT_STRING=$USER # The scheduling priority for daemon processes. See 'man nice'. # export HBASE_NICENESS=10 # The directory where pid files are stored. /tmp by default. # export HBASE_PID_DIR=/var/hadoop/pids # Seconds to sleep between slave commands. Unset by default. This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HBASE_SLAVE_SLEEP=0.1 # Tell HBase whether it should manage it's own instance of Zookeeper or not. export HBASE_MANAGES_ZK=false 8. 編輯hbase-site.xml [hadoop@hdp08 conf]$ vi hbase-site.xml <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://hdp08:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hdp08:2181,hdp09:2181,hdp10:2181</value> </property> </configuration> 注意:hbase.zookeeper.quorum中的值應(yīng)當(dāng)是運(yùn)行zookeeper的機(jī)器 9. 編輯regionservers [hadoop@hdp08 conf]$ vi regionservers hdp09 hdp10 10. 將hdp08的hbase發(fā)送到hdp09,hdp10 [hadoop@hdp08 apps]$ scp -r hbase hadoop@hdp09:/home/hadoop/apps [hadoop@hdp08 apps]$ scp -r hbase hadoop@hdp10:/home/hadoop/apps 11. 啟動HBase [hadoop@hdp08 apps]$ start-hbase.sh 啟動本機(jī)hbase [hadoop@hdp08 bin]$ hbase-daemon.sh start master [hadoop@hdp08 bin]$hbase-daemon.sh start regionserver 12. 驗(yàn)證啟動 1. 在hadoop節(jié)點(diǎn)使用jps查看節(jié)點(diǎn)狀態(tài) 13. 查看啟動狀態(tài)信息 http://192.168.195.138:16010/ 三、 配置多臺HMaster 在$HBASE_HOME/conf/ 目錄下新增文件配置backup-masters,在其內(nèi)添加要用做Backup Master的節(jié)點(diǎn)hostname。如下: [hadoop@hdp08 conf]$ vi backup-masters dhp09