全部課程
					
						
					
					
			
			
			
	    
	 
	 
					發(fā)布時間: 2018-01-30 16:26:29
1. 什么是hbase
HBASE是一個高可靠性、高性能、面向列、可伸縮的分布式存儲系統(tǒng),利用HBASE技術(shù)可在廉價PC Server上搭建起大規(guī)模結(jié)構(gòu)化存儲集群。
HBASE的目標(biāo)是存儲并處理大型的數(shù)據(jù),更具體來說是僅需使用普通的硬件配置,就能夠處理由成千上萬的行和列所組成的大型數(shù)據(jù)。
HBASE是Google Bigtable的開源實(shí)現(xiàn),但是也有很多不同之處。比如:Google Bigtable利用GFS作為其文件存儲系統(tǒng),HBASE利用Hadoop HDFS作為其文件存儲系統(tǒng);Google運(yùn)行MAPREDUCE來處理Bigtable中的海量數(shù)據(jù),HBASE同樣利用Hadoop MapReduce來處理HBASE中的海量數(shù)據(jù);Google Bigtable利用Chubby作為協(xié)同服務(wù),HBASE利用Zookeeper作為對應(yīng)。

上圖描述了Hadoop EcoSystem中的各層系統(tǒng),其中HBase位于結(jié)構(gòu)化存儲層,Hadoop HDFS為HBase提供了高可靠性的底層存儲支持,Hadoop MapReduce為HBase提供了高性能的計算能力,Zookeeper為HBase提供了穩(wěn)定服務(wù)和failover機(jī)制。 此外,Pig和Hive還為HBase提供了高層語言支持,使得在HBase上進(jìn)行數(shù)據(jù)統(tǒng)計處理變的非常簡單。 Sqoop則為HBase提供了方便的RDBMS數(shù)據(jù)導(dǎo)入功能,使得傳統(tǒng)數(shù)據(jù)庫數(shù)據(jù)向HBase中遷移變的非常方便。 2. 與傳統(tǒng)數(shù)據(jù)庫的對比 1、傳統(tǒng)數(shù)據(jù)庫遇到的問題: 1)數(shù)據(jù)量很大的時候無法存儲 2)沒有很好的備份機(jī)制 3)數(shù)據(jù)達(dá)到一定數(shù)量開始緩慢,很大的話基本無法支撐  2、HBASE優(yōu)勢: 1)線性擴(kuò)展,隨著數(shù)據(jù)量增多可以通過節(jié)點(diǎn)擴(kuò)展進(jìn)行支撐 2)數(shù)據(jù)存儲在hdfs上,備份機(jī)制健全 3)通過zookeeper協(xié)調(diào)查找數(shù)據(jù),訪問速度塊。 3. hbase集群中的角色 1、一個或者多個主節(jié)點(diǎn),Hmaster 2、多個從節(jié)點(diǎn),HregionServer
安裝部署Hbase在多臺裝有hadoop、zookeeper的機(jī)器上安裝hbase,本課程是以hdp08、hdp09、hdp10三臺機(jī)器為例子來講解 1. hdp08、hdp09、hdp10三臺機(jī)器分別安裝JDK、HADOOP、ZOOKEEPER,并設(shè)置好環(huán)境變量 2. 從Apache網(wǎng)站上(http://hbase.apache.org/)下載HBase穩(wěn)定發(fā)布包,本課程以hbase-1.1.12-bin.tar.gz為例,將hbase-1.1.12-bin.tar.gz上傳到hdp08 3. 解壓hbase-1.1.12-bin.tar.gz到/home/hadoop/apps目錄下 [hadoop@hdp10 ~]$ tar zxvf hbase-1.1.12-bin.tar.gz -C apps 4. 將解壓出來的文件夾名稱修改成hbase [hadoop@hdp10 apps]$ mv hbase-1.1.12 hbase 5. 設(shè)置環(huán)境變量 [root@hdp10 apps]# vi /etc/profile JAVA_HOME=/opt/jdk1.8.0_121 ZOOKEEPER_HOME=/home/hadoop/apps/zookeeper HBASE_HOME=/home/hadoop/apps/hbase PATH=$HBASE_HOME/bin:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH export HBASE_HOME ZOOKEEPER_HOME HADOOP_HOME JAVA_HOME PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL[root@hdp10 apps]#source /etc/profile [hadoop@hdp10 apps]$ hbase version
7. 編輯$HBASE_HOME/conf/hbase-env.sh [hadoop@hdp08 conf]$ vi hbase-env.sh # #/** # * Copyright 2007 The Apache Software Foundation # * # * Licensed to the Apache Software Foundation (ASF) under one # * or more contributor license agreements.  See the NOTICE file # * distributed with this work for additional information # * regarding copyright ownership.  The ASF licenses this file # * to you under the Apache License, Version 2.0 (the # * "License"); you may not use this file except in compliance # * with the License.  You may obtain a copy of the License at # * # *     http://www.apache.org/licenses/LICENSE-2.0 # * # * Unless required by applicable law or agreed to in writing, software # * distributed under the License is distributed on an "AS IS" BASIS, # * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # * See the License for the specific language governing permissions and # * limitations under the License. # */   # Set environment variables here.   # This script sets variables multiple times over the course of starting an hbase process, # so try to keep things idempotent unless you want to take an even deeper look # into the startup scripts (bin/hbase, etc.)   # The java implementation to use.  Java 1.6 required. export JAVA_HOME=/opt/jdk1.8.0_121/   # Extra Java CLASSPATH elements.  Optional.這行代碼是錯的,需要可以修改為下面的形式 #export HBASE_CLASSPATH=/home/hadoop/hbase/conf export JAVA_CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar # The maximum amount of heap to use, in MB. Default is 1000. # export HBASE_HEAPSIZE=1000   # Extra Java runtime options. # Below are what we set by default.  May only work with SUN JVM. # For more on why as well as other possible settings, # see http://wiki.apache.org/hadoop/PerformanceTuning export HBASE_OPTS="-XX:+UseConcMarkSweepGC"   # Uncomment below to enable java garbage collection logging for the server-side processes # this enables basic gc logging for the server processes to the .out file # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps $HBASE_GC_OPTS"   # this enables gc logging using automatic GC log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+. Either use this set of options or the one above # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M $HBASE_GC_OPTS"   # Uncomment below to enable java garbage collection logging for the client processes in the .out file. # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps $HBASE_GC_OPTS"   # Uncomment below (along with above GC logging) to put GC information in its own logfile (will set HBASE_GC_OPTS). # This applies to both the server and client GC options above # export HBASE_USE_GC_LOGFILE=true     # Uncomment below if you intend to use the EXPERIMENTAL off heap cache. # export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize=" # Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value.     # Uncomment and adjust to enable JMX exporting # See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access. # More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html # # export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false" # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101" # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102" # export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103" # export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"   # File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default. # export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers   # File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default. # export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters   # Extra ssh options.  Empty by default. # export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"   # Where log files are stored.  $HBASE_HOME/logs by default. # export HBASE_LOG_DIR=${HBASE_HOME}/logs   # Enable remote JDWP debugging of major HBase processes. Meant for Core Developers  # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070" # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071" # export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072" # export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"   # A string representing this instance of hbase. $USER by default. # export HBASE_IDENT_STRING=$USER   # The scheduling priority for daemon processes.  See 'man nice'. # export HBASE_NICENESS=10   # The directory where pid files are stored. /tmp by default. # export HBASE_PID_DIR=/var/hadoop/pids   # Seconds to sleep between slave commands.  Unset by default.  This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HBASE_SLAVE_SLEEP=0.1   # Tell HBase whether it should manage it's own instance of Zookeeper or not. export HBASE_MANAGES_ZK=false 8. 編輯hbase-site.xml [hadoop@hdp08 conf]$ vi hbase-site.xml <configuration> <property>     <name>hbase.rootdir</name>     <value>hdfs://hdp08:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hdp08:2181,hdp09:2181,hdp10:2181</value> </property> </configuration> 注意:hbase.zookeeper.quorum中的值應(yīng)當(dāng)是運(yùn)行zookeeper的機(jī)器 9. 編輯regionservers [hadoop@hdp08 conf]$ vi regionservers hdp09 hdp10 10. 將hdp08的hbase發(fā)送到hdp09,hdp10 [hadoop@hdp08 apps]$ scp -r hbase hadoop@hdp09:/home/hadoop/apps [hadoop@hdp08 apps]$ scp -r hbase hadoop@hdp10:/home/hadoop/apps 11.  啟動HBase [hadoop@hdp08 apps]$ start-hbase.sh 啟動本機(jī)hbase [hadoop@hdp08 bin]$ hbase-daemon.sh start master [hadoop@hdp08 bin]$hbase-daemon.sh start regionserver 12. 驗(yàn)證啟動 1. 在hadoop節(jié)點(diǎn)使用jps查看節(jié)點(diǎn)狀態(tài) 13. 查看啟動狀態(tài)信息 http://192.168.195.138:16010/  三、 配置多臺HMaster 在$HBASE_HOME/conf/ 目錄下新增文件配置backup-masters,在其內(nèi)添加要用做Backup Master的節(jié)點(diǎn)hostname。如下: [hadoop@hdp08 conf]$ vi backup-masters  dhp09