大型网站所使用的工具.ppt

上传人:sccc 文档编号:5814390 上传时间:2023-08-22 格式:PPT 页数:33 大小:1.69MB
返回 下载 相关 举报
大型网站所使用的工具.ppt_第1页
第1页 / 共33页
大型网站所使用的工具.ppt_第2页
第2页 / 共33页
大型网站所使用的工具.ppt_第3页
第3页 / 共33页
大型网站所使用的工具.ppt_第4页
第4页 / 共33页
大型网站所使用的工具.ppt_第5页
第5页 / 共33页
点击查看更多>>
资源描述

《大型网站所使用的工具.ppt》由会员分享,可在线阅读,更多相关《大型网站所使用的工具.ppt(33页珍藏版)》请在三一办公上搜索。

1、大型網站所使用的工具,Perlbal-http:/-http:/分散式檔案系統有公司認為 MogileFS 比起 Hadoop 適合拿來處理小檔案 memcached-http:/memcached.org/共享記憶體?把資料庫或其他需要經常讀取的部分,用記憶體快取(Cache)方式存放Moxi-http:/的 PROXYMore Resource:http:/,How to scale up web service in the past?,Source:http:/,Source:http:/,Source:http:/,Source:http:/,6,HBase Intro,王耀聰 陳威

2、宇jazznchc.org.twwauenchc.org.tw,教育訓練課程,HBase is a distributed column-oriented database built on top of HDFS.,HBase is.,A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage.Designed to operate on top of the Hadoop distributed file system

3、(HDFS)or Kosmos File System(KFS,aka Cloudstore)for scalability,fault tolerance,and high availability.Integrated into the Hadoop map-reduce platform and paradigm.,Benefits,Distributed storageTable-like in data structure multi-dimensional mapHigh scalabilityHigh availabilityHigh performance,Who use HB

4、ase,Adobe 內部使用(Structure data)Kalooga 圖片搜尋引擎 http:/Meetup 社群聚會網站 http:/Streamy 成功從 MySQL 移轉到 Hbase http:/Trend Micro 雲端掃毒架構 http:/Yahoo!儲存文件 fingerprint 避免重複 http:/-http:/wiki.apache.org/hadoop/Hbase/PoweredBy,Backdrop,Started toward by Chad Walters and Jim2006.11Google releases paper on BigTable200

5、7.2Initial HBase prototype created as Hadoop contrib.2007.10First useable HBase2008.1Hadoop become Apache top-level project and HBase becomes subproject2008.10HBase 0.18,0.19 released,HBase Is Not,Tables have one primary index,the row key.No join operators.Scans and queries can select a subset of av

6、ailable columns,perhaps by using a wildcard.There are three types of lookups:Fast lookup using row key and optional timestamp.Full table scanRange scan from region start to end.,HBase Is Not(2),Limited atomicity and transaction support.HBase supports multiple batched mutations of single rows only.Da

7、ta is unstructured and untyped.No accessed or manipulated via SQL.Programmatic access via Java,REST,or Thrift APIs.Scripting via JRuby.,Why Bigtable?,Performance of RDBMS system is good for transaction processing but for very large scale analytic processing,the solutions are commercial,expensive,and

8、 specialized.Very large scale analytic processingBig queries typically range or table scans.Big databases(100s of TB),Why Bigtable?(2),Map reduce on Bigtable with optionally Cascading on top to support some relational algebras may be a cost effective solution.Sharding is not a solution to scale open

9、 source RDBMS platformsApplication specificLabor intensive(re)partitionaing,Why HBase?,HBase is a Bigtable clone.It is open sourceIt has a good community and promise for the futureIt is developed on top of and has good integration for the Hadoop platform,if you are using Hadoop already.It has a Casc

10、ading connector.,HBase benefits than RDBMS,No real indexesAutomatic partitioningScale linearly and automatically with new nodesCommodity hardwareFault toleranceBatch processing,Data Model,Tables are sorted by RowTable schema only define its column families.Each family consists of any number of colum

11、nsEach column consists of any number of versionsColumns only exist when inserted,NULLs are free.Columns within a family are sorted and stored togetherEverything except table names are byte(Row,Family:Column,Timestamp)Value,Row key,Column Family,value,TimeStamp,Members,MasterResponsible for monitorin

12、g region serversLoad balancing for regionsRedirect client to correct region serversThe current SPOFregionserver slavesServing requests(Write/Read/Scan)of ClientSend HeartBeat to MasterThroughput and Region numbers are scalable by region servers,Regions,表格是由一或多個 region 所構成Region 是由其 startKey 與 endKey

13、 所指定每個 region 可能會存在於多個不同節點上,而且是由數個HDFS 檔案與區塊所構成,這類 region 是由 Hadoop 負責複製,實際個案討論 部落格,邏輯資料模型一篇 Blog entry 由 title,date,author,type,text 欄位所組成。一位User由 username,password等欄位所組成。每一篇的 Blog entry可有許多Comments。每一則comment由 title,author,與 text 組成。ERD,部落格 HBase Table Schema,Row keytype(以2個字元的縮寫代表)與 timestamp組合而成

14、。因此 rows 會先後依 type 及 timestamp 排序好。方便用 scan()來存取 Table的資料。BLOGENTRY 與 COMMENT的”一對多”關係由comment_title,comment_author,comment_text 等column families 內的動態數量的column來表示每個Column的名稱是由每則 comment的 timestamp來表示,因此每個column family的 column 會依時間自動排序好,Architecture,ZooKeeper,HBase depends on ZooKeeper(Chapter 13)and

15、by default it manages a ZooKeeper instance as the authority on cluster state,Operation,The-ROOT-table holds the list of.META.table regions,The.META.table holds the list of all user-space regions.,Installation(1),$wget http:/sudo tar-zxvf hbase-*.tar.gz-C/opt/$sudo ln-sf/opt/hbase-0.20.3/opt/hbase$su

16、do chown-R$USER:$USER/opt/hbase$sudo mkdir/var/hadoop/$sudo chmod 777/var/hadoop,啟動Hadoop,Setup(1),$vim/opt/hbase/conf/hbase-env.sh export JAVA_HOME=/usr/lib/jvm/java-6-sunexport HADOOP_CONF_DIR=/opt/hadoop/confexport HBASE_HOME=/opt/hbaseexport HBASE_LOG_DIR=/var/hadoop/hbase-logsexport HBASE_PID_D

17、IR=/var/hadoop/hbase-pidsexport HBASE_MANAGES_ZK=trueexport HBASE_CLASSPATH=$HBASE_CLASSPATH:/opt/hadoop/conf,$cd/opt/hbase/conf$cp/opt/hadoop/conf/core-site.xml./$cp/opt/hadoop/conf/hdfs-site.xml./$cp/opt/hadoop/conf/mapred-site.xml./,Setup(2),name value,Startup&Stop,全部啟動/關閉$bin/start-hbase.sh$bin/

18、stop-hbase.sh個別啟動/關閉$bin/hbase-daemon.sh start/stop zookeeper$bin/hbase-daemon.sh start/stop master$bin/hbase-daemon.sh start/stop regionserver$bin/hbase-daemon.sh start/stop thrif$bin/hbase-daemon.sh start/stop rest,Testing(4),$hbase shell create test,data0 row(s)in 4.3066 seconds listtest1 row(s)i

19、n 0.1485 seconds put test,row1,data:1,value10 row(s)in 0.0454 seconds put test,row2,data:2,value20 row(s)in 0.0035 seconds put test,row3,data:3,value30 row(s)in 0.0090 seconds,scan testROW COLUMN+CELLrow1 column=data:1,timestamp=1240148026198,value=value1row2 column=data:2,timestamp=1240148040035,va

20、lue=value2row3 column=data:3,timestamp=1240148047497,value=value33 row(s)in 0.0825 seconds disable test09/04/19 06:40:13 INFO client.HBaseAdmin:Disabled test0 row(s)in 6.0426 seconds drop test09/04/19 06:40:17 INFO client.HBaseAdmin:Deleted test0 row(s)in 0.0210 seconds list0 row(s)in 2.0645 seconds

21、,Connecting to HBase,Java clientget(byte row,byte column,long timestamp,int versions);Non-Java clientsThrift server hosting HBase client instanceSample ruby,c+,&java(via thrift)clientsREST server hosts HBase clientTableInput/OutputFormat for MapReduceHBase as MR source or sinkHBase ShellJRuby IRB wi

22、th“DSL”to add get,scan,and admin./bin/hbase shell YOUR_SCRIPT,Thrift,a software framework for scalable cross-language services development.By facebookseamlessly between C+,Java,Python,PHP,and Ruby.This will start the server instance,by default on port 9090The other similar project“rest”,$hbase-daemon.sh start thrift$hbase-daemon.sh stop thrift,References,HBase 介紹http:/www.wretch.cc/blog/trendnop09/21192672 Hadoop:The Definitive GuideBook,by Tom WhiteHBase Architecture 101http:/,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 建筑/施工/环境 > 农业报告


备案号:宁ICP备20000045号-2

经营许可证:宁B2-20210002

宁公网安备 64010402000987号