The-Design-and-Architecture-of-the-Microsoft-Cluster-Service微软的集群服务设计和结构-精选课件.ppt

资源描述

《The-Design-and-Architecture-of-the-Microsoft-Cluster-Service微软的集群服务设计和结构-精选课件.ppt》由会员分享，可在线阅读，更多相关《The-Design-and-Architecture-of-the-Microsoft-Cluster-Service微软的集群服务设计和结构-精选课件.ppt（30页珍藏版）》请在三一办公上搜索。

1、The Design and Architecture of the Microsoft Cluster Service (MSCS)- W. Vogels et al.,ECE 845 PresentationBySandeep TamboliApril 18, 2000,1,The Design and Architecture of,Outline,PrerequisitesIntroductionDesign GoalsCluster AbstractionsCluster OperationCluster ArchitectureImplementation ExamplesSumm

2、ary,2,OutlinePrerequisites2,Prerequisites,Availability = MTTF / (MTTF + MTTR)MTTF: Mean Time To FailureMTTR: Mean Time To RepairHigh Availability:Modern taxonomy of High Availability:A system having sufficient redundancy in components to mask certain defined faults, has High Availability (HA). IBM H

3、igh Availability Services:The goals of high availability solutions are to minimize both the number of service interruptions and the time needed to recover when an outage does occur. High availability is not a specific technology nor a quantifiable attribute; it is a goal to be reached. This goal is

4、different for each system and is based on the specific needs of the business the system supports. The presenter:May have degraded performance while a component is down,3,PrerequisitesAvailability = MT,MSCS(a.k.a. Wolfpack),Extension of Windows NT to improve availabilityFirst phase of implementationS

5、calability limited up to 2 nodesMSCS features:Fail overMigrationAutomated restartDifferences with previous HA solutions:Simpler User InterfaceMore sophisticated modeling of applications Tighter integration with the OS (NT),4,MSCS(a.k.a. Wolfpack)Extension,MSCS(2),Shared nothing cluster model:Each no

6、de owns a subset of cluster resourcesOnly one node may own a resource at a timeOn failure, another node may take the resource ownership,5,MSCS(2)Shared nothing cluster,Design Goals,CommodityCommercial-off-the-shelf nodesWindows NT serverStandard Internet protocolsScalabilityTransparencyPresented as

7、a single system to the clientsSystem management tools manage as if a single serverService and system execution information available in single cluster wide log,6,Design GoalsCommodity6,Design Goals(2),AvailabilityOn failure detectionRestart application on another nodeMigrate other resources ownershi

8、pRestart policy can specify availability requirements of the applicationHardware/software upgrades possible in phased manner,7,Design Goals(2)Availability7,Cluster Abstractions,Node: Runs an instance of Cluster ServiceDefined and activeResourceFunctionality offered at a nodePhysical: printerLogical:

9、 IP addressApplications implement logical resourcesExchange mail databaseSAP applicationsQuorum ResourcePersistent storage for Cluster Configuration DatabaseArbitration mechanism to control membershipPartition on a fault tolerant shared SCSI disk,8,Cluster AbstractionsNode: Runs,Cluster Abstractions

10、(2),Resource DependenciesDependency trees: Sequence to bring resources onlineResource GroupsUnit of migrationVirtual serversApplication runs within virtual server environmentIllusion to applications, administrators, and clients of a single stable environmentClient connects using virtual server nameE

11、nables many application instances to run on a same physical node,9,Cluster Abstractions(2)Resourc,Cluster Abstractions(3),Cluster Configuration DatabaseReplicated at each nodeAccessed through NT registryupdates applied using Global Update Protocol,10,Cluster Abstractions(3)Cluster,Cluster Membership

12、 Operation,11,Cluster Membership Operation11,Member Join,Sponsor broadcasts the identity of the joining nodeSponsor informs the joining node aboutCurrent membershipCluster configuration databaseJoining members heartbeats startSponsor waits for the first heartbeatSponsor signals the other nodes to co

13、nsider the joining node a full memberAcknowledgement is sent to the joining nodeOn failure, Join operation abortedJoining node removed from the membership,12,Member JoinSponsor broadcasts,Member Regroup,Upon suspicion that an active node has failed, member regroup operation is executed to detect any

14、 membership changesReasons for suspicion: missing heartbeatspower failuresThe regroup algorithm moves each node through 6 stagesEach node sends periodic messages to all other nodes, indicating which stage it has finishedBarrier synchronization,13,Member RegroupUpon suspicion t,Regroup Algorithm,Acti

15、vate: After a local clock tick, each node sends and collects status messagesNode advances if all responses collected or timeout occursClosing: It is determined if partitions exist and if current nodes partition should survivePruning: All nodes that are pruned for lack of connectivity, haltCleanup ph

16、ase one: All the surviving nodesInstall new membershipMark the halted nodes as inactiveInform the cluster network manager to filter out halted nodes messagesMake event manager invoke local callback handlers announcing node failuresCleanup phase two: A second cleanup callback is invoked to allow a co

17、ordinated two-phase cleanupStabilized: The regroup has finished,14,Regroup AlgorithmActivate: 14,Partition Survival,A partition survives if any of the following is satisfied:n(new membership) 1/2 * n(original membership) Following three conditions satisfied togethern(new membership) = 1/2* n(Origina

18、l membership)n(new membership) 2 tiebreaker node (new membership)Following three conditions satisfied togethern(original membership) = 2n(new membership) = 1quorum disk (new membership),15,Partition SurvivalA partition,Resource Management,Resource control DLL for each type of resourcePolymorphic des

19、ign allows easy management of varied resource typesResource state transition diagram:,16,Resource ManagementResource co,Resource Migration: Pushing a group,Executed whenResource failure at the original nodeResource group prefers to execute at other nodeAdministrator moves the groupSteps involved:All

20、 resources taken to offline stateA new active host node selectedBrought online at the new node,17,Resource Migration: Pushing a,Resource Migration: Pulling a group,Executed whenThe original node failsSteps involvedA new active host node selectedBrought online at the new nodeNodes can determine the n

21、ew owner hostswithout communicating with each otherwith the help of replicated cluster database,18,Resource Migration: Pulling a,Resource Migration: Fail-back,No automatic migration to preferred ownerConstrained by fail-back window:How long must the node be up and runningBlackout periodsFail-back de

22、ferred for cost or availability reasons,19,Resource Migration: Fail-backN,Cluster Architecture,20,Cluster Architecture20,Global Update Management,Atomic broadcast protocolIf one surviving member receives an update, all the surviving members eventually receive the updateLocker node has a central role

23、Steps in normal execution:A node wanting to start global update contacts the lockerWhen accepted by locker, the sender RPCs to each active node to install the update, in the order of node-ID starting with the node immediately after the lockerOnce global update is over, the sender sends the locker an

24、 unlock request to indicate successful termination,21,Global Update ManagementAtomic,Failure Conditions,If all the nodes that received update fail = update never occurredIf sender fails during the update operationLocker reconstructs the update and sends it to each active nodeNodes ignore the duplica

25、te updateIf sender and locker both fail after sender installed the update at any node beyond the lockerThe next node in the update list is assigned as a new lockerThe new locker will complete the update,22,Failure ConditionsIf all the n,Support Components,Cluster Network: Extension to the basic OSHe

26、artbeat managementCluster Disk Driver: Extension to the basic OSShared SCSI busCluster wide Event LoggingEvents sent via RPC to all other nodes (periodically)Time ServiceClock synchronization,23,Support ComponentsCluster Netw,Implementation Examples,MS SQL ServerA SQL Server resource group configure

27、d as Virtual Server2-node cluster can have 2 or more HA SQL ServersOracle serversOracle Parallel ServerShared disk modelUses MSCS to track cluster organization and membership notificationsOracle Fail-Safe serverEach instance of Fail-Safe database is a virtual serverUpon failure:The virtual server mi

28、grates to the other nodeThe clients reconnect under the same name and address,24,Implementation ExamplesMS SQL,Implementation Examples(2),SAP R/3Three-tier client/server systemNormal operation:One node hosts database virtual serverThe other provides application components combined in a serverUpon fa

29、ilure:The failed virtual server migrates to the surviving nodeThe application servers are failover awareMigration of the application server needs new login session,25,Implementation Examples(2)SAP,Scalability Issues:Join Latency, Regroup messages, GUP Latency, GUP throughput,26,Scalability Issues:Jo

30、in Laten,Summary,A highly available 2-node cluster design using commodity componentsCluster is managed in 3 tiersCluster abstractionsCluster operationCluster Service components (interaction with OS)Design not scalable beyond about 16 nodes,27,SummaryA highly available 2-no,Relevant URLs,A Modern Tax

31、onomy of High Availability interlog/resnick/HA.htmAn overview of Clustering in Windows NT Server 4.0, Enterprise Editionmicrosoft/ntserver/ntserverenterprise/exec/overview/clustering.aspScalability of MSCScs.cornell.edu/rdc/mscs/nt98/IBM High Availability Services as.ibm/asus/highavail2.htmlHigh-Ava

32、ilability Linux Projectlinux-ha.org/,28,Relevant URLsA Modern Taxonomy,Discussion Questions,Is clustering the only choice for HA systems?Why is MSCS in use today despite of its scalability concerns?Does performance suffer because of HA provisions? Why?Are geographical HA solutions needed (in order t

33、o take care of site disasters)? This is good for transaction oriented services. What about, say, scientific computing?Hierarchical clustering?,29,Discussion QuestionsIs cluster,Glossary,NetBIOS: Short for Network Basic Input Output System, an application programming interface (API) that augments the

34、 DOS BIOS by adding special functions for local-area networks (LANs). Almost all LANs for PCs are based on the NetBIOS. Some LAN manufacturers have even extended it, adding additional network capabilities. NetBIOS relies on a message format called Server Message Block (SMB). SMB: Short for Server Me

35、ssage Block, a message format used by DOS and Windows to share files, directories and devices. NetBIOS is based on the SMB format, and many network products use SMB. These SMB-based networks include Lan Manager, Windows for Workgroups, Windows NT, and LanServer. There are also a number of products that use SMB to enable file sharing among different operating system platforms. A product called Samba, for example, enables UNIX and Windows machines to share directories and files.,30,GlossaryNetBIOS: Short for Net,

展开阅读全文