Monday, November 12, 2012

Confusing terms



You are not confused if you are able to answer following questions without any doubts

Q) Can we build a system which is highly fault-tolerant but not highly available?
Q) Can we build a system with low reliability and High availability?
Q) is reliability
defined as MTBF or MTTR?


Scalability
A system whose performance improves after adding hardware, proportionally to the capacity added, is said to be a scalable system.

Elasticity
Elasticity often refers to a system's ability to allocate additional resources in an autonomic manner

In other words, a scalable system allows you to add resources in order to handle more load, while an elastic system will add resources itself when the load increases.

Fault-tolerance

Fault tolerance refers to a system's ability to continue operating, perhaps gracefully degrading in performance when components of the system fail. There is no exact measure to measure fault-tolerance of a system

Availability

Availability is a percentage of time that a system is actually operational and providing its intended service.

A = Uptime/(Uptime + Downtime)
Ai = MTBF/(MTBF+MTTR)

Where there are no single points of failure might be considered system as fault tolerant, but if application-level data migrations, software upgrades, or configuration changes take an hour or more of downtime to complete, then the system is not highly available.

Reliability
In simple words, how long can a system stay up continuously?

More concrete definition is “reliability is the ability of a person or system to perform and maintain its functions in routine circumstances, as well as hostile or unexpected circumstances”
 


Reliability is often defined in terms of mean time between failures (MTBF). We can build a system with low-quality, not-so-reliable components and subsystems, and still achieve HA.

Durability
Durability of a system guarantees that stored data
can't be lost.

References:


1) http://www.quora.com/Distributed-Systems/What-is-the-difference-between-the-terms-scalable-and-elastic
2) http://www.ibm.com/developerworks/library/pa-bigiron2/


3) http://www.quora.com/Distributed-Systems/What-is-the-difference-between-a-highly-fault-tolerant-and-a-highly-available-system#
4) http://www.wikipedia.org/

Thursday, September 20, 2012

Pregel:Google's way of processing graphs


Introduction
Many applications can be modeled as graphs. Google has introduced the Pregel to process large scale graph processing applications. Lets delve more into this Pregel framework.

Few inspirations towards graph processing frameworks:
1) Facebook’s social graph contains 721 million users, 69 billion friendship links. Average distance between two users are 4.74
2) Representation of Wikipedia articles in a graph
3) Database trend is moving towards Graph Database.

Pros:
Large scale graph processing distributed system
Provides fault-tolerance capabilities through checkpointing
Performance of system is improved by bulk synchronous computation
API is modeled as a ‘think like a vertex’

Overall, Pregel has influenced state-of-art towards graphs.

Cons:
Lets talk about issues with Pregel using an example.
In one case,micro data centers are distributed geographically. Nodes in a data center are heterogeneous in nature.
In other case, graph is processed in homogenous mega data center. Performance of Pregel(considering a design in paper) in case 2 is much better than case 1 for following reasons.
1) Pregel doesn’t consider heterogeneous nature of system while partitioning the data. Slow node gets same size of data but can’t perform in similar speed as a fast node.
2) Due to single synchronization barrier, slow node slows down entire whole graph computation.
3) What if Pregel partitions graph data such that there involved huge communication between nodes which are far away.
4) Shape of graph is not considered in partitioning data.

Current state:
Twitter has introduced Cassovary, another big graph processing library. Many projects has inspired from Pregel. Open source projects are Apache Hama and Giraph. Giraph is has strong contributors from Twitter, Facebook, LinkedIn.

Brief descripion of inspired projects:

Apache HamaPure BSP implementation over Hadoop
GiraphIt is almost similar to Hama
HipGJava based library and no single synchronization barrier
Signal/Collectgives same importance to vertices & edges instead of focusing on vertex.
PhoebusPregel in Erlang


Discussion Points:
MapReduce vs Pregel
Issues with single synchronization barrier
Best way of partitioning graph data