Computing Paradigms

  1. Parallel Computing: Enhancement in computing power is constrained by the speed of light, thermodynamic laws and high financial costs for processor fabrication. There are three ways to improve performance.
    1. Work harder: Using faster hardware
    2. Work Smarter: Using optimized algorithms
    3. Seek help: Multiple Computers for a particular task
    Parallel Computing allows the sharing of a computational task among multiple processors.    Scalable parallel computing architectures:
    1. Massively parallel processors (MPP): A large parallel processing system with a shared-nothing architecture. Consist of several hundred nodes with a high-speed interconnection network/switch. Each node consists of a main memory & one or more processors and runs a separate copy of the OS
    2.  Symmetric Multiprocessors (SMP):Shared-everything architecture. All processors (around 2-64)  share all the global resources available. Single copy of the OS runs on these system.
    3. Cache coherent Non-uniform memory access (CC-NUMA): A scalable multiprocessor system having a cache-coherent nonuniform memory access architecture. Every processor has a global view of all of the memory.
    4. Distributed Systems: Conventional networks of independent computers. Have multiple system images as each node runs its own OS. The individual machines could be combinations of MPPs, SMPs, clusters, & individual computers
    5. Clusters: A collection of workstations that are interconnected by a high-speed network. Work as an integrated collection of resources. Have a single system image spanning all its nodes
  1. Cluster Computing: A cluster is a type of parallel computing system, which consists of a collection of interconnected stand-alone  computers cooperatively working together as a single, integrated computing resource. A node is a single or multiprocessor system with memory, I/O facilities, & OS. Nodes connected via LAN appear as a single system to users and applications providing a cost-effective way to gain features for high performance, expandability, scalability, availability and throughput. A Hypercluster is an interconnected  cluster of clusters
  2. Grid Computing: In the mid 1990s, the term Grid was coined to describe technologies that would allow consumers to obtain computing power on demand. Ian Foster and others posited that by standardizing the protocols used to request computing power, we could spur the creation of a Computing Grid, analogous in form and utility to the electric power grid. Grid systems (TerraGrid, Open Source Grid etc.) were later developed  to provide not only computing power but also data and software on demand. This enables unification of geographically distributed resources by resource sharing, selection and aggregation for solving complex problems. The unification depends on availability, capability, cost and QoS of resources. The business model for Grids is project-oriented in which the users or community represented by that proposal have certain number of service units (i.e. CPU hours) they can spend. For example, the TeraGrid.  When an
    institution joins the TeraGrid with a set of resources, it shares its resources and  gains access to a dozen other Grid sites. Grid Computing aims to "enable resource sharing and coordinated problem solving in dynamic, multi-institutional virtual organizations (where each VO can consist of either physically distributed institutions or logically related projects/groups). The computing power at disposal now were earlier affordable only by supercomputers and large dedicated clusters. The resources can be
     
    1. Computers PCs, clusters, supercomputers, laptops, mobile devices, etc
    2. Software    – e.g. ASPs renting expensive special purpose applications on demand;
    3. Cataloged data and databases – e.g.human genome database 
    4. Other special devices/instruments   
    5. People 
         In order to support the creation of the so called “Virtual Organizations”—a logical entity within which distributed resources can be discovered and shared as if they were from the same organization, Grids define and provide a set of standard protocols, middleware, toolkits, and services built on
top of these protocols. Interoperability and security are the primary concerns for the Grid infrastructure. Grids provide protocols and services at five different layers as identified in the Grid protocol architecture. At the fabric layer, Grids provide access to different resource. The connectivity layer defines core communication and authentication protocols for easy and secure network transactions.
The resource layer defines protocols for the publication, discovery, negotiation, monitoring, accounting and payment of sharing operations on individual resources. The collective layer captures interactions across collections of resources, directory services such as MDS (Monitoring and Discovery Service) allows for the monitoring and discovery of VO resources. The application layer comprises whatever user applications built on top of the above protocols and APIs and operate in VO environments.
  1. Cloud Computing: Cloud Computing is hinting at a future in which we won’t compute on local computers, but on centralized facilities operated by third-party compute and storage utilities. The cloud here denotes internet and capabilities are transparently available globally. It shares the vision with Grid Computing — to reduce the cost of computing, increase reliability, and increase flexibility by transforming computers from something that we buy and operate ourselves
    to something that is operated by a third party.  Since clusters are expensive to operate, we have low cost virtualisation here as Cloud computing operates at massive scale (Amazon, Google,
    and Microsoft etc to create real commercial large-scale systems containing hundreds of thousands of computers). Cloud computing can be defined as — a large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically-scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet. Cloud Computing is a specialized distributed computing paradigm; it differs from traditional ones in that 
    1. It is massively scalable
    2. It can be encapsulated as an abstract entity that delivers different levels of services to customers outside the Cloud
    3. It is driven by economies of scale
    4. The services can be dynamically configured (via virtualization or other approaches) and delivered on demand.
    5. Governments, research institutes, and industry leaders are rushing to adopt Cloud Computing to solve their ever increasing computing and storage problems arising in the
      Internet Age. There are three main factors contributing to the surge and interests in Cloud Computing: 1) rapid decrease in hardware cost and increase in computing power and storage
      capacity, and the advent of multi-core architecture and modern supercomputers consisting of hundreds of thousands of cores; 2) the exponentially growing data size in scientific
      instrumentation/simulation and Internet publishing and archiving; and 3) the wide-spread adoption of Services Computing and Web 2.0 applications
    The evolution from Grid to Cloud computing  has been a result of a shift in focus from an infrastructure that delivers storage and computing resources (such is the case in Grids) to one that is economy based aiming to deliver more abstract resources and services (such is the case in Clouds). It is a business model in which computing resources, such as computation and storage, are packaged as metered services. A Cloud infrastructure can be utilized internally by a company
    or exposed to the public as utility computing. Amazon essentially provides a centralized Cloud consisting of Compute Cloud EC2 and Data Cloud S3. The former is charged based on per instance-hour consumed for each instance type and the later is charged by per GB-Month of storage used. In addition, data transfer is charged by TB / month data transfer, depending on the source and target of such transfer.
          We define a four-layer architecture for Cloud Computing in comparison to the Grid architecture, composed of 1) fabric, 2) unified resource, 3) platform, and 4) application Layers. The fabric layer contains the raw hardware level resources, such as compute resources, storage resources, and network resources. The unified resource layer contains resources that have been abstracted/encapsulated (usually by virtualization) so that they can be exposed to upper layer and end users as integrated resources, for instance, a virtual computer/cluster, a logical file system, a database system, etc. The platform layer adds on a collection of specialized tools, middleware and services on top of the unified resources to provide a development and/or deployment platform. For instance, a Web hosting environment, a scheduling service, etc. Finally, the application layer contains the applications that would run in the Clouds. Apart from differing with Grid computing on account of business model and the architectures, following differences are worth notable.
  1. Resource Management:
    1. Compute Model: Grids use a batch-scheduled compute model.  Due to the expensive scheduling decisions, data staging in and out, and potentially long queue times, many Grids don’t natively support interactive applications. The resources in the Cloud are being shared by all users at the same time (in contrast to dedicated resources governed by a queuing system). This should allow latency sensitive applications to operate natively on Cloud. 
    2. Data Model: Internet Computing in future will be centralized around Data, Cloud Computing, as well as Client Computing. Cloud Computing and Client Computing
      will coexist and evolve hand in hand and data management will become more and more important for both, with increase in data centric applications. 
    3. Data Locality: To achieve good scalability at Internet scales for Clouds, Grids, and their applications, data must be distributed over many computers, and computations
      must be steered towards the best place to execute in order to minimize the communication costs. In Grids, data storage usually relies on a shared file systems (e.g. NFS), where data locality cannot be easily applied.
    4. Compute-data management: It is important to schedule computational tasks close to the data, and to understand the costs of moving the work as opposed to moving the data. Grids have been largely successful and Cloud faces challenges here.
    5. Virtualization: This is important for abstraction and encapsulation. Clouds need to run
      multiple (or even up to thousands or millions of) user applications, and all the applications appear to the users as if they were running simultaneously and could use all the available resources in the Cloud. Virtualization provides the necessary abstraction such that the underlying fabric (raw compute, storage, network resources) can be unified as a pool
      of resources and resource overlays (e.g. data storage services, Web hosting environments) can be built on top of them. Virtualization also enables each application to be encapsulated
      such that they can be configured, deployed, started, migrated, suspended, resumed, stopped, etc., and thus provides better security, manageability, and isolation. Grids do not rely on virtualization as much as Clouds do, but that might be more due to policy and having each individual organization maintain full control of their resources (i.e. by not virtualizing them). However, there are efforts in Grids to use virtualization as well.
    6. Monitoring: Grid have a different trust model where users via their identity delegation can access and browse resources at different Grid sites, and Grid resources are not highly abstracted and virtualized as in Clouds. Monitoring in Clouds requires a fine balance of business application monitoring, enterprise server management, virtual machine monitoring, and hardware maintenance, and will be a significant challenge for Cloud Computing
  2. Programming Model: Grids primarily target large-scale scientific computations and the programming model does not differ fundamentally from traditional parallel and distributed environments. Various programming models used are Message passing interface (MPI) (something similar to thread programming),  Map Reduce etc. Clouds (such as Amazon Web
    Services, Microsoft’s Azure Services Platform) have generally adopted Web Services APIs where users access, configure and program Cloud services using pre-defined APIs exposed as Web services, and HTTP and SOAP are the common protocols chosen for such services. Google App Engine uses a modified Python runtime and chooses Python scripting language for Web application development.
  3. Application model: Grids support HPC (High performance computing - having tightly coupled parallel jobs with low-latency interconnects) and HTC(High Throughput computing - loosly coupled jobs) via MPI. As Cloud Computing is still in its infancy, the applications that will run on Clouds are not well defined, but we can certainly characterize them to be loosely coupled, transaction oriented (small tasks in the order of milliseconds to seconds), and likely to be interactive (as opposed to batch scheduled as they are currently in Grids).
    1. Evolutionary Computing
    2. Quantum Computing: Computations are based on the laws of quantum mechanics, which is the behavior of particles at the sub-atomic level. In 1982, Feynman proposed the idea of creating machines based on the laws of quantum mechanics instead of the laws of classical physics. In 1985, David Deutsch developed the quantum turingmachine, showing that quantum circuits are universal. In 1994, Peter Shor came up with a quantum algorithm to factor very large numbers in polynomial time. In 1997, Lov Grover develops a quantum search algorithm with O(√N) complexity. 
      1. Representation of data: A bit of data is represented by a single atom that is in one of two states denoted by |0> and |1>.  A single bit of this form is known as a qubit.  A qubit could be implemented as using the two energy levels of an atom.  An excited state representing |1> and a ground state representing |0>. A single qubit can be forced into a superposition of the two states denoted by the addition of the state vectors:
        |ψ> = α  |0> + β  |1>
        Where α   and β   are complex numbers and |α  |2    +  | β  |2    = 1. A qubit in superposition is in both of the states |1> and |0 at the same time. A three qubit register would be superposition of 8 states viz |ψ> = |000> (1/√8) |001> (1/√8) |011> (1/√8)+ ...
      2. Data Retrieval: In general, an n qubit register can represent the numbers 0 through 2(n-1) simultaneously. If we attempt to retrieve the values represented within a superposition, the superposition randomly collapsesto represent just one of the original values. In the first equation above where α and β are probability amplitudes, α represents the probability of the superposition collapsing to |0>.  In a balanced superposition, α = 1/√2n   where n is the number of qubits.
      3. Entanglement: is the ability of quantum systems to exhibit correlations between states within a superposition. Imagine two qubits, each in the state |0> + |1> , we can entangle the two qubits such that the measurement of one qubit is always correlated to the measurement of the other qubit.  
      4. Operations on Data: Due to the nature of quantum physics, the destruction of information in a gate (i.e. a zero at output) will cause heat to be evolved which can destroy the superposition of qubits.  Quantum Gates are similar to classical gates, but do not have a degenerate output. i.e. their original input state can be derived from their output state, uniquely.  They must be reversible in order to perform deterministic computation. The Simplest gate involves one qubit and is called a Hadamard Gate (also known as a square-root of NOT gate.) . Two Hadamard gates used in succession can be used as a NOT gate. A gate which operates on two qubits is called a Controlled-NOT (CN) Gate.  If the bit on the control line is 1, invert the bit on the target line. The CN gate behaves similar toto the XOR gate with some extra information to make it reversible.  A gate which operates on three qubits is called a Controlled Controlled NOT (CCN) Gate.  Iffthe bits on both of the control lines is 1, then the target bit is inverted. The CCN gate has been shown to be a universal reversible logic gate as it can be used as a NAND gate
    Shor's Algorithm: Shor’s algorithm shows (in principle,) that a quantum computer is capable of factoring very large numbers in polynomial time. The algorithm is dependant on
    1.  Modular Arithmetic
    2.  Quantum Parallelism
    3.  Quantum Fourier Transform
    In 2001, a 7 qubit machine was built and programmed to run Shor’s algorithm to successfully factor 15. Can we expect quantum computers to solve NP Complete problems in polynomial time?

      3 comments:

      Twitter Delicious Facebook Digg Stumbleupon Favorites More

       
      Design by Free WordPress Themes | Bloggerized by Lasantha - Premium Blogger Themes | Blogger Templates