Sandesh Shrestha: storage

Showing posts with label storage. Show all posts

Thursday, May 14, 2015

EMC VPLEX : Extending VMWare Functionality Across Data Centers

VPLEX is a storage virtualization appliance. It sits between the storage arrays and hosts and virtualizes the presentation of storage arrays, including non-EMC arrays. Storage is then configured and presented to the host. It delivers data mobility and availability across arrays and sites. VPLEX is a unique virtual storage technology that enables mission critical applications to remain up and running during any of a variety of planned and unplanned downtime scenarios. VPLEX permits painless, nondisruptive data movement, taking technologies like VMware and other clusters that were built assuming a single storage instance and enabling them to function across arrays and across distance.

VPLEX key use cases comprise:

Continuous operations – VPLEX enables active/active data centers with zero downtime
Migration/tech refresh – VPLEX provides accelerated and nondisruptive migrations and technology refresh
Oracle RAC functionality – VPLEX extends Oracle Real Application Clusters (RAC) and other clusters over distance
VMware functionality – VPLEX extends VMware functionality across distance while enhancing availability
MetroPoint Topology – VPLEX with EMC RecoverPoint delivers a 3-site continuous protection and operational recovery solution

The EMC VPLEX family includes three models:

EMC VPLEX Local:

EMC VPLEX Local delivers availability and data mobility across arrays. VPLEX is a continuous availability and data mobility platform that enables mission-critical applications to remain up and running during a variety of planned and unplanned downtime scenarios.

EMC VPLEX Metro:

EMC VPLEX Metro delivers availability and data mobility across sites. VPLEX Metro with AccessAnywhere enables active-active, block level access to data between two sites within synchronous distances. Host application stability needs to be considered. It is recommended that depending on the application that consideration for Metro be =< 5ms latency. The combination of virtual storage with VPLEX Metro and virtual servers allows for the transparent movement of VM’s and storage across longer distances and improves utilization across heterogeneous arrays and multiple sites.

EMC VPLEX Geo:

EMC VPLEX Metro delivers availability and data mobility across sites. VPLEX Geo with AccessAnywhere enables active-active, block level access to data between two sites within asynchronous distances. Geo improves the cost efficiency of resources and power. It provides the same distributed device flexibility as Metro but extends the distance up to 50ms of network latency.

Saturday, May 9, 2015

Software Defined Storage Solution from Coho Data

For a long time, networking was defined by some distributed protocols like BGP, OSPF, MPLS, STP and so on. Each network device in the topology would run these protocols and collectively they made the internet work. They accomplished the miraculous job of connecting the plethora of devices that make up the internet. However, the amount of effort required to configure, troubleshoot and maintain these devices was enormous. Add to that the cost of upgrading these devices every few years. Collectively, these costs compelled the networking industry to come up with a solution to these problems.

SDN was introduced few decades back. The concept of separating the brain from the device was a radical idea which spread very fast across the networking industry. SDN introduced centralized control to the network. Hence, whole of the network can now be controlled from a single device. This centralized controller evaluates the entire topology and pushes down instructions to individual device thus making sure that each device is working as efficiently as possible. The SDN controller is also able to single-handedly track the resource utilization and respond to failure thus minimizing the down time.

SDN simplified networking to a great extent. However, storage which is complementary to networking was still implemented in the same old way at that time. Coho Data, which is based out of Sunnyvale California took the effort to redefine storage using the concept of software defined networking. It has introduced a control-centric-architecture to storage.

Here's a graphical representation of how the storage controller looks like:

SDSC (Software Defined Storage Controller) is the central decision making engine that runs within the Coho Cluster. It evaluates the system and makes decisions regarding two specific points of control data placement and connectivity. At any point, the SDSC can respond to change by either moving client connections or by migrating data. These two knobs turn out to be remarkably powerful tools in making the system perform well.

The strong aspect of this solution is its modular nature. Not only the storage device are completely new and innovative, innovation has also been done in the switching fabric thus facilitating the migration of data. The solution makes sure that performance is not degraded when the storage capacity scales.

Tiering in Coho Architecture:

Coho’s microarrays are directly responsible for implementing automatic tiering of data that is stored on them. Tiering happens in response to workload characteristics, but a simple characterization of what happens is that as the PCIe flash device fills up, the coldest data is written out to the lower tier. This is illustrated in the diagram below.

All new data writes go to NVMe flash. Effectively, this top tier of flash has the ability to act as an enormous write buffer, with the potential to absorb burst writes that are literally terabytes in size. Data in this top tier is stored sparsely at a variable block size.

As data in the top layer of flash ages and that layer fills, Coho’s operating environment (called Coast) actively migrates cold data to the lower tiers within the microarray. The policy for this demotion is device-specific: on our hybrid (HDD-backed) DataStore nodes, data is consolidated into linear 512K regions and written out as large chunks. On repeated access, or when analysis tells us that access is predictive of future re-access, disk-based data is “promoted,” or copied back into flash so that additional reads to the chunk are served faster.

Source :http://www.cohodata.com/blog/2015/03/18/software-defined-storage/

Tuesday, April 21, 2015

Software Defined Storage

SDS is a class of storage solutions that can be used with commodity storage media and compute hardware; where storage media and compute hardware have no special intelligence embedded in them. All the intelligence of data management and access is provided by a software layer. The solution may provide some or all the feature of modern enterprise storage systems like scale up and out architecture, reliability and fault tolerance, high availability, unified storage management and provisioning, geographically distributed data center awareness and handling, disaster recovery, QoS, resource pooling, integration with existing storage infrastructure, etc. It may provide some or all data access methods like file, block and object.

A generic data flow in a SDS solution is explained in the figure below:

Source : http://thenewstack.io/understanding-software-defined-storage/

VMware defines the Software-defined Storage Architecture as follows:

SDS is a new approach to storage that enables a fundamentally more efficient operational model. We can accomplish this by:

Virtualizing the underlying hardware through the Virtual Data Plane
Automating storage operations across heterogeneous tiers through the Policy-Driven Control Plane

Virtual Data Plane

In the VMware SDS model, the data plane, responsible for storing data and applying data services (snapshots, replication, caching, and more, is virtualized by abstracting physical hardware resources and aggregating them into logical pools of capacity (virtual datastores) that can be flexibly consumed and managed. By making the virtual disk the fundamental unit of management for all storage operations in the virtual datastores, exact combinations of resources and data services can be configured and controlled independently for each VM.

The VMware implementation of the virtual data plane is delivered through:

Virtual SAN – for x-86 hyperconverged storage
vSphere Virtual Volumes – for external storage (SAN/NAS)

Policy-Driven Control Plane

In the VMware SDS model, the control plane acts as the bridge between applications and infrastructure, providing standardized management and automation across different tiers of storage. Through SDS, storage classes of service become logical entities controlled entirely by software and interpreted through policies. Policy-driven automation simplifies provisioning at scale, enables dynamic control over individual service levels for each VM and ensures compliance throughout the lifecycle of the application.

The policy-driven control plane is programmable via public APIs used to control policies via scripting and cloud automation tools, which in turn enable self-service consumption of storage for application tenants.

The VMware implementation of the policy-driven control plane is delivered through:

Storage Policy-Based Management – provides management over external storage (SAN/NAS) through vSphere Virtual Volumes and over x86 storage through Virtual SAN.

Nutanix which is another player in the field of Software-defined Storage follows a similar approach but the controller here is a seperate VM on top of hypervisor and requires Nutanix hardware to implement the approach.

You can read more on software defined storage in this ebook written by Scott Lowe

Coho Data which is based out of Sunnyvale, California uses a SDN enabled data stream switch to connect the VMs to storage implemented as MicroArray Nodes containing PCIe flash and hard drives.

Data Hypervisor Software on the MicroArray virtualizes storage hardware to create a high performance, bare metal object store that scales to support different application needs without static storage tiers.

Coho Data Architecture: http://www.cohodata.com/coho-scale-out-storage-architecture

Sunday, April 19, 2015

Storage Area Network

Challenges with Directly Attached Storage:

1. Storage remains isolated and underutilized.
2. Complexity in sharing storage resources across multiple servers.
3. High cost of managing information.
4. Challenges in scalability.

An effective information management system must provide:

1. Timely information to business users
2. Flexible and resilient storage infrastructure.

A storage area network(SAN) provides such a solution.

A storage area network is a high-speed, dedicated network designed to deliver block-level storage to computers that are not directly connected to the storage devices or drive arrays. The storage in a SAN is not owned by any server unlike DAS(Directly Attached Storage) but is accessible by all of the servers on the network.

Advantages of SAN:

Enables sharing of storage resources across multiple servers.
Centralizes storage and management
Meets increasing storage demands efficiently with better economics of scale.

SAN Classification:

Fibre Channel (FC) SAN: uses Fiber Channel protocol for communitcation.
IP SAN: uses IP-based protocolss for communication
Fibre Channel over Ethernet (FCoE) SAN: uses FCoE protocol for communication.

Understanding Fibre Channel:

High-speed network technology: Supports upto 16 Gbps

Highly Scalable : accomodates approximately 15 million devices.

Components of FC SAN:

Node (server and storage) ports: Provide physical interface for communicating with other nodes.

Exist on

- HBA in server

- Front-end adapters in storage

Each port has a transmit(Tx) link and a receive (Rx) link

Cables:

SAN implementation uses

- Optical fiber cables for long distances

- Copper cables for short distance

Two types of optical cables:

Single-mode: Carries single beam of light and carries signal upto 10 km

Multimode : Can carry multiple beams of light simultaneously. Used for short distance

Connectors:

Attached at the end of a cable

Enable swift connection and disconnection of the cable to and from a port

Commonly used connectors for fiber optic cables are:

Standard Connector(SC): Duplex connectors

Lucent Connector(LC) : Duplex connectors

Straight Tip(ST) : Patch panel connectors and Simplex connectors.

Interconnecting Devices:

Commenly used interconnecting devices in FC SAN are:

- Hubs, switches and directors

Hubs provide limited connectivity and scalability

Switches and directors are intelligent devices

- Switches are available with fixed port count or modular design

- Directors are always modular, and its port count can be increased by inserting additional 'line cards' or 'blades'.

- High-end switches and directors contain redundant components.

- Both switches and directors have management port to connect to SAN management servers.

SAN Management Software:

- A suite of tools used in a SAN to manage interfaces between host and storage arrays

- Provides intergrated management of SAN environment

- Enables web-based management using GUI or CLI

FC Interconnectivity Options:

Point-to-Point Connectivity:

Simplest FC configuration which enables direct connection between nodes.
Offers limited connectivity and scalability
Used in DAS environment

FC-AL Connectivity:

Provides shared loop to attached nodes: Nodes must arbitrate to gain control
Implemented using ring or star topology. May also use hub which uses star topology.
Limitations of FC-AL :

- Only one device can perform I/O operation at a time

- Uses 8 bit of the 24 bit fiber channel addressing.1 address is reserved to connect to FC switch port. Supports upto 126 nodes.

- Addition or removal of a node causes momentary pause in loop traffic

FC- SW Connectivity :

Creates a logical space(called fabric) in which all nodes communicate using switches. Interswitch links(ISL) enable switches to be connected.
Provides dedicated path between nodes.
Addition/removal of node does not affect traffic of other nodes.
Each port has unique 24 bit FC address.

Port Types in Switch Fabric:

Port provides physical interface to a device to connect to other devices. The types are:

N_port: or Node port is typically a host port of storage array switch.

E_port: or Extension port which is connected to E-port of other switch

F_port : or Fabric port is a port in switch which connects to N_port

G_port: or Generic port can work as F_port or E_port which is automatically done.

Fibre Channel Protocol (FCP) Overview:

Traditional technologies such as SCSI have limited scalability and distance
Network technologies provide greater scalability and distance but have high protocol overhead.
FCP provides benefits of both channel and network technologies
- High performance with low protocol overheads
- High scalability with long distance capability
Implements SCSI over FC network
Storage devices attached to SAN, appear as local storage devices to host operating system

Addressing in switched Fabric:

The server or disk array which has a HBA reports itself to the network using Fabric Login(FLogi). It advertises its NWWN(Node World Wide Name). The FC switch replies with the FC ID for that device. This functionality is similar to that of a DHCP.

A FC switch has a block of addresses assigned to it represented by its Domain ID.

Domain ID is a unique number provided to each switch in the fabric. Domain IDs can be statistically or dynamically configured. Since permission is required to assign domain ID, it never overlaps. One switch is elected as the principal switch. This is elected based on priority value and system WWN. The lowest one wins. No backup principal switch is elected unlike DR/BDR selection in normal switches. If the principal switch dies, new now is elected. The failover is fast.

- 239 addresses are available for domain ID.

Maximum possible number of node ports in a switched fabric:

- 239 domains * 256 areas * 256 ports = 15,663,104

In case of multiple switches, FSPF(Fabric Shortest Path First) is used for routing which uses Fabric IDs. Fiber channel routing table is checked for routing.

Address Resolution using Fiber Channel Name Server(FCNS):

FCNS has a list of PWWN and FC ID. This server is run by principal switch. FCNS database is distributed across all switches so there is no need of backup. As soon as the device gets reply from switch with the FC ID, the host will send PLogi message with PWWN and FC ID thus registering itself with principal switch. For address resolution, host will send query to FCNS with PWWN and FCNS replise with FC ID. Thus routing is based on FC ID which is a logical address. Thus fiber channel is a layer-3 protocol.

Sandesh Shrestha

Pages