learning and sharing: June 2014

Monday, 23 June 2014

Find all subdomain of domain name

1. Use Wolfram Alpha

2. Use Google
http://www.google.com/search?num=100&q=site:*.domain.name/

3. Use pentest-tool
https://pentest-tools.com/reconnaissance/find-subdomains-of-domain

Thursday, 12 June 2014

Object storage and Block Storage

Chapter 6. Storage Decisions

Storage is found in many parts of the OpenStack stack, and the differing types can cause confusion to even experienced cloud engineers. This section focuses on persistent storage options you can configure with your cloud. It's important to understand the distinction between ephemeral storage and persistent storage.

Ephemeral Storage

If you deploy only the OpenStack Compute Service (nova), your users do not have access to any form of persistent storage by default. The disks associated with VMs are "ephemeral," meaning that (from the user's point of view) they effectively disappear when a virtual machine is terminated.

Persistent Storage

Persistent storage means that the storage resource outlives any other resource and is always available, regardless of the state of a running instance.

Today, OpenStack clouds explicitly support two types of persistent storage: object storage and block storage.

Object Storage

With object storage, users access binary objects through a REST API. You may be familiar with Amazon S3, which is a well-known example of an object storage system. Object storage is implemented in OpenStack by the OpenStack Object Storage (swift) project. If your intended users need to archive or manage large datasets, you want to provide them with object storage. In addition, OpenStack can store your virtual machine (VM) images inside of an object storage system, as an alternative to storing the images on a file system.

OpenStack Object Storage provides a highly scalable, highly available storage solution by relaxing some of the constraints of traditional file systems. In designing and procuring for such a cluster, it is important to understand some key concepts about its operation. Essentially, this type of storage is built on the idea that all storage hardware fails, at every level, at some point. Infrequently encountered failures that would hamstring other storage systems, such as issues taking down RAID cards or entire servers, are handled gracefully with OpenStack Object Storage.

A good document describing the Object Storage architecture is found within the developer documentation—read this first. Once you understand the architecture, you should know what a proxy server does and how zones work. However, some important points are often missed at first glance.

When designing your cluster, you must consider durability and availability. Understand that the predominant source of these is the spread and placement of your data, rather than the reliability of the hardware. Consider the default value of the number of replicas, which is three. This means that before an object is marked as having been written, at least two copies exist—in case a single server fails to write, the third copy may or may not yet exist when the write operation initially returns. Altering this number increases the robustness of your data, but reduces the amount of storage you have available. Next, look at the placement of your servers. Consider spreading them widely throughout your data center's network and power-failure zones. Is a zone a rack, a server, or a disk?

Object Storage's network patterns might seem unfamiliar at first. Consider these main traffic flows:

Among object, container, and account servers
Between those servers and the proxies
Between the proxies and your users

Object Storage is very "chatty" among servers hosting data—even a small cluster does megabytes/second of traffic, which is predominantly, “Do you have the object?”/“Yes I have the object!” Of course, if the answer to the aforementioned question is negative or the request times out, replication of the object begins.

Consider the scenario where an entire server fails and 24 TB of data needs to be transferred "immediately" to remain at three copies—this can put significant load on the network.

Another fact that's often forgotten is that when a new file is being uploaded, the proxy server must write out as many streams as there are replicas—giving a multiple of network traffic. For a three-replica cluster, 10 Gbps in means 30 Gbps out. Combining this with the previous high bandwidth demands of replication is what results in the recommendation that your private network be of significantly higher bandwidth than your public need be. Oh, and OpenStack Object Storage communicates internally with unencrypted, unauthenticated rsync for performance—you do want the private network to be private.

The remaining point on bandwidth is the public-facing portion. The swift-proxy service is stateless, which means that you can easily add more and use HTTP load-balancing methods to share bandwidth and availability between them.

More proxies means more bandwidth, if your storage can keep up.

Block Storage

Block storage (sometimes referred to as volume storage) provides users with access to block-storage devices. Users interact with block storage by attaching volumes to their running VM instances.

These volumes are persistent: they can be detached from one instance and re-attached to another, and the data remains intact. Block storage is implemented in OpenStack by the OpenStack Block Storage (cinder) project, which supports multiple backends in the form of drivers. Your choice of a storage backend must be supported by a Block Storage driver.

Most block storage drivers allow the instance to have direct access to the underlying storage hardware's block device. This helps increase the overall read/write IO.

Experimental support for utilizing files as volumes began in the Folsom release. This initially started as a reference driver for using NFS with cinder. By Grizzly's release, this has expanded into a full NFS driver as well as a GlusterFS driver.

These drivers work a little differently than a traditional "block" storage driver. On an NFS or GlusterFS file system, a single file is created and then mapped as a "virtual" volume into the instance. This mapping/translation is similar to how OpenStack utilizes QEMU's file-based virtual machines stored in/var/lib/nova/instances.

OpenStack Storage Concepts

Table 6.1, “OpenStack storage” explains the different storage concepts provided by OpenStack.

Table 6.1. OpenStack storage
	Ephemeral storage	Block storage	Object storage
Used to…	Run operating system and scratch space	Add additional persistent storage to a virtual machine (VM)	Store data, including VM images
Accessed through…	A file system	A block device that can be partitioned, formatted, and mounted (such as, /dev/vdc)	The REST API
Accessible from…	Within a VM	Within a VM	Anywhere
Managed by…	OpenStack Compute (nova)	OpenStack Block Storage (cinder)	OpenStack Object Storage (swift)
Persists until…	VM is terminated	Deleted by user	Deleted by user
Sizing determined by…	Administrator configuration of size settings, known as flavors	User specification in initial request	Amount of available physical storage
Example of typical usage…	10 GB first disk, 30 GB second disk	1 TB disk	10s of TBs of dataset storage

File-level Storage (for Live Migration)

With file-level storage, users access stored data using the operating system's file system interface. Most users, if they have used a network storage solution before, have encountered this form of networked storage. In the Unix world, the most common form of this is NFS. In the Windows world, the most common form is called CIFS (previously, SMB).

OpenStack clouds do not present file-level storage to end users. However, it is important to consider file-level storage for storing instances under/var/lib/nova/instances when designing your cloud, since you must have a shared file system if you want to support live migration.

Choosing Storage Backends

Commodity Storage Backend Technologies

Users will indicate different needs for their cloud use cases. Some may need fast access to many objects that do not change often, or want to set a time-to-live (TTL) value on a file. Others may access only storage that is mounted with the file system itself, but want it to be replicated instantly when starting a new instance. For other systems, ephemeral storage—storage that is released when a VM attached to it is shut down— is the preferred way. When you select storage backends,ask the following questions on behalf of your users:

Do my users need block storage?
Do my users need object storage?
Do I need to support live migration?
Should my persistent storage drives be contained in my compute nodes, or should I use external storage?
What is the platter count I can achieve? Do more spindles result in better I/O despite network access?
Which one results in the best cost-performance scenario I'm aiming for?
How do I manage the storage operationally?
How redundant and distributed is the storage? What happens if a storage node fails? To what extent can it mitigate my data-loss disaster scenarios?

To deploy your storage by using only commodity hardware, you can use a number of open-source packages, as shown in Table 6.2, “Persistent file-based storage support”.

Table 6.2. Persistent file-based storage support
	Object	Block	File-level
Swift
LVM
Ceph			Experimental
Gluster
NFS
ZFS
Sheepdog
This list of open source file-level shared storage solutions is not exhaustive; other open source solutions exist (MooseFS). Your organization may already have deployed a file-level shared storage solution that you can use.

Also, you need to decide whether you want to support object storage in your cloud. The two common use cases for providing object storage in a compute cloud are:

To provide users with a persistent storage mechanism
As a scalable, reliable data store for virtual machine images

Commodity Storage Backend Technologies

This section provides a high-level overview of the differences among the different commodity storage backend technologies. Depending on your cloud user's needs, you can implement one or many of these technologies in different combinations:

OpenStack Object Storage (swift)

The official OpenStack Object Store implementation. It is a mature technology that has been used for several years in production by Rackspace as the technology behind Rackspace Cloud Files. As it is highly scalable, it is well-suited to managing petabytes of storage. OpenStack Object Storage's advantages are better integration with OpenStack (integrates with OpenStack Identity, works with the OpenStack dashboard interface) and better support for multiple data center deployment through support of asynchronous eventual consistency replication.

Therefore, if you eventually plan on distributing your storage cluster across multiple data centers, if you need unified accounts for your users for both compute and object storage, or if you want to control your object storage with the OpenStack dashboard, you should consider OpenStack Object Storage. More detail can be found about OpenStack Object Storage in the section below.

Ceph

A scalable storage solution that replicates data across commodity storage nodes. Ceph was originally developed by one of the founders of DreamHost and is currently used in production there.

Ceph was designed to expose different types of storage interfaces to the end user: it supports object storage, block storage, and file-system interfaces, although the file-system interface is not yet considered production-ready. Ceph supports the same API as swift for object storage and can be used as a backend for cinder block storage as well as backend storage for glance images. Ceph supports "thin provisioning," implemented using copy-on-write.

This can be useful when booting from volume because a new volume can be provisioned very quickly. Ceph also supports keystone-based authentication (as of version 0.56), so it can be a seamless swap in for the default OpenStack swift implementation.

Ceph's advantages are that it gives the administrator more fine-grained control over data distribution and replication strategies, enables you to consolidate your object and block storage, enables very fast provisioning of boot-from-volume instances using thin provisioning, and supports a distributed file-system interface, though this interface is not yet recommended for use in production deployment by the Ceph project.

If you want to manage your object and block storage within a single system, or if you want to support fast boot-from-volume, you should consider Ceph.

Gluster

A distributed, shared file system. As of Gluster version 3.3, you can use Gluster to consolidate your object storage and file storage into one unified file and object storage solution, which is called Gluster For OpenStack (GFO). GFO uses a customized version of swift that enables Gluster to be used as the backend storage.

The main reason to use GFO rather than regular swift is if you also want to support a distributed file system, either to support shared storage live migration or to provide it as a separate service to your end users. If you want to manage your object and file storage within a single system, you should consider GFO.

LVM

The Logical Volume Manager is a Linux-based system that provides an abstraction layer on top of physical disks to expose logical volumes to the operating system. The LVM backend implements block storage as LVM logical partitions.

On each host that will house block storage, an administrator must initially create a volume group dedicated to Block Storage volumes. Blocks are created from LVM logical volumes.

	Note
	LVM does not provide any replication. Typically, administrators configure RAID on nodes that use LVM as block storage to protect against failures of individual hard drives. However, RAID does not protect against a failure of the entire host.

ZFS

The Solaris iSCSI driver for OpenStack Block Storage implements blocks as ZFS entities. ZFS is a file system that also has the functionality of a volume manager. This is unlike on a Linux system, where there is a separation of volume manager (LVM) and file system (such as, ext3, ext4, xfs, and btrfs). ZFS has a number of advantages over ext4, including improved data-integrity checking.

The ZFS backend for OpenStack Block Storage supports only Solaris-based systems, such as Illumos. While there is a Linux port of ZFS, it is not included in any of the standard Linux distributions, and it has not been tested with OpenStack Block Storage. As with LVM, ZFS does not provide replication across hosts on its own; you need to add a replication solution on top of ZFS if your cloud needs to be able to handle storage-node failures.

We don't recommend ZFS unless you have previous experience with deploying it, since the ZFS backend for Block Storage requires a Solaris-based operating system, and we assume that your experience is primarily with Linux-based systems.

Conclusion

We hope that you now have some considerations in mind and questions to ask your future cloud users about their storage use cases. As you can see, your storage decisions will also influence your network design for performance and security needs. Continue with us to make more informed decisions about your OpenStack cloud design.

Monday, 9 June 2014

What is Direct-IO

A file is simply a collection of data stored on media. When a process wants to access data from a file, the operating system brings the data into main memory, the process reads it, alters it and stores to the disk . The operating system could read and write data directly to and from the disk for each request, but the response time and throughput would be poor due to slow disk access times. The operating system therefore attempts to minimize the frequency of disk accesses by buffering data in main memory, within a structure called the file buffer cache.

Certain applications derive no benefit from the file buffer cache. Databases normally manage data caching at the application level, so they do not need the file system to implement this service for them. The use of a file buffer cache results in undesirable overheads in such cases, since data is first moved from the disk to the file buffer cache and from there to the application buffer. This “doublecopying” of data results in more CPU consumption and adds overhead to the memory too.

For applications that wish to bypass the buffering of memory within the file system cache, Direct I/O is provided. When Direct I/O is used for a file, data is transferred directly from the disk to the application buffer, without the use of the file buffer cache. Direct I/O can be used for a file either by mounting the corresponding file system with the direct i/o option (options differs for each OS), or by opening the file with the O_DIRECT flag specified in the open() system call. Direct I/O benefits applications by reducing CPU consumption and eliminating the overhead of copying data twice – first between the disk and the file buffer cache, and then from the file.

However, there are also few performance impacts when direct i/o is used. Direct I/O bypasses filesystem read-ahead – so there will be a performance impact.

Hardware RAID, Software RAID, ZFS (st)

Q: Which is best, hardware RAID, vs Software RAID, vs ZFS ?
A: Short answer: If your drives are 500GB or less, pick whichever you like / can afford. If you are using drives larger than 500GB use ZFS. No matter what you decide to use, remember to backup your data!

Still have questions? Read on:

If you have been a system administrator for any reasonable length of time you've had a server with a hardware RAID controller report a failed disk. When replacing the failed disk with a new one, another drive fails during the rebuild, the system crashes and you have no choice but to restore your data from backup. This will happen due to the controller being bad or more likely the silent corruption of data on one of the other disks in the array. Silent corruption of data, colloquially referred to as "bit rot", is more frequent these days as hard drives store larger amounts of data. I have personally seen this happen on several occasions over the last 20 years, most frequently within the last 5, so it is natural to wonder if the pros of hardware RAID really outweigh the cons for a general purpose server or NAS device.

What follows is not meant to be the last word on this subject nor a scholarly article. My goal is merely to help NAS4Free users make a fast, informed decision by collecting the most important information from the last few years in one place. When you are done reading everything here you should know exactly why you have chosen one solution over the others. Different people have different needs and different budgets, there is no single solution which is right for everyone, but it is possible to arrive at some general recommendations that will be best for most people.Those recommendations are the short answer above. If you are looking for “How To” type answers or have specific questions about configuring or managing SoftRAID or ZFS arrays you should look elsewhere in the Wiki.

Pros of Hardware RAID

Improves server up-time statistics and availability which might be negatively impacted by common, predictable disk failures when configured and monitored properly.
Easy to set up – Some controllers have a menu driven wizard to guide you through building your array or automatically set up right out of the box.
Easy to replace disks – If a disk fails just pull it out and replace it with a new one assuming all other hardware is hot-plug compatible.
Performance improvements (sometimes) – If you are using an underpowered CPU, a HW RAID card with a decent cache and BBU (write backs enabled of course) will generate significantly better throughput. However this will only be a benefit if you are running tremendously intense workloads (transferring data faster than 1Gbps or hosting apps doing constant reads/write), the typical NAS4Free server is not used in such a way, though it could be.
Some controllers have battery backups that help reduce the possibility of data damage/loss during a power outage.

Cons of Hardware RAID

Proprietary – Minimal or complete lack of detailed hardware and software specifications/instructions. You are stuck with this vendor.
On-Disk meta data can make it near impossible to recover data without a compatible RAID card – If your controller dies you’ll have to find a compatible model to replace it with, your disks won’t be useful without the controller. This is especially bad if working with a discontinued model that has failed after years of operation.
Monitoring applications are all over the road – Depending on the vendor and model the ability and interface to monitor the health and performance of your array varies greatly. Often you are tied to specific piece of software that the vendor provides.
Additional cost – Hardware RAID cards cost more than standard disk controllers. High end models can be very expensive.
Lack of standardization between controllers (configuration, management, etc)- The keystrokes, nomenclature and software that you use to manage and monitor one card likely won’t work on another.
Inflexible – Your ability to create, mix and match different types of arrays and disks and perform other maintenance tasks varies tremendously with each card.
Hardware RAID is not backup, if you disagree then you need to read the definition again. Even the most expensive Hardware RAID is so much junk without a complete and secure backup of your data which would be necessary following a catastrophic, unpredictable failure.

Pros of Software RAID

ZFS and SoftRAID can improve server up-time statistics and availability which might be negatively impacted by common, predictable disk failures when configured and monitored properly.
ZFS and similar (like BtrFS) were deliberately developed as software systems and for the most part do not benefit from dedicated hardware controllers.
Development and innovation in ZFS, BtrvFS and others is active and ongoing.
Hardware independent – RAID is implemented in the OS which keeps things consistent regardless of the hardware manufacturer.
Easy recovery in case of MB or controller failure - With software RAID you can swap the drives to a different server and read the data. There is no vendor lock-in with a software RAID solution.
Standardized RAID configuration (for each OS) – The management toolkit is OS specific rather than specific to each individual type of RAID card.
Standardized monitoring (for each OS) – Since the toolkit is consistent you can expect to monitor the health of each of your servers using the same methods.
Removing and replacing a failed disk is typically very simple and you may not even have to reboot the server if all your hardware is hot-plug compatible.
Good performance – CPUs just keep getting faster and faster. Unless you are performing tremendous amounts of IO the extra cost just doesn’t seem worthwhile. Even then, it takes milliseconds to read/write data, but only nanoseconds to compute parity, so until drives can read/write much faster, the overhead is minimal for a modern processor.
Very flexible – Software RAID allows you to reconfigure your arrays in ways that are generally not possible with hardware controllers. Of course, not all configurations are useful, some can be downright dumb.

Cons of Software RAID

Need to learn the software RAID tool set for each OS. – These tools are often well documented but often more difficult to learn/understand as their hardware counterparts. Flexibility does have it's price.
Possibly slower performance than dedicated hardware – A high end dedicated RAID card might match or outperform software RAID if you are willing to spend over $500.00 for it.
Development of SoftRAID0 through 6 has pretty much stopped as focus moves to newer technologies like ZFS and BtrvFS.
Additional load on CPU – RAID operations have to be calculated somewhere and in software this will run on your CPU instead of dedicated hardware. While I have yet to see real-world performance degradation it is possible some servers/applications may be more or less affected. As you add more drives/arrays you should keep in mind the minimum processor/RAM needs for the server as a whole. You need to test your particular use case to ensure everything works well.
Additional RAM requirements - particularly if using ZFS and some of its high end features, but not for plain old SoftRAID.
Susceptible to power outages - A UPS is strongly recommended when using SoftRAID as data loss or damage could occur if power is lost. ZFS is less susceptible, but I would still recommend a UPS.
Disk replacement sometimes requires prep work - You typically should tell the software RAID system to stop using a disk before you yank it out of the system. I’ve seen systems panic when a disk was physically removed before being logically removed. This is of course minimized if all your hardware is hot-plug compatible.
ZFS and Software RAID are not backup, if you disagree then you need to read the definition again. Even the largest and cheapest SoftRAID is so much junk without a complete and secure backup of your data which would be necessary following a catastrophic, unpredictable failure.

It will be pretty obvious to most people after weighing the pros and cons above that SoftRAID is the better choice for most NAS4Free users. This is true based on price/performance even before we consider stability and longevity of your data.

Now that we've established Software RAID is the way to go, what do you choose? SoftRAID1, SoftRAID5, 6? Maybe a hybrid SoftRAID 10 or 50? What about ZFS?

As drive capacities increase so does silent data corruption or "bit rot". You must understand what this is to properly assess the odds of a catastrophic loss of data happening to you. Silent data corruption is not a new concept, it has been well discussed for years that RAID5 is pretty useless and RAID6 will soon be in Why RAID 5 stops working in 2009, also archived here. Read the article, it's about 2 pages and easy to understand even though you have to use powers of 10 to do the math. The take away is don't waste your money or your time on RAID5 or RAID6 if you really care about your data. Personally I have never trusted my data to either.

Since you now understand data integrity can no longer be improved with RAID5 or 6 you are left with just a few choices.

Hybrid SoftRAID - Not really a good option due to cost and management issues for most people, but still valuable in certain specific scenarios.
SoftRAID1 - I have always used SoftRAID1 with a good backup and it has always been cheap and easy to manage.
ZFS - Has a big advantage over plain old SoftRAID, data integrity is built in from the start while SoftRAID has none. Data integrity is the big advantage of ZFS and to learn more about it you should read A Conversation with Jeff Bonwick and Bill Moore - The future of file systems also archived here.

Now that we are starting to use 3TB drives on a regular basis, bit rot becomes an even bigger issue.
How big? Well, the science is not yet fully settled because testing this stuff is so boring. You should read Keeping Bits Safe: How Hard Can It Be? by David S.H. Rosenthal, published in the monthly magazine of the Association for Computing Machinery (ACM). I will summarize best as I can: A NetApp study encompassing data collected over 41 months from NetApp’s filers in the field, covering more than 1.5 million drives found more than 400,000 silent corruption incidents. A CERN study writing and reading over 88,000TB of data within a 6 month period found .00017TB was silently corrupted. On average, it is not unreasonable to expect roughly 3 corrupt files on a 1TB drive according to another CERN study; CERN's data corruption research also archived here. Yes, the numbers appear small, but these studies were conducted in 2007/08, before we had 3TB hard drives. The numbers have not gotten better, they get worse with increased capacities and longer time frames.

Stop and think about it, when you had 400GB of data spread among a bunch of 500GB drives, how likely was it you would lose everything? Now that you have 2.5TB of data spread among a bunch of 3TB drives, how likely is it you could lose everything? Knowing that MTBFs and error rates have not improved, yet capacities have increased 6X or more, you've got to ask yourself one question, “Do I feel lucky?”.

The bigger and older your drives are the more likely you are to suffer a failure or silent corruption. So save your time and money, data integrity is only one of the many benefits of ZFS, choose it from the start and avoid problems later. Don't forget you still need a backup even if you are using ZFS.