Thursday, 28 February 2019

Industry Practices and Tools - Part 2

Importance of Maintaining The Quality of Code

Good quality code is an essential property of a software because it could lead to financial losses or waste of time needed for further maintenance, modification or adjustments if code quality is not good enough.

Readable

The ability of allowing the code to be easily, quickly, and clearly understandable by someone new or someone that hasn't seen it in a while.
Ensures that everyone can understand the code written by everyone else.
If the code is messy and badly written, it would be very hard to understand what the code does and where changes need to be made.
This could waste much time trying to figure out how it all fits together before making any action and even end up re-writing the again assuming that it is buggy and carelessly written.

Efficiency

Directly related to the performance and speed of running the software.
The quality of the software can be evaluated with the efficiency of the code used.
No one likes to use a software that takes too long to perform an action.

Reliability

Ability to perform consistent and failure-free operations every time it runs.
The software would be very less useful if the code function differently every time it runs even with the same input in same environment and if it breaks down often without throwing any errors.

Robustness

Ability to cope with errors during program execution even under unusual condition.
Image how would you feel when you use a software that keep showing strange and unfamiliar message when you did something wrong.
Software is typically buggy and fragile but it should handle any errors encountered gracefully.

Portability

Ability of the code to be run on as many different machines and operating systems as possible.
It would be a waste of time and energy for programmers to re-write the same code again when it transferred from one environment to another.

Maintainability

Code that is easy to add new features, modify existing features or fix bugs with a minimum of effort without the risk of affecting other related modules .
Software always needs new features or bug fixes. So the written code must be easy to understand, easy to find what needs to be change, easy to make changes and easy to check that the changes have not introduced any bugs.

Different Approaches Used to Measure The Quality of Code

Reliability

Reliability measures the probability that a system will run without failure over a specific period of operation.
It relates to the number of defects and availability of the software.
Number of defects can be measured by running a static analysis tool.
Software availability can be measured using the mean time between failures(MTBF).
Low defect counts are especially important for developing reliable codebase.

Testability

Testability measures how well the software supports testing efforts. It relies on how well you can control, observe, isolate, and automate testing, among other factors.
Testability can be measured based on how many test cases you need to find potential faults in the system.
Size and complexity of the software can impact testability.
So, applying methods at the code level such as cyclomatic complexity can help you improve the testability of the component.

Maintainability

Maintainability measures how easily software can be maintained.
It relates to the size, consistency, structure, and complexity of the codebase.
And ensuring maintainable source code relies on a number of factors, such as testability and understandability.
You can’t use a single metric to ensure maintainability.
Some metrics you may consider to improve maintainability are number of stylistic warnings and Halstead complexity measures.
Both automation and human reviewers are essential for developing maintainable codebases.

Portability

Portability measures how usable the same software is in different environments.
It relates to platform dependency.
There isn’t a specific measure of portability. But there are several ways you can ensure portable code.
It’s important to regularly test code on different platforms, rather than waiting until the end of development.
It’s also a good idea to set your compiler warning levels as high as possible and use at least two compilers.
Enforcing a coding standard also helps with portability.

Reusablitiy

Reusability measures whether existing assets such as code can be used again.
Assets are more easily reused if they have characteristics such as modularity or loose coupling.
Reusability can be measured by the number of interdependencies.
Running a static analyzer can help you identify these interdependencies.

Available Tools to Maintain Code Quality

Collaborator

Collaborator is the most comprehensive peer code review tool, built for teams working on projects where code quality is critical.
See code changes, identify defects, and make comments on specific lines. Set review rules and automatic notifications to ensure that reviews are completed on time.
Custom review templates are unique to Collaborator. Set custom fields, checklists, and participant groups to tailor peer reviews to your team’s ideal workflow.
Easily integrate with 11 different SCMs, as well as IDEs like Eclipse & Visual Studio.
Build custom review reports to drive process improvement and make auditing easy.
Conduct peer document reviews in the same tool so that teams can easily align on requirements, design changes, and compliance burdens.

Review Assistant

Review Assistant is a code review tool.
This code review plug-in helps you to create review requests and respond to them without leaving Visual Studio.
Review Assistant supports TFS, Subversion, Git, Mercurial, and Perforce.
Simple setup: up and running in 5 minutes.

Key Features

Flexible code reviews
Discussions in code
Iterative review with defect fixing
Team Foundation Server integration
Flexible email notifications
Rich integration features
Reporting and Statistics
Drop-in Replacement for Visual Studio Code Review Feature and much more

Codebrag

Codebrag is a simple, light-weight, free and open source code review tool which makes the review entertaining and structured.
Codebrag is used to solve issues like non-blocking code review, inline comments & likes, smart email notifications etc.
With Codebrag one can focus on workflow to find out and eliminate issues along with joint learning and teamwork.
Codebrag helps in delivering enhanced software using its agile code review.
License for Codebrag open source is maintained by AGPL.

Gerrit

Gerrit is a free web-based code review tool used by the software developers to review their code on a web-browser and reject or approve the changes.
Gerrit provides the repository management for Git.
Gerrit can be integrated with Git which is a distributed Version Control System.
Gerrit is also used in discussing a few detailed segments of the code and enhancing the right changes to be made.
Using Gerrit, project members can use rationalized code review process and also the extremely configurable hierarchy.

Codestriker

Codestriker is an open source and free online code reviewing web application that assists the collaborative code review.
Using Codestriker one can record the issues, comments, and decisions in a database which can be further used for code inspections.
Codestriker supports traditional documents review. It can be integrated with ClearCase, Bugzilla, CVS etc.
Codestriker is licensed under GPL.

Veracode

Veracode (now acquired by CA Technologies) is a company which delivers various solutions for automated & on-demand application security testing, automated code review etc.
Veracode is used by the developers in creating secured software by scanning the binary code or byte code in place of source code.
Using Veracode one can identify the improper encrypted functionalities, malicious code and backdoors from a source code.
Veracode can review a large amount of code and returns the results immediately.
To use Veracode there is no need to buy any software or hardware, you just need to pay for the analysis services you need.

Dependency/Package Management Tools

Most digital services will rely on some third-party code from other software to work properly.
This is called a dependency.
A dependency is something you rely upon to achieve a goal but that is not under your control.
Project dependencies come in many shapes and sizes.
They include hardware, software, resources, and people. Your software project will rely on a large number of dependencies, regardless of the size of the your technology stack, or the available human and financial resources.
It is tempting to look at each dependency as a single independent unit.
However, this type of isolated dependency is rare. In general, each dependency is part of a complex web of interconnected relationships that is hard to untangle.
You’ll need to manage any dependencies in your service carefully to keep your code up to date , system secure and service working as intended.
A common approach, especially in the open source community, is to use a dependency management tool.
This pulls in third-party dependencies automatically at runtime, deploy time or compile time.
Even if you’re using a dependency management tool, you shouldn’t just trust a dependency without testing it first.
This includes how secure it is. For example, if a library used to generate a web form introduces an SQL injection vulnerability , then your acceptance tests should fail.
You will need to trust the specific code and version you are using , not just the general library or framework it belongs to.
As demonstrated in the graphic above, the typical number of dependencies for a representative open source package varies widely between ecosystems.
In the modern era of software development, developers incorporate many distinct open source packages into their applications to help speed up development time and improve software quality.
The actual counts can vary across developers and ecosystems—for example, a Python developer might be less likely to pull in dozens of additional packages than a JavaScript developer would, for example.
But regardless of the programming language, each new package pulls in a network of additional (so called transitive) dependencies of its own.
As we illustrate above, the average package in most ecosystems pulls in an additional five dependencies, adding to the overall software complexity.
Software today is like an iceberg: you may actively pull in just a few dependencies yourself, but those known dependencies are only a small percentage of your actual dependency tree.
The additional dependencies brought in by packages that your application relies on are equally important to the security, licensing, and future performance of your software.
The open source that we all rely on extends far beyond the first layer of packages in our applications.
We should seek to understand and ensure the health of every package we use, whether hidden in our transitive dependencies or not.
Let’s look at three common tools of dependency management.

NuGet

NuGet is the package manager for the Microsoft development platform including .NET.
The NuGet client tools provide the ability to produce and consume packages.
The NuGet Gallery is the central package repository used by all package authors and consumers.
When you use NuGet to install a package, it copies the library files to your solution and automatically updates your project (add references, change config files, etc.).
If you remove a package, NuGet reverses whatever changes it made so that no clutter is left.

Composer

This dependency manager for PHP lets you create a composer.json file in your project root , run a single command , and all your dependencies are down;oaded ready to use.
Composer is not a package manager in the same sense as Yum or Apt are.
Yes, it deals with “packages” or libraries, but it manages them on a per-project basis, installing them in a directory (e.g. vendor) inside your project.
By default it does not install anything globally. Thus, it is a dependency manager.
It does however support a “global” project for convenience via the global command.

Nanny

Nanny is a dependency management tool for managing dependencies between your projects.
Unlike tools like Maven, Nanny can be used for arbitrary dependencies and is easy to use.
Nanny lets you specify dependencies to your project, and Nanny will go ahead and pull in all the dependencies (and everything those dependencies are dependent on) into the _deps folder in your project.
Nanny makes it easy to create dependencies and manage dependency versions.

Bower

Bower is a package manager for the web.
Bower lets you easily install assets such as images, CSS and JavaScript, and manages dependencies for you.
Bower can manage components that contain HTML, CSS, JavaScript, fonts or even image files.
Bower doesn’t concatenate or minify code or do anything else.
It just installs the right versions of the packages you need and their dependencies.

Pintjs

Pint is a small, asynchronous, dependency aware wrapper around Grunt attempting to solve some of the problems that accompany a build process at scale.
A typical Gruntfile starts with, at a minimum, some variation of: jsHint, jasmine, LESS, handlebars, uglify, copy, and clean stack.
Just these half dozen or so plugins can balloon your Gruntfile upwards of 300 lines and when you add complex concatenation, cache busting, and versioning can cause it to grow well in to the 1000+ lines.
Pint allows you to break up and organize your build into small testable pieces.

Jam

Jam is a package manager for JavaScript. Unlike other repositories, they put the browser first.
Using a stack of script tags isn’t the most maintainable way of managing dependencies; with Jam packages and loaders like RequireJS you get automatic dependency resolution.
You can achieve faster load times with asynchronous loading and the ability to optimize downloads.
JavaScript modules and packages provide properly namespaced and more modular code.

Volo

Volo is a tool for creating browser based, front end projects from project templates and add dependencies by fetching them from GitHub.
Once your project is set up, automate common tasks.
Volo is dependency manager and project creation tool that favors GitHub for the package repository.
At its heart, volo is a generic command runner — you can create new commands for volo, and you can use commands others have created.

NPM

npm is the package manager tool for JavaScript.
Find , share and reuse packages of code from hundreds of thousands of developers and assemble them in powerful new ways.
Dependencies can be updated and optimized right from the terminal.
And you can build new projects with dependency files and version numbers automatically pulled from the package.json file.

What are the difference between Maven and Ivy?

Maven	Ivy
Dependency ranges are bounded , unbounded and in multiple ranges	Dependency ranges are up to next major version (2.0+)
Defined order (eg 0.10 > 0.9), pluggable version syntax planned, falls back to string comparison	Defined order (eg 0.10 > 0.9), pluggable version syntax planned, falls back to string comparison
Snapshots are configurable timestamping, build numbering, update frequency	Snapshots can publish continuous integration build
Profiles are pluggable activators, as well as by id, jdk, OS, sys property for whole build process including deps	Profiles are single id configuration (must be defined consistently across all ivy configurations)
Scope defines known scopes for sensible build defaults, combined during transitivity	Scope can use configurations in a limited way for this
Filtering can exclude dependencies from tree, apply a version globally	Filtering can exclude dependencies from tree, apply a version globally
Dependency reports are currently basic tabulated reports	Dependency reports are multi-page tabulated report
System scoped can be used transitively	System scoped can not be used transitively

Build Tools

Build tools are programs that automate the creation of executable applications from source code.
Building incorporates compiling, linking and packaging the code into a usable or executable form.
In small projects, developers will often manually invoke the build process.
This is not practical for larger projects, where it is very hard to keep track of what needs to be built, in what sequence and what dependencies there are in the building process.
Using an automation tool allows the build process to be more consistent.
The primary purpose of the first build tools, such as the GNU make and "makedepend" utilities, commonly found in Unix and Linux-based operating systems, was to automate the calls to the compilers and linkers.
Today, as build processes become ever more complex, build automation tools usually support the management of the pre- and post-compile and link activities, as well as the compile and link activities.
Modern build tools go further in enabling work flow processing by obtaining source code, deploying executables to be tests and even optimizing complex build processes using distributed build technologies, which involves running the build process in a coherent, synchronized manner across several machines.
Here are a few things to look for when selecting a build tool.

Speed

Ideally, you want your build tool to be fast in execution as there’s much need for speed when iterating on a website or app.
Also, when changing a line of code, you want to reload the page to see the changes instantly.
Disrupting that process could slow down productivity.

Community Driven

The tool you select should have a healthy community of developers that exchange plugins and are continually adding functionality to support it.

Modular and Flexible

Even the most advanced tool has its limits.
Tools that are extensible allow you to add your own custom functionality giving you the flexiblity to adjust as you see fit.

Significance of using a build tool in large scale software development

Organization and Maintenance of Software Components

This is one of the big tasks of large-scale software development.
The system must be arranged into a set of small, manageable components that interact with each other.
The interaction should take place through well-defined, organized interfaces, simplifying the task of managing and maintaining the components.
Typically, time constraints and insufficient experience combine to introduce defects in the way software systems are architected, leading to decreased quality and larger overheads in managing and maintaining the system.
As an answer to this problem, a number of studies have measured software systems for complexity, which has led to a standardized set of software quality metrics.
While there's no substitute for experienced project managers, the metrics do offer insight into assessing software complexity and quality.
It's also important to be able to detect inconsistencies in the program as it changes.
Typically, a program may be changed in one place, but the effect of these changes in other places is overlooked.
For example, by changing a type it's possible to make an existing type cast located in a different module no longer necessary, and also overlook this type cast.

Time Constraints

A problem faced by large software systems development (in any language) is waiting for the system to be rebuilt after every small change.
The time required to rebuild after each change increases with the size of the system, adding up to expensive overhead costs.
After a certain point, the necessary, endless rebuilds significantly reduce productivity.
Organizing a system into well-architected components and reusable libraries goes a long way toward solving this problem.
Yet developer tools still need to be smart about how much rebuilding they have to do for each small change.
The best solution to this problem is incremental development environments such as VisualAge.
They recompile only the minimum amount necessary when a system is changed.
This concept of incrementality can also be extended to other activities beyond the standard development steps to include QA, testing and so forth.

Memory Management

Large systems tend to use a lot of memory, and unless it's managed carefully the capacity of the underlying hardware can quickly be exhausted.
In systems written in C and C++, the developer has complete responsibility for making sure that unused memory is recycled for future use rather than retained indefinitely.
Java addresses this problem with automatic garbage collection, i.e., the Java Virtual Machine periodically searches for memory that is no longer in use and recycles it for future use.
Unfortunately, garbage collection can take a significant amount of time when systems use a lot of memory, severely contributing to performance degradation.
Most Java programmers today assume that they have to live with this in large Java programs.
The solution is to actively manage memory, and simply let the garbage collector kick in for the smaller chunks of memory as well as what slips through the cracks of the explicit memory management routines.
Hopefully, better garbage collection algorithms will become available shortly in Java Virtual Machines and the problems related to garbage collection will soon be a memory; the next six months will reveal this possibility.

Safety

When building large systems a lot of assumptions are made regarding how the system works.
If the system is correct, these assumptions are met by the system execution.
However, since bugs in software are always expected, these assumptions may not always hold.
The debugging process essentially means running the system in a controlled manner to determine if these assumptions are met, and looking for ways to make corrections when they aren't.
In mission-critical systems some assumptions are important and require enforcement.
Similarly, in multithreaded systems where it's often impossible to reproduce a problem, the violation of any assumption must be reported.
Providing diagnostic APIs solves this problem, allowing assumptions to be built into the program as constraints that must hold during the program's execution.
Tools to help manage them are needed to encourage users to write such constraints.
One important capability necessary to encourage using diagnostic constructs is an easy way to strip out these constructs when it's time to package the system for final shipping.

Build Automation

Build Automation is the process of scripting and automating the retrieval of software code from a repository, compiling it into a binary artifact, executing automated functional tests, and publishing it into a shared and centralized repository.
Also build automation is the process of automating the creation of a software build and the associated processes including compiling computer source code into binary code and running automated tests.
Build-automation utilities allow the automation of simple, repeatable tasks.
When using the tool, it will calculate how to reach the goal by executing tasks in the correct, specific order and running each task.
The two ways build tools differ are task-oriented vs. product-oriented. Task-oriented tools describe the dependency of networks in terms of a specific set task and product-oriented tools describe things in terms of the products they generate.
Automation is achieved through the use of a compile farm for either distributed compilation or the execution of the utility step.
The distributed build process must have machine intelligence to understand the source-code dependencies to execute the distributed build.

Advantages

A necessary pre-condition for continuous integration
Improve product quality
Accelerate the compile and link processing
Eliminate redundant tasks
Minimize "bad builds"
Eliminate dependencies on key personnel
Have history of builds and releases in order to investigate issues
Save time and money - because of the reasons listed above

Build Tools

LambdaTest

LambdaTest is a scalable cloud-based cross browser testing platform designed to offer all software testing need to cloud infrastructure.
According to the vendor, LambdaTest platform helps ensure web app elements (such as JavaScript, CSS, HTLM5, Video...etc.) render seamlessly across every desktop...

Service Control

ServiceControl is an identity management solution that is designed to provide a simpler way to create, manage, and audit accounts across multiple systems.
This software is targeted at solution architects, IDM and IAM project managers, line-of-business application owners, and busy IT administrator.

Apache Ant With Ivy

Ant is java library, which helps to drive the process defined in the build file.
Mainly Ant is used to build java applications.
Ant is very flexible , it does not impose any rules like coding conventions , directory structure.
Ivy is a sub project of Ant , which acts as a dependency manager.

Gradle

Gradle is built upon the concepts of ant and maven.
Gradle uses groovy scripts for declaring project configuration.
Gradle was designed for multi-project builds and supports incremental builds by determining which parts of the build are up-to-date.
Ant is mostly treated as legacy right now. Industry going forward with Gradle build tool.
I personally feel, Ant and Maven still we can use, it mainly depends on the project.
Sometimes we can use a combination of Ant and Gradle, Maven and Gradle, or even three together.

Apache Maven

Maven is more than a build tool.
Maven even describes how software is built and helps in dependency management also.
Maven is used mainly for java based projects.

Maven Build Life Cycle

Maven is based around the central concept of a build lifecycle.
What this means is that the process for building and distributing a particular artifact (project) is clearly defined.
For the person building a project, this means that it is only necessary to learn a small set of commands to build any Maven project, and the POM will ensure they get the results they desired.
Each of these build lifecycles is defined by a different list of build phases, wherein a build phase represents a stage in the lifecycle.
For example, the default lifecycle comprises of the following phases.

Validate

Validate the project is correct and all necessary information is available.

Compile

Compile the source code of the project.

Test

Test the compiled source code using a suitable unit testing framework.
These tests should not require the code be packaged or deployed.

Package

Take the compiled code and package it in its distributable format, such as a JAR.

Verify

Run any checks on results of integration tests to ensure quality criteria are met.

Install

Install the package into the local repository, for use as a dependency in other projects locally.

Deploy

Done in the build environment, copies the final package to the remote repository for sharing with other developers and projects.

Gradle Build Life Cycle

We said earlier that the core of Gradle is a language for dependency based programming.
In Gradle terms this means that you can define tasks and dependencies between tasks.
Gradle guarantees that these tasks are executed in the order of their dependencies, and that each task is executed only once.
There are build tools that build up such a dependency graph as they execute their tasks.
Gradle builds the complete dependency graph before any task is executed.
This lies at the heart of Gradle and makes many things possible which would not be possible otherwise.
A Gradle build has three distinct phases.

Initialization

Gradle supports single and multi-project builds.
During the initialization phase, Gradle determines which projects are going to take part in the build, and creates a project instance for each of these projects.

Configuration

During this phase the project objects are configured.
The build scripts of all projects which are part of the build are executed.

Execution

Gradle determines the subset of the tasks, created and configured during the configuration phase, to be executed.
The subset is determined by the task name arguments passed to the gradle command and the current directory.
Gradle then executes each of the selected tasks.

Maven

Maven is a build automation tool used primarily for Java projects.
Maven addresses two aspects of building software: first, it describes how software is built, and second, it describes its dependencies.
Maven is a project management and comprehension tool that provides developers a complete build lifecycle framework.
Development team can automate the project's build infrastructure in almost no time as Maven uses a standard directory layout and a default build lifecycle.
In case of multiple development teams environment, Maven can set-up the way to work as per standards in a very short time.
As most of the project setups are simple and reusable, Maven makes life of developer easy while creating reports, checks, build and testing automation setups.
To summarize, Maven simplifies and standardizes the project build process.
It handles compilation, distribution, documentation, team collaboration and other tasks seamlessly.
Maven increases reusability and takes care of most of the build related tasks.
The primary goal of Maven is to provide developer with a comprehensive model for projects, which is reusable, maintainable, and easier to comprehend and plugins or tools that interact with this declarative model.
Maven project structure and contents are declared in an xml file, pom.xml, referred as Project Object Model (POM), which is the fundamental unit of the entire Maven system.

Features of Maven

Model based builds

Maven is able to build any number of projects into predefined output types such as jar, war, metadata.

Coherent site of project information

Using the same metadata as per the build process, maven is able to generate a website and a PDF including complete documentation.

Release management and distribution publication

Without additional configuration, maven will integrate with your source control system such as CVS and manages the release of a project.

Backward compatibility

You can easily port the multiple modules of a project into Maven 3 from older versions of Maven. It can support the older versions also.

Parallel Builds

It analyzes the project dependency graph and enables you to build schedule modules in parallel.
Using this, you can achieve the performance improvements of 20-50%.

Better error and integrity reporting

Maven improved error reporting, and it provides you with a link to the Maven wiki page where you will get full description of the error.

How Maven Uses Conventions Over Configurations

Convention over configuration is one of the main design philosophies behind apache maven.
Maven is directed by a configuration file called pom.xml.
This may also be distributed in a project hierarchy where a "parent" pom file calls subsequent pom.xml files lower in the build hierarchy.
Maven also has default targets which perform tasks defined by convention.
All operations can be modified and expanded with more detail.
This is also in contrast to Ant which requires one to define all targets and behavior.
Let's go through a few examples.

A complete Maven project can be created using the following configuration file

<project>

<modelVersion>4.0.0</modelVersion>

<groupId>com.packt</groupId>

<artifactId>sample-one</artifactId>

<version>1.0.0</version>

</project>

If someone needs to override the default , conventional behavior of Maven , then it is possible too.
The following example pom.xml file shows how to override some of the preceding default values.

<project>

<modelVersion>4.0.0</modelVersion>

<groupId>com.packt</groupId>

<artifactId>sample-one</artifactId>

<version>1.0.0</version>

<packaging>jar</packaging>

<build>

<sourceDirectory>${basedir}/src/main/java</sourceDirectory>

<testSourceDirectory>${basedir}/src/test/java

</testSourceDirectory>

<outputDirectory>${basedir}/target/classes

</outputDirectory>

</build>

</project>

Build Profile In Maven

A Build profile is a set of configuration values, which can be used to set or override default values of Maven build.
Using a build profile, you can customize build for different environments such as Production v/s Development environments.
Profiles are specified in pom.xml file using its activeProfiles/profiles elements and are triggered in variety of ways.
Profiles modify the POM at build time, and are used to give parameters different target environments (for example, the path of the database server in the development, testing, and production environments).
Build profiles are majorly of three types.

Per project

Defined in the project POM file, pom.xml

Per user

Defined in Maven settings xml file (%USER_HOME%/.m2/settings.xml)

Global

Defined in Maven global settings xml file (%M2_HOME%/conf/settings.xml)

Now, under src/main/resources , there are three environment specific files.

env.properties

Default configuration used if no profile is mentioned.

env.test.properties

Test configuration when test profile is used.

env.prod.properties

production configuration when prod profile is used.

Profiles are specified using a subset of the elements available in the POM itself (plus one extra section), and are triggered in any of a variety of ways. They modify the POM at build time, and are meant to be used in complementary sets to give equivalent-but-different parameters for a set of target environments (providing, for example, the path of the appserver root in the development, testing, and production environments). As such, profiles can easily lead to differing build results from different members of your team. However, used properly, profiles can be used while still preserving project portability.

Find about maven build life cycle and phases in the upper of the article!

Maven Goals

Each phase is a sequence of goals , and each goal is responsible for a specific tasks.
When we run a phase – all goals bound to this phase are executed in order.
Here are some of the phases and default goals bound to them

Compiler

The compile goal from the compiler plugin is bound to the compile phase.

Surefire

Test is bound to test phase.

Install

Install is bound to install phase.

Jar and War

War is bound to package phase.

Dependency Management in Maven

One of the core features of Maven is Dependency Management.
Managing dependencies is a difficult task once we've to deal with multi-module projects (consisting of hundreds of modules/sub-projects).
Maven provides a high degree of control to manage such scenarios.

Transitive Dependency Discovery

It is pretty often a case, when a library, say A, depends upon other library, say B.
In case another project C wants to use A, then that project requires to use library B too.
Maven helps to avoid such requirements to discover all the libraries required.
Maven does so by reading project files (pom.xml) of dependencies, figure out their dependencies and so on.
We only need to define direct dependency in each project pom.
Maven handles the rest automatically.
With transitive dependencies, the graph of included libraries can quickly grow to a large extent.
Cases can arise when there are duplicate libraries.
Maven provides few features to control extent of transitive dependencies.

Dependency Mediation

Determines what version of a dependency is to be used when multiple versions of an artifact are encountered.
If two dependency versions are at the same depth in the dependency tree, the first declared dependency will be used.

Dependency Management

Directly specify the versions of artifacts to be used when they are encountered in transitive dependencies.
For an example project C can include B as a dependency in its dependency Management section and directly control which version of B is to be used when it is ever referenced.

Dependency Scope

Includes dependencies as per the current stage of the build.

Excluded Dependencies

Any transitive dependency can be excluded using "exclusion" element.
As example, A depends upon B and B depends upon C, then A can mark C as excluded.

Optional Dependencies

Any transitive dependency can be marked as optional using "optional" element.
As example, A depends upon B and B depends upon C. Now B marked C as optional. Then A will not use C.

Dependency Scope

Transitive Dependencies Discovery can be restricted using various Dependency Scope as mentioned below.

Compile

This scope indicates that dependency is available in classpath of project.
It is default scope.

Provided

This scope indicates that dependency is to be provided by JDK or web-Server/Container at runtime.

Runtime

This scope indicates that dependency is not required for compilation, but is required during execution.

Test

This scope indicates that the dependency is only available for the test compilation and execution phases.

System

This scope indicates that you have to provide the system path.

Import

This scope is only used when dependency is of type pom.
This scope indicates that the specified POM should be replaced with the dependencies in that POM's <dependencyManagement> section.

Usually, we have a set of project under a common project. In such case, we can create a common pom having all the common dependencies and then make this pom, the parent of sub-project's poms. Following example will help you understand this concept.

Contemporary Tools and Practices Widely Used in the Software Industry

The growing demand and importance of data analytics in the market have generated many openings worldwide.
It becomes slightly tough to shortlist the top data analytics tools as the open source tools are more popular, user-friendly and performance oriented than the paid version.
There are many open source tools which doesn’t require much/any coding and manages to deliver better results than paid versions e.g. – R programming in data mining and Tableau public, Python in data visualization. Below is the list of top 10 of data analytics tools, both open source and paid version, based on their popularity, learning and performance.

R Programming

R is the leading analytics tool in the industry and widely used for statistics and data modeling.
It can easily manipulate your data and present in different ways.
It has exceeded SAS in many ways like capacity of data, performance and outcome.
R compiles and runs on a wide variety of platforms viz -UNIX, Windows and MacOS.
It has 11,556 packages and allows you to browse the packages by categories.
R also provides tools to automatically install all packages as per user requirement, which can also be well assembled with Big data.

Tableau Public

Tableau Public is a free software that connects any data source be it corporate Data Warehouse, Microsoft Excel or web-based data, and creates data visualizations, maps, dashboards etc. with real-time updates presenting on web.
They can also be shared through social media or with the client.
It allows the access to download the file in different formats. If you want to see the power of tableau, then we must have very good data source.
Tableau’s Big Data capabilities makes them important and one can analyze and visualize data better than any other data visualization software in the market.

Python

Python is an object-oriented scripting language which is easy to read, write, maintain and is a free open source tool.
It was developed by Guido van Rossum in late 1980’s which supports both functional and structured programming methods.
Phython is easy to learn as it is very similar to JavaScript, Ruby, and PHP.
Also, Python has very good machine learning libraries viz. Scikitlearn, Theano, Tensorflow and Keras.
Another important feature of Python is that it can be assembled on any platform like SQL server, a MongoDB database or JSON.
Python can also handle text data very well.

SAS

Sas is a programming environment and language for data manipulation and a leader in analytics, developed by the SAS Institute in 1966 and further developed in 1980’s and 1990’s.
SAS is easily accessible, managable and can analyze data from any sources.
SAS introduced a large set of products in 2011 for customer intelligence and numerous SAS modules for web, social media and marketing analytics that is widely used for profiling customers and prospects.
It can also predict their behaviors, manage, and optimize communications.

Apache Spark

The University of California, Berkeley’s AMP Lab, developed Apache in 2009.
Apache Spark is a fast large-scale data processing engine and executes applications in Hadoop clusters 100 times faster in memory and 10 times faster on disk.
Spark is built on data science and its concept makes data science effortless.
Spark is also popular for data pipelines and machine learning models development.
Spark also includes a library – MLlib, that provides a progressive set of machine algorithms for repetitive data science techniques like Classification, Regression, Collaborative Filtering, Clustering, etc.

Rapidminer

RapidMiner is a powerful integrated data science platform developed by the same company that performs predictive analysis and other advanced analytics like data mining, text analytics, machine learning and visual analytics without any programming.
RapidMiner can incorporate with any data source types, including Access, Excel, Microsoft SQL, Tera data, Oracle, Sybase, IBM DB2, Ingres, MySQL, IBM SPSS, Dbase etc.
The tool is very powerful that can generate analytics based on real-life data transformation settings, i.e. you can control the formats and data sets for predictive analysis.

Qlikview

QlikView has many unique features like patented technology and has in-memory data processing, which executes the result very fast to the end users and stores the data in the report itself.
Data association in QlikView is automatically maintained and can be compressed to almost 10% from its original size.
Data relationship is visualized using colors – a specific color is given to related data and another color for non-related data.

Splunk

Splunk is a tool that analyzes and search the machine-generated data.
Splunk pulls all text-based log data and provides a simple way to search through it, a user can pull in all kind of data, and perform all sort of interesting statistical analysis on it, and present it in different formats.

Friday, 22 February 2019

Industry Practices And Tools - Part 1

Version Control Systems

Version control systems are a category of software tools that help a software team manage changes to source code over time. Version control software keeps track of every modification to the code in a special kind of database. If a mistake is made, developers can turn back the clock and compare earlier versions of the code to help fix the mistake while minimizing disruption to all team members.
For almost all software projects, the source code is like the crown jewels - a precious asset whose value must be protected. For most software teams, the source code is a repository of the invaluable knowledge and understanding about the problem domain that the developers have collected and refined through careful effort. Version control protects source code from both catastrophe and the casual degradation of human error and unintended consequences.
Software developers working in teams are continually writing new source code and changing existing source code. The code for a project, app or software component is typically organized in a folder structure or "file tree". One developer on the team may be working on a new feature while another developer fixes an unrelated bug by changing code, each developer may make their changes in several parts of the file tree.
Version control helps teams solve these kinds of problems, tracking every individual change by each contributor and helping prevent concurrent work from conflicting. Changes made in one part of the software can be incompatible with those made by another developer working at the same time. This problem should be discovered and solved in an orderly manner without blocking the work of the rest of the team. Further, in all software development, any change can introduce new bugs on its own and new software can't be trusted until it's tested. So testing and development proceed together until a new version is ready.
Git , Apache Subversion , Perforce , Mercurial , GNU Bazaar are some well known version control systems.

Version Controlling Models

Local Version Control Systems

Local version control system maintains track of files within the local system. This approach is very common and simple. This type is also error prone which means the chances of accidentally writing to the wrong file is higher.
In this version , we can copy files and dated directories because this is revision control system.
Also we can save series of patches in a local version control system.
But the disadvantage in here is , this system is very difficult to collaborate.
As well as , the branching is almost impossible in this systems.

Centralized Version Control Systems

In this approach, all the changes in the files are tracked under the centralized server. The centralized server includes all the information of versioned files, and list of clients that check out files from that central place.
Some advantages in these kind of systems are performing actions other than pushing and pulling change-sets is extremely fast because the tool only needs to access the hard drive, not a remote server , committing new change-sets can be done locally without anyone else seeing them. Once you have a group of change-sets ready, you can push all of them at once and since each programmer has a full copy of the project repository, they can share changes with one or two other people at a time if they want to get some feedback before showing the changes to everyone.
Some disadvantages in these kind of systems are if your project contains many large, binary files that cannot be easily compressed, the space needed to store all versions of these files can accumulate quickly and if your project has a very long history (50,000 change-sets or more), downloading the entire history can take an impractical amount of time and disk space.

Distributed Version Control Systems

Distributed version control systems come into picture to overcome the drawback of centralized version control system. The clients completely clone the repository including its full history. If any server dies, any of the client repositories can be copied on to the server which help restore the server.
Some advantages in these kind of systems are performance of distributed systems is better, because there is no waiting for locks to happen across potentially slow network connections , branching and merging is much easier to achieve in a distributed system, largely because it’s built in to the way the system works and with a distributed system, you don’t need to be connected to the network all the time.
A disadvantage of this system is Initial checkout of a repository is slower as compared to checkout in a centralized version control system , because all branches and revision history are copied to the local machine by default.

GIT

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
Git is easy to learn and has a tiny footprint with lightning fast performance. It outclasses SCM tools like Subversion , CVS , Perforce , and ClearCase with features like cheap local branching , convenient staging areas and multiple workflows.

GitHub

GitHub Inc. is a web-based hosting service for version control using Git. It is mostly used for computer code. It offers all of the distributed version control and source code management functionality of Git as well as adding its own features.
It provides access control and several collaboration features such as bug tracking , feature requests , task management and wikis for every project.

GIT IS NOT GITHUB!!! We will see some differences...

Git	GitHub
Installed locally	Hosted in the cloud
First released in 2005	Company launched in 2008
Maintained by the Linux foundation	Purchased in 2018 by Microsoft
Focused on version controlling and code sharing	Focused on centralized source code hosting
Primarily a command line tool	Administered thorough the web
Provides a desktop interface named Git gui	Desktop interface named GitHub desktop
No user management features	Built-in user management
Open source licensed	Includes a free tier and pay for use tiers

Git Commands - Commit

The "commit" command is used to save your changes to the local repository. Note that you have to explicitly tell Git which changes we want to include in a commit before running the "git commit" command. This means that a file won't be automatically included in the next commit just because it was changed. Instead, you need to use the "git add" command to mark the desired changes for inclusion.
Also note that in Git , a commit is not automatically transferred to the remote server. Using the "git commit" command only saves a new commit object in the local Git repository. Exchanging commits has to be performed manually and explicitly.

Git Commands - Push

The git push command is used to upload local repository content to a remote repository. Pushing is how you transfer commits from your local repository to a remote repo. It is the counterpart to git fetch , but whereas fetching imports commits to local branches , pushing exports commits to remote branch. Remote branches are configured using the git remote command. Pushing has the potential to overwrite changes ,caution should be taken when pushing.
git push is most commonly used to publish an upload local changes to a central repository. After a local repository has been modified a push is executed to share the modifications with remote team members.

Commit	Push
Records changes to the repository	Update remote references along with the associated objects
Commits the files that is staged in the local repo	Fast-forwards merge the master branch of local side with the remote master branch

Git Staging Area

Imagine a box. You can put stuff into the box. You can take stuff out of the box. This box is the staging area of Git. You can craft commits here. Committing is like sealing that box and sticking a label on it. The contents of that box are your changes. So, why not have the label mean something? You wouldn’t label a moving box with kitchen items as simply “stuff.”

As you make changes locally, Git can "see" them. However, figuratively speaking, they're out of the box. If you were to try and make a commit at this point, Git wouldn't have anything to commit.
Staging helps to split up one large change into multiple commits.
It helps to review the changes.
As well as the staging helps when a merge has conflicts and helps to keep extra local files hanging around.
Staging helps to sneak in small changes.

Git Directory

Git directory that is a bare repository that is typically used for exchanging histories with others by pushing into it and fetching from it.

Collaboration Workflow of Git

In terms of Git process, collaboration is often about branching workflows. Thinking ahead on how you will intertwine commit trees will help you minimize integration bugs and support your release management strategy.

Integration Branch

Use an integration branch with software development teams who work towards deploying a collection of contributions into production as a single entity. This is opposed to teams that focus on deploying features individually. Often teams may want to be doing the latter but practical limitations impose a process that groups their efforts , and the team ends up doing the former , so be sure to review your actual Git usage to see if you would benefit from using this type of collaboration pattern.
This workflow pattern is a useful staging point for when the risk of integrating multiple branches is high enough to warrant testing the combined contributions as a whole.

Topic Branch

Teams will want to use topic branches if it is important to keep their commit trees in a state that can be easily read or have individual features reverted. Topic branches signify that the commits maybe overwritten to clean up their structure and be shrunk down to a feature commit.
Topic branches are often owned by and individual contributor but can also be a designated space for a team to develop a feature upon. Other contributors know that this type of branch could have its commit tree re-written at any moment , and should not try to keep their local branches synchronized with it.

Fork

The fork empowers the repository maintainers with an enforced gateway over pushing directly to an origin repository branch , but more importantly it facilitates collaboration.
The fork workflow pattern gives teams their own space to work in whatever way they are used to with a single integration point between the two repositories. Over communicating is imperative within the pull request description. The teams have had separate communication streams before a pull request has been issued , and highlighting the decisions that have already made will speed up the review process.
Of course one benefit of the fork workflow is that you can direct comments to contributors of the origin repository , as the permission cascade downwards. From the point of view of the origin repository , you have the control to delete forks when they are no longer needed.

Clone

Using a clone of the project's repository lays out an isolated training and communication ground for the outsourced team to manage their contributions , enforce policies and take advantage of knowledge sharing. Once a contribution is deemed up to standard and ready for the main repository it can be pushed to one of the origin repositories remote branches and integrated as usual.
Some projects have high expectations for following their coding conventions and defined Git workflow standards to contribute to their repository. It can be daunting working in this environment until you have learnt the ropes , so work together as a team to optimize both parties' time.

CDN

A Content Delivery Network(CDN) is an interconnected system of computers on the internet that provides web content rapidly to numerous users by duplicating or caching the content on multiple servers and directing the content to users on proximity. The goal of a CDN is to serve content to end-users with high availability and high performance. CDNs serve a large fraction of the Internet content today, including web objects (text, graphics, and scripts), downloadable objects (media files, software, documents), applications (e-commerce, portals), real-time , streaming data, on-demand streaming media, and social networks. When an end-user requests a specific web page, video or file, the server closest to that user is dynamically determined and is used to deliver the content to that user, thereby increasing the speed of delivery. Content may be replicated on hundreds or thousands of servers in order to provide identical content to as many users as possible even during peak usage.

Benefits Of CDN

Companies that witness a huge traffic on their website on daily basis can use CDN to their advantage. When a large number of users simultaneously access a web page on some specific content such as a video, a CDN enables that content to be sent to each of them without delay. Here are few of the benefits of using a CDN for your website:

Your reliability and response times get a huge boost

High performing website equals high conversion and growing sales. Latency and speed issues tend to cripple web businesses and cause damage. A few seconds can mean the difference between a successful conversion or a bounce. A reliable CDN ensures that the load speed is more than optimal and that online transactions are made seamlessly.

A CDN enables global reach

Over one third of the world’s population is online, which means that the global use of the internet has increased exponentially over the last 15 years. CDNs provide solutions through cloud acceleration with local POPs. This global reach will eliminate any latency problems that interrupt long-distance online transactions and cause slow load times.

A CDN saves lots of money

Hiring a CDN results in noticeable savings for a business; rather than investing in an infrastructure and separate service providers all across the globe, a global CDN can eliminate the need to pay for costly foreign hosting and thus, save your business a lot of money. A global CDN offers a single platform to handle all of the separate operations, working across numerous regions for a reasonable price. CDNs are also recommended for companies with a tight budget.

100% availability

Due to the distribution of assets across many regions, CDNs have automatic server availability sensing mechanisms with instant user redirection. As a result, CDN websites experience 100 percent availability, even during massive power outages, hardware issues or network problems.

Decrease server load

The strategic placement of a CDN can decrease the server load on interconnects, public and private peers and backbones, freeing up the overall capacity and decreasing delivery costs. Essentially, the content is spread out across several servers, as opposed to offloading them onto one large server.

24/7 customer support

Quality CDNs have been known for outstanding customer support. In other words, there is a CS team standby at all time, at your disposal. Whenever something occurs, you have backup that’s waiting to help you fix your performance related problems. Having a support team on quick dial is a smart business decision – you’re not just paying for a cloud service, you’re paying for a large spectre of services that help your business grow on a global scale.

Increase in the number of concurrent user

Strategically placing the servers in a CDN can result in high network backbone capacity , which equates to a significant increase in the number of users accessing the network at a given time. For example, where there is a 100 GB/s network backbone with 2tb\s capacity, only 100 GB/s can be delivered. However, with a CDN, 10 servers will be available at 10 strategic locations and can then provide a total capacity of 10 x 100 GB/s.

DDoS protection

Other than inflicting huge economic losses, DDoS attacks can also have a serious impact on the reputation and image of the victimized company or organization. Whenever customers type in their credit card numbers to make a purchase online, they are placing their trust in that business. DDoS attacks are on the rise and new ways of Internet Security are being developed; all of which have helped increase the growth of CDNs, as cloud security adds another layer of security. Cloud solutions are designed to stop an attack before it ever reaches your data center. A CDN will take on the traffic and keep your website up and running. This means you need not be concerned about DDoS attacks impacting your data center, keeping your business’ website safe and sound.

Analytics

Content delivery networks can not only deliver content at a fast pace, they can also offer priceless analytical info to discover trends that could lead to advertising sales and reveal the strengths and the weaknesses of your online business. CDNs have the ability to deliver real-time load statistics, optimize capacity per customer, display active regions, indicate which assets are popular, and report viewing details to their customers. These details are extremely important, since usage logs are deactivated once the server source has been added to the CDN. Info analysis shows everything a developer needs to know to further optimize the website. In-depth reporting ultimately leads to performance increase, which results in higher user experience and then further reflects on sales and conversion rates.

How CDN is differ from web hosting servers?

CDN and Web Hosting seem to be similar, but they are totally two different concepts. Web Hosting is hosting your content of your website in a server. There are different hosting plans available these days like shared hosting, VPS (Virtual Private Hosting), Dedicated Hosting and Cloud Hosting. Since the web data become richer, in the sense, more with Audio, Video or bigger page sizes, it consumes more bandwidth to deliver the end user or the person who is browsing. On top of bandwidth, it takes more time to load the content of the webpage to the user. Here only, the CDN network comes into the picture.
Web Hosting is to host your web in a server to allow people to access from Internet, whereas CDN increases the delivery speed of your web content across the world.
CDN at the moment deliver only the static part of your website, but Google is planning to cache the whole page including content of your web pages, web servers on the other hand, contain all your web related content.
Mostly, web content are hosted in a single server, but CDN content will be spread across the world, in multiple hosted environment.
Web Hosting is used to host your website on a server and let users access it over the internet. A content delivery network is about speeding up the access/delivery of your website’s assets to those users.
Traditional web hosting would deliver 100% of your content to the user. If they are located across the world, the user still must wait for the data to be retrieved from where your web server is located. A CDN takes a majority of your static and dynamic content and serves it from across the globe, decreasing download times. Most times, the closer the CDN server is to the web visitor, the faster assets will load for them.

Free And Commercial CDN s

Cloudfare

Cloudfare is popularly known as the best free CDN. It is one of the few industry-leading players actually offer a free plan. Powered by its 115 datacenters , cloudfare delivers speed , reliability , and protection from basic DDoD attacks.

Incapsula

Incapsula provides application delievery from the cloud: global CDN , website security , DDoS protection , Load balancing and failover. It takes 5 minutes to activate the service, and they have a great free plan.

jsDelievr

jsDelivr is a publicly available CDN where any web developer can upload and host their own files. It is best suited for hosting the libraries that are not hosted by Google.

CDNjs

CDNjs is a community-powered CDN user by over 320000 websites. Sponsored by Cloudfare , Userapp and Algolia. CDNjs hosts over 1000 libraries.

Imgur

A wildly-popular image hosting site, imgur is fast , reliable and perfect for begginers. If you're just starting up and looking for an easy way to save server bandwidth , imgur along with other popular image hosting sites like PhotoBucket and Flickr should serve your purposes to the fullest.

Cloudinary

If you run website that heavily dependent on images (think portfolios of photography/design services), offloading your images to another server would be a good idea. You would end up saving a lot of precious bandwidth. Cloudinary is a robust image management solution that can host your images, resize them on-the-fly and a ton of other cool features. In their forever-free plan , they offer 2GB storage with 5GB of bandwidth.

Requirements For Virtualization

Oracle VM VirtualBox is an free and open-source tool that you can use to run virtual servers on any computer using an x86-type processor, such as the common Intel and AMD chips. It lets you run other operating systems or another instance of the same operating system on your computer. For example, if your creative department uses Macs, they could use VirtualBox to open up a Windows virtual computer to access a program that the rest of your company uses. It's frequently used for testing since you can use one computer to test a program or Web page in multiple operating systems.

CPU and RAM

VirtualBox runs on Intel and AMD processors even if they don't support their manufacturers' VT-x or AMD-V virtualization technologies. Oracle also recommends that you have at least 1GB of RAM to run the software in addition to what is needed to support your computer's processes. When running VirtualBox, remember that your CPU's power will be divided between the virtual computers that it runs, so the faster it is, the faster each virtual computer will be.

Storage

Oracle doesn't specify a storage requirement for VirtualBox because the program itself is relatively small. For example, the VirtualBox installer for Windows is less than 100MB. However, you do need space for the virtual computers that will run under VirtualBox. If you want to run a virtual Windows 8 computer on your Linux box, you need enough space to install the second operating system and Windows 8 programs and for storage inside your Windows 8 virtual partition. If you heavily use VirtualBox, you could end up needing hundreds of gigabytes of extra storage.

Windows Requirements

VirtualBox can run on many flavors of Windows. It supports 32- and 64-bit versions of Vista, Windows 7 and Windows 8, as well as 32-bit versions of Windows XP. It also works on Windows server platforms, including the 32-bit edition of Windows Server 2003, both 32- and 64-bit Windows Server 2008 and Windows Server 2012.

Other Requirements

VirtualBox not only enables you to run multiple operating systems, but it can also run on multiple operating systems. Oracle offers an OS X version of VirtualBox that runs on versions 10.6, 10.7 and 10.8. You can use VirtualBox with four flavors of Linux -- Oracle Enterprise Linux, SUSE Linux, Ubuntu and Redhat Enterprise Linux. VirtualBox also supports Solaris 10 and Solaris 11.

Image result for requirements for virtualization

Pros And Cons Of Different Virtualization Techniques

Server Virtualization

Server virtualization is the masking of server sources , including the number and identity of individual physical servers , processors ,and operating systems from server users. The server administrator uses a software application to divide one physical server into multiple isolated virtual environments.

Pros

It saves hardware. Without the need to purchase or upgrade costly server hardware, companies can reallocate their funds back into growing their business.
It is doing operational savings. With a large server array, it is necessary to employ dedicated technicians to maintain it all. Salaries for full-time tech staff are a considerable drain on spending for many companies, but with virtualization, these resources can be reduced to part-time or even channeled into less-costly managed services.
It saves energy. An extensive collection of physical servers is a massive drain on energy resources. Not only do they need to be kept running, they need to be kept cool—meaning there will be extra energy involved to maintain climate stability in the server room. Virtualization eliminates this need, potentially reducing your energy consumption by a significant degree.
It has real estate savings. Physical servers tend to take up a fair bit of room. With office space at a premium, this can be quite costly. Virtualization of your servers can support downsizing efforts and help you save on real estate. That space can then be re purposed for other things or gotten rid of entirely.

Cons

The cost of entry can be prohibitive. Virtualization, like any other technological initiative, is pay-to-pay. For instance , the physical servers that can be virtualized cost more than their traditional counterparts.
Not all the applications can be virtualized. You may still need to maintain a hybrid system to ensure all of your applications keep working as they should. Today , most applications support virtualization , but if you are running propriety software , you may want to look at its capabilities before moving forward.
There can be security risks. Data security is one of the most significant issues we face today. Virtualization carries an added security risk , so additional spending will be required to ensure data safety and integrity.

Popular Tools

vSphere
Kernel-based Virtual Machine
VMware ESXi
Nutanix Acropolis
Amazon Elastic Compute Cloud
VSOM
Oracle VM Virtualbox
Citrix Hypervisor

Popular Implementations

Using virtual servers for disaster recovery
Saving money with server consolidation
Creating server consolidation plan to avoid sprawl
Virtual Server performance improves with resource throtting

Storage Virtualization

Storage virtualization is technique to abstract information about all the storage hardware resources on storage area networks. This can be used to integrate hardware resources from different networks and data centers into one logical view.

Pros

Cost-effective in terms of not having to purchase as much additional software.
The same amount of work can be completed with less servers since they are effectively working together
Increased loading and backup speed
Less energy use
More disk space

Cons

Often violates licensing agreements
Reduced costs and increases in disk space can encourage people to increase the number of servers, creating server sprawl, in which there are too many servers to be managed
The network system is much more complicated
If one system fails, they all fail
If one server is infected or breached, the entire network is compromised.

Popular Tools

Tintri VMstore
Infinio Accelerator
Condusiv Technology V-locity
Proximal Data AutoCache
Pure Storage Flash Array

Popular Implementations

Information inquiry commands
Mapping management
RAID
Snapshots
Free space Management

Desktop Virtualization

Desktop virtualization is software technology that separates that desktop environment and associated application software from the physical client device that is used to access it.

Pros

Access Anywhere. A well set up DaaS system lets your employees access their work desktop from pretty much anywhere. This saves on costs (not having to purchase multiple licenses for employees with, say, a desktop and a laptop), as well as makes employees not in the office more productive.
Security and Reliability. A DaaS setup lets you keep an eye on security and reduce maintenance costs by only having one central point that needs patching, upgrading, and maintenance.
Uniformity and Control. Since the desktop image is shared by all (or most), you have a lot of control with what is available, what can be installed, and what can be put where. That level of control is much more difficult on individual machines.
Ability to Switch Environments on the Fly. If you have employees who might need to use multiple environments, Linux and Windows for example, or two different versions of Windows, a DaaS set-up allows you to offer both and let employees use them interchangeably.

Cons

Long-term ROI. Most experts and consultants agree that the ROI benefits of DaaS systems are long-term prospects. You won’t be seeing an immediate return like with server virtualization or other outsourcing methods.
Multiple Use Cases Require Multiple Images. Having users who need different environments and different default settings can require having multiple images stored on your central server, and that can quickly overrun your space availability and get expensive.
Single Failure Point. Unlike the distributed desktop model where the loss or failure of one PC is contained, if your DaaS server or provider goes down or becomes compromised, EVERYONE in your organization can be down or compromised.
Usually requires network access. Employees who can’t get online can’t work, so Internet/network connectivity issues can wipe out productivity across entire departments.

Popular Tools

Citrix XenDesktop
Microsoft Enterprise Desktop Virtualization
MokaFive Virtual Desktop Solution 1.0
Pano Logic Virtual Desktop Solution
Solid ICE
webOS
Virtual Desktop Infrastructure

Image result for Microsoft Enterprise Desktop Virtualization

Image result for Virtual Desktop Infrastructure

Popular Implementations

As a virtual machine
Image repository
Management Server
Management console
End user

Hypervisor

A hypervisor is a process that separates a computer’s operating system and applications from the underlying physical hardware. Usually done as software although embedded hypervisors can be created for things like mobile devices.
The hypervisor drives the concept of virtualization by allowing the physical host machine to operate multiple virtual machines as guests to help maximize the effective use of computing resources such as memory, network bandwidth and CPU cycles.

One of the key functions a hypervisor provides is isolation, meaning that a guest cannot affect the operation of the host or any other guest, even if it crashes. As such, the hypervisor must carefully emulate the hardware of a physical machine, and (except under carefully controlled circumstances), prevent access by a guest to the real hardware. How the hypervisor does this is a key determinant of virtual machine performance. But because emulating real hardware can be slow, hypervisors often provide special drivers, so called ‘paravirtualized drivers’ or ‘PV drivers’, such that virtual disks and network cards can be represented to the guest as if they were a new piece of hardware, using an interface optimized for the hypervisor. These PV drivers are operating system and (often) hypervisor specific. Use of PV drivers can speed up performance by an order of magnitude, and are also a key determinant to performance.

Emulation

Emulation is the process of imitating a hardware/software program/platform on another program or platform. This makes it possible to run programs on systems not designed for them.
Emulators, as the name implies, emulate the functions of one system on another. Thus, the second system behaves like the original system, attempting to exactly reproduce the external behaviors of the first system.

How does the emulation is different from Virtual Machines?

An emulator is anything that tries to behave like something else and isn't necessarily a virtual machine if that emulated thing is not hardware.
A virtual machine can either be something that tries to present its own system image of the same architecture (emulates an i386 system while running on an i386; the CPU doesn't need to be emulated but various hardware pheriphreals need to be), a different architecture (i386 VM running on a PowerPC Macintosh, the CPU needs to be emulated), or a bytecode interpreter like the Java VM.
So a Java VM is not really an emulator as its not trying to pretend to be anything else, but it's a sort of virtual machine based on "imaginary hardware" in a sense.
VMware running x86 Windows on an x86 Linux host is a VM that's not emulating the CPU, but is emulating other things like disk controllers.
Something like WINE for example is not machine emulator or a VM, they're OS emulators (take the same API calls and try to do the same thing that Windows would do with them, etc.)
Windows 95 on Win10 (without using VMware etc) would require an "OS emulator" called WoW64 (Windows on Windows 64) that translates API calls similar to WINE.
Virtual machines make use of CPU self-virtualization, to whatever extent it exists, to provide a virtualized interface to the real hardware. Emulators emulate hardware without relying on the CPU being able to run code directly and redirect some operations to a hypervisor controlling the virtualcontainer.

Pros and Cons of Virtual Machines

Advantages

Less physical hardware

In a typical distributed control system (DCS), you might have two Tag/OS servers, two batch servers, a historian or two, and an engineering station or two. Easily, you’re looking at six servers that will need to be physically maintained. You’ll find a time and overall cost savings on the replacement of hardware and maintenance.

More eco-friendly

If you look at your current configuration, most of your machines are idling along. But, with them virtualized and running on a cluster, you maximize your machines’ potential while saving money on energy costs.

System upgrades

The time and heartache of making system images before applying a patch and having a system restore fail are all realities. With the virtual environment, if something goes wrong while applying a patch or update, you can simply roll back the virtual machine back where it was before you applied the patch using a snapshot.

Use of thin clients

Using a thin client manager, replacement of a bad terminal is as easy as a few clicks and powering on the new unit. Conversely, with a physical machine you’re stuck with re-imaging or building a replacement from scratch.

Disadvantages

Cost

The upfront cost can be much higher and, depending on how high of an availability you want, you’ll need to be willing to design the system for your needs now and in the future.

Complexity

If you’re not familiar with the hardware and network aspects of the whole setup, it can be daunting to figure out. Routing rules and virtual local area networks (VLAN) continue to add complexity, especially if security is a concern.

Hardware Keys

Yes, you can use hardware keys. You can bind a USB port to a specific virtual machine. However, you are not able to move the virtual machine without physically moving the key as well.

Add-on hardware

In the past, you weren’t able to add on older PCI hardware and share it with the virtual machine. This has changed, but it doesn’t work 100% of the time. I’d recommend testing it thoroughly before deploying. Of course, this also limits which machine a virtual machine can run on because it will need to be bound to that piece of hardware.

Image result for pros and cons of virtual machines

Containers/Dockers

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, run time, system tools, system libraries and settings.

Advantages

Well documented

Docker’s feature set changes rapidly. The Docker team churns out new releases at a dizzying pace. Each release tends to add new features, and deprecate old ones.
Fortunately, the Docker team also does a nice job of documenting everything. The Docker Documentation is reliably up to date. The docs usually make it very clear if information applies only to specific versions of Docker.
Docker’s solid documentation merits praise because many other software projects do a poorer job in this respect.

Has public container registries

One of the coolest things about Docker that people tend to overlook, I think, is the way it has made public repositories the go-to way to distribute and install software.
The repository idea is not new to Docker, of course. GitHub has been doing the same thing for years. So have Linux distributions, which usually rely on public repositories as the main source for installing software.
Through Docker Hub, Docker brings turn-key software distribution and installation to a new level. Repositories are no longer something you use just for source code or on Linux. With Docker, they become the default way to install software almost anywhere.

Disadvantges

Storage is still hard

Better storage options for Docker containers are on the horizon. But the fact remains that today, there is no really seamless way to connect containers to storage. Docker Data Volumes require a lot of provisioning on the host and manual configuration. They solve the storage dilemma, but not in a really user-friendly or efficient way.

Has poor monitoring

Basically, the only type of monitoring solution that Docker offers is the stats commands.
There are third-party tools that offer more monitoring. Docker itself provided more robust monitoring.

Platform dependent

Docker now advertise itself as supporting Windows and Mac OS X as well as Linux. But it actually uses virtual machines to run on non-Linux platforms. At the end of the day, Docker is still Linux-only.

What are the differences between containers and VMs?

VMs	Containers
Heavyweight	Lightweight
Limited performance	Native performance
Each VMs run in its own OS	All containers share the host OS
Hardware-level virtualization	OS virtualization
Startup time in minutes	Startup time in milliseconds
Allocates required memory	Requires less memory space
Fully isolated and hence more secure	Process-level isolation, possibly less secure