Northeastern University Evolution of Hadoop and Apache Spark Discussion
Question Description
I need two comments, each part is around 100 words.
Part 1:
Given the evolution of Hadoop and Apache Spark, how is Amazon Web Services providing solutions for companies looking to do Big Data analysis?
Amazon Web Services (AWS) is Amazon’s cloud computing platform, which offers a variety of services to customers around the world. These services include data analytics, data storage, data integration, and more. As part of AWS’s data analytics service, a variety of tools are offered, including but not limited to, Amazon Elastic MapReduce (EMR).
Apache Spark and Hadoop are tools used to process large amounts of data. Spark can process data in-memory, whereas Hadoop reads and writes to disk. Customers can store data in data lakes, and use open source data processing frameworks such as Spark and Hadoop on top. However, operating Spark and Hadoop is difficult and time consuming, and requires the need for organizations to purchase additional hardware and install the different software. As the most popular storage infrastructure for data lakes is Amazon S3, Amazon wanted to find a way to bridge this gap for its customers around the world.
Amazon EMR is Amazon’s answer, which is a managed cluster platform that simplifies running big data frameworks such as Hadoop and Spark to process large amounts of data. It can be also used to integrate with, transform, and move large amounts of data in to and out of Amazon S3, as well as various other databases. All in all, Amazon EMR provides on-demand storage and compute, as well as flexibility, scalability, automatic configurations, ongoing monitoring, and full customer support. (AWS EMR 2020)
To do so, Spark can be installed on an EMR cluster, which allows it to use EMR’s own File System (EMRFS). EMRFS can, in turn, directly access the data stored in Amazon S3, which removes the need to first read and write the data in to HDFS. (Amazon 2020)
With Spark and Hadoop services now available on AWS through Amazon Elastic MapReduce, customers around the world can get pay-as-you-go access to process enormous quantities of data for only a fraction of the cost. With seamless integration with Amazon S3, as well as unlimited, scalable compute, storage and compute can now grow independently of each other. As a result, AWS continues to remain at the forefront of the vastly-growing cloud industry, leading the $100 billion industry with 33% of the total market share. (Richter 2020)
Part 2:
Amazon Web Services offers a broad set of global cloud-based products including compute, storage, databases, analytics, networking, mobile, developer tools, management tools, IoT, security, and enterprise applications. These services help organizations move faster, lower IT costs, and scale (Amazon Web Services, 2020).
Amazon is a great example of Big data analytics as it used data to understand its customer buying behavior and in the supply chain to deliver ordered projects optimizing methods. Analyzing a large set requires a huge computing capacity that can vary in size dependent on the measure of input data and the type of analysis. This attribute of big data workloads is ideally suited to the pay-as-you-go cloud computing model, where applications can undoubtedly scale here and there dependent on request. As requirements change, you can easily resize your environment (horizontally or vertically) on AWS to meet your needs, without having to wait for additional hardware or being required to over-invest to provision enough capacity. With AWS due to its elastic compute capability, one doesn’t have to worry about the system scaling inefficiently.
In addition, you get flexible computing on a global infrastructure with access to the many different geographic regions that AWS offers, along with the ability to use other scalable services that augment to build sophisticated big data applications. These other services include Amazon Simple Storage Service (Amazon S3) to store data and AWS Glue to orchestrate jobs to move and transform that data easily. AWS IoT, which lets connected devices interact with cloud applications and other connected devices. (The AWS Advantage in Big Data Analytics)
Hadoop and Apache Spark provide a great platform for analyzing big chunks of data to the companies but AWS provides the same capability and some more in a nice little package which makes the barrier to entry in the vast data analytics field a little lower.
Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."