Data analytics in the AWS cloud : building a data platform for BI and predictive analytics on AWS

Eylem Seç

Ayırt
Listelerime ekle
Eposta
Yazdır

Başlık:

Yazar:

Minichino, Joe, author.

ISBN:

9781394320677

9781119909262

9781119909255

Fiziksel Tanımlama:

1 online resource

İçerik:

Cover -- Title Page -- Copyright Page -- About the Author -- About the Technical Editor -- Acknowledgments -- Contents at a Glance -- Contents -- Introduction -- What Is a Data Lake? -- When You Do Not Need a Data Lake -- When Do You Need Analytics? -- When Do You Need a Data Lake for Analytics? -- How About an Analytics Team? -- The Data Platform -- The End of the Beginning -- Chapter 1 AWS Data Lakes and Analytics Technology Overview -- Why AWS? -- What Does a Data Lake Look Like in AWS? -- Analytics on AWS -- Skills Required to Build and Maintain an AWS Analytics Pipeline -- Chapter 2 The Path to Analytics: Setting Up a Data and Analytics Team -- The Data Vision -- Support -- DA Team Roles -- Early Stage Roles -- Team Lead -- Data Architect -- Data Engineer -- Data Analyst -- Maturity Stage Roles -- Data Scientist -- Cloud Engineer -- Business Intelligence (BI) Developer -- Machine Learning Engineer -- Business Analyst -- Niche Roles -- Analytics Flow at a Process Level -- Workflow Methodology -- The DA Team Mantra: "Automate Everything" -- Analytics Models in the Wild: Centralized, Distributed, Center of Excellence -- Centralized -- Distributed -- Center of Excellence -- Summary -- Chapter 3 Working on AWS -- Accessing AWS -- Everything Is a Resource -- S3: An Important Exception -- IAM: Policies, Roles, and Users -- Policies -- Identity-Based Policies -- Resource-Based Policies -- Roles -- Users and User Groups -- Summarizing IAM -- Working with the Web Console -- The AWS Command-Line Interface -- Installing AWS CLI -- Linux Installation -- macOS Installation -- Windows -- Configuring AWS CLI -- A Note on Region -- Setting Individual Parameters -- Using Profiles and Configuration Files -- Final Notes on Configuration -- Using the AWS CLI -- Using Skeletons and File Inputs -- Cleaning Up!.

Infrastructure-as-Code: CloudFormation and Terraform -- CloudFormation -- CloudFormation Stacks -- CloudFormation Template Anatomy -- CloudFormation Changesets -- Getting Stack Information -- Cleaning Up Again -- CloudFormation Conclusions -- Terraform -- Coding Style -- Modularity -- Limitations -- Terraform vs. CloudFormation -- Infrastructure-as-Code: CDK, Pulumi, Cloudcraft, and Other Solutions -- AWS CDK -- Pulumi -- Cloudcraft -- Infrastructure Management Conclusions -- Chapter 4 Serverless Computing and Data Engineering -- Serverless vs. Fully Managed -- AWS Serverless Technologies -- AWS Lambda -- Pricing Model -- Laser Focus on Code -- The Lambda Paradigm Shift -- Virtually Infinite Scalability -- Geographical Distribution -- A Lambda Hello World -- Lambda Configuration -- Runtime -- Container-Based Lambdas -- Architectures -- Memory -- Networking -- Execution Role -- Environment Variables -- AWS EventBridge -- AWS Fargate -- AWS DynamoDB -- AWS SNS -- Amazon SQS -- AWS CloudWatch -- Amazon QuickSight -- AWS Step Functions -- Amazon API Gateway -- Amazon Cognito -- AWS Serverless Application Model (SAM) -- Ephemeral Infrastructure -- AWS SAM Installation -- Configuration -- Creating Your First AWS SAM Project -- Application Structure -- SAM Resource Types -- SAM Lambda Template -- !! Recursive Lambda Invocation !! -- Function Metadata -- Outputs -- Implicitly Generated Resources -- Other Template Sections -- Lambda Code -- Building Your First SAM Application -- Testing the AWS SAM Application Locally -- Deployment -- Cleaning Up -- Summary -- Chapter 5 Data Ingestion -- AWS Data Lake Architecture -- Serverless Data Lake Architecture Structure -- Ingestion -- Storage and Processing -- Cataloging, Governance, and Search -- Security and Monitoring -- Consumption -- Sample Processing Architecture: Cataloging Images into DynamoDB.

Use Case Description -- SAM Application Creation -- S3-Triggered Lambda -- Adding DynamoDB -- Lambda Execution Context -- Inserting into DynamoDB -- Cleaning Up -- Serverless Ingestion -- AWS Fargate -- AWS Lambda -- Example Architecture: Fargate-Based Periodic Batch Import -- The Basic Importer -- ECS CLI -- AWS Copilot CLI -- Clean Up -- AWS Kinesis Ingestion -- Example Architecture: Two-Pronged Delivery -- Fully Managed Ingestion with AppFlow -- Operational Data Ingestion with Database Migration Service -- DMS Concepts -- DMS Instance -- DMS Endpoints -- DMS Tasks -- Summary of the Workflow -- Common Use of DMS -- Example Architecture: DMS to S3 -- DMS Instance -- DMS Endpoints -- DMS Task -- Summary -- Chapter 6 Processing Data -- Phases of Data Preparation -- What Is ETL? Why Should I Care? -- ETL Job vs. Streaming Job -- Overview of ETL in AWS -- ETL with AWS Glue -- ETL with Lambda Functions -- ETL with Hadoop/EMR -- Other Ways to Perform ETL -- ETL Job Design Concepts -- Source Identification -- Destination Identification -- Mappings -- Validation -- Filter -- Join, Denormalization, Relationalization -- AWS Glue for ETL -- Really, It's Just Spark -- Visual -- Spark Script Editor -- Python Shell Script Editor -- Jupyter Notebook -- Connectors -- Creating Connections -- Creating Connections with the Web Console -- Creating Connections with the AWS CLI -- Creating ETL Jobs with AWS Glue Visual Editor -- ETL Example: Format Switch from Raw (JSON) to Cleaned (Parquet) -- Job Bookmarks -- Transformations -- Apply Mapping -- Filter -- Other Available Transforms -- Run the Edited Job -- Visual Editor with Source and Target Conclusions -- Creating ETL Jobs with AWS Glue Visual Editor (without Source and Target) -- Creating ETL Jobs with the Spark Script Editor -- Developing ETL Jobs with AWS Glue Notebooks -- What Is a Notebook? -- Notebook Structure.

Step 1: Load Code into a DynamicFrame -- Step 2: Apply Field Mapping -- Step 3: Apply the Filter -- Step 4: Write to S3 in Parquet Format -- Example: Joining and Denormalizing Data from Two S3 Locations -- Conclusions for Manually Authored Jobs with Notebooks -- Creating ETL Jobs with AWS Glue Interactive Sessions -- It's Magic -- Development Workflow -- Streaming Jobs -- Differences with a Standard ETL Job -- Streaming Sources -- Example: Process Kinesis Streams with a Streaming Job -- Streaming ETL Jobs Conclusions -- Summary -- Chapter 7 Cataloging, Governance, and Search -- Cataloging with AWS Glue -- AWS Glue and the AWS Glue Data Catalog -- Glue Databases and Tables -- Databases -- The Idea of Schema-on-Read -- Tables -- Create Table Manually -- Creating a Table from an Existing Schema -- Creating a Table with a Crawler -- Summary on Databases and Tables -- Crawlers -- Updating or Not Updating? -- Running the Crawler -- Creating a Crawler from the AWS CLI -- Retrieving Table Information from the CLI -- Classifiers -- Classifier Example -- Crawlers and Classifiers Summary -- Search with Amazon Athena: The Heart of Analytics in AWS -- A Bit of History -- Interface Overview -- Creating Tables Manually -- Athena Data Types -- Complex Types -- Running a Query -- Connecting with JDBC and ODBC -- Query Stats -- Recent Queries and Saved Queries -- The Power of Partitions -- Athena Pricing Model -- Automatic Naming -- Athena Query Output -- Athena Peculiarities (SQL and Not) -- Computed Fields Gotcha and WITH Statement Workaround -- Lowercase! -- Query Explain -- Deduplicating Records -- Working with JSON, Flattening, and Unnesting -- Athena Views -- CREATE TABLE AS SELECT (CTAS) -- Saving Queries and Reusing Saved Queries -- Running Parameterized Queries -- Athena Federated Queries -- Athena Lambda Connectors -- Note on Connection Errors.

Performing Federated Queries -- Creating a View from a Federated Query -- Governing: Athena Workgroups, Lake Formation, and More -- Athena Workgroups -- Fine-Grained Athena Access with IAM -- Recap of Athena-Based Governance -- AWS Lake Formation -- Registering a Location in Lake Formation -- Creating a Database in Lake Formation -- Assigning Permissions in Lake Formation -- LF-Tags and Permissions in Lake Formation -- Data Filters -- Governance Conclusions -- Summary -- Chapter 8 Data Consumption: BI, Visualization, and Reporting -- QuickSight -- Signing Up for QuickSight -- Standard Plan -- Enterprise Plan -- Users and User Groups -- Managing Users and Groups -- Managing QuickSight -- Users and Groups -- Your Subscriptions -- SPICE Capacity -- Account Settings -- Security and Permissions -- VPC Connections -- Mobile Settings -- Domains and Embedding -- Single Sign-On -- Data Sources and Datasets -- Creating an Athena Data Source -- Creating Other Data Sources -- Creating a Data Source from the AWS CLI -- Creating a Dataset from a Table -- Creating a Dataset from a SQL Query -- Duplicating Datasets -- Note on Creating Datasets -- QuickSight Favorites, Recent, and Folders -- SPICE -- Manage SPICE Capacity -- Refresh Schedule -- QuickSight Data Editor -- QuickSight Data Types -- Change Data Types -- Calculated Fields -- Joining Data -- Excluding Fields -- Filtering Data -- Removing Data -- Geospatial Hierarchies and Adding Fields to Hierarchies -- Unsupported Format Dates -- Visualizing Data: QuickSight Analysis -- Adding a Title and a Description to Your Analysis -- Renaming the Sheet -- Your First Visual with AutoGraph -- Field Wells -- Visual Types -- Saving and Autosaving -- A First Example: Pie Chart -- Renaming a Visual -- Filtering Data -- Adding Drill-Downs -- Parameters -- Actions -- Insights -- ML-Powered Insights -- Sharing an Analysis.

Özet:

A comprehensive and accessible roadmap to performing data analytics in the AWS cloud In Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processing, analyzing data on the Amazon Web Services cloud platform. In the book, you'll explore every relevant aspect of data analytics--from data engineering to analysis, business intelligence, DevOps, and MLOps--as you discover how to integrate machine learning predictions with analytics engines and visualization tools. You'll also find: Real-world use cases of AWS architectures that demystify the applications of data analytics Accessible introductions to data acquisition, importation, storage, visualization, and reporting Expert insights into serverless data engineering and how to use it to reduce overhead and costs, improve stability, and simplify maintenance A can't-miss for data architects, analysts, engineers and technical professionals, Data Analytics in the AWS Cloud will also earn a place on the bookshelves of business leaders seeking a better understanding of data analytics on the AWS cloud platform.

Notlar:

John Wiley and Sons

Tüzel Kişi Konu Girişi:

Amazon Web Services (Firm)

Konu Terimleri:

Cloud computing.

Big data -- Data processing.

Infonuagique.

Données volumineuses -- Informatique.

Data Visualization.

Data Mining.

Databases.