[PDF] Apache Flume: Distributed Log Collection for Hadoop Free Download

[message] Brief Description [PDF] Apache Flume: Distributed Log Collection for Hadoop Free Download by Steve Hoffman | Publisher : Packt Pub...

  • [message]
    • Brief Description
      • [PDF] Apache Flume: Distributed Log Collection for Hadoop Free Download by Steve Hoffman | Publisher : Packt Publishing | Category : Computers & Internet | Tags : Expression, Service, Writing, Systems, Software, Database | ISBN-10 : 1782167919 | ISBN-13 : 9781782167914
  • [message]
    • Book Image
      • Apache Flume: Distributed Log Collection for Hadoop by Steve Hoffman, Publisher : Packt Publishing
  • [message]
    • Complete Book Description
      • Stream data to Hadoop using Apache Flume

        Overview

        • Integrate Flume with your data sources
        • Transcode your data en-route in Flume
        • Route and separate your data using regular expression matching
        • Configure failover paths and load-balancing to remove single points of failure
        • Utilize Gzip Compression for files written to HDFS

        In Detail

        Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop’s HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with many failover and recovery mechanisms.

        Apache Flume: Distributed Log Collection for Hadoop covers problems with HDFS and streaming data/logs, and how Flume can resolve these problems. This book explains the generalized architecture of Flume, which includes moving data to/from databases, NO-SQL-ish data stores, as well as optimizing performance. This book includes real-world scenarios on Flume implementation.

        Apache Flume: Distributed Log Collection for Hadoop starts with an architectural overview of Flume and then discusses each component in detail. It guides you through the complete installation process and compilation of Flume.

        It will give you a heads-up on how to use channels and channel selectors. For each architectural component (Sources, Channels, Sinks, Channel Processors, Sink Groups, and so on) the various implementations will be covered in detail along with configuration options. You can use it to customize Flume to your specific needs. There are pointers given on writing custom implementations as well that would help you learn and implement them.

        • By the end, you should be able to construct a series of Flume agents to transport your streaming data and logs from your systems into Hadoop in near real time.

        What you will learn from this book

        • Understand the Flume architecture
        • Download and install open source Flume from Apache
        • Discover when to use a memory or file-backed channel
        • Understand and configure the Hadoop File System (HDFS) sink
        • Learn how to use sink groups to create redundant data flows
        • Configure and use various sources for ingesting data
        • Inspect data records and route to different or multiple destinations based on payload content
        • Transform data en-route to Hadoop
        • Monitor your data flows

        Approach

        A starter guide that covers Apache Flume in detail.

        Who this book is written for

        Apache Flume: Distributed Log Collection for Hadoop is intended for people who are responsible for moving datasets into Hadoop in a timely and reliable manner like software engineers, database administrators, and data warehouse administrators.

        Table of Contents

        Chapter 1: Overview and Architecture
        Chapter 2: Flume Quick Start
        Chapter 3: Channels
        Chapter 4: Sinks and Sink Processors
        Chapter 5: Sources and Channel Selectors
        Chapter 6: Interceptors, ETL, and Routing
        Chapter 7: Monitoring Flume
        Chapter 8: There Is No Spoon – The Realities of Real-time Distributed Data Collection

  • [message]
    • Book Details
      • Book Name : Apache Flume: Distributed Log Collection for Hadoop

        Edition : 1

        Author : Steve Hoffman

        Publisher : Packt Publishing

        Category : Computers & Internet

        ISBN-10 : 1782167919

        ISBN-13 : 9781782167914

        ASIN : 1782167919

        Pages : 108

        Language : English

        Publish Date : July 16, 2013
  • [message]
    • Purchase on Amazon

These study materials are for information purposes and completely free. If you find these study material useful please write to us in a comment box.

Disclaimer : We are not the original publisher of this Book/Material on net. This eBook/Material had been collected from other sources of net.

Thank You
The Free Study Team

COMMENTS