Overview

  • Actually, i attended hadoop summit 2016 - tokyo at last month. So i am going to review some of the talks that i heard. Also you can download slides and watch the videos from youtube.
  • slides
  • youtube
    • Unfortunately, just keynote videos were uploaded now. But i think each session’s video will be uploaded soon.
  • This article is about Security in Hadoop, especially Apache Ranger & Apache Atlas.

Security in Hadoop

There were two sessions for Security in Hadoop. - Security and Data Governance using Apache Ranger and Apache Atlas by Madhan Neethiraj. - Protecting Enterprise Data in Apache Hadoop by Owen O`Malley.

Nowadays, many organizations have its own data platform with Hadoop. Maybe many users use the platform at the same time, and it can cause several issues. Security is one of the most important requirement to maintain the system. There are various requirements in Security.

  • Administration
    • Central management, Consistent Security
  • Authentication
    • Authenticate users and systems
  • Authorization
    • Provision access to data
  • Audit
    • Maintain a record of data access
  • Data Protection
    • Protect data at rest and in motion

There are several good open-source to handle those things.

I am going to introduce Apache Ranger & Apache Atlas in this article.

Apache Ranger

Goal

  • Centralized security administration to manage all security related tasks in a central UI or using REST APIs.
  • Fine grained authorization to do a specific action and/or operation with Hadoop component/tool and managed through a central administration tool.
  • Standardize authorization method across all Hadoop components.
  • Enhanced support for different authorization methods - Role based access control, attribute based access control etc.
  • Centralize auditing of user access and administrative actions (security related) within all the components of Hadoop.

Centralized Administration

Centralized Administration

  • Apache Ranger support web based UI for centralized administration.

Authorization Policies

Authorization Policies

  • Support easy and consistent way to handle access control across Hadoop components

Row-Filter, Column-masking

  • Row-Filter
    • Row-filter filter the accessible data only.
    • Let’s suppose your organizations have employees at many countries. If you want to grant some user or user group to access specific rows like country, then you can easily set it with row-filter.
  • Column-masking
    • Some specific fields like id, password, personal information are very sensitive.
    • You can hide some characters from those fields with column-masking.

Tag-based Policies

Tag-based Policies

  • Support setting tag with specific rules.
  • Is is very simple to maintain, and reusable.

Audit logs

Access Audio Logs

  • Apache Ranger plugin generate detailed audit logs of access to protected resources.
  • Log destination is also pluggable.
  • Support interactive view in Apache Ranger Console.

Architecture

Apache Ranger Architecture

Apache Atlas

Atlas is a scalable and extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem.

  • Metadata Repository
    • Flexible type system to capture schema/metadata of multiple components
    • Out-of-box models for Hive, HDFS, Storm, Falcon, Sqoop
  • Data Lineage/Provenance
    • Captures data lineage across components
  • Classification
    • Use tags to classify the data – like PII, PHI, PCI, EXPIRES_ON
    • Support for attributes in tags – like expiry_date
  • Search
    • Search using classifications, attributes
    • Advanced search using DSL; convenient full-text search
  • Integrations
    • With Apache Hive, Apache Storm, Apache Falcon, Apache Sqoop for metadata and lineage
    • With Apache Ranger for classification based security
  • APIs to add support for more components

Lineage

Lineage at Apache Atlas

  • Allowing users to drill-down into operational, security, and provenance related infomation

Classification

Classification at Apache Atlas

  • Import or define taxonomy business-oriented annotations for data.
  • Define, annotate, and automate capture of relationships between data sets and underlying elements including source, target, and derivation processes.

Architecture

Apache Atlas Architecture

yongmaroo.kim's profile image

yongmaroo.kim

2016-11-13 05:30

Read more posts by this author