Sunday 2 April 2017

Hadoop

Hadoop topics

1. Big Data



2.Terminology

2.1 Node
2.2 Rack
2.3 Network switch
2.4 Cluster and types
2.5 Master-Slave Architecture
2.6 Data Pipeline
2.7 Distributed data storage
2.8 Serial processing 
2.9 Parallel processing.
2.10 Scale out/in and up/down 

3. Hadoop Basics



4. Introduction

4.1 What is Hadoop?
4.2 History
4.3 Why Hadoop?
4.4 Hadoop versions
4.5 Advantages & Limitations

5. HDFS


6. sixth chapter



6.1 One
6.2 Two

7. Sixth chapter


7.1 Snapshot of commands
7.2 Execution of commands

8. HDFS Configurations


8.1 What is configuration
8.2 Why should we configure

8. Challenges in HDFS


8.1 Name Node failures
8.2 Secondary Name Node failures
8.3 Data Node failures
8.4 Where HDFS fits
8.5 Where HDFS may not fits

9. Hadoop clustering


9.1 Adding new nodes
9.2 Removing the existing nodes
9.3 Checking the Dead nodes
9.4 Restarting the Dead nodes

10. MAPREDUCE


10.1 Architecture of Map Reduce
10.2 JobTracker
10.3 Why JobTracker
10.4 Role of JobTracker
10.5 TaskTracker
10.6 Why TaskTracker
10.7 Role of TaskTracker
10.8 Job execution flow

11. Hadoop Data types


11.1 What Data type
11.2 Where Data type is useful

12. Input formats


12.1 What is input format
12.2 Importance of input format
12.3 Text
12.4 Key value
12.5 Sequence file
12.6 Nline

13. Output formats


13.1 What is output format
13.2 Importance of output format
13.3 Text
13.4 Sequence file

14. Mapper


14.1 What is Mapper
14.2 Main task of Mapper
14.3 Importance of Mapper
14.4 Advantages and limitations
14.5 Programs in Mapper

15. Reducer


15.1 What is Reducer
15.2 Main task of Reducer
15.3 Importance of Reducer
15.4 Advantages and limitations
15.5 Programs in Reducer

16. Combiner


16.1 What is Combiner
16.2 Main task of Combiner
16.3 Importance of Combiner
16.4 Advantages and limitations
16.5 Programs in Combiner

17. Partitioner


17.1 What is Partitioner
17.2 Main task of Partitioner
17.3 Importance of Partitioner
17.4 Advantages and limitations
17.5 Programs in Partitioner

18. Counter


18.1 What is Counter
18.2 Main task of Counter
18.3 Importance of Counter
18.4 Advantages and limitations
18.5 Programs in Counter

19. Distributed Cache


 19.1 What is Distributed Cache
 19.2 Main task of Distributed Cache
 19.3 Importance of Distributed Cache
 19.4 Advantages and limitations
 19.5 Programs in Distributed Cache

20. Joins


 20.1 Joins in Map side
 20.2 Importance of Joins in Map side
 20.3 Where Map side joins fits
 20.4 Joins in Reduce side
 20.5 Importance of Joins in Reduce side
 20.6 Where Reduce side joins fits

21. Compression


 21.1 What is Compression
 21.2 Why Compression is required
 21.3 How to enable compression
 21.4 How to disable compression

22. Map Reduce Schedulers


 22.1 What is scheduler
 22.2 Importance
 22.3 FIFO
 22.4 Capacity
 22.5 Fair

23. Map Reduce programming model


 23.1 Map Reduce jobs in java
 23.2 Map Reduce jobs in local mode
 23.3 Map Reduce jobs in pseudo mode
 23.4 Map Reduce jobs in cluster mode

24. YARN


 24.1 What is YARN
 24.2 YARN importance
 24.3 Where it is helpful
 24.4 YARN & Map Reduce difference

25. Apache PIG


 25.1 What it is
 25.2 Pig vs Map Reduce
 25.3 Pig vs SQL
 25.4 Data types in Pig
 25.5 Pig local mode execution
 25.6 Pig Map Reduce mode execution

26. PIG UDFs


 26.1 What it is UDF
 26.2 Write UDF
 26.3 Use UDF
 26.4 Importance

27. PIG Filters


 27.1 What it is Filter
 27.2 Write Filter
 27.3 Use Filter
 27.4 Importance

28. Load functions


 28.1 What it is Load function
 28.2 Write Load function
 28.3 Use Load function
 28.4 Importance

29. Store functions


 29.1 Use Store function
 29.2 Importance

30. Apache HIVE


 30.1 What it is
 30.2 Architecture
 30.3 Driver
 30.4 Compiler
 30.5 Integration with Hadoop
 30.6 Hive Query Language
 30.7 Hive QL vs SQL
 30.8 Hive DLL and DML

31. Services in HIVE


 31.1 CLI
 31.2 Hive server
 31.3 Hwi
 31.4 Metastore
 31.5 Metastore configuration

32. Metastore in HIVE


 32.1 Metastore
 32.2 Metastore configuration

33. Hive UDFs


 33.1 What it is UDF
 33.2 Write UDF
 33.3 Use UDF
 33.4 Importance

34. Hive UDAFs


 34.1 Use UDAF
 34.2 Importance

35. Hive UDTFs


 35.1 Use UDAF
 35.2 Importance

36. Hive Partitions


 36.1 What is Hive partitions
 36.2 Importance
 36.3 How to write
 36.4 Limitations

37. Hive Buckets


 37.1 What is Hive bucket
 37.2 Importance
 37.3 How to write

38. Hive SerDe


 38.1 What is SerDe
 38.2 Importance
 38.3 How to write

39. Integration


 39.1 Hive and Hbase integration

40. Apache Zookeeper


 40.1 What it is
 40.2 Commands

41. Apache Hbase


 41.1 What it is
 41.2 user cases
 41.3 Basics

42. Installation of Hbase


 42.1 Local mode
 42.2 Psuedo mode
 42.3 Cluster mode

43. Architecture of Hbase


 43.1 Architecture overview
 43.2 Storage

44. Usage of Hbase


 44.1 Key design
 44.2 Bloom filter
 44.3 Versioning
 44.4 Co-processors
 44.5 Filters

45. Hbase clients


 45.1 REST
 45.2 Thrift
 45.3 Hive
 45.4 Web based UI

46. Hbase Admin


 46.1 Schema definition
 46.2 CURD operations

47. Apache SQOOP


 47.1 What it is
 47.2 Connecting RDB using Sqoop
 47.3 Commands

48. Apache FLUME


 48.1 What it is
 48.2 Examples

49. Apache OOZIE


 49.1 What it is
 49.2 Executing workflow jobs
 49.3 Monitoring workflow jobs