update
This commit is contained in:
parent
5baafeb2f4
commit
1b62994e5d
@ -2,7 +2,7 @@
|
||||
|
||||
Do wykonania ćwiczeń należy skopiować repozytorium:
|
||||
```shell
|
||||
git clone https://git.wmi.amu.edu.pl/bigdata/apache_hadoop
|
||||
git clone https://git.wmi.amu.edu.pl/s1201683/hadoop_zaliczenie
|
||||
```
|
||||
Celem ćwiczenia jest zaprezentowanie aplikacji w oparciu o algorytm MapReduce z wykorzystaniem:
|
||||
|
||||
@ -18,7 +18,7 @@ WordCount jest „odpowiednikiem Hello World” w świecie Big Data. Ćwiczenie
|
||||
Aby wykonać ćwiczenia, należy skopiować folder _books_ do systemu HDFS:
|
||||
```
|
||||
hdfs dfs -mkdir tmp
|
||||
hdfs dfs -copyFromLocal ~/apache_hadoop/mr/books tmp/books
|
||||
hdfs dfs -copyFromLocal ~/hadoop_zaliczenie/mr/books tmp/books
|
||||
```
|
||||
## 1.WordCount – Hadoop Streaming
|
||||
Hadoop streaming umożliwia użytkownikom wykorzystanie mappera i reducera napisanego w dowolnym języku programowania. Jedynym wymaganiem jest obecność interpretera na każdym z węzłów.
|
||||
|
BIN
mr/.DS_Store
vendored
BIN
mr/.DS_Store
vendored
Binary file not shown.
BIN
mr/python/.DS_Store
vendored
Normal file
BIN
mr/python/.DS_Store
vendored
Normal file
Binary file not shown.
@ -2,13 +2,17 @@
|
||||
import sys
|
||||
import re
|
||||
|
||||
|
||||
# input comes from STDIN (standard input)
|
||||
for line in sys.stdin:
|
||||
|
||||
# remove leading and trailing whitespace
|
||||
line = line.strip()
|
||||
words = re.findall(r'\b\w+\b', line)
|
||||
|
||||
# split the line into words
|
||||
words = re.findall(r'\b\w+\b', line) # using regex to find words
|
||||
# increase counters
|
||||
for word in words:
|
||||
|
||||
# apply regex to remove non-alphanumeric characters and convert to lowercase
|
||||
word = re.sub(r'[^a-zA-Z0-9]', '', word).lower()
|
||||
# write the results to STDOUT (standard output);
|
||||
# what we output here will be the input for the
|
||||
# Reduce step, i.e. the input for reducer.py
|
||||
print('%s\t%s' % (word, 1))
|
||||
|
@ -1,4 +1,4 @@
|
||||
|
||||
#!/usr/bin/env python
|
||||
from operator import itemgetter
|
||||
import sys
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user