Dodano skopiowane pliki do repozytorium chmury
This commit is contained in:
commit
b6145320e9
13
README.md
Normal file
13
README.md
Normal file
@ -0,0 +1,13 @@
|
||||
# Przetwarzanie danych w chmurze publicznej
|
||||
## Wstęp do Data lakes
|
||||
### Struktura projektu ###
|
||||
|
||||
* jupyter - notebooki z ćwiczeniami
|
||||
* labs - skrypty do laboratorium - testowe dane, generator i podstawowe pliki terraform (starter)
|
||||
* pdf - prezentacja i materiały do ćwiczeń
|
||||
* testing-stack - docker compose z definicjami Kafka, Kafka-Connect
|
||||
|
||||
### Wymagania wstępne ###
|
||||
Instalacja środowiska zgodnie z instrukcją w :
|
||||
|
||||
`./pdf/LABS Setup - Przetwarzanie Danych w chmurze publicznej.pdf`
|
633
jupyter/UAM_1_avro.ipynb
Normal file
633
jupyter/UAM_1_avro.ipynb
Normal file
@ -0,0 +1,633 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 1 Format danych AVRO\n",
|
||||
"\n",
|
||||
"Ćwiczenie ma na celu zademonstrowanie schematów danych AVRO, typów złożonych (mapy, listy, struktury zagnieżdżone) oraz wstęp do Glue / Athena\n",
|
||||
"\n",
|
||||
"## Przebieg ćwiczenia\n",
|
||||
"* skonfiguruj środowisko uruchomieniowe Python (sugerowana Anaconda z Python 3)\n",
|
||||
"* zainstaluj wszystkie wymagane biblioteki\n",
|
||||
"\n",
|
||||
"<code>\n",
|
||||
"% conda create -n myenv python=3.8\n",
|
||||
"% conda activate uam-datalake\n",
|
||||
"% pip install -r ./datalake-uam/jupyter/requirements.txt\n",
|
||||
"</code>\n",
|
||||
"\n",
|
||||
"* zaloguj się do konsoli AWS i stwórz Bucket testowy oraz bazę dancyh w Glue. Uzupełnij poniższy skrypt o te dane \n",
|
||||
"* wygeneruj dane testowe w wybranym schemacie AVRO\n",
|
||||
"* zapisz dane do plików na S3 w folderach s3:/<twoj-bukcet-name>/EventName/namespace=xxx/year=YYYY/month=MM/day=DD/version=VVV\n",
|
||||
"* zarejestruj tabele w Glue z wykorzystaniem BOTO3 / crawler (poprzez konsole AWS GUI - przeglądarkę)\n",
|
||||
"* skonfiguruj domyślną WorkGroup w Athena (PRIMARY) - konieczne wskazanie miejsce docelowego dla danych z zapytań (S3 location Athena) https://docs.aws.amazon.com/athena/latest/ug/getting-started.html\n",
|
||||
"* sprawdź definicję tabeli i upewnij się że są zarejestrowane partycje (użyj polecenia MSCK REPAIR w Athena (LOAD PARTITIONS)\n",
|
||||
"* sprawdź ile danych jest w tabeli (select count(*) from table) - Data Scanned in bytes\n",
|
||||
"* odpytaj tabele z wykorzystaniem predykatu day=1 (partition elimination) - zweryfikuj ilość danych przeskanowanych (do porównania z ćwiczeniem 2 - parquet)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import boto3\n",
|
||||
"\n",
|
||||
"REGION = \"us-east-1\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"session_kwargs = {\n",
|
||||
"\n",
|
||||
" \"aws_access_key_id\":\"\",\n",
|
||||
" \"aws_secret_access_key\":\"\",\n",
|
||||
" \"aws_session_token\":\"\",\n",
|
||||
" \"region_name\": REGION\n",
|
||||
"}\n",
|
||||
" \n",
|
||||
"session = boto3.Session(**session_kwargs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"\n",
|
||||
"from faker import Faker\n",
|
||||
"from botocore.exceptions import ClientError\n",
|
||||
"from avro.datafile import DataFileReader, DataFileWriter\n",
|
||||
"from avro.io import DatumReader, DatumWriter\n",
|
||||
"import time \n",
|
||||
"import io\n",
|
||||
"import datetime\n",
|
||||
"from avro.schema import Parse\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"fake = Faker()\n",
|
||||
"fake.seed_instance(4321)\n",
|
||||
" \n",
|
||||
"S3_BUCKET = \"datalake-dev-920628590621-us-east-1\"\n",
|
||||
"\n",
|
||||
"TEST_DB = 'datalake_dev_jk'\n",
|
||||
"TEST_TABLE_NAME = 'avro_uam_test'\n",
|
||||
"EVENT_NAME = \"UamTestEvent\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"s3_client = session.client(\"s3\")\n",
|
||||
"glue_client = session.client(\"glue\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def tear_down_test_db(database=TEST_DB):\n",
|
||||
" db_names = [x[\"Name\"] for x in glue_client.get_databases()[\"DatabaseList\"] ]\n",
|
||||
" if database in db_names:\n",
|
||||
" glue_client.delete_database(Name=database)\n",
|
||||
" print(\"{} deleted\".format(database))\n",
|
||||
"\n",
|
||||
" response_create_db = glue_client.create_database(DatabaseInput={'Name': database }) \n",
|
||||
" print(\"%s db recreated\" % database)\n",
|
||||
"\n",
|
||||
"def tear_down_test_table(database=TEST_DB, table_name=TEST_TABLE_NAME):\n",
|
||||
" tbl_list = [x[\"Name\"] for x in glue_client.get_tables(DatabaseName=database)[\"TableList\"]]\n",
|
||||
" if table_name in tbl_list:\n",
|
||||
" glue_client.delete_table(DatabaseName=database,Name=table_name)\n",
|
||||
" print(\"test table {} deleted\".format(table_name))\n",
|
||||
" else:\n",
|
||||
" print(\"tbl %s not found\" % table_name)\n",
|
||||
" \n",
|
||||
"def tear_down_s3(bucket=S3_BUCKET,prefix=EVENT_NAME):\n",
|
||||
" s3 = boto3.resource('s3',**session_kwargs)\n",
|
||||
" bucket = s3.Bucket(bucket)\n",
|
||||
" bucket.objects.filter(Prefix=prefix).delete()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 1. AVRO schema"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"UamTestEvent com.uam.datalake.v1 1.0.2\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'mox-meta': {'version': '1.0.2', 'type': 'ENTITY_SNAPSHOT'},\n",
|
||||
" 'namespace': 'com.uam.datalake.v1',\n",
|
||||
" 'type': 'record',\n",
|
||||
" 'name': 'UamTestEvent',\n",
|
||||
" 'fields': [{'name': 'customerId',\n",
|
||||
" 'type': {'type': 'string', 'avro.java.string': 'String'}},\n",
|
||||
" {'name': 'isActive',\n",
|
||||
" 'type': 'boolean',\n",
|
||||
" 'doc': 'a boolean flag if the Customer is active'},\n",
|
||||
" {'name': 'age', 'type': 'int'},\n",
|
||||
" {'name': 'balance', 'type': 'float'},\n",
|
||||
" {'name': 'accountBalance_logical_dec',\n",
|
||||
" 'type': {'type': 'bytes',\n",
|
||||
" 'logicalType': 'decimal',\n",
|
||||
" 'precision': 20,\n",
|
||||
" 'scale': 4}},\n",
|
||||
" {'name': 'array_of_strings',\n",
|
||||
" 'type': ['null',\n",
|
||||
" {'type': 'array',\n",
|
||||
" 'items': {'type': 'string', 'avro.java.string': 'String'}}],\n",
|
||||
" 'default': None},\n",
|
||||
" {'name': 'paymentDetails',\n",
|
||||
" 'type': ['null',\n",
|
||||
" {'type': 'record',\n",
|
||||
" 'name': 'PaymentDetails',\n",
|
||||
" 'fields': [{'name': 'counterPartyName',\n",
|
||||
" 'type': ['null', {'type': 'string', 'avro.java.string': 'String'}],\n",
|
||||
" 'default': None},\n",
|
||||
" {'name': 'groupingId',\n",
|
||||
" 'type': ['null', {'type': 'string', 'avro.java.string': 'String'}],\n",
|
||||
" 'default': None},\n",
|
||||
" {'name': 'payeeId',\n",
|
||||
" 'type': ['null', {'type': 'string', 'avro.java.string': 'String'}],\n",
|
||||
" 'default': None},\n",
|
||||
" {'name': 'message',\n",
|
||||
" 'type': ['null', {'type': 'string', 'avro.java.string': 'String'}],\n",
|
||||
" 'default': None},\n",
|
||||
" {'name': 'type',\n",
|
||||
" 'type': {'type': 'enum',\n",
|
||||
" 'name': 'PaymentType',\n",
|
||||
" 'symbols': ['UNKNOWN', 'ONE', 'TWO']}},\n",
|
||||
" {'name': 'otherAccountId',\n",
|
||||
" 'type': ['null', {'type': 'string', 'avro.java.string': 'String'}],\n",
|
||||
" 'default': None}]}],\n",
|
||||
" 'default': None},\n",
|
||||
" {'name': 'parameters',\n",
|
||||
" 'type': ['null',\n",
|
||||
" {'type': 'map',\n",
|
||||
" 'avro.java.string': 'String',\n",
|
||||
" 'values': {'type': 'string', 'avro.java.string': 'String'}}],\n",
|
||||
" 'default': None}]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# test Avro Schema with all important cases\n",
|
||||
"\n",
|
||||
"1\n",
|
||||
"\n",
|
||||
"schema_string = \"\"\"\n",
|
||||
"{\n",
|
||||
" \"mox-meta\":{\n",
|
||||
" \"version\":\"1.0.2\",\n",
|
||||
" \"type\":\"ENTITY_SNAPSHOT\"\n",
|
||||
" },\n",
|
||||
" \"namespace\":\"com.uam.datalake.v1\",\n",
|
||||
" \"type\":\"record\",\n",
|
||||
" \"name\":\"\",\n",
|
||||
" \"fields\":[\n",
|
||||
" {\n",
|
||||
" \"name\":\"customerId\",\n",
|
||||
" \"type\":{\n",
|
||||
" \"type\":\"string\",\n",
|
||||
" \"avro.java.string\":\"String\"\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\":\"isActive\",\n",
|
||||
" \"type\":\"boolean\",\n",
|
||||
" \"doc\":\"a boolean flag if the Customer is active\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\":\"age\",\n",
|
||||
" \"type\":\"int\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\":\"balance\",\n",
|
||||
" \"type\":\"float\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\":\"accountBalance_logical_dec\",\n",
|
||||
" \"type\":{\n",
|
||||
" \"type\":\"bytes\",\n",
|
||||
" \"logicalType\":\"decimal\",\n",
|
||||
" \"precision\":20,\n",
|
||||
" \"scale\":4\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\":\"array_of_strings\",\n",
|
||||
" \"type\":[\n",
|
||||
" \"null\",\n",
|
||||
" {\n",
|
||||
" \"type\":\"array\",\n",
|
||||
" \"items\":{\n",
|
||||
" \"type\":\"string\",\n",
|
||||
" \"avro.java.string\":\"String\"\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"default\":null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\":\"paymentDetails\",\n",
|
||||
" \"type\":[\n",
|
||||
" \"null\",\n",
|
||||
" {\n",
|
||||
" \"type\":\"record\",\n",
|
||||
" \"name\":\"PaymentDetails\",\n",
|
||||
" \"fields\":[\n",
|
||||
" {\n",
|
||||
" \"name\":\"counterPartyName\",\n",
|
||||
" \"type\":[\n",
|
||||
" \"null\",\n",
|
||||
" {\n",
|
||||
" \"type\":\"string\",\n",
|
||||
" \"avro.java.string\":\"String\"\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"default\":null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\":\"groupingId\",\n",
|
||||
" \"type\":[\n",
|
||||
" \"null\",\n",
|
||||
" {\n",
|
||||
" \"type\":\"string\",\n",
|
||||
" \"avro.java.string\":\"String\"\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"default\":null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\":\"payeeId\",\n",
|
||||
" \"type\":[\n",
|
||||
" \"null\",\n",
|
||||
" {\n",
|
||||
" \"type\":\"string\",\n",
|
||||
" \"avro.java.string\":\"String\"\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"default\":null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\":\"message\",\n",
|
||||
" \"type\":[\n",
|
||||
" \"null\",\n",
|
||||
" {\n",
|
||||
" \"type\":\"string\",\n",
|
||||
" \"avro.java.string\":\"String\"\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"default\":null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\":\"type\",\n",
|
||||
" \"type\":{\n",
|
||||
" \"type\":\"enum\",\n",
|
||||
" \"name\":\"PaymentType\",\n",
|
||||
" \"symbols\":[\n",
|
||||
" \"UNKNOWN\",\n",
|
||||
" \"ONE\",\n",
|
||||
" \"TWO\" \n",
|
||||
" ]\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\":\"otherAccountId\",\n",
|
||||
" \"type\":[\n",
|
||||
" \"null\",\n",
|
||||
" {\n",
|
||||
" \"type\":\"string\",\n",
|
||||
" \"avro.java.string\":\"String\"\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"default\":null\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"default\":null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\":\"parameters\",\n",
|
||||
" \"type\":[\n",
|
||||
" \"null\",\n",
|
||||
" {\n",
|
||||
" \"type\":\"map\",\n",
|
||||
" \"avro.java.string\":\"String\",\n",
|
||||
" \"values\":{\n",
|
||||
" \"type\":\"string\",\n",
|
||||
" \"avro.java.string\":\"String\"\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"default\":null\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
"}\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"schema = json.loads(schema_string)\n",
|
||||
"schema[\"name\"] = EVENT_NAME\n",
|
||||
"\n",
|
||||
"RECORD_NAME = schema[\"name\"]\n",
|
||||
"NAMESPACE = schema[\"namespace\"]\n",
|
||||
"VERSION = schema[\"mox-meta\"][\"version\"]\n",
|
||||
"\n",
|
||||
"print('%s %s %s' %(RECORD_NAME,NAMESPACE,VERSION))\n",
|
||||
"schema"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 2. Generating test data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# generate some avro data (buffer file) based on the above schema\n",
|
||||
"\n",
|
||||
"avro_schema = Parse(json.dumps(schema))\n",
|
||||
"buf = io.BytesIO()\n",
|
||||
"writer = DataFileWriter(buf, DatumWriter(), avro_schema)\n",
|
||||
"\n",
|
||||
"for x in range(0, 10000):\n",
|
||||
"\n",
|
||||
"\n",
|
||||
" customer_id = fake.uuid4()\n",
|
||||
" amount = fake.pydecimal(left_digits=8, right_digits=4)\n",
|
||||
" amount_int = int(str(amount).replace('.', ''))\n",
|
||||
"\n",
|
||||
" strings_arrray = [fake.first_name() for x in range(0, fake.random.randint(1, 5))]\n",
|
||||
"\n",
|
||||
" paymentDetails = {'counterPartyName': customer_id,\n",
|
||||
" 'groupingId': str(fake.uuid4()), 'payeeId': None, 'message': None,\n",
|
||||
" 'type': 'ONE', 'otherAccountId': str(fake.uuid4())}\n",
|
||||
" \n",
|
||||
" randint = fake.random.randint(20, 70)\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" array_of_structs = [{\"field1\": \"one\"}, {\"field1\": \"two\"}]\n",
|
||||
" customer = {\n",
|
||||
" \"customerId\": customer_id,\n",
|
||||
" \"isActive\": fake.random.choice([True, False]),\n",
|
||||
" \"age\": randint,\n",
|
||||
" \"balance\": fake.random.random() * 123,\n",
|
||||
" \"accountBalance_logical_dec\": amount_int.to_bytes(amount_int.bit_length() // 8 + 1, byteorder='big',\n",
|
||||
" signed=True),\n",
|
||||
" \"array_of_strings\": strings_arrray,\n",
|
||||
" \"paymentDetails\": paymentDetails,\n",
|
||||
"\n",
|
||||
" \"parameters\": {\"key1\": \"value1\", \"key2\": \"value2\"}\n",
|
||||
"\n",
|
||||
" }\n",
|
||||
" writer.append(customer)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"writer.flush()\n",
|
||||
"raw_bytes = buf.getvalue()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"uploaded UamTestEvent/namespace=com.uam.datalake.v1/year=2020/month=2/day=1/version=1.0.2/CustData_d6f44c22-5c87-4e14-956b-ad7d985226d0.avro\n",
|
||||
"uploaded UamTestEvent/namespace=com.uam.datalake.v1/year=2020/month=2/day=2/version=1.0.2/CustData_cf0edf74-e76e-458f-a1d2-e092275a719c.avro\n",
|
||||
"uploaded UamTestEvent/namespace=com.uam.datalake.v1/year=2020/month=2/day=3/version=1.0.2/CustData_f9ed4ad7-ed1c-4431-b9ab-317387cdb5af.avro\n",
|
||||
"uploaded UamTestEvent/namespace=com.uam.datalake.v1/year=2020/month=2/day=4/version=1.0.2/CustData_d89b1c7f-66e3-4522-a603-1630abbf24fa.avro\n",
|
||||
"uploaded UamTestEvent/namespace=com.uam.datalake.v1/year=2020/month=2/day=5/version=1.0.2/CustData_92483d0c-5254-4825-9bbb-59545fc6e4dd.avro\n",
|
||||
"uploaded UamTestEvent/namespace=com.uam.datalake.v1/year=2020/month=2/day=6/version=1.0.2/CustData_3473649b-97c5-4597-965b-672a11cdad73.avro\n",
|
||||
"uploaded UamTestEvent/namespace=com.uam.datalake.v1/year=2020/month=2/day=7/version=1.0.2/CustData_ecc999f8-eda7-4782-b844-17809980f34c.avro\n",
|
||||
"uploaded UamTestEvent/namespace=com.uam.datalake.v1/year=2020/month=2/day=8/version=1.0.2/CustData_b95fd45c-8f0d-4612-8f8b-131437895013.avro\n",
|
||||
"uploaded UamTestEvent/namespace=com.uam.datalake.v1/year=2020/month=2/day=9/version=1.0.2/CustData_5a5ef6ba-1576-4450-82a0-3f6f9a10a7c8.avro\n",
|
||||
"uploaded UamTestEvent/namespace=com.uam.datalake.v1/year=2020/month=2/day=10/version=1.0.2/CustData_4e5dc9b8-6eb7-4fa4-93e0-61faf874b698.avro\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tear_down_s3()\n",
|
||||
"\n",
|
||||
"for i in range(1,11):\n",
|
||||
"\n",
|
||||
" target_key_name = '{record_name}/namespace={ns}/year=2020/month=2/day={day}/version={ver}/CustData_{rand}.avro'.format(\n",
|
||||
" record_name=RECORD_NAME,ns=NAMESPACE, day=i,ver=VERSION, rand=fake.uuid4())\n",
|
||||
" try:\n",
|
||||
" response = s3_client.put_object(Body=raw_bytes, Bucket=S3_BUCKET, Key=target_key_name)\n",
|
||||
" print(\"uploaded %s\" % target_key_name)\n",
|
||||
" except ClientError as e:\n",
|
||||
" logging.error(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3. Avro reading"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"UamTestEvent/namespace=com.uam.datalake.v1/year=2020/month=2/day=10/version=1.0.2/CustData_4e5dc9b8-6eb7-4fa4-93e0-61faf874b698.avro\n",
|
||||
"{'customerId': 'cc733c92-6853-45f6-8e49-bec741188ebb', 'isActive': True, 'age': 58, 'balance': 49.34309768676758, 'accountBalance_logical_dec': b'7L\\xbc\\xff\\xf3', 'array_of_strings': ['Rebecca'], 'paymentDetails': {'counterPartyName': 'cc733c92-6853-45f6-8e49-bec741188ebb', 'groupingId': '9626bf79-2f97-4c0c-9aae-de080adab7df', 'payeeId': None, 'message': None, 'type': 'ONE', 'otherAccountId': '69261bc2-4a71-4de7-bc8b-1beb0d9320ac'}, 'parameters': {'key1': 'value1', 'key2': 'value2'}}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(target_key_name)\n",
|
||||
"\n",
|
||||
"obj = s3_client.get_object(Bucket=S3_BUCKET, Key=target_key_name)\n",
|
||||
"record_raw = obj['Body'].read()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"reader = DataFileReader(io.BytesIO(record_raw), DatumReader())\n",
|
||||
"for line in reader:\n",
|
||||
" print(line)\n",
|
||||
" break\n",
|
||||
"\n",
|
||||
"avro_schema = reader.meta[\"avro.schema\"]\n",
|
||||
"reader.close()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"b'{\"type\": \"record\", \"mox-meta\": {\"version\": \"1.0.2\", \"type\": \"ENTITY_SNAPSHOT\"}, \"name\": \"UamTestEvent\", \"namespace\": \"com.uam.datalake.v1\", \"fields\": [{\"type\": {\"type\": \"string\", \"avro.java.string\": \"String\"}, \"name\": \"customerId\"}, {\"type\": \"boolean\", \"name\": \"isActive\", \"doc\": \"a boolean flag if the Customer is active\"}, {\"type\": \"int\", \"name\": \"age\"}, {\"type\": \"float\", \"name\": \"balance\"}, {\"type\": {\"type\": \"bytes\", \"logicalType\": \"decimal\", \"precision\": 20, \"scale\": 4}, \"name\": \"accountBalance_logical_dec\"}, {\"type\": [\"null\", {\"type\": \"array\", \"items\": {\"type\": \"string\", \"avro.java.string\": \"String\"}}], \"name\": \"array_of_strings\", \"default\": null}, {\"type\": [\"null\", {\"type\": \"record\", \"name\": \"PaymentDetails\", \"namespace\": \"com.uam.datalake.v1\", \"fields\": [{\"type\": [\"null\", {\"type\": \"string\", \"avro.java.string\": \"String\"}], \"name\": \"counterPartyName\", \"default\": null}, {\"type\": [\"null\", {\"type\": \"string\", \"avro.java.string\": \"String\"}], \"name\": \"groupingId\", \"default\": null}, {\"type\": [\"null\", {\"type\": \"string\", \"avro.java.string\": \"String\"}], \"name\": \"payeeId\", \"default\": null}, {\"type\": [\"null\", {\"type\": \"string\", \"avro.java.string\": \"String\"}], \"name\": \"message\", \"default\": null}, {\"type\": {\"type\": \"enum\", \"name\": \"PaymentType\", \"namespace\": \"com.uam.datalake.v1\", \"symbols\": [\"UNKNOWN\", \"ONE\", \"TWO\"]}, \"name\": \"type\"}, {\"type\": [\"null\", {\"type\": \"string\", \"avro.java.string\": \"String\"}], \"name\": \"otherAccountId\", \"default\": null}]}], \"name\": \"paymentDetails\", \"default\": null}, {\"type\": [\"null\", {\"type\": \"map\", \"avro.java.string\": \"String\", \"values\": {\"type\": \"string\", \"avro.java.string\": \"String\"}}], \"name\": \"parameters\", \"default\": null}]}'"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"avro_schema"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"tbl avro_uam_test not found\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'ResponseMetadata': {'RequestId': '1d5a05a1-0253-436c-a85a-d21f334d51ae',\n",
|
||||
" 'HTTPStatusCode': 200,\n",
|
||||
" 'HTTPHeaders': {'date': 'Sat, 24 Apr 2021 11:20:04 GMT',\n",
|
||||
" 'content-type': 'application/x-amz-json-1.1',\n",
|
||||
" 'content-length': '2',\n",
|
||||
" 'connection': 'keep-alive',\n",
|
||||
" 'x-amzn-requestid': '1d5a05a1-0253-436c-a85a-d21f334d51ae'},\n",
|
||||
" 'RetryAttempts': 0}}"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"#register glue table with avro SCHEMA\n",
|
||||
"tear_down_test_table() # create or replace\n",
|
||||
"\n",
|
||||
"glue_client.create_table(\n",
|
||||
" DatabaseName=TEST_DB,\n",
|
||||
" TableInput={\n",
|
||||
" \"Name\" : TEST_TABLE_NAME,\n",
|
||||
" 'Owner': 'owner',\n",
|
||||
" 'StorageDescriptor': {\n",
|
||||
" 'Columns': [\n",
|
||||
" {'Name': 'customerId', 'Type': 'string'},\n",
|
||||
" {'Name': 'isActive', 'Type': 'boolean'},\n",
|
||||
" {'Name': 'age', 'Type': 'int'}, \n",
|
||||
" {'Name': 'balance', 'Type': 'float'},\n",
|
||||
" {'Name': 'accountBalance_logical_dec', 'Type': 'decimal(20,4)'},\n",
|
||||
" {'Name': 'array_of_strings', 'Type': 'array<string>'},\n",
|
||||
" {'Name': 'paymentdetails',\n",
|
||||
" 'Type': 'struct<counterpartyname:string,groupingid:string,payeeid:string,message:string,type:string,otheraccountid:string>'},\n",
|
||||
" {'Name': 'parameters', 'Type': 'map<string,string>'}\n",
|
||||
" ],\n",
|
||||
" 'Location': 's3://{}/{}/namespace={}/'.format(S3_BUCKET,RECORD_NAME,NAMESPACE),\n",
|
||||
" 'InputFormat': 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat',\n",
|
||||
" 'OutputFormat': 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat',\n",
|
||||
" 'Compressed': False,\n",
|
||||
" 'NumberOfBuckets': -1,\n",
|
||||
" 'SerdeInfo': {\n",
|
||||
" 'SerializationLibrary': 'org.apache.hadoop.hive.serde2.avro.AvroSerDe',\n",
|
||||
" 'Parameters': \n",
|
||||
" {\n",
|
||||
" 'avro.schema.literal': json.dumps(schema),\n",
|
||||
" 'serialization.format': '1'\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" 'BucketColumns': [],\n",
|
||||
" 'SortColumns': [],\n",
|
||||
" },\n",
|
||||
" 'PartitionKeys': [\n",
|
||||
" {'Name': 'year','Type': 'int'},\n",
|
||||
" {'Name': 'month','Type': 'int'},\n",
|
||||
" {'Name': 'day','Type': 'int'},\n",
|
||||
" {'Name': 'version','Type': 'string'}\n",
|
||||
" ],\n",
|
||||
" 'TableType': 'EXTERNAL_TABLE', \n",
|
||||
" 'Parameters': {\n",
|
||||
" \n",
|
||||
" 'avro.schema.literal': json.dumps(schema),\n",
|
||||
" 'classification': 'avro',\n",
|
||||
" 'compressionType': 'none',\n",
|
||||
" \n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### now you can find the table in Glue Data Catalogue and query with Athena (remember about partitions)\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"SELECT * , paymentdetails.groupingid , \"$path\"\n",
|
||||
"FROM \"avro_uam_test\" as a \n",
|
||||
"CROSS JOIN UNNEST(array_of_strings) as t(names)\n",
|
||||
"where customerid = '0d6f913f-9364-4898-875e-d07311d1e300' and day = 1\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
606
jupyter/UAM_2_parquet.ipynb
Normal file
606
jupyter/UAM_2_parquet.ipynb
Normal file
File diff suppressed because one or more lines are too long
41
jupyter/UAM_3_Kafka_Connect.md
Normal file
41
jupyter/UAM_3_Kafka_Connect.md
Normal file
@ -0,0 +1,41 @@
|
||||
### Runbook kafka stack + kafka connect
|
||||
1. uruchom stack
|
||||
```
|
||||
docker-compose -f docker-compose.yml up
|
||||
```
|
||||
|
||||
2. Sprawdz jakie sa topiki na Kafce
|
||||
```
|
||||
./kafka-topics.sh --bootstrap-server localhost:9092 --list
|
||||
```
|
||||
|
||||
3. Stworz nowy topik : test-topic
|
||||
```
|
||||
./kafka-topics.sh --bootstrap-server localhost:9092 --topic json.test.topic --create --partitions 3 --replication-factor
|
||||
```
|
||||
|
||||
4. Połącz się producerem do Kafki na topik test-topic
|
||||
```
|
||||
./kafka-console-producer.sh --broker-list localhost:9092 --topic json.test.topic
|
||||
```
|
||||
|
||||
5. Połącz się z Kafką na json.test-topic
|
||||
```
|
||||
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic json.test.topic --from-beginning --property print.timestamp=true --from-beginning --property print.key=true
|
||||
```
|
||||
|
||||
6. Deploy connectors
|
||||
|
||||
Najpierw simple connector - wyślij na http://localhost:8083/connectors/ (POST method) definicję UAM_3_sink_connector_simple.json np. z Postmana
|
||||
|
||||
|
||||
Teraz napisz kilka wiadomości i poczekaj aż zrzuci do Minio
|
||||
|
||||
7. `docker-compose logs minio -f`
|
||||
`docker-compose ps`
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
39
jupyter/UAM_3_sink_connector_SMT.json
Normal file
39
jupyter/UAM_3_sink_connector_SMT.json
Normal file
@ -0,0 +1,39 @@
|
||||
{
|
||||
"name": "sink-s3-bytes",
|
||||
"config": {
|
||||
"timestamp.extractor": "Record",
|
||||
"locale": "US",
|
||||
"timezone": "UTC",
|
||||
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
|
||||
"tasks.max": "3",
|
||||
"s3.region": "ap-southeast-1",
|
||||
"s3.bucket.name": "dev-data-raw",
|
||||
"s3.acl.canned": "bucket-owner-full-control",
|
||||
"s3.part.size": "5242880",
|
||||
"flush.size": "10",
|
||||
"rotate.interval.ms": "3600000",
|
||||
"rotate.schedule.interval.ms": "3000",
|
||||
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
|
||||
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
|
||||
"value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
|
||||
"value.converter.schemas.enable": "false",
|
||||
"key.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
|
||||
"key.converter.schemas.enable": "false",
|
||||
"partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
|
||||
"path.format": "'year'=YYYY/'month'=M/'day'=d/'hour'=H",
|
||||
"partition.duration.ms": "3600000",
|
||||
"schema.generator.class": "io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator",
|
||||
"schema.compatibility": "NONE",
|
||||
"name": "sink-s3-bytes",
|
||||
"topics.regex": ".*",
|
||||
"topics.dir": "sink-s3-bytes",
|
||||
"transforms": "MakeMap, InsertMetadata",
|
||||
"transforms.MakeMap.type": "org.apache.kafka.connect.transforms.HoistField$Value",
|
||||
"transforms.MakeMap.field": "msg_payload",
|
||||
"transforms.InsertMetadata.type": "org.apache.kafka.connect.transforms.InsertField$Value",
|
||||
"transforms.InsertMetadata.partition.field": "msg_partition",
|
||||
"transforms.InsertMetadata.offset.field": "msg_offset",
|
||||
"transforms.InsertMetadata.timestamp.field": "msg_ts",
|
||||
"store.url": "http://minio:9000"
|
||||
}
|
||||
}
|
27
jupyter/UAM_3_sink_connector_simple.json
Normal file
27
jupyter/UAM_3_sink_connector_simple.json
Normal file
@ -0,0 +1,27 @@
|
||||
{
|
||||
"name": "sink-s3-json",
|
||||
"config": {
|
||||
"timestamp.extractor": "Record",
|
||||
"locale": "US",
|
||||
"timezone": "UTC",
|
||||
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
|
||||
"tasks.max": "1",
|
||||
"s3.region": "ap-southeast-1",
|
||||
"s3.bucket.name": "dev-data-raw",
|
||||
"s3.part.size": "5242880",
|
||||
"flush.size": "10",
|
||||
"rotate.interval.ms": "3600000",
|
||||
"rotate.schedule.interval.ms": "3000",
|
||||
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
|
||||
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
|
||||
"partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
|
||||
"path.format": "'year'=YYYY/'month'=M/'day'=d",
|
||||
"partition.duration.ms": "3600000",
|
||||
"schema.generator.class": "io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator",
|
||||
"schema.compatibility": "NONE",
|
||||
"name": "sink-s3-json",
|
||||
"topics.regex": "json.*",
|
||||
"store.url": "http://minio:9000",
|
||||
"topics.dir": "sink-s3-json"
|
||||
}
|
||||
}
|
412
jupyter/UAM_4_Kinesis.ipynb
Normal file
412
jupyter/UAM_4_Kinesis.ipynb
Normal file
@ -0,0 +1,412 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 4 Kinesis Data Streams & Firehose\n",
|
||||
"\n",
|
||||
"Przebieg ćwiczenia\n",
|
||||
"* Stwórz Data Stream z wykorzystaniem boto3 / AWS console (GUI)\n",
|
||||
"* wygeneruj testowe dane do streama\n",
|
||||
"* odczytaj dane ze streama (ShardIterator)\n",
|
||||
"* Stwórz Kinesis Firehose Stream i podepnij pod niego utworzony wcześniej Data Stream jako source. Skonfiguruj buffor size = 1Mb buffor time = 60s\n",
|
||||
"* wygeneruj 10000 wiadomości i sprawdź czy dane ładowane są do S3\n",
|
||||
"\n",
|
||||
"## Pamiętaj aby po skończonych ćwiczeniach usunąć wszystkie obiekty\n",
|
||||
"### Uwaga !!! poniższy skrypt tworzy obiekty w regionie HongKong ! Na końcu skryptu jest funkcja tear_down_all() która usuwa testowy bucket, bazę Glue i Kinesis Data Streams czyli wszystkie obiekty które były stworzone w kodzie."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import boto3\n",
|
||||
"\n",
|
||||
"REGION = \"ap-east-1\"\n",
|
||||
"\n",
|
||||
"session_kwargs = {\n",
|
||||
"\n",
|
||||
" \"aws_access_key_id\":\"\",\n",
|
||||
" \"aws_secret_access_key\":\"\",\n",
|
||||
" \"aws_session_token\":\"\",\n",
|
||||
" \"region_name\": REGION\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"session = boto3.Session(**session_kwargs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"kinesis_client = session.client(\"kinesis\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'ResponseMetadata': {'RequestId': 'dd209b3a-57d1-b862-8a48-bc194546845a',\n",
|
||||
" 'HTTPStatusCode': 200,\n",
|
||||
" 'HTTPHeaders': {'x-amzn-requestid': 'dd209b3a-57d1-b862-8a48-bc194546845a',\n",
|
||||
" 'x-amz-id-2': 'k78aY4x6wCDEXo6kL76yEG64tV2ct9TQxM76Bfy345CJgSaVdfDJsjlr1jzNnpRxVk2qc+G9L42xZbmYrx/mivAFMWoek7Si',\n",
|
||||
" 'date': 'Sat, 20 Jun 2020 15:01:55 GMT',\n",
|
||||
" 'content-type': 'application/x-amz-json-1.1',\n",
|
||||
" 'content-length': '0'},\n",
|
||||
" 'RetryAttempts': 0}}"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"STREAM_NAME = 'uam-test'\n",
|
||||
"\n",
|
||||
"kinesis_client.create_stream(\n",
|
||||
" StreamName=STREAM_NAME,\n",
|
||||
" ShardCount=1\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['uam-test']"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"kinesis_client.list_streams()[\"StreamNames\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'ShardId': 'shardId-000000000000', 'SequenceNumber': '49608139280302835973846909376978574930250627257427034114', 'ResponseMetadata': {'RequestId': 'c4858035-fc96-5621-93ed-a71b8bbdb95f', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'c4858035-fc96-5621-93ed-a71b8bbdb95f', 'x-amz-id-2': 'Z8hEKLIaCne7YPLWuDMTiFcAt7s1HJvfENE/2Oj7ARnuzjOsVNj+QyHpuIbM0tMucp7YZxLq2M65Xy/bgyyYeI2T5eJviuYA', 'date': 'Sat, 20 Jun 2020 15:02:06 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '110'}, 'RetryAttempts': 0}}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"response = kinesis_client.put_record(\n",
|
||||
" StreamName=STREAM_NAME,\n",
|
||||
" Data=b'{\"col1\" : \"this is my test json data\"}',\n",
|
||||
" PartitionKey='1' \n",
|
||||
")\n",
|
||||
"print(response)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"ename": "NameError",
|
||||
"evalue": "name 'kinesis_client' is not defined",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)",
|
||||
"\u001b[1;32m<ipython-input-3-f9ad17c57d0b>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mresponse\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mkinesis_client\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mdescribe_stream\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mStreamName\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mSTREAM_NAME\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mresponse\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[1;31mNameError\u001b[0m: name 'kinesis_client' is not defined"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"response = kinesis_client.describe_stream(StreamName=STREAM_NAME) \n",
|
||||
"print(response)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"ename": "NameError",
|
||||
"evalue": "name 'response' is not defined",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)",
|
||||
"\u001b[1;32m<ipython-input-4-bfebdd8c4299>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0mshard_ids\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[0mstream_name\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[1;32mif\u001b[0m \u001b[0mresponse\u001b[0m \u001b[1;32mand\u001b[0m \u001b[1;34m'StreamDescription'\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mresponse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 4\u001b[0m \u001b[0mstream_name\u001b[0m\u001b[1;33m=\u001b[0m \u001b[0mresponse\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'StreamDescription'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'StreamName'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[1;31mNameError\u001b[0m: name 'response' is not defined"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"shard_ids = []\n",
|
||||
"stream_name = None \n",
|
||||
"if response and 'StreamDescription' in response:\n",
|
||||
" stream_name= response['StreamDescription']['StreamName'] \n",
|
||||
" \n",
|
||||
" for shard_id in response['StreamDescription']['Shards']:\n",
|
||||
" shard_id = shard_id['ShardId']\n",
|
||||
" shard_iterator = kinesis_client.get_shard_iterator(StreamName=stream_name, ShardId = shard_id, ShardIteratorType=\"TRIM_HORIZON\")\n",
|
||||
" shard_ids.append({'shard_id' : shard_id ,'shard_iterator' : shard_iterator['ShardIterator'] })\n",
|
||||
" \n",
|
||||
"shard_ids"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'AAAAAAAAAAHfJPwIwOEqHzQIjn90snM/nPs4zZARsJlXPyGlUTbvU+T5cdGvXzb54qetks+heTq/ttfFlehkcLGr27CpkPNDn2A9NHYc1w+3VjLIBmNKTLJlHnCjjFCwgqksrs1mUQVli12hZjy6wZXhGualZUI//H2BxRwKqH/Pf2Zk9S6KSbeJFDm0boV2COPqB3wZ21axe8lWXJVJAfjMgPacIU6K'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"sh = shard_iterator[\"ShardIterator\"]\n",
|
||||
"sh"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tries = 0\n",
|
||||
"limit = 100\n",
|
||||
"result = []\n",
|
||||
"while tries < 10:\n",
|
||||
" tries += 1\n",
|
||||
" response_get_rec = kinesis_client.get_records(ShardIterator = sh , Limit = limit)\n",
|
||||
" shard_iterator = response_get_rec['NextShardIterator']\n",
|
||||
" break\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'Records': [{'SequenceNumber': '49608139280302835973846909376978574930250627257427034114',\n",
|
||||
" 'ApproximateArrivalTimestamp': datetime.datetime(2020, 6, 20, 17, 2, 6, 976000, tzinfo=tzlocal()),\n",
|
||||
" 'Data': b'{\"col1\" : \"this is my test json data\"}',\n",
|
||||
" 'PartitionKey': '1'}],\n",
|
||||
" 'NextShardIterator': 'AAAAAAAAAAEgmwgBEgaauGNF/YzN5S+FcuWOWZZMledH4BR7CLiGD9iYYL4z+eLK7NaTQTHTAlSFEYm6N6vtjdFcTl8ibGJGKnuQthZiMgCfolA1FAAoWmLHvI0slHvZx1oWLfdApD8robDWj3zX/2d4zOzj1P9xz/+Xo8/YFdCXd0ENfUNxI7MhzZUGamw09rXa8Y0sDunFpkLy7msr5vjURGjr+xrf',\n",
|
||||
" 'MillisBehindLatest': 0,\n",
|
||||
" 'ResponseMetadata': {'RequestId': 'e580a084-b06e-ca24-b2e8-87b6c745255a',\n",
|
||||
" 'HTTPStatusCode': 200,\n",
|
||||
" 'HTTPHeaders': {'x-amzn-requestid': 'e580a084-b06e-ca24-b2e8-87b6c745255a',\n",
|
||||
" 'x-amz-id-2': '/zy9vbX4wzEuqd7TZ959MQcL0OzB9kPQ3TSfrtEIIpI1IvubKb7OgnxYhxiWIZpvPWbtzXO5x0r+eO6+A8wR+bn0v8khnlpL',\n",
|
||||
" 'date': 'Sat, 20 Jun 2020 15:02:10 GMT',\n",
|
||||
" 'content-type': 'application/x-amz-json-1.1',\n",
|
||||
" 'content-length': '489'},\n",
|
||||
" 'RetryAttempts': 0}}"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"response_get_rec"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 99,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['49608122819498384072835111163875039160761192199284064258']"
|
||||
]
|
||||
},
|
||||
"execution_count": 99,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"[x[\"SequenceNumber\"] for x in response_get_rec[\"Records\"] ]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 100,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'SequenceNumber': '49608122819498384072835111163875039160761192199284064258',\n",
|
||||
" 'ApproximateArrivalTimestamp': datetime.datetime(2020, 6, 20, 4, 13, 11, 39000, tzinfo=tzlocal()),\n",
|
||||
" 'Data': b'{\"col1\" : \"this is my test json data\"}',\n",
|
||||
" 'PartitionKey': '1'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 100,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"response_get_rec[\"Records\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Utworz recznie Kinesis Firehose dla tego Stream'a dopiero pozniej wygeneruj dane testowe ponizsza petla"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"ename": "KeyboardInterrupt",
|
||||
"evalue": "",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
|
||||
"\u001b[0;32m<ipython-input-13-0e9e1b83eba0>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m10000\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m response = kinesis_client.put_record(\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mStreamName\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mSTREAM_NAME\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mData\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34mb'{\"col1\" : \"this is json data\"}'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mPartitionKey\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'1'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/site-packages/botocore/client.py\u001b[0m in \u001b[0;36m_api_call\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 314\u001b[0m \"%s() only accepts keyword arguments.\" % py_operation_name)\n\u001b[1;32m 315\u001b[0m \u001b[0;31m# The \"self\" in this scope is referring to the BaseClient.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 316\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_api_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moperation_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 317\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 318\u001b[0m \u001b[0m_api_call\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpy_operation_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/site-packages/botocore/client.py\u001b[0m in \u001b[0;36m_make_api_call\u001b[0;34m(self, operation_name, api_params)\u001b[0m\n\u001b[1;32m 619\u001b[0m \u001b[0mhttp\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparsed_response\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mevent_response\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 620\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 621\u001b[0;31m http, parsed_response = self._make_request(\n\u001b[0m\u001b[1;32m 622\u001b[0m operation_model, request_dict, request_context)\n\u001b[1;32m 623\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/site-packages/botocore/client.py\u001b[0m in \u001b[0;36m_make_request\u001b[0;34m(self, operation_model, request_dict, request_context)\u001b[0m\n\u001b[1;32m 639\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_make_request\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moperation_model\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrequest_dict\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrequest_context\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 640\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 641\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_endpoint\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmake_request\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moperation_model\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrequest_dict\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 642\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 643\u001b[0m self.meta.events.emit(\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/site-packages/botocore/endpoint.py\u001b[0m in \u001b[0;36mmake_request\u001b[0;34m(self, operation_model, request_dict)\u001b[0m\n\u001b[1;32m 100\u001b[0m logger.debug(\"Making request for %s with params: %s\",\n\u001b[1;32m 101\u001b[0m operation_model, request_dict)\n\u001b[0;32m--> 102\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_send_request\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrequest_dict\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moperation_model\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mcreate_request\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moperation_model\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/site-packages/botocore/endpoint.py\u001b[0m in \u001b[0;36m_send_request\u001b[0;34m(self, request_dict, operation_model)\u001b[0m\n\u001b[1;32m 132\u001b[0m \u001b[0mrequest\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcreate_request\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrequest_dict\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moperation_model\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 133\u001b[0m \u001b[0mcontext\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrequest_dict\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'context'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 134\u001b[0;31m success_response, exception = self._get_response(\n\u001b[0m\u001b[1;32m 135\u001b[0m request, operation_model, context)\n\u001b[1;32m 136\u001b[0m while self._needs_retry(attempts, operation_model, request_dict,\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/site-packages/botocore/endpoint.py\u001b[0m in \u001b[0;36m_get_response\u001b[0;34m(self, request, operation_model, context)\u001b[0m\n\u001b[1;32m 164\u001b[0m \u001b[0;31m# If an exception occurs then the success_response is None.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 165\u001b[0m \u001b[0;31m# If no exception occurs then exception is None.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 166\u001b[0;31m success_response, exception = self._do_get_response(\n\u001b[0m\u001b[1;32m 167\u001b[0m request, operation_model)\n\u001b[1;32m 168\u001b[0m kwargs_to_emit = {\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/site-packages/botocore/endpoint.py\u001b[0m in \u001b[0;36m_do_get_response\u001b[0;34m(self, request, operation_model)\u001b[0m\n\u001b[1;32m 198\u001b[0m \u001b[0mhttp_response\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfirst_non_none_response\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresponses\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 199\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mhttp_response\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 200\u001b[0;31m \u001b[0mhttp_response\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_send\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrequest\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 201\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mHTTPClientError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 202\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/site-packages/botocore/endpoint.py\u001b[0m in \u001b[0;36m_send\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m 267\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 268\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_send\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrequest\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 269\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhttp_session\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrequest\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 270\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 271\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/site-packages/botocore/httpsession.py\u001b[0m in \u001b[0;36msend\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m 252\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 253\u001b[0m \u001b[0mrequest_target\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_request_target\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrequest\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mproxy_url\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 254\u001b[0;31m urllib_response = conn.urlopen(\n\u001b[0m\u001b[1;32m 255\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mrequest\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 256\u001b[0m \u001b[0murl\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mrequest_target\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/site-packages/urllib3/connectionpool.py\u001b[0m in \u001b[0;36murlopen\u001b[0;34m(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)\u001b[0m\n\u001b[1;32m 668\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 669\u001b[0m \u001b[0;31m# Make the request on the httplib connection object.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 670\u001b[0;31m httplib_response = self._make_request(\n\u001b[0m\u001b[1;32m 671\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 672\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/site-packages/urllib3/connectionpool.py\u001b[0m in \u001b[0;36m_make_request\u001b[0;34m(self, conn, method, url, timeout, chunked, **httplib_request_kw)\u001b[0m\n\u001b[1;32m 424\u001b[0m \u001b[0;31m# Python 3 (including for exceptions like SystemExit).\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 425\u001b[0m \u001b[0;31m# Otherwise it looks like a bug in the code.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 426\u001b[0;31m \u001b[0msix\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mraise_from\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 427\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mSocketTimeout\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mBaseSSLError\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mSocketError\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 428\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_raise_timeout\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0murl\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtimeout_value\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mread_timeout\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/site-packages/urllib3/packages/six.py\u001b[0m in \u001b[0;36mraise_from\u001b[0;34m(value, from_value)\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/site-packages/urllib3/connectionpool.py\u001b[0m in \u001b[0;36m_make_request\u001b[0;34m(self, conn, method, url, timeout, chunked, **httplib_request_kw)\u001b[0m\n\u001b[1;32m 419\u001b[0m \u001b[0;31m# Python 3\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 420\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 421\u001b[0;31m \u001b[0mhttplib_response\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetresponse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 422\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mBaseException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 423\u001b[0m \u001b[0;31m# Remove the TypeError from the exception chain in\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/http/client.py\u001b[0m in \u001b[0;36mgetresponse\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 1330\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1331\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1332\u001b[0;31m \u001b[0mresponse\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbegin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1333\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mConnectionError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1334\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/http/client.py\u001b[0m in \u001b[0;36mbegin\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 301\u001b[0m \u001b[0;31m# read until we get a non-100 response\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 302\u001b[0m \u001b[0;32mwhile\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 303\u001b[0;31m \u001b[0mversion\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstatus\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mreason\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_read_status\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 304\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mstatus\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0mCONTINUE\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 305\u001b[0m \u001b[0;32mbreak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/http/client.py\u001b[0m in \u001b[0;36m_read_status\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 262\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 263\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_read_status\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 264\u001b[0;31m \u001b[0mline\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreadline\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0m_MAXLINE\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"iso-8859-1\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 265\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mline\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0m_MAXLINE\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 266\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mLineTooLong\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"status line\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/socket.py\u001b[0m in \u001b[0;36mreadinto\u001b[0;34m(self, b)\u001b[0m\n\u001b[1;32m 667\u001b[0m \u001b[0;32mwhile\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 668\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 669\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_sock\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrecv_into\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 670\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mtimeout\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 671\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_timeout_occurred\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/ssl.py\u001b[0m in \u001b[0;36mrecv_into\u001b[0;34m(self, buffer, nbytes, flags)\u001b[0m\n\u001b[1;32m 1239\u001b[0m \u001b[0;34m\"non-zero flags not allowed in calls to recv_into() on %s\"\u001b[0m \u001b[0;34m%\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1240\u001b[0m self.__class__)\n\u001b[0;32m-> 1241\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnbytes\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbuffer\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1242\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1243\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0msuper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrecv_into\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mbuffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnbytes\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mflags\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/opt/anaconda3/envs/uam-d/lib/python3.8/ssl.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, len, buffer)\u001b[0m\n\u001b[1;32m 1097\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1098\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mbuffer\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1099\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_sslobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbuffer\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1100\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1101\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_sslobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;31mKeyboardInterrupt\u001b[0m: "
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for i in range(0,10000):\n",
|
||||
" response = kinesis_client.put_record(\n",
|
||||
" StreamName=STREAM_NAME,\n",
|
||||
" Data=b'{\"col1\" : \"this is json data\"}',\n",
|
||||
" PartitionKey='1' \n",
|
||||
" )\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"BUCKET_NAME = 'datalake-uam'\n",
|
||||
"GLUE_DB = 'uam'\n",
|
||||
"KINESIS_STREAM = 'uam-test'\n",
|
||||
"KINESIS_FIREHOSE = 'uam-test-fh'\n",
|
||||
"\n",
|
||||
"s3_client = session.client(\"s3\")\n",
|
||||
"glue_client = session.client(\"glue\")\n",
|
||||
"\n",
|
||||
"def tear_down_all():\n",
|
||||
" \n",
|
||||
" s3 = boto3.resource('s3',**session_kwargs)\n",
|
||||
" bucket = s3.Bucket(BUCKET_NAME)\n",
|
||||
" bucket.objects.delete()\n",
|
||||
" \n",
|
||||
" s3_client.delete_bucket(\n",
|
||||
" Bucket = BUCKET_NAME\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" glue_client.delete_database(\n",
|
||||
" Name=GLUE_DB\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" kinesis_client.delete_stream(StreamName=STREAM_NAME)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tear_down_all()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
262
jupyter/UAM_Lab_1_reading_from_kinesis_stream.ipynb
Normal file
262
jupyter/UAM_Lab_1_reading_from_kinesis_stream.ipynb
Normal file
@ -0,0 +1,262 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "alternate-pantyhose",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Lab1 Czytanie Kinesis Data Streams \n",
|
||||
"\n",
|
||||
"Przebieg ćwiczenia\n",
|
||||
"* Stwórz Data Stream \n",
|
||||
"* wygeneruj testowe dane do streama\n",
|
||||
"* odczytaj dane ze streama (ShardIterator)\n",
|
||||
"* zwróć uwagę na iteracje po shardach i iteratorach (per shard)\n",
|
||||
"* porównaj przeczytane dane z danymi wygenerowanymi (czytamy dwie iteracje - pierwsze 10 rekordów TRIM_HORIZON)\n",
|
||||
"* sprawdź jakie inne opcje ustawienia punktu w shardzie są"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "gross-series",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import boto3\n",
|
||||
"from pprint import pprint\n",
|
||||
"\n",
|
||||
"kinesis_client = boto3.client('kinesis')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "excited-latex",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['cryptostock-dev-100603781557-jk-12345']"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"kinesis_client.list_streams()[\"StreamNames\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "excellent-address",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'cryptostock-dev-100603781557-jk-12345'"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"STREAM_NAME = kinesis_client.list_streams()[\"StreamNames\"][0]\n",
|
||||
"STREAM_NAME"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "attended-combat",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[{'HashKeyRange': {'EndingHashKey': '340282366920938463463374607431768211455',\n",
|
||||
" 'StartingHashKey': '0'},\n",
|
||||
" 'SequenceNumberRange': {'StartingSequenceNumber': '49617445977150094507622122574044516561004852020651229186'},\n",
|
||||
" 'ShardId': 'shardId-000000000000'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"response = kinesis_client.describe_stream(StreamName=STREAM_NAME) \n",
|
||||
"pprint(response[\"StreamDescription\"][\"Shards\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "together-finance",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'shard_id': 'shardId-000000000000',\n",
|
||||
" 'shard_iterator': 'AAAAAAAAAAEfLA4f5f+lMhjNfHXIXsKxQeP3dg79sVKKRiT+843gRXwSQsYRXeMIS4KwdRUjPdChkE2ZZGYSG3DeghHZi41DXOE0pNSdFHnqkePkBVIX2cN/9rbedZTgX/WXfNaL+sMUfdbYV6f9iQEtTtRAYN3bXfk5jUwIBvcgB1mQDRzdT1Or150vbf3LSlLtC7XlkK7HNZoGM1t577jseZTyvJ4+yeBOV73DQnSFnL/EPQvVdm+lidZtaNe39NMak4bXx5AWmhwblwLPmXg/l2PMDx7Z'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"shard_ids = []\n",
|
||||
"stream_name = None \n",
|
||||
"if response and 'StreamDescription' in response:\n",
|
||||
" stream_name= response['StreamDescription']['StreamName'] \n",
|
||||
" \n",
|
||||
" # reading all shards (getting shard iterators)\n",
|
||||
" for shard_id in response['StreamDescription']['Shards']:\n",
|
||||
" shard_id = shard_id['ShardId'] \n",
|
||||
" shard_iterator = kinesis_client.get_shard_iterator(StreamName=stream_name, ShardId = shard_id, ShardIteratorType=\"TRIM_HORIZON\")\n",
|
||||
" \n",
|
||||
" si = shard_iterator[\"ShardIterator\"]\n",
|
||||
" shard_ids.append({'shard_id' : shard_id ,'shard_iterator' : si })\n",
|
||||
" \n",
|
||||
"shard_ids"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "vital-bridges",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[{'ApproximateArrivalTimestamp': datetime.datetime(2021, 4, 18, 14, 18, 25, 191000, tzinfo=tzlocal()),\n",
|
||||
" 'Data': b'{\"transaction_ts\": 1601510403, \"symbol\": \"ETH_USD\", \"price\": 360'\n",
|
||||
" b'.03, \"amount\": 0.646, \"dollar_amount\": 232.57938, \"type\": \"buy\",'\n",
|
||||
" b' \"trans_id\": 124289044}\\n',\n",
|
||||
" 'PartitionKey': 'ETH_USD',\n",
|
||||
" 'SequenceNumber': '49617445977150094507622122578777461144796130256147709954'},\n",
|
||||
" {'ApproximateArrivalTimestamp': datetime.datetime(2021, 4, 18, 14, 18, 25, 310000, tzinfo=tzlocal()),\n",
|
||||
" 'Data': b'{\"transaction_ts\": 1601510403, \"symbol\": \"BTC_USD\", \"price\": 107'\n",
|
||||
" b'80.83, \"amount\": 0.035, \"dollar_amount\": 377.32905, \"type\": \"buy'\n",
|
||||
" b'\", \"trans_id\": 124289043}\\n',\n",
|
||||
" 'PartitionKey': 'BTC_USD',\n",
|
||||
" 'SequenceNumber': '49617445977150094507622122578778670070615744885322416130'},\n",
|
||||
" {'ApproximateArrivalTimestamp': datetime.datetime(2021, 4, 18, 14, 18, 25, 428000, tzinfo=tzlocal()),\n",
|
||||
" 'Data': b'{\"transaction_ts\": 1601510404, \"symbol\": \"ETH_USD\", \"price\": 360'\n",
|
||||
" b'.12, \"amount\": 0.523, \"dollar_amount\": 188.34276, \"type\": \"buy\",'\n",
|
||||
" b' \"trans_id\": 124289045}\\n',\n",
|
||||
" 'PartitionKey': 'ETH_USD',\n",
|
||||
" 'SequenceNumber': '49617445977150094507622122578779878996435359514497122306'},\n",
|
||||
" {'ApproximateArrivalTimestamp': datetime.datetime(2021, 4, 18, 14, 18, 25, 545000, tzinfo=tzlocal()),\n",
|
||||
" 'Data': b'{\"transaction_ts\": 1601510405, \"symbol\": \"BTC_USD\", \"price\": 107'\n",
|
||||
" b'84.42, \"amount\": 0.25635676, \"dollar_amount\": 2764.65897, \"type\"'\n",
|
||||
" b': \"buy\", \"trans_id\": 124289050}\\n',\n",
|
||||
" 'PartitionKey': 'BTC_USD',\n",
|
||||
" 'SequenceNumber': '49617445977150094507622122578781087922254974143671828482'},\n",
|
||||
" {'ApproximateArrivalTimestamp': datetime.datetime(2021, 4, 18, 14, 18, 25, 663000, tzinfo=tzlocal()),\n",
|
||||
" 'Data': b'{\"transaction_ts\": 1601510407, \"symbol\": \"BTC_USD\", \"price\": 107'\n",
|
||||
" b'84.42, \"amount\": 0.23877038, \"dollar_amount\": 2575.000061, \"type'\n",
|
||||
" b'\": \"buy\", \"trans_id\": 124289051}\\n',\n",
|
||||
" 'PartitionKey': 'BTC_USD',\n",
|
||||
" 'SequenceNumber': '49617445977150094507622122578782296848074588772846534658'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"limit = 5\n",
|
||||
"response_get_rec = kinesis_client.get_records(ShardIterator = si , Limit = limit)\n",
|
||||
"next_shard_iterator = response_get_rec['NextShardIterator']\n",
|
||||
"pprint(response_get_rec[\"Records\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "alone-martial",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"AAAAAAAAAAGUetfZYhJAUbRmLdnxgHF2gXHQ+Yt8063YzfurEZ+Vdauri9LJ13JLrPqLIrBxeHRJ1GEBctxNJ4jYeB4Um/JNu4+2L5Jfa1Apl9s9y6f/5UMZlqIAFGvUPmW53Gj6MyauM9r7EWNBUBZCvrFQkHvC9fQwNYP3eyYm1xp4K9fcjBX90qUdnGmFU69bq+3BF5I7PXgPHitcwzJev6PqPLVny2SmhSHtnRF/Rogj00Xv+DtKo1/SBdVid3tyQ0e9tm4XrgttPfPIhNsJB3j57lpM\n",
|
||||
"[{'ApproximateArrivalTimestamp': datetime.datetime(2021, 4, 18, 14, 18, 25, 812000, tzinfo=tzlocal()),\n",
|
||||
" 'Data': b'{\"transaction_ts\": 1601510409, \"symbol\": \"BTC_USD\", \"price\": 107'\n",
|
||||
" b'84.42, \"amount\": 1.01303547, \"dollar_amount\": 10924.99998, \"type'\n",
|
||||
" b'\": \"buy\", \"trans_id\": 124289054}\\n',\n",
|
||||
" 'PartitionKey': 'BTC_USD',\n",
|
||||
" 'SequenceNumber': '49617445977150094507622122578783505773894203402021240834'},\n",
|
||||
" {'ApproximateArrivalTimestamp': datetime.datetime(2021, 4, 18, 14, 18, 25, 930000, tzinfo=tzlocal()),\n",
|
||||
" 'Data': b'{\"transaction_ts\": 1601510410, \"symbol\": \"BTC_USD\", \"price\": 107'\n",
|
||||
" b'84.42, \"amount\": 0.26135077, \"dollar_amount\": 2818.516471, \"type'\n",
|
||||
" b'\": \"buy\", \"trans_id\": 124289055}\\n',\n",
|
||||
" 'PartitionKey': 'BTC_USD',\n",
|
||||
" 'SequenceNumber': '49617445977150094507622122578784714699713818031195947010'},\n",
|
||||
" {'ApproximateArrivalTimestamp': datetime.datetime(2021, 4, 18, 14, 18, 26, 48000, tzinfo=tzlocal()),\n",
|
||||
" 'Data': b'{\"transaction_ts\": 1601510413, \"symbol\": \"ETH_USD\", \"price\": 360'\n",
|
||||
" b'.39, \"amount\": 5.55416701, \"dollar_amount\": 2001.666249, \"type\":'\n",
|
||||
" b' \"buy\", \"trans_id\": 124289059}\\n',\n",
|
||||
" 'PartitionKey': 'ETH_USD',\n",
|
||||
" 'SequenceNumber': '49617445977150094507622122578785923625533432660370653186'},\n",
|
||||
" {'ApproximateArrivalTimestamp': datetime.datetime(2021, 4, 18, 14, 18, 26, 165000, tzinfo=tzlocal()),\n",
|
||||
" 'Data': b'{\"transaction_ts\": 1601510414, \"symbol\": \"ETH_USD\", \"price\": 360'\n",
|
||||
" b'.6, \"amount\": 13.855, \"dollar_amount\": 4996.113, \"type\": \"buy\", '\n",
|
||||
" b'\"trans_id\": 124289071}\\n',\n",
|
||||
" 'PartitionKey': 'ETH_USD',\n",
|
||||
" 'SequenceNumber': '49617445977150094507622122578787132551353047358264836098'},\n",
|
||||
" {'ApproximateArrivalTimestamp': datetime.datetime(2021, 4, 18, 14, 18, 26, 282000, tzinfo=tzlocal()),\n",
|
||||
" 'Data': b'{\"transaction_ts\": 1601510415, \"symbol\": \"ETH_USD\", \"price\": 360'\n",
|
||||
" b'.24, \"amount\": 6.32869733, \"dollar_amount\": 2279.849926, \"type\":'\n",
|
||||
" b' \"sell\", \"trans_id\": 124289072}\\n',\n",
|
||||
" 'PartitionKey': 'ETH_USD',\n",
|
||||
" 'SequenceNumber': '49617445977150094507622122578788341477172661987439542274'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(next_shard_iterator)\n",
|
||||
"response_get_rec = kinesis_client.get_records(ShardIterator = next_shard_iterator , Limit = limit)\n",
|
||||
"pprint(response_get_rec[\"Records\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "analyzed-applicant",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
5
jupyter/requirements.txt
Normal file
5
jupyter/requirements.txt
Normal file
@ -0,0 +1,5 @@
|
||||
boto3==1.17.33
|
||||
faker==6.6.2
|
||||
avro-python3==1.9.1
|
||||
pandavro==1.6.0
|
||||
pyarrow==0.17.1
|
15780
labs/data_generator/crypto_trades_20201001.csv
Normal file
15780
labs/data_generator/crypto_trades_20201001.csv
Normal file
File diff suppressed because it is too large
Load Diff
150
labs/data_generator/generator.py
Normal file
150
labs/data_generator/generator.py
Normal file
@ -0,0 +1,150 @@
|
||||
#!/usr/bin/env python3
|
||||
import configparser
|
||||
import argparse
|
||||
|
||||
import csv
|
||||
import time
|
||||
import logging
|
||||
import sys
|
||||
import json
|
||||
import os
|
||||
|
||||
import boto3
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s',
|
||||
handlers=[logging.StreamHandler(sys.stdout)]
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
DEFAULT_DATA_FILE = 'crypto_trades_20201001.csv'
|
||||
|
||||
|
||||
class KinesisProducer:
|
||||
"""
|
||||
Kinesis Producer
|
||||
"""
|
||||
|
||||
def __init__(self, speed_per_sec):
|
||||
|
||||
self.client = boto3.client('kinesis')
|
||||
self.max_retry_attempt = 5
|
||||
|
||||
def produce(self, event, key, data_stream):
|
||||
"""
|
||||
A simple wrapper for put record
|
||||
:param event:
|
||||
:param key:
|
||||
:param data_stream:
|
||||
:return:
|
||||
"""
|
||||
|
||||
# adding a new line at the end to produce JSON lines
|
||||
# (otherwise we would need to pre-process those records in Firehose
|
||||
# invoking a Lambda to add those new lines).Every message is a dumped json with \n
|
||||
|
||||
tran_id = event["trans_id"]
|
||||
payload = (json.dumps(event) + '\n').encode('utf-8')
|
||||
|
||||
attempt = 1
|
||||
while attempt < self.max_retry_attempt:
|
||||
try:
|
||||
response = self.client.put_record(
|
||||
StreamName=data_stream,
|
||||
Data=payload,
|
||||
PartitionKey=key
|
||||
)
|
||||
logger.info('Msg with trans_id={} sent to shard {} seq no {}'.format(tran_id, response["ShardId"],
|
||||
response["SequenceNumber"]))
|
||||
return response
|
||||
|
||||
except Exception as e:
|
||||
logger.warning('Exception has occurred {}, retrying...'.format(e))
|
||||
attempt += 1
|
||||
time.sleep(attempt)
|
||||
|
||||
logger.error('Max attempt has been reached, rethrowing the last err')
|
||||
raise
|
||||
|
||||
|
||||
def prepare_event(event):
|
||||
"""
|
||||
Events from CSV have no dtypes, lets convert it to some more real values (int / decimals etc)
|
||||
:param event:
|
||||
:return:
|
||||
"""
|
||||
msg_key = event["symbol"]
|
||||
|
||||
msg_formatted = {
|
||||
"transaction_ts": int(event["transaction_ts"]),
|
||||
"symbol": event["symbol"],
|
||||
"price": float(event["price"]),
|
||||
"amount": float(event["amount"]),
|
||||
"dollar_amount": float(event["dollar_amount"]),
|
||||
"type": event["type"],
|
||||
"trans_id": int(event["trans_id"]),
|
||||
}
|
||||
|
||||
return msg_formatted, msg_key
|
||||
|
||||
|
||||
def produce_data(kinesis_data_stream, messages_per_sec, input_file, single_run):
|
||||
"""
|
||||
Main method for producing
|
||||
:param kinesis_data_stream: param from cmdline name of KDS
|
||||
:param messages_per_sec: param from cmdline max speed per sec 1/mps
|
||||
:return:
|
||||
"""
|
||||
kp = KinesisProducer(speed_per_sec=messages_per_sec)
|
||||
|
||||
with open(input_file) as csv_file:
|
||||
reader = csv.DictReader(csv_file, delimiter=',')
|
||||
all_rows = list(reader)
|
||||
|
||||
current_time = int(all_rows[0]["transaction_ts"])
|
||||
|
||||
replay_cnt = 1
|
||||
while True:
|
||||
logger.info("start replaying for the {} time".format(replay_cnt))
|
||||
for row in all_rows:
|
||||
|
||||
new_event_time = int(row["transaction_ts"])
|
||||
time_delta = new_event_time - current_time
|
||||
current_time = new_event_time
|
||||
|
||||
if time_delta > 0 and messages_per_sec > 0:
|
||||
time.sleep(time_delta / messages_per_sec)
|
||||
|
||||
event, key = prepare_event(row)
|
||||
kp.produce(event, key, kinesis_data_stream)
|
||||
|
||||
if single_run:
|
||||
break
|
||||
replay_cnt += 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
logger.info('Starting Simple Kinesis Producer (replaying stock data)')
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('-k', '--kinesis_ds', dest='kinesis_ds', required=True)
|
||||
parser.add_argument('-i', '--input_file', dest='input_file', required=False)
|
||||
parser.add_argument('-s', '--messages_per_sec', dest='mps', type=int, default=-1, required=False)
|
||||
parser.add_argument('-r', '--single-run', dest='singel_run', action='store_true', required=False, default=False)
|
||||
|
||||
args, unknown = parser.parse_known_args()
|
||||
config = configparser.ConfigParser()
|
||||
|
||||
kinesis_data_stream = args.kinesis_ds
|
||||
messages_per_sec = int(args.mps)
|
||||
|
||||
single_run = args.singel_run if hasattr(args, 'singel_run') else False
|
||||
|
||||
if args.input_file:
|
||||
input_file = args.input_file
|
||||
else:
|
||||
main_path = os.path.abspath(os.path.dirname(__file__))
|
||||
input_file = os.path.join(main_path, DEFAULT_DATA_FILE)
|
||||
|
||||
produce_data(kinesis_data_stream, messages_per_sec, input_file, single_run)
|
67
labs/labs_preparation.md
Normal file
67
labs/labs_preparation.md
Normal file
@ -0,0 +1,67 @@
|
||||
## Laboratorium
|
||||
### Przetwarzanie danych w chmnurze publicznej
|
||||
|
||||
---
|
||||
|
||||
|
||||
1. Wymagania wstępne - środowiska (rekomendowane **PyCharm + Anacoda**)
|
||||
* PyCharm - https://www.jetbrains.com/pycharm/download/
|
||||
* Anaconda - https://www.anaconda.com/products/individual#Downloads
|
||||
- nowe środowisko Python 3.9
|
||||
Windows users : użyj Anaconda Prompt)
|
||||
Linux / MacOs bash / zsh etc..
|
||||
```
|
||||
conda create -n uam_cloud_dp python=3.8
|
||||
conda activate uam_cloud_dp
|
||||
```
|
||||
|
||||
|
||||
|
||||
* Terraform (minimum w wersji 0.14)
|
||||
- pobierz Terraform z https://www.terraform.io/downloads.html
|
||||
właściwy dla twoje OS
|
||||
- zainstaluj zgodnie z https://learn.hashicorp.com/tutorials/terraform/install-cli?in=terraform/aws-get-started
|
||||
- sprawdź poprawność instalacji wpisując w cmdline / bash (TF w wersji 0.14+)
|
||||
```
|
||||
$ terraform --version
|
||||
Terraform v0.14.8
|
||||
```
|
||||
|
||||
* Setup środowiska
|
||||
- Aktywuj swoją conda env
|
||||
```
|
||||
conda activate uam_cloud_dp
|
||||
```
|
||||
- instalacja wymaganych pakietów Python
|
||||
```
|
||||
pip install -f <path to this repo>/labs/requirements.txt
|
||||
```
|
||||
- sprawdź czy awscli jest zainstalowane poprawnie
|
||||
```
|
||||
$ aws --version
|
||||
aws-cli/1.19.33 Python/3.8.8 Windows/10 botocore/1.20.33
|
||||
```
|
||||
|
||||
|
||||
* Konfiguracja konta AWS
|
||||
- Zaloguj się do AWS Educate - https://www.awseducate.com/signin/SiteLogin
|
||||
|
||||
- AWS Account -> Starter Account
|
||||
- Account Details - skopiuj tymczasowe dane do logowanie (Access / Secret i Token)
|
||||
|
||||
- jeśli pierwszy raz konfigurujsze awscli na swojej maszynie wpisz (Acces i Secret nie istotne - potem je wyedytujemy)
|
||||
```bash
|
||||
$ aws configure
|
||||
AWS Access Key ID [None]: a
|
||||
AWS Secret Access Key [None]: b
|
||||
Default region name [None]: us-east-1
|
||||
Default output format [None]:
|
||||
```
|
||||
- Wklej do pliku ~/.aws/credentials skopiowane dane do logowania
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
BIN
labs/lambda/awswrangler-layer-2.7.0-py3.8.zip
Normal file
BIN
labs/lambda/awswrangler-layer-2.7.0-py3.8.zip
Normal file
Binary file not shown.
56
labs/lambda/lambda_definition.py
Normal file
56
labs/lambda/lambda_definition.py
Normal file
@ -0,0 +1,56 @@
|
||||
import urllib.parse
|
||||
import awswrangler as wr
|
||||
import pandas as pd
|
||||
import boto3
|
||||
|
||||
client_ssm = boto3.client('ssm')
|
||||
|
||||
def etl_function(event, context):
|
||||
processed_zone_prefix = "processed-zone"
|
||||
|
||||
record = event["Records"][0]
|
||||
bucket = client_ssm.get_parameter(Name = 's3_processed_bucket_name')['Parameter']['Value']
|
||||
key = urllib.parse.unquote(record["s3"]["object"]["key"])
|
||||
event_prefix = key.split('/')[1]
|
||||
full_src_path = 's3://{bucket}/{key}'.format(bucket=bucket, key=key)
|
||||
|
||||
print(f'Processing key = {full_src_path}')
|
||||
df = wr.s3.read_json(path=full_src_path, lines=True)
|
||||
|
||||
filename = key.split('/')[-1][-36:]
|
||||
dest_prefix = f"s3://{bucket}/{processed_zone_prefix}/{event_prefix}"
|
||||
|
||||
df['transaction_date'] = pd.to_datetime(df['transaction_ts'], unit='s')
|
||||
df['year'] = df['transaction_date'].dt.year
|
||||
df['month'] = df['transaction_date'].dt.month
|
||||
df['day'] = df['transaction_date'].dt.day
|
||||
df['hour'] = df['transaction_date'].dt.hour
|
||||
|
||||
cols_to_return = ["transaction_date", "price", "amount", "dollar_amount", "type", "trans_id"]
|
||||
|
||||
new_keys = []
|
||||
for [symbol, year, month, day, hour], data in df.groupby(['symbol', 'year', 'month', 'day', 'hour']):
|
||||
partitions = f"symbol={symbol}/year={year}/month={month}/day={day}/hour={hour}"
|
||||
full_key_name = '/'.join([dest_prefix, partitions, filename + '.parquet'])
|
||||
|
||||
print(f'Saving a new key = {full_key_name}')
|
||||
new_keys.append(full_key_name)
|
||||
|
||||
wr.s3.to_parquet(
|
||||
df=data[cols_to_return],
|
||||
path=full_key_name,
|
||||
compression='snappy'
|
||||
)
|
||||
|
||||
return {
|
||||
'key': key,
|
||||
'statusCode': 200,
|
||||
'new_keys': new_keys
|
||||
}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
event = ""
|
||||
context = ""
|
||||
|
||||
response = etl_function(event, context)
|
BIN
labs/lambda/lambda_definition.zip
Normal file
BIN
labs/lambda/lambda_definition.zip
Normal file
Binary file not shown.
4
labs/requirements.txt
Normal file
4
labs/requirements.txt
Normal file
@ -0,0 +1,4 @@
|
||||
awscli==1.32.91
|
||||
boto3==1.34.89
|
||||
configparser==7.0.0
|
||||
awswrangler==3.7.3
|
12
labs/setup_c9.sh
Normal file
12
labs/setup_c9.sh
Normal file
@ -0,0 +1,12 @@
|
||||
#!/bin/bash
|
||||
sudo yum update -y
|
||||
|
||||
wget https://releases.hashicorp.com/terraform/1.8.1/terraform_1.8.1_linux_amd64.zip -P ~/
|
||||
unzip ~/terraform_1.8.1_linux_amd64.zip -d ~/.
|
||||
sudo mv ~/terraform /usr/local/bin
|
||||
|
||||
pip install -r requirements.txt
|
||||
|
||||
echo "alias python='python3'
|
||||
alias tf='terraform'
|
||||
alias c='clear'" >> ~/.bashrc
|
25
labs/terraform/.terraform.lock.hcl
Normal file
25
labs/terraform/.terraform.lock.hcl
Normal file
@ -0,0 +1,25 @@
|
||||
# This file is maintained automatically by "terraform init".
|
||||
# Manual edits may be lost in future updates.
|
||||
|
||||
provider "registry.terraform.io/hashicorp/aws" {
|
||||
version = "5.47.0"
|
||||
constraints = "~> 5.0"
|
||||
hashes = [
|
||||
"h1:bZEm2TDCM7jmpNXK6QOWsT1YU8GiGGQaraUvwO887U8=",
|
||||
"zh:06037a14e47e8f82d0b3b326cd188566272b808b7970a9249a11db26d475b83d",
|
||||
"zh:116b7dd58ca964a1056249d2b6550f399b0a6bc9a7920b7ee134242114432c9f",
|
||||
"zh:1aa089c81459071c1d65ba7454f1122159e1fa1b5384e6e9ef85c8264f8a9ecb",
|
||||
"zh:2c1471acba40c4944aa88dda761093c0c969db6408bdc1a4fb62417788cd6bb6",
|
||||
"zh:3b950bea06ea4bf1ec359a97a4f1745b7efca7fc2da368843666020dd0ebc5d4",
|
||||
"zh:7191c5c2fce834d584153dcd5269ed3042437f224d341ad85df06b2247bd09b2",
|
||||
"zh:76d841b3f247f9bb3899dec3b4d871613a4ae8a83a581a827655d34b1bbee0ee",
|
||||
"zh:7c656ce252fafc2c915dad43a0a7da17dba975207d75841a02f3f2b92d51ec25",
|
||||
"zh:8ec97118cbdef64139c52b719e4e22443e67a1f37ea1597cd45b2e9b97332a35",
|
||||
"zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425",
|
||||
"zh:a369deca7938236a7da59f7ad1fe18137f736764c9015ed10e88edb6e8505980",
|
||||
"zh:a743882fb099401eae0c86d9388a6faadbbc27b2ac9477aeef643e5de4eec3f9",
|
||||
"zh:d5f960f58aff06fc58e244fea6e665800384cacb8cd64a556f8e145b98650372",
|
||||
"zh:e31ffcfd560132ffbff2f574928ba392e663202a750750ed39a8950031b75623",
|
||||
"zh:ebd9061b92a772144564f35a63d5a08cb45e14a9d39294fda185f2e0de9c8e28",
|
||||
]
|
||||
}
|
0
labs/terraform/.terraform/Untitled
Normal file
0
labs/terraform/.terraform/Untitled
Normal file
Binary file not shown.
19
labs/terraform/S3.tf
Normal file
19
labs/terraform/S3.tf
Normal file
@ -0,0 +1,19 @@
|
||||
resource "aws_s3_bucket" "raw_bucket" {
|
||||
bucket = "datalake-raw-${var.account_number}-${var.student_initials}-${var.student_index_no}"
|
||||
force_destroy = true
|
||||
tags = {
|
||||
Purpose = "UAM Cloud Data Processing"
|
||||
Environment = "DEV"
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket" "processed_bucket" {
|
||||
bucket = "datalake-processed-${var.account_number}-${var.student_initials}-${var.student_index_no}"
|
||||
force_destroy = true
|
||||
tags = {
|
||||
Purpose = "UAM Cloud Data Processing"
|
||||
Environment = "DEV"
|
||||
}
|
||||
|
||||
}
|
32
labs/terraform/athena.tf
Normal file
32
labs/terraform/athena.tf
Normal file
@ -0,0 +1,32 @@
|
||||
resource "aws_s3_bucket" "athena_results" {
|
||||
bucket = "athena-results-${var.account_number}-${var.student_initials}-${var.student_index_no}"
|
||||
force_destroy = true
|
||||
tags = merge(local.common_tags, {})
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_lifecycle_configuration" "athena_results_lifecycle" {
|
||||
bucket = aws_s3_bucket.athena_results.id
|
||||
|
||||
rule {
|
||||
id = "standard-expiration"
|
||||
status = "Enabled"
|
||||
|
||||
expiration {
|
||||
days = 1
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_athena_workgroup" "athena_workgroup" {
|
||||
name = "development"
|
||||
|
||||
configuration {
|
||||
enforce_workgroup_configuration = true
|
||||
|
||||
result_configuration {
|
||||
output_location = "s3://${aws_s3_bucket.athena_results.bucket}/output/"
|
||||
}
|
||||
}
|
||||
|
||||
force_destroy = true
|
||||
}
|
24
labs/terraform/aws_iam_policy.tf
Normal file
24
labs/terraform/aws_iam_policy.tf
Normal file
@ -0,0 +1,24 @@
|
||||
resource "aws_iam_policy" "lambda_policy" {
|
||||
name = "lambda_policy"
|
||||
policy = jsonencode({
|
||||
Version = "2012-10-17",
|
||||
Statement = [
|
||||
{
|
||||
Effect = "Allow",
|
||||
Action = [
|
||||
"s3:GetObject",
|
||||
"s3:PutObject"
|
||||
],
|
||||
Resource = [
|
||||
"arn:aws:s3:::${aws_s3_bucket.raw_bucket.id}/*",
|
||||
"arn:aws:s3:::${aws_s3_bucket.processed_bucket.id}/*"
|
||||
]
|
||||
},
|
||||
{
|
||||
Effect = "Allow",
|
||||
Action = "ssm:GetParameter",
|
||||
Resource = "arn:aws:ssm:${var.region}:${var.account_number}:parameter/s3_processed_bucket_name"
|
||||
}
|
||||
]
|
||||
})
|
||||
}
|
24
labs/terraform/glu.dc.tf
Normal file
24
labs/terraform/glu.dc.tf
Normal file
@ -0,0 +1,24 @@
|
||||
|
||||
|
||||
resource "aws_glue_catalog_database" "datalake_db_raw_zone" {
|
||||
name = "datalake_raw_${var.account_number}_${var.student_initials}_${var.student_index_no}"
|
||||
}
|
||||
|
||||
resource "aws_glue_catalog_database" "datalake_db_processed_zone" {
|
||||
name = "datalake_processed_${var.account_number}_${var.student_initials}_${var.student_index_no}"
|
||||
}
|
||||
|
||||
resource "aws_glue_crawler" "glue_crawler_raw_zone" {
|
||||
database_name = aws_glue_catalog_database.datalake_db_raw_zone.name
|
||||
name = "gc-raw-${var.account_number}-${var.student_initials}-${var.student_index_no}"
|
||||
role = var.lab_role_arn
|
||||
table_prefix = "crawler_"
|
||||
|
||||
s3_target {
|
||||
path = "s3://${aws_s3_bucket.raw_bucket.bucket}/raw-zone/stockdata/"
|
||||
}
|
||||
|
||||
tags = local.common_tags
|
||||
|
||||
}
|
||||
|
16
labs/terraform/kinesis_ds.tf
Normal file
16
labs/terraform/kinesis_ds.tf
Normal file
@ -0,0 +1,16 @@
|
||||
resource "aws_kinesis_stream" "cryptostock_stream" {
|
||||
name ="cryptostock-${var.account_number}-${var.student_initials}-${var.student_index_no}"
|
||||
shard_count = 1
|
||||
enforce_consumer_deletion = true
|
||||
shard_level_metrics = [
|
||||
"IncomingBytes",
|
||||
"OutgoingBytes",
|
||||
"IncomingRecords",
|
||||
"OutgoingRecords"
|
||||
]
|
||||
tags = {
|
||||
Purpose = "UAM Cloud Data Processing"
|
||||
Environment = "DEV"
|
||||
Owner = var.student_full_name
|
||||
}
|
||||
}
|
17
labs/terraform/kinesis_fh.tf
Normal file
17
labs/terraform/kinesis_fh.tf
Normal file
@ -0,0 +1,17 @@
|
||||
resource "aws_kinesis_firehose_delivery_stream" "stock_delivery_stream" {
|
||||
name = "firehose-${var.account_number}-${var.student_initials}-${var.student_index_no}"
|
||||
destination = "extended_s3"
|
||||
kinesis_source_configuration {
|
||||
kinesis_stream_arn = aws_kinesis_stream.cryptostock_stream.arn
|
||||
role_arn = var.lab_role_arn
|
||||
}
|
||||
extended_s3_configuration {
|
||||
role_arn = var.lab_role_arn
|
||||
bucket_arn = aws_s3_bucket.raw_bucket.arn
|
||||
buffering_size = 1
|
||||
buffering_interval = 60
|
||||
prefix = "raw-zone/stockdata/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/hour=!{timestamp:HH}/"
|
||||
error_output_prefix = "${"raw-zone/stockdata_errors/!{firehose:error-output-type}/year=!{timestamp:yyyy}"}${"/month=!{timestamp:MM}/day=!{timestamp:dd}/hour=!{timestamp:HH}"}/"
|
||||
}
|
||||
}
|
||||
|
34
labs/terraform/lambda.tf
Normal file
34
labs/terraform/lambda.tf
Normal file
@ -0,0 +1,34 @@
|
||||
resource "aws_lambda_layer_version" "aws_wrangler" {
|
||||
filename = "../lambda/awswrangler-layer-2.7.0-py3.8.zip"
|
||||
layer_name = "aws_wrangler_${var.account_number}_${var.student_initials}_${var.student_index_no}"
|
||||
source_code_hash = "${filebase64sha256("../lambda/awswrangler-layer-2.7.0-py3.8.zip")}"
|
||||
compatible_runtimes = ["python3.8"]
|
||||
}
|
||||
|
||||
resource "aws_lambda_function" "etl_post_processing" {
|
||||
function_name = "etl-post-processing-${var.account_number}-${var.student_initials}-${var.student_index_no}"
|
||||
filename = "../lambda/lambda_definition.zip"
|
||||
handler = "lambda_definition.etl_function"
|
||||
runtime = "python3.8"
|
||||
role = var.lab_role_arn
|
||||
timeout = 300
|
||||
memory_size = 512
|
||||
source_code_hash= filebase64sha256("../lambda/lambda_definition.zip")
|
||||
layers = ["${aws_lambda_layer_version.aws_wrangler.arn}"]
|
||||
}
|
||||
resource "aws_lambda_permission" "allow_bucket" {
|
||||
statement_id = "AllowExecutionFromS3Bucket"
|
||||
action = "lambda:InvokeFunction"
|
||||
function_name = aws_lambda_function.etl_post_processing.arn
|
||||
principal = "s3.amazonaws.com"
|
||||
source_arn = aws_s3_bucket.raw_bucket.arn
|
||||
}
|
||||
resource "aws_s3_bucket_notification" "trigger_etl_lambda" {
|
||||
bucket = aws_s3_bucket.raw_bucket.id
|
||||
lambda_function {
|
||||
lambda_function_arn = aws_lambda_function.etl_post_processing.arn
|
||||
events = ["s3:ObjectCreated:*"]
|
||||
filter_prefix = "raw-zone/"
|
||||
}
|
||||
depends_on = [aws_lambda_permission.allow_bucket]
|
||||
}
|
7
labs/terraform/main.tf
Normal file
7
labs/terraform/main.tf
Normal file
@ -0,0 +1,7 @@
|
||||
locals {
|
||||
common_tags = {
|
||||
purpose = "UAM Cloud Data Processing"
|
||||
environment = "DEV"
|
||||
owner = var.student_full_name
|
||||
}
|
||||
}
|
13
labs/terraform/provider.tf
Normal file
13
labs/terraform/provider.tf
Normal file
@ -0,0 +1,13 @@
|
||||
terraform {
|
||||
required_providers {
|
||||
aws = {
|
||||
source = "hashicorp/aws"
|
||||
version = "~> 5.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
provider "aws" {
|
||||
profile = "default"
|
||||
region = var.region
|
||||
}
|
5
labs/terraform/ssm_parameter.tf
Normal file
5
labs/terraform/ssm_parameter.tf
Normal file
@ -0,0 +1,5 @@
|
||||
resource "aws_ssm_parameter" "s3_processed" {
|
||||
name = "s3_processed_bucket_name"
|
||||
type = "String"
|
||||
value = aws_s3_bucket.processed_bucket.bucket
|
||||
}
|
7
labs/terraform/starter_files/main.tf
Normal file
7
labs/terraform/starter_files/main.tf
Normal file
@ -0,0 +1,7 @@
|
||||
locals {
|
||||
common_tags = {
|
||||
purpose = "UAM Cloud Data Processing"
|
||||
environment = "DEV"
|
||||
owner = var.student_full_name
|
||||
}
|
||||
}
|
13
labs/terraform/starter_files/provider.tf
Normal file
13
labs/terraform/starter_files/provider.tf
Normal file
@ -0,0 +1,13 @@
|
||||
terraform {
|
||||
required_providers {
|
||||
aws = {
|
||||
source = "hashicorp/aws"
|
||||
version = "~> 5.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
provider "aws" {
|
||||
profile = "default"
|
||||
region = var.region
|
||||
}
|
5
labs/terraform/starter_files/terraform.tfvars
Normal file
5
labs/terraform/starter_files/terraform.tfvars
Normal file
@ -0,0 +1,5 @@
|
||||
account_number=XXXXX
|
||||
student_initials="jk"
|
||||
student_full_name="Jakub Kasprzak"
|
||||
student_index_no = "12345"
|
||||
lab_role_arn = "arn:aws:iam::XXXXX:role/LabRole"
|
37
labs/terraform/starter_files/variables.tf
Normal file
37
labs/terraform/starter_files/variables.tf
Normal file
@ -0,0 +1,37 @@
|
||||
variable "account_number" {
|
||||
description = "Account number"
|
||||
type = number
|
||||
}
|
||||
|
||||
variable "region" {
|
||||
description = "Region name - must be NVirginia us-east-1"
|
||||
type = string
|
||||
default = "us-east-1"
|
||||
}
|
||||
|
||||
variable "environment" {
|
||||
description = "Environment name"
|
||||
type = string
|
||||
default = "dev"
|
||||
}
|
||||
|
||||
variable "student_initials" {
|
||||
description = "letters of first and last names"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "student_full_name" {
|
||||
description = "Student's full name"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "student_index_no" {
|
||||
description = "Index no"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "lab_role_arn" {
|
||||
description = "the role we use for all labs, dont use a single role for everything! it is an anti-pattern!!!!"
|
||||
type = string
|
||||
|
||||
}
|
9
labs/terraform/terraform.tfstate
Normal file
9
labs/terraform/terraform.tfstate
Normal file
@ -0,0 +1,9 @@
|
||||
{
|
||||
"version": 4,
|
||||
"terraform_version": "1.8.1",
|
||||
"serial": 289,
|
||||
"lineage": "3ca48c2b-bb88-35e7-8a35-5ef0e08daaff",
|
||||
"outputs": {},
|
||||
"resources": [],
|
||||
"check_results": null
|
||||
}
|
896
labs/terraform/terraform.tfstate.backup
Normal file
896
labs/terraform/terraform.tfstate.backup
Normal file
@ -0,0 +1,896 @@
|
||||
{
|
||||
"version": 4,
|
||||
"terraform_version": "1.8.1",
|
||||
"serial": 272,
|
||||
"lineage": "3ca48c2b-bb88-35e7-8a35-5ef0e08daaff",
|
||||
"outputs": {},
|
||||
"resources": [
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_athena_workgroup",
|
||||
"name": "athena_workgroup",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"arn": "arn:aws:athena:us-east-1:205216182560:workgroup/development",
|
||||
"configuration": [
|
||||
{
|
||||
"bytes_scanned_cutoff_per_query": 0,
|
||||
"enforce_workgroup_configuration": true,
|
||||
"engine_version": [
|
||||
{
|
||||
"effective_engine_version": "Athena engine version 3",
|
||||
"selected_engine_version": "AUTO"
|
||||
}
|
||||
],
|
||||
"execution_role": "",
|
||||
"publish_cloudwatch_metrics_enabled": true,
|
||||
"requester_pays_enabled": false,
|
||||
"result_configuration": [
|
||||
{
|
||||
"acl_configuration": [],
|
||||
"encryption_configuration": [],
|
||||
"expected_bucket_owner": "",
|
||||
"output_location": "s3://athena-results-205216182560-agb-s1201687/output/"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"description": "",
|
||||
"force_destroy": true,
|
||||
"id": "development",
|
||||
"name": "development",
|
||||
"state": "ENABLED",
|
||||
"tags": {},
|
||||
"tags_all": {}
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "bnVsbA==",
|
||||
"dependencies": [
|
||||
"aws_s3_bucket.athena_results"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_glue_catalog_database",
|
||||
"name": "datalake_db_processed_zone",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"arn": "arn:aws:glue:us-east-1:205216182560:database/datalake_processed_205216182560_agb_s1201687",
|
||||
"catalog_id": "205216182560",
|
||||
"create_table_default_permission": [
|
||||
{
|
||||
"permissions": [
|
||||
"ALL"
|
||||
],
|
||||
"principal": [
|
||||
{
|
||||
"data_lake_principal_identifier": "IAM_ALLOWED_PRINCIPALS"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"description": "",
|
||||
"federated_database": [],
|
||||
"id": "205216182560:datalake_processed_205216182560_agb_s1201687",
|
||||
"location_uri": "",
|
||||
"name": "datalake_processed_205216182560_agb_s1201687",
|
||||
"parameters": {},
|
||||
"tags": {},
|
||||
"tags_all": {},
|
||||
"target_database": []
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "bnVsbA=="
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_glue_catalog_database",
|
||||
"name": "datalake_db_raw_zone",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"arn": "arn:aws:glue:us-east-1:205216182560:database/datalake_raw_205216182560_agb_s1201687",
|
||||
"catalog_id": "205216182560",
|
||||
"create_table_default_permission": [
|
||||
{
|
||||
"permissions": [
|
||||
"ALL"
|
||||
],
|
||||
"principal": [
|
||||
{
|
||||
"data_lake_principal_identifier": "IAM_ALLOWED_PRINCIPALS"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"description": "",
|
||||
"federated_database": [],
|
||||
"id": "205216182560:datalake_raw_205216182560_agb_s1201687",
|
||||
"location_uri": "",
|
||||
"name": "datalake_raw_205216182560_agb_s1201687",
|
||||
"parameters": {},
|
||||
"tags": {},
|
||||
"tags_all": {},
|
||||
"target_database": []
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "bnVsbA=="
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_glue_crawler",
|
||||
"name": "glue_crawler_raw_zone",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"arn": "arn:aws:glue:us-east-1:205216182560:crawler/gc-raw-205216182560-agb-s1201687",
|
||||
"catalog_target": [],
|
||||
"classifiers": [],
|
||||
"configuration": "",
|
||||
"database_name": "datalake_raw_205216182560_agb_s1201687",
|
||||
"delta_target": [],
|
||||
"description": "",
|
||||
"dynamodb_target": [],
|
||||
"hudi_target": [],
|
||||
"iceberg_target": [],
|
||||
"id": "gc-raw-205216182560-agb-s1201687",
|
||||
"jdbc_target": [],
|
||||
"lake_formation_configuration": [
|
||||
{
|
||||
"account_id": "",
|
||||
"use_lake_formation_credentials": false
|
||||
}
|
||||
],
|
||||
"lineage_configuration": [
|
||||
{
|
||||
"crawler_lineage_settings": "DISABLE"
|
||||
}
|
||||
],
|
||||
"mongodb_target": [],
|
||||
"name": "gc-raw-205216182560-agb-s1201687",
|
||||
"recrawl_policy": [
|
||||
{
|
||||
"recrawl_behavior": "CRAWL_EVERYTHING"
|
||||
}
|
||||
],
|
||||
"role": "LabRole",
|
||||
"s3_target": [
|
||||
{
|
||||
"connection_name": "",
|
||||
"dlq_event_queue_arn": "",
|
||||
"event_queue_arn": "",
|
||||
"exclusions": [],
|
||||
"path": "s3://datalake-raw-205216182560-agb-s1201687/raw-zone/stockdata/",
|
||||
"sample_size": 0
|
||||
}
|
||||
],
|
||||
"schedule": "",
|
||||
"schema_change_policy": [
|
||||
{
|
||||
"delete_behavior": "DEPRECATE_IN_DATABASE",
|
||||
"update_behavior": "UPDATE_IN_DATABASE"
|
||||
}
|
||||
],
|
||||
"security_configuration": "",
|
||||
"table_prefix": "crawler_",
|
||||
"tags": {
|
||||
"environment": "DEV",
|
||||
"owner": "Agnieszka Gąbka-Buszek",
|
||||
"purpose": "UAM Cloud Data Processing"
|
||||
},
|
||||
"tags_all": {
|
||||
"environment": "DEV",
|
||||
"owner": "Agnieszka Gąbka-Buszek",
|
||||
"purpose": "UAM Cloud Data Processing"
|
||||
}
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "bnVsbA==",
|
||||
"dependencies": [
|
||||
"aws_glue_catalog_database.datalake_db_raw_zone",
|
||||
"aws_s3_bucket.raw_bucket"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_iam_policy",
|
||||
"name": "lambda_policy",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"arn": "arn:aws:iam::205216182560:policy/lambda_policy",
|
||||
"attachment_count": 0,
|
||||
"description": "",
|
||||
"id": "arn:aws:iam::205216182560:policy/lambda_policy",
|
||||
"name": "lambda_policy",
|
||||
"name_prefix": "",
|
||||
"path": "/",
|
||||
"policy": "{\"Statement\":[{\"Action\":[\"s3:GetObject\",\"s3:PutObject\"],\"Effect\":\"Allow\",\"Resource\":[\"arn:aws:s3:::datalake-raw-205216182560-agb-s1201687/*\",\"arn:aws:s3:::datalake-processed-205216182560-agb-s1201687/*\"]},{\"Action\":\"ssm:GetParameter\",\"Effect\":\"Allow\",\"Resource\":\"arn:aws:ssm:us-east-1:205216182560:parameter/s3_processed_bucket_name\"}],\"Version\":\"2012-10-17\"}",
|
||||
"policy_id": "ANPAS7R6WOEQGHVSNQS6X",
|
||||
"tags": {},
|
||||
"tags_all": {}
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "bnVsbA==",
|
||||
"dependencies": [
|
||||
"aws_s3_bucket.processed_bucket",
|
||||
"aws_s3_bucket.raw_bucket"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_kinesis_firehose_delivery_stream",
|
||||
"name": "stock_delivery_stream",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 1,
|
||||
"attributes": {
|
||||
"arn": "arn:aws:firehose:us-east-1:205216182560:deliverystream/firehose-205216182560-agb-s1201687",
|
||||
"destination": "extended_s3",
|
||||
"destination_id": "destinationId-000000000001",
|
||||
"elasticsearch_configuration": [],
|
||||
"extended_s3_configuration": [
|
||||
{
|
||||
"bucket_arn": "arn:aws:s3:::datalake-raw-205216182560-agb-s1201687",
|
||||
"buffering_interval": 60,
|
||||
"buffering_size": 1,
|
||||
"cloudwatch_logging_options": [
|
||||
{
|
||||
"enabled": false,
|
||||
"log_group_name": "",
|
||||
"log_stream_name": ""
|
||||
}
|
||||
],
|
||||
"compression_format": "UNCOMPRESSED",
|
||||
"custom_time_zone": "UTC",
|
||||
"data_format_conversion_configuration": [],
|
||||
"dynamic_partitioning_configuration": [],
|
||||
"error_output_prefix": "raw-zone/stockdata_errors/!{firehose:error-output-type}/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/hour=!{timestamp:HH}/",
|
||||
"file_extension": "",
|
||||
"kms_key_arn": "",
|
||||
"prefix": "raw-zone/stockdata/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/hour=!{timestamp:HH}/",
|
||||
"processing_configuration": [
|
||||
{
|
||||
"enabled": false,
|
||||
"processors": []
|
||||
}
|
||||
],
|
||||
"role_arn": "arn:aws:iam::205216182560:role/LabRole",
|
||||
"s3_backup_configuration": [],
|
||||
"s3_backup_mode": "Disabled"
|
||||
}
|
||||
],
|
||||
"http_endpoint_configuration": [],
|
||||
"id": "arn:aws:firehose:us-east-1:205216182560:deliverystream/firehose-205216182560-agb-s1201687",
|
||||
"kinesis_source_configuration": [
|
||||
{
|
||||
"kinesis_stream_arn": "arn:aws:kinesis:us-east-1:205216182560:stream/cryptostock-205216182560-agb-s1201687",
|
||||
"role_arn": "arn:aws:iam::205216182560:role/LabRole"
|
||||
}
|
||||
],
|
||||
"msk_source_configuration": [],
|
||||
"name": "firehose-205216182560-agb-s1201687",
|
||||
"opensearch_configuration": [],
|
||||
"opensearchserverless_configuration": [],
|
||||
"redshift_configuration": [],
|
||||
"server_side_encryption": [
|
||||
{
|
||||
"enabled": false,
|
||||
"key_arn": "",
|
||||
"key_type": "AWS_OWNED_CMK"
|
||||
}
|
||||
],
|
||||
"snowflake_configuration": [],
|
||||
"splunk_configuration": [],
|
||||
"tags": {},
|
||||
"tags_all": {},
|
||||
"timeouts": null,
|
||||
"version_id": "1"
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjoxODAwMDAwMDAwMDAwLCJkZWxldGUiOjE4MDAwMDAwMDAwMDAsInVwZGF0ZSI6NjAwMDAwMDAwMDAwfSwic2NoZW1hX3ZlcnNpb24iOiIxIn0=",
|
||||
"dependencies": [
|
||||
"aws_kinesis_stream.cryptostock_stream",
|
||||
"aws_s3_bucket.raw_bucket"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_kinesis_stream",
|
||||
"name": "cryptostock_stream",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 1,
|
||||
"attributes": {
|
||||
"arn": "arn:aws:kinesis:us-east-1:205216182560:stream/cryptostock-205216182560-agb-s1201687",
|
||||
"encryption_type": "NONE",
|
||||
"enforce_consumer_deletion": true,
|
||||
"id": "arn:aws:kinesis:us-east-1:205216182560:stream/cryptostock-205216182560-agb-s1201687",
|
||||
"kms_key_id": "",
|
||||
"name": "cryptostock-205216182560-agb-s1201687",
|
||||
"retention_period": 24,
|
||||
"shard_count": 1,
|
||||
"shard_level_metrics": [
|
||||
"IncomingBytes",
|
||||
"IncomingRecords",
|
||||
"OutgoingBytes",
|
||||
"OutgoingRecords"
|
||||
],
|
||||
"stream_mode_details": [
|
||||
{
|
||||
"stream_mode": "PROVISIONED"
|
||||
}
|
||||
],
|
||||
"tags": {
|
||||
"Environment": "DEV",
|
||||
"Owner": "Agnieszka Gąbka-Buszek",
|
||||
"Purpose": "UAM Cloud Data Processing"
|
||||
},
|
||||
"tags_all": {
|
||||
"Environment": "DEV",
|
||||
"Owner": "Agnieszka Gąbka-Buszek",
|
||||
"Purpose": "UAM Cloud Data Processing"
|
||||
},
|
||||
"timeouts": null
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjozMDAwMDAwMDAwMDAsImRlbGV0ZSI6NzIwMDAwMDAwMDAwMCwidXBkYXRlIjo3MjAwMDAwMDAwMDAwfSwic2NoZW1hX3ZlcnNpb24iOiIxIn0="
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_lambda_function",
|
||||
"name": "etl_post_processing",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"architectures": [
|
||||
"x86_64"
|
||||
],
|
||||
"arn": "arn:aws:lambda:us-east-1:205216182560:function:etl-post-processing-205216182560-agb-s1201687",
|
||||
"code_signing_config_arn": "",
|
||||
"dead_letter_config": [],
|
||||
"description": "",
|
||||
"environment": [],
|
||||
"ephemeral_storage": [
|
||||
{
|
||||
"size": 512
|
||||
}
|
||||
],
|
||||
"file_system_config": [],
|
||||
"filename": "../lambda/lambda_definition.zip",
|
||||
"function_name": "etl-post-processing-205216182560-agb-s1201687",
|
||||
"handler": "lambda_definition.etl_function",
|
||||
"id": "etl-post-processing-205216182560-agb-s1201687",
|
||||
"image_config": [],
|
||||
"image_uri": "",
|
||||
"invoke_arn": "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:205216182560:function:etl-post-processing-205216182560-agb-s1201687/invocations",
|
||||
"kms_key_arn": "",
|
||||
"last_modified": "2024-05-27T10:23:37.000+0000",
|
||||
"layers": [
|
||||
"arn:aws:lambda:us-east-1:205216182560:layer:aws_wrangler_205216182560_agb_s1201687:7"
|
||||
],
|
||||
"logging_config": [
|
||||
{
|
||||
"application_log_level": "",
|
||||
"log_format": "Text",
|
||||
"log_group": "/aws/lambda/etl-post-processing-205216182560-agb-s1201687",
|
||||
"system_log_level": ""
|
||||
}
|
||||
],
|
||||
"memory_size": 512,
|
||||
"package_type": "Zip",
|
||||
"publish": false,
|
||||
"qualified_arn": "arn:aws:lambda:us-east-1:205216182560:function:etl-post-processing-205216182560-agb-s1201687:$LATEST",
|
||||
"qualified_invoke_arn": "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:205216182560:function:etl-post-processing-205216182560-agb-s1201687:$LATEST/invocations",
|
||||
"replace_security_groups_on_destroy": null,
|
||||
"replacement_security_group_ids": null,
|
||||
"reserved_concurrent_executions": -1,
|
||||
"role": "arn:aws:iam::205216182560:role/LabRole",
|
||||
"runtime": "python3.8",
|
||||
"s3_bucket": null,
|
||||
"s3_key": null,
|
||||
"s3_object_version": null,
|
||||
"signing_job_arn": "",
|
||||
"signing_profile_version_arn": "",
|
||||
"skip_destroy": false,
|
||||
"snap_start": [],
|
||||
"source_code_hash": "DYklWA51/+hutwYtHutJg59rV7DY0LEgfp+ne8wgiSo=",
|
||||
"source_code_size": 884,
|
||||
"tags": {},
|
||||
"tags_all": {},
|
||||
"timeout": 300,
|
||||
"timeouts": null,
|
||||
"tracing_config": [
|
||||
{
|
||||
"mode": "PassThrough"
|
||||
}
|
||||
],
|
||||
"version": "$LATEST",
|
||||
"vpc_config": []
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjo2MDAwMDAwMDAwMDAsImRlbGV0ZSI6NjAwMDAwMDAwMDAwLCJ1cGRhdGUiOjYwMDAwMDAwMDAwMH19",
|
||||
"dependencies": [
|
||||
"aws_lambda_layer_version.aws_wrangler"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_lambda_layer_version",
|
||||
"name": "aws_wrangler",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"arn": "arn:aws:lambda:us-east-1:205216182560:layer:aws_wrangler_205216182560_agb_s1201687:7",
|
||||
"compatible_architectures": [],
|
||||
"compatible_runtimes": [
|
||||
"python3.8"
|
||||
],
|
||||
"created_date": "2024-05-27T08:03:57.293+0000",
|
||||
"description": "",
|
||||
"filename": "../lambda/awswrangler-layer-2.7.0-py3.8.zip",
|
||||
"id": "arn:aws:lambda:us-east-1:205216182560:layer:aws_wrangler_205216182560_agb_s1201687:7",
|
||||
"layer_arn": "arn:aws:lambda:us-east-1:205216182560:layer:aws_wrangler_205216182560_agb_s1201687",
|
||||
"layer_name": "aws_wrangler_205216182560_agb_s1201687",
|
||||
"license_info": "",
|
||||
"s3_bucket": null,
|
||||
"s3_key": null,
|
||||
"s3_object_version": null,
|
||||
"signing_job_arn": "",
|
||||
"signing_profile_version_arn": "",
|
||||
"skip_destroy": false,
|
||||
"source_code_hash": "C0YX/4auMnBs4J9JCDy1f7uc2GLF0vU7ppQgzffQiN4=",
|
||||
"source_code_size": 43879070,
|
||||
"version": "7"
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "bnVsbA=="
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_lambda_permission",
|
||||
"name": "allow_bucket",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"action": "lambda:InvokeFunction",
|
||||
"event_source_token": null,
|
||||
"function_name": "arn:aws:lambda:us-east-1:205216182560:function:etl-post-processing-205216182560-agb-s1201687",
|
||||
"function_url_auth_type": null,
|
||||
"id": "AllowExecutionFromS3Bucket",
|
||||
"principal": "s3.amazonaws.com",
|
||||
"principal_org_id": null,
|
||||
"qualifier": "",
|
||||
"source_account": null,
|
||||
"source_arn": "arn:aws:s3:::datalake-raw-205216182560-agb-s1201687",
|
||||
"statement_id": "AllowExecutionFromS3Bucket",
|
||||
"statement_id_prefix": ""
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "bnVsbA==",
|
||||
"dependencies": [
|
||||
"aws_lambda_function.etl_post_processing",
|
||||
"aws_lambda_layer_version.aws_wrangler",
|
||||
"aws_s3_bucket.raw_bucket"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_s3_bucket",
|
||||
"name": "athena_results",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"acceleration_status": "",
|
||||
"acl": null,
|
||||
"arn": "arn:aws:s3:::athena-results-205216182560-agb-s1201687",
|
||||
"bucket": "athena-results-205216182560-agb-s1201687",
|
||||
"bucket_domain_name": "athena-results-205216182560-agb-s1201687.s3.amazonaws.com",
|
||||
"bucket_prefix": "",
|
||||
"bucket_regional_domain_name": "athena-results-205216182560-agb-s1201687.s3.us-east-1.amazonaws.com",
|
||||
"cors_rule": [],
|
||||
"force_destroy": true,
|
||||
"grant": [
|
||||
{
|
||||
"id": "42e379e111382262c89fb5a8aef42c2fd8d1e15971b4e7ae2317f0b554d6f32e",
|
||||
"permissions": [
|
||||
"FULL_CONTROL"
|
||||
],
|
||||
"type": "CanonicalUser",
|
||||
"uri": ""
|
||||
}
|
||||
],
|
||||
"hosted_zone_id": "Z3AQBSTGFYJSTF",
|
||||
"id": "athena-results-205216182560-agb-s1201687",
|
||||
"lifecycle_rule": [
|
||||
{
|
||||
"abort_incomplete_multipart_upload_days": 0,
|
||||
"enabled": true,
|
||||
"expiration": [
|
||||
{
|
||||
"date": "",
|
||||
"days": 1,
|
||||
"expired_object_delete_marker": false
|
||||
}
|
||||
],
|
||||
"id": "standard-expiration",
|
||||
"noncurrent_version_expiration": [],
|
||||
"noncurrent_version_transition": [],
|
||||
"prefix": "",
|
||||
"tags": {},
|
||||
"transition": []
|
||||
}
|
||||
],
|
||||
"logging": [],
|
||||
"object_lock_configuration": [],
|
||||
"object_lock_enabled": false,
|
||||
"policy": "",
|
||||
"region": "us-east-1",
|
||||
"replication_configuration": [],
|
||||
"request_payer": "BucketOwner",
|
||||
"server_side_encryption_configuration": [
|
||||
{
|
||||
"rule": [
|
||||
{
|
||||
"apply_server_side_encryption_by_default": [
|
||||
{
|
||||
"kms_master_key_id": "",
|
||||
"sse_algorithm": "AES256"
|
||||
}
|
||||
],
|
||||
"bucket_key_enabled": false
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"tags": {
|
||||
"environment": "DEV",
|
||||
"owner": "Agnieszka Gąbka-Buszek",
|
||||
"purpose": "UAM Cloud Data Processing"
|
||||
},
|
||||
"tags_all": {
|
||||
"environment": "DEV",
|
||||
"owner": "Agnieszka Gąbka-Buszek",
|
||||
"purpose": "UAM Cloud Data Processing"
|
||||
},
|
||||
"timeouts": null,
|
||||
"versioning": [
|
||||
{
|
||||
"enabled": false,
|
||||
"mfa_delete": false
|
||||
}
|
||||
],
|
||||
"website": [],
|
||||
"website_domain": null,
|
||||
"website_endpoint": null
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjoxMjAwMDAwMDAwMDAwLCJkZWxldGUiOjM2MDAwMDAwMDAwMDAsInJlYWQiOjEyMDAwMDAwMDAwMDAsInVwZGF0ZSI6MTIwMDAwMDAwMDAwMH19"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_s3_bucket",
|
||||
"name": "processed_bucket",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"acceleration_status": "",
|
||||
"acl": null,
|
||||
"arn": "arn:aws:s3:::datalake-processed-205216182560-agb-s1201687",
|
||||
"bucket": "datalake-processed-205216182560-agb-s1201687",
|
||||
"bucket_domain_name": "datalake-processed-205216182560-agb-s1201687.s3.amazonaws.com",
|
||||
"bucket_prefix": "",
|
||||
"bucket_regional_domain_name": "datalake-processed-205216182560-agb-s1201687.s3.us-east-1.amazonaws.com",
|
||||
"cors_rule": [],
|
||||
"force_destroy": true,
|
||||
"grant": [
|
||||
{
|
||||
"id": "42e379e111382262c89fb5a8aef42c2fd8d1e15971b4e7ae2317f0b554d6f32e",
|
||||
"permissions": [
|
||||
"FULL_CONTROL"
|
||||
],
|
||||
"type": "CanonicalUser",
|
||||
"uri": ""
|
||||
}
|
||||
],
|
||||
"hosted_zone_id": "Z3AQBSTGFYJSTF",
|
||||
"id": "datalake-processed-205216182560-agb-s1201687",
|
||||
"lifecycle_rule": [],
|
||||
"logging": [],
|
||||
"object_lock_configuration": [],
|
||||
"object_lock_enabled": false,
|
||||
"policy": "",
|
||||
"region": "us-east-1",
|
||||
"replication_configuration": [],
|
||||
"request_payer": "BucketOwner",
|
||||
"server_side_encryption_configuration": [
|
||||
{
|
||||
"rule": [
|
||||
{
|
||||
"apply_server_side_encryption_by_default": [
|
||||
{
|
||||
"kms_master_key_id": "",
|
||||
"sse_algorithm": "AES256"
|
||||
}
|
||||
],
|
||||
"bucket_key_enabled": false
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"tags": {
|
||||
"Environment": "DEV",
|
||||
"Purpose": "UAM Cloud Data Processing"
|
||||
},
|
||||
"tags_all": {
|
||||
"Environment": "DEV",
|
||||
"Purpose": "UAM Cloud Data Processing"
|
||||
},
|
||||
"timeouts": null,
|
||||
"versioning": [
|
||||
{
|
||||
"enabled": false,
|
||||
"mfa_delete": false
|
||||
}
|
||||
],
|
||||
"website": [],
|
||||
"website_domain": null,
|
||||
"website_endpoint": null
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjoxMjAwMDAwMDAwMDAwLCJkZWxldGUiOjM2MDAwMDAwMDAwMDAsInJlYWQiOjEyMDAwMDAwMDAwMDAsInVwZGF0ZSI6MTIwMDAwMDAwMDAwMH19"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_s3_bucket",
|
||||
"name": "raw_bucket",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"acceleration_status": "",
|
||||
"acl": null,
|
||||
"arn": "arn:aws:s3:::datalake-raw-205216182560-agb-s1201687",
|
||||
"bucket": "datalake-raw-205216182560-agb-s1201687",
|
||||
"bucket_domain_name": "datalake-raw-205216182560-agb-s1201687.s3.amazonaws.com",
|
||||
"bucket_prefix": "",
|
||||
"bucket_regional_domain_name": "datalake-raw-205216182560-agb-s1201687.s3.us-east-1.amazonaws.com",
|
||||
"cors_rule": [],
|
||||
"force_destroy": true,
|
||||
"grant": [
|
||||
{
|
||||
"id": "42e379e111382262c89fb5a8aef42c2fd8d1e15971b4e7ae2317f0b554d6f32e",
|
||||
"permissions": [
|
||||
"FULL_CONTROL"
|
||||
],
|
||||
"type": "CanonicalUser",
|
||||
"uri": ""
|
||||
}
|
||||
],
|
||||
"hosted_zone_id": "Z3AQBSTGFYJSTF",
|
||||
"id": "datalake-raw-205216182560-agb-s1201687",
|
||||
"lifecycle_rule": [],
|
||||
"logging": [],
|
||||
"object_lock_configuration": [],
|
||||
"object_lock_enabled": false,
|
||||
"policy": "",
|
||||
"region": "us-east-1",
|
||||
"replication_configuration": [],
|
||||
"request_payer": "BucketOwner",
|
||||
"server_side_encryption_configuration": [
|
||||
{
|
||||
"rule": [
|
||||
{
|
||||
"apply_server_side_encryption_by_default": [
|
||||
{
|
||||
"kms_master_key_id": "",
|
||||
"sse_algorithm": "AES256"
|
||||
}
|
||||
],
|
||||
"bucket_key_enabled": false
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"tags": {
|
||||
"Environment": "DEV",
|
||||
"Purpose": "UAM Cloud Data Processing"
|
||||
},
|
||||
"tags_all": {
|
||||
"Environment": "DEV",
|
||||
"Purpose": "UAM Cloud Data Processing"
|
||||
},
|
||||
"timeouts": null,
|
||||
"versioning": [
|
||||
{
|
||||
"enabled": false,
|
||||
"mfa_delete": false
|
||||
}
|
||||
],
|
||||
"website": [],
|
||||
"website_domain": null,
|
||||
"website_endpoint": null
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjoxMjAwMDAwMDAwMDAwLCJkZWxldGUiOjM2MDAwMDAwMDAwMDAsInJlYWQiOjEyMDAwMDAwMDAwMDAsInVwZGF0ZSI6MTIwMDAwMDAwMDAwMH19"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_s3_bucket_lifecycle_configuration",
|
||||
"name": "athena_results_lifecycle",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"bucket": "athena-results-205216182560-agb-s1201687",
|
||||
"expected_bucket_owner": "",
|
||||
"id": "athena-results-205216182560-agb-s1201687",
|
||||
"rule": [
|
||||
{
|
||||
"abort_incomplete_multipart_upload": [],
|
||||
"expiration": [
|
||||
{
|
||||
"date": "",
|
||||
"days": 1,
|
||||
"expired_object_delete_marker": false
|
||||
}
|
||||
],
|
||||
"filter": [
|
||||
{
|
||||
"and": [],
|
||||
"object_size_greater_than": "",
|
||||
"object_size_less_than": "",
|
||||
"prefix": "",
|
||||
"tag": []
|
||||
}
|
||||
],
|
||||
"id": "standard-expiration",
|
||||
"noncurrent_version_expiration": [],
|
||||
"noncurrent_version_transition": [],
|
||||
"prefix": "",
|
||||
"status": "Enabled",
|
||||
"transition": []
|
||||
}
|
||||
],
|
||||
"timeouts": null
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjoxODAwMDAwMDAwMDAsInVwZGF0ZSI6MTgwMDAwMDAwMDAwfX0=",
|
||||
"dependencies": [
|
||||
"aws_s3_bucket.athena_results"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_s3_bucket_notification",
|
||||
"name": "trigger_etl_lambda",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"bucket": "datalake-raw-205216182560-agb-s1201687",
|
||||
"eventbridge": false,
|
||||
"id": "datalake-raw-205216182560-agb-s1201687",
|
||||
"lambda_function": [
|
||||
{
|
||||
"events": [
|
||||
"s3:ObjectCreated:*"
|
||||
],
|
||||
"filter_prefix": "raw-zone/",
|
||||
"filter_suffix": "",
|
||||
"id": "tf-s3-lambda-20240527080407879000000001",
|
||||
"lambda_function_arn": "arn:aws:lambda:us-east-1:205216182560:function:etl-post-processing-205216182560-agb-s1201687"
|
||||
}
|
||||
],
|
||||
"queue": [],
|
||||
"topic": []
|
||||
},
|
||||
"sensitive_attributes": [],
|
||||
"private": "bnVsbA==",
|
||||
"dependencies": [
|
||||
"aws_lambda_function.etl_post_processing",
|
||||
"aws_lambda_layer_version.aws_wrangler",
|
||||
"aws_lambda_permission.allow_bucket",
|
||||
"aws_s3_bucket.raw_bucket"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"mode": "managed",
|
||||
"type": "aws_ssm_parameter",
|
||||
"name": "s3_processed",
|
||||
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
|
||||
"instances": [
|
||||
{
|
||||
"schema_version": 0,
|
||||
"attributes": {
|
||||
"allowed_pattern": "",
|
||||
"arn": "arn:aws:ssm:us-east-1:205216182560:parameter/s3_processed_bucket_name",
|
||||
"data_type": "text",
|
||||
"description": "",
|
||||
"id": "s3_processed_bucket_name",
|
||||
"insecure_value": null,
|
||||
"key_id": "",
|
||||
"name": "s3_processed_bucket_name",
|
||||
"overwrite": null,
|
||||
"tags": {},
|
||||
"tags_all": {},
|
||||
"tier": "Standard",
|
||||
"type": "String",
|
||||
"value": "datalake-processed-205216182560-agb-s1201687",
|
||||
"version": 1
|
||||
},
|
||||
"sensitive_attributes": [
|
||||
[
|
||||
{
|
||||
"type": "get_attr",
|
||||
"value": "value"
|
||||
}
|
||||
]
|
||||
],
|
||||
"private": "bnVsbA==",
|
||||
"dependencies": [
|
||||
"aws_s3_bucket.processed_bucket"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"check_results": null
|
||||
}
|
5
labs/terraform/terraform.tfvars
Normal file
5
labs/terraform/terraform.tfvars
Normal file
@ -0,0 +1,5 @@
|
||||
account_number=205216182560
|
||||
student_initials="agb"
|
||||
student_full_name="Agnieszka Gąbka-Buszek"
|
||||
student_index_no = "s1201687"
|
||||
lab_role_arn = "arn:aws:iam::205216182560:role/LabRole"
|
37
labs/terraform/variables.tf
Normal file
37
labs/terraform/variables.tf
Normal file
@ -0,0 +1,37 @@
|
||||
variable "account_number" {
|
||||
description = "Account number"
|
||||
type = number
|
||||
}
|
||||
|
||||
variable "region" {
|
||||
description = "Region name - must be NVirginia us-east-1"
|
||||
type = string
|
||||
default = "us-east-1"
|
||||
}
|
||||
|
||||
variable "environment" {
|
||||
description = "Environment name"
|
||||
type = string
|
||||
default = "dev"
|
||||
}
|
||||
|
||||
variable "student_initials" {
|
||||
description = "letters of first and last names"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "student_full_name" {
|
||||
description = "Student's full name"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "student_index_no" {
|
||||
description = "Index no"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "lab_role_arn" {
|
||||
description = "the role we use for all labs, dont use a single role for everything! it is an anti-pattern!!!!"
|
||||
type = string
|
||||
|
||||
}
|
BIN
pdf/Intro_to_cloud_computing.pdf
Normal file
BIN
pdf/Intro_to_cloud_computing.pdf
Normal file
Binary file not shown.
BIN
pdf/LABS Przetwarzanie Danych w chmurze publicznej.pdf
Normal file
BIN
pdf/LABS Przetwarzanie Danych w chmurze publicznej.pdf
Normal file
Binary file not shown.
BIN
pdf/cloud_data_processing.pdf
Normal file
BIN
pdf/cloud_data_processing.pdf
Normal file
Binary file not shown.
Loading…
Reference in New Issue
Block a user