Skip to content
Snippets Groups Projects
Commit 446eb4fd authored by Durvesh Rajubhau Mahurkar's avatar Durvesh Rajubhau Mahurkar
Browse files

Upload New File

parent 9e901b47
Branches
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
## Installing required packages
%% Cell type:code id: tags:
``` python
!pip install apache-beam
!pip install --force-reinstall google-cloud-storage
!pip install apache-beam[gcp]
```
%% Output
Collecting apache-beam
Downloading apache_beam-2.59.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.5 kB)
Collecting crcmod<2.0,>=1.7 (from apache-beam)
Downloading crcmod-1.7.tar.gz (89 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.7/89.7 kB 4.2 MB/s eta 0:00:00
[?25h Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting orjson<4,>=3.9.7 (from apache-beam)
Downloading orjson-3.10.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (50 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.4/50.4 kB 654.1 kB/s eta 0:00:00
[?25hCollecting dill<0.3.2,>=0.3.1.1 (from apache-beam)
Downloading dill-0.3.1.1.tar.gz (151 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 152.0/152.0 kB 9.6 MB/s eta 0:00:00
[?25h Preparing metadata (setup.py) ... [?25l[?25hdone
Requirement already satisfied: cloudpickle~=2.2.1 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (2.2.1)
Collecting fastavro<2,>=0.23.6 (from apache-beam)
Downloading fastavro-1.9.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.5 kB)
Collecting fasteners<1.0,>=0.3 (from apache-beam)
Downloading fasteners-0.19-py3-none-any.whl.metadata (4.9 kB)
Requirement already satisfied: grpcio!=1.48.0,!=1.59.*,!=1.60.*,!=1.61.*,!=1.62.0,!=1.62.1,<2,>=1.33.1 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (1.64.1)
Collecting hdfs<3.0.0,>=2.1.0 (from apache-beam)
Downloading hdfs-2.7.3.tar.gz (43 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.5/43.5 kB 2.8 MB/s eta 0:00:00
[?25h Preparing metadata (setup.py) ... [?25l[?25hdone
Requirement already satisfied: httplib2<0.23.0,>=0.8 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (0.22.0)
Requirement already satisfied: jsonschema<5.0.0,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (4.23.0)
Requirement already satisfied: jsonpickle<4.0.0,>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (3.3.0)
Requirement already satisfied: numpy<1.27.0,>=1.14.3 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (1.26.4)
Collecting objsize<0.8.0,>=0.6.1 (from apache-beam)
Downloading objsize-0.7.0-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: packaging>=22.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (24.1)
Collecting pymongo<5.0.0,>=3.8.0 (from apache-beam)
Downloading pymongo-4.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Requirement already satisfied: proto-plus<2,>=1.7.1 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (1.24.0)
Requirement already satisfied: protobuf!=4.0.*,!=4.21.*,!=4.22.0,!=4.23.*,!=4.24.*,<4.26.0,>=3.20.3 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (3.20.3)
Collecting pydot<2,>=1.2.0 (from apache-beam)
Downloading pydot-1.4.2-py2.py3-none-any.whl.metadata (8.0 kB)
Requirement already satisfied: python-dateutil<3,>=2.8.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (2.8.2)
Requirement already satisfied: pytz>=2018.3 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (2024.2)
Collecting redis<6,>=5.0.0 (from apache-beam)
Downloading redis-5.1.0-py3-none-any.whl.metadata (9.1 kB)
Requirement already satisfied: regex>=2020.6.8 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (2024.9.11)
Requirement already satisfied: requests<3.0.0,>=2.24.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (2.32.3)
Requirement already satisfied: typing-extensions>=3.7.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (4.12.2)
Collecting zstandard<1,>=0.18.0 (from apache-beam)
Downloading zstandard-0.23.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Requirement already satisfied: pyarrow<17.0.0,>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (16.1.0)
Requirement already satisfied: pyarrow-hotfix<1 in /usr/local/lib/python3.10/dist-packages (from apache-beam) (0.6)
Collecting js2py<1,>=0.74 (from apache-beam)
Downloading Js2Py-0.74-py3-none-any.whl.metadata (868 bytes)
Collecting docopt (from hdfs<3.0.0,>=2.1.0->apache-beam)
Downloading docopt-0.6.2.tar.gz (25 kB)
Preparing metadata (setup.py) ... [?25l[?25hdone
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from hdfs<3.0.0,>=2.1.0->apache-beam) (1.16.0)
Requirement already satisfied: pyparsing!=3.0.0,!=3.0.1,!=3.0.2,!=3.0.3,<4,>=2.4.2 in /usr/local/lib/python3.10/dist-packages (from httplib2<0.23.0,>=0.8->apache-beam) (3.1.4)
Requirement already satisfied: tzlocal>=1.2 in /usr/local/lib/python3.10/dist-packages (from js2py<1,>=0.74->apache-beam) (5.2)
Collecting pyjsparser>=2.5.1 (from js2py<1,>=0.74->apache-beam)
Downloading pyjsparser-2.7.1.tar.gz (24 kB)
Preparing metadata (setup.py) ... [?25l[?25hdone
Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema<5.0.0,>=4.0.0->apache-beam) (24.2.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema<5.0.0,>=4.0.0->apache-beam) (2023.12.1)
Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema<5.0.0,>=4.0.0->apache-beam) (0.35.1)
Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema<5.0.0,>=4.0.0->apache-beam) (0.20.0)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo<5.0.0,>=3.8.0->apache-beam)
Downloading dnspython-2.6.1-py3-none-any.whl.metadata (5.8 kB)
Requirement already satisfied: async-timeout>=4.0.3 in /usr/local/lib/python3.10/dist-packages (from redis<6,>=5.0.0->apache-beam) (4.0.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam) (2024.8.30)
Downloading apache_beam-2.59.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.6 MB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.6/15.6 MB 67.3 MB/s eta 0:00:00
[?25hDownloading fastavro-1.9.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 62.0 MB/s eta 0:00:00
[?25hDownloading fasteners-0.19-py3-none-any.whl (18 kB)
Downloading Js2Py-0.74-py3-none-any.whl (1.0 MB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 42.5 MB/s eta 0:00:00
[?25hDownloading objsize-0.7.0-py3-none-any.whl (11 kB)
Downloading orjson-3.10.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (141 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 141.9/141.9 kB 9.1 MB/s eta 0:00:00
[?25hDownloading pydot-1.4.2-py2.py3-none-any.whl (21 kB)
Downloading pymongo-4.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 57.7 MB/s eta 0:00:00
[?25hDownloading redis-5.1.0-py3-none-any.whl (261 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 261.2/261.2 kB 20.1 MB/s eta 0:00:00
[?25hDownloading zstandard-0.23.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.4/5.4 MB 66.8 MB/s eta 0:00:00
[?25hDownloading dnspython-2.6.1-py3-none-any.whl (307 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.7/307.7 kB 20.9 MB/s eta 0:00:00
[?25hBuilding wheels for collected packages: crcmod, dill, hdfs, pyjsparser, docopt
Building wheel for crcmod (setup.py) ... [?25l[?25hdone
Created wheel for crcmod: filename=crcmod-1.7-cp310-cp310-linux_x86_64.whl size=31404 sha256=47dc3451b06ac3201ab1bc72b40f38c2fbdc7a4ec7f4eb16809171c7f5ad680c
Stored in directory: /root/.cache/pip/wheels/85/4c/07/72215c529bd59d67e3dac29711d7aba1b692f543c808ba9e86
Building wheel for dill (setup.py) ... [?25l[?25hdone
Created wheel for dill: filename=dill-0.3.1.1-py3-none-any.whl size=78542 sha256=3f3c64766dc34730bb2ab29a10ac85c747587342b2b6656123c8437cce8a281a
Stored in directory: /root/.cache/pip/wheels/ea/e2/86/64980d90e297e7bf2ce588c2b96e818f5399c515c4bb8a7e4f
Building wheel for hdfs (setup.py) ... [?25l[?25hdone
Created wheel for hdfs: filename=hdfs-2.7.3-py3-none-any.whl size=34325 sha256=8d1a31bca252ca0170842d140abbdb3c77a75e8f4bdb7207c94b632cfc758434
Stored in directory: /root/.cache/pip/wheels/e5/8d/b6/99c1c0a3ac5788c866b0ecd3f48b0134a5910e6ed26011800b
Building wheel for pyjsparser (setup.py) ... [?25l[?25hdone
Created wheel for pyjsparser: filename=pyjsparser-2.7.1-py3-none-any.whl size=25983 sha256=6be8e58dba3902a3e4f5850f4093cac8357a584f79626762b99823018c157c06
Stored in directory: /root/.cache/pip/wheels/5e/81/26/5956478df303e2bf5a85a5df595bb307bd25948a4bab69f7c7
Building wheel for docopt (setup.py) ... [?25l[?25hdone
Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13704 sha256=4f8d8d3fa9fa3f5190772934989f590858526810e7919cbe546dc5753b5701ec
Stored in directory: /root/.cache/pip/wheels/fc/ab/d4/5da2067ac95b36618c629a5f93f809425700506f72c9732fac
Successfully built crcmod dill hdfs pyjsparser docopt
Installing collected packages: pyjsparser, docopt, crcmod, zstandard, redis, pydot, orjson, objsize, js2py, fasteners, fastavro, dnspython, dill, pymongo, hdfs, apache-beam
Attempting uninstall: pydot
Found existing installation: pydot 3.0.2
Uninstalling pydot-3.0.2:
Successfully uninstalled pydot-3.0.2
Successfully installed apache-beam-2.59.0 crcmod-1.7 dill-0.3.1.1 dnspython-2.6.1 docopt-0.6.2 fastavro-1.9.7 fasteners-0.19 hdfs-2.7.3 js2py-0.74 objsize-0.7.0 orjson-3.10.7 pydot-1.4.2 pyjsparser-2.7.1 pymongo-4.10.1 redis-5.1.0 zstandard-0.23.0
Collecting google-cloud-storage
Downloading google_cloud_storage-2.18.2-py2.py3-none-any.whl.metadata (9.1 kB)
Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage)
Downloading google_auth-2.35.0-py2.py3-none-any.whl.metadata (4.7 kB)
Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage)
Downloading google_api_core-2.20.0-py3-none-any.whl.metadata (2.7 kB)
Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage)
Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting google-resumable-media>=2.7.2 (from google-cloud-storage)
Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB)
Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage)
Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage)
Using cached google_crc32c-1.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.3 kB)
Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage)
Using cached googleapis_common_protos-1.65.0-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage)
Downloading protobuf-5.28.2-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage)
Using cached proto_plus-1.24.0-py3-none-any.whl.metadata (2.2 kB)
Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage)
Using cached cachetools-5.5.0-py3-none-any.whl.metadata (5.3 kB)
Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage)
Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB)
Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage)
Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB)
Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage)
Using cached charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (33 kB)
Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage)
Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage)
Using cached urllib3-2.2.3-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage)
Using cached certifi-2024.8.30-py3-none-any.whl.metadata (2.2 kB)
Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage)
Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB)
Downloading google_cloud_storage-2.18.2-py2.py3-none-any.whl (130 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 130.5/130.5 kB 8.7 MB/s eta 0:00:00
[?25hDownloading google_api_core-2.20.0-py3-none-any.whl (142 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 142.2/142.2 kB 11.0 MB/s eta 0:00:00
[?25hDownloading google_auth-2.35.0-py2.py3-none-any.whl (208 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.0/209.0 kB 15.0 MB/s eta 0:00:00
[?25hUsing cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB)
Using cached google_crc32c-1.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (37 kB)
Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached cachetools-5.5.0-py3-none-any.whl (9.5 kB)
Using cached certifi-2024.8.30-py3-none-any.whl (167 kB)
Using cached charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142 kB)
Using cached googleapis_common_protos-1.65.0-py2.py3-none-any.whl (220 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached proto_plus-1.24.0-py3-none-any.whl (50 kB)
Downloading protobuf-5.28.2-cp38-abi3-manylinux2014_x86_64.whl (316 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 316.6/316.6 kB 21.8 MB/s eta 0:00:00
[?25hUsing cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB)
Using cached rsa-4.9-py3-none-any.whl (34 kB)
Using cached urllib3-2.2.3-py3-none-any.whl (126 kB)
Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB)
Installing collected packages: urllib3, pyasn1, protobuf, idna, google-crc32c, charset-normalizer, certifi, cachetools, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage
Attempting uninstall: urllib3
Found existing installation: urllib3 2.2.3
Uninstalling urllib3-2.2.3:
Successfully uninstalled urllib3-2.2.3
Attempting uninstall: pyasn1
Found existing installation: pyasn1 0.6.1
Uninstalling pyasn1-0.6.1:
Successfully uninstalled pyasn1-0.6.1
Attempting uninstall: protobuf
Found existing installation: protobuf 3.20.3
Uninstalling protobuf-3.20.3:
Successfully uninstalled protobuf-3.20.3
Attempting uninstall: idna
Found existing installation: idna 3.10
Uninstalling idna-3.10:
Successfully uninstalled idna-3.10
Attempting uninstall: google-crc32c
Found existing installation: google-crc32c 1.6.0
Uninstalling google-crc32c-1.6.0:
Successfully uninstalled google-crc32c-1.6.0
Attempting uninstall: charset-normalizer
Found existing installation: charset-normalizer 3.3.2
Uninstalling charset-normalizer-3.3.2:
Successfully uninstalled charset-normalizer-3.3.2
Attempting uninstall: certifi
Found existing installation: certifi 2024.8.30
Uninstalling certifi-2024.8.30:
Successfully uninstalled certifi-2024.8.30
Attempting uninstall: cachetools
Found existing installation: cachetools 5.5.0
Uninstalling cachetools-5.5.0:
Successfully uninstalled cachetools-5.5.0
Attempting uninstall: rsa
Found existing installation: rsa 4.9
Uninstalling rsa-4.9:
Successfully uninstalled rsa-4.9
Attempting uninstall: requests
Found existing installation: requests 2.32.3
Uninstalling requests-2.32.3:
Successfully uninstalled requests-2.32.3
Attempting uninstall: pyasn1-modules
Found existing installation: pyasn1_modules 0.4.1
Uninstalling pyasn1_modules-0.4.1:
Successfully uninstalled pyasn1_modules-0.4.1
Attempting uninstall: proto-plus
Found existing installation: proto-plus 1.24.0
Uninstalling proto-plus-1.24.0:
Successfully uninstalled proto-plus-1.24.0
Attempting uninstall: googleapis-common-protos
Found existing installation: googleapis-common-protos 1.65.0
Uninstalling googleapis-common-protos-1.65.0:
Successfully uninstalled googleapis-common-protos-1.65.0
Attempting uninstall: google-resumable-media
Found existing installation: google-resumable-media 2.7.2
Uninstalling google-resumable-media-2.7.2:
Successfully uninstalled google-resumable-media-2.7.2
Attempting uninstall: google-auth
Found existing installation: google-auth 2.27.0
Uninstalling google-auth-2.27.0:
Successfully uninstalled google-auth-2.27.0
Attempting uninstall: google-api-core
Found existing installation: google-api-core 2.19.2
Uninstalling google-api-core-2.19.2:
Successfully uninstalled google-api-core-2.19.2
Attempting uninstall: google-cloud-core
Found existing installation: google-cloud-core 2.4.1
Uninstalling google-cloud-core-2.4.1:
Successfully uninstalled google-cloud-core-2.4.1
Attempting uninstall: google-cloud-storage
Found existing installation: google-cloud-storage 2.8.0
Uninstalling google-cloud-storage-2.8.0:
Successfully uninstalled google-cloud-storage-2.8.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
apache-beam 2.59.0 requires protobuf!=4.0.*,!=4.21.*,!=4.22.0,!=4.23.*,!=4.24.*,<4.26.0,>=3.20.3, but you have protobuf 5.28.2 which is incompatible.
google-ai-generativelanguage 0.6.6 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 5.28.2 which is incompatible.
google-cloud-datastore 2.19.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 5.28.2 which is incompatible.
google-cloud-firestore 2.16.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 5.28.2 which is incompatible.
google-colab 1.0.0 requires google-auth==2.27.0, but you have google-auth 2.35.0 which is incompatible.
tensorboard 2.17.0 requires protobuf!=4.24.0,<5.0.0,>=3.19.6, but you have protobuf 5.28.2 which is incompatible.
tensorflow 2.17.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 5.28.2 which is incompatible.
tensorflow-metadata 1.15.0 requires protobuf<4.21,>=3.20.3; python_version < "3.11", but you have protobuf 5.28.2 which is incompatible.
Successfully installed cachetools-5.5.0 certifi-2024.8.30 charset-normalizer-3.3.2 google-api-core-2.20.0 google-auth-2.35.0 google-cloud-core-2.4.1 google-cloud-storage-2.18.2 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.65.0 idna-3.10 proto-plus-1.24.0 protobuf-5.28.2 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 urllib3-2.2.3
Requirement already satisfied: apache-beam[gcp] in /usr/local/lib/python3.10/dist-packages (2.59.0)
Requirement already satisfied: crcmod<2.0,>=1.7 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (1.7)
Requirement already satisfied: orjson<4,>=3.9.7 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (3.10.7)
Requirement already satisfied: dill<0.3.2,>=0.3.1.1 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (0.3.1.1)
Requirement already satisfied: cloudpickle~=2.2.1 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2.2.1)
Requirement already satisfied: fastavro<2,>=0.23.6 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (1.9.7)
Requirement already satisfied: fasteners<1.0,>=0.3 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (0.19)
Requirement already satisfied: grpcio!=1.48.0,!=1.59.*,!=1.60.*,!=1.61.*,!=1.62.0,!=1.62.1,<2,>=1.33.1 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (1.64.1)
Requirement already satisfied: hdfs<3.0.0,>=2.1.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2.7.3)
Requirement already satisfied: httplib2<0.23.0,>=0.8 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (0.22.0)
Requirement already satisfied: jsonschema<5.0.0,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (4.23.0)
Requirement already satisfied: jsonpickle<4.0.0,>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (3.3.0)
Requirement already satisfied: numpy<1.27.0,>=1.14.3 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (1.26.4)
Requirement already satisfied: objsize<0.8.0,>=0.6.1 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (0.7.0)
Requirement already satisfied: packaging>=22.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (24.1)
Requirement already satisfied: pymongo<5.0.0,>=3.8.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (4.10.1)
Requirement already satisfied: proto-plus<2,>=1.7.1 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (1.24.0)
Collecting protobuf!=4.0.*,!=4.21.*,!=4.22.0,!=4.23.*,!=4.24.*,<4.26.0,>=3.20.3 (from apache-beam[gcp])
Downloading protobuf-4.25.5-cp37-abi3-manylinux2014_x86_64.whl.metadata (541 bytes)
Requirement already satisfied: pydot<2,>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (1.4.2)
Requirement already satisfied: python-dateutil<3,>=2.8.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2.8.2)
Requirement already satisfied: pytz>=2018.3 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2024.2)
Requirement already satisfied: redis<6,>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (5.1.0)
Requirement already satisfied: regex>=2020.6.8 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2024.9.11)
Requirement already satisfied: requests<3.0.0,>=2.24.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2.32.3)
Requirement already satisfied: typing-extensions>=3.7.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (4.12.2)
Requirement already satisfied: zstandard<1,>=0.18.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (0.23.0)
Requirement already satisfied: pyarrow<17.0.0,>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (16.1.0)
Requirement already satisfied: pyarrow-hotfix<1 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (0.6)
Requirement already satisfied: js2py<1,>=0.74 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (0.74)
Requirement already satisfied: cachetools<6,>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (5.5.0)
Requirement already satisfied: google-api-core<3,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2.20.0)
Collecting google-apitools<0.5.32,>=0.5.31 (from apache-beam[gcp])
Downloading google-apitools-0.5.31.tar.gz (173 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 173.5/173.5 kB 12.3 MB/s eta 0:00:00
[?25h Preparing metadata (setup.py) ... [?25l[?25hdone
Requirement already satisfied: google-auth<3,>=1.18.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2.35.0)
Requirement already satisfied: google-auth-httplib2<0.3.0,>=0.1.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (0.2.0)
Requirement already satisfied: google-cloud-datastore<3,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2.19.0)
Requirement already satisfied: google-cloud-pubsub<3,>=2.1.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2.25.1)
Collecting google-cloud-pubsublite<2,>=1.2.0 (from apache-beam[gcp])
Downloading google_cloud_pubsublite-1.11.1-py2.py3-none-any.whl.metadata (5.6 kB)
Requirement already satisfied: google-cloud-storage<3,>=2.18.2 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2.18.2)
Requirement already satisfied: google-cloud-bigquery<4,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (3.25.0)
Requirement already satisfied: google-cloud-bigquery-storage<3,>=2.6.3 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2.26.0)
Requirement already satisfied: google-cloud-core<3,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2.4.1)
Requirement already satisfied: google-cloud-bigtable<3,>=2.19.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2.26.0)
Collecting google-cloud-spanner<4,>=3.0.0 (from apache-beam[gcp])
Downloading google_cloud_spanner-3.49.1-py2.py3-none-any.whl.metadata (10 kB)
Collecting google-cloud-dlp<4,>=3.0.0 (from apache-beam[gcp])
Downloading google_cloud_dlp-3.23.0-py2.py3-none-any.whl.metadata (5.3 kB)
Requirement already satisfied: google-cloud-language<3,>=2.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (2.13.4)
Collecting google-cloud-videointelligence<3,>=2.0 (from apache-beam[gcp])
Downloading google_cloud_videointelligence-2.13.5-py2.py3-none-any.whl.metadata (5.7 kB)
Collecting google-cloud-vision<4,>=2 (from apache-beam[gcp])
Downloading google_cloud_vision-3.7.4-py2.py3-none-any.whl.metadata (5.2 kB)
Collecting google-cloud-recommendations-ai<0.11.0,>=0.1.0 (from apache-beam[gcp])
Downloading google_cloud_recommendations_ai-0.10.12-py2.py3-none-any.whl.metadata (5.3 kB)
Requirement already satisfied: google-cloud-aiplatform<2.0,>=1.26.0 in /usr/local/lib/python3.10/dist-packages (from apache-beam[gcp]) (1.68.0)
Requirement already satisfied: googleapis-common-protos<2.0.dev0,>=1.56.2 in /usr/local/lib/python3.10/dist-packages (from google-api-core<3,>=2.0.0->apache-beam[gcp]) (1.65.0)
Requirement already satisfied: oauth2client>=1.4.12 in /usr/local/lib/python3.10/dist-packages (from google-apitools<0.5.32,>=0.5.31->apache-beam[gcp]) (4.1.3)
Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.10/dist-packages (from google-apitools<0.5.32,>=0.5.31->apache-beam[gcp]) (1.16.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.18.0->apache-beam[gcp]) (0.4.1)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.18.0->apache-beam[gcp]) (4.9)
Requirement already satisfied: google-cloud-resource-manager<3.0.0dev,>=1.3.3 in /usr/local/lib/python3.10/dist-packages (from google-cloud-aiplatform<2.0,>=1.26.0->apache-beam[gcp]) (1.12.5)
Requirement already satisfied: shapely<3.0.0dev in /usr/local/lib/python3.10/dist-packages (from google-cloud-aiplatform<2.0,>=1.26.0->apache-beam[gcp]) (2.0.6)
Requirement already satisfied: pydantic<3 in /usr/local/lib/python3.10/dist-packages (from google-cloud-aiplatform<2.0,>=1.26.0->apache-beam[gcp]) (2.9.2)
Requirement already satisfied: docstring-parser<1 in /usr/local/lib/python3.10/dist-packages (from google-cloud-aiplatform<2.0,>=1.26.0->apache-beam[gcp]) (0.16)
Requirement already satisfied: google-resumable-media<3.0dev,>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from google-cloud-bigquery<4,>=2.0.0->apache-beam[gcp]) (2.7.2)
Requirement already satisfied: grpc-google-iam-v1<1.0.0dev,>=0.12.4 in /usr/local/lib/python3.10/dist-packages (from google-cloud-bigtable<3,>=2.19.0->apache-beam[gcp]) (0.13.1)
Requirement already satisfied: grpcio-status>=1.33.2 in /usr/local/lib/python3.10/dist-packages (from google-cloud-pubsub<3,>=2.1.0->apache-beam[gcp]) (1.48.2)
Requirement already satisfied: opentelemetry-api>=1.27.0 in /usr/local/lib/python3.10/dist-packages (from google-cloud-pubsub<3,>=2.1.0->apache-beam[gcp]) (1.27.0)
Requirement already satisfied: opentelemetry-sdk>=1.27.0 in /usr/local/lib/python3.10/dist-packages (from google-cloud-pubsub<3,>=2.1.0->apache-beam[gcp]) (1.27.0)
Collecting overrides<8.0.0,>=6.0.1 (from google-cloud-pubsublite<2,>=1.2.0->apache-beam[gcp])
Downloading overrides-7.7.0-py3-none-any.whl.metadata (5.8 kB)
Requirement already satisfied: sqlparse>=0.4.4 in /usr/local/lib/python3.10/dist-packages (from google-cloud-spanner<4,>=3.0.0->apache-beam[gcp]) (0.5.1)
Collecting grpc-interceptor>=0.15.4 (from google-cloud-spanner<4,>=3.0.0->apache-beam[gcp])
Downloading grpc_interceptor-0.15.4-py3-none-any.whl.metadata (8.4 kB)
Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /usr/local/lib/python3.10/dist-packages (from google-cloud-storage<3,>=2.18.2->apache-beam[gcp]) (1.6.0)
Requirement already satisfied: docopt in /usr/local/lib/python3.10/dist-packages (from hdfs<3.0.0,>=2.1.0->apache-beam[gcp]) (0.6.2)
Requirement already satisfied: pyparsing!=3.0.0,!=3.0.1,!=3.0.2,!=3.0.3,<4,>=2.4.2 in /usr/local/lib/python3.10/dist-packages (from httplib2<0.23.0,>=0.8->apache-beam[gcp]) (3.1.4)
Requirement already satisfied: tzlocal>=1.2 in /usr/local/lib/python3.10/dist-packages (from js2py<1,>=0.74->apache-beam[gcp]) (5.2)
Requirement already satisfied: pyjsparser>=2.5.1 in /usr/local/lib/python3.10/dist-packages (from js2py<1,>=0.74->apache-beam[gcp]) (2.7.1)
Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema<5.0.0,>=4.0.0->apache-beam[gcp]) (24.2.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema<5.0.0,>=4.0.0->apache-beam[gcp]) (2023.12.1)
Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema<5.0.0,>=4.0.0->apache-beam[gcp]) (0.35.1)
Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema<5.0.0,>=4.0.0->apache-beam[gcp]) (0.20.0)
Requirement already satisfied: dnspython<3.0.0,>=1.16.0 in /usr/local/lib/python3.10/dist-packages (from pymongo<5.0.0,>=3.8.0->apache-beam[gcp]) (2.6.1)
Requirement already satisfied: async-timeout>=4.0.3 in /usr/local/lib/python3.10/dist-packages (from redis<6,>=5.0.0->apache-beam[gcp]) (4.0.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam[gcp]) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam[gcp]) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam[gcp]) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam[gcp]) (2024.8.30)
Requirement already satisfied: pyasn1>=0.1.7 in /usr/local/lib/python3.10/dist-packages (from oauth2client>=1.4.12->google-apitools<0.5.32,>=0.5.31->apache-beam[gcp]) (0.6.1)
Requirement already satisfied: deprecated>=1.2.6 in /usr/local/lib/python3.10/dist-packages (from opentelemetry-api>=1.27.0->google-cloud-pubsub<3,>=2.1.0->apache-beam[gcp]) (1.2.14)
Requirement already satisfied: importlib-metadata<=8.4.0,>=6.0 in /usr/local/lib/python3.10/dist-packages (from opentelemetry-api>=1.27.0->google-cloud-pubsub<3,>=2.1.0->apache-beam[gcp]) (8.4.0)
Requirement already satisfied: opentelemetry-semantic-conventions==0.48b0 in /usr/local/lib/python3.10/dist-packages (from opentelemetry-sdk>=1.27.0->google-cloud-pubsub<3,>=2.1.0->apache-beam[gcp]) (0.48b0)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3->google-cloud-aiplatform<2.0,>=1.26.0->apache-beam[gcp]) (0.7.0)
Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic<3->google-cloud-aiplatform<2.0,>=1.26.0->apache-beam[gcp]) (2.23.4)
Requirement already satisfied: wrapt<2,>=1.10 in /usr/local/lib/python3.10/dist-packages (from deprecated>=1.2.6->opentelemetry-api>=1.27.0->google-cloud-pubsub<3,>=2.1.0->apache-beam[gcp]) (1.16.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata<=8.4.0,>=6.0->opentelemetry-api>=1.27.0->google-cloud-pubsub<3,>=2.1.0->apache-beam[gcp]) (3.20.2)
Downloading google_cloud_dlp-3.23.0-py2.py3-none-any.whl (193 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.8/193.8 kB 14.6 MB/s eta 0:00:00
[?25hDownloading google_cloud_pubsublite-1.11.1-py2.py3-none-any.whl (304 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 304.6/304.6 kB 22.1 MB/s eta 0:00:00
[?25hDownloading google_cloud_recommendations_ai-0.10.12-py2.py3-none-any.whl (184 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 184.7/184.7 kB 15.2 MB/s eta 0:00:00
[?25hDownloading google_cloud_spanner-3.49.1-py2.py3-none-any.whl (402 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 402.7/402.7 kB 29.1 MB/s eta 0:00:00
[?25hDownloading google_cloud_videointelligence-2.13.5-py2.py3-none-any.whl (244 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 245.0/245.0 kB 21.3 MB/s eta 0:00:00
[?25hDownloading google_cloud_vision-3.7.4-py2.py3-none-any.whl (467 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 467.5/467.5 kB 37.4 MB/s eta 0:00:00
[?25hDownloading protobuf-4.25.5-cp37-abi3-manylinux2014_x86_64.whl (294 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 294.6/294.6 kB 21.7 MB/s eta 0:00:00
[?25hDownloading grpc_interceptor-0.15.4-py3-none-any.whl (20 kB)
Downloading overrides-7.7.0-py3-none-any.whl (17 kB)
Building wheels for collected packages: google-apitools
Building wheel for google-apitools (setup.py) ... [?25l[?25hdone
Created wheel for google-apitools: filename=google_apitools-0.5.31-py3-none-any.whl size=131014 sha256=3b23230511e396ebd4aa2974cdf1dacf21f587956bf6a4c67d26d07388adb463
Stored in directory: /root/.cache/pip/wheels/04/b7/e0/9712f8c23a5da3d9d16fb88216b897bf60e85b12f5470f26ee
Successfully built google-apitools
Installing collected packages: protobuf, overrides, grpc-interceptor, google-apitools, google-cloud-vision, google-cloud-videointelligence, google-cloud-spanner, google-cloud-recommendations-ai, google-cloud-dlp, google-cloud-pubsublite
Attempting uninstall: protobuf
Found existing installation: protobuf 5.28.2
Uninstalling protobuf-5.28.2:
Successfully uninstalled protobuf-5.28.2
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-metadata 1.15.0 requires protobuf<4.21,>=3.20.3; python_version < "3.11", but you have protobuf 4.25.5 which is incompatible.
Successfully installed google-apitools-0.5.31 google-cloud-dlp-3.23.0 google-cloud-pubsublite-1.11.1 google-cloud-recommendations-ai-0.10.12 google-cloud-spanner-3.49.1 google-cloud-videointelligence-2.13.5 google-cloud-vision-3.7.4 grpc-interceptor-0.15.4 overrides-7.7.0 protobuf-4.25.5
%% Cell type:markdown id: tags:
## Importing packages
%% Cell type:code id: tags:
``` python
# authenticating user
from google.colab import auth
auth.authenticate_user()
```
%% Cell type:code id: tags:
``` python
from apache_beam.options.pipeline_options import PipelineOptions
import apache_beam as beam
from apache_beam.io import ReadFromText, WriteToBigQuery
from google.cloud import bigquery
```
%% Cell type:markdown id: tags:
## BigQuery table_schema initialization
%% Cell type:code id: tags:
``` python
table_schema = {
'fields': [
{
"name": "student_id",
"mode": "REQUIRED",
"type": "STRING",
"description": "",
"fields": []
},
{
"name": "study_hours_per_week",
"mode": "NULLABLE",
"type": "FLOAT",
"description": "",
"fields": []
},
{
"name": "attendance_rate",
"mode": "NULLABLE",
"type": "FLOAT",
"description": "",
"fields": []
},
{
"name": "previous_grades",
"mode": "NULLABLE",
"type": "FLOAT",
"description": "",
"fields": []
},
{
"name": "participation_in_extracurricular_activities",
"mode": "NULLABLE",
"type": "BOOLEAN",
"description": "",
"fields": []
},
{
"name": "parent_education_level",
"mode": "NULLABLE",
"type": "STRING",
"description": "",
"fields": []
},
{
"name": "passed",
"mode": "NULLABLE",
"type": "BOOLEAN",
"description": "",
"fields": []
}
]
}
```
%% Cell type:markdown id: tags:
## Pipeline Functions
%% Cell type:code id: tags:
``` python
def parse_csv_to_dict(line):
element_list = line.split(',')
new_row = {}
for i, field_data in enumerate(table_schema['fields']):
new_row[field_data["name"]] = element_list[i]
return [new_row]
def run_pipeline(beam_options, input_file, project_id, dataset_id, table_name):
with beam.Pipeline(options=beam_options) as pipeline:
table_spec = f'{project_id}:{dataset_id}.{table_name}'
# Read CSV data from the GCS
data = pipeline | 'ReadFromText' >> ReadFromText(input_file, skip_header_lines=1) \
| 'Parse CSV' >> beam.ParDo(parse_csv_to_dict)
data = data | 'Remove nan' >> beam.Filter(lambda row: all(value is not None and value != 'nan' for value in row.values()))
# data | "Print" >> beam.Map(print)
# Write the data to BigQuery
data | 'WriteToBigQuery' >> WriteToBigQuery(
table=table_spec,
schema=table_schema,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
)
```
%% Cell type:code id: tags:
``` python
beam_options = PipelineOptions(
runner='DirectRunner',
temp_location='gs://niveustraining-bucketname/Durvesh',
)
input_file = 'gs://niveustraining-bucketname/Durvesh/student_performance_prediction.csv'
project_id = 'niveustraining'
dataset_id = 'python_DE_assignment'
table_name = 'student_performance'
run_pipeline(beam_options, input_file, project_id, dataset_id, table_name)
```
%% Output
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/root/.local/share/jupyter/runtime/kernel-cf9f0cd3-b77c-49a0-83d9-a43dbc665ab2.json']
%% Cell type:code id: tags:
``` python
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment