Ingest data from GCS to BQ using Cloud Functions.

SP Kumar Rachumallu
Nerd For Tech
Published in
4 min readApr 2, 2021

--

If you are already familiar with FaaS (Function-as-a-Service) then Cloud Functions need no introduction.

Google Cloud Functions is a server less execution environment for building and connecting cloud services. With Cloud Functions, you write simple, single-purpose functions that are attached to events emitted from your cloud infrastructure and services. Your Cloud Function is triggered when an event being watched is fired. Your code executes in a fully managed environment. There is no need to provision any infrastructure or worry about managing any servers.

Cloud Functions can be written using Node.js, Python, Go, Java, .NET, and Ruby programming language runtimes.

Cloud Functions are either HTTP based functions or event-driven functions. Depending on runtime you use, event-driven can either be background functions or cloud event functions.

In this article, we would write, deploy and trigger and event-driven function specifically BackGround functions (using Python runtime) that automatically loads data from Cloud Storage to BigQuery whenever a new file is available in GCS.

  • Before we begin, ensure Cloud Functions, and Cloud Storage APIs are enabled from APIs and Services console.
  • Once the required APIs are enabled, from the navigation menu, click Cloud Functions to open up the home page. You can click Create Function from the home page to create a new function.

Creating a new function involves two steps: Configuration & Code.

On clicking Create Function, at first you would see a configuration screen, where you would need to choose basic details such as Function name, Region, Trigger type etc. Depending upon trigger type, you would also need to choose further options. For example, if your Trigger is Cloud Pub/Sub you would need to select a cloud Pub/Sub topic from the dropdown menu.

In this example, I would choose Cloud Storage as my Trigger and select the bucket from the drop down menu. I also need to choose the Event Type which in our example would be Finalize/Create.

User can choose to check Retry on failure box. A warning appears which tells what the check box mean for the trigger.

You can further choose to set Runtime, Build and Configuration Settings, which are untouched and left default for the purpose of this example.

Once everything is selected as required in configuration screen, click Next to navigate to Code screen where you would need to choose the Runtime which in our case is Python 3.7 and Entry Point (Entry point is the name of the function you would like to execute within cloud function).

At this stage, it is expected we know the Code Structuring requirements for Python. For example:

├── main.py

└── requirements.txt

main.py contains one or more functions that we intend to execute.

requirements.txt file specifies dependencies for these functions to execute. This includes Packages to be installed too.

Once the required runtime is selected, one can choose to use inline code editor, or upload code from GCS etc. to create a function. In our example, I choose to use inline code editor. Once the code and requirements are defined as needed, you can Deploy the function.

The green tick beside the function name says the Deployment is successful.

Your Cloud Function to load data from GCS to BQ whenever a new CSV file is triggered is now ready.

Once the function is executed, you can monitor the performance, errors, metrics and logs just by getting into the function screen by selecting the right version.

You can also use Multiple Cloud Functions to carry out the extract-transform-load.

Code Credits:

https://gist.githubusercontent.com/mkahn5/5d2d569209f39f72d089a68d767de57b/raw/19f4440bf2e8ee2f22b3ca757591b007d7975672/main.py

--

--