If you are already familiar with FaaS (Function-as-a-Service) then Cloud Functions need no introduction.

Google Cloud Functions is a server less execution environment for building and connecting cloud services. With Cloud Functions, you write simple, single-purpose functions that are attached to events emitted from your cloud infrastructure and services. Your Cloud Function is triggered when an event being watched is fired. Your code executes in a fully managed environment. There is no need to provision any infrastructure or worry about managing any servers.

Cloud Functions can be written using Node.js, Python, Go, Java, .NET, and Ruby programming language runtimes.

Cloud…


Source: Trifacta webpage

Cloud Dataprep is a managed service from Google Cloud and is a partner service developed by Trifacta. Cloud Dataprep help analysts reduce time required to prepare data for analysis. Within the console, you have tools that help explore, cleanse, and transform data without configuring or provisioning servers.

Dataprep comes in three editions. Cloud Dataprep Premium, Standard and Legacy. Dataprep jobs run on Dataflow. When the job execution completes, Dataflow creates a template file in Cloud Storage (Temp Directory).

Let us look at how we can use Dataprep to load data from GCS to BigQuery with a few transformations. …


Cloud Data Fusion is a fully managed, code free web based data integration service that helps build data pipelines quickly and efficiently. An instance in Data Fusion is composed of several Google cloud services such as Cloud Storage, Persistent disk, Cloud Key Management Service etc. These services are used to develop and manage pipelines build on Data Fusion.

When you run a data pipeline, data fusion automatically provisions Dataproc clusters, run pipelines, and tears down the clusters once the job is completed. User may choose to run pipelines on existing Dataproc clusters too.

Before we start extracting data from Oracle…


Whether you want to migrate your on-prem data to cloud or between cloud providers, teams within an Organization should come together, choose a service that best fits, estimate costs, plan and execute.

Storage Transfer Service or simply Transfer Service is one of GCP’s offering that help you move data either from On-Prem, or from a different cloud or even from one GCS bucket to other. You can seamlessly create and process transfer operations to GCP Storage in a fully managed and serverless way.

No code — No infrastructure

User may choose either to perform one-time transfer or schedule transfers.

One…


BigQuery Data Transfer Service is an offering by GCP that automates the movement of data from different external sources into BigQuery under given conditions on periodic basis.

Image picked from Google Cloud Documentation.

Currently DTS supports few Google SaaS apps such as Campaign Manager, Google Ads etc., Amazon S3 (Cloud storage), and couple of data warehouses such as Teradata and Amazon Redshift. You can also automate data load from Salesforce CRM and Adobe Analytics using third party transfers for DTS available under Google Cloud Marketplace.


BigQuery has the capability that allow users to query data from external data sources without the need of loading data at all. Currently BQ supports GC storage systems such as Cloud Storage, Bigtable, Google Drive and Cloud SQL. To use the federated queries, you should colocate your BQ dataset and the external data source. If you are using Cloud Bigtable, you might want to check the supported locations before making a decision.

If you are running a federated query on Cloud Storage or Google Drive or Cloud Bigtable, the process is pretty much the same with minor to no changes.

Using Cloud Console


Google BigQuery

Speed and Scalability are two biggest advantages of Google BigQuery, complimenting its serverless architecture. BigQuery is comparatively a cost effective data warehouse that allows querying data that can scale up to petabytes. Being serverless allows customers to concentrate on insights than to manage infrastructure. BigQuery does not just fit to one paradigm of data lifecycle. You can use BigQuery to ingest, process, store and even perform core analytics on data.

If you have an external data source, you can either load data into BigQuery or you can query the external source without loading it at all (Federated Query). The key…


This article sets the context of ingestion and services available to ingest data within GCP and is just a preface of the 3 part series.

Data engineers often create pipelines that access data from different data sources within an organization to serve needs of business stakeholders. This data comes from various sources in varied formats and each having a different schema. Whether it is a BI Dashboard or an ML model, data pipelines help streamline the process of data building.

Every organization has their own way of working with pipelines. …


Working with Big Query datasets within Jupyter notebook is no different to work with any other DB using Jupyter. In 3 steps we can query data stored in BQ from Jupyter notebook.

Step#1

Before we read or query data, we need to make sure to meet few prerequisites.

  1. You should have an active google cloud account (BQ Sandbox works too). You can use an existing project or create new project.
  2. You should have Jupyter notebook installed on your computer.
  3. You should enable Big query API either from cloud console or from Cloud Shell.

To enable Big Query API through cloud console…

SP Kumar Rachumallu

Lead Programmer at Novartis Healthcare Pvt Ltd.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store