Whether you are developing BI dashboards or analyzing data to make critical business decisions, you need to access data from various data marts, data warehouses or transactional databases in order to perform aggregation or derive statistics as necessary for the report.

If you are working with BigQuery as data warehouse and CloudSQL as your RDBMS, you often need to identify a way to access the data for your analytical needs.

One way of doing it is by writing Federated Queries from BigQuery where CloudSQL is treated as an External Source of data and you enclose your query within a function…

Cloud SQL fall under the category of Database as a service (DBAAS). In other words, Cloud SQL is a virtual machine hosted on Google Compute Engine, which runs a MySQL image through which you get to access MySQL server without having to SSH.

Whether you are using MySQL on-prem or on cloud, Database Migration Service (DMS) provides connectivity options such as IP allowlists and VPC peering to migrate your data. Choosing a connectivity method depends on the source DB and where it resides.

Database migration has to happen in phases or waves. Outlining the order of various tasks and allocating…

If you are already familiar with FaaS (Function-as-a-Service) then Cloud Functions need no introduction.

Google Cloud Functions is a server less execution environment for building and connecting cloud services. With Cloud Functions, you write simple, single-purpose functions that are attached to events emitted from your cloud infrastructure and services. Your Cloud Function is triggered when an event being watched is fired. Your code executes in a fully managed environment. There is no need to provision any infrastructure or worry about managing any servers.

Cloud Functions can be written using Node.js, Python, Go, Java, .NET, and Ruby programming language runtimes.


Source: Trifacta webpage

Cloud Dataprep is a managed service from Google Cloud and is a partner service developed by Trifacta. Cloud Dataprep help analysts reduce time required to prepare data for analysis. Within the console, you have tools that help explore, cleanse, and transform data without configuring or provisioning servers.

Dataprep comes in three editions. Cloud Dataprep Premium, Standard and Legacy. Dataprep jobs run on Dataflow. When the job execution completes, Dataflow creates a template file in Cloud Storage (Temp Directory).

Let us look at how we can use Dataprep to load data from GCS to BigQuery with a few transformations. …

Cloud Data Fusion is a fully managed, code free web based data integration service that helps build data pipelines quickly and efficiently. An instance in Data Fusion is composed of several Google cloud services such as Cloud Storage, Persistent disk, Cloud Key Management Service etc. These services are used to develop and manage pipelines build on Data Fusion.

When you run a data pipeline, data fusion automatically provisions Dataproc clusters, run pipelines, and tears down the clusters once the job is completed. User may choose to run pipelines on existing Dataproc clusters too.

Before we start extracting data from Oracle…

Whether you want to migrate your on-prem data to cloud or between cloud providers, teams within an Organization should come together, choose a service that best fits, estimate costs, plan and execute.

Storage Transfer Service or simply Transfer Service is one of GCP’s offering that help you move data either from On-Prem, or from a different cloud or even from one GCS bucket to other. You can seamlessly create and process transfer operations to GCP Storage in a fully managed and serverless way.

No code — No infrastructure

User may choose either to perform one-time transfer or schedule transfers.


BigQuery Data Transfer Service is an offering by GCP that automates the movement of data from different external sources into BigQuery under given conditions on periodic basis.

Image picked from Google Cloud Documentation.

Currently DTS supports few Google SaaS apps such as Campaign Manager, Google Ads etc., Amazon S3 (Cloud storage), and couple of data warehouses such as Teradata and Amazon Redshift. You can also automate data load from Salesforce CRM and Adobe Analytics using third party transfers for DTS available under Google Cloud Marketplace.

BigQuery has the capability that allow users to query data from external data sources without the need of loading data at all. Currently BQ supports GC storage systems such as Cloud Storage, Bigtable, Google Drive and Cloud SQL. To use the federated queries, you should colocate your BQ dataset and the external data source. If you are using Cloud Bigtable, you might want to check the supported locations before making a decision.

If you are running a federated query on Cloud Storage or Google Drive or Cloud Bigtable, the process is pretty much the same with minor to no changes.

Using Cloud Console

Google BigQuery

Speed and Scalability are two biggest advantages of Google BigQuery, complimenting its serverless architecture. BigQuery is comparatively a cost effective data warehouse that allows querying data that can scale up to petabytes. Being serverless allows customers to concentrate on insights than to manage infrastructure. BigQuery does not just fit to one paradigm of data lifecycle. You can use BigQuery to ingest, process, store and even perform core analytics on data.

If you have an external data source, you can either load data into BigQuery or you can query the external source without loading it at all (Federated Query). The key…

SP Kumar Rachumallu

Lead Programmer at Novartis Healthcare Pvt Ltd.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store