“Switching to a cloud solution allows even small companies to build a scalable data warehouse.”
Author: Martin Schneider
– Former Big Data Engineer at Customlytics
In our last blog post, we spoke about why it’s a good idea to think about investing in a data warehouse. In case you haven’t we recommend to take the time and read it here before continuing.
We think a data warehouse has required the asset to stay competitive and cope with the increasing amount of information. Further, the latest developments in the cloud market allow even startups to build their analytics platform. Which was previously an asset only big companies with financial and personal resources could afford. But not anymore.
In this blog, we want to challenge you with questions to help you figure out some crucial answers, which have an impact on the architecture of your data warehouse.
1. When to start?
We argue that data is needed from the get-go! Might it be to convince investors, check your company KPI’s or simply check server/app performance.
2. How to start?
There are two answers to this question, which depends on your company. If you are right at the start of your journey you should think about which KPI’s and data is needed and how to collect this data. Along with a scalable and modifiable way to collect and store the data, because the demands and constraints will change over time.
The other answer is when you are in an established company. Take a look around and check which data is already available and what are KPI’s requested by different stakeholders.
To see if all questions can be answered with the available data.
It is a good idea to care about a good data structure early on. You want to avoid the need to refactor bigger parts of your code-base and infrastructure when new requirement appears.
With such a structure you can start with an Excel/ script reporting and later transition to a cloud or on-premise solution.
3. Why start?
Investing in a data warehouse is an investment in efficiency and the future. First, your developer might be able to satisfy the need for data with excel/ script reporting. But data creates the need for even more data! This point can’t be more stressed as 3rd party data has to enrich your own 1st party data in order to make decisions.
Second, excel and scripts are not endlessly scalable. With the growth of the company, the demand for data is growing. Switching to a cloud solution allows even smaller companies with little effort and budgets to build a scalable data warehouse.
4. Whom to start with?
In my opinion, you have two options (ordered by my preference):
Data Engineer – models and defines data sets, writes scalable ETL processes and data collection code, ensures data is clean and well structured
Data Scientist – applies advanced statistics to data to discover new insights, builds Machine Learning models, figures out ways to optimize business processes using data
Without data, an Analyst or Data Scientist can’t work. Which makes the Data Engineer the first choice. Since Data Scientists are (considered) to be to a degree a “Jack of all Trades” they are a potential option for a first hire.
Don’t forget to think about which technology you like to use. If you already use products of one of the cloud providers it makes sense to utilize their data warehouse products.
Be open about this part, people tend to have a favorite tech stack and like to stick to them.
5. Final thoughts
Thinking about how to incorporate data into daily workflows is something to be done sooner than later. It is easier to do while the company grows than doing it later on. The needed structures and habits can be incorporated while scaling the company.
The initial phase can be summarized in three steps:
1. Think about what your KPIs are and if additional data is needed how to acquire those data points.
2. Make the initial hire and start a data team.
3. Research the tool landscape and get some ideas what you like to use and how already used technologies can be utilized on your endeavor. (i.e. Firebase and Google Cloud Platform with Big Query and Looker)
One last thing, don’t forget to build your monitoring system, for data imports, ETL pipelines or cost monitoring.
Making the move to a data warehouse strategy? We here at Customlytics cover all app marketing and analytics topics. We’re here to help if you need actionable tips for your data warehousing strategy. Drop us a line via email [email protected].
Sources:
Medium – Which data role should I fill in a startup first?
KDnuggets – The different data science roles in the industry
KDnuggets – A Winning Game Plan For Building Your Data Science Team