When most people think of data storage, the first thing they imagine is a giant hard drive where files are dumped, just like old stuff that ends up in your garage. Luckily, data warehouses are the complete opposite of that concept. They keep information from several data sources in such a way that enables easy search and analysis for business intelligence. On the downside, data warehouses (DWHs) require a lot of know-how and investments to implement and maintain. Let’s review the components of a DWH cost and roughly calculate an average cost to build a data warehouse from scratch taking into account different factors.
Component 1: Storage
The first requirement of a data warehouse and the first component of the total DWH cost is the hardware that performs the essential functions of data storage and corresponding processing. It includes hard disk drives or solid-state drives that provide storage space, computing power to process queries to the warehouse, various networking, and power units, etc.
There are two options to choose from: a local server that provides on-premise storage and a virtual machine that enables cloud storage. Let’s compare their relative costs to help you choose the optimal variant for your particular case.
Cloud storage pricing.
There are 4 top DWaaS (data-warehouse-as-a-service) solutions: Amazon Redshift, Microsoft Azure, Google BigQuery, and Snowflake. All of these cloud data warehouses have different pricing models and sets of provided features to cover the needs of most businesses.
Currently, Redshift is the most popular cloud solution for a data warehouse. It is an especially valid and cost-effective choice if your company already uses other web services from Amazon. It has a highly customizable and rather complicated pricing model that takes into account your location, required storage and computing capacity, and many other factors to adjust the final cost.
The “per GB per month” or “per TB per year” approach is the easiest to understand, so let’s start with it. In the USA, Redshift charges $0.024 per one gigabyte of data stored during 30 days that amounts to $295 per TB per year. However, this option is called “managed storage” and works only with RA3 nodes that cost at least $3.26 per hour ($2347.2 per month) or $18,848 per year if paid upfront.
A cheaper option is DC2 nodes, which do not support managed storage and have other limitations, but their price starts from $0.25 per hour ($180 per month) or $1,380 per year if paid upfront.
Microsoft Azure Synapse
The minimum price of data storage in Azure services is $122.88 for every processed terabyte or $0.17 per TB/hour. To set up a DWH, you will also need Azure Synapse previously known as Azure SQL Data Warehouse. Its minimum price is $876 per month or $6,623 per year.
Similar to Amazon’s pricing models, Microsoft offers a number of options for data warehouses that depend on the amount of processed data, storage capacity, and other factors.
Google BigQuery uses two pricing models: on-demand and flat-rate. The cost is $0.02 per gigabyte per month ($245.76 per TB/year) and can be reduced to $0.01 per gigabyte per month in case of long-term storage. Computing services are charged separately. For example, the on-demand query processing costs $5.00 per one terabyte of processed data or $8,500 per year for every 500 reserved slots (Google’s unit of query processing capacity) as a flat price. There is also a free tier where a limited amount of services are provided without charge.
Snowflake charges about $0.04 per gigabyte per month ($491 per TB/year) or $0.022 per gigabyte per month ($270 per TB/year) if paid up front. The pricing for computation resources is a bit tricky: $2 - $4 “per credit.” A “credit” is the measurement unit that denotes an hour of server operation. So, if a server works continuously, the minimum monthly price is $1,440 that amounts to $17,280 per year.
While cloud data warehouse pricing models seem steep, they all have payment options with significant discounts as well as usage control features that allow clients to pay only for utilized resources. The general pricing is roughly linear: more information requires more money to store it and process queries.
Local storage pricing.
Local storage involves purchasing and installing the required hardware on your premises. As an alternative, if you don’t have enough space or required infrastructure, you may rent a spot/rack for your server in a data center. In addition to computer components, you may also consider buying an operating system, networking hardware, uninterruptible power supplies, and surge protectors.
Keep in mind that these and other expenses will be a one-time payment and you won’t have to deal with monthly/yearly billings as in case of cloud solutions. The exceptions are electricity bills (the average cost for commercial users is 10.42 cents per kWh), Internet bills (about $1000 per year for the unlimited plan), and/or additional rent charges.
A new server with 64GB of RAM and 8TB of space on solid-state disks will cost $3200 or more, depending on hardware brands and other components. A suitable 24-Port Ethernet switch costs about $150, and a 1500VA UPS and surge protector has roughly the same price. A basic 600W combination of server and networking hardware that runs during the whole year consumes 5256 kWh (0.6kW * 8,760 hours), so the electricity cost is about $550 per year.
Overall, the average cost of a data warehouse on a local server is about $3500 as a one-time upfront payment for hardware and $1700 of annual expenses on electricity and the Inter
|Amazon Redshift||Microsoft Azure Synapse||Google BigQuery||Snowflake||Local server
(64Gb RAM/8TB SDD)
|Storage: $295 per TB/year.
Computation: $18,848 per year.
|Storage: $122.88 per processed terabyte.
Computation: $6,623 per year.
|Storage: $245.76 or $122.88 per TB/year.
Computation: $8,500 per year.
|Storage: $491 or $270 per TB/year.
Computation: $17,280 per year.
|$3500+ as one-time expense plus $1700 per year for electricity and the Internet.|
Component 2: DBMS and ETL Software
Another component required to implement a data warehouse in your enterprise is the extract, transform, and load (ETL) software. Various ETL tools are used for data migration, synchronization, and visualization in order to create a fully-functioning DWH. However, most of them are optional and you may use them if you have an extra budget to spare.
Database management systems (DBMSs) are an essential type of software for a data warehouse. Besides the licensing costs, this kind of software involves additional expenditures on the services of qualified IT specialists to implement, customize, and support it.
Paid database management solutions.
Proprietary software costs are high, but you get a high-quality product from a reputable company that offers excellent customer support. Moreover, well-known paid DBMSs usually have numerous manuals and video lessons available on the Internet that aid in building a data warehouse and training employees to use it.
Oracle MySQL. The annual subscription costs $5000 for enterprise usage.
IBM DB2. The base edition costs $1900 as a one-time payment or $79.16 per month.
Microsoft SQL Server 2019 Enterprise edition costs $13,748 as a one-time payment or $5,434/year.
Free database management solutions.
This category includes such solutions as Talend, Greenplum, Apache Cassandra, Cubrid, ClickHouse, PostgreSQL, FirebirdSQL, etc. Though these DBMSs are free to use or have a “free tier” in their pricing models, their installation and configuration processes require professional services that must be paid separately. The salaries of such professionals are described in the next section.
Component 3. Costs of Personnel/Human Resources
To set up and maintain a data warehouse, you need several types of specialists. To avoid data loss that compromises your business operations, you should refer to certified professionals that guarantee the highest quality of their work. Their salaries are presented below.
A data warehouse consultant is essential for planning the works on your future DWH. Most likely, you will not need this type of specialist on a constant basis. A data warehouse consultant will assist other specialists that build and maintain your DWH. Other responsibilities include preparing the technical specifications and teaching users how to work with a data warehouse. The average salary of such a consultant is $34 per hour and the required time is at least 10 working days per year, which is 80 hours – $2720.
A data engineer will perform most of data “warehousing” in terms of building, configuration, performance tuning, and other activities. In some cases, when your DWH is created and configured and you are not planning to expand it in the nearest future, you won’t need a data engineer as a permanent employee. The average salary of a data engineer in the United States is $92,000 per year or $40 per hour. However, in other parts of the globe, the salary of a professional with similar experience and skills may be much lower, for example about $20,000 in East Europe.
A database administrator will maintain your data warehouse up and running for about $74,000 per year. The responsibilities include troubleshooting, regular data migration and cleanup, etc.
The main purpose of data warehouses is to ensure effective analysis of information collected from several databases and other sources. So, there is a constant need for an expert that performs this task. In most real-life scenarios a business enterprise will have a team of data analysis specialists or a data analytics department. However, if you are not involved in Big Data, let’s take a single specialist as a baseline. An average salary for a data analyst in the USA is $62,000 per year.
|Data warehouse consultant (part-time)||$2,720|
|Data engineer (full time)||$92,000|
|Database administrator (full time)||$74,000|
|Data analyst (full time)||$62,000|
As we have already mentioned above, data warehouses are expensive. Now it’s time to prove it with figures. For example, if you prefer cloud solutions, such as Microsoft Azure Synapse, and a paid DBMS, such as Microsoft SQL Server 2019, the final cost will be about $12,000 per year plus $122.88 for every processed terabyte of information.
If you opt for a local server and a free DBMS, you’ll need roughly $3500 on hardware that will serve you for a couple of years and $1700 per year for electricity and the Internet. This variant is more complicated than a cloud-based solution and requires more work hours to implement it. In both cases, you will need at least four types of data specialists that will cost you about $230,720 per year in the United States or less if you hire a skilled team from another country.
Last but not least: all prices and salaries change from time to time, so the figures given above are for illustration only. Contact us, and we will estimate the total price based on your requirements and provide highly skilled specialists to create and maintain a top-quality data warehouse.