I've said elsewhere
that a business's greatest asset is its data. It's like money in the bank.
But what good is money if it's not the right currency? If your wallet contains American dollars, is that going to help you when you're in Europe? If you want to buy a sandwich in Paris, first you have to take your dollars to a currency exchange and convert it into euros before you can fill your belly.
For many businesses, their databases are like treasure chests, holding a wealth of information. However, a lot of that data is not in a form that is useful for effective decision making. An ugly little secret about database systems: they are modeled in a way that best supports the application, and not necessarily the user. It's as if the treasure chest of data is filled with silver doubloons, gold ingots, and yes, in some cases, Monopoly money.
If a business wants to use data for reports, predictive analytics, and other ways to grow their business, more often than not they will need to convert it into a usable form and store it in a separate repository. This repository is often known as a data mart or data warehouse (depending on the size of the repository). The process of converting and uploading data goes by the term ETL (extract, transform, and load).
A lot of database developers often make the mistake of treating a data warehouse the same as any other database. It's like trying to write a check when the merchant says "cash only." Even experienced database administrators (DBAs) can make this mistake. This often results in slow ETL processes which deliver data late.
Furthermore, business intelligence requirements can change quickly, as analysts demand new data in a different form to support their calculations. The traditional methods of creating, modifying, and managing databases learned by most DBAs simply do not adapt well in the new paradigm for data warehouses. Sometimes it's better to bring in new blood, than to teach an old dog new tricks.
Here are a few principles to keep in mind when building a data warehouse:
1. Throughput is King. In a normal back-end database, it's common to have hundreds or even thousands of process threads reading and writing small amounts data simultaneously. In a data warehouse, you only have one (or a few) process writing data, but it's writing a massive amount of data in a single operation. Instead of worrying about concurrency with numerous small writes, focus on throughput on one big load.
2. Disk is Cheap. Thanks to Moore's Law, the cost of storing data keeps dropping significantly every year. Because data storage is so cheap, use staging tables liberally as intermediate processing areas for large-scale or complex transformations.
3. Optimize Performance through your Data Model. A common mistake many inexperienced people make, is to create a lot of indexes whenever users complain about slow queries. Too many indexes slows down ETL operations, and a lot of times they don't get used by the database when determining the best execution plan. You can often get better performance by restructuring the tables in the data warehouse to make data more readily accessible without a lot of complex joins or calculations. Create indexes only as a last resort when all other alternatives have been proven as ineffective.
4. Use Your ETL to Enforce Data Integrity. Ask an old-school DBA how they ensure the data is correct, and he or she will provide the textbook answer of using constraints to enforce data integrity. In a data warehouse, however, constraints such as checks and foreign keys can slow down or even bring a massive load to a screeching halt. If even one row out of several million is wrong, it can bring your system down for hours, resulting in a lot of late nights, weekends in the office, and updated resumes. This is one of the few cases where you should ignore what your DBA says and use your ETL processes to enforce data integrity. You're already transforming it to fit in the data warehouse, why not check data integrity as part of the process?
If you can master the art of creating a data warehouse and maintaining fast, speedy ETL processes to keep it up-to-date, you can set yourself up as a master money changer and take a generous cut of the exchange rate.
Want to learn more principles for data warehouse success? Sign up to my email list, and receive a free guidebook of data warehouse principles!