Dimension and Partition Processing

Be Sociable, Share!

    Microsoft SQL Server 2008 Analysis Services Unleashed Review Chapter 21

    As this chapter claims at the outset, SSAS only stores information in two locations: dimension files and partition files. Chapter 20 had discussed the static description of these files, and this chapter now discusses the dynamic construction of these files.

    A note on page 382 made me think about the processing sequence. We have way to force Analysis Services to issue a single SQL query to process all the dimension attributes. This choice involves setting the EnableTableGrouping server configuration property. Using this property could cause inefficient performance. However, the point I take is that not all this data loading happens all at once, and that small difference could be an issue for some data loading scenarios. Forcing the load to happen with one SQL query is a way to assure that these dimensions are loaded under the same lock. Conceptually, dimensions may have some relationships among themselves, and under some conditions, OLTP systems may be changing attributes of different dimensions at the same time. I wrote at the beginning of this review series that multidimensional is somewhat misleading because these dimensions are not equivalent with the known spatial dimensions. Specifically, some amount of correlation among dimensions may be part of the original design, and while efficient for transactional table loading, that synchronization would require careful management of the transactional system. Microsoft has a new product called Master Data Services which allows management of these sometimes correlated dimensions. Perhaps future SSAS loading would work with Master Data Services to assure that loading has the same consistent view as what Master Data Services reports to transactional managers.

    This chapter contains two excellent figures describing the loading process:

    • The chart on page 380 shows how Analysis Services attempts to process dimension attribute loading in parallel processes. Many of the loading features can happen in parallel, thus maximizing server use and minimizing loading time.
    • The chart on page 404 shows what happens under processing error conditions. The loading process is technically a transaction, which can optionally stop based on certain flag settings (some can be set to “IgnoreError”). Errors are reported to the log to allow debugging.

    The book contains a number of performance hints:

    • One of the most important ones I believe is on page 395: “We recommend a partition size containing between 5 and 20 million records.” Actually, this tip is for a maximum partition size since Food Mart has less than 5 million records. This advice provides a good understanding of how to plan for best partition use.
    • Processing 50K records per second is good processing performance, while loading 100K records per second is considered optimal processing performance (page 396).
    • Incremental partition updates allow for quicker loading of new data, but should be rebuilt periodically to optimize overall query performance (pages 398-399).

    For anyone trying to maximize loading performance, this chapter is invaluable.

    Gorbach, I., Berger, A., & Melomed, E. (2009). Microsoft SQL Server 2008 Analysis Services Unleashed. Indianapolis, IN: Pearson Education Inc.
    ISBN: 0-672-33001-6

    Be Sociable, Share!