Cubes (& PowerPivot) vs Traditional Excel Pivot tables

This is a question I’ve been asked a few times before; “apart from handling larger data-sets, what do you get with cubes that you don’t get with pivot tables?”

This isn’t an exhaustive list but it covers what I think are the most important functionality differences.

#1 – Cubes are organised into dimensions / attributes.

With traditional Excel pivot tables, you are building on top of a flat data-set and as a result you will get your dimensions / attributes / measures in one big list.

You can take a flat data-set straight into PowerPivot as well (and there are definitely use cases for doing this) but if you build a dimensional model (as you are forced to do in SSAS MD) you will logically organise your reporting attributes by the correct dimension and make a much more intuitive model for the user.

dim-vs-flat

#2 – Hierarchies.

Traditional pivot tables will show you the implicit relationships in your data but you will have to create a column for each attribute you wish to show.  You can then tidy this up by adding manual grouping but the whole process is very clunky.

With PowerPivot & Cubes, you explicitly define hierarchies and when exploring the data you can drag this single field in and have instant drill-up and drill-down, plus on large dimensions performance is going to be significantly better.  Displaying natural relationships this way is key in making the data easy to understand at a glance.

ssas_pvt_hier_v2

#3 – MDX / DAX.

In cubes and PowerPivot you can write your own MDX & DAX functions to build more complex calculations.  There really isn’t any direct comparison for MDX/DAX for traditional pivot tables although when creating a measure in a traditional pivot table there are some options available (as shown below) but nothing with the advanced analytics capability of MDX/ DAX.

pivot_measure_opts

Users also sometimes build calculations outside of the traditional excel pivot but this then means the pivot table has to remain static otherwise the calculations are overwritten.

#4 – Client Tools.

Traditional Pivot Tables are themselves the end-product.  PowerPivot provides an upgrade path to SSAS tabular and cubes have a whole host of cutting edge reporting tools that can use them as a source.  Personally I don’t like excel pivot tables for browsing cubes and have tried out some different reporting tools that can connect to SSAS.  My favourite experience was delivered by Pyramid Analytic’s BI Office solution, an example of which can be shown below.

Power View inside Excel is not bad but in my opinion it’s not the easiest thing to use and I can’t see it gaining wide adoption.

bioxl

#5 – Other Considerations

There are plenty of other important considerations such as

security,
pre-aggregation,
partitioning,
formatting of dimensions/measures,
deployment possibilities

…and of course scalability but hopefully the above should demonstrate to people new to BI that SSAS offers key functionality benefits compared with pivot tables.

Redgate pt1 – Database Lifecycle Management

Most people who work with SQL Server know about Redgate.  They are a well known company that create sql server productivity tools amongst other software.

I had used/liked their tools in the past but have been really impressed with them as a company over the last year as they are leading the way in creating a much better model for database development – not just trying to theorise on it, but also building the tools and running the workshops required to get it off the ground.

I attended a free DLM (database life-cycle management) course they ran back in April and now see it becoming the de-facto standard development model once companies have managed to catch-up.

Based-on a model that has been around in the app dev world for years, it basically automates all the manual processes in your development pipeline and gives you very early visibility of problems.

The simplified image below describes the model but you can get a full picture from their own website here.

redgate-dlm

Planning for the Future

“. . . the United States…is pioneering the development of a world-wide information grid.  It is expected that such a grid will come into being by about 1975.  For the first time in history the cumulative knowledge of mankind will be made accessible on a global scale—and it will be almost instantaneously available in response to demand.”

– Zbigniew Brzezinski, Between Two Ages : America’s Role in the Technetronic Era (Viking Press, 1970)

Brzezinski has been one of the most influential ‘statesmen’ of the last 50 years.  If you follow politics you will be aware of the impact he has had over the last few decades.

I recently read Between Two Ages and the book is full of examples of how people in this field are able to think 10, 20 or even 30 years into the future.  In particular, the quote above describing the internet, written in 1970 shows just how possible it is to predict change.

It makes me think that Architects in the DWH/BI space should be able to broadly know what is coming down the line for at least the next 2-3 years and to plan accordingly.  A lot of companies I know of are just replaying 10-year-old practices on more modern platforms.

For reference, technetronic means ‘shaped or influenced by the changes wrought by advances in technology and communications, especially electronic’.

Creating an ‘SSIS Configurations’ report

When starting my current role, I didn’t get the opportunity to have a handover period from the departing developer.  The company had a lot of SSIS packages and I needed to be able to look at safely testing them without affecting production file shares etc.

In the end I asked a consultant who was working with me (and who knew C#) to write something to help me extract the .dtsconfig data into a spreadsheet.

Once I had the data in Excel, I imported it into SQL and built the below SSRS report so I could see side by side comparisons for each of the packages…

SSIS_ConfigReport

I have made the tool available on my mega account if you want to give it a go yourself.  Instructions also provided in the folder.

dtsConfig-tool (choose download as zip).

Thanks to Steve Cage who wrote the extraction tool and let me post it online 🙂

The cloud; computing as a utility.

Here is a great video produced by cognitive media and data barracks – a hosted IT provider with their data-center in an old underground RAF bunker.

It explores the concept of computing being provided as a utility just like a home is provided with water, electricity & gas.  Narrated by Stephen Fry.

Dave Coplin: Re-Imagining Work

I recently attended a presentation by Dave Coplin, ‘Chief Envisioning Officer’ at Microsoft UK – His work looks at questioning conventional wisdom and finding new ways of doing things.  The above 9-minute video is really enlightening, in particular the part regarding employee engagement.  It was produced by Cognitive Media who have a done a lot of great videos in this format.

His book ‘Business Re-Imagined’ is really concise and can be read inside of a few hours but has loads of good info and case studies.  For me, the chapter on Netflix is the stand-out part of the book where he talks about a ‘revolution in a slide show’.  You can read have a look at the slideshow that he references which is called ‘Freedom & Responsibility’ by using the following link https://jobs.netflix.com/who-we-are.

‘Business Re-Imagined’ on Amazon

A better way to deliver data-dumps

A common task when working as a data-developer is to produce one-off data extracts for various users, especially if you are not working within a self-service model.

This is fine but can lead to a really frustrating situation whereby you get a request such as “please can you re-run this data extract and I need it within the next hour’.

Usually the only information the requestor can give you is which developer ran it for them last time (and they have usually left the company).  You then have to try and reverse-engineer the output and hope for the best.

To get around this, I use the following excel template which contains the output as well as a tab to record the sql used, the server it was run against and the date it was produced – this allows for a quick and easy ‘refresh’ of the data and also means that the data is traceable to source.

data_extract_template.xlsx

Note:  I also have another version of this template with a number of macros that have keyboard shortcuts to quickly apply formatting options such as UK dates but I cannot upload .xlsm to wordpress.  However this is very easy to build yourselves.

Building a good team

“…and I know a few superstar programmers who could crank out a Fortran compiler by themselves in one week, and lots of programmers who couldn’t write the code to print the startup banner if they had six months.” – Joel Spolsky

Joel Spolsky runs StackOverflow and has some amazing articles. In this post I have abridged one of his articles on hiring quality staff and added my own emphasis.

The full article was written in 2006 and can be found here http://www.joelonsoftware.com/articles/GuerrillaInterviewing3.html


On picking the right people:

You’re going to see three types of people in your interviews. At one end of the scale, there are the unwashed masses, lacking even the most basic skills for this job. They are easy to ferret out and eliminate, often just by asking two or three quick questions. At the other extreme you’ve got your brilliant superstars who write lisp compilers for fun, in a weekend, in Assembler for the Nintendo DS. And in the middle, you have a large number of “maybes” who seem like they might just be able to contribute something. The trick is telling the difference between the superstars and the maybes, because the secret is that you don’t want to hire any of the maybes. Ever.

Why am I so hardnosed about this? It’s because it is much, much better to reject a good candidate than to accept a bad candidate. A bad candidate will cost a lot of money and effort and waste other people’s time fixing all their bugs. Firing someone you hired by mistake can take months and be nightmarishly difficult, especially if they decide to be litigious about it. In some situations it may be completely impossible to fire anyone. Bad employees demoralize the good employees. And they might be bad programmers but really nice people or maybe they really need this job, so you can’t bear to fire them, or you can’t fire them without pissing everybody off, or whatever. It’s just a bad scene.

OK, I didn’t tell you the most important part—how do you know whether to hire someone?

In principle, it’s simple. You’re looking for people who are

Smart, and

Get things done.

People who are Smart but don’t Get Things Done…have a tendency to show up at your office, coffee mug in hand, and try to start a long conversation about the relative merits of Java introspection vs. COM type libraries, on the day you are trying to ship a beta.

People who Get Things Done but are not Smart will do stupid things, seemingly without thinking about them, and somebody else will have to come clean up their mess later. This makes them net liabilities to the company because not only do they fail to contribute, but they soak up good people’s time.


On how to interview properly:

The worst kind of interviewer is the blowhard. That’s the kind who blabs the whole time and barely leaves the candidate time to say, “yes, that’s so true, I couldn’t agree with you more.” Blowhards hire everyone; they think that the candidate must be smart because “he thinks so much like me!”

The second worst kind of interviewer is the Quiz Show Interviewer. This is the kind of person who thinks that smart means “knows a lot of facts.” They just ask a bunch of trivia questions about programming and give points for correct answers. i.e.

“What’s the difference between varchar and varchar2 in Oracle 8i?”

This is a terrible question. There is no possible, imaginable correlation between people that know that particular piece of trivia and people that you want to hire. Who cares what the difference is? You can find out online in about fifteen seconds! Remember, smart does not mean “knows the answer to trivia questions.”

Anyway, software teams want to hire people with aptitude, not a particular skill set…it’s better to hire people that are going to be able to learn any new technology rather than people who happen to know how to make JDBC talk to a MySQL database right this minute.

I am very, very careful to avoid anything that might give me some preconceived notions about the candidate. If you think that someone is smart before they even walk into the room, just because they have a Ph.D. from MIT, then nothing they can say in one hour is going to overcome that initial prejudice. If you think they are a bozo because they went to community college, nothing they can say will overcome that initial impression.


On how to identify high-calibre individuals:

One: Look for passion. Smart people are passionate about the projects they work on. They get very excited talking about the subject. They talk quickly, and get animated. Being passionately negative can be just as good a sign. “My last boss wanted to do everything on VAX computers because it was all he understood. What a dope!” There are far too many people around that can work on something and not really care one way or the other. It’s hard to get people like this motivated about anything.

Two: Good candidates are careful to explain things well. I have rejected candidates because when they talked about their previous project, they couldn’t explain it in terms that a normal person could understand. Often CS majors will just assume that everyone knows what Bates Theorem is or what O(log n) means. If they start doing this, stop them for a minute and say, “could you do me a favor, just for the sake of the exercise, could you please explain this in terms my grandmother could understand.” At this point many people will still continue to use jargon and will completely fail to make themselves understood. Gong! You don’t want to hire them, basically, because they are not smart enough to comprehend what it takes to make other people understand their ideas.

Three: Look for signs that they took a leadership role. A candidate might say, “We were working on X, but the boss said Y and the client said Z.” I’ll ask, “So what did you do?” A good answer to this might be “I got together with the other members of the team and wrote a proposal…” A bad answer might be, “Well, there was nothing I could do. It was an impossible situation.” Remember, Smart and Gets Things Done. The only way you’re going to be able to tell if somebody Gets Things Done is to see if historically they have tended to get things done in the past. In fact, you can even ask them directly to give you an example from their recent past when they took a leadership role and got something done—overcoming some institutional inertia.