Diving into T-SQL Grouping Sets

The Grouping Sets feature was added to the T-SQL GROUP BY clause with the release of SQL Server 2008. Despite its usefulness, I’ve not come across many (or any) of them in the wild at any of the clients I’ve worked at in the past few years. Here is a primer on how to use them, using the AdventureWorks OLTP database.

SQL Server 2005 Style

In SQL Server 2005, we could use the WITH ROLLUP syntax to add an aggregated total row:

SELECT
    st.Name ,
    SUM(soh.TotalDue) AS TotalDue
FROM
    Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
WHERE st.Name IN ('Germany', 'United Kingdom')
GROUP BY
	st.Name WITH ROLLUP;

WITH ROLLUP

The WITH ROLLUP and WITH CUBE features will be removed in a future version of SQL Server.

Basic Grouping Sets

With SQL Server 2008 onwards, we can use the GROUPING SETS syntax to achieve the same results. The grouping set is a defined grouping of data in a query. Each grouping set is defined by placing the columns in parentheses. An empty group, (), will show an aggregation across all sets. Here we define a grouping set for territory name, and an overall aggregation:

SELECT
    st.Name ,
    SUM(soh.TotalDue) AS TotalDue
FROM
    Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
WHERE st.Name IN ('Germany', 'United Kingdom')
GROUP BY
	GROUPING SETS ((st.Name), ());

Basic Grouping Set

Another way of expressing this is to use GROUP BY ROLLUP():

SELECT
    st.Name ,
    SUM(soh.TotalDue) AS TotalDue
FROM
    Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
WHERE st.Name IN ('Germany', 'United Kingdom')
GROUP BY
	ROLLUP (st.Name);

GROUP BY ROLLUP

Adding another column into the Grouping Set

Grouping Sets obviously become more useful when dealing with multiple columns for grouping. In this next query, we’ll add in a column for the order year, and define two grouping sets; one for the territory name and one for the year.

SELECT
    st.Name ,
    YEAR(soh.OrderDate) AS Yr ,
    SUM(soh.TotalDue) AS TotalDue
FROM
    Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
WHERE st.Name IN ('Germany', 'United Kingdom') AND soh.OrderDate >= '20070101'
GROUP BY
	GROUPING SETS ((st.Name), (YEAR(soh.OrderDate))) ;

Grouping sets with two columns

Let’s expand this out and amend the second set to include the territory name. This will return aggregates rows for both the name and year and the name alone.

SELECT
    st.Name ,
    YEAR(soh.OrderDate) AS Yr ,
    SUM(soh.TotalDue) AS TotalDue
FROM
    Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
WHERE st.Name IN ('Germany', 'United Kingdom') AND soh.OrderDate >= '20070101'
GROUP BY
	GROUPING SETS ((st.Name), (st.Name, YEAR(soh.OrderDate))) ;

Grouping Sets with two columns in one set

The overall aggregate can be returned by using the empty grouping set.

SELECT
    st.Name ,
    YEAR(soh.OrderDate) AS Yr ,
    SUM(soh.TotalDue) AS TotalDue
FROM
    Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
WHERE st.Name IN ('Germany', 'United Kingdom') AND soh.OrderDate >= '20070101'
GROUP BY
	GROUPING SETS ((st.Name), (st.Name, YEAR(soh.OrderDate)), ()) ;

Grouping sets with overall total

To group on everything, all columns and combinations can be defines in their own grouping sets:

SELECT
    st.Name ,
    YEAR(soh.OrderDate) AS Yr ,
    SUM(soh.TotalDue) AS TotalDue
FROM
    Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
WHERE st.Name IN ('Germany', 'United Kingdom') AND soh.OrderDate >= '20070101'
GROUP BY
	GROUPING SETS ((st.Name), (YEAR(soh.OrderDate)), (st.Name, YEAR(soh.OrderDate)), ()) ;

Using grouping sets to group on everything

Using the GROUPING function

We can identify whether a column has been aggregated by the use of grouping sets (or ROLLUP and CUBE) by using the GROUPING function. It will return 1 for aggregated or 0 for not aggregated. The following query uses this to identify if the territory name column has been aggregated:

SELECT
    st.Name ,
    YEAR(soh.OrderDate) AS Yr ,
    SUM(soh.TotalDue) AS TotalDue,
    GROUPING(st.Name) AS TerritoryGrouped
FROM
    Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
WHERE st.Name IN ('Germany', 'United Kingdom') AND soh.OrderDate >= '20070101'
GROUP BY
	GROUPING SETS ((st.Name), (YEAR(soh.OrderDate)), (st.Name, YEAR(soh.OrderDate)), ()) ;

Using the GROUPING function

Using the GROUPING_ID function

The GROUPING_ID function in T-SQL computes the level of grouping. The grouping columns can be passed in to return an INT which will show the level. For example:

SELECT
    st.Name ,
    YEAR(soh.OrderDate) AS Yr ,
    SUM(soh.TotalDue) AS TotalDue,
    GROUPING(st.Name) AS TerritoryGrouped,
    GROUPING_ID(st.Name, YEAR(soh.OrderDate)) AS GrpLevel
FROM
    Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
WHERE st.Name IN ('Germany', 'United Kingdom') AND soh.OrderDate >= '20070101'
GROUP BY
	GROUPING SETS ((st.Name), (YEAR(soh.OrderDate)), (st.Name, YEAR(soh.OrderDate)), ()) ;

Using the GROUPING_ID function

This can be used to filter rows on certain levels by using the HAVING clause:

SELECT
    st.Name ,
    YEAR(soh.OrderDate) AS Yr ,
    SUM(soh.TotalDue) AS TotalDue,
    GROUPING(st.Name) AS TerritoryGrouped,
    GROUPING_ID(st.Name, YEAR(soh.OrderDate)) AS GrpLevel
FROM
    Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
WHERE st.Name IN ('Germany', 'United Kingdom') AND soh.OrderDate >= '20070101'
GROUP BY
	GROUPING SETS ((st.Name), (YEAR(soh.OrderDate)), (st.Name, YEAR(soh.OrderDate)), ()) 
HAVING
	GROUPING_ID(st.Name, YEAR(soh.OrderDate)) = 3 ;

Using GROUPING_ID to filter rows

Another use of this is for labelling the grouping rows:

SELECT
    st.Name ,
    YEAR(soh.OrderDate) AS Yr ,
    SUM(soh.TotalDue) AS TotalDue,
    GROUPING(st.Name) AS TerritoryGrouped,
    CASE GROUPING_ID(st.Name, YEAR(soh.OrderDate)) 
		WHEN 0 THEN 'Territory & Year Total'
		WHEN 1 THEN 'Territory Total'
		WHEN 2 THEN 'Year Total'
		WHEN 3 THEN 'Overall Total'
    END	AS GrpLevel
FROM
    Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
WHERE st.Name IN ('Germany', 'United Kingdom') AND soh.OrderDate >= '20070101'
GROUP BY
	GROUPING SETS ((st.NAME), (YEAR(soh.OrderDate)), (st.Name, YEAR(soh.OrderDate)), ()) ;

Using GROUPING_ID to label rows

Grouping sets and grouping functions add some powerful functionality to the standard GROUP BY clause. Let’s hope I see them being used more in the future. 🙂

References and further reading:
SQL Server Books Online: GROUP BY
SQL Server Books Online: GROUPING
SQL Server Books Online: GROUPING_ID

Case-sensitivity in a T-SQL WHERE clause

I was asked today if it was possible to use a case-sensitive LIKE statement in a WHERE clause. The default SQL Server collation is case-insensitive, but the collation used can be specified in a query. To return the collation of the current database:

SELECT DATABASEPROPERTYEX(DB_NAME(),'Collation');

In my case, this returned Latin1_General_CI_AS. The case-sensitive version of this is Latin1_General_CS_AS. To use this collation for a column, specify it after the column using the COLLATE statement. For example, I have a table called Groups, with 2 rows. The first row has a GroupName of “Admin”, and the value in the second row is “Standard”. The following query returns both rows:

SELECT GroupID ,
    GroupName
FROM dbo.Groups
WHERE GroupName LIKE '%a%'

Results:

GroupID	GroupName
1	Admin
2	Standard

However, when we use the COLLATE statement to use the case-sensitive collation, only the second row is returned:

SELECT GroupID ,
    GroupName
FROM dbo.Groups
WHERE GroupName COLLATE Latin1_General_CS_AS LIKE '%a%'

Results:

GroupID	GroupName
2	Standard

Practical uses for this could include exact string searches and password matching.

T-SQL to drop all Foreign Keys that reference a table

In a recent data migration project, I had a task to re-number some primary key columns for certain tables, so that there was no key overlap when a number of systems were combined. Part of this process involved identifying and dropping all the foreign keys that referenced the primary key on the tables that were being changed, then adding them again after the updates were completed. To save a bit of time, I knocked up this little script to generate the DROP and ALTER statements for the referencing foreign keys for a given table.

SELECT
    'ALTER TABLE ' + s.name + '.' + OBJECT_NAME(fk.parent_object_id)
		+ ' DROP CONSTRAINT ' + fk.NAME + ' ;' AS DropStatement,
    'ALTER TABLE ' + s.name + '.' + OBJECT_NAME(fk.parent_object_id)
	+ ' ADD CONSTRAINT ' + fk.NAME + ' FOREIGN KEY (' + COL_NAME(fk.parent_object_id, fkc.parent_column_id) 
		+ ') REFERENCES ' + ss.name + '.' + OBJECT_NAME(fk.referenced_object_id) 
		+ '(' + COL_NAME(fk.referenced_object_id, fkc.referenced_column_id) + ');' AS CreateStatement
FROM
	sys.foreign_keys fk
INNER JOIN sys.foreign_key_columns fkc ON fk.object_id = fkc.constraint_object_id
INNER JOIN sys.schemas s ON fk.schema_id = s.schema_id
INNER JOIN sys.tables t ON fkc.referenced_object_id = t.object_id
INNER JOIN sys.schemas ss ON t.schema_id = ss.schema_id
WHERE
	OBJECT_NAME(fk.referenced_object_id) = 'account'
	AND ss.name = 'dbo';

A Couple Of T-SQL Quickies

It’s been a while since I’ve posted any SQL Server content, so in a bid to get back into the swing of things, here are a couple of very simple, but useful, T-SQL commands. I’ve used these recently in a large project I’m working on.

Identify all tables with no Foreign Key constraints:

select distinct
t.name
from sys.tables t
left outer join sys.foreign_keys fk on t.object_id = fk.parent_object_id
where fk.object_id is null
order by t.name;

List all tables in a database, with row counts

select t.name ,
ps.row_count
from sys.dm_db_partition_stats ps
inner join sys.tables t on ps.object_id = t.object_id
where ps.index_id < 2 order by t.name;

Goodbye Posterous, Hello WordPress

I’ve been a Posterous user for over 3 years, having used them to host all my sparsely updated blogs. It came as a bit of a surprise this week when their founder Sachin Agarwal announced that the company had been bought by Twitter. From the press releases, the acquisition appears to be more about buying the people rather than the product. All good for the Posterous staff I think, I’m pleased for them if it works out. However, where does it leave the users? This particular statement on the announcement makes me think that all development on the current Posterous Spaces platform will stop and will eventually get killed off.

Posterous Spaces will remain up and running without disruption. We’ll give users ample notice if we make any changes to the service. For users who would like to back up their content or move to another service, we’ll share clear instructions for doing so in the coming weeks.

A company actively encouraging users to backup and find another service is a sure sign this thing isn’t going to be around for much longer.

As it turns out, this was fairly good timing for me. I’d been considering moving blogging platforms for a while, but with no immediate need to do so, I’d put it off. This announcement gave me the push required.

While I’ve been a fan of Posterous, I’ve also been frustrated by its lack of flexibility. In addition, I also wanted more control over my data, I wanted to own it rather than have it in the hands of someone else. This has been an issue for me with many services over the past couple of years. For example, the proprietary format of notes in Evernote led me to start using Simplenote and Notational Velocity, for the simple reason that I can move my data elsewhere if required. So when it came to choosing a new blogging platform, the key thing was self-hosting. The natural choice was WordPress, it being so ubiquitous and easy to install. Using this guide by SQL Server guru Brent Ozar, I was up and running in no time.

I’ve consolidated both my personal and SQL Server blogs into one, all the SQL content can be found here. Now I just need to get back to writing again…but this will probably have to wait till I finish my Creative Writing Open University course in the summer. My words are at a premium!

I chat, I message

Since I stopped using Windows as an OS for personal use, almost four years ago, I also gave up using IM chat programs. Back then, I was a fairly big user of MSN Messenger, but when I found the Mac version at that point to be flaky to say the least, I ditched it completely. And I didn’t miss it at all. Apart from using Skype for remote work purposes, I’ve shied away from almost all other chat programs.

On Thursday last week, I had read an article on Lifehacker about IM app Adium, and various plugins that were available for it. I decided to fire it up and connect it to my Google and Facebook accounts. To give me something to compare it against, I did the same with iChat – an app I’ve never clicked on since the day I got my first MacBook Pro. I found both apps to be pretty much similar in functionality, certainly for my uses, there was nothing to compel me to use one over the other.

Later that day, Apple announced Mountain Lion, their upcoming version of OS X. Included in this version will be a number of iOS-like features. One of them being Messages, the new version of iChat. The beta was released that day. Quite the coincidence I thought. Messages has one very compelling feature: integration with iMessage for iOS. This means that any iMessage conversations on iPad or iPhone can be carried on in Messages and vice versa. Having used this for a few days I’m certainly liking it. It puts iMessage at the centre of instant communication for Apple users. More importantly it means I can use the full size keyboard rather than having to tap away on the iPhone screen all the time!

This kind of app convergence across desktop and mobile devices is certainly seems to be the way forward. While it has it’s detractors, I’m all for it, and excited by it. Although it could do with some consistency – will the iMessage app on iOS open up and start supporting other IM protocols, such as Jabber and AIM? I’d like to think so.

If instant messaging becomes ubiquitous across devices and operating systems, the question begs, where does this leave email and SMS? 

October Playlist – The Exact Feeling

October was an excellent month for new album releases, the pick of the bunch being Bad As Me by Tom Waits. Unfortunately this isn’t on Spotify as yet, so the playlist only includes the second single, ‘Back In The Crowd’ – probably the weakest track on the record. Anyway, you should all be buying it, as it is an absolute belter of an album. Another record notable by it’s absence from Spotify is the new Richmond Fontaine release, The High Country. I’ve been listening to it a lot, a fine example of storytelling via music.

The new Ryan Adams record Ashes & Fire is also a fine return to form, in the vein of the excellent Heartbreaker album, his debut from 2000. Ryan appeared on one of the BBC Four Songwriters Circle shows with Neil Finn from Crowded House – a nice reminder (as if I needed one) of how good a lyricist he is too. I was particularly pleased to see Paul Brady on another episode of said show, an extremely under-rated musician.

As is almost always the way, I eventually got into the new Wilco record, The Whole Love. These always take a few listens to sink in, they’re not noticeable for their immediacy. But great songs and outstanding musicianship yet again.

The playlist is completed by a couple of funk tracks by Gil Scott-Heron and Rick James, one of my favourite songs by the late great Bert Jansch, some Judie Tzuke (who I’ve been listening to a fair bit since hearing her on the John Martyn tribute record), and The Tragically Hip – a band I never tire of.

The playlist is here.

  1. Nobody Wants To – Crowded House
  2. The Exact Feeling – The Tragically Hip
  3. Lucky Now – Ryan Adams
  4. Black Moon – Wilco
  5. Living For The Corporation – Paul Brady
  6. Stay With Me Till Dawn – Judie Tzuke
  7. Back In The Crowd – Tom Waits
  8. Waiting For The Axe To Fall – Gil Scott-Heron
  9. Mary Jane – Rick James
  10. Born Alone – Wilco
  11. The Black Swan – Bert Jansch
  12. Kindness – Ryan Adams
  13. Freak Turbulence – The Tragically Hip

September Playlist – Sentimental Man

I’ve been unable to think of a word to describe the sort of music I’ve been listening to recently. It’s been more laid-back, less rocky and with little of the country/blues roots that I normally have on the go. I stumbled across live releases by Lloyd Cole and Counting Crows, both of which are excellent. The latter is a live version of their debut record, August And Everything After, one of my favourite records. This month’s new find was the Boston-based bluegrass band, Crooked Still, whose 2010 release, Some Strange Country includes a nice version of The Rolling Stones’ ‘You Got The Silver’. I’ve been re-listening to the back catalogue of The Blue Nile, totally in love with Paul Buchanan’s vocals right now.

I still can’t get enough of John Martyn. A tribute record to the big man, Johnny Boy Would Love This, was released this year, and there are some absolute crackers on it (even the inclusion of Phil Collins isn’t enough to put me off!), but the best by far is the version of ‘Hurt In Your Heart’ by Judie Tzuke. A very tough song to cover, but she really nails it here.

My own band Fake Gods are now on Spotify, so have cheekily added one of our tracks to the set too. 🙂

The Spotify playlist is over here, enjoy!

  1. Perfect Skin – Lloyd Cole & The Commotions
  2. Prospect Street – The Big Dish
  3. Because Of Toledo – The Blue Nile
  4. Sentimental Man – The Blue Nile
  5. Couldn’t Love You More – John Martyn
  6. Hurt In Your Heart – Judie Tzuke
  7. Anna – Brendan Campbell
  8. You Got The Silver – Crooked Still
  9. Turning Away – Crooked Still
  10. Wanderlust – Love & Money
  11. Rain King/Thunder Road – Counting Crows
  12. What’s The Frequency Kenneth? – REM
  13. On A Sea Of Fleur-De-Lis – Fake Gods

Kindling

So Amazon announced their new Kindle range this week, which included the new Android-based tablet, the Kindle Fire. I really like the current Kindle device, but I’m not sure how I feel about the new products yet. It seems a touch half-baked to me, and there are some design decisions I really don’t understand.

iPad Competitor

In the non-techie media there has been much talk of how the Fire is a competitor to the iPad. I believe this thinking is all wrong. The Fire is purely a media consumption device. Whereas the iPad can be used as a device to create content as well as consume it. The two are not the same, although they are broadly marketing to the same people. I do think the Fire could impact iPad sales, but not to the extent that some so-called experts believe. The Fire to me is more like a portable media platform rather than a mobile computer.

Pricing

A lot has been made of the $199 price tag for the Fire. No doubt it’s an excellent pricing point. Likewise, the entry model being £89/$79 puts that into the “disposable” device category. But as Dave Caolo of 52 Tiger points out, the cost price is a touch disingenuous, due to the fact that the device does not even ship with a charger.

Design

The new Kindles have done away with the hardware keyboard, a great move in my opinion. It’s one of the worst things about the current Kindle, making it look rather dated. The base Kindle model is now 30% lighter, and is a touch smaller. However, this change has impacted the battery life, which is halved to one month, storage space (also halved to 2GB) and means the device no longer has any audio capability. It’s an interesting trade-off, given that the current Kindles are extremely light anyway, did they really need to make it any less heavy? I always thought the two month battery life was a huge selling point for a dedicated book reader.

I’ve not mentioned the Touch yet – the new $99 ($149 for 3G) touch screen Kindle. I’m not entirely sure where this model fits in to the scene. It’s a middle ground that I don’t really get. The Touch and the Fire both introduce a design change that makes zero sense to me – the buttons on the sides to turn the pages have been removed, so page navigation is done purely by the touch screen. The easy-to-access hardware buttons gave the Kindle a usability that reading apps on the iPad, for example, could not compete with. I really don’t understand the thinking behind this move.

Amazon have done some really good and exciting things this week with these announcements, I reckon they’ll sell a shed load of the base and Fire models in the run up to the holiday season. I’m more tempted now for the base Kindle myself – as a dedicated “one thing well” reading device, it’s at a compelling price for a very good product. I’ll be interested to see the Fire close up and get a feel for it, although it’s not something I’d personally be interested in (as an iPad owner). When the new devices start shipping next month, we’ll start to see how consumers feel about the design changes, and whether Amazon is onto another winner with the Fire.