- Technology and analytics
When working with data, you often need to know how many unique items you have in a list. For example, you might wish to know how many customers you have in your database, how many products you can offer to distributors, or all the countries or geographical regions you make sales in.
The required data is frequently stored in tables or lists where duplication is rife. Therefore, how do you “retire” the replicants? Here, I consider three approaches, reviewing the advantages and disadvantages of each.
First, let’s consider my data, highly tailored for the purposes of this article:
A cursory glance may help you glean the information you require: Upon inspection, there are six unique items in the screenshot, not including the heading. But how do we get Excel to confirm this total? I present three alternatives, all detailed in the downloadableExcel file.
Option 1: Using dynamic arrays
I can employ the dynamic array formula UNIQUE. That would be an obvious start, considering the title of this article. I assume the range (excluding the heading) is calledExample1.
Dynamic array formulas are calculations that use a function that will automatically extend its range depending upon the quantum of the results. This automatic extending is known asspillingand, although it potentially produces an array (a range of results that may encapsulate both more than one row and more than one column), it does not have to be entered using CTRL+SHIFT+ENTER, which is how array formulas needed to be entered in the past.
The ironic thing about UNIQUE is that it does two things: It details distinct items (ie, provides each value that occurs with no repetition), and also it can return values that occur once and only once in a referred range. It is the former feature we require here.
The UNIQUE function has the following syntax:
=UNIQUE(array, [by_column], [occurs_once])
It has three arguments:
•array:This is required and represents the range or array from which to return unique values.
•by_column:This argument is optional. This is a logical value (TRUE/FALSE) indicating how to compare. If you wish to compare by row, the argument should be FALSE or omitted (since this is the default). To compare by column, you will need to select TRUE.
•occurs_once:This argument is also optional. It requires a logical value, too:
•TRUE:Only return unique values that occur once.
•FALSE:Include all distinct values (default if omitted).
It may sound complicated, but in truth, it isn’t. Generating a unique list fromExample1is straightforward:
is both simple and intuitive to use, producing the list of the six  unique items in the order they are encountered (they are not sorted automatically — use the dynamic array function SORT to achieve this). Once this has been derived, all we need to do is count the items in the list. COUNTA will achieve this as it counts the number of nonempty cells in a range (see the screenshot “Using the UNIQUE Function to Count Unique Items in a List”).
Using the UNIQUE function to count unique items in a list
You should note that here the UNIQUE formula is in cell F32 and spills down the range F32:F37. Highlighting this range will result in Excel displaying it as F32#, the hash/pound sign (#) denoting that the range may vary. Therefore,
counts the spilled range emanating from cell F32 and hence totals the six  unique items. You should note that blanks will appear as “0” in the range as will zeroes, but they will be treated as twodifferentunique items, which is quite useful.
On this occasion, the formulas may be condensed (or “nested”):
Nesting array formulas does not always provide the required results due to the way Excel’s calculation engine works (this is discovered by using the universally and scientifically acclaimed approach known as “trial and error”), but in this instance it will.
This method is remarkably simple and should be understood by the majority of Excel users. However, it’s not all peaches and cream: Dynamic arrays are only available in Excel 365 and Excel 2021 presently, so this is not available to all. Call me old fashioned, but many get upset if they see #NAME? instead. Therefore, this solution is useful only when all end users have dynamic array formulas at their disposal.
So, what alternatives may we consider instead?
Option 2: Using PivotTables
Everyone loves a good PivotTable, right? Since creating tables is firmly entrenched in the spreadsheeting software, there is no need to worry about version compatibility using a cross-tab query.
In this instance, I will first convert my source data into a Table (usingInsert -> Tablefrom the Ribbon or else the keyboard shortcut CTRL+T). This allows the range to be automatically extended, without using those fancy dynamic arrays (see the screenshot “Converting Source Data Into a Table”).
Converting source data into a Table
On the Ribbon, in the context-specific tabTable Design(ie, when you select one or more cells of the Table), you will note I have named this Table Example2.
Next, I highlight one or more cells in this Table and selectInsert -> PivotTablefrom the Ribbon (the keyboard shortcut varies depending upon the version of Excel you have, but typically starts ALT+N+V).
I now view thePivotTable Fieldspane. If it hasn’t shown up automatically, right-click in the resulting PivotTable and select the final option,Show Field List(it’s annoying that Excel refers to it as something else in the pop-up shortcut menu). Then, simply move our only field (List) to theRowsarea (see the screenshot “PivotTables Fields Pane”).
This produces the following PivotTable:
This list will be sorted alphanumerically by default. Now, we simply count the number of nonblank items in this list. If you had a blank item in your original list, don’t worry, it will still appear in the PivotTable as (blank) so will be treated as, er, nonblank (see the screenshot “Using PivotTables to Count Unique Items in a List”).
Using PivotTables to count unique items in a list
There are a couple of drawbacks with this approach:
- You should note that the COUNTA formula may need to include a larger range than is filled by the PivotTable. This is in case the range extends when the data is refreshed. This may cause issues if end users add other data to this worksheet.
- If the source data changes, the PivotTable must be refreshed: The COUNTA formula will not necessarily provide the right answer until this action is performed. Many users forget to do this.
Therefore, the idea is simple. However, although it will work in all current versions of Excel, end users may forget to refresh the data should the source list be updated. So, what alternative do we have?
Option 3: Using SUMPRODUCT
Regular readers will know that SUMPRODUCT is one of my favourite functions in Excel, so much so that I named our company after it.
The function is highly underestimated. Consider the example in the screenshot “Example Use of the SUMPRODUCT Function”.
Example use of the SUMPRODUCT function
Here, I have various pricing points and the corresponding quantities sold. To calculate my total sales, I can compute my sales by taking theproductofUnit Pricemultiplied byQuantityon a line-by-line level and thensummingthem. As you can see, SUMPRODUCT does it all in one go:
But SUMPRODUCT is more powerful than that. The formula
does exactly the same thing. However, consider
Take a look at this revised example in the screenshot “Revised Example of the SUMPRODUCT Function”.
Revised example of the SUMPRODUCT function
Do you see how SUMPRODUCT divides on a record-by-record basis? This is powerful, and it is this concept that I shall use for our final method to be employed, using our list now conveniently calledExample3(see the screenshot “Using SUMPRODUCT to Count Unique Items in a List”).
Using SUMPRODUCT to count unique items in a list
Here, I have used the formula
Clearly, this seems to work, although its logic may be a little less transparent than the other two approaches upon first glance. The initial condition:
checks whether the rangeExample3contains nonempty cells (TRUE if so, FALSE otherwise). Here, it does not need to be nonblank — it just needs to beanythingthat cannot occur in the list in this scenario. You could just substitute this for one  should you wish, but I wanted to demonstrate how this might work if you wished to exclude blank cells. This gives us:
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
The second part, COUNTIF(Example3,Example3&””)
uses one of the more unusual ways of using COUNTIF. Again, it returns an array, but this time each value in the array represents a count of the numbers in the array using each value of the array as a criterion (the &”” addendum merely coerces the value to a text string, which may be required in certain instances). This results in:
3, 2, 1, 3, 2, 1, 2, 2, 3, 1
ie, there are three  instances of “a”, two  of “b”, and so on. The numerator is then divided by the denominator on an item-by-item basis to give us:
0.33, 0.5, 1, 0.33, 0.5, 1, 0.5, 0.5, 0.33, 1
When used in mathematical operations, TRUE behaves like one  and FALSE behaves as if it were zero . These results are then summed together to give us six . Easy when you know how.
The advantage of this approach is that it neither requires dynamic arrays nor data refreshing. The problem is the calculation is a little opaque. No solution is perfect, but this final option may prove to be the best all-rounder.
Word to the wise
Some of you may be surprised that I did not use Power Query/Get & Transform as one of the options above, since removing duplicates is a base transformation in the Power Query Editor. For the arbitrary purposes of this article, I merely wanted to consider basic Excel features and functions.
Indeed, Get & Transform may be used, too. It is a great method for cleansing data where there may be surplus spaces (trimming), non-printable characters (cleaning), or a preponderance of haphazard upper/lowercase lettering. However, similar to the PivotTables solution detailed above, this requires data to be refreshed in order to be updated, which many end users forget. Therefore, given the “behind the scenes” nature of this option, I chose to discount it in this instance. Nonetheless, this extract/transform/load (ETL) tool should be viewed as an essential tool in every modeller’s armoury. It’s merely a case of knowing which may work best when.
Liam Bastick, FCMA, CGMA, FCA, is director of SumProduct, a global consultancy specialising in Excel training. He is also an Excel MVP (as appointed by Microsoft) and author ofIntroduction to Financial ModellingandContinuing Financial Modelling.Send ideas for future Excel-related articles to him email@example.com. To comment on this article or to suggest an idea for another article, contact Oliver Rowe atOliver.Rowe@aicpa-cima.com.