I've been playing around with Excel an awful lot lately, which is always guaranteed to cheer me up. Time, then to add some more little discoveries to the pile of Excel-based bits and bobs I've put your way in the past (specifically
here,
here and
here).
Blank skip
How many times have you been infuriated by empty rows getting in the way of good analysis? Perhaps the answer to that question is "none", in which case "clever you". Me, though; I'm always getting myself into situations where for whatever reason I find empty cells getting in my way. One example is to be found in my personal accounts, where I have daily entries, and then weekly and monthly summaries appearing down the side. These summaries are cast adrift amidst rows of emptiness, and that can have implications when it comes to wanting to do anything very involved (graphs, for instance, can be a pain). There are ways around this: I could producing running figures down the page, but that would be ugly and would defeat the idea of the summary; I could copy and paste somewhere else, skipping blanks or sorting them away. That would look better but it would have dynamic repercussions, and I want to perform calculations using live data rather than a copy and paste of a particular point in time. What I need is an array formula.

In this example I have three columns: A contains the days of the week (my x-axis, if you like), while B and C contain data entered on certain days. To abbreviate that table to just those rows that have data in column B, I've popped the following into D1:
=INDEX(
$A:$C,SMALL(
IF($B:$B<>"",ROW(
$B:$B)),
ROWS(D$1:D1)),
COLUMNS($D1:D1))
Most importantly (and this is the weird and counter-intuitive bit) you need to press control, shift and enter at the same time (rather than just enter) to make this formula stick properly. Doing this will add those curly brackets round things ({}) and make everything work as if by magic. Infuriatingly, just typing the curly brackets doesn't work. It's all about the ctrl-shift-return.
Once you've ^⇧↵ed into D1, you can expand the formula across the rest of the cells in the usual way (and colour everything all rainbow-like, if that's how you roll). There are errors in D5-F7, because there's no data to feed them, but that can be sorted out with an =IFERROR in the right places if it bugs you.
So what is the formula doing? Let's pull it apart:
The bones of the formula is a simple cell reference from a defined array (the first three columns):
=INDEX(array, row number, column number)
The column number bit is a simple column-counting formula that will increase as we drag across E and F. In D1, the value is 1, in E1 it is 2, and in F1 it is 3.
The row number is more involved because this is the bit that's doing the hard work. There's a conditional formula in a SMALL formula that relies on the row number. The condition is that we should return the row number only if the value of a cell in B is greater than or less than blank. The rest of the SMALL formula states that we're only interested in the kth returned value, where k corresponds to the row we're on (so if we're on row 1, which D1 is, then we're after the first value where the value of B is greater than or less than nothingness).
Doing all that with a normal formula won't get you very far, because you'd be unable to keep count of those successfully met conditions (you could keep a running tally, but we're trying to keep our sheet clean of that sort of thing), but turning it into a magic ^⇧↵ array formula lets Excel tot up various details across a whole range of cells.
Array calculations do make Excel have to think that little bit harder, so you might want to avoid whole-column arrays like the ones I've used.
Offset Mapping
You know what's really irritating? When you've got a lot of data in Sheet2 that's reliant on the data in Sheet1, you delete a row from Sheet1 and everything on Sheet2 goes all error-y. Likewise if you sort Sheet1 in a new way and Sheet2 becomes a befuddled mess. What you need is to break the link between a cell in Sheet1 and a cell in Sheet2 while still carrying over the data. What you need is to be able to map to a particular cell, regardless of its contents. One way of doing it is to use INDEX (like in the example above), for which you must establish an array range. But perhaps easier is to use OFFSET. While INDEX pinpoints a location within an established array, OFFSET pinpoints a location relative to an origin (e.g. cell A1). The difference is subtle but slightly more flexible.
If we want to reproduce the contents of any cell in Sheet1 in a corresponding cell location on Sheet2 (without tying the two together in a relative formula), we can use:
=OFFSET(Sheet1!$A$1,ROW()-1,COLUMN()-1)
The ROW() and COLUMN() bits get the current coordinates, and these are applied with reference to cell A1 on Sheet1 (the "-1"s account for the fact that A1 takes up a cell of physical space). The offset location can be anything we want it to be, as, for that matter, can the offsets (the example above is just a convenient starting-place).
I'm currently using this to make reading lists. I can use one sheet as a sandpit (inserting and deleting rows, and shuffling content about the place), then have a second sheet which takes a snapshot of that content and arranges it in a pretty way without me having to do anything. It's my favourite thing in Excel at the moment.
The Name Game
Ok, this one's a brag more than a formula...
I just mentioned that I'm using Excel to generate reading lists, and I thought it would be wise to come up with a set of formulae to standardise whatever I put in and turn it into a Harvard citation. On the face of it, this shouldn't be too tricky. We can all look at a citation and work out what's what and how we might move the commas and brackets about to switch from one referencing style to another. Why a program like EndNote makes the process of importing citations such hard work I'm not entirely sure, given that even allowing for the vast wealth of different formats and models, the basic codes are all variations on a theme and surely a reference manager could work out which it was looking at if it could but really be bothered to try. But that's a different quibble. What I'm wanting to do is take a pre-existing citation of indeterminate style, dismantle it into its constituent (citation-manager friendly) parts and reassemble them. Unsurprisingly, the hardest bit of that is the identification of the dismantled parts. The presence of numbers, capitals, and various forms of punctuation are a clear help in this regard, whatever style is in play. Isolating the publication date is very easily done, and things start to unravel nicely from there because there are only so many ways you can describe the elements of a citation (at least in the common forms and for the common formats, which is what I'm playing with). A title is a title and varies very little as a consequence. Even volume and issue numbers follow some sort of discernibly extractable logic. In fact, there is only one part of your general, generic, stylistically vague common-all-garden reference that poses an especially pressing problem to a poorly programmed computer and that is the author name. Having extracted every other necessary building block, I was not to let myself be beaten by a few names.
There are a number of reasons why names cause trouble. They can be awash with commas and full stops, performing a range of different and often contradictory roles: commas between surnames and forenames, commas between collaborating authors; full-stops after each initial, or all initials, or none. Which brings us then to the swirling mess of first names, surnames, middle names, initials etc. But as humans we can look at these things and work our way through them with not too much trouble, so surely a computer can too. Admittedly, part of the human advantage is a reference database of typical forenames and surnames: something which could be emulated in Excel if needs be, but it seems an extreme length to have to go to. There must be a simpler way.
Well, reader, I managed it. Having already isolated the name portion of the original citation, I then dismantled it word by word, classing a comma as a word but eliminating full stops from the equation, to give something like:
Smith , A , Jones , B and McDonald , C
Now to work out what each cell actually is. Commas and "and"s are easy to determine. Initials were identified using something like:
=IF
(OR
(LEN(A1)=1,
IFERROR(CODE(MID(A1,2,1))=CODE(UPPER(MID(A1,2,1))),FALSE)),"I1,""
)
...in other words, if the length of a cell is one character, or if the second character is upper-case, consider the cell to contain one or more initials.
This is a good starting point and will serve us in good stead. The use of "I1" is to indicate the first instance of initials. Later initials can be given "I2", "I3", etc. This will help when it comes to piecing things back together properly.
Initials are easy. Forenames are pretty easy too, if they immediately follow a comma. An entry that goes SURNAME COMMA FORENAME is easy to tag. SURNAME COMMA FORENAME COMMA SURNAME COMMA FORENAME... is slightly more troublesome, but here maths is our friend: if we assume some sort of internal consistency, then the first comma indicates a forename, the second a new surname, etc.
Sometimes, commas are used more sparingly, eg. SURNAME INITIAL, SURNAME INITIAL, etc., but here we have the benefit of an initial to help us through the soup. Thankfully, SURNAME FORENAME is not something that tends to appear in the occidental world, so we can probably apply the assumption that nothing like that will bother us.
From these foundations, it's quite easy to put together some formulae to draw out the names we need in the majority of cases. But some people have really awkward names. And some people aren't people at all. People with two forenames or two surnames are especially troublesome. Again, commas (and "and") are our friends here. And we can begin to string together some really messy formulae based on relative positions of our respective bits of names. There's a lot of trial and error to this, but after some messing about it's possible to get something together that works almost universally (I've run my current version across about 400 citations and it seems to be working ok). Then it's just a case of glueing everything back together again.
Here's a link to a very-much beta version. The gubbins is very messy trial-and-error stuff, and I'm sure you'll be able to break it quite easily if you try, but it might give you a place to start should you ever require something similar.