Exchange mailbox quotas and a ‘paradox of thrift’

The study of economics throws up some fantastic names for concepts or economic models, some of which have become part of the standard lexicon, such as the Law of Diminishing Returns, or the concept of opportunity cost, which I’ve written about before.


thrift.gifThough it sounds like it might be something out of Doctor Who, The Paradox of Thrift is a Keynesian concept which basically says that, contrary to what might seem obvious, saving money (as in people putting money into savings accounts) might be bad for the economy (in essence, if people saved more and spent or invested less, it would reduce the amount of money in circulation and cause an economic system to deflate). There’s a similar paradox to managing mailbox sizes in Exchange – from an IT perspective it seems like a good thing to reduce the total volume of mail on the server, since it costs less to manage all the disks and there’s less to backup and restore.


Ask the end users, however, and it’s probably a different story. I’ve lost count of how many times I’ve heard people grumble that they can’t send email because their mailbox has filled up (especially if they’ve been away from the office). End users might argue they just don’t have time to keep their mailbox size low through carefully ditching mail that they don’t need to keep, and filing the stuff that they do.



I guess it’s like another principle in economics – the idea that we have unlimited wants, but a limited set of resources with which to fulfil those wants & needs. The whole point of economics is to make best use of these limited resources to best satisfy the unlimited wants. Many people (with a few exceptions) would agree that they never have enough money – there’ll always be other, more expensive ways to get rid of it.


It’s important to have a sensible mailbox quota or the paradox of being too stingy may come back and bite you. Some organisations will take mail off their Exchange servers and drop it into a central archive, an approach which solves the problem somewhat but introduces an overhead of managing that archive (not to mention the cost of procurement). I’d argue that it’s better to use Managed Folders facilities in Exchange to manage the data.


The true paradox of mailbox quota thrift kicks in if the users have to archive everything to PST files, then you’ve just got the problem of how to make sure that’s backed up… especially since it’s not supported to have them stored on a network drive (though that doesn’t stop people from doing it… Personal folder files are unsupported over a LAN or over a WAN link). Even worse (from a backup perspective) is that Outlook opens all the PST files configured in its profile, for read/write. So what this means is that every one of the PST files in your Outlook profile gets its date/time stamp updated every time you run Outlook.


This of course means that if you’re storing your PSTs on a network share (tsk, tsk), and that file share is being backed up every night (as many are), then your PSTs will be backed up every night, regardless of whether the job is incremental/differential or full. I’ve seen large customers (eg a 100,000+ user bank) who estimate that over 50% of the total data they back up, every day, is PST files. Since PSTs are used as archives by most people, by definition the contents don’t change much, but that’s irrelevant – the date/time stamp is still updated every times they’re opened.


So as well as losing any benefit of single-instance storage by leaving the data in Exchange (or getting the users to delete it properly), you’re consuming possibly massive amounts of disk space on file servers, and having to deal with huge amounts of data to be backed up every night, even if it doesn’t change.


If you had an Exchange server with 1,000 users, and set the mailbox quota at 200Mb, you might end up with 75% quota usage and with 10% single instance ratio, you’d have about 135Gb of data on that server, which would be backed up in full every week, with incremental or differential backups every night in between (which will be a good bit smaller since not all that much data will change day to day).


If each of those users had 1Gb of PST files (not at all extraordinary – I currently have nearly 15Gb of PSTs loaded into Outlook! – even with a 2Gb quota on the mailbox, which is only 30% full), then you could be adding 1Tb of data to the file servers, hurting the LAN performance by having those PSTs locked open over the network, and being backed up every day… Give those users a 2Gb mailbox quota, and stop them from using PSTs altogether, and they’d be putting 1.2Tb worth of data onto Exchange, which might be more expensive to keep online than 1Tb+ of dumb filestore, but it’s being backed up more appropriately and can be controlled much better. 


So: don’t be miserly with your users’ mailbox quotas. Or be miserly, and stop them from using PSTs altogether (in Outlook 2003) or stop the PSTs from getting any bigger (in Outlook 2007).

How to handle URLs with spaces in Outlook, Word etc

I was talking to a customer earlier today who was envisioning frustrations around using click-to-dial type functionality within OCS, where they’ll be copying & pasting phone numbers around. Now if the number is nicely formatted (and E.164 compliant…) then it won’t be problem, but the nearer number formatting gets to being easily machine-readable, the further it gets from being human-friendly.

This reminded me of a nice tip for dealing with odd URLs or other links (particularly UNC names such as \\server\share\folder name\file name) which might contain spaces. In many applications now (chiefly Word and Outlook, but others – such as Windows Live Writer – support it too), it’s possible to write or paste in a URL and have the application delay processing it and presenting it as a hyperlink.

Instead of ending up with \\server\share\folder name\file name, which you’ll get by starting to type the link, begin it with a “<“, then type or paste the whole URL, then close with a “>”. Now when you press space or enter, the app will likely process the hyperlink, remove the <>s and all is well. If you do end up with a half-formed link, go to the start of the text (before it becomes a hyperlink), enter the “<“, then jump to the end of the hyperlinked text (eg the end of “\folder”), and press backspace – this should remove the active bit. Finally, jump to the end, add your “>” and press space to complete.

Tags as long-running transactions

Tagging: Brett, John, Allister, Darren and Julius.

When it comes to transaction processing, most systems think in terms of very short increments of time – eg taking money from an ATM, the whole transaction is done in a few seconds. Some may take longer – like transferring money between two different banks, which could take a few days. Others are maybe much more long-running – such a house sale and purchase, which could last for weeks and weeks.

So it is with some blog follow up. I only just spotted that Steve the Geek had tagged me a few weeks ago, and maybe it’s time to follow up…

  • name at least 5 programs (web or standalone) that you love that go against the mainstream ( optional – reason why – if possible)
  • name at least 5 programs that you dislike; OSes not included, (optional – reason why – if possible)
  • tag at least 5 other people

So here goes…

Bouquets

  • Ilium’s eWallet – a super cool bit of cheap software which allows easy maintenance of confidential stuff on a PC (maybe the legions of passwords you might manage, or the account numbers of all your credit cards or bank accounts), and can synch them down to your Smartphone, Pocket PC or Palm. It’s one of the first things I install on any new mobile device or after rebuilding a PC. Not so much against the mainstream as genre-defining.
  • Windows Live Mail – I’ve now got 3 or 4 WL/Hotmail accounts that I use, and this desktop app manages them all nicely, even integrating to the instant search in Vista. Not mainstream since a lot of people -still- don’t know it even exists.
  • Numerous web-based forums, often based on software like vBulletin or UBB. In most cases, the forum software just works really well (though sometimes they have real problems with scalability), and has come on leaps & bounds since the early web forums. So much more friendly that Usenet. FlyerTalkDigitalSpy are examples of great web forums; PistonHeads, less so. 
  • Local.Live.com – drastically needs a better name, but it’s so good in so many ways that it’s a crying shame a lot of folks still don’t know about it. I remember the first time I saw Google Earth – I though it was really impressive, even though the UI was horrible. Microsoft’s Virtual Earth (even mobile) technology has overtaken Google Maps/Google Earth IMHO.
  • At the risk of being a bit too Microsoft-centric, I’m going to add Digital Image Suite 2006 here. Not as powerful as Photoshop, maybe, but for what I need to do with it (manage photos and do the odd bit of cropping & touching up), it works really well. Shame it’s now been discontinued 🙁

Brickbats

  • Partition Magic – Actually, I used to like PQM because it did something that there was no other feasible way of doing – dynamically resizing and moving disk partitions whilst preserving the data on them. I’m putting it in here because it hasn’t been updated in years (since well before Symantec hoovered up the company), and has no roadmap for the future – so it isn’t compatible with Vista and never will be.
  • Almost any PC laptop utilities from the manufacturer – whether it’s Toshiba’s crazy FlashCards that keep popping up on top of everything, or their monitor program to make sure the hard disk isn’t being moved too much (??), to Dell’s QuickSet utilities, they’re almost always slow, the UI is horrible, they consume lots of memory and (in the case of Tosh), routinely just fall over, especially when shutting the machine down.
  • Siebel. Talk to anyone in Microsoft who has to use Siebel (now, amusingly, an Oracle product, but one which MS has spent years and probably $$$$ implementing), and the universal opinion is that it is absolutely horrible in almost every regard.
  • Zune software – I’m sorry, I just don’t see why it was necessary to build a separate app which (presumably) shares a lot of its guts with Windows Media Player, that has to be installed to sync with the Zune player. Why can’t Zune just consume WMP, even put a skin on it for branding purposes, but not require a different look & feel, separate registration of filetypes etc? Maybe an example of Zune trying to be a little too like iPod/iTunes.
  • Acrobat Reader. How many times have I clicked on a link in a web page to open a PDF file in a new tab in IE7, read the doc and then pressed CTRL-F4 to close that tab, only to get an error saying: “Acrobat Reader: This action cannot be performed from within an external window”… Or how many times has the PC bogged down, only to find the Acrobat Reader process – which isn’t even open and visible – merrily chewing away at all the CPU and memory it can grab? Or how many times has IE fallen over like a helicopter missing a rotor blade, only to find that the dreaded ACRORD32.EXE is behind the fault? It’s probably better now than previously, but it seriously winds me up when Acrobat falls to bits because I know that most people will just attribute it to Windows or IE.

The business case for Exchange 2007

(this is a follow on to the previous post on measuring business impact, and are my own thoughts on the case for moving to Exchange 2007)

There are plenty of resources already published which talk about the top reasons to deploy, upgrade or migrate to Exchange 2007 – the top 10 reasons page would be a good place to start. I’d like to draw out some tangible benefits which are maybe less obvious than the headline-grabbing “reduce costs”/”make everyone’s life better” type reasons. I’ll approach these reasons over a number of posts, otherwise this blog will end up reading like a whitepaper (and nobody will read it…)

GOAL: Be more available at a realistic price

High availability is one of those aims which is often harder to achieve than it first appears. If you want a really highly-available system, you need to think hard not only about which bits need to be procured and deployed (eg clustered hardware and the appropriate software that works with it), but the systems management and operations teams need to be structured in such a way that they can actually deliver the promised availability. Also, a bit like disaster recovery, high availability is always easier to justify following an event where not having it is sorely missed… eg if a failure happens and knocks the systems out of production for a while, it’ll be easier to go cap-in-hand to the budget holder and ask for more money to stop it happening again.

Example: Mrs Dalton runs her own business, and like many SMBs, money was tight when the company was starting up – to the extent that they used hand-me-down PC hardware to run their main file/print/mail server. I had always said that this needed to be temporary only, and how they really should buy something better, and it was always something that was going to happen in the future.

Since I do all the IT in the business (and I don’t claim to do it well – only well enough that it stops being a burden for me… another characteristic of small businesses, I think), and Mrs D is the 1st line support for anyone in the office if/when things go wrong, it can be a house of cards if we’re both away. A year or two after they started, the (temporary) server blew its power supply whilst we were abroad on holiday, meaning there was no IT services at all – no internal or internet access (since the DHCP server was now offline) which ultimately meant no networked printers, no file shares with all the client docs, no mail (obviously) – basically everything stopped.

A local PC repair company was called in and managed to replace the PSU and restore working order (at a predictably high degree of expense), restoring normal service after 2 days of almost complete downtime.

Guess what? When we got back, the order went in for a nice shiny server with redundant PSU, redundant disks etc etc. No more questions asked…

Now a historical approach to making Exchange highly available would be to cluster the servers – something I’ve talked about previously in a Clustering & High Availability post.

The principal downside to the traditional Exchange 2003-style cluster (now known as a Single Copy Cluster) was that it required a Storage Area Network (at least if you wanted more than 2 nodes), which could be expensive compared to the kind of high-capacity local disk drives that might be the choice for a stand-alone server. Managing a SAN can be a costly and complex activity, especially if all you want to do with it is to use it with Exchange.

Also, with the Single-Copy model, there’s still a single point of failure – if the data on the SAN got corrupted (or worst case, the SAN itself goes boom), then everything is lost and you have to go back to the last backup, which could have been hours or even days old.

NET: Clustering Exchange, in the traditional sense, can help you deliver a better quality of service. Downtime through routine maintenance is reduced and fault tolerance of servers is automatically provided (to a point).

Now accepting that a single copy cluster (SCC) solution might be fine for reducing downtime due to more minor hardware failure or for managing the service uptime during routine maintenance, it doesn’t provide a true disaster-tolerant solution. Tragic events like the Sept 11th attacks, or the public transport bombs in cities such as London and Madrid, made a lot of organisations take the threat of total loss of their service more seriously … meaning more started looking at meaningful ways of providing a lights-out disaster recovery datacenter. In some industries, this is even a regulatory requirement.

Replication, Replication, Replication

Thinking about true site-tolerant DR just makes everything more complex by multiples – in the SCC environment, the only supported way to replicate data to the DR site will be to do it synchronously – ie the Exchange servers in site A write data to their SAN, which replicates that write to the SAN in site B, which acknowledges that it has received that data, all before the SAN in site A can acknowledge to the servers that the data has successfully been written. All this adds huge latency to the process, and can consume large amounts of high-speed bandwidth not to mention duplication of hardware and typically expensive software (to manage the replication) at both sides.

If you plan to shortcut this approach and use some other piece of replication software (which is installed on the Exchange servers at both ends) to manage the process, be careful – there are some clear supportability boundaries which you need to be aware of. Ask yourself – is taking a short cut to save money in a high availability solution, just a false economy? Check out the Deployment Guidelines for multi-site replication in Exchange 2003.

There are other approaches which could be relevant to you for site-loss resilience. In most cases, were you to completely lose a site (and for a period of time measured at least in days and possibly indefinitely), there will be other applications which need to be brought online more quickly than perhaps your email system – critical business systems on which your organisation depends. Also, if you lost a site entirely, there’s the logistics of managing where all the people are going to go? Work from home? Sit in temporary offices?

One practical solution here is to use something in Exchange 2003 or 2007 called Dial-tone recovery. In essence, it’s a way of bringing up Exchange service at a remote location without having immediate access to all the Exchange data. So your users can at least log in and receive mail, and be able to use email to communicate during the time of adjustment, with the premise that at some point in the near future (once all the other important systems are up & running), their previous Exchange mailbox data will be brought back online and they can access it again. Maybe that data isn’t going to be complete, though – it could be simply a copy of the last night’s backup which can be restored onto the servers at the secondary site.

Using Dial-tone (and an associated model called Standby clustering, where manual activation of standby servers in a secondary datacenter can bring service – and maybe data – online), can provide you a way of keep service availability high (albeit with temporary lowering of the quality, since all the historic data isn’t there) at a time when you might really need that service (ie in a true disaster). Both of these approaches can be achieved without the complexity and expense of sharing disk storage, and without having to replicate the data in real-time to a secondary location.

Exchange 2007 can help you solve this problem, out of the box

Exchange 2007 introduced a new model called Cluster Continuous Replication (CCR) which provides a near-real-time replication process. This is modelled in such Cluster Continuous Replication Architecture

a way that you have a pair of Exchange mailbox servers (and they can only be doing the mailbox role, meaning you’re going to need other servers to take care of servicing web clients, performing mail delivery etc), and one of the servers is “active” at any time, with CCR taking care of the process of making sure that the copy of the data is also kept up to date, and providing the mechanism to automatically (or manually) fail over between the two nodes, and the two copies of the data.

What’s perhaps most significant about CCR is (apart from the fact that it’s in the box and therefore fully supported by Microsoft), is that there is no longer a requirement for the cluster nodes to access shared disk resources… meaning you don’t need a SAN (now, you may still have reasons for wanting a SAN, but it’s just not a requirement any more).

NET: Cluster Continuous Replication in Exchange 2007 can deliver a 2-node shared-nothing cluster architecture, where total failure of all components on one side can be automatically dealt with. Since there’s no requirement to share disk resources between the nodes, it may be possible to use high-speed, dedicated disks for each node, reducing the cost of procurement and the cost & complexity of managing the storage.

Exchange 2007 also offers Local Continuous Replication (LCR), designed for stand-alone servers to keep 2 copies of their databases on different sets of disks. LCR could be used to provide a low-cost way of keeping a copy of the data in a different place, ready to be brought online through a manual process. It is only applicable in a disaster recovery scenario, since it will not offer any form of failover in the event of a server failure or planned downtime.

Standby Continuous Replication (SCR) is the name given to another component of Exchange 2007, due to be part of the next service pack. This will provide a means to have standby, manually-activated, servers at a remote location, which receive a replica of data from a primary site, but without requiring the servers to be clustered. SCR could be used in conjunction with CCR, so a cluster which provides high availability at one location could also send a 3rd replica of its data to a remote site, to be used in case of total failure of the primary site.

 The key point is “reasonable price”

In summary, then: reducing downtime in your Exchange environment through clustering presents some challenges.

  • If you only have one site, you can cluster servers to share disk storage and get a higher level of service availability (assuming you have the skills to manage the cluster properly). To do this, you’ll need some form of storage area network or iSCSI NAS appliance.
  • If you need to provide site-loss resilience (either temporary but major, such as a complete power loss, or catastrophic, such as total loss of the site), there are 3rd-party software-based replication approaches which may be effective, but are not supported by Microsoft. Although these solutions may work well, you will need to factor in the possible additional risk of a more complex support arrangement. The time you least want to be struggling to find out who can and should be helping you get through a problem, is when you’ve had a site loss and are desperately trying to restore service.
  • Fully supported site-loss resilience with Exchange 2003 can only be achieved by replicating data at a storage subsystem level – in essence, you have servers and SANs at both sites, and the SANs take care of real-time, synchronous, replication of the data between the sites. This can be expensive to procure (with proprietary replication technology not to mention high speed, low latency network to connect the sites – typically dark fibre), and complex to manage.
  • There are manual approaches which can be used to provide a level of service at a secondary site, without requiring 3rd party software or hardware solutions – but these approaches are designed to be used for true disaster recovery, not necessarily appropriate for short-term outages such as temporary power failure or server hardware failure.
  • The Cluster Continuous Replication approach in Exchange 2007 can be used to deliver a highly-available cluster in one site, or can be spanned across sites (subject to network capacity etc) to provide high-availability for server maintenance, and a degree of protection against total site failure of either location.

NET: The 3 different replication models which are integral to Exchange 2007 (LCR, CCR and SCR) can help satisfy an organisation’s requirements to provide a highly-available, and disaster-tolerant, enterprise messaging system. This can be achieved without requiring proprietary and expensive 3rd party software and/or hardware solutions, compared with what would be required to deliver the same service using Exchange 2003.

 

Topics to come in the next installments of the business case for Exchange 2007 include:

  • Lower the risk of being non-compliant
  • Reduce the backup burden
  • Make flexible working easier

Technology changes during the Blair era

So Tony Blair stepped down as the UK’s Prime Minister this week, just over 10 years since his ascendance to the position. Funnily enough, I got my “10 year” service award at Microsoft recently (a fetching crystal sculpture and a note from Bill ‘n’ Steve thanking me for the last decade’s commitment), which got me all misty-eyed and thinking about just how far the whole technology landscape has evolved in that time. I also did a presentation the other day to a customer’s gathering of IT people from across the world, who wanted to hear about future directions in Microsoft products. I figured it would be worth taking a retrospective before talking about how things were envisaged to change in the next few years.

When I joined Microsoft in June 1997, my first laptop was a Toshiba T4900CT – resplendent with 24Mb of RAM and a Pentium 75 processor. My current phone now has 3 times as much internal storage (forgetting about the 1Gb MicroSD card), a CPU that’s probably 5 times as powerful and a brighter LCD display which may be only a quarter the resolution, but displays 16 times as many colours.

In 1997, there was no such thing as broadband (unless you fancied paying for a Kilo- or even Mega-stream fixed line) and mobile data was something that could be sent over the RAM Mobile Data Network at speeds of maybe 9kbps. I do remember playing with an Ericsson wireless adapter which allowed a PC to get onto the RAM network – it was a type III PCMCIA card (meaning it took up 2 slots), it had a long retractable antenna, and if you used it anywhere near the CRT monitor that would be on the average desk, you’d see massive picture distortion (and I mean, pulses & spikes that would drag everything on the screen over to one side) that would make anyone think twice about sitting too close to the adapter…

The standard issue mobile phone was the Nokia 2110, a brick by modern standards which was twice as thick & twice as heavy as my Orange SPV E600, though the Nokia’s battery was only half as powerful but was said to last almost as long as the SPV’s. Don’t even think about wireless data, a colour screen, downloadable content or even synchronisation with other data sources like email.

People didn’t buy stuff on the internet in 1997 – in fact, a pioneering initiative called “e-Christmas” was set up at the end of that year, to encourage electronic commerce – I recall being able to order goods from as many as a handful of retailers, across as many as a few dozen product lines!

One could go on and on – at the risk of sounding like an old buffer. If Ray Kurzweil is right, and the pace of change is far from constant but is in fact accelerating and has been since the industrial revolution, then we’ll see the same order of magnitude change in technology as we had in the last ten years, within the next three.

In tech terms, there was no such thing as the good old days: it’s never been better than it is now, and it’s going to keep getting better at a faster rate, for as long as I think anyone can guess.

The Campaign for Real Pedantry, erm, I mean numbers

Hats off to James O’Neill for a display of true, world-class pedantry to which I could only aspire. It drives me nuts to get emails with badly formatted phone numbers which can’t be dialled on Smartphones without first editing them, and now that I’ve started using Office Communications Server 2007 (more later) as the backbone for my real office phone, it impedes the usability of that too.

James’ beef is that a lot of people incorrectly write a UK phone number which would be defined as 0118 909 nnnn (where 0118 is the area dialing code, and 909nnnn is the local number, the last 4 digits of which form an extension number in this specific example, available through DDI).

Here are some examples of number crime:

  • (0) 118 909 nnnn – Incorrect and useless. Why put the first zero in brackets at all? Nobody is ever going to dial starting ‘118’
  • +44 (0) 118 909 nnnn – Incorrect, though perhaps useful to people who don’t understand country codes. There may well be lots of people out there who don’t ever call international and don’t understand the “+44” model of dialing from a mobile phone. So maybe the (0) will indicated to them that maybe they should add it in… but it could be confusing to overseas dialers who’re calling this number – how do they know if they should dial +44 118 or +44 0 118?
  • +44 (0) (118) 909 nnnn – someone likes the brackets just a little too much
  • +44 (0118) 909 nnnn – even worse than number 2. Either drop the brackets and the 0, or drop the +44 altogether.

The only correct way to write this number is +44 118 909 nnnn, or for the truly pedantic, +44118909nnnn. Maybe you wouldn’t publish an E.164 formatted number (as the scheme is called) as your primary customer services number, and it doesn’t make sense to use it for numbers that won’t be dial-able from abroad (eg some 0870 numbers or 0800 numbers). But for everything else, I’d encourage everyone to please make sure your email signature has a properly formatted number (either simplifying it by dropping the +44 or losing the brackets and leading zero). If your company publishes your number in its online address book, then make sure that’s formatted correctly too so that people using telephone-aware systems (like Windows Mobile or Outlook Voice Access) can correctly call you.

In my profession, if someone doesn’t figure that +44 118 909 nnnn is my phone number and that if they’re in the UK and not in the Reading area, they need to drop +44 and add “0” if they’re dialing from a plain old phone system, then I’m quite happy to have them not phoning me up…

Outlook 2007 signatures location

Following my post about .sig files, I had cause to dig around looking for where Outlook actually puts the Signature files. I came across a post which Allister wrote a little while ago, but it’s such a useful little tip that it’s worth repeating…

Basically, Outlook 2007 offers a nice simple editor UI for building your own signatures, however it’s complicated by the need to present the signature in HTML (the default format for mail now), Rich Text Format (aka RTF, the original Exchange/Outlook format dating back to the mid 90s) and plain old Text (with total loss of colour, formatting etc).

image

Outlook actually generates 3 versions of the sig, to be used for the different formats. In some cases – notably with plain text – the end result following the conversion isn’t quite what you’d want… my nicely formatted .sig about comes out a bit mangled, as

Ewan Dalton |     | Microsoft UK | ewand@microsoft.com |+44 118 909 3318 Microsoft Limited | Registered in England | No 1624297 | Thames Valley Park, Reading RG6 1WG

so it may be necessary to do a bit of tweaking to the formats. Do bear in mind, that if you do edit the sig files directly, then go back into the menu in Outlook and make some other change, your original tweaks will be over-written.

Anyway, you could find the signature files in something akin to:

[root]\Users\[username]\AppData\Roaming\Microsoft\Signatures

(there may not be a \Roaming folder, depending on how your environment is set up, or it may be in \Documents and Settings\ and under an Application Data folder depending on your version of Windows).

“Success kills” – Marc Andreessen on Facebook

Like so many other people in the last few weeks, I started using Facebook. They’re growing at such a ridiculous rate, adding 100,000 new users every day, and it’s reckoned that 50% of the millions of active users return to the site every day.


Following a link from a post by Steve, and reading Marc Andressen’s opinions on why Facebook is so successful (and what it’s done spectacularly right, and what in his opinion its shortcomings are), one particular section shocked me the most… after discussing the viral spread of Facebook applications and focusing on iLike as probably the most successful. Facebook app developers need to host their own servers (and bandwidth) to provide the services that Facebook will provide the gateway to. When iLike launched, they had near-exponential take up of their application, which completely hammered the servers they had access to. Here’s what Andreesen says subsequently:



Yesterday, about two weeks later, ILike announced that they have passed 3 million users on Facebook and are still growing — at a rate of 300,000 users per day.

They didn’t say how many servers they’re running, but if you do the math, it has to be in the hundreds and heading into the thousands.

Translation: unless you already have, or are prepared to quickly procure, a 100-500+ server infrastructure and everything associated with it — networking gear, storage gear, ISP interconnetions, monitoring systems, firewalls, load balancers, provisioning systems, etc. — and a killer operations team, launching a successful Facebook application may well be a self-defeating proposition.

This is a “success kills” scenario — the good news is you’re successful, the bad news is you’re flat on your back from what amounts to a self-inflicted denial of service attack, unless you have the money and time and knowledge to tackle the resulting scale challenges.


I love that analogy – self-inflicted DOS 🙂 But what a scary situation to be in – suddenly having to provide real-time, world-class infrastructure or else risk losing the goodwill of everyone who’s accessing the service if it fails or is too slow.


All of which makes me think – where on earth does the revenue to pay for all stuff this come from?

Measuring business impact

I’m going to approach this topic over a number of posts, as something I’ve been thinking about rather a lot lately.


Basically, the challenge is about finding out what impact making a change to the business environment will have: either positive or negative, and then using that information to either justify making the change in the first place (so it’s not really measuring business impact, but estimating future business impact of an impending change), or a retrospective measurement to either decide if some earlier change was a good thing (and maybe to see if it should continue).


Most of the time you’ll read about managing business impact, reducing cost, improving flexibility etc etc, it will be coming from someone trying to sell you something – an IT supplier saying that the latest version of this is going to solve all sorts of problems (some of which you don’t even know exist yet), or an IT or business analyst trying to sell you their insight and knowledge, without which you’re bound to fail and wind up on the scrapheap counting all those missed opportunities you just couldn’t see at the time.


Numerous terms have arisen, to try to describe this impact or to frame a way of counting the scale of it. Just a few examples:



TCOGartner Group coined the “Total Cost of Ownership” term in the late 1980s, to describe the cost of running a whole IT system, not just the cost of buying it or implementing it in the first place. It’s one of the most-used terms when it comes to talking about the benefits of some new IT system, partly because most businesses would see a benefit in reducing operational costs… so think that TCO reduction is inevitably a good thing. The irony is that, in my experience at least, many businesses don’t really know what their costs are (other than at a high level) so measuring a change in TCO is going to be difficult to do at any specific level.


Think of an example of support costs – if a new project aims to reduce the costs of keeping everything running, the only way you’ll know if it was effective would be to know what the true cost was in the first place. I’ve seen some businesses which can tell exactly how much it costs to provide really specific services to their users – like $10 a month to put a fixed phone on a user’s desk – so can more accurately estimate how much of a saving will be generated by rationalising, or improving the current systems.


RoIa catch-all term for what the return on any investment will be, and (in measuring terms at least), what the time frame for that return will be. Just as one way of making more money is to reduce the costs of operations, investing in something new which returns more money back into the business is a clear way of growing. The downside of looking for an ROI in every investment, however, is that the knock-on ROI will be in some associated project which you might not be expecting right now, or measuring currently. What I mean by that is, the fact that you made some change to the business might not bring about any RoI in itself (eg increasing capacity on the network) but it will allow other project (like deploying a new application) to be more effective.


TEIForrester Research came up with this one, possibly in answer to the noise Gartner was making about their TCO model… though it does go further than just look at cost. “Total Economic Impact” tries to correlate cost, benefit and (most importantly, perhaps) the future flexibility that might come about by making some change, with the risk inherent in doing so.


Opportunity Cost


Even when thinking about the financial models for justifying expenditure (let’s assume it’s a new software deployment, which will have direct costs – licenses – and indirect costs – the time of people to carry out testing of the software, training time for end users etc), it’s very easy to get caught up in thinking too closely about the project in question.


One concept that stands out to me when talking about IT investment, is that of opportunity cost – an economics term which isn’t really measured as a value of cost at all, but it’s the missed opportunity itself. In basic terms, the opportunity cost of going to the cinema on a Saturday afternoon is not going to see the football. In that example, it’s a straight choice – you can only do one of those things at that point in time, and the cost will be the missed opportunity to do the other. The element of choice there will be to decide which is going to be better – which is going to cost less, or which might return a higher degree of satisfaction, possibly.


Thinking about opportunity cost in business terms is a bit harder, since we often don’t know what the missed opportunity is until we look back some time later and realise it then. To flip that idea on its head, let’s say you want to measure how effective someone is at doing their job.


Business effectiveness


Just about every employer has measures in place to try to figure out how well the employees are doing – in relative terms, measuring their performance in comparison with their peers, or in financial terms, to decide if the resources being used to employ that person could be better used in a different area, or if more resources should be deployed to have more people doing that type of job.


Let’s take the example of a restaurant. Making a successful business will depend on a whole load of relatively fixed factors – the location of the building, the decor and ambience of the place, for example – as well as lots of flexible things, like the quality and price of the food or the effectiveness of the service. There will even be external factors that the restaurant could do nothing about, except possibly anticipate – such as change in fashion or a competitor opening up across the street.


If the quality of the food is poor when compared to the price, the standard of service and the overall ambience of the place, then customers will be unhappy and will likely vote with their feet. If the food is consistently average but cheap, then people will come for that reason (just look at most fast food outlets). Each of these factors could be varied – raising the price of the not-so-nice food, or paying more for ingredients to get higher quality, or firing the chef and replacing him with someone who’s more skilled – and they should make a difference, but the problem is in knowing (or guesstimating) what that difference will be before deciding on which factor to vary, and by how much.


When businesses look at how they invest in people, it’s easy to question the amount spent on any specific role. In the restaurant case, would adding another chef make the quality better? Would it improve the time it takes to get meals out to customers (something that’s maybe really important, if it’s a lunchtime restaurant but maybe less so in some cosy neighbourhood trattoria)? Would the knock-on effect be worth the extra cost of employing the chef? And would the extra chef just get in the way of the existing one, reducing their individual effectiveness?


I’ve said to people in my own employers in the past, that the only way they will really be able to measure how good a job my team does, is to stop us doing it and then observe what happens 2 years later. So what if the restaurant reduced the number of waiting staff, and switched from using expensive, fresh ingredients to cheaper frozen stuff, in an effort to reduce cost? On one hand, the figures might look good because the cost of supply has just dropped and the operational costs have been reduced too.


But the long term impact might be that loyal customers drift away as the food’s not as good value as it was before, or a bad review from an unexpected restaurant critic. At that point, it could require a huge effort to turn things around and rebuild the tarnished name of the place.


So what’s the point of all this rambling? Well, in the next installment I’ll look at some of the TCO/ROI arguments around Exchange Server…

Graeme Obree: The Flying Scotsman

  I went to see a really interesting film tonight, a preview of The Flying Scotsman, a film about the life (or at least some of the achievements) of the remarkable Graeme Obree.


Obree, if you’ve never heard of him, broke the holy-grail record of cycling, beating a 9-year old total distance cycled in an hour. That record was always though of as the supreme struggle – get on your bike, cycle as hard as you can for 60 minutes, and see how far you got.


I had a passing acquaintance with Obree when he was in his teens (and I was younger): we were both in the same cycling club, very occasionally used to go on rides (which would generally involve me being dropped early on by Graeme and his pal Gordon Stead, in whose workshop he built the “Old Faithful” bike with which he stirred a hornet’s nest of controversy in the cycling establishment, whilst breaking a couple of world records and taking a couple of world championships).


Obree was always fast – even as a 17-year-old he was doing 20-minute 10-mile time trials on fairly ordinary bikes, along a dual carriageway. I recall hearing out of the blue in 1993 that some weird Scotsman who’d build his own bike out of washing machine parts – not strictly true, but looks good in the papers – had just taken the hour record… and couldn’t quite believe that it was the same Mr G Obree of Ayrshire.


Anyway, it’s a good film. A good story, and a darker and more interesting one than the usual “nice guy underdog triumphs against the villanous ogres of the establishment” type affair. Obree was, maybe still is to a degree, haunted by a bi-polar condition which he has sought therapy from in writing his own biography on which the film is based. Jonny Lee Miller turns in a top-drawer performance, and even manages to look a lot like Graeme did at the time.


The Flying Scotsman: The Graeme Obree StoryI haven’t read all of the Flying Scotsman autobiography yet – I’m waiting for it to wing its way from Amazon – but I have delved into it using their excellent “Search Inside” feature which allows you to preview pages of a book before committing to buying it.


I suppose the Obree story is one that everyone can learn from – by having supreme self belief (he refused to talk about “attempting” to break the world record… the way he saw it, he was going to break the record) and raw talent, it’s possible to prevail. Now, I can talk myself into self belief, but I’m still searching for the raw talent… 🙂


PS. The film goes on general release on 29th June 2007, and the book has been available for a couple of years.