The business case for Exchange 2007 – part IV

Another installment in a series of posts outlining the case for going to Exchange 2007. Previous articles can be found here.

GOAL: Make flexible working easier

“Flexible Working” might mean different things to differing organisations – some might think of mobile staff who turn up at any office with a laptop, sit at any free desk and start working – others might imagine groups of workers who can work from home part- or even full-time. Whatever your definition is, there’s no doubt that the technology which can enable these scenarios has evolved in great strides in recent years.

RPC Over HTTP – magic technology, even if the name isn’t

The “Wave 2003” of Exchange Server 2003/Outlook 2003/Windows XP SP2/Windows Server 2003 brought to the fore a technology which wasn’t really new, but needed the coordination of server OS, server application, client OS and client applications to make it available: if you’ve been using or deploying RPC/HTTP, you’ll know exactly what it does and why it’s cool. If you haven’t deployed it, the name might mean nothing to you… in short, the way in which Outlook talks to Exchange Server when you’re on the internal network, can be wrapped up within a secure channel that is more friendly to firewalls – hence “tunneling” that protocol (RPC) inside a stream of data which your firewall can receive (HTTP, or more correctly, HTTPS).

What this means in practice is that your users can connect in to your environment using a widely-supported network mechanism (ie HTTPS), and without requiring a Virtual Private Network connection to be established in the first place. This manifests itself in the fact that as soon as a user’s PC finds a connection to the internet, Outlook will attempt to connect to your network using HTTPS, and if it succeeds, will become “online” with Exchange and (if they’re using the default “cached mode” of Outlook) will synchronise changes between Outlook and Exchange since the client was last online.

image

A sometimes overlooked benefit of using regular internet protocols to connect the client & servers together, is that the communication will be able to leave one protected network, traverse the unprotected internet within a secure channel, then enter a second protected network. This means that (for example) your users could be connected to a customer or partner’s own internal network, but be able to go through that network’s firewall to reach your Exchange server. If you required a VPN to be established to connect Outlook and Exchange, then it almost certainly won’t be possible to use a protected network as your starting point, since the owners of that network will not allow the outbound connections that VPN clients use, but will allow outbound connections on HTTPS.

Now, RPC/HTTP was part of Outlook and Exchange 2003, however it’s been improved in Exchange 2007 and is easier to get up and running. If you’re also using Outlook 2007, the client configuration is a whole lot simpler – even if it’s the first time a user has ever connected to Exchange, all they may need to know is their email address and password, and Outlook will be able to find the Exchange server and configure itself using whatever default you’ve set. The technology behind the ease of configuration is called the Autodiscover Service, and the whole area of “connecting over the internet” functionality has also been given a more descriptive (to the non-techies, anyway) term: Outlook Anywhere.

From an end-user point of view, this technology is almost silent – for remote laptop users working at home, they often just start up their laptop, which connects automatically to a home wireless network and out to the internet, then Outlook just goes straight to Exchange and they’re online. Deploying this technology in Microsoft saw the volume of VPN traffic reduce dramatically, and the calls to the help desk concerning remote access dropped significantly too.

NET: Using Outlook 2007 and Exchange 2007 together simplifies the provision of remote access to remote users, particularly when using Outlook in “cached mode”. This configuration reduces, or even removes, the need to provide Virtual Private Network access, which could make the user experience better and save management overhead and expense.

Web client access instead of Outlook

Another element of flexible or remote working might be to use the web to get to email – maybe your remote users just want to quickly check email or calendar on their home PC, rather than using a laptop. Maybe there are workers who want to keep abreast of things when they’re on holiday, and have access to a kiosk or internet cafe type PC. Or perhaps your users are in their normal place of work, but don’t use email much, or don’t log-in to their own PC?

Outlook Web Access has been around for a number of versions of Exchange, and just gets better with every release. The 2007 version has added large areas of functionality (like support for the Unified Messaging functionality in Exchange, or huge improvements in handling the address book), meaning that for a good number of users, it’s as functional as they’d need Outlook to be. It’s increasingly feasible to have users accessing OWA as their primary means of getting to Exchange. One possible side benefit here is a licensing one – although you’d still be required to buy an Exchange Client Access License (which gives the user or the device the rights to connect to the server), you won’t need to buy Outlook or the Microsoft Office suite.

Outlook Web Access not only gives the web-user the ability to use email, calendar etc, but it can also provide access to internal file shares and/or Sharepoint document libraries – where the Exchange server will fetch data from internal sources, and display to the reader within their browser. It can also take Office documents and render them in HTML – so reading a spreadsheet or document could be done on a PC with no copy of Office available, or simply can be read without needing to download a copy of that document for rendering client-side in an application.

It’s possible to control what happens to attachments within OWA – some organisations don’t want people to be able to download attached files, in case they leave copies of them on public PCs like internet cafes – how many users would just save the document to the desktop, and maybe forget to delete it? Using server-side rendering of documents, all traces of the document will be removed when the user logs out or has their connection timed out.

Even for predominantly office-based users, OWA can provide a good way of getting to mail from some other PC, without needing to configure anything or log in to the machine – in that respect, it’s just like Hotmail, where you go to a machine and enter your username and password to access the mail, rather than having to log in to the whole PC as a given users.

If you deploy Outlook Anywhere (aka RPC/HTTP), you’ll already have all the infrastructure you need to enable Outlook Web Access – it uses the same Exchange Client Access server role (in fact, in Microsoft’s own deployment, “Outlook Anywhere” accounts for about 3/4 of all the remote traffic, with the rest being made up of OWA and Exchange Activesync).

NET: Outlook Web Access gives a very functionally-rich yet easy to use means of getting to data held on Exchange and possibly elsewhere on the internal network, in a secure means of communications to an external web browser. OWA 2007 has replicated more of Outlook’s functionality (such as great improvements to accessing address books), such that users familiar with Outlook will need little or no training, and users who don’t have Outlook may be able to rely on OWA as their primary means of accessing mail.

Mobile mail with ActiveSync

Exchange 2003 SP2 and an update to Windows Mobile 5 introduced the first out of the box “push mail” capability for Exchange, which forms part of the Microsoft Exchange Activesync protocol that’s also licensed to a number of other mobile device vendors. This allows Exchange to use the same infrastructure that’s already in place for Web access and for Outlook Anywhere, to push mail to mobile devices and to synchronise other content with them (like calendar updates or contact information). The Exchange Activesync capability in Exchange 2007 has been enhanced further, along with parallel improvements in the new Windows Mobile 6 client software for mobile devices.

Now it’s possible to flag messages for follow-up, read email in HTML format, set Out of Office status, and a whole ton of other functional enhancements which build on the same infrastructure described above. There’s no subscription to an external service required, and no additional servers or other software – reducing the cost of acquisition, deployment, and (potentially) in TCO. Analyst firm Wipro published some research, updated in June 2007, looking into TCO for mobile device platforms in which they conclude that Windows Mobile 5 and Exchange Activesync would be 20-28% lower in cost (over 3 years) than an equivalent Blackberry infrastructure.

NET: Continuing improvements in Exchange 2007 and Windows Mobile 6 will further enhance the user experience of mobile access to mail, calendar, contacts & tasks. Overall costs of ownership may be significantly lower than alternative mobile infrastructures, especially since the Microsoft server requirements may already be in place to service Outlook Anywhere and Outlook Web Access.

A last word on security

Of course, if you’re going to publish an Exchange server – which sits on your internal network, and has access to your internal Active Directory – to the outside world, you’ll need to make sure you take account of good security practice. You probably don’t want inbound connections from what are (at the outset) anonymous clients, coming through your firewall and connecting to Exchange – for one, they’ll have gone through the firewall within an encrypted SSL session (the S part of HTTPS) and since you don’t yet know who the end user is, an outsider could be using that connection as a way of mounting a denial of service attack or similar.

Microsoft’s ISA Server is a certified firewall which can be an end-point for the inbound SSL session (so it decrypts that connection), can challenge the client to authenticate and can inspect that what is going on in that session is a legitimate protocol (and not an attacker trying to flood your server with traffic). The “client” could be a PC running Outlook, a mobile device using Activesync or a web browser trying to access Outlook Web Access. See this whitepaper for more information on publishing Exchange 2007 onto the internet using ISA.

The Wal-Mart Effect

Here’s an interesting book on a business force which is changing the way that the US economy works, if you believe what the author is saying. Wal-Mart (which owns ASDA in the UK) has been growing like crazy in recent years, to the point where they’re big enough, supposedly, to have a direct impact on the inflation rate in a economy the size of the US.

The

One startling aspect of this exposé, is the effect that a company as powerful as Wal-Mart can have on its suppliers… normally reported as a bad thing, but there are good things too. An example of the latter was of one company who was shipping goods into the US, which were then taken to its own distribution centres, repackaged and sent out to Wal-Marts distribution chain, and then on down to the stores.

Once the two companies started sharing more detailed information with the other, Wal-Mart revealed that it was sending empty trucks back to its regional centre, from stores all over the country, which could be used by this supplier – so the supplier started importing its goods bound for Wal-Mart into Florida, and using Wal-Mart’s own trucks to ship the merchandise straight to their own distribution centres, thereby cutting out waste & expense.

It’s an interesting read – there may even be some parallels between Wal-Mart and Microsoft, some positive and others not. Microsoft’s Chief Operating Officer used to be Wal-Mart’s CIO, responsible for (among other things) one of the largest databases in the world, where Wal-Mart’s suppliers could see into the sales of their products across the entire distribution chain, as they happened… Quite some system…

The business case for Exchange 2007 – part III

This is a continuation of an occasional series of articles about how specific capabilities of Exchange 2007 can be mapped to business challenges. The other parts, and other related topics, can be found here.

GOAL: Lower the risk of being non-compliant

Now here’s a can of worms. What is “compliance”?

There are all sorts of industry- or geography-specific rules around both data retention and data destruction, and knowing which ones apply to you and what you should do about them is pretty much a black art for many organisations.

The US Sarbanes-Oxley Act of 2002 came to fruition to make corporate governance and accounting information more robust, in the wake of various financial scandals (such as the collapse of Enron). Although SOX is a piece of US legislation, it applies to not just American companies, but any foreign companies who have a US stock market listing or who are a subsidiary of a US parent.

The Securities Exchange Commission defines a 7-year period for retention of financial information, and for other associated information which forms part of the audit or review of that financial information. Arguably, any email or document which discusses a major issue for the company, even if it doesn’t make specific reference to the impact on corporate finance, could be required to be retained.

These requirements understandably can cause IT managers and CIOs to worry that they might not be compliant with whatever rules they are expected to follow, especially since they vary hugely in different parts of the world, and for any global company, can be highly confusing.

So, for anyone worried about being non-compliant, the first thing they’ll need to do is figure out what it would take for them to be compliant, and how they can measure up to that. This is far from an easy task, and a whole industry has sprung up to try to reassure the frazzled executive that if they buy this product/engage these consultants, then all will be well.

NET: Nobody can sell you out-of-the-box compliance solutions. They will sell you tools which can be used to implement a regime of compliance, but the trick is knowing what that looks like.

Now, Exchange can be used as part of the compliance toolset, and in conjunction with whatever policies and processes the business has in place to ensure appropriate data retention is put in place, and that there is a proper discovery process that can prove that something either exists or does not.

There are a few things to look out for, though…

Keeping “everything” just delays the impact of the problem, doesn’t solve it

I’ve seen so many companies implement archiving solutions where they just keep every document or every email message. I think this is storing up big trouble for the future: it might solve an immediate problem of ticking the box to say everything is archived, but management of that archive is going to become a problem later down the line.

Any reasonable retention policy will specify that documents or other pieces of information of a particular type or topic need to be kept for a period of time. They don’t say that every single piece of paper or electronic information must be kept.

NET: Keep everything you need to keep, and decide (if you can) what is not required to be kept, and throw it away. See a previous post on using Managed Folders & policy to implement this on Exchange.

Knowing where the data is kept is the only way you’ll be able to find it again

It seems obvious, but if you’re going to get to the point where you need to retain information, you’d better know where it’s kept otherwise you’ll never be able to prove that the information was indeed retained (or, sometimes even more importantly, prove that the information doesn’t exist… even if it maybe did at one time).

From an email perspective, this means not keeping data squirreled away on the hard disks of users’ PCs, or in the form of email archives which can only be opened via a laborious and time consuming process.

NET: PST files on users’ PCs or on network shares, are bad news for any compliance regime. See my previous related post on the mailbox quota paradox of thrift.

Exchange 2007 introduced a powerful search capability which allows end user to run searches against everything in their mailbox, be it from Outlook or a web client, even a mobile device. The search technology makes it so easy for an individual to find emails and other content, that a lot of people have pretty much stopped filing emails and just let them pile up, knowing they can find the content again, quickly.

The same search technology offers an administrator (and this would likely not be the email admins: more likely a security officer or director of compliance) the ability to search across mailboxes for specific content, carrying out a discovery process.

Outsourcing the problem could be a solution

Here’s something that might be of interest, even if you’re not running Exchange 2007- having someone else store your compliance archive for you. Microsoft’s Exchange Hosted Services came about as part of the company’s acquisition of Frontbridge a few years ago.

Much attention has been paid to the Hosted Filtering service, where all inbound mail for your organisation is delivered first to the EHS datacentre, scanned for potentially malicious content, then the clean stuff delivered down to your own mail systems.

Hosted Archive is a companion technology which runs on top of the filtering: since all inbound (and outbound) email is routed through the EHS datacentre, it’s a good place to keep a long-term archive of it. And if you add journaling into the mix (where every message internal to your Exchange world is also copied up to the EHS datacentre), then you could tick the box of having kept a copy of all your mail, without really having to do much. Once you’ve got the filtering up & running anyway, enabling archiving is a phone call away and all you need to know at your end is how to enable journaling.

NET: Using hosted filtering reduces the risk of inbound malicious email infecting your systems, and of you spreading infected email to other external parties. Hosting your archive in the same place makes a lot of sense, and is a snap to set up.

Exchange 2007 does add a little to this mix though, in the shape of per-user journaling. In this instance, you could decide you don’t need to archive every email from every user, but only certain roles or levels of employee (eg HR and legal departments, plus board members & executives).

Now, using Hosted Archive does go against what I said earlier about keeping everything – except that in this instance, you don’t need to worry about how to do the keeping… that’s someone else’s problem…

Further information on using Exchange in a compliance regime can be seen in a series of video demos, whitepapers and case studies at the Compliance with Exchange 2007 page on Microsoft.com.

Sometimes, you know you didn’t pay enough

… (and sometimes you probably suspect you paid too much)

A common trait in western cultures is the eye for a good deal – you know, getting two-for-the-price-of-one, or thinking that it’s worth buying something because it’s on sale and you’ll save 25%, rather than because you really need it or wanted it beforehand.

I saw a quotation the other day which set me thinking… John Ruskin, a leading 19th-century English artist, all-round intellectual and writer on culture & politics, said:

“There is hardly anything in the world that someone cannot make a little worse and sell a little cheaper, and the people who consider price alone are that person’s lawful prey. 

It is unwise to pay too much, but it is also unwise to pay too little. 

When you pay too much, you lose a little money, that is all. When you pay too little, you sometimes lose everything because the thing you bought is incapable of doing the thing you bought it to do. 

The common law of business balance prohibits paying a little and getting a lot… It can’t be done.

If you deal with the lowest bidder it is well to add something for the risk you run. 

And if you do that you will have enough to pay for something better.” — John Ruskin (1819-1900)

This is something that maybe executives at Mattel toys are mulling over right now, but it’s probably a valuable lesson to any consumers about the risk of going for the absolute cheapest in every sense, regardless of price point.

There’s probably an economic principle to explain all this, but I’ve no idea what it’s called

As it happens, I’ve been getting back into cycling recently and that’s required me to spend a great deal of time and money poring over bikes & accessories, whilst learning about all the differences between manufacturers, model ranges etc.

In short, they’re all much of a muchness. Just like computers, consumer electronics, or cars – is last year’s model really so inferior to the all-shiny new one, that it’s worth paying the premium for the up-to-date one? And how can a single manufacturer make such a huge range of related product and still retain its aspired brand values? (quality, excellence, durability, performance, blah blah blah)

I’ve pretty much come to the conclusion that for any individual at any point in time, there is a point where whatever it is you’re looking at is just too cheap, too low-spec for your needs. Sure, I can buy a A graph for illlustrative effectmountain bike for £50 in supermarkets or junk shops, but it’ll be heavy and not as well screwed together as a more expensive one I might get from a good cycle shop.

There’s a similar principle in all sorts of consumer areas – like wine, as another example. It’s possible to buy wine at £3 a bottle, but it’s going to be pretty ropey. £5 and up and you start getting really noticeable improvements – maybe a £6 bottle of wine could be considered 5 times better than a £3 bottle, though it’s unlikely that this will carry on – at some point, you’ll pay double and the more expensive product will hardly be any better to most people, but for someone, that might be the mid-point in their curve which would stretch from too cheap at one end, too expensive at the other, with a nice middle flat bit where they really want to be.

The far end of that curve would be the point where buying something too expensive will be wasted – if I only need the mountain bike to go to the shops on a Sunday morning for the newspapers, I could do without a lot of the lightweight materials or fancy suspension that a better bike would have. Ditto, if I’m an average cyclist, I won’t need a top-of-the-range carbon bike since it won’t make any difference to my “performance” (though try saying that to all the golfers who regularly sink their salaries into buying all the latest kit, without having any meaningful impact on their game).

Maybe it won’t be “wasted”, but I just won’t have any way of judging compared to other products in its proximity – if I’m in the market for a MINI and yet looked at the comparative price difference of a Ferrari and an Aston Martin, I wouldn’t rationally be able to say that one is better and worth the premium over the other.

So what does any of this have to do with software?

A two-fold principle I suppose: on one hand, maybe you don’t need to buy the latest and greatest piece of software without knowing what it will do for you and why. Or if you do buy the new version, have you really invested any effort into making sure you’re using it to its maximum potential?

Look at the new version of Microsoft Office, with the much-discussed “Ribbon” UI (actually, this link is a great training resource – it can show you the look of the Office 2003 application, you click on an icon or menu item, and it will take you to the location of the same command in the new UI).

The Ribbon scares some people when they see it, as they just think “all my users will need to be re-trained”, and they maybe ask “how can I make it look like the old version?”.

The fact that the Ribbon is so different gives us an excellent opportunity to think about what the users are doing in the first instance – rather than taking old practices and simply transplanting them into the new application, maybe it’s time to look in more depth about what the new application can do, and see if the old ways are still appropriate?

A second point would be to be careful about buying software which is too cheap – if someone can give it away for free, or it’s radically less expensive than the rest of the software in that category, are you sure it’s robust enough, that it will have a good level of backup support (and not just now, but in a few years’ time?) What else is the supplier going to get out of you, if they’re subsidising that low-cost software?

Coming back to Ruskin: it’s quite ironic that doing a quick search for that quote online reveals lots of businesses who’ve chosen it as a motto on their web site. Given that Ruskin was an opponent of capitalism (in fact he gave away all the money he inherited upon his father’s death), I wonder how he would feel about the practice of many companies using his words as an explanation of why they aren’t cheaper than their competitors?

The business case for Exchange 2007 – part II

(This is a follow on to the previous post on measuring business impact, and the first post on the business case for Exchange 2007, and are my own thoughts on the case for moving to Exchange 2007). It’s part of a series of posts which I’m trying to keep succinct, though they tend to be a bit longer than usual. If you find them useful, please let me know…)

GOAL: Reduce the backup burden

Now I’m going to start by putting on the misty rose-tinted specs and think back to the good old days of Exchange 4.0/5.x. When server memory was measured in megabytes and hard disk capacity in the low Gbs, there were much lower bottlenecks to performance than exist today.

Lots of people deployed Exchange servers with their own idea of how many users they would “fit” onto each box – in some cases, it would be the whole organisation; in others, it would be as many users as that physical site would have (since good practice was then to deploy a server at every major location); some would be determined by how many mailboxes that server could handle before it ran out of puff. As wide area networks got faster, more reliable and less expensive, and as server hardware got better and cheaper, the bottleneck for lots of organisations stopped being about how many users the server could handle, and more about how many users was IT comfortable in having the server handle.

On closer inspection, this “comfort” level would typically come about for 2 reasons:

  • Spread the active workload – If the server goes down (either planned or unplanned), I only want it to affect a percentage of the users rather than everyone. This way, I’d maybe have 2 medium-sized servers and put 250 users on each, rather than 500 users on one big server.
  • Time to Recovery is lower – If I had to recover the server because of a disaster, I only have so many hours (as the SLA might state) to get everything back up and running, and it will take too long to restore that much data from tape. If I split the users across multiple servers, then the likelihood of a disaster affecting more than one server may be lower, and,  in the event of total site failure, the recovery of multiple servers can at least be done in parallel.

(Of course, there were other reasons, initially – maybe people didn’t believe the servers would handle the load, so played safe and deployed more than they really needed… or third party software, like Blackberry Enterprise Server, might have added extra load so they’d need to split the population across more servers).

So the ultimate bottleneck is the time it takes for a single database or single server’s data to be brought back online in the event of total failure. This time will be a function of how fast the backup media was (older DAT type tape backup systems might struggle to do 10Gb/hr, whereas a straight-to-disk backup might do 10 or 20 times that rate), and is often referred to in mumbo-jumbo whitepaper speak as “RTO” or Recovery Time Objective. If you’ve only got 6 hours before you need to have the data back online, and it takes 20Gb/hr to recover the data from your backup media, then at a maximum you could only afford to have 120Gb to be recovered and still have a hope of meeting the SLA.

There are a few things that can be done to mitigate this requirement:

  • Agree a more forgiving RTO.
  • Accept a lower RPO (Recovery Point Objective is, in essence, the stage you need to get to – eg have all the data back up and running, or possibly have service restored but with no historical data, such as with dial-tone recovery in Exchange).
  • Reduce the volume of data which will need to be recovered in series – by separating out into multiple databases per server, or by having multiple servers.

Set realistic expectations

Now, it might sound like a non-starter to say that the RTO should be longer, or the RPO less functional – after all, the whole point of backup & disaster recovery is to carry on running even when bad stuff happens, right?

It’s important to think about why data is being backed up in the first place: it’s a similar argument to using clustering for high availability. You need to really know if you’re looking for availability, or recoverability. Availability means that you can keep a higher level of service, by continuing to provide service to users even when a physical server or other piece of infrastructure is no longer available, for whatever reason. Recoverability, on the other hand, is the ease and speed with which service and/or data can be brought online following a more sever failure.

I’ve spoken with lots of customers over the years who think they want clustering, but in reality they don’t know how to operate a single server in a well-managed and controlled fashion, so adding clusters would make things less reliable, not more. I’ve also spoken with customers who think they need site resilience, so if they lose their entire datacenter, they can carry on running from a backup site.

Since all but the largest organisations tend to run their datacenters in the same place where their users are (whether that “datacenter” is a cupboard under the stairs or the whole basement of their head office), in the event that the entire datacenter is wiped out, it’s quite likely that they’ll have lots of other things to worry about – like where the users are going to sit? How is the helpdesk going to function, and communicate effectively with all those now-stranded users? What about all the other, really mission critical applications? Is email really as important as the sales order processing system, or the customer-facing call centre?

In many cases, I think it is acceptable to have a recovery point objective of, within a reasonable time, delivering a service that will enable users to find each other and to send & receive mail. I don’t believe it’s always worth the effort and expense that would be required to bring all the users’ email online at the same time – I’d rather see mail service restored within an hour, even if it takes 5 days for the historical data to come back, compared to 8 hours for restoring any kind of service which included all the old data.

How much data to fit on each server in the first place

Microsoft’s best practice advice has been to limit the size of each Exchange database to 50Gb (in Exchange 2003), to make the backup & recovery process more manageable. If you built Exchange 2003 servers with the maximum number of databases, this would set the size “limit” of each server to 1Tb of data. In Exchange 2007, this advisory “limit” has been raised to 100Gb maximum per database, unless the server is replicating the data elsewhere (using the Continuous Replication technology), in which case it’s 200Gb per database. Oh, and Exchange 2007 raises the total number of databases to 50, so in theory, each server could now support 10Tb of data and still be recoverable within a reasonable time.

The total amount of data that can be accommodated on a single server is often used to make a decision about how many mailboxes to host there, and how big they should be – it’s pretty common to see sizes limited to 200Mb or thereabouts, though it does vary hugely (see the post on the Exchange Team blog from a couple of years ago to get a flavour). Exchange 2007 now defaults to having a mailbox quota of 10 times that size: 2Gb, made possible through some fundamental changes to the way Exchange handles and stores data.

Much of this storage efficiency now derives from Exchange 2007 running on 64-bit (x64) servers, meaning there’s potentially a lot more memory available for the server to cache disk contents in. A busy Exchange 2003 server (with, say, 4000 users), might only have enough memory to cache 250Kb of data for each user – probably not even enough for caching the index for the user’s mailbox, let alone any of the data. In Exchange 2007, the standard recommendation would be to size the server so as to have 5Mb or even 10Mb of memory for every user, resulting in dramatically more efficient use of the storage subsystem. This pay-off means that a traditional performance bottleneck on Exchange of the storage subsystem’s I/O throughput, is reduced considerably.

NET: Improvements in the underlying storage technology within Exchange 2007 mean that it is feasible to store a lot more data on each server, without performance suffering and without falling foul of your RTO/SLA goals.

I’ve posted before about Sizing Exchange 2007 environments.

What to back up and how?

When looking at backup and recovery strategies, it’s important to consider exactly what is being backed up, how often, and why.

Arguably, if you have a 2nd or 3rd online (or near-online) copy of a piece of data, then it’s less important to back it up in a more traditional fashion, since the primary point of recovery will be another of the online copies. The payoff for this approach is that it no longer matters as much if it takes a whole weekend to complete writing the backup to whatever medium you’re using (assuming some optical or magnetic media is still in play, of course), and that slower backup is likely to be used only for long-term archival or for recovery in a true catastrophe when all replicas of the data are gone.

Many organisations have sought to reduce the volume of data on Exchange for the purposes of meeting their SLAs, or because keeping large volumes of data on Exchange was traditionally more expensive due to the requirements for high-speed (and often shared) storage. With having more memory in an Exchange server due to it being 64-bit, the hit on I/O performance can be much lower, meaning that a 2007 server could host more data with the same set of disks than an equivalent 2003 server would (working on the assumption that Exchange will have historically hit disk I/O throughput bottlenecks before running out of disk space). The simplest way to reduce the volume of data stored on Exchange (and therefore, data which needs to be backed up and recovered on Exchange), is to reduce the mailbox quota of the end users.

In the post, Exchange mailbox quotas and ‘a paradox of thrift’, I talked about the downside of trying too hard to reduce mailbox sizes – the temptation is for the users to stuff everything into a PST file and have that being backed up (or risk being lost!) outside of Exchange. Maybe it’s better to invest in keeping more data online on Exchange, such that it’s always accessible from any client (unlike some archiving systems which require client-side software, thereby rendering the data unaccessible to non-Outlook clients), not replicated to users’ PCs when running in Cached Mode, and not being indexed for easy retrieval by either the Exchange Server or by the client PC.

NET: Taking data off Exchange and into either user’s PST archive files, or a centralised archiving system, may reduce the utility of the information by making it less easy to find and access, and could introduce more complex data management procedures as well as potential additional costs of ownership.

Coming to a datacenter near you

An interesting piece of “sleeper” technology may help reduce the discussions of backup technique: known simply as DPM, or System Center Data Protection Manager to give it its full title. DPM has been available for a while and targeted at backing up and restoring file server data, but the second release (DPM 2007) is due soon, and adds support for Exchange (as well as Sharepoint and SQL databases). In essence, DPM is an application which runs on Windows Server, that is used to manage snap-shots of the data source(s) it’s been assigned to protect. The server will happily take snaps at timely intervals and can keep them in a near-line state or archive them to offline (ie tape) storage for archival.

DPM 2007-05 graphic B

With very low cost but high-capacity disks (such as Serial-Attached SCSI arrays or even SATA disks deployed in fault-tolerant configurations), it could be possible to have DPM servers capable of backing up many Tbs of data as the first or second line of backup, before spooling off to tapes on an occasional basis for offsite storage. A lot of this technology has been around in some form for years (with storage vendors typically having their own proprietary mechanisms to create & manage the snapshots), but with a combination of Windows’ Volume Shadowcopy Services (VSS), Exchange’s support for VSS, and DPM’s provision of the back-end to the whole process, the cost of entry could be significantly lower.

NET: Keeping online snapshots of important systems doesn’t need to be as expensive as in the past, and can provide a better RTO and RPO than alternatives.

So, it’s important to think about how you backup and restore the Exchange servers in your organisation, but by using Exchange 2007, you could give the users a lot more quota that they’ve had before. Using Managed Folders in Exchange, you could cajole the users into keeping this data more free of stuff they don’t need to keep, and to more easily keep the stuff they do. All the while, it’s now possible to make sure the data is backed up quickly and at much lower cost than would have been previously possible with such volumes of data.

Exchange mailbox quotas and a ‘paradox of thrift’

The study of economics throws up some fantastic names for concepts or economic models, some of which have become part of the standard lexicon, such as the Law of Diminishing Returns, or the concept of opportunity cost, which I’ve written about before.


thrift.gifThough it sounds like it might be something out of Doctor Who, The Paradox of Thrift is a Keynesian concept which basically says that, contrary to what might seem obvious, saving money (as in people putting money into savings accounts) might be bad for the economy (in essence, if people saved more and spent or invested less, it would reduce the amount of money in circulation and cause an economic system to deflate). There’s a similar paradox to managing mailbox sizes in Exchange – from an IT perspective it seems like a good thing to reduce the total volume of mail on the server, since it costs less to manage all the disks and there’s less to backup and restore.


Ask the end users, however, and it’s probably a different story. I’ve lost count of how many times I’ve heard people grumble that they can’t send email because their mailbox has filled up (especially if they’ve been away from the office). End users might argue they just don’t have time to keep their mailbox size low through carefully ditching mail that they don’t need to keep, and filing the stuff that they do.



I guess it’s like another principle in economics – the idea that we have unlimited wants, but a limited set of resources with which to fulfil those wants & needs. The whole point of economics is to make best use of these limited resources to best satisfy the unlimited wants. Many people (with a few exceptions) would agree that they never have enough money – there’ll always be other, more expensive ways to get rid of it.


It’s important to have a sensible mailbox quota or the paradox of being too stingy may come back and bite you. Some organisations will take mail off their Exchange servers and drop it into a central archive, an approach which solves the problem somewhat but introduces an overhead of managing that archive (not to mention the cost of procurement). I’d argue that it’s better to use Managed Folders facilities in Exchange to manage the data.


The true paradox of mailbox quota thrift kicks in if the users have to archive everything to PST files, then you’ve just got the problem of how to make sure that’s backed up… especially since it’s not supported to have them stored on a network drive (though that doesn’t stop people from doing it… Personal folder files are unsupported over a LAN or over a WAN link). Even worse (from a backup perspective) is that Outlook opens all the PST files configured in its profile, for read/write. So what this means is that every one of the PST files in your Outlook profile gets its date/time stamp updated every time you run Outlook.


This of course means that if you’re storing your PSTs on a network share (tsk, tsk), and that file share is being backed up every night (as many are), then your PSTs will be backed up every night, regardless of whether the job is incremental/differential or full. I’ve seen large customers (eg a 100,000+ user bank) who estimate that over 50% of the total data they back up, every day, is PST files. Since PSTs are used as archives by most people, by definition the contents don’t change much, but that’s irrelevant – the date/time stamp is still updated every times they’re opened.


So as well as losing any benefit of single-instance storage by leaving the data in Exchange (or getting the users to delete it properly), you’re consuming possibly massive amounts of disk space on file servers, and having to deal with huge amounts of data to be backed up every night, even if it doesn’t change.


If you had an Exchange server with 1,000 users, and set the mailbox quota at 200Mb, you might end up with 75% quota usage and with 10% single instance ratio, you’d have about 135Gb of data on that server, which would be backed up in full every week, with incremental or differential backups every night in between (which will be a good bit smaller since not all that much data will change day to day).


If each of those users had 1Gb of PST files (not at all extraordinary – I currently have nearly 15Gb of PSTs loaded into Outlook! – even with a 2Gb quota on the mailbox, which is only 30% full), then you could be adding 1Tb of data to the file servers, hurting the LAN performance by having those PSTs locked open over the network, and being backed up every day… Give those users a 2Gb mailbox quota, and stop them from using PSTs altogether, and they’d be putting 1.2Tb worth of data onto Exchange, which might be more expensive to keep online than 1Tb+ of dumb filestore, but it’s being backed up more appropriately and can be controlled much better. 


So: don’t be miserly with your users’ mailbox quotas. Or be miserly, and stop them from using PSTs altogether (in Outlook 2003) or stop the PSTs from getting any bigger (in Outlook 2007).

The business case for Exchange 2007

(this is a follow on to the previous post on measuring business impact, and are my own thoughts on the case for moving to Exchange 2007)

There are plenty of resources already published which talk about the top reasons to deploy, upgrade or migrate to Exchange 2007 – the top 10 reasons page would be a good place to start. I’d like to draw out some tangible benefits which are maybe less obvious than the headline-grabbing “reduce costs”/”make everyone’s life better” type reasons. I’ll approach these reasons over a number of posts, otherwise this blog will end up reading like a whitepaper (and nobody will read it…)

GOAL: Be more available at a realistic price

High availability is one of those aims which is often harder to achieve than it first appears. If you want a really highly-available system, you need to think hard not only about which bits need to be procured and deployed (eg clustered hardware and the appropriate software that works with it), but the systems management and operations teams need to be structured in such a way that they can actually deliver the promised availability. Also, a bit like disaster recovery, high availability is always easier to justify following an event where not having it is sorely missed… eg if a failure happens and knocks the systems out of production for a while, it’ll be easier to go cap-in-hand to the budget holder and ask for more money to stop it happening again.

Example: Mrs Dalton runs her own business, and like many SMBs, money was tight when the company was starting up – to the extent that they used hand-me-down PC hardware to run their main file/print/mail server. I had always said that this needed to be temporary only, and how they really should buy something better, and it was always something that was going to happen in the future.

Since I do all the IT in the business (and I don’t claim to do it well – only well enough that it stops being a burden for me… another characteristic of small businesses, I think), and Mrs D is the 1st line support for anyone in the office if/when things go wrong, it can be a house of cards if we’re both away. A year or two after they started, the (temporary) server blew its power supply whilst we were abroad on holiday, meaning there was no IT services at all – no internal or internet access (since the DHCP server was now offline) which ultimately meant no networked printers, no file shares with all the client docs, no mail (obviously) – basically everything stopped.

A local PC repair company was called in and managed to replace the PSU and restore working order (at a predictably high degree of expense), restoring normal service after 2 days of almost complete downtime.

Guess what? When we got back, the order went in for a nice shiny server with redundant PSU, redundant disks etc etc. No more questions asked…

Now a historical approach to making Exchange highly available would be to cluster the servers – something I’ve talked about previously in a Clustering & High Availability post.

The principal downside to the traditional Exchange 2003-style cluster (now known as a Single Copy Cluster) was that it required a Storage Area Network (at least if you wanted more than 2 nodes), which could be expensive compared to the kind of high-capacity local disk drives that might be the choice for a stand-alone server. Managing a SAN can be a costly and complex activity, especially if all you want to do with it is to use it with Exchange.

Also, with the Single-Copy model, there’s still a single point of failure – if the data on the SAN got corrupted (or worst case, the SAN itself goes boom), then everything is lost and you have to go back to the last backup, which could have been hours or even days old.

NET: Clustering Exchange, in the traditional sense, can help you deliver a better quality of service. Downtime through routine maintenance is reduced and fault tolerance of servers is automatically provided (to a point).

Now accepting that a single copy cluster (SCC) solution might be fine for reducing downtime due to more minor hardware failure or for managing the service uptime during routine maintenance, it doesn’t provide a true disaster-tolerant solution. Tragic events like the Sept 11th attacks, or the public transport bombs in cities such as London and Madrid, made a lot of organisations take the threat of total loss of their service more seriously … meaning more started looking at meaningful ways of providing a lights-out disaster recovery datacenter. In some industries, this is even a regulatory requirement.

Replication, Replication, Replication

Thinking about true site-tolerant DR just makes everything more complex by multiples – in the SCC environment, the only supported way to replicate data to the DR site will be to do it synchronously – ie the Exchange servers in site A write data to their SAN, which replicates that write to the SAN in site B, which acknowledges that it has received that data, all before the SAN in site A can acknowledge to the servers that the data has successfully been written. All this adds huge latency to the process, and can consume large amounts of high-speed bandwidth not to mention duplication of hardware and typically expensive software (to manage the replication) at both sides.

If you plan to shortcut this approach and use some other piece of replication software (which is installed on the Exchange servers at both ends) to manage the process, be careful – there are some clear supportability boundaries which you need to be aware of. Ask yourself – is taking a short cut to save money in a high availability solution, just a false economy? Check out the Deployment Guidelines for multi-site replication in Exchange 2003.

There are other approaches which could be relevant to you for site-loss resilience. In most cases, were you to completely lose a site (and for a period of time measured at least in days and possibly indefinitely), there will be other applications which need to be brought online more quickly than perhaps your email system – critical business systems on which your organisation depends. Also, if you lost a site entirely, there’s the logistics of managing where all the people are going to go? Work from home? Sit in temporary offices?

One practical solution here is to use something in Exchange 2003 or 2007 called Dial-tone recovery. In essence, it’s a way of bringing up Exchange service at a remote location without having immediate access to all the Exchange data. So your users can at least log in and receive mail, and be able to use email to communicate during the time of adjustment, with the premise that at some point in the near future (once all the other important systems are up & running), their previous Exchange mailbox data will be brought back online and they can access it again. Maybe that data isn’t going to be complete, though – it could be simply a copy of the last night’s backup which can be restored onto the servers at the secondary site.

Using Dial-tone (and an associated model called Standby clustering, where manual activation of standby servers in a secondary datacenter can bring service – and maybe data – online), can provide you a way of keep service availability high (albeit with temporary lowering of the quality, since all the historic data isn’t there) at a time when you might really need that service (ie in a true disaster). Both of these approaches can be achieved without the complexity and expense of sharing disk storage, and without having to replicate the data in real-time to a secondary location.

Exchange 2007 can help you solve this problem, out of the box

Exchange 2007 introduced a new model called Cluster Continuous Replication (CCR) which provides a near-real-time replication process. This is modelled in such Cluster Continuous Replication Architecture

a way that you have a pair of Exchange mailbox servers (and they can only be doing the mailbox role, meaning you’re going to need other servers to take care of servicing web clients, performing mail delivery etc), and one of the servers is “active” at any time, with CCR taking care of the process of making sure that the copy of the data is also kept up to date, and providing the mechanism to automatically (or manually) fail over between the two nodes, and the two copies of the data.

What’s perhaps most significant about CCR is (apart from the fact that it’s in the box and therefore fully supported by Microsoft), is that there is no longer a requirement for the cluster nodes to access shared disk resources… meaning you don’t need a SAN (now, you may still have reasons for wanting a SAN, but it’s just not a requirement any more).

NET: Cluster Continuous Replication in Exchange 2007 can deliver a 2-node shared-nothing cluster architecture, where total failure of all components on one side can be automatically dealt with. Since there’s no requirement to share disk resources between the nodes, it may be possible to use high-speed, dedicated disks for each node, reducing the cost of procurement and the cost & complexity of managing the storage.

Exchange 2007 also offers Local Continuous Replication (LCR), designed for stand-alone servers to keep 2 copies of their databases on different sets of disks. LCR could be used to provide a low-cost way of keeping a copy of the data in a different place, ready to be brought online through a manual process. It is only applicable in a disaster recovery scenario, since it will not offer any form of failover in the event of a server failure or planned downtime.

Standby Continuous Replication (SCR) is the name given to another component of Exchange 2007, due to be part of the next service pack. This will provide a means to have standby, manually-activated, servers at a remote location, which receive a replica of data from a primary site, but without requiring the servers to be clustered. SCR could be used in conjunction with CCR, so a cluster which provides high availability at one location could also send a 3rd replica of its data to a remote site, to be used in case of total failure of the primary site.

 The key point is “reasonable price”

In summary, then: reducing downtime in your Exchange environment through clustering presents some challenges.

  • If you only have one site, you can cluster servers to share disk storage and get a higher level of service availability (assuming you have the skills to manage the cluster properly). To do this, you’ll need some form of storage area network or iSCSI NAS appliance.
  • If you need to provide site-loss resilience (either temporary but major, such as a complete power loss, or catastrophic, such as total loss of the site), there are 3rd-party software-based replication approaches which may be effective, but are not supported by Microsoft. Although these solutions may work well, you will need to factor in the possible additional risk of a more complex support arrangement. The time you least want to be struggling to find out who can and should be helping you get through a problem, is when you’ve had a site loss and are desperately trying to restore service.
  • Fully supported site-loss resilience with Exchange 2003 can only be achieved by replicating data at a storage subsystem level – in essence, you have servers and SANs at both sites, and the SANs take care of real-time, synchronous, replication of the data between the sites. This can be expensive to procure (with proprietary replication technology not to mention high speed, low latency network to connect the sites – typically dark fibre), and complex to manage.
  • There are manual approaches which can be used to provide a level of service at a secondary site, without requiring 3rd party software or hardware solutions – but these approaches are designed to be used for true disaster recovery, not necessarily appropriate for short-term outages such as temporary power failure or server hardware failure.
  • The Cluster Continuous Replication approach in Exchange 2007 can be used to deliver a highly-available cluster in one site, or can be spanned across sites (subject to network capacity etc) to provide high-availability for server maintenance, and a degree of protection against total site failure of either location.

NET: The 3 different replication models which are integral to Exchange 2007 (LCR, CCR and SCR) can help satisfy an organisation’s requirements to provide a highly-available, and disaster-tolerant, enterprise messaging system. This can be achieved without requiring proprietary and expensive 3rd party software and/or hardware solutions, compared with what would be required to deliver the same service using Exchange 2003.

 

Topics to come in the next installments of the business case for Exchange 2007 include:

  • Lower the risk of being non-compliant
  • Reduce the backup burden
  • Make flexible working easier

“Success kills” – Marc Andreessen on Facebook

Like so many other people in the last few weeks, I started using Facebook. They’re growing at such a ridiculous rate, adding 100,000 new users every day, and it’s reckoned that 50% of the millions of active users return to the site every day.


Following a link from a post by Steve, and reading Marc Andressen’s opinions on why Facebook is so successful (and what it’s done spectacularly right, and what in his opinion its shortcomings are), one particular section shocked me the most… after discussing the viral spread of Facebook applications and focusing on iLike as probably the most successful. Facebook app developers need to host their own servers (and bandwidth) to provide the services that Facebook will provide the gateway to. When iLike launched, they had near-exponential take up of their application, which completely hammered the servers they had access to. Here’s what Andreesen says subsequently:



Yesterday, about two weeks later, ILike announced that they have passed 3 million users on Facebook and are still growing — at a rate of 300,000 users per day.

They didn’t say how many servers they’re running, but if you do the math, it has to be in the hundreds and heading into the thousands.

Translation: unless you already have, or are prepared to quickly procure, a 100-500+ server infrastructure and everything associated with it — networking gear, storage gear, ISP interconnetions, monitoring systems, firewalls, load balancers, provisioning systems, etc. — and a killer operations team, launching a successful Facebook application may well be a self-defeating proposition.

This is a “success kills” scenario — the good news is you’re successful, the bad news is you’re flat on your back from what amounts to a self-inflicted denial of service attack, unless you have the money and time and knowledge to tackle the resulting scale challenges.


I love that analogy – self-inflicted DOS 🙂 But what a scary situation to be in – suddenly having to provide real-time, world-class infrastructure or else risk losing the goodwill of everyone who’s accessing the service if it fails or is too slow.


All of which makes me think – where on earth does the revenue to pay for all stuff this come from?

Measuring business impact

I’m going to approach this topic over a number of posts, as something I’ve been thinking about rather a lot lately.


Basically, the challenge is about finding out what impact making a change to the business environment will have: either positive or negative, and then using that information to either justify making the change in the first place (so it’s not really measuring business impact, but estimating future business impact of an impending change), or a retrospective measurement to either decide if some earlier change was a good thing (and maybe to see if it should continue).


Most of the time you’ll read about managing business impact, reducing cost, improving flexibility etc etc, it will be coming from someone trying to sell you something – an IT supplier saying that the latest version of this is going to solve all sorts of problems (some of which you don’t even know exist yet), or an IT or business analyst trying to sell you their insight and knowledge, without which you’re bound to fail and wind up on the scrapheap counting all those missed opportunities you just couldn’t see at the time.


Numerous terms have arisen, to try to describe this impact or to frame a way of counting the scale of it. Just a few examples:



TCOGartner Group coined the “Total Cost of Ownership” term in the late 1980s, to describe the cost of running a whole IT system, not just the cost of buying it or implementing it in the first place. It’s one of the most-used terms when it comes to talking about the benefits of some new IT system, partly because most businesses would see a benefit in reducing operational costs… so think that TCO reduction is inevitably a good thing. The irony is that, in my experience at least, many businesses don’t really know what their costs are (other than at a high level) so measuring a change in TCO is going to be difficult to do at any specific level.


Think of an example of support costs – if a new project aims to reduce the costs of keeping everything running, the only way you’ll know if it was effective would be to know what the true cost was in the first place. I’ve seen some businesses which can tell exactly how much it costs to provide really specific services to their users – like $10 a month to put a fixed phone on a user’s desk – so can more accurately estimate how much of a saving will be generated by rationalising, or improving the current systems.


RoIa catch-all term for what the return on any investment will be, and (in measuring terms at least), what the time frame for that return will be. Just as one way of making more money is to reduce the costs of operations, investing in something new which returns more money back into the business is a clear way of growing. The downside of looking for an ROI in every investment, however, is that the knock-on ROI will be in some associated project which you might not be expecting right now, or measuring currently. What I mean by that is, the fact that you made some change to the business might not bring about any RoI in itself (eg increasing capacity on the network) but it will allow other project (like deploying a new application) to be more effective.


TEIForrester Research came up with this one, possibly in answer to the noise Gartner was making about their TCO model… though it does go further than just look at cost. “Total Economic Impact” tries to correlate cost, benefit and (most importantly, perhaps) the future flexibility that might come about by making some change, with the risk inherent in doing so.


Opportunity Cost


Even when thinking about the financial models for justifying expenditure (let’s assume it’s a new software deployment, which will have direct costs – licenses – and indirect costs – the time of people to carry out testing of the software, training time for end users etc), it’s very easy to get caught up in thinking too closely about the project in question.


One concept that stands out to me when talking about IT investment, is that of opportunity cost – an economics term which isn’t really measured as a value of cost at all, but it’s the missed opportunity itself. In basic terms, the opportunity cost of going to the cinema on a Saturday afternoon is not going to see the football. In that example, it’s a straight choice – you can only do one of those things at that point in time, and the cost will be the missed opportunity to do the other. The element of choice there will be to decide which is going to be better – which is going to cost less, or which might return a higher degree of satisfaction, possibly.


Thinking about opportunity cost in business terms is a bit harder, since we often don’t know what the missed opportunity is until we look back some time later and realise it then. To flip that idea on its head, let’s say you want to measure how effective someone is at doing their job.


Business effectiveness


Just about every employer has measures in place to try to figure out how well the employees are doing – in relative terms, measuring their performance in comparison with their peers, or in financial terms, to decide if the resources being used to employ that person could be better used in a different area, or if more resources should be deployed to have more people doing that type of job.


Let’s take the example of a restaurant. Making a successful business will depend on a whole load of relatively fixed factors – the location of the building, the decor and ambience of the place, for example – as well as lots of flexible things, like the quality and price of the food or the effectiveness of the service. There will even be external factors that the restaurant could do nothing about, except possibly anticipate – such as change in fashion or a competitor opening up across the street.


If the quality of the food is poor when compared to the price, the standard of service and the overall ambience of the place, then customers will be unhappy and will likely vote with their feet. If the food is consistently average but cheap, then people will come for that reason (just look at most fast food outlets). Each of these factors could be varied – raising the price of the not-so-nice food, or paying more for ingredients to get higher quality, or firing the chef and replacing him with someone who’s more skilled – and they should make a difference, but the problem is in knowing (or guesstimating) what that difference will be before deciding on which factor to vary, and by how much.


When businesses look at how they invest in people, it’s easy to question the amount spent on any specific role. In the restaurant case, would adding another chef make the quality better? Would it improve the time it takes to get meals out to customers (something that’s maybe really important, if it’s a lunchtime restaurant but maybe less so in some cosy neighbourhood trattoria)? Would the knock-on effect be worth the extra cost of employing the chef? And would the extra chef just get in the way of the existing one, reducing their individual effectiveness?


I’ve said to people in my own employers in the past, that the only way they will really be able to measure how good a job my team does, is to stop us doing it and then observe what happens 2 years later. So what if the restaurant reduced the number of waiting staff, and switched from using expensive, fresh ingredients to cheaper frozen stuff, in an effort to reduce cost? On one hand, the figures might look good because the cost of supply has just dropped and the operational costs have been reduced too.


But the long term impact might be that loyal customers drift away as the food’s not as good value as it was before, or a bad review from an unexpected restaurant critic. At that point, it could require a huge effort to turn things around and rebuild the tarnished name of the place.


So what’s the point of all this rambling? Well, in the next installment I’ll look at some of the TCO/ROI arguments around Exchange Server…