Living the dream with Office Communicator 2007

I’ve been a long-time fan of instant messaging and pervasive “presence”, especially the cultural changes it allows organisations to make in order to communicate and collaborate better. As a result, I’ve been really interested to see what’s been happening with Office Communications Server (the soon-to-be-released successor to Live Communications Server).

Around 6 weeks ago, I joined an internal MS deployment of full-voice OCS, meaning that my phone number was moved onto the OCS platform so now I’m not using the PBX at all. It’s been a remarkably cool experience in a whole lot of ways, but it really hits home just how different the true UC world might be, when you start to use it in anger.

I’ve been working from home today, and the fact that my laptop is on the internet (regardless of whether I’m VPNed into the company network), the OCS server will route calls to my PC and simultaneously to the mobile, so I can pick them up wherever. As more and more people are using OCS internally, it’s increasingly the norm to just hit the “Call” button from within Office Communicator (the OCS client) or from Outlook, and not really care which number is going to be called.

brettjo on a Catalina

Here, I was having a chat with Brett and since we both have video cameras, I just made a video call – I was at home so just talked to the laptop in a speakerphone type mode, Brett was in the office so used his wired phone, which was plugged into the PC:

(this device is known internally as a “Catalina” and functions mainly as a USB speaker/microphone, but also has some additional capabilities like a message waiting light, a few hard-buttons, and a status light that shows the presence as currently set on OCS).

It’s a bit weird when you start using the phone and realise that you’re not actually going near a traditional PBX environment for a lot of the interaction. Calling up voice mail, as delivered by Exchange Unified Messaging, is as easy as pressing the “call voice mail” button in Communicator – no need to provide a PIN or an extension number, since the system already knows who I am and I’ve already authenticated by logging in to the PC.

When I use this, the “call” goes from my PC to OCS, then from the OCS server directly to the Exchange server, all as an IP data stream and without touching the traditional TDM PBX that we still have here. A third party voice gateway allows for me to use OCS to call other internal people who are still homed on the PBX system, and to make outbound calls.

Microsoft’s voice strategy of “VoIP As You Are” starts to make a lot of sense in this environment – I could deploy technology like OCS and Exchange UM and start getting immediate benefit, without needing to rip & replace the traditional phone system, at least not until it’s ready for obsolescence.

Here’s an idea of what kind of system is in place – for more information, check out Paul Duffy’s interview with ZDNet’s David Berlind.

The business case for Exchange 2007 – part II

(This is a follow on to the previous post on measuring business impact, and the first post on the business case for Exchange 2007, and are my own thoughts on the case for moving to Exchange 2007). It’s part of a series of posts which I’m trying to keep succinct, though they tend to be a bit longer than usual. If you find them useful, please let me know…)

GOAL: Reduce the backup burden

Now I’m going to start by putting on the misty rose-tinted specs and think back to the good old days of Exchange 4.0/5.x. When server memory was measured in megabytes and hard disk capacity in the low Gbs, there were much lower bottlenecks to performance than exist today.

Lots of people deployed Exchange servers with their own idea of how many users they would “fit” onto each box – in some cases, it would be the whole organisation; in others, it would be as many users as that physical site would have (since good practice was then to deploy a server at every major location); some would be determined by how many mailboxes that server could handle before it ran out of puff. As wide area networks got faster, more reliable and less expensive, and as server hardware got better and cheaper, the bottleneck for lots of organisations stopped being about how many users the server could handle, and more about how many users was IT comfortable in having the server handle.

On closer inspection, this “comfort” level would typically come about for 2 reasons:

  • Spread the active workload – If the server goes down (either planned or unplanned), I only want it to affect a percentage of the users rather than everyone. This way, I’d maybe have 2 medium-sized servers and put 250 users on each, rather than 500 users on one big server.
  • Time to Recovery is lower – If I had to recover the server because of a disaster, I only have so many hours (as the SLA might state) to get everything back up and running, and it will take too long to restore that much data from tape. If I split the users across multiple servers, then the likelihood of a disaster affecting more than one server may be lower, and,  in the event of total site failure, the recovery of multiple servers can at least be done in parallel.

(Of course, there were other reasons, initially – maybe people didn’t believe the servers would handle the load, so played safe and deployed more than they really needed… or third party software, like Blackberry Enterprise Server, might have added extra load so they’d need to split the population across more servers).

So the ultimate bottleneck is the time it takes for a single database or single server’s data to be brought back online in the event of total failure. This time will be a function of how fast the backup media was (older DAT type tape backup systems might struggle to do 10Gb/hr, whereas a straight-to-disk backup might do 10 or 20 times that rate), and is often referred to in mumbo-jumbo whitepaper speak as “RTO” or Recovery Time Objective. If you’ve only got 6 hours before you need to have the data back online, and it takes 20Gb/hr to recover the data from your backup media, then at a maximum you could only afford to have 120Gb to be recovered and still have a hope of meeting the SLA.

There are a few things that can be done to mitigate this requirement:

  • Agree a more forgiving RTO.
  • Accept a lower RPO (Recovery Point Objective is, in essence, the stage you need to get to – eg have all the data back up and running, or possibly have service restored but with no historical data, such as with dial-tone recovery in Exchange).
  • Reduce the volume of data which will need to be recovered in series – by separating out into multiple databases per server, or by having multiple servers.

Set realistic expectations

Now, it might sound like a non-starter to say that the RTO should be longer, or the RPO less functional – after all, the whole point of backup & disaster recovery is to carry on running even when bad stuff happens, right?

It’s important to think about why data is being backed up in the first place: it’s a similar argument to using clustering for high availability. You need to really know if you’re looking for availability, or recoverability. Availability means that you can keep a higher level of service, by continuing to provide service to users even when a physical server or other piece of infrastructure is no longer available, for whatever reason. Recoverability, on the other hand, is the ease and speed with which service and/or data can be brought online following a more sever failure.

I’ve spoken with lots of customers over the years who think they want clustering, but in reality they don’t know how to operate a single server in a well-managed and controlled fashion, so adding clusters would make things less reliable, not more. I’ve also spoken with customers who think they need site resilience, so if they lose their entire datacenter, they can carry on running from a backup site.

Since all but the largest organisations tend to run their datacenters in the same place where their users are (whether that “datacenter” is a cupboard under the stairs or the whole basement of their head office), in the event that the entire datacenter is wiped out, it’s quite likely that they’ll have lots of other things to worry about – like where the users are going to sit? How is the helpdesk going to function, and communicate effectively with all those now-stranded users? What about all the other, really mission critical applications? Is email really as important as the sales order processing system, or the customer-facing call centre?

In many cases, I think it is acceptable to have a recovery point objective of, within a reasonable time, delivering a service that will enable users to find each other and to send & receive mail. I don’t believe it’s always worth the effort and expense that would be required to bring all the users’ email online at the same time – I’d rather see mail service restored within an hour, even if it takes 5 days for the historical data to come back, compared to 8 hours for restoring any kind of service which included all the old data.

How much data to fit on each server in the first place

Microsoft’s best practice advice has been to limit the size of each Exchange database to 50Gb (in Exchange 2003), to make the backup & recovery process more manageable. If you built Exchange 2003 servers with the maximum number of databases, this would set the size “limit” of each server to 1Tb of data. In Exchange 2007, this advisory “limit” has been raised to 100Gb maximum per database, unless the server is replicating the data elsewhere (using the Continuous Replication technology), in which case it’s 200Gb per database. Oh, and Exchange 2007 raises the total number of databases to 50, so in theory, each server could now support 10Tb of data and still be recoverable within a reasonable time.

The total amount of data that can be accommodated on a single server is often used to make a decision about how many mailboxes to host there, and how big they should be – it’s pretty common to see sizes limited to 200Mb or thereabouts, though it does vary hugely (see the post on the Exchange Team blog from a couple of years ago to get a flavour). Exchange 2007 now defaults to having a mailbox quota of 10 times that size: 2Gb, made possible through some fundamental changes to the way Exchange handles and stores data.

Much of this storage efficiency now derives from Exchange 2007 running on 64-bit (x64) servers, meaning there’s potentially a lot more memory available for the server to cache disk contents in. A busy Exchange 2003 server (with, say, 4000 users), might only have enough memory to cache 250Kb of data for each user – probably not even enough for caching the index for the user’s mailbox, let alone any of the data. In Exchange 2007, the standard recommendation would be to size the server so as to have 5Mb or even 10Mb of memory for every user, resulting in dramatically more efficient use of the storage subsystem. This pay-off means that a traditional performance bottleneck on Exchange of the storage subsystem’s I/O throughput, is reduced considerably.

NET: Improvements in the underlying storage technology within Exchange 2007 mean that it is feasible to store a lot more data on each server, without performance suffering and without falling foul of your RTO/SLA goals.

I’ve posted before about Sizing Exchange 2007 environments.

What to back up and how?

When looking at backup and recovery strategies, it’s important to consider exactly what is being backed up, how often, and why.

Arguably, if you have a 2nd or 3rd online (or near-online) copy of a piece of data, then it’s less important to back it up in a more traditional fashion, since the primary point of recovery will be another of the online copies. The payoff for this approach is that it no longer matters as much if it takes a whole weekend to complete writing the backup to whatever medium you’re using (assuming some optical or magnetic media is still in play, of course), and that slower backup is likely to be used only for long-term archival or for recovery in a true catastrophe when all replicas of the data are gone.

Many organisations have sought to reduce the volume of data on Exchange for the purposes of meeting their SLAs, or because keeping large volumes of data on Exchange was traditionally more expensive due to the requirements for high-speed (and often shared) storage. With having more memory in an Exchange server due to it being 64-bit, the hit on I/O performance can be much lower, meaning that a 2007 server could host more data with the same set of disks than an equivalent 2003 server would (working on the assumption that Exchange will have historically hit disk I/O throughput bottlenecks before running out of disk space). The simplest way to reduce the volume of data stored on Exchange (and therefore, data which needs to be backed up and recovered on Exchange), is to reduce the mailbox quota of the end users.

In the post, Exchange mailbox quotas and ‘a paradox of thrift’, I talked about the downside of trying too hard to reduce mailbox sizes – the temptation is for the users to stuff everything into a PST file and have that being backed up (or risk being lost!) outside of Exchange. Maybe it’s better to invest in keeping more data online on Exchange, such that it’s always accessible from any client (unlike some archiving systems which require client-side software, thereby rendering the data unaccessible to non-Outlook clients), not replicated to users’ PCs when running in Cached Mode, and not being indexed for easy retrieval by either the Exchange Server or by the client PC.

NET: Taking data off Exchange and into either user’s PST archive files, or a centralised archiving system, may reduce the utility of the information by making it less easy to find and access, and could introduce more complex data management procedures as well as potential additional costs of ownership.

Coming to a datacenter near you

An interesting piece of “sleeper” technology may help reduce the discussions of backup technique: known simply as DPM, or System Center Data Protection Manager to give it its full title. DPM has been available for a while and targeted at backing up and restoring file server data, but the second release (DPM 2007) is due soon, and adds support for Exchange (as well as Sharepoint and SQL databases). In essence, DPM is an application which runs on Windows Server, that is used to manage snap-shots of the data source(s) it’s been assigned to protect. The server will happily take snaps at timely intervals and can keep them in a near-line state or archive them to offline (ie tape) storage for archival.

DPM 2007-05 graphic B

With very low cost but high-capacity disks (such as Serial-Attached SCSI arrays or even SATA disks deployed in fault-tolerant configurations), it could be possible to have DPM servers capable of backing up many Tbs of data as the first or second line of backup, before spooling off to tapes on an occasional basis for offsite storage. A lot of this technology has been around in some form for years (with storage vendors typically having their own proprietary mechanisms to create & manage the snapshots), but with a combination of Windows’ Volume Shadowcopy Services (VSS), Exchange’s support for VSS, and DPM’s provision of the back-end to the whole process, the cost of entry could be significantly lower.

NET: Keeping online snapshots of important systems doesn’t need to be as expensive as in the past, and can provide a better RTO and RPO than alternatives.

So, it’s important to think about how you backup and restore the Exchange servers in your organisation, but by using Exchange 2007, you could give the users a lot more quota that they’ve had before. Using Managed Folders in Exchange, you could cajole the users into keeping this data more free of stuff they don’t need to keep, and to more easily keep the stuff they do. All the while, it’s now possible to make sure the data is backed up quickly and at much lower cost than would have been previously possible with such volumes of data.

Lelouch’s “C’etait un Rendezvous” gets mashed

RendezvousSomeone has taken the petrol-heads’ classic film, a 9-minute dash through early morning Paris known simply as “Rendezvous“, and built a mash-up between Google Video and Google Maps, to show the route he was taking. Who needs another excuse to watch this film? Well, you’ve got it now.


Rendezvous, if you hadn’t heard the story, was a film shot by French director Claude Lelouch, allegedly driven by a professional driver at the wheel of Lelouch’s Ferrari 275GTB. In reality, it was a Mercedes saloon and it was Lelouch himself driving, and later dubbed the soundtrack (though it does sound pretty realistic to me).


Legend has it that he was arrested immediately following the first showing of the film: no surprise, since what it shows is completely illegal – driving at over 100mph through red-lights, the wrong way down one-way streets etc. It’s still strangely compelling, though, even if you know it’s a bit of a fake…


(thanks to Steve for the link)

BBC iPlayer kicks up a stink

It’s been interesting reading various news articles about the fact that the soon-to-be-released BBC iPlayer application will initially be available only to Internet Explorer and Windows XP users. The Register reports that a group called the Open Source Consortium is due to meet with the BBC Trust since the service will not be available at all to users (for example) of Firefox or Linux OS.

The Guardian‘s coverage points out that the same issues behind the iPlayer are shared with the commercial broadcasters’ services (ie Channel 4 and Sky). Channel 4 says:

Will I be able to access 4oD on my Mac?

Unfortunately not at the launch of 4oD.
This is an industry-wide issue caused because the accepted Digital Rights Management (DRM) system used to protect online video content, which is required by our content owners, is not compatible with Apple Mac hardware and software. The closed DRM system used by Apple is not currently available for licence by third parties and there is no other Mac-compatible DRM solution which meets the protection requirements of content owners. Unfortunately, we are therefore unable to offer 4oD content to Mac users at this stage.

The fact is, all of these services are being required to use DRM since they don’t own much of the content they’re “broadcasting”, and the content owners are saying that they’ll only allow it to be broadcast if it can be protected. And nobody has (yet) built a DRM system that is up to the job of securing the content, for the other platforms in question (with the exception of FairPlay, which Apple won’t license).

Someone from the BBC comments about the fact that the Windows DRM may be a target for hackers…

“We expect it to get broken. When it gets broken, Microsoft releases a new version [of DRM] and the application gets updated. It’s an imperfect solution. But it’s the least imperfect solution of them all.”

So, it’s interesting that the Open Source Consortium is threatening to take this whole thing to the European Union under an anti-trust banner. What’s better – provide an innovative service to 70-85% of the market, or have no service to anyone because the content providers won’t allow it? Sure, the latter example is “fairer” since it doesn’t favour one platform vs another, but is it really in the best interests of the end users…?

Exchange mailbox quotas and a ‘paradox of thrift’

The study of economics throws up some fantastic names for concepts or economic models, some of which have become part of the standard lexicon, such as the Law of Diminishing Returns, or the concept of opportunity cost, which I’ve written about before.


thrift.gifThough it sounds like it might be something out of Doctor Who, The Paradox of Thrift is a Keynesian concept which basically says that, contrary to what might seem obvious, saving money (as in people putting money into savings accounts) might be bad for the economy (in essence, if people saved more and spent or invested less, it would reduce the amount of money in circulation and cause an economic system to deflate). There’s a similar paradox to managing mailbox sizes in Exchange – from an IT perspective it seems like a good thing to reduce the total volume of mail on the server, since it costs less to manage all the disks and there’s less to backup and restore.


Ask the end users, however, and it’s probably a different story. I’ve lost count of how many times I’ve heard people grumble that they can’t send email because their mailbox has filled up (especially if they’ve been away from the office). End users might argue they just don’t have time to keep their mailbox size low through carefully ditching mail that they don’t need to keep, and filing the stuff that they do.



I guess it’s like another principle in economics – the idea that we have unlimited wants, but a limited set of resources with which to fulfil those wants & needs. The whole point of economics is to make best use of these limited resources to best satisfy the unlimited wants. Many people (with a few exceptions) would agree that they never have enough money – there’ll always be other, more expensive ways to get rid of it.


It’s important to have a sensible mailbox quota or the paradox of being too stingy may come back and bite you. Some organisations will take mail off their Exchange servers and drop it into a central archive, an approach which solves the problem somewhat but introduces an overhead of managing that archive (not to mention the cost of procurement). I’d argue that it’s better to use Managed Folders facilities in Exchange to manage the data.


The true paradox of mailbox quota thrift kicks in if the users have to archive everything to PST files, then you’ve just got the problem of how to make sure that’s backed up… especially since it’s not supported to have them stored on a network drive (though that doesn’t stop people from doing it… Personal folder files are unsupported over a LAN or over a WAN link). Even worse (from a backup perspective) is that Outlook opens all the PST files configured in its profile, for read/write. So what this means is that every one of the PST files in your Outlook profile gets its date/time stamp updated every time you run Outlook.


This of course means that if you’re storing your PSTs on a network share (tsk, tsk), and that file share is being backed up every night (as many are), then your PSTs will be backed up every night, regardless of whether the job is incremental/differential or full. I’ve seen large customers (eg a 100,000+ user bank) who estimate that over 50% of the total data they back up, every day, is PST files. Since PSTs are used as archives by most people, by definition the contents don’t change much, but that’s irrelevant – the date/time stamp is still updated every times they’re opened.


So as well as losing any benefit of single-instance storage by leaving the data in Exchange (or getting the users to delete it properly), you’re consuming possibly massive amounts of disk space on file servers, and having to deal with huge amounts of data to be backed up every night, even if it doesn’t change.


If you had an Exchange server with 1,000 users, and set the mailbox quota at 200Mb, you might end up with 75% quota usage and with 10% single instance ratio, you’d have about 135Gb of data on that server, which would be backed up in full every week, with incremental or differential backups every night in between (which will be a good bit smaller since not all that much data will change day to day).


If each of those users had 1Gb of PST files (not at all extraordinary – I currently have nearly 15Gb of PSTs loaded into Outlook! – even with a 2Gb quota on the mailbox, which is only 30% full), then you could be adding 1Tb of data to the file servers, hurting the LAN performance by having those PSTs locked open over the network, and being backed up every day… Give those users a 2Gb mailbox quota, and stop them from using PSTs altogether, and they’d be putting 1.2Tb worth of data onto Exchange, which might be more expensive to keep online than 1Tb+ of dumb filestore, but it’s being backed up more appropriately and can be controlled much better. 


So: don’t be miserly with your users’ mailbox quotas. Or be miserly, and stop them from using PSTs altogether (in Outlook 2003) or stop the PSTs from getting any bigger (in Outlook 2007).

How to handle URLs with spaces in Outlook, Word etc

I was talking to a customer earlier today who was envisioning frustrations around using click-to-dial type functionality within OCS, where they’ll be copying & pasting phone numbers around. Now if the number is nicely formatted (and E.164 compliant…) then it won’t be problem, but the nearer number formatting gets to being easily machine-readable, the further it gets from being human-friendly.

This reminded me of a nice tip for dealing with odd URLs or other links (particularly UNC names such as \\server\share\folder name\file name) which might contain spaces. In many applications now (chiefly Word and Outlook, but others – such as Windows Live Writer – support it too), it’s possible to write or paste in a URL and have the application delay processing it and presenting it as a hyperlink.

Instead of ending up with \\server\share\folder name\file name, which you’ll get by starting to type the link, begin it with a “<“, then type or paste the whole URL, then close with a “>”. Now when you press space or enter, the app will likely process the hyperlink, remove the <>s and all is well. If you do end up with a half-formed link, go to the start of the text (before it becomes a hyperlink), enter the “<“, then jump to the end of the hyperlinked text (eg the end of “\folder”), and press backspace – this should remove the active bit. Finally, jump to the end, add your “>” and press space to complete.

Tags as long-running transactions

Tagging: Brett, John, Allister, Darren and Julius.

When it comes to transaction processing, most systems think in terms of very short increments of time – eg taking money from an ATM, the whole transaction is done in a few seconds. Some may take longer – like transferring money between two different banks, which could take a few days. Others are maybe much more long-running – such a house sale and purchase, which could last for weeks and weeks.

So it is with some blog follow up. I only just spotted that Steve the Geek had tagged me a few weeks ago, and maybe it’s time to follow up…

  • name at least 5 programs (web or standalone) that you love that go against the mainstream ( optional – reason why – if possible)
  • name at least 5 programs that you dislike; OSes not included, (optional – reason why – if possible)
  • tag at least 5 other people

So here goes…

Bouquets

  • Ilium’s eWallet – a super cool bit of cheap software which allows easy maintenance of confidential stuff on a PC (maybe the legions of passwords you might manage, or the account numbers of all your credit cards or bank accounts), and can synch them down to your Smartphone, Pocket PC or Palm. It’s one of the first things I install on any new mobile device or after rebuilding a PC. Not so much against the mainstream as genre-defining.
  • Windows Live Mail – I’ve now got 3 or 4 WL/Hotmail accounts that I use, and this desktop app manages them all nicely, even integrating to the instant search in Vista. Not mainstream since a lot of people -still- don’t know it even exists.
  • Numerous web-based forums, often based on software like vBulletin or UBB. In most cases, the forum software just works really well (though sometimes they have real problems with scalability), and has come on leaps & bounds since the early web forums. So much more friendly that Usenet. FlyerTalkDigitalSpy are examples of great web forums; PistonHeads, less so. 
  • Local.Live.com – drastically needs a better name, but it’s so good in so many ways that it’s a crying shame a lot of folks still don’t know about it. I remember the first time I saw Google Earth – I though it was really impressive, even though the UI was horrible. Microsoft’s Virtual Earth (even mobile) technology has overtaken Google Maps/Google Earth IMHO.
  • At the risk of being a bit too Microsoft-centric, I’m going to add Digital Image Suite 2006 here. Not as powerful as Photoshop, maybe, but for what I need to do with it (manage photos and do the odd bit of cropping & touching up), it works really well. Shame it’s now been discontinued 🙁

Brickbats

  • Partition Magic – Actually, I used to like PQM because it did something that there was no other feasible way of doing – dynamically resizing and moving disk partitions whilst preserving the data on them. I’m putting it in here because it hasn’t been updated in years (since well before Symantec hoovered up the company), and has no roadmap for the future – so it isn’t compatible with Vista and never will be.
  • Almost any PC laptop utilities from the manufacturer – whether it’s Toshiba’s crazy FlashCards that keep popping up on top of everything, or their monitor program to make sure the hard disk isn’t being moved too much (??), to Dell’s QuickSet utilities, they’re almost always slow, the UI is horrible, they consume lots of memory and (in the case of Tosh), routinely just fall over, especially when shutting the machine down.
  • Siebel. Talk to anyone in Microsoft who has to use Siebel (now, amusingly, an Oracle product, but one which MS has spent years and probably $$$$ implementing), and the universal opinion is that it is absolutely horrible in almost every regard.
  • Zune software – I’m sorry, I just don’t see why it was necessary to build a separate app which (presumably) shares a lot of its guts with Windows Media Player, that has to be installed to sync with the Zune player. Why can’t Zune just consume WMP, even put a skin on it for branding purposes, but not require a different look & feel, separate registration of filetypes etc? Maybe an example of Zune trying to be a little too like iPod/iTunes.
  • Acrobat Reader. How many times have I clicked on a link in a web page to open a PDF file in a new tab in IE7, read the doc and then pressed CTRL-F4 to close that tab, only to get an error saying: “Acrobat Reader: This action cannot be performed from within an external window”… Or how many times has the PC bogged down, only to find the Acrobat Reader process – which isn’t even open and visible – merrily chewing away at all the CPU and memory it can grab? Or how many times has IE fallen over like a helicopter missing a rotor blade, only to find that the dreaded ACRORD32.EXE is behind the fault? It’s probably better now than previously, but it seriously winds me up when Acrobat falls to bits because I know that most people will just attribute it to Windows or IE.

The business case for Exchange 2007

(this is a follow on to the previous post on measuring business impact, and are my own thoughts on the case for moving to Exchange 2007)

There are plenty of resources already published which talk about the top reasons to deploy, upgrade or migrate to Exchange 2007 – the top 10 reasons page would be a good place to start. I’d like to draw out some tangible benefits which are maybe less obvious than the headline-grabbing “reduce costs”/”make everyone’s life better” type reasons. I’ll approach these reasons over a number of posts, otherwise this blog will end up reading like a whitepaper (and nobody will read it…)

GOAL: Be more available at a realistic price

High availability is one of those aims which is often harder to achieve than it first appears. If you want a really highly-available system, you need to think hard not only about which bits need to be procured and deployed (eg clustered hardware and the appropriate software that works with it), but the systems management and operations teams need to be structured in such a way that they can actually deliver the promised availability. Also, a bit like disaster recovery, high availability is always easier to justify following an event where not having it is sorely missed… eg if a failure happens and knocks the systems out of production for a while, it’ll be easier to go cap-in-hand to the budget holder and ask for more money to stop it happening again.

Example: Mrs Dalton runs her own business, and like many SMBs, money was tight when the company was starting up – to the extent that they used hand-me-down PC hardware to run their main file/print/mail server. I had always said that this needed to be temporary only, and how they really should buy something better, and it was always something that was going to happen in the future.

Since I do all the IT in the business (and I don’t claim to do it well – only well enough that it stops being a burden for me… another characteristic of small businesses, I think), and Mrs D is the 1st line support for anyone in the office if/when things go wrong, it can be a house of cards if we’re both away. A year or two after they started, the (temporary) server blew its power supply whilst we were abroad on holiday, meaning there was no IT services at all – no internal or internet access (since the DHCP server was now offline) which ultimately meant no networked printers, no file shares with all the client docs, no mail (obviously) – basically everything stopped.

A local PC repair company was called in and managed to replace the PSU and restore working order (at a predictably high degree of expense), restoring normal service after 2 days of almost complete downtime.

Guess what? When we got back, the order went in for a nice shiny server with redundant PSU, redundant disks etc etc. No more questions asked…

Now a historical approach to making Exchange highly available would be to cluster the servers – something I’ve talked about previously in a Clustering & High Availability post.

The principal downside to the traditional Exchange 2003-style cluster (now known as a Single Copy Cluster) was that it required a Storage Area Network (at least if you wanted more than 2 nodes), which could be expensive compared to the kind of high-capacity local disk drives that might be the choice for a stand-alone server. Managing a SAN can be a costly and complex activity, especially if all you want to do with it is to use it with Exchange.

Also, with the Single-Copy model, there’s still a single point of failure – if the data on the SAN got corrupted (or worst case, the SAN itself goes boom), then everything is lost and you have to go back to the last backup, which could have been hours or even days old.

NET: Clustering Exchange, in the traditional sense, can help you deliver a better quality of service. Downtime through routine maintenance is reduced and fault tolerance of servers is automatically provided (to a point).

Now accepting that a single copy cluster (SCC) solution might be fine for reducing downtime due to more minor hardware failure or for managing the service uptime during routine maintenance, it doesn’t provide a true disaster-tolerant solution. Tragic events like the Sept 11th attacks, or the public transport bombs in cities such as London and Madrid, made a lot of organisations take the threat of total loss of their service more seriously … meaning more started looking at meaningful ways of providing a lights-out disaster recovery datacenter. In some industries, this is even a regulatory requirement.

Replication, Replication, Replication

Thinking about true site-tolerant DR just makes everything more complex by multiples – in the SCC environment, the only supported way to replicate data to the DR site will be to do it synchronously – ie the Exchange servers in site A write data to their SAN, which replicates that write to the SAN in site B, which acknowledges that it has received that data, all before the SAN in site A can acknowledge to the servers that the data has successfully been written. All this adds huge latency to the process, and can consume large amounts of high-speed bandwidth not to mention duplication of hardware and typically expensive software (to manage the replication) at both sides.

If you plan to shortcut this approach and use some other piece of replication software (which is installed on the Exchange servers at both ends) to manage the process, be careful – there are some clear supportability boundaries which you need to be aware of. Ask yourself – is taking a short cut to save money in a high availability solution, just a false economy? Check out the Deployment Guidelines for multi-site replication in Exchange 2003.

There are other approaches which could be relevant to you for site-loss resilience. In most cases, were you to completely lose a site (and for a period of time measured at least in days and possibly indefinitely), there will be other applications which need to be brought online more quickly than perhaps your email system – critical business systems on which your organisation depends. Also, if you lost a site entirely, there’s the logistics of managing where all the people are going to go? Work from home? Sit in temporary offices?

One practical solution here is to use something in Exchange 2003 or 2007 called Dial-tone recovery. In essence, it’s a way of bringing up Exchange service at a remote location without having immediate access to all the Exchange data. So your users can at least log in and receive mail, and be able to use email to communicate during the time of adjustment, with the premise that at some point in the near future (once all the other important systems are up & running), their previous Exchange mailbox data will be brought back online and they can access it again. Maybe that data isn’t going to be complete, though – it could be simply a copy of the last night’s backup which can be restored onto the servers at the secondary site.

Using Dial-tone (and an associated model called Standby clustering, where manual activation of standby servers in a secondary datacenter can bring service – and maybe data – online), can provide you a way of keep service availability high (albeit with temporary lowering of the quality, since all the historic data isn’t there) at a time when you might really need that service (ie in a true disaster). Both of these approaches can be achieved without the complexity and expense of sharing disk storage, and without having to replicate the data in real-time to a secondary location.

Exchange 2007 can help you solve this problem, out of the box

Exchange 2007 introduced a new model called Cluster Continuous Replication (CCR) which provides a near-real-time replication process. This is modelled in such Cluster Continuous Replication Architecture

a way that you have a pair of Exchange mailbox servers (and they can only be doing the mailbox role, meaning you’re going to need other servers to take care of servicing web clients, performing mail delivery etc), and one of the servers is “active” at any time, with CCR taking care of the process of making sure that the copy of the data is also kept up to date, and providing the mechanism to automatically (or manually) fail over between the two nodes, and the two copies of the data.

What’s perhaps most significant about CCR is (apart from the fact that it’s in the box and therefore fully supported by Microsoft), is that there is no longer a requirement for the cluster nodes to access shared disk resources… meaning you don’t need a SAN (now, you may still have reasons for wanting a SAN, but it’s just not a requirement any more).

NET: Cluster Continuous Replication in Exchange 2007 can deliver a 2-node shared-nothing cluster architecture, where total failure of all components on one side can be automatically dealt with. Since there’s no requirement to share disk resources between the nodes, it may be possible to use high-speed, dedicated disks for each node, reducing the cost of procurement and the cost & complexity of managing the storage.

Exchange 2007 also offers Local Continuous Replication (LCR), designed for stand-alone servers to keep 2 copies of their databases on different sets of disks. LCR could be used to provide a low-cost way of keeping a copy of the data in a different place, ready to be brought online through a manual process. It is only applicable in a disaster recovery scenario, since it will not offer any form of failover in the event of a server failure or planned downtime.

Standby Continuous Replication (SCR) is the name given to another component of Exchange 2007, due to be part of the next service pack. This will provide a means to have standby, manually-activated, servers at a remote location, which receive a replica of data from a primary site, but without requiring the servers to be clustered. SCR could be used in conjunction with CCR, so a cluster which provides high availability at one location could also send a 3rd replica of its data to a remote site, to be used in case of total failure of the primary site.

 The key point is “reasonable price”

In summary, then: reducing downtime in your Exchange environment through clustering presents some challenges.

  • If you only have one site, you can cluster servers to share disk storage and get a higher level of service availability (assuming you have the skills to manage the cluster properly). To do this, you’ll need some form of storage area network or iSCSI NAS appliance.
  • If you need to provide site-loss resilience (either temporary but major, such as a complete power loss, or catastrophic, such as total loss of the site), there are 3rd-party software-based replication approaches which may be effective, but are not supported by Microsoft. Although these solutions may work well, you will need to factor in the possible additional risk of a more complex support arrangement. The time you least want to be struggling to find out who can and should be helping you get through a problem, is when you’ve had a site loss and are desperately trying to restore service.
  • Fully supported site-loss resilience with Exchange 2003 can only be achieved by replicating data at a storage subsystem level – in essence, you have servers and SANs at both sites, and the SANs take care of real-time, synchronous, replication of the data between the sites. This can be expensive to procure (with proprietary replication technology not to mention high speed, low latency network to connect the sites – typically dark fibre), and complex to manage.
  • There are manual approaches which can be used to provide a level of service at a secondary site, without requiring 3rd party software or hardware solutions – but these approaches are designed to be used for true disaster recovery, not necessarily appropriate for short-term outages such as temporary power failure or server hardware failure.
  • The Cluster Continuous Replication approach in Exchange 2007 can be used to deliver a highly-available cluster in one site, or can be spanned across sites (subject to network capacity etc) to provide high-availability for server maintenance, and a degree of protection against total site failure of either location.

NET: The 3 different replication models which are integral to Exchange 2007 (LCR, CCR and SCR) can help satisfy an organisation’s requirements to provide a highly-available, and disaster-tolerant, enterprise messaging system. This can be achieved without requiring proprietary and expensive 3rd party software and/or hardware solutions, compared with what would be required to deliver the same service using Exchange 2003.

 

Topics to come in the next installments of the business case for Exchange 2007 include:

  • Lower the risk of being non-compliant
  • Reduce the backup burden
  • Make flexible working easier

Technology changes during the Blair era

So Tony Blair stepped down as the UK’s Prime Minister this week, just over 10 years since his ascendance to the position. Funnily enough, I got my “10 year” service award at Microsoft recently (a fetching crystal sculpture and a note from Bill ‘n’ Steve thanking me for the last decade’s commitment), which got me all misty-eyed and thinking about just how far the whole technology landscape has evolved in that time. I also did a presentation the other day to a customer’s gathering of IT people from across the world, who wanted to hear about future directions in Microsoft products. I figured it would be worth taking a retrospective before talking about how things were envisaged to change in the next few years.

When I joined Microsoft in June 1997, my first laptop was a Toshiba T4900CT – resplendent with 24Mb of RAM and a Pentium 75 processor. My current phone now has 3 times as much internal storage (forgetting about the 1Gb MicroSD card), a CPU that’s probably 5 times as powerful and a brighter LCD display which may be only a quarter the resolution, but displays 16 times as many colours.

In 1997, there was no such thing as broadband (unless you fancied paying for a Kilo- or even Mega-stream fixed line) and mobile data was something that could be sent over the RAM Mobile Data Network at speeds of maybe 9kbps. I do remember playing with an Ericsson wireless adapter which allowed a PC to get onto the RAM network – it was a type III PCMCIA card (meaning it took up 2 slots), it had a long retractable antenna, and if you used it anywhere near the CRT monitor that would be on the average desk, you’d see massive picture distortion (and I mean, pulses & spikes that would drag everything on the screen over to one side) that would make anyone think twice about sitting too close to the adapter…

The standard issue mobile phone was the Nokia 2110, a brick by modern standards which was twice as thick & twice as heavy as my Orange SPV E600, though the Nokia’s battery was only half as powerful but was said to last almost as long as the SPV’s. Don’t even think about wireless data, a colour screen, downloadable content or even synchronisation with other data sources like email.

People didn’t buy stuff on the internet in 1997 – in fact, a pioneering initiative called “e-Christmas” was set up at the end of that year, to encourage electronic commerce – I recall being able to order goods from as many as a handful of retailers, across as many as a few dozen product lines!

One could go on and on – at the risk of sounding like an old buffer. If Ray Kurzweil is right, and the pace of change is far from constant but is in fact accelerating and has been since the industrial revolution, then we’ll see the same order of magnitude change in technology as we had in the last ten years, within the next three.

In tech terms, there was no such thing as the good old days: it’s never been better than it is now, and it’s going to keep getting better at a faster rate, for as long as I think anyone can guess.

The Campaign for Real Pedantry, erm, I mean numbers

Hats off to James O’Neill for a display of true, world-class pedantry to which I could only aspire. It drives me nuts to get emails with badly formatted phone numbers which can’t be dialled on Smartphones without first editing them, and now that I’ve started using Office Communications Server 2007 (more later) as the backbone for my real office phone, it impedes the usability of that too.

James’ beef is that a lot of people incorrectly write a UK phone number which would be defined as 0118 909 nnnn (where 0118 is the area dialing code, and 909nnnn is the local number, the last 4 digits of which form an extension number in this specific example, available through DDI).

Here are some examples of number crime:

  • (0) 118 909 nnnn – Incorrect and useless. Why put the first zero in brackets at all? Nobody is ever going to dial starting ‘118’
  • +44 (0) 118 909 nnnn – Incorrect, though perhaps useful to people who don’t understand country codes. There may well be lots of people out there who don’t ever call international and don’t understand the “+44” model of dialing from a mobile phone. So maybe the (0) will indicated to them that maybe they should add it in… but it could be confusing to overseas dialers who’re calling this number – how do they know if they should dial +44 118 or +44 0 118?
  • +44 (0) (118) 909 nnnn – someone likes the brackets just a little too much
  • +44 (0118) 909 nnnn – even worse than number 2. Either drop the brackets and the 0, or drop the +44 altogether.

The only correct way to write this number is +44 118 909 nnnn, or for the truly pedantic, +44118909nnnn. Maybe you wouldn’t publish an E.164 formatted number (as the scheme is called) as your primary customer services number, and it doesn’t make sense to use it for numbers that won’t be dial-able from abroad (eg some 0870 numbers or 0800 numbers). But for everything else, I’d encourage everyone to please make sure your email signature has a properly formatted number (either simplifying it by dropping the +44 or losing the brackets and leading zero). If your company publishes your number in its online address book, then make sure that’s formatted correctly too so that people using telephone-aware systems (like Windows Mobile or Outlook Voice Access) can correctly call you.

In my profession, if someone doesn’t figure that +44 118 909 nnnn is my phone number and that if they’re in the UK and not in the Reading area, they need to drop +44 and add “0” if they’re dialing from a plain old phone system, then I’m quite happy to have them not phoning me up…