Skip to content

CloudSlave
Syndicate content
Updated: 52 min 8 sec ago

I was ill last week with a throat

Mon, 03/29/2010 - 10:50
I was ill last week with a throat infection.

I started writing this blog to keep my family informed, but there are some thoughts about diagnosis and being a good patient that may be generally useful. They're towards the end.
5 days to get really sick
Sunday evening:
start of sore throat
Monday :
big painful spot at LHS of throat, .5 cm square; eating and drinking painful but possible - my last meal.
Home early from work. Slept for 11 hours.
Tuesday:
Went to a pharmacist for a local pain killer. He looked at my throat and said - OK, it's a white spot. In other words, yellow would have been different.
Started a fever. Slept 12 hours but not comfortable: mucous and swelling started to make breathing difficult by blocking airway (at top of throat, I thought).
Didn't bother about eating or drinking.
Wednesday:
Saw the doc, the mucous/swelling/white spot; got me some antibiotics but not the panic button. I described the mucous feeling - as though there were a "mucous curtain" covering my epiglottis or something. More on this later.
I did feel a bit better in the morning but then went downhill. Pain started to go down my throat: the top of my neck felt inflamed; started walking gently so I didn't jiggle it too much.
Took the first dose of antibiotics but then started into a migraine sequence, so threw up the first dose.
Thursday:
Eating and drinking were becoming a big physical and psychological hurdle, which started to become my main concern worry.
I tried water, milk, frozen Mars bar, straws - couldn't get more than two sips down.
I tried different pain-killers. Paracetomol (tylenol) is best, but only 4 doses a day. Intravenous paracetomol gets to the spot in 10 minutes; 30 minutes for tablets.
Friday:
We went back to doc: this was potentially serious. The pain was depressing my appetite, leading to minimal intake of fluids, lowering my bodies disease-fighting, in a vicious cycle.
I foresaw IV's for fluid and food at the least.

24 hours to turn around

Friday:
09:25
Saw the doc. When she saw my throat, she said "Ooh - ow - you poor thing" (lady doctor).
11:00
Into A&E at Northwick Park, the specialitist hospital in the area for ENT (Ear, Nose, Throat).
11:30
The six ladies at the admitting desk finished their information-sharing session
(lead up to Friday night after all) and realised I was there, so I got processed.
(Only me in the queue, so easy to miss me!
At NT/e, first job is for someone - anyone - to pick up the phone within 10 seconds!)
12:00
Saw the ENT on call. When he saw my throat, he said "Hmmm" (male doctor). So I knew something was up.

They stuck a cool camera down my nose, with bands of LED lights down the tube.
Like kiddies' LED sneakers (LA lights, apparently).
But narrower, thankfully.
14:00
Up on the ward with three big hairy drugs and their escorts (IV saline, IV painkiller) pumping into me.
19:00
First meal since Monday. Hospital food is great.
03:30
One of the big hairy drugs was adrenaline which they gave to me (to keep to schedule) at 10, 1, and 3:30.
My heart sounded like Marisa Tomei's biological clock ticking.
03:45
Sleep, sort of. Because I could get scary, I was right next to the nurses stand - lights all night, audio effects etc.
Saturday:
06:45
Start scuffling with IV tubes - now I'm getting mad with a no-sleep headache.
07:45
Ate a full breakfast - second meal. Wrong cereal, wrong bread. Who cares.
09:25
Ready for KATN (kick ass and take names). I'm back.
Postscript 14:00
Just looked at my throat. Crikey mighty!

On the LHS is 2-3 cm square (about a quarter of it) of yellow, infected skin, mapped a like the west cast of Scotland. There's more on the right, and that's just the unimportant stuff - obviously can't see the voice box! Doc said it was a lot worse yesterday.

The big hairy drugs have done their thing, so they can leave off the adrenaline-before-sleep-regime. If the yellow stuff is still that big, I'm glad I'm staying in for another day!
Diagnosis
The first problem was a bacterial infection in my throat - bacterial infections being more severe than viral.

However, the most worrying problem was that a second infection just above the voicebox, known as "supraglottitis". This is the one that can stop you breathing. There was also a little bit of thrush infection down there, that looked like a opportunistic outgrowth from the first problem.
Diagnosis By Friend and Family
I mis-diagnosed the threat here - I was thinking Amber threat level when it should have been Red. Why:
  1. I had inferred the pharmacist saying (whether or not he did!) white spots were OK - not so! I did indeed have a bacterial infection, which is more severe than viral.
  2. Lots of people were getting viral infections that lasted four days. My expectations were set.
  3. I had lots of throat infections like this before.
    My biggest problem was they started out viral, then went into my chest after a week as bacterial. Cue 2 weeks of misery.
    But that was OK - I could tell my bronchioles etc. were clear. But that didn't mean it was only my throat - mis-diagnosis number 2.

Should we not do "diagnosis by friends and family" then? The reason we do is that the information from other people's war stories and your own symptoms, it does work out significantly well (i.e. 95% of the time).
And when you can barely stand up, speak or walk without a stick, war stories are about all you can handle anyway.
Telling the story - be woolly
Did you notice that on Wednesday, the idea in my mind was "a curtain of mucous over the airway", but I definitely talked about the epiglottis to my GP.

Fatal mistake.
As an engineer, I like to make mental models of what's going on so.
When I said "epiglottis", I was trying to convey an analogy - something in the throat that could hold a curtain.
But I'm meant to be the woolly one! And 'epiglottis' is a big precise word to a doctor, so you can't use it for an woolly analogy
By mentioning it, I switched the doctor's mind to the wrong thing.
Jumping off the tracks
If you mention the wrong body part in the way I did (whether it involves your own diagnosis or not!), you'll get the old 1-2 from the doc:
  1. the doctor hears a medical term that might cause the "symptoms" and very quickly decide it doesn't fit
  2. now the uppercut: because the body part is wrong, your "symptom" is wrong! No more soft probing around the woolly symptom!

Same thing happens in court: discredit a witness and therefore the testimony's discredited too.
Decision graphs
A little bit of theory to explain why the above was such a mistake.

My company works with a hospital and university, as part of a project aiming to train teams of medical and emergency staff in Second Life. The bit we do is to model the diagnosis pathway - exactly the steps we're talking about above.

When we do this, there are broadly four parts in the model? (VERY SIMPLIFIED from real life of course, but enough for my point.)
  1. The presenting symptom

    If we're talking about a motorbike, the presenting symptom could be "don't start". Or for you, "head hurts".

    The presenting symptom may be all you need in A&E.
    Phineas Gage worked around explosives and had a one-inch tamping bar driven through his skull.In this case, skip to #4.
  2. Gathering information

    Failing a tamping bar, the doctor has to zero in - typical analytical stuff. To do that, he needs information.

    From his current level of understanding of the case, the doctor has to root around to find the significant stuff. He asks questions: does it hurt when you bend your neck, twist it, stand up, where is it.
    When I was driving motorbike's, my trusty mechanic (my brother) would lie the head of the spark plug against the engine, turn over the motor, and see if you there was a spark.
  3. Zeroing in

    If you're lucky, only one of the tests comes out positive, in which case you can zero in to another "state", which is another check list.

    If you're unlucky, you may have to run multiple paths, and hope they join up later. If you've zeroed in but you've got another checklist to run through, then you're back to the previous step. You can go round the "gather information", "zero in" loop quite a few times.
  4. The diagnosis

    But eventually, there are no more questions and you have a diagnosis. Hopefully, only one! If not, see a consultant.

The point here is about getting step 2 right - and this is the cooperative step between doctor and patient. For early-stage or complicated situations, the patient can make a big difference to the outcome.
Getting better quicker
I'm really writing this because I don't think I did a good job as a patient! With a bit more smarts, I needed have spent a couple of days in hospital.

The crucial phase to get right is "Gathering information", where by mentioning the wrong body part I derailed the diagnosis. What I could have done was collect all the information only I had.

  • what it feels like - intensity level
  • how often+when+ for how long the problem happens
  • how it occurs (my position in bed affected caused or removed the curtain of mucous)
  • can you get rid of it by doing something
  • the funny things with my breathing I did to try to clear it - in and out, chest-cough/diaphragm push. Would have been a good clue to the location!

Abstract analogies are probably good. My "curtain of mucous" is 50-50 - "curtain is OK", but "mucous" is technical - it could have been secretions from the infection.

(In fact, writing this I've remembered the killer symptom: I could make a fluttering sound when there was a lot of mucous in the curtain. I tried to describe and demonstrate this to my GP - but there wasn't much mucous there because I was sitting up, so she couldn't hear it).
Acknowledgements

I'm very grateful to my GP for the referral, and the medical staff at Northwick Park Hospital for my astonishing turnaround. It was bad, but it could have been a lot worse!
Categories: Companies

The Transportation Business

Sun, 02/21/2010 - 13:24

A lot of the action we get involved with CloudTran relates to finance, investment banking and so on. But we met with a media company last week, who have a setup that seems just as suitable for CloudTran and GigaSpaces.

The company collects data about consumer responses to promotions every day - about 10 million data points - then slice and dice this into usable information for the consumer product manufacturers.They're a UK company, with a stable business based on applications in Java and coping quite happily at the moment ... but now having a little think about business and technical strategy in the longer term. This brought a few things to mind.

Globalisation and Consolidation

Consumer information collection and processing? Sounds like a business that could go global - if I were CocaCola, I'd want to get analytics across countries, regions or continents from a single supplier. If the business can possibly go global, the management has to decide if they want to play in that space. (And if they don't, they may soon find someone else occupying that high ground.)

So ... what would it take to go global? Globalisation in this area is likely to lead to consolidation of worldwide processing through a single system in order to save operational costs. This means there could be more processing runs, many of them upwards of 100 million data points.

The 2 hour day

The crunch point for this company is a small processing window, in their case the 2 hours each night where all the raw data is in and has to be turned into saleable information.

So the maximum power of the system is used for 2 hours per day - 8% usage. That matches typical CPU utilisation averages of around 5-10% in the worldwide server base.

For large enterprises, the answer to making their estate more efficient is virtualisation, which can move your system utilization up to the 40-50% range. For SMEs, the answer is the cloud. Buy two hours of a few machines - maybe $50 per night - for heavy processing power, then release them. Compared to the value of the information being provided, surely the cost must be trivial. And the "-as-a-service" model means that management, provision, backups and all that stuff is done by the provider, simplifying the SMEs own IT function.

Red-hot Oracle

Given that the company's Oracle database machine is already red-hot during peak processing, positioning for growth means looking at the data processing structure. The current process consists of a number of select+calculate+save cycles - reading then writing to the database disk.

In thinking about this, I was struck by how long the 'back-end database' architecture has survived, and how the disk vs. electronics equation has changed.

My company evaluated databases back in 1982-3 and ended up buying Oracle. At that point, the hot CPU was Motorola 68000 at 1 Mips, typically with 256kb memory. The hot disk was the Shugart/Seagate ST-412 - 10MB capacity, transfer rate of 5Mbps (million bits/sec) and 85ms average access time (on the Seagate site ... or 15-30ms in Wikipaedia!).

Nowadays, for testing we have a number of fairly hot Intel i7 920, 4 cores - which I reckon is at least 5,000X faster than the 68000 - with 8GB memory (=16,000X). The hard disk is 250Gb (25,000X), transfer rate of 150MB/s - million bytes/sec - or 1.2Gbps (240X), and access time of 13ms. (probably 6.5X).

ThenNowMultiplier (X)CPU1 Mips5,000 Mips5,000Memory256kb8Gb32,000Disk Capacity10MB250GB25,000Transfer rate10MB1.2Gbps240Access time85ms13ms6.5
So - disk capacity has kept up with CPUs and memory, but disk transfer rates are well down by comparison ... and access times have fallen off the pace dramatically. And furthermore, the discrepancy is going to keep getting worse: in the next 10 years, disk seek time and IO rates won't improve greatly whereas CPU and memory will improve by 10-100 times. There just aren't the advances in mechanical devices there are in electronics.

For an analytics application like this, paying for expensive licenses to pump information in and out of the database through a bunged-up pipe when it's irrelevant to the business - that's just a waste. It is much less effective than loading the in-coming data and doing the analysis in-memory.

Easy Scalability

What I was trying (!) to say in the previous section was, the company's database-centric architecture is almost certain to hit the wall when the volumes go up by 10-100X - these sort of changes in quantity usually trigger a qualitative change. Especially if the DB machine is
already smoking.

The combination of GigaSpaces' scalable data partitioning with CloudTran's automatic distribution and collection of data across machines meansthat more nodes, more memory and more processors can easily be deployed on the application.

As CloudTran allows you to easily create a number of deployments and keep them in step with the app, it might be worthwhile to have different size deployments on hand, and spin up large and small versions for processing depending on the size of data-set to be processed.

Data Grid = Compute Grid

A common approach to using grids is to have separate data and compute grids.

However, this company's application can hugely benefit from putting related data into a single machine; when you spread this out, the separate grids become a single data+compute grid. This means that the data to perform a distinct step in the calculation is mostly, if not completely, within the memory of the CPU running a calculation.

Our rule of thumb is that if the cost of getting a piece of data from an in-memory transactional store is 1 unit, then across the network the cost is 50-100 units, and from a database 50,000-100,000 units.

By spreading the data around, the data+compute grid also can apply many processors to the data. Multiply the above numbers by 10 or 100 nodes working on the problem and soon you're talking about real money.

Organising this with CloudTran takes no effort - the co-location of the data, and scatter-gather of information from other nodes, is all done automatically. The calculations are split up, with each node working on its sub-process, using its local data.

Overlaps

We mentioned that the typical processing cycle is select+calculate+commit. In an in-memory architecture, the select will be done in-memory, as of course will the calculate. This leaves the "commit cycle".

If the information being committed is simply intermediate calculations, then in an in-memory architecture you can skip that step entirely. However, there will be cases in long-running calculations where you won't have the time to start over if anything goes wrong: in that case, you really do want to commit this information to a database.

CloudTran makes it possible to commit results to a transaction buffer machine very quickly (i.e. small numbers of milliseconds for small transactions) and have the processing cycle move onto the next phase while the previous one commits. In other words, we overlap the commit of one stage with the in-memory processing of the next. If the slowest part is the information analysis, using CloudTran to commit in parallel means that committing costs nothing (in the critical path sense).

And What About Flash?

Now that I've moaned about hard disks, does flash ruin the whole argument? Well, for analytics applications, the answer is probably not. If in any given phase of processing (select or commit) your database machine is smoking today, then adding flash drives won't help at all because the bottleneck is the CPU. You won't be able to create a scalable solution with a "database tier" architecture.

What is the database tier doing with its CPU cycles? The database is doing some "real" processing steps, such as merging indexes and sorting for SELECT. Then there are the overhead steps. On the storage side, this means packing rows into storage pages on disk (e.g. Oracle blocks), constructing indexes and mapping them to storage. On the comms side, the overhead steps are interfacing through JDBC, serialising the response and then the processing for TCP/IP.

You may be able to improve the "real" processing steps - but to scale up by 10-100X, you'll almost certainly hit the "overhead" wall. This is why we prefer to build on an integrated data+compute tier.

Stephen Foskett has a complementary analysis of flash and cloud storage issues.

IT - Data and Processing

A while back, I visited a large IT company's headquarters with my boss. As we walked towards reception, he looked up at the huge building and said "Guess how many people work here?". I forget my answer - his was "About a third of them".

Ever since I started in enterprise IT, I have been struck by how many IT components don't do real work either. So much shipping data around goes on. If you're looking for an customer's personal information and orders, the real data might be 2KB - but you'll probably end up shifting a many megabytes around various components to get to it. Then the real processing is usually trivial - a few hundred instructions.

As Peter Drucker would have said, right now we're in the transportation business - we should be in the information business.
Categories: Companies

CloudTran Reading list

Sat, 02/06/2010 - 09:36
I just got asked for some background reading on CloudTran. I've been meaning to give my introductory reading list for some time, so here goes.

The order here is hopefully a step-by-step guide to the issues; should be a bit easier for you than jumping in at the deep end like I did.

1. http://www.openspaces.org/display/DAE/GigaSpaces+PetClinic
This is what started it all off - see slides 13-15 in the slide show.

First, it indicates there's a fair amount of non-trivial work for your regular Java application developer. Second, this raises as many questions as it answers - in particular, if you have a lot of information, how do you distribute it across a grid and then how do you integrate a transaction with backend databases and other stores.
2. Pat Helland's Apostasy - Life Beyond Distributed Transactions - the original 'slit-your-wrists' exhortation and still the best.
Pat says, forget doing distributed transactions in a scalable application - "distributed transactions are too fragile and perform poorly".

Pat's statement of the problem is brilliant; but his solution would mean that application programmers would end up doing lots of infrastructure work, which in my experience is a no-no. Surely the better answer is to productise this infrastructure functionality, so application developers have a simple sandbox and can quickly deliver business results.
2a. This leads onto Todd Hoff's highscalability.com site, and articles like http://highscalability.com/amazon-architecture. Be afraid... it's all too complicated.

3. Andy Bechtolsheim's talk at HPTS: http://www.hpts.ws/session1/bechtolsheim.pdf.
And here is James Hamilton's one-page in-flight summary.

Andy was one of the original founders of Sun.

Bottom line for the next 10 years:

- Memory and CPU's will become cheaper
- More memory and more cores
- The bottleneck of access times to hard disks is going to get 10x worse, which will mean they are gradually phased out for live data
- Flash memory will take over mainstream applications for storage sizes > main memory. But how many writes can you get out of them...4. Stanford's Case for RAMClouds. RAMClouds means 'all active data in memory rather than on disk'
And here's an easy-entry synposis of the same article.

By the time it was published, RAMClouds wasn't new ... but it does tie the previous paper into forward thinking about architecture, and gives a theoretical reasoning as to why RAMClouds will be one of the new architectures.

I actually saw Todd Hoff''s overview piece first - http://highscalability.com/are-cloud-based-memory-architectures-next-big-thing.5. The requirement. Google "600 billion RFID" and go from there.
Basically, applications will continue to get larger. A million on-line users isn't worth shouting about today. This is the case for thinking about application architectures that will survive the next 10 years - there are going to be loads of customers out there wanting information now.
6. Performance matters (admittedly from Akamai marketing literature):
2006: Respond to users in 4 seconds

And we're getting more impatient:

2009: Respond to users in 2 seconds

This is the business driver: handle more customers and give a better experience (and get a competitive edge).
7. The fundamental platform: Julian Browne's Space-based architecture.
Also GigaSpaces white papers. This is based on JavaSpaces. Here is Bill Olivier's take on the big problems JavaSpaces solves - http://www.jisc.ac.uk/media/documents/programmes/jtap/jtap-055.pdf -- see section 2.3.1.

"Jini addresses the hard distributed computing problems of: network latency, memory access, partial failure, concurrency and consistency".

The big thing developers have trouble getting their head round, is that in a scalable system every failure event must be handled as part of the application. Most developers are used to letting ops worry about failure modes. It's really hard in a large-scale distributed environment to get this right.
8. How to distribute data for application programmers: partitioning and the entity group pattern. This answers the question, "how do I spread across nodes for best performance but easy management".
Billy Newport has a good overview of grids and partitioning.

Google App Engine defines Entity Groups as the limits of transactions.
In CloudTran we use this purely to define where information goes; cloud transactions can span entity group boundaries.
9. NoSQL (forget SQL) and BASE - An ACID Alternative. The database as we know it doesn't handle scalable applications and specialised requirements well.
The thrust of NoSQL (or 'not only SQL') is: if you really want to get scalable data, you can't have SQL and ACID charateristics. And there are certainly beyond SQL databases like BigTable that have highly specialised characteristics.

For a hilarious counter, see Brian Aker's talk.

In CloudTran we provide transactionality that can provide transactionality for SQL and no SQL, coordinating in-memory data with eventual consistency at the data sources. Some of SQL functionalities for joins has to be done by hand, but it's about 90% there.
_______________________________________

This should get you started Andrea.
Categories: Companies

Java Architecture and the Cloud - Players, Patterns, Products

Mon, 01/25/2010 - 17:21
Are you curious about where Java architecture might go in the next decade ? Do you need to skill yourself up for application development in Clouds ?

Having canvassed opinion, we've decided to put together a meeting for architects, project leaders and developers to discuss how to develop large scale, high speed, fully transactional applications in the Cloud. The evening will include a review of market trends, key players, products and architectures for cloud / grid commercial applications. And we'll also be announcing CloudTran, a new product to bring the Cloud into the mainstream as a platform for Java developers.

We're delighted that we'll be joined by Dan Stone (http://blog.scapps.co.uk/) who will give us a run down of the leading products in this area based on his forthcoming book on the subject. And Jim Liddle, UK Operations Director, GigaSpaces who will talk about the game-changing features of GigaSpaces XAP.

There'll be time on the day for an open session as we're keen to get a dialogue going, so please come armed with your questions, comments, views, war stories. The evening will be held on Thurs Feb 11th from 5pm at The Masons Arms, London W1S.

If you'd like to come along, please sign up at http://www.eventbrite.com/event/533997200
Categories: Companies

What does CloudTran add over GigaSpaces XAP ?

Tue, 01/05/2010 - 16:46
CloudTran's goal is to make it as easy as possible for Java developers to write mission-critical applications that are scalable and fast. CloudTran is layered on top of GigaSpaces XAP. As both products serve Java developers and provide transactions, we are often asked what CloudTran adds over GigaSpaces. Here's the bite-sized answer.

1 Coordinating data grid and storage services. GigaSpaces transactions coordinate operations in one space or between spaces, and it has a mirror service that does asynchronous write-behind to a database. However, for multi-database or multi-space scenarios it does not preserve atomicity or consistency between the spaces and the data storage services.

CloudTran provides rock-solid ACID transactionality that coordinates operations between the data grid and storage services, without limiting the speed by disk writes. This means that developers using rich domain models spread across different data stores can use transactions without worrying about whether the particular deployment configuration can handle it.

2 Object-Space-Relational Mapping. Java developers are used to working with an object view, where relations between objects are held as object references. But in a scalable, cloud-based system, objects are distributed across different machines and object references aren't possible. Instead, the relation must use an identifier - like a foreign key - that can be used to simulate an object reference by doing a lookup and fetch under the covers. This means that there need to be different types of objects: one used by Java developers in their app, having object references; and another stored in the space with foreign keys in place of the object references.

As if that wasn't enough, backing up Java space objects to a SQL database requires the usual object-relational mapping. The code has to load objects from the database into memory, as well as saving updates from memory to the persistent store.
In other words, there are three different views of information that need format definitions and mapping operations between them. CloudTran generates code to make sure this is all done correctly: JDBC is supported out of the box; other types of persistent stores can be mapped via plug-ins.

3 Automatic Configuration and Deployment. GigaSpaces XAP is an application server-level product. This means it has more features than just a caching product, but it needs to be configured and deployed. As Stefan Norberg says in his post, Why it sucks being an Oracle customer, the down side is the developer has to do "configuration, deployments and all of that". This requires a lot of investment in learning config and deployment concepts and increases cost and risk from getting it wrong.

CloudTran provides modelling and code generation to help developers get over this hump. Modelling is via an Eclipse plugin which uses terms that developers can readily understand - entities, data sources, services and queue receivers/topic subscribers. Then the code generation makes it easy to convert the model into a production system - just add business logic.

Developers can also model multiple deployments, tied to the application model. The default deployment is to the Eclipse UI, but Windows, Linux and Amazon EC2 are supported. We have found it especially useful to be able to model deployments for different purposes (such as Dev, Test, UAT and Live) strictly driven by the application model - it avoids the finger trouble of reworking the configuration by hand when the application changes.
Categories: Companies