Thursday, February 28, 2013

I2P: Using the Cloud for Professionals.

Suppose you want to write a paper or do some research with a group, or simply have an interactive conversation about improving your practice or bettering your business. How would you do that?

Here's my story of what's possible and how to get there...

The stock-in-trade for professional programmers is not writing code, but dealing with people and the complexities of jointly constructing or maintaining large, complex and invisible artefacts.

As a software professional, I've worked on projects and systems in the 1-2M Lines of Code range, with 50-100 coders. These are often considered "large", especially in my discipline, real-time technical programming.

To produce anything requires process, tools and discipline, but mostly automation of the important tasks. It's beyond human capability for 3 people to manage a single project without significant errors and problems or being forced into rigid compartmentalisation and code isolation.

The largest, most complex and challenging codebase know is the Linux Kernel, now 21 years old is by people distributed around the planet and who mostly never meet:
  • 38,566 files of 15,384,000 lines
  • 2,833 developers from 373 companies
  • every day, 10,500 lines added; 8,400 lines removed; 2,300 lines modified
  • A staggering 5.79 changes per hour, 365 days a year.
This work relies on the Internet and automation. Things must Just Work.

This is the largest, most active distributed collaborative project every known, Software or not.

How do these folk do this, day after day, without serious problem?
What's the secret sauce, the magic toolbox, that allows them to do all this?

To handle anything so large, complex and fast-changing would have to need amazingly complex and difficult to master tools, wouldn't it?

Just two, but only because they're based on the Internet and Open Source Software and another secret:
  • A distributed Version Control system, now "git" stores the changes to every file in a single "Code Repository". Other projects use "subversion" and "CVS". The Repositories are replicated and backed up using "Cloud computing" principles.
  • A common "toolchain". The compilers, linkers and analysis tools plus "make", the command that knows what-depends-on-what and how to compile any and all parts of the source code into an executable file or "binary".
The other secret is:
All exchanged files have very simple structures, usually plain text files.
That's it, the recipe we know that scales to over 3000 simultaneous users and 15M lines (and the same or more again in documentation) - when printed, 150,000 pages, or the proverbial "1000 ft high" pile.
Text source files, a shared Code Repository run over the Internet on Cloud services and a common set of build tools.
Apart from learning the language, the compiler and each of the programs in the "toolchain", (and reading the code!) what's the overhead and training required to get into kernel development. Surprisingly low. I'm not sure why every 2nd year computing or software engineering student doesn't have this as a course requirement to progress to 3rd year. Any 1st year student could probably do it, every 2nd year student should be competent in these tools, as much as reading reference books or using the library.

There is a very readable 10 page paper, "Submit your first kernel patch", that I encourage you to read. I don't expect you to understand any of the code or the incantations recited to do "magic".
But you will understand:
  • the process is simple and well defined, and
  • the central tools, the version control system, is very, very simple.
So, you're a working Healthcare Professional and you want to write a paper, conduct some research or collaborate with your peers. How is any of this relevant to you? That's software, not research or document writing, isn't it?

What I didn't say is that all documentation and diagrams are stored in the Code Repositories as well. They're probably larger in size. The same rules apply: simple text files, common tools, known process.

You have Microsoft Word, an email account and a PC. How hard can it be?
MS-Word has some very nifty version control and you can review and merge/reject updates from multiple authors. So everything you need is sitting right there in front of you, isn't it?


I recently went through exactly this "old-school" manual process. I started with an Apple word processing program, converting to '.doc' and then to OpenOffice format because we could all read that. Formatting was a mess, even though I'd used barely more "markup" than I use on a webpage. It took me many hours to get it readable, not close to good.

Although I'd said "the document has ONE owner, responsible for updating it", that rule was soon broken and old versions were updated and sent around.

The rule I'd proposed, "give every copy of the document a unique name" (your name + date/time), failed as well.

While there were only 3 of us collaborating on a 20 page document, we lost a substantial number of edits while wasting a rather large chunk of time doing it. Our submission was not nearly as good as it could've been and it took rather more time and effort than it deserved.

For the next iteration, we used Google Docs. It isn't designed for full-on "Project Collaboration" like the defunct Google Wave, but it is very useful to us.

GDocs has two features that enable real-time multi-author Collaboration:
  • version history, allowing you to undo changes back to a point-in-time. You can't merge/reject all changes, but its way better than nothing. Importantly you know "What got changed?"
  • In a normal browser, simultaneous access and editing. You see the names of all other people with the document open for viewing or editing. They're given a unique colour, and what they type, or even just 'select', gets highlighted.
Our next document sped out the door because we used Google Docs.

You can take an existing document and upload it to GDocs, then have it convert it to one of its own files that you can collaboratively edit. At any time, you can download, right there from your browser, a copy of that document in a variety of useful formats: PDF, ".doc", OpenOffice and as a plain text file.

This is everything you need for document collaboration, even with large, distributed groups.

The lesson from 20 years of work on the Linux kernel is that you don't need expensive, complex and cumbersome tools to succeed in difficult tasks. Just the right number of simple and reliable tools.

Google will sell your business a complete on-line replacement for all your Document Processing needs. Everything lives "in the Cloud" and you can access them from anywhere with a browser and password.
Pretty neat and appealing, eh?!?!?!

While nothing beats the "Documents in the Cloud" model for some uses, to me it seems exactly wrong for handling all internal documents. The least concern is security. An ex-employee or a hacker can get into all your files...

But mainly its the distinction between ownership and control. While you still own all your content, you no longer control access to it.

If for some reason Google goes down (it's happened) or you lose Internet connectivity (there's a thousand "moving parts" between you and your data, literally), then your business is fried for the duration...

Please understand what I'm saying: For some uses by all businesses, "Documents in the Cloud" is a perfect solution, for all use by some businesses, the dependency on the network and service provider will result in severe, even catastrophic, business impact.

I know there are competitors to Google Docs, I have had no need to go look for them and use it as an example only. It worked for me, but Your Mileage May Vary.

One very worthy product/company that worth looking at for personal use and collaboration, though I haven't tried it, is Evernote.

What's special about them is they allow you to capture text, sketches, even recordings (from anything, e.g. Pen recorders), from any platform: PC, laptop, iPad, Android tablets and smartphones, then they allow you to tag your data and search it.

Modulo the warnings about not having a local copy of all your data, Evernote seems to be setting up to fill the valuable niche the Filofax then PDA/Palm Pilot/Blackberry once filled: All my notes together.

That is important for Professional practice and for Collaboration.

No comments:

Post a Comment