Orchestrate.io, Stop Dealing With the Database Infrastructure!

In this interview I talk to Matt Heitzenroder, Co-founder of Orchestrate.io and previous general manager of Basho Europe, data nerd and love of data types. In this video he talks about the data types, data structures, schema or schema-less options, graph, stores and other ideas behind Orchestrate.io. He also jumps into what exactly Orchestrate’s Mission is.

We also dive into some mentions around plans for geo, time series and what Orchestrate is doing with these data options. After a bit of high level discussion, Matt gives us some strategy and tactical around the plans for their involvement in the community, business domains and open source plans.

Close and important to my passions, we discuss some of the plans around what is coming down the pipe for open source involvement, how Orchestrate will fit into that and what code you’ll be seeing from the team.

For a sneak peak of some of the open source coming your way check out and maybe even help out with Salter, now with more Go language oompf!  https://github.com/dizzyd/salter

Architectural PaaS Cracks or Crack PaaS

Over the last couple years there have been two prominent open source PaaS Solutions come onto the market. Cloud Foundry & OpenShift. There’s been a lot of talk about these plays and the talk has slowly but steadily turned into traction. Large enterprises are picking these up and giving their developers and operations staff a real chance to make changes. Sometimes disruptive in a very good way.

However, with all the grandeur I’m going to hit on the negatives. These are the missing parts, the serious pain points beyond just some little deployment nuisance. Then a last note on why, even amidst the pain points, you still need to make real movement with PaaS tooling and technologies.

Negative: The Data Story is Lacking

Both Cloud Foundry and OpenShift have a way to plug into databases easily.

Cloud Foundry provides ways to build a Cloud Foundry Service that becomes the bound and hooked in SQL Server, MySQL, Postgresql, Redis or whatever data storage service you need. For more details on building a service, check out the echo example on the vcap sample github project.

OpenShift has what are called Cartridges which provide the ability to add databases and other services into the system. For more information about the cartridges check out Red Hat’s OpenShift Documentation and also the forums.

Cloud Foundry and OpenShift however have distinctive weak spots when it comes to services that go beyond a mere single instance database. In the case of a true distributed database such as Cassandra, HBase or Riak, it is inordinately difficult to integrate a system that any PaaS inter-operates with well. In some cases it’s irrelevant to even try.

The key problem being that both of the PaaS systems assume the mantle of master while subjugating the distributed database a lower tier of coordination. The way to resolve this at the moment is to do an autonomous installation of Riak, Cassandra, Neo4j or other database that may be distributed, stored hot swappable, or otherwise spread across multiple machine or instance points. Then create a bound connection between it and the PaaS Application that is hosted. This is the big negative in PaaS systems and tooling right now, the data story just doesn’t expand well to the latest in data and database technologies. I’ll elaborate more about this below.

Negative: Deployment is Sometimes Easy, Maintenance is Sometimes Hard

Cloud Foundry is extremely rough to deploy, unless you use Bosh to deploy to either VMware Virtualized instances or AWS. Now, you could if resources were available get Bosh to deploy your Cloud Foundry environment anywhere you wanted. However, that’s not easy to do. Bosh is still a bit of a black box. I myself along with others in the community are working to document Bosh, but it is slow going.

OpenShift is dramatically easier to deploy, but is missing a few key pieces once deployed that draw some additional operational overhead. One of those is that OpenShift requires more networking management to handle routing between various parts of the PaaS Ecosystem.

Overall, this boils down to what you need between the two PaaS tool chains. If you want Cloud Foundry’s automatic routing and management between nodes. This is a viable route, but if your team wants to manage the networking tier more autonomous from the PaaS environment then maybe OpenShift is the way to go. In the end, it’s negative bumpy territory to determine which you may or may not want based on that.

Negative: Full Spectrum Polyglot, Missing Some

Cloud Foundry has a wider selection of languages and frameworks with community involvement around those with groups like Iron Foundry. OpenShift I’m sure will be getting to parity in the coming months. I have no doubt between both of these PaaS Ecosystems that they’ll expand to new languages and frameworks over time. Being polyglot after all is a no brainer these days!

Why PaaS Is, IMHO, Still Vitally Important

First toss out the idea that huge, web scale, Facebooks and Googles need to be built. Think about what the majority of developers out there in the world work on. Tons and tons and tons of legacy or greenfield enterprise applications. Sometimes the developer is lucky enough to work on a full vertical mix of things for a small business, but generally, the standard developer in the world is working on an enterprise app.

PaaS tooling takes the vast majority of that enterprise app maintenance from an operational side and tosses it out. Instead of managing a bunch of servers with a bunch of different apps the operations team manages an ecosystem that has a bunch of apps. This, for the enterprises that have enough foresight and have managed their IT assets well enough to be able to implement and use PaaS tooling, is HUGE!

For companies working to stay relevant in the enterprise, for companies looking to make inroads into the enterprise and especially for enterprises that are looking to maintain, grow or struggling to keep ahead of the curve – PaaS tooling is something that is a must have.

Just ask a dev, do they want to spend a few hours configuring and testing a server?  Do they want to deploy their application and focus on building more value into that application?

…being I’ve spent a few years being the developer, I’ll hedge on the side of adding value.

What’s Next?

So what’s next? Two major things in my opinion.

1. Fill the data gap. Most of the PaaS tooling needs to bridge the gap with the data story. I’m working my part with testing, development and efforts to get real options built into these environments, but this often leads back to the data story of PaaS being weak. What’s the solution here? I’m in talks, ongoing, planning sessions ongoing, and we’ll eventually get a solid solution around the data side.

2. Fix deployments & deployment management. Bosh isn’t straight forward or obvious in what it does, Cloud Foundry is easily the hardest thing to deploy with many dependencies. OpenShift is easier to deploy and neither of them actually have a solid management story over time. Bosh does some impressive updates of Cloud Foundry, and OpenShift has some upgrade methods, but still over time and during day to day operations there hasn’t been any clear cut wins with viewing, monitoring and managing nodes and data within these environments.

Red Hat, OpenShift PaaS and Cartridges for Riak

Today I participated in the OpenShift Community Day here in Portland at the Doubletree Hotel. One of the things I wanted to research was the possibility of putting together a OpenShift Origin Cartridge for Riak. As with most PaaS Systems this isn’t the most straight forward process. The reason is simple, OpenShift and CloudFoundry have a deployment model based around certain conventions that don’t fit with the multi-node deployment of a distributed database. But there are ways around this and my intent was to create or come up with a plan for a Cartridge to commit these work-arounds.

After reading the “New OpenShift Cartridge Format – Part 1” by Mike McGrath @Michael_Mcgrath I set out to get a Red Hat Enterprise Linux image up and running. The quickest route to that was to spool up an AWS EC2 instance. 30 seconds later I had exactly that up and running. The next goal was to get Riak installed and running on this instance. I wasn’t going to actually build a cluster right off, but I wanted at least a single running Riak node to use for trying this out.

In the article “New OpenShift Cartridge Format – Part 1” Mike skips the specifics of the cartridge and focuses on getting a service up and running that will be turned into a Cartridge. As Mike writes,

What do we really need to do to create an new cartridge? Step one is to pick something to create a cartridge for.

…to which my answer is, “alright, creating a Cartridge for Riak!”  ;)

However, even though I have the RHEL instance up and running already, with Riak installed, I decided I’d follow along with his exactly example too. So I dove in with

sudo yum install httpd

to install Apache. With that done I now have Riak & Apache installed on the RHEL EC2 instance. The goal with both of these services is to get them running as the regular local Unix user in a home directory.

With both Riak and Apache installed, time to create a local user directory for each of the respective tools. However, before that, with this version of Linux on AWS we’ll need to create a local user account.

useradd -c "Adron Hall" adron
passwd adron

Changing password for user adron.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Next I switched to the user I created ‘su adron’ and created the following directories in the home path for attempting to get Apache and Riak up and running locally like this. I reviewed the rest of the steps in making the Cartridge w/ Apache and then immediately started running into a few issues with getting Riak setup just like I need it to be able to build a cartridge around it. At least, with my first idea of how I should build a cartridge.

At this point I decided we need to have a conversation around the strategy here. So Bill Decoste, Ryan and some of the other Red Hat team on hand today. After a discussion with Bill it sounds like there are some possibilities to get Riak running via the OpenShift Origin Cartridges.

The Strategy

The plan now is to get a cartridge setup so that the cartridge can launch a single Riak instance. That instance, then with post-launch scripts can then join itself to the overall Riak cluster. The routing can be done via the internal routing and some other capabilities that are inherent to what is in OpenShift itself. It sounds like it’ll take a little more tweaking, but the possibility is there for the near future.

At this point I sat down and read up on the Cartridge a little more before taking off for the day. Overall a good start and interesting to get an overview of the latest around OpenShift.

Thanks to the Red Hat Team, have a great time at the OpenStack Conference and I’ll be hacking on this Cartridge strategy!

References

Deploycon, PaaS & the pending data tier gravity fallout…

For a quick recap of last years Deploycon & related talks, check out my “Day #3 => DeployCon && Enterprise && Data Gravity” entry from last year.

PaaS Systems aren’t always effectively distributed. Heroku has fallen over every time east-1 has gone down at AWS. Not that I’m saying they’ve done bad, just pointing that out. With Cloud Foundry, there’s several key SPOFs (Single Points of Failure), and with all PaaS Systems the data tier is often the neglected pairing of the system. I’ve been wanting to write about this for a few months now and Deploycon has lit a fire for me to do just that.

Deploycon – “Platform Services and Developer Expectations” **

I’m on a panel at Deploycon titled “Platform Services and Developer Expectations” and this leads right back around to that. This SPOF issue is concerning to me as PaaS Providers talk up the offerings more and more with little light actually shone on this issue. In some ways each is moving away form their respective SPOFs, but overall they’re all pretty prevalent throughout. For security, each has a non-distributed database, which technically needs backed up still – no clear replication or other mechanisms setup to ensure data integrity in a failure situation. Of course, the huge saving grace with a PaaS, is that if the overall system goes down or a SPOF blows up, all the existing deployed applications will generally continue to run. Unless of course the routing and networking are also SPOF. This is the largest glaring concern with PaaS Systems that I see today.

One of the other things about PaaS that has always led to a ton of questions is “what about my PostGresql/mysql/Riak/mongodb/database thing and how do I do X, Y, Z with it to ensure scalability in my PaaS.” In almost every case it ends with a simple and unfortunate answer, “…when it comes to data, a PaaS doesn’t really do a damn thing for ya…” This is obviously not very helpful. The entire reason to put a PaaS into place is to simplify life, the sad fact that it barely does a thing for the data tier isn’t very helpful.

Now, hold on a second before you start screaming at me about “but a PaaS does X, Y and Z and isn’t even supposed to touch that aspect of things…” let me elaborate a bit more. The panel at Deploycon states “…Developer Expectations” and when things are getting simplified in the way a PaaS does, developers assume that if it does all this fancy magic for an application it ought to simplify the data side of things too! Right? Well no, and it isn’t going to for the foreseeable future. But no matter what, it doesn’t change the fact that developers often have that expectation.

Now, I could write at length about all the reasons that PaaS doesn’t really do anything for the data tier. I could wax poetic about how a distributed database (re: Riak, Cassandra, etc) just doesn’t lend itself to a cookie cutter approach to deployment under a PaaS or an RDBMS has umpteen different configurations for stability, scaling, hot swappable services, and other such complexities around the data tier. But instead I’m going to skip all, maybe cover some of those things another day, and jump right into some of the things that are actually moving forward to fill this gap.

BOSH, Cloud Foundry, OpenShift & fixing the data tier…

The most obvious reason there isn’t a simple turn key solution to the data side of things with a PaaS ecosystem is that data is complex and extremely diverse. There’s distributed key/value stores (Riak, Cassandra), there’s sort of kind of distributed databases (Mongo), graph databases (Neo4j), the age old RDBMS (DB2, SQL Server, Oracle’s Stuff, etc) and the million solutions around that, there’s key/value in memory styled databases that are insanely fast, like Redis. Expanding just slightly you have software that works around these systems such as Hadoop & Riak CS & the list goes on. All of it focused on the data tier and maintaining one, two or some form of the three points around CAP Theorem (http://en.wikipedia.org/wiki/CAP_theorem), atomicity and other key capbilities.

All of the PaaS Systems, including public and private often have some sort of plug-in style architectures for data. Whether it is Apprenda which is closed to community and closed source or an ongoing open to community PaaS like OpenShift or Cloud Foundry, things still fall almost entirely to the developers or database team to build an architecture around the data. When looking at solutions to simplify data in PaaS Systems the closed source solutions we have no idea what they’re up to in this regard. The one’s that are open source or in large part public and involved in the community PaaSes, like EngineYard, Heroku, Cloudbees and others we can really see the directions and efforts around creating real PaaS style solutions to the data tier problem.

BOSH, Vagrant, etc…  One of the best solutions I’ve seen so far is the ability of Bosh, which was created by the Cloud Foundry team while at VMware, to spool up an environment that includes such things as a Riak Cluster (or other cluster). Currently Brian McClain & Dr Nic have worked to put together such Bosh + Vagrant scripts & get things rolling. I myself will be spending some considerable time on just that. But beyond that this is a good start in enabling data tier back ends.

How to close the gap, between absurdly simple application deployment and still arduous and difficult data tier deployment? For the next several years I think we’ll have cumbersome deployment practices around the data tier. There won’t be anything as elegantly simple as Cloud Foundry’s single line deployment or AppFog’s one click deployment of a web application. The best we can do at this time, is to streamline around pieces and architectures, and at least get them into a kind of simple 3 step deployment.

Please drop a comment or two on how you think we might simplify the data side of the PaaS toolchain. Also drop a few tweets in the twitterverse too, I’m sure that’ll be exploding as usual. I’m @adron, ping me.

Cheers, happy data architecting.

** the Deployconpanel will be at 4:30pm in Santa Clara on April 2nd. Come check it out.

Thor HAMMA! OS-X Cocoa UI for Cloud Foundry

So today we’re super excited to release Thor release candidate from the furnaces of the Iron Foundry. We’ve had number of people working not he project and core Objective-C Coder Benjamin van der Veen @bvanderveen (Twitter), @bvanderveen (Github) and site tearing through tests, implementation, refactoring and UI hacking non-stop these last few weeks. I’ll admit, I think he’s slept some, but nobody knows.

With this new release, the features around…  well…  check out the video.

For a more complete list of the features check out Github, Github Issues & the Iron Foundry Blog.

Thor Project Opens Up, Building the Cloud Foundry Ecosystem with the Community

The Iron Foundry Team are big advocates of open source software. We write code across all sorts of languages, just like many of the development shops out there do. Sometimes we’re heavy on the .NET, other times we’re all up in some Java, Ruby on Rails, spooling up a Node.js Application or something else. So keeping with our love of open source and our polyglot nature we’ve created the Thor Project with three distinct apps.

Before jumping into the applications though, a little context for what and where Thor is in the grand scheme of things. We need to roll back to the Cloud Foundry Project to get into that. The Cloud Foundry Project is an open source project built around software for PaaS (Platform as a Service) which can be used to build your own PaaS internally or externally, in a cloud provider or directly on hardware. It’s your choice how, when and where you want to use it. For more context on PaaS check out my previous entry “The Confusions of IaaS, PaaS and SaaS“.

Thor Project

Cocoa for OS-X

Thor Odinson

Thor Odinson, God of Thunder

You know who Thor is right? He’s this mythic Norse God, also known as the God of Thunder. Since we’re all about bringing the hamma we welcomed Thor into our team’s stable of applications. So starting immediately we’ve released Thor into the realms for contributions and fighting the good open source software battle! If you’d like to join the effort, check out the github project and feel free to join us!

Technically, what is the Thor Application? This is a Cocoa Application built for OS-X that is used for managing, deploying and publishing applications to Cloud Foundry enabled and or Iron Foundry extended PaaS Environments.

.NET for Windows 7

The .NET Metro version of the Thor Application is also released via github with a provided installer. We’ve almost taken the same path, except of course for the very different UX and UI queues with Windows 7 and the Metro UX design guidelines.

WinRT for Windows 8

I wasn’t really sure what to call this version. Is it Metro or WinRT or Windows 8 or something else? Anyway, there is a project, it is albeit empty at this point, but it is the project where the Windows 8 version of Thor will go! For now get the Windows 7 version and install it on Windows 8, it won’t have touch interface support and things, but should work just like a regular application on Windows 8.

The Code

To get started with these, generally you’d just clone the repo and do a build, then get started checking out the code. There is one catch, for the OS-X version you’ll want to pull down the sub-modules with the following command.

git clone git@github.com:YourForkHere/Thor.git
git submodule update --init --recursive

Once you do that in XCode just make sure to then select the right project as the starting build project.

…then when the application is launched…

Thor Running in OS-X

Thor Running in OS-X

I’ll have more in the coming days and weeks about Thor & Iron Foundry. For now, check out the blog entry on the Iron Foundry Blog and subscribe there for more information.

Ways to Interact Asynchronously with C#

NOTE: All of this code is available at my Github Project “Remembering” (https://github.com/Adron/Remembering). Feel free to fork it, share it, or send me corrections or pull requests.

While working on the Thor Project there have been numerous situations where I need to fire off an asynchronous callback in C# while maintaining good responsiveness in the actual user interface. Benjamin (@bvanderveen) has been using Reactive Extensions with subscriptions to do this in the Objective-C code for the Cocoa based OS-X Thor user interface. See my previous blog entry for an example of what he’s doing.

For a good example of asynchronous calls against Cloud Foundry I setup the following project using the Iron Foundry Project VCAP Client Library. The first thing I setup was a static class with a few constants to use across the examples for the URI, username and password for the Cloud Foundry Account.

public static class YourSecrets
{
    public const string Username = "youremail@someplace.com";
    public const string Password = "AnAwesom3HardPassw0rd!";
    public const string Uri = "http://api.yourpaas.com";
}

Next step was to setup the delegate and method I’d use for calling out to the Cloud Foundry environment and retrieving data in parallel to my active console or user interface. That code snippet looked like this. I also added a private variable _finished for use in tracking when the request was completed in the begin and end invoke example below.

private bool _finished;

IEnumerable TheMethodToConnectThatWillTakeLongTime(string uri)
{
    var client = new VcapClient(uri);
    client.Login(TheSecretBits.YourSecrets.Username, TheSecretBits.YourSecrets.Password);

    _finished = false;

    return client.GetApplications();
}

delegate IEnumerable MethodDelegate(string uri);

Once I had that setup I was ready to create my baseline method that would make a synchronous call. A synchronous call is one that makes the call as if it just called the method directly. There’s no real reason to create one like I’ve done here, but I was just using it to provide a basic example of calling the delegate.

public void SynchronousCall()
{
    var starting = DateTime.Now.ToLongTimeString();

    var delegateMethod = new MethodDelegate(TheMethodToConnectThatWillTakeLongTime);
    var returnedBits = delegateMethod(TheSecretBits.YourSecrets.Uri);

    var ending = DateTime.Now.ToLongTimeString();

    Console.WriteLine(string.Format("The delegate call returned \n\n{0}\n\nstarting at {1} and

ending at {2} which takes a while of waiting.",
        returnedBits, starting, ending));

    _finished = false;
}

That gets us a baseline. If you run a synchronous call against anything with a console application or a windows app, WPF or whatever it will lock up the calling thread while it is waiting for a response. In any type of user interface that is unacceptable. One of the best options is to fire of an asynchronous callback. The way I did this, which is an ideal way to make calls with the Iron Foundry Client Library against a Cloud Foundry Environment, is shown below.

This is my asynchronous call.

public void DemoCall()
{
    Console.WriteLine("Callback:");
    var delegateMethod = new MethodDelegate(TheMethodToConnectThatWillTakeLongTime);

    var callbackDelegate = new AsyncCallback(MyAsyncCallback);

    Console.WriteLine(" starting...{0}", DateTime.Now.ToLongTimeString());
    delegateMethod.BeginInvoke(TheSecretBits.YourSecrets.Uri, callbackDelegate, delegateMethod);
    Console.WriteLine(" ending...{0}", DateTime.Now.ToLongTimeString());
}

Now the simple callback.

public void MyAsyncCallback(IAsyncResult ar)
{
    Console.WriteLine("Things happening, async state calling.");

    var delegateMethod = (MethodDelegate)ar.AsyncState;

    Console.WriteLine(" called...{0}", DateTime.Now.ToLongTimeString());

    var returnedBits = delegateMethod.EndInvoke(ar);

    Console.WriteLine(" end invoked...{0}", DateTime.Now.ToLongTimeString());

    foreach (Application application in returnedBits)
    {
        Console.WriteLine("Application {0} is in {1} state...",
            application.Name, application.State);
        Console.WriteLine(" with {0} running instances, {1} memory per instance, {2} disk allocated...",
            application.RunningInstances, application.Resources.Memory, application.Resources.Disk);
        Console.Write(" hosted at ");
        foreach (var uri in application.Uris)
        {
            Console.Write(uri + " ");
        }
        Console.WriteLine(".");
    }
}

That’ll get the call running on a parallel thread and when it is wrapped up it returns the data.

The User Interface Interaction Issue

This is all fine and dandy for the command console. But if you want to give control back to the UI thread in a UI application and make sure that the background thread can actually update a control when fired off, do the same thing as I’ve discussed here except set the control up to invoke the dispatcher, so that the “threads don’t cross” when trying to return information to a control that needs updated. In order to do this take the control that needs updated and set the Dispatcher Invoke method as shown below.

private void Write(string updateText)
{
    UpdatingTextBlock.Dispatcher.Invoke(
    System.Windows.Threading.DispatcherPriority.Normal,
        new Action(delegate
    {
        UpdatingTextBlock.Text += updateText;
    }
    ));
}

For more on the Iron Foundry Project and the library I’ve used here, check out the Iron Foundry Blog & Site. For more information on Thor and to follow or get involved check out the Thor Project Site (Hosted with Cloud Foundry at Tier 3!).

All of this code is available in my Github Project “Remembering” (https://github.com/Adron/Remembering). Feel free to fork it, share it, or send me corrections or pull requests.