Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!
Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!
I read Matt Aslett’s (The 451) post on the golden age of open source with interest. In it he describes that we’ve arrived at the fourth stage of open source, which is â€ťin short: a return to a focus on collaboration and community, as well as commercial interests.”
What we’re doing with membase.org definitely falls in line with this description although with a slightly different twist. NorthScale saw the need for a simple, fast, and elastic NoSQL database that we felt wasnâ€™t being met by existing technologies. When it became clear that many prominent companies shared this view and were committed to an open source solution, NorthScale stepped in to shepherd the development of a broad community around the membase.org project. Consistent with Matt Aslettâ€™s description of open source 4.0, the result is a project with an â€śemphasis on collaboration and community rather than control.” While NorthScale has contributed the bulk of the code to the project, our customers Zynga and NHN are co-sponsors of the project who have a strong commitment to its success. This blurring of the line between vendor and customer â€“ the collaboration between two seemingly opposite sides of a transaction â€“ has long set open source apart from the large proprietary vendors who want nothing more than a lock on their customers.
Traditionally, the primary attraction to open source, and what enabled it to make inroads in the enterprise, has been cost. This is the “cheaper Oracle than Oracle” model where the technology is not necessarily solving any new problems in the market, but provides a cheaper open source version of something enterprises are already paying for.
However, when I talk to enterprise companies, lowering costs no longer cuts it as a sole driver for open source technology adoption. On the other hand, if we engage our customers around a very real and painful problem theyâ€™re dealing with â€“ in our case, the mismatch between relational databases and the needs of interactive web applications â€“ and demonstrate how we’re solving this with innovative new technology, then we can have a discussion.
In a nutshell, the fourth stage of open source is much more than just a return to community and collaboration â€“ it’s about putting open source front and center as an engine of innovation. We’re seeing an emergence of open source projects that solve a new problem and create a new solution that eases this pain point. The source code just happens to be open because it’s what we have all come to expect. This is particularly true of infrastructure software going forward, where it’s expected that some component, if not all of it, is available as open source.
We believe open source 4.0 is characterized in part by projects that solve new problems with innovative solutions and use a highly collaborative model. We encourage the participation of both â€ścorporate sponsorsâ€ť and passionate individuals who are willing to contribute to the membase roadmap and strengthen the community.
Recently, Attila KiskĂł, the author of the best .NET memcached client, the Enyim .NET memcached client, has been enhancing his client library to speak directly to membase data nodes.Â Membase already supports all existing memcached client libraries and memcached protocols via a high-performance proxy, but there’s a “direct path” that client libraries can use for ever-increased performance.Â Along the way, we ended up with a quick guide on the membase.org wiki on how to create your own native or “smart” membase client library, so anybody else with their own favorite programming language can also do the same.
The easiest approach is to start with your favorite memcached client library (that speaks memcached binary protocol) and proceed from there.Â The fun partÂ is with handling the cases during Rebalance operations to allow for seamless cluster elasticity without data loss, but who doesn’t like fun challenges like these?
I am excited to announce that NorthScale Membase Server 1.6 Beta3 is now available and ready for download.
This beta release adds a lot of new functionality and reflects most of what youâ€™ll find in the final product. Highlights include:
Letâ€™s take a look at these features in a bit more detail:
Windows support is by far one of the most frequently requested features, and we are very pleased to offer it with this beta release. Beta3 provides 32-bit Windows support, with 64-bit support on the way (Note: The 32bit binary runs just fine on Windows 64-bit but is subject to the 32-bit memory limits). The Windows version provides the same feature set as our Linux version.
Multi-tenancy is the mechanism for creating multiple buckets on one membase cluster. Each bucket represents a separate namespace, but more importantly it also provides a resource control mechanism on a per bucket basis, allowing buckets to have different behavior. For example if you have some data you consider very important, you may want to create a bucket with a replica count of 3; for other less crucial data, a replica count of 0 might make sense. This way you can decide how to divide the cluster resources to accommodate different requirements for different applications or different types. No more one size fits all!
Bucket quotas are worth a bit more explanation. Each time you create a cluster, you set a fixed amount of memory that each server node in the cluster will contribute to the total cluster memory that buckets can consume. Once set, this value will be inherited by any server joining the cluster and cannot be changed. Hence, the total memory available for membase use in the cluster increases by this amount with each addition of server to the cluster.
Similarly, each bucket defines a memory quota that sets the amount of memory it can use out of the cluster total memory. This quota does not change as you add servers to your cluster, but you can manually edit this on the â€śManage Bucketâ€ť screen.
In addition to the memory quota, there is also a disk quota associated with each bucket. In contrast to the memory quota, there is no fixed limit of disk space that each server brings to the cluster; all free disk space on the assigned storage path may be used. It is up to sysadmin to make sure that each node provides sufficient space to accommodate the data written (and you can track free disk space in the new Cluster Overview dashboard). Disk quotas are not yet enforced in Beta3, but you can already use it to monitor your bucketâ€™s usage versus the quota.
The Cluster Overview provides a single cluster overview dashboard, showing you the most crucial stats of your cluster in one place.
As you can see you get a single page to keep track of the memory and disk usage of all your buckets, as well as how many operations your cluster is performing. The â€śdisk fetches per secondâ€ť serves as a potential issue indicator. If you are seeing a lot of disk reads it means that the working set for at least one of your buckets does not fit into RAM alone anymore. Disk reads are much higher latency than memory reads, so should this happen you can use the Data Bucket monitor section to drill down and understand which bucket is encountering the issue. If you need to take action you can increase the bucket memory quota in the Manage Data Bucket section. Issue resolved!
As you see we packed a lot of great new features into Beta3. But there is still more to come. You might be able to guess from the new bucket creation dialog that we have another bucket type in store, which will make multi-tenancy even more exciting â€“ but for more on that youâ€™ll have to check back later.
Enjoy Beta3 and let us know how you are getting on with the new features!
We are looking forward to a great week next week at VMworld 2010 in San Francisco. It looks like it’s shaping up to be a great conference.
See a Membase demo.
If you’re at the show, be sure to come by NorthScale’s booth (#640) for a Membase demo. Membase is an elastic key-value database that stores data behind interactive web applications far more efficiently and cost effectively than it can be stored in a relational database. We’d love to show you how this highly available, cloud-friendly data layer expands and rebalances dynamically as application needs change. Just talk to anyone in the booth wearing a t-shirt with the Membase mascot (right).
Wear a t-shirt, win an iPad!
And speaking of t-shirts, pick up your own yellow membase t-shirt and wear it around the show floor to win a chance for an iPad. We’ll be giving away iPads at our booth on Tuesday (8/31) and Wednesday (9/1) at 5:30pm, so stop by during the day to get your t-shirt and find out more about how to win.
Can’t make the show?
Test drive Membase anyway.
If you won’t be in San Francisco next week, you can still take Membase for a spin by downloading it here. There are also a number of webinars available for getting started, and a very active user forum as well.
Hope to see you at the show!
Things are moving at the speed of light over here and I wanted to take a second to come up for air.
We just had our 7th weekly beta webinar and this week I did a demo/preview (albeit quick) of some of the features and functionality coming in our soon-to-be-released beta 3.
Check out the recorded webinar for a sneak-preview and then download the real thing when it’s available.
Thanks for all the feedback and please keep it coming.
P.S. If you happen to be in the area today, stop by for a beer and some eats at our parking lot BBQ (behind our Mountain View offices from 3-8pm.)
If you are a user of memcached and have deployed instances on Amazon EC2, you may have received a message from Amazon over the weekend (we received one on 8/7/2010) indicating you may have a â€śPossible Insecure Memcached Configuration.â€ť Hereâ€™s the body of the message we received:
We’ve sent you this email to let you know that we have observed that you may be running memcached in an insecure configuration. Specifically, we have noticed that you have at least one security group that allows the whole internet to have access to the port most commonly used by memcached (11211).
There has been a lot of recent attention by the security community about the lack of access controls on memcached and recently some exploits have been published. This has highlighted the importance of running with strict access controls. While we are not aware of any unauthorized access to your Amazon EC2 instances, we do believe you should have your technical team look at this immediately.
We suggest that you audit your security group settings and restrict access to only the instances and IP addresses that need access. Most users only authorize other Amazon EC2 instances to access their memcached server. If you need to access your memcached server from outside of Amazon EC2, you can also authorize just trusted addresses to access your security group.
If you need additional assistance, you can reach our Premium Support team by sending email to email@example.com.
The Amazon Web Services Team
Great email and service from the AWS team, and the suggested fix is spot on.
This posting is meant to provide some background on the issue and the alluded to â€śrecent attentionâ€ť the issue has received. The issue is relevant to all users of memcached, not just those deploying on Amazon EC2.
The genesis of this bulletin was almost certainly the result of the development of go-derper by the team at sensepost, highlighted at the blackhat USA 2010 conference on July 30, 2010.
The highlighted vulnerability can be summarized as: if you deploy memcached on a server, leave the TCP port on which memcached is configured to listen (11211, by default) exposed to the Internet, leave the memcached ASCII protocol enabled, AND you are not using SASL authentication with the memcached binary protocol, then there is a trivial way for Bad Guys to retrieve and replace most of the contents of your cache. go-derper.rb is a simple Ruby application, built by sensepost, that can be used to exploit the vulnerability.
Eliminating the vulnerability
Letâ€™s examine the vulnerability, clause-by-clause, and highlight what can be done to eliminate it, starting at the top:
â€śIf you deploy memcached on a server,â€ť
This may seem silly to consider, but there are actually options here. Not everyone needs to deploy and configure memcached on a server themselves in order to use the technology. If you are deploying memcached on a cloud platform, for example, you may simply leverage a pre-built image or even an add-on service.
We run the memcached add-on service for Heroku (itself run on Amazon infrastructure), the leading platform-as-a-service cloud provider for Ruby applications. Because we manage the memcached add-on, our deep expertise with memcached is implicitly brought to bear on behalf of the thousands of applications deployed on Heroku that leverage our memcached add-on.
Additionally, we are working closely with our friends at RightScale to make pre-configured memcached images available for those who want to deploy pre-configured memcached and membase instances on Amazon AWS.
If you are using one of these deployment options, weâ€™ve ensured the configuration is secure.
â€ś[if you] leave the TCP port on which memcached is configured to listen (11211, by default) exposed to the Internet,â€ť
If you have deployed your own instance of memcached, either on your own equipment or in a cloud computing environment, then you need to ensure a firewall is protecting the system.
Amazon provides a rich set of capabilities for expressing and enforcing access control for instances running on EC2.
NorthScale co-founder Dustin Sallings also weighed in over the weekend; his blog provides great additional detail, especially regarding firewalling.
â€ś[if you] leave the memcached ASCII protocol enabled,â€ť
As built, the go-derper exploit depends on use of the ASCII protocol.
The fact of this vulnerability is that it also exists in the binary protocol, but the binary protocol supports authentication and access control, providing a mechanism for securing the data.
The ASCII protocol, the original protocol developed for memcached, does not have any facility for authentication or access control, and thus is not suitable for hanging on the public Internet. This protocol was explicitly developed for use behind a firewall, as a â€śback-end,â€ť protected system.
In the unlikely event that you have some good reason to make the memcached port available to any host on the public Internet, but want to control access to the data, then you should disable ASCII protocol support (and enable SASL authentication on the binary protocol, as described next). The NorthScale distribution of memcached makes it easy to configure memcached to NOT bind the ASCII protocol listener to the memcached port.
â€śAND [if] you are not using SASL authentication with the memcached binary protocol,â€ť
As mentioned above, the memcached binary protocol in recent releases of memcached does support authentication and access authorization via the SASL protocol.
The NorthScale distribution of memcached makes it very easy to leverage this capability. Creating a new â€śbucketâ€ť in our memcached distribution provides both multi-tenancy capability (allowing multiple applications to securely bind to a single memcached cluster) and serves as the vehicle for SASL credential binding. It is this capability that allows us to securely support thousands of memcached add-on users up on Heroku without running thousands of individual servers.
If you are using an older version of memcached (most linux distributions ship with antiquated versions of the software), and you need authenticated access support, you should look at a more recent version of the software. I certainly recommend our distribution.
The memcached historical context
The vulnerability is not surprising. Memcached was initially built by Brad Fitzpatrick for use at LiveJournal, in an environment where control over servers and network security was managed by a skilled team of system administrators. With many lines of defense in front of memcached, there was little need to build yet another layer of security into memcached itself, where the inevitable price would be development effort (effort better spent building blogging features) and performance (in an environment where many millions of memcached transactions are processed per day, and every single microsecond counts).
In a perfect world, every person developing and deploying software should fully understand the characteristics of all the underlying software infrastructure components on which their software is dependent; have a firm understanding of network security, policy formulation and policy enforcement; and regularly audit their operational environment while tracking emerging threats. Few systems would get deployed. In fact, I think it is fair to say that some of the most popular web applications on the Internet today would never have seen the light of day under those constraints.
In the real world, there is a lot of interesting software being developed and deployed by people who are not themselves, and who frequently do not have the resources to employ, experienced system administrators and network security specialists. They just want to get their software or service in the hands of as many users as possible, as quickly as they possibly can. If and when the billions of operations per second materialize, then the microseconds can be wrung out, hopefully by a competent team of system administrators which the organization can then attract, and afford.
An aside on cloud computing
Ultimately, this is one of the promises of cloud computing, as outlined in one of our white papers. Cloud computing is not just about transforming capital to operating expenses, or about leveraging service provider economies of scale. Managed hosting providers have been doing that for over a decade. Cloud computing ultimately enables software developers to develop and deploy software, without also building up expertise in system administration and network security. Ultimately the world is a better place as a result. More developers are empowered to build and deliver software solutions.
Amazon has demonstrated part of the value they provide to their customer base: they have tracked a newly highlighted vulnerability that is widely relevant (given the broad deployment of memcached) to their users, identified specific users at risk (possible given the metadata used to configure the virtual machine instances that ultimately underlie running systems on EC2) and notified them in a timely manner precisely how to deal with the problem. Serious value add.
Where do you get your memcached?
The team here at NorthScale provides the vast majority of the contributed source code to the memcached and membase open source projects. We respect the clearly expressed desire of the larger memcached development community that the core of memcached should remain raw, fast and best suited for those who know what they are doing.
We also make available commercially-supported, certified, less â€śrawâ€ť versions of those systems, making it easier for organizations to deploy, configure, secure and manage the software. While the memcached development community doesnâ€™t want the code base polluted with â€śease of useâ€ť features, there is a much larger potential community of users of the software that will be better off with those features present. Same goes for things like replication and live cluster reconfiguration. Many users want these capabilities, but the core community would prefer to keep them out of memcached proper. We make them freely available in our distribution and make the source code available in related projects (e.g. http://github.com/northscale/bucket_engine).