Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!
Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!
Whenever I talk about Membase with candidates, employees, or friends, I feel more and more excited about what we are building and how it is going to impact the industry. Each discussion validates my belief that what we do *is* unique and a game changer.
Just today, we had two important âwins,â one from a prospect who evaluated our technology against other NoSQL databases and chose Membase. I canât talk much about it yet, but this is an amazing win. The second is the fact that IDC chose us as an innovative company to watch. Great day!
Every morning when I look at my calendar, I find myself looking forward several things. At the top of the list is the meetings in which I am going to discuss Membase technology, meet smart people, and demo a data management solution they can get excited about. I also look forward to the end of each day to see what improvements are in the latest build that make it even better for Membase users. Itâs fun being in a position where people are hungry to learn about what you do and how you do it.
After five years in a big company I now remember how much I love being in a startup: to be able to move quickly, change direction fast when needed, develop features in days that in other environments would take weeks or even months, wear multiple hats, and most importantly, be close to the customer. I like building meaningful systems that solve real problems. This company is an amazing place to be and it is getting better every day.
And, by the way, we are *always* looking for great people to join us. If youâre one of them, just shoot me an email.
Winning awards is always fun. Over the years, companies Iâve been part of have won their fair share. But not all awards are created equal. Some definitely carry more weight than others, and I put the IDC Innovative Company to Watch award in this category. The fact that IDC does extensive research on the markets they address, talks regularly to a broad set of vendors and customers, and has a rigorous process for award selection all brings great credibility to the award. The award certainly has great meaning for us and I suspect this is also true for organizations who are thinking through what database to use for their next project.
As a small company itâs always a challenge to get the word out about your products and this is particularly true when youâre in a space like NoSQL where there are lots of competing technologies. Membase wasnât one of the first NoSQL products in the market, so itâs encouraging that our innovative work and early customer success is being recognized so quickly. Weâre very proud that while IDC could have given the award to any of the many NoSQL contenders, they chose to give it to us.
While we are thrilled to be recognized as a company to watch, it is even more gratifying that IDC understands the strategic importance of this new category of databases for enterprise customers and the significant near-term opportunity (tens of millions of dollars) it represents for companies like ours. IDC notes that they are seeing âan âintensifying trendâ for application development to move to the Web, creating the need for back-end architectures that demand extreme speed and scale elasticity while maintaining high levels of reliability. I can second that. Weâve seen a marked increase in the interest and uptake of our software â we just hit a run-rate of 30,000 downloads a month, and judging by this heightened demand in the marketplace, customers with interactive web applications are clearly looking for alternatives to complement their relational database solutions.
Weâre also excited about the range of customer interest. Yes, our customers include many Web 2.0 type companies such as social gaming and ad targeting platforms among others â but many enterprise customers are now recognizing the need for non-relational solutions on the back end. A recent InformationWeek survey indicated that 44% of IT staff in the enterprise had not yet heard of NoSQL databases. But that means that 56% have heard of NoSQLâ and in fact, if our interactions with customers is any indication, many of those already have pilots underway. From financial services to retailers to media companies, weâre seeing a growing number of inquiries and engagements in the enterprise and expect those numbers to increase as the value of NoSQL becomes more widely understood among those in mainstream IT.
Iâd love to hear from you, especially if you are involved with web-based application development for an enterprise. What are your plans for exploring this emerging class of data management solutions optimized to support interactive web applications?
We NorthScalers have been hard at work and are proud to release Membase Server Beta 4, our final Beta release ahead of our general availability release.
Go and grab it here!
In addition to support for 64-bit Windows, we think you’ll be particularly excited by a major new feature in the release: memcached buckets!
Introducing Memcached Buckets
You now can create buckets in your Membase Server cluster that behave exactly like memcached, which means you can use Membase Server as a drop-in replacement for your existing memcached setup. In a single cluster you can now share the resources between memcached buckets and membase buckets.
Let’s look at the differences between memcached and membase bucket types:
Fundamentally, membase buckets are designed as permanent data stores. Once you put a key-value (KV) pair into a membase bucket it will remain there until you remove it (or the time-to-live expires). In a membase bucket, data will be written to disk, so your store can grow, constrained only by the available disk space. In addition, membase buckets offer replication; further, they are using vbuckets to allow data to be moved between nodes as cluster topology changes.
On the other hand, memcached buckets follow the memcached semantics: they are fundamentally designed as caches, not permanent data stores. As the cache runs out of available cache memory, items are evicted from the cache, based on a least-recently-used (LRU) policy. As a result, your application needs to be able to cope with the expected behavior, namely that an item stored in the cache may not be available at some later point. If that’s not the behavior you want, then you need to consider membase buckets as an alternative.
And, just like memcached, memcached buckets do not persist data to disk and there is no replication between nodes. When you add a node, keys served from the new node are no longer accessible from the old node, as they do not get transferred from the old node to the new one. There will be cache misses and the KV pairs will need to be set again on the new node – again, these are the normal, expected behaviors of a memcached setup.
Memory quota allocation for memcached buckets is identical to the that of the current NorthScale Memcached Server product. A fixed amount of memory per node is allocated for use by the memcached bucket, so adding or removing nodes will change the size of the memcached bucket. This is different from membase buckets, where the quota stays unchanged as the number of nodes changes, but we chose to keep memcached bucket behavior consistent with what current memcached users are accustomed to.
The new setup wizard lets you configure the default bucket when creating a new cluster, so you can start with a memcached bucket as the only bucket, and expand from there as your needs dictate.
A Quick Word on Disk Quotas
Based on our own experience and feedback from users, we took a hard second look at our disk quota system for membase buckets. Ultimately, we decided to remove that option. We believe this change brings the product more in line with typical database behavior: we now only return errors and run out of disk space when you actually run out of disk space Of course we are still showing disk usage per bucket and across the cluster, so that you can keep an eye on overall resource usage.
Enjoy Beta 4 and let us know what you think!
I am very excited that Membase ServerTemplates are now up and running on the RightScale Cloud Management Platform (see todayâs announcement). RightScale customers now have easy access to a leading NoSQL database for the first time, and Membase customers can rest easy that when theyâre ready to deploy their applications in the cloud they can take advantage of the leading cloud management platform in the industry.
For those who may not be familiar with RightScale ServerTemplates, theyâre really cool. They provide a kind of blueprint for what a server should do in the cloud. They let users deploy preconfigured, cloud-ready servers that know how to operate in the cloud: how to obtain an IP address, how to submit monitoring data, and how to work with other servers in a cloud deployment. In our case, the Membase ServerTemplate sets everything up so you can easily use Right Scale to deploy, provision, and manage a Membase database server running in Amazon (AWS), Rackspace, GoGrid, Eucalyptus or any other cloud service that RightScale supports.
Establishing a close relationship with RightScale was an easy decision. Many of our customers, including Zynga, already use RightScale extensively and prodded us to integrate our products with theirs. Weâve been working closely with RightScale for the past couple of months and I think youâll like what weâve put together.
Since social gaming turned out to be one of biggest initial adopters of Membase, RightScale, and Amazon (AWS), weâve collectively decided to host a webinar to talk about how leading social gaming companies are using our products to build successful businesses (register now). If youâre in the social gaming business you wonât want to miss this but even if youâre not you can still learn a lot about deploying your web application in the cloud.
So, check out our new partnership with RightScale and let me know what you think. And let us know what other hot cloud companies we should work with next. Our goal is to make Membase easily accessible no matter how and where you choose to develop and deploy your application. Tell us how we can make your life easier!
I read Matt Aslett’s (The 451) post on the golden age of open source with interest. In it he describes that we’ve arrived at the fourth stage of open source, which is âin short: a return to a focus on collaboration and community, as well as commercial interests.”
What we’re doing with membase.org definitely falls in line with this description although with a slightly different twist. NorthScale saw the need for a simple, fast, and elastic NoSQL database that we felt wasnât being met by existing technologies. When it became clear that many prominent companies shared this view and were committed to an open source solution, NorthScale stepped in to shepherd the development of a broad community around the membase.org project. Consistent with Matt Aslettâs description of open source 4.0, the result is a project with an âemphasis on collaboration and community rather than control.” While NorthScale has contributed the bulk of the code to the project, our customers Zynga and NHN are co-sponsors of the project who have a strong commitment to its success. This blurring of the line between vendor and customer â the collaboration between two seemingly opposite sides of a transaction â has long set open source apart from the large proprietary vendors who want nothing more than a lock on their customers.
Traditionally, the primary attraction to open source, and what enabled it to make inroads in the enterprise, has been cost. This is the “cheaper Oracle than Oracle” model where the technology is not necessarily solving any new problems in the market, but provides a cheaper open source version of something enterprises are already paying for.
However, when I talk to enterprise companies, lowering costs no longer cuts it as a sole driver for open source technology adoption. On the other hand, if we engage our customers around a very real and painful problem theyâre dealing with â in our case, the mismatch between relational databases and the needs of interactive web applications â and demonstrate how we’re solving this with innovative new technology, then we can have a discussion.
In a nutshell, the fourth stage of open source is much more than just a return to community and collaboration â it’s about putting open source front and center as an engine of innovation. We’re seeing an emergence of open source projects that solve a new problem and create a new solution that eases this pain point. The source code just happens to be open because it’s what we have all come to expect. This is particularly true of infrastructure software going forward, where it’s expected that some component, if not all of it, is available as open source.
We believe open source 4.0 is characterized in part by projects that solve new problems with innovative solutions and use a highly collaborative model. We encourage the participation of both âcorporate sponsorsâ and passionate individuals who are willing to contribute to the membase roadmap and strengthen the community.
Recently, Attila KiskĂł, the author of the best .NET memcached client, the Enyim .NET memcached client, has been enhancing his client library to speak directly to membase data nodes.Â Membase already supports all existing memcached client libraries and memcached protocols via a high-performance proxy, but there’s a “direct path” that client libraries can use for ever-increased performance.Â Along the way, we ended up with a quick guide on the membase.org wiki on how to create your own native or “smart” membase client library, so anybody else with their own favorite programming language can also do the same.
The easiest approach is to start with your favorite memcached client library (that speaks memcached binary protocol) and proceed from there.Â The fun partÂ is with handling the cases during Rebalance operations to allow for seamless cluster elasticity without data loss, but who doesn’t like fun challenges like these?
I am excited to announce that NorthScale Membase Server 1.6 Beta3 is now available and ready for download.
This beta release adds a lot of new functionality and reflects most of what youâll find in the final product. Highlights include:
Letâs take a look at these features in a bit more detail:
Windows support is by far one of the most frequently requested features, and we are very pleased to offer it with this beta release. Beta3 provides 32-bit Windows support, with 64-bit support on the way (Note: The 32bit binary runs just fine on Windows 64-bit but is subject to the 32-bit memory limits). The Windows version provides the same feature set as our Linux version.
Multi-tenancy is the mechanism for creating multiple buckets on one membase cluster. Each bucket represents a separate namespace, but more importantly it also provides a resource control mechanism on a per bucket basis, allowing buckets to have different behavior. For example if you have some data you consider very important, you may want to create a bucket with a replica count of 3; for other less crucial data, a replica count of 0 might make sense. This way you can decide how to divide the cluster resources to accommodate different requirements for different applications or different types. No more one size fits all!
Bucket quotas are worth a bit more explanation. Each time you create a cluster, you set a fixed amount of memory that each server node in the cluster will contribute to the total cluster memory that buckets can consume. Once set, this value will be inherited by any server joining the cluster and cannot be changed. Hence, the total memory available for membase use in the cluster increases by this amount with each addition of server to the cluster.
Similarly, each bucket defines a memory quota that sets the amount of memory it can use out of the cluster total memory. This quota does not change as you add servers to your cluster, but you can manually edit this on the âManage Bucketâ screen.
In addition to the memory quota, there is also a disk quota associated with each bucket. In contrast to the memory quota, there is no fixed limit of disk space that each server brings to the cluster; all free disk space on the assigned storage path may be used. It is up to sysadmin to make sure that each node provides sufficient space to accommodate the data written (and you can track free disk space in the new Cluster Overview dashboard). Disk quotas are not yet enforced in Beta3, but you can already use it to monitor your bucketâs usage versus the quota.
The Cluster Overview provides a single cluster overview dashboard, showing you the most crucial stats of your cluster in one place.
As you can see you get a single page to keep track of the memory and disk usage of all your buckets, as well as how many operations your cluster is performing. The âdisk fetches per secondâ serves as a potential issue indicator. If you are seeing a lot of disk reads it means that the working set for at least one of your buckets does not fit into RAM alone anymore. Disk reads are much higher latency than memory reads, so should this happen you can use the Data Bucket monitor section to drill down and understand which bucket is encountering the issue. If you need to take action you can increase the bucket memory quota in the Manage Data Bucket section. Issue resolved!
As you see we packed a lot of great new features into Beta3. But there is still more to come. You might be able to guess from the new bucket creation dialog that we have another bucket type in store, which will make multi-tenancy even more exciting â but for more on that youâll have to check back later.
Enjoy Beta3 and let us know how you are getting on with the new features!
We are looking forward to a great week next week at VMworld 2010 in San Francisco. It looks like it’s shaping up to be a great conference.
See a Membase demo.
If you’re at the show, be sure to come by NorthScale’s booth (#640) for a Membase demo. Membase is an elastic key-value database that stores data behind interactive web applications far more efficiently and cost effectively than it can be stored in a relational database. We’d love to show you how this highly available, cloud-friendly data layer expands and rebalances dynamically as application needs change. Just talk to anyone in the booth wearing a t-shirt with the Membase mascot (right).
Wear a t-shirt, win an iPad!
And speaking of t-shirts, pick up your own yellow membase t-shirt and wear it around the show floor to win a chance for an iPad. We’ll be giving away iPads at our booth on Tuesday (8/31) and Wednesday (9/1) at 5:30pm, so stop by during the day to get your t-shirt and find out more about how to win.
Can’t make the show?
Test drive Membase anyway.
If you won’t be in San Francisco next week, you can still take Membase for a spin by downloading it here. There are also a number of webinars available for getting started, and a very active user forum as well.
Hope to see you at the show!
Things are moving at the speed of light over here and I wanted to take a second to come up for air.
We just had our 7th weekly beta webinar and this week I did a demo/preview (albeit quick) of some of the features and functionality coming in our soon-to-be-released beta 3.
Check out the recorded webinar for a sneak-preview and then download the real thing when it’s available.
Thanks for all the feedback and please keep it coming.
P.S. If you happen to be in the area today, stop by for a beer and some eats at our parking lot BBQ (behind our Mountain View offices from 3-8pm.)
If you are a user of memcached and have deployed instances on Amazon EC2, you may have received a message from Amazon over the weekend (we received one on 8/7/2010) indicating you may have a âPossible Insecure Memcached Configuration.â Hereâs the body of the message we received:
We’ve sent you this email to let you know that we have observed that you may be running memcached in an insecure configuration. Specifically, we have noticed that you have at least one security group that allows the whole internet to have access to the port most commonly used by memcached (11211).
There has been a lot of recent attention by the security community about the lack of access controls on memcached and recently some exploits have been published. This has highlighted the importance of running with strict access controls. While we are not aware of any unauthorized access to your Amazon EC2 instances, we do believe you should have your technical team look at this immediately.
We suggest that you audit your security group settings and restrict access to only the instances and IP addresses that need access. Most users only authorize other Amazon EC2 instances to access their memcached server. If you need to access your memcached server from outside of Amazon EC2, you can also authorize just trusted addresses to access your security group.
If you need additional assistance, you can reach our Premium Support team by sending email to email@example.com.
The Amazon Web Services Team
Great email and service from the AWS team, and the suggested fix is spot on.
This posting is meant to provide some background on the issue and the alluded to ârecent attentionâ the issue has received. The issue is relevant to all users of memcached, not just those deploying on Amazon EC2.
The genesis of this bulletin was almost certainly the result of the development of go-derper by the team at sensepost, highlighted at the blackhat USA 2010 conference on July 30, 2010.
The highlighted vulnerability can be summarized as: if you deploy memcached on a server, leave the TCP port on which memcached is configured to listen (11211, by default) exposed to the Internet, leave the memcached ASCII protocol enabled, AND you are not using SASL authentication with the memcached binary protocol, then there is a trivial way for Bad Guys to retrieve and replace most of the contents of your cache. go-derper.rb is a simple Ruby application, built by sensepost, that can be used to exploit the vulnerability.
Eliminating the vulnerability
Letâs examine the vulnerability, clause-by-clause, and highlight what can be done to eliminate it, starting at the top:
âIf you deploy memcached on a server,â
This may seem silly to consider, but there are actually options here. Not everyone needs to deploy and configure memcached on a server themselves in order to use the technology. If you are deploying memcached on a cloud platform, for example, you may simply leverage a pre-built image or even an add-on service.
We run the memcached add-on service for Heroku (itself run on Amazon infrastructure), the leading platform-as-a-service cloud provider for Ruby applications. Because we manage the memcached add-on, our deep expertise with memcached is implicitly brought to bear on behalf of the thousands of applications deployed on Heroku that leverage our memcached add-on.
Additionally, we are working closely with our friends at RightScale to make pre-configured memcached images available for those who want to deploy pre-configured memcached and membase instances on Amazon AWS.
If you are using one of these deployment options, weâve ensured the configuration is secure.
â[if you] leave the TCP port on which memcached is configured to listen (11211, by default) exposed to the Internet,â
If you have deployed your own instance of memcached, either on your own equipment or in a cloud computing environment, then you need to ensure a firewall is protecting the system.
Amazon provides a rich set of capabilities for expressing and enforcing access control for instances running on EC2.
NorthScale co-founder Dustin Sallings also weighed in over the weekend; his blog provides great additional detail, especially regarding firewalling.
â[if you] leave the memcached ASCII protocol enabled,â
As built, the go-derper exploit depends on use of the ASCII protocol.
The fact of this vulnerability is that it also exists in the binary protocol, but the binary protocol supports authentication and access control, providing a mechanism for securing the data.
The ASCII protocol, the original protocol developed for memcached, does not have any facility for authentication or access control, and thus is not suitable for hanging on the public Internet. This protocol was explicitly developed for use behind a firewall, as a âback-end,â protected system.
In the unlikely event that you have some good reason to make the memcached port available to any host on the public Internet, but want to control access to the data, then you should disable ASCII protocol support (and enable SASL authentication on the binary protocol, as described next). The NorthScale distribution of memcached makes it easy to configure memcached to NOT bind the ASCII protocol listener to the memcached port.
âAND [if] you are not using SASL authentication with the memcached binary protocol,â
As mentioned above, the memcached binary protocol in recent releases of memcached does support authentication and access authorization via the SASL protocol.
The NorthScale distribution of memcached makes it very easy to leverage this capability. Creating a new âbucketâ in our memcached distribution provides both multi-tenancy capability (allowing multiple applications to securely bind to a single memcached cluster) and serves as the vehicle for SASL credential binding. It is this capability that allows us to securely support thousands of memcached add-on users up on Heroku without running thousands of individual servers.
If you are using an older version of memcached (most linux distributions ship with antiquated versions of the software), and you need authenticated access support, you should look at a more recent version of the software. I certainly recommend our distribution.
The memcached historical context
The vulnerability is not surprising. Memcached was initially built by Brad Fitzpatrick for use at LiveJournal, in an environment where control over servers and network security was managed by a skilled team of system administrators. With many lines of defense in front of memcached, there was little need to build yet another layer of security into memcached itself, where the inevitable price would be development effort (effort better spent building blogging features) and performance (in an environment where many millions of memcached transactions are processed per day, and every single microsecond counts).
In a perfect world, every person developing and deploying software should fully understand the characteristics of all the underlying software infrastructure components on which their software is dependent; have a firm understanding of network security, policy formulation and policy enforcement; and regularly audit their operational environment while tracking emerging threats. Few systems would get deployed. In fact, I think it is fair to say that some of the most popular web applications on the Internet today would never have seen the light of day under those constraints.
In the real world, there is a lot of interesting software being developed and deployed by people who are not themselves, and who frequently do not have the resources to employ, experienced system administrators and network security specialists. They just want to get their software or service in the hands of as many users as possible, as quickly as they possibly can. If and when the billions of operations per second materialize, then the microseconds can be wrung out, hopefully by a competent team of system administrators which the organization can then attract, and afford.
An aside on cloud computing
Ultimately, this is one of the promises of cloud computing, as outlined in one of our white papers. Cloud computing is not just about transforming capital to operating expenses, or about leveraging service provider economies of scale. Managed hosting providers have been doing that for over a decade. Cloud computing ultimately enables software developers to develop and deploy software, without also building up expertise in system administration and network security. Ultimately the world is a better place as a result. More developers are empowered to build and deliver software solutions.
Amazon has demonstrated part of the value they provide to their customer base: they have tracked a newly highlighted vulnerability that is widely relevant (given the broad deployment of memcached) to their users, identified specific users at risk (possible given the metadata used to configure the virtual machine instances that ultimately underlie running systems on EC2) and notified them in a timely manner precisely how to deal with the problem. Serious value add.
Where do you get your memcached?
The team here at NorthScale provides the vast majority of the contributed source code to the memcached and membase open source projects. We respect the clearly expressed desire of the larger memcached development community that the core of memcached should remain raw, fast and best suited for those who know what they are doing.
We also make available commercially-supported, certified, less ârawâ versions of those systems, making it easier for organizations to deploy, configure, secure and manage the software. While the memcached development community doesnât want the code base polluted with âease of useâ features, there is a much larger potential community of users of the software that will be better off with those features present. Same goes for things like replication and live cluster reconfiguration. Many users want these capabilities, but the core community would prefer to keep them out of memcached proper. We make them freely available in our distribution and make the source code available in related projects (e.g. http://github.com/northscale/bucket_engine).