|
I see a lot of buzz about Cassandra, but I don't know when it's appropriate. In what use cases would I prefer Cassandra to other data stores? What are its advantages and disadvantages of which I should be aware? |
|
For me, the main features which are most beneficial are:
The main problems will be
All of these systems behave differently and have rather subtle nuances which take time to understand; only proceed with development against something that your team are happy using. You will probably have to support it for some time, and migration will be very expensive. If it is an important decision, but you haven't made this decision before and aren't familiar with the different options, how do you decide?
(Jul 12 '10 at 18:10)
Joseph Turian ♦♦
Spend some time researching the available options - get some machines - preferably reasonable spec hardware (i.e. not VMs) - and install the possible options on to them - write some simulators to load data in and perform queries - get a feel for how it "works" from a development and ops standpoint. Repeat until you think you have an option your team is happy to work with. You will need to maintain this for a long time - it could save you many man-years choosing the right option vs the wrong option. Spending a month or two upfront should give a decent payback.
(Jul 26 '10 at 07:48)
Mark R
|
|
It might sound like a bit of a tautology, but you should use it when you need the specific features that it has and are not concerned about the features it lacks. First and foremost is well proven distribution/replication from small scale to very large, without having to change your operational model (e.g. by having to add sharding or replication yourself). It also has a richer data model than some of the alternatives within that space but a somewhat simpler conflict-resolution model, and the importance of having an active community around it should not be dismissed. As for disadvantages, the most obvious one is learning curve. Cassandra is not transactional. It does not provide any atomicity for updates across rows, and even for updates within a row you have to be careful to avoid races. If you do need transactions and would end up implementing them in some form with Cassandra, you're probably Doing It Wrong. The same concern applies somewhat to secondary indices. To use Cassandra effectively you'll need to learn about columns and supercolumns, and about denormalizing your data - which is also going to increase the total amount of space you use compared to a traditional DB. Basically, Cassandra solves a bunch of Very Hard distributed-system problems that very few other systems solve. If you have a large and growing data set, which fits Cassandra's data/consistency model better than (for example) Riak's or Voldemort's, then you should probably at least investigate how it solves those problems before you even consider solving them yourself. |
|
If you do need transactions and would end up implementing them in some form with Cassandra, you're probably Doing It Wrong. The same concern applies somewhat to secondary indices. To use Cassandra effectively you'll need to learn about columns and supercolumn true religion jeans |