Visual Guide to NoSQL Systems

There are so many NoSQL systems these days that it's hard to get a quick overview of the major trade-offs involved when evaluating relational and non-relational systems in non-single-server environments. I've developed this visual primer with quite a lot of help (see credits at the end), and it's still a work in progress, so let me know if you see anything misplaced or missing, and I'll fix it.

Without further ado, here's what you came here for (and further explanation after the visual).

Note: RDBMSs (MySQL, Postgres, etc) are only featured here for comparison purposes. Also, some of these systems can vary their features by configuration (I use the default configuration here, but will try to delve into others later).

Media_httpfarm5static_mevik

As you can see, there are three primary concerns you must balance when choosing a data management system: consistency, availability, and partition tolerance.

  • Consistency means that each client always has the same view of the data.
  • Availability means that all clients can always read and write.
  • Partition tolerance means that the system works well across physical network partitions.

According to the CAP Theorem, you can only pick two. So how does this all relate to NoSQL systems?

One of the primary goals of NoSQL systems is to bolster horizontal scalability. To scale horizontally, you need strong network partition tolerance which requires giving up either consistency or availability. NoSQL systems typically accomplish this by relaxing relational abilities and/or loosening transactional semantics.

In addition to CAP configurations, another significant way data management systems vary is by the data model they use: relational, key-value, column-oriented, or document-oriented (there are others, but these are the main ones).

  • Relational systems are the databases we've been using for a while now. RDBMSs and systems that support ACIDity and joins are considered relational.
  • Key-value systems basically support get, put, and delete operations based on a primary key.
  • Column-oriented systems still use tables but have no joins (joins must be handled within your application). Obviously, they store data by column as opposed to traditional row-oriented databases. This makes aggregations much easier.
  • Document-oriented systems store structured "documents" such as JSON or XML but have no joins (joins must be handled within your application). It's very easy to map data from object-oriented software to these systems.

Now for the particulars of each CAP configuration and the systems that use each configuration:

Consistent, Available (CA) Systems have trouble with partitions and typically deal with it with replication. Examples of CA systems include:

  • Traditional RDBMSs like Postgres, MySQL, etc (relational)
  • Vertica (column-oriented)
  • Aster Data (relational)
  • Greenplum (relational)

Consistent, Partition-Tolerant (CP) Systems have trouble with availability while keeping data consistent across partitioned nodes. Examples of CP systems include:

Available, Partition-Tolerant (AP) Systems achieve "eventual consistency" through replication and verification. Examples of AP systems include:

Self promotion and Credits

  • If you're a developer and looking for a job or if you're hiring developers and these data systems are important to you, consider coming to Hirelite: Speed Dating for the Hiring Process on Tuesday.
  • This guide draws heavily from a recent Ruby meetup (by Matthew Jording and Michael Bryzek) and a recent MongoDB presentation (given by Dwight Merriman).
  • Thanks to DBNess and ansonism for their help with validating system categorizations.
  • Thanks to those who helped shape the post after it was written: Stan, Dwight, and others who commented here and on this Hacker News thread.

Update: Here's a print version of the Visual Guide To NoSQL Systems if you need one quickly (warning: it's not all that pretty and I may not keep it updated, but as of 3/17/2010, it's current).

Why Networking Sucks for Introverts (and one way I'm trying to fix it for us)

Networking can really suck for introverts. I know because I'm one of them. You're probably thinking, "Of course! Introverts are shy and have trouble with social interactions." However, introversion is much more complex and encompasses an overlapping spectrum of feelings. Here's my take on it:

Intro

In general, the terms "introvert" and "extrovert" describe social preferences, not social capabilities, and it's important to remember that there's nothing wrong with tending toward one side or the other. Both have advantages and disadvantages (many that you can overcome with practice or adrenaline).

Problems that Introverts Have with Networking

These observations stem largely from the software-related meet-ups I've attended in NYC (Hackers & Founders, Hadoop, Android, etc), so they may only be applicable to technically-oriented introverts.

1. Making small talk

"The weather sure is ____." When introverts hear this, we immediately disengage. It's a struggle for us to realize that a little upfront investment in small talk can lead to a great conversation. Small talk is all about finding something to have a deeper conversation about, but often times, introverts get stuck in small talk ruts or completely blank on what to talk about, leading to awkward pauses.

2. Inducing awkward pauses

I've been a party to plenty of awkward pauses, both on the "caused" and the "affected" side. Awkward pauses happen for two reasons: struggling with turn taking or blanking on what to talk about. Blanking on what to talk about can happen because introverts have other interesting ideas we're mulling or we've run out of conversation topics. In the past, I've thought about keeping notes on conversation topics, but it's pretty weird to see someone whip out a notebook in the middle of a conversation, so I haven't done it.

3. Politely leaving conversations of no interest

If an introvert can't get out of the small talk stage or genuinely has no interest in the person they're talking to (imagine getting stuck talking to a someone from a recruitment agency that snuck in to a MySQL meetup), the conversation is over. Time to escape.

The polite introverts needlessly stick with the conversation, trying to think of a way to break it off nicely. From personal experience, these dreaded conversations can last up to half an hour. All the while, you're catching bits and pieces of interesting conversations all around you.

The less polite introverts either have "I don't care" plastered across their faces or just walk away. I have seen both. The latter is much more entertaining.

4. Having group discussions instead of 1-on-1 conversations

If there's a type of conversation that introverts love, it's 1-on-1 conversations. It's nerdy, but it's great to get into an intellectual property debate with one other person. However, at most "networking events" it's tough to get a 1-on-1 conversation. Often times you're stuck with a group.

Introverts have a lot of trouble with group conversations. We feel like we can't get a word in - other people are always talking! We feel like we have to keep up with the main conversation and all the little side conversations that keep splitting off. And a lot of times, it's hard to even join a group conversation.

Joining a group conversation is hard because everyone already involved is participating in the conversation. It's hard for them to include someone who has just popped in. It sounds strange, but I've seen people walk up to a group conversation, stand there for 10 minutes, and then walk away without ever saying anything.


What I'm doing about it

I've resolved to help fix these problems for a subset of introverts in a subset of networking situations. I want to help software engineers, who are definitely more introverted than the general population, network to find jobs and meet companies. To accomplish this goal (and others - a subject of another post), I'm introducing Hirelite.com: Speed Dating for the Hiring Process.

At a Hirelite event, software engineers will go on 5-minute "speed interviews" with companies. Making small talk and creating awkward pauses will be less of an issue because the conversations will be short and focused on how each party can help the other. Starting new conversations and politely leaving conversations of no interest will be of little concern due to the 5-minute time limit and rotation to the next conversation. And group conversations will be minimized: one company (possibly two people) speaking with one software engineer.

Hirelite is having its inaugural event on March 16th in New York City. For this event, there is only space for 20 companies and 20 software engineers, so let us know early if you would like to attend.