Visual Tech Job Board Comparison

It's hard to determine if a tech job board is worth watching (if you're a job seeker) or worth posting to (if you're hiring), so I made a quick visual comparison of job boards in New York City.

I used metrics that were easiest to quantify quickly through examining up to 300 recent local tech job posts on each of these sites, so you should definitely consider metrics other than what I've mentioned here (namely what types of job seekers frequent each job board). A few notable job boards are missing due to technical constraints that I didn't have time to overcome while scraping data (Startuply and Monster load some posts using JavaScript, and TheLadders and LinkedIn require logging in). I've tried to be as objective as possible, but I run Hirelite: Speed Dating for the Hiring Process, so keep that in mind.

A few notes and observations after the graphic...


What each metric means

  • Cost - the cost of a single post. The life time of the post varies per site.
  • Headhunter posts - the number of posts originating from recruitment agencies as opposed to from companies that are hiring.
  • Typical company sizes - an estimate of the typical size of companies posting to a job board based on the company name, funding stage, salary/equity balance, and other information contained in the post.
  • 20 most frequent words - the words most often used in job posts at a particular job board. Technical note: I used Lucene (and the StandardAnalyzer) to help with text processing and frequency calculations, so very common words (a, the, ...) are excluded. Additionally, some special characters were omitted from words (see observations for effects).

Observations

Keep in mind that more (and more random) samples would be ideal, but here are some preliminary observations:

First, I noticed that "c" appeared much more than I expected. Companies can't be requesting C skills enough to put it in the 20 most frequent words used Craigslist and Stack Overflow! Well... maybe Stack Overflow. It turns out, the tool I used process the text (Lucene) cut out special characters, normalizing C++, C#, Objective-C, and C all to C, thus inflating the frequency of "c".

The words "you", "we", and "our" appear very high on 37signals, Craigslist, Hirelite, NextNY, and Stack Overflow, but are much less emphasized on CareerBuilder and Dice. Are the job posts less personal or intimate? Does this matter? From looking at the posts in more detail, it seems to correlate with a greater focus on more specific requirements in posts on CareerBuilder and Dice. Note that the word "years" as in "3 years of Java experience" appear in the top 20 most frequent words of CareerBuilder and Dice; however, they do appear in the top 50 most frequent words of all the job boards surveyed.

I highlighted words in top 20 most frequent word lists that I thought correlated to technical skills or softer skills to observe the relative importance of each, but I don't see any discernible pattern. Additionally, many of the meanings of these words depend highly on their context (requirements section vs responsibilities section vs about the company section).

Initially, I thought that posts from larger companies correlated with a higher number of recruiter/confidential posts, but then I got to NextNY where many posts for positions at small to medium sized companies are recruiter/confidential posts. Maybe recruiter/confidential posts will appear in high numbers wherever they're allowed? Hirelite and Stack Overflow have policies against posts where the hiring company is not named, but I don't know of any explicit policy on 37signals (though they have no recruiter/confidential posts). Does anyone know if they have a policy about these posts?

Finally, let me know what you see in the data or if you have other ideas of what to do with this type of data. I'm considering doing some kind of analysis of how typical job post language compares to typical English - I predict probably an inordinate use of "pirate" and "ninja".

Data (including top 50 most frequent words)

37signals
Cost (single post): $400
Headhunter posts: 0%
Typical company sizes: generally medium sized companies or funded small companies
50 most frequent words: we, experience, our, you, web, have, design, team, work, development, business, can, developer, your, who, software, looking, end, ruby, new, rails, project, management, skills, working, us, strong, requirements, about, from, css, well, knowledge, front, things, technologies, jquery, html, systems, php, all, years, use, technology, some, should, projects, javascript, help, has

CareerBuilder
Cost: $419
Headhunter posts: 65%
Typical company sizes: large companies
50 most frequent words: experience, skills, management, business, technology, development, job, work, requirements, our, technical, systems, project, information, knowledge, support, must, years, software, data, have, your, team, ability, required, strong, services, security, robert, half, all, working, email, time, us, opportunity, we, sql, contact, developer, more, new, network, industry, design, you, company, system, server, application

Craigslist
Cost: $25
Headhunter posts: 46%
Typical company sizes: all company sizes
50 most frequent words: experience, our, software, development, you, we, work, skills, have, team, new, design, business, strong, knowledge, management, systems, web, c, your, java, developer, technical, years, requirements, please, ability, working, applications, environment, job, must, programming, project, all, company, data, time, technology, product, sql, looking, candidates, york, client, solutions, plus, services, from, well

Dice
Cost: $459
Headhunter posts: 51%
Typical company sizes: large companies
50 most frequent words: experience, business, skills, development, management, team, work, knowledge, services, technology, systems, project, technical, new, client, years, data, strong, design, java, support, developer, our, you, required, software, information, web, financial, have, description, working, ability, all, solutions, position, application, requirements, sales, applications, company, other, manager, your, must, title, environment, including, york, understanding

Hirelite
Cost: $100
Headhunter posts: 0%
Typical company sizes: seed stage to medium sized companies
50 most frequent words: you, we, our, experience, software, team, web, work, have, your, development, skills, new, looking, design, engineer, from, environment, applications, java, years, strong, technology, get, working, plus, senior, about, us, technologies, developers, who, systems, ruby, javascript, can, business, product, platform, people, like, engineering, building, what, want, understanding, technical, other, developer, company

NextNY
Cost: $0
Headhunter posts: 43%
Typical company sizes: seed stage to medium sized companies
50 most frequent words: experience, you, we, work, have, our, skills, your, team, development, new, web, product, business, working, strong, client, management, media, from, apply, all, clients, project, online, data, software, ability, looking, design, marketing, can, years, technology, us, time, sales, including, high, company, about, requirements, must, technical, services, environment, who, advertising, please, lead

Stack Overflow
Cost: $350
Headhunter posts: 0%
Typical company sizes: generally medium sized companies or funded small companies
50 most frequent words: you, experience, our, we, development, software, work, team, have, c, skills, new, systems, from, web, technology, design, your, knowledge, strong, working, developers, programming, java, developer, looking, applications, environment, years, technical, high, including, code, business, application, management, about, projects, technologies, all, ability, well, requirements, performance, media, engineer, us, science, more, computer

Visual Guide to NoSQL Systems

There are so many NoSQL systems these days that it's hard to get a quick overview of the major trade-offs involved when evaluating relational and non-relational systems in non-single-server environments. I've developed this visual primer with quite a lot of help (see credits at the end), and it's still a work in progress, so let me know if you see anything misplaced or missing, and I'll fix it.

Without further ado, here's what you came here for (and further explanation after the visual).

Note: RDBMSs (MySQL, Postgres, etc) are only featured here for comparison purposes. Also, some of these systems can vary their features by configuration (I use the default configuration here, but will try to delve into others later).

As you can see, there are three primary concerns you must balance when choosing a data management system: consistency, availability, and partition tolerance.
  • Consistency means that each client always has the same view of the data.
  • Availability means that all clients can always read and write.
  • Partition tolerance means that the system works well across physical network partitions.

According to the CAP Theorem, you can only pick two. So how does this all relate to NoSQL systems?

One of the primary goals of NoSQL systems is to bolster horizontal scalability. To scale horizontally, you need strong network partition tolerance which requires giving up either consistency or availability. NoSQL systems typically accomplish this by relaxing relational abilities and/or loosening transactional semantics.

In addition to CAP configurations, another significant way data management systems vary is by the data model they use: relational, key-value, column-oriented, or document-oriented (there are others, but these are the main ones).
  • Relational systems are the databases we've been using for a while now. RDBMSs and systems that support ACIDity and joins are considered relational.
  • Key-value systems basically support get, put, and delete operations based on a primary key.
  • Column-oriented systems still use tables but have no joins (joins must be handled within your application). Obviously, they store data by column as opposed to traditional row-oriented databases. This makes aggregations much easier.
  • Document-oriented systems store structured "documents" such as JSON or XML but have no joins (joins must be handled within your application). It's very easy to map data from object-oriented software to these systems.

Now for the particulars of each CAP configuration and the systems that use each configuration:

Consistent, Available (CA) Systems have trouble with partitions and typically deal with it with replication. Examples of CA systems include:
  • Traditional RDBMSs like Postgres, MySQL, etc (relational)
  • Vertica (column-oriented)
  • Aster Data (relational)
  • Greenplum (relational)

Consistent, Partition-Tolerant (CP) Systems have trouble with availability while keeping data consistent across partitioned nodes. Examples of CP systems include:

Available, Partition-Tolerant (AP) Systems achieve "eventual consistency" through replication and verification. Examples of AP systems include:

Self promotion and Credits

  • If you're a developer and looking for a job or if you're hiring developers and these data systems are important to you, consider coming to Hirelite: Speed Dating for the Hiring Process on Tuesday.
  • This guide draws heavily from a recent Ruby meetup (by Matthew Jording and Michael Bryzek) and a recent MongoDB presentation (given by Dwight Merriman).
  • Thanks to DBNess and ansonism for their help with validating system categorizations.
  • Thanks to those who helped shape the post after it was written: Stan, Dwight, and others who commented here and on this Hacker News thread.

Update: Here's a print version of the Visual Guide To NoSQL Systems if you need one quickly (warning: it's not all that pretty and I may not keep it updated, but as of 3/17/2010, it's current).