The startup that I’ve been working at for the last year, Sumo Logic, has now launched its product! Our product is a service for gathering, searching, and analyzing logs: if you have software that’s generating log files, you point our collector at those files and it will upload them to our service, at which point you can slice and dice them however you want. You can do that with logs from one program or one machine, but you can also do that for logs from hundreds or thousands of machines: we’ll happily accept whatever you throw at us.
I’ve been working on distributed systems for a while; in particular, StreamStar was a distributed system of heterogeneous software running on heterogeneous machines. And when you’re working with a distributed system, surprises are going to happen; I love my unit tests, but when you’re pushing large amounts of data while wanting to meet tight performance limits, surprises are going to happen, every once in a while a piece of data won’t be where you expect it to be. And, when that happens, you need to piece together a timeline to understand and learn from the event; getting logs from all your different components and putting together a story from all of them is the way to do that.
But you won’t be able to put together a story if you don’t have a lot of logs; the flip side, though, is that you need to be able to track a single event across those logs from different machines without being overwhelmed by all the other events that are in them. So you need to deal with a lot of data while searching within it to focus on a single event while popping back out when the need arises to gather more information and test a hypothesis. We tried to do that on StreamStar, but it was hard, and the log volume was overwhelming; Sumo Logic is also a homogeneous distributed system and hence is vulnerable to the same problem, but the difference is that we can use our own product to analyze what’s going on within it! Which is awesome.
After working on StreamStar, I joined Playdom, working on their business intelligence team. There, we had to deal with logs for a different reason: instead of understanding what the different components of our own software were doing, we needed to understand what our players were doing. We needed to understand what drew players into our games, how long they stayed, what they spent money on.
We had a very good set of homegrown tools written by some extremely talented engineers. The problem was, though, that people on game teams would ask us quite natural questions that we couldn’t answer, because the homegrown tools had to be focused on doing specific types of analysis on a handful of prebaked log types to be able to perform well. As I moved out of Playdom’s business intelligence team, they’d just begun overhauling their log infrastructure to be able to do a wider range of analysis (though, I think, still with prebaked log types?); Sumo Logic’s tools, however, will accept whatever lines of text you throw at it, and let you search, parse, and analyze it. No need to spend years of engineering effort to get that benefit (years that a startup can’t afford to spend!): just stick log lines into your software, install a collector, and start querying away.
That’s my background; the Sumo Logic founders and some of the other early employees come from a different space, however, namely security. And that means that we’re quite happy to accept and analyze log files generated by third-party software (firewalls, routers, web servers) instead of logs in software that you wrote yourself. In that context, you’re trying to figure out how your systems are being used, and whether and how they’re being misused.
That sort of analysis sometimes looks like the distributed system analysis that I mentioned in my StreamStar example: if there’s a specific security breach that you’re trying to track across systems, it can look a lot like tracking down an anomaly in a distributed system. But there’s also a different sort of analysis, where you’re trying to detect a statistical signal of malicious behavior out of a sea of normal behavior.
The tools that we’re developing for that are rather fascinating. The first one to be released is the “summarize” operator: after using search to pick out a general class of logs, pipe the result through summarize, and Sumo Logic will cluster them for you. You can drill into clusters, teach the system what clusters are interesting and what clusters are expected, and in general work on teasing a signal out of what seems like noise. Useful for unexpected security events; but it’s also useful to just run a scheduled search every hour or every day where you throw all your warning and error logs at summarize to learn how your system’s behavior changes from day to day. (And, believe me, running in AWS, your system’s behavior will change from day to day…)
I mentioned my coworkers above: the thing that sealed the deal with me to join Sumo Logic was meeting all the employees when I interviewed (I ended up becoming the tenth employee) and realizing that I would actively enjoy working with every one of them. Normally, when interviewing even with a good company, there are some people whom I’m looking forward to work with, some I’m indifferent about, some I’m a little unsure about, and then there are all the people whom the people in charge of hiring don’t even trust to throw in front of job candidates. Not so with Sumo Logic: I learn something from Christian and Kumar, the cofounders, every time I talk to them; the other early hires are extremely sharp as well; and we’ve kept a quite high caliber as we’ve (slowly!) expanded since then.
So: if you’re writing a service, want to add logs to understand your software’s behavior and/or your users’ behavior, but don’t want to manage those logs, take a look at us! Or if you’re running lots of servers and want to keep track of what they’re doing, we can help with that, too! You can get a free demo account if you want to play around with the product on canned data, or sign up for a free trial if you want to feed in your own data.
Or, if you’re a programmer who likes to work with large amounts of data or distributed systems or is curious about Scala, we’re hiring! (We’re hiring for non-engineering positions, too.) I had my one-year anniversary a couple of weeks ago; it’s been a great year, I’m looking forward to many more great years.
This post has not been revised since publication.