Hopping on the Big Data bandwagon
In 13 years we have gone from 100,000 websites to more than 180 million, with more than 22 billion pages (Google indexes about 15.5 billion of them, maybe more). But this is mostly static data. The pages are made and are then indexed. With the emergence of the real-time web, however, we are creating far more amounts of data than just the growth of the pages on the internet. And the data changes constantly. It is a stream, as John Borthwick so elegantly describes. It’s harder to store, index, access, backup, guard against (and restore from) failure, and analyze. Measurements of the amount of data being created is hard to come by. We know Facebook stores and analyzes petabytes of data, as their 300M users log 8 billion minutes per day on the site. Twitter, with only about 70MM users so far, generates more than 1M tweets per hour and more than 27M tweets per day, and they are just getting started.
Data is more than just our explicit posts, tweets and photos, however. Just by carrying a mobile phone around, we generate thousands of discreet data events a day (phone checking in with tower, etc.) By surfing the web, we each generate thousands of additional data events a day (cookie written, ad served, ad clicked, page requested, cookie read, logged in somewhere, searched, etc.) Throughout our travels in the interactive ecosystem, various parties are logging this data. And this is where the problems emerge. It’s getting harder to do. Much harder.
Vulnerabilities surround us. Gmail, Twitter, Microsoft/Danger’s Sidekick cloud service, Amazon S3 and EC2 have all had outages. We hear about these services when they fail, even for an hour or two. Most of the failures are due to the sheer size and complexity of the undertaking.
What’s happening here? Well, the tools and methods we have used since 1995 to manage web infrastructure are breaking down at this new scale. Some parts of the chain distribute quite nicely when we throw more iron at the problem, like Apache and other web servers. And application servers generally follow suit. But two main areas, in my mind, are particularly vulnerable to the challenges of scaling, and that’s the database and methods to traverse data.
Database
Scaling the database has always been a challenge. Oracle offers RAC and other clustering solutions which can achieve scale but not often the performance optimized for low-latency, high-read applications. These are very expensive and have their own set of issues. But more generally, SQL databases and other relational database solutions really don’t scale horizontally, transparently to the application, and don’t eliminate all “single points” of failure. Others have gone into more detail here. Open source alternatives to Oracle, like MySQL and Postgres have not solved this problem.
I am not the first to recognize this problem. In fact, the NoSQL movement has been active for some time, developing alternate non-relational database solutions in the name of producing truly scaleable, high-performance databases (while giving up some of the elegant features of relational databases). Key-value store non-relational projects like BigTable, CouchDB, HBase, Voldemort, Dynamite, Amazon’s Dynamo, Cassandra, and Mongo offer different approaches here. There is something very interesting here. Facebook, Digg, Google, Twitter and many other large-scale internet properties are adopting these non-traditional database solutions for various services within their architecture.
Traversing Data
Large sets of data are a challenge to analyze and process. Basic computer science brute-force algorithms break down (or at least perform sub-optimally) at these scales and are not optimized to work in a distributed basis. One key example is the search index. To deal with the enormous size of search indexes, Google developed MapReduce which led Doug Cutting at Yahoo! to develop Hadoop, a Java framework to make it easy for data-intensive apps to work in a distributed manner. But these are not the only companies burdened by web-scale data problems. Some smart folks figured this out and formed Cloudera to commercialize Hadoop. I have been kicking myself for missing that investment opportunity. It is sure to be a winner. A number of our portfolio companies now use Cloudera’s Hadoop.
So, big data is revealing itself to be a disruptive phenomenon. I haven’t really done the topic justice, so I would very much appreciate your point of view in the comments below. I am committed to digging in deeper here.
Some iPhone Gripes, Part 1
I love the iPhone. But it is a work in progress, let’s face it. I wanted to quickly list a bunch of my specific issues with iPhone in the hopes that someone at Apple is listening out there (crazy assumption, I know…). First post of several:
- When connectivity is limited and I choose to read/delete email, must you pop up seven different dialog boxes in rapid succession telling me “can’t get mail” and “can’t delete message”? Is that really relevant? Can’t you just queue my requests and actions and re-submit them once connectivity is re-established? Blackberry has been doing that for more than a decade. They even clue me in that a task is queued with a clock icon next to an unsent message.
- When I am on a call and you pop a calendar alert or a text message arrived, must you make me deal with the alert before you let me have access to the phone controls again? Why force me out of the context of the phone call? Why can’t I access things like “speaker” and “end call” before I have responded to a calendar alert or a new arriving text message.
- Push email is not compatible with the battery capacity you have chosen. Turning on push (for exchange servers) will kill the battery within 3-4 hours, rendering this feature completely unusable. Either fix it or warn users more prominently about this.
- Why must I enter my iTunes account password every time before downloading free apps?
- It takes 4-5 steps to delete a calendar event. Really? That’s the best UI you can come up with?
- New calendar events: can’t I please click the hour on which I want the new event scheduled before clicking the “+” button to create a new event? The interface for setting the time is so tedious, I should be able to tell you when I want the event schedule by a simply gesture.
The Book Industry is in Trouble. But Piracy is Just a Symptom.
Randall Stross has an article in the NYT this morning suggesting that the book industry may soon get “Napsterized” — suffer the disastrous fate of the music industry, all because of piracy. This article performs revisionist history on the explicit actions of the music industry underlying its decline. Piracy has been a convenient culprit for media industries as their distribution shifts to digital, but it is not the only cause of their problems. It is largely a symptom of traditional market issues.
In the physical goods world, media companies maintain a monopoly over the distribution of their content. They control who can sell it, they price it at whatever level they deem appropriate and they determine if and when a consumer can buy it (think release windows). As all media goes digital, this monopoly quickly melts away. The content owner cannot control distribution (it’s too easy to copy a digital good) and as such they cannot control availability. When this control erodes, pricing pressure follows as consumers have a choice between buying and stealing.
The music industry, after the emergence of MP3 encoding in 1996, did not internalize this fundamental change. They believed they could maintain their monopoly on distribution by suing consumers who engaged in piracy, controlling release windows and limiting licenses to only a few digital outlets at the same prices of the physical goods. Consumers inherently knew that the digital good should be less expensive than the physical one (there are no hard good costs, after all) and demanded widespread access to digital downloads. It took the music industry seven long years until they broadly licensed Apple in 2003 with their full catalogs. In that time, consumers found the alternative — Napster, Gnutella and Bit Torrent. By the time iTunes took off, it was too late.
The book industry, with all this learning behind it, is making similar (but not identical) mistakes. They have licensed some of their catalog to a few eBook retailers. But there are still millions of titles not available for legitimate download. In addition, they have tried to hold pricing for the eBook at the same level as the physical book. Jeff Bezos knows consumers expect to pay less. So he subsidizes the price of eBooks in order to get the price to about $10 a book. When free is a few clicks away, convenience rules. The publishers should flood the market with their entire catalogs and price them at dramatically low prices. There should be hundreds of places to buy them online. They should make it painfully easy to buy an eBook, even risking the cannibalization of their physical books. They need to make the legitimate good superior to the pirated one.
The digital future for all media companies is likely a smaller market with inferior economics than the monopoly physical one they enjoyed for decades. To survive in this new world will require lower cost structures. But the result of not embracing this future are clear: just ask the music industry.
The Impact of Social Media on the Enterprise
For decades, companies have been defining the channels their customers must use to contact them. Social media challenges the long-held notion that companies control the conversation. “We are available by phone weekdays from 9am until 4pm Eastern Standard Time” is quickly becoming a thing of the past. “We will attempt to answer the emails we receive within 48 hours, but times vary based on incoming volume” will be no more.
In a world where any customer can, in seconds, tweet or post to Facebook a pithy product review or share an experience they had with a brand, companies are forced to entirely rethink how they interact with their customers. Step one, probably the hardest step, is realizing they are no longer in control. The power of social media has empowered the consumer to reach literally hundreds or thousands of people in seconds. And because we know a consumer’s closest friends are three to five times more likely to share the same preferences for products and brands, this newfound power is not to be underestimated.
Sure companies have Facebook pages and Twitter accounts. Yes, a few thousand companies are already searching Twitter for mentions and engaging customers. This is but a start. The real transformation happens when the companies let go of the conversation and instead work to nurture it. The brands who offer tools to their customers to increase the amount of conversation and encourage their customers to discuss the pros and cons of their products will be the winners who emerge from this disruptive time.
Companies like Get Satisfaction and UserVoice offer tools that change the balance of power between a company and its customers. Get Satisfaction has a fantastic manifesto, or “Company-Customer Pact” (http://getsatisfaction.com/ccpact), which defines a new relationship between a brand and its customers, encouraging public dialog, warts and all, but expecting productive discussion in return for the company’s helpful engagement.
While product forums from companies like Jive Software have been around for many years, I believe public conversations about brands will now be distributed in nature, spread across the web into thousands of tiny corners. The challenge for companies is figuring out how to manage this. A conversation could start with a tweet, be directed to a help forum, be responded to in email, updated in a blog post, and then broadcast on Facebook. How will this be tracked, measured and monitored? This market is ripe with opportunity for both brands and software platforms built to nurture the distributed web-wide conversation. And brands who are seen supporting a public dialog will engender more respect from their customers than those who turn a blind eye to it, or worse, try to shut it down. Ultimately, companies become more customer-centric from this disruption. I am sure United Airlines wishes they had just paid for the passenger’s guitar they broke now that the music video he recorded chronicling the ordeal spread virally and has been viewed more than five million times!
The company/customer relationship is but one relationship forever changed by social media. Similar transformations are happening between companies and their employees and companies and their vendors. New companies and tools will emerge to address these situations. At Venrock, we are looking for the entrepreneurs that are pioneering this space and embracing this opportunity. I would appreciate your point of view.
(This post appeared on Fast Company.)
Ignoring Market Signals
I am fascinated by the Hollywood studios’ war with Redbox. Today’s NYT has a nice overview and my friend Rich Greenfield at Pali Capital has been covering this for some time (registration required). Most of the attention is falling on two issues: (1) Hollywood hates the one dollar per day price point, and (2) also hates that Redbox breaks their windowing strategy by offering new releases as rentals immediately as the DVD hits the market.
Redbox’s CEO, Mitch Lowe, knows he has stumbled on a model to which millions of consumers are responding. There will be more than 22,000 Redbox kiosks by December in places like supermarkets and Wal-Mart stores. Their volume is sufficient to scare the studios. The studios’ fears? Cannibalization of DVD sales. Their argument? DVD sales are down 13.5 percent for the first half of 2009 over last year and some titles are selling 25 percent fewer copies than expected while rental income is up 8% (NYT, Digital Entertainment Group).
What baffles me is that the studios still think they are in control. The only reason this model exists, like Netflix, is because of the first sale doctrine. This section of copyright law makes it permissible for anyone who buys a copyrighted work to resell it. Therefore the studios can’t stop a wholesaler or retailer, to whom they have sold a DVD, from selling it to Redbox. And they can’t stop Redbox (or Netflix) from renting DVDs, even though they hate the practice. In the digital world, there is no first sale doctrine, and that’s why your choices of which movies to rent or buy online are terribly restricted and unreasonably priced. The studios set the terms, and no unapproved and unlicensed model can emerge.
Redbox, and Netflix before them, have found models that consumers love. They are based on low price points and high consumer convenience. Time and again we know consumers respond to these models. Consumers don’t respect windows and profit skimming (even though these are intelligent business models). In the digital world, consumers have too much choice to adhere to restrictions imposed by copyright owners. Why buy DVDs when you can download any number of the 65,000 apps in the iPhone app store? Why pay for a digital rental that expires in 24 hours when you can watch six simultaneous channels of the U.S. Open on DirecTV for no extra charge?
We now live in an attention economy. The studios haven’t yet learned that they are dramatically competing for our attention, not just our wallets. To be successful, they must look for market signals, and man is this a big one: consumers will rent more DVDs when you price them low, put them at locations where they already are, and offer the newest releases. The alternative? We’ll just do other things. That is, until the transition to the digital world is complete. Then most of these models will go away if the studios have their way. Then, like the music industry, piracy becomes a better choice and a superior good (no restrictions, low-price).
I have talked before about the need to read market signals. Note to studios: here is a big one. You should be excited, not scared.
Venrock: Shaping The Future, 40 Years of Innovation
I am very lucky to be a VC at a prestigious firm link Venrock with such a celebrated history. This year, Venrock turns 40. To commemorate that, we created a book detailing 40 Venrock entrepreneurs and the companies they built. It’s an inspiring read and motivates me to want to work with the highest caliber entrepreneurs and together create such meaningful companies who have, quite literally, shaped the future. Have a look, and please let me know your thoughts.
You can see the book in full-screen mode if you click the
button.
For a paper copy, or more info, go here.
Confusing Traction With Value
The recent distressed exit of iLike reminds us of the need to build real value in our startups if we hope to create lasting companies and wealth. I have seen cycles that last about 18 months or so in which traction gets substituted for value. Yes, we’ll sometimes see exits with big numbers attached to them during the peaks of these cycles. But only those companies that have built real value will sustain valuations during lean years and create long-term successful companies.
If we look over the past fifteen years of webtime, we see a few categories emerge where real value was created:
- Efficiency/Cost Reduction (DoubleClick, RedHat, PayPal, Craigslist)
- Monetizeable audience (Yahoo!, Google, AOL)
- Repeat customer commerce (Amazon, eBay, Netflix)
- Solve a pain point (Checkpoint, Postini)
- Create new markets (EA, Google)
The aberrations occur when traction looks like value. When Slide was funded at a $500M pre-money valuation, that was an example of traction being confused for value. Sure people posted their pictures using a Slide widget 150M times, but there was no value created. Slide did not have a real relationship with those customers and it was a stretch to believe that an ad model would occur on top of those photo widgets. Similarly, iLike may have had its application installed by 50M users, but those “customers” were simply indicating a band or two that they liked. This is not valuable data and the audience was never monetized in any meaningful way. Another aberration was around CBS’s purchase of Last.FM for $280M. Last.FM had lots of users using its free service (something like 15M). But supporting those users was expensive (bandwidth, music rights) and the users weren’t paying. CBS viewed it at the time as a play for audience, and that makes sense. But at that price, a more successful ad model would need to emerge to overcome the content and bandwidth costs. As soon as the ad market hiccuped, that price looked exorbitant and indeed now traffic has declined and its business model unproven.
It bears noting that traction often precedes value. As VCs, we often look for early traction when vetting companies. But we also need to believe, and so should entrepreneurs, that the traction will result in an asset being created which has value. In Twitter’s case, the value of the collective shared links, I believe, will prove to be enormously valuable. In addition, creating a platform where consumers willingly subscribe to brands or information sources also has value.
I think we have returned to a time where eyeballs don’t necessarily equate to value, and rightfully so.
A New Music Model? Perhaps. But not for VCs.
Brad Stone has a nice article in today’s NYT about Polyphonic, a new model for funding music artists and bands. The venture is exciting, largely because of the people involved. Terry McBride, in my opinion, is probably the most innovative businessperson in the music business and has a track record to demonstrate it. In the article, Brad Stone quotes me as casting doubt on the appropriateness of this model for venture investors. To demonstrate why, we need to make a bunch of assumptions, but I will make those assumptions as reasonable as possible:
The Polyphonic model, according to the article, will take $20M and invest $300K chunks into bands. Let’s assume that is structured as an advance against a 50/50 split on all revenue the band produces (publishing, touring, merch, recorded music sales and other licensing.) After fees to operate the business (let’s say $2M over three years), on average, Polyphonic would make 20 investments (i.e., fund 20 bands) a year over three years, or 60 bands. This assumes Polyphonic does not provide any follow-on funding to the bands. That is, on $300K, the bands must pay themselves, pay for marketing activities, tour support, record an album, seek distribution, and generally fund their infrastructure. (While possible, this is thin.)
How much revenue can a band produce? Let’s break that down. Out of the 105,575 new releases in 2008, just 1,515 sold more than 1000 units. Polyphonic likely expects these 60 acts to perform as a portfolio. Some will bomb, some will do well, and a few will do really well. (VCs manage investments this way too.) Putting some numbers around that, let’s assume 10% sell meaningless amounts, 50% do okay (under 20,000 units), 35% sell well (45,000 units) and 5% do exceptionally well (more than 100K units). Let’s say one is a huge breakout and sells 1M units. And let’s assume a 1x multiple on the sales revenue for merch, publishing, touring and other income (ie, doubling).
As the chart below shows (at $7 royalty a record ($10 ASP – 30% retail margin) and assuming the band can double its recorded music revenues through touring, publishing, merch, etc.), the total to an entity like Polyphonic would be just under $27M on a $20M investment, or about a 35% cash on cash return. Might beat the S&P over three years, but probably not.

You can see where the sensitivities are: if the major breakout hit sells 2M units, the the take is more like $46M. If you assume no breakout hits, but all 60 acts sell 45,000 units, the return is about $28M. If you assume every band can triple its revenues beyond its recorded music sales, the you can produce about $46M on $20M, which is pretty decent.
The other challenge with this model is that the funder owns nothing more than a cash flow interest in the band. The article discusses that the artist still own their masters and all rights. I applaud that construct as I think that is the right new model with the right incentives. However, it means the multiple applied to a company like Polyphonic, which is building no long-term equity value, is less than that paid on a company building longer-term equity value. In other words, it is hard to sell a company like Polyphonic for a big multiple.
VC economics, however, look very different. If a VC took $20M into 10 deals at $2M each (let’s say each investment buys 25% of the company) and just one companies is sold for $200M, assuming all other deals fail (unlikely), the VC has produced $50M on $20M. If two companies are sold for $200M, the VC has produced $100M on $20M. (This analysis is greatly simplified just to illustrate the point.)
Let me be clear: I really like what Polyphonic is doing and am hugely supportive of their model and direction. I want alternative models like this to exist, I want artists to prosper, and I want artists to be in control of their own destiny, surrounded by smart people who understand how to market in the new digital world. I just don’t think these models can produce a venture return. There are plenty of other sources of capital for these models, and I am excited to see them get funded and take off. Hats off to Terry McBride and the team at Polyphonic.
Comments (1)
Comments (5)
Leave a Comment