|What may become the ultimate cloud-based repository of factual information:|
The Factual website:
Just the Facts. Yes, All of Them.
By QUENTIN HARDY
New York Times
March 24, 2012
BEVERLY HILLS, Calif.
AT 7 years old, Gilad Elbaz wrote, “I want to be a rich mathematician and very smart.” That, he figured, would help him “discover things like time machines, robots and machines that can answer any question.”
In the 34 years since, Mr. Elbaz has accomplished big chunks of these goals. He has built Web-traversing software robots and answered some very big questions for Google, along the way becoming a millionaire several hundred times over.
His time-machine plans, however, have been ditched for something he finds more important: trying to identify every fact in the world, and to hold them all in a company he calls Factual.
“The world is one big data problem,” Mr. Elbaz says from his headquarters, a quiet office 14 floors above the Los Angeles Country Club. He is a slim, soft-spoken man who weaves in his chair when an idea excites him. “What if you could spot any error, as soon as you wrote it? Factual is definitely a new thing that will change business, and a valuable new tool for computing.”
In the booming world of Big Data, where once-unimaginably huge amounts of information are scoured for world-changing discoveries, Mr. Elbaz may be the most influential inventor and investor. Besides Factual, he has interests in 30 start-ups, including an incubator in San Francisco dedicated to Big Data. Factual’s headquarters, in a high-rise on the Avenue of the Stars, hosts seminars for a data community he hopes to foster in the Los Angeles area.
Mr. Elbaz also serves on the boards of the California Institute of Technology, his alma mater, and the X Prize Foundation, which offers cash prizes to teams that meet challenges in space flight, medicine and genomics. The company he sold to Google, Applied Semantics, is the basis of Google’s AdSense business, which brings Google close to $10 billion in revenue annually.
While valued for his investments and guidance, Mr. Elbaz remains relatively little-known. He is so self-effacing that he recently walked through a conference of 3,000 data scientists, recognized only by the staff members of one of his investments. He lives quietly with his wife, a former federal prosecutor, and his three children in a modest ranch house in West Hollywood. For fun, he plays basketball at a local sports club.
His mental and financial assets, he says, are like gifts he needs to deploy so the world works better.
“If all data was clear, a lot fewer people would subtract value from the world,” he says. “A lot more people would add value.”
Creating clear, reliable data could also make Factual a very big company.
“Gil is pretty far ahead of the rest of us, the one entrepreneur where it takes a few meetings before I really understand everything he is talking about,” says Ben Horowitz, a venture capitalist who backed Factual through his firm, Andreessen Horowitz. “Three years ago, he thought Factual was his biggest chance to change the world. Over time, the world has moved his way.”
Since its start in 2008, Factual has absorbed what Mr. Elbaz terms “many billions of individual facts we’ve collated.”
Geared to both big companies and smaller software developers, it includes available government data, terabytes of corporate data and information on 60 million places in 50 countries, each described by 17 to 40 attributes. Factual knows more than 800,000 restaurants in 30 different ways, including location, ownership and ratings by diners and health boards. It also contains information on half a billion Web pages, a list of America’s high schools and data on the offices, specialties and insurance preferences of 1.8 million United States health care professionals. There are also listings of 14,000 wine grape varietals, of military aircraft accidents from 1950 to 1974, and of body masses of major celebrities. Odd facts matter too, Mr. Elbaz notes.
He keeps 500 terabytes of storage near Factual’s headquarters. That’s about twice the amount needed to hold the entire Library of Congress. He has more data stored inside Amazon’s giant cloud of computers. His statisticians have cleaned and corrected data to account for things like how different health departments score sanitation, whether the term “middle school” means two years or three in a particular town, and whether there were revisions between an original piece of data and its duplicate.
Factual’s plan, outlined in a big orange room with a few tables and walled with whiteboards, is to build the world’s chief reference point for thousands of interconnected supercomputing clouds. The digital world is expected to hold a collective 2.7 zettabytes of data by year-end, an amount roughly equivalent to 700 billion DVDs. Factual, which now has 50 employees, could prove immensely valuable as this world grows and these databases begin to interact.
FACTUAL sells data to corporations and independent software developers on a sliding scale, based on how much the information is used. Small data feeds for things like prototypes are free; contracts with its biggest customers run into the millions. Sometimes, Factual trades data with other companies, building its resources.
Some current uses are for adding information like restaurant locations to cellphone maps, or for planning sales campaigns. But more broadly, Factual is meant for the heart of a great business of our age: using all the cloud-based data and algorithms to find patterns in nature and society, for scientists to observe and businesses to exploit.
“Data has always been seen as just a side effect in computing, something you look up while you are doing work,” Mr. Elbaz says. “We see it as a whole separate layer that everyone is going to have to tap into, data you want to solve a problem, but that you might not have yourself, and completely reliable.”
A restaurant chain, for example, might use Factual to figure out whether a new location is near the competition, and how the locals have talked about the place on Yelp, the social ratings site. Checking for gas stations near the restaurant can indicate how many cars come off the highway. The chain can also employ Factual to see where it is mentioned on the Web, or to correct what other people are saying about it.
Financed with $27 million by a constellation of Silicon Valley luminaries, Factual remains closely held. But it already has thousands of customers. Facebook, CitySearch, AT&T and others use it for information about places. Newsweek used the database to help rank America’s greenest companies.
Others use Factual data for tasks like product planning and customer care. There are no profits yet, as Mr. Elbaz puts money into more data sets and talent, which already includes advanced mathematicians, data scientists from LinkedIn and Google, and at least one specialist in late Roman archaeology.
Competitors in the new industry include Microsoft, which says its Windows Azure Marketplace has “trillions of data points,” as well as a language translator. People can sell data sets to Azure, too. Infochimps offers geographic and social data, among other kinds, while companies like Gnip and Datasift offer insights from Twitter and other social sites. Wolfram Alpha, founded by another mathematician, has both data and computations that are used by Apple’s Siri, among others.
And a young company called ClearStory, also financed by Andreessen Horowitz, is trying to tie together all of these companies, often called data marts, in a way ordinary people can use. There are also several open-source data repositories, with public and private information that developers plug into their algorithms.
Several other data specialists, mostly from Google, have left their jobs to wrangle lots of information in new ways. David Friedberg, a former product manager at Google, has started the Climate Corporation, which uses government data on weather, soil porosity and the root structures of wheat and soybeans to write crop insurance.
Mr. Elbaz is also an investor in Kaggle, which awards cash for finding data patterns. It was used by NASA, for example, to find a better way to measure the shape of galaxies; in the first week of competition, a Ph.D. student in glacier mapping had outperformed NASA’s algorithms. He has also put money into ZestCash, which makes payday loans that are cheaper than the industry’s average, judging risk via criteria like cellphone bills and how its applicants read the ZestCash Web site.
The ZestCash C.E.O., Douglas Merrill, once ran Google’s internal information systems.
“We feel like all data is credit data, we just don’t know how to use it yet,” he says. “This is the math we all learned at Google. A page was important for what was on it, but also for how good the grammar was, what the type font was, when it was created or edited. Everything. What Gil is doing at Factual is the same. Data matters. More data is always better.”
MR. ELBAZ was born in Washington, D.C., and grew up in Ohio, Texas and Florida. His father, who was born in Morocco and grew up in Israel, was a school principal and professor of Hebrew literature. His mother, a journalist, died when Mr. Elbaz was 18. At age 3, he began writing a repeating series of numbers at preschool. He read almanacs and enjoyed watching the crawl of stock prices on TV, seeking patterns.
“He would go to a lot of math competitions, and come out with three or four prizes,” says Nissim Elbaz, Mr. Elbaz’s father. “In between the math contests he’d take tests in physics for fun. When I would tell him what a genius he was, he’d give me a dirty look, so I learned to keep it in my heart.”
The elder Mr. Elbaz recalls that when he tried to explain the Isreali- Palestinian conflict to him, the son replied that the hatred would end if the two sides could just agree on the facts.
From an early age, Mr. Elbaz would also figure out math-related businesses — like buying the entire supply of a single brand of baseball cards in El Paso, Tex., then reselling them at three times the money at a memorabilia convention.
“We’d do lotteries based on guessing the number of marbles in a jar,” says Eytan Elbaz, his younger brother, who has worked with Gil and now has two start-ups of his own. “When he was 16, he held a contest based on rolling a Yahtzee dice. He stayed up the night before making a spreadsheet to figure out all the payouts against what we’d take in.” Mr. Elbaz’s other brother, Noam, has spent the last decade studying at a yeshiva in Israel.
At Caltech, Mr. Elbaz majored in applied science and economics. Interested in the subject of monopolies, he won an award for a paper that determined that companies would take financial losses to corner their markets.
He worked for I.B.M. for two years, looking at the use of computers in problems of manufacturing, then went to Sybase, a database company. This was in the early 1990s, when I.B.M. was stumbling in the transition from mainframe computers to servers and PCs.
His younger brother says he thinks that the experience changed him. Many employees were “just trying to hold on to their jobs, not working together for the company,” Eytan says. He recalls how Gil, concerned about how employees were hoarding their data, “started talking about how much better it would be if people shared data.”
Mr. Elbaz then joined a semiconductor start-up called Microunity and became a consultant, saving money and playing the stock market to help finance his own first business. His father gave him $10,000 to invest for him, which Mr. Elbaz tripled in 18 months. When Mr. Elbaz and a Caltech friend decided to form a company in 1998 — it became Applied Semantics — his father told him to put the stock winnings into it.
Applied Semantics software quickly scanned thousands of Web pages for their meaning. By parsing content, it could tell businesses what kind of ads would work well on a particular page. It had 45 employees and was profitable when Google acquired it in 2003 for $102 million in cash and pre-I.P.O. stock.
While Mr. Elbaz would not say how much he made from the deal, his father’s $30,000 from the stock investments was eventually worth $18 million. “He certainly changed my retirement,” Nissim Elbaz says.
Mr. Elbaz became the head of Google’s engineering office in Santa Monica, Calif., near where he lives with his wife, Elyssa, and three sons. He has donated several million dollars to a handful of causes, including science education, environmental efforts and an organization that helps Los Angeles nonprofit groups. He has also donated to Common Crawl, a Google-type Web examination tool that researchers can access through Amazon’s computers.
“Having money is overrated when you are brought up not to believe you are entitled to it,” he says. “You can make enough money to not need things, or you can just not need things.”
In 2007, Mr. Elbaz left Google to start Factual. When Mr. Horowitz, who along with Mark Andreessen runs Andreessen Horowitz, was asked to invest in Factual in 2009, he struggled with the idea that Mr. Elbaz would want to work hard on another start-up when he was already rich. But when Mr. Elbaz described his palace of facts, Mr. Horowitz said he recognized a true believer.
“I asked him, ‘Aren’t you too rich to build this company?’ ” Mr. Horowitz recalls. “He gave me one of the longest and most thought-out answers I’d ever heard. He thinks this is a chance to change the world — that matters to him more than money.” Mr. Horowitz says Mr. Elbaz told him he needed the money as an incentive for the engineers, and that he needed to reach his goal while his mind was still strong enough.
“I eventually realized this was not a ‘too rich to work hard’ problem.” Mr. Horowitz said.
Other investors in Factual include Ron Conway, Esther Dyson, Index Ventures and the Founder Collective.
FACTUAL also has offices in Shanghai and in Palo Alto, Calif., where Mr. Elbaz wants to add more talent from Silicon Valley. His first two employees in Palo Alto were Tim Chklovski, with a doctorate from M.I.T. in artificial intelligence, and Tyler Bell, who worked at Yahoo on maps after a decade at Oxford piecing together how ancient Rome became early Europe.
“Roman amphorae get dug up around Europe, and get described in all sorts of different ways, in different languages,” Mr. Bell says. “What we do here is the same kind of restoration. We started to learn it well when we had one million data sets uploaded. The early stuff was things like senators and how they voted, or ZIP codes, types of cigars, lists of video games. Merging ZIP codes is easy. Merging different ways people feel about toys is hard.”
Part of the difficulty, even among employees, is deciding how much data is enough. “For sure, we want the correct name and location of every gas station on the globe,” Mr. Bell says. “Not the price changes at every station.”
“Wait a minute, I’d like to know every gallon of gasoline that flows around the world,” Mr. Chklovski cuts in. “That might take us 20 years, but it would be interesting.”
At most start-ups, talk about doing the same kind of thing, only bigger and better, 20 years from now might seem like a marriage of the delusional and the dull. Mr. Elbaz and his team, however, say they feel that it makes sense. Telling everyone the true facts of the world is at least the work of a lifetime.
”Lately, I’ve been thinking that we need to get more personal data,” Mr. Elbaz says. He doesn’t mean names and addresses, but their genetic information, what they ate, when and where they exercised — ideally, for everyone on the planet, now and forever. “I want to figure out a way,” he says, “to get people to leave their data to science.”