All Systems Go: The Newly Emerging Infrastructure to Support Free Books

by Ben Crowell       http://www.lightandmatter.com/article/infrastructure.html

 

With the cost of college textbooks up 62% over the last decade,[1] pressure is building for an alternative model of publishing: the free book. Five years ago, an author had to be very persistent --- maybe even a little crazy --- to try the new approach. But now a whole new infrastructure is springing up to make it easier.

 

This article is copyright 2005 by Benjamin Crowell, and is available under the Creative Commons Attribution-ShareAlike license (or, at your option, under the GNU Free Documentation License version 1.2, with no invariant sections, no front-cover texts, and no back-cover texts).

You mean like Wikipedia?

Five years ago, people looked at me funny when I expressed my enthusiasm for free books. You mean like Project Gutenberg? Downloading Hamlet for free? No no, I explained, I was talking about books intentionally set free by their authors. Huh? You know, like Linux. Open-source books. You mean like that party game where you sit in a circle, and everybody takes turns making up the next part of the story? No no no, I'd explain, serious tomes on weighty subjects: calculus, Proust, cell biology. This would be received with a look of pity, or maybe a hint that I might need to talk to a mental health professional.

These days the response is, "Oh, you mean like Wikipedia?"

That's progress. In 1950, science fiction author Robert Heinlein (of Stranger in a Strange Land fame) was asked by the editor of a science fiction magazine to make serious predictions of what the world would be like in the year 2000.[2] At that time, Heinlein's fictional depictions of 2000 featured a lot of flying cars, but in his nonfiction article he did say some sensible things about how progress actually works. Progress is exponential. Over the short term, an exponential curve doesn't look very impressive, and if you extrapolate it linearly, you won't think anything all that exciting is going to happen. But in the long term, the curve goes ballistic. Johann Gutenberg thought his printing press would allow people who weren't quite so rich to have their own Bibles. He would never have imagined the public library, much less the internet.

Giving it away isn't free.

The most notable recent progress in the world of free books is that a kind of infrastructure is starting to come together to support them. This article is the third in an intermittent series. In the first one,[3] written in 2000, a big barrier I pointed out was the problem with the lack of appropriate open-source software for desktop publishing: you have to eat your own dog food, and there was no way that open-source books would ever get off the ground if they had to be produced using an array of expensive and mutually incompatible closed-source software. But since then I've learned the hard way that software wasn't the only barrier. Although my second article,[4] in 2002, was wildly optimistic, I was finding that my own physics textbooks, which I'd tried to set free in the wild, were acting like twenty-somethings who couldn't afford to move out on their own because their jobs at WalMart were just barely paying for their car insurance. Professors at other colleges were adopting my books, which was exciting, but those professors wanted printed copies for themselves and their students. One day I woke up and found myself running a small business --- and it wasn't just small, it was small, inefficient, time-consuming, and unprofitable. I had forgotten the flip side of Heinlein's dictum of exponential progress: although we tend to underestimate for the far future, we also tend to overestimate for the near future. Flying cars didn't happen by 2000, because, well, for one thing you couldn't yet find a service station that sold plutonium. The infrastructure wasn't there yet. I think the infrastructure for free books is only now starting to be built, which is why, although there are now over 1000 free books listed on a web site I run that catalogs them,[5] those thousand books have still had relatively little impact.

One of those thousand books is Wikipedia, and although it's atypical in many ways, Wikipedia is a particularly dramatic illustration of one of the infrastructural problems I'm talking about: bandwidth. The Wikimedia Foundation depends on an international server farm consisting of over a hundred machines.[6] On a smaller scale, bandwidth is going to be an issue for anyone serving up a popular, illustrated book. As my own books have been downloaded more and more, my webhosting costs kept escalating, eventually reaching $100 a month.[7] The authors of one free organic chemistry textbook[8] have gone so far as to throttle back the load on their server by requiring prospective readers to register and give an e-mail address before they can access the book. The problem is that many of those prospective readers will balk, and that will keep the book from gaining mind-share --- college professors are used to being courted assiduously by book reps, and to getting so many unsolicited books sent to them by publishers that they can use them as doorstops.

I've also learned what a capital-intensive business print publishing is. With traditional printing technology, the unit cost goes down dramatically as the length of the press run increases, and that economic imperative meant that I soon had about $10,000 invested in a closet full of books. Meanwhile, my wife, who handles the family finances, was warning me that I was losing money. Webhosting, printing, and advertising were adding up to more than I was bringing in.

the new infrastructure

The whole business of free books was harder than I'd though it would be, so hard that it probably would have deterred most prospective authors if they'd known what they were getting into. One of the lessons of Wikipedia's success is that you have to make things easy for authors; its philosophy of instant gratification was the reason it did so well where its stuffier, slower-moving predecessor Nupedia had failed. One item of good news is that within the last few years, it's become possible for a someone who isn't an ubergeek to create an illustrated textbook using open-source software. Scribus, a GUI desktop publishing application, makes it easy to do a book with a complex visual layout, something that could previously be done using LaTeX, but only with the sacrifice of many goats. Inkscape, a 2003 fork of the open-source illustration program Sodipodi, has made rapid progress, and is now in the same league as Adobe Illustrator. PDF, long treated with suspicion by the free software community, has emerged as a lingua franca for online books, and there is now a hefty toolchain for creating and working on PDF files, including Scribus, Inkscape, and a long-awaited 1.x release of PdfTex, as well as many smaller utilities such as pdftk and pdfripimage. Color management had for a long time been a shortcoming of Linux compared to Windows and MacOS, partly due to patents, but is now incorporated into a few Linux applications such as Scribus via the littlecms color management system. OpenOffice's ability to read Microsoft Word documents has also helped to alleviate the former problem of authors finding that once they had written something in Word format, they were forever locked into proprietary software.

One justification offered by publishers for the astronomical prices of college textbooks is the high cost of permissions fees for materials such as photographs, artwork, or anthologized text. Since September 2004, the free information community has had a weapon in its arsenal that's unavailable to traditional publishers: Wikimedia Commons, a repository containing hundreds of thousands of photos.[9] Since most of the images are under copyleft licenses, they can be used in copylefted free books, but not in traditional copyrighted books. Although five years ago many of us who were interested in free books envisioned the sharing of copylefted words, in fact it looks like the commonest currency of collaboration is turning out to be not text but pictures. A positive development related to this has been the increasing standardization of licenses. There's a pretty clear consensus these days that new copylefted materials should be licensed either under the Creative Commons Attribution-ShareAlike license or the GNU Free Documentation License, or both.[10] As a textbook author, dual licensing my books under these two licences allows me to legally use essentially 100% of the contents of Wikimedia Commons. Even for an author who has no particular interest in free information as an abstract ideal, the use of non-proprietary photos opens up the possibility of using free digital copies of a book as sales tools. That's an option that traditional publishers don't have, because permissions fees are computed according to how widely the book is distributed.[11]

But even if there's no legal objection to giving your book away for free, there's still that pesky issue of the cost of bandwidth. Luckily there's some new infrastructure coming along to take care of this as well. Jason Turgeon has founded a site called textbookrevolution.org,[12] whose mission is to mirror free books and take the load off of authors. The Wikibooks project, which aims to extend the success of Wikipedia to the creation of a whole library of books, is backed by the horsepower of the Wikimedia Foundation server farm. And finally, there's an interesting company called lulu.com,[13] dating back to 2002, which is opening up new possibilities for distributing books, both electronically and in print.

Has the time come for print-on-demand?

Lulu's main business is print-on-demand (POD), which means that instead of producing thousands of copies of a book at once as on a traditional printing press, modern technology and automation make it economically feasible to produce books one at a time, as requested by customers. For a long time, I was very skeptical about POD. It always seemed to be a technology that was supposed to be viable soon, but that nobody seemed to be able to execute successfully as a business. The whole field also carried the taint of the sleazy vanity publishing industry, infamous for luring would-be authors into paying money to get their books published and then leaving them high and dry. Lulu, refreshingly, doesn't claim to be anything it's not. What it is is ... well ... funky might be the word. Among its prominently displayed bestselling titles are "Raw Foods for Busy People," a calendar with pictures of "SAM, World's Ugliest Dog," and the infamous hoax novel "Atlanta Nights," originally written as an expose of one of the less honest vanity publishers and now being sold for its value as humor.[14]

Lulu's pricing works nicely for free-information books: "Because we support the free and open exchange of information, if you decide not to get a royalty, we also waive our commission. The selling price of your printed book, calendar, CD or DVD will be its production cost only; download versions are free." What this boils down to is that if you have a book that you want to set free, and you don't have any ambition to make money from it, you can distribute the PDF file through Lulu without paying for bandwidth, and readers who are so inclined will be able to buy printed paperback copies for $4.53 plus 2 cents per page.

One textbook author who has used Lulu is Prof. Richard Fitzpatrick at UT Austin.[15] Modestly referring to his books as lecture notes, Fitzpatrick writes, "My motivation for writing these notes was mainly that I did not want to be tied to any particular textbook. In England, where I was educated, lecturers are (or were in my student days) expected to generate their own courses, and textbooks are mainly used for reference purposes and/or for background material. I have never liked the U.S. model where profs seem to be able to get away with essentially reading the assigned textbook to their classes. Obviously, another consideration is that I have my own approach to certain subjects which does not necessarily correspond exactly to the approaches used in available textbooks. Finally, I am, of course, appalled by the astronomical cost of Physics textbooks (especially textbooks for lower division survey courses, which are also pretty awful, and getting worse edition by edition). Hence, by making my notes available to my students, I give them the option of not buying the official textbook." On the first day of class, Fitzpatrick hands out photocopies of his book to his students for free in a condensed format, with four pages reduced to fit on one page. If they want something nicer, they can then download the PDF file for free and print it out themselves, or order a bound copy from Lulu for about $9 for a 240-page black and white book. Fitzpatrick, who doesn't take any royalty from his books' sales on Lulu, says, "I have been very happy with my experience with Lulu... The only problem I see with Lulu is the rather large time-delay between ordering a book and getting it in the mail. I think that Lulu should concentrate on fixing this problem." Fitzpatrick's books have been used by other professors, both at UT and at other schools.

Another author who uses Lulu is East Tennessee State University computer science professor David Tarnoff. Tarnoff's students can download Computer Organization and Design Fundamentals[16] for free, or buy a printed copy of the 434-page book from Lulu for $16.98 --- about 80% less than the cost of the proprietary book he'd been using before. (He takes a $3 royalty on each printed copy.)

Since I originally wrote this article last year, I've experimented with producing and selling my own books via lulu. Although in many ways it's an improvement over what I was doing before, I've also run into many problems, and at this point I would be very cautious about recommending lulu to other people. One big issue is that, for a company whose entire business is built around the pdf format, lulu is not actually very good at supporting the pdf format correctly; they don't coordinate properly with the subcontractors who actually print the books, and this has led to a constant stream of problems, in which books that had sold a bunch of copies all of a sudden don't print one day, resulting in a cancellation of the school's order. I've also had a lot of problems with the poor quality of their packaging on wholesale orders.

wikibooks

Wikibooks,[17] founded in July 2003, represents a completely different approach than Lulu's. One dramatic contrast is that where Lulu emphasizes print publishing, Wikibooks almost completely ignores it: although every book has a "printable version" link provided automatically, in nearly all cases the printable version doesn't actually work --- it's just a table of contents. Wikibooks also differs from Lulu by being completely noncommercial and open to participation by anyone. One good result of this is that due to peer review, Wikibooks has relatively little of the crank literature that permeates most of the self-publishing world. In a C|Net interview, Wikipedia founder Jimmy Wales has expressed high --- some might even say grandiose --- goals for the project, predicting in an interview that traditional publishers are "going to have to recognize that there's a fundamental shift in the marketplace. Some of them will prosper. Some of them will figure out the new regime and find out ways to add value. Others will stick their heads in the sand and get slaughtered."[18]

On the other hand, the wiki model seems to encourage the creation of abortive book projects that linger indefinitely without being improved or deleted, and it can be extremely hard for a visitor to the site to find any high-quality, completed books. Wales says, "It's still a young project. I would consider it to be mission accomplished when we could point and say, 'Well, you could teach yourself, or someone could teach you using these materials, (anything) from the kindergarten to the university level.'"

I recently carried out a non-scientific attempt to survey the best books on Wikibooks. My sample, taken from the adult English-language site, consisted of all the books whose level of development was listed as "comprehensive text" (the highest level), plus all those that had been voted book of the month at some point, plus a list of four books suggested by a Wikibooks user with whom I happened to start up a dialog on Slashdot. This made a total of 20 books.[19] Of these:

  • 11 were not really books, ranging in length from 6 to 50 pages.
  • One was marked with a copyright violation banner.
  • Four or five were books that seemed to have been written entirely outside of Wikibooks, dumped into Wikibooks wholesale, and never really edited much by Wikibooks users.
  • Only three or four were complete books that appeared to have been created and written through Wikibooks itself.

Despite Wales' high hopes for revolutionizing education, I don't think the traditional textbook publishers are shaking in their boots yet. One of the strongest areas of activity on Wikibooks actually seems to be the production of guides to computer games, a genre that accounted for five of the 20 books in my sample (counted in the "not really books" category due to their short lengths). Maybe I'm being unfair. Although Wikibooks has existed for two and a half years, some of its users say that it was neglected for a long time by the Wikimedia Foundation's founders, and that activity has only really started to build up within the last six months. Even so, I think there are some characteristics encoded deep in Wikibooks' DNA that will keep it from having much of an impact on the world of formal education. One of the things teachers want most of all in a text is authority and reliability, and those are fundamental areas of weakness in the whole wiki approach. Another problem is the books' lack of availability in print. For better or for worse, college professors and K-12 school districts are almost never willing to consider a book that can't be ordered wholesale from a print publisher. Although a few college professors are willing to adopt a digital-only book and leave it up to their students to download it and print it out on their own, very few books on the Wikibooks site offer versions that are actually formatted appropriately for convenient do-it-yourself printing. There is a fundamental problem here, since wiki software is oriented toward HTML output, but HTML isn't a very printer-friendly format. And finally, the wiki method doesn't seem to be a good match to the way textbooks are actually written: by teachers. A teacher typically decides to write a textbook because he isn't happy with the books that are available and thinks he can do better. The openness and inclusivity of the wiki method are diametrically opposed to the ability of a single author to express his own vision.

making contact

Writing is all about making contact between an author and a reader, and this is currently one of the biggest problems for free books. What was really subversive and exciting about the World Wide Web was that it took control away from publishers, changing the definition of "page" from something produced in a factory to something you could create yourself, without asking for permission. Along with that freedom came the problem for the reader of how to find anything worthwhile on the Web, where there was no publisher to filter out the garbage. Readers looking for free books online are likely to visit relatively popular sites, which right now include Lulu and Wikibooks. The problem is that on Lulu they'll wade through crank literature such as "Fixing Einstein's Physics," while on Wikibooks they'll be sorting through such gems as "How To Build A Pykrete Bong." My own attempt to help solve this problem for free books is my web site theassayer.org,[5] which catalogs free books and accepts user-submitted reviews. The new site textbookrevolution.org[12] focuses on mirroring a smaller and more carefully vetted selection of book, and provides editorial blurbs and an opportunity for users to submit reviews. Textbookrevolution's creator Jason Turgeon writes, "Traditional textbook publishers are insane. They're looking at the size of the US market for textbooks, which is no longer growing, trying to figure out how to keep their revenue growing and satisfy shareholders. And their solution isn't to find new markets, to reach out to developing nations, or to cut development and distribution costs by using the new technologies that are available to all of us. Instead, their solution has been to raise prices every year and to try to kill off the used book market with gimmicks and pointless new editions. But their prices are getting so high that they're actually shooting themselves in the foot --- no one outside of the developed world can afford their product at all, and fewer and fewer of those who can pay are willing to. I can feel the change in the air. Students, teachers, and parents are all fed up. Sites like mine are just the beginning. Sooner or later, something is going to click into place and the market is going to correct itself."

Another recent attempt to connect readers to books has been the Google Book Search program (formerly known as Google Print). This interface between the freewheeling culture of Google and the staid world of print publishing has so far been about as successful as attempts to mate a Klingon[20] with a Ferengi[21]. The idea was that Google would index printed books in the same way that it indexes web pages, without allowing web surfers to download and read a whole book. Some of the books would be scanned from four large university and public libraries, while a second program would scan in books that publishers submitted in hopes that Google hits would drive sales. The scans of library collections are on an opt-out basis for publishers, rather than opt-in, which elicited howls of protest and a copyright infringement lawsuit. I originally thought that the Google program might become an important part of the free book infrastructure. The only requirements were that your book have an ISBN, and that you submit a printed copy to Google. (The latter requirement has since been eliminated, and publishers can simply upload PDFs.) I signed up as a publisher, and at first it looked promising. In my account, I chose to make all of my books 100% browsable. (Presumably most publishers of non-free books have been setting much lower values.) When I tested the system by doing an ordinary Google search on text from one of my books, it showed me the page in the scanned book on which the text occurred, gave me the ability to flip through the book page by page, and showed me an option to buy a copy.

Then the lawsuit hit, and Google began to backpedal. Results from books were no longer shown in ordinary Google searches, but only in searches on print.google.com, and since very few people ever do a search on print.google.com, the potential for attracting readers was greatly reduced. Then, in an attempt to convince publishers and authors that this wasn't all about pirating their books, Google changed the name to Google Book Search, and the URL to books.google.com. Then searches began to show less and less of my books, even though my account was still set at 100% browsable. First it would only show a few pages, then only one page, until finally, right now, if you do this search, you're allowed to view a few lines of text, with the rest of the page obscured by a scary notice: "Restricted Page. This page is unavailable for viewing." The whole experience makes the reader feel like a naughty ten-year-old boy trying to get a glimpse of the neighbor lady undressing through a gap in her curtains, and given the quality of the experience, it's hard to imagine that Google Book Search will ever build a viable pool of users.

connecting the dots

Now that an infrastructure is being built for free books, part of the challenge will be to get people to start using it, and using all the pieces together. Most people doing free textbooks aren't using Lulu or Wikibooks. Most people using Lulu aren't doing free books. Nobody on Wikibooks seems to be using Lulu. There's also a need to fix or throw away some of the failures, such as Google Book Search and the accumulated cruft of failed books on Wikibooks. Picking winners and losers, however, has never been one of the strengths of the free information movement, and the issues surrounding Google Book Search are being played out by people who have no particular interest in free books.

On the reader's side, it doesn't occur to most people looking for reading material that there are over a thousand free books on the web. They might be surprised to learn that quite a few of those books are also available in print, and they probably don't know that they can find them through sites like textbookrevolution and theassayer. Some people reading this article will have bought a printed book in a store that was also free on the web, but not realized it. For example, you can go into a store and buy one of Bruce Eckel's programming books, or a science fiction book from Baen, bring it home, and finish reading the whole thing without ever knowing that it was available for free in digital form. For the authors of those books, it's a good deal --- the digital books are a form of free publicity, and the economics is like the logic of coupons and rebates: the people who have the most money to spend are the ones least likely to use them. But this under-the-radar approach keeps readers and authors from realizing what's going on around them.

References

[1] "Rip-off 101: Second Edition --- How the Publishing Industry's Practices Needlessly Drive Up Textbook Costs" >>Link
[2] Robert A. Heinlein, "Where To?", Galaxy magazine, written in 1950 and published in February 1952.
[3] Ben Crowell, Do Open-Source Books Work? >>Link
[4] Ben Crowell, Free Books: A Sneaky Success >>Link
[5] theassayer.org >>Link
[6] Wikimedia servers >>Link
[7] Jason Turgeon, whose web site promoting free books is discussed later in this article, commented after reading an early draft of this article that it really shouldn't really be necessary to pay so much for bandwidth these days. It's true that you can get webhosting for less money than I've been paying, but my earlier experiences with cheaper webhosts had been problematic in terms of reliability. For many authors, it doesn't seem reasonable to take on any new monthly bill for the sake of promoting a free book that isn't generating any revenue. In my own case, I'm running some CPU-intensive web applications, not just serving up static html.
[8] Daley and Daley, Organic Chemistry, >>Link
[9] Wikimedia Commons >>Link
[10] The older OPL license has fallen into disuse, and its creator now urges people to use a Creative Commons license. The GFDL has been criticized for some features that are seen as unfree, such as the provisions for having portions of the text (like dedications) that can't be modified. This has led the Debian project to get rid of all GFDL'd documentation that uses the objectionable options. However, there is a consensus that the GFDL, when used without the cruft, is a free license. My claim that the free-information community has moved toward standardizing on the GFDL and CC-BY-SA is not based on any scientific opinion poll, but simply on my own perception. The use of the GFDL by Wikipedia has been strong message of support for the license. The prevalence of CC-BY-SA can also be judged by the significant number of Wikipedians who have inserted boilerplate language on their user pages saying that their own contributions are dual licensed under the CC-BY-SA as well as the GFDL. The other flavors CC licenses are probably more popular than CC-BY-SA, but I'm just talking about authors who want to use a viral copyleft license of some kind.
[11] "Textbook," Wikipedia >>Link
[12] textbookrevolution.org >>Link
[13] lulu.com >>Link
[14] Travis Tea, Atlanta Nights. "The world is full of bad books written by amateurs. But why settle for the merely regrettable? Atlanta Nights is a bad book written by experts." >>Wikipedia link
[15] Richard Fitzpatrick's Lulu site, which links to his university page: >>Link
[16] the web site for Tarnoff's course: >>Link
[17] en.wikibooks.org >>Link
[18] Daniel Terdiman, "Wikibooks takes on textbook industry," c|net, September 28, 2005 >>Link
[19] Ada Programming, America's Army: Special Forces, Blender 3D: Noob to Pro, Cell Biology, Chinese, Chrono Trigger, Consciousness Studies, Final Fantasy VI, How to Build a Computer, How to solve the Rubik's Cube, Introduction to Sociology, JAGS-2, Knowing Knoppix, Lucid Dreaming, Medal of Honor: Frontline, Qrai, Teaching Assistant in France Survival Guide, The Legend of Zelda: The Wind Waker, U.S. History, and UK Constitution and Government.
[20] "Klingon," Wikipedia >>Link
[21] "Ferengi," Wikipedia >>Link
series | Light and Matter | The Assayer