By Stuart Russell
Although it was published several years after Bostrom’s Superintelligence, I recommend reading Human Compatible first. Russell covers similar ground with respect to the problem of control over a superintelligence but in a style that I think most interested readers will find easier to follow and more insightful. If you then want a more expansive coverage of the risks and challenges posed by a superintelligence, as well as the likely ways one might emerge, then Superintelligence is your book.
One of Russell’s key points is that we shouldn’t think of the control or value problem in the same way we might think of solutions for a more traditional machine learning algorithm or even a narrow AI. It’s simply not possible for us to specify a priori a loss or cost function to optimize. None of us would qualify as a superintelligence, and I don’t know about anyone else, but I don’t think I could come close to specifying the loss function I might be trying to optimize. Russell’s proposal is that rather than trying to directly seed a superintelligence with the optimal values and objectives, we should instead ensure that it strives to achieve our objectives after it first learns them. History is rich with tyrants and terrorists whose objectives we obviously wouldn’t want to be learned by a superintelligence but a more representative sample might just work to ensure that we don’t get broken down into piece parts as part of Bostrom’s paperclip factory nightmare.