Realms-shim Security Updates

Brian Warner, 02 Oct 2019

Summary: Four security-critical bugs were discovered and fixed in the “realms-shim” and “SES” libraries, which underpin Agoric’s secure JavaScript platform. All users should upgrade to the latest versions. The remainder of this post is a detailed technical analysis of the bugs and the libraries they affected.

On Saturday morning (14-Sep-2019), bug #48 was filed on the realms-shim repository, revealing a sandbox escape vulnerability in the Realms shim. Over the course of the next week, several other bugs were discovered with similar severity. These were all fixed by realms-shim-1.2.0 and SES-0.6.3, released 02-Oct-2019. All users of SES and the Realms shim should upgrade to these versions immediately.

This blog post describes the SES and realms-shim libraries, the nature of the vulnerabilities that were reported, and the fixes we applied. To explain the vulnerabilities, it is necessary to describe the shim in considerable detail, so we must start with a bit of background.

What is the Realms shim? What is SES?

The “Realms” feature is a proposed addition to JavaScript, currently at stage 2 on the TC39 standards track (championed by Mark Miller, Chief Scientist at Agoric, along with Caridy Patiño and Dave Herman). It lets JS code create a new “Realm” and evaluate code inside it:

let r = Realm.makeRootRealm(options);
r.evaluate(src);

The new Realm comes with a new set of “primordials”, like Object/Array/Function/etc, and specifically does not have any host objects (like Document, Window, or require()). Realms provide a fresh clean JavaScript environment, and one of the nice things about JS is that it has a well-defined distinction between computational things (defined by TC39) and IO things (defined by W3C for a browser environment, or the Node.js folks for Node).

That r.evaluate is actually a super-powered “safe three-argument evaluator”; more on that later.

So a brand new Realm makes a great starting point for a sandbox. Code evaluated inside a new Realm doesn’t get any power: it can’t use the network, or modify the DOM, or require() any modules from which it could get those things. And it cannot (by design) reach or modify any of the primordials from the “Primal Realm” (the one that created the new Realm, which does have all the usual host objects), with which it could perform a “prototype poisoning” attack. It can think furiously, but not act or speak or sense without deliberate outside permission.

The Realms shim is a polyfill to provide the Realm object on platforms that don’t already have it, authored by Caridy Patiño, JF Paradis, Mark Miller, and several others. (To be precise, it builds a Realm object unconditionally, since there are no native implementations yet). In Node.js it uses the “vm” module to get a new set of primordials, and in a browser environment it creates a new <iframe>. It then uses a serious hack, known as the “eight magic lines of code”, to build a safe evaluator.

SES is the sandboxing library we build on top of the Realms feature. It creates a new Realm and then modifies that Realm to make it safe to run mutually-suspicious programs inside, enabling object-capability safety rules. To do this, it uses Object.freeze to seal all the Realm’s primordials against modification (disabling prototype poisoning). It also removes a number of primordials that would allow ambient communication channels between unrelated objects, and others (like Date.now) that would enable non-deterministic execution. SES uses realms-shim to get a Realm object, and then exports a SES object. The combination of SES and Realms, working together, forms the basis of our secure-code execution environment.

JavaScript’s Direct and Indirect “eval”

JavaScript has two different kinds of eval(). The first is called a “direct eval”. The string being evaluated gets full (mutable) access to the lexical scope at which the eval() statement appears, so it behaves almost as if the string was interpolated into the original program at that point:

function a() {
  let counter = 1;
  function doEval(code) {
    return eval(code); // direct eval
  }
  return doEval;
}
doEval = a();
doEval('counter + 1'); // == 2
doEval('counter += 1'); // == 3
doEval('counter'); // == 3

Direct eval isn’t very useful for sandboxing, because the evaluated code gets full access to everything, including the internals of the code managing the sandbox. There is a second form named “indirect eval”: it gets access to the global scope, rather than the local scope where the eval appears. This is still a problem for our goals, but it’s a start:

let counter = 1;
function a() {
  let secret = 9;
  function doEval(code) {
    const e = eval;
    return e(code); // indirect eval
  }
  return doEval;
}
doEval = a();
doEval('counter + 1'); // == 2
doEval('secret'); // ReferenceError
doEval('secret = 4'); // modifies a global 'secret', not 9
doEval('secret'); // == 4

To invoke a direct eval, two conditions must be met. The first is that the function being invoked must be literally named eval (so assigning it to some other name disqualifies it from being direct):

let otherval = eval;
othereval(code); // indirect eval

The second is that the function being invoked must be the original native eval function object: you can’t create some other function and just change its name to eval:

function othereval(code) {
  // do something resembling evaluation
}
let eval = othereval();
eval(code); // indirect eval

Direct eval is a JavaScript “special form”: it cannot be replicated by a subroutine.

For our purposes, we want an indirect eval that references a “safe global scope”, where everything on the global object is frozen and powerless. The confined code can access the global, but it doesn’t matter, because that doesn’t give it any additional power. We also want our eval function to accept endowments and transforms. See the docs for full details, but “endowments” are basically properties that can be used by the confined code as if they were globals (except that they’re only present for the duration of a single evaluation):

r.evaluate('a+b', { a: 1, b: 2 }); // == 3
r.evaluate('a+b', { }); // error: 'a' and 'b' are no longer in scope

Endowments are really useful to provide special authority (like filesystem access) to the confined code in a controlled way. And “transforms” are functions that modify the code before it runs, to rewrite the syntax (think Babel plugins). Both features are available on the special evaluate that a Realm offers, so the shim needs to build them on top of the normal JavaScript eval function.

r.evaluate = function(source, endowments, options) {
  const { transforms } = options;
  // ... magic goes here
}

The Eight Magic Lines

The goal of the Realms proposal is to give you an r.evaluate(source) function that acts as an “indirect eval”, which uses a per-Realm global scope, with fresh primordial objects, and the additional “endowments” and “transforms” feature. The core evaluation routine (the “eight magic lines”) look like this:

/* 1 */  return unsafeFunction(`
/* 2 */    with (arguments[0]) {
/* 3 */      ${optimizer}
/* 4 */      return function() {
/* 5 */        'use strict';
/* 6 */        return eval(arguments[0]);
/* 7 */      };
/* 8 */    }
/*   */  `);

This returns a function which, when run, returns the “safe evaluator” that is provided as r.evaluate. It first creates an outer function that prepares a special environment, inside of which it builds a second function (line 4) that will serve as the confined evaluator. This environment has a with statement on line 2. with isn’t used very much these days, but it transforms any global name lookups within the body (lines 3-7) into property lookups on its argument (the arguments[0] on line 2). with is forbidden in use strict code, for good reason: it’s hard to read code when all the names might mean something completely different.

The with handler on line 2 will be a Proxy. As a result, when the confined code is fed to the direct eval on line 6, all the global name lookups will go through the proxy handler. To avoid adding any new names into the environment (other than eval itself), these functions use the old arguments object.

Our proxy handler must return the real (unsafe) eval when it is referenced on line 6, because we need to invoke a direct eval that very first time (so names will be looked up in the proxy). But if anything in the confined code references eval, we need those to be an indirect eval (in fact it needs to be the same safe evaluator we’re building here). So the second (and subsequent) times the proxy handler is asked to get “eval”, we need to return a different object. To achieve this, the proxy handler has a flag that switches lookups of eval between safe and unsafe mode.

This flag is set just before we invoke this evaluator:

  scopeHandler.allowUnsafeEvaluatorOnce();
  let err;
  try {
    // Ensure that "this" resolves to the safe global.
    return apply(scopedEvaluator, safeGlobal, [src]);
  } catch (e) {
    err = e;
    throw e;
  } finally {
    // belt and suspenders: the proxy switches this off immediately after
    // the first access, but if that's not the case we abort.
    if (scopeHandler.unsafeEvaluatorAllowed()) {
      throwTantrum('handler did not revoke useUnsafeEvaluator', err);
    }
  }

and is cleared during the first lookup of the name “eval”:

  get(target, prop) {
    // Special treatment for eval. The very first lookup of 'eval' gets the
    // unsafe (real direct) eval, so it will get the lexical scope that uses
    // the 'with' context.
    if (prop === 'eval') {
      if (useUnsafeEvaluator === true) {
        // revoke before use
        useUnsafeEvaluator = false;
        return unsafeEval;
      }
      return target.eval;
    }

For more details, check out the Report on Realms Shim Security Review, a presentation of the first security review we did (last year).

Vulnerability One: Stack Overflow Error Leaves Evaluator In “Unsafe Mode”

The initial bug report demonstrated a code path that violated some assumptions in the proxy handler. After further analysis, we reduced the code to the following:

function loop() {
  (0,eval)('1');
  loop();
}

try {
  loop();
} catch(e) {}
eval + '' // "function eval() { [native code] }"

The “eight lines” are invoked for the first time when this code is passed to r.evaluate(): that evaluation creates the loop() function and then runs it. loop() looks up eval (which invokes the proxy handler in the “safe” state), does a trivial invocation (which resets the handler to the “unsafe” state briefly, then sets it back to “safe”), then recurses. Since loop() lacks an end condition, this loops over and over until it throws a stack-overflow error (RangeError, in JavaScript), which happens just before the get() hander resets useUnsafeEvaluator back to false (oh no!). Now the recursion unwinds, and finally the last line is invoked (eval + '', used to demonstrate that eval is now the unsafe direct evaluator).

When this line runs, it does a new lookup of eval in the proxy handler. Because the exception left useUnsafeEvaluator set to true, this eval gets the “unsafe” eval which accesses the outer Realm. From there, the code can access primal-Realm objects like window which are supposed to be off-limits for the confined code.

The core problem was that the same proxy handler was shared between multiple invocations of eval(). This allowed invariant violations in one use (“useUnsafeEvaluator shall always be false after the first lookup”) to be visible to a second use. The immediate fix was to create a new scoped handler for each act of evaluation.

During the subsequent analysis, we found a similar failure mode that invokes the safe function constructor Function('1') instead of invoking eval, with similar consequences. We developed a generalized fix by rewriting the evaluator with a more-robust mode switch mechanism.

Vulnerability Two: Exceptions Reveal Primal-Realm Error Objects

A day later, the same contributor submitted a second attack, the core of which was:

let HostException;
try{
     (0, eval)('--'+'>');
}catch(e){
     HostException = e;
}
const HostObject = HostException.__proto__.__proto__.__proto__.constructor;

This exploited a check to protect against HTML comments inside the confined code. The ECMA specification allows (but does not require) parsers to recognize <!-- comment --> in JavaScript source code, so different engines might treat the same source code in different ways. To protect against surprises, the Realms shim rejects attempts to evaluate anything that even looks like an HTML comment. It is well known that HTML cannot be parsed with regular expressions, so the shim uses an approximation that will falsely match some non-comments:

const htmlCommentPattern = new RegExp(`(?:${'<'}!--|--${'>'})`);

function rejectHtmlComments(s) {
  const index = s.search(htmlCommentPattern);
  if (index !== -1) {
    const linenum = s.slice(0, index).split('\n').length; // more or less
    throw new SyntaxError(
      `possible html comment syntax rejected around line ${linenum}`
    );
  }
}

Unfortunately, the surrounding code allowed this SyntaxError to propagate out to the confined code being evaluated. Once it grabbed a reference to this HostException object, it followed the prototype chain out to the host’s Object object, whereupon it could modify the primal-realm primordials to break out of the sandbox.

Further analysis found other exceptions that could be triggered in the shim code. One was an intentional exception to warn users who attempted a direct eval. A real native Realm implementation could support direct eval, but a shim cannot, and anyone trying to do a direct eval under the shim would actually get the behavior of an indirect eval. The recommended practice, for shims that cannot perfectly emulate a feature, is for them to throw an exception instead of misbehaving silently. This way users do not write code that works one way against the shim, but a different way against a native implementation (instead they just don’t write that code at all).

Other exceptions were unintentional. The stack overflow was one such case: the RangeError raised was a primal-realm Exception object. Others involved redefining properties of the global object, and another involved trying to convert Symbol values to a string.

The fix for all of these was to expand the scope of our existing callAndWrapError. This function is used to catch these exceptions before they become visible to the confined code. A new Exception object (from the correct Realm) is created, and string-valued properties like message are copied over from the unsafe one. The confined code sees only the new object, which does not leak access to the primal-realm primordials. We were using callAndWrapError to catch exceptions in certain cases, but the exploit identified a new source of exceptions that we had not anticipated. The problem was fixed by simply applying the existing defense mechanism to a wider scope.

Vulnerability Three: Reflect.Construct Reveals Unsafe Function Object

The next vulnerability exploits the peculiar behavior of Reflect.construct to reach primal-realm intrinsics. See the bug report for full details, but in short a safe Function constructor (intentionally provided to the confined code) could be modified and passed into Reflect.construct to obtain access to a “bound function exotic object” from the primal realm, and from there get access to the primal-realm Object primordial.

Vulnerability Four: Spread Operator Relies On Mutable Array.prototype

The last report described an attack involving the “spread” operator. This is a syntactic convenience that allows functions to “roll up” the rest of their arguments into a single Array. The callAndWrapError function, used to fix the earlier bug, is defined like this:

  function callAndWrapError(target, ...args) {
    try {
      return target(...args);
    } catch (err) {
      // create new Error object, copy properties, re-throw
    }
  }

callAndWrapError is defined inside the new Realm, like a lot of the shim code, and so it must be defensive against changes the confined code might make to that Realm. In particular it pre-fetches properties like Object.getPrototypeOf and Map.prototype.get, in case the sandboxed code modifies them later. These pre-fetches take place while the Realm is being constructed, before the potentially-attacking code is executed, so by the time we need to use them, the attacker has lost their chance to make modifications.

Unfortunately, when you use the spread operator, internally it asks the args object for an iterator, by doing a lookup of a special property named Symbol.iterator (the property name is not a string: it is a special Symbol object). If Array.prototype is modified to replace this iterator property with some other code, that code will execute in the middle of callAndWrapError, and can get access to a primal-realm object. The attack modifies the prototype, and then makes a new (grand)child Realm to trigger the invocation:

Array.prototype[Symbol.iterator] = function(){
	this[0].unsafeGlobal.top.document.title = "Oh No"
}
Realm.makeRootRealm();

The fix for this is to eschew the use of the spread operator within the shim, and use a pre-fetched copy of Reflect.apply:

  const { apply } = Reflect;
  
  function callAndWrapError(target, args) {
    try {
      return apply(target, undefined, args);
    } catch (err) {
      ...

It’s worth noting that SES (which builds upon Realms) will freeze all the globals, so these mutations would not be possible in a SES realm. The default Realm constructor provides mutable globals because some libraries depend upon them, but the safer option is to use the frozen globals of SES.

Severity / Impact

All four bugs enable a full breach of the sandbox. An attacker’s ability to exploit the vulnerabilities depends upon the application: many uses of SES do not accept completely arbitrary code from unknown parties, and these would not be vulnerable.

These libraries are not yet widely used, but we’d like them to be, so we worked quickly to understand and repair the problems, and then perform a coordinated disclosure and delivery of the fixed versions.

What You Must Do

The realms-shim package is fixed as of version 1.2.0. Any code which depends upon earlier versions should be updated immediately.

The realms-shim code was only recently packaged as an NPM module. Before that, SES (and other downstream packages) incorporated the realms-shim as a git-submodule. As a result, SES-0.6.1 and earlier include a copy of the vulnerable shim, and SES-0.6.2 depends upon a vulnerable version of the shim package. Applications which use the SES package must update to version 0.6.3 to get the fix.

The Timeline

The initial bug report was made publically, prompting an immediate fix, a new release of SES (0.6.1), and the first proper release of the realms-shim (1.1.0). The follow-on bugs were submitted privately, allowing us to investigate more fully and coordinate disclosure among the known users of both packages.

  • 14-Sep-2019 (Sat)
    • 9:12am PDT: user XmiliaH files bug #48
    • 4:21pm PDT: Agoric engineer JF Paradis lands PR #49 to fix it
  • 15-Sep-2019 (Sun)
    • 10:30am PDT: XmiliaH emails security@agoric.com with second bug
  • 16-Sep-2019 (Mon)
    • 11:06am PDT: XmiliaH files bug #51 (without details) to escalate
    • 2:35pm PDT: bug acknowledged, GitHub draft security advisory opened, investigation proceeds
    • 2:47pm: XmiliaH contributes additional versions of the bug to the draft security advisory
  • 18-Sep-2019 (Wed)
    • 1:35am PDT: Richard Gibson emails security@agoric.com with third bug (Reflect.constructor)
    • 6:43pm: JF creates draft PR to fix second and third bugs
  • internal testing and coordination with disclosure partners begins, embargo window established
  • 22-Sep-2019 (Sun)
    • 6:26am PDT: XmiliaH files an additional vulnerability involving the spread operator
  • 24-Sep-2019 (Tue)
    • 11:30am PDT: JF adds spread operator fix to PR
  • 02-Oct-2019 (Wed)
  • 07-Oct-2019: Additional details copied to realms-shim bug #51

Future Plans

The realms-shim and SES packages are core pieces of a secure object-capability code execution environment, under heavy development. As part of our engineering roadmap, we will push them towards fully-reviewed, production-ready quality. We will conduct thorough internal security code reviews soon this year, and will contract with professional third-party reviewers following that. To better support ongoing use by the community, we have created a security disclosure group and updated our processes for reporting, reviewing, and responding to security bugs, and we continually seek to improve on our planning and execution.

These security bugs highlight another important roadmap item: a simplified design called the “Compartment API”. This is a subset of the Realms API that only supports a single Realm, which is made safe by freezing, just like in SES. This approach reduces the amount of shim code significantly, and renders many of today’s bugs irrelevant (because the primal-realm objects they leak are no longer powerful). The Compartment API is also better suited to environments that do not offer a way to build new primordials, like Web Workers, Service Workers and the lightweight Moddable XS engine. The Compartment API is still a work-in-progress, but due to these recent bugs, we are expediting this design change, and we recommend the new approach for all systems that do not require the full generality (and complexity) of mutable Realms. The Realms proposal will be updated with this new architecture.

Acknowledgements

Many thanks to the keen-eyed bug submitters, XmiliaH and Richard Gibson, for finding the problems and responsibly reporting them to us so quickly. We are grateful for their efforts and involvement.

And thanks to the NPM security team, especially Andre Eleuterio and Isaac Schlueter, and the GitHub security advisory program, for help with getting the advisories out in a timely fashion.

Contact

Please feel free to contact us by email at info@agoric.com, or on Twitter (@agoric). All our code is on GitHub, and if you find security-sensitive bugs, please submit them to security@agoric.com .

Updated: