Client Security =============== This document is intended to give an overview of the security considerations which must be kept in mind when working on the Hypothesis client. It outlines the overall security goals for the client, names some risks and attack vectors, and identifies ways in which code in the client attempts to mitigate those risks. .. environment-overview: Environment Overview -------------------- The Hypothesis client is a `single-page web application `_ which runs in a browser. Typically, it interacts with some annotated content (the page on which annotations are made) and an annotation service running on a remote server. At different times, users interact directly with the client, with the annotated content, and with the annotation service. Data can flow in both directions: from the annotated content to the client and vice versa. Communication with the annotation service is also bidirectional, making use of an HTTP API and a WebSocket connection: .. code:: .─. ( ) .─`─'─. ; User : ┌─────: ;──────┬────────────────────┐ │ \ / │ │ │ `───' │ │ │ │ │ v v v ┌────────────┐ * ╔════════════╗ ┌────────────┐ │ │ * ║ ║ │ │ │ │ * ║ ║ HTTP │ │ │ Annotated │──────>║ Client ║──────>│ Annotation │ │ content │<──────║ ║<──────│ service │ │ │ * ║ ║ WS │ │ │ │ * ║ ║ │ │ └────────────┘ * ╚════════════╝ └────────────┘ There are two important trust boundaries in this system: 1. Between the client code, executing in a browser, and the service, executing on a remote server. 2. Between the annotated content (which may be an HTML page or a PDF rendered as an HTML page) and the client application. This boundary is marked with asterisks (``*``) in the ASCII art above. Threat Model ------------ We are principally interested in ensuring that untrusted parties cannot gain access to data that is intended to be confidential, or tamper with such data when it is in transit. Protected data might include: - user credentials - annotation data or metadata which is displayed by the client - user profile information - group membership records - user search history We must assume that the user has a baseline level of trust in: 1. their browser software (and the platform it runs on) 2. our client software 3. the annotation service 4. any 3rd-party account provider mediating access to the annotation service (e.g. Google, Facebook, etc.) Any other parties are considered untrusted. Untrusted actors thus include any and all of the following: - the publishers of arbitrary web pages (including annotated content) - advertisers or other 3rd-party contributors to arbitrary web pages (including annotated content) - other users of the annotation service who have not been explicitly designated as trusted (through group membership, for example) - members of the public who don't use the annotation service - active attackers We aim to defend confidential user data against any possibility of unauthorised access. Potential Attack Vectors ------------------------ The mechanisms of directed attack we are aiming to defend against are common to many web applications, namely: - execution of untrusted code in a trusted context (principally by `XSS `_) - `clickjacking `_ - phishing/imitation attacks - eavesdropping of unencrypted network traffic by an untrusted party - to a limited extent, `cross-site request forgery `_, although this is mostly a concern for the annotation service Design Considerations and Defenses ---------------------------------- Same-Origin Policy Protections ############################## The starting point for understanding many of the client-side security mechanisms is the web platform's `same-origin policy `_ (SOP), which ensures that any document on origin [#f1]_ "A" has very limited access to the execution context or DOM tree of any document on a different origin "B". .. _security-sop: .. figure:: security-sop.png Distinct origins for annotated content and client application As shown in :numref:`security-sop`, the bulk of the Hypothesis client application executes within an ``