A Primer for Testing the Security of GraphQL APIs

By Alex Leahu

Whether you're a penetration tester, security engineer, or bug bounty hunter, it can be incredibly helpful to know how to find vulnerabilities in a GraphQL API. This post will introduce you to GraphQL and its functionality from the perspective of someone performing a security assessment.

The post will not focus on how to securely implement a GraphQL API, although you can extrapolate details that’ll help you in doing so. Additionally, although I will draw parallels to familiar topics like REST and SQL, other concepts may be new.

A Brief Introduction to GraphQL

GraphQL is a specification that was developed by Facebook and later moved to the GraphQL Foundation. The high-level description from the GraphQL Foundation is:

GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools.

Here is our security-relevant, high-level overview of GraphQL:

GraphQL is regarded by some as an alternative to REST. There are many implementations of GraphQL engines and clients. Because the reference implementation is written in JavaScript, much of the ecosystem is also written in JavaScript, although other languages are supported as well. Most GraphQL engines do the bare minimum required by the spec and aren't a “batteries-included” solution in the way Python's Django may be considered. Much is left up to the developer, sort of like Python's Flask. And just like how there are a lot of third-party addons for Flask, there's a similar ecosystem around GraphQL. The ecosystem has much more tooling support for JavaScript GraphQL engines; if you're testing a GraphQL API not written in JavaScript, you will need to review much more hand-crafted custom code.

It’s a good idea to review the Introduction to GraphQL to create familiarity. Additionally, there are public GraphQL APIs like SpaceXLand where you can get the hang of how queries work. We’ll just scratch the surface enough to get you going here.

Types

As mentioned earlier, GraphQL uses static types. An object type is made up of fields along with types defining what is accepted by each field.

Below is an example object type called Character:

type Character {
  name: String!
  appearsIn: [Episode!]!
}

In the example above, String is considered a scalar type. Eventually a field will have to resolve actual data. Scalar types represent the actual data that will be resolved, such as a string or integer. Unlike other types, like Episode, these do not have subfields. There are built-in scalar types for the most common use-cases, although custom scalar types can also be created by developers. Pay close attention to any scalar type that is custom, since custom types are more likely to have issues.

Feel free to also look into enumeration types as well, just so you know that they exist.

Operations Types

There are three operation types, but the first two are the ones you'll see most often:

Even though query is an operation type, we also refer to all requests as queries no matter what the operation type is.

Queries

All GraphQL queries must include fields that you want to be returned in the response. Fields are specific pieces of data within an object like an ID, name, or address. In the following example, we are using the users query to request the name of all users:

Request

{
  users {
    name
  }
}

Response

{
  "data": {
     "users": [
      {
        "name": "Carmen"
      },
      {
        "name": "Nathan"
      }
     ]
  }
}

Some queries may also accept arguments. A query similar to the one above is the singular version user, which accepts the argument id among others. As a result, we get the name of the user with a particular ID:

{
  user(id: "1000") {
    name
  }
}

For mutations, you might see a field like add_user, where you can pass an argument to create a new user. In this case, we wanted to return the id of the newly created user after completion:

mutation {
  add_user(name: "Palo") {
    id
  }
}

Variables

So far, we've been passing values as arguments directly in the query, but it's also possible to set up a query in a way similar to a prepared statement in SQL. This method allows sending the values separately.

Let's take the previous example and pass the name “Palo” as a variable.

Query

mutation anything($name: String!){
  add_user(name: $name) {
    id
  }
}

Variables

{"name": "Palo"}

The query and variables will be submitted to the GraphQL API in the same request, but are sent as separate parameters.

Fragments

You will probably see queries that contain ... in them. Those are fragments. Fragments are used to simplify the reuse of a set of fields. You can define the set of fields and use them as a fragment throughout your queries. Although fragments are good to understand, queries can always be created without them.

GraphQL API Interaction

So how do we talk to a GraphQL API? Well, it happens over HTTP, and all interactions are done through POST requests against a single endpoint like https://example.com/graphql. What changes between requests is the query and variables parameters sent in the request. The endpoint remains the same; only the request body changes.

Here is an example of a GraphQL API HTTP request and response:

Request

POST /graphql HTTP/1.1
Host: example.com
...
Content-Type: application/json
Connection: close

{"query":"{\n  capsules {\n    id\n  }\n}\n","variables":null}

Response

HTTP/1.1 200 OK
...
Content-Type: application/json

{"data":{"capsules":[{"id":"C105"},{"id":"C101"},{"id":"C109"},{"id":"C110"},{"id":"C106"},{"id":"C102"},{"id":"C205"},{"id":"C103"},{"id":"C201"},{"id":"C104"},{"id":"C111"},{"id":"C113"},{"id":"C108"},{"id":"C107"},{"id":"C112"},{"id":"C206"},{"id":"C203"},{"id":"C202"},{"id":"C204"}]}}

This is a pretty basic query, but as the queries and responses get larger it becomes a pain to work with manually. When it comes time to constructing queries, you are going to want to use a tool like GraphiQL or Altair. It's similar to working with Postman for REST APIs (which also now has GraphQL support!). Be sure to also check if the API is already hosting a GraphQL console/explorer, which means you don't even have to bring your own tools (we'll talk more about this later).

As a note, API requests are not limited to POST requests or even required to be JSON encoded. The query can be passed as parameters in a GET request or even as URL-encoded form data in a POST request. All of this depends on the implementation, and you should definitely explore to see what is available. The following requests are some examples of what the GraphQL API may accept:

GET (query parameters)

GET /graphql?query=%7B%0A++capsules+%7B%0A++++id%0A++%7D%0A%7D%0A&variables%5Bvariable1%5D=example HTTP/1.1
...
Connection: close

POST (URL encoded body)

POST /graphql HTTP/1.1
...
Content-Type: application/x-www-form-urlencoded
Connection: close

query=%7B%0A++capsules+%7B%0A++++id%0A++%7D%0A%7D%0A&variables%5Bvariable1%5D=example

POST (form data)

POST /graphql HTTP/1.1
...
Content-Type: multipart/form-data, boundary=---------------------------kanbvfbnmt
Connection: close

-----------------------------kanbvfbnmt
Content-Disposition: form-data; name="query"

{
  capsules {
    id
  }
}

-----------------------------kanbvfbnmt
Content-Disposition: form-data; name="variables[variable1]"

example
-----------------------------kanbvfbnmt--

Discovery

A GraphQL schema or a working client is important to have. Think of a schema as API documentation generated and used by tools like Swagger or Postman; it provides type definitions, and GraphQL is statically typed, remember? Without this, you won't know which queries you can make (unless you observe client-generated traffic).

Introspection is Enabled

An introspection query is a special query you can make against a GraphQL API that returns information about which queries it supports. A very basic introspection query is shown below:

Request

{
  __schema {
    types {
      name
    }
  }
}

Response

{
  "data": {
    "__schema": {
      "types": [
        ...
        {
          "name": "Location"
        },
        {
          "name": "LaunchFind"
        },
        {
          "name": "LaunchesPastResult"
        },
        {
          "name": "Launchpad"
        },
        {
          "name": "MissionsFind"
        },
        {
          "name": "Mission"
        },
        {
          "name": "MissionResult"
        },
        {
          "name": "PayloadsFind"
        },
        {
          "name": "Roadster"
        },
        ...
      ]
    }
  }
}

However, you probably want to know more than just the names of the types available. The full query below can be used to get a lot more details:

query IntrospectionQuery {
      __schema {
        queryType { name }
        mutationType { name }
        subscriptionType { name }
        types {
          ...FullType
        }
        directives {
          name
          description
          locations
          args {
            ...InputValue
          }
        }
      }
    }

    fragment FullType on __Type {
      kind
      name
      description
      fields(includeDeprecated: true) {
        name
        description
        args {
          ...InputValue
        }
        type {
          ...TypeRef
        }
        isDeprecated
        deprecationReason
      }
      inputFields {
        ...InputValue
      }
      interfaces {
        ...TypeRef
      }
      enumValues(includeDeprecated: true) {
        name
        description
        isDeprecated
        deprecationReason
      }
      possibleTypes {
        ...TypeRef
      }
    }

    fragment InputValue on __InputValue {
      name
      description
      type { ...TypeRef }
      defaultValue
    }

    fragment TypeRef on __Type {
      kind
      name
      ofType {
        kind
        name
        ofType {
          kind
          name
          ofType {
            kind
            name
            ofType {
              kind
              name
              ofType {
                kind
                name
                ofType {
                  kind
                  name
                  ofType {
                    kind
                    name
                  }
                }
              }
            }
          }
        }
      }
    }

If you get a response to an introspection query, you are halfway there. You will have the schema and can learn what is possible to do with the GraphQL API. This is usually enabled by default in most GraphQL implementations, and it often stays enabled for APIs intended for public consumption. An introspection query is sent automatically by GraphQL tools (like GraphiQL) to have the documented API ready for you.

Introspection is Disabled

If introspection is disabled, you are not at the end of the road! You still have options. Let's talk about what you can do:

  1. If you have access to source code, you may have access to a file that contains a schema defined using schema definition language (SDL). With this file, you have the entire schema in human-readable form, which can easily be processed by other tools. On the other hand, a schema that is defined programmatically is not easily consumable by other tools and can be more difficult to use.

  2. If you have access to the client-side source code (like a mobile app or JavaScript), then you can learn a lot about the queries implemented there as well. This requires more manual work and can be tedious. How long it will take to get useful information depends on whether the source code has been compiled, transpiled, obfuscated, and/or minified.

  3. Reversing client-side code doesn't sound like fun? Well, just like when you don't have API documentation, you could still observe traffic from the client and see which queries are made. However, this process makes it hard to see the big picture of what is available to you since you have to go through every request to understand how everything is connected.

  4. If you don't have access to a client or maybe you think there are additional queries that were not sent by the client, the last resort is brute-forcing. This is made much more effective through tools like clairvoyance or clairvoyancex, which use information leakage and helpful autocomplete features from GraphQL APIs to help build a schema. Although you may not build the complete schema through this method, you will have much more than you did before.

Visualization

Once you have the schema, you can really start digging into the API. Tools like Voyager are great for visualizing the schema and give you an idea of what you can do:

With this visualization, you can start coming up with a plan of attack to make sure you get coverage of everything as different roles, users, or organizations. Unfortunately, there isn't much automation for this at the moment except for tools that already exist like AuthMatrix. Although they are not intended specifically for GraphQL, they can still be set up with HTTP requests containing a GraphQL query.

Vulnerabilities

As with every new technology, a lot of vulnerabilities that affect GraphQL are reincarnations of known issue types. This list is not exhaustive, but it includes things you should definitely look for:

Authentication and Authorization

A user may have a field called AuthoredPosts, which represents all the different post nodes that the author created. Each of those post nodes would have an author field that represents the author that created that post.

If a post is intended to be available only to friends of the author, the API might have an authorization check in the post node. The API can validate that the user trying to interact with the post is a friend of the author. But what if the check is instead performed upon accessing the author edge from a user node? This would prevent a user from accessing the post, but the user may still be able to like the post even if they are not a friend. Since the authorization check isn’t performed in the post node, an additional check for liking the post may have been forgotten. It gets tricky trying to track all the different authorization checks when performing them on edges.

Because there are usually different paths to getting to a node, you (as the tester) should check all the paths to a node in the case that authorization checks are performed on an edge and the check is missing. Voyager, a tool mentioned earlier, will help with figuring out those paths in a visual manner.

Mislabeled Operation Type

When compared to a REST API, a query operation type is like a GET request and a mutation operation type is like a POST request. We should expect that a query will not be used for state changes and will only be done as a mutation and vice-versa. This may not always be the case, and you may see operation types misused. The operation name will often be a dead giveaway as to whether it's a state-changing query. Why is it an issue to misuse operation types? It causes confusion, and it also improves an attacker's chances of achieving cross-site request forgery, which is mentioned later.

Input Validation

In addition to the built-in scalar types, custom scalars can be created and used in GraphQL. These custom scalars may not have had as many eyes on them, so hit these extra hard with some fuzzing. Make sure that they do the following:

If a custom scalar type is being used to encapsulate other complicated types like JSON or XML, this is an area you should focus on. It's very likely that the contents in the encapsulated type are not being validated, which could lead to vulnerabilities.

GraphQL is database agnostic and can be backed by a NoSQL database (MongoDB, Couchbase), relational database (MySQL, PostgreSQL), or even another API. Injections in these back-end systems can still occur whether it's through type juggling or lax scalar types. Be sure to look for server-side errors when injecting (no)SQL or JSON syntax within queries. The more knowledge you have on the back ends behind a GraphQL API, the better. For example, if you have REST API calls made in the background with data you send to a GraphQL API, then you can try to inject extra parameters, get path traversal, or even perform server-side request forgery. There’s a lot of funky stuff that could be happening behind the scenes.

GraphQL Batching

Remember how we said you can request multiple resources within one request? Well, that feature can also backfire if you start making too many queries in a single HTTP request; brute-force attacks, race conditions, and other unexpected behaviors can occur.

If API rate limiting is in place and enforced only at the HTTP level, a single HTTP request with many queries won't be throttled. Brute-force attacks may suddenly become very easy! Interesting endpoints may be login or authentication-related (2FA), IDOR/enumerable objects, or other functionality specific to your application. A tool that can help with this is BatchQL, created by Assetnote. Otherwise, pull out your favorite scripting language and script an attack yourself.

Race conditions can occur if mutations within the same HTTP request are performed in parallel instead of sequentially. This behavior will depend on the GraphQL engine that is being used. Of course, race conditions across multiple HTTP requests would also still apply. Because many applications are not built with atomic operations in mind, these vulnerabilities are very common.

Denial of Service

Complex queries can lead to denial of service (DoS) by making the GraphQL API take a long time to return the response to your query. This is not so different from regular expression DoS, although it's trickier to prevent. Requesting a lot of data at once (or having a query that nests many other queries) could overload something in the chain, be it the application server, database, or other APIs behind the scene.

Techniques that developers use to mitigate DoS in GraphQL include limiting the depth, size, or cost of a query.

For additional reading, Apollo wrote a blog post that reviews the thought process of trying to protect a GraphQL API from expensive queries.

Cross-site Request Forgery (CSRF)

Don't skip checking for CSRF just because you see all of this JSON in requests. The GraphQL API may accept data in other forms besides plain JSON in a body.

Ask yourself these questions:

  1. Is the GraphQL API being used from a browser that performs cookie authentication and/or doesn't require special headers?
  2. Are you able to perform mutations using GET parameters, POST body with URL encoded data, or POST body with form data?

If you meet the conditions above and the stars align with SameSite, then you're in luck!

For additional reading, Doyensec wrote a great blog post about CSRF issues in GraphQL, and they also released a tool that can help with generating various GET and POST requests.

Introspection Enabled

As someone testing a GraphQL API, you’ll definitely want introspection enabled. However, just as introspection is useful to you, an attacker is also going to get value out of it. Make it a bit harder for attackers by disabling introspection on production APIs, unless of course public API documentation is required. This is basically an information disclosure feature/misconfiguration. Additionally, once introspection is disabled, ensure that hinting is also disabled, as that type of information disclosure leads to schema discovery. For example, you may issue a query with a partial word like caps, which doesn't exist:

{
  caps {
    id
  }
}

The server may respond with an error message like the following:

{
  "errors": [
    {
      "message": "Cannot query field \"caps\" on type \"Query\". Did you mean \"capsule\" or \"capsules\"?",
      "locations": [
        {
          "line": 2,
          "column": 3
        }
      ],
      "extensions": {
        "code": "GRAPHQL_VALIDATION_FAILED"
      }
    }
  ]
}

This behavior can be used to more efficiently brute-force a schema. The tool clairvoyance, mentioned earlier, utilizes this functionality to enumerate the schema.

Console/Explorer Enabled

Depending on the GraphQL engine used, it may come packaged with a console for developers that isn't disabled in production.

The following is GraphiQL, which is a common console you will see:

Here are some ideas for paths to look for: SecLists – GraphQL. You don't want to have consoles like this enabled unless they are on a public GraphQL API, as this exposes an additional attack surface.

Error Information Disclosure

Error messages are helpful to any attacker. In the context of GraphQL, pay attention to error messages that leak names of fields. If you don't have a schema, this can help reveal some of the inner workings of the GraphQL API.

Conclusions

Hopefully this post was a good springboard for you to use to jump into the waters of GraphQL! If you are looking to take things to the next level, be sure to go through Introduction to GraphQL. After that, go ahead and build a GraphQL API yourself and get into the same mindset as developers utilizing this new technology.


Forces Unseen is an independent security consulting firm with a focus on application and infrastructure security.