Here’s a look at the results from our recent effort analyzing GraphQL API endpoints across the web, and the percentage of those endpoints that allowed an unauthenticated user to view the query & data schema. The intent of this article is to address the implications of allowing this schema to be retrieved, similar technologies that allow access to their schema, what can be done about it, and the trend of those who have prevented such a disclosure.
GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools.Source: https://graphql.org/
For more details about GraphQL, the reader is encouraged to view: https://graphql.org/learn/
During the months of August and September 2020, we scanned the top one million domains looking for GraphQL API endpoints at a few well-known paths – the most common being an HTTP POST to /graphql. Although there are many GraphQL instances with non-common paths, our common paths search still identified and successfully communicated with 3,824 GraphQL endpoints.
Many of the discovered GraphQL API endpoints allowed us to retrieve the schema, which describes the data types and structure, query syntax, directives, entry points, etc. It would be comparable to what one would get if dumping a MySQL DB schema along with routines (ex. stored procedures). This information was obtained by leveraging a built-in GraphQL feature known as introspection. More about introspection can be viewed here: https://graphql.org/learn/introspection/
The following is an example of an introspection query against a well-known cybersecurity organization:
Notice that we retrieved almost 2 MB of schema in the JSON response. When that JSON based schema is “beautified”, it is 140,565 lines describing, in complete detail, how to interact with the API, including default and user generated/set metadata values for every aspect of the schema, such as a description, depreciation status, etc.
By itself, an unauthenticated user gaining access to the API schema through GraphQL introspection is an informational level concern. However, without this information, it would be more difficult, maybe impossible, to formulate some requests against the GraphQL endpoint. Anything one can do to make it harder for an attacker to acquire this type of information reduces the likelihood and impact of an attack.
“Security through obscurity” is not an effective security strategy. But, obscurity, in conjunction with proper access control, adds an extra layer of burden to an attacker. There is simply no need to expose this level of detail except for the limited occasions a developer or integrator needs to see it. Just like you would not expose a database schema or network architecture diagram to an unauthenticated/unauthorized user, you should not expose this schema.
Of the 3,824 GraphQL endpoints we found and interacted with, 94% allowed introspection! Conversely, 6% of endpoints had it disabled. Of that 6%, nearly all of them appeared to be using the Apollo GraphQL platform, which has introspection disabled by default in production. Kudos to Apollo Graph for making this the default setting!
Metadata: Another aspect to consider with exposed schema is the potential for future code updates to inadvertently include sensitive information in the metadata (ex. description field), or in the schema itself, that would then be publicly accessible. Data is sometimes carelessly placed in fields, such as metadata fields, by a developer who is assuming it is “server side only,” when in fact it is accessible by an external, unauthenticated user.
Field suggestions: Some GraphQL instances not only verified whether a field in our query was valid or not, but also gave suggestions to us when one was wrong (i.e. fuzzy matching).
Note: For the endpoints that did not have this field suggestion feature, common named field names were still easy to verify and enumerate.
Here’s what you can do to mitigate some of the information gathering attacks shown above:
GraphQL is not the only technology with these issues; there are other similar API frameworks that expose their schema and have similar risks. This list is not inclusive, but is intended to give a few examples:
If you interested to know how your API and web services would perform against these types of attacks, but do not have the expertise or resources to do so, we would love to speak with you. Contact us today!
Featured image is a derivative work from the following images: mcmurryjulie @ https://pixabay.com/vectors/database-schema-data-tables-schema-1895779/