Making Prisma/GraphQL Better

Author: Ethan Glover
Last Updated: Mar 5, 2022, 4:24 PM UTC
GraphQL

GraphQL is faster than REST. The specification itself doesn't make anything inherently faster. But by allowing the client to make highly specific requests, it naturally uses less resources.

That being said, because GraphQL is just a specification and doesn't handle logic for you, just because you're using GraphQl doesn't mean you're taking full advantage.

For example, when you use Prisma to write a resolver to fetch an item, GraphQL itself will only retrieve the data the caller requested. However, that does not mean Prisma won't retrieve that excess data from the database server side.

Select All

query Query {
  user(id: 123) {
    username
  }
}

This query will only return the username field from a user resolver. (Let's assume we'tre talking about a user database here.) However, that does not mean the Prisma function used to retrieve this data doesn't do:

select * from user where id = 123

Luckily, PalJs provides an easy PrismaSelect function that looks at the info object from the GraphQL query and makes sure to only select the fields requested. You can abstract this into a very small and simple function to make it easy to use for all resolvers (info comes from every incoming Apollo query):

This will transform that select all SQL query into the following:

select username from user where id = 123

Relationships

One of the challenges of writing resolvers can be creating relationships between tables. For example, I want to make the below query. The user table is related to the blog table with a relationship of userId => authorId.

query Query {
  user(id: 123) {
    blogs {
      title
    }
  }
}

There are a couple of things I want Prisma to do here.

  1. Recognize that the initial query to the users table needs to get userId if the child query is querying the blog table.
  2. Recognize that the userId field on user relates to the authorId field on blog.

I don't want the caller to have to select the userId field and I don't want to spend too much time writing if statements for every resolver relationship. And unfortunately, the GraphQL info object and Prisma's generated types don't help us much here.

So what I've come up with is a resolveArguments function to do all of this work for me. Here's how it works:

The key to the functionality is in the array of “RelationInfo” objects. Each item in the array represents a relationship between the resolver on the current table and the parent table in the nested GraphQL query. We use this function in the child resolver. Sticking with our example above, we'd put this in the blogs resolver, not the user resolver.

export interface RelationInfo {
  parentColumnName: string;
  parentTableName: string;
  relationColumnName: string;
}
const resolvedArguments = resolveArguments({
  arguments_,
  info,
  parent,
  relationInfo: [
    {
      parentColumnName: Prisma.User.userId,
      parentTableName: Prisma.ModelName.User,
      relationColumnName: Prisma.Blog.authorId,
    },
  ],
});

Let's walk through what this function does. From the start, while we're resolving arguments we can go ahead and apply our select abstraction from above.

let resolvedArguments = parameters.arguments_;
if (!ignoreSelect) {
  resolvedArguments.select = select<SelectType>(parameters.info);
}
  

From here, we can start filling out the request. We do that by looping over the relationInfo array and finding which one of it's parent table names relates to the info objects parent table name.

for (const relation of parameters.relationInfo) {
  if (parameters.info.parentType.name === relation.parentTableName) {
  }
}

While I mentioned that I don't want the caller to have to select userId, unfortunately you have to remember we're starting from the child table. This logic is happening in the blogs query and we don't have control over what happens in the user query at this point. So we need that userId.

It would at least be a nice courtesy to let users know which field needs to be queried:

if (typeof thisParent?.[relation.parentColumnName] === 'undefined') {
  throw new TypeError(
    `Must call ${relation.parentColumnName} from ${relation.parentTableName}`
  );
}

Otherwise, we can move on and get our resolved arguments. Remember the point of this was to make sure we're using the appropriate relationship fields. So essentially all we need to do is make sure Prisma's where statement contains authorId: userId.

return {
  ...parameters.arguments_,
  where: {
    [relation.relationColumnName]: thisParent?.[relation.parentColumnName],
    ...parameters.arguments_.where,
  },
}

Simple as that, we use the spread operator to make sure we're not stripping out any fields from the original request and simply add a where for the relationship. Altogether it would look something like this:

This allows you to create boilerplate resolvers that only require a relationship array and the default parameters Apollo provides from incoming queries. It also helps Prisma generate smarter SQL queries. The SQL query here would be something like this:

const blogs = async (
  parent: Record<string, unknown> | undefined,
  arguments_: Prisma.BlogFindManyArgs,
  context: Context
  info: GraphQLResolveInfo
): Promise<Blog[]> => {
  const resolvedArguments = resolveArguments({
      arguments_,
      info,
      parent,
      relationInfo: [
          {
              parentColumnName: Prisma.UserScalarFieldEnum.userId,
              parentTableName: Prisma.ModelName.User,
              relationColumnName: Prisma.BlogScalarFieldEnum.authorId,
          },
      ]
  });

  return prisma.blogs.findMany({ ...resolvedArguments });
}

This allows you to create boilerplate resolvers that only require a relationship array and the default parameters Apollo provides from incoming queries. It also helps Prisma generate smarter SQL queries. The SQL query here would be something like this:

select userId from User where userId = ‘userId’
select title from blog where authorId = ‘userId’

In case you're wondering why Prisma generates two SQL statements instead of a Join. It's because it's more efficient. You'll notice that both are using the same userId from the original arguments. These can fire concurrently. This is a simple query, you could have queried the blog table directly with the userId. But as these scale up and become more complex, Prisma will generate different SQL.

The point of these abstractions is to take advantage of Prisma as a backend. Prisma is really freaking smart. Ideally you want to give it as much information as possible (such as relationship info) and let it do it's thing. In the case of GraphQL where a single post request can be querying multiple things, Prisma will read everything and generate the smartest query it can based on that context. Meaning, it doesn't just fire off SQL queries one at a time as it hits your resolvers. It intelligently looks at all incoming requests before hitting the database and generates what it believes to be the best queries (it's usually right).

Solving The Many-To-Many N+1 Problem

ORM's get criticized for the N+1 problem. As Prisma's optimization guide says:

The n+1 problem occurs when you loop through the results of a query and perform one additional query per result, resulting in n number of queries plus the original (n+1). This is a common problem with ORMs, particularly in combination with GraphQL, because it is not always immediately obvious that your code is generating inefficient queries.

-Prisma, Query Optimization

However, that doesn't mean this issue can't be solved on a programmatic level such that we can automatically generate efficient queries. As noted before, Prisma batches incoming queries to create more intelligent SQL queries. However, this does not apply to findMany queries.

Because this site doesn't use a database. (On Hosting Static Pages) I'm going to pull an example from a real-world production app where I implemented this solution. I'm running this query where sites refers to a location and caseData refers to a support ticket. I used the following query:

Query for sites and caseData

Without optimizing for N+1 you get one query selecting all sites and a new query for each of those sites selecting all caseDatas. The generated SQL from prisma looks like this:

Unresolved site, case data.

This query took about 6 seconds to complete. Here's what it looks like after using my resolveFindMany() function.

Results of SQL after using resolved function.

Case Data Resolved SQL Queries

Now, instead of hundreds of individual select statements we get three queries. The first is the same, we're basically just grabbing all sites. Then, it grabs all siteId's (which is the column we're using as a relationship to caseData). Last it grabs the caseData. Note that instead of Join, Prisma is using SQL's IN keyword here.

You could bring up a lot of arguments about this implementation. Complain about the first two queries being essentially the same. Argue that Join would be faster. (It is not.) At the end of the day, this implementation is based on Prisma's research and use of relationships to create efficient queries.

This second query took about 4 seconds. A 2 second savings. Keep in mind that I am not running this multiple times or doing any real benchmarking. But a 2 second difference is significant. Normally, you would want to avoid this kind of query altogether. Instead opt for pagination. In the real world, this abstraction cuts down a significant amount of time on any findMany query without much effort from the developer. Simply use the resolveFindMany() function.

So what does this function look like? It is honestly very nasty and not totally TypeScript friendly. I'll give a basic list of what it does and leave the entire function I'm using in production below. For now we'll keep using the site caseData query as an example.

  1. From the caseData resolver use the same relationship array resolveArguments from above does to get the parent model.
  2. Get the relationValue (siteId) from the parent object. (Site.siteId)
  3. Run a findUnique on the parent model (Site) where the parentColumnName is equal to relationValue.
  4. Chain the current table (caseData) to that findUnique query to run them together.

What you end up with is something like this:

Prisma.Site.findUnique({where: {siteId}}).CaseData()

This is created for every site found on the parent query. Remember, Prisma batches findUnique queries. So by generating a findUnique query for every N results from sites, Prisma will batch them together using SQL's IN keyword as seen above.

So the issue is just generating this based on the GraphQL info object. Meaning, we want our resolver to return something like this:

return resolveFindMany({
  context,
  info,
  modelName: AmsData.ModelName.CaseData,
  parent,
  relationInfo,
  resolvedArguments,
});

We're again passing along the default argument on every Apollo query but also including a relationInfo array and some resolvedArguments (using the above mentioned code).

This is the messy TypeScript confused function I currently use (totally functional and performant):

Last Words

GraphQL is a huge booster to any API. It makes for a good Gateway when using a lot of third-party API's. It's more performant, and easier to work with and maintain than a REST API.

However, that doesn't mean everything is done for you out of the box. GraphQL is a specification, nothing more. You can use services like Hasura to generate GraphQL API's but if you're looking to build your own, it does involve writing code.

The initial perception a lot of people have of it is that it somehow just magically does everything for you. Not recognizing that all it can do is map requests to resolvers and return data based on that.

I've had to listen to professional developers get confused over how authentication works with GraphQL... as if REST we're handling it for them previously. The internet still works the same when you're using GraphQL. Headers and post data still exist.

The point of this post was to share how I've been able to generate efficient queries with Prisma using GraphQL queries. As this is one of the things that make people skeptical of GraphQL I will say I found making these sort of programmatic abstractions based on incoming data way easier when I have access to GraphQL's info object and Prisma's batching capabilities.