Improving entity performance Part 2 (EJB 3)

Improving query performance

Even trivial applications can make extensive use of queries. When using EJB 3, you write queries in JPQL and they are translated to SQL. Although you develop in JPQL, you can take certain actions to ensure that the corresponding queries will perform well. DBAs can certainly play a big part in helping improve queries.

You may also be interested in enabling a higher level of logging in your persistence provider to expose and capture the generated SQL statements, and run them through a "tuning utility" provided by your database vendor. This can help you determine whether the SQL can be improved. Certain databases provide an automatic SQL tuning utility that provides suggestions for improving the SQL executed by an application. You can work with your DBA to use such tools and get their recommendations on how to improve query performance.

There is no magic sequence of steps that address all query issues, but we’ll discuss some of the more common scenarios.

Avoiding full-table scans

Unless your entity is mapped to a very small table, you must avoid using SELECT statements that perform full-table scans. For example, you can retrieve all items in a query like this:

tmp6283_thumb_thumb


The query will be much faster and you don’t have to do any extra filtering work in the middle tier.

tmp6284_thumb_thumb

Next, you retrieve the returned collection, iterate through the collection, and perform one or more operations on the resulting data. The persistence provider will generate the following SQL:

tmp6285_thumb_thumb

There are two problems here. First, this code will retrieve and bring in a lot of rows into the middle tier and consume a lot of memory. Second, it will cause a FULL TABLE SCAN in your database and the query will be very slow. Your DBA will advise you to avoid such SQL. Realistically, the number of available items you want is much less than the total number of items in your database. You must utilize the full potential of database filtering by changing your query to limit the number of rows retrieved as follows:

Using indexes to make queries faster

Indexes make your query faster. Your DBAs are probably responsible for building the indexes on tables, but there’s a good chance that they don’t know the details of how your application works. You should work with them so that they understand the queries used by your application. Only then can they build appropriate indexes for your application. Queries that include the primary key always use an indexed scan, meaning that no additional indexes are required. In spite of this, here are some additional cases where you’ll want to use an index to improve performance.

Filtering based on a nonidentity field

This is very prevalent in applications. For example, suppose you want to retrieve your Item entities by itemTitle as follows:

tmp6286_thumb_thumb

This JPQL statement will be translated to SQL as follows:

tmp6287_thumb_thumb

If you do not have an index on ITEM_TITLE, the query will include a FULL TABLE scan. Therefore, we recommend you create an index in these situations. In this case, the index would be created on item_title.

Using indexes for relationship fields

Relationships are implemented in the database by using foreign key constraints. However, you may not have indexes on the foreign key column(s). When you retrieve an associated entity, a join between the two underlying tables is performed. But this is slow because a FULL TABLE sCAN will be executed on the associated entities. Creating an index on the underlying table will allow the database to use that index while joining the tables, which is must faster than joining two large tables without an index.

In ActionBazaar, Item and Bid have a one-to-many relationship due to eager loading. If you have a JPQL query that uses a JOIN clause, the persistence provider could generate the following SQL statement:

tmp6288_thumb_thumb

If we assume that there is no index on the bids.item_id, the Oracle database handles this SQL statement like so:

tmp6289_thumb_thumb

If you add an index on the item_id column for the bids table, you’ll see the query plan for our SQL statement change as follows:

tmp6290_thumb_thumb

Review your queries with your DBAs and they should be able to determine whether adding an index for a column makes sense.

Ordering a collection of entities

You can order the entities retrieved in a collection by using the @OrderBy annotation. In ActionBazaar, if we want to retrieve Bids in descending order of bidPrice, we can do this:

tmp6291_thumb_thumb

Ordering of rows is an expensive operation in the database, especially when a lot of rows are retrieved. Therefore, it doesn’t always make sense to retrieve the entities in an ordered collection. Unless another sort order is required, let’s set the default order using a JPQL query as follows:

tmp6292_thumb_thumb

The database will try to order the matching records by BID_PRICE. We expect your DBA will agree that adding an index on BID_PRICE for the BIDS table will improve query performance.

Using functions in the WHERE clause of JPQL

You can use JPQL functions in the WHERE clause of a query. For example, you can create a JPQL query as follows:

tmp6293_thumb_thumb

This statement will be translated to SQL as follows:

tmp6294_thumb_thumb

tmp6295_thumb_thumb[2]

Check the documentation for your persistence provider to find out whether setting the JDBC fetch size is supported.

Remember that when you use a function in the WHERE clause, the database won’t use an indexed scan, even if an index exists on the first_name column. Because of this, you should avoid using functions in the WHERE clause. Some databases support creating function-based indexes and you can use them if needed. Just be aware that the function-based indexes may not be portable.

For our example, you could consider storing the names in uppercase instead of using the JPQL Upper function.

Reducing round-trips to the database

If your query retrieves a lot of entities, that means a lot of rows are retrieved from the database, and this translates into multiple round-trips between your application and the database. Some JDBC drivers provide facilities to reduce the number of round-trips to the database by setting the number of rows to be prefetched in the middle tier while a result set is being populated. This improves performance of queries that retrieve a large number of rows.

Caching

The EJB 3 JPA does not require persistence providers to do any type of caching. On the other hand, one of the primary benefits of using most O/R mapping frameworks is that they provide a certain level of caching to reduce trips to the database. Some people think caching is the solution for every performance problem. The reality is that improper use of caching may lead to stale data and a whole different set of performance issues. Before jumping into using a cache, you need to understand how your vendor supports caching.

In most cases, you can improve performance of your applications with the appropriate use of caching. Most persistence providers support caching either entity objects, queries, or both.

Caching probably makes sense for data that is read-only or is not frequently updated (read-mostly). For example, in ActionBazaar some entities such as Category rarely change. Knowing this helps us decide that it makes sense to cache Catalog entity objects.

Some queries may always result in the same data within a specific time interval. For example, a named query findFiveMostPopularitems may always return the same set of entities for a four- to five-hour interval. You may wish to cache the results of that query because almost all users of ActionBazaar would probably wish to see the most popular items.

The caching types you can use with an EJB 3 JPA provider can be broken into three levels, as shown in figure 13.3.

You may use caching of objects at three levels: 1) the transactional cache, which is made available by the persistence provider within a transaction to reduce database round-trips; 2) the extended persistence context, which you can use as a caching mechanism with stateful session beans; and 3) the persistence unit level cache (if provided by the persistence provider), which is a shared cache that all clients of the persistence unit can use.

Figure 13.3 You may use caching of objects at three levels: 1) the transactional cache, which is made available by the persistence provider within a transaction to reduce database round-trips; 2) the extended persistence context, which you can use as a caching mechanism with stateful session beans; and 3) the persistence unit level cache (if provided by the persistence provider), which is a shared cache that all clients of the persistence unit can use.

Keep these three levels in mind when evaluating possible options to put in your cache. Try moving items from one cache to another if possible, to determine which configuration works best for your specific application and runtime usage patterns.

Transactional cache

Transactional caching ensures that the object from the cache is returned when the same object is requested again. A typical example is that you run a query that returns an Item entity, and that entity will be cached in the transactional cache. When you use the EntityManager.find method to retrieve the Item again, the persistence provider will return the same Item instance from the transactional cache.

The other benefit of a transactional cache is that all updates to an entity are deferred to the end of the transaction. Imagine what would happen if you did the following in the same transaction:

tmp6297_thumb_thumb

If your persistence provider doesn’t use a cache and doesn’t defer the commit until the end of the transaction, then it probably will perform what translates to at least two SQL statements. First, it will execute a SQL INSERT to persist the Item. This will be followed by a SQL UPDATE to modify the initialPrice. Most persistence providers will make use of a cache and then execute a single insert that will take the new price into account.

Hibernate calls this cache its first level or session cache, and TopLink calls it the UnitOfWork cache. Both of these products enable these caches by default. Check whether your persistence provider supports this type of cache. We aren’t aware of any reason you’d want to disable the transactional cache.

Using an extended persistence context

The transaction cache will demarcate a single transaction. This could be a problem if your application needs to maintain conversational state between different method calls, since it will require multiple round-trips to the database to retrieve the same entity. You can avoid this situation by using an extended persistence context.

You may remember from our discussions in topic 9 that only stateful session beans support extended persistence contexts. They allow you to keep entity instances managed beyond single method calls and transaction boundaries. You can use extended persistence context as a cache for your stateful bean instance during the state of conversation, which can reduce the number of round-trips to your database. Additionally, all database updates resulting from persist, merge, remove, and so forth are queued until the end of the persistence context, reducing the number of database operations.

Listing 13.1 shows how you can use an extended persistence context for caching entities between method calls of your application.

Listing 13.1 Using an extended persistence context to cache objects across method calls

Listing 13.1 Using an extended persistence context to cache objects across method callsListing 13.1 Using an extended persistence context to cache objects across method calls

The PlaceOrderBean uses an extended EntityManager by setting the Persistence-ContextType to EXTENDED. The persistence context will live during the entire state of conversation and will be destroyed when the bean instance is destroyed or removed ©. You can store entities as the instance variables of the stateful session bean O and values for the instances are set by different methods C, and the entities are managed during the lifespan of the extended EntityManager and can be used without having to be detached at the end of the method call. Entity operations such as merge Q can be performed outside the scope of a transaction since we have set the default transaction attribute for the bean to not_supported. The database updates resulting from these operations are queued up and performed when the persistence context is associated with a transaction f. This reduces the number of round-trips to the database.

However, you have to be careful when your stateful session bean invokes another EJB such as a stateless session bean—there are several limitations related to the propagation of extended persistence contexts.

Persistence unit level cache

The transactional and persistence context caching mechanisms can only be used with a single client and cannot be shared. You’ll probably see a real performance boost when entity instances in the cache are shared by multiple clients, thus reducing trips to the database for all of them. You can call this an application cache, but we call it a PersistenceUnit cache because entities are scoped in a persistence unit. Hibernate calls this a second-level cache, and you need to configure an external cache provider to take advantage of this second level or session factory level cache. TopLink refers to it as a session cache, and it’s integrated with the persistence provider. In addition, the TopLink implementation provides several options for configuring this cache. Review your vendor documentation to see whether it provides the ability to cache objects out of the box.

You can either cache entities or queries in the PersistenceUnit cache. When you retrieve some entities using a query, those entities will be cached. If you try to retrieve a cached entity by using the EntityManager’s find method, then the entity instance will be returned from the cache. Typically, persistence providers store the entities in the cache using their identities, so you must use the find method to retrieve an entity by its primary key.

If your EJB 3 persistence provider supports caching of entities in a Persistence-Unit cache, it is probably done with a vendor-specific name-value pair of properties in a configuration file. In the following example we are trying to cache 5000 instances of the Category entity in TopLink Essentials:

tmp6300_thumb[2]

If you want to cache a query result, then you probably want to do it on a per-query basis, either stored in an external configuration or as a QueryHint for the query. Check your persistence provider documentation to determine how it supports the caching of queries.

Here is an example of how TopLink Essentials can be used for caching in a named query using @QueryHint:

tmp6301_thumb[2]

You can also provide a hint to refresh the cache from the database with a query like this:

tmp6302_thumb[2]

Now that you have seen some examples of caching, let’s discuss some caching best practices.

Read-only entities

You can significantly improve application performance by using read-only entities. Examine your applications to determine whether any entities may be made read-only. For example, in our ActionBazaar system, office locations change very rarely and we can make the ShippingType entity read-only. Unfortunately, like caching features, the read-only feature is a vendor extension and you have to depend on either a vendor-specific API or configuration to mark an entity as readonly. Normally, read-only entity instances will be loaded into the PersistenceUnit cache and never discarded.

Read-only entities will significantly improve performance because persistence providers won’t calculate the change set for the read-only entity, and no clone or merge operations will be performed against a read-only entity.

Caching best practices

Understanding your applications and checking your caching strategy usually makes sense for your applications. The cache is best used for entities that are not frequently updated or that are read only. Make sure that table data is updated only by the application that is using the cache, because if any external applications update the same cache, someone working with the cache will end up with stale data.

Check your persistence provider for details on how they support caching entities and queries; the caching mechanism varies from provider to provider. For example, one provider may follow this rule: if an entity is updated in a transaction, then the entity will be updated in the cache when the transaction is committed to the database. However, some persistence providers may choose to expire the cache instead.

Stress-test your applications with and without a cache. It’s the only way to determine if using the cache, and more specifically the particular cache configuration, will improve or degrade your applications’ performance.

That concludes our discussion on improving EJB 3 entity performance. Are you feeling faster yet? Even if you’re not, your entities should be. Next, let’s look at how you can improve the performance of session and message-driven beans.

Next post:

Previous post: