Planet CQ

January 28, 2025

Things on a content management system - Jörg Hoh

AEM CS: Java 21 update

After a lengthy preparation period, this year the rollout of Java 21 will start for AEM as a Cloud Service. While the public documentation contains all relevant information (and I don’t want to reiterate them here), I want to make a few things more clear.

First, this is the update of the Java version used to run AEM as a Cloud Service. This version can be different from the Java version which is used to build the application. As Java versions are backwards compatible and can read binaries created by older versions, it is entirely possible to run the AEM CS instance with Java 21, but still build the application with Java 11. Of course this restricts you to the language features of Java 11 and for example you cannot use Records, but besides that there is no negative impact at all.

This scenario is fully supported; but at some point you need to update your build version to a newer Java version, as freshly added APIs might use Java features which are not available in Java 11. And as a personal recommendation I would suggest to switch also your buildtime Java version to Java 21.

This change of the runtime Java version should in most cases be totally invisible for you; at least as long as you don’t use or add 3rd-party libraries, which need to support new Java versions explicitly; the most prominent libraries in the AEM context are Groovy (often as part of the. Groovy console) and the ASM library (a library which allows to create and modify Java bytecode). If you deploy one of these into your AEM instance, make sure that you update these to a version which supports Java 21.

by Jörg at January 28, 2025 02:53 PM

January 18, 2025

Things on a content management system - Jörg Hoh

JCR queries with large result sets

TL;DR: If you expect large result sets, try to run that query asynchronously and not in a request; and definitely pay attention to the memory footprint.

JCR queries can be a tricky thing in AEM, especially when it comes to their performance. Over the years practices have emerged, with the most important of them being “always use an index”. You can find a comprehensive list of recommendations in the JCR Query cheat sheet for AEM.

There you can also find the recommendation to limit the size of the result set (it’s the last in the list); while that can definitely help if you need just 1 or a handful of results, this recommendation is void if you need to compute all results of a query. And that situation can get even worse if you know that this result set can be large (like thousands or even tens of thousands of results).

I have seen that often, when content maintenance processes were executed in the context of requests, which took many minutes in an on-prem setup, but then failed on AEM CS because of the hard limit of 60 seconds for requests.

Large result sets come with their own complexities:

Iterating through the entire result set requires ACL checking plus the proper conversion into JCR objects. That’s not for free.
As the query engine puts a (configurable) read limit to a query, it can have a result set of at maximum 100k nodes by default. This number is the best case, because any access to the repository to post-filter the result delivered by the Lucene index also counts towards that number. If you cross that limit, reading the result set will terminate with an exception.
The memory consumption: While the JCR queries provide an iterator to read the result set, the QueryBuilder API provides API which read the entire result set and return it as a list (SearchResult.getHit()). If this API is used, just the result set can consume a significant amount of heap.
And finally: what does the application do with the result set? Is it performing an operating with each result individually and then does not the single result anymore? Or does it read each result from the query, performs some calculations and stores them again in a list/array for the next step of processing. Assuming that you have 100k querybuilder Hits and 100k custom objects (potentially even referencing the Hit objects), that can easily lead to a memory consumption in the gigabytes.
And all that could happen in parallel.

In my experience all of these properties of large result sets mandate that you run such a query asynchronously, as it’s quite possible that this query takes tens of seconds (even minutes) to complete. Either run it as a Sling Job or using a custom Executor in the context of an OSGI service, but do not run them in the context of request, as in AEM CS this request has the big chance to time out.

by Jörg at January 18, 2025 12:26 PM

December 20, 2024

Things on a content management system - Jörg Hoh

This was 2024

Wow, another year has passed. Time for a recap.

My personal goal for 2024 in this blog was to post more often and more consistently, and I think that I was successful at that. When I counted correctly, it were 20 posts in 2024. The consistency in the intervals could be better (a few just days apart, other multiple weeks), but unlike in some other years I never really felt, that I was lagging way behind. So I am quite happy with it and will try to do the same in 2025.

This year I adopted 2 ideas from other blogs:

A blog post series, which is planned as such. In January and February I posted 5 posts on Modeling Performance Tests (starting here). This approach worked quite well, mostly because I spent enough time to write them before I made the first post public. If I know upfront that topics are large enough, I will continue with this type.
The “top N things …” type of posts. I don’t particular like this type of posting, because very often they just scream for attention and clicks, without adding much value. I used that approach 2 times (The new AEM CS feature in 2024 I love most and My top 3 reasons why page rendering is slow) ; and then mostly to share links to other pages. It can work that way, but that will never be my favorite type of blog post.

The most successful blog post of 2024: As I did not add any page analytics to this page (I would need a cookie banner then), I have only some basic statistics from WordPress. The top 3 requested pages besides the start page in 2024 were:

CQ development patterns – Sling ResourceResolver and JCR Sessions (written in 2013)
Do not use AEM as proxy for backend calls (of 2024)
How to analyze “Authentication support missing” (of 2023)

Interesting that a 10 year old article was requested most often. Also WordPress showed me that LinkedIn was a significant source of traffic, so I probably should continue to announce blog posts there. (If you think I should also do announcements elsewhere, let me know.)

And just today I saw the latest video from Tad Reeves, where he mentioned my article on performance testing in AEM CS. Thank you Tad, I really appreciate your feedback and the recognition!

That’s for 2024! I wish you all a relaxing break and a successful year 2025!

by Jörg at December 20, 2024 04:10 PM

December 11, 2024

Things on a content management system - Jörg Hoh

My top 3 reasons why page rendering is slow

In the past years I was engaged in many performance tuning activities, which related mostly to slow page rendering on AEM publish instances. Performance tuning on authoring side is often different and definitely much harder

And over the time I identified 3 main types of issues, which make the page rendering times slow. And slow page rendering can be hidden by caching, but at some point the page needs to be rendered, and often it makes a difference if this process takes 800ms or 5 seconds. Okay, so let’s start.

Too many components

This is a pattern which I see often in older codebases. Often pages are assembled out of 100+ components, very often in deep nesting. My personal record I have seen were 400 components, nested in 10 levels. This normally causes problems in the authoring UI because you need to very careful to select the correct component and its parent or a child container.

The problem on the page rendering process is the overhead of each component. This overhead consists of the actual include logic and then all the component-level filters. While each inclusion and each component does not take much time, the large number of components cause the problem.

For that reason: Please please reduce the number of components on your page. Not only the backend rendering time, but also the frontend performance (less javascript and CSS rules to evaluate) and the authors experience will benefit from it.

Slow Sling models

I love Sling Models. But they can also hide a lot of performance problems (see my series about optimizing Sling Models), and thus can be a root-cause for performance problems. In the context of page rendering and Sling Models backing HTL scripts, the problem are normally not the annotations (see this post), but rather the complex and time-consuming logic when the models are instantiated, most specifically the problems with executing the same logic multiple times (as described in my earlier post “Sling Model Performance (Part 4)“).

External network connections

This pattern requires that during page rendering a synchronous call is done towards a different system; and while this request is executed the rendering thread on the AEM side is blocked. This can turn into problems if the backend is either slow or not available. Unfortunately this is the hardest case to fix, because removing this often requires a re-design of the application. Please see also my post about “Do not use AEM as a proxy for backend calls” for this; it contains a few recommendations how to avoid at least some of the worst aspects, for example using proper timeouts.

by Jörg at December 11, 2024 05:46 AM

December 02, 2024

Things on a content management system - Jörg Hoh

Sling model performance (part 4)

I think it’s time for another chapter in the topic of Sling Model performance, just to document some interesting findings I have recently made in the context of a customer project. If you haven’t read them, I recommend you to check the first 3 parts of this series here:

In this blog post I want to show the impact of inheritance in combination with Sling Models.

Sling Models are simple Java POJOs, and for that reason all features of Java can be used safely. I have seen many projects, where these POJOs inherit from a more or less sophisticated class hierarchy, which often reflect the component hierarchy. These parent classes also often consolidate generic functionality used in many or all Sling Models.

For example many Sling Models need to know the site-root page, because from there on they build links, the navigation, read global properties from etc. For that reason I have seen in many parent classes code like this:

public class AbstractModel {

  Page siteRoot;

  public void init() {
    siteRoot = getSiteRoot();
    // and many more initializations
  }
}

And then this is used like this by a Sling Model called ComponentModel:

public class ComponentModel extends AbstractModel {

  @PostConstruct
  public void init() {
    super();
  }
  ...
}

That’s all straight forward and good. But only until 10 other Sling Models also inherit from the AbstractModel, and all of them also invoke the getSiteRoot() method, which in all cases returns a page object representing the same object in the repository. Feels redundant, and it is. And it’s especially redundant, if a Model invokes the init() method of its parent and does not really need all of the values calculated there.

While in this case the overhead is probably small, I have seen cases where the removal of this redundant code brought down the rendering time from 15 seconds to less 1 second! That’s significant!

For this reason I want to make some recommendations how you can speed up your Sling Models when you use inheritance.

If you want or need to use inheritance, make sure that the parent class has a small and fast init method, and that it does not add too much overhead to each construction of a Sling Model.
I love Java Lambdas in this case, because you can pass them around and only invoke them when you really need their value. That’s ideal for lazy evaluation.
And if you need to calculate values more than once, store them for later reuse
- in the request properties, if you adapt from a request
- or in the ResourceResolver’s propertyMap, if you adapt from a Resource.

by Jörg at December 02, 2024 07:42 PM

November 26, 2024

Things on a content management system - Jörg Hoh

Monitoring Java heap

Every now and then I get the question: “What do you think if we alert at 90% heap usage of AEM?”. The answer is always longer, so I write it down here for easier linking.

TL;DR: Don’t alert on the amount of used heap, but only on garbage collection.

Java is language which relies on garbage collection (GC). Unlike other programming languages memory is managed by the runtime. The operator assigns a certain amount of RAM to the java process for usage, and that’s it. A large fraction of this RAM goes into the heap, and the Java Virtual machine (JVM) manages this heap entirely on its own.

Now, as every good runtime, the JVM is lazy and does work only when it’s required. That means it will start the garbage collection only when then the amount of free memory is low. This is probably over-simplified, but good enough for the purpose of this article.

That means that the heap usage metrics show that the heap usage is approaching 100%, and then it suddenly drops to a much lower value, because the garbage collection process just released memory which is no longer required. And then the garbage collection pauses and the processing goes on, consuming memory, until at some point the garbage collection starts again. This leads to the typical saw-tooth pattern of the JVM.

For that reason it’s not helpful to use the heap usage as alerting metric, as it fluctuates too much, and it will alert you when the actual memory usage is already down.

But of course there are other situations, where the saw-tooth pattern gets less visible, as the garbage collection can release less memory with each run, and that can indeed point to a problem. How can this get measured?

In this scenario the garbage collection runs more frequently, and the less the garbage collection releases, the more often it runs, until the entire application is effectively stopped and only the garbage collection is running. That means that here you can use the amount of the time the garbage collector runs per time period. Anything below 5% is good, and anything beyond 10% is a problem.

For that reason, rather measure the garbage collection, as it is a better indicator if your heap is too small.

by Jörg at November 26, 2024 07:43 PM

November 17, 2024

Things on a content management system - Jörg Hoh

Delivering dynamic renditions

One of the early features of ACS AEM Commons was the Named Image Transformer as part of the release 1.5 of 2014. This feature allowed you to transform image assets dynamically with a number of options, most notable the transformation into different images dimensions to match the requirements of the frontend guidelines. This feature was quite popular and in a stripped-down scope (it does not support all features) it also made it into the WCM Core Components (called the AdaptiveImageServlet).

This feature is nice, but it suffers from a huge problem: This transformation is done dynamically on request, and depending on the image asset itself it can consume a huge amount of heap memory. The situation gets worse when many of such requests are done in parallel, and I have seen more than once situations of AEM publish instances ending up in heavy garbage collection situations, ultimately leading to crashes and/or service outages.

This problem is not really new, as pretty much the same issue also happens on asset ingestion time, when the predefined renditions are created. While on AEM 6.5 the standard solution was to externalize to this problem for asset ingestion (hello Imagemagick!), and AEM CS solved this challenge in a different and more scalable way using AssetCompute. But both solutions did not address the problem of enduser requests to these dynamic renditions, this is and was still done on request in the heap.

We have implemented a number of improvements in the AdaptiveImageServlet to improve the situation:

A limit for requested dimensions was added to keep the memory consumption “reasonable”.
The original rendition is necessarily used as a basis to render the image in the requested dimension, but rather the closest rendition, which can satisfy the requirements of the requested parameters.
An already existing rendition is delivered , if its dimensions and image format is requested.
An upcoming improvement for the AdaptiveImageServlet on AEM CS is to deliver these renditions directly from the blobstore instead of streaming the binary via the JVM.

This improves the situation already, but there are still customers and cases, where images are resized dynamically. For these users I suggest to make the these changes:

Compile a list of all required image dimensions which you need in your frontend.
And then define matching processing profiles, so that whenever such a rendition is requested via the AdaptiveImageServlet it can be served directly from an existing rendition.

That works without changes in your codebase and will improve the delivering of such assets.

And for the users of the Named Image Transformer of ACS AEM Commons I suggest to rethink the usage of it. Do you really use all of its features?

by Jörg at November 17, 2024 04:48 PM

October 20, 2024

Things on a content management system - Jörg Hoh

Restoring deleted content

I just wrote about backup and restore in AEM CS, and why backups cannot serve as a replacement for an archival solution. But instead it’s just designed as a precaution for major data loss and corruption.

But there is another aspect to that question: what about deleted content? Is requesting a restore the proper way to handle these cases?

Assume that you have accidentally deleted an entire subtree of pages in your AEM instance. From a functional point of view you can perform a restore to a time before this deletion of content. But that means that a rollback of the entire content is made, which means that not only this deleted content is restored, but also other changes which performed since that time would be undone.

And depending on the frequency of activities and the time you would need to restore this can be a lot. And you would need to perform all these changes again to catch-up.

The easiest way to handle such cases is to use the versioning features of AEM. Many activities trigger the creation of a version of a page, for example when you activate it, when you delete it via the UI; you can also manually trigger the creation of a version. To restore one page or even an entire subtree you can use the “Restore” and “Restore Tree” features of AEM (see the documentation).

In earlier versions of AEM versions have not been created for Assets by default, but this has changed in AEM CS; now versions are created for assets pretty much as they are creted for pages by default. That means you can use the same approach and restore versions of assets via the timeline (see the documentation).

With the proper versioning in place, most if not all of such accidental deletions or changes can be handled; this is the preferred approach to handle it, because it can be executed by regular users and does not have an impact on the rest system of the system by rolling back really all changes. And you don’t have any downtime on authoring instances.

For that reason I recommend you to work as much as possible with these features. But there are situations, where the impact is that severe that you rather want to roll back everything than restoring things through the UI. In that situation a restore is probably the better solution.

by Jörg at October 20, 2024 06:02 PM

October 09, 2024

Things on a content management system - Jörg Hoh

AEM CS Backup, Restores and Archival

One recurring question I see in the Adobe internal communication channels is like this: “For our customer X we need to know how long Adobe stores backups for our CS instances”.

The obvious answer to this is “7 days” (see the documentation) or “3 months” (for Offsite backup), because the backup is designed only to handle cases of data corruption of the repository. But in most cases there is a followup question “But we need access to backup data up to 5 years”. Then it’s clear that this question is not about backup, but rather about content archival and compliance. And that’s a totally different question.

TL;DR

When you need to retain content for compliance reasons, my colleagues are happy to discuss the details with you. But increasing the retention period for your backups is not a solution for it.

Compliance

So what does “content archival and compliance” mean in this situation? For regulatory and legal reasons some industries are required to retain all public statements (including websites) for some time (normally 5-10 years). And of course the implementation of that is up to the company itself. And it seems quite easy to implement an approach which holds the backups for up these 10 years around.

Some years back I spent some time on the drawing board to design a solution for an AEM on-prem customer; their requirement was to be able to prove what at any time within these 10 years was displayed to customers on their website.
We initially also thought about keeping backups around for 10 years; but then we came up with these questions:

When the content is required, a restore from that backup would be required to an environment which can host this AEM instance. Is such an environment (servers, virtual machines) available? How much of these environments would be required, assuming that this instance would be required to run for some months (throughout the entire legal process which requires content from that time)?
Assuming that an 8y old backup must be restored, are there still the old virtual machine images with Redhat Linux 7 (or whatever OS) around? Is it okay from a compliance perspective to run these old and potentially unsupported OS versions even in a secured network environment? Is the documentation still around which describes to install all of that? Does your backup system still support a restore to such an old OS version?
How would you authenticate against such an old AEM version? Would you require your users to have their old passwords at hand (if you authenticate against AEM), or does your central identity management still support the interface this old AEM version is trying for authentication?
As this is a web page, is it ensured that all external references, which are embedded into the page are also available? Think about the Javascript and CSS libraries, which are often just pulled from their respective CDN servers.
How frequently must a backup be stored? Is it okay and possible to store just the authoring instance every quarter and do not perform any cleanup (version cleanup, workflow purge, …) in that time and have all content changes versioned, so you can use the restore functionality to go back to the requested time? Or do you need to store a backup after each deployment, because each deployment has the chance to change the UI and introduce backwards incompatible changes, which render the restored content not to work anymore? And would you need to archive the publish instance as well (where normally no versions are preserved)? And are you sure that you can trust the AEM version storage enough, so you can rely on JCR versioning to recreate any intermediary states between those retained backups?
When you design such a complex process, you should definitely test the restore process regularly.
And finally: What are the costs of such a backup approach? Can you use the normal backup storage, or do you need a special solution which guarantees that the stored data cannot be tampered with?

You can see that the list of questions is long. I don’t say it is impossible, but it requires a lot of work and attention to detail.

In my project the deal breaker was the calculated storage cost (we would have required a dedicated storage, as the normal backup storage did not provide the required guarantees for archival purposes). So we decided to take a different approach, and we added a custom process which creates a PDF/A out of every activated page and stores it in the dedicated archival solution (assets are stored as is). This adds upfront costs (a custom implementation), but is much cheaper on the long run. And on top if it does not need IT to access the old version of the homepage of January 23, 2019; but instead the business users or legal can directly access the archive and fetch the respective PDF of the time they are interested in.

In AEM CS the situation is a bit different, because the majority of the questions above deal with “old AEM vs everything else around is current”, and many aspects are not relevant for customers anymore; they are in the domain of Adobe instead. But I am not aware that Adobe ever planned to setup such a time machine, which allows to re-create everything at a specific point in time (besides all implications of security etc), mostly because “everything” is a lot.

So, as a conclusion: Using backups for content archival and compliance is not the best solution. It sounds easy at first, but it raises a lot of question if look into the details. The longer you need to retain these AEM backups, the more likely will it be that inevitable changes in the surrounding environments makes a proper function harder or even impossible.

by Jörg at October 09, 2024 05:39 PM

October 05, 2024

Things on a content management system - Jörg Hoh

The new AEM CS feature in 2024 which I love most

Pretty much 4 years ago I joined the AEM as a Cloud Service engineering team, and since that time I am working on the platform level as a Site Reliability Engineering. I work on platform reliability and performance and help customers to improve their applications in these aspects.

But that also means, that many features which are released throughout the years are not that relevant for my work. But there are a few ones that matter a lot to me. They allow me to help customers in really good and elegant ways.

In 2024 there was one, which I like very much, and that’s the Traffic Rules feature (next to the custom error page and CDN cache purging as self-service). I like it, because it lets you filter and transform traffic at scale where it can be handled best: At the CDN layer.

Before that feature was available, all traffic handling needed to happen at the dispatcher level. The combination of the Apache httpd and dispatcher rules allowed you to perform all these operations. However, I consider it a bit problematic. Because at that point the traffic already hit the dispatcher instances. It was already in your datacenter, on your servers.

To mitigate that, many customers (both onprem/AMS or AEM CS) purchased a WAF solution to handle specifically these cases. But now with the traffic rules every AEM CS customers gets a new set of features which they can use to handle traffic on the CDN level.

The documentation is quite extensive and contains relevant examples, showcasing the ways how you can block, ratelimit or transform traffic to your needs:

The most compelling reason I rate this as my top feature this year is really the traffic transformation feature.

A part of my daily job is to help customers to prepare their AEM CS instances to handle their traffic spikes. Besides all the tunings on the backend, the biggest angle to improve this sutuation is to handle all these requests at the CDN. Because then it’s not hitting the backend at all.

A constant problem in that situation are request parameters which are added by campaigns. You might know the “utm*”, “fbclid” or “gclid” query parameters when traffic comes to your site which was clicked either on Facebook or Google. And there are many more. Analytics tool need these parameters to attribute traffic to the right source and to measure the effectiveness of campaigns, but from a traffic management point of view these parameters are horrible. Because by default all CDNs and intermediate caches are considering such requests with query strings as non-cacheable. And that means, that all these requests hit your publish instances, and the CDN and the dispatcher caches are mostly useless for that.

It’s possible to remove these request parameters on the dispatcher (using the /IgnoreUrlParams configuration). But with the traffic transformation feature of AEM CS you can remove them also directly on the CDN, so that this traffic is then served entirely from the CDN. That’s the best case situation, because then these requests never make it to origin, which improves latency for end users.

I am very happy about this feature, because with it the scaling calculation gets much easier, when such campaign traffic is handled almost entirely by the CDN. And that’s the whole idea behind using a CDN: To handle the traffic spikes.

For this reason I recommend every AEM CS customer to check out the traffic rules to filter and transform traffic at the CDN level. It is included in every AEM CS offering and you don’t need the extra WAF feature to use it.
Configure these rules to handle all your campaign traffic and increase the cache hit ratio. It’s very powerful and you can use it to make your application much more resilient.

by Jörg at October 05, 2024 05:08 PM

August 17, 2024

Things on a content management system - Jörg Hoh

Java interfaces, OSGI and package versions

TL;DR Be cautious when implementing interfaces provided by libraries, you can get problems when these libraries are updated. Check for the @ProviderType and @ConsumerType annotations of the Java interfaces you are using to make sure that you don’t limit yourself to a specific version of a package, as sooner or later this will cause problems.

One of the principles of object-oriented programming is the encapsulation to hide any implementation details. Java uses interfaces as a language feature to implement this principle.

OSGI uses a similar approach to implement services. An OSGI service offers its public API via a Java interface. This Java interface is exported and therefor it is visible to your Java code. And then you can use it how it is taught in every AEM (and modern OSGI) class like this:

@Reference
UserNotificationService service;

With the magic of Declarative Service a reference to an implementation of UserNotificationService is injected and you are ready to use it.

But if that interface is visible and with the power of Java at hand, you can create an instance of that class on your own:

public class MyUserNotificationService implements UserNotificationService {
...
}

Yes, this is possible and nothing prevents you from doing it. But …

Unlike Object-oriented programming, OSGI has some higher aspirations. It focuses on modular software, dedicated bundles, which can have an independent lifecycle. You should be able to extend functionality in a bundle without the need that all other code in other bundles needs to be recompiled. So a binary compatibility is important.

Assuming that the framework you are using comes with the UserNotificationService which like this

package org.framework.user;
public interface UserNotificationService {
  void notifyUserViaPopup (User user, NotificationContent notification);
}

Now you decide to implement this interface in your own codebase (hey, it’s public and Java does not prevent me from doing it) and start using it in your codebase:

public class MyUserNotificationService implements UserNotificationService {
  void notifyUserViaPopup (User user, NotificationContent notification) {
    ..
  }
}

All is working fine. But then the framework is adjusted and now the UserNotificationService looks like this:

package org.framework.user;
public interface UserNotificationService { // version 1.1
  void notifyUserViaPopup (User user, NotificationContent notification);
  void notifyUserViaEMail (User user, NotificationContent notification);
}

Now you have a problem, because MyUserNotificationService is no longer compatible to the UserNotificationService (version 1.1), because MyuserNotificationService does not implement the method notifyUserViaEmail. Most likely you can’t load your new class anymore, triggering interesting exceptions. You would need to adjust MyUserNotificationService and implement the missing method to make it run again, even if you would never need the notifyUserViaEmail functionality.

So we have 2 problems with that approach:

It will be only detected on runtime, which is too late.
You should not be required to adapt your code to changes in the other of some one else, especially if this is just an extension of the API you are not interested in at all.

OSGI has a solution for 1, but only some helpers for (2). Let’s check first the solution for (1).

Package versions and references

OSGI has the notion of “package version” and it’s best practice to provide version numbers for API packages. That means you start with a version “1.0” and and people start to use it (using service references). And when you make a compatible change (like in the example above you add a new method to the service interface) you increase the package version by a minor version to 1.1 and all existing users can still reference this service, even if their code was never compiled against the version 1.1 of the UserNotificationService. This is backwards-compatible change. If you are making a backwards-incompatible change (e.g removing a method from the service interface), you have to increase the major version to 2.0.

When you build your code and use the bnd-maven-plugin (or the maven-bundle-plugin) the plugin will automatically calculate the import range on the versions and store that information in the target/classes/META-INF/MANIFEST.MF. If you just reference services, the import range can be wide like this:

org.framework.user;version=([1.0,2)

which translates to: This bundle has a dependenty to the package org.framework.user with a version equal or higher than 1.0, but lower than (excluding) 2. That means that a bundle with this import statement will resolve with package org.framework.user 1.1. If you OSGI environment only exports org.framework.user in version 2.0, your bundle will not resolve.

(Much more can be written in this aspect, and I simplified a lot here. But the above part is the important part when you are working with AEM as a consumer of the APIs provided to you.)

Package versions and implementing interfaces

The situation gets tricky, when you are implementing exported interfaces. Because that will lock you to a specific version of the package. If you implement the MyUserNotificationService as listed above, the plugins will calculate the import range like this:

org.framework.user;version=([1.0,1.1)

This will basically lock you to that specific version 1.0 of the package. While it does not prevent changes to the implementation of any implementations of the UserNotificationService in your framework libraries, it will prevent any change to the API of it. And not only for the UserNotificationService, but also for all other classes in the org.framework.user package.

But sometimes the framework requires you to implement interfaces, and these interfaces are “guaranteed” to not change by the developers of it. In that case the above behavior does not make sense, as a change to a different class in the same package would not break any binary compatibility for these “you need to implement these interface” classes.

To handle this situation, OSGI introduced 2 java annotations, which can added to such interfaces and which clearly express the intent of the developers. They also influence the import range calculation.

The @ProviderType annotation: This annotation expresses that the developer does not want you to implement this interface. This interface is purely meant to be used to reference existing functionality (most likely provided by the same bundle as the API); if you implement such an interface, the plugin will calculate a a narrow import range.
The @ConsumerType annotation: This annotation shows the intention of the developer of the library that this interface can be implemented by other parties as well. Even if the library ships an implementation of that service on its own (so you can @Reference it) you are free to implement this interface on your own and register it as a service. If you implement such an interface with this annotation, the version import range will be wide.

In the end your goal should be not to have a narrow import version range for any library. You should allow your friendly framework developers (and AEM) to extend existing interfaces without breaking any binary compatibility. And that also means that you should not implement interfaces you are not supposed to implement.

by Jörg at August 17, 2024 05:28 PM

July 31, 2024

Things on a content management system - Jörg Hoh

Do not use the Stage environment in your content creation process!

Every now and then (and definitely more often than I ever expected) I come across a question about best practices, how to promote content from the AEM as a Cloud Service Stage environment to Production. The AEM CS standard process does not allow that, and on further request it turns out, that the customers

create and validate the production content on the Stage environment
and when ready, promote that content to the Production environment and publish it.

This approach contradicts quite a bit the CQ5 and AEM good practices (since basically forever!), which say:

Production content is created only on the production environment. The Stage environment is used for code validation and performance testing.

These good practice are directly implemented in AEM CS, and for that reason it is not possible to promote content from Stage to the Production environment.

But there are other implications in AEM CS, when your content creation process takes place on the Stage environment:

If your Stage environment is an integral part of your content creation process, then your Stage environment must not have any lesser SLA than the Production environment. It actually is another production environment. Which is not reflected in the SLAs in AEM CS.
If you use your Stage environment as part of the content creation process, which environment do you use for the final validation and performance testing? In the design of AEM CS this is the role of the Stage environment, because it is sized identical to Production.
in AEM CS the Production Fullstack pipeline covers both Stage and PROD environments, but in serial manner (first Stage and then PROD, often with an extended period of time for approval step in between). That means, that you can update your Stage environment, but not your Production environment, which could impact your content creation process.

For these reasons, do not expand your content creation process on 2 environments. If you have requirements which can only be satisfied with 2 dedicated and independent environments, please talk to Adobe product management early.

I am not saying that the product design is always 100% correct and that if you are wrong if you need 2 environments for content creation. But in most of the cases it was possible to fit the content creation process to the Production environment, especially with the addition of the preview publish. And if that’s still not a fit for your case, talk to Adobe early on, so we can learn about your requirements.

by Jörg at July 31, 2024 10:39 AM

June 27, 2024

Things on a content management system - Jörg Hoh

Do not use AEM as a proxy for backend calls

Since I am working with AEM CS customers, I came a few time across the architecture pattern, that requests made to a site to passed all the way through to the AEM instance (bypassing all caches), and then AEM does an outbound request to a backend system (for example a PIM system or other API service, sometimes public, sometimes via VPN), collects the result and sends back the response.

This architectural pattern is problematic in a few ways:

AEM handles requests with a threadpool, which has an upper limit of requests it will handle (by default 200). That means that at any time the number of such backend requests is limited by the amount of AEM instances. In AEM CS this number is variable (auto-scaling), but even in an auto-scaling world there is an upper limit.
The most important factor in the number of such requests AEM can handle per second is the latency of the backend system call. For example if your backend system responds always in less than 100ms, your AEM can handle up to 2000 of such proxy requests per second. If the latency is more likely 1 second, it’s only up to 200 proxy requests per second. This can be enough, this can be way too small.
To achieve such a throughput consistently, you need to have agressive timeouts; if you configure your timeouts with 2 seconds, your guaranteed throughput can only be up to 100 proxy requests/seconds.
And next to all those proxy requests your AEM instances also need to handle the other duties of AEM, most importantly rendering pages and delivering assets. That will reduce the number of threads you can utilize for such backend calls.

The most common issue I have seen with this pattern is that in case of backend performance problems the AEM threadpool of all AEM instances are consumed within seconds, leading almost immediately to an outage of the AEM service. That means, that a problem on the backend or on the connection between AEM and the backend takes down your page rendering abilities, leaving you with what is cached at the CDN level.

The common recommendation we make in these cases is quite obvious: introduce more agressive timeouts. But the actual solution to this problem is a different one:

Do not use AEM as a proxy.

This is a perfect example for a case, where the client (browser) itself can do the integration. Instead of proxy-ing (=tunneling) all backend traffic through AEM, the client could approach the backend service directly. Because then the constraints AEM has (for example the number of concurrent requests) do no longer apply for the calls to the backend. Instead the backend is exposed directly to the endusers, and uses whatever technology is suitable for that; typically it is exposed via an API gateway.

If the backend gets slow, AEM is not affected. If AEM has issues, the backend is not directly impacted because of it. AEM does not even need to know that there is a backend at all. Both systems are entirely decoupled.

As you see, I pretty much prefer this approach of “integration at the frontend layer” and exposing the backend to the endusers over any type of “AEM calls the backend systems”. Mostly because such architectures are less complex and easier to debug and analyze. And that should be your default and preferred approach, whenever this required.

Disclaimer: Yes, there are cases where the application logic requires AEM to do backend calls; but in these cases it’s questionable if such requests need to be done synchronously in requests, meaning that an AEM request needs to do a backend call to consume its result. If these request can be done async, then the whole problem vector I outlined above simply does not exist.

Note: In my opinion hiding the hostnames of your backend system is also not a good reason for such an backend integration. Also “the service is just available from within our company network and AEM accesses it via VPN” is not a good reason, too. In both cases you can achieve the same with an publicly accessible API gateway, which is specifically designed to handle such usecases and all security-relevant implications of it.

So, do not use AEM as a simple proxy!

by Jörg at June 27, 2024 06:57 PM

June 12, 2024

Things on a content management system - Jörg Hoh

My view on manual cache flushing

I read the following statement by Samuel Fawaz on LinkedIn regarding the recent announcement of the self-service feature to get the API key for CDN purge for AEM as a Cloud Service:

[…] 𝘚𝘰𝘮𝘦𝘵𝘪𝘮𝘦𝘴 𝘵𝘩𝘦 𝘊𝘋𝘕 𝘤𝘢𝘤𝘩𝘦 𝘪𝘴 𝘫𝘶𝘴𝘵 𝘮𝘦𝘴𝘴𝘦𝘥 𝘶𝘱 𝘢𝘯𝘥 𝘺𝘰𝘶 𝘸𝘢𝘯𝘵 𝘵𝘰 𝘤𝘭𝘦𝘢𝘯 𝘰𝘶𝘵 𝘦𝘷𝘦𝘳𝘺𝘵𝘩𝘪𝘯𝘨. 𝘕𝘰𝘸 𝘺𝘰𝘶 𝘤𝘢𝘯.

I fully agree, that a self-service for this feature was overdue. But I always wonder why an explicit cache flush (both for CDN and dispatcher) is necessary at all.

The caching rules are very simple, as the rules for the AEM as a Cloud Service CDN are all based on the TTL (time-to-live) information sent from AEM or the dispatcher configuration. The caching rules for the dispatcher are equally simple and should be well understood (I find that this blog post on the TechRevel blog covers this topic of dispatcher cache flushing quite well).

In my opinion it should be doable to build a model which allows you to make assumptions, how long it takes for a page update to be visible to all users on the CDN. And it also allows you to reason about more complex situations (especially when content is pulled from multiple pages/areas to render) and understand how and when content changes are getting visible for endusers.

But when I look at the customer requests coming in for cache flushes (CDN and dispatcher), I think that in most cases there is no clear understanding what actually happened; most often it’s just that on the authoring the content is as expected and activated properly, but this change does not show up the same way on publish. The solution is often to request a cache flush (or trigger it yourself) and hope for the best. And very often this fixes the problem, and then the most up-to-date content is delivered.

But is there an understanding why the caches were not updated properly? Honestly, I doubt that very often. The same way as infamous “Windows restart” can fix annoying, suddenly appearing problems with your computer, flushing caches seems be one of the first steps for fixing content problems. The issues goes away, we shrug and go on with our work.

But unlike in the case of Windows the situation is different here, because you have the dispatcher configuration in your git repository. And you know the rules of caching. You have everything you need to have to understand the problem better and even fix it from happening again.

Whenever the authoring users come to you with that request “content is not showing up, please flush the cache”, you should consider this situation as a bug. Because it’s a bug, as the system is not work as expected. You should apply the workaround (do the flush), but afterwards invest time into the analysis and root-cause analysis (RCA), why it happened. Understand and adjust the caching rules. Because very often these cases are well reproducible.

In his LinkedIn post Samuel writes “Sometimes the CDN cache is just messed up“, and I think that is not true. It’s not that it’s a random event you cannot influence at all. On the contrary. It’s an event which is defined by your caching configuration. It’s an event which you can control and prevent, you just need to understand how. And I think that this step of understanding and then fixing it is missing very often. And then the next from request from your authoring users for a cache flush is inevitable, and another cache flush is executed.

In the end flushing caches comes with the price of increased latency for endusers until the cache is populated again. And that’s a situation we should avoid as good as we can.

So as a conclusion:

An explicitly requested cache clear is a bug because it means that something is not working as expected.
And as every bug it should be understood and fixed, so you are no longer required to perform the workaround.

by Jörg at June 12, 2024 10:58 AM

May 31, 2024

Things on a content management system - Jörg Hoh

Adopting AEM as a Cloud Service: Shifting from Code-Centric Approaches

The first CQ5 version I worked with was CQ 5.2.0 in late 2009; and since then a lot changed. I could list a lot of technical changes and details, but that’s not the most interesting part. I want to propose this hypothesis as the most important change:

CQ5 was a framework which you had to customize to get value out of it. Starting with AEM 6.x more and more out-of-the-box features were added which can be used directly. In AEM as a Cloud Service most new features are directly usable, not requiring (or even allowing) customization.

And as corollary: The older your code base the more customizations, and the harder is the adoption of new features.

As a SRE in AEM as a Cloud Service I work with many customers, which migrated their application over from an AEM 6.x version. While the “best practice analyzer” is a great help to get your application ported to AEM CS, it’s just this: It helps you to migrate your customizations, the (sometimes) vast amount of overlays for the authoring UI, backend integrations, complex business and rendering logic, JSPs, et cetera. And very often this code is based on the AEM framework only and could technically still run on CQ 5.6.1, because it works with Nodes, Resources, Assets and Pages as the only building blocks.

While this was the most straight-forward way in the times of CQ5, it becomes more and more a problem in later versions. With the introduction of Content Fragments, Experience Fragments, Core Components, Universal Editor, Edge Delivery Services and others, many new features were added which often do not fit into the self-grown application structures. These product features are promoted and demoed, and it’s understandable that the business users want to use them. But the adoption of these new features would often require large refactorings, proper planning and a budget for it. Nothing you do in a single 2-week sprint.

But this situation also has impact on the developers themselves. While customizations through code were the standard procedure in CQ5, there are often other ways available in AEM CS. But when I read through the AEM forums and new blog posts for AEM, I still see a large focus on coding: Custom servlets, sling models, filters, whatever. Often using the same old CQ5 style we had to use 10 years ago, because there was nothing else. That approach still works, but it will lead you into the customization hell again. Also many in violation of the practices recommended for AEM CS.

That means:

If you want to start an AEM CS project in 2024, please don’t follow the same old approach.
Make sure that you understand the new features introduced in the last 10 years, and how you can mix and match them to implement the requirements.
Opening the IDE and start coding should be your last resort.

It also makes sense to talk with Adobe about the requirements you need to implement; I see that features requested by many customers are often prioritized and are implemented with customer involvement; a way which is much easier to do in AEM CS than before.

by Jörg at May 31, 2024 11:40 AM

April 24, 2024

Things on a content management system - Jörg Hoh

AEM CS & Mongo exceptions

If you are an avid log checker on your AEM CS environments you might have come across messages like this in your authoring logs:

02.04.2024 13:37:42:1234 INFO [cluster-ClusterId{value='6628de4fc6c9efa', description='MongoConnection for Oak DocumentMK'}-cmp57428e1324330cluster-shard-00-02.2rgq1.mongodb.net:27017] org.mongodb.driver.cluster Exception in monitor thread while connecting to server cmp57428e1324330cluster-shard-00-02.2rgq1.mongodb.net:27017 com.mongodb.MongoSocketException: cmp57428e1324330cluster-shard-00-02.2rgq1.mongodb.net 
at com.mongodb.ServerAddress.getSocketAddresses(ServerAddress.java:211) [org.mongodb.mongo-java-driver:3.12.7] 
at com.mongodb.internal.connection.SocketStream.initializeSocket(SocketStream.java:75) [org.mongodb.mongo-java-driver:3.12.7] 
...
Caused by: java.net.UnknownHostException: cmp57428e1324330cluster-shard-00-02.2rgq1.mongodb.net

And you might wonder what is going on. I get this question every now and then, often assuming that this something problematic. Because we have all learned that stacktraces normally indicate problems. And on first sight this indicates a problem, that a specific hostname cannot be resolved. Is there a DNS problem in AEM CS?

Actually this message does not indicate any problem. The reason behind this is the way how mongodb implemented scaling operations. If you up- or downscale the mongo cluster, this does not happen in-place, but you get actually a new mongo cluster of the new size and of course the same content. And this new cluster comes with a new hostname.

So in this situation there was a scaling operation, and AEM CS connected to the new cluster and now looses connection to the old cluster, because the older cluster is stopped and its DNS entry is removed. Which is of course expected. And for that reason you can also see that this is logged on level INFO, and not as an ERROR.

Unfortunately this is a log message created by the mongo-driver itself, so this cannot be changed on the Oak level by removing the stacktrace from this message and changing the message itself. And for that reason you will continue to see it in the AEM CS logs, until a new improved mongo driver changes that.

by Jörg at April 24, 2024 10:52 AM

March 04, 2024

Things on a content management system - Jörg Hoh

Performance test modelling (part 5)

This is part 5 and the final post of the blog post series about performance test modelling; see part 1 for an overview and the links to all articles of this series.

In the previous post I discussed the impact of the system which we test, how the modelling of the test and the test content will influence the result of the performance test, and how you implement the most basic scenario of the performance tests.

In this blog post I want to discuss the predicted result of a performance test and the actual outcome of it, and what you can do when these do not match (actually they rarely do on the first execution). Also I want to discuss the situation where after golive you encounter that a performance test delivered the expected results, but did not match the observed behavior in production.

The performance test does not match the expected results

In my experience every performance, no matter how good or bad the basic definition is, contains at least 2 relevant data points:

the number of concurrent users (we discussed that already in part 1)
and an expected result, for example that the transaction must be completed within N seconds.

What if you don’t meet the performance criteria in point 2? This is typically the time when customers in AEM as a Cloud Service start to raise questions to Adobe, about number of pods, hardware details etc, as if the problem can only be the hardware sizing on the backend. If you don’t have a clear understanding about all the implications and details of your performance tests, this often seems to be the most natural thing to ask.

But if you have built a good model for your performance test, your first task should be to compare the assumptions with the results. Do you have your expected cache-hit ratio on the CDN? Were some assumptions in the model overly optimistic or pessimistic? As you have actual data to validate your assumptions you should do exactly that: go through your list of assumptions and check each one of them. Refine them. And when you have done that, modify the test and start another execution.

And at some point you might come to the conclusion, that all assumptions are correct, you have the expected cache-hit ratio, but the latency of the cache misses is too high (in which case the required action is performance tuning of individual requests). Or that you have already reduced the cache MISSES (and cache PASSES) to the minimum possible and that the backend is still not able to handle the load (in which case the expected outcome should be an upscale); or it can also be both.

That’s fine, and then it’s perfect to talk to Adobe, and share your test model, execution plan and the results. I wrote in part 1:

As you can imagine, if I am given just a few diagrams with test results and test statistics as preparation for this call with the customer … this is not enough, and very often more documentation about the test is not available. Which often leads to a lot of discussions about some very basic things and that adds even more delay to an already late project and/or bad customer experience.

But in this situation, when you have a good test model and have done your homework already, it’s possible to directly have a meaningful discussion without the need to uncovering all the hidden assumptions. Also, if you have that model at hand, I assume that performance tests are not an afterthought, and that there are still reasonable options to do some changes, which will either completely fix the situation or at least remediate the worst symptoms, without impacting the go-live and the go-live date too much.

So while this is definitely not the outcome we all work, design, build and ultimately hope for, it’s still much better than the 2nd option below.

I hope that I don’t need to talk about unrealistic expectations in your performance tests, for example delivering a p99,9 with 200 ms latency, while at the same time requiring a good number of requests always be handled by the AEM backend. You should have detected these unrealistic assumptions much earlier, mostly during design and then in the first runs during the evolution phase of your test.

Scenario 2: After go-live the performance is not what it’s supposed to be

In this scenario a performance test was either not done at all (don’t blame me for it!) or the test passed, but the results of the performance tests did not match the observed reality. This often shows up as outages in production or unbearable performance for users. This is the worst case scenario, because everyone assumed the contrary as the performance test results were green. Neither the business nor the developer team are prepared for it, and there is no time for any mitigation. This normally leads to an escalated situation with conference calls, involvement from Adobe, and in general a lot of stress for all parties.

The entire focus is on mitigation, and we ( I am speaking now as a member of the Adobe team, who is often involved in such situations) will try to do everything to mitigate that situation by implementing workarounds. As in many cases the most visible bottleneck is on the backend side, upscaling the backend is indeed the first task. And often this helps to buy you some time to perform other changes. But there are even cases, where an upscale of 1000% would be required to somehow mitigate that situation (which is possible, but also very short-lived, as every traffic spike on top will require additional 500% …); also it’s impossible to speed up the latency of a single-threaded request of 20 seconds by adding more CPU. These cases are not easy to solve, and the workaround often takes quite some time, and is often very tailored; and there cases where a workaround is not even possible. In any way it’s normally not a nice experience for no-one of the involved parties.

I refer to all of these actions as “workaround“. In bold. Because they are not not the solution to the challenge of performance problems. They cannot be a solution because this situation proves that the performance test was testing some scenarios, but not the scenario which shows in the production environment. It also raises valid concerns on the reliability of other aspects of the performance tests, and especially about the underlying assumptions. Anyway, we are all trying to do our best to get the system back to track.

As soon as the workarounds are in place and the situation is somehow mitigated, 2 types of questions will come up:

How does a long-term solution look like?
Why did that happen? What was wrong with the performance test and the test results?

While the response to (1) is very specific (and definitely out of scope of this blog post), the response to (2) is interesting. If you have a good documented performance test model you can compare its assumptions with the situation in which the production performance problem happened. You have the chance to spot the incorrect or missing assumption, adjust your model and then the performance test itself. And with that you should be able to reproduce your production issue in a performance test!

And if you have a performance failing test, it’s much easier to fix the system and your application, and apply some specific changes which fix this failed test. And it gives you much more confidence that you changed the right things to make the production environment handle the same situation again in a much better way. Interestingly, this gives also to some large extent the response to the question (1).

If you don’t have such a model in this situation, you are bad off. Because then you either start building the performance test model and the performance test from scratch (takes quite some time), or you switch to the “let’s test our improvements in production” mode. Most often the production testing approach is used (along with some basic testing on stage to avoid making the situation worse), but even that takes time and a high number of production deployments. While you can say it’s agile, other might say it’s chaos and hoping for the best… the actual opposite of good engineering practice.

Summary

In summary, when you have a performance test model, you are more likely to have less problems when your system goes live. Mostly because you have invested time and thoughts in that topic. And because you acted on it. It will not prevent you from making mistakes, forgetting relevant aspects and such, but if that happens you have a good basis to understand quickly the problem and also a good foundation to solve them.

I hope that you learned in these posts some aspects about performance tests which will help you to improve your test approach and test design, so you ultimately have less unexpected problems with performance. And if you have less problems with that, my life in the AEM CS engineering team is much easier

Thanks for staying with me for throughout this first planned series of blog posts. It’s a bit experimental, although the required structure in this topic led to some interesting additions on the overall structure (the first outline just covered 3 posts, now we are at 5). But I think that even that is not enough, I think that some aspects deserve a blog post on their own.

by Jörg at March 04, 2024 09:22 PM

February 26, 2024

Things on a content management system - Jörg Hoh

Performance test modelling (part 4)

This the 4th post of the blog post series about performance test modelling; see part 1 for an overview and the links to all articles of this series.

In the parts 2 and 3 I outlined relevant aspects when it comes to model your performance tests:

The modelling of the expected load, often as expressed as “concurrent users”.
The realistic modelling of the system where we want to conduct the performance tests, mostly regarding the relevant content and data.

In this blog post I want show how you deduce from that data, what specific scenarios you should cover by a performance tests. Because there is no single test, which tells you that the resulting end-user performance is good or not.

The basic performance test scenario

Let’s start with a very simple model, where we assume that the traffic rate is quite identical for the whole day; and therefor the performance test resembles that model:

On first sight this is quite simple to model, because you performance test will execute requests at a constant rate for the whole period of time.

But as I outlined in part 3, even if it seems that simple, you have to include at least some background noise. Also you have to take into account, that initially the cache-hit ratio is poor at the beginning, so you have to implement a cache-warmup phase (normally implement as a ramp-up phase, in which the load is increasing up the planned plateau) and just start to measure there.

So our revised plan rather looks like this this

Such a test execution (with the proper modelling of users, requests and requested data) can give you pretty good results if your model assumes a pretty constant load.

What about if your model requires you model a much more fluctuating request rate (for example if your users/visitors are primarily located in North America, and during the night you have almost no traffic, but it starts to increase heavily on the american morning hours? In that case you probably model the warmup in a way, that it resembles the morning increase on traffic, both in frequency and rate. That shouldn’t be that hard, but requires a bit more explicit modelling than just a simple rampup.

To give you some practical hints towards some basic parameters:

Such a performance test should run at least 2-3 hours, and even if you see that the results are not what you expect, not terminating it can reveal interesting results.
The warmup phase should at least cover 30 minutes; not only to give the caches time to warm-up, but also to give the backend systems time to scale to their “production sizing”; when you don’t execute performance test all the time, the system might scale down because there is no sense in having many systems idling when there is not load.
It can make sense to start not with the 100% of the targeted load, but with smaller numbers and start to increase from there. Because only then you can see the bottleneck which your test hits first. If you start already with 100% you might just see a lot of blockings, but you don’t know which one is the most impeding one.
When you are implementing a performance test in the context of AEM as a Cloud Service, I recommend to also use my checklist for Performance testing on AEM CS which gives some more practical hints how to get your tests right; although a few aspects covered there are covered in more depth in this post series as well.

When you have such a test passing the biggest part of the work is done; and based on your models you can do execute a number of different tests based to answer more questions.

Variations of the basic performance

The above model just covers an totally average day. But of course it’s possible to vary the created scenario to respond to some more questions:

What happens if the load of the day is not 100%, but for some reasons 120%, with identical assumptions about user behavior and traffic distribution? That’s quite simple, because you just increase a number in the performance test.
The basic performance test runs just for a few hours and stops then. It gives you the confidence that the system can operate at least these many hours, but a few issues might go unnoticed. For example memory leaks accumulating over time might get only visible after many hours of load. For that reason it makes sense to run your test for 24-48 hours continuously to validate that there is no degradation over that time.
What’s the behavior when the system goes into overload? An interesting question (but only if it does not break already when hitting the anticipated load) which is normally answered by a break test; then you increase the load more and more, until the situation really gets out of hand. If you have enough time, that’s indeed something you can try, but let’s hope that’s not very relevant
How does the system behave when your backend systems are not available? What if they come online again?

And probably many more interesting scenarios, which you can think of. But you should only perform these, when you have the basic version test right.

When you have your performance tests passing, the question is still: How does it compare to production load? Are we actually testing the right things?

In part 5 (the last post of this series) I cover the options you have when performance test does not match the expected results and also the worst case scenario: What happens if you find out after golive that your performance tests were good, but the production environment behaves very differently?

by Jörg at February 26, 2024 01:53 PM

February 20, 2024

Things on a content management system - Jörg Hoh

CDN and dispatcher – 2 complementary caching layers

I sometimes hear the question how to implement cache invalidation for the CDN. Or the question is why AEM CS still operates with a dispatcher layer when it now has a more powerful CDN in front of it.

The questions are very different, but the answer is in both cases: the CDN is no replacement for the dispatcher, and the dispatcher does not replace the CDN. They serve different purposes, and they combination of these two can be a really good package. Let me explain this.

The dispatcher is very traditional cache. It’s fronting the AEM systems and the cache status is actively maintained by cache invalidation so it always delivers current data. But from an end-user perspective this cache is often far away in terms of network latency. If my AEM systems are hosted in Europe, and end-users from Australia are reaching it, the latency can get huge.

The CDN is the contrary, it serves the content from many locations across the world, being as close to the end-user as possible. But the CDN cache invalidation is cumbersome, and for that reason most often TTL-based expiration is used. That means, you have to accept that there is a chance, that new content is already available, but the CDN can still deliver old content.

Not everyone is happy with that; and if that’s a real concern, short TTLs (in the range of a few minutes) are the norm. That means, that many files on the CDN will get stale every few minutes, which results in cache misses; and a cache miss on the CDN goes back to origin. But of course the reality is, that not many pages change every 10 minutes; actually very few. But customers want to have that low TTL just in case a page was changed, and that change needs to get visible to all endusers as soon as possible. .

So you have a lot of cache misses on the CDN, which trigger a re-fetch of the file from origin, and and because many of the files have not changed, you refetch the exactly same binary which got stale seconds ago. Actually a waste of resources, because your origin system delivers the same content over and over again to the CDN a consequence of these misses. So you could keep your AEM instances busy all the time, re-rendering the same requests over and over, always creating the same response.

Introducing the dispatcher caching, fronting the actual AEM instance. If the file has not changed, the dispatcher will deliver the same file (or just HTTP 304 not modified, which even avoids sending the content again). And it’s fast, much faster than letting AEM rendering the same content again. And if the file has actually changed, it’s rendered once and then reused for all the future CDN cache misses.

The combination of these 2 types of caching approaches help you to deliver content from the edge while at the same time having a reasonable latency for content updates (that means the time between replicating a change to the publish instances until all users across the world can see it) without the need to have a huge number of AEM instances in the background.

So as a conclusion, using the CDN and the dispatcher cache is a good combination, if setup properly.

by Jörg at February 20, 2024 05:22 PM

February 09, 2024

Things on a content management system - Jörg Hoh

Performance tests modelling (part 3)

This is post 3 in my series about Performance Test Modelling. See the first post for an overview of this topic.

In the previous 2 posts I discussed the importance of having a clearly defined model of the performance tests, and that a good definition the load factors (typically measured by “concurrent users”) is required to build a realistic test.

In this post I cover the influence of the test system and test data on the performance test and its result, and why you should spend effort to create a test with a realistic set of data/content. In this post we will do a few thought experiments, and to judge the results of each experiment, we will use the cache-hit ratio of a CDN as a proxy metric.

Let’s design a performance test for a very simple site: It just consists of 1 page, 5 images and 1 CSS and 1 JS file; 8 files in total. Plus there is a CDN for it. So let’s assume that we have to test with 100, 500 and 1000 concurrent users. What’s the test result you expect?

Well, easy. You will get the same test result for all tests irrespective of the level of concurrency; mostly because after the first requests a files will be delivered from the CDN. That means no matter with what concurrency we test, the files are delivered from the CDN, for which we assume it will always deliver very fast. We do not test our system, but rather the CDN, because the cache hit ratio is quite close to 100%.

So what’s the reason why we do this test at all, knowing that the tests just validate the performance promises of the CDN vendor? There is no reason for it. The only reason why we would ever execute such a test is that on test design we did not pay attention to the data which we use to test. And someone decided that these 7 files are enough for satisfy the constraints of the performance test. But the results do not tell us anything about the performance of the site, which in production will consists of tens of thousands of distinct files.

So let’s us do a second thought experiment, this time we test with 100’000 files, 100 concurrent users requesting these files randomly, and a CDN which is configured to cache files for 8 hours (TTL=8h). With regard to to chache-hit-ratio, what is the expectation?

We expect that the cache-hit ratio starts low for quite some time, this is the cache-warming phase. And then it starts to increase, but it will never hit 100%, as after some time cache entries will expire on the cache and start produce cache-misses. This is a much better model of reality, but it still has a major flaw: In reality, requests are not randomly distributed, but normally there are hotspots.

A hotspot consists of files, which are requested much more often than average. Normally these are homepages or other landing pages, plus other pages which users normally are directed to. This set of files is normally quite small compared to the total amount of files (in the range of 1-2%), but they make up 40-60% of the overall requests, and you can easily assume a Pareto distribution (the famous 80/20 rule), that 20% of the files were responsible for 80% of the requests. That means we have a hotspot and a long-tail distribution of the requests.

If we modify the same performance test to take that distribution into account, we end up with a higher cache-hit ratio, because now the hotspot can be delivered mostly from the CDN. But on the long-tail we will have more cache-misses, because they are requested that rarely, so they can expire on the CDN without being requested again. But in total the cache-hit ratio will be better than with the random distribution, especially on the often-requested pages (which are normally the ones we care about most).

Let’s translate this into a graph which displays the response time.

This test is now quite realistic, and if we only focus on the 95 percentile (p95; that means if we take 100 requests, 95 of them are faster than this) the result would meet the criteria; but beyond that the response time is getting higher, because there are a lot of cache misses.

This level of realism in the test results comes with a price: Also the performance test model and the test preparation and execution are much more complicated now.

And till now we only considered users, but what happens when we add random internet noise and the search engines (the unmodelled users from the first part of this series) into the scenario? These will add more (relative) weight to the long-tail, because these requests do not necessarily follow the usual hotspots, but we have to assume a more random distribution for these.

That means that then the cache-hit ratio will be lower again, as there will be much more cache-misses now; and of course this will also increase the response time of the p95. And: it will complicate the model even further.

So let’s stop here. As I have outlined above, the most simple model is totally unrealistic, but making it more realistic makes the model more complex as well. And at some point the model is no longer helpful, because we cannot transform it into a test setup without too much effort (creating test data/content, complex rules to implement the random and hotspot-based requests, etc). That means especially in the case of the test data and test scenarios we need to find the right balance. The right balance between the investment we want to make into tests and how close it should mirror the reality.

I also tried to show you, how far you can get without doing any kind of performance test. Just based on some assumptions were able to build a basic understanding how the system will behave, and how some changes of the parameters will affect the result. I use this technique a lot and it helps me to quickly refine models and define the next steps or the next test iteration.

In part 4 I discuss various scenarios which you should consider in your performance test model, including some practical recommendations how to include them in your test model.

by Jörg at February 09, 2024 01:57 PM

February 01, 2024

Things on a content management system - Jörg Hoh

Performance tests modelling (part 2)

This is is the second blog post in the series about performance test modelling. You can find the overview over this series and links to all its articles in the post “Performance tests modelling (part 1)“.

In this blog post I want to cover the aspect of “concurrent users”, what it means in the context of a performance test and why its important to clearly understand its impact.

Concurrent users is an often used measure to indicate the the load put to a system, expressed by usage in a definition, how many users are concurrently using that system. And for that reason many performance tests provide as quantitative requirement: “The system should be able to handle 200 concurrent users”. While that seems to be a good definition on first sight, it leaves many questions:

What does “concurrent” mean?
And what does “user” mean?
Are “200 concurrent users” enough?
Do we always have “200 concurrent users”?

Definition of concurrent

Let’s start with the first question: What does “concurrent” really mean on a technical level? How can we measure that our test indeed does “200 concurrent users” and not just 20 or 1000?

Are there any server-side sessions which we can count and which directly give this number? And that we setup our test in a way to hit that number?
Or do we have to rely on more vague definitions like “users are considered concurrent when they do a page load less than 5 minutes apart”? And that we design our test in that way?

Actually it does not matter at all, which definition you choose. It’s just important that you explicitly define which definition you use. And what metric you choose to understand that you hit that number. This is an important definition when it comes to implementing your test.

And as a side-note: Many commercial tools have their own definition of concurrent, and here the exact definition does not matter as well, as long as you are able to articulate it.

What is a user?

The next question is about “the user” which is modeled in the test; to simplify the test and test executions one or more “typical” user personas are created, which visit the site and perform some actions. Which is definitely helpful, but it’s just that: A simplification, because otherwise our model would explode because of the sheer complexity and variety of user behavior. Also sometimes we don’t even know what a typical “user” does on our site, because that system will be brand-new.

So this is a case, where we have a huge variance in the behavior of the users, which we should outline in our model as a risk: The model is only valid if the majority of the users are behaving more or less as we assumed.

But is this all? Are really all users do at least 10% of the actions we assume they do?

Let’s brainstorm a bit and try to find answers for these questions:

Does the google bot behave like that? All the other bots of the search engines?
What about malware scanners which try to hit a huge list of WordPress/Drupal/… URLs on your site?
Other systems performing (random?) requests towards your site?

You could argue, that this traffic has less/no business value, and for that reason we don’t test for it. Also it could be assumed that this is just a small fraction of the overall user traffic, and can be ignored. But that is just an assumption, and nothing more. You just assume that it is irrelevant. But often these requests are not irrelevant, not all all.

I encountered cases where not the “normal users” were bringing down a system, but rather this non-normal type of “user”. An example for that are cases where the custom 404 handler was very slow, and for that reason the basic undocumented assumption “We don’t need to care about 404s, as they are very fast” was violated and brought down the site. All performance tests passed, but the production system failed nevertheless.

So you need to think about “user” in a very broad sense. And even if you don’t implement the constant background noise of the internet in your performance test, you should list it as factor. If you know that a lot of this background noise will trigger a HTTP statuscode 404, you are more likely to check that this 404 handler is fast.

Are “200 concurrent users” enough?

One information every performance has is the number of concurrent users which the system must be able to handle. But even if we assume, that “concurrent” and “users” are both defined as well, is this enough?

First, on what data is this number based on? Is it a number based on data derived from another system, which the new system should replace? That’s probably the best data you can get. Or when you build a new system, is it based on good marketing data (which would be okay-ish), based on assumptions of the expected usage or just numbers we would like to see (because we assume that a huge number of concurrent users means a large audience and a high business value)?

So probably this is the topic which will be discussed the most. But the number and the way how that number is determined should be challenged and vetted. Because it’s one the corner-stones of the whole performance test model. It does not make sense to build a high performance and scalable system when afterwards you find out that the business numbers we grossly overrated, and a smaller and cheaper solution would have delivered the same results.

What about time?

A more important is aspect which is often overlooked is the timing; how many users are working on the site at every moment? Do you need to expect the maximum number 8 hours every day or just during the peak days of the year? Do you have a more or less constant usage or only during business hours in Europe?

This heavily depends on the type of your application and the distribution of your audience. If you build an intranet site for a company only located in Europe, the usage during the night is pretty much “zero”, and it will start to increase at 0600 in the morning (probably the Germans going to work early :-)), hitting the max usage between 09 and 16 o’clock and going to zero at latest at 22 o’clock. The contrast to it is a site visited world-wide by customers, where we can expect a higher and almost flat line; of course with variations depending on the number of people being up.

This influences your tests as well, because in both cases you don’t need to simulate spikes, that means a 500% increase of users within 5 minutes. On the other hand, if you plan for large marketing campaigns addressing millions of users, this might exactly be the situation you need to plan and test for. Not to mention if you book a slot during the Superbowl break.

Why is this important? Because you need to test only scenarios which you will expect to see in production. And ignore scenarios which we don’t have any value for you. For example it’s a waste of time and investment to test for a sudden spike in the above mentioned intranet case for the European company, while it’s essential for marketing campaigns to test a scenario, where such a spike comes on top of the normal traffic.

Summary

“N concurrent users” itself is not much information; and while it can serve as input, your performance test model should contain a more detailed understanding of that definition and what it means to the performance test. Otherwise you will focus just on a given number of users of this idealistic type and ignore every other scenario and case.

In the part 3 I cover how the system and the test data itself will influence the result of the performance test.

by Jörg at February 01, 2024 06:26 PM

January 26, 2024

Things on a content management system - Jörg Hoh

Performance tests modelling (part 1)

In my last blog post about performance test I outlined best practices about building and executing a performance test with AEM as a Cloud Service. But intentionally I left out a huge aspect of the topic:

How should your test look like?
What is a realistic test?
And what can a test result tell you about the behavior of your production environment?

These are hard question, and I often find that these questions are not asked. Or people are not aware that these questions should be asked.

This is the first post in a series of blog posts, in which I want to dive a bit deeper into performance testing in the context of AEM and AEM CS (and many aspects can probably get generalized to other web applications as well). Unlike my other blog posts it addresses topics on a higher level (I will not refer to any AEM functionality or API, and won’t even mention AEM that often), because I learned over time, that very often performance tests are done based on a lot of assumptions. And that it’s very hard to discuss the details of a performance tests if these assumptions are not documented explicitly. I had such discussions in these 2 contexts:

The result of a performance test (in AEM as a Cloud Service) is poor and the customer wants to understand what Adobe will do.
After golive severe performance problems show up on production; and the customer wants to understand how this can happen as their tests showed no problems.

As you can imagine, if I am given just a few diagrams with test results and test statistics as preparation for this call with the customer … this is not enough, and very often more documentation about the tests is not available. Which often leads to a lot of discussions about some very basic things and that adds even more delay to an already late project and/or bad customer experience. So you can also consider this blog series as a kind of self-defense. If you were asked to read this post, now you know

I hope that this series will also help you improve your way of doing performance tests, so we all will have less of these situations to deal with.

This post series consists of these individual posts:

Part 1 (this post): What is a performance test? Why do we have these tests at all?
Part 2: What is the relevance of “number of concurrent users”? What does it even mean?
Part 3: What is the impact of test system and the test data on the result of the test?
Part 4: What scenarios should be covered by a performance test
Part 5: What are your options if the test results do not match the observations in the production environment?

And a word upfront to the term “performance test”: I summarize a number of different tests types under that term, which are executed with different intentions, and which come with many names: “Performance tests”, “Load tests”, “Stress tests”, “Endurance tests”, “Soak tests”, and many more. Their intention and execution differ, but in the end they can all benefit from the same questions which I want to cover this blog series. So if you read “performance test”, all of these other tests are meant as well.

What is a performance test? And why do we do them?

A performance test is a tool to predict the future, more specifically how a certain system will behave in a more-or-less defined scenario.

And that outlines already two problems which performance tests have.

It is a prediction of the future. Unlike a science experiment it does not try to understand the presence and extrapolate into the future. It does not the have same quality as “tomorrow we will have a sunrise, even if the weather is clouded”, but rather goes into the direction of “if my 17 year old son wants to celebrate his birthday party with his friends at home, we better plan a full cleaning of the house for the day after”. That means no matter how well you know your son and his friends (or the system you are building), there is still an element of surprise and unknown in it.
The scenario which we want to simulate is somehow “defined”. In quotes, because in many cases the definitions of that scenario are pretty vague. We normally base these definitions on previous experience we have made and some best practices of the industry.

So it’s already clear from these 2 items, that this prediction is unlikely to be exact and 100% accurate. But it does not need to be accurate, it just needs to be helpful.

A performance test is helpful if it delivers better results than our gut feeling; and the industry has learned that our gut feeling is totally unreliably when it comes to the behaviour of web applications under production load. That’s why many enterprise golive procedures require a performance tests, which will always deliver a more reliable result as gut feeling. But just creating and executing a performance test does not make this a helpful performance test.

So a helpful performance test is also a test, which mimics the reality close enough, that you don’t need to change your plans immediately after your system goes live and hits reality. Unfortunately you only know if your performance test was helpful after you went live. It shares this situation with other test approaches as well; for example a 100% unittest coverage does not mean, that your code does not have bugs, it’s just less likely.

What does that mean for performance tests and their design?

First, a performance test is based on a mental model of your system and the to-be reality, which must be documented. All its assumptions and goals should be explicitly documented, because only then a review can be done. And a review helps to uncover blind spots in our own mental model of the system, its environment and the way how it is used. It helps to clearly outline all known factors which influence the test execution and also its result.

Without that model, it is impossible to compare the test result with reality and try to understand which factor or aspect in the test was missing, misrepresented or not fully understood, which lead to a gap between test result and reality. If you don’t have a documented model, it’s possible to question everything, starting from the model to the correct test execution and the results. If you don’t have a model, the result of a performance test is just a PDF with little to no meaning.

Also you must be aware that this mental model is a massive simplification, as it is impossible to factor in all aspects of the reality, also because the reality changes every day. You will change your application, new releases of AEM as a Cloud Service will be deployed, you add more content, and so on.

Your mental model will never be complete and probably also never be up-to-date, and that will be reflected in your performance test. . But if you know that, you can factor it in. For example you know that in 3 months time the number of content has doubled, and you can decide if it’s helpful to redo the performance test with changed parameters. It’s now a “known unknown”, and no more a “unknown unknown”. You can even decide to ignore factors, if they deem not relevant to you, but of course you should document it.

When you have designed and documented such a model, it is much easier to implement the test, execute the test and reason about the results. Without such a model, there is much more uncertainties in every piece of the test execution. It’s like developing software without a clear and shared understanding what exactly you want to develop.

That’s enough for this post. As promised, this is more abstract than usual, but I hope you liked it and it helps to improve your tests. In part 2 I look into a few relevant aspects which should be covered by your model.

by Jörg at January 26, 2024 05:27 PM

January 23, 2024

CQ5 Blog - Inside Solutions

12 Steps to Migrate AEM from On-Premise to the Cloud

Is your organization harnessing the full potential of Adobe Experience Manager (AEM) to deliver exceptional digital experiences across various channels?

If you currently run your websites on AEM On-Premise or rely on Adobe’s Managed Services, it’s time to embark on a journey into the Cloud.

In 2020, AEM introduced the next generation of CMS with AEM as a Cloud Services. It’s time for you and your company to prepare for this transition and embrace this next-generation CMS in the Cloud.

At One Inside – A Vass Company, we’ve worked with several large enterprises, helping them move from AEM on-premise to AEM Cloud Service, executing seamless migrations in less than three months.

Within this comprehensive guide, our AEM experts have compiled their knowledge and address the following questions:

How can you smoothly transition from AEM on-premise to AEM Cloud?
What are the critical steps for a successful migration to AEM Cloud?
What common pitfalls should you avoid?

But before delving into the steps, let’s explore the compelling advantages of embracing AEM Cloud.

What are the benefits of moving to AEM Cloud?

As with any enterprise project, it is essential to demonstrate the clear benefits of migrating your AEM installations to the Cloud to your organization and board.

Let’s explore why this transition is a necessary step.

Moving from AEM On-Premises or managed services to AEM Cloud offers numerous advantages, including:

Reduced Cost of Ownership and Mid-term ROI

The total cost of ownership with AEM Cloud is drastically reduced. Your company might get savings on several aspects:

License: Licensing costs may decrease since the new pricing model is usage-based. Additionally, transitioning to the Cloud provides you with a fresh opportunity to engage in price negotiations with Adobe.
Operational Costs: AEM Cloud simplifies many operational aspects, such as environment management and automated version updates.
Infrastructure and Hosting: If you previously hosted AEM on your premises, you’ll experience substantial infrastructure and hosting expenses savings. This eliminates the cost of maintaining infrastructure.
Workforce: The number of full-time employees (FTEs) required for the project will decrease, resulting in cost reductions.

While the migration project incurs initial expenses, our team has successfully migrated websites to AEM Cloud in less than three months.

The timeline can vary depending on integration complexity and the number of websites and domains involved.

Based on our analysis, the return on investment (ROI) for such a project typically falls below three years. In other words, migrating to AEM Cloud is a worthwhile investment.

Your CMS is always up-to-date, ensuring you have access to the latest features.

With AEM as a Cloud Services, you can say goodbye to version upgrade projects.

Adobe automatically updates the CMS with the latest features, eliminating the concept of versions. It operates like any other Software as a Service, ensuring you are always working with the most current version.

It’s more secure

Security is a primary concern for large enterprises, and AEM as a Cloud Service could offer enhanced security compared to your current setup.

The solution is continuously monitored, and regular patches are applied promptly whenever a security issue is detected.

Read this document about Adobe Cloud Service Security Overview for more details.

99.9% Uptime

With AEM Cloud, your website will always be online. This solution can efficiently scale horizontally and vertically to consistently maintain this high level of service, effectively managing even the most intensive traffic loads.

What are the main benefits of Adobe Experience Manager as a Cloud Service?

No Learning Curve

One significant advantage of transitioning to AEM Cloud is that your marketing team will find the tool familiar.

Despite significant changes in architecture, release processes, and operations, the end-user experience remains unchanged.

Content editors won’t notice any differences following the migration if you use the latest on-premise version.

This means you won’t need to invest time and resources in managing this change or providing extensive training to your team.

Focus on Innovation and Achieve a Faster Time to Market

Managing the operation of an Enterprise CMS is a practice rooted in the past. It’s time for your organization to embrace this new reality.

With AEM Cloud, you can accelerate innovation for several reasons:

Your workforce can be fully dedicated to projects that create value.
You gain access to the latest innovations from Adobe.

Thanks to our extensive experience with AEM Cloud Service and collaboration with multiple clients, we have witnessed a significantly improved time to market. Projects are completed swiftly, and new websites can be launched within months.

When your company has a new product or service to showcase, you’ll reap the benefits of working with this new generation CMS.

Moving from AEM On-Premise to AEM as a Cloud Service step-by-step

This section will guide you through migrating from AEM On-Premise to AEM as a Cloud Service.

Each step is carefully designed to ensure a smooth and successful transition to the Cloud, covering critical aspects from initial analysis to going live.

AEM On Premise to Cloud Migration Project Steps

Step 1 – Analyze, Plan, and Estimate the Effort

The initial step in this journey is to understand AEM as a Cloud Service and the associated changes and deprecated features.

Some noteworthy changes include:

Architecture changes with automatic horizontal scaling
Project code structure
Asset storage
Built-in CDN
Dispatcher configuration
Network and API connections, including IP whitelisting
DNS & SSL certificate configuration
CI/CD pipelines
AEM author access with Adobe account
User groups & permissions

Additionally, it’s crucial to evaluate your current AEM installation, particularly in terms of connections and integrations with other services:

APIs or endpoints within the internal network
Third-party services, especially those protected by IP whitelisting
Any data import services to AEM
Login with closed user group (CUG)

These elements should be carefully reviewed, as some adjustments may be necessary.

Another critical aspect is effective communication with current stakeholders, partners, and the Adobe team. Onboarding these parties from the project’s outset is essential, with clear task assignments and timeframes.

For example, you will later discover that the involvement of your internal IT team is required. Informing them in advance is crucial to prevent project delays.

Furthermore, it’s essential to review your licensing agreements with Adobe and ensure that you have the appropriate subscriptions for AEM as a Cloud Service.

While this initial step may only take a few days, it is vital in assessing critical aspects of your installation, defining the project plan and effort, and sharing this information with key stakeholders.

Step 2 – Prepare the code for AEM as a Cloud Service

This step aims to ensure your current AEM installation and its code base are ready for the Cloud while remaining compatible with your existing on-premise instances.

While we won’t go deep into all the structural changes required for AEM Cloud in this article, we’ll provide an overview to keep it easily digestible for all readers.

Adobe offers a helpful tool called the Adobe Best Practices Analyzer designed to evaluate your current AEM implementation and offer guidance on improvements to align with best practices and Adobe standards.

The report generated by this tool covers:

Application functionality in need of refactoring.
Repository items that should be relocated to supported locations.
Legacy user interface dialogs and components that require modernization.
Deployment and configuration issues.
AEM 6.x features replaced by new functionalities or are currently unsupported on AEM as a Cloud Service.

It’s important to note that an AEM expert should review the Adobe Best Practices Analyzer report, as it will not fully comprehend the entire codebase and its implications.

Following the assessment, an AEM architect or developer can restructure the codebase and apply new practices per the latest AEM Archetype.

A recommended practice is further refactoring and reviewing outdated features from your current codebase.

Since comprehensive testing of the entire website and application will be necessary later on, taking the opportunity to eliminate technical debt and establish a more robust foundation is advantageous.

Step 3 – Prepare AEM Cloud Environments

This step aims to prepare the cloud environment and set up AEM Cloud Manager, the backbone of AEM as a Cloud Service. Importantly, this step can be conducted concurrently with the previous one.

Adobe Cloud Manager offers a user-friendly interface that simplifies configuring environments, setting up pipelines, and configuring certificates, DNS, and other essential services.

Please take note: To access AEM Cloud Manager and the necessary services, you must first establish a licensing agreement with Adobe. Start discussions with your Adobe account manager well in advance to prevent any delays at this stage.

Step 4 – Migrate Your Projects and Code to AEM Cloud

By this stage, your code has been refactored, and any changes incompatible with the on-premise setup have been implemented and migrated to make it cloud-ready.

Additionally, all necessary environments (test, staging, production) have been appropriately configured and are ready to host your code.

This step is relatively straightforward and involves pushing your code to the Cloud Git repository. During this phase and until the go-live, it is advisable to enforce a feature freeze.

Blogpost_-AEM-CS-Cloud-Manager_CI-CD-process-flow

However, if you cannot afford to freeze features in your production environment or if critical changes must be applied to your on-premise installation, it is feasible to backport the code to the Cloud later.

At One Inside, we have experience handling such situations, but it’s essential to understand that a code freeze can help mitigate the risks of project delays and increased complexity.

Ready to Move to AEM Cloud?

Don’t wait any longer! Reach out to our experts now, and let’s make your move seamless and successful!

Get in touch

Step 5 – Validate Integration with Core Services or External APIs

Chances are, your website relies on data from third-party services or internal applications.

To ensure seamless integration with these services, specific network configurations must be carried out using the Cloud Manager.

Furthermore, AEM as a Cloud Service offers a static IP address that must be whitelisted on your end to enable connectivity with our on-premise applications.

This step is crucial for establishing a secure and uninterrupted connection between your AEM Cloud environment and your core services or external APIs.

Step 6 – Integrate Adobe Target, Adobe Analytics, and the Adobe Experience Cloud Suite

Since you are already utilizing AEM for your websites, it’s probable that you also rely on other solutions within the Adobe Experience Cloud suite, including Adobe Analytics and Adobe Target.

The integration of these solutions is typically straightforward, and they should seamlessly operate within your web pages.

Your existing usage of AEM makes it easier to extend the integration to other Adobe Experience Cloud components, enhancing your ability to analyze and optimize your digital experiences.

Step 7 – Migrate Content

Content migration is an important step, but it doesn’t have to be overly concerning. The structure of the content between your on-premise website and the newly created AEM Cloud website remains the same.

To make this process sound less daunting, you can think of it as a content move, similar to transferring content from your staging environment to the production environment.

Additionally, Adobe offers various tools to streamline this task, such as the Content Transfert Tool, which is specially designed for migrating existing content from your AEM On-Premise to AEM Cloud, and the Package Manager, which facilitates the import and export of repository content.

When we refer to content migration, it encompasses more than just pages; it includes all content within your repository, including:

Page content
Assets
User and group data

Furthermore, since you may continue to create content on your productive site while performing the migration, the tool supports differential content top-up.

You can only transfer changes made since the last content migration, ensuring an efficient and up-to-date transition.

Step 8 – Test, Test, Test

We are approaching the final stages of the migration journey. Although some testing has occurred throughout the various steps, it’s now time for a comprehensive User Acceptance Testing (UAT) session.

Your dedicated testing team and business users should actively participate in this critical phase. It’s essential to have a detailed test strategy in place before commencing UAT.

Including authors in the testing process serves multiple purposes.

Not only does it expedite their familiarity with the new environment, but they are also the individuals most acquainted with how the components should function.

Their input, knowledge, and support are pivotal in ensuring your digital presence remains clear and distinctive.

Conducting thorough testing ensures your migration to AEM Cloud is successful, and your website operates seamlessly in its new environment.

Step 9 – Redirect Domains

This is the final step before going live, and it’s the point where your IT network team plays a key role.

They will manage certificates, DNS configurations, and domain redirection.

As emphasized at the beginning of this guide, it’s crucial that your IT stakeholders were informed from day one of this project about these critical milestones, and tasks were allocated accordingly.

They should be well-prepared and aware of what needs to be done, as preparations for this phase have been ongoing for several weeks.

Effective coordination in this step is essential to prevent delays in the overall process and the go-live date.

Ensuring a smooth domain redirection, your website seamlessly transitions to its new AEM Cloud environment.

Step 10 – Go Live

This step might seem the most stressful, but paradoxically, it’s also the simplest.

Your website has undergone extensive testing, and everything functions seamlessly in the cloud environment. It’s time for the final transition, shifting from your AEM On-Premise instance to the AEM Cloud instance.

The switch will be seamless for your end-users, and they won’t experience any interruptions in service. With careful planning and execution, this step should mark the successful culmination of your migration to AEM Cloud.

“The migration to AEM Cloud is a source of great satisfaction for both the business and IT stakeholders to see the website actively running in the Cloud, moving into a new era of better performance and exciting possibilities to enhance the customer experience. ”

Martyna Wilczynska

Project Manager at One Inside – A VASS Company

Step 11 – Train your Team

Your editors won’t require specific training as the admin interface remains the same.

However, it’s important to note that a new essential tool, Adobe Cloud Manager, has been introduced.

Your IT or DevOps teams should manage this tool, or you can delegate site maintenance to your Adobe Partner.

Our AEM experts can offer training to ensure your IT team possesses the necessary skills and knowledge to handle critical tasks related to SSL Certificates, domain linking, whitelisting, and account management.

Step 12 – Decommission the On-Premise Instance

As a final recommendation, keeping your on-premise server running for 2 to 4 weeks after the migration is advisable.

This precaution provides a safety net in case of any critical situations where you might need to switch back to the on-premise instance.

While, based on our experience, such a reversal is rarely necessary, it’s prudent to manage this potential risk.

Once the hyper-care phase is concluded, you can confidently shift your entire focus to your new AEM as a Cloud Service instance, knowing you have a contingency plan in place if needed.

Need personal guidance with our experts?

Ready to explore your AEM Cloud migration? Book some time with us, so we can evaluate your needs and help you prepare for a seamless move!

Yes. I need help.

Lessons Learned from AEM as a Cloud Service Migration Projects and Best Practices

After several successful migrations to AEM as a Cloud Service, our team has gathered excellent knowledge, and we would like to share some best practices that will help you mitigate the risk in this project.

Start with a Thorough Analysis

Begin your Cloud Migration project with a comprehensive analysis. Avoid rushing the assessment of your current AEM On-Premise setup. It’s crucial to evaluate dependencies and elements that require refactoring carefully.

If this is your first migration, invest time in research and documentation for a project of this nature.

Even if you have an internal team handling AEM, consider seeking support from an experienced Adobe Partner. Their expertise can prove invaluable in ensuring a successful migration.

Manage Stakeholders’ Dependencies

Taking care of stakeholders’ dependencies early in the project is crucial. Multiple members of your organization will play pivotal roles at significant project milestones.

We’ve already mentioned the IT team’s role in managing the network, but other groups may be involved, such as security and quality assurance.

At the project’s start, it’s essential to communicate your expectations clearly with these teams and provide them with precise dates for their involvement.

This proactive approach helps prevent delays and ensures a smooth progression of the project.

Not your typical Scrum project

What may come as a surprise is that a Cloud Migration project does not fully correspond to your typical Scrum-managed IT project.

In the regular framework, we focus on delivering the highest presentable value in the shortest amount of time, and we present our solutions to the clients, constantly asking for feedback.

An AEM Cloud Migration project primarily involves refactoring the backend code, which may not be presentable to the stakeholders until the website is in the acceptance environment in the Cloud and ready for testing.

Regular Team and Stakeholder Meetings

As the three-month timeline swiftly progresses, staying in sync with your team and key stakeholders is essential.

We highly recommend establishing a weekly update routine to track progress, identify and address risks, and implement mitigation plans.

During these weekly reviews, pay particular attention to dependencies with other teams and assess the advancement of their activities. This proactive approach ensures everyone is aligned and swiftly responds to evolving project needs.

“Clear communication with clients is key to risk mitigation, issue identification, and progress updates during the migration. It alleviates client stress and ensures transparency in their digital journey.”

Michael Kleger

Project Manager at One Onside

Relationship with Adobe

License negotiations must be completed to gain access to the cloud environment.

Equally important is discussing with your Adobe account manager to negotiate to keep a standby server on-premise for a specified period as a fallback.

From our experience, initiating such conversations as early as possible allows for negotiating the most advantageous and flexible transition away from the on-premise infrastructure.

Furthermore, in the event of unexpected issues, you may require support from Adobe’s team. It’s possible that certain features may not function properly when refactored for the Cloud.

To expedite the response time of Adobe Support, it is essential to collaborate with an Adobe Partner who maintains a strong relationship with the Adobe team.

For instance, at One Inside, we have cultivated a partnership with Adobe spanning over a decade, and our office is located within 30km of the AEM team responsible for building AEM as a Cloud Service.

This close relationship can be invaluable in certain situations. Over the years, we have developed a robust relationship with Adobe as a company and its talented individuals.

This gives us an advantage in problem-solving, as we possess intimate knowledge of whom to contact without navigating multiple support levels.

Avoid Developing on the On-Premise Instance During Migration

Avoid introducing new developments to your live websites whenever possible while the migration progresses. This practice helps prevent numerous issues.

However, we acknowledge that implementing a three-month code freeze is often impractical.

To mitigate potential problems, ensure that the code on both environments is synchronized and optimized for the Cloud before making any further enhancements to your on-premise branch.

This alignment minimizes complications during the migration process.

Leverage the Opportunity to Enhance Design Flaws

During the migration process, you’ll have the opportunity to test your entire website thoroughly.

Seize this moment to enhance various aspects of your site, including architecture, code refactoring, and minor design adjustments.

In our migration projects, we’ve successfully incorporated improvements such as image rendition generation, frontend enhancements, and optimizations related to performance and caching.

This migration window allows you to transition to the Cloud and enhance your website’s overall quality and functionality.

Key Takeaways for Your AEM as a Cloud Service Migration

In conclusion, migrating to AEM as a Cloud Service is a transformative journey that requires careful planning and execution.

AEM Cloud Service is the future of AEM and this migration sets the foundation.

Throughout this article, we’ve shared valuable insights and best practices from successful AEM Cloud migrations. From analyzing dependencies to fostering solid relationships with Adobe, from weekly team updates to optimizing design flaws, these lessons can guide you toward a successful migration.

Embrace the challenges and opportunities of transitioning to the Cloud, and remember that a well-executed migration can lead to a more efficient, secure, and innovative digital experience for your organization and its users.

With the right approach and the support of experienced partners, you can confidently navigate this journey and deliver excellent results.

We would like to express our gratitude to the talented individuals within our company who contributed to this article, including Martyna Wilczynska, Basil Kohler, Michael Kleger and Samuel Schmitt.

Samuel Schmitt

Digital Solution Expert

Would you like to receive our next article?

Subscribe to our newsletter and we will send you the next article about Adobe Experience Manager.

I subscribe

The post 12 Steps to Migrate AEM from On-Premise to the Cloud appeared first on One Inside.

by Samuel Schmitt at January 23, 2024 10:37 AM

January 12, 2024

Things on a content management system - Jörg Hoh

Sling Model Exporter & exposing ResourceResolver information

Welcome to 2024. I will start this new year with a small advice regarding Sling Models, which I hope you can implement very easy on your side.

The Sling Model Exporter is based on the Jackson framework, and it can serialize an object graph, with the root being the requested Sling Model. For that it recursively serializes all public & protected members and return values of all simple getters. Properly modeled this works quite well, but small errors can have large consequences. While missing data is often quite obvious (if the JSON powers an SPA, you will find it not properly working), too much data being serialized is spotted less frequently (normally not at all).

I am currently exploring options to improve performance, and I am a big fan of the ResourceResolver.getPropertyMap() API to implement a per-resourceresolver cache. While testing such an potential improvement I found customer code, in which the ResourceResolver is serialized via the Sling Model Exporter into JSON. In that case the code looked like this:

@SlingModel
public class MyModel {
 @Self
 Resoruce resource;
 
 ResourceResolver resolver;

 @PostConstruct
 public void init() {
   resolver = resource.getResourceResolver();
 }
}

(see this good overview at Baeldung of the default serialization rules of Jackson.)

And that’s bad in 2 different aspects:

Security: The serialized ResourceResolver object contains next to the data returned by the public getters (e.g. the search paths, userId and potentially other interesting data) also the complete propertyMap. And this serialized cache is probably nothing you want to expose to the consumer of this JSON.
Exceptions: If the getProperty() cache contains instances of classes, which are not publicly exposed (that means these class definitions are hidden within some implementation packages), you will encounter ClassNotFound exceptions during serialization, which will break the export. And instead a JSON you get an internal server error or a partially serialized object graph.

In short: It is not a good idea to serialize a ResourceResolver. And honestly, I have not found a reason to say why this should be possible at all. So right now I am a bit hesitant to use the propertMap as cache, especially in contexts where the Sling Model Exporter might be used. And that blocks me to work on some interesting performance improvements

To unblock this situation, we have introduced a 2 step mechanism, which should help to overcome this situation:

In the latest AEM as a Cloud Service release 14697 (both in the cloud as well as in the SDK) a new WARN message has been added when your Model definition causes a ResourceResolver to be serialized. Search the logs for this message “org.apache.sling.models.jacksonexporter.impl.JacksonExporter A ResourceResolver is serialized with all its private fields containing implementation details you should not disclose. Please review your Sling Model implementation(s) and remove all public accessors to a ResourceResolver.“
It should contain also a referecene to the request path, where this is happening, so it should be easily possible to identify the Sling model class which triggers this serialization and change that piece of code so the ResourceResolver is not serialized anymore. Note, that the above message is just a warning, the behavior remains unchanged.
As a second measure also functionality is implemented, which allows to block the serialization of ResourceResolver via the Sling Model Exporter completely. Enabling this is a breaking change for all AEM as a Cloud Service customers (even if I am 99.999% sure that it won’t break any functionality), and for that reason we cannot enable this change on the spot. But at some point this is step is necessary to guarantee that the above listed 2 problems will never happen.

Right now the first step is enabled, and you will see this log message. If you see this log message, I encourage you to adapt your code (the core components should be safe) so ResourceResolvers are no longer serialized.

In parallel we need to implement step 2; right now the planning is not done yet, but I hope to activate step 2 some time later in 2024 (not before mid of the year). But before this is done, there will be formal announcements in the AEM release notes. And I hope that with this blog post and the release notes all customers have adapted their implementation, so that setting this switch will not change anything.

Update (January 19, 2024): There is now a piece of official AEM documentation covering this situation as well.

by Jörg at January 12, 2024 06:23 PM

December 16, 2023

Things on a content management system - Jörg Hoh

A review of 2023

It’s again December, and so time to review a bit my activities of 2023 in this blog.

I have to admit, I am not a reliable writer, as I write very infrequent. And it’s not because of lack of time, but rather because I rarely find content which (in my opinion) is worth to write about. I don’t have large topics, which i split up into a series of posts. If you ever saw a smaller series of posts, that mostly happen by accident. I was just working on aspects of the system, which at some point I wrote about and afterwards started to understand more. That was the default of the last 15 years … (OMG, am I blogging here really for that long already? Probably, the first post “Why use the dispatcher?” went live on December 22, 2008. So this is the 15th anniversary.)

I started 2023 with 4 posts till the end of of September:

But something changed in October. I had already prepared 2 postings, but I started to come up with more topics within days; it ended with 6 blog posts in October and November, which is quite a pace for this blog:

It felt incredible to be able to announce every few days a new blog post. I don’t think that I can keep that frequency, but I will try to write more often in 2024. I just noted a few topics for the next posts already, stay tuned

Also, if you are reading this blog because you found the link to it somewhere, but you are interested in the topics I write about: You can get notified of new posts immediately by providing me (well, WordPress) your email address (you should see it on the right rail of this page). Alternatively if you are old-style, you can also subscribe to the RSS Feed of this blog, which also contains the full content of the postings. That might be interesting for you, as I normally reference new posts on other channels with some delay, and sometimes I even skip it completely (or simply forget).

Thanks for your attention, and I wish you all a successful and happy 2024.

by Jörg at December 16, 2023 06:03 PM

November 25, 2023

Things on a content management system - Jörg Hoh

Thoughts on performance testing on AEM CS

Performance is an interesting topic on its own, and I already wrote a bit about it in this blog (see the overview). I have not written yet about performance testing in the context of AEM CS. It’s not that it is fundamentally different, but there are some specifics, which you should be aware of.

Perform your performance tests on the Stage environment. The stage environment is kept at the same sizing as the production environment, so it should deliver the same behavior. and your PROD environment, if you have the same content and your test is realistic.
Use a warmup phase. As the Stage environment is normally downscaled to the minimum (because there is no regular traffic), it can take a bit of time until it has upscaled (automatically) to the same number of instances as your PROD is normally operating with. That means that your test should have a proper warmup period, during you which increase the traffic to the normal 100% level of production. This warmup phase should take at least 30 minutes.
I think that any test should take at least 60-90 minutes (including warmup); even if you see early that the result is not what you expect to be, there is often something to learn even from such incorrect/faulty situations. I had the case that a customer was constantly terminating the after about 20-25 minutes, claiming that something was not working server-side as they expected it to be. Unfortunately the situation has not yet settled, so I was not able to get any useful information from the system.
AEM CS comes with a CDN bundled to the environment, and that’s the case also for the Stage environment. But that also means that your performance test should contain all requests, which you expect to be delivered from the CDN. This is important because it can show if your caching is working as intended. Also only then you can assess the impact of the cache misses (when files expire on the CDN) on the overall performance.
While you are at it, you can run a stage pipeline during the performance test and deploy new code. You should not see any significant change in performance during that time.
Oh yes, also do some content activations in that time. That makes your test much more realistic and also reveal potential performance problems when updating content (e.g. because you constantly invalidate the complete dispatcher cache).
You should focus on a large content set when you do the performance test. If you only test a handful of pages/assets/files, you are mostly testing caches (at all levels).
“Campaign-traffic” is rarely tested. This is traffic, which has some query strings attached (e.g. “utm_source”, “gclid” and such) to support traffic attribution. These parameters are ignored while rendering, but they often bypass all caching layers, hitting AEM. And while a regular performance test only tests without these paramters, if you marketing department runs a facebook campaign, the traffic from that campaign looks much different, and then the results of your performance tests are not valid anymore.

Some words as precaution:

A performance test can look like a DOS, and your requests can get blocked for that reason. This can happen especially if these requests are originating from a single source IP. For that reason you should distribute your load injector and use multiple source IP addresses. In case you still get blocked, please contact support so we can adapt accordingly.
AEM CS uses an affinity cookie to indicate that requests of a user-agent are handled by a specific backend system. If you use the same affinity cookie throughout all your performance tests, you just test a single backend system; and that effectively disables any loadbalancing and renders the performance test results unusable. Make sure that you design your performance tests with that in mind.

I general I prefer it much if I can help you during the performance phase, than to handle escalations for of bad performance and potential outages because of it. I hope that you think the same way.

by Jörg at November 25, 2023 11:45 AM

November 19, 2023

Things on a content management system - Jörg Hoh

If you have curl, every problem looks like a request

If you are working in IT (or a crafter) you should know the saying: “When you have a hammer, every problem looks like a nail”. It describes the tendency of people, that if they have a tool, which helps them to reliably solve a specific problem, that they will try to use this tool at every other problem, even if it does not fit at all.

Sometimes I see this pattern in AEM as well, but not with a hammer, but with “curl”. Curl is a commandline HTTP client, and it’s quite easy to fire a request against AEM and do something with the output of it. It’s something every AEM developer should be familiar with, also because it’s a great tool to automate things. And if you talk about “automating AEM”, the first thing people often come up with is “curl”…

And there the problem starts: Not every problem can be automated with curl. For example take a periodic data export from AEM. The immediate reaction of most developers (forgive me if I generalize here, but I have seen this pattern too often!) is to write a servlet to pull all this data together, create a CSV list and then use curl to request this servlet every day/week.

Works great, does it? Good, mark that task as done, next!

Wait a second, on prod it takes 2 minutes to create that list. Well, not a problem, right? Until it takes 20 minutes, because the number of assets is growing. And until you move to AEM CS, where the timeout of requests is 60 seconds, and your curl is terminated with a statuscode 503.

So what is the problem? It is not the timeout of 60 seconds; and it’s also not the constantly increasing number of assets. It’s the fact, that this is a batch operation, and you use a communication pattern (request/response), which is not well suited for batch operations. It’s the fact, that you start with curl in mind (a tool which is built for the request/response pattern) and therefor you build the implementation around it this pattern. You have curl, so every problem is solved with a request.

What are the limits of this request/response pattern? Definitely the runtime is a limit, and actually for 3 reasons:

The timeout for requests on AEM CS (or basically any other loadbalancer) is set for security reasons and to keep the prevent misuse. Of course the limit of 60 seconds in AEM CS is a bit arbitrary, but personally I would not wait 60 seconds for a webpage to start rendering. So it’s as good as any higher number.
There is another limit, which is determined by the availability of the backend system, which is actually processing this request. In an high-available and autoscaling environment systems start and stop in an automated fashion, managed by a control-plane which operates on a set of rules. And these rules can enforce, that any (AEM-) system will be forced to shutdown at maximum 10 minutes after it has stopped to receive new requests. And that means for a requests, which would take constantly 30+ minutes, that it might be terminated, without finishing successfully. And it’s unclear if your curl would even realize it (especially if you are streaming the results).
(And technically you can also add that the whole network connection needs to be kept open for that long, and AEM CS itself is just a single factor in there. Also the internet is not always stable, you can experience network hiccups and any point in time. It’s normally just well hidden by retrying failing requests. Which is not an option here, because it won’t solve the problem at all.)

In short: If your task can take long (say: 60+ seconds), then a request is not necessarily the best option to implement it.

So, what options do you have then? Well, the following approach works also in AEM CS:

Use a request to create and initiate your task (let’s call it a “job”);
And then poll the system until this job is completed, then return the result.

This is an asynchronous pattern, and it’s much more scalable when it comes to the amount of processing you can do in there.

Of course you cannot use a single curl command anymore, but now you need to write a program to execute this logic (don’t write it in a shell-script please!); but on the AEM side you can now use either sling jobs or AEM workflows and perform the operation.

But this avoids this restriction on 60 seconds and it can handle restarts of AEM transparently, at least on author side. And you have the huge benefit, that you can collect all your errors during the runtime of this job and decide afterwards, if the execution was a success or failed (which you cannot do in HTTP).

So when you have long-running operations, check if you need to do them within a request. In many cases it’s not required, and then please switch gears to some asynchronous pattern. And that’s something you can do even before the situation starts to get a problem.

by Jörg at November 19, 2023 04:32 PM

November 09, 2023

Things on a content management system - Jörg Hoh

Identify repository access

Performance tuning in AEM is typically a tough job. The most obvious and widely known aspect is the tuning of JCR queries, but that’s all; if your code is not doing any JCR query and still slow, it’s getting hard. For requests my standard approach is to use “Recent requests” and identify slow components, but that’s it. And then you have threaddumps, but these are hardly helping here. There is no standard way to diagnose further without relying on gut feeling and luck.

When I had to optimize a request last year, I thought again about this problem. And I asked myself the question:
Whenever I check this request in the threaddumps, I see the code accessing the repository. Why is this the case? Is the repository slow or is it just accessing the repository very frequently?

The available tools cannot answer this question. So I had to write myself something which can do that. In the end I committed it to the Sling codebase with SLING-11654.

The result is an additional logger, (“org.apache.sling.jcr.resource.AccessLogger.operation” on loglevel TRACE) which you can enable and which can you log every single (Sling) repository access, including the operation, the path and the full stacktrace. That is a huge amount of data, but it answered my question quite thoroughly.

The repository is itself is very fast, because a request (taking 500ms in my local setup) performs 10’000 times a repository access. So the problem is rather the total number of repository access.
Looking at the list of accessed resources it became very obvious, that there is a huge number of redundant access. For example these are the top 10 accessed paths while rendering a simple WKND page (/content/wknd/language-masters/en/adventures/beervana-portland):
- 1017 /conf/wknd/settings/wcm/templates/adventure-page-template/structure
- 263 /
- 237 /conf/wknd/settings/wcm/templates
- 237 /conf/wknd/settings/wcm
- 227 /content
- 204 /content/wknd/language-masters/en
- 199 /content/wknd
- 194 /content/wknd/language-masters/en/adventures/beervana-portland/jcr:content
- 192 /content/wknd/jcr:content
- 186 /conf/wknd/settings

But now with that logger, I was able to identify access patterns and map them to code. And suddenly you see a much bigger picture, and you can spot a lot of redundant repository access.

With that help I identified the bottleneck in the code, and the posting “Sling Model performance” was the direct result of this finding. Another result was the topic for my talk at AdaptTo() 2023; checkout the recording for more numbers, details and learnings.

But with these experiences I made an important observation: You can use the number of repository access as a proxy metric for performance. The more repository access you do, the slower your application will get. So you don’t need to rely so much on performance tests anymore (although they definitely have their value), but you can validate changes in the code by counting the number of repository access performed by it. Less repository access is always more performant, no matter the environmental conditions.

And with an additional logger (“org.apache.sling.jcr.AccessLogger.statistics” on TRACE) you can get just the raw numbers without details, so you can easily validate any improvement.

Equipped with that knowledge you should be able to investigate the performance of your application on your local machine. Looking forward for the results

(This is currently only available on AEM CS / AEM CS SDK, I will see to get it into an upcoming AEM 6.5 servicepack.)

by Jörg at November 09, 2023 01:28 PM

November 04, 2023

Things on a content management system - Jörg Hoh

The Explain Query tool

When there’s a topic which has been challenging forever in the AEM world, then it’s JCR queries and indexes. It can feel like an arcane science, where it’s quite easy to mess up and end up with a slow query. I learned it also the hard way, and a printout of the JCR query cheatsheet is always below my keyboard.

But there were some recent changes, which made the work with query performance easier. First, in AEM CS the Explain Query tool has been added, which is also available via the AEM Developer Console. It displays queries, slow queries, number of rows read, the used index, execution plan etc. But even with that tool alone it’s still hard to understand what makes a query performant or slow.

Last week there was a larger update to the AEM documentation (thanks a lot, Tom!), which added a detailed explanation of the Explain Query tool. Especially it drills down into the details of the query execution plan and how to interpret it.

With this information and the good examples given there you should be able to analyze the query plan of your queries and optimize the indexes and queries before you execute them the first time in production.

by Jörg at November 04, 2023 06:22 PM

October 16, 2023

Things on a content management system - Jörg Hoh

3 rules how to use an HttpClient in AEM

Many AEM applications consume data from other systems, and in the last decade the protocol of choice turned out to the HTTP(S). And there are a number of very mature HTTP clients out, which can be used together with AEM. The most frequently used variant is the Apache HttpClient, which is shipped with AEM.

But although the HttpClient is quite easy to use, I came across a number of problems, many of them result in service outages. In this post I want to list the 3 biggest mistakes you can make when you use the Apache HttpClient. While I observed the results in AEM as a Cloud Service, the underlying effects are the same on-prem and in AMS, the resulting effects can be a bit different.

Reuse the HttpClient instance

I often see that a HttpClient instance is created for a single HTTP request, and in many cases it’s not even closed properly afterwards. This can lead to these consequences:

If you don’t close the HttpClient instance properly, the underlying network connection(s) will not be closed properly, but eventually timeout. And until then the network connections stays open. If you using a proxy with a connection limit (many proxies do that) this proxy can reject new requests.
If you re-create a HttpClient for every request, the underlying network connection will get re-established every time with the latency of the 3-way handshake.

The reuse of the HttpClient object and its state is also recommended by its documentation.

The best way to make that happen is to wrap the HttpClient into an OSGI service, create it on activation and stop it when the service is deactivated.

Set agressive connection- and read-timeouts

Especially when an outbund HTTP request should be executed within the context of a AEM request, performance really matters. Every milisecond which is spent in that external call makes the AEM request slower. This increases the risk of exhausting the Jetty thread pool, which then leads to non-availability of that instance, because it cannot accept any new requests. I have often seen AEM CS outages because a backend was not responding slowly or not at all. All requests should finish quickly, and in case of errors must also return fast.

That means, timeouts should not exceed 2 second (personally I would prefer even 1 second). And if your backend cannot respond that fast, you should reconsider its fitness for interactive traffic, and try not to connect to it in a synchronous request.

Implement a degraded mode

When your backend application responds slowly, returns errors or is not available at all, your AEM application should be react accordingly. I had the case a number of times that any problem on the backend had an immediate effect on the AEM application, often resulting in downtimes because either the application was not able to handle the results of the HttpClient (so the response rendering failed with an exception), or because the Jetty threadpool was totally consumed by those requests.

Instead your AEM application should be able to fallback into a degraded mode, which allows you to display at least a message, that something is not working. In the best case the rest of the site continues to work as usual.

If you implement these 3 rules when you do your backend connections, and especially if you test the degraded mode, your AEM application will be much more resilient when it comes to network or backend hiccups, resulting in less service outages. And isn’t that something we all want?

by Jörg at October 16, 2023 01:43 PM

October 14, 2023

Things on a content management system - Jörg Hoh

Recap: AdaptTo 2023

It was adapTo() time again, the first time again in an in-person format since 2019. And it’s definitely much different from the virtual formats we experienced during the pandemic. More personal, and allowing me to get away from the daily work routine; I remember that in 2020 and 2021 I constantly had work related topics (mostly Slack) on the other screen, while I was attending the virtual conference. That’s definitely different when you are at the venue

And it was great to see all the people again. Many of the people which are part of the community for years, but also many new faces. Nice to see that the community can still attract new people, although I think that the golden time of the backend-heavy web-development is over. And that was reflected on stage as well, with Edge Delivery Services being quite a topic.

As in the past years, the conference itself isn’t that large (this year maybe 200 attendees) and it gives you plenty of chances to get in touch and chat about projects, new features, bugs and everything else you can imagine. The location is nice, and Berlin gives you plenty of opportunities to go out for dinner. So while 3 days of conference can definitely be exhausting, I would have liked to spend much more dinners with attendees.

I got the chance to come on stage again with one of my favorite topics: Performance improvement in AEM, a classic backend topic. According to the talk feedback, people liked it
Also, the folks of the adaptTo() recorded all the talks and you can find both the recording and the slide deck on the talk’s page.

The next call for papers is already announced to start in February ’24), and I will definitely submit a talk again. Maybe you as well?

by Jörg at October 14, 2023 04:07 PM

July 13, 2023

Things on a content management system - Jörg Hoh

AEM CS & dedicated egress IP

Many customers of AEM as a Cloud Service are used to perform a first level of access control by allowing just a certain set of IP addresses to access a system. For that reason they want that their AEM instances use a static IP address or network range to access their backend systems. AEM CS supports with this with the feature called “dedicated egress IP address“.

But when testing that feature there is often the feedback, that this is not working, and that the incoming requests on backend systems come from a different network range. This is expected, because this feature does not change the default routing for outgoing traffic for the AEM instances.

The documentation also says

Http or https traffic will go through a preconfigured proxy, provided they use standard Java system properties for proxy configurations.

The thing is that if traffic is supposed to use this dedicated egress IP, you have to explicitly make it use this proxy. This is important, because by default not all HTTP Clients do this.

For example, the in the Apache HTTP Client library 4.x, the HttpClients.createDefault() method does not read the system properties related proxying, but the HttpClients.createSystem() does. Same with the java.net.http.HttpClient, for which you need to configure the Builder to use a proxy. Also okhttp requires you to configure the proxy explicitly.

So if requests from your AEM instance is coming from the wrong IP address, check that your code is actually using the configured proxy.

by Jörg at July 13, 2023 07:40 AM

July 02, 2023

Things on a content management system - Jörg Hoh

Sling Model Exporter: What is exported into the JSON?

Last week we came across a strange phenomenon, when in the AEM release validation process the process broke in an unexpected situation. Which is indeed a good thing, because it covered an aspect I have never thought of.

The validation broke because during a request the serialization of a Sling Model failed with an exception. The short version: It tried to serialize a ResourceResolver(!) into JSON (more details in SLING-11924). Why would anyone serialize a ResourceResolver into a JSON to be consumed by an SPA? I clearly believe that this was not done intentionally, but happened by accident. But nevertheless, it broke the improvement we intended to make, so we had to rollback it and wait for SLING-11924 being implemented.

But it gives me the opportunity to explain, which fields of a Sling Model are exported by the SlingModelExporter. As it is backed by the Jackson data-bind framework, the same rules apply:

All public fields are serialized
all public available getter methods, which do not expect a parameter are serialized.

It is not too hard to check this, but there are a few subtle aspect to consider in the context of Sling Models.

Injections: make sure that you make only these injections as public, which you want to be dealt with by the SlingModelExporter. Make everything else private.
I see often Lombok used to create getters for SlingModels (because you need them for the use in HTL). This is especially problematic, when the annotation @Getter is done on a class-level, because now for every field (not matter the visibility) a getter is created, which is then picked up by the SlingModelExporter.

My call to action: Validate your SlingModels and check them that you don’t export a ResourceResolver by accident. (If you are a AEM as a Cloud Service customer and affected by this problem, you will probably get an email from us, telling you to do exactly that.)

by Jörg at July 02, 2023 06:38 PM

January 12, 2023

Things on a content management system - Jörg Hoh

Sling models performance (part 3)

In the first and second part of this series “Sling Models performance” I covered aspects which can degrade the performance of your Sling models, be it by not specifying the correct injector or by re-using complex models for very simple cases (by complex PostConstruct models).

And there is another aspect when it comes to performance degradation, and it starts with a very cool convenience function. Because Sling Models can create a whole tree of objects. Imagine this code as part of a Sling Model:

@ChildResource
AnotherModel child;

It will adapt the child-resource named “child” into the class “AnotherModel” and inject it. This nesting is a cool feature and can be a time-saver if you have a more complex resource structure to model your content.

But also it comes with a price, because it will create another Sling Model object; and even that Sling Model can trigger the creation of more Sling Models, and so on. And as I have outlined in my previous posts, the creation of these Sling Models does not come for free. So if your “main Sling Model” internally creates a whole tree of Sling Models, the required time will increase. Which can be justified, but not if you just need a fraction of the data of the Sling Models. So is it worth to spend 10 miliseconds to create a complex Sling Model just to call a simple getter of it, if you could retrieve this information alone in just 10 microseconds?

So this is a situation, where I need to repeat what I have written already in part 2:

When you build your Sling Models, try to resolve all data lazily, when it is requested the first time.
Sling Model Perforamance (part 2)

But unfortunately, injectors do not work lazily but eagerly; injections are executed as part of construction of the model. Having a lazy injection would be a cool feature …

So until this is available, you should use check the re-use of Sling Model quite carefully; always consider how much work is actually done in the background, and if the value of reusing that Sling Model is worth the time spent in rendering.

by Jörg at January 12, 2023 04:38 PM

January 02, 2023

Things on a content management system - Jörg Hoh

The most expensive HTTP request

TL;DR: When you do a performance test for your application, also test a situation where you just fire large number of invalid requests; because you need to know if your error-handling is good enough to withstand this often unplanned load.

In my opinion the most expensive HTTP requests are the ones which return with a 404. Because they don’t bring any value, are not as easily cacheable as others and are very easily to generate. If you are looking into AEM logs, you will often find requests from random parties which fire a lot of requests, obviously trying to find vulnerable software. But in AEM these always fail, because there are not resources with these names, returning a statuscode 404. But this turns a problem if these 404 pages are complex to render, taking 1 second or more. In that case requesting 1000 non-existing URLs can turn into a denial of service.

This can even get more complex, if you work with suffixes, and the end user can just request the suffix, because you prepend that actual resource by mod_rewrite on the dispatcher. In such situations the requested resource is present (the page you configured), but the suffix can be invalid (for example point to a non-existing resource). Depending on the implementation you can find out very late about this situation; and then you have already rendered a major part of the page just to find out that the suffix is invalid. This can also lead to a denial of service, but is much harder to mitigate than the plain 404 case.

So what’s the best way to handle such situations? You should test for such a situation explicitly. Build a simple performance test which just fires a few hundreds requests triggering a 404, and observe the response time of the regular requests. It should not drop! If you need to simplify your 404 pages, then do that! Many popular websites have very stripped down 404 pages for just that reason.

And when you design your URLs you should always have in mind these robots, which just show up with (more or less) random strings.

by Jörg at January 02, 2023 01:56 PM

December 21, 2022

Things on a content management system - Jörg Hoh

AEM article review December 2022

I am doing this blog now for quite some time (the first article in this blog dates back to December 2008! That was the time of CQ 5.0! OMG), and of course I am not the only one writing on AEM. Actually the number of articles which are produced every months is quite large, but I am often a bit disappointed because many just reproduce some very basic aspects of AEM, which can be found at many places. But the amount of new content which describe aspects which have barely been covered by other blog posts or the official product documentation is small.

For myself I try to focus on such topics, offer unique views on the product and provide recommendations how things can be done (better), all based on my personal experiences. I think that this type of content is appreciated by the community, and I get good feedback on it. To encourage the broader community to come up with more content covering new aspects I will do a little experiment and promote a few selected articles of others. I think that these article show new aspects or offer a unique view on certain on AEM.

Depending on the feedback I will decide i will continue with this experiment. If you think that your content also offers new views, uncovers hidden features or suggests best practices, please let me know (see the my contact data here). I will judge these proposals on the above mentioned criteria. But of course it will be still my personal decision.

Let’s start with Theo Pendle, who has written an article on how to write your own custom injector for Sling Models. The example he uses is a real good one, and he walks you through all the steps and explains very well, why that is all necessary. I like the general approach of Theos writing and consider the case of safely injecting cookie values as a valid for such a injector. But in general I think that there are not many other cases out there, where it makes sense to write custom injectors.

Also on a technical level John Mitchell has his article “Using Sling Feature Flags to Manage Continous Releases“, published on the Adobe Tech Blog. He introduces Sling Features and how you can use them to implement Feature Flags. And that’s something I have not seen used yet in the wild, and also the documentation is quite sparse on it. But he gives a good starting point, although a more practical example would be great

The third article I like the most. Kevin Nenning writes on “CRXDE Lite, the plague of AEM“. He outlines why CRXDE Lite has gained such a bad reputation within Adobe, that disabling CRXDE Lite is part of the golive checklist for quite some time. But on the other hand he loves the tool because it’s a great way for quick hacks on your local development instance and for a general read-only tool. This is an article every AEM developer should read.
And in case you haven’t seen it yet: AEM as a Cloud Service offers the repository browser in the developer console for a read-only view on your repo!

And finally there is Yuri Simione (an Adobe AEM champion), who published 2 articles discussing the question “Is AEM a valid Content Services Plattform?” (article 1, article 2). He discusses an implementation which is based on Jackrabbit/Oak and Sling (but not AEM) to replace an aging Documentum system. And finally he offers an interesting perspective on the future of Jackrabbit. Definitely a read if you are interested in a more broader use of AEM and its foundational pieces.

That’s it for December. I hope you enjoy these articles as much as I did, and that you can learn from them and get some new inspiration and insights.

by Jörg at December 21, 2022 05:29 PM

December 12, 2022

Things on a content management system - Jörg Hoh

Sling Models performance, part 2

In the last blog post I demonstrated the impact of the correct type of annotations on performance of Sling Models. But there is another aspect of Sling Models, which should not be underestimated. And that’s the impact of the method which is annotated with @PostConstruct.

If you are not interested in the details, just skip to the conclusion at the bottom of this article.

To illustrate this aspect, let me give you an example. Assume that you have a navigation (or list component) in which you want to display only pages of the type “product pages” which are specifically marked to be displayed. Because you are developer which is favoring clean code, you already have a “ProductPageModel” Sling Model which also offers a “showInNav()” method. So your code will look like this:

List<Page> pagesToDisplay = new ArrayList<>();
for (Page child : page.listChildren()) {
  ProductPageModel ppm = child.adaptTo(ProductPageModel.class);
  if (ppm != null && ppm.showInNav()) {
    pagesToDisplay.add(child);
  }
}

This works perfectly fine; but I have seen this approach to be the root cause for severe performance problems. Mostly because the ProductPageModel is designed the one and only Sling Model backing a Product Page; the @PostConstruct method of the ProductPageModel contains all the logic to calculate all retrieve and calculate all required information, for example Product Information, datalayer information, etc.

But in this case only a simple property is required, all other properties are not used at all. That means that the majority of the operations in the @PostConstruct method are pure overhead in this situation and consuming time. It would not be necessary to execute them at all in this case.

Many Sling Models are designed for a single purpose, for example rendering a page, where such a sling model is used extensively by an HTL scriptlet. But there are cases where the very same SlingModel class is used for different purposes, when only a subset of this information is required. But also in this case the whole set of properties is resolved, as it you would need for the rendering of the complete page.

I prepared a small test-case on my github account to illustrate the performance impact of such code on the performance of the adaption:

ModelWithPostConstruct contains a method annotated with @PostConstruct, which resolves a another property via an InheritanceValueMap.
ModelWithoutPostConstruct provides the same semantic, but executes the calculations lazy, only when the information is required.

The benchmark is implement in a simple servlet (SlingModelPostConstructServlet), which you can invoke on the path “/bin/slingmodelpostconstruct”

$ curl -u admin:admin http://localhost:4502/bin/slingmodelpostconstruct
test data created below /content/cqdump/performance
de.joerghoh.cqdump.performance.core.models.ModelWithPostconstruct: single adaption took 50 microseconds
de.joerghoh.cqdump.performance.core.models.ModelWithoutPostconstruct: single adaption took 11 microseconds

The overhead is quite obvious, almost 40 microseconds per adaption; of course it’s dependent on the amount of logic within this @PostConstruct method. And this postconstruct method is quite small, compared to other SlingModels I have seen. And in the cases where only a minimal subset of the information is required, this is pure overhead. Of course the overhead is often minimal if you just consider a single adaption, but given the large number of Sling Models in typical AEM projects, the chance is quite high that this turns into a problem sooner or later.

So you should pay attention on the different situations when you use your Sling Models. Especially if you have such vastly different cases (rendering the full page vs just getting one property) you should invest a bit of time and optimize them for these usecases. Which leads me to the following:

Conclusion

When you build your Sling Models, try to resolve all data lazily, when it is requested the first time. Keep the @PostConstruct method as small as possible.

by Jörg at December 12, 2022 08:17 AM

November 28, 2022

Things on a content management system - Jörg Hoh

Sling Model Performance

In my daily job as an SRE for AEM as a Cloud Service I often have to deal with performance questions, especially in the context of migrations of customer applications. Applications sometimes perform differently on AEM CS than they did on AEM 6.x, and a part of my job is to look into these cases.

This often leads to interesting deep dives and learnings; you might have seen this reflected in the postings of this blog The problem this time was a tight loop like this:

for (Resource child: resource.getChildren()) { SlingModel model = child.adaptTo(SlingModel.class); if (model != null && model.hasSomeCondition()) { // some very lightweight work } }

This code performed well with 1000 child resources in a AEM 6.x authoring instance, but quite poorly on an AEM CS authoring instance with the same number of child nodes. And the problem is not the large number of childnodes …

After wading knee-deep through TRACE logs I found the problem at an unexpected location. But before I present you the solution and some recommendations, let me you explain some background. But of course you can skip the next section and jump directly to the TL;DR at the bottom of this article.

SlingModels and parameter injection

One of the beauties of Sling Models is that these are simple PoJos, and properties are injected by the Sling Models framework. You just have to add matching annotations to mark them accordingly. See the full story in the official documentation.

The simple example in the documentation looks like this:

@Inject String title;

which (typically) injects the property named “title” from the resource this model was adapted from. The same way you can inject services, child-nodes any many other useful things.

To make this work, the framework uses an ordered list of Injectors, which are able to retrieve values to be injected (see the list of available injectors). The first injector which returns a non-null value is taken and its result is injected. In this example the ValueMapInjector is supposed to return a property called “title” from the valueMap of the resource, which is quite early in the list of injectors.

Ok, now let’s understand what the system does here:

@Inject @Optional String doesNotExist;

Here a optional field is declared, and if there is no property called “doesNotExist” in the valueMap of the resource, other injectors are queried if they can handle that injection. Assuming that no injector can do that, the value of the field “doesNotExist” remains null. No problem at first sight.

But indeed there is a problem, and it’s perfomance. Even the lookup of a non-existing property (or node) in the JCR takes time, and doing this a few hundred or even thousand times in a loop can slow down your code. And a slower repository (like the clustered MongoDB persistence in the AEM as a Cloud Service authoring instances) even more.

To demonstrate it, I wrote a small benchmark (source code on my github account), which does a lot of adaptions to Sling Models. When deployed to AEM 6.5.5 or later (or a recent version of the AEM CS SDK) you can run it via curl -u admin:admin http://localhost:4502/bin/slingmodelcompare

This is its output:

de.joerghoh.cqdump.performance.core.models.ModelWith3Injects: single adaption took 18 microseconds de.joerghoh.cqdump.performance.core.models.ModelWith3ValueMaps: single adaption took 16 microseconds de.joerghoh.cqdump.performance.core.models.ModelWithOptionalValueMap: single adaption took 18 microseconds de.joerghoh.cqdump.performance.core.models.ModelWith2OptionalValueMaps: single adaption took 20 microseconds de.joerghoh.cqdump.performance.core.models.ModelWithOptionalInject: single adaption took 83 microseconds de.joerghoh.cqdump.performance.core.models.ModelWith2OptionalInjects: single adaption took 137 microseconds

It’s a benchmark which on a very simple list of resources tries adaptions to a number of Model classes, which are different in their type of annotations. So adapting to a model which injects 3 properties takes approximately 20 microseconds, but as soon as a model has a failing injection (which is declared with “@Optional” to avoid failing the adaption), the duration increases massively to 83 microseconds, and even 137 microseconds when 2 these failed injections are there.

Ok, so having a few of such failed injections do not make a problem per se (you could do 2’000 within 100 milliseconds), but this test setup is a bit artificial, which makes these 2’000 a really optimistic number:

It is running on a system with a fast repository (SDK on my M1 Macbook); so for example the ChildResourceInjector does not has almost no overhead to test for the presence of a childResource called “doesNotExist”. This can be different, for example on AEM CS Author the Mongo storage has a higher latency than the segmentStore on the SDK or a publish. If that (non-existing) child-resource is not in the cache, there is an additional latency in the range of 1ms to load that information. What for? Well, basically for nothing.
The OsgiInjector is queried as well, which tries to access the OSGI ServiceRegistry; this registry is a central piece of OSGI, and it’s consistency is heavily guarded by locks. I have seen this injector being blocked by these locks, which also adds latency.

That means that these 50-60 microseconds could easily multiply, and then the performance is getting a problem. And this is the problem which initially sparked this investigation.

So what can we do to avoid this situation? That is quite easy: Do not use @Inject, but use the specialized injectors directly (see them in the documentation). While the benefit is probably quite small when it comes to properties which are present (ModelWith3Injects tool 18 microseconds vs 16 microseconds of ModelWith3ValueMaps), the different gets dramatic as soon as we consider failed injections:

1 failed invocation: 18 microseconds (ModelWithOptionalValueMap) vs 83 microseconds (ModelWithOptionalInject)
2 failed invocations: 20 microseconds (ModelWith2OptionalValueMaps) vs 137 microseconds (ModelWith2OptionalInjects)
And with every more failed injections for that model the penalty will increase by another 50-60 microseconds.

Even in my local benchmark the improvement can be seen quite easily, there is almost no overhead of such a failed injection, if I explicitly mark them as Injection via the ValueMapInjector. And as mentioned, this overhead can be even larger in reality.

Still, this is a micro-optimization in the majority of all cases; but as mentioned already, many of these optimizations implemented definitely can make a difference.

TL;DR Use injector-specific annotations

Instead of @Inject use directly the correct injector. You normally know exactly where you want that injected value to come from.
And by the way: did you know that the use of @Inject is discouraged in favor of these injector-specific annotations?

Update: The Sling Models documentation has been updated and explicitly discourages the use of @Inject now.

by Jörg at November 28, 2022 08:25 AM

October 31, 2022

Things on a content management system - Jörg Hoh

Limits of dispatcher caching with AEM as a Cloud Service

In the last blog post I proposed 5 rules for Caching with AEM, how you should design your caching strategy. Today I want to show another aspect of rule 1: Prefer caching at the CDN over caching at the dispatcher.

I already explained that the CDN is always located closer to the consumer, so the latency is lower and the experience will be better. But when we limit the scope to AEM as a Cloud Service, the situation gets a bit complicated, because the dispatcher is not able to cache files for more than 24 hours.

This is caused by a few architectural decisions done for AEM as a Cloud Service:

The dispatcher is co-hosted with its AEM publish instance to ensure that there is always a 1:1 relation between the AEM publish and its assigned dispatcher.
All publish instances are re-created every 24 hours.

These 2 decisions lead to the fact, that no dispatcher cache can hold files fore more than 24 hours because the instance is terminated after that time. And there are other situations where the publishs are to be re-created, for example during deployments and up/down-scaling situations, and then the cache does not contain files for 24 hours, but maybe just 3 hours.

This naturally can limit the cache-hit ratio in cases where you have content which is requested frequently but is not changed in days/weeks or even months. In an AEM as a Cloud Service setup these files are then rendered once per day (or more often, see above) per publish/dispatcher, while in other setups (for example AMS on on-prem setups where long-living dispatcher caches are pretty much default) it can delivered from the dispatcher cache without the need to re-render it every day.

The CDN does not have this limitation. It can hold for days and weeks and deliver them, if the TTL settings allow this. But as you can control the CDN only via TTL, you have to make a tradeoff between cache-hit ratio on the CDN and the accuracy of the delivered content regarding a potential change.

That means:

If you have files which do not change you just set a large TTL to them and then let the CDN handle them. A good example are clientlibs (JS and CSS files), because they have a unique name (an additional selector which is created as a hash over the content of the file.
If there’s a chance that you make changes to such content (mostly pages), you should set a reasonable TTL (and of course “stale-while-revalidate”) and accept that your publishs need to re-render these pages when the time has passed.

That’s a bit a drawback of the AEM as a Cloud Service setup, but on the hand side your dispatcher caches are regularly cleared.

by Jörg at October 31, 2022 02:02 PM

October 17, 2022

Things on a content management system - Jörg Hoh

Dispatcher, CDN and Caching

In today’s web performance discussions, there is a lot of focus on the browser as the most important. Google defines Web Core Vitals, and there are many other aspects which are important to have a fast site. Plus then SEO …

While many developers focus on these, I see that many sites often neglect the importance of proper caching. While many of these sites already use a CDN (in AEM CS a CDN is part of any offering), they often do not use the CDN in an optimal way; this can result in slow pages (because of the network latency) and also unnecessary load on the backend systems.

In this blog post I want to outline some ways how you can optimize your site for caching, with a focus on AEM in combination with a CDN. It does not really matter if it is AEM as a Cloud Service or AEM on AMS or on-premises, these recommendations can be applied to all of them.

Rule 1: Prefer caching at the CDN over caching at the dispatcher

The dispatcher is located close the AEM instance and typically co-located to your AEM instances. There is a high latency from the dispatcher to the end-user, especially if your end-users are spread across the globe. For example the average latency between Frankfurt/Germany and Sydney/Australia is approximately 250ms, and that makes browsing a website not really fast. Using a decent CDN can cut reduce these numbers dramatically.

Also a CDN is better suited to handle millions of requests per minute than a bunch of VMs running dispatcher instances, both from a cost perspective and from a perspective of knowhow required to operate at that scale.

That means that your caching strategy should aim for an optimal caching at the CDN level. The dispatcher is fine as a secondary cache to handle cache-misses or expired cache items. But no direct enduser request should it ever make through to the dispatcher.

Rule 2: Use TTL-based invalidation

The big advantage of the dispatcher is the direct control of the caching. You deliver your content from the cache, until you change that content. And immediately after the change the cache is actively invalidated, and your changed content is delivered. But you cannot use the same approach for CDNs, and while the CDNs made reasonable improvements to reduce the time to actively invalidate content from the CDNs, it still takes minutes.

A better approach is to use a TTL-based (time-to-live) invalidation (or rather: expiration), where every CDN node can decide on its own if a file in the cache is still valid or not. And if the content is too old, it’s getting refetched from the origin (your dispatchers).

Although this approach introduces some latency from the time of content activation to the time all users world-wide are to see it, such a latency is acceptable in general.

Rule 3: Staleness is not (necessarily) a problem

When you optimize your site, you need not only optimize that every request is requested from the CDN (instead from your dispatchers); but you also should think about what happens if a requested file is expired on the CDN. Ideally it should not matter much.

Imagine that you have a file which is configured with a TTL of 300 seconds. What should happen if this file is requested 301 seconds after it has been stored in the CDN cache. Should the CDN still deliver it (and accept that the user receives a file which can be a bit older than specified) or do you want to the user to wait until the CDN has obtained a fresh copy of that file?
Typically you accept that staleness for a moment and deliver the old copy for a while, until the CDN has obtained a fresh copy in the background. Use the “stale-while revalidate” caching headers to configure this behavior.

Rule 4: Pay attention to the 404s

A HTTP status 404 (“File not found”) is tricky to handle, because by default a 404 is not cached at the CDN. That means that all those requests will hit your dispatcher and eventually even your AEM instances, which are the authoritative source to answer if such a file exists. But the number of requests a AEM instance can handle is much smaller than the number the dispatchers or even the CDN can handle. And you should reserve these precious resources on doing something more useful than responding with “sorry, the resource you requested is not here”.

For that reason check the 404s and handle them appropriately; you have a number of options for that:

Fix incorrect links which are under your control.
Create dispatcher rules or CDN settings which handle request patterns which you don’t control, and return a 404 from there.
You also have the option to allow the CDN to cache a 404 response.

In any way, you should manage the 404s, because they are most expensive type of requests: You spend resources to deliver “nothing”.

Rule 5: Know your query strings

Query strings for requests were used a lot of provide parameters to the server-side rendering process, and you might use that approach as well in your AEM application. But query strings are also used a lot to tag campaign traffic for correct attribution; you might have seen such requests already, they often contain parameters like “utm_source”, “fbclid” etc. But these parameters do not have impact on the server-side rendering!
Because these requests cannot be cached by default, CDN and dispatcher will forward all requests containing any query string to AEM. And that’s again the most scarce resource, and having it rendered there will again impose the latency hit on your site visitors.

The dispatcher has the ability of remove named query strings from the request, which enables it to serve such requests from the dispatcher cache; that’s not as good as serving these requests from the CDN but much much better than handling them on AEM. You should use that as much as possible.

If you follow these rules, you have the chance not to only improve the user experience for your visitors, but at the same time you make your site much more scalable and resilient against attacks and outages.

by Jörg at October 17, 2022 06:58 AM

July 05, 2022

Things on a content management system - Jörg Hoh

What’s the maximum size of a node in JCR/AEM?

An interesting question which comes up every now and then is: “Is there a limit how large a JCR node can get?”. And as always in IT, the answer is not that simple.

In this post I will answer that question and also outline why this limit is hardly a constraint in AEM development. Also I will show ways how you can design your application so that this limit is not a problem at all.

(Allow me a personal note here: For me the most interesting part of that question is the motivation behind it. When this question I asked I typically have the impression that the folks know that they are a bit off-limit here, because this is a topic which is discussed very rarely (if at all). That means they know that they (plan to) do something which violates some good practices. And for that reason they request re-assurance. For me this always leaves the question: Why do they do it then?? Because when you follow the recommended ways and content architecture patterns, you will never hit such a limit.)

We first have to distinguish between binaries and non-binaries. For binaries there is no real limit as they are stored in the blobstore. You can put files with 50GB in size there, not a problem. Such binaries are represented either using the nodetype “nt:file” (used most often) or using binary properties (rarely used).

And then there is the non-binary data. This data comprises of all other node- and property-types, where the information is stored within the nodestore (also often as multi-value properties). Here are limits.

In AEM CS MongoDB is used as data storage, and the maximum size of a MongoDB document is 16 Megabyte. As an approximation (it’s not always the case), you can assume that a single JCR node with all its properties is stored in a single MongoDB document, which directly results in a maximum size per node: 16 Megabytes.

In reality a node cannot get that large; other data is also stored inside that document. I recommend to store never more than 1 Megabyte of non-binary properties inside a single node. Technically you don’t have that limit in a TarMK/SegmentTar-only setup, but I would not exceed it either. You will have all kind of interesting problems and you barely have experience with such large nodes in the AEM world.

If you actually violate this limit in the size of a document, you get this very nasty exception and your content will not be stored:

javax.jcr.RepositoryException: OakOak0001: Command failed with error 10334 (BSONObjectTooLarge): ‘BSONObj size: 17907734 (0x1114016) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: “7:/var/workflow/instances/server840/2022-06-01/xxx_reviewassetsworkflow1_114/history”‘ on server cmgbr9sharedcluster51rt-shard-00-01.xxxxx:27017. The full response is {“operationTime”: {“$timestamp”: {“t”: 1656435709, “i”: 87}}, “ok”: 0.0, “errmsg”: “BSONObj size: 17907734 (0x1114016) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: \”7:/var/workflow/instances/server840/2022-06-01/xxx_reviewassetsworkflow1_114/history\””, “code”: 10334, “codeName”: “BSONObjectTooLarge”, “$clusterTime”: {“clusterTime”: {“$timestamp”: {“t”: 1656435709, “i”: 87}}, “signature”: {“hash”: {“$binary”: “MXahc2R2arLq+rc41fRzIFKzRAw=”, “$type”: “00”}, “keyId”: {“$numberLong”: “7059363699751911425”}}}} [7:/var/workflow/instances/server840/2022-06-01/xxx_reviewassetsworkflow1_114/history]
at org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:250) [org.apache.jackrabbit.oak-api:1.42.0.T20220608154910-4c59b36]
…

But is this really a limit which is hurting AEM developers and customers? Actually I don’t think so. And there are at least 2 good reasons why I believe this:

Pages barely do have that much content stored in a single component (be in the jcr:content node or any component beneath it), the same with assets. The few instances I have seen this exception just happened because a lot of “data” was stored inside properties (e.g. complete files), which would have better been stored in “nt:file” nodes as binaries.
Since version 1.0 Oak contains a warning if it needs to index properties larger than 100 Kilobytes, and I have rarely seen this warning in the wild. There are prominent examples in AEM itself when this warning is written for nodes in /libs.

So the best way to find out if you are close to run into this problem with the total size of the documents is to check the logs for this warning:

05.07.2022 09:31:57.326 WARN [async-index-update-fulltext-async] org.apache.jackrabbit.oak.plugins.index.lucene.LuceneDocumentMaker String length: 116946 for property: imageData at Node: /libs/wcm/core/content/editors/template/tour/content/items/third is greater than configured value 102400

Having these warnings in the logs means that you should pay attention to them; here it’s not a problem because this property is unlikely to get any larger over time. But you should pay attention to those properties which can grow over time.
(Although there is no warning if you have many smaller properties, which in sum hit the limits of the MongoDB document.)

How to mitigate?

As mentioned above it’s hard to come up with cases where this is actually a problem, especially if you are developing in line with the AEM guidelines. The only situation where I can imagine this limit to be a problem is when a lot of data is stored within a node, which is to be consumed by custom logic. But in this case you own the data and the logic. Therefor you have the chance to change the implementation in a way that this situation is not occurring anymore.

When you design your content and data structure, you should be aware of this limit of not storing more than 1 Megabyte within a single node. Because there is no workaround you have when you get that exception. The only way to make it work again is to fix the data structure and the code for it. There are 2 approaches:

Split the data across more nodes, ideally in a tree-ish way where you can also use application knowledge to store it in an intuitive (and often faster) way.
If you just have a single property which is that large, you could also try to convert it into a binary property. This is much simpler as in the majority of cases you just need to change the type of the property from String to Binary. The type conversions are done implicitly but if you store actual string data, you should worry about encoding then…

Now you know the answer to the question “What’s the maximum size of a node in JCR/AEM” and why it should never be a problem for you. Also I also outlined ways how you can avoid hitting this problem at all by choosing an appropriate node structure or storing large data in binaries instead of properties.

Happy developing and I hope you never encounter this situation!

by Jörg at July 05, 2022 12:03 PM

June 20, 2022

Things on a content management system - Jörg Hoh

Sling Scheduled Jobs vs Sling Scheduler

Apache Sling and AEM provide 2 different approaches to start processes at a given time or in a given interval. It is not always trivial to make the right decision between these two, and I have seen a few cases of misuse already. Let’s dive into this topic and I will outline in what situation to use the Scheduler and when to use Scheduled Jobs.

Let me outline the differences between these using using a simple table:

	Sling Scheduled Job	Sling Scheduler
Timing is persisted across restarts	Yes	No
Start a job via OSGI annotations	No	yes
Start a job via API	Yes	Yes
Trigger on every cluster node	Yes (job execution then follows regular Sling Jobs rules)	yes
Trigger just once per cluster	Yes (job execution then follows regular Sling Jobs rules)	Yes (only on cluster leader possible)

Comparison between Scheduled Jobs and Scheduler

To get a better understanding on these 2 distinct features, I cover a 2 use cases and list which feature is a the better match for it.

Execute code exactly once at a given time

That’s a case for the scheduled job. Even if the job is failing because the executing SLING instance goes down, it will be re-scheduled and tried again.

Here the exactly once semantics means that this is a single job with a global scope. Missing it is not an option. It might be delayed if the triggering date is missed or the execution is aborted, but it will be executed as soon as possible after the scheduled time has passed.

Periodic jobs which effect just a single AEM instance

Use the scheduler whenever you execute periodic/incremental jobs like cleanups, data imports etc. It’s not a problem if you miss their execution on time, but you just make sure that you execute at the next time (or trigger it during startup if necessary).

Another note to choose the right approach: You should not need to create a scheduled job during startup (that means on the startup of services), for these cases it’s normally better to use the Scheduler. There might be rare cases where this it is the right solution, but in the majority of cases you should just use the scheduler in that case.

A word of caution when you use Scheduled Jobs with a periodic pattern: As they are persisted, you need to un-register them when you don’t need them anymore.

by Jörg at June 20, 2022 05:39 PM

March 17, 2022

Things on a content management system - Jörg Hoh

How to analyze “Authentication support missing”

Errors and problems in running software manifest often in very interesting and non-obvious cases. A problem in location A manifests itself only with an unrelated error message in a different location B.

We also have one example of such a situation in AEM, and that’s the famous “Authentication support missing” error message. I see often the question “I got this error message; what should I do now?”, and so I decided: It’s time to write a blog post about it. Here you are.

“Authentication support missing” is actually not even correct: There is no authentication module available, so you cannot authenticate. But in 99,99% of the cases this is just a symptom. Because the default AEM authentication depends on a running SlingRepository service. And a running Sling repository has a number of dependencies itself.

I want to highlight 2 of these dependencies, because they tend to cause problems most often: The Oak repository and the RepositoryInitializer service. Both must be up and be started/run succesfully until the SlingRepository service is being registered succesfully. Let’s look into each of these dependencies.

The Oak repository

The Oak repository is a quite complex system in itself, and there are many reasons why it did not start. To name a few:

Consistency problems with the repository files on disk (for whatever reasons), permission problems on the filesystem, full disks, …
Connectivity issues towards the storage (especially if you use a database or mongodb as storage)
Messed up configuration

If you have an “authentication support missing” message, you first check should be on the Oak repository, typically reachable in the AEM error.log. If you have an ERROR messages logged by any “org.apache.jackrabbit.oak” class during the startup, this is most likely the culprit. Investigate from there.

Sling Repository Initializer (a.k.a. “repoinit”)

Repoinit is designed to ensure that a certain structure in the repository is provided, even before any consumer is accessing it. All of the available scripts must be executed, and any failure will immediate terminate the startup of the SlingRepositoryService. Check also my latest blog post on Sling Repository Initializer for details how to prevent such problems.

Repoinit failures are typically quite prominent in the AEM error.log, just search for an ERROR message starting with this:

*ERROR* [Apache SlingRepositoryStartup Thread #1] com.adobe.granite.repository.impl.SlingRepositoryManager Exception in a SlingRepositoryInitializer, SlingRepositoryservice registration aborted …

These are 2 biggest contributors to this “Authentication support missing” error messages. Of course there are more reasons why it could appear. But to be honest, I only have seen these 2 cases in the last years.

I hope that this article helps you to investigate such situations more swiftly.

by Jörg at March 17, 2022 04:41 PM

March 11, 2022

Things on a content management system - Jörg Hoh

How to deal with RepoInit failures in Cloud Service

Some years, even before AEM as a Cloud Services, the RepoInit language has been implemented as part of Sling (and AEM) to create repository structures directly on the startup of the JCR Repository. With it your application can rely that some well-defined structures are always available.

In this blog post I want to walk you through a way how you can test repoinit statements locally and avoid pipeline failures because of it.

Repoinit statements are deployed as part of OSGI configurations; and that means that during the development phase you can work in an almost interactive way with it. Also exceptions are not a problem; you can fix the statement and retry.

The situation is much different when you already have repoinit statements deployed and you startup your AEM (to be exact: the Sling Repository service) again. Because in this case all repoinit statements are executed as part of the startup of the repository. And any exception in the execution of repoinits will stop the startup of the repository service and render your AEM unusable. In the case of CloudManager and AEM as a Cloud Service this will break your deployment.

Let me walk you through 2 examples of such an exception and how you can deal with it.

*ERROR* [Apache SlingRepositoryStartup Thread #1] com.adobe.granite.repository.impl.SlingRepositoryManager Exception in a SlingRepositoryInitializer, SlingRepositoryservice registration aborted java.lang.RuntimeException: Session.save failed: javax.jcr.nodetype.ConstraintViolationException: OakConstraint0025: /conf/site/configuration/favicon.ico[[nt:file]]: Mandatory child node jcr:content not found in a new node 
at org.apache.sling.jcr.repoinit.impl.AclVisitor.visitCreatePath(AclVisitor.java:167) [org.apache.sling.jcr.repoinit:1.1.36] 
at org.apache.sling.repoinit.parser.operations.CreatePath.accept(CreatePath.java:71)

In this case the exception is quite detailed what actually went wrong. It failed when saving, and it says that /conf/site/configuration/favicon (of type nt:file) was affected. The problem is that a mandatory child node “jcr:content” is missing.

Why is it a problem? Because every node of nodetype “nt:file” requires a “jcr:content” child node which actually holds the binary.

This is a case which you can detect very easily also on a local environment.

Which leads to the first recommendation:

When you develop in your local environment, you should apply all repoinit statements to a fresh environment, in which there are no manual changes. Because otherwise your repoinit statements rely on the presence of some things which are not provided by the repoinit scripts.

Having a mix of manual changes and repoinit on a local development environment and then moving it untested over is often leads to failures in the CloudManager pipelines.

The second example is a very prominent one, and I see it very often:

[Apache SlingRepositoryStartup Thread #1] com.adobe.granite.repository.impl.SlingRepositoryManager Exception in a SlingRepositoryInitializer, SlingRepositoryservice registration aborted java.lang.RuntimeException: Failed to set ACL (java.lang.UnsupportedOperationException: This builder is read-only.) AclLine DENY {paths=[/libs/cq/core/content/tools], privileges=[jcr:read]} 
at org.apache.sling.jcr.repoinit.impl.AclVisitor.setAcl(AclVisitor.java:85)

It’s the well-known “This builder is read-only” version. To understand the problem and its resolution, I need to explain a bit the way the build process assembles AEM images in the CloudManager pipeline.

In AEM as a cloud service you have an immutable part of the repository, which consists out of the trees “/libs” and “/apps”. They are immutable, because they cannot be modified on runtime, not even with admin permissions.

During build time this immutable part of the image is built. This process merges both product side parts (/libs) and custom application parts (/apps) together. After that also all repoinit scripts run, both the ones provided by the product as well as any custom one. And of course during that part of the build these parts are writable, thus writing into /apps using repoinit is not a problem.

So why do you actually get this exception, when /libs and /apps are writeable? This is because repoinit is executed a second time. During the “final” startup, when /apps and /libs are immutable.

Repoinit is designed around that idea, that all activities are idempotent. This means that if you want to create an ACL on /apps/myapp/foo/bar the repoinit statement is a no-op if that specific ACL already exists. A second run of repoinit will do nothing, but find everything still in place.

But if in the second run the system executes this action again, it’s not an no-op anymore. This means that this ACL is not there as expected. Or whatever the goal of that repoinit statement was.

And there is only one reason why this happen. There was some other action between these 2 executions of repoinit which changed the repository. The only thing which also modifies the repository are installations of content packages.

Let’s illustrate this problem with an example. Imagine you have this repoinit script:

create path /apps/myapp/foo/bar
set acl on /apps/myapp/foo/bar
  allow jcr:read for mygroup
end

And you have a content package which comes with content for /apps/myapp and the filter is set to “overwrite”, but not containing this ACL.

In this case the operations leading to this error are these:

Repoinit sets the ACL on /apps/myapp/foo/bar
the deployment overwrites /apps/myapp with the content package, so the ACL is wiped
AEM starts up
Repoinit wants to set the ACL on /apps/myapp/foo/bar, which is now immutable. It fails and breaks your deployment.

The solution to this problem is simple: You need to adjust the repoinit statements and the package definitions (especially the filter definitions) in a way, that the package installation does not wipe and/or overwrite any structure created by repoinit. And with “structure” I do not refer only to nodes, but also nodetypes, properties etc. All must be identical, and in the best case they don’t interfere.

It is hard to validate this locally, as you don’t have an immutable /apps and /libs, but there is a test approach which comes very close to it:

Run all your repoinit statements in your local test environment
Install all your content packages
Enable write tracing (see my blog post)
Re-run all your repo-init statements.
Disable write tracing again

During the second run of the repoinit statements you should not see any write in the trace log. If you have any write operation, it’s a sign that your packages overwrite structures created by repoinit. You should fix these asap, because they will later break your CloudManager pipeline.

With this information at hand you should be able to troubleshoot any repoinit problems already on your local test environment, avoiding pipeline failures because of it.

by Jörg at March 11, 2022 11:28 AM

February 03, 2022

Things on a content management system - Jörg Hoh

The deprecation of Sling Resource Events

Sling events are used for many aspects of the system, and initially JCR changes were sent with it. But the OSGI eventing (which the Sling events are built on-top) are not designed for a huge volumes of events (thousands per second); and that is a situation which can happen with AEM; and one of the most compelling reasons to get away from this approach is that all these event handlers (both resource change event and all others) share a single thread-pool.

For that reason the ResourceChangeListeners have been introduced. Here each listener provides detailed information which change it is interested in (restrict by path and type of the change) therefor Sling is able to optimise the listeners on the JCR level; it does not listen for changes when no-one is interested in. This can reduce the load on the system and improve the performance.
For this reason the usage of OSGI Resource Event listeners are deprecated (although they are still working as expected).

How can I find all the ResourceChangeEventListeners in my codebase?

That’s easy, because on startup for each of these ResourceChangeEventListeners you will find a WARN message in the logs like this:

Found OSGi Event Handler for deprecated resource bridge: com.acme.myservice

This will help you to identify all these listeners.

How do I rewrite them to ResourceChangeListeners?

In the majority of cases this should be straight-forward. Make your service implement the ResourceChangeListeners interface and provide these additional OSGI properties:

@Component(
service = ResourceChangeListener.class,
configurationPolicy = ConfigurationPolicy.IGNORE,
property = {
ResourceChangeListener.PATHS + "=/content/dam/asset-imports",
ResourceChangeListener.CHANGES + "=ADDED",
ResourceChangeListener.CHANGES + "=CHANGED",
ResourceChangeListener.CHANGES + "=REMOVED"
})

With this switch you allow resource events to be processed separately in an optimised way; they do not block anymore other OSGI events.

by Jörg at February 03, 2022 06:58 PM

January 21, 2022

Things on a content management system - Jörg Hoh

How to handle errors in Sling Servlet requests

Error handling is a topic which developers rarely pay too much attention to. It is done when the API forces them to handle an exception. And the most common pattern I see is the “log and throw” pattern, which means that the exception is logged and then re-thrown.

When you develop in the context of HTTP requests, error handling can get tricky. Because you need to signal the consumer of the response, that an error happened and the request was not successful. Frameworks are designed in a way that they handle any exception internally and set the correct error code if necessary. And Sling is not different from that, if your code throws an exception (for example the postConstruct of a Sling Model), the Sling framework catches it and sets the correct status code 500 (Internal Server Error).

I’ve seen code, which catches exception itself and sets the status code for the response itself. But this is not the right approach, because every exception handled this way the developers implicitly states: “These are my exceptions and I know best how to handle them”; almost as if the developer takes ownership of these exceptions and their root causes, and that there’s nothing which can handle this situation better.

This approach to handle exceptions on its own is not best practice, and I see 2 problems with it:

Setting the status code alone is not enough, but the remaining parts of the request processing need to stopped as well. Otherwise the processing continues as nothing happened, which is normally not useful or even allowed. It’s hard to ensure this when the exception is caught.
Owning the exception handling removes the responsibility from others. In AEM as a Cloud Service Adobe monitors response codes and the exceptions causing it. And if there’s only a status code 500 but no exception reaching the SlingMainServlet, then it’s likely that this is ignored, because the developer claimed ownership of the exception (handling).

If you write a Sling Servlet or code operating in the context of a request it is best practice not to catch exceptions, but to let them bubble up to the Sling Main Servlet, which is able to handle it appropriately. handle exceptions by yourself, only if you have a better way to deal with them as to log them.

by Jörg at January 21, 2022 07:27 PM

January 05, 2022

Things on a content management system - Jörg Hoh

How to deal with the “TooManyCallsException”

I randomly see the question “We get the TooManyCallsException while rendering pages, and we need to increase the threshold for the number of inclusions to 5000. Is this a problem? What can we do so we don’t run into this issue at all?”

Before I answer this question, I want to explain the background of this setting, why it was introduced and when such a “Call” is made.

Sling rendering is based on Servlets; and while a single servlet can handle the rendering of the complete response body, that is not that common in AEM. AEM pages normally consistent of a variety of different components, which internally can consist of distinct subcomponents as well. This depends on the design approach the development has choosen.
(It should be mentioned that all JSPs and all HTL scripts are compiled into regular Java servlets.)

That means that the rendering process can be considered as tree of servlets, and servlets calling other servlets (with the DefaultGetServlet being the root of such a tree when rendering pages). This tree is structured along the resource tree of the page, but it can include servlets which are rendering content from different areas of the repository; for example when dealing with content fragments or including images, which require their metadata to be respected.

It is possible to turn this tree into a cyclic graph; and that means that the process of traversing this tree of servlets will turn into a recursion. In that case request processing will never terminate, the Jetty thread pool will quickly fill up to its limit, and the system will get unavailable. To avoid this situation only a limited number of servlet-calls per request is allowed. And that’s this magic number of 1000 allowed calls (which is configured in the Sling Main Servlet).

Knowing this let me try to answer the question “Is it safe to increase this value of 1000 to 5000?“. Yes, it is safe. In case your page rendering process goes recursive it terminates later, which will increase a bit the risk of your AEM instance getting unavailable.

“Are there any drawbacks? Why is the default 1000 and not 5000 (or 10000 or any higher value)?” From experience 1000 is sufficient for the majority of applications. It might be too low for applications where the components are designed very granular which in turn require a lot of servlet calls to properly render a page.
And every servlet call comes with a small overhead (mostly for running the component-level filters); and even if this overhead is just 100 microseconds, 1000 invocations are 100 ms just for the invocation overhead. That means you should find a good balance between a clean application modularization and the runtime performance overhead of it.

Which leads to the next question: “What are the problematic calls we should think of?“. Good one.
From a high-level view of AEM page renderings, you cannot avoid the servlet-calls which render the components. That means that you as an AEM application developer cannot influence the overall page rendering process, but you can only try to optimise the rendering of individual (custom) components.
To optimise these, you should be aware, that the following things trigger the invocation of a servlet during page rendering:

the <cq:include>, <sling:include> and <sling:forward> JSP tags
the data-sly-include statement of HTL
and every method which invokes directly or indirectly the service() method of a servlet.

A good way to check this for some pages is the “Recent requests” functionality of the OSGI Webconsole.

by Jörg at January 05, 2022 04:28 PM

December 01, 2021

Things on a content management system - Jörg Hoh

The web, an eventually consistent system

For many large websites, CDNs are the foundation for delivering content quickly to their customers around the world. The ability of CDNs to cache responses close to consumers also allows these sites to operate on a small hardware footprint. However, compared to what they would have to invest if they operated without a CDN and delivered all content through their own systems, this comes at a cost: your CDN may now deliver content that is out of sync with your origin because you changed the content on your own system. This change is not done in an atomic fashion. This is the same “atomic” as in the ACID principle of database implementations.
This is a conscious decision, and it is caused primarily by the CAP theorem. It states that in a distributed data storage system, you can only achieve 2 of these 3 guarantees:

Consistency
Availability
Partition tolerance

And in the case of a CDN (which is a highly distributed data storage system), its developers usually opt for availability and partition tolerance over consistency. That is, they accept delivering content that is out of date because the originating system has already updated it.

To mitigate this situation the HTTP protocol has features built-in which help to mitigate the problem at least partially. Check out the latest RFC draft on it, it is a really good read. The main feature is called “TTL” (time-to-live) and means that the CDN delivers a version of the content only for a configured time. Afterwards the CDN fetches a new version will from the origin system. The technical term for this is “eventual consistent” because at that point the state of the system with respect to that content is consistent again.

This is the approach all CDNs support, and it works very reliable. But only if you accept that you change content on the origin system and that it will reach your consumers with this delay. The delay is usually set to a period of time that is empirically determined by the website operators, trying to balance the need to deliver fresh content (which requires a very low or no TTL) with the number of requests that the CDN can answer instead of the origin system (in this case, the TTL should be as high as possible). Usually it is in the range of a few minutes.

(Even if you don’t use a CDN for your origin systems, you need these caching instructions, otherwise browsers will make assumptions and cache the requested files on their own. Browsing the web without caching is slow, even on very fast connections. Not to mention what happens when using a mobile device over a slow 3G line … Eventual consistency is an issue you can’t avoid when working on the web.)

Caching is an issue you will always have to deal with when creating web presences. Try to cache as much as possible without neglecting the need to refresh or update content at a random time.

You need to constantly address eventual consistency. Atomic changes (that means changes are immediately available to all consumers) are possible, but they come at a price. You can’t use CDNs for this content; you must deliver it all directly from your origin system. In this case, you need to design your origin system so that it can function without eventual consistency at all (and that’s built in into many systems). Not to mention the additional load it will have to handle.

And for this reason I would always recommend not relying on atomic updates or consistency across your web presence. Always factor in eventual consistency in the delivery of your content. And in most cases even business requirements where “immediate updates” are required can be solved with a TTL of 1 minute. Still not “immediate”, but good enough in 99% of all cases. For the remaining 1% where consistency is mandatory (e.g. real-time stock trading) you need to find a different solution. And I am not sure if the web is always the right technology then.

And as an afterthought regarding TTL: Of course many CDNs offer you the chance to actively invalidate the content, but it often comes with a price. In many cases you can invalidate only single files. Often it is not an immediate action, but takes seconds up to many minutes. And the price is always that you have to have the capacity to handle the load when the CDN needs to refetch a larger chunk of content from your origin system.

by Jörg at December 01, 2021 01:16 PM

November 01, 2021

Things on a content management system - Jörg Hoh

Understanding AEM request processing using the OSGI “Recent Request” console

During some recent work on performance improvements in request processing I used a tool, which is part of AEM for a very long time now; I cannot recall a time when it was NOT there. It’s very simple, but nevertheless powerful and it can help you to understand the processing of requests in AEM much better.

I am talking about the “Recent Requests Console” in the OSGI webconsole, which is a gem in the “AEM performance tuning” toolbox.

In this blog post I use this tool to explain the details of the request rendering process of AEM. You can find the detailed description of this process in the pages linked from this page (Sling documentation).

With this Recent Requests screen (goto /system/console/requests) you can drill down into the rendering process of the last 20 requests handled by this AEM instance; these are listed at the top of the screen. Be aware that if you have a lot of concurrent requests you might often miss the request you are looking for, so if you really rely on it, you should increase the number of requests which are retained. This can be done via the OSGI configuration of the Sling Main Servlet.

When you have opened a request, you will see a huge number of single log entries. Each log entry contains as first element a timestamp (in microseconds, 1000 microseconds = 1 millisecond) relative to the start of the request. With this information you can easily calculate how much time passed between 2 entries.

And each request has a typical structure, so let’s go through it using the AEM Start page (/aem/start.html). So just use a different browser window and request that page. Then check back on the “Recent requests console” and select the “start.html”.
In the following I will go through the lines, starting from the top.

0 TIMER_START{Request Processing} 1 COMMENT timer_end format is {<elapsed microseconds>,<timer name>} <optional message> 13 LOG Method=GET, PathInfo=null 17 TIMER_START{handleSecurity} 2599 TIMER_END{2577,handleSecurity} authenticator org.apache.sling.auth.core.impl.SlingAuthenticator@5838b613 returns true

This is a standard header for each request. We can see here that the authentication took 2599 microseconds.

2981 TIMER_START{ResourceResolution} 4915 TIMER_END{1932,ResourceResolution} URI=/aem/start.html resolves to Resource=JcrNodeResource, type=granite/ui/components/shell/page, superType=null, path=/libs/granite/ui/content/shell/start 4922 LOG Resource Path Info: SlingRequestPathInfo: path='granite/ui/components/shell/page', selectorString='null', extension='html', suffix='null'

Here we see the 2 log lines for a the resolving process of a resourcetype. It took 1932 microseconds to map the request “/aem/start.html” to the resourcetype “granite/core/components/login” with the path being /libs/granite/ui/content/shell/start. Additionally we see information about the selector, extension and suffix elements.

4923 TIMER_START{ServletResolution} 4925 TIMER_START{resolveServlet(/libs/granite/ui/content/shell/start)} 4941 TIMER_END{14,resolveServlet(/libs/granite/ui/content/shell/start)} Using servlet BundledScriptServlet (/libs/granite/ui/components/shell/page/page.jsp) 4945 TIMER_END{21,ServletResolution} URI=/aem/start.html handled by Servlet=BundledScriptServlet (/libs/granite/ui/components/shell/page/page.jsp)

That’s a nested servlet resolution, which takes 14 respective 21 microseconds. Till now that’s mostly standard and hard to influence performance-wise. But it already gives you a lot information, especially regarding the resourcetype which is managing the complete response processing.

4948 LOG Applying Requestfilters
4952 LOG Calling filter: com.adobe.granite.resourceresolverhelper.impl.ResourceResolverHelperImpl 4958 LOG Calling filter: org.apache.sling.security.impl.ContentDispositionFilter 4961 LOG Calling filter: com.adobe.granite.csrf.impl.CSRFFilter 4966 LOG Calling filter: org.apache.sling.i18n.impl.I18NFilter 4970 LOG Calling filter: com.adobe.granite.httpcache.impl.InnerCacheFilter 4979 LOG Calling filter: org.apache.sling.rewriter.impl.RewriterFilter 4982 LOG Calling filter: com.adobe.cq.history.impl.HistoryRequestFilter 7870 LOG Calling filter: com.day.cq.wcm.core.impl.WCMRequestFilter 7908 LOG Calling filter: com.adobe.cq.wcm.core.components.internal.servlets.CoreFormHandlingServlet 7912 LOG Calling filter: com.adobe.granite.optout.impl.OptOutFilter 7921 LOG Calling filter: com.day.cq.wcm.foundation.forms.impl.FormsHandlingServlet 7932 LOG Calling filter: com.day.cq.dam.core.impl.servlet.DisableLegacyServletFilter 7935 LOG Calling filter: org.apache.sling.engine.impl.debug.RequestProgressTrackerLogFilter 7938 LOG Calling filter: com.day.cq.wcm.mobile.core.impl.redirect.RedirectFilter 7940 LOG Calling filter: com.day.cq.wcm.core.impl.AuthoringUIModeServiceImpl 8185 LOG Calling filter: com.adobe.granite.rest.assets.impl.AssetContentDispositionFilter 8201 LOG Calling filter: com.adobe.granite.requests.logging.impl.RequestLoggerImpl 8212 LOG Calling filter: com.adobe.granite.rest.impl.servlet.ApiResourceFilter 8302 LOG Calling filter: com.day.cq.dam.core.impl.servlet.ActivityRecordHandler 8321 LOG Calling filter: com.day.cq.wcm.core.impl.warp.TimeWarpFilter 8328 LOG Calling filter: com.day.cq.dam.core.impl.assetlinkshare.AdhocAssetShareAuthHandler

These are all request-level filters, which are executed just once per request.

And now the interesting part starts: the rendering of the page itself. The building blocks are called “components” (that term is probably familiar to you) and it always follows the same pattern:

Calling Component Filters
Executing the Component
Return from the Component Filters (in reverse order of the calling)

This pattern can be clearly seen in the output, but most often it is more complicated because many components include other components, and so you end up in a tree of components being rendered.

As an example for the straight forward case we can take the “head” component of the page:

25849 LOG Including resource MergedResource [path=/mnt/overlay/granite/ui/content/globalhead/experiencelog, resources=[/libs/granite/ui/content/globalhead/experiencelog]] (SlingRequestPathInfo: path='/mnt/overlay/granite/ui/content/globalhead/experiencelog', selectorString='null', extension='html', suffix='null') 25892 TIMER_START{resolveServlet(/mnt/overlay/granite/ui/content/globalhead/experiencelog)} 25934 TIMER_END{40,resolveServlet(/mnt/overlay/granite/ui/content/globalhead/experiencelog)} Using servlet BundledScriptServlet (/libs/cq/experiencelog/components/head/head.jsp) 25939 LOG Applying Includefilters 25943 LOG Calling filter: com.adobe.granite.csrf.impl.CSRFFilter 25951 LOG Calling filter: com.day.cq.personalization.impl.TargetComponentFilter 25955 LOG Calling filter: com.day.cq.wcm.core.impl.page.PageLockFilter 25959 LOG Calling filter: com.day.cq.wcm.core.impl.WCMComponentFilter 26885 LOG Calling filter: com.day.cq.wcm.core.impl.monitoring.PageComponentRequestFilter 26893 LOG Calling filter: com.adobe.granite.metrics.knownerrors.impl.ErrorLoggingComponentFilter 26896 LOG Calling filter: com.day.cq.wcm.core.impl.WCMDebugFilter 26899 LOG Calling filter: com.day.cq.wcm.core.impl.WCMDeveloperModeFilter 28125 TIMER_START{BundledScriptServlet (/libs/cq/experiencelog/components/head/head.jsp)#1} 46702 TIMER_END{18576,BundledScriptServlet (/libs/cq/experiencelog/components/head/head.jsp)#1} 46734 LOG Filter timing: filter=com.day.cq.wcm.core.impl.WCMDeveloperModeFilter, inner=18624, total=19806, outer=1182 46742 LOG Filter timing: filter=com.day.cq.wcm.core.impl.WCMDebugFilter, inner=19806, total=19810, outer=4 46749 LOG Filter timing: filter=com.adobe.granite.metrics.knownerrors.impl.ErrorLoggingComponentFilter, inner=19810, total=19816, outer=6 46756 LOG Filter timing: filter=com.day.cq.wcm.core.impl.monitoring.PageComponentRequestFilter, inner=19816, total=19830, outer=14 46761 LOG Filter timing: filter=com.day.cq.wcm.core.impl.WCMComponentFilter, inner=19830, total=20750, outer=920 46767 LOG Filter timing: filter=com.day.cq.wcm.core.impl.page.PageLockFilter, inner=20750, total=20754, outer=4 46772 LOG Filter timing: filter=com.day.cq.personalization.impl.TargetComponentFilter, inner=20754, total=20758, outer=4

At the top you see the LOG statement “Including resource …” which provides you with the information what resource is rendered, including additional information like selector, extension and suffix.

As next statement we have the resolution of the renderscript which is used to render this resource, plus the time it took (40 microseconds).

Then we have the invocation of all component filters, the execution of the render script itself, which is using a TIMER to record start time, end time and duration (18576 microseconds), and the unwinding of the component filters.

If you use a recent version of the SDK for AEM as a Cloud Service, all timestamps are in microseconds, but in AEM 6.5 and older the duration measured for the Filters (inner=…, outer=…) were printed in miliseconds (which is an inconsistency I just fixed recently).

If a component includes another component, it looks like this:

8350 LOG Applying Componentfilters 8358 LOG Calling filter: com.day.cq.personalization.impl.TargetComponentFilter 8361 LOG Calling filter: com.day.cq.wcm.core.impl.page.PageLockFilter 8365 LOG Calling filter: com.day.cq.wcm.core.impl.WCMComponentFilter 8697 LOG Calling filter: com.day.cq.wcm.core.impl.monitoring.PageComponentRequestFilter 8703 LOG Calling filter: com.adobe.granite.metrics.knownerrors.impl.ErrorLoggingComponentFilter 8733 LOG Calling filter: com.day.cq.wcm.core.impl.WCMDebugFilter 8750 TIMER_START{BundledScriptServlet (/libs/granite/ui/components/shell/page/page.jsp)#0} 25849 LOG Including resource MergedResource [path=/mnt/overlay/granite/ui/content/globalhead/experiencelog, resources=[/libs/granite/ui/content/globalhead/experiencelog]] (SlingRequestPathInfo: path='/mnt/overlay/granite/ui/content/globalhead/experiencelog', selectorString='null', extension='html', suffix='null') 25892 TIMER_START{resolveServlet(/mnt/overlay/granite/ui/content/globalhead/experiencelog)} 25934 TIMER_END{40,resolveServlet(/mnt/overlay/granite/ui/content/globalhead/experiencelog)} Using servlet BundledScriptServlet (/libs/cq/experiencelog/components/head/head.jsp) 25939 LOG Applying Includefilters [...] 148489 LOG Filter timing: filter=com.day.cq.wcm.core.impl.WCMDeveloperModeFilter, inner=1698, total=1712, outer=14 148500 LOG Filter timing: filter=com.day.cq.wcm.core.impl.WCMDebugFilter, inner=1712, total=1717, outer=5 148509 LOG Filter timing: filter=com.adobe.granite.metrics.knownerrors.impl.ErrorLoggingComponentFilter, inner=1717, total=1722, outer=5 148519 LOG Filter timing: filter=com.day.cq.wcm.core.impl.monitoring.PageComponentRequestFilter, inner=1722, total=1735, outer=13 148527 LOG Filter timing: filter=com.day.cq.wcm.core.impl.WCMComponentFilter, inner=1735, total=2144, outer=409 148534 LOG Filter timing: filter=com.day.cq.wcm.core.impl.page.PageLockFilter, inner=2144, total=2150, outer=6 148543 LOG Filter timing: filter=com.day.cq.personalization.impl.TargetComponentFilter, inner=2150, total=2154, outer=4 148832 TIMER_END{140080,BundledScriptServlet (/libs/granite/ui/components/shell/page/page.jsp)#0}

You see the component filters, but then after the TIMER_START for the page.jsp (check the trailing timer number: #0, every timer has a unique ID!) line you see the inclusion of a new resource. For this again the render script is resolved and instead of the ComponentFilters the IncludeFilters are called, but in the majority of cases the list of filters are identical. And depending on the resource structure and the script, the rendering tree can get really deep. But eventually you can see that the the rendering of the page.jsp is completed; you can easily find it by looking for the respective timer ID.

Equipped with this knowledge you can now easily dig into the page rendering process and see which resources and resource types are part of the rendering process of a page. And if you are interested in the bottlenecks of the page rendering process you can check the TIMER_END lines which both include the rendering script plus the time in microseconds it took to render it (be aware, that this time also includes it too to render all scripts invoked from this render script).

But the really cool part is that this is extensible. Via the RequestProgressTracker you can easily write your own LOG statements, start timers etc. So if you want to debug requests to better understand the timing, you can easily use something like this:

slingRequest.getRequestProgressTracker().log("Checkpoint A");

And then you can find this log message in this screen when this component is rendered. You can use it to output useful (debugging) information or just have use its timestamp to identify performance problems. This can be superior to normal logging (to a logfile), becaus you can leave these statements in production code, and they won’t pollute the log files. You just need to have access to the OSGI webconsole, search for the request you are interested and check the rendering process.

And if you are interested, you can can also get all entries in this screen and do whatever you like. For example you can write a (request-level) filter, which calls first the next filter, and afterwards logs all entries of the RequestProgressTracker to the logfile, if the request processing took more than 1 second.

The Request Progress Tracker plus the “Recent Requests” Screen of the OSGI webconsole are a really cool combination to both help you to understand the inner working of the Sling Request Processing, and it’s also a huge help to analyze and understand the performance of request processing.

I hope that this technical deep dive into the sling page rendering process was helpful for you, and you are able to spot many interesting aspects of an AEM system just be using this tool. If you have questions, please leave me a comment below.

by Jörg at November 01, 2021 03:22 PM

October 25, 2021

CQ5 Blog - Inside Solutions

Cloud Manager: Deploy and Operate AEM Cloud Service

Cloud Manager is an integral part of Adobe’s AEM as a Cloud Service (AEMaaCS) offering.

Cloud Manager provides a fully-featured Continuous Integration / Continuous Development (CI/CD) pipeline enabling organisations to build, test, and deploy their AEM applications to the Adobe Cloud automatically.

Hosting, operation, and scaling of Adobe Experience Manager is all managed by Adobe in the background including a SLA. Maintenance of Cloud Manager and upgrading of AEM is taken care of by Adobe as well.

Cloud Manager benefits smaller projects with the extensive out of the box build pipeline and stable deployment that promises zero downtime. Larger projects can free up resources in their devops and operations team which no longer have to focus on the intricacy of deploying and hosting AEM.

Lastly, overall system performance, stability and availability are improved since no one will know how to build and host Adobe Experience Manager better than Adobe.

Overall, Cloud Manager is a great cost and time saver due to a lot of functionality which is provided and maintained by Adobe.

We will explore and highlight the main functionalities so that you understand the tool and reasoning why we, at One Inside, think it’s so great.

What is Adobe Cloud Manager?

Adobe Cloud Manager allows self-managed deployments and operation of AEM Cloud Service.

It consists of a CI/CD pipeline, various environments, code repositories and further information about the system like logs or SLA reports.

Log in to Adobe Cloud Manager

To log in to Cloud Manager, go to experience.adobe.com (Experience Manager / Launch Cloud Manager).

If you do not have access, either your company does not yet have the AEM Cloud Service licenses or your account is lacking the required permissions.

Blogpost_-AEM-CS-Cloud-Manager_Log-in-to-Adobe-Cloud-Manager

You can find the most important information about the environments and pipelines on the Startpage and have access to more detailed information.

Cloud Manager Core Features

Cloud Manager has the following main features:

Self-Service web interface for the deployments and AEM operation
Cloud Manager functionality can also be accessed programmatically via API
Fully automated and configurable CI/CD Pipeline
Provisioning and configuration of productive and test environments
Adobe hosted git repositories
Automated quality assurance of the application (code quality, security and performance)
Autoscaling of both AEM author as well as publish Instances
Multitier Caching Architecture including global Akamai Cache

Benefits and Disadvantages of Adobe Cloud Manager

These outlined core features result in a great set of benefits when using Cloud Manager:

Performance – Great performance can be expected, the global CDN and the possibility to run the AEM servers in one of the globally distributed Azure datacenters (support for AWS is on the roadmap). AEM hosting by Adobe helps guarantee optimal AEM performance and continuous improvements.
Autoscaling – When subjected to unusually high load, Cloud Manager detects the need for additional resources and automatically brings additional instances online via autoscaling. This works for both authoring and publishing instances.
Confidence in deployments – since the same pipeline is executed by all AEMaaCS customers, Adobe can optimise the reliability of the pipeline and deployments. After ten successful deployments, the customer can usually independently carry out deployments without involving Adobe Customer Service at all.
Extensibility of the pipeline – Cloud Manager is integrated into the Experience Cloud APIs and is therefore easy to connect or integrate with other or custom services.
Backup – Cloud Manager will automatically back up before every release. If any issue is noticed after the deployment, the release can be set back with the press of a button. The production instances are backed up as well (24h point in time recovery, up to 7 days with Adobe-defined timestamps).
“Zero” Downtime – Adobe has a lot of experience in hosting AEM for its large customer base. This allows Adobe to achieve great availability and you can expect basically zero downtime. Need proof? Adobe’s SLA of 99.9%.
Very low initial setup time – Basically a “1 click setup” for environments and the default pipeline. Certificates and domains are also set up quickly via the UI.
Very low maintenance and operation costs – Adobe takes care of maintaining the pipeline, upgrading AEM, providing security fixes for the OS, and operating all systems (Cloud Manager, Apache / Dispatcher, AEM instances, Akamai CDN etc).
Always up to date AEM – Adobe releases new versions almost weekly or even more frequently if there is a very urgent security fix. The moment the new features or fixes are available you will see them on your AEM instances! Your security team will be very happy to hear that.
For on-premises versions, new features will only be available approximately 6 months after their release (except security fixes which usually come with service packs).

As always there are some drawbacks, but the benefits far outweigh them and there are ways to work around them:

Less flexibility – The pipeline and architecture is, to a certain degree, predefined. For example, it’s no longer possible to install additional OS level applications or use a different caching solution like Varnish. The Adobe I/O runtime or an external environment has to be used to provide additional services instead.
Limited customisability of AEM – It’s no longer possible to extend AEM freely. Some customisation is still possible, especially if the developers get creative, but not everything. Since this will be a win for maintainability, this could almost be regarded as an advantage.
Less control – Since Adobe takes responsibility for running the services and provides an SLA there are certain limitations.
For example, it’s not possible to log in on the publisher’s website, no admin password is available, and Felix Console access is blocked. Especially the last one is a concern for any AEM developers and will hinder the possibilities to debug issues on productive systems.
These issues are somewhat alleviated since Cloud Manager allows to extract certain information like log files. On the authoring instance, some tools are still accessible (e.g. /crx/de). Cloud Manager also provides additional utilities, like viewing the bundle status (those will probably be expanded on in the future).

Release a new version of AEM in the cloud (CI/CD pipeline)

The Cloud Manager CI/CD pipeline brings the code in the repository to a build application on your productive Adobe Experience Manager environment.

There are two types of environments, each environment consists of the full AEM stack (author, publish, dispatcher). A single pair called “Production” consists of “Stage” and “Prod” environments.

Every “Production” deployment first goes to “Stage” where it is analysed and can be further inspected manually before being approved and deployed to “Production”.

All other environments are referred to as “Non-Production” and are used as test environments. New test environments can be provisioned as needed.

The pipeline runs are shown in the UI and additional logs of each execution step can be downloaded to debug any issues. There are several steps in the pipeline explained as follows:

1 – Code in Adobe Git

Cloud Manager allows git repositories hosted by Adobe. The pipeline can only fetch code from those repositories. The pipeline can be triggered on commit on a certain branch or triggered manually.

To use an external non-Adobe repository, the changes have to be synchronised with the Adobe repository (this can easily be automated with various CI/CD tools like Github Actions, Bitbucket Pipelines or Jenkins).

2 – Build Code and Unit Tests

The project is built by executing the Maven build, including executing the unit tests. The result is the “Release” build.

3 – Code Scanning

This inspects the whole code base and applies static code analysis.

There are several rule sets for different topics like test coverage, potential security issues or maintainability in the context of AEM. Each topic is rated and a recommendation is given by Adobe.

These recommendations are set up quite reasonably at 50% coverage. The goal of each project should be to reach those numbers.

4 – Deploy to Stage

The code is now deployed to the stage environment. Internally, Cloud Manager creates a copy of the whole stage environment before deploying it. If there is any issue or failure with the deployment, the Stage can be reverted to the previous state.

5 – Stage testing Tests: Security Tests, Performance & Load Tests, UI Tests

Various tests are executed by default by Adobe to test if AEM itself is still working as expected and some tools to measure the performance of the website. Additionally, custom tests can be added to further test the integration of the application into AEM or the website itself.

Blogpost_-AEM-CS-Cloud-Manager_Stage-testing

Deploy to Production

If not disabled, the pipeline halts at this point before deploying to Production. This allows us to inspect the performance tests and code audit test results.

Any further manual testing can now be done on Stage. If everything looks good, the build can be approved and will be deployed to Product. Otherwise, the build is cancelled and reverted.

Blogpost_-AEM-CS-Cloud-Manager_Production-Deployment

Testing with Cloud Manager

Four different categories of tests are executed in the pipeline.

Unit Tests

These are executed before the deployment step. They test the application on a code level and in isolation.

Since unit tests are pretty much industry standard, there is mainly one interesting question: how high should the coverage be?

There are different viewpoints on this topic.

Adobe is quite defensive or realistic with the expectation of 50%. From our perspective, for web based projects and especially content focused logic, the integration tests discussed afterwards provide a lot of value.

For each project it has to be decided how much time is spent on each type of test.

Code Scanning

This is executed before the deployment step. It inspects the code itself by doing static analysis. It gives various metrics indicating the quality of the code base by a set of code quality rules defined by Adobe.

Internally, SonarQube is used to analyse the code. Additionally, OakPAL scans the built content package to catch various potential issues which might not work with the deployment.

There are three categories of criticality:

Critical: Pipeline stops immediately
Important: Pipeline pauses, can be manually continued if the issue is not urgent for the current release and fixed later
Info: Purely informational

There are the following types of ratings, each with different failure thresholds (check code quality rules documentation for details).

Blogpost_-AEM-CS-Cloud-Manager_Code-Scanning-Result

Over 100 SonarQube rules are applied. If a specific issue is a false positive and should be ignored, an Excel from the link above can be downloaded to look up the rule key.

The key can then be used in the Java code to make sure SonarCube will skip the warning. An example for “Credentials should not be hard-coded” is ”squid:S2068″.

In the Java code add the annotation: @SuppressWarnings(“squid:S2068”)

Experience Audit (Performance Testing)

This executes the well known Google Lighthouse Tool, the same that is available in Chrome Dev Tools.

It indicates changes compared to the last release. It’s also possible to inspect and download the full Lighthouse report.

This is a great feature to have, especially because the tests are executed on every run and in an isolated, repeatable environment.

What is missing is a view for the audit over time, that would be really helpful to track performance.

Blogpost__AEM-CS-Cloud-Manager_Experience_Audit

Product Functional Testing, Custom Functional Testing, Custom UI Testing

There are several integration tests.

Adobe provides a set of tests to verify the basic functionality of AEM, for example, if content can still be replicated from Author to Publish. Adobe might add additional tests in the future as well – who doesn’t like free integration tests!

In addition, custom tests can be written to further verify the functionality of AEM. This is especially useful if there are custom AEM modifications.

UI Tests are intended to test the website itself on the publisher instance via dispatcher. The idea is to provide test content with the code which will be deployed to the stage and then execute the integration tests in various browsers to verify functionality.

There is a default setup in the Maven Archetype for Integration tests based on webdriver.io and Selenium.

Docker is used to build and execute the integration tests in the cloud. It’s possible to modify those to adjust the test setup. Important to note is that UI tests are disable dy default.

Follow the documentation for “Customer Opt-Int” to understand how to enable it.

Team and Roles

There is a set of predefined roles by Adobe, which also have their according permission profile and access restrictions of who is allowed to run or modify the pipeline and other features.

Most of those roles probably match the existing roles of a project.

The most important ones are Business Owner, Deployment Manager, Program Manager, Developer. Content Authors do not have to interact with Cloud Manager. Permissions for Authors are set up in AEM itself.

We believe it’s not necessary to use this many roles for most projects and if you can trust your developers, it’s probably enough if the lead is “Business Owner” and developers are “Deployment Manager“.

Have a look at the user permission table to decide what makes sense for your project.

A notable role that is missing is “DevOps” or “Operation“. Deployment Manager is what comes closest to a DevOps person, since it’s allowed to edit the pipeline, however, any experienced developer should be able to configure Cloud Manager.

If there are integrations of Cloud Manager planned, a person with DevOps experience might become helpful.

Integrate Cloud Manager programmatically in your current Solution (Advanced topic)

Adobe is aware that every customer has its unique application landscape. There are various ways to integrate AEM Cloud Service and Cloud Manager.

Adobe Cloud Manager API

All capabilities available in the UI can also be programmatically accessed with the Adobe Cloud Manager API.

This allows to integrate the AEM Cloud Service Pipeline into a custom existing CI/CD infrastructure and also enhances the pipeline with additional custom features.

Webhooks are also supported by the API which is a great way to integrate with other services.

Some example use cases are triggering of the pipeline from an external action, monitoring and notification (e.g. Slack channel) of the pipeline runs, externally executing additional tests, or adding actions after the deployment like clearing of the cache.

Identity Management System (IMS) integration

Provisioning and access to control for the Cloud Manager and AEM can be handled manually but also via integration of external IMS.

Synchronising user accounts with group permissions is supported to automate provisioning. SAML with an external IDP is supported to enable Single Sign On.

Firewall

By default, the AEM Cloud Service instances do not have access to external systems due to security reasons. Simple IP whitelisting can be configured directly in Cloud Manager.

For anything more complex, a solution can be worked out with the Adobe Cloud Manager engineer.

Forwarding Splunk logs

AEM Cloud Service internally uses Splunk to aggregate the logs.

Via support it can be requested to configure forwarding of the Splunk logs to a custom Splunk instance. This is a great way to extract as much information from the system as possible.

Conclusion

As we can see, Adobe Cloud Manager provides out of the box enterprise-grade CI/CD and hosting of AEM applications in the cloud.

There is a big cost-saving potential for both the initial setup as well as maintenance, thanks to the simple configuration in the web UI and all the operation efforts being taken care of by Adobe.

Combined with the powerful capability of AEM itself and the Adobe Experience Cloud as a whole, AEM Cloud Service is the best cloud-native CMS offering on the market.

This article is part of a series of content about AEM Cloud Service, where we explain how to move to AEM Cloud.

Learn how to design an AEM website with Core Components. Finally, once your website is live, start optimising it and improving the customer experience.

Basil Kohler

AEM Architect

Would you like to receive the next article?

Subscribe to our newsletter and we will send you the next article about AEM Cloud Service.

I subscribe

The post Cloud Manager: Deploy and Operate AEM Cloud Service appeared first on One Inside.

by Samuel Schmitt at October 25, 2021 08:31 AM

October 17, 2021

Things on a content management system - Jörg Hoh

AEM micro-optimization (part 4) – define allowed templates

This time I want to discuss a different type of micro-optimization. It’s not something you as a developer can implement in your code, but it’s rather a question of the application design, which has some surprising impact. I came across it when I recently investigated poor performance in the Siteadmin navigation. And although I did this investigation in AEM as a Cloud Service, the logic on AEM 6.5 behaves the same way.

When you click in the siteadmin navigation through your pages, AEM collects a lot of information about pages and folders to display them in the proper context. For example, when you click on page with child pages, it collects information what actions should be displayed if a specific child node is going to be selected (copy, paste, publish, …)

An important information is if the “Create page” action should be made available. And that’s the thing I want to outline in this article.

Assuming that you have the required write permissions on that folder, the most important is if templates are allowed to be created as children of the current page. The logic is described in the documentation and is quite complex.

In short:

On the content the template must be allowed (using the cq:allowedTemplates property (if present) AND
The template must be allowed to be used as a child page of the current page

Both conditions are must be met for a template to make it eligible to be used as a source for a new page. To display the entry “Page” it’s sufficient if at least 1 template is allowed.

Now let’s think about the runtime performance of this check, and that’s mostly determined by the total number of templates in the system. AEM determines all templates by this JCR query:

//jcr:content/element(*,cq:Template)

And that query returns 92 results on my local SDK instance with WKND installed. If we look a bit more closely to the results, we can determine 3 different types of templates:

Static templates
Editable templates
Content Fragment models

So depending on your use-case it’s easy to end up with hundreds of templates, and not all of them are applicable at the location you are currently in. In fact, typically just very few templates can be used to create a page here. That means that the check most likely needs to iterate a lot to eventually encounter a template which is a match.

Let’s come back to the evaluation if that entry should be displayed. If you have defined the cq:allowedTemplates property on the page or it’s ancestors it’s sufficient to check the templates listed there. Typically it’s just a handful of templates, and it’s very likely that you find a “hit” early on, which immediately terminates this check with a positive result. I want to explicitly mention that not every template listed can be created here, because there also other constraints (e.g. the parent template must be of a certain type etc) which must match.

If template A is allowed to be used below /content/wknd/en, then we just need to check the single Template A to get that hit. We don’t care, where in the list of templates it is (which are returned by the above query), because we know exactly which one(s) to look at.

If that property is not present, AEM needs to go through all templates and check the conditions for each and every one, until it finds that positive result. And the list of templates is identical to the order in which the templates are returned from the JCR query, that means the order is not deterministic. Also it is not possible to order the result in a helpful way, because the semantic of our check (which include regular expressions) cannot be expressed as part of the JCR query.

So you are very lucky if the JCR query returns a matching template already at position 1 of the list, but that’s very unlikely. Typically you need to iterate tens of templates to get a hit.

So, what’s the impact on the performance of this iteration and the checks? In an synthetic check with 200 templates, when I did not have any match, it took around 3-5ms to iterate and check all of the results.

You might ask, “I really don’t feel a 3-5ms delay”, but when the list view in siteadmin performs this check for up to 40 pages in a single request, it’s rather a 120-200 millisecond difference. And that is a significant delay for requests where bad performance is visible immediately. Especially if there’s a simple way to mitigate this.

And for that reason I recommend you to provide “cq:allowedTemplates” properties on your content structure. In many cases it’s possible and it will speed up the siteadmin navigation performance.

And for those, who cannot change that: I currently working on changing the logic to speedup the processing for the cases where no cq:allowedTemplates property is applicable. And if you are on AEM as a Cloud Service, you’ll get this improvement automatically.

by Jörg at October 17, 2021 02:13 PM

September 21, 2021

CQ5 Blog - Inside Solutions

5 tips to help you maintain and improve your chatbot

Chatbot solutions are a flexible approach to connecting with customers in many situations.

They simplify access to your brand for your customers. Plus, they help you gain deep insights into your customers’ needs.

However, a chatbot is only successful when it manages to answer customer requests.

To reach this goal, optimizations after go-live are key.

In this post, we will explain the chatbot lifecycle and the five things you can do to improve your solution over time.

Finally, we will explain the effort your marketing team needs to invest for your solution to be successful.

What is the lifecycle of a chatbot?

We divide the chatbot lifecycle into the following five phases:

From ideas to roadmap: First, you have to understand what chatbot solutions have to offer and define your vision.
Turning a roadmap into a plan: Once your vision and strategy are defined, you will make a concrete plan to build your chatbot.
Build chatbot and conversations: Implement, create content, and design conversations and begin training your system
From training to go-live: You need to test your system, prepare the go-live, and bring your chatbot to life.
Scale and optimize the chatbot experience: Go-live is just the start for your chatbot. Now it’s time to grow and optimize! Customers are impatient – don’t plan to optimize your chatbot in the future, do it continuously, from day one.

We will focus on the last step of the lifecycle, which is often misunderstood or put aside.

Do I really need to optimize an AI chatbot?

Chatbots use AI to perform.

Why is continuous optimization required? Current systems offer a flexibility in understanding the intention of the user.

However, if the language derivatives too much from the expected (trained) phrases and/or the intention is not known, the system reaches its limits.

Chatbot systems are not fully self-learning. You must assist them for them to improve.

The chatbot’s answers are not yet driven by AI at all. It’s typically driven by hand-crafted content which is optimized for the expected/wanted user journey instead.

Following challenges will occur for each chatbot over time – independent of how well you planned it:

The user’s language is unexpected and therefore a wrong intention is assumed.
Customers have questions concerning your business which you have not foreseen.
You offer new services which the chatbot wasn’t trained for.
Your chatbot system should get smarter with the integration of further systems, e.g., from a CRM or PIM
The world we live in changes over time. This can force you to adapt to keep offering a pleasing and up to date experience for your customers – even though your services hasn’t changed.

So what are the concrete steps to keep your chatbot up to date?

How do you optimize a chatbot?

Optimization of a AI chatbot is crucial, and we have listed some activities that your marketing team will have to take care of, with some support from your IT team.

1 – Understanding more and more questions

Likely you started with a limited basis for natural language understanding. To understand any needs from your customers – independent of if you answer them or not – you will have to add more and more intents over time.

In short: You have to teach your system new phrases.

2 – Keeping your content up to date

As you optimize the content on your website over time, the same is required for your chatbot. Change wordings, influence the user journey, or improve linked assets.

In short: You have to optimize chatbot answers continuously.

3 – Learn from your customers

Your customers offer deep insights into their needs. Benefit from it.

To do so, monitor how they interact with your system and which questions they ask – especially for topics that you do not answer with your chatbot or even your website yet.

In short: You must monitor chatbot interactions to gain insights.

4 – Validate the success of your chatbot

Chatbots are awesome. They support your customers and can have a great return on investment.

However, don’t just believe that your chatbot is successful. Track if the intended goals are reached and try to collect additional data to identify further goals.

Set your success factors in relation to other systems and identify cross-system benefits.

In short: You have to monitor the success of your chatbot.

5 – Move your chatbot to the next level

Likely you restricted the list of features for your initial chatbot.

After go-live and having first experience with conversational marketing, it’s absolutely the right time to introduce additional features to your chatbot.

For example you could integrate further business information systems, try out new UX/UI concepts, or close feature gaps.

You should also make sure you stay up to date with features offered by your competitors’ chatbots.

In short: You have to enrich your chatbot with features to continue to benefit and stay attractive for customers.

Who optimizes the chatbot and how often?

For successful chatbot optimization, a proper process is required.

Identify the regular tasks and assign responsibilities. Plan optimization reviews to ensure success in the long run.

We described the main tasks earlier. They include:

Increasing training data for existing intents
Add new intents for unexpected topics
Improve the flow and content of answers

Chatbot-Project-Journey_Marketing-Operations

All these tasks are mainly content-driven. Therefore, the maintenance must be driven by the business stakeholder and its team.

Team members must be trained to write content for chatbots and must understand the basic principles of NLU (Natural Language Understanding) to improve training data.

As these tasks impact customers directly it’s important to do them regularly. Improve and add intents e.g., once a week and update content monthly.

Beside chatbot specific maintenance tasks, other optimizations are needed as well : bug fixes and new features.

These tasks are typically solved in collaboration with the development team. Business stakeholders know what they want, developers know how to implement. Don’t just wait until you need features but plan regular improvements.

Lastly, ensure your system keeps its technology up to date. APIs may change and provide new features.

In summary, optimization must be planned, responsibilities assigned, and tasks distributed to team members.

Get the most out of your chatbot

A chatbot is a great tool to open another channel for your customers. It’s not a replacement but an additional opportunity to offer the best service.

It offers you deep insights into customers’ needs.

On top of that, it forces you to rethink your processes and goals for service and marketing tasks. Take the opportunity and improve your services with a conversational channel.

We summarized all aspects of the journey to a successful chatbot here (Whitepaper).

From the first vision, over planning, to go live. Find out more about chatbots and how they benefit your customers and your organization.

Clemens Blumer

Senior Software Architect

Would you like to receive the next article?

Subscribe to our newsletter and we will send you the next article about Conversational Marketing and Chatbot.

I subscribe

The post 5 tips to help you maintain and improve your chatbot appeared first on One Inside.

by Samuel Schmitt at September 21, 2021 02:06 PM

September 14, 2021

CQ5 Blog - Inside Solutions

Designing your AEM Cloud Service Website with Core Components

Adobe brings a set of reusable and production-ready components for its content management system, AEM.

Their name: The Adobe Core Components.

Their purpose: Speeding up development time.

But how do you take advantage of these Core Components to deliver a website fast?

What is the best design approach, and how should the team members work together?

If your company is using AEM or AEM Cloud Service and you are looking to fast-track the design process while following the best practices of AEM’s styling system, you came to the right place.

Today, we are going to explain how to avoid a long design phase, and too much back and forth.

We will explain the best workflow to design user experiences and build websites at scale:

You’ll learn what Adobe Core Components are…
and how your design team and development team should collaborate!

AEM Styling process in a nutshell

Let’s go through the styling process with Adobe Experience Manager.

The process suits a new project with AEM Cloud Service.

However, you could follow a similar design approach while working with an on-premise version of AEM.

Indeed, the basis of the styling process will remain the same for any version of AEM and leverage Adobe Core Components, a library of best practice components.

The main idea of the styling process can be summarised in three concepts:

Using standardised components
Low code, a software development approach that requires little to no coding
Reusability

The standards are represented by the Core Components. More than just a library of web elements, they will drive the design of the user experience from the very beginning and help the designers, the website owner, and developers to work together on a common frame.

Low code is a key aspect of the styling process. Why always reinvent the wheel?

The Core Components offer a foundation. By leveraging them, the development effort is drastically reduced and the main task is all about styling the components. In other words, it’s focused on adapting CSS and JS. HTML won’t be adapted.

Finally, the design elements must be reusable for other websites, landing pages, or intranets.

When companies decide to invest in AEM and build a consistent user experience across many websites and channels, they must consider having proper information architecture.

A well-designed component library helps them implement websites at scale.

We will now go into detail of these concepts, explain what core components are, and what it means to build the user experience with core components in mind to get your project design up to speed.

What are the Core Components in AEM?

The Core Components are a set of standardised Web Content Management (WCM) components for Adobe Experience Manager (AEM).

They were introduced with AEM 6.3, and their aim is to speed up development time while reducing maintenance costs, and ensuring better upgradability.

The use of Core Components is the best and recommended approach to start a new project with AEM as a Cloud Service. Core Components are cloud-ready. Using them will help you deliver your new website faster.

If you have expertise with building websites, you might have noticed that some elements or UI patterns are quite common. For example, we often build text, image elements, or teasers highlighting content of related pages.

This is what the Core Components library has to offer. A list of thirty versatile components including:

Title
Text
Image
List
Teaser
Download
Button
And more components…

These components are built to be flexible and can be assembled to produce nearly any kind of layout.

From our expertise, 80% to 90% of usual components on a website can be implemented with a core component and a bit of styling.

As you can see below, the teaser components come with four variations that cover most use cases.

If something is missing, you can still customise a core component, extend a core component with additional functionality, or for complex scenarios, create a component from scratch (the old school way).

Adobe Core Components are open source, and you can find them on Github.

Blogpost_-AEM-CS-Core-Components_Example

To summarise, the Core Components offer a standard approach and have many advantages:

Design-agnostic: data, logic and design are completely separated
Stylisation: they can be styled in different ways
Flexibility: they offer a wide range of functionality
Future-proof: they guarantee compatibility with future versions of AEM

How to use the AEM Core Components

To understand the design process and the people, profiles, and roles involved in this process, it’s important to get a better understanding on how to use, extend, and customise Core Components.

We won’t go too much into the technical details, but it’s important that the anatomy of an AEM Core Component is understood.

Understanding the architecture of a Core Component

To make it simple, a core component can be split into two distinct parts: the backend part and the frontend part.

Blogpost_ AEM-CS-Core-Components_How-to-use

The backend contains:

The content model. It defines the structure of the content that can be stored in a component: for example, a teaser might consist of a title, an image, a short description text and a link to the target.
The configuration of the components and the edit dialog. These elements let you define what to display, and what an editor can edit, and the options he can use.
The logic behind the preparation of the content for frontend (also called view).

The frontend part will be in charge of generating the output in HTML:

A markup language (HTL) is used to bring together content from the backend and HTML elements.
CSS and JS are used for the styling and effects applied on the elements.

(We use on purpose layman terms, if you want more specific information, jump to the official and technical documentation of Adobe.)

Customising a Core Component

Yes, a core component can be customised. You can extend them to match your requirements and avoid starting custom development from scratch.

However, a word of advice. To keep all the benefits and guarantee upgrade compatibility, some best practices and customisation patterns must be followed:

1 ) Never modify the code directly. Instead, extend the existing logic:

The architecture of a Core Component allows you to extend the content model, dialog, and logic of a component and allow an editor to use additional content.

For example, you might want to add a “category” field to a certain teaser. All you have to do is extend the teaser with a text element “category“, and define how the editor shall use it in the dialogue, and how it shall be represented in the HTML output.

2) Style the components by applying your own CSS styles:

Core Components follow a standard naming convention inspired by Bootstrap to make it easy for an experienced frontend developer to apply the website’s branding.

You can read more about customization patterns here.

Roles in your team

The customisation patterns tell us what kind of profiles or roles you need in your website project team to guarantee smooth operation.

Basically, you will need two types of roles:

An AEM backend developer in charge of the configuration and extension of the backend logic of the core components.
A frontend developer mastering CSS and JS who could apply any look and feel to the HTML structure offered by the Core Components.

Blogpost_-AEM-CS-Core-Components_Customisation-Roles

We do believe that customisation can be reduced to its minimum and even be avoided if you start designing your website with Core Components in mind. More about this later.

Managing design at scale with a flexible system

The frontend developers will play a key role in the implementation process. Once they master the Core Components and style system, the sky will be the limit.

Just by adapting the style of the component, multiple themes could be created for various websites, microsites, landing pages, and more.

And the beauty is: the Core Components stay the same.

You simply adapt their style and assemble them in a new manner for different websites. By setting up a versatile set of components for your digital presence together with various themes, you will be able to manage design and website at scale.

Blogpost_ AEM-CS-Core-Components_Customisation-Flexible-System

But how can this be achieved? This is what we tackle now, and outline how to design with the Core Components in mind.

AEM design workflow with Core Components

Now that the concept of Core Components is clearer, we can detail the design workflow.

We will answer the following questions:

How to design a website with AEM using Core Components
How to design a website with AEM without compromise
How to design a website with AEM and go live fast

We will cover two scenarios. One when the UI/UX of the website is not done yet, and the second when you already have the design of the website ready.

You may be doing a migration of an existing website to AEM, and therefore want to migrate your existing website first and apply changes later.

Step 1 – Map the mockup to Core Components

In our first scenario, the design of the website is not defined yet. We start with a blank page.

The two main recommendations are:

Plan the design based on the Core Components
Set up a team made of designers and AEM consultants

It’s crucial to take the Core Components into account from the beginning. We recommend you involve AEM experts from day one, as they will guide you through this process.

In other words, don’t leave the designer alone in a room and once the design is ready, hand it over to the AEM experts and developers for implementation.

This often leads to unconceivable user experiences that don’t leverage the solution, coming with additional cost, escalations and frustrations.

A common misunderstanding is that a framework restricts the design process. But in fact, it’s the opposite. Talking to an AEM expert will open new perspectives and will unleash the full potential of AEM.

Together the designer and the AEM Expert will define a mockup including the main page templates and components to use. This guarantees that you get the best AEM has to offer.

Blogpost_ AEM-CS-Core-Components_Page-Mockup-Example

In the scenario where the design of your website is ready, for instance if you are migrating to AEM as a Cloud Service, you should start with component mapping.

An AEM expert will analyse the building blocks of your current website and map it to Core Components.

With this scenario, there might be some trade-offs:

Changing the current blocks on your website to map with the Core Components layout and feature. Let’s imagine that you have a teaser with 4 CTA while the teaser Core Component offers only 2 CTA. Here you could decide to adapt your requirements to the Core Components.
Or, if your requirements are not adaptable, the solution would be to create a custom component, extending the Core Component that fits your current UX and UI.

Anyway, for both scenarios, the goal is to have a mockup of the website where all elements are represented by Core Components.

Blogpost_ AEM-CS-Core-Components_Building-Block-Mappings-To-Core-Components

Step 2 – Design in Adobe XD

Adobe XD is a design tool.

With Adobe XD, designers can now design based on the out-of-the-box AEM Core Components, and consider how different styles can be implemented via AEM’s style system.

Adobe created an UI Kit for AEM Components.

Blogpost_ AEM-CS-Core-Components_XD-UI-Kit

By using the premade UI Kit based on AEM Core Components, unnecessary design deviations are avoided, namely the kind of deviation that requires more development effort and involves extra cost.

The steps for the designer are the following:

They will assemble the Core Components based on the mockup which will create the layout for the different pages.
They will then start styling each component based on the visual identity of the website and branding guidelines.

By following this approach, the backend developer will have an easier time configuring everything in AEM. Layout structure will represent a page template in the CMS.

It’s crucial that the website design stays in sync with the Core Components.

A similar process can be done in the case of scenario 2.

Blogpost_ AEM-CS-Core-Components_XD-UI-KIT-Mockup-Mapping

Step 3 – Configuration and style in AEM matching the mockup

While the designer will adapt the look and feel of the Core Components in Adobe XD, a AEM backend developer can start the configuration of the page templates and components in AEM.

This can be done in parallel as both work on the same basis – the defined mockup.

As soon as the design is ready and validated, a frontend developer can start working and applying the right style, CSS and JS to the core components.

Everything is bundled into AEM and ready to be deployed.

Blogpost_ AEM-CS-Core-Components_AEM-Style-matching-Mockup

Overview of the AEM styling workflow

To recap, here are the main steps of the design workflow with AEM and Core Components:

1 – Define a mockup based on the Core Components

2 – Create the UI and theme in Adobe XD

3 – Configure the page templates and components, then style the Core Components in AEM

And do not forget, a critical aspect is to have a mixed team made of designers and AEM experts from the very beginning.

Blogpost_ AEM-CS-Core-Components_AEM-Styling-Workflow-Overview

Core Components: AEM Cloud Service’s best companions

Even though the Adobe Core Components were created with AEM 6.3 before the release of AEM Cloud Service, they perform best when used together with the cloud version of Adobe CMS.

One of the main purposes of AEM Cloud Service is to enable fast innovation and help you focus on what matters most: building outstanding customer experiences.

With AEM Cloud Service you don’t need to care about servers, IT operations, network, security etc. anymore.

Leveraging Core Components gives you additional benefits by speeding up the design and development phase.

Quickly assemble the building blocks to realise a mockup, and then style them with limited backend development.

This is the best way to tackle any AEM as a Cloud Service project, and will enable you to go live fast, while guaranteeing upgradability.

Finally, build enterprise websites faster with AEM Cloud Service and Core Components

Designing for AEM with Core Components is close to what you are already doing for other projects. You leverage a framework that enables you to build something faster.

The key element is to design with the Core Components in mind and to involve an AEM expert at the very first stage of the design process.

The expert will guide you through the process and indicate the best way to use the components to avoid potential limitations.

This article is part of a series of content about AEM Cloud Service, where we explain how to move to AEM Cloud Service.

Samuel Schmitt

Digital Solution Expert

Would you like to receive the next article?

Subscribe to our newsletter and we will send you the next article about AEM Cloud Service.

I subscribe

The post Designing your AEM Cloud Service Website with Core Components appeared first on One Inside.

by Samuel Schmitt at September 14, 2021 09:33 AM

August 31, 2021

CQ5 Blog - Inside Solutions

The Marketing Technology Landscape of Swiss Insurers

In this study, we analysed the top 25 Swiss insurance companies.

We focused on their marketing technology stacks. In essence, what technologies are these companies using for their website, mobile application, analytics, marketing automation and more?

How do they leverage these technologies to build an appealing customer experience and engage with their clients?

The top Swiss insurers are Zurich Financial Services, Swiss Re, Swiss Life, Helvetia Insurance, Axa Group, Bâloise, Helsana, CSS, Groupe Mutuel, Suva and 15 others.

What is a Marketing Technology Stack?

To simplify things, we didn’t analyse every single marketing tool used by Swiss insurance companies.

We focused on the main elements of their marketing stack, which include:

The website and content management system as the baseline for customer experience and content
Analytics to collect data about website visitor’s behaviours, as well as personalisation engines to personalise the user experience
The marketing automation solution chosen to connect and engage with customers via email or other channels
The mobile app offering services directly via smartphone
The chatbot to engage in a conversational way

Disclaimer

The collected data is publicly available. We used tools such as builtwith.com or Wappalyzer to gather data about the technology used. Some information was found within public case studies published by other Swiss agencies and technology vendors.

We cannot commit to the exactitude of all the data as we rely on third parties for it. The trends represent our own view of the market.

Now, let’s dive into the details of the marketing technology landscape of Swiss insurers.

Adobe Experience Manager is the leading CMS for Swiss Insurers

Adobe Experience Manager (AEM) is the most used Content Management System by Swiss insurers. Adobe’s CMS is the leading solution for content management systems and customer experience.

Ranking first for several years of the Gartner Magic Quadrant and The Forrester Wave, it makes sense to see Adobe’s CMS flagship widely used by large enterprises.

Of course, we, as a Swiss Adobe and AEM partner, are a bit biased at One Inside, but the market proves that it’s one of the best digital experience platforms.

AEM is used by 50% of the Top 10 Swiss Insurers and overall 36% in the top 25. Sitecore is also present. The .NET CMS is one of the top three solutions of the CMS segment and is often a choice for large enterprises.

Magnolia CMS is the little one of the trio with good market share. Magnolia is a Swiss-made CMS and has an important footprint in its home country. AEM actually also has Swiss roots (if you remember the days of CQ).

Blogpost_-Swiss-Insurances_CMS-Distribution

In a February 2021 EY survey, that interviewed the IT leadership from more than 70 European Insurers, the result was that insurers have a bold ambition to move to cloud solutions.

The survey says: “most insurers aim to move at least 80% of their business to the cloud in the coming years to meet the primary objectives of increased agility and digital transformation.“

We could argue that this trend is valid for Swiss insurers, and that the challenges they face are similar to other European companies: agility and time to market.

What does this mean for their choice of content management system?

At the core of the customer experience, lies the CMS. New cloud CMS solutions offer all the benefits insurance companies are looking for: cost optimisation and more agility with a speed of execution.

In the upcoming years, insurance companies might adopt such a solution and give up their on-premise solutions. They will get the benefits of delivering new customer experiences and releasing new offerings faster.

The current CMS landscape will evolve in the next 3 to 5 years, and insurance companies might then use native cloud CMS more commonly.

Adobe and its AEM Cloud Service already has an excellent chance to gain market share. Indeed, its CMS is, for the moment, one of the only cloud-native Enterprise-CMS on the market.

Sitecore and Magnolia are technically speaking behind and offer a “cloud” CMS closer to managed cloud hosting or PaaS than an actual SaaS model, such as Adobe does. New players might enter the market, such as Headless CMS vendors.

Analytics and Personalisation: Google vs Adobe

Google Analytics is the leading analytics solution and is used by two-thirds of the 25 top Swiss insurers. The only contender is Adobe Analytics.

Adobe Analytics is used by half of the top ten insurers. It is often used together with AEM and other solutions of the Adobe Experience Cloud, such as Adobe Target, for personalisation purposes.

Both analytics solutions answer the needs of enterprises. While Google Analytics might be more straightforward to set up, Adobe Analytics offers more web analytics features.

Blogpost_ Swiss-Insurances_Analytics-& -ersonalisation-Google-vs-Adobe

Analytics and personalisation often come in pairs but don’t always come together. While 100% of the companies use analytics solutions, not all of them use them together with a personalisation solution.

A personalisation solution can adapt the user experience based on behavioural information and helps the marketing team run A/B Tests.

Only ten of 25 Swiss insurers are using a personalisation solution. The leading vendors are Google, with Google Optimize 360, and Adobe, with Adobe Target.

Both products offer a set of similar features with A/B testing and personalisation.

Optimize 360 is natively integrated with Analytics 360, so you can use Analytics 360 reporting to understand where to improve your site quickly.

Adobe Target offers a native connection with Adobe Analytics, Adobe Experience Manager and other Adobe Experience Cloud solutions. The native integration with the CMS offers great opportunities, such as adapting the display of web components based on behavioural rules or even based on complex evaluation by AI.

For instance, the Swiss insurance company CSS offers a personalised experience by leveraging Adobe Experience Manager with Target.

While browsing the website and visiting different offerings and insurance products, the visitor will see teasers and other web elements aligned with his centre of interest. Find more information about the CSS project here.

A fragmented Marketing Automation landscape

For the marketing automation part, we focused mainly on lead generation and nurturing aspects via email.

Identifying the marketing automation solutions and email providers used by the 25 top swiss insurers was not the most straightforward task. Indeed, this information is not always publicly available.

From the information we gathered, we noticed that the technology landscape of marketing automation is fragmented.

No vendor is leading in this segment, and various software vendors are used, such as Salesforces, Adobe Campaign, Sendinblue, Emarsys, Campaign Monitor, XMPie, Mailjet and even Mailchimp.

It’s pretty surprising to find Mailchimp on the list 5 times total.

Why surprising?

Mailchimp is a marketing automation software targeting SMB and e-commerce websites, and the marketing automation features are less appropriate for large enterprises. But it seems that Mailchimp fulfils the needs of a few insurance companies anyway.

Use cases in regards to requirements are limited. The main lead capture tactics are executed through premium calculators or newsletter subscriptions. We believe the first one drives the most leads. Also we noticed that the nurturing campaign is not so advanced, at least for the ones we tested.

Few marketing automation success stories are publicly available.

Suva, the Swiss National Accident Insurance Fund, uses XMPie for its email campaigns and achieving a 127% increase in signatories to the Suva Safety Charter, from 1,500 to 3,414 companies thanks to an omnichannel onboarding mixing email and print document (Source).

Blogpost_ Swiss-Insurances_Marketing-Automation-Suva

Chatbot, an under-exploited channel

As we already noticed with the use of Marketing Automation software, chatbot solutions are also not very present on the public websites of Swiss insurance companies.

Chatbots offer similar benefits as Marketing Automation software. They are an excellent tool to capture lead information and engage with the client in the channel of their choice, if done right.

On top of this, chatbots get the advantage of offering immediate answers to clients while collecting great insights to the company about the questions and concerns of their audience.

Of the 25 companies from the study, 15 don’t offer any chatbot on their public website.

For the ten companies with a chatbot, again, the solutions are disparate. Some offer Whatsapp or Facebook bots; others offer simple rule-based chatbots. Very few companies fully embrace the conversation channel and offer advanced AI chatbots.

At One Inside, we had the chance to collaborate with CSS and build a fully integrated AI chatbot.

Firstly, the chatbot is seamlessly integrated into the layout of the website and offers an outstanding user interface. Secondly, the chatbot is completely integrated into the Adobe Experience Manager CMS.

Integration in AEM allows marketing teams to manage chatbot questions and answers directly from the same back-end they are used to administering their web content: It facilitates the reuse of web content. Plus, the chatbot is intelligent and learns over time.

The below video explains the AEM Chatbot Module.

It’s hard to explain why Swiss insurers are reluctant to deploy AI Chatbots.

To compare with the US market, according to a 2019 LexisNexis survey, more than 80% of large U.S. insurers have fully deployed AI solutions in place, including the research and development of chatbots – and this was two years ago.

It could be that the priority has shifted to other areas of the marketing stack, or that chatbot projects for enterprises are still hard to execute.

Mobile App, the customer portal in your pocket

84% of Swiss insurers offer a mobile app for iOS and Android smartphones. The few companies that didn’t invest in a mobile application are the ones that don’t have a customer portal.

Indeed, the primary mobile application use case for insurers is to offer access to all customer information directly from a smartphone. The mobile app is an extension of the customer portal.

Some insurers, such as Visana, found innovative use cases with their mobile application to increase customer engagement.

The solution built by Visana is called myPoints, and it’s a bonus programme. By doing more physical activity every day, you get rewarded with up to CHF 120 per year. An excellent way to stay healthy.

A step toward digital maturity

From the single prism of marketing technology, we were already able to judge the digital maturity of Swiss insurance companies.

The foundation of the customer experience is already in place, at least for the top Swiss insurers, and they make great use of their content management system to support their content strategy.

More improvement could be gained on automation, conversational channels, and artificial intelligence.

Especially in marketing automation, the maturity could be higher, for example, by introducing a modern solution, such as Adobe Journey Optimizer, the new cloud-based marketing automation solution from Adobe that is seamlessly integrated in Adobe’s real-time CDP.

We assume that certain insurers are already working on such solutions, as they are aware that they always need to modernise their marketing stack.

To conclude, the leading insurer, Zurich, sets ambitious targets to meet customers’ needs. As mentioned in this media release, Zurich will continue “to further transform insurance, using technology to meet changing needs and create rewarding experiences“.

We’re very excited about the results, and are curious to see if the other market contenders will follow suit.

Samuel Schmitt

Digital Solution Expert

Would you like to receive the next article?

Subscribe to our newsletter and we will send you the next article about marketing technolgies.

I subscribe

The post The Marketing Technology Landscape of Swiss Insurers appeared first on One Inside.

by Samuel Schmitt at August 31, 2021 04:30 PM

July 19, 2021

CQ5 Blog - Inside Solutions

Headless CMS with AEM: A Complete Guide

You might have already heard about Headless CMS and you may be wondering if you should go “all-in” with this new model.

In our complete guide, we are going to answer the most common questions, such as

What is the difference between Headless and traditional CMS?
Is headless the best choice for your next website implementation?
How to use a headless CMS for your next project

It’s best to understand what Headless CMS means before making any decision to start developing your next web project on a content delivery model that won’t fit.

At One Inside, our expertise relies on the implementation of the Adobe CMS, Adobe Experience Manager (AEM). We can show you what AEM can do in regards to content delivery — and in which case headless is recommended.

What is a traditional CMS?

This is likely the one you are familiar with. Traditional CMS uses a “server-side” approach to deliver content to the web.

The main characteristics of a traditional CMS are:

Authors generate content with WYSIWYG editors and use predefined templates.
HTML is rendered on the server
Static HTML is then cached and delivered
The management of the content and the publication and rendering of it are tightly coupled

Let’s define what a headless CMS is now.

What is a headless CMS?

A headless CMS decouples the management of the content from its presentation completely. Headless CMS can also be called an API-first content platform.

The authors create content in the backend, often without a WYSIWYG editor. The content created is not linked to a predefined template, meaning the author cannot preview the content.

The content is then distributed via an API.

The presentation of the content on the website, mobile app, or any other channels, is done independently. Each channel fetches the content and defines the presentation logic.

A headless CMS is mainly made of

A backend to create structured forms of content
An API to distribute content

Let’s speak about the last category of CMS supporting both traditional and headless.

What is a hybrid CMS?

A Hybrid CMS is a CMS supporting both content delivery models: the Headless and the traditional.

You can create content in a classic way, by creating a page in the backend, previewing the page, and publishing it.

On the other hand, the content created can be distributed via an API as well.

This hybrid approach is offered by the traditional CMS. It’s straightforward to deliver the content from their repository via API.

With a hybrid approach you get the best of both worlds:

Single source for content and assets
Multichannel delivery
Authors only need to learn one tool for all content authoring
The administration is simplified (one login, one server, one technology for content etc)

Is AEM a Headless CMS?

Yes, with Adobe Experience Manager you can create content in a headless fashion.

The content can be fully decoupled from the presentation layer and served via an API to any channels.

You might know that AEM offers a great interface for authors enabling them to create content by using predefined templates and web components.

The content can be organised hierarchically and published immediately to websites or any other channels.

As AEM offers the (very) best of both worlds, it supports the traditional approach and the headless way. AEM is considered a Hybrid CMS.

The Headless features of AEM go far beyond what “traditional” Headless CMS can offer.

How does AEM work in headless mode for SPAs?

Since version 6.4, AEM supports the Single Page Application (SPA) paradigm with the SPA Editor.

This enables content authors to build dynamic as well as content-focused applications as they are used to when working with creating pages.

SPAs are currently mostly used for static applications.

Enabling dynamic page creation, layouts and components in a SPA with a visual content editor shows how valuable AEM’s Hybrid CMS approach is.

With the SPA Editor 2.0 it’s possible to only deliver content to specific areas or snippets in the app.

We are going to look into several aspects of how AEM implements the headless CMS approach:

What is the difference between rendering HTML in the backend vs SPA?
How does the SPA WYSIWYG content editor work?
What are Content APIs?
How to develop SPAs with AEM
What is Server Side Rendering (SSR)?
How to use Content Fragments and the GraphQL API?

We will use the technical insight gained in this section to conclude what the pros and cons of SPAs with AEM are and in which cases this approach is best.

Rendering HTML in the backend vs Single Page Application

Traditionally, the HTML of a web page is rendered by a backend server.

The browser loads the HTML and linked resources. Javascript is then used to enhance the user experience with dynamic functionality.

When a user navigates to another page, this one loads again and the process is repeated.

This approach works well for simpler and static pages. However, websites have become more and more complex and feature rich.

Websites now often behave like full-fledged applications such as a social media platform or a banking portal.

The complex interactions, state management and consumptions of many APIs makes it difficult to develop and maintain frontend code.

That is why the Single Page Application method of developing dynamic web pages gained a lot of traction in the last decade. The responsibility of the view layer is shifted to the location where it is displayed: the browser.

With the SPA approach, the first page loads an empty HTML and Javascript / CSS. Javascript then dynamically assembles the webpage.

This is a simplified example of a HTML delivered by a SPA:

      <head>
	<source src="/spa/main.js">
<link rel="stylesheet" href="/spa/style.css">
</head>
<body>
	<div id="app"></div>
</body>

Any additional data or content like text, products, user account info, images etc. are requested from backend APIs.

If the user navigates to a different link, only the content area is replaced and re-rendered, the rest of the UI does not change. This makes the website feel like an application and not necessarily like a website.

The SPA WYSIWYG content editor

The AEM “what you see is what you get” editor was extended to support SPAs.

Seeing the content created directly in the app is a blessing for anyone who has worked with a form-based editor (of a traditional Headless CMS).

Even better, an author that is familiar with AEM will immediately feel at home and be able to create content without learning a new tool – while also reusing any other AEM content or assets from the traditional web page.

How did Adobe implement this? Here’s an overview of the main elements:

Blogpost_Headless-and-AEM_SPA-WYSIWYG-content-editor_v2

If this all looks very familiar to you – it is! Except for step 3, 7 and 8, it’s all the same as with a backend rendered page. You can find a more detailed view of this here.

Content APIs

We referred to Content APIs, also called AEM Content Services, a couple of times already.

What are they and how do they work?

AEM is built on the RESTful Sling framework. Architecturally, the visualisation layer is already completely decoupled from the data through the Java Content Repository. So in this regard, AEM already was a Headless CMS.

This shows that on any AEM page you can change the extension from .html with .json (or .infinity.json to be more correct) and AEM will return all the content for the request page. If you currently use AEM, check the sidenote below.

For this request AEM will return the raw data stored in the repository for the requested path. Even though this could be considered “Content as an API”, there are several issues with this approach:

The format of the data is unstructured
It’s difficult to work with for clients
Contains unneeded or unwanted data like the username of the author
Not stable in regards to changes
Paths will not be externalised
It is not possible to inject further logic, like resolving additional information for an image path

Therefore an additional layer was introduced called Sling Exporter Framework.

It allows us to easily define how existing Sling Models should be transformed and serialised to certain data formats like JSON or XML. It basically is the mapping for your data to the exposed data in the API.

Since Sling Models are already the basis of any modern AEM project this makes it straightforward to provide a transformation of the web content to JSON.

Adjusting existing Sling Models to support the Sling Exporter Framework usually just requires a single line:

      @Model(adaptables = SlingHttpServletRequest.class)
@Exporter(name = "jackson", extensions = "json")
public class Text {
	@ValueMapValue
	@Getter
	private String text;

	@ValueMapValue
	@Getter
	private boolean isRichText;
}

If this component is rendered with the selector .model., the following json will be generated by the exporter framework:

      {
	"id": "text-2d9d50c5a7",
	"text": "<p>Lorem ipsum dolor sit amet.</p>",
	"richText": true,
	":type":  ".../components/content/text"
}

Actual examples of how the data would be transformed can be found on the core components dev page.

In the example of the image component you can note the following:

User data is not exposed in the JSON (red)
Path is rewritten (yellow)
Models can be automatically enhanced with auxiliary data useful for clients (green).

In this case, AEM Core Components inject the required fields for Adobe Analytics with the Standardized Datalayer for modern Event-driver tracking.

With the corresponding Adobe Launch Extension this enables zero configuration Adobe Analytics Integration for Core Components.

Repository Data:

      jcr:primaryType: nt:unstructured
jcr:createdBy: admin
fileReference: /content/dam/core-components-examples/library/sample-assets/lava-into-ocean.jpg
jcr:lastModifiedBy: admin
jcr:created:
displayPopupTitle: true
jcr:lastModified:
titleValueFromDAM: true
sling:resourceType: core-components-examples/components/image
isDecorative: false
altValueFromDAM: true

JSON Exported Sling Model:

      {
  "id": "image-f4b958f398",
  "alt": "Lava flowing into the ocean",
  "title": "Lava flowing into the ocean",
  "src": "/content/core-components-examples/library/page-authoring/image/_jcr_content/root/responsivegrid/demo_554582955/component/image.coreimg.jpeg/1550672497829/lava-into-ocean.jpeg",
  "srcUriTemplate": "/content/core-components-examples/library/page-authoring/image/_jcr_content/root/responsivegrid/demo_554582955/component/image.coreimg{.width}.jpeg/1550672497829/lava-into-ocean.jpeg",
  "areas": [],
  "lazyThreshold": 0,
  "dmImage": false,
  "uuid": "0f54e1b5-535b-45f7-a46b-35abb19dd6bc",
  "widths": [],
  "lazyEnabled": false,
  ":type": "core-components-examples/components/image",
  "dataLayer": {
    "image-f4b958f398": {
      "@type": "core-components-examples/components/image",
      "repo:modifyDate": "2019-01-22T17:31:15Z",
      "dc:title": "Lava flowing into the ocean",
      "image": {
        "repo:id": "0f54e1b5-535b-45f7-a46b-35abb19dd6bc",
        "repo:modifyDate": "2019-02-20T14:21:37Z",
        "@type": "image/jpeg",
        "repo:path": "/content/dam/core-components-examples/library/sample-assets/lava-into-ocean.jpg",
        "xdm:tags": [],
        "xdm:smartTags": {}
      }
    }
  }
}

A complete example of a content structure which supports the Sling Exporter Framework might look like this:

Blogpost_Headless-and-AEM_content-structure-to-support-Sling-Exporter-Framework

Pages with their properties, editable template structure for static components, responsive grid (“parsys”) components and the content of the components – basically all information which describes the content of the page – are exported in a well-defined consistent API intended for clients to be consumed and rendered.

Sidenote for AEM users

Are you using AEM for your website?

If so, try replacing the .html with .json. Do you get any JSON back? If yes, you can try to add another childpath named “jcr:content” before .json: /de/home.html -> /de/home/jcr:content.json.

If any json is returned you will probably also see some technical metadata and a username. Depending how users are created, this may be a cryptic number, but it could also be a readable name or an email address.

In any case, you might want to discuss this with your team. Adobe recommends disabling the default Sling GET servlet on the productive publish instances.

Benefits of developing Single Page Application with AEM

Adobe put a lot of effort into making it as simple as possible to get up and running and develop SPAs with AEM.

All the mechanisms are also tightly integrated into existing AEM technology, making SPAs a first-class citizen in AEM.

In the following, we will give a small overview of how the setup looks to give some further insights on what impact this has on developer teams working with AEM.

Setup & Onboarding

Adobe provides a reference implementation called Core Components with a large set of components containing all the current best practices in regards to AEM development (SPA or non-SPA).

The projects can be locally set up with the project templating tool AEM Maven Archetype for on premise or AEM as a Cloud Service installation. It supports creating a React or Angular SPA project template with the following:

AEM base setup
Core Components
Setup for Sling Exporter Framework
A frontend build chain that builds and deploys all assets directly into AEM
Angular / React libraries for the AEM integration
A static preview server for local, AEM-independent frontend development

Further, there is a very good starting tutorial for React and Angular that will get developers up and running quickly.

Of course, this doesn’t mean that a developer is ready to deliver production-ready AEM SPAs solutions the next day, but it’s good to know that Adobe is committed to simplifying onboarding of developers.

AEM SPA Backend Development

AEM Backend Developers will have less work to do, because they no longer have to integrate the frontend and take the HTML and migrate it to HTML. This usually is a big pain point of any larger AEM project introducing bugs and costing time.

With the SPA approach, the interface between backend and frontend is no longer the HTML markup but instead the Content API, which can be predefined, specified and more easily adjusted.

Further, the responsibility of rendering the UI in the browser goes to the frontend team where usually the expertise in these areas lie anyway.

AEM developers can focus on what they know best: building the solid backbone of the application and ensuring content authors have the required tools and enjoyable interfaces to create content.

AEM SPA Frontend Setup

In AEM projects, frontend developers usually build a static prototype with a set of static components which are handed to the backend.

This is, as mentioned, usually a very inefficient process. It is hard to tackle this problem without requiring frontend developers to install AEM, which comes with its own set of problems.

Therefore teams just have to accept this aspect of AEM development. This is restrictive for frontend developers to build great user experiences.

With the SPA approach, frontend developers get the full power and responsibility of the frontend – without having to know a lot about or install AEM.

This is all due to the fact that the only communication with the backend is via the Content API and the clear separation of providing data and the presentation layer.

We know how Content APIs deliver the content. But how are they consumed from the frontend? There are three setups, each valuable depending on the context.

JSON Mock

A JSON mock file is basically a copy of an example output directly from AEM Content API.

The frontend developer can adjust this file, switch content, add components and more. They can simulate authoring in AEM by editing a file, as long as they move within the specification of the Content API.

Blogpost_Headless-and-AEM_Frontend-Setup-Mock

Point to remote AEM

The frontend setup can be configured to point to a certain remote AEM Instance like QA, pre-production or even production! Gone are the days where frontend had to struggle to reproduce an issue in their local environment.

This enables many more possibilities, like running a whole integration test suite in the frontend on productive content without having to move large amounts of data to a different stage.

Blogpost_Headless-and-AEM_Frontend-Setup-Remote-AEM

Running as part of the AEM installation

As might be expected, the frontend can be deployed to an AEM instance.

It is important to locally test the integration with the authoring environment and the SPA editor during development.

This part could also be taken care of by the backend developers to avoid forcing the frontend developers to install AEM locally.

Of course, this will also be the setup used on stages or on production.

Blogpost_Headless-and-AEM_Frontend-Setup-Part-of-AEM-Installation

AEM Frontend development

After understanding how the content comes to the SPA – how does the frontend code know how to render the content?

Similarly, how does the AEM editor know how to communicate with the SPA when the author changes content?

This is handled by the frontend libraries provided by Adobe for React and Angular.

Other frameworks are in consideration, but nothing was announced in regards to for example Vue or Svelte support. The only option in these cases would be to build a similar library.

These libraries contain functionality to parse the Content API output, instantiate the required components, fill the content properties and dynamically put them in the required order into the application context so that they are rendered on the page.

The same goes for other aspects like switching a page, routing and so on. Most of this works out of the box.

The main task the frontend developer has is to map the frontend components to the resource types of the backend Sling Models.

This will also enable the AEM editor to inform the frontend about which component needs to refetch its content when editing.

To demonstrate this in practice, consider the following standard angular component:

Blogpost_Headless-and-AEM_Frontend-Development-1

The component code has to be extended with the following:

Blogpost_Headless-and-AEM_Frontend-Development 2

This “maps” the frontend code to an AEM resource type which corresponds to a Sling Model which maps to the content in the repository.

Additionally, an “Edit Config” is provided to give hints to the AEM editor, for example when a component is considered “empty”, so that a placeholder can be displayed.

Blogpost_Headless-and-AEM_Frontend-Sling-Model-Mapping

Server Side Rendering (SSR)

There are two major concerns when using the SPA approach: initial page loading speed and SEO.

Both can be solved by implementing SSR into the architecture. We also will shortly discuss how to actually implement this architecture (source: Adobe).

Initial page load issue with SPA

As discussed, with SPAs only an HTML document with an empty body is sent to the browser.

Therefore, the browser can initially not display any content and the user has to wait until the Javascript – which for SPAs is usually quite a lot compared to more traditional sites – is done loading, parsing and executing.

In the meanwhile, only a blank screen or a loading spinner is visible.

For a complex website like a banking application this is less of an issue since the users probably accept some loading time.

For a content-focused page where the user expects it to load within a couple hundred of milliseconds, this is usually not acceptable.

Server Side Rendering solves this by instead executing the Javascript in the backend in a headless browser or a NodeJS server and returning the populated initial HTML to the client.

The browser can directly start rendering the HTML and the users get immediate feedback . The browser will still fetch the Javascript and the SPA will inject itself into the rendered HTML (this is usually referred to as “rehydration”).

From there on, the execution flow continues as if the page was initially rendered in the browser. For more details and considerations we recommend this article by Google.

SEO and SPA

When a search engine crawls a SPA it will only see a blank HTML without content to parse. This can cause issues with SEO and ranking.

Some crawlers, like Google, support executing Javascript. Similar to SSR they will execute the Javascript to then crawl and index the content.

However, Google does not treat HTML and Javascript rendered pages equally. There are always two passes: first raw HTML and then processed Javascript.

These passes are not treated equally, the first pass which only reads and processes the HTML, has priority. The consequences of this can be a (much) longer timeframe until a page is indexed and ranks on Google.

Even though in the disadvantages regarding Google seem to be decreasing, not all search engines support executing Javascript.

There are also other contexts where non-SSR SPAs become problematic, for example generating a preview to share a page on social media.

Server Side Rendering in AEM

The groundwork and architectural pattern for SSR with AEM is proposed by Adobe and the recommendation is to use Adobe I/O to build the infrastructure.

We also heard that Adobe is working on an extended tutorial or possibly even reference implementation for SSR with Adobe I/O.

But for now this would have to be developed with a custom implementation on the basis of this sample code base.

How does Headless AEM work for clients that are not web-based?

So far this article focused on content-focused web pages or mobile hybrid SPAs.

The headless capabilities of AEM and decoupling content from rendering HTML enables many more use cases and applications where content needs to be displayed from native Android or iOS Apps, Social Media Snippets, digital signage systems to small IOT devices.

To accommodate such a vast ecosystem, loosely structured web content is problematic.

However, the rich feature set of AEM also allows to create structured content according to a predefined model by using the “AEM Content Fragments” feature.

Content fragments are predefined form-based or simple rich text pieces of content which can be linked and structured. Other content from AEM like text, assets and tags can of course be reused in content fragments as well.

Fetching structured data with GraphQL

Recently AEM was extended to allow consuming content fragments with GraphQL (besides the already existing simpler JSON APIs).

In concept, GraphQL can be compared to a SQL database query, the difference being that the query is not used for a database but instead an API.

This allows different clients to query the API according to their own needs instead of the API having to provide different endpoints returning different amounts or sets of data for different clients.

For example, a smartwatch might want to display less content than the corresponding app and would query only what is needed without the backend having to support this use case.

Since GraphQL requires a predefined data structure it would not work that well with web content and content fragments were the obvious choice.

Adobe is working on GraphQL support and additional features like subscription, mutation and pagination are on its way. Due to the flexibility of the query possibilities, performance is a key topic for any GraphQL API.

Adobe plans to tackle this by using “persisted queries”.

A client will first “register” a query. AEM will give a handle for the query. This query handle can then be invoked with a simple GET call which can be cached, making any following query fast and scalable.

When to implement Adobe Experience Manager in a headless way

As discussed in a previous article about Headless CMS vs Hybrid CMS, you cannot go Headless at any price. There are some limitations and it might not be the best for every use case.

Let’s go through the pros and the cons of the headless approach.

The pros of the headless approach:

Enables implementation of content-focused SPAs with dynamic pages, layouts and components (for web or hybrid mobile apps)
Delivers a unique user experience for content pages achievable with SPAs (e.g. no page reload)
Well known WYSIWYG authoring environment for SPAs
“Full” integration for content focused SPAs, (upcoming) “light” integration for existing projects or limited content areas
Content & Asset reuse – all content from one CMS
Omnichannel delivery of content to any type of client from smart watches to digital signage to IOT
Supported by Adobe (Editoring, Backed technology, frontend libraries, project setup)
Separation of concern: AEM developers build the backend, frontend developers the frontend – no “integrating of markup” anymore
Probably fewer AEM developers and more React / Angular frontend developers required, simplifies hiring

The cons of the headless approach:

SSR setup required, especially for content focused web pages (initial page load & SEO)
No clear guidance on how to implement SSR from Adobe (yet)
Traditional approach well established, architectures with many successful projects scaled to large user bases
Many unknowns: learning curve for developers, some bumps in the road are to be expected with any new technology
Frontend developers still have to consider the AEM editor with some limitations e.g. no use of the CSS viewport height (vh) units
Not all AEM features are supported (yet)
Only Adobe support for Angular and React, no Vue or Svelte

From this we conclude the following three key points:

SPAs for content focused web pages is now a valid approach that makes it possible to deliver new UX without any sacrifices in content creation with (almost) the full capabilities of AEM supported. This requires the implementation of an architecture with SSR which has to be custom built until there is direct guidance from Adobe.
AEM is a fully capable headless CMS that can deliver content to any device or screen with modern technologies and standards (JSON API, GraphQL etc) which should be able to scale to large user bases due to performance optimisations by Adobe.
Separation of concerns in regards to providing data and presenting data on the technical level means great improvements for the developer teams. No integration of markup. Frontend developers get the freedom they need and will be happier coders. Backend developers can focus on what they know best. A side benefit of this is that probably fewer AEM developers will be needed which are much more difficult to hire than React or Angular developers.

From these takeaways we can recommend AEM headless or hybrid to be considered when the following points are met:

You aim to deliver the same experience and code base for a content-focused page on the web and a hybrid mobile app.
You struggle to find enough AEM developers for web-based projects but have a strong team of frontend developers.
You have an existing SPA and want to display content in limited areas of the up (wait for upcoming SPA editor 2.0).
You want to deliver content from AEM to platforms that are not web technology based (headless).

Finally, is Hybrid CMS the best solution?

Yes, absolutely!

Hybrid CMS is the future, because it makes possible to keep the established traditional approaches while being able to deliver content to any other device or platform, all from a single, consistent user interface using modern technologies like SPAs and GraphQL.

A single platform for all your content also means reusing content across all platforms!

Text, assets, tags, Content Fragments, Experience Fragments – all can be reused on your traditional site, SPA, native iOS or Android App, digital signage (AEM Screens) or on a toaster with a display.

Even better, through the tight integration of AEM into the Adobe Experience Cloud, content can be further reused in emails (Adobe Campaign or Marketo), personalisation (Adobe Target) and many more tools and technologies.

For web technology based projects, it also allows you to split the teams according to the separation of concern.

This means that developers can focus on what they know and do best. Plus, hiring might be simpler because potentially less AEM developers will be needed.

The drawback? All this technology is quite new and there are no established best practices yet, so there naturally are some unknowns and risks for new projects.

Basil Kohler

AEM Architect

Would you like to receive the next article?

Subscribe to our newsletter and we will send you the next article about Adobe Experience Manager.

I subscribe

The post Headless CMS with AEM: A Complete Guide appeared first on One Inside.

by Samuel Schmitt at July 19, 2021 11:30 AM

June 07, 2021

CQ5 Blog - Inside Solutions

AEM Screens: Questions, Answers and Lessons Learned

At One Inside we have the chance to collaborate with our customer SBB, the Swiss Federal Railway, on innovative projects.

These projects blend many channels and sources of information to provide travellers with up-to-the-minute information throughout their journeys.

These are next-generation omnichannel experiences.

There is one particular project that we want to highlight today, the “CMS customer information”.

The aim of this project is to deliver valuable customer information to different screens at train stations, among them Smart Information Displays – touch-screen based kiosks that will be available on several hundred train stations all over Switzerland.

On the solution side, Adobe Experience Manager is used as the base content management system, and AEM Screens, Adobe’s Digital Signage solution, is used to deliver content to the various screens.

You may be wondering:

How to leverage a digital signage solution?
What is AEM Screens?
What lessons did we learn from the project?

We will answer all those questions and more below

What is AEM Screens?

Adobe Experience Manager (AEM) Screens is a digital signage solution allowing you to publish dynamic content and digital experience to a vast variety of screens and displays at your store, premises or at train stations.

The beauty of AEM Screens is that it is part of the Adobe Experience Manager solution, Adobe’s powerful CMS.

It enables marketers and content editors to work from a single place, create content for the website, mobile channels and as well push assets to displays.

The aim is to manage all assets for any channel from one simple and intuitive interface.

The video below is a replay of a webinar held in January 2021, where we detail our project with the Swiss Railways.

Later in this article, we share the audience questions, together with more insights about AEM Screens and its integration in a large enterprise ecosystem.

What kind of displays and screens can you manage with AEM Screens?

Any kind of screens, menu board, touchscreens, widescreens and more can be used with AEM Screens. You can then deliver unified and useful experiences into physical spaces.

In the project done for SBB, the Swiss Railway, we introduced 3 kinds of screens:

Smart Information Displays (SID): Their main purpose is to replace the paper timetables and network plans and offer information in an interactive way at the train station.
E-Panels (ad screens): During large disruptions, ad screens can be used to provide customer information.
Inspiration Desks: Located in Traveler Information Centers, inspiration desks inspire customers to buy leisure trips.

What are the key features of AEM Screens that you used in this project?

There are several features that we used and would not want to miss:

Configuration: It’s very easy to connect a new screen to the system and provide it with a new configuration. For example, this is how we configure the Smart Information Displays (touch-screen based kiosks).
Content: AEM Screens is directly integrated into AEM Sites and AEM Assets and can use the same content that is used on the website or in other channels. It’s easy to navigate to a specific screen (just choose the train station and then the corresponding screen(s)) and add content to it.
AEM Screens Player: AEM Screens has its own player that runs on the media player hardware that delivers the signal to the screen and acts as a client for the AEM Screens CMS. This setup allows us to provide any kind of information to a screen – be it a loop of nice images and videos, a single-page-application that interacts with the user through a touch display, or service disruption information pushed to the screen within seconds.
Flexibility: AEM is basically a hybrid CMS (a mixture between a traditional and a headless CMS with a digital signage package attached to it. Because it’s heavily based on open interfaces and standards, we were able to find an interface to connect to every backend system and every screen we wanted to. This was crucial more than once during the project.

What is the impact on content creation, or how easy is it to use existing assets with AEM Screens?

AEM Screens is part of Adobe Experience Manager, the famous content management system for enterprise.

AEM Screens uses the same user interface and functionality for content management as AEM Sites – the functionality that allows creating websites – so that authors feel at home immediately.

Existing assets can be used either by dragging and dropping them manually into channels where they get immediately published to the screens.

Or existing web content can be reused by re-rendering it to a screen-optimized design (using specific stylesheets) – for example, you don’t want links to external sites that a user at a touch-based display can tap on.

For the project with SBB, most of the content we push to screens is automatically consumed from backend systems, enriched with additional assets, and rendered for the specific screen, all automatically.

Is it possible to build interactive and personalized experiences with AEM Screens?

Yes and Yes. Actually, we already built an interactive experience with the Inspiration Desks to be rolled out in all-new travel information centers at large train stations in Switzerland.

It’s based on a single page application (SPA) that uses Angular and loads data (from AEM Sites) asynchronously.

The SPA runs in an AEM Screen channel and can be replaced by an idle channel that shows images and videos when nobody is at the desk.

Interaction with the SPA happens through a touch-based screen limiting some functionalities (e. g. no keyboard entries), but this is not a huge problem. QR codes are used to get content to the user’s mobile phone.

If we can identify a user standing at a screen, for example through NFC tags or another method, all personalization functions that are available within AEM can be used, including personalization and optimization provided by Adobe Target.

Basically, anything that can be done on a website can be done with AEM Screens as well.

Is it possible to integrate analytics with AEM Screens?

Yes. Actually, that is exactly what we will do, at least for the interactive screens, such as the Inspiration Desks (touch-screen enabled desks with travel tips and useful information about Switzerland and the SBB).

For the Inspiration Desks, we want to learn how users use them. Which content is interesting to them, and which isn’t.

In addition, we want to display QR codes that bring travel tips to mobile phones. If someone then buys a ticket after reading a travel tip, we will see his first interaction on the Inspiration Desk.

Did you use AEM Content Fragment and/or Experience Fragments with AEM Screens?

Yes we did. We used AEM Content Fragments to configure centrally managed information blocks displayed on various screens.

It’s a great out-of-the-box mechanism offered by AEM Sites. Our team configured this mechanism for the purpose of AEM Screens easily.

As you can set up specific permissions on the Content Fragment feature, we allowed a few content editors to create fragments and edit content.

The screens manager can then pick the content (fragment) they want to display on their screens.

Is it possible to use an API Gateway with AEM Screens?

We don’t use an API gateway, we use the APIs that Adobe Experience Manager provides.

AEM provides a lot of interfaces, and mostly we use REST APIs exchanging JSON objects, as all other systems are able to understand them. It’s easy to implement such an interface.

How to manage multi-language displays with AEM Screens?

As you are probably well aware, Switzerland has 4 official languages, 3 of them are used by SBB, together with English at airport train stations.

Basically, each station has its “home” language. All communication on said station shall be in its home language, and maybe a second language.

However, it’s not that easy to find out in which languages to communicate on each station.

Our solution is able to support all languages (de, en, it, fr). How many languages a message is submitted in doesn’t matter – we will rebroadcast the message in all languages available.

The SBB employee in charge of service disruption information at the Traffic Control Center decides which languages to use.

Have you used a CDN?

Yes, we have, but only because Adobe Experience Manager is already used for the website, and therefore has a CDN (Content Delivery Network) already. We can use already existing setups and workflows.

If we had had to build the system from scratch, we probably wouldn’t have used a CDN, as all the consuming systems are basically located within the SBB network, and only a small amount of images are used.

In fact, the Smart Information Displays use AEM Screens’ functionality to cache images and other content locally, so that they can provide content even if the connection to the network is lost (offline functionality).

What’s the acceptable latency between someone making a change in the CMS and the corresponding displays being updated?

Speed is essential, especially during a service disruption. Not only because everybody is waiting for information, but also as information can and will be updated quite often, for example when busses are available or other connections are re-routed.

Once the SBB employee at the Traffic Control Center sends out a message, it’s processed in the CMS in less than a second.

The E-Panels, for example, poll the CMS every 20 seconds for a new message and need about 5 to 10 seconds to switch from advertisements to a disruption message.

All together it takes about 30 seconds from the publication of the message to its presentation on the screens.

This is much faster than the current system, and in the near future, we might bring it down to a few seconds for other screens.

Who is the main target audience for the digital signage solution considering that most Swiss people have smartphones with the SBB mobile app installed?

Most of the travellers have the SBB mobile app or at least a smartphone where they can get information from the website.

But imagine a larger train station full of people during a large service disruption. There needs to be information, and the information needs to be accurate, fast, and consistent.

That’s what we’re working on. You’ll always be faster looking at a monitor than searching for your next train on your mobile phone.

Furthermore, we plan to integrate the website and the mobile app as additional channels into the solution to ensure that the same information is served to all users.

For what kind of other use cases could AEM Screens be used?

There are many use cases imaginable – basically all use cases that involve a display, be it a touch-based interactive screen or a simple, passive display.

Any information can be put onto a screen, be it advertisements, information about the operative situation of a railway network, or timetable.

It gets interesting when one starts to combine sources: A (big) screen is still quite an investment, therefore if it can be used for multiple purposes – e. g. ads during idle times, a timetable during rush hours, and a service disruption message during service disruption – basically multiplies the value of a screen.

AEM Screens lets you configure different channels with specific content and allows you to manage these channels based on schedules or other rules, even manually, so that you are always in control of what is displayed on your screens.

If you want to know more, we’ll gladly tell you everything about AEM Screens and discuss your scenario. Don’t hesitate to contact us.

Michael Grob

Senior Consultant Digital Marketing

Would you like to receive the next article?

Subscribe to our newsletter and we will send you the next article about Adobe Experience Manager.

I subscribe

The post AEM Screens: Questions, Answers and Lessons Learned appeared first on One Inside.

by Samuel Schmitt at June 07, 2021 08:44 AM

June 03, 2021

Things on a content management system - Jörg Hoh

AEM micro-optimizations (part 3)

Welcome to my third post on AEM micro-optimizations. Again with some interesting ways how you can improve your AEM application performance, somethings with little improvements, but sometimes with significant ones.

During some recent performance optimization I came across code, which felt a bit odd. Technically it was quite easy:

for (Item item : manyItems) {
  proprocessSingleItem (resolver, item);
}
void processSingleItem (ResourceResolver resolver, Item i} {
// do something with the resourceResolver
resolver.commit();
}

That is indeed a very common pattern, especially in software, which evolved over time: You have code, which deals with a single item. And later, if you need to do it for multiple items, you execute this code in a loop. Works perfectly, and the pattern is widely used.

And it can be problematic.

If you have an operation in that performSingleItem() method, which comes with a method creating some overhead . Maybe you are not aware of that overhead, so it goes unnoticed. Maybe you expect, that if a that performSingleItem() method takes 5 ms for an item, requiring 50 ms for 10 items is ok. Well, an O(n) algorithm isn’t too bad, is it?

But what if I tell you, that the static overhead of that method is that so large, that providing 10 items as parameters instead of just one will increase the runtime of it not by a factor of 10, but only by a factor of 1.1?

Imagine you need to go grocery shopping for your Sunday dinner. You get yourself ready, take the bike to the grocery store, get the potatoes you need. Pay, and get back home. Drop the potatoes there. Then again, taking the bike to the grocery store, getting the some meat. Back home. Again to the grocery store, this time for paprika (grilled paprika are delicious …). And so on and so on, until you have everything you need for your barbecue on Sunday. You spent now 6 hours mostly on the bike and waiting at the counter.

Are you doing that? No, of course not. You drive once to the grocery store, get all the things and pack them onto your bike, and get home. Takes maybe 90 minutes. Have the static overhead (cycling, waiting at the counter) just once saves a lot of it.

It’s the same in coding. You have static overhead (acquiring locks, getting database connections, network latency, calling through thick framework layers will just copying references to the data), which is not determined by the amount of data you process. But unlike in the example of grocery shopping it’s not directly visible at which times there is such a static overhead, and unfortunately documentation rarely point that out.

Writing to the repository comes with such a static overhead; and it can be like a 20 minutes ride to the grocery store. Saving 10 times smaller batches definitely takes more time than saving once with a batch of 10-times the size. At least if you keep the size of the changeset limited, for details here check this earlier posting of mine.

Check this great presentation of Georg Henzler at adaptTo() 2019 (starting at 17:00min ) (slides) for some benchmark data, how the size of the changeset influences the time to save (spoiler: for realistic sizes it does not really increase).

So I changed the above code to something like this:

for (Item item : manyItems) {   
  proprocessSingleItem (resolver, item);
} 
resolver.commit();

void processSingleItem (ResourceResolver resolver, Item i} { 
  // do something with the resourceResolver but no commit
}

Switching to this approach improved the performance for ~ 100 items by a factor of more than 10! And that’s an impressive number for such a minimal change.

So check your code for this specific coding pattern, find out if the parameters are good (that means small changes) and add some performance logging. And then convert to this batching mode and see what your numbers are doing.

Of course, very often this saving is operating in the context of a much larger operation, and a 10 times improvement in this area will only speed up the larger operation of 12 seconds to 11 seconds. But hey, when you get this 1 second for almost free, just do it (and we are still talking about micro-optimizations). But nothing prevents you from taking a deeper look into what the system is doing in the remaining 11 seconds.

Leave me a comment if you have some interesting story to share, where such small changes resulted in big improvements.

by Jörg at June 03, 2021 02:41 PM

May 24, 2021

Things on a content management system - Jörg Hoh

AEM micro-optimization (part 2)

Micro optimizations are important, and their importance is described by a LWN posting about the linux kernel:

Most users are unlikely to notice any amazing speed improvements resulting from these changes. But they are an important part of the ongoing effort to optimize the kernel’s behavior wherever possible; a long list of changes like this is the reason why Linux performs as well as it does.

And is not specific for the Linux kernel, but you can apply the same strategy to every piece of software. AEM as a complex (and admittedly, it can sometimes be really slow) beast applies the very same.

There are a number of cases in AEM, where do you operate not only single objcets (pages, assets, resources, nodes), but apply the same operation on multiple of these objects.

The naive approach of just iterating the list and execute the operation on a single element of that list can be quite ineffective, especially if this operation comes with a static overhead.

Some examples:

For replication there are some pre-checks, then the creation of the package, the creation of the sling jobs (or sending the package to the pipeline when running on AEM as a Cloud Service), the update of the replication status, writing the audit log entries.
When determining the replication status of a page, the replication queues need to checked if this page is still subject to a pending replication, which can get slow when the queues are full.
Committing changes to the JCR repository; there is a certain overhead in it (validating all changes, comitting them to permanent storage, invoking the synchronous listeners, locking etc).

And in many cases these bottlenecks are known for a while, and there is API which allows to perform this action in a batch mode for a multitude of elements:

Replication: Batch replication (you can provide a number of path strings)
Getting status for a large amount of resources: ReplicationStatusProvider.getBatchReplicationStatus
The Audit Log
and many more

(The ReplicationStatusProvider has been introduced some years back when we had to deal with large workflow packages being replicated, which resulted in a lot of traversales of the replication queue entries. Adding this optimized version improved the performance by at least a factor of 10; so even in less intense operations I expect an improvement.

So if you have a hand-crafted loop to execute a certain activity on many elements, check if a more efficient batch API is available. There’s a good chance that it is already there.

If you have more cases where batch mode should be available, you it isn’t, leave a comment here. I am happy to support to either find the right API or potentially kickstart a product improvement.

by Jörg at May 24, 2021 02:28 PM

May 17, 2021

CQ5 Blog - Inside Solutions

The Headless CMS Adventure: More than a trend?

The Headless CMS Adventure: More than a trend?

The CMS world goes through a revolution during which it reinvents itself every few years.

Currently, the latest trend is the so-called ‘headless’ CMS, which completely separates the management of content from its presentation.

As a result, a question arises from website owners and web developers: do we need a headless CMS?

Headless CMS offer many advantages compared to traditional CMS, but the added value is also dependent on the context, the number of channels, and what you are trying to achieve with your content.

In one of our projects, we followed a headless approach and learned a lot from it.

Today, we would like to share our learnings with you and hopefully, this will help you decide which direction is best for your future web project.

What are the advantages of a headless CMS?

A headless CMS differs from a traditional one, often described as ‘monolithic’ CMS.

In a headless fashion, the content and presentation layer of the website are completely separated.

Moreover, the presentation is outside of the responsibility of the CMS and must be implemented elsewhere.

The headless strategy does have advantages: the CMS can only take care of the content, and its functionalities can be completely optimised for just that.

As the content does not contain any information about how it is to be presented, the same content can be used for several output channels – such as the website, a mobile app, social media and even print media at best.

In contrast, a monolithic CMS takes care of both the creation and the presentation of the content in ‘pages’. The page structure ultimately corresponds to the navigation of the website.

Depending on the architecture of the CMS, content may be placed directly on pages, which makes content reuse difficult on other channels. In other architectures, there may exists a certain logical decoupling between content and design.

How does Adobe Experience Manager manage content?

In Adobe Experience Manager (AEM), the CMS within the Adobe Experience Cloud, content is usually assembled directly by the authors into pages that then correspond (more or less) to the web pages 1:1, even if, technically speaking, the content management and the presentation of the pages are separated.

Why am I telling you about AEM? Pretty simple.

Firstly, because One Inside is a certified Adobe Partner and our teams are experts in the implementation of AEM projects.

Secondly, because it was the CMS used for the project (as in almost all of our projects).

Now let’s talk about this project.

The project: going headless at any price

The goal of the project was to display content (already available on the website) in another channel: a digital kiosk.

A digital kiosk is a ‘vending machine’ consisting of a computer and a touch screen. The users of the digital kiosk can get informed about the products offered in a self-service way.

Our team joined the project relatively late, after the technical solution had already been designed.

The kiosk leveraged Angular to display information to its users.

The (front-end) application of the kiosk was already halfway implemented. The architectural decision had been taken as well: the kiosk should store all content locally, on the device.

As the solution was designed, it was not possible to benefit directly from the main website created in AEM. Otherwise, it would have been possible to create a version of the website adapted to the kiosk with relatively little effort.

Instead, REST-APIs had to be used to provide content to the kiosk. A library of JSON objects had to be defined for the content and the configuration of the kiosk. Thanks to the openness of AEM and direct support of REST and JSON, this could be implemented with reasonable effort.

Is headless always best?

During the project, another advantage of separating content and presentation was highlighted: the frontend team and the backend team could work independently from each other.

The backend team was in charge of managing the content in AEM while the frontend team took care of an Angular app for the content presentation.

Each team can

concentrate fully on their area of expertise,
work with their usual technologies and frameworks,
and hardly has to consider the other party.

A weekly coordination meeting plus an interface definition (properly documented) is sufficient for the team’s collaboration.

The architecture diagram above shows that the headless approach is justified for this use case: in the course of the project, other screens and devices will be connected, each device via its own interface.

These can be implemented in a short time with little consultation between the teams – and new channels or devices can be added at any time.

Welcome to the wonderful world of headless CMS!

Now all existing monolithic CMS can be replaced!

Or maybe not.

With the first mentioned device, the kiosk, we could have saved a lot of time in our project if we had simply implemented a slimmed-down website directly in AEM.

Headless was an overhead and introduced unnecessary complexity.

A lot of functionality had to be created using different technologies that would have been available already.

AEM’s CMS is powerful and offers the possibilities to create a light version of a website, for example based on Experience Fragments.

Too much effort for simple websites

Especially for websites or web-related channels, the headless approach might not be a better solution than a traditional approach.

While backend for content repository is built (the CMS), the frontend is created completely independently (with Angular, React or a similar framework).

On the front-end side, everything has to be developed from scratch. Every component, every page template, including the entire navigation.

This is an additional effort that should not be underestimated compared to design adaptations of the existing website with AEM and Core Components.

Think content first

What is overlooked is what I believe to be the biggest problem with the headless approach: how does content fit into the website, or rather how is a page structure defined?

On a highly structured site such as a news portal, where articles are assigned to categories (e.g. politics, sports) and displayed by topics, this can be done automatically, and similar solutions can be found for e-commerce.

In such cases, when the navigation can be automated, a headless approach might be appropriate.

For an average corporate website where content is displayed in a semi-structured hierarchy, the frontend has to request the right content.

This can go as far as that the whole navigation has to be created in the frontend making it extremely difficult to support it via the backend.

All functions for structuring content, which are taken for granted in a conventional content management system, must be created in the frontend with great effort, if at all possible.

In our example with the kiosk, this problem had to be solved in such a way that the entire content structure per device is maintained in the CMS and then made available to the respective device as a JSON object.

The kiosk then procures the content itself and ensures the appropriate presentation.

In the long run, the coordination between frontend and backend becomes problematic: even the smallest errors in the JSON configuration file lead to errors in the frontend.

The advantages of a traditional CMS outweigh the headless approach in this case.

Hybrid CMS: the best of both worlds

Fortunately, Adobe has not stopped at the monolithic approach but has shown that Adobe Experience Manager is considered one of the most innovative products on the market for a good reason.

Content Services and related technologies such as Content Fragments and Experience Fragments allow content management and presentation to be separated. Plus, all content can also be retrieved as JSON objects via REST APIs.

This approach, known as ‘hybrid CMS’, offers a headless CMS in addition to the traditional, page-based CMS optimised for websites.

Content prepared for the website can be made available to other systems such as displays, kiosks, apps or the Internet of Things (IoT) via the corresponding APIs.

The question is no longer whether headless or not, but only: when do I go for headless and when for the traditional approach?

Making the right choice can decide the success of a project.

We are happy to help you choose the optimal content management strategy for your project and show you an efficient, cost-effective way to succeed.

Talk to us.

Michael Grob

Senior Consultant Digital Marketing

Would you like to receive the next article?

Subscribe to our newsletter and we will send you the next article about Adobe Experience Manager.

I subscribe

The post The Headless CMS Adventure: More than a trend? appeared first on One Inside.

by Samuel Schmitt at May 17, 2021 09:51 AM

May 11, 2021

CQ5 Blog - Inside Solutions

Introducing a step-by-step approach to build chatbots for enterprise

Chatbots and other conversational marketing solutions have been around for a while now. Most of us are used to seeing little chat boxes at the bottom of a website.

But how do they work?

How can enterprises tackle these projects and offer a conversational channel to their customers?

How do you leverage the power of AI and natural language processing (NLP) to build proper dialog and conversations within a chatbot solution?

After discussing and interviewing customers and prospects, we came to the conclusion that the project’s steps are not always obvious. Chatbot projects are a rather new challenge compared to website projects.

Today, we want to share our knowledge in a complete guide on building a chatbot for enterprise. The whitepaper details our methodology in five steps. It explains the challenges, risks, and shares best practices.

What you will find in the Chatbot Journey whitepaper

The whitepaper has been created by our chatbot experts and consultants. They explain the steps to guide you towards your first conversational experience: from ideation to implementation.

We have highlighted the tasks to be done as well as the effort and team members required.

To make things easier, we created a five-step plan to successfully run chatbot projects and to share insights about tasks, required skills and challenges:

Step 1 – From idea to roadmap: make a list of ideas and define business cases
Step 2 – Turning a roadmap into a plan: bring all the experts together and define the solution end to end
Step 3 – Designing the chatbot and conversations: build your ideas into the chatbot
Step 4 – From training to go-live
Step 5 – Scale and optimize the chatbot experience: enhance the conversation

Our approach was designed with the requirements and challenges of enterprises in mind.

You should read this whitepaper if you plan to create a chatbot for your enterprise and will have a key role in your organization, such as:

Marketing Manager: you are in charge of digital marketing activities and customer experience. Your goal is to build a new channel to engage with your customers.
Digital Project Manager: you are in charge of your organization’s digital projects and a chatbot is a brand new channel that you have to tackle soon.
Executive: you might be heading marketing or the IT department and oversee the digital transformation at your company. It’s important for you to understand the challenges and risks that come with such a project.

Do you want to read the whitepaper?

You can check the brand new Chatbot Journey Whitepaper out right here:

Get started with our chatbot for entreprise guide

Read our complete Guide to Enterprise Chatbots

Download the whitepaper

If you have additional questions about our methodology, our experts are here to answer all your questions.

Clemens Blumer

Senior Software Architect

Would you like to receive the next article?

Subscribe to our newsletter and we will send you the next article about Conversational Marketing and Chatbot.

I subscribe

The post Introducing a step-by-step approach to build chatbots for enterprise appeared first on One Inside.

by Samuel Schmitt at May 11, 2021 08:02 AM