Planet CQ

October 07, 2025

Things on a content management system - Jörg Hoh

Writing backwards compatible software

On last week’s adaptTo() conference I discussed the topic of AEM content migration with a few attendees, and why it’s a tough topic. I learned that the majority of these adhoc-migrations are done, because they are mandated by changes in the components themselves. And therefor migrations are required to adhere to the new expectations of the component. My remark “can’t you write your components in a way, that they are backwards compatible” was not that well received … it seems that it this is a hard topic for many.

And yes, writing backwards compatible components is not easy, because it comes with a few prerequisites:

  • The awareness, that you are making a change, which breaks compatibility with existing software and content. While the breakages in “software” can be detected easily, the often-times very loose contract between code and content is much harder to enforce. With some experience in that area you will develop a feeling for that, but especially less experienced folks can make such changes inadvertently, and you will detect that problem way too late.
  • You need to have a strategy which tells how to handle such a situation. While the AEM WCM Core Components introduced a versioning model, which seems to work quite nicely, an existing codebase might not be prepared for this. It forces some more structure and thoughts how to design your codebase, especially when it then comes to Sling Models and OSGI services, and where to have logic, so you don’t duplicate it.
  • And even if you are prepared for this situation, it’s not for free, you will end up with new versions of components which you need to maintain. Just breaking compatibility is much easier, because you still will have just 1 component.

So I totally get if you don’t care about backwards compatibility at all, because you are in the end the only consumer of your code, and you can control everything. You are not a product developer, where backwards compatibility needs to have a much higher priority.

But backwards compatibility gives you one massive benefit, which I consider as quite important: It gives you the flexibility to perform a migration to a time which is a good fit. It’s not that you need to perform this migration before, in the midst or immediately after a deployment. You deploy the necessary code, and then migrate thecontent when its convenient. And if that migration date is pushed further for whatever reason, it’s not a problem at all, because this backwards compatibility allows you to decouple the technical aspect of it (the deployment) from the actual execution of the content-migration. And for that you don’t need to re-scope deployments and schedules.

So maybe this is just me with the hat of a product developer, who is so focused on backwards compatibility. And in the wild the cost over backwards-compatibility is much higher than the flexibility it allows. I don’t know. Leave me a comment if you want to share your opinion.

by Jörg at October 07, 2025 01:38 PM

September 26, 2025

Things on a content management system - Jörg Hoh

Why I would deprecate InjectionStrategy.OPTIONAL for Sling Models

Sling Models offer a very convenient abstraction, as they allow data from the repository being mapped into fields of Java POJO classes. One feature I find often used is the optional InjectionStrategy. By default if an injection is not working, the instantiation of the POJO fails. When the InjectionStrategy.OPTIONAL field is set in the model annotation (see the Sling docs), such a non-working injection will not fail the creation of the model, but instead the field is left with the default value of the respective type. Which is null for Strings and other complex types. And this setting is valid for the entire class, so when you want to write reliable code, you would have to assume that every injected String property could be null.

This comes with a few challenges, because now you can’t rely anymore on values being non-null, but you would need to test each field if a proper value has been provided. Which is sometimes done, but in the majority of cases it is just assumed, that the field is non-null.

I wonder, why this is done at all. Because normally you write your components in a way that the necessary properties are always available. And if you operate with defaults, you can guarantee with several ways that they are available as soon as the component is being created and authored for the very first time. And while for a few cases a missing property must be dealt with for whatever reason, it is never justified to treat all property injections as optional. Because that would mean, that this sling model is supposed to make sense of almost any resource it is adapted from. And that won’t work.

And if a property is really optional: we added some time back the feature to use something like this (if you really can’t give a default value, which would be a much better choice):

@ValueMapValue
Optional<String> textToDisplay;

With this you can express the optionality of this value with the Java type system, and in that case it’s quite unlikely to miss the validation.

But if it would be just be up to me, I would deprecate InjectionStrategy.OPTIONAL and ban it, because it’s one of the most frequent reasons for NullPointer exceptions in AEM.

I know that using InjectionStrategy.OPTIONAL saves you from asking yourself “is this property always present?”, but that’s a very poor excuse. Because with just a few more seconds of work you can make your Sling Model more robust by just providing default values for every injected field. So please:

  • Avoid using optional injections when possible!
  • When it’s required use the Optional type to express it!
  • Don’t use InjectionStrategy.OPTIONAL!

Using “optional” (in all cases) can also come with a performance impact when used with the generic @Inject annotation; for that read my earlier blog posts on the performance of Sling Models: Sling Model Performance.

by Jörg at September 26, 2025 10:39 AM

September 23, 2025

Things on a content management system - Jörg Hoh

How not to do content migrations

(Note: This post is not about getting content from environment A to B or from your AEM 6.5 to AEM CS.)

The requirements towards content and component structure evolve over time; the components which you started initially with might not be sufficient anymore. For that reasons the the components will evolve, they need new properties, or components need to be added/removed/merged, and that must be reflected in the content as well. Something which is possible to do manually, but which will take too much work and is too error-prone. Automation for the rescue.

I already came across a few of those “automated content migrations”, and I have found a few patterns which don’t work. But before I start with them, let me briefly cover the one pattern, which works very well.

The working approach

The only working approach is a workflow, which is invoked on small-ish subtrees of your content. It skips silently over content which does not need to be migrated, and reports every situation which got migrated. It might even have a dry-run mode, which just reports everything it would change. This approach has a few advantages:

  • It will be invoked intentionally on author only, and only operates a single, well-defined subtree of content. It logs all changes it does.
  • It does not automatically activate every change it has done, but requires activation as a dedicated second step. This allows to validate the changes and activate it only then.
  • If it fails, it can repeatedly get invoked on the same content, and continue from were it has left.
  • It’s a workflow, with the guarantees of a workflow. It cannot time out as a request can do, but will complete eventually. You can either log the migration output or store it as dedicated content/node/binary data somewhere. You know when a subtree is migrated and you can prove that it’s completed.

Of course this is not something you can simply do, but it requires some planning in both designing, coding and the execution of the content migration.

Now, let’s face the few things which don’t work.

Non-working approach 1: Changing content on the fly

I have seen page rendering code, which tries to modify the content it is operating on, removing old properties, adding new properties either with default values and other values.

This approach can work, but only if the user has write permissions on the content. As this migration happens at the first time the rendering is initiated with write permissions (normally by a regular editor on the authoring system), it will fail in every other situation (e.g on publish if the merging conditions exist there as well). And you will have a non-cool mix of page rendering and content-fixup code in your components.

This is a very optimistic approach, over which you don’t have any control, and for that reason you probably can never remove that fixup code, because you never know if all content has already been changed.

Non-working approach 2: Let’s do it on startup

Admitted, I have seen this only once. But it was a weird thing, because a migration OSGI service was created, which executed the content migration in its activate() method. And we came across it because this activate delayed the entire startup to a situation, which caused our automation to run into a timeout, because we don’t expect a startup of an AEM instance to take 30+ minutes.

Which is also its biggest problem and which makes it unusable: You don’t have any control over this process, it can be problematic in the case of clustered repositories (in AEM CS authoring) and even if the migration has already been completed, the check if there’s something to do can take quite long.

But hey, when you have it already implemented as service, it’s quite easy to migrate it to a workflow and then use the above recommended approach.


Let me know if you have found other cases of working or non-working approaches for content migration; but in my experience it’s always the best way to make this an explicit task, which can be planned, managed and properly executed. Everything else can work sometimes, but definitely with a less predictable outcome.

by Jörg at September 23, 2025 07:51 AM

September 17, 2025

Things on a content management system - Jörg Hoh

SQL injection in AEM?

TL;DR While SQL injection in AEM is less a problem than in other web frameworks, it should not be ignored. Because being able to read and extract content can pose a problem. For that reason review your code and your permission setup.

When you follow the topic of software security, you are probably well aware of the problem of “SQL injection“, a situation in where an attacker can control (parts of) a SQL command which his sent to a database. In one way or another, this SQL injection is part of the OWASP Top 10 issues for a really long time. And even if almost all application frameworks have built-in ways to mitigate it, it’s still a problem.

From a highlevel perspective, AEM-based applications can also be affected by SQL injections. But due to its design, the impact is less grave:

  • JCR SQL/XPath or the QueryBuilder are just query languages, and they don’t support any kind of operations which create or modify content.
  • These queries always operate on top of a JCR session, which implements resource based access control on top of principals (users and groups).

These 2 constraints limit the impact of any type of “SQL injection” (in which an attacker can control parts of a query), because as an attacker you can only retrieve content which the principal you are impersonating has read access to. For that reason a properly designed and implemented permission setup will prevent that any sensitive data, which should not be accessible to that principal, can be extracted; and modifications are not possible at all.

Nevertheless, SQL injection is possible. I frequently see code, in which parameters are passed with a requests, which are not validated and checked, but instead are passed unfiltered as repository path into queries or other API calls. Of course this will cause exceptions or NPEs if that repository path is accessible to the session associated with that request. But if that user session has read access to more data than it actually needs (or even uses a privileged session which has even more access to content), an attacker can still access and potentially extract content which the developers have not planned for.

So from a security point of view you should care about SQL injection also in AEM:

  • Review your code and make sure that it does proper input validation.
  • Review your permission setup. Especially check for the permissions of the anonymous user, as this user is used for the non-authenticated requests on publish.
  • Make sure that you use service users only for their intended purposes. On the other hand, the security gain by service-users is very limited, if code invoked and parameterized by an anonymous request executes a query with a service-user on restricted content only accessible to this service-user. In that case you can make that restricted content readable directly to the anonymous user and it would not be less secure.

by Jörg at September 17, 2025 05:14 PM

June 11, 2025

CQ5 Blog - Inside Solutions

Who Are the Best Magnolia CMS Partners in Switzerland?

Who Are the Best Magnolia CMS Partners in Switzerland?

Choosing the right Magnolia CMS partner is key to driving your digital transformation.

The ideal partner combines technical skill with business insight, from user experience and Java development to DevOps, consulting, and custom digital solutions.

In this article, we highlight top agencies in Switzerland that deliver end-to-end expertise — from software services to digital marketing, helping brands build powerful, future-ready platforms.

How to Choose the Right Magnolia CMS Partner in Switzerland

Selecting the right Magnolia CMS partner in Switzerland can make or break your digital transformation project. To ensure a smooth implementation and long-term success, here are four essential criteria to consider:

1. Proven Expertise in Magnolia CMS

Look for a partner with a solid track record in delivering Magnolia CMS solutions. Experience matters especially when it comes to complex integrations, custom development, and enterprise-grade deployments.

2. Industry-Specific Know-How

Every industry has its own digital challenges. Whether you’re in finance, insurance, public services, or retail, choose a Magnolia partner with experience in your vertical. This ensures a faster time-to-value and a solution tailored to your business needs.

3. Real Client Success Stories

Client testimonials and case studies offer valuable insights into how an agency performs in real-world projects. A reputable partner will showcase successful collaborations and long-term customer relationships.

4. Official Certifications and Recognized Partnerships

Ensure the agency is a certified Magnolia partner, ideally at the Gold or Platinum level.

These certifications guarantee that the team meets Magnolia’s quality standards, follows best practices, and receives ongoing training. Certified partners also get access to exclusive tools, updates, and support, helping them deliver more secure and future-proof implementations.

Our selection of top Magnolia cms Partners in Switzerland

VASS – A Magnolia Platinum Partner

VASS is one of the leading Magnolia CMS partners in Switzerland, with a strong local presence and global reach. Headquartered in Liestal, VASS supports major Swiss enterprises like SBB, Manor, Helsana, BCV, and Pictet in building future-ready CMS platforms and seamless digital experiences.

As a Platinum Magnolia Partner since 2012, VASS brings deep expertise in enterprise-grade content management, integration, and marketing technology. With over 30 Magnolia certifications, we ensure high delivery standards and continuous alignment with Magnolia’s evolving ecosystem.

But VASS is more than just a local agency. With 4,900 experts across 26 countries in Europe, America, and Asia, VASS helps clients across industries such as banking, insurance, retail, public services, telecom, and media lead their digital future.

Magnolia is at the core of many of VASS large-scale content-driven projects. A few examples:

  • Telefónica evolved its Emocion Portal with Magnolia as the foundation of its digital experience.
  • Rentokil reduced content update cycles from days to minutes thanks to Magnolia’s flexibility.
  • Prosegur connected 160,000+ users via a unified Magnolia-powered platform.

VASS is your trusted CMS partner in Switzerland and beyond with end-to-end Magnolia capabilities, from architecture to optimization.

Tinext

Tinext is an Italian-based digital agency with a strong presence in Ticino, Switzerland. As a long-time Magnolia Premium Partner, Tinext supports enterprise clients in designing and delivering tailored digital experiences. 

Their work focuses on helping organizations turn digital platforms into strategic assets through a combination of advisory, implementation, and operations services.

​​

fastforward websolutions

fastforward websolutions is a Swiss-based boutique digital agency with teams in Bern, offering specialized support for Magnolia CMS projects. With over 25 years of experience, they provide front-end and back-end development, CRM and eCommerce integrations, and DevOps services tailored to client needs.

Known for their agility and personal approach, fastforward is a great fit for small to mid-sized projects or as a technical partner for agencies. While they offer deep expertise and flexibility, their size may limit their ability to scale for large enterprise-level implementations.

Jils

jls is a Swiss digital agency and an independent subsidiary of Swisscom, with offices in Lucerne, Zurich, and Bern. Their 100-person team combines design, development, and digital marketing to deliver cross-channel experiences across web, mobile, and cloud platforms.

Known for crafting tailor-made web experiences, jls focuses on creating functional and emotional value for users. Their work spans custom applications, interaction design, and digital strategies, helping businesses stand out through memorable and effective digital solutions.

Arvato Systems

Arvato Systems, part of the Bertelsmann Group, is a global IT services provider with over 2,600 experts in 25+ locations. The company supports enterprise clients in their digital transformation, offering solutions across cloud computing, CRM, e-commerce, marketing, and application management.

With deep experience in sectors like retail, healthcare, utilities, and media, Arvato delivers robust digital platforms and long-term IT operations support. Their approach combines strong technical foundations with personalized, partnership-driven client relationships — making them a reliable choice for large-scale Magnolia CMS projects.

Magnolia CMS – Frequently Asked Questions

What is Magnolia CMS?

Magnolia is an enterprise-grade, Java-based content management system (CMS) that enables businesses to create, manage, and deliver digital content across multiple channels. Known for its flexibility, headless capabilities, and strong integration options, it’s designed to support scalable and secure digital experiences.

Is Magnolia a headless CMS?

Yes. Magnolia offers headless CMS features, allowing you to manage content independently from the front-end. This means you can publish to websites, mobile apps, and other platforms seamlessly—while still benefiting from Magnolia’s visual editing tools for marketers and authors.

What makes Magnolia different from other CMS platforms?

Magnolia stands out for its:

  • Intuitive authoring interface
  • Strong personalization capabilities
  • Modular architecture
  • Enterprise-grade integrations
  • Open-source flexibility with commercial support

It strikes a balance between usability for marketers and power for developers.

Who uses Magnolia CMS in Switzerland?

In Switzerland, Magnolia CMS is trusted by a wide range of leading brands, public institutions, and enterprise organizations. These include major players in finance, insurance, healthcare, energy, and retail, all relying on Magnolia for scalable, flexible digital experiences.

Some notable Swiss-based companies using Magnolia CMS include:

  • Migros – Switzerland’s largest retailer
  • TWINT – Leading mobile payment app
  • Baloise Group – Integrated Magnolia across 16 corporate sites
  • BLKB (Basellandschaftliche Kantonalbank) – Halved publishing times
  • Primeo Energie – Built faster and more personalized web experiences
  • Group Mutuel – Improved editorial efficiency
  • EFG International – 80% faster content publishing
  • City of Lausanne – Digital services for 300,000+ residents
  • Visana – Leading digital healthcare solutions
  • Emmi, Selecta, Belimo, Medela, and Kuhn Rikon – Streamlining multisite management and content operations

Magnolia helps Swiss organizations deliver high-performance, multichannel digital experiences, from public sector websites to retail platforms and corporate portals.

Why choose a certified Magnolia partner?

Certified partners bring:

  • Proven implementation experience
  • Up-to-date training and certifications
  • Access to Magnolia’s partner portal and support
  • Best practices for DevOps, consulting, and digital marketing

Working with a Platinum or Gold partner ensures a smoother, more effective deployment.

Would you like to start a new Magnolia project?

We are happy to help.

The post Who Are the Best Magnolia CMS Partners in Switzerland? appeared first on One Inside.

by Samuel Schmitt at June 11, 2025 02:14 PM

January 28, 2025

Things on a content management system - Jörg Hoh

AEM CS: Java 21 update

After a lengthy preparation period, this year the rollout of Java 21 will start for AEM as a Cloud Service. While the public documentation contains all relevant information (and I don’t want to reiterate them here), I want to make a few things more clear.

First, this is the update of the Java version used to run AEM as a Cloud Service. This version can be different from the Java version which is used to build the application. As Java versions are backwards compatible and can read binaries created by older versions, it is entirely possible to run the AEM CS instance with Java 21, but still build the application with Java 11. Of course this restricts you to the language features of Java 11 and for example you cannot use Records, but besides that there is no negative impact at all.

This scenario is fully supported; but at some point you need to update your build version to a newer Java version, as freshly added APIs might use Java features which are not available in Java 11. And as a personal recommendation I would suggest to switch also your build time Java version to Java 21.

This change of the runtime Java version should in most cases be totally invisible for you; at least as long as you don’t use or add 3rd-party libraries, which need to support new Java versions explicitly; the most prominent libraries in the AEM context are Groovy (often as part of the Groovy console) and the ASM library (a library which allows to create and modify Java bytecode). If you deploy one of these into your AEM instance, make sure that you update these to a version which supports Java 21.

by Jörg at January 28, 2025 02:53 PM

January 18, 2025

Things on a content management system - Jörg Hoh

JCR queries with large result sets

TL;DR: If you expect large result sets, try to run that query asynchronously and not in a request; and definitely pay attention to the memory footprint.

JCR queries can be a tricky thing in AEM, especially when it comes to their performance. Over the years practices have emerged, with the most important of them being “always use an index”. You can find a comprehensive list of recommendations in the JCR Query cheat sheet for AEM.

There you can also find the recommendation to limit the size of the result set (it’s the last in the list); while that can definitely help if you need just 1 or a handful of results, this recommendation is void if you need to compute all results of a query. And that situation can get even worse if you know that this result set can be large (like thousands or even tens of thousands of results).

I have seen that often, when content maintenance processes were executed in the context of requests, which took many minutes in an on-prem setup, but then failed on AEM CS because of the hard limit of 60 seconds for requests.

Large result sets come with their own complexities:

  1. Iterating through the entire result set requires ACL checking plus the proper conversion into JCR objects. That’s not for free.
  2. As the query engine puts a (configurable) read limit to a query, it can have a result set of at maximum 100k nodes by default. This number is the best case, because any access to the repository to post-filter the result delivered by the Lucene index also counts towards that number. If you cross that limit, reading the result set will terminate with an exception.
  3. The memory consumption: While the JCR queries provide an iterator to read the result set, the QueryBuilder API provides API which read the entire result set and return it as a list (SearchResult.getHit()). If this API is used, just the result set can consume a significant amount of heap.
  4. And finally: what does the application do with the result set? Is it performing an operating with each result individually and then does not the single result anymore? Or does it read each result from the query, performs some calculations and stores them again in a list/array for the next step of processing. Assuming that you have 100k querybuilder Hits and 100k custom objects (potentially even referencing the Hit objects), that can easily lead to a memory consumption in the gigabytes.
  5. And all that could happen in parallel.

In my experience all of these properties of large result sets mandate that you run such a query asynchronously, as it’s quite possible that this query takes tens of seconds (even minutes) to complete. Either run it as a Sling Job or using a custom Executor in the context of an OSGI service, but do not run them in the context of request, as in AEM CS this request has the big chance to time out.

by Jörg at January 18, 2025 12:26 PM

December 20, 2024

Things on a content management system - Jörg Hoh

This was 2024

Wow, another year has passed. Time for a recap.

My personal goal for 2024 in this blog was to post more often and more consistently, and I think that I was successful at that. When I counted correctly, it were 20 posts in 2024. The consistency in the intervals could be better (a few just days apart, other multiple weeks), but unlike in some other years I never really felt, that I was lagging way behind. So I am quite happy with it and will try to do the same in 2025.

This year I adopted 2 ideas from other blogs:

  • A blog post series, which is planned as such. In January and February I posted 5 posts on Modeling  Performance Tests (starting here). This approach worked quite well, mostly because I spent enough time to write them before I made the first post public. If I know upfront that topics are large enough, I will continue with this type.
  • The “top N things …” type of posts. I don’t particular like this type of posting, because very often they just scream for attention and clicks, without adding much value. I used that approach 2 times (The new AEM CS feature in 2024 I love most and My top 3 reasons why page rendering is slow) ; and then mostly to share links to other pages. It can work that way, but that will never be my favorite type of blog post.

The most successful blog post of 2024: As I did not add any page analytics to this page (I would need a cookie banner then), I have only some basic statistics from WordPress. The top 3 requested pages besides the start page in 2024 were:

  1. CQ development patterns – Sling ResourceResolver and JCR Sessions (written in 2013)
  2. Do not use AEM as proxy for backend calls (of 2024)
  3. How to analyze “Authentication support missing” (of 2023)

Interesting that a 10 year old article was requested most often. Also WordPress showed me that LinkedIn was a significant source of traffic, so I probably should continue to announce blog posts there. (If you think I should also do announcements elsewhere, let me know.)

And just today I saw the latest video from Tad Reeves, where he mentioned my article on performance testing in AEM CS. Thank you Tad, I really appreciate your feedback and the recognition!

That’s for 2024! I wish you all a relaxing break and a successful year 2025!

by Jörg at December 20, 2024 04:10 PM

December 11, 2024

Things on a content management system - Jörg Hoh

My top 3 reasons why page rendering is slow

In the past years I was engaged in many performance tuning activities, which related mostly to slow page rendering on AEM publish instances. Performance tuning on authoring side is often different and definitely much harder :-/

And over the time I identified 3 main types of issues, which make the page rendering times slow. And slow page rendering can be hidden by caching, but at some point the page needs to be rendered, and often it makes a difference if this process takes 800ms or 5 seconds. Okay, so let’s start.

Too many components

This is a pattern which I see often in older codebases. Often pages are assembled out of 100+ components, very often in deep nesting. My personal record I have seen were 400 components, nested in 10 levels. This normally causes problems in the authoring UI because you need to very careful to select the correct component and its parent or a child container.

The problem on the page rendering process is the overhead of each component. This overhead consists of the actual include logic and then all the component-level filters. While each inclusion and each component does not take much time, the large number of components cause the problem.

For that reason: Please please reduce the number of components on your page. Not only the backend rendering time, but also the frontend performance (less javascript and CSS rules to evaluate) and the authors experience will benefit from it.

Slow Sling models

I love Sling Models. But they can also hide a lot of performance problems (see my series about optimizing Sling Models), and thus can be a root-cause for performance problems. In the context of page rendering and Sling Models backing HTL scripts, the problem are normally not the annotations (see this post), but rather the complex and time-consuming logic when the models are instantiated, most specifically the problems with executing the same logic multiple times (as described in my earlier post “Sling Model Performance (Part 4)“).

External network connections

This pattern requires that during page rendering a synchronous call is done towards a different system; and while this request is executed the rendering thread on the AEM side is blocked. This can turn into problems if the backend is either slow or not available. Unfortunately this is the hardest case to fix, because removing this often requires a re-design of the application. Please see also my post about “Do not use AEM as a proxy for backend calls” for this; it contains a few recommendations how to avoid at least some of the worst aspects, for example using proper timeouts.

by Jörg at December 11, 2024 05:46 AM

December 02, 2024

Things on a content management system - Jörg Hoh

Sling model performance (part 4)

I think it’s time for another chapter in the topic of Sling Model performance, just to document some interesting findings I have recently made in the context of a customer project. If you haven’t read them, I recommend you to check the first 3 parts of this series here:

In this blog post I want to show the impact of inheritance in combination with Sling Models.

Sling Models are simple Java POJOs, and for that reason all features of Java can be used safely. I have seen many projects, where these POJOs inherit from a more or less sophisticated class hierarchy, which often reflect the component hierarchy. These parent classes also often consolidate generic functionality used in many or all Sling Models.

For example many Sling Models need to know the site-root page, because from there on they build links, the navigation, read global properties from etc. For that reason I have seen in many parent classes code like this:

public class AbstractModel {

  Page siteRoot;

  public void init() {
    siteRoot = getSiteRoot();
    // and many more initializations
  }
}

And then this is used like this by a Sling Model called ComponentModel:

public class ComponentModel extends AbstractModel {

  @PostConstruct
  public void init() {
    super();
  }
  ...
}

That’s all straight forward and good. But only until 10 other Sling Models also inherit from the AbstractModel, and all of them also invoke the getSiteRoot() method, which in all cases returns a page object representing the same object in the repository. Feels redundant, and it is. And it’s especially redundant, if a Model invokes the init() method of its parent and does not really need all of the values calculated there.

While in this case the overhead is probably small, I have seen cases where the removal of this redundant code brought down the rendering time from 15 seconds to less 1 second! That’s significant!

For this reason I want to make some recommendations how you can speed up your Sling Models when you use inheritance.

  • If you want or need to use inheritance, make sure that the parent class has a small and fast init method, and that it does not add too much overhead to each construction of a Sling Model.
  • I love Java Lambdas in this case, because you can pass them around and only invoke them when you really need their value. That’s ideal for lazy evaluation.
  • And if you need to calculate values more than once, store them for later reuse

by Jörg at December 02, 2024 07:42 PM

November 26, 2024

Things on a content management system - Jörg Hoh

Monitoring Java heap

Every now and then I get the question: “What do you think if we alert at 90% heap usage of AEM?”. The answer is always longer, so I write it down here for easier linking.

TL;DR: Don’t alert on the amount of used heap, but only on garbage collection.

Java is language which relies on garbage collection (GC). Unlike other programming languages memory is managed by the runtime. The operator assigns a certain amount of RAM to the java process for usage, and that’s it. A large fraction of this RAM goes into the heap, and the Java Virtual machine (JVM) manages this heap entirely on its own.

Now, as every good runtime, the JVM is lazy and does work only when it’s required. That means it will start the garbage collection only when then the amount of free memory is low. This is probably over-simplified, but good enough for the purpose of this article.

That means that the heap usage metrics show that the heap usage is approaching 100%, and then it suddenly drops to a much lower value, because the garbage collection process just released memory which is no longer required. And then the garbage collection pauses and the processing goes on, consuming memory, until at some point the garbage collection starts again. This leads to the typical saw-tooth pattern of the JVM.

(source: Interesting Garbage Collection Patterns by Ram Lakshamanan)

For that reason it’s not helpful to use the heap usage as alerting metric, as it fluctuates too much, and it will alert you when the actual memory usage is already down.

But of course there are other situations, where the saw-tooth pattern gets less visible, as the garbage collection can release less memory with each run, and that can indeed point to a problem. How can this get measured?

In this scenario the garbage collection runs more frequently, and the less the garbage collection releases, the more often it runs, until the entire application is effectively stopped and only the garbage collection is running. That means that here you can use the amount of the time the garbage collector runs per time period. Anything below 5% is good, and anything beyond 10% is a problem.

For that reason, rather measure the garbage collection, as it is a better indicator if your heap is too small.

by Jörg at November 26, 2024 07:43 PM

November 17, 2024

Things on a content management system - Jörg Hoh

Delivering dynamic renditions

One of the early features of ACS AEM Commons was the Named Image Transformer as part of the release 1.5 of 2014. This feature allowed you to transform image assets dynamically with a number of options, most notable the transformation into different images dimensions to match the requirements of the frontend guidelines. This feature was quite popular and in a stripped-down scope (it does not support all features) it also made it into the WCM Core Components (called the AdaptiveImageServlet).

This feature is nice, but it suffers from a huge problem: This transformation is done dynamically on request, and depending on the image asset itself it can consume a huge amount of heap memory. The situation gets worse when many of such requests are done in parallel, and I have seen more than once situations of AEM publish instances ending up in heavy garbage collection situations, ultimately leading to crashes and/or service outages.

This problem is not really new, as pretty much the same issue also happens on asset ingestion time, when the predefined renditions are created. While on AEM 6.5 the standard solution was to externalize to this problem for asset ingestion (hello Imagemagick!), and AEM CS solved this challenge in a different and more scalable way using AssetCompute. But both solutions did not address the problem of enduser requests to these dynamic renditions, this is and was still done on request in the heap.

We have implemented a number of improvements in the AdaptiveImageServlet to improve the situation:

  • A limit for requested dimensions was added to keep the memory consumption “reasonable”.
  • The original rendition is necessarily used as a basis to render the image in the requested dimension, but rather the closest rendition, which can satisfy the requirements of the requested parameters.
  • An already existing rendition is delivered , if its dimensions and image format is requested.
  • An upcoming improvement for the AdaptiveImageServlet on AEM CS is to deliver these renditions directly from the blobstore instead of streaming the binary via the JVM.

This improves the situation already, but there are still customers and cases, where images are resized dynamically. For these users I suggest to make the these changes:

  • Compile a list of all required image dimensions which you need in your frontend.
  • And then define matching processing profiles, so that whenever such a rendition is requested via the AdaptiveImageServlet it can be served directly from an existing rendition.

That works without changes in your codebase and will improve the delivering of such assets.

And for the users of the Named Image Transformer of ACS AEM Commons I suggest to rethink the usage of it. Do you really use all of its features?

by Jörg at November 17, 2024 04:48 PM

October 20, 2024

Things on a content management system - Jörg Hoh

Restoring deleted content

I just wrote about backup and restore in AEM CS, and why backups cannot serve as a replacement for an archival solution. But instead it’s just designed as a precaution for major data loss and corruption.

But there is another aspect to that question: what about deleted content? Is requesting a restore the proper way to handle these cases?

Assume that you have accidentally deleted an entire subtree of pages in your AEM instance. From a functional point of view you can perform a restore to a time before this deletion of content. But that means that a rollback of the entire content is made, which means that not only this deleted content is restored, but also other changes which performed since that time would be undone.

And depending on the frequency of activities and the time you would need to restore this can be a lot. And you would need to perform all these changes again to catch-up.

The easiest way to handle such cases is to use the versioning features of AEM. Many activities trigger the creation of a version of a page, for example when you activate it, when you delete it via the UI; you can also manually trigger the creation of a version. To restore one page or even an entire subtree you can use the “Restore” and “Restore Tree” features of AEM (see the documentation).


In earlier versions of AEM versions have not been created for Assets by default, but this has changed in AEM CS; now versions are created for assets pretty much as they are creted for pages by default. That means you can use the same approach and restore versions of assets via the timeline (see the documentation).

With the proper versioning in place, most if not all of such accidental deletions or changes can be handled; this is the preferred approach to handle it, because it can be executed by regular users and does not have an impact on the rest system of the system by rolling back really all changes. And you don’t have any downtime on authoring instances.

For that reason I recommend you to work as much as possible with these features. But there are situations, where the impact is that severe that you rather want to roll back everything than restoring things through the UI. In that situation a restore is probably the better solution.

by Jörg at October 20, 2024 06:02 PM

October 09, 2024

Things on a content management system - Jörg Hoh

AEM CS Backup, Restores and Archival

One recurring question I see in the Adobe internal communication channels is like this: “For our customer X we need to know how long Adobe stores backups for our CS instances”.

The obvious answer to this is “7 days” (see the documentation) or “3 months” (for Offsite backup), because the backup is designed only to handle cases of data corruption of the repository. But in most cases there is a followup question “But we need access to backup data up to 5 years”. Then it’s clear that this question is not about backup, but rather about content archival and compliance. And that’s a totally different question.

TL;DR

When you need to retain content for compliance reasons, my colleagues are happy to discuss the details with you. But increasing the retention period for your backups is not a solution for it.

Compliance

So what does “content archival and compliance” mean in this situation? For regulatory and legal reasons some industries are required to retain all public statements (including websites) for some time (normally 5-10 years). And of course the implementation of that is up to the company itself. And it seems quite easy to implement an approach which holds the backups for up these 10 years around.

Some years back I spent some time on the drawing board to design a solution for an AEM on-prem customer; their requirement was to be able to prove what at any time within these 10 years was displayed to customers on their website.
We initially also thought about keeping backups around for 10 years; but then we came up with these questions:

  • When the content is required, a restore from that backup would be required to an environment which can host this AEM instance. Is such an environment (servers, virtual machines) available? How much of these environments would be required, assuming that this instance would be required to run for some months (throughout the entire legal process which requires content from that time)?
  • Assuming that an 8y old backup must be restored, are there still the old virtual machine images with Redhat Linux 7 (or whatever OS) around? Is it okay from a compliance perspective to run these old and potentially unsupported OS versions even in a secured network environment? Is the documentation still around which describes to install all of that? Does your backup system still support a restore to such an old OS version?
  • How would you authenticate against such an old AEM version? Would you require your users to have their old passwords at hand (if you authenticate against AEM), or does your central identity management still support the interface this old AEM version is trying for authentication?
  • As this is a web page, is it ensured that all external references, which are embedded into the page are also available? Think about the Javascript and CSS libraries, which are often just pulled from their respective CDN servers.
  • How frequently must a backup be stored? Is it okay and possible to store just the authoring instance every quarter and do not perform any cleanup (version cleanup, workflow purge, …) in that time and have all content changes versioned, so you can use the restore functionality to go back to the requested time? Or do you need to store a backup after each deployment, because each deployment has the chance to change the UI and introduce backwards incompatible changes, which render the restored content not to work anymore? And would you need to archive the publish instance as well (where normally no versions are preserved)? And are you sure that you can trust the AEM version storage enough, so you can rely on JCR versioning to recreate any intermediary states between those retained backups?
  • When you design such a complex process, you should definitely test the restore process regularly.
  • And finally: What are the costs of such a backup approach? Can you use the normal backup storage, or do you need a special solution which guarantees that the stored data cannot be tampered with?

You can see that the list of questions is long. I don’t say it is impossible, but it requires a lot of work and attention to detail.

In my project the deal breaker was the calculated storage cost (we would have required a dedicated storage, as the normal backup storage did not provide the required guarantees for archival purposes). So we decided to take a different approach, and we added a custom process which creates a PDF/A out of every activated page and stores it in the dedicated archival solution (assets are stored as is). This adds upfront costs (a custom implementation), but is much cheaper on the long run. And on top if it does not need IT to access the old version of the homepage of January 23, 2019; but instead the business users or legal can directly access the archive and fetch the respective PDF of the time they are interested in.

In AEM CS the situation is a bit different, because the majority of the questions above deal with “old AEM vs everything else around is current”, and many aspects are not relevant for customers anymore; they are in the domain of Adobe instead. But I am not aware that Adobe ever planned to setup such a time machine, which allows to re-create everything at a specific point in time (besides all implications of security etc), mostly because “everything” is a lot.

So, as a conclusion: Using backups for content archival and compliance is not the best solution. It sounds easy at first, but it raises a lot of question if look into the details. The longer you need to retain these AEM backups, the more likely will it be that inevitable changes in the surrounding environments makes a proper function harder or even impossible.

by Jörg at October 09, 2024 05:39 PM

October 05, 2024

Things on a content management system - Jörg Hoh

The new AEM CS feature in 2024 which I love most

Pretty much 4 years ago I joined the AEM as a Cloud Service engineering team, and since that time I am working on the platform level as a Site Reliability Engineering. I work on platform reliability and performance and help customers to improve their applications in these aspects.

 But that also means, that many features which are released throughout the years are not that relevant for my work. But there are a few ones that matter a lot to me. They allow me to help customers in really good and elegant ways.

In 2024 there was one, which I like very much, and that’s the Traffic Rules feature (next to the custom error page and CDN cache purging as self-service). I like it, because it lets you filter and transform traffic at scale where it can be handled best: At the CDN layer.

Before that feature was available, all traffic handling needed to happen at the dispatcher level. The combination of the Apache httpd and dispatcher rules allowed you to perform all these operations. However, I consider it a bit problematic. Because at that point the traffic already hit the dispatcher instances. It was already in your datacenter, on your servers.

To mitigate that, many customers (both onprem/AMS or AEM CS) purchased a WAF solution to handle specifically these cases. But now with the traffic rules every AEM CS customers gets a new set of features which they can use to handle traffic on the CDN level.

The documentation is quite extensive and contains relevant examples, showcasing the ways how you can block, ratelimit or transform traffic to your needs:

The most compelling reason I rate this as my top feature this year is really the traffic transformation feature.

A part of my daily job is to help customers to prepare their AEM CS instances to handle their traffic spikes. Besides all the tunings on the backend, the biggest angle to improve this sutuation is to handle all these requests at the CDN. Because then it’s not hitting the backend at all.

A constant problem in that situation are request parameters which are added by campaigns. You might know the “utm*”, “fbclid” or “gclid” query parameters when traffic comes to your site which was clicked either on Facebook or Google. And there are many more. Analytics tool need these parameters to attribute traffic to the right source and to measure the effectiveness of campaigns, but from a traffic management point of view these parameters are horrible. Because by default all CDNs and intermediate caches are considering such requests with query strings as non-cacheable. And that means, that all these requests hit your publish instances, and the CDN and the dispatcher caches are mostly useless for that.

It’s possible to remove these request parameters on the dispatcher (using the /IgnoreUrlParams configuration). But with the traffic transformation feature of AEM CS you can remove them also directly on the CDN, so that this traffic is then served entirely from the CDN. That’s the best case situation, because then these requests never make it to origin, which improves latency for end users.

I am very happy about this feature, because with it the scaling calculation gets much easier, when such campaign traffic is handled almost entirely by the CDN. And that’s the whole idea behind using a CDN: To handle the traffic spikes.

For this reason I recommend every AEM CS customer to check out the traffic rules to filter and transform traffic at the CDN level. It is included in every AEM CS offering and you don’t need the extra WAF feature to use it.
Configure these rules to handle all your campaign traffic and increase the cache hit ratio. It’s very powerful and you can use it to make your application much more resilient.

by Jörg at October 05, 2024 05:08 PM

August 17, 2024

Things on a content management system - Jörg Hoh

Java interfaces, OSGI and package versions

TL;DR Be cautious when implementing interfaces provided by libraries, you can get problems when these libraries are updated. Check for the @ProviderType and @ConsumerType annotations of the Java interfaces you are using to make sure that you don’t limit yourself to a specific version of a package, as sooner or later this will cause problems.

One of the principles of object-oriented programming is the encapsulation to hide any implementation details. Java uses interfaces as a language feature to implement this principle.

OSGI uses a similar approach to implement services. An OSGI service offers its public API via a Java interface. This Java interface is exported and therefor it is visible to your Java code. And then you can use it how it is taught in every AEM (and modern OSGI) class like this:

@Reference
UserNotificationService service;

With the magic of Declarative Service a reference to an implementation of UserNotificationService is injected and you are ready to use it.

But if that interface is visible and with the power of Java at hand, you can create an instance of that class on your own:

public class MyUserNotificationService implements UserNotificationService {
...
}

Yes, this is possible and nothing prevents you from doing it. But …

Unlike Object-oriented programming, OSGI has some higher aspirations. It focuses on modular software, dedicated bundles, which can have an independent lifecycle. You should be able to extend functionality in a bundle without the need that all other code in other bundles needs to be recompiled. So a binary compatibility is important.

Assuming that the framework you are using comes with the UserNotificationService which like this

package org.framework.user;
public interface UserNotificationService {
  void notifyUserViaPopup (User user, NotificationContent notification);
}

Now you decide to implement this interface in your own codebase (hey, it’s public and Java does not prevent me from doing it) and start using it in your codebase:

public class MyUserNotificationService implements UserNotificationService {
  void notifyUserViaPopup (User user, NotificationContent notification) {
    ..
  }
}

All is working fine. But then the framework is adjusted and now the UserNotificationService looks like this:

package org.framework.user;
public interface UserNotificationService { // version 1.1
  void notifyUserViaPopup (User user, NotificationContent notification);
  void notifyUserViaEMail (User user, NotificationContent notification);
}

Now you have a problem, because MyUserNotificationService is no longer compatible to the UserNotificationService (version 1.1), because MyuserNotificationService does not implement the method notifyUserViaEmail. Most likely you can’t load your new class anymore, triggering interesting exceptions. You would need to adjust MyUserNotificationService and implement the missing method to make it run again, even if you would never need the notifyUserViaEmail functionality.

So we have 2 problems with that approach:

  1. It will be only detected on runtime, which is too late.
  2. You should not be required to adapt your code to changes in the other of some one else, especially if this is just an extension of the API you are not interested in at all.

OSGI has a solution for 1, but only some helpers for (2). Let’s check first the solution for (1).

Package versions and references

OSGI has the notion of “package version” and it’s best practice to provide version numbers for API packages. That means you start with a version “1.0” and and people start to use it (using service references). And when you make a compatible change (like in the example above you add a new method to the service interface) you increase the package version by a minor version to 1.1 and all existing users can still reference this service, even if their code was never compiled against the version 1.1 of the UserNotificationService. This is backwards-compatible change. If you are making a backwards-incompatible change (e.g removing a method from the service interface), you have to increase the major version to 2.0.

When you build your code and use the bnd-maven-plugin (or the maven-bundle-plugin) the plugin will automatically calculate the import range on the versions and store that information in the target/classes/META-INF/MANIFEST.MF. If you just reference services, the import range can be wide like this:

org.framework.user;version=([1.0,2)

which translates to: This bundle has a dependenty to the package org.framework.user with a version equal or higher than 1.0, but lower than (excluding) 2. That means that a bundle with this import statement will resolve with package org.framework.user 1.1. If you OSGI environment only exports org.framework.user in version 2.0, your bundle will not resolve.

(Much more can be written in this aspect, and I simplified a lot here. But the above part is the important part when you are working with AEM as a consumer of the APIs provided to you.)

Package versions and implementing interfaces

The situation gets tricky, when you are implementing exported interfaces. Because that will lock you to a specific version of the package. If you implement the MyUserNotificationService as listed above, the plugins will calculate the import range like this:

org.framework.user;version=([1.0,1.1)

This will basically lock you to that specific version 1.0 of the package. While it does not prevent changes to the implementation of any implementations of the UserNotificationService in your framework libraries, it will prevent any change to the API of it. And not only for the UserNotificationService, but also for all other classes in the org.framework.user package.

But sometimes the framework requires you to implement interfaces, and these interfaces are “guaranteed” to not change by the developers of it. In that case the above behavior does not make sense, as a change to a different class in the same package would not break any binary compatibility for these “you need to implement these interface” classes.

To handle this situation, OSGI introduced 2 java annotations, which can added to such interfaces and which clearly express the intent of the developers. They also influence the import range calculation.

  • The @ProviderType annotation: This annotation expresses that the developer does not want you to implement this interface. This interface is purely meant to be used to reference existing functionality (most likely provided by the same bundle as the API); if you implement such an interface, the plugin will calculate a a narrow import range.
  • The @ConsumerType annotation: This annotation shows the intention of the developer of the library that this interface can be implemented by other parties as well. Even if the library ships an implementation of that service on its own (so you can @Reference it) you are free to implement this interface on your own and register it as a service. If you implement such an interface with this annotation, the version import range will be wide.

In the end your goal should be not to have a narrow import version range for any library. You should allow your friendly framework developers (and AEM) to extend existing interfaces without breaking any binary compatibility. And that also means that you should not implement interfaces you are not supposed to implement.

by Jörg at August 17, 2024 05:28 PM

July 31, 2024

Things on a content management system - Jörg Hoh

Do not use the Stage environment in your content creation process!

Every now and then (and definitely more often than I ever expected) I come across a question about best practices, how to promote content from the AEM as a Cloud Service Stage environment to Production. The AEM CS standard process does not allow that, and on further request it turns out, that the customers

  • create and validate the production content on the Stage environment
  • and when ready, promote that content to the Production environment and publish it.

This approach contradicts quite a bit the CQ5 and AEM good practices (since basically forever!), which say:

Production content is created only on the production environment. The Stage environment is used for code validation and performance testing.

These good practice are directly implemented in AEM CS, and for that reason it is not possible to promote content from Stage to the Production environment.

But there are other implications in AEM CS, when your content creation process takes place on the Stage environment:

  • If your Stage environment is an integral part of your content creation process, then your Stage environment must not have any lesser SLA than the Production environment. It actually is another production environment. Which is not reflected in the SLAs in AEM CS.
  • If you use your Stage environment as part of the content creation process, which environment do you use for the final validation and performance testing? In the design of AEM CS this is the role of the Stage environment, because it is sized identical to Production.
  • in AEM CS the Production Fullstack pipeline covers both Stage and PROD environments, but in serial manner (first Stage and then PROD, often with an extended period of time for approval step in between). That means, that you can update your Stage environment, but not your Production environment, which could impact your content creation process.

For these reasons, do not expand your content creation process on 2 environments. If you have requirements which can only be satisfied with 2 dedicated and independent environments, please talk to Adobe product management early.

I am not saying that the product design is always 100% correct and that if you are wrong if you need 2 environments for content creation. But in most of the cases it was possible to fit the content creation process to the Production environment, especially with the addition of the preview publish. And if that’s still not a fit for your case, talk to Adobe early on, so we can learn about your requirements.

by Jörg at July 31, 2024 10:39 AM

June 27, 2024

Things on a content management system - Jörg Hoh

Do not use AEM as a proxy for backend calls

Since I am working with AEM CS customers, I came a few time across the architecture pattern, that requests made to a site to passed all the way through to the AEM instance (bypassing all caches), and then AEM does an outbound request to a backend system (for example a PIM system or other API service, sometimes public, sometimes via VPN), collects the result and sends back the response.

This architectural pattern is problematic in a few ways:

  1. AEM handles requests with a threadpool, which has an upper limit of requests it will handle (by default 200). That means that at any time the number of such backend requests is limited by the amount of AEM instances. In AEM CS this number is variable (auto-scaling), but even in an auto-scaling world there is an upper limit.
  2. The most important factor in the number of such requests AEM can handle per second is the latency of the backend system call. For example if your backend system responds always in less than 100ms, your AEM can handle up to 2000 of such proxy requests per second. If the latency is more likely 1 second, it’s only up to 200 proxy requests per second. This can be enough, this can be way too small.
  3. To achieve such a throughput consistently, you need to have agressive timeouts; if you configure your timeouts with 2 seconds, your guaranteed throughput can only be up to 100 proxy requests/seconds.
  4. And next to all those proxy requests your AEM instances also need to handle the other duties of AEM, most importantly rendering pages and delivering assets. That will reduce the number of threads you can utilize for such backend calls.

The most common issue I have seen with this pattern is that in case of backend performance problems the AEM threadpool of all AEM instances are consumed within seconds, leading almost immediately to an outage of the AEM service. That means, that a problem on the backend or on the connection between AEM and the backend takes down your page rendering abilities, leaving you with what is cached at the CDN level.

The common recommendation we make in these cases is quite obvious: introduce more agressive timeouts. But the actual solution to this problem is a different one:

Do not use AEM as a proxy.

This is a perfect example for a case, where the client (browser) itself can do the integration. Instead of proxy-ing (=tunneling) all backend traffic through AEM, the client could approach the backend service directly. Because then the constraints AEM has (for example the number of concurrent requests) do no longer apply for the calls to the backend. Instead the backend is exposed directly to the endusers, and uses whatever technology is suitable for that; typically it is exposed via an API gateway.

If the backend gets slow, AEM is not affected. If AEM has issues, the backend is not directly impacted because of it. AEM does not even need to know that there is a backend at all. Both systems are entirely decoupled.

As you see, I pretty much prefer this approach of “integration at the frontend layer” and exposing the backend to the endusers over any type of “AEM calls the backend systems”. Mostly because such architectures are less complex and easier to debug and analyze. And that should be your default and preferred approach, whenever this required.

Disclaimer: Yes, there are cases where the application logic requires AEM to do backend calls; but in these cases it’s questionable if such requests need to be done synchronously in requests, meaning that an AEM request needs to do a backend call to consume its result. If these request can be done async, then the whole problem vector I outlined above simply does not exist.

Note: In my opinion hiding the hostnames of your backend system is also not a good reason for such an backend integration. Also “the service is just available from within our company network and AEM accesses it via VPN” is not a good reason, too. In both cases you can achieve the same with an publicly accessible API gateway, which is specifically designed to handle such usecases and all security-relevant implications of it.

So, do not use AEM as a simple proxy!

by Jörg at June 27, 2024 06:57 PM

June 12, 2024

Things on a content management system - Jörg Hoh

My view on manual cache flushing

I read the following statement by Samuel Fawaz on LinkedIn regarding the recent announcement of the self-service feature to get the API key for CDN purge for AEM as a Cloud Service:

[…] 𝘚𝘰𝘮𝘦𝘵𝘪𝘮𝘦𝘴 𝘵𝘩𝘦 𝘊𝘋𝘕 𝘤𝘢𝘤𝘩𝘦 𝘪𝘴 𝘫𝘶𝘴𝘵 𝘮𝘦𝘴𝘴𝘦𝘥 𝘶𝘱 𝘢𝘯𝘥 𝘺𝘰𝘶 𝘸𝘢𝘯𝘵 𝘵𝘰 𝘤𝘭𝘦𝘢𝘯 𝘰𝘶𝘵 𝘦𝘷𝘦𝘳𝘺𝘵𝘩𝘪𝘯𝘨. 𝘕𝘰𝘸 𝘺𝘰𝘶 𝘤𝘢𝘯.

I fully agree, that a self-service for this feature was overdue. But I always wonder why an explicit cache flush (both for CDN and dispatcher) is necessary at all.

The caching rules are very simple, as the rules for the AEM as a Cloud Service CDN are all based on the TTL (time-to-live) information sent from AEM or the dispatcher configuration. The caching rules for the dispatcher are equally simple and should be well understood (I find that this blog post on the TechRevel blog covers this topic of dispatcher cache flushing quite well).

In my opinion it should be doable to build a model which allows you to make assumptions, how long it takes for a page update to be visible to all users on the CDN. And it also allows you to reason about more complex situations (especially when content is pulled from multiple pages/areas to render) and understand how and when content changes are getting visible for endusers.

But when I look at the customer requests coming in for cache flushes (CDN and dispatcher), I think that in most cases there is no clear understanding what actually happened; most often it’s just that on the authoring the content is as expected and activated properly, but this change does not show up the same way on publish. The solution is often to request a cache flush (or trigger it yourself) and hope for the best. And very often this fixes the problem, and then the most up-to-date content is delivered.

But is there an understanding why the caches were not updated properly? Honestly, I doubt that very often. The same way as infamous “Windows restart” can fix annoying, suddenly appearing problems with your computer, flushing caches seems be one of the first steps for fixing content problems. The issues goes away, we shrug and go on with our work.

But unlike in the case of Windows the situation is different here, because you have the dispatcher configuration in your git repository. And you know the rules of caching. You have everything you need to have to understand the problem better and even fix it from happening again.

Whenever the authoring users come to you with that request “content is not showing up, please flush the cache”, you should consider this situation as a bug. Because it’s a bug, as the system is not work as expected. You should apply the workaround (do the flush), but afterwards invest time into the analysis and root-cause analysis (RCA), why it happened. Understand and adjust the caching rules. Because very often these cases are well reproducible.

In his LinkedIn post Samuel writes “Sometimes the CDN cache is just messed up“, and I think that is not true. It’s not that it’s a random event you cannot influence at all. On the contrary. It’s an event which is defined by your caching configuration. It’s an event which you can control and prevent, you just need to understand how. And I think that this step of understanding and then fixing it is missing very often. And then the next from request from your authoring users for a cache flush is inevitable, and another cache flush is executed.

In the end flushing caches comes with the price of increased latency for endusers until the cache is populated again. And that’s a situation we should avoid as good as we can.

So as a conclusion:

  • An explicitly requested cache clear is a bug because it means that something is not working as expected.
  • And as every bug it should be understood and fixed, so you are no longer required to perform the workaround.

by Jörg at June 12, 2024 10:58 AM

May 31, 2024

Things on a content management system - Jörg Hoh

Adopting AEM as a Cloud Service: Shifting from Code-Centric Approaches

The first CQ5 version I worked with was CQ 5.2.0 in late 2009; and since then a lot changed. I could list a lot of technical changes and details, but that’s not the most interesting part. I want to propose this hypothesis as the most important change:

CQ5 was a framework which you had to customize to get value out of it. Starting with AEM 6.x more and more out-of-the-box features were added which can be used directly. In AEM as a Cloud Service most new features are directly usable, not requiring (or even allowing) customization.

And as corollary: The older your code base the more customizations, and the harder is the adoption of new features.

As a SRE in AEM as a Cloud Service I work with many customers, which migrated their application over from an AEM 6.x version. While the “best practice analyzer” is a great help to get your application ported to AEM CS, it’s just this: It helps you to migrate your customizations, the (sometimes) vast amount of overlays for the authoring UI, backend integrations, complex business and rendering logic, JSPs, et cetera. And very often this code is based on the AEM framework only and could technically still run on CQ 5.6.1, because it works with Nodes, Resources, Assets and Pages as the only building blocks.

While this was the most straight-forward way in the times of CQ5, it becomes more and more a problem in later versions. With the introduction of Content Fragments, Experience Fragments, Core Components, Universal Editor, Edge Delivery Services and others, many new features were added which often do not fit into the self-grown application structures. These product features are promoted and demoed, and it’s understandable that the business users want to use them. But the adoption of these new features would often require large refactorings, proper planning and a budget for it. Nothing you do in a single 2-week sprint.

But this situation also has impact on the developers themselves. While customizations through code were the standard procedure in CQ5, there are often other ways available in AEM CS. But when I read through the AEM forums and new blog posts for AEM, I still see a large focus on coding: Custom servlets, sling models, filters, whatever. Often using the same old CQ5 style we had to use 10 years ago, because there was nothing else. That approach still works, but it will lead you into the customization hell again. Also many in violation of the practices recommended for AEM CS.

That means:

  • If you want to start an AEM CS project in 2024, please don’t follow the same old approach.
  • Make sure that you understand the new features introduced in the last 10 years, and how you can mix and match them to implement the requirements.
  • Opening the IDE and start coding should be your last resort.

It also makes sense to talk with Adobe about the requirements you need to implement; I see that features requested by many customers are often prioritized and are implemented with customer involvement; a way which is much easier to do in AEM CS than before.

by Jörg at May 31, 2024 11:40 AM

April 24, 2024

Things on a content management system - Jörg Hoh

AEM CS & Mongo exceptions

If you are an avid log checker on your AEM CS environments you might have come across messages like this in your authoring logs:

02.04.2024 13:37:42:1234 INFO [cluster-ClusterId{value='6628de4fc6c9efa', description='MongoConnection for Oak DocumentMK'}-cmp57428e1324330cluster-shard-00-02.2rgq1.mongodb.net:27017] org.mongodb.driver.cluster Exception in monitor thread while connecting to server cmp57428e1324330cluster-shard-00-02.2rgq1.mongodb.net:27017 com.mongodb.MongoSocketException: cmp57428e1324330cluster-shard-00-02.2rgq1.mongodb.net 
at com.mongodb.ServerAddress.getSocketAddresses(ServerAddress.java:211) [org.mongodb.mongo-java-driver:3.12.7]
at com.mongodb.internal.connection.SocketStream.initializeSocket(SocketStream.java:75) [org.mongodb.mongo-java-driver:3.12.7]
...
Caused by: java.net.UnknownHostException: cmp57428e1324330cluster-shard-00-02.2rgq1.mongodb.net

And you might wonder what is going on. I get this question every now and then, often assuming that this something problematic. Because we have all learned that stacktraces normally indicate problems. And on first sight this indicates a problem, that a specific hostname cannot be resolved. Is there a DNS problem in AEM CS?

Actually this message does not indicate any problem. The reason behind this is the way how mongodb implemented scaling operations. If you up- or downscale the mongo cluster, this does not happen in-place, but you get actually a new mongo cluster of the new size and of course the same content. And this new cluster comes with a new hostname.

So in this situation there was a scaling operation, and AEM CS connected to the new cluster and now looses connection to the old cluster, because the older cluster is stopped and its DNS entry is removed. Which is of course expected. And for that reason you can also see that this is logged on level INFO, and not as an ERROR.

Unfortunately this is a log message created by the mongo-driver itself, so this cannot be changed on the Oak level by removing the stacktrace from this message and changing the message itself. And for that reason you will continue to see it in the AEM CS logs, until a new improved mongo driver changes that.

by Jörg at April 24, 2024 10:52 AM

March 04, 2024

Things on a content management system - Jörg Hoh

Performance test modelling (part 5)

This is part 5 and the final post of the blog post series about performance test modelling; see part 1 for an overview and the links to all articles of this series.

In the previous post I discussed the impact of the system which we test, how the modelling of the test and the test content will influence the result of the performance test, and how you implement the most basic scenario of the performance tests.

In this blog post I want to discuss the predicted result of a performance test and the actual outcome of it, and what you can do when these do not match (actually they rarely do on the first execution). Also I want to discuss the situation where after golive you encounter that a performance test delivered the expected results, but did not match the observed behavior in production.

The performance test does not match the expected results

In my experience every performance, no matter how good or bad the basic definition is, contains at least 2 relevant data points:

  1. the number of concurrent users (we discussed that already in part 1)
  2. and an expected result, for example that the transaction must be completed within N seconds.

What if you don’t meet the performance criteria in point 2? This is typically the time when customers in AEM as a Cloud Service start to raise questions to Adobe, about number of pods, hardware details etc, as if the problem can only be the hardware sizing on the backend. If you don’t have a clear understanding about all the implications and details of your performance tests, this often seems to be the most natural thing to ask.

But if you have built a good model for your performance test, your first task should be to compare the assumptions with the results. Do you have your expected cache-hit ratio on the CDN? Were some assumptions in the model overly optimistic or pessimistic? As you have actual data to validate your assumptions you should do exactly that: go through your list of assumptions and check each one of them. Refine them. And when you have done that, modify the test and start another execution.

And at some point you might come to the conclusion, that all assumptions are correct, you have the expected cache-hit ratio, but the latency of the cache misses is too high (in which case the required action is performance tuning of individual requests). Or that you have already reduced the cache MISSES (and cache PASSES) to the minimum possible and that the backend is still not able to handle the load (in which case the expected outcome should be an upscale); or it can also be both.

That’s fine, and then it’s perfect to talk to Adobe, and share your test model, execution plan and the results. I wrote in part 1:

As you can imagine, if I am given just a few diagrams with test results and test statistics as preparation for this call with the customer … this is not enough, and very often more documentation about the test is not available. Which often leads to a lot of discussions about some very basic things and that adds even more delay to an already late project and/or bad customer experience.

But in this situation, when you have a good test model and have done your homework already, it’s possible to directly have a meaningful discussion without the need to uncovering all the hidden assumptions. Also, if you have that model at hand, I assume that performance tests are not an afterthought, and that there are still reasonable options to do some changes, which will either completely fix the situation or at least remediate the worst symptoms, without impacting the go-live and the go-live date too much.

So while this is definitely not the outcome we all work, design, build and ultimately hope for, it’s still much better than the 2nd option below.

I hope that I don’t need to talk about unrealistic expectations in your performance tests, for example delivering a p99,9 with 200 ms latency, while at the same time requiring a good number of requests always be handled by the AEM backend. You should have detected these unrealistic assumptions much earlier, mostly during design and then in the first runs during the evolution phase of your test.

Scenario 2: After go-live the performance is not what it’s supposed to be

In this scenario a performance test was either not done at all (don’t blame me for it!) or the test passed, but the results of the performance tests did not match the observed reality. This often shows up as outages in production or unbearable performance for users. This is the worst case scenario, because everyone assumed the contrary as the performance test results were green. Neither the business nor the developer team are prepared for it, and there is no time for any mitigation. This normally leads to an escalated situation with conference calls, involvement from Adobe, and in general a lot of stress for all parties.

The entire focus is on mitigation, and we ( I am speaking now as a member of the Adobe team, who is often involved in such situations) will try to do everything to mitigate that situation by implementing workarounds. As in many cases the most visible bottleneck is on the backend side, upscaling the backend is indeed the first task. And often this helps to buy you some time to perform other changes. But there are even cases, where an upscale of 1000% would be required to somehow mitigate that situation (which is possible, but also very short-lived, as every traffic spike on top will require additional 500% …); also it’s impossible to speed up the latency of a single-threaded request of 20 seconds by adding more CPU. These cases are not easy to solve, and the workaround often takes quite some time, and is often very tailored; and there cases where a workaround is not even possible. In any way it’s normally not a nice experience for no-one of the involved parties.

I refer to all of these actions as “workaround“. In bold. Because they are not not the solution to the challenge of performance problems. They cannot be a solution because this situation proves that the performance test was testing some scenarios, but not the scenario which shows in the production environment. It also raises valid concerns on the reliability of other aspects of the performance tests, and especially about the underlying assumptions. Anyway, we are all trying to do our best to get the system back to track.

As soon as the workarounds are in place and the situation is somehow mitigated, 2 types of questions will come up:

  1. How does a long-term solution look like?
  2. Why did that happen? What was wrong with the performance test and the test results?

While the response to (1) is very specific (and definitely out of scope of this blog post), the response to (2) is interesting. If you have a good documented performance test model you can compare its assumptions with the situation in which the production performance problem happened. You have the chance to spot the incorrect or missing assumption, adjust your model and then the performance test itself. And with that you should be able to reproduce your production issue in a performance test!

And if you have a performance failing test, it’s much easier to fix the system and your application, and apply some specific changes which fix this failed test. And it gives you much more confidence that you changed the right things to make the production environment handle the same situation again in a much better way. Interestingly, this gives also to some large extent the response to the question (1).

If you don’t have such a model in this situation, you are bad off. Because then you either start building the performance test model and the performance test from scratch (takes quite some time), or you switch to the “let’s test our improvements in production” mode. Most often the production testing approach is used (along with some basic testing on stage to avoid making the situation worse), but even that takes time and a high number of production deployments. While you can say it’s agile, other might say it’s chaos and hoping for the best… the actual opposite of good engineering practice.

Summary

In summary, when you have a performance test model, you are more likely to have less problems when your system goes live. Mostly because you have invested time and thoughts in that topic. And because you acted on it. It will not prevent you from making mistakes, forgetting relevant aspects and such, but if that happens you have a good basis to understand quickly the problem and also a good foundation to solve them.

I hope that you learned in these posts some aspects about performance tests which will help you to improve your test approach and test design, so you ultimately have less unexpected problems with performance. And if you have less problems with that, my life in the AEM CS engineering team is much easier 🙂

Thanks for staying with me for throughout this first planned series of blog posts. It’s a bit experimental, although the required structure in this topic led to some interesting additions on the overall structure (the first outline just covered 3 posts, now we are at 5). But I think that even that is not enough, I think that some aspects deserve a blog post on their own.

by Jörg at March 04, 2024 09:22 PM

February 26, 2024

Things on a content management system - Jörg Hoh

Performance test modelling (part 4)

This the 4th post of the blog post series about performance test modelling; see part 1 for an overview and the links to all articles of this series.

In the parts 2 and 3 I outlined relevant aspects when it comes to model your performance tests:

  • The modelling of the expected load, often as expressed as “concurrent users”.
  • The realistic modelling of the system where we want to conduct the performance tests, mostly regarding the relevant content and data.

In this blog post I want show how you deduce from that data, what specific scenarios you should cover by a performance tests. Because there is no single test, which tells you that the resulting end-user performance is good or not.

The basic performance test scenario

Let’s start with a very simple model, where we assume that the traffic rate is quite identical for the whole day; and therefor the performance test resembles that model:

On first sight this is quite simple to model, because you performance test will execute requests at a constant rate for the whole period of time.

But as I outlined in part 3, even if it seems that simple, you have to include at least some background noise. Also you have to take into account, that initially the cache-hit ratio is poor at the beginning, so you have to implement a cache-warmup phase (normally implement as a ramp-up phase, in which the load is increasing up the planned plateau) and just start to measure there.

So our revised plan rather looks like this this

Such a test execution (with the proper modelling of users, requests and requested data) can give you pretty good results if your model assumes a pretty constant load.

What about if your model requires you model a much more fluctuating request rate (for example if your users/visitors are primarily located in North America, and during the night you have almost no traffic, but it starts to increase heavily on the american morning hours? In that case you probably model the warmup in a way, that it resembles the morning increase on traffic, both in frequency and rate. That shouldn’t be that hard, but requires a bit more explicit modelling than just a simple rampup.

To give you some practical hints towards some basic parameters:

  • Such a performance test should run at least 2-3 hours, and even if you see that the results are not what you expect, not terminating it can reveal interesting results.
  • The warmup phase should at least cover 30 minutes; not only to give the caches time to warm-up, but also to give the backend systems time to scale to their “production sizing”; when you don’t execute performance test all the time, the system might scale down because there is no sense in having many systems idling when there is not load.
  • It can make sense to start not with the 100% of the targeted load, but with smaller numbers and start to increase from there. Because only then you can see the bottleneck which your test hits first. If you start already with 100% you might just see a lot of blockings, but you don’t know which one is the most impeding one.
  • When you are implementing a performance test in the context of AEM as a Cloud Service, I recommend to also use my checklist for Performance testing on AEM CS which gives some more practical hints how to get your tests right; although a few aspects covered there are covered in more depth in this post series as well.

When you have such a test passing the biggest part of the work is done; and based on your models you can do execute a number of different tests based to answer more questions.

Variations of the basic performance

The above model just covers an totally average day. But of course it’s possible to vary the created scenario to respond to some more questions:

  • What happens if the load of the day is not 100%, but for some reasons 120%, with identical assumptions about user behavior and traffic distribution? That’s quite simple, because you just increase a number in the performance test.
  • The basic performance test runs just for a few hours and stops then. It gives you the confidence that the system can operate at least these many hours, but a few issues might go unnoticed. For example memory leaks accumulating over time might get only visible after many hours of load. For that reason it makes sense to run your test for 24-48 hours continuously to validate that there is no degradation over that time.
  • What’s the behavior when the system goes into overload? An interesting question (but only if it does not break already when hitting the anticipated load) which is normally answered by a break test; then you increase the load more and more, until the situation really gets out of hand. If you have enough time, that’s indeed something you can try, but let’s hope that’s not very relevant 🙂
  • How does the system behave when your backend systems are not available? What if they come online again?

And probably many more interesting scenarios, which you can think of. But you should only perform these, when you have the basic version test right.

When you have your performance tests passing, the question is still: How does it compare to production load? Are we actually testing the right things?

In part 5 (the last post of this series) I cover the options you have when performance test does not match the expected results and also the worst case scenario: What happens if you find out after golive that your performance tests were good, but the production environment behaves very differently?

by Jörg at February 26, 2024 01:53 PM

February 20, 2024

Things on a content management system - Jörg Hoh

CDN and dispatcher – 2 complementary caching layers

I sometimes hear the question how to implement cache invalidation for the CDN. Or the question is why AEM CS still operates with a dispatcher layer when it now has a more powerful CDN in front of it.

The questions are very different, but the answer is in both cases: the CDN is no replacement for the dispatcher, and the dispatcher does not replace the CDN. They serve different purposes, and they combination of these two can be a really good package. Let me explain this.

The dispatcher is very traditional cache. It’s fronting the AEM systems and the cache status is actively maintained by cache invalidation so it always delivers current data. But from an end-user perspective this cache is often far away in terms of network latency. If my AEM systems are hosted in Europe, and end-users from Australia are reaching it, the latency can get huge.

The CDN is the contrary, it serves the content from many locations across the world, being as close to the end-user as possible. But the CDN cache invalidation is cumbersome, and for that reason most often TTL-based expiration is used. That means, you have to accept that there is a chance, that new content is already available, but the CDN can still deliver old content.

Not everyone is happy with that; and if that’s a real concern, short TTLs (in the range of a few minutes) are the norm. That means, that many files on the CDN will get stale every few minutes, which results in cache misses; and a cache miss on the CDN goes back to origin. But of course the reality is, that not many pages change every 10 minutes; actually very few. But customers want to have that low TTL just in case a page was changed, and that change needs to get visible to all endusers as soon as possible. .

So you have a lot of cache misses on the CDN, which trigger a re-fetch of the file from origin, and and because many of the files have not changed, you refetch the exactly same binary which got stale seconds ago. Actually a waste of resources, because your origin system delivers the same content over and over again to the CDN a consequence of these misses. So you could keep your AEM instances busy all the time, re-rendering the same requests over and over, always creating the same response.

Introducing the dispatcher caching, fronting the actual AEM instance. If the file has not changed, the dispatcher will deliver the same file (or just HTTP 304 not modified, which even avoids sending the content again). And it’s fast, much faster than letting AEM rendering the same content again. And if the file has actually changed, it’s rendered once and then reused for all the future CDN cache misses.

The combination of these 2 types of caching approaches help you to deliver content from the edge while at the same time having a reasonable latency for content updates (that means the time between replicating a change to the publish instances until all users across the world can see it) without the need to have a huge number of AEM instances in the background.

So as a conclusion, using the CDN and the dispatcher cache is a good combination, if setup properly.

by Jörg at February 20, 2024 05:22 PM

February 09, 2024

Things on a content management system - Jörg Hoh

Performance tests modelling (part 3)

This is post 3 in my series about Performance Test Modelling. See the first post for an overview of this topic.

In the previous 2 posts I discussed the importance of having a clearly defined model of the performance tests, and that a good definition the load factors (typically measured by “concurrent users”) is required to build a realistic test.

In this post I cover the influence of the test system and test data on the performance test and its result, and why you should spend effort to create a test with a realistic set of data/content. In this post we will do a few thought experiments, and to judge the results of each experiment, we will use the cache-hit ratio of a CDN as a proxy metric.

Let’s design a performance test for a very simple site: It just consists of 1 page, 5 images and 1 CSS and 1 JS file; 8 files in total. Plus there is a CDN for it. So let’s assume that we have to test with 100, 500 and 1000 concurrent users. What’s the test result you expect?

Well, easy. You will get the same test result for all tests irrespective of the level of concurrency; mostly because after the first requests a files will be delivered from the CDN. That means no matter with what concurrency we test, the files are delivered from the CDN, for which we assume it will always deliver very fast. We do not test our system, but rather the CDN, because the cache hit ratio is quite close to 100%.

So what’s the reason why we do this test at all, knowing that the tests just validate the performance promises of the CDN vendor? There is no reason for it. The only reason why we would ever execute such a test is that on test design we did not pay attention to the data which we use to test. And someone decided that these 7 files are enough for satisfy the constraints of the performance test. But the results do not tell us anything about the performance of the site, which in production will consists of tens of thousands of distinct files.

So let’s us do a second thought experiment, this time we test with 100’000 files, 100 concurrent users requesting these files randomly, and a CDN which is configured to cache files for 8 hours (TTL=8h). With regard to to chache-hit-ratio, what is the expectation?

We expect that the cache-hit ratio starts low for quite some time, this is the cache-warming phase. And then it starts to increase, but it will never hit 100%, as after some time cache entries will expire on the cache and start produce cache-misses. This is a much better model of reality, but it still has a major flaw: In reality, requests are not randomly distributed, but normally there are hotspots.

A hotspot consists of files, which are requested much more often than average. Normally these are homepages or other landing pages, plus other pages which users normally are directed to. This set of files is normally quite small compared to the total amount of files (in the range of 1-2%), but they make up 40-60% of the overall requests, and you can easily assume a Pareto distribution (the famous 80/20 rule), that 20% of the files were responsible for 80% of the requests. That means we have a hotspot and a long-tail distribution of the requests.

If we modify the same performance test to take that distribution into account, we end up with a higher cache-hit ratio, because now the hotspot can be delivered mostly from the CDN. But on the long-tail we will have more cache-misses, because they are requested that rarely, so they can expire on the CDN without being requested again. But in total the cache-hit ratio will be better than with the random distribution, especially on the often-requested pages (which are normally the ones we care about most).

Let’s translate this into a graph which displays the response time.

This test is now quite realistic, and if we only focus on the 95 percentile (p95; that means if we take 100 requests, 95 of them are faster than this) the result would meet the criteria; but beyond that the response time is getting higher, because there are a lot of cache misses.

This level of realism in the test results comes with a price: Also the performance test model and the test preparation and execution are much more complicated now.

And till now we only considered users, but what happens when we add random internet noise and the search engines (the unmodelled users from the first part of this series) into the scenario? These will add more (relative) weight to the long-tail, because these requests do not necessarily follow the usual hotspots, but we have to assume a more random distribution for these.

That means that then the cache-hit ratio will be lower again, as there will be much more cache-misses now; and of course this will also increase the response time of the p95. And: it will complicate the model even further.

So let’s stop here. As I have outlined above, the most simple model is totally unrealistic, but making it more realistic makes the model more complex as well. And at some point the model is no longer helpful, because we cannot transform it into a test setup without too much effort (creating test data/content, complex rules to implement the random and hotspot-based requests, etc). That means especially in the case of the test data and test scenarios we need to find the right balance. The right balance between the investment we want to make into tests and how close it should mirror the reality.

I also tried to show you, how far you can get without doing any kind of performance test. Just based on some assumptions were able to build a basic understanding how the system will behave, and how some changes of the parameters will affect the result. I use this technique a lot and it helps me to quickly refine models and define the next steps or the next test iteration.

In part 4 I discuss various scenarios which you should consider in your performance test model, including some practical recommendations how to include them in your test model.

by Jörg at February 09, 2024 01:57 PM

February 01, 2024

Things on a content management system - Jörg Hoh

Performance tests modelling (part 2)

This is is the second blog post in the series about performance test modelling. You can find the overview over this series and links to all its articles in the post “Performance tests modelling (part 1)“.

In this blog post I want to cover the aspect of “concurrent users”, what it means in the context of a performance test and why its important to clearly understand its impact.

Concurrent users is an often used measure to indicate the the load put to a system, expressed by usage in a definition, how many users are concurrently using that system. And for that reason many performance tests provide as quantitative requirement: “The system should be able to handle 200 concurrent users”. While that seems to be a good definition on first sight, it leaves many questions:

  • What does “concurrent” mean?
  • And what does “user” mean?
  • Are “200 concurrent users” enough?
  • Do we always have “200 concurrent users”?

Definition of concurrent

Let’s start with the first question: What does “concurrent” really mean on a technical level? How can we measure that our test indeed does “200 concurrent users” and not just 20 or 1000?

  • Are there any server-side sessions which we can count and which directly give this number? And that we setup our test in a way to hit that number?
  • Or do we have to rely on more vague definitions like “users are considered concurrent when they do a page load less than 5 minutes apart”? And that we design our test in that way?

Actually it does not matter at all, which definition you choose. It’s just important that you explicitly define which definition you use. And what metric you choose to understand that you hit that number. This is an important definition when it comes to implementing your test.

And as a side-note: Many commercial tools have their own definition of concurrent, and here the exact definition does not matter as well, as long as you are able to articulate it.

What is a user?

The next question is about “the user” which is modeled in the test; to simplify the test and test executions one or more “typical” user personas are created, which visit the site and perform some actions. Which is definitely helpful, but it’s just that: A simplification, because otherwise our model would explode because of the sheer complexity and variety of user behavior. Also sometimes we don’t even know what a typical “user” does on our site, because that system will be brand-new.

So this is a case, where we have a huge variance in the behavior of the users, which we should outline in our model as a risk: The model is only valid if the majority of the users are behaving more or less as we assumed.

But is this all? Are really all users do at least 10% of the actions we assume they do?

Let’s brainstorm a bit and try to find answers for these questions:

  • Does the google bot behave like that? All the other bots of the search engines?
  • What about malware scanners which try to hit a huge list of WordPress/Drupal/… URLs on your site?
  • Other systems performing (random?) requests towards your site?

You could argue, that this traffic has less/no business value, and for that reason we don’t test for it. Also it could be assumed that this is just a small fraction of the overall user traffic, and can be ignored. But that is just an assumption, and nothing more. You just assume that it is irrelevant. But often these requests are not irrelevant, not all all.

I encountered cases where not the “normal users” were bringing down a system, but rather this non-normal type of “user”. An example for that are cases where the custom 404 handler was very slow, and for that reason the basic undocumented assumption “We don’t need to care about 404s, as they are very fast” was violated and brought down the site. All performance tests passed, but the production system failed nevertheless.

So you need to think about “user” in a very broad sense. And even if you don’t implement the constant background noise of the internet in your performance test, you should list it as factor. If you know that a lot of this background noise will trigger a HTTP statuscode 404, you are more likely to check that this 404 handler is fast.

Are “200 concurrent users” enough?

One information every performance has is the number of concurrent users which the system must be able to handle. But even if we assume, that “concurrent” and “users” are both defined as well, is this enough?

First, on what data is this number based on? Is it a number based on data derived from another system, which the new system should replace? That’s probably the best data you can get. Or when you build a new system, is it based on good marketing data (which would be okay-ish), based on assumptions of the expected usage or just numbers we would like to see (because we assume that a huge number of concurrent users means a large audience and a high business value)?

So probably this is the topic which will be discussed the most. But the number and the way how that number is determined should be challenged and vetted. Because it’s one the corner-stones of the whole performance test model. It does not make sense to build a high performance and scalable system when afterwards you find out that the business numbers we grossly overrated, and a smaller and cheaper solution would have delivered the same results.

What about time?

A more important is aspect which is often overlooked is the timing; how many users are working on the site at every moment? Do you need to expect the maximum number 8 hours every day or just during the peak days of the year? Do you have a more or less constant usage or only during business hours in Europe?

This heavily depends on the type of your application and the distribution of your audience. If you build an intranet site for a company only located in Europe, the usage during the night is pretty much “zero”, and it will start to increase at 0600 in the morning (probably the Germans going to work early :-)), hitting the max usage between 09 and 16 o’clock and going to zero at latest at 22 o’clock. The contrast to it is a site visited world-wide by customers, where we can expect a higher and almost flat line; of course with variations depending on the number of people being up.

This influences your tests as well, because in both cases you don’t need to simulate spikes, that means a 500% increase of users within 5 minutes. On the other hand, if you plan for large marketing campaigns addressing millions of users, this might exactly be the situation you need to plan and test for. Not to mention if you book a slot during the Superbowl break.

Why is this important? Because you need to test only scenarios which you will expect to see in production. And ignore scenarios which we don’t have any value for you. For example it’s a waste of time and investment to test for a sudden spike in the above mentioned intranet case for the European company, while it’s essential for marketing campaigns to test a scenario, where such a spike comes on top of the normal traffic.

Summary

“N concurrent users” itself is not much information; and while it can serve as input, your performance test model should contain a more detailed understanding of that definition and what it means to the performance test. Otherwise you will focus just on a given number of users of this idealistic type and ignore every other scenario and case.

In the part 3 I cover how the system and the test data itself will influence the result of the performance test.

by Jörg at February 01, 2024 06:26 PM

CQ5 Blog - Inside Solutions

12 étapes pour migrer AEM On-Premise vers AEM Cloud Service

12 étapes pour migrer AEM On-Premise vers AEM Cloud Service

Votre entreprise exploite-t-elle tout le potentiel d’Adobe Experience Manager (AEM) pour offrir des expériences numériques exceptionnelles ?

Si vous gérer actuellement vos sites web avec AEM On-Premise ou si vous vous appuyez sur les produits d’Adobe, il est temps de vous lancer vers AEM Cloud Service.

En 2020, AEM a introduit la nouvelle génération de CMS avec AEM as a Cloud Service. Il est temps pour vous et votre entreprise de vous préparer à cette transition et d’adopter cette nouvelle génération de CMS dans le cloud.

Chez One Inside – A Vass Company, nous avons travaillé avec de nombreuses grandes entreprises, les guidant dans la transition vers le Cloud et effectuant des migrations vers AEM Cloud Service en moins de trois mois.

Dans ce guide, nos experts AEM ont rassemblé leurs connaissances pour répondre aux questions suivantes :

  • Comment passer en douceur d’une solution AEM on-premise à une solution AEM Cloud Service ?
  • Quelles sont les étapes critiques d’une migration réussie vers AEM Cloud Service ?
  • Quels sont les pièges les plus courants à éviter ?

Mais avant d’aborder ces étapes, explorons les avantages indéniables de l’adoption de la solution AEM Cloud Service.  

Quels sont les avantages d’une migration vers AEM Cloud Service ?

Comme pour tout projet d’entreprise, il est essentiel de démontrer à votre organisation et à votre management les avantages de la migration de vos sites web gérés par Adobe Experience Manager vers le cloud.

Voyons pourquoi cette transition est une étape nécessaire.

Passer d’une solution AEM sur-site/On-Premise ou Managed Services à une solution AEM Cloud Service offre de nombreux avantages, notamment :

Réduction des coûts et retour sur investissement à moyen terme

Le coût total de gestion avec AEM Cloud est considérablement réduit. Votre entreprise peut réaliser des économies sur plusieurs aspects :

  • Licence : Les coûts de licence peuvent diminuer puisque le nouveau modèle de tarification est basé sur l’utilisation. En outre, le passage au cloud vous donne une nouvelle occasion de négocier les prix avec Adobe.
  • Coûts opérationnels : AEM Cloud Service simplifie de nombreux aspects opérationnels, tels que la gestion des environnements et les mises à jour de versions automatisées.
  • Infrastructure et hébergement : Si vous avez précédemment hébergé AEM en interne, vous réaliserez des économies substantielles sur les frais d’infrastructure et d’hébergement. Les coûts de maintenance de l’infrastructure sont ainsi éliminés.
  • Effectifs : Le nombre d’employés à temps plein (ETP) requis pour le projet diminuera, ce qui entraînera une réduction des coûts.

Bien que le projet de migration entraîne des coûts initiaux, notre équipe a réussi à migrer des sites web vers AEM Cloud en moins de trois mois.

Le délai peut varier en fonction de la complexité de l’intégration et du nombre de sites web et de domaines concernés.

D’après notre analyse, le retour sur investissement (ROI) d’un tel projet est généralement inférieur à trois ans. En d’autres termes, la migration vers AEM Cloud est un investissement rentable.

Votre CMS est toujours à jour, ce qui vous garantit l’accès aux dernières fonctionnalités.

Avec AEM Cloud Service, vous pouvez dire adieu aux projets de mise à jour.

Adobe met automatiquement à jour le CMS avec les dernières fonctionnalités, éliminant ainsi le concept de versions.

Il fonctionne comme n’importe quel autre SaaS, ce qui vous permet de toujours travailler avec la version la plus récente.

Une sécurité renforcée

La sécurité est une préoccupation majeure pour les grandes entreprises, et AEM Cloud Service peut offrir une sécurité renforcée par rapport à votre configuration actuelle.

La solution est monitorée en permanence et des correctifs réguliers sont appliqués rapidement dès qu’un problème de sécurité est détecté. Pour en savoir plus, consultez le document Adobe Cloud Service Security Overview.

Disponibilité à 99,9%

Avec AEM Cloud Service, votre site web sera toujours en ligne. Cette solution peut évoluer horizontalement et verticalement pour maintenir en permanence ce haut niveau de service et gérer efficacement les charges de trafic les plus intenses.

What are the main benefits of Adobe Experience Manager as a Cloud Service?

Pas de courbe d’apprentissage

L’un des principaux avantages de la transition vers AEM Cloud Service est que votre équipe marketing trouvera l’outil familier.

Malgré des changements significatifs dans l’architecture, les processus de mise en production et les opérations, l’expérience de l’utilisateur final reste inchangées.

Les éditeurs de contenu ne remarqueront aucune différence après la migration si vous utilisez la dernière version sur site.

Cela signifie que vous n’aurez pas besoin d’investir du temps et des ressources dans la gestion de ce changement ou de fournir une formation approfondie à votre équipe.

Se concentrer sur l’innovation et accélérer la mise sur le marché

Gérer les opérations d’un CMS d’entreprise c’est du passé. Il est temps pour votre organisation d’adopter la nouvelle réalité du Cloud.

Avec AEM Cloud Service, vous pouvez accélérer l’innovation et cela pour plusieurs raisons :

  • Votre personnel peut se consacrer entièrement à des projets qui créent de la valeur.
  • Vous avez accès aux dernières innovations d’Adobe.

Grâce à notre expérience approfondie de la solution AEM Cloud Service et à notre collaboration avec de nombreux clients, nous avons constaté une nette amélioration de la mise en ligne des projets.

Les projets sont achevés rapidement et les nouveaux sites web peuvent être lancés en quelques mois.

Lorsque votre entreprise aura un nouveau produit ou service à présenter, vous profiterez beaucoup plus vite des avantages de votre travail avec cette nouvelle génération de CMS.

Passer d’AEM On-Premise à AEM Cloud Service, étape par étape

Cette section vous guidera dans la migration d’AEM On-Premise vers AEM as a Cloud Service.

Chaque étape est soigneusement conçue pour assurer une transition en douceur et réussie vers le cloud, couvrant les aspects critiques de l’analyse initiale à la mise en service.

AEM On Premise to Cloud Migration Project Steps

Étape 1 – Analyser, planifier et estimer l’effort

La première étape consiste à comprendre AEM Cloud Service, ainsi que les changements associés et les fonctionnalités obsolètes. Parmi les changements notables, nous citons :

  • Modifications de l’architecture avec mise à l’échelle horizontale automatique
  • Structure du code du projet
  • Stockage des images
  • CDN intégré
  • Configuration du dispatcher
  • Connexions réseau et API
  • Configuration des DNS et des certificats SSL
  • Pipelines CI/CD
  • Accès des auteurs AEM avec un compte Adobe
  • Groupes d’utilisateurs et autorisations

En outre, il est essentiel d’évaluer votre installation AEM actuelle, notamment en termes de connexions et d’intégrations avec d’autres services :

  • API ou endpoints au sein du réseau interne
  • Services tiers, en particulier ceux qui sont protégés par une liste blanche d’adresses IP
  • Tout service d’importation de données vers AEM
  • Connexion avec un groupe d’utilisateurs fermé (CUG)

Ces éléments doivent être soigneusement examinés, car certains ajustements peuvent s’avérer nécessaires.

Un autre aspect essentiel est une communication efficace avec les parties prenantes, les partenaires et l’équipe Adobe.

Il est essentiel d’impliquer ces parties dès le début du projet, en définissant clairement les tâches à accomplir et les délais à respecter.

Par exemple, vous découvrirez plus tard que l’implication de votre équipe informatique interne est nécessaire.

Il est essentiel de les informer à l’avance afin d’éviter tout retard dans le projet. En outre, il est essentiel de revoir vos accords de licence avec Adobe et de vous assurer que vous disposez des licences appropriés pour AEM as a Cloud Service.

Bien que cette première étape puisse prendre quelques jours, elle est essentielle pour évaluer les aspects critiques de votre installation, définir le plan et les efforts du projet et partager ces informations avec les principales parties prenantes.

Étape 2 – Préparation du code pour AEM Cloud Service

Cette étape vise à s’assurer que votre installation AEM actuelle et sa base de code sont prêtes pour le Cloud tout en restant compatibles avec vos instances existantes « on-premise ».

Bien que nous n’approfondissions pas tous les changements structurels requis pour AEM Cloud Service dans cet article, nous fournirons une vue d’ensemble pour que tous les lecteurs puissent l’assimiler facilement.

Adobe propose un outil utile appelé Adobe Best Practices Analyzer, conçu pour évaluer votre implémentation AEM actuelle et proposer des conseils sur les améliorations à apporter pour s’aligner sur les meilleures pratiques et les normes Adobe.

Le rapport généré par cet outil couvre :

  • les fonctionnalités de l’application qui ont besoin d’être remaniées.
  • les éléments du repository qui devraient être déplacés vers des emplacements pris en charge.
  • les composants qui doivent être modernisés.
  • les problèmes de déploiement et de configuration.
  • les fonctionnalités d’AEM 6.x remplacées par de nouvelles fonctionnalités ou qui ne sont actuellement pas prises en charge par AEM as a Cloud Service.

Il est important de noter qu’un expert AEM doit examiner le rapport d’Adobe Best Practices Analyzer, car il ne permet pas de comprendre l’ensemble de la base de code et ses implications.

Après l’évaluation, un architecte ou un développeur AEM peut restructurer la base de code et appliquer de nouvelles pratiques selon le dernier archétype AEM.

Une pratique recommandée consiste à remanier davantage et à revoir les fonctionnalités obsolètes de votre base de code actuelle.

Étant donné que des tests complets de l’ensemble du site web et de l’application seront nécessaires ultérieurement, il est avantageux de profiter de l’occasion pour éliminer la dette technique et établir une base plus solide.

Étape 3 – Préparation de l’environnement AEM Cloud

Cette étape vise à préparer l’environnement cloud et à configurer AEM Cloud Manager, qui est indispensable pour AEM Cloud. Il est important de noter que cette étape peut être réalisée en même temps que la précédente.

Adobe Cloud Manager offre une interface conviviale qui simplifie la configuration des environnements, la mise en place de pipelines et la configuration des certificats, du DNS et d’autres services essentiels.

Veuillez noter que, pour accéder à AEM Cloud Manager et aux services nécessaires, vous devez d’abord conclure un accord de licence avec Adobe. Entamez les discussions avec votre gestionnaire de compte Adobe suffisamment à l’avance pour éviter tout retard à ce stade.

Étape 4 – Migration de vos projets vers AEM Cloud

À ce stade, votre code a été remanié et tous les changements incompatibles avec la configuration sur site ont été mis en œuvre et migrés pour le rendre prêt pour le cloud.

En outre, tous les environnements nécessaires (test, staging, production) ont été configurés de manière appropriée et sont prêts à héberger votre code.

Cette étape est relativement simple et consiste à pousser votre code vers un Git.

Au cours de cette phase et jusqu’à la mise en service, il est conseillé de geler les fonctionnalités.

Cependant, si vous ne pouvez pas vous permettre de geler des fonctionnalités dans votre environnement de production ou si des changements critiques doivent être appliqués à votre installation sur site, il est possible de rétroporter le code vers le Cloud ultérieurement.

Chez One Inside, nous avons l’habitude de gérer de telles situations, mais il est essentiel de comprendre qu’un gel du code peut aider à atténuer les risques de retard du projet et de complexité accrue.

Prêt à passer à AEM Cloud Service?

N’attendez plus ! Contactez dès maintenant nos experts, et rendons votre transition fluide et réussie !

Étape 5 – Valider l’intégration avec les services de base ou les API externes

Il est probable que votre site web s’appuie sur des données provenant de services tiers ou d’applications internes.

Pour garantir une intégration transparente avec ces services, des configurations réseau spécifiques doivent être effectuées à l’aide de Adobe Cloud Manager.

En outre, AEM Cloud Service offre une adresse IP statique qui doit être mise sur liste blanche de votre côté pour permettre la connectivité avec nos applications sur site.

Cette étape est capitale pour établir une connexion sécurisée et ininterrompue entre votre environnement AEM Cloud et vos services principaux ou API externes.

Étape 6 – Intégrer Adobe Target, Adobe Analytics et la suite Adobe Experience Cloud

Comme vous utilisez déjà AEM pour vos sites web, il est probable que vous utilisiez également d’autres solutions de la suite Adobe Experience Cloud, notamment Adobe Analytics et Adobe Target.

L’intégration de ces solutions est généralement simple et elles devraient fonctionner de manière transparente dans vos pages web.

Votre utilisation actuelle d’AEM facilite l’extension de l’intégration à d’autres composants d’Adobe Experience Cloud, améliorant ainsi votre capacité à analyser et à optimiser vos expériences numériques.

Étape 7 – Migration du contenu

La migration du contenu est une étape essentielle, mais ne elle ne doit pas être trop vous inquiétez, car la structure interne du contenu entre votre site web « on-premise » et le site web AEM Cloud nouvellement créé reste identique.

Adobe propose divers outils pour rationaliser cette tâche, tels que l’outil Content Transfer Tool, spécialement conçu pour migrer le contenu existant de votre site AEM On-Premise vers AEM Cloud, et le Package Manager, qui facilite l’importation et l’exportation du contenu.

La migration de contenu ne concerne pas seulement les pages, mais l’ensemble du contenu de votre référentiel de donnée, y compris :

  • Le contenu des pages
  • Les actifs
  • Les données relatives aux utilisateurs et aux groupes.

Il est important de comprendre que vous pouvez continuer à créer du contenu sur votre site productif tout en effectuant la migration.

En effet l’outil de migration de contenu importera que les modifications apportées depuis la dernière migration de contenu, ce qui garantit une transition efficace et actualisée.

Étape 8 – Tester, tester, tester

Nous approchons des dernières étapes de la migration. Bien que des tests aient été effectués au cours des différentes étapes, il est maintenant temps de procéder à une session complète de User Acceptance Testing (UAT).

Votre équipe de test spécialisée et les utilisateurs doivent participer activement à cette phase critique. Il est essentiel de disposer d’une stratégie de test détaillée avant de commencer l’UAT.

L’intégration des auteurs dans le processus de test a de multiples objectifs. Non seulement cela accélère leur familiarisation avec le nouvel environnement, mais ce sont aussi les personnes qui connaissent le mieux le fonctionnement des composants.

Leur contribution, leurs connaissances et leur soutien sont essentiels pour garantir que votre présence numérique reste claire et distinctive.

La réalisation de tests approfondis garantit la réussite de votre migration vers AEM Cloud Service et le bon fonctionnement de votre site web dans son nouvel environnement.

Étape 9 – Redirection du domaine

Il s’agit de la dernière étape avant le Go-Live, et c’est là que votre équipe de réseau informatique joue un rôle clé.

Elle gérera les certificats, les configurations DNS et la redirection des domaines.

Comme souligné au début de ce guide, il est essentiel que vos partenaires informatiques aient été informés dès le premier jour de ce projet de ces étapes critiques et que les tâches aient été réparties en conséquence.

Ils doivent être bien préparés et conscients de ce qui doit être fait, car les préparatifs de cette phase sont en cours depuis plusieurs semaines.

Une coordination efficace à ce stade est essentielle pour éviter tout retard dans le processus global et la date de mise en service.

Grâce à une redirection de domaine efficace, votre site web est transféré en toute transparence vers son nouvel environnement AEM Cloud.

Étape 10 – Go Live

Cette étape peut sembler la plus stressante, mais paradoxalement, c’est aussi la plus simple.

Votre site Web a fait l’objet de tests approfondis et tout fonctionne parfaitement dans l’environnement en Cloud.

Il est temps de procéder à la transition finale, en passant de votre instance AEM On-Premise à l’instance AEM Cloud.

La transition sera transparente pour vos utilisateurs finaux, qui ne subiront aucune interruption de service. Avec une planification et une exécution minutieuses, cette étape devrait marquer l’aboutissement de votre migration vers AEM Cloud.

“La migration vers AEM Cloud représente une grande fierté aussi bien pour les équipes marketing et métier que pour le service informatique. Voir notre site web opérationnel dans le Cloud marque l’entrée dans une ère nouvelle, caractérisée par des performances améliorées et des opportunités d’enrichir l’expérience client. ”
Martyna Wilczynska

Martyna Wilczynska

Cheffe de projet chez One Inside – A VASS Company

Étape 11 – Formation

Vos éditeurs n’auront pas besoin de formation spécifique car l’interface d’administration reste la même. Cependant, il est important de noter qu’un nouvel outil essentiel, Adobe Cloud Manager, a été introduit.

Vos équipes informatiques ou DevOps doivent gérer cet outil, ou vous pouvez déléguer la maintenance du site à votre partenaire Adobe.

Nos experts AEM peuvent proposer des formations pour s’assurer que votre équipe informatique possède les compétences et les connaissances nécessaires pour gérer les tâches essentielles liées aux certificats SSL, à la liaison des domaines, à la mise en liste blanche et à la gestion des comptes.

Étape 12 – Mise hors service de l’instance sur site 

Enfin, il est conseillé de laisser fonctionner les serveurs « on-premise »  sur site pendant 2 à 4 semaines après la migration.

Cette précaution permet de disposer d’un filet de sécurité en cas de situation critique qui nécessiterait de revenir à l’instance sur site.

Bien que, d’après notre expérience, un tel retour en arrière soit rarement nécessaire, il est prudent de gérer ce risque potentiel.

Une fois la phase d’hyper-care terminée, vous pouvez en toute confiance vous concentrer sur votre nouvelle instance AEM as a Cloud Service, en sachant que vous disposez d’un plan d’urgence en cas de besoin.

Besoin d’un accompagnement personnalisé ?

Prêt à explorer votre migration vers AEM Cloud ? Réservez un moment avec nous afin que nous puissions évaluer vos besoins et vous aider à préparer une transition sans accroc !

Leçons tirées de la migration vers AEM as a Cloud Service et meilleures pratiques

Après plusieurs migrations réussies vers AEM Cloud Service, notre équipe a acquis d’excellentes connaissances et nous souhaitons partager avec vous quelques bonnes pratiques qui vous aideront à réduire les risques liés à ce projet.

Commencez par une analyse approfondie

Commencez votre projet de migration vers AEM cloud par une analyse complète.

Évitez de précipiter l’évaluation de votre configuration AEM sur site actuelle. Il est essentiel d’évaluer soigneusement les dépendances et les éléments qui nécessitent un remaniement.

S’il s’agit de votre première migration, investissez du temps dans la recherche et la documentation pour un projet de cette nature.

Même si vous disposez d’une équipe interne chargée d’AEM, pensez à solliciter l’assistance d’un partenaire Adobe expérimenté. Leur expertise peut s’avérer inestimable pour garantir une migration réussie.

Gérer les dépendances des parties prenantes

Il est essentiel de s’occuper des dépendances des parties prenantes dès le début du projet.

Plusieurs membres de votre organisation joueront un rôle essentiel lors des étapes importantes du projet. Nous avons déjà mentionné le rôle de l’équipe informatique dans la gestion du réseau, mais d’autres groupes peuvent être impliqués, comme la sécurité et l’assurance qualité.

Dès le début du projet, il est essentiel de communiquer clairement vos attentes à ces équipes et de leur fournir des dates précises pour leur participation. Cette approche proactive permet d’éviter les retards et d’assurer le bon déroulement du projet.

Un projet Scrum pas comme les autres

Ce qui peut surprendre, c’est qu’un projet de migration vers le cloud ne correspond pas tout à fait à un projet informatique respectant le méthodologie Scrum.

Dans le cadre habituel, nous nous efforçons de fournir la valeur la plus élevée possible en un minimum de temps, et nous présentons nos solutions aux clients en leur demandant constamment leur avis.

Cependant, un projet de migration au Cloud consiste principalement à remanier le code du backend, le rendant non présentable aux parties prenantes jusqu’aux derniers moments, lorsque le site web est déjà dans l’environnement d’acceptance dans le Cloud et prêt à être testé.  

Réunions régulières de l’équipe et des parties prenantes

À mesure que le délai de trois mois s’écoule, il est essentiel de rester en phase avec votre équipe et les principales parties prenantes.

Nous vous recommandons vivement d’établir une routine de mise à jour hebdomadaire pour suivre les progrès, identifier et traiter les risques, et mettre en œuvre des plans d’atténuation.

Au cours de ces revues hebdomadaires, accordez une attention particulière aux dépendances avec les autres équipes et évaluez l’avancement de leurs activités.

Cette approche proactive permet de s’assurer que tout le monde est aligné et de répondre rapidement à l’évolution des besoins du projet.

“Une communication claire avec les clients est essentielle pour atténuer les risques, identifier les problèmes et fournir des mises à jour sur l’avancement de la migration. Elle réduit le stress des clients et garantit la transparence de leur parcours numérique.”
Michael Kleger

Michael Kleger

Chef de projet chez One Onside

Relations avec Adobe

Des négociations de licence doivent être menées à bien pour obtenir l’accès Adobe Cloud Manager. Il est tout aussi important de discuter avec votre Adobe Account Manager pour négocier le maintien d’un serveur de secours sur site pendant une période donnée.

D’après notre expérience, le fait d’entamer ces discussions le plus tôt possible permet de négocier la transition la plus avantageuse et la plus souple pour quitter l’infrastructure sur site.

En outre, en cas de problèmes inattendus, vous aurez peut-être besoin de l’assistance de l’équipe d’Adobe. Il est possible que certaines fonctionnalités ne fonctionnent pas correctement lorsqu’elles sont remaniées pour le cloud.

Pour accélérer le temps de réponse du support Adobe, il est essentiel de collaborer avec un partenaire Adobe qui entretient des relations étroites avec l’équipe Adobe.

Par exemple, chez One Inside, nous avons cultivé un partenariat avec Adobe depuis plus de dix ans, et notre bureau est situé à moins de 30 km de l’équipe AEM responsable de la création d’AEM Cloud.

Cette relation étroite peut s’avérer inestimable dans certaines situations. Au fil des ans, nous avons développé une relation solide avec Adobe en tant qu’entreprise et avec ses employés talentueux.

Cela nous donne un avantage dans la résolution des problèmes, car nous savons parfaitement qui contacter sans avoir à naviguer entre plusieurs niveaux d’assistance.

Éviter de développer sur l’instance « on-premise » pendant la migration

Dans la mesure du possible, évitez d’introduire de nouveaux développements sur vos sites web pendant la migration. Cette pratique permet d’éviter de nombreux problèmes.

Cependant, nous reconnaissons que le gel du code pendant trois mois est souvent peu pratique.

Pour atténuer les problèmes potentiels, assurez-vous que le code des deux environnements est synchronisé et optimisé pour le Cloud avant d’apporter de nouvelles améliorations à votre branche « on-premise ». Cet alignement minimise les complications au cours du processus de migration.

Profitez de l’occasion pour améliorer les défauts de conception

Au cours du processus de migration, vous aurez l’occasion de tester l’intégralité de votre site web. Profitez-en pour améliorer divers aspects de votre site, notamment l’architecture, le remaniement du code et les ajustements mineurs de la conception.

Dans nos projets de migration, nous avons intégré avec succès des améliorations telles que la génération de rendus d’images, des améliorations de l’expérience client et des optimisations liées aux performances et à la mise en cache.

Cette fenêtre de migration vous permet de passer au Cloud et d’améliorer la qualité et la fonctionnalité globales de votre site web.

Principaux enseignements à tirer de la migration vers AEM as a Cloud Service

En conclusion, la migration vers AEM as a Cloud Service est un parcours de transformation qui nécessite une planification et une exécution minutieuses. AEM Cloud Service est l’avenir d’AEM et cette migration en pose les bases.

Tout au long de cet article, nous avons partagé des informations précieuses et des bonnes pratiques issues de migrations AEM Cloud réussies.

De l’analyse des dépendances à l’établissement de relations solides avec Adobe, des mises à jour hebdomadaires de l’équipe à l’optimisation des défauts de conception, ces leçons peuvent vous guider vers une migration réussie.

Saisissez les défis et les opportunités de la transition vers le cloud, et n’oubliez pas qu’une migration bien menée peut conduire à une expérience numérique plus efficace, plus sûre et plus innovante pour votre organisation et ses utilisateurs.

Avec la bonne approche et le soutien de partenaires expérimentés, vous pouvez naviguer en toute confiance et obtenir d’excellents résultats.

Samuel Schmitt

Samuel Schmitt

Digital Solution Expert

Souhaitez-vous recevoir notre prochain article ?

Abonnez-vous à notre newsletter et nous vous enverrons le prochain article sur Adobe Experience Manager.

The post 12 étapes pour migrer AEM On-Premise vers AEM Cloud Service appeared first on One Inside.

by Samuel Schmitt at February 01, 2024 09:47 AM

January 31, 2024

CQ5 Blog - Inside Solutions

12 Schritte für eine Migration von AEM On-Premise zur AEM Cloud

12 Schritte für eine erfolgreiche Migration von AEM On-Premise zur AEM Cloud

Nutzt Ihre Organisation das volle Potenzial des Adobe Experience Managers (AEM), um Ihren Kunden hervorragende digitale Erlebnisse über diverse Kanäle hinweg anzubieten? 

Falls Sie Ihre Webseiten momentan mit einer AEM-Lösung auf eigenen Servern betreiben oder die Managed Services von Adobe in Anspruch nehmen, ist jetzt der ideale Zeitpunkt für den Schritt in die Cloud. 

Im Jahr 2020 hat AEM eine neue Ära des Content-Management-Systems (CMS) mit AEM as a Cloud Service eingeläutet. Für Sie und Ihr Unternehmen ist es nun an der Zeit, sich auf diesen wichtigen Wandel vorzubereiten und die Vorteile dieses fortschrittlichen CMS in der Cloud zu nutzen. 

Bei One Inside – A Vass Company haben wir umfangreiche Erfahrungen gesammelt, indem wir zahlreichen namhaften Firmen bei ihrem Wechsel von lokalen Lösungen in die Cloud zur Seite standen. Wir haben reibungslose Übergänge auf die AEM Cloud innerhalb von weniger als drei Monaten realisiert. 

In diesem detaillierten Leitfaden teilen unsere AEM-Spezialisten ihr Fachwissen zu folgenden Themen: 

  • Wie gelingt der nahtlose Übergang von AEM On-Premise zur AEM Cloud? 
  • Welche Schlüsselschritte sind für eine erfolgreiche Migration in die AEM Cloud entscheidend? 
  • Welche häufigen Stolpersteine gilt es zu vermeiden? 

Bevor wir uns jedoch den einzelnen Migrationsschritten zuwenden, lohnt es sich, die überzeugenden Vorteile der AEM Cloud näher zu betrachten. 

Welche Vorteile bringt der Umstieg auf AEM Cloud mit sich? 

Für jedes zukunftsorientierte Unternehmen ist es essenziell, die offensichtlichen Vorteile eines Wechsels seiner AEM-Systeme in die Cloud hervorzuheben, um die Unternehmensführung von einem solchen Schritt zu überzeugen. 

Lassen Sie uns die Gründe betrachten, warum dieser Schritt für Ihr Unternehmen von Bedeutung sein könnte. 

Der Umstieg von AEM On-Premises oder Managed Services auf AEM Cloud bringt eine Fülle von Vorteilen mit sich, einschliesslich: 

Reduzierte Betriebskosten und ein mittelfristiger Return on Investment (ROI) 

Die Gesamtbetriebskosten werden durch die Nutzung von AEM Cloud signifikant gesenkt. Verschiedene Bereiche, in denen Ihr Unternehmen Einsparungen erzielen kann, umfassen: 

  • Lizenzen: Durch ein nutzungsbasiertes Preismodell können sich die Lizenzkosten verringern. Zudem bietet der Wechsel in die Cloud eine Chance, neue Preisverhandlungen mit Adobe zu führen. 
  • Betriebskosten: AEM Cloud vereinfacht viele Betriebsprozesse, wie die Umgebungsverwaltung und die automatischen Updates. 
  • Infrastruktur und Hosting: Wenn Sie AEM bisher intern gehostet haben, werden Sie erhebliche Kosteneinsparungen bei Infrastruktur und Hosting erzielen, da die Instandhaltungskosten wegfallen. 
  • Personal: Die Anzahl der benötigten Vollzeitkräfte wird sich verringern, was zu weiteren Einsparungen führt.Trotz der anfänglichen Ausgaben für das Migrationsprojekt hat unser Team den Übergang auf AEM Cloud in unter drei Monaten erfolgreich vollzogen. 

Trotz der anfänglichen Kosten des Migrationsprojekts, konnte unser Team den Umzug auf AEM Cloud in weniger als drei Monaten vollziehen. 

Der Zeitrahmen variiert je nach Integrationskomplexität sowie der Anzahl der betroffenen Websites und Domains. 

Unsere Untersuchungen zeigen, dass der Return of Investment (ROI) für ein solches Projekt in der Regel bei unter drei Jahren liegt, was die Migration zu einer wertvollen Investition macht. 

Sie haben immer Zugriff auf die neusten Funktionen, weil Ihr CMS stets auf dem neuesten Stand ist. 

Mit AEM als Cloud Service entfallen kostspielige Versions-Upgrades. Adobe sorgt für die automatische Aktualisierung des CMS mit den neuesten Funktionen und macht damit das Konzept traditioneller Versionsnummern obsolet.

Sie arbeiten immer mit der aktuellsten Version, ähnlich wie bei anderen SaaS-Lösungen. 

Erhöhte Sicherheit 

Sicherheit ist besonders für grosse Unternehmen ein kritisches Anliegen, und AEM als Cloud-Service bietet möglicherweise mehr Sicherheit als Ihre gegenwärtige Lösung.

Die Cloud-Lösung wird kontinuierlich überwacht und Patches werden umgehend eingespielt, sobald Sicherheitslücken entdeckt werden.

Für detaillierte Informationen zur Sicherheit der Adobe Cloud-Services lesen Sie bitte das entsprechende Sicherheitsdokument von Adobe. 

99,9% Betriebszeit 

AEM Cloud garantiert, dass Ihre Website ständig erreichbar ist.

Diese Lösung ist darauf ausgelegt, horizontal und vertikal effizient zu skalieren, um ein hohes Serviceniveau aufrechtzuerhalten und auch bei Spitzenlasten zuverlässig zu funktionieren. 

What are the main benefits of Adobe Experience Manager as a Cloud Service?

Keine Lernkurve für Ihr Marketingteam 

Ein weiterer Vorteil der Migration auf AEM Cloud ist, dass Ihr Marketingteam mit dem Tool vertraut sein wird. Trotz wesentlicher Änderungen an der Architektur und den Prozessen bleibt die Benutzererfahrung für den Endanwender gleich.

Editoren von jeglichen Inhalten werden nach der Migration keine Unterschiede feststellen, was bedeutet, dass keine zusätzlichen Schulungen oder Anpassungen erforderlich sind. 

Fokus auf Innovation und schnellere Markteinführung 

Das Verwalten eines Enterprise-CMS wie in alten Zeiten ist nicht mehr zeitgemäss. Ihre Organisation sollte sich der neuen Gegebenheiten annehmen. 

Die AEM Cloud ermöglicht eine Beschleunigung der Innovationskraft aus mehreren Gründen: 

  • Ihr Team kann sich ganz auf wertschöpfende Projekte konzentrieren. 
  • Sie profitieren von den neuesten Adobe-Innovationen. 

Durch unsere umfassende Erfahrung mit der AEM Cloud und die Zusammenarbeit mit verschiedenen Kunden konnten wir eine deutlich verkürzte Markteinführungszeit beobachten. Projekte werden zügig abgeschlossen und neue Webseiten können innerhalb weniger Monate an den Start gebracht werden. 

Wenn Ihr Unternehmen ein neues Produkt oder eine neue Dienstleistung präsentieren möchte, wird es die Vorteile des Einsatzes dieses CMS der neuesten Generation zu schätzen wissen. 

Schrittweiser Übergang von AEM On-Premise zu AEM Cloud-Service 

In diesem Abschnitt begleiten wir Sie auf dem Weg von Ihrer lokalen AEM-Installation hin zu AEM als Cloud-Service. 

Jede Phase dieses Prozesses wurde sorgsam ausgearbeitet, um sicherzustellen, dass der Wechsel in die Cloud problemlos und erfolgreich verläuft. Dabei werden alle entscheidenden Punkte von der initialen Analyse bis hin zum operativen Betrieb berücksichtigt. 

AEM On Premise to Cloud Migration Project Steps

Schritt 1 – Analyse, Planung und Aufwandsschätzung 

Der Auftakt dieses Unterfangens umfasst das Verständnis für AEM als Cloud-Service, inklusive der damit einhergehenden Neuerungen und der auslaufenden Features. Zu den nennenswerten Änderungen gehören unter anderem: 

  • Anpassungen der Architektur mit automatischer horizontaler Skalierbarkeit 
  • Anpassungen in der Projektcode-Struktur 
  • Verwaltung digitaler Assets 
  • Integriertes CDN (Content Delivery Network) 
  • Konfiguration der Dispatching-Systeme 
  • Netzwerk- und API-Anbindungen, inklusive IP-Whitelisting 
  • Einrichtung von DNS- und SSL-Zertifikaten 
  • Einrichtung und Verwaltung von CI/CD-Pipelines 
  • Zugriff für AEM-Editoren via Adobe-Konto 
  • Definition von Benutzergruppen und Zugriffsrechten 

Zudem ist es von grosser Bedeutung, Ihre aktuelle AEM-Struktur eingehend zu prüfen, insbesondere in Bezug auf Schnittstellen und Verknüpfungen mit anderen Systemen: 

  • APIs oder interne Netzwerkschnittstellen 
  • Services von Drittanbietern, vor allem jene, die durch IP-Whitelisting geschützt sind 
  • Sämtliche Datenimport-Dienste für AEM 
  • Zugangssysteme mit eingeschränkten Nutzerkreisen (Closed User Groups, CUG) 

Diese Komponenten erfordern eine genaue Prüfung, da Anpassungen nötig sein könnten. 

Ein weiterer zentraler Punkt ist die effiziente Kommunikation mit allen aktuellen Stakeholdern, Kooperationspartnern und dem Adobe-Team. Eine frühzeitige Einbindung dieser Gruppen mit klaren Aufgaben und Zeitplänen ist für das Gelingen des Projekts ausschlaggebend. 

Es wird beispielsweise notwendig sein, Ihr internes IT-Team zu involvieren. Dessen frühestmögliche Information ist entscheidend, um Verzögerungen im Projektverlauf zu vermeiden. 

Überdies ist es unabdingbar, Ihre Lizenzabkommen mit Adobe zu prüfen und zu bestätigen, dass Sie über die passenden Lizenzen für AEM als Cloud-Service verfügen. 

Diese Abkommen regeln die Vereinbarungen bezüglich Cloud-Hosting, Datenschutz und Support. Nun ist der Moment gekommen, in dem Sie auch sämtliche Verträge mit Subunternehmern überprüfen sollten. Gibt es eine Support-Vereinbarung mit Service-Level-Agreements (SLA) mit einem Zulieferer? Wer ist für die Einrichtung unserer Ticketing- und Dokumentationssysteme zuständig? Diese Abkommen sollten nun überprüft und an die neue Cloud-Struktur angepasst werden. Es ist essentiell, dass jeder seinen Bereich kennt, der Umfang der Arbeit klar definiert ist und dass die Arbeitsabläufe reibungslos funktionieren. 

Obwohl diese erste Etappe nur einige Tage in Anspruch nimmt, ist sie doch entscheidend für die Einschätzung von kritischen Aspekten Ihrer bestehenden Infrastruktur, die Definition des Projektplanes und des Aufwandes sowie für die Informationsweitergabe an alle Hauptakteure. 

Schritt 2 – Vorbereitung des Codes für den Einsatz von AEM als Cloud-Service 

In dieser Phase geht es darum, die Codebasis Ihrer aktuellen AEM-Installation für den Betrieb in der Cloud vorzubereiten, wobei zugleich die Kompatibilität mit bestehenden On-Premise-Systemen gewährleistet werden soll. 

Obwohl wir in diesem Beitrag nicht auf sämtliche strukturelle Änderungen eingehen, die für die Nutzung von AEM in der Cloud notwendig sind, möchten wir Ihnen einen verständlichen Überblick bieten. 

Adobe stellt hierfür ein nützliches Werkzeug zur Verfügung, den Adobe Best Practices Analyzer. Dieses Instrument dient dazu, Ihre AEM-Umsetzung zu bewerten und empfiehlt Anpassungen gemäss den Best Practices und Standards von Adobe. Der Bericht des Analyzers beinhaltet unter anderem: 

  • Anwendungsfunktionen, die überarbeitet werden müssen. 
  • Repository-Elemente, die an unterstützte Orte verschoben werden müssen. 
  • Überholte Dialoge und UI-Komponenten, die modernisiert werden sollten. 
  • Herausforderungen bei der Implementierung und Konfiguration. 
  • Funktionen von AEM 6.x, die entweder durch neue ersetzt wurden oder in der aktuellen Cloud-Version von AEM nicht unterstützt werden. 

Es ist ratsam, dass ein AEM-Experte den Bericht des Adobe Best Practices Analyzers durchgeht, da dieser möglicherweise nicht alle Aspekte der Codebasis und deren Implikationen vollständig abdeckt. 

Anschliessend sollte ein AEM-Architekt oder Entwickler die Codebasis entsprechend umstrukturieren und an die Praktiken der neuesten AEM-Archetypen angleichen. 

Es empfiehlt sich auch, ein Refactoring älterer Funktionen Ihrer Codebasis vorzunehmen. 

In Anbetracht der Tatsache, dass zu einem späteren Zeitpunkt umfangreiche Tests der Website und der Anwendungen anstehen, bietet es sich an, diese Gelegenheit zu nutzen, um technische Altlasten zu bereinigen und eine robustere Basis für die Zukunft zu schaffen. 

Schritt 3 – Einrichten der AEM-Cloud-Umgebungen 

Ziel dieses Schrittes ist es, die Cloud-Umgebung vorzubereiten und den AEM Cloud Manager zu konfigurieren, der das Herzstück von AEM als Cloud Service darstellt. Wichtig ist, dass dieser Schritt parallel zum vorherigen erfolgen kann. 

Der Adobe Cloud Manager bietet eine benutzerfreundliche Oberfläche, die es vereinfacht, Umgebungen einzurichten, Pipelines zu konfigurieren sowie Zertifikate, DNS und weitere wesentliche Dienste zu verwalten. 

Schritt 4 – Überführung Ihrer Projekte in die AEM Cloud 

Bis zu diesem Punkt wurde Ihr Code bereits überarbeitet und jegliche Inkompatibilitäten wurden identifiziert und so angepasst, dass diese Cloud-tauglich sind. 

Zusätzlich sind alle erforderlichen Umgebungen (Test, Staging, Produktion) eingerichtet und bereit, Ihren Code aufzunehmen. 

Dieser Schritt gestaltet sich relativ einfach und beinhaltet das Übertragen Ihres Codes ins Cloud-Git-Repository. Während dieser Phase und bis zur endgültigen Inbetriebnahme empfiehlt es sich, Neuerungen nicht mehr zuzulassen (Feature Freeze). 

Sollten jedoch Änderungen in Ihrer Produktionsumgebung unabdingbar sein oder kritische Anpassungen an Ihrer On-Premise-Installation notwendig werden, ist es möglich, den Code später in die Cloud zu übertragen. 

Bei One Inside haben wir Erfahrung mit solchen Szenarien gesammelt. Es ist jedoch wichtig zu verstehen, dass das Aussetzen von Neuerungen dabei helfen kann, Risiken wie Projektverzögerungen und eine erhöhte Komplexität zu minimieren. 

Bereit für den Umstieg auf AEM Cloud?

Warten Sie nicht länger! Kontaktieren Sie jetzt unsere Experten, und wir gestalten Ihren Umzug nahtlos und erfolgreich!

Schritt 5 – Integrationstest mit Kernsystemen oder externen APIs 

Es ist sehr wahrscheinlich, dass Ihre Webseite Daten von Drittanbieterdiensten oder internen Anwendungen nutzt. Um eine nahtlose Einbindung dieser Dienste zu gewährleisten, müssen spezifische Netzwerkeinstellungen im Cloud Manager vorgenommen werden. 

Darüber hinaus bietet AEM als Cloud Service eine statische IP-Adresse, die auf Ihrer Seite freigegeben werden muss, um die Konnektivität mit Ihren internen Anwendungen zu ermöglichen.

Dieser Schritt ist entscheidend, um eine sichere und störungsfreie Verbindung zwischen Ihrer AEM-Cloud-Umgebung und den Kernsystemen oder externen APIs zu etablieren. 

Schritt 6 – Integration von Adobe Target, Adobe Analytics und der Adobe Experience Cloud Suite 

Da Sie AEM bereits für Ihre Websites nutzen, ist es wahrscheinlich, dass Sie sich auch auf andere Lösungen innerhalb der Adobe Experience Cloud Suite verlassen, einschliesslich Adobe Analytics und Adobe Target. 

Die Einbindung dieser Lösungen gestaltet sich in der Regel unkompliziert und sie sollten reibungslos in Ihre Webseiten integriert werden können.

Durch die bereits bestehende Nutzung von AEM ist es einfacher, die Integration auf weitere Komponenten der Adobe Experience Cloud auszudehnen und so Ihre Fähigkeiten zur Analyse und Optimierung Ihrer digitalen Erlebnisse zu verbessern. 

Schritt 7 – Inhaltsmigration 

Die Übertragung von Inhalten stellt zwar einen wesentlichen Schritt dar, sollte aber keine übermässigen Sorgen bereiten, da die Struktur der Inhalte auf Ihrer bestehenden Webseite und der neu aufgesetzten AEM Cloud-Instanz gleichbleibt. 

Betrachten Sie diesen Vorgang als Umzugsprozess Ihrer Inhalte, vergleichbar mit der Übertragung von einer Staging- in eine Produktionsumgebung. 

Zudem stellt Adobe spezielle Werkzeuge zur Verfügung, die den Migrationsprozess vereinfachen, wie zum Beispiel das Content Transfer Tool für die Überführung bestehender Inhalte von AEM On-Premise in die Cloud und den Package Manager, welcher den Import und Export von Inhalten im Repository erleichtert. 

Die Inhaltsmigration umfasst mehr als nur die Webseiten selbst; es geht um das gesamte Spektrum der in Ihrem Repository vorhandenen Daten, einschliesslich: 

  • Seiteninhalte 
  • Assets 
  • Nutzer- und Gruppeninformationen 

Da es möglich ist, während der Migration weiterhin Inhalte auf Ihrer Live-Webseite zu erstellen, ermöglicht das Tool ein gezieltes Update. So können Sie ausschliesslich jene Änderungen übertragen, die seit der letzten Inhaltsmigration durchgeführt wurden, was einen effizienten und aktuellen Übergang sicherstellt. 

Schritt 8 – Testen, testen, testen 

Wir bewegen uns auf die letzten Etappen der Migration zu. Obwohl bereits in früheren Phasen Tests stattgefunden haben, ist es jetzt Zeit für eine ausgiebige User Acceptance Testing (UAT)-Phase. 

Ihr spezialisiertes Testteam sowie die Endnutzer sollten aktiv an dieser entscheidenden Phase beteiligt sein. Eine ausgearbeitete Teststrategie ist vonnöten, bevor das UAT beginnt. 

Die Einbindung der Content-Ersteller in den Testprozess erfüllt mehrere Funktionen: Sie gewöhnen sich nicht nur schneller an die neue Umgebung, sondern bringen auch ihr tiefes Verständnis für die Funktionsweise der Komponenten ein. Ihr Feedback, ihre Kenntnisse und ihre Unterstützung sind ausschlaggebend, um die Einzigartigkeit Ihrer digitalen Präsenz zu wahren. 

Durch umfassendes Testen wird sichergestellt, dass der Übergang zur AEM Cloud erfolgreich verläuft und Ihre Webseite in der neuen Umgebung ohne Schwierigkeiten funktioniert. 

Schritt 9 – Umleitung der Domain 

Dieser Schritt bildet die Voraussetzung für die Inbetriebnahme und erfordert das geschickte Management Ihres IT-Netzwerkteams.

Dieses kümmert sich um die Verwaltung von Zertifikaten, DNS-Konfigurationen und die Umschaltung der Domain. 

Wie bereits am Anfang dieses Leitfadens hervorgehoben wurde, ist es entscheidend, dass Ihre IT-Verantwortlichen von Beginn an über diese wesentlichen Schritte im Bilde sind und entsprechende Aufgaben übernommen haben.

Sie sollten gut vorbereitet sein, um zu wissen, was in dieser Phase zu tun ist, da die Vorbereitungen hierfür schon seit Wochen laufen. 

Eine effiziente Koordination in dieser Phase ist essenziell, um Verzögerungen im Gesamtprojekt und bei der Umstellung zu vermeiden. Eine reibungslose Domain-Umleitung sichert die nahtlose Überführung Ihrer Webseite in die AEM Cloud-Umgebung. 

Schritt 10 – Das Go-Live 

Dieser Abschnitt könnte als der herausforderndste erscheinen, doch paradoxerweise ist er oft der einfachste. 

Ihre Webseite hat umfangreiche Tests durchlaufen und funktioniert einwandfrei in der Cloud. Jetzt ist es an der Zeit für den endgültigen Wechsel von Ihrer lokalen AEM-Installation zur Cloud-Version. 

Für die Nutzer Ihrer Seite wird der Übergang nahtlos verlaufen, ohne jegliche Unterbrechungen. Bei gewissenhafter Planung und Durchführung markiert dieser Schritt den erfolgreichen Abschluss Ihrer Migration in die AEM Cloud. 

“Die Migration zur AEM Cloud ist eine große Zufriedenheitsquelle sowohl für die Geschäfts- als auch für die IT-Verantwortlichen, die Website aktiv in der Cloud laufen zu sehen, in eine neue Ära besserer Leistung und spannender Möglichkeiten einzutreten, um das Kundenerlebnis zu verbessern”
Martyna Wilczynska

Martyna Wilczynska

Project Manager at One Inside – A VASS Company

Schritt 11 – Schulung 

Eine umfassende Schulung Ihrer Inhaltsverantwortlichen ist nicht notwendig, da die Benutzeroberfläche unverändert bleibt. 

Dennoch wurde mit der Migration in die Cloud ein neues, bedeutendes Werkzeug – der Adobe Cloud Manager – eingeführt. 

Ihre IT- oder DevOps-Teams sollten in der Lage sein, dieses Tool zu verwalten, alternativ kann die Wartung Ihrer Webseite an einen Adobe-Partner übertragen werden. 

Wir bei One Inside bieten Trainings an, um sicherzustellen, dass Ihr IT-Personal mit den notwendigen Fähigkeiten ausgestattet ist, um wichtige Aufgaben wie die Verwaltung von SSL-Zertifikaten, Domain-Verlinkungen, Whitelistings und Account-Management zu übernehmen. 

Schritt 12 – Stilllegung der On-Premise-Instanz 

Abschliessend empfehlen wir, Ihre lokale Server-Instanz noch für einen Zeitraum von zwei bis vier Wochen nach der Migration aktiv zu lassen. Diese Vorsichtsmassnahme dient als Sicherheitsnetz, falls Sie aus irgendeinem Grund auf die lokale Version zurückgreifen müssen. 

Obwohl eine Rückkehr zur On-Premise-Instanz nach unserer Erfahrung selten erforderlich ist, ist es klug, sich gegen dieses potenzielle Risiko abzusichern. Nachdem die intensive Betreuungsphase abgeschlossen ist, können Sie sich voll und ganz auf Ihre neue AEM als Cloud-Service-Instanz verlassen, wissend, dass ein Notfallplan bereitsteht. 

Benötigen Sie persönliche Beratung durch unsere Experten?

Bereit, Ihre AEM Cloud-Migration zu erkunden? Vereinbaren Sie einen Termin mit uns, damit wir Ihre Bedürfnisse bewerten und Ihnen helfen können, sich auf einen nahtlosen Umzug vorzubereiten!

Erfahrungen und Best Practices aus der Migration zu AEM als Cloud-Service 

Unser Team hat durch mehrere erfolgreiche Migrationen zu AEM als Cloud-Service wertvolle Erfahrungen gesammelt. Wir möchten Ihnen einige bewährte Methoden vorstellen, die dazu beitragen können, das Risiko bei Ihrem Migrationsprojekt zu minimieren. 

Starten Sie mit einer detaillierten Analyse 

Es ist empfehlenswert, Ihr Projekt zur Cloud-Migration mit einer gründlichen Analyse zu beginnen. Vermeiden Sie eine vorschnelle Beurteilung Ihrer bestehenden AEM-On-Premise-Installation. Eine sorgfältige Bewertung von Abhängigkeiten und zu überarbeitenden Elementen ist unerlässlich. 

Falls Sie erstmalig eine Migration durchführen, investieren Sie genügend Zeit in die Recherche und Dokumentation für ein solches Vorhaben. Selbst wenn Sie ein internes Team haben, das sich um AEM kümmert, ist die Hinzuziehung eines erfahrenen Adobe-Partners empfehlenswert. Deren Expertise kann massgeblich zum Erfolg Ihrer Migration beitragen. 

Stakeholder-Abhängigkeiten managen 

Es ist ausschlaggebend, die Abhängigkeiten der Projektbeteiligten frühzeitig zu berücksichtigen. Verschiedene Mitglieder Ihrer Organisation werden bei wichtigen Meilensteinen des Projekts eine zentrale Rolle spielen. 

Wir haben die Bedeutung des IT-Teams bei der Netzwerkverwaltung bereits hervorgehoben, aber auch andere Gruppen können involviert sein, z.B. Sicherheit und Qualitätssicherung. 

Zu Beginn des Projekts ist es wichtig, diesen Teams klare Erwartungen zu kommunizieren und genaue Termine für ihre Mitwirkung festzulegen. Ein solch proaktiver Ansatz hilft, Verzögerungen zu vermeiden und gewährleistet einen reibungslosen Projektverlauf. 

Über den Rahmen eines typischen Scrum-Projekts hinaus 

Es mag überraschen, aber ein Cloud-Migrationsprojekt unterscheidet sich deutlich von einem herkömmlichen, nach Scrum organisierten IT-Projekt.

Normalerweise fokussieren wir uns darauf, schnellstmöglich den höchstmöglichen Wert zu liefern und suchen dabei kontinuierlich das Feedback des Kunden.

Bei einem Cloud-Migrationsprojekt liegt der Schwerpunkt jedoch auf der Überarbeitung des Backend-Codes, der erst in den finalen Phasen präsentiert werden kann, wenn sich die Webseite bereits in der Cloud-Testumgebung befindet. Das in uns gesetzte Vertrauen der Kunden motiviert uns, aussergewöhnliche Ergebnisse zu erzielen. 

Regelmässige Meetings mit Team und Stakeholdern 

Angesichts eines straffen Zeitplans von drei Monaten ist es unerlässlich, regelmässige Updates mit Ihrem Team und den Schlüsselakteuren durchzuführen. Wir empfehlen dringend, wöchentliche Besprechungen einzuplanen, um den Fortschritt zu überwachen, Risiken zu identifizieren und Gegenmassnahmen zu planen. 

Bei diesen wöchentlichen Meetings sollten Sie besonders auf die Interaktionen mit anderen Teams achten und den Fortschritt ihrer Tätigkeiten bewerten. Ein solcher proaktiver Ansatz gewährleistet, dass alle Beteiligten im Einklang arbeiten und schnell auf veränderliche Anforderungen des Projekts reagieren können. 

“Klare Kommunikation mit den Kunden ist entscheidend für die Risikominderung, die Identifizierung von Problemen und das Aktualisieren des Fortschritts während der Migration. Es verringert den Stress der Kunden und gewährleistet Transparenz in ihrer digitalen Reise.”
Michael Kleger

Michael Kleger

Project Manager at One Onside

Partnerschaft mit Adobe 

Der Zugang zur Cloud-Umgebung setzt Verhandlungen über die Lizenzierung voraus. Es ist ebenso entscheidend, mit Ihrem Adobe-Kundenbetreuer zu diskutieren, ob ein Standby-Server als temporäre Sicherheitslösung lokal verbleiben kann.

Unsere Erfahrungen zeigen, dass frühzeitige Gespräche zu einem möglichst vorteilhaften und flexiblen Wechsel von der On-Premise-Infrastruktur beitragen. 

Zudem bietet Adobe Unterstützung bei eventuell auftretenden Problemen. Es kann vorkommen, dass einige Funktionen nach der Anpassung für die Cloud nicht wie erwartet funktionieren.

Um schnelle Unterstützung vom Adobe-Support zu erhalten, ist eine enge Zusammenarbeit mit einem Adobe-Partner, der eine gute Beziehung zu Adobe pflegt, vorteilhaft. 

Bei One Inside beispielsweise unterhalten wir seit über einem Jahrzehnt eine enge Partnerschaft mit Adobe. Unsere Nähe zum AEM-Entwicklungsteam, das für AEM als Cloud-Service zuständig ist, erweist sich oft als unschätzbar wertvoll.

Die langjährige Verbindung zu Adobe und seinen talentierten Mitarbeitern erleichtert es uns, effiziente Lösungen zu finden, da wir direkte Ansprechpartner haben und nicht durch mehrere Support-Ebenen navigieren müssen. 

Entwicklungen auf der On-Premise-Instanz während der Migration vermeiden 

Verzichten Sie möglichst auf neue Entwicklungen auf Ihren Live-Webseiten während der Migration. Dies verhindert viele potenzielle Probleme. 

Wir wissen, dass ein dreimonatiger Code-Freeze oft nicht praktikabel ist. Um mögliche Schwierigkeiten zu minimieren, sollten Sie aber sicherstellen, dass der Code in beiden Umgebungen synchronisiert und für die Cloud optimiert wird, bevor weitere Verbesserungen an Ihrer lokalen Installation vorgenommen werden. Diese Koordination reduziert Komplikationen während des Migrationsprozesses. 

Gelegenheit zur Verbesserung nutzen 

Der Migrationsprozess bietet die Chance, Ihre Webseite umfassend zu testen. Nutzen Sie diese Gelegenheit, um verschiedene Aspekte Ihrer Webseite zu verbessern, wie die Architektur, das Code-Refactoring und kleinere Designanpassungen. 

In unseren Migrationsprojekten haben wir erfolgreich Verbesserungen wie Bildgenerierung, Frontend-Optimierungen und Leistungssteigerungen durchgeführt. Dieses Zeitfenster ermöglicht Ihnen nicht nur den Wechsel in die Cloud, sondern auch die Verbesserung der Gesamtqualität und Funktionalität Ihrer Webseite. 

Schlussfolgerungen für Ihre Migration zu AEM als Cloud-Service 

Abschliessend lässt sich sagen, dass die Migration zu AEM als Cloud-Service eine transformative Reise darstellt, die eine sorgfältige Planung und Ausführung erfordert. 

AEM Cloud-Service repräsentiert die Zukunft von AEM, und diese Migration legt dafür den Grundstein. 

In diesem Artikel haben wir wertvolle Einblicke und Best Practices aus erfolgreichen AEM Cloud-Migrationen geteilt. Von der Analyse von Abhängigkeiten bis hin zur Pflege einer soliden Beziehung zu Adobe, von regelmässigen Team-Updates bis zur Optimierung von Designfehlern – all diese Erkenntnisse können Ihnen zu einer erfolgreichen Migration verhelfen. 

Stellen Sie sich den Herausforderungen und Chancen des Wechsels in die Cloud und bedenken Sie, dass eine gut durchgeführte Migration zu einer effizienteren, sichereren und innovativeren digitalen Erfahrung für Ihr Unternehmen und seine Nutzer führen kann. Mit der richtigen Strategie und der Unterstützung durch erfahrene Partner können Sie diesen Weg sicher beschreiten und ausgezeichnete Ergebnisse erzielen. 

Wir möchten unsere Dankbarkeit gegenüber den talentierten Personen in unserem Unternehmen zum Ausdruck bringen, die zu diesem Artikel beigetragen haben, einschließlich Martyna Wilczynska, Basil Kohler, Michael Kleger und Samuel Schmitt.

Samuel Schmitt

Samuel Schmitt

Digital Solution Expert

Möchten Sie unseren nächsten Artikel erhalten?

Abonnieren Sie unseren Newsletter und wir senden Ihnen den nächsten Artikel über Adobe Experience Manager.

The post 12 Schritte für eine Migration von AEM On-Premise zur AEM Cloud appeared first on One Inside.

by Samuel Schmitt at January 31, 2024 05:36 PM

January 26, 2024

Things on a content management system - Jörg Hoh

Performance tests modelling (part 1)

In my last blog post about performance test I outlined best practices about building and executing a performance test with AEM as a Cloud Service. But intentionally I left out a huge aspect of the topic:

  • How should your test look like?
  • What is a realistic test?
  • And what can a test result tell you about the behavior of your production environment?

These are hard question, and I often find that these questions are not asked. Or people are not aware that these questions should be asked.

This is the first post in a series of blog posts, in which I want to dive a bit deeper into performance testing in the context of AEM and AEM CS (and many aspects can probably get generalized to other web applications as well). Unlike my other blog posts it addresses topics on a higher level (I will not refer to any AEM functionality or API, and won’t even mention AEM that often), because I learned over time, that very often performance tests are done based on a lot of assumptions. And that it’s very hard to discuss the details of a performance tests if these assumptions are not documented explicitly. I had such discussions in these 2 contexts:

  • The result of a performance test (in AEM as a Cloud Service) is poor and the customer wants to understand what Adobe will do.
  • After golive severe performance problems show up on production; and the customer wants to understand how this can happen as their tests showed no problems.

As you can imagine, if I am given just a few diagrams with test results and test statistics as preparation for this call with the customer … this is not enough, and very often more documentation about the tests is not available. Which often leads to a lot of discussions about some very basic things and that adds even more delay to an already late project and/or bad customer experience. So you can also consider this blog series as a kind of self-defense. If you were asked to read this post, now you know 🙂

I hope that this series will also help you improve your way of doing performance tests, so we all will have less of these situations to deal with.

This post series consists of these individual posts:

And a word upfront to the term “performance test”: I summarize a number of different tests types under that term, which are executed with different intentions, and which come with many names: “Performance tests”, “Load tests”, “Stress tests”, “Endurance tests”, “Soak tests”, and many more. Their intention and execution differ, but in the end they can all benefit from the same questions which I want to cover this blog series. So if you read “performance test”, all of these other tests are meant as well.

What is a performance test? And why do we do them?

A performance test is a tool to predict the future, more specifically how a certain system will behave in a more-or-less defined scenario.

And that outlines already two problems which performance tests have.

  • It is a prediction of the future. Unlike a science experiment it does not try to understand the presence and extrapolate into the future. It does not the have same quality as “tomorrow we will have a sunrise, even if the weather is clouded”, but rather goes into the direction of “if my 17 year old son wants to celebrate his birthday party with his friends at home, we better plan a full cleaning of the house for the day after”. That means no matter how well you know your son and his friends (or the system you are building), there is still an element of surprise and unknown in it.
  • The scenario which we want to simulate is somehow “defined”. In quotes, because in many cases the definitions of that scenario are pretty vague. We normally base these definitions on previous experience we have made and some best practices of the industry.

So it’s already clear from these 2 items, that this prediction is unlikely to be exact and 100% accurate. But it does not need to be accurate, it just needs to be helpful.

A performance test is helpful if it delivers better results than our gut feeling; and the industry has learned that our gut feeling is totally unreliably when it comes to the behaviour of web applications under production load. That’s why many enterprise golive procedures require a performance tests, which will always deliver a more reliable result as gut feeling. But just creating and executing a performance test does not make this a helpful performance test.

So a helpful performance test is also a test, which mimics the reality close enough, that you don’t need to change your plans immediately after your system goes live and hits reality. Unfortunately you only know if your performance test was helpful after you went live. It shares this situation with other test approaches as well; for example a 100% unittest coverage does not mean, that your code does not have bugs, it’s just less likely.

What does that mean for performance tests and their design?

First, a performance test is based on a mental model of your system and the to-be reality, which must be documented. All its assumptions and goals should be explicitly documented, because only then a review can be done. And a review helps to uncover blind spots in our own mental model of the system, its environment and the way how it is used. It helps to clearly outline all known factors which influence the test execution and also its result.

Without that model, it is impossible to compare the test result with reality and try to understand which factor or aspect in the test was missing, misrepresented or not fully understood, which lead to a gap between test result and reality. If you don’t have a documented model, it’s possible to question everything, starting from the model to the correct test execution and the results. If you don’t have a model, the result of a performance test is just a PDF with little to no meaning.

Also you must be aware that this mental model is a massive simplification, as it is impossible to factor in all aspects of the reality, also because the reality changes every day. You will change your application, new releases of AEM as a Cloud Service will be deployed, you add more content, and so on.

Your mental model will never be complete and probably also never be up-to-date, and that will be reflected in your performance test. . But if you know that, you can factor it in. For example you know that in 3 months time the number of content has doubled, and you can decide if it’s helpful to redo the performance test with changed parameters. It’s now a “known unknown”, and no more a “unknown unknown”. You can even decide to ignore factors, if they deem not relevant to you, but of course you should document it.

When you have designed and documented such a model, it is much easier to implement the test, execute the test and reason about the results. Without such a model, there is much more uncertainties in every piece of the test execution. It’s like developing software without a clear and shared understanding what exactly you want to develop.

That’s enough for this post. As promised, this is more abstract than usual, but I hope you liked it and it helps to improve your tests. In part 2 I look into a few relevant aspects which should be covered by your model.

by Jörg at January 26, 2024 05:27 PM

January 23, 2024

CQ5 Blog - Inside Solutions

12 Steps to Migrate AEM from On-Premise to the Cloud

12 Steps to Migrate AEM from On-Premise to the Cloud

Is your organization harnessing the full potential of Adobe Experience Manager (AEM) to deliver exceptional digital experiences across various channels?

If you currently run your websites on AEM On-Premise or rely on Adobe’s Managed Services, it’s time to embark on a journey into the Cloud.

In 2020, AEM introduced the next generation of CMS with AEM as a Cloud Services. It’s time for you and your company to prepare for this transition and embrace this next-generation CMS in the Cloud.

At One Inside – A Vass Company, we’ve worked with several large enterprises, helping them move from AEM on-premise to AEM Cloud Service, executing seamless migrations in less than three months.

Within this comprehensive guide, our AEM experts have compiled their knowledge and address the following questions:

  • How can you smoothly transition from AEM on-premise to AEM Cloud?
  • What are the critical steps for a successful migration to AEM Cloud?
  • What common pitfalls should you avoid?

But before delving into the steps, let’s explore the compelling advantages of embracing AEM Cloud.

What are the benefits of moving to AEM Cloud?

As with any enterprise project, it is essential to demonstrate the clear benefits of migrating your AEM installations to the Cloud to your organization and board.

Let’s explore why this transition is a necessary step.

Moving from AEM On-Premises or managed services to AEM Cloud offers numerous advantages, including:

Reduced Cost of Ownership and Mid-term ROI

The total cost of ownership with AEM Cloud is drastically reduced.  Your company might get savings on several aspects:

  • License: Licensing costs may decrease since the new pricing model is usage-based. Additionally, transitioning to the Cloud provides you with a fresh opportunity to engage in price negotiations with Adobe.
  • Operational Costs: AEM Cloud simplifies many operational aspects, such as environment management and automated version updates.
  • Infrastructure and Hosting: If you previously hosted AEM on your premises, you’ll experience substantial infrastructure and hosting expenses savings. This eliminates the cost of maintaining infrastructure.
  • Workforce: The number of full-time employees (FTEs) required for the project will decrease, resulting in cost reductions.

While the migration project incurs initial expenses, our team has successfully migrated websites to AEM Cloud in less than three months.

The timeline can vary depending on integration complexity and the number of websites and domains involved.

Based on our analysis, the return on investment (ROI) for such a project typically falls below three years. In other words, migrating to AEM Cloud is a worthwhile investment.

Your CMS is always up-to-date, ensuring you have access to the latest features.

With AEM as a Cloud Services, you can say goodbye to version upgrade projects.

Adobe automatically updates the CMS with the latest features, eliminating the concept of versions. It operates like any other Software as a Service, ensuring you are always working with the most current version.

It’s more secure

Security is a primary concern for large enterprises, and AEM as a Cloud Service could offer enhanced security compared to your current setup.

The solution is continuously monitored, and regular patches are applied promptly whenever a security issue is detected.

Read this document about Adobe Cloud Service Security Overview for more details.

99.9% Uptime

With AEM Cloud, your website will always be online. This solution can efficiently scale horizontally and vertically to consistently maintain this high level of service, effectively managing even the most intensive traffic loads.

What are the main benefits of Adobe Experience Manager as a Cloud Service?

No Learning Curve

One significant advantage of transitioning to AEM Cloud is that your marketing team will find the tool familiar.

Despite significant changes in architecture, release processes, and operations, the end-user experience remains unchanged.

Content editors won’t notice any differences following the migration if you use the latest on-premise version.

This means you won’t need to invest time and resources in managing this change or providing extensive training to your team.

Focus on Innovation and Achieve a Faster Time to Market

Managing the operation of an Enterprise CMS is a practice rooted in the past. It’s time for your organization to embrace this new reality.

With AEM Cloud, you can accelerate innovation for several reasons:

  • Your workforce can be fully dedicated to projects that create value.
  • You gain access to the latest innovations from Adobe.

Thanks to our extensive experience with AEM Cloud Service and collaboration with multiple clients, we have witnessed a significantly improved time to market. Projects are completed swiftly, and new websites can be launched within months.

When your company has a new product or service to showcase, you’ll reap the benefits of working with this new generation CMS.

Moving from AEM On-Premise to AEM as a Cloud Service step-by-step

This section will guide you through migrating from AEM On-Premise to AEM as a Cloud Service.

Each step is carefully designed to ensure a smooth and successful transition to the Cloud, covering critical aspects from initial analysis to going live.

AEM On Premise to Cloud Migration Project Steps

Step 1 – Analyze, Plan, and Estimate the Effort

The initial step in this journey is to understand AEM as a Cloud Service and the associated changes and deprecated features.

Some noteworthy changes include:

  • Architecture changes with automatic horizontal scaling
  • Project code structure
  • Asset storage
  • Built-in CDN
  • Dispatcher configuration
  • Network and API connections, including IP whitelisting
  • DNS & SSL certificate configuration
  • CI/CD pipelines
  • AEM author access with Adobe account
  • User groups & permissions

Additionally, it’s crucial to evaluate your current AEM installation, particularly in terms of connections and integrations with other services:

  • APIs or endpoints within the internal network
  • Third-party services, especially those protected by IP whitelisting
  • Any data import services to AEM
  • Login with closed user group (CUG)

These elements should be carefully reviewed, as some adjustments may be necessary.

Another critical aspect is effective communication with current stakeholders, partners, and the Adobe team. Onboarding these parties from the project’s outset is essential, with clear task assignments and timeframes.

For example, you will later discover that the involvement of your internal IT team is required. Informing them in advance is crucial to prevent project delays.

Furthermore, it’s essential to review your licensing agreements with Adobe and ensure that you have the appropriate subscriptions for AEM as a Cloud Service.

While this initial step may only take a few days, it is vital in assessing critical aspects of your installation, defining the project plan and effort, and sharing this information with key stakeholders.

Step 2 – Prepare the code for AEM as a Cloud Service

This step aims to ensure your current AEM installation and its code base are ready for the Cloud while remaining compatible with your existing on-premise instances.

While we won’t go deep into all the structural changes required for AEM Cloud in this article, we’ll provide an overview to keep it easily digestible for all readers.

Adobe offers a helpful tool called the Adobe Best Practices Analyzer designed to evaluate your current AEM implementation and offer guidance on improvements to align with best practices and Adobe standards.

The report generated by this tool covers:

  • Application functionality in need of refactoring.
  • Repository items that should be relocated to supported locations.
  • Legacy user interface dialogs and components that require modernization.
  • Deployment and configuration issues.
  • AEM 6.x features replaced by new functionalities or are currently unsupported on AEM as a Cloud Service.

It’s important to note that an AEM expert should review the Adobe Best Practices Analyzer report, as it will not fully comprehend the entire codebase and its implications.

Following the assessment, an AEM architect or developer can restructure the codebase and apply new practices per the latest AEM Archetype.

A recommended practice is further refactoring and reviewing outdated features from your current codebase.

Since comprehensive testing of the entire website and application will be necessary later on, taking the opportunity to eliminate technical debt and establish a more robust foundation is advantageous.

Step 3 – Prepare AEM Cloud Environments

This step aims to prepare the cloud environment and set up AEM Cloud Manager, the backbone of AEM as a Cloud Service. Importantly, this step can be conducted concurrently with the previous one.

Adobe Cloud Manager offers a user-friendly interface that simplifies configuring environments, setting up pipelines, and configuring certificates, DNS, and other essential services.

Step 4 – Migrate Your Projects and Code to AEM Cloud

By this stage, your code has been refactored, and any changes incompatible with the on-premise setup have been implemented and migrated to make it cloud-ready.

Additionally, all necessary environments (test, staging, production) have been appropriately configured and are ready to host your code.

This step is relatively straightforward and involves pushing your code to the Cloud Git repository. During this phase and until the go-live, it is advisable to enforce a feature freeze.

However, if you cannot afford to freeze features in your production environment or if critical changes must be applied to your on-premise installation, it is feasible to backport the code to the Cloud later.

At One Inside, we have experience handling such situations, but it’s essential to understand that a code freeze can help mitigate the risks of project delays and increased complexity.

Ready to Move to AEM Cloud?

Don’t wait any longer! Reach out to our experts now, and let’s make your move seamless and successful!

Step 5 – Validate Integration with Core Services or External APIs

Chances are, your website relies on data from third-party services or internal applications.

To ensure seamless integration with these services, specific network configurations must be carried out using the Cloud Manager.

Furthermore, AEM as a Cloud Service offers a static IP address that must be whitelisted on your end to enable connectivity with our on-premise applications.

This step is crucial for establishing a secure and uninterrupted connection between your AEM Cloud environment and your core services or external APIs.

Step 6 – Integrate Adobe Target, Adobe Analytics, and the Adobe Experience Cloud Suite

Since you are already utilizing AEM for your websites, it’s probable that you also rely on other solutions within the Adobe Experience Cloud suite, including Adobe Analytics and Adobe Target.

The integration of these solutions is typically straightforward, and they should seamlessly operate within your web pages.

Your existing usage of AEM makes it easier to extend the integration to other Adobe Experience Cloud components, enhancing your ability to analyze and optimize your digital experiences.

Step 7 – Migrate Content

Content migration is an important step, but it doesn’t have to be overly concerning. The structure of the content between your on-premise website and the newly created AEM Cloud website remains the same.

To make this process sound less daunting, you can think of it as a content move, similar to transferring content from your staging environment to the production environment.

Additionally, Adobe offers various tools to streamline this task, such as the Content Transfert Tool, which is specially designed for migrating existing content from your AEM On-Premise to AEM Cloud, and the Package Manager, which facilitates the import and export of repository content.

When we refer to content migration, it encompasses more than just pages; it includes all content within your repository, including:

  • Page content
  • Assets
  • User and group data

Furthermore, since you may continue to create content on your productive site while performing the migration, the tool supports differential content top-up.

You can only transfer changes made since the last content migration, ensuring an efficient and up-to-date transition.

Step 8 – Test, Test, Test

We are approaching the final stages of the migration journey. Although some testing has occurred throughout the various steps, it’s now time for a comprehensive User Acceptance Testing (UAT) session.

Your dedicated testing team and business users should actively participate in this critical phase. It’s essential to have a detailed test strategy in place before commencing UAT.

Including authors in the testing process serves multiple purposes.

Not only does it expedite their familiarity with the new environment, but they are also the individuals most acquainted with how the components should function.

Their input, knowledge, and support are pivotal in ensuring your digital presence remains clear and distinctive.

Conducting thorough testing ensures your migration to AEM Cloud is successful, and your website operates seamlessly in its new environment.

Step 9 – Redirect Domains

This is the final step before going live, and it’s the point where your IT network team plays a key role.

They will manage certificates, DNS configurations, and domain redirection.

As emphasized at the beginning of this guide, it’s crucial that your IT stakeholders were informed from day one of this project about these critical milestones, and tasks were allocated accordingly.

They should be well-prepared and aware of what needs to be done, as preparations for this phase have been ongoing for several weeks.

Effective coordination in this step is essential to prevent delays in the overall process and the go-live date.

Ensuring a smooth domain redirection, your website seamlessly transitions to its new AEM Cloud environment.

Step 10 – Go Live

This step might seem the most stressful, but paradoxically, it’s also the simplest.

Your website has undergone extensive testing, and everything functions seamlessly in the cloud environment. It’s time for the final transition, shifting from your AEM On-Premise instance to the AEM Cloud instance.

The switch will be seamless for your end-users, and they won’t experience any interruptions in service. With careful planning and execution, this step should mark the successful culmination of your migration to AEM Cloud.

“The migration to AEM Cloud is a source of great satisfaction for both the business and IT stakeholders to see the website actively running in the Cloud, moving into a new era of better performance and exciting possibilities to enhance the customer experience. ”
Martyna Wilczynska

Martyna Wilczynska

Project Manager at One Inside – A VASS Company

Step 11 – Train your Team

Your editors won’t require specific training as the admin interface remains the same.

However, it’s important to note that a new essential tool, Adobe Cloud Manager, has been introduced.

Your IT or DevOps teams should manage this tool, or you can delegate site maintenance to your Adobe Partner.

Our AEM experts can offer training to ensure your IT team possesses the necessary skills and knowledge to handle critical tasks related to SSL Certificates, domain linking, whitelisting, and account management.

Step 12 – Decommission the On-Premise Instance

As a final recommendation, keeping your on-premise server running for 2 to 4 weeks after the migration is advisable.

This precaution provides a safety net in case of any critical situations where you might need to switch back to the on-premise instance.

While, based on our experience, such a reversal is rarely necessary, it’s prudent to manage this potential risk.

Once the hyper-care phase is concluded, you can confidently shift your entire focus to your new AEM as a Cloud Service instance, knowing you have a contingency plan in place if needed.

Need personal guidance with our experts?

Ready to explore your AEM Cloud migration? Book some time with us, so we can evaluate your needs and help you prepare for a seamless move!

Lessons Learned from AEM as a Cloud Service Migration Projects and Best Practices

After several successful migrations to AEM as a Cloud Service, our team has gathered excellent knowledge, and we would like to share some best practices that will help you mitigate the risk in this project.

Start with a Thorough Analysis

Begin your Cloud Migration project with a comprehensive analysis. Avoid rushing the assessment of your current AEM On-Premise setup. It’s crucial to evaluate dependencies and elements that require refactoring carefully.

If this is your first migration, invest time in research and documentation for a project of this nature.

Even if you have an internal team handling AEM, consider seeking support from an experienced Adobe Partner. Their expertise can prove invaluable in ensuring a successful migration.

Manage Stakeholders’ Dependencies

Taking care of stakeholders’ dependencies early in the project is crucial. Multiple members of your organization will play pivotal roles at significant project milestones.

We’ve already mentioned the IT team’s role in managing the network, but other groups may be involved, such as security and quality assurance.

At the project’s start, it’s essential to communicate your expectations clearly with these teams and provide them with precise dates for their involvement.

This proactive approach helps prevent delays and ensures a smooth progression of the project.

Not your typical Scrum project

What may come as a surprise is that a Cloud Migration project does not fully correspond to your typical Scrum-managed IT project.

In the regular framework, we focus on delivering the highest presentable value in the shortest amount of time, and we present our solutions to the clients, constantly asking for feedback.

An AEM Cloud Migration project primarily involves refactoring the backend code, which may not be presentable to the stakeholders until the website is in the acceptance environment in the Cloud and ready for testing.

Regular Team and Stakeholder Meetings

As the three-month timeline swiftly progresses, staying in sync with your team and key stakeholders is essential.

We highly recommend establishing a weekly update routine to track progress, identify and address risks, and implement mitigation plans.

During these weekly reviews, pay particular attention to dependencies with other teams and assess the advancement of their activities. This proactive approach ensures everyone is aligned and swiftly responds to evolving project needs.

“Clear communication with clients is key to risk mitigation, issue identification, and progress updates during the migration. It alleviates client stress and ensures transparency in their digital journey.”
Michael Kleger

Michael Kleger

Project Manager at One Onside

Relationship with Adobe

License negotiations must be completed to gain access to the cloud environment.

Equally important is discussing with your Adobe account manager to negotiate to keep a standby server on-premise for a specified period as a fallback.

From our experience, initiating such conversations as early as possible allows for negotiating the most advantageous and flexible transition away from the on-premise infrastructure.

Furthermore, in the event of unexpected issues, you may require support from Adobe’s team. It’s possible that certain features may not function properly when refactored for the Cloud.

To expedite the response time of Adobe Support, it is essential to collaborate with an Adobe Partner who maintains a strong relationship with the Adobe team.

For instance, at One Inside, we have cultivated a partnership with Adobe spanning over a decade, and our office is located within 30km of the AEM team responsible for building AEM as a Cloud Service.

This close relationship can be invaluable in certain situations. Over the years, we have developed a robust relationship with Adobe as a company and its talented individuals.

This gives us an advantage in problem-solving, as we possess intimate knowledge of whom to contact without navigating multiple support levels.

Avoid Developing on the On-Premise Instance During Migration

Avoid introducing new developments to your live websites whenever possible while the migration progresses. This practice helps prevent numerous issues.

However, we acknowledge that implementing a three-month code freeze is often impractical.

To mitigate potential problems, ensure that the code on both environments is synchronized and optimized for the Cloud before making any further enhancements to your on-premise branch.

This alignment minimizes complications during the migration process.

Leverage the Opportunity to Enhance Design Flaws

During the migration process, you’ll have the opportunity to test your entire website thoroughly.

Seize this moment to enhance various aspects of your site, including architecture, code refactoring, and minor design adjustments.

In our migration projects, we’ve successfully incorporated improvements such as image rendition generation, frontend enhancements, and optimizations related to performance and caching.

This migration window allows you to transition to the Cloud and enhance your website’s overall quality and functionality.

Key Takeaways for Your AEM as a Cloud Service Migration

In conclusion, migrating to AEM as a Cloud Service is a transformative journey that requires careful planning and execution.

AEM Cloud Service is the future of AEM and this migration sets the foundation.

Throughout this article, we’ve shared valuable insights and best practices from successful AEM Cloud migrations. From analyzing dependencies to fostering solid relationships with Adobe, from weekly team updates to optimizing design flaws, these lessons can guide you toward a successful migration.

Embrace the challenges and opportunities of transitioning to the Cloud, and remember that a well-executed migration can lead to a more efficient, secure, and innovative digital experience for your organization and its users.

With the right approach and the support of experienced partners, you can confidently navigate this journey and deliver excellent results.

We would like to express our gratitude to the talented individuals within our company who contributed to this article, including Martyna Wilczynska, Basil Kohler, Michael Kleger and Samuel Schmitt.

Samuel Schmitt

Samuel Schmitt

Digital Solution Expert

Would you like to receive our next article?

Subscribe to our newsletter and we will send you the next article about Adobe Experience Manager.

The post 12 Steps to Migrate AEM from On-Premise to the Cloud appeared first on One Inside.

by Samuel Schmitt at January 23, 2024 10:37 AM

January 12, 2024

Things on a content management system - Jörg Hoh

Sling Model Exporter & exposing ResourceResolver information

Welcome to 2024. I will start this new year with a small advice regarding Sling Models, which I hope you can implement very easy on your side.

The Sling Model Exporter is based on the Jackson framework, and it can serialize an object graph, with the root being the requested Sling Model. For that it recursively serializes all public & protected members and return values of all simple getters. Properly modeled this works quite well, but small errors can have large consequences. While missing data is often quite obvious (if the JSON powers an SPA, you will find it not properly working), too much data being serialized is spotted less frequently (normally not at all).

I am currently exploring options to improve performance, and I am a big fan of the ResourceResolver.getPropertyMap() API to implement a per-resourceresolver cache. While testing such an potential improvement I found customer code, in which the ResourceResolver is serialized via the Sling Model Exporter into JSON. In that case the code looked like this:

@SlingModel
public class MyModel {
 @Self
 Resoruce resource;

 ResourceResolver resolver;

 @PostConstruct
 public void init() {
   resolver = resource.getResourceResolver();
 }
}

(see this good overview at Baeldung of the default serialization rules of Jackson.)

And that’s bad in 2 different aspects:

  • Security: The serialized ResourceResolver object contains next to the data returned by the public getters (e.g. the search paths, userId and potentially other interesting data) also the complete propertyMap. And this serialized cache is probably nothing you want to expose to the consumer of this JSON.
  • Exceptions: If the getProperty() cache contains instances of classes, which are not publicly exposed (that means these class definitions are hidden within some implementation packages), you will encounter ClassNotFound exceptions during serialization, which will break the export. And instead a JSON you get an internal server error or a partially serialized object graph.

In short: It is not a good idea to serialize a ResourceResolver. And honestly, I have not found a reason to say why this should be possible at all. So right now I am a bit hesitant to use the propertMap as cache, especially in contexts where the Sling Model Exporter might be used. And that blocks me to work on some interesting performance improvements 😦

To unblock this situation, we have introduced a 2 step mechanism, which should help to overcome this situation:

  1. In the latest AEM as a Cloud Service release 14697 (both in the cloud as well as in the SDK) a new WARN message has been added when your Model definition causes a ResourceResolver to be serialized. Search the logs for this message “org.apache.sling.models.jacksonexporter.impl.JacksonExporter A ResourceResolver is serialized with all its private fields containing implementation details you should not disclose. Please review your Sling Model implementation(s) and remove all public accessors to a ResourceResolver.
    It should contain also a referecene to the request path, where this is happening, so it should be easily possible to identify the Sling model class which triggers this serialization and change that piece of code so the ResourceResolver is not serialized anymore. Note, that the above message is just a warning, the behavior remains unchanged.
  2. As a second measure also functionality is implemented, which allows to block the serialization of ResourceResolver via the Sling Model Exporter completely. Enabling this is a breaking change for all AEM as a Cloud Service customers (even if I am 99.999% sure that it won’t break any functionality), and for that reason we cannot enable this change on the spot. But at some point this is step is necessary to guarantee that the above listed 2 problems will never happen.

Right now the first step is enabled, and you will see this log message. If you see this log message, I encourage you to adapt your code (the core components should be safe) so ResourceResolvers are no longer serialized.

In parallel we need to implement step 2; right now the planning is not done yet, but I hope to activate step 2 some time later in 2024 (not before mid of the year). But before this is done, there will be formal announcements in the AEM release notes. And I hope that with this blog post and the release notes all customers have adapted their implementation, so that setting this switch will not change anything.

Update (January 19, 2024): There is now a piece of official AEM documentation covering this situation as well.

by Jörg at January 12, 2024 06:23 PM

December 16, 2023

Things on a content management system - Jörg Hoh

A review of 2023

It’s again December, and so time to review a bit my activities of 2023 in this blog.

I have to admit, I am not a reliable writer, as I write very infrequent. And it’s not because of lack of time, but rather because I rarely find content which (in my opinion) is worth to write about. I don’t have large topics, which i split up into a series of posts. If you ever saw a smaller series of posts, that mostly happen by accident. I was just working on aspects of the system, which at some point I wrote about and afterwards started to understand more. That was the default of the last 15 years … (OMG, am I blogging here really for that long already? Probably, the first post “Why use the dispatcher?” went live on December 22, 2008. So this is the 15th anniversary.)

I started 2023 with 4 posts till the end of of September:

But something changed in October. I had already prepared 2 postings, but I started to come up with more topics within days; it ended with 6 blog posts in October and November, which is quite a pace for this blog:

It felt incredible to be able to announce every few days a new blog post. I don’t think that I can keep that frequency, but I will try to write more often in 2024. I just noted a few topics for the next posts already, stay tuned 🙂

Also, if you are reading this blog because you found the link to it somewhere, but you are interested in the topics I write about: You can get notified of new posts immediately by providing me (well, WordPress) your email address (you should see it on the right rail of this page). Alternatively if you are old-style, you can also subscribe to the RSS Feed of this blog, which also contains the full content of the postings. That might be interesting for you, as I normally reference new posts on other channels with some delay, and sometimes I even skip it completely (or simply forget).

Thanks for your attention, and I wish you all a successful and happy 2024.

by Jörg at December 16, 2023 06:03 PM

November 25, 2023

Things on a content management system - Jörg Hoh

Thoughts on performance testing on AEM CS

Performance is an interesting topic on its own, and I already wrote a bit about it in this blog (see the overview). I have not written yet about performance testing in the context of AEM CS. It’s not that it is fundamentally different, but there are some specifics, which you should be aware of.

  • Perform your performance tests on the Stage environment. The stage environment is kept at the same sizing as the production environment, so it should deliver the same behavior. and your PROD environment, if you have the same content and your test is realistic.
  • Use a warmup phase. As the Stage environment is normally downscaled to the minimum (because there is no regular traffic), it can take a bit of time until it has upscaled (automatically) to the same number of instances as your PROD is normally operating with. That means that your test should have a proper warmup period, during you which increase the traffic to the normal 100% level of production. This warmup phase should take at least 30 minutes.
  • I think that any test should take at least 60-90 minutes (including warmup); even if you see early that the result is not what you expect to be, there is often something to learn even from such incorrect/faulty situations. I had the case that a customer was constantly terminating the after about 20-25 minutes, claiming that something was not working server-side as they expected it to be. Unfortunately the situation has not yet settled, so I was not able to get any useful information from the system.
  • AEM CS comes with a CDN bundled to the environment, and that’s the case also for the Stage environment. But that also means that your performance test should contain all requests, which you expect to be delivered from the CDN. This is important because it can show if your caching is working as intended. Also only then you can assess the impact of the cache misses (when files expire on the CDN) on the overall performance.
  • While you are at it, you can run a stage pipeline during the performance test and deploy new code. You should not see any significant change in performance during that time.
  • Oh yes, also do some content activations in that time. That makes your test much more realistic and also reveal potential performance problems when updating content (e.g. because you constantly invalidate the complete dispatcher cache).
  • You should focus on a large content set when you do the performance test. If you only test a handful of pages/assets/files, you are mostly testing caches (at all levels).
  • Campaign-traffic” is rarely tested. This is traffic, which has some query strings attached (e.g. “utm_source”, “gclid” and such) to support traffic attribution. These parameters are ignored while rendering, but they often bypass all caching layers, hitting AEM. And while a regular performance test only tests without these paramters, if you marketing department runs a facebook campaign, the traffic from that campaign looks much different, and then the results of your performance tests are not valid anymore.

Some words as precaution:

  • A performance test can look like a DOS, and your requests can get blocked for that reason. This can happen especially if these requests are originating from a single source IP. For that reason you should distribute your load injector and use multiple source IP addresses. In case you still get blocked, please contact support so we can adapt accordingly.
  • AEM CS uses an affinity cookie to indicate that requests of a user-agent are handled by a specific backend system. If you use the same affinity cookie throughout all your performance tests, you just test a single backend system; and that effectively disables any loadbalancing and renders the performance test results unusable. Make sure that you design your performance tests with that in mind.

I general I prefer it much if I can help you during the performance phase, than to handle escalations for of bad performance and potential outages because of it. I hope that you think the same way.

by Jörg at November 25, 2023 11:45 AM

November 19, 2023

Things on a content management system - Jörg Hoh

If you have curl, every problem looks like a request

If you are working in IT (or a crafter) you should know the saying: “When you have a hammer, every problem looks like a nail”. It describes the tendency of people, that if they have a tool, which helps them to reliably solve a specific problem, that they will try to use this tool at every other problem, even if it does not fit at all.

Sometimes I see this pattern in AEM as well, but not with a hammer, but with “curl”. Curl is a commandline HTTP client, and it’s quite easy to fire a request against AEM and do something with the output of it. It’s something every AEM developer should be familiar with, also because it’s a great tool to automate things. And if you talk about “automating AEM”, the first thing people often come up with is “curl”…

And there the problem starts: Not every problem can be automated with curl. For example take a periodic data export from AEM. The immediate reaction of most developers (forgive me if I generalize here, but I have seen this pattern too often!) is to write a servlet to pull all this data together, create a CSV list and then use curl to request this servlet every day/week.

Works great, does it? Good, mark that task as done, next!

Wait a second, on prod it takes 2 minutes to create that list. Well, not a problem, right? Until it takes 20 minutes, because the number of assets is growing. And until you move to AEM CS, where the timeout of requests is 60 seconds, and your curl is terminated with a statuscode 503.

So what is the problem? It is not the timeout of 60 seconds; and it’s also not the constantly increasing number of assets. It’s the fact, that this is a batch operation, and you use a communication pattern (request/response), which is not well suited for batch operations. It’s the fact, that you start with curl in mind (a tool which is built for the request/response pattern) and therefor you build the implementation around it this pattern. You have curl, so every problem is solved with a request.

What are the limits of this request/response pattern? Definitely the runtime is a limit, and actually for 3 reasons:

  • The timeout for requests on AEM CS (or basically any other loadbalancer) is set for security reasons and to keep the prevent misuse. Of course the limit of 60 seconds in AEM CS is a bit arbitrary, but personally I would not wait 60 seconds for a webpage to start rendering. So it’s as good as any higher number.
  • There is another limit, which is determined by the availability of the backend system, which is actually processing this request. In an high-available and autoscaling environment systems start and stop in an automated fashion, managed by a control-plane which operates on a set of rules. And these rules can enforce, that any (AEM-) system will be forced to shutdown at maximum 10 minutes after it has stopped to receive new requests. And that means for a requests, which would take constantly 30+ minutes, that it might be terminated, without finishing successfully. And it’s unclear if your curl would even realize it (especially if you are streaming the results).
  • (And technically you can also add that the whole network connection needs to be kept open for that long, and AEM CS itself is just a single factor in there. Also the internet is not always stable, you can experience network hiccups and any point in time. It’s normally just well hidden by retrying failing requests. Which is not an option here, because it won’t solve the problem at all.)

In short: If your task can take long (say: 60+ seconds), then a request is not necessarily the best option to implement it.

So, what options do you have then? Well, the following approach works also in AEM CS:

  1. Use a request to create and initiate your task (let’s call it a “job”);
  2. And then poll the system until this job is completed, then return the result.

This is an asynchronous pattern, and it’s much more scalable when it comes to the amount of processing you can do in there.

Of course you cannot use a single curl command anymore, but now you need to write a program to execute this logic (don’t write it in a shell-script please!); but on the AEM side you can now use either sling jobs or AEM workflows and perform the operation.

But this avoids this restriction on 60 seconds and it can handle restarts of AEM transparently, at least on author side. And you have the huge benefit, that you can collect all your errors during the runtime of this job and decide afterwards, if the execution was a success or failed (which you cannot do in HTTP).

So when you have long-running operations, check if you need to do them within a request. In many cases it’s not required, and then please switch gears to some asynchronous pattern. And that’s something you can do even before the situation starts to get a problem.

by Jörg at November 19, 2023 04:32 PM

November 09, 2023

Things on a content management system - Jörg Hoh

Identify repository access

Performance tuning in AEM is typically a tough job. The most obvious and widely known aspect is the tuning of JCR queries, but that’s all; if your code is not doing any JCR query and still slow, it’s getting hard. For requests my standard approach is to use “Recent requests” and identify slow components, but that’s it. And then you have threaddumps, but these are hardly helping here. There is no standard way to diagnose further without relying on gut feeling and luck.

When I had to optimize a request last year, I thought again about this problem. And I asked myself the question:
Whenever I check this request in the threaddumps, I see the code accessing the repository. Why is this the case? Is the repository slow or is it just accessing the repository very frequently?

The available tools cannot answer this question. So I had to write myself something which can do that. In the end I committed it to the Sling codebase with SLING-11654.

The result is an additional logger, (“org.apache.sling.jcr.resource.AccessLogger.operation” on loglevel TRACE) which you can enable and which can you log every single (Sling) repository access, including the operation, the path and the full stacktrace. That is a huge amount of data, but it answered my question quite thoroughly.

  • The repository is itself is very fast, because a request (taking 500ms in my local setup) performs 10’000 times a repository access. So the problem is rather the total number of repository access.
  • Looking at the list of accessed resources it became very obvious, that there is a huge number of redundant access. For example these are the top 10 accessed paths while rendering a simple WKND page (/content/wknd/language-masters/en/adventures/beervana-portland):
    • 1017 /conf/wknd/settings/wcm/templates/adventure-page-template/structure
    • 263 /
    • 237 /conf/wknd/settings/wcm/templates
    • 237 /conf/wknd/settings/wcm
    • 227 /content
    • 204 /content/wknd/language-masters/en
    • 199 /content/wknd
    • 194 /content/wknd/language-masters/en/adventures/beervana-portland/jcr:content
    • 192 /content/wknd/jcr:content
    • 186 /conf/wknd/settings

But now with that logger, I was able to identify access patterns and map them to code. And suddenly you see a much bigger picture, and you can spot a lot of redundant repository access.

With that help I identified the bottleneck in the code, and the posting “Sling Model performance” was the direct result of this finding. Another result was the topic for my talk at AdaptTo() 2023; checkout the recording for more numbers, details and learnings.

But with these experiences I made an important observation: You can use the number of repository access as a proxy metric for performance. The more repository access you do, the slower your application will get. So you don’t need to rely so much on performance tests anymore (although they definitely have their value), but you can validate changes in the code by counting the number of repository access performed by it. Less repository access is always more performant, no matter the environmental conditions.

And with an additional logger (“org.apache.sling.jcr.AccessLogger.statistics” on TRACE) you can get just the raw numbers without details, so you can easily validate any improvement.

Equipped with that knowledge you should be able to investigate the performance of your application on your local machine. Looking forward for the results 🙂

(This is currently only available on AEM CS / AEM CS SDK, I will see to get it into an upcoming AEM 6.5 servicepack.)

by Jörg at November 09, 2023 01:28 PM

November 04, 2023

Things on a content management system - Jörg Hoh

The Explain Query tool

When there’s a topic which has been challenging forever in the AEM world, then it’s JCR queries and indexes. It can feel like an arcane science, where it’s quite easy to mess up and end up with a slow query. I learned it also the hard way, and a printout of the JCR query cheatsheet is always below my keyboard.

But there were some recent changes, which made the work with query performance easier. First, in AEM CS the Explain Query tool has been added, which is also available via the AEM Developer Console. It displays queries, slow queries, number of rows read, the used index, execution plan etc. But even with that tool alone it’s still hard to understand what makes a query performant or slow.

Last week there was a larger update to the AEM documentation (thanks a lot, Tom!), which added a detailed explanation of the Explain Query tool. Especially it drills down into the details of the query execution plan and how to interpret it.

With this information and the good examples given there you should be able to analyze the query plan of your queries and optimize the indexes and queries before you execute them the first time in production.

by Jörg at November 04, 2023 06:22 PM

October 16, 2023

Things on a content management system - Jörg Hoh

3 rules how to use an HttpClient in AEM

Many AEM applications consume data from other systems, and in the last decade the protocol of choice turned out to the HTTP(S). And there are a number of very mature HTTP clients out, which can be used together with AEM. The most frequently used variant is the Apache HttpClient, which is shipped with AEM.

But although the HttpClient is quite easy to use, I came across a number of problems, many of them result in service outages. In this post I want to list the 3 biggest mistakes you can make when you use the Apache HttpClient. While I observed the results in AEM as a Cloud Service, the underlying effects are the same on-prem and in AMS, the resulting effects can be a bit different.

Reuse the HttpClient instance

I often see that a HttpClient instance is created for a single HTTP request, and in many cases it’s not even closed properly afterwards. This can lead to these consequences:

  • If you don’t close the HttpClient instance properly, the underlying network connection(s) will not be closed properly, but eventually timeout. And until then the network connections stays open. If you using a proxy with a connection limit (many proxies do that) this proxy can reject new requests.
  • If you re-create a HttpClient for every request, the underlying network connection will get re-established every time with the latency of the 3-way handshake.

The reuse of the HttpClient object and its state is also recommended by its documentation.

The best way to make that happen is to wrap the HttpClient into an OSGI service, create it on activation and stop it when the service is deactivated.

Set agressive connection- and read-timeouts

Especially when an outbund HTTP request should be executed within the context of a AEM request, performance really matters. Every milisecond which is spent in that external call makes the AEM request slower. This increases the risk of exhausting the Jetty thread pool, which then leads to non-availability of that instance, because it cannot accept any new requests. I have often seen AEM CS outages because a backend was not responding slowly or not at all. All requests should finish quickly, and in case of errors must also return fast.

That means, timeouts should not exceed 2 second (personally I would prefer even 1 second). And if your backend cannot respond that fast, you should reconsider its fitness for interactive traffic, and try not to connect to it in a synchronous request.

Implement a degraded mode

When your backend application responds slowly, returns errors or is not available at all, your AEM application should be react accordingly. I had the case a number of times that any problem on the backend had an immediate effect on the AEM application, often resulting in downtimes because either the application was not able to handle the results of the HttpClient (so the response rendering failed with an exception), or because the Jetty threadpool was totally consumed by those requests.

Instead your AEM application should be able to fallback into a degraded mode, which allows you to display at least a message, that something is not working. In the best case the rest of the site continues to work as usual.

If you implement these 3 rules when you do your backend connections, and especially if you test the degraded mode, your AEM application will be much more resilient when it comes to network or backend hiccups, resulting in less service outages. And isn’t that something we all want?

by Jörg at October 16, 2023 01:43 PM

October 14, 2023

Things on a content management system - Jörg Hoh

Recap: AdaptTo 2023

It was adapTo() time again, the first time again in an in-person format since 2019. And it’s definitely much different from the virtual formats we experienced during the pandemic. More personal, and allowing me to get away from the daily work routine; I remember that in 2020 and 2021 I constantly had work related topics (mostly Slack) on the other screen, while I was attending the virtual conference. That’s definitely different when you are at the venue 🙂

And it was great to see all the people again. Many of the people which are part of the community for years, but also many new faces. Nice to see that the community can still attract new people, although I think that the golden time of the backend-heavy web-development is over. And that was reflected on stage as well, with Edge Delivery Services being quite a topic.

As in the past years, the conference itself isn’t that large (this year maybe 200 attendees) and it gives you plenty of chances to get in touch and chat about projects, new features, bugs and everything else you can imagine. The location is nice, and Berlin gives you plenty of opportunities to go out for dinner. So while 3 days of conference can definitely be exhausting, I would have liked to spend much more dinners with attendees.

I got the chance to come on stage again with one of my favorite topics: Performance improvement in AEM, a classic backend topic. According to the talk feedback, people liked it 🙂
Also, the folks of the adaptTo() recorded all the talks and you can find both the recording and the slide deck on the talk’s page.

The next call for papers is already announced to start in February ’24), and I will definitely submit a talk again. Maybe you as well?

by Jörg at October 14, 2023 04:07 PM

July 13, 2023

Things on a content management system - Jörg Hoh

AEM CS & dedicated egress IP

Many customers of AEM as a Cloud Service are used to perform a first level of access control by allowing just a certain set of IP addresses to access a system. For that reason they want that their AEM instances use a static IP address or network range to access their backend systems. AEM CS supports with this with the feature called “dedicated egress IP address“.

But when testing that feature there is often the feedback, that this is not working, and that the incoming requests on backend systems come from a different network range. This is expected, because this feature does not change the default routing for outgoing traffic for the AEM instances.

The documentation also says

Http or https traffic will go through a preconfigured proxy, provided they use standard Java system properties for proxy configurations.

The thing is that if traffic is supposed to use this dedicated egress IP, you have to explicitly make it use this proxy. This is important, because by default not all HTTP Clients do this.

For example, the in the Apache HTTP Client library 4.x, the HttpClients.createDefault() method does not read the system properties related proxying, but the HttpClients.createSystem() does. Same with the java.net.http.HttpClient, for which you need to configure the Builder to use a proxy. Also okhttp requires you to configure the proxy explicitly.

So if requests from your AEM instance is coming from the wrong IP address, check that your code is actually using the configured proxy.

by Jörg at July 13, 2023 07:40 AM

July 02, 2023

Things on a content management system - Jörg Hoh

Sling Model Exporter: What is exported into the JSON?

Last week we came across a strange phenomenon, when in the AEM release validation process the process broke in an unexpected situation. Which is indeed a good thing, because it covered an aspect I have never thought of.

The validation broke because during a request the serialization of a Sling Model failed with an exception. The short version: It tried to serialize a ResourceResolver(!) into JSON (more details in SLING-11924). Why would anyone serialize a ResourceResolver into a JSON to be consumed by an SPA? I clearly believe that this was not done intentionally, but happened by accident. But nevertheless, it broke the improvement we intended to make, so we had to rollback it and wait for SLING-11924 being implemented.

But it gives me the opportunity to explain, which fields of a Sling Model are exported by the SlingModelExporter. As it is backed by the Jackson data-bind framework, the same rules apply:

  • All public fields are serialized
  • all public available getter methods, which do not expect a parameter are serialized.

It is not too hard to check this, but there are a few subtle aspect to consider in the context of Sling Models.

  • Injections: make sure that you make only these injections as public, which you want to be dealt with by the SlingModelExporter. Make everything else private.
  • I see often Lombok used to create getters for SlingModels (because you need them for the use in HTL). This is especially problematic, when the annotation @Getter is done on a class-level, because now for every field (not matter the visibility) a getter is created, which is then picked up by the SlingModelExporter.

My call to action: Validate your SlingModels and check them that you don’t export a ResourceResolver by accident. (If you are a AEM as a Cloud Service customer and affected by this problem, you will probably get an email from us, telling you to do exactly that.)

by Jörg at July 02, 2023 06:38 PM

January 12, 2023

Things on a content management system - Jörg Hoh

Sling models performance (part 3)

In the first and second part of this series “Sling Models performance” I covered aspects which can degrade the performance of your Sling models, be it by not specifying the correct injector or by re-using complex models for very simple cases (by complex PostConstruct models).

And there is another aspect when it comes to performance degradation, and it starts with a very cool convenience function. Because Sling Models can create a whole tree of objects. Imagine this code as part of a Sling Model:

@ChildResource
AnotherModel child;

It will adapt the child-resource named “child” into the class “AnotherModel” and inject it. This nesting is a cool feature and can be a time-saver if you have a more complex resource structure to model your content.

But also it comes with a price, because it will create another Sling Model object; and even that Sling Model can trigger the creation of more Sling Models, and so on. And as I have outlined in my previous posts, the creation of these Sling Models does not come for free. So if your “main Sling Model” internally creates a whole tree of Sling Models, the required time will increase. Which can be justified, but not if you just need a fraction of the data of the Sling Models. So is it worth to spend 10 miliseconds to create a complex Sling Model just to call a simple getter of it, if you could retrieve this information alone in just 10 microseconds?

So this is a situation, where I need to repeat what I have written already in part 2:


When you build your Sling Models, try to resolve all data lazily, when it is requested the first time.

Sling Model Perforamance (part 2)

But unfortunately, injectors do not work lazily but eagerly; injections are executed as part of construction of the model. Having a lazy injection would be a cool feature …

So until this is available, you should use check the re-use of Sling Model quite carefully; always consider how much work is actually done in the background, and if the value of reusing that Sling Model is worth the time spent in rendering.

by Jörg at January 12, 2023 04:38 PM

January 02, 2023

Things on a content management system - Jörg Hoh

The most expensive HTTP request

TL;DR: When you do a performance test for your application, also test a situation where you just fire large number of invalid requests; because you need to know if your error-handling is good enough to withstand this often unplanned load.

In my opinion the most expensive HTTP requests are the ones which return with a 404. Because they don’t bring any value, are not as easily cacheable as others and are very easily to generate. If you are looking into AEM logs, you will often find requests from random parties which fire a lot of requests, obviously trying to find vulnerable software. But in AEM these always fail, because there are not resources with these names, returning a statuscode 404. But this turns a problem if these 404 pages are complex to render, taking 1 second or more. In that case requesting 1000 non-existing URLs can turn into a denial of service.

This can even get more complex, if you work with suffixes, and the end user can just request the suffix, because you prepend that actual resource by mod_rewrite on the dispatcher. In such situations the requested resource is present (the page you configured), but the suffix can be invalid (for example point to a non-existing resource). Depending on the implementation you can find out very late about this situation; and then you have already rendered a major part of the page just to find out that the suffix is invalid. This can also lead to a denial of service, but is much harder to mitigate than the plain 404 case.

So what’s the best way to handle such situations? You should test for such a situation explicitly. Build a simple performance test which just fires a few hundreds requests triggering a 404, and observe the response time of the regular requests. It should not drop! If you need to simplify your 404 pages, then do that! Many popular websites have very stripped down 404 pages for just that reason.

And when you design your URLs you should always have in mind these robots, which just show up with (more or less) random strings.

by Jörg at January 02, 2023 01:56 PM

December 21, 2022

Things on a content management system - Jörg Hoh

AEM article review December 2022

I am doing this blog now for quite some time (the first article in this blog dates back to December 2008! That was the time of CQ 5.0! OMG), and of course I am not the only one writing on AEM. Actually the number of articles which are produced every months is quite large, but I am often a bit disappointed because many just reproduce some very basic aspects of AEM, which can be found at many places. But the amount of new content which describe aspects which have barely been covered by other blog posts or the official product documentation is small.

For myself I try to focus on such topics, offer unique views on the product and provide recommendations how things can be done (better), all based on my personal experiences. I think that this type of content is appreciated by the community, and I get good feedback on it. To encourage the broader community to come up with more content covering new aspects I will do a little experiment and promote a few selected articles of others. I think that these article show new aspects or offer a unique view on certain on AEM.

Depending on the feedback I will decide i will continue with this experiment. If you think that your content also offers new views, uncovers hidden features or suggests best practices, please let me know (see the my contact data here). I will judge these proposals on the above mentioned criteria. But of course it will be still my personal decision.

Let’s start with Theo Pendle, who has written an article on how to write your own custom injector for Sling Models. The example he uses is a real good one, and he walks you through all the steps and explains very well, why that is all necessary. I like the general approach of Theos writing and consider the case of safely injecting cookie values as a valid for such a injector. But in general I think that there are not many other cases out there, where it makes sense to write custom injectors.

Also on a technical level John Mitchell has his article “Using Sling Feature Flags to Manage Continous Releases“, published on the Adobe Tech Blog. He introduces Sling Features and how you can use them to implement Feature Flags. And that’s something I have not seen used yet in the wild, and also the documentation is quite sparse on it. But he gives a good starting point, although a more practical example would be great 🙂

The third article I like the most. Kevin Nenning writes on “CRXDE Lite, the plague of AEM“. He outlines why CRXDE Lite has gained such a bad reputation within Adobe, that disabling CRXDE Lite is part of the golive checklist for quite some time. But on the other hand he loves the tool because it’s a great way for quick hacks on your local development instance and for a general read-only tool. This is an article every AEM developer should read.
And in case you haven’t seen it yet: AEM as a Cloud Service offers the repository browser in the developer console for a read-only view on your repo!

And finally there is Yuri Simione (an Adobe AEM champion), who published 2 articles discussing the question “Is AEM a valid Content Services Plattform?” (article 1, article 2). He discusses an implementation which is based on Jackrabbit/Oak and Sling (but not AEM) to replace an aging Documentum system. And finally he offers an interesting perspective on the future of Jackrabbit. Definitely a read if you are interested in a more broader use of AEM and its foundational pieces.

That’s it for December. I hope you enjoy these articles as much as I did, and that you can learn from them and get some new inspiration and insights.

by Jörg at December 21, 2022 05:29 PM

December 12, 2022

Things on a content management system - Jörg Hoh

Sling Models performance, part 2

In the last blog post I demonstrated the impact of the correct type of annotations on performance of Sling Models. But there is another aspect of Sling Models, which should not be underestimated. And that’s the impact of the method which is annotated with @PostConstruct.

If you are not interested in the details, just skip to the conclusion at the bottom of this article.

To illustrate this aspect, let me give you an example. Assume that you have a navigation (or list component) in which you want to display only pages of the type “product pages” which are specifically marked to be displayed. Because you are developer which is favoring clean code, you already have a “ProductPageModel” Sling Model which also offers a “showInNav()” method. So your code will look like this:

List<Page> pagesToDisplay = new ArrayList<>();
for (Page child : page.listChildren()) {
  ProductPageModel ppm = child.adaptTo(ProductPageModel.class);
  if (ppm != null && ppm.showInNav()) {
    pagesToDisplay.add(child);
  }
}

This works perfectly fine; but I have seen this approach to be the root cause for severe performance problems. Mostly because the ProductPageModel is designed the one and only Sling Model backing a Product Page; the @PostConstruct method of the ProductPageModel contains all the logic to calculate all retrieve and calculate all required information, for example Product Information, datalayer information, etc.

But in this case only a simple property is required, all other properties are not used at all. That means that the majority of the operations in the @PostConstruct method are pure overhead in this situation and consuming time. It would not be necessary to execute them at all in this case.

Many Sling Models are designed for a single purpose, for example rendering a page, where such a sling model is used extensively by an HTL scriptlet. But there are cases where the very same SlingModel class is used for different purposes, when only a subset of this information is required. But also in this case the whole set of properties is resolved, as it you would need for the rendering of the complete page.

I prepared a small test-case on my github account to illustrate the performance impact of such code on the performance of the adaption:

  • ModelWithPostConstruct contains a method annotated with @PostConstruct, which resolves a another property via an InheritanceValueMap.
  • ModelWithoutPostConstruct provides the same semantic, but executes the calculations lazy, only when the information is required.

The benchmark is implement in a simple servlet (SlingModelPostConstructServlet), which you can invoke on the path “/bin/slingmodelpostconstruct”

$ curl -u admin:admin http://localhost:4502/bin/slingmodelpostconstruct
test data created below /content/cqdump/performance
de.joerghoh.cqdump.performance.core.models.ModelWithPostconstruct: single adaption took 50 microseconds
de.joerghoh.cqdump.performance.core.models.ModelWithoutPostconstruct: single adaption took 11 microseconds

The overhead is quite obvious, almost 40 microseconds per adaption; of course it’s dependent on the amount of logic within this @PostConstruct method. And this postconstruct method is quite small, compared to other SlingModels I have seen. And in the cases where only a minimal subset of the information is required, this is pure overhead. Of course the overhead is often minimal if you just consider a single adaption, but given the large number of Sling Models in typical AEM projects, the chance is quite high that this turns into a problem sooner or later.

So you should pay attention on the different situations when you use your Sling Models. Especially if you have such vastly different cases (rendering the full page vs just getting one property) you should invest a bit of time and optimize them for these usecases. Which leads me to the following:

Conclusion

When you build your Sling Models, try to resolve all data lazily, when it is requested the first time. Keep the @PostConstruct method as small as possible.

by Jörg at December 12, 2022 08:17 AM

November 28, 2022

Things on a content management system - Jörg Hoh

Sling Model Performance

In my daily job as an SRE for AEM as a Cloud Service I often have to deal with performance questions, especially in the context of migrations of customer applications. Applications sometimes perform differently on AEM CS than they did on AEM 6.x, and a part of my job is to look into these cases.

This often leads to interesting deep dives and learnings; you might have seen this reflected in the postings of this blog 🙂 The problem this time was a tight loop like this:

for (Resource child: resource.getChildren()) {
SlingModel model = child.adaptTo(SlingModel.class);
if (model != null && model.hasSomeCondition()) {
// some very lightweight work
}
}

This code performed well with 1000 child resources in a AEM 6.x authoring instance, but quite poorly on an AEM CS authoring instance with the same number of child nodes. And the problem is not the large number of childnodes …

After wading knee-deep through TRACE logs I found the problem at an unexpected location. But before I present you the solution and some recommendations, let me you explain some background. But of course you can skip the next section and jump directly to the TL;DR at the bottom of this article.

SlingModels and parameter injection

One of the beauties of Sling Models is that these are simple PoJos, and properties are injected by the Sling Models framework. You just have to add matching annotations to mark them accordingly. See the full story in the official documentation.

The simple example in the documentation looks like this:

@Inject
String title;

which (typically) injects the property named “title” from the resource this model was adapted from. The same way you can inject services, child-nodes any many other useful things.

To make this work, the framework uses an ordered list of Injectors, which are able to retrieve values to be injected (see the list of available injectors). The first injector which returns a non-null value is taken and its result is injected. In this example the ValueMapInjector is supposed to return a property called “title” from the valueMap of the resource, which is quite early in the list of injectors.

Ok, now let’s understand what the system does here:

@Inject
@Optional
String doesNotExist;

Here a optional field is declared, and if there is no property called “doesNotExist” in the valueMap of the resource, other injectors are queried if they can handle that injection. Assuming that no injector can do that, the value of the field “doesNotExist” remains null. No problem at first sight.

But indeed there is a problem, and it’s perfomance. Even the lookup of a non-existing property (or node) in the JCR takes time, and doing this a few hundred or even thousand times in a loop can slow down your code. And a slower repository (like the clustered MongoDB persistence in the AEM as a Cloud Service authoring instances) even more.

To demonstrate it, I wrote a small benchmark (source code on my github account), which does a lot of adaptions to Sling Models. When deployed to AEM 6.5.5 or later (or a recent version of the AEM CS SDK) you can run it via curl -u admin:admin http://localhost:4502/bin/slingmodelcompare

This is its output:

de.joerghoh.cqdump.performance.core.models.ModelWith3Injects: single adaption took 18 microseconds
de.joerghoh.cqdump.performance.core.models.ModelWith3ValueMaps: single adaption took 16 microseconds
de.joerghoh.cqdump.performance.core.models.ModelWithOptionalValueMap: single adaption took 18 microseconds
de.joerghoh.cqdump.performance.core.models.ModelWith2OptionalValueMaps: single adaption took 20 microseconds
de.joerghoh.cqdump.performance.core.models.ModelWithOptionalInject: single adaption took 83 microseconds
de.joerghoh.cqdump.performance.core.models.ModelWith2OptionalInjects: single adaption took 137 microsecond
s

It’s a benchmark which on a very simple list of resources tries adaptions to a number of Model classes, which are different in their type of annotations. So adapting to a model which injects 3 properties takes approximately 20 microseconds, but as soon as a model has a failing injection (which is declared with “@Optional” to avoid failing the adaption), the duration increases massively to 83 microseconds, and even 137 microseconds when 2 these failed injections are there.

Ok, so having a few of such failed injections do not make a problem per se (you could do 2’000 within 100 milliseconds), but this test setup is a bit artificial, which makes these 2’000 a really optimistic number:

  • It is running on a system with a fast repository (SDK on my M1 Macbook); so for example the ChildResourceInjector does not has almost no overhead to test for the presence of a childResource called “doesNotExist”. This can be different, for example on AEM CS Author the Mongo storage has a higher latency than the segmentStore on the SDK or a publish. If that (non-existing) child-resource is not in the cache, there is an additional latency in the range of 1ms to load that information. What for? Well, basically for nothing.
  • The OsgiInjector is queried as well, which tries to access the OSGI ServiceRegistry; this registry is a central piece of OSGI, and it’s consistency is heavily guarded by locks. I have seen this injector being blocked by these locks, which also adds latency.

That means that these 50-60 microseconds could easily multiply, and then the performance is getting a problem. And this is the problem which initially sparked this investigation.

So what can we do to avoid this situation? That is quite easy: Do not use @Inject, but use the specialized injectors directly (see them in the documentation). While the benefit is probably quite small when it comes to properties which are present (ModelWith3Injects tool 18 microseconds vs 16 microseconds of ModelWith3ValueMaps), the different gets dramatic as soon as we consider failed injections:

Even in my local benchmark the improvement can be seen quite easily, there is almost no overhead of such a failed injection, if I explicitly mark them as Injection via the ValueMapInjector. And as mentioned, this overhead can be even larger in reality.

Still, this is a micro-optimization in the majority of all cases; but as mentioned already, many of these optimizations implemented definitely can make a difference.

TL;DR Use injector-specific annotations

Instead of @Inject use directly the correct injector. You normally know exactly where you want that injected value to come from.
And by the way: did you know that the use of @Inject is discouraged in favor of these injector-specific annotations?

Update: The Sling Models documentation has been updated and explicitly discourages the use of @Inject now.

by Jörg at November 28, 2022 08:25 AM

October 31, 2022

Things on a content management system - Jörg Hoh

Limits of dispatcher caching with AEM as a Cloud Service

In the last blog post I proposed 5 rules for Caching with AEM, how you should design your caching strategy. Today I want to show another aspect of rule 1: Prefer caching at the CDN over caching at the dispatcher.

I already explained that the CDN is always located closer to the consumer, so the latency is lower and the experience will be better. But when we limit the scope to AEM as a Cloud Service, the situation gets a bit complicated, because the dispatcher is not able to cache files for more than 24 hours.

This is caused by a few architectural decisions done for AEM as a Cloud Service:

These 2 decisions lead to the fact, that no dispatcher cache can hold files fore more than 24 hours because the instance is terminated after that time. And there are other situations where the publishs are to be re-created, for example during deployments and up/down-scaling situations, and then the cache does not contain files for 24 hours, but maybe just 3 hours.

This naturally can limit the cache-hit ratio in cases where you have content which is requested frequently but is not changed in days/weeks or even months. In an AEM as a Cloud Service setup these files are then rendered once per day (or more often, see above) per publish/dispatcher, while in other setups (for example AMS on on-prem setups where long-living dispatcher caches are pretty much default) it can delivered from the dispatcher cache without the need to re-render it every day.

The CDN does not have this limitation. It can hold for days and weeks and deliver them, if the TTL settings allow this. But as you can control the CDN only via TTL, you have to make a tradeoff between cache-hit ratio on the CDN and the accuracy of the delivered content regarding a potential change.

That means:

  • If you have files which do not change you just set a large TTL to them and then let the CDN handle them. A good example are clientlibs (JS and CSS files), because they have a unique name (an additional selector which is created as a hash over the content of the file.
  • If there’s a chance that you make changes to such content (mostly pages), you should set a reasonable TTL (and of course “stale-while-revalidate”) and accept that your publishs need to re-render these pages when the time has passed.

That’s a bit a drawback of the AEM as a Cloud Service setup, but on the hand side your dispatcher caches are regularly cleared.

by Jörg at October 31, 2022 02:02 PM

October 17, 2022

Things on a content management system - Jörg Hoh

Dispatcher, CDN and Caching

In today’s web performance discussions, there is a lot of focus on the browser as the most important. Google defines Web Core Vitals, and there are many other aspects which are important to have a fast site. Plus then SEO …

While many developers focus on these, I see that many sites often neglect the importance of proper caching. While many of these sites already use a CDN (in AEM CS a CDN is part of any offering), they often do not use the CDN in an optimal way; this can result in slow pages (because of the network latency) and also unnecessary load on the backend systems.

In this blog post I want to outline some ways how you can optimize your site for caching, with a focus on AEM in combination with a CDN. It does not really matter if it is AEM as a Cloud Service or AEM on AMS or on-premises, these recommendations can be applied to all of them.

Rule 1: Prefer caching at the CDN over caching at the dispatcher

The dispatcher is located close the AEM instance and typically co-located to your AEM instances. There is a high latency from the dispatcher to the end-user, especially if your end-users are spread across the globe. For example the average latency between Frankfurt/Germany and Sydney/Australia is approximately 250ms, and that makes browsing a website not really fast. Using a decent CDN can cut reduce these numbers dramatically.

Also a CDN is better suited to handle millions of requests per minute than a bunch of VMs running dispatcher instances, both from a cost perspective and from a perspective of knowhow required to operate at that scale.

That means that your caching strategy should aim for an optimal caching at the CDN level. The dispatcher is fine as a secondary cache to handle cache-misses or expired cache items. But no direct enduser request should it ever make through to the dispatcher.

Rule 2: Use TTL-based invalidation

The big advantage of the dispatcher is the direct control of the caching. You deliver your content from the cache, until you change that content. And immediately after the change the cache is actively invalidated, and your changed content is delivered. But you cannot use the same approach for CDNs, and while the CDNs made reasonable improvements to reduce the time to actively invalidate content from the CDNs, it still takes minutes.

A better approach is to use a TTL-based (time-to-live) invalidation (or rather: expiration), where every CDN node can decide on its own if a file in the cache is still valid or not. And if the content is too old, it’s getting refetched from the origin (your dispatchers).

Although this approach introduces some latency from the time of content activation to the time all users world-wide are to see it, such a latency is acceptable in general.

Rule 3: Staleness is not (necessarily) a problem

When you optimize your site, you need not only optimize that every request is requested from the CDN (instead from your dispatchers); but you also should think about what happens if a requested file is expired on the CDN. Ideally it should not matter much.

Imagine that you have a file which is configured with a TTL of 300 seconds. What should happen if this file is requested 301 seconds after it has been stored in the CDN cache. Should the CDN still deliver it (and accept that the user receives a file which can be a bit older than specified) or do you want to the user to wait until the CDN has obtained a fresh copy of that file?
Typically you accept that staleness for a moment and deliver the old copy for a while, until the CDN has obtained a fresh copy in the background. Use the “stale-while revalidate” caching headers to configure this behavior.

Rule 4: Pay attention to the 404s

A HTTP status 404 (“File not found”) is tricky to handle, because by default a 404 is not cached at the CDN. That means that all those requests will hit your dispatcher and eventually even your AEM instances, which are the authoritative source to answer if such a file exists. But the number of requests a AEM instance can handle is much smaller than the number the dispatchers or even the CDN can handle. And you should reserve these precious resources on doing something more useful than responding with “sorry, the resource you requested is not here”.

For that reason check the 404s and handle them appropriately; you have a number of options for that:

  • Fix incorrect links which are under your control.
  • Create dispatcher rules or CDN settings which handle request patterns which you don’t control, and return a 404 from there.
  • You also have the option to allow the CDN to cache a 404 response.

In any way, you should manage the 404s, because they are most expensive type of requests: You spend resources to deliver “nothing”.

Rule 5: Know your query strings

Query strings for requests were used a lot of provide parameters to the server-side rendering process, and you might use that approach as well in your AEM application. But query strings are also used a lot to tag campaign traffic for correct attribution; you might have seen such requests already, they often contain parameters like “utm_source”, “fbclid” etc. But these parameters do not have impact on the server-side rendering!
Because these requests cannot be cached by default, CDN and dispatcher will forward all requests containing any query string to AEM. And that’s again the most scarce resource, and having it rendered there will again impose the latency hit on your site visitors.

The dispatcher has the ability of remove named query strings from the request, which enables it to serve such requests from the dispatcher cache; that’s not as good as serving these requests from the CDN but much much better than handling them on AEM. You should use that as much as possible.

If you follow these rules, you have the chance not to only improve the user experience for your visitors, but at the same time you make your site much more scalable and resilient against attacks and outages.

by Jörg at October 17, 2022 06:58 AM

July 05, 2022

Things on a content management system - Jörg Hoh

What’s the maximum size of a node in JCR/AEM?

An interesting question which comes up every now and then is: “Is there a limit how large a JCR node can get?”.  And as always in IT, the answer is not that simple.

In this post I will answer that question and also outline why this limit is hardly a constraint in AEM development. Also I will show ways how you can design your application so that this limit is not a problem at all.

(Allow me a personal note here: For me the most interesting part of that question is the motivation behind it. When this question I asked I typically have the impression that the folks know that they are a bit off-limit here, because this is a topic which is discussed very rarely (if at all). That means they know that they (plan to) do something which violates some good practices. And for that reason they request re-assurance. For me this always leaves the question: Why do they do it then?? Because when you follow the recommended ways and content architecture patterns, you will never hit such a limit.)

We first have to distinguish between binaries and non-binaries. For binaries there is no real limit as they are stored in the blobstore. You can put files with 50GB in size there, not a problem. Such binaries are represented either using the nodetype “nt:file” (used most often) or using binary properties (rarely used).

And then there is the non-binary data. This data comprises of all other node- and property-types, where the information is stored within the nodestore (also often as multi-value properties). Here are limits.

In AEM CS MongoDB is used as data storage, and the maximum size of a MongoDB document is 16 Megabyte. As an approximation (it’s not always the case), you can assume that a single JCR node with all its properties is stored in a single MongoDB document, which directly results in a maximum size per node: 16 Megabytes.

In reality a node cannot get that large; other data is also stored inside that document. I recommend to store never more than 1 Megabyte of non-binary properties inside a single node. Technically you don’t have that limit in a TarMK/SegmentTar-only setup, but I would not exceed it either. You will have all kind of interesting problems and you barely have experience with such large nodes in the AEM world.

If you actually violate this limit in the size of a document, you get this very nasty exception and your content will not be stored:

javax.jcr.RepositoryException: OakOak0001: Command failed with error 10334 (BSONObjectTooLarge): ‘BSONObj size: 17907734 (0x1114016) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: “7:/var/workflow/instances/server840/2022-06-01/xxx_reviewassetsworkflow1_114/history”‘ on server cmgbr9sharedcluster51rt-shard-00-01.xxxxx:27017. The full response is {“operationTime”: {“$timestamp”: {“t”: 1656435709, “i”: 87}}, “ok”: 0.0, “errmsg”: “BSONObj size: 17907734 (0x1114016) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: \”7:/var/workflow/instances/server840/2022-06-01/xxx_reviewassetsworkflow1_114/history\””, “code”: 10334, “codeName”: “BSONObjectTooLarge”, “$clusterTime”: {“clusterTime”: {“$timestamp”: {“t”: 1656435709, “i”: 87}}, “signature”: {“hash”: {“$binary”: “MXahc2R2arLq+rc41fRzIFKzRAw=”, “$type”: “00”}, “keyId”: {“$numberLong”: “7059363699751911425”}}}} [7:/var/workflow/instances/server840/2022-06-01/xxx_reviewassetsworkflow1_114/history]
at org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:250) [org.apache.jackrabbit.oak-api:1.42.0.T20220608154910-4c59b36]

But is this really a limit which is hurting AEM developers and customers? Actually I don’t think so. And there are at least 2 good reasons why I believe this:

  • Pages barely do have that much content stored in a single component (be in the jcr:content node or any component beneath it), the same with assets. The few instances I have seen this exception just happened because a lot of “data” was stored inside properties (e.g. complete files), which would have better been stored in “nt:file” nodes as binaries.
  • Since version 1.0 Oak contains a warning if it needs to index properties larger than 100 Kilobytes, and I have rarely seen this warning in the wild. There are prominent examples in AEM itself when this warning is written for nodes in /libs.

So the best way to find out if you are close to run into this problem with the total size of the documents is to check the logs for this warning:

05.07.2022 09:31:57.326 WARN [async-index-update-fulltext-async] org.apache.jackrabbit.oak.plugins.index.lucene.LuceneDocumentMaker String length: 116946 for property: imageData at Node: /libs/wcm/core/content/editors/template/tour/content/items/third is greater than configured value 102400

Having these warnings in the logs means that you should pay attention to them; here it’s not a problem because this property is unlikely to get any larger over time. But you should pay attention to those properties which can grow over time.
(Although there is no warning if you have many smaller properties, which in sum hit the limits of the MongoDB document.)

How to mitigate?

As mentioned above it’s hard to come up with cases where this is actually a problem, especially if you are developing in line with the AEM guidelines. The only situation where I can imagine this limit to be a problem is when a lot of data is stored within a node, which is to be consumed by custom logic. But in this case you own the data and the logic. Therefor you have the chance to change the implementation in a way that this situation is not occurring anymore.

When you design your content and data structure, you should be aware of this limit of not storing more than 1 Megabyte within a single node. Because there is no workaround you have when you get that exception. The only way to make it work again is to fix the data structure and the code for it. There are 2 approaches:

  • Split the data across more nodes, ideally in a tree-ish way where you can also use application knowledge to store it in an intuitive (and often faster) way.
  • If you just have a single property which is that large, you could also try to convert it into a binary property. This is much simpler as in the majority of cases you just need to change the type of the property from String to Binary. The type conversions are done implicitly but if you store actual string data, you should worry about encoding then…

Now you know the answer to the question “What’s the maximum size of a node in JCR/AEM” and why it should never be a problem for you. Also I also outlined ways how you can avoid hitting this problem at all by choosing an appropriate node structure or storing large data in binaries instead of properties.

Happy developing and I hope you never encounter this situation!

by Jörg at July 05, 2022 12:03 PM

June 20, 2022

Things on a content management system - Jörg Hoh

Sling Scheduled Jobs vs Sling Scheduler

Apache Sling and AEM provide 2 different approaches to start processes at a given time or in a given interval. It is not always trivial to make the right decision between these two, and I have seen a few cases of misuse already. Let’s dive into this topic and I will outline in what situation to use the Scheduler and when to use Scheduled Jobs.

Let me outline the differences between these using using a simple table:

 Sling Scheduled JobSling Scheduler
Timing is persisted across restartsYesNo
Start a job via OSGI annotationsNoyes
Start a job via APIYesYes
Trigger on every cluster nodeYes (job execution then follows regular Sling Jobs rules)yes
Trigger just once per clusterYes (job execution then follows regular Sling Jobs rules)Yes (only on cluster leader possible)  
Comparison between Scheduled Jobs and Scheduler

To get a better understanding on these 2 distinct features, I cover a 2 use cases and list which feature is a the better match for it.

Execute code exactly once at a given time

That’s a case for the scheduled job. Even if the job is failing because the executing SLING instance goes down, it will be re-scheduled and tried again.

Here the exactly once semantics means that this is a single job with a global scope. Missing it is not an option. It might be delayed if the triggering date is missed or the execution is aborted, but it will be executed as soon as possible after the scheduled time has passed.

Periodic jobs which effect just a single AEM instance

Use the scheduler whenever you execute periodic/incremental jobs like cleanups, data imports etc. It’s not a problem if you miss their execution on time, but you just make sure that you execute at the next time (or trigger it during startup if necessary).

Another note to choose the right approach: You should not need to create a scheduled job during startup (that means on the startup of services), for these cases it’s normally better to use the Scheduler. There might be rare cases where this it is the right solution, but in the majority of cases you should just use the scheduler in that case.

A word of caution when you use Scheduled Jobs with a periodic pattern: As they are persisted, you need to un-register them when you don’t need them anymore.

by Jörg at June 20, 2022 05:39 PM

March 17, 2022

Things on a content management system - Jörg Hoh

How to analyze “Authentication support missing”

Errors and problems in running software manifest often in very interesting and non-obvious cases. A problem in location A manifests itself only with an unrelated error message in a different location B.

We also have one example of such a situation in AEM, and that’s the famous “Authentication support missing” error message.  I see often the question “I got this error message; what should I do now?”, and so I decided: It’s time to write a blog post about it. Here you are.

“Authentication support missing” is actually not even correct: There is no authentication module available, so you cannot authenticate. But in 99,99% of the cases this is just a symptom. Because the default AEM authentication depends on a running SlingRepository service. And a running Sling repository has a number of dependencies itself.

I want to highlight 2 of these dependencies, because they tend to cause problems most often: The Oak repository and the RepositoryInitializer service. Both must be up and be started/run succesfully until the SlingRepository service is being registered succesfully. Let’s look into each of these dependencies.

The Oak repository

The Oak repository is a quite complex system in itself, and there are many reasons why it did not start. To name a few:

  • Consistency problems with the repository files on disk (for whatever reasons), permission problems on the filesystem, full disks, …
  • Connectivity issues towards the storage (especially if you use a database or mongodb as storage)
  • Messed up configuration

If you have an “authentication support missing” message, you first check should be on the Oak repository, typically reachable in the AEM error.log. If you have an ERROR messages logged by any “org.apache.jackrabbit.oak” class during the startup, this is most likely the culprit. Investigate from there.

Sling Repository Initializer (a.k.a. “repoinit”)

Repoinit is designed to ensure that a certain structure in the repository is provided, even before any consumer is accessing it. All of the available scripts must be executed, and any failure will immediate terminate the startup of the SlingRepositoryService. Check also my latest blog post on Sling Repository Initializer for details how to prevent such problems.

Repoinit failures are typically quite prominent in the AEM error.log, just search for an ERROR message starting with this:

*ERROR* [Apache SlingRepositoryStartup Thread #1] com.adobe.granite.repository.impl.SlingRepositoryManager Exception in a SlingRepositoryInitializer, SlingRepositoryservice registration aborted …

These are 2 biggest contributors to this “Authentication support missing” error messages. Of course there are more reasons why it could appear. But to be honest, I only have seen these 2 cases in the last years.

I hope that this article helps you to investigate such situations more swiftly.

by Jörg at March 17, 2022 04:41 PM

March 11, 2022

Things on a content management system - Jörg Hoh

How to deal with RepoInit failures in Cloud Service

Some years, even before AEM as a Cloud Services, the RepoInit language has been implemented as part of Sling (and AEM) to create repository structures directly on the startup of the JCR Repository. With it your application can rely that some well-defined structures are always available.

In this blog post I want to walk you through a way how you can test repoinit statements locally and avoid pipeline failures because of it.

Repoinit statements are deployed as part of OSGI configurations; and that means that during the development phase you can work in an almost interactive way with it. Also exceptions are not a problem; you can fix the statement and retry.

The situation is much different when you already have repoinit statements deployed and you startup your AEM (to be exact: the Sling Repository service) again. Because in this case all repoinit statements are executed as part of the startup of the repository. And any exception in the execution of repoinits will stop the startup of the repository service and render your AEM unusable. In the case of CloudManager and AEM as a Cloud Service this will break your deployment.

Let me walk you through 2 examples of such an exception and how you can deal with it.

*ERROR* [Apache SlingRepositoryStartup Thread #1] com.adobe.granite.repository.impl.SlingRepositoryManager Exception in a SlingRepositoryInitializer, SlingRepositoryservice registration aborted java.lang.RuntimeException: Session.save failed: javax.jcr.nodetype.ConstraintViolationException: OakConstraint0025: /conf/site/configuration/favicon.ico[[nt:file]]: Mandatory child node jcr:content not found in a new node 
at org.apache.sling.jcr.repoinit.impl.AclVisitor.visitCreatePath(AclVisitor.java:167) [org.apache.sling.jcr.repoinit:1.1.36] 
at org.apache.sling.repoinit.parser.operations.CreatePath.accept(CreatePath.java:71)

In this case the exception is quite detailed what actually went wrong. It failed when saving, and it says that /conf/site/configuration/favicon (of type nt:file) was affected. The problem is that a mandatory child node “jcr:content” is missing.

Why is it a problem? Because every node of nodetype “nt:file” requires a “jcr:content” child node which actually holds the binary.

This is a case which you can detect very easily also on a local environment.

Which leads to the first recommendation:

When you develop in your local environment, you should apply all repoinit statements to a fresh environment, in which there are no manual changes. Because otherwise your repoinit statements rely on the presence of some things which are not provided by the repoinit scripts.

Having a mix of manual changes and repoinit on a local development environment and then moving it untested over is often leads to failures in the CloudManager pipelines.

The second example is a very prominent one, and I see it very often:

[Apache SlingRepositoryStartup Thread #1] com.adobe.granite.repository.impl.SlingRepositoryManager Exception in a SlingRepositoryInitializer, SlingRepositoryservice registration aborted java.lang.RuntimeException: Failed to set ACL (java.lang.UnsupportedOperationException: This builder is read-only.) AclLine DENY {paths=[/libs/cq/core/content/tools], privileges=[jcr:read]} 
at org.apache.sling.jcr.repoinit.impl.AclVisitor.setAcl(AclVisitor.java:85)

It’s the well-known “This builder is read-only” version. To understand the problem and its resolution, I need to explain a bit the way the build process assembles AEM images in the CloudManager pipeline.

In AEM as a cloud service you have an immutable part of the repository, which consists out of the trees “/libs” and “/apps”. They are immutable, because they cannot be modified on runtime, not even with admin permissions.

During build time this immutable part of the image is built. This process merges both product side parts (/libs) and custom application parts (/apps) together. After that also all repoinit scripts run, both the ones provided by the product as well as any custom one. And of course during that part of the build these parts are writable, thus writing into /apps using repoinit is not a problem.

So why do you actually get this exception, when /libs and /apps are writeable? This is because repoinit is executed a second time. During the “final” startup, when /apps and /libs are immutable.

Repoinit is designed around that idea, that all activities are idempotent. This means that if you want to create an ACL on /apps/myapp/foo/bar the repoinit statement is a no-op if that specific ACL already exists. A second run of repoinit will do nothing, but find everything still in place.

But if in the second run the system executes this action again, it’s not an no-op anymore. This means that this ACL is not there as expected. Or whatever the goal of that repoinit statement was.

And there is only one reason why this happen. There was some other action between these 2 executions of repoinit which changed the repository. The only thing which also modifies the repository are installations of content packages.

Let’s illustrate this problem with an example. Imagine you have this repoinit script:

create path /apps/myapp/foo/bar
set acl on /apps/myapp/foo/bar
  allow jcr:read for mygroup
end

And you have a content package which comes with content for /apps/myapp and the filter is set to “overwrite”, but not containing this ACL.

In this case the operations leading to this error are these:

  • Repoinit sets the ACL on /apps/myapp/foo/bar
  • the deployment overwrites /apps/myapp with the content package, so the ACL is wiped
  • AEM starts up
  • Repoinit wants to set the ACL on /apps/myapp/foo/bar, which is now immutable. It fails and breaks your deployment.

The solution to this problem is simple: You need to adjust the repoinit statements and the package definitions (especially the filter definitions) in a way, that the package installation does not wipe and/or overwrite any structure created by repoinit. And with “structure” I do not refer only to nodes, but also nodetypes, properties etc. All must be identical, and in the best case they don’t interfere.

It is hard to validate this locally, as you don’t have an immutable /apps and /libs, but there is a test approach which comes very close to it:

  • Run all your repoinit statements in your local test environment
  • Install all your content packages
  • Enable write tracing (see my blog post)
  • Re-run all your repo-init statements.
  • Disable write tracing again

During the second run of the repoinit statements you should not see any write in the trace log. If you have any write operation, it’s a sign that your packages overwrite structures created by repoinit. You should fix these asap, because they will later break your CloudManager pipeline.

With this information at hand you should be able to troubleshoot any repoinit problems already on your local test environment, avoiding pipeline failures because of it.

by Jörg at March 11, 2022 11:28 AM

February 03, 2022

Things on a content management system - Jörg Hoh

The deprecation of Sling Resource Events

Sling events are used for many aspects of the system, and initially JCR changes were sent with it. But the OSGI eventing (which the Sling events are built on-top) are not designed for a huge volumes of events (thousands per second); and that is a situation which can happen with AEM; and one of the most compelling reasons to get away from this approach is that all these event handlers (both resource change event and all others) share a single thread-pool.

For that reason the ResourceChangeListeners have been introduced. Here each listener provides detailed information which change it is interested in (restrict by path and type of the change) therefor Sling is able to optimise the listeners on the JCR level; it does not listen for changes when no-one is interested in. This can reduce the load on the system and improve the performance.
For this reason the usage of OSGI Resource Event listeners are deprecated (although they are still working as expected).

How can I find all the ResourceChangeEventListeners in my codebase?

That’s easy, because on startup for each of these ResourceChangeEventListeners you will find a WARN message in the logs like this:

Found OSGi Event Handler for deprecated resource bridge: com.acme.myservice

This will help you to identify all these listeners.

How do I rewrite them to ResourceChangeListeners?

In the majority of cases this should be straight-forward. Make your service implement the ResourceChangeListeners interface and provide these additional OSGI properties:

@Component(
service = ResourceChangeListener.class,
configurationPolicy = ConfigurationPolicy.IGNORE,
property = {
ResourceChangeListener.PATHS + "=/content/dam/asset-imports",
ResourceChangeListener.CHANGES + "=ADDED",
ResourceChangeListener.CHANGES + "=CHANGED",
ResourceChangeListener.CHANGES + "=REMOVED"
})

With this switch you allow resource events to be processed separately in an optimised way; they do not block anymore other OSGI events.

by Jörg at February 03, 2022 06:58 PM

January 21, 2022

Things on a content management system - Jörg Hoh

How to handle errors in Sling Servlet requests

Error handling is a topic which developers rarely pay too much attention to. It is done when the API forces them to handle an exception. And the most common pattern I see is the “log and throw” pattern, which means that the exception is logged and then re-thrown.

When you develop in the context of HTTP requests, error handling can get tricky. Because you need to signal the consumer of the response, that an error happened and the request was not successful. Frameworks are designed in a way that they handle any exception internally and set the correct error code if necessary. And Sling is not different from that, if your code throws an exception (for example the postConstruct of a Sling Model), the Sling framework catches it and sets the correct status code 500 (Internal Server Error).

I’ve seen code, which catches exception itself and sets the status code for the response itself. But this is not the right approach, because every exception handled this way the developers implicitly states: “These are my exceptions and I know best how to handle them”; almost as if the developer takes ownership of these exceptions and their root causes, and that there’s nothing which can handle this situation better.

This approach to handle exceptions on its own is not best practice, and I see 2 problems with it:

  • Setting the status code alone is not enough, but the remaining parts of the request processing need to stopped as well. Otherwise the processing continues as nothing happened, which is normally not useful or even allowed. It’s hard to ensure this when the exception is caught.
  • Owning the exception handling removes the responsibility from others. In AEM as a Cloud Service Adobe monitors response codes and the exceptions causing it. And if there’s only a status code 500 but no exception reaching the SlingMainServlet, then it’s likely that this is ignored, because the developer claimed ownership of the exception (handling).

If you write a Sling Servlet or code operating in the context of a request it is best practice not to catch exceptions, but to let them bubble up to the Sling Main Servlet, which is able to handle it appropriately. handle exceptions by yourself, only if you have a better way to deal with them as to log them.

by Jörg at January 21, 2022 07:27 PM

January 05, 2022

Things on a content management system - Jörg Hoh

How to deal with the “TooManyCallsException”

I randomly see the question “We get the TooManyCallsException while rendering pages, and we need to increase the threshold for the number of inclusions to 5000. Is this a problem? What can we do so we don’t run into this issue at all?”

Before I answer this question, I want to explain the background of this setting, why it was introduced and when such a “Call” is made.

Sling rendering is based on Servlets; and while a single servlet can handle the rendering of the complete response body, that is not that common in AEM. AEM pages normally consistent of a variety of different components, which internally can consist of distinct subcomponents as well. This depends on the design approach the development has choosen.
(It should be mentioned that all JSPs and all HTL scripts are compiled into regular Java servlets.)

That means that the rendering process can be considered as tree of servlets, and servlets calling other servlets (with the DefaultGetServlet being the root of such a tree when rendering pages). This tree is structured along the resource tree of the page, but it can include servlets which are rendering content from different areas of the repository; for example when dealing with content fragments or including images, which require their metadata to be respected.

It is possible to turn this tree into a cyclic graph; and that means that the process of traversing this tree of servlets will turn into a recursion. In that case request processing will never terminate, the Jetty thread pool will quickly fill up to its limit, and the system will get unavailable. To avoid this situation only a limited number of servlet-calls per request is allowed. And that’s this magic number of 1000 allowed calls (which is configured in the Sling Main Servlet).

Knowing this let me try to answer the question “Is it safe to increase this value of 1000 to 5000?“. Yes, it is safe. In case your page rendering process goes recursive it terminates later, which will increase a bit the risk of your AEM instance getting unavailable.


Are there any drawbacks? Why is the default 1000 and not 5000 (or 10000 or any higher value)?” From experience 1000 is sufficient for the majority of applications. It might be too low for applications where the components are designed very granular which in turn require a lot of servlet calls to properly render a page.
And every servlet call comes with a small overhead (mostly for running the component-level filters); and even if this overhead is just 100 microseconds, 1000 invocations are 100 ms just for the invocation overhead. That means you should find a good balance between a clean application modularization and the runtime performance overhead of it.

Which leads to the next question: “What are the problematic calls we should think of?“. Good one.
From a high-level view of AEM page renderings, you cannot avoid the servlet-calls which render the components. That means that you as an AEM application developer cannot influence the overall page rendering process, but you can only try to optimise the rendering of individual (custom) components.
To optimise these, you should be aware, that the following things trigger the invocation of a servlet during page rendering:

  • the <cq:include>, <sling:include> and <sling:forward> JSP tags
  • the data-sly-include statement of HTL
  • and every method which invokes directly or indirectly the service() method of a servlet.

A good way to check this for some pages is the “Recent requests” functionality of the OSGI Webconsole.

by Jörg at January 05, 2022 04:28 PM

December 01, 2021

Things on a content management system - Jörg Hoh

The web, an eventually consistent system

For many large websites, CDNs are the foundation for delivering content quickly to their customers around the world. The ability of CDNs to cache responses close to consumers also allows these sites to operate on a small hardware footprint. However, compared to what they would have to invest if they operated without a CDN and delivered all content through their own systems, this comes at a cost: your CDN may now deliver content that is out of sync with your origin because you changed the content on your own system. This change is not done in an atomic fashion. This is the same “atomic” as in the ACID principle of database implementations.
This is a conscious decision, and it is caused primarily by the CAP theorem. It states that in a distributed data storage system, you can only achieve 2 of these 3 guarantees:

  • Consistency
  • Availability
  • Partition tolerance

And in the case of a CDN (which is a highly distributed data storage system), its developers usually opt for availability and partition tolerance over consistency. That is, they accept delivering content that is out of date because the originating system has already updated it.

To mitigate this situation the HTTP protocol has features built-in which help to mitigate the problem at least partially. Check out the latest RFC draft on it, it is a really good read. The main feature is called “TTL” (time-to-live) and means that the CDN delivers a version of the content only for a configured time. Afterwards the CDN fetches a new version will from the origin system. The technical term for this is “eventual consistent” because at that point the state of the system with respect to that content is consistent again.

This is the approach all CDNs support, and it works very reliable. But only if you accept that you change content on the origin system and that it will reach your consumers with this delay. The delay is usually set to a period of time that is empirically determined by the website operators, trying to balance the need to deliver fresh content (which requires a very low or no TTL) with the number of requests that the CDN can answer instead of the origin system (in this case, the TTL should be as high as possible). Usually it is in the range of a few minutes.

(Even if you don’t use a CDN for your origin systems, you need these caching instructions, otherwise browsers will make assumptions and cache the requested files on their own. Browsing the web without caching is slow, even on very fast connections. Not to mention what happens when using a mobile device over a slow 3G line … Eventual consistency is an issue you can’t avoid when working on the web.)

Caching is an issue you will always have to deal with when creating web presences. Try to cache as much as possible without neglecting the need to refresh or update content at a random time.

You need to constantly address eventual consistency. Atomic changes (that means changes are immediately available to all consumers) are possible, but they come at a price. You can’t use CDNs for this content; you must deliver it all directly from your origin system. In this case, you need to design your origin system so that it can function without eventual consistency at all (and that’s built in into many systems). Not to mention the additional load it will have to handle.

And for this reason I would always recommend not relying on atomic updates or consistency across your web presence. Always factor in eventual consistency in the delivery of your content. And in most cases even business requirements where “immediate updates” are required can be solved with a TTL of 1 minute. Still not “immediate”, but good enough in 99% of all cases. For the remaining 1% where consistency is mandatory (e.g. real-time stock trading) you need to find a different solution. And I am not sure if the web is always the right technology then.

And as an afterthought regarding TTL: Of course many CDNs offer you the chance to actively invalidate the content, but it often comes with a price. In many cases you can invalidate only single files. Often it is not an immediate action, but takes seconds up to many minutes. And the price is always that you have to have the capacity to handle the load when the CDN needs to refetch a larger chunk of content from your origin system.

by Jörg at December 01, 2021 01:16 PM

November 01, 2021

Things on a content management system - Jörg Hoh

Understanding AEM request processing using the OSGI “Recent Request” console

During some recent work on performance improvements in request processing I used a tool, which is part of AEM for a very long time now; I cannot recall a time when it was NOT there. It’s very simple, but nevertheless powerful and it can help you to understand the processing of requests in AEM much better.

I am talking about the “Recent Requests Console” in the OSGI webconsole, which is a gem in the “AEM performance tuning” toolbox.

In this blog post I use this tool to explain the details of the request rendering process of AEM. You can find the detailed description of this process in the pages linked from this page (Sling documentation).

Screenshot “Recent requests”

With this Recent Requests screen (goto /system/console/requests) you can drill down into the rendering process of the last 20 requests handled by this AEM instance; these are listed at the top of the screen. Be aware that if you have a lot of concurrent requests you might often miss the request you are looking for, so if you really rely on it, you should increase the number of requests which are retained. This can be done via the OSGI configuration of the Sling Main Servlet.

When you have opened a request, you will see a huge number of single log entries. Each log entry contains as first element a timestamp (in microseconds, 1000 microseconds = 1 millisecond) relative to the start of the request. With this information you can easily calculate how much time passed between 2 entries.

And each request has a typical structure, so let’s go through it using the AEM Start page (/aem/start.html). So just use a different browser window and request that page. Then check back on the “Recent requests console” and select the “start.html”.
In the following I will go through the lines, starting from the top.

      0 TIMER_START{Request Processing}
      1 COMMENT timer_end format is {<elapsed microseconds>,<timer name>} <optional message>
     13 LOG Method=GET, PathInfo=null
     17 TIMER_START{handleSecurity}
   2599 TIMER_END{2577,handleSecurity} authenticator org.apache.sling.auth.core.impl.SlingAuthenticator@5838b613 returns true

This is a standard header for each request. We can see here that the authentication took 2599 microseconds.

   2981 TIMER_START{ResourceResolution}
   4915 TIMER_END{1932,ResourceResolution} URI=/aem/start.html resolves to Resource=JcrNodeResource, type=granite/ui/components/shell/page, superType=null, path=/libs/granite/ui/content/shell/start
   4922 LOG Resource Path Info: SlingRequestPathInfo: path='granite/ui/components/shell/page', selectorString='null', extension='html', suffix='null'

Here we see the 2 log lines for a the resolving process of a resourcetype. It took 1932 microseconds to map the request “/aem/start.html” to the resourcetype “granite/core/components/login” with the path being /libs/granite/ui/content/shell/start. Additionally we see information about the selector, extension and suffix elements.

   4923 TIMER_START{ServletResolution}
   4925 TIMER_START{resolveServlet(/libs/granite/ui/content/shell/start)}
   4941 TIMER_END{14,resolveServlet(/libs/granite/ui/content/shell/start)} Using servlet BundledScriptServlet (/libs/granite/ui/components/shell/page/page.jsp)
   4945 TIMER_END{21,ServletResolution} URI=/aem/start.html handled by Servlet=BundledScriptServlet (/libs/granite/ui/components/shell/page/page.jsp)

That’s a nested servlet resolution, which takes 14 respective 21 microseconds.  Till now that’s mostly standard and hard to influence performance-wise. But it already gives you a lot information, especially regarding the resourcetype which is managing the complete response processing.

   4948 LOG Applying Requestfilters
   4952 LOG Calling filter: com.adobe.granite.resourceresolverhelper.impl.ResourceResolverHelperImpl
   4958 LOG Calling filter: org.apache.sling.security.impl.ContentDispositionFilter
   4961 LOG Calling filter: com.adobe.granite.csrf.impl.CSRFFilter
   4966 LOG Calling filter: org.apache.sling.i18n.impl.I18NFilter
   4970 LOG Calling filter: com.adobe.granite.httpcache.impl.InnerCacheFilter
   4979 LOG Calling filter: org.apache.sling.rewriter.impl.RewriterFilter
   4982 LOG Calling filter: com.adobe.cq.history.impl.HistoryRequestFilter
   7870 LOG Calling filter: com.day.cq.wcm.core.impl.WCMRequestFilter
   7908 LOG Calling filter: com.adobe.cq.wcm.core.components.internal.servlets.CoreFormHandlingServlet
   7912 LOG Calling filter: com.adobe.granite.optout.impl.OptOutFilter
   7921 LOG Calling filter: com.day.cq.wcm.foundation.forms.impl.FormsHandlingServlet
   7932 LOG Calling filter: com.day.cq.dam.core.impl.servlet.DisableLegacyServletFilter
   7935 LOG Calling filter: org.apache.sling.engine.impl.debug.RequestProgressTrackerLogFilter
   7938 LOG Calling filter: com.day.cq.wcm.mobile.core.impl.redirect.RedirectFilter
   7940 LOG Calling filter: com.day.cq.wcm.core.impl.AuthoringUIModeServiceImpl
   8185 LOG Calling filter: com.adobe.granite.rest.assets.impl.AssetContentDispositionFilter
   8201 LOG Calling filter: com.adobe.granite.requests.logging.impl.RequestLoggerImpl
   8212 LOG Calling filter: com.adobe.granite.rest.impl.servlet.ApiResourceFilter
   8302 LOG Calling filter: com.day.cq.dam.core.impl.servlet.ActivityRecordHandler
   8321 LOG Calling filter: com.day.cq.wcm.core.impl.warp.TimeWarpFilter
   8328 LOG Calling filter: com.day.cq.dam.core.impl.assetlinkshare.AdhocAssetShareAuthHandler

These are all request-level filters, which are executed just once per request.

And now the interesting part starts: the rendering of the page itself. The building blocks are called “components” (that term is probably familiar to you) and it always follows the same pattern:

  • Calling Component Filters
  • Executing the Component
  • Return from the Component Filters (in reverse order of the calling)

This pattern can be clearly seen in the output, but most often it is more complicated because many components include other components, and so you end up in a tree of components being rendered.

As an example for the straight forward case we can take the “head” component of the page:

  25849 LOG Including resource MergedResource [path=/mnt/overlay/granite/ui/content/globalhead/experiencelog, resources=[/libs/granite/ui/content/globalhead/experiencelog]] (SlingRequestPathInfo: path='/mnt/overlay/granite/ui/content/globalhead/experiencelog', selectorString='null', extension='html', suffix='null')
  25892 TIMER_START{resolveServlet(/mnt/overlay/granite/ui/content/globalhead/experiencelog)}
  25934 TIMER_END{40,resolveServlet(/mnt/overlay/granite/ui/content/globalhead/experiencelog)} Using servlet BundledScriptServlet (/libs/cq/experiencelog/components/head/head.jsp)
  25939 LOG Applying Includefilters
  25943 LOG Calling filter: com.adobe.granite.csrf.impl.CSRFFilter
  25951 LOG Calling filter: com.day.cq.personalization.impl.TargetComponentFilter
  25955 LOG Calling filter: com.day.cq.wcm.core.impl.page.PageLockFilter
  25959 LOG Calling filter: com.day.cq.wcm.core.impl.WCMComponentFilter
  26885 LOG Calling filter: com.day.cq.wcm.core.impl.monitoring.PageComponentRequestFilter
  26893 LOG Calling filter: com.adobe.granite.metrics.knownerrors.impl.ErrorLoggingComponentFilter
  26896 LOG Calling filter: com.day.cq.wcm.core.impl.WCMDebugFilter
  26899 LOG Calling filter: com.day.cq.wcm.core.impl.WCMDeveloperModeFilter
  28125 TIMER_START{BundledScriptServlet (/libs/cq/experiencelog/components/head/head.jsp)#1}
  46702 TIMER_END{18576,BundledScriptServlet (/libs/cq/experiencelog/components/head/head.jsp)#1}
  46734 LOG Filter timing: filter=com.day.cq.wcm.core.impl.WCMDeveloperModeFilter, inner=18624, total=19806, outer=1182
  46742 LOG Filter timing: filter=com.day.cq.wcm.core.impl.WCMDebugFilter, inner=19806, total=19810, outer=4
  46749 LOG Filter timing: filter=com.adobe.granite.metrics.knownerrors.impl.ErrorLoggingComponentFilter, inner=19810, total=19816, outer=6
  46756 LOG Filter timing: filter=com.day.cq.wcm.core.impl.monitoring.PageComponentRequestFilter, inner=19816, total=19830, outer=14
  46761 LOG Filter timing: filter=com.day.cq.wcm.core.impl.WCMComponentFilter, inner=19830, total=20750, outer=920
  46767 LOG Filter timing: filter=com.day.cq.wcm.core.impl.page.PageLockFilter, inner=20750, total=20754, outer=4
  46772 LOG Filter timing: filter=com.day.cq.personalization.impl.TargetComponentFilter, inner=20754, total=20758, outer=4

At the top you see the LOG statement “Including resource …” which provides you with the information what resource is rendered, including additional information like selector, extension and suffix.

As next statement we have the resolution of the renderscript which is used to render this resource, plus the time it took (40 microseconds).

Then we have the invocation of all component filters, the execution of the render script itself, which is using a TIMER to record start time, end time and duration (18576 microseconds), and the unwinding of the component filters.

If you use a recent version of the SDK for AEM as a Cloud Service, all timestamps are in microseconds, but in AEM 6.5 and older the duration measured for the Filters (inner=…, outer=…) were printed in miliseconds (which is an inconsistency I just fixed recently).

If a component includes another component, it looks like this:

8350 LOG Applying Componentfilters
   8358 LOG Calling filter: com.day.cq.personalization.impl.TargetComponentFilter
   8361 LOG Calling filter: com.day.cq.wcm.core.impl.page.PageLockFilter
   8365 LOG Calling filter: com.day.cq.wcm.core.impl.WCMComponentFilter
   8697 LOG Calling filter: com.day.cq.wcm.core.impl.monitoring.PageComponentRequestFilter
   8703 LOG Calling filter: com.adobe.granite.metrics.knownerrors.impl.ErrorLoggingComponentFilter
   8733 LOG Calling filter: com.day.cq.wcm.core.impl.WCMDebugFilter
   8750 TIMER_START{BundledScriptServlet (/libs/granite/ui/components/shell/page/page.jsp)#0}
  25849 LOG Including resource MergedResource [path=/mnt/overlay/granite/ui/content/globalhead/experiencelog, resources=[/libs/granite/ui/content/globalhead/experiencelog]] (SlingRequestPathInfo: path='/mnt/overlay/granite/ui/content/globalhead/experiencelog', selectorString='null', extension='html', suffix='null')
  25892 TIMER_START{resolveServlet(/mnt/overlay/granite/ui/content/globalhead/experiencelog)}
  25934 TIMER_END{40,resolveServlet(/mnt/overlay/granite/ui/content/globalhead/experiencelog)} Using servlet BundledScriptServlet (/libs/cq/experiencelog/components/head/head.jsp)
  25939 LOG Applying Includefilters
[...]
148489 LOG Filter timing: filter=com.day.cq.wcm.core.impl.WCMDeveloperModeFilter, inner=1698, total=1712, outer=14
 148500 LOG Filter timing: filter=com.day.cq.wcm.core.impl.WCMDebugFilter, inner=1712, total=1717, outer=5
 148509 LOG Filter timing: filter=com.adobe.granite.metrics.knownerrors.impl.ErrorLoggingComponentFilter, inner=1717, total=1722, outer=5
 148519 LOG Filter timing: filter=com.day.cq.wcm.core.impl.monitoring.PageComponentRequestFilter, inner=1722, total=1735, outer=13
 148527 LOG Filter timing: filter=com.day.cq.wcm.core.impl.WCMComponentFilter, inner=1735, total=2144, outer=409
 148534 LOG Filter timing: filter=com.day.cq.wcm.core.impl.page.PageLockFilter, inner=2144, total=2150, outer=6
 148543 LOG Filter timing: filter=com.day.cq.personalization.impl.TargetComponentFilter, inner=2150, total=2154, outer=4
 148832 TIMER_END{140080,BundledScriptServlet (/libs/granite/ui/components/shell/page/page.jsp)#0}

You see the component filters, but then after the TIMER_START for the page.jsp (check the trailing timer number: #0, every timer has a unique ID!) line you see the inclusion of a new resource. For this again the render script is resolved and instead of the ComponentFilters the IncludeFilters are called, but in the majority of cases the list of filters are identical. And depending on the resource structure and the script, the rendering tree can get really deep. But eventually you can see that the the rendering of the page.jsp is completed; you can easily find it by looking for the respective timer ID.

Equipped with this knowledge you can now easily dig into the page rendering process and see which resources and resource types are part of the rendering process of a page. And if you are interested in the bottlenecks of the page rendering process you can check the TIMER_END lines which both include the rendering script plus the time in microseconds it took to render it (be aware, that this time also includes it too to render all scripts invoked from this render script).

But the really cool part is that this is extensible. Via the RequestProgressTracker you can easily write your own LOG statements, start timers etc. So if you want to debug requests to better understand the timing, you can easily use something like this:

slingRequest.getRequestProgressTracker().log("Checkpoint A");

And then you can find this log message in this screen when this component is rendered. You can use it to output useful (debugging) information or just have use its timestamp to identify performance problems. This can be superior to normal logging (to a logfile), becaus you can leave these statements in production code, and they won’t pollute the log files. You just need to have access to the OSGI webconsole, search for the request you are interested and check the rendering process.

And if you are interested, you can can also get all entries in this screen and do whatever you like. For example you can write a (request-level) filter, which calls first the next filter, and afterwards logs all entries of the RequestProgressTracker to the logfile, if the request processing took more than 1 second.

The Request Progress Tracker plus the “Recent Requests” Screen of the OSGI webconsole are a really cool combination to both help you to understand the inner working of the Sling Request Processing, and it’s also a huge help to analyze and understand the performance of request processing.

I hope that this technical deep dive into the sling page rendering process was helpful for you, and you are able to spot many interesting aspects of an AEM system just be using this tool. If you have questions, please leave me a comment below.

by Jörg at November 01, 2021 03:22 PM

October 25, 2021

CQ5 Blog - Inside Solutions

Cloud Manager: Deploy and Operate AEM Cloud Service

Cloud Manager: Deploy and Operate AEM Cloud Service

Cloud Manager is an integral part of Adobe’s AEM as a Cloud Service (AEMaaCS) offering. 

Cloud Manager provides a fully-featured Continuous Integration / Continuous Development (CI/CD) pipeline enabling organisations to build, test, and deploy their AEM applications to the Adobe Cloud automatically. 

Hosting, operation, and scaling of Adobe Experience Manager is all managed by Adobe in the background including a SLA. Maintenance of Cloud Manager and upgrading of AEM is taken care of by Adobe as well.

Cloud Manager benefits smaller projects with the extensive out of the box build pipeline and stable deployment that promises zero downtime. Larger projects can free up resources in their devops and operations team which no longer have to focus on the intricacy of deploying and hosting AEM. 

Lastly, overall system performance, stability and availability are improved since no one will know how to build and host Adobe Experience Manager better than Adobe.

Overall, Cloud Manager is a great cost and time saver due to a lot of functionality which is provided and maintained by Adobe. 

We will explore and highlight the main functionalities so that you understand the tool and reasoning why we, at One Inside, think it’s so great.

What is Adobe Cloud Manager?

Adobe Cloud Manager allows self-managed deployments and operation of AEM Cloud Service.

It consists of a CI/CD pipeline, various environments, code repositories and further information about the system like logs or SLA reports.

Log in to Adobe Cloud Manager

To log in to Cloud Manager, go to experience.adobe.com (Experience Manager / Launch Cloud Manager).

If you do not have access, either your company does not yet have the AEM Cloud Service licenses or your account is lacking the required permissions.

You can find the most important information about the environments and pipelines on the Startpage and have access to more detailed information. 

Cloud Manager Core Features

Cloud Manager has the following main features:

  • Self-Service web interface for the deployments and AEM operation
  • Cloud Manager functionality can also be accessed programmatically via API
  • Fully automated and configurable CI/CD Pipeline
  • Provisioning and configuration of productive and test environments
  • Adobe hosted git repositories
  • Automated quality assurance of the application (code quality, security and performance)
  • Autoscaling of both AEM author as well as publish Instances
  • Multitier Caching Architecture including global Akamai Cache

Benefits and Disadvantages of Adobe Cloud Manager

These outlined core features result in a great set of benefits when using Cloud Manager:

  • Performance – Great performance can be expected, the global CDN and the possibility to run the AEM servers in one of the globally distributed Azure datacenters (support for AWS is on the roadmap). AEM hosting by Adobe helps guarantee optimal AEM performance and continuous improvements.
  • Autoscaling – When subjected to unusually high load, Cloud Manager detects the need for additional resources and automatically brings additional instances online via autoscaling. This works for both authoring and publishing instances.
  • Confidence in deployments – since the same pipeline is executed by all AEMaaCS customers, Adobe can optimise the reliability of the pipeline and deployments. After ten successful deployments, the customer can usually independently carry out deployments without involving Adobe Customer Service at all.
  • Extensibility of the pipeline – Cloud Manager is integrated into the Experience Cloud APIs and is therefore easy to connect or integrate with other or custom services.
  • Backup – Cloud Manager will automatically back up before every release. If any issue is noticed after the deployment, the release can be set back with the press of a button. The production instances are backed up as well (24h point in time recovery, up to 7 days with Adobe-defined timestamps).
  • “Zero” Downtime – Adobe has a lot of experience in hosting AEM for its large customer base. This allows Adobe to achieve great availability and you can expect basically zero downtime. Need proof? Adobe’s SLA of 99.9%.
  • Very low initial setup time – Basically a “1 click setup” for environments and the default pipeline. Certificates and domains are also set up quickly via the UI.
  • Very low maintenance and operation costs – Adobe takes care of maintaining the pipeline, upgrading AEM, providing security fixes for the OS, and operating all systems (Cloud Manager, Apache / Dispatcher, AEM instances, Akamai CDN etc).
  • Always up to date AEM – Adobe releases new versions almost weekly or even more frequently if there is a very urgent security fix. The moment the new features or fixes are available you will see them on your AEM instances! Your security team will be very happy to hear that.
    For on-premises versions, new features will only be available approximately 6 months after their release (except security fixes which usually come with service packs).

As always there are some drawbacks, but the benefits far outweigh them and there are ways to work around them:

  • Less flexibility – The pipeline and architecture is, to a certain degree, predefined. For example, it’s no longer possible to install additional OS level applications or use a different caching solution like Varnish. The Adobe I/O runtime or an external environment has to be used to provide additional services instead.
  • Limited customisability of AEM – It’s no longer possible to extend AEM freely. Some customisation is still possible, especially if the developers get creative, but not everything. Since this will be a win for maintainability, this could almost be regarded as an advantage.
  • Less control – Since Adobe takes responsibility for running the services and provides an SLA there are certain limitations.
    For example, it’s not possible to log in on the publisher’s website, no admin password is available, and Felix Console access is blocked. Especially the last one is a concern for any AEM developers and will hinder the possibilities to debug issues on productive systems.
    These issues are somewhat alleviated since Cloud Manager allows to extract certain information like log files. On the authoring instance, some tools are still accessible (e.g. /crx/de). Cloud Manager also provides additional utilities, like viewing the bundle status (those will probably be expanded on in the future).

Release a new version of AEM in the cloud (CI/CD pipeline)

The Cloud Manager CI/CD pipeline brings the code in the repository to a build application on your productive Adobe Experience Manager environment.

There are two types of environments, each environment consists of the full AEM stack (author, publish, dispatcher). A single pair called “Production” consists of “Stage” and “Prod” environments. 

Every “Production” deployment first goes to “Stage” where it is analysed and can be further inspected manually before being approved and deployed to “Production”.

All other environments are referred to as “Non-Production” and are used as test environments. New test environments can be provisioned as needed.

The pipeline runs are shown in the UI and additional logs of each execution step can be downloaded to debug any issues. There are several steps in the pipeline explained as follows:

1 – Code in Adobe Git

Cloud Manager allows git repositories hosted by Adobe. The pipeline can only fetch code from those repositories. The pipeline can be triggered on commit on a certain branch or triggered manually.

To use an external non-Adobe repository, the changes have to be synchronised with the Adobe repository (this can easily be automated with various CI/CD tools like Github Actions, Bitbucket Pipelines or Jenkins).

2 – Build Code and Unit Tests

The project is built by executing the Maven build, including executing the unit tests. The result is the “Release” build.

3 – Code Scanning

This inspects the whole code base and applies static code analysis.

There are several rule sets for different topics like test coverage, potential security issues or maintainability in the context of AEM. Each topic is rated and a recommendation is given by Adobe. 

These recommendations are set up quite reasonably at 50% coverage. The goal of each project should be to reach those numbers.

4 – Deploy to Stage

The code is now deployed to the stage environment. Internally, Cloud Manager creates a copy of the whole stage environment before deploying it. If there is any issue or failure with the deployment, the Stage can be reverted to the previous state.

5 – Stage testing Tests: Security Tests, Performance & Load Tests, UI Tests

Various tests are executed by default by Adobe to test if AEM itself is still working as expected and some tools to measure the performance of the website. Additionally, custom tests can be added to further test the integration of the application into AEM or the website itself.

Deploy to Production

If not disabled, the pipeline halts at this point before deploying to Production. This allows us to inspect the performance tests and code audit test results.

Any further manual testing can now be done on Stage. If everything looks good, the build can be approved and will be deployed to Product. Otherwise, the build is cancelled and reverted.

Testing with Cloud Manager

Four different categories of tests are executed in the pipeline.

Unit Tests

These are executed before the deployment step. They test the application on a code level and in isolation.

Since unit tests are pretty much industry standard, there is mainly one interesting question: how high should the coverage be?

There are different viewpoints on this topic. 

Adobe is quite defensive or realistic with the expectation of 50%. From our perspective, for web based projects and especially content focused logic, the integration tests discussed afterwards provide a lot of value. 

For each project it has to be decided how much time is spent on each type of test.

Code Scanning

This is executed before the deployment step. It inspects the code itself by doing static analysis. It gives various metrics indicating the quality of the code base by a set of code quality rules defined by Adobe.

Internally, SonarQube is used to analyse the code. Additionally, OakPAL scans the built content package to catch various potential issues which might not work with the deployment.

There are three categories of criticality:

  • Critical: Pipeline stops immediately
  • Important: Pipeline pauses, can be manually continued if the issue is not urgent for the current release and fixed later
  • Info: Purely informational

There are the following types of ratings, each with different failure thresholds (check code quality rules documentation for details).

Over 100 SonarQube rules are applied. If a specific issue is a false positive and should be ignored, an Excel from the link above can be downloaded to look up the rule key. 

The key can then be used in the Java code to make sure SonarCube will skip the warning. An example for “Credentials should not be hard-coded” is ​​”squid:S2068″. 

In the Java code add the annotation: @SuppressWarnings(“squid:S2068”)

Experience Audit (Performance Testing)

This executes the well known Google Lighthouse Tool, the same that is available in Chrome Dev Tools. 

It indicates changes compared to the last release. It’s also possible to inspect and download the full Lighthouse report. 

This is a great feature to have, especially because the tests are executed on every run and in an isolated, repeatable environment. 

What is missing is a view for the audit over time, that would be really helpful to track performance.

Product Functional Testing, Custom Functional Testing, Custom UI Testing

There are several integration tests.

Adobe provides a set of tests to verify the basic functionality of AEM, for example, if content can still be replicated from Author to Publish. Adobe might add additional tests in the future as well – who doesn’t like free integration tests!

In addition, custom tests can be written to further verify the functionality of AEM. This is especially useful if there are custom AEM modifications.

UI Tests are intended to test the website itself on the publisher instance via dispatcher. The idea is to provide test content with the code which will be deployed to the stage and then execute the integration tests in various browsers to verify functionality. 

There is a default setup in the Maven Archetype for Integration tests based on webdriver.io and Selenium. 

Docker is used to build and execute the integration tests in the cloud. It’s possible to modify those to adjust the test setup. Important to note is that UI tests are disable dy default.

Follow the documentation for “Customer Opt-Int” to understand how to enable it.

Team and Roles

There is a set of predefined roles by Adobe, which also have their according permission profile and access restrictions of who is allowed to run or modify the pipeline and other features. 

Most of those roles probably match the existing roles of a project.

The most important ones are Business Owner, Deployment Manager, Program Manager, Developer. Content Authors do not have to interact with Cloud Manager. Permissions for Authors are set up in AEM itself.

We believe it’s not necessary to use this many roles for most projects and if you can trust your developers, it’s probably enough if the lead is “Business Owner” and developers are “Deployment Manager“. 

Have a look at the user permission table to decide what makes sense for your project.

A notable role that is missing is “DevOps” or “Operation“. Deployment Manager is what comes closest to a DevOps person, since it’s allowed to edit the pipeline, however, any experienced developer should be able to configure Cloud Manager.

If there are integrations of Cloud Manager planned, a person with DevOps experience might become helpful.

Integrate Cloud Manager programmatically in your current Solution (Advanced topic)

Adobe is aware that every customer has its unique application landscape. There are various ways to integrate AEM Cloud Service and Cloud Manager.

Adobe Cloud Manager API

All capabilities available in the UI can also be programmatically accessed with the Adobe Cloud Manager API

This allows to integrate the AEM Cloud Service Pipeline into a custom existing CI/CD infrastructure and also enhances the pipeline with additional custom features.

Webhooks are also supported by the API which is a great way to integrate with other services.

Some example use cases are triggering of the pipeline from an external action, monitoring and notification (e.g. Slack channel) of the pipeline runs, externally executing additional tests, or adding actions after the deployment like clearing of the cache.

Identity Management System (IMS) integration

Provisioning and access to control for the Cloud Manager and AEM can be handled manually but also via integration of external IMS.

Synchronising user accounts with group permissions is supported to automate provisioning. SAML with an external IDP is supported to enable Single Sign On.

Firewall

By default, the AEM Cloud Service instances do not have access to external systems due to security reasons. Simple IP whitelisting can be configured directly in Cloud Manager. 

For anything more complex, a solution can be worked out with the Adobe Cloud Manager engineer.

Forwarding Splunk logs

AEM Cloud Service internally uses Splunk to aggregate the logs. 

Via support it can be requested to configure forwarding of the Splunk logs to a custom Splunk instance. This is a great way to extract as much information from the system as possible.

Conclusion

As we can see, Adobe Cloud Manager provides out of the box enterprise-grade CI/CD and hosting of AEM applications in the cloud. 

There is a big cost-saving potential for both the initial setup as well as maintenance, thanks to the simple configuration in the web UI and all the operation efforts being taken care of by Adobe.

Combined with the powerful capability of AEM itself and the Adobe Experience Cloud as a whole, AEM Cloud Service is the best cloud-native CMS offering on the market.

This article is part of a series of content about AEM Cloud Service, where we explain how to move to AEM Cloud.

Learn how to design an AEM website with Core Components. Finally, once your website is live, start optimising it and improving the customer experience.

Basil Kohler

Basil Kohler

AEM Architect

Would you like to receive the next article?

Subscribe to our newsletter and we will send you the next article about AEM Cloud Service.

The post Cloud Manager: Deploy and Operate AEM Cloud Service appeared first on One Inside.

by Samuel Schmitt at October 25, 2021 08:31 AM

October 17, 2021

Things on a content management system - Jörg Hoh

AEM micro-optimization (part 4) – define allowed templates

This time I want to discuss a different type of micro-optimization. It’s not something you as a developer can implement in your code, but it’s rather a question of the application design, which  has some surprising impact. I came across it when I recently investigated poor performance in the Siteadmin navigation. And although I did this investigation in AEM as a Cloud Service, the logic on AEM 6.5 behaves the same way.

When you click in the siteadmin navigation through your pages, AEM collects a lot of information about pages and folders to display them in the proper context. For example, when you click on page with child pages, it collects information what actions should be displayed if a specific child node is going to be selected (copy, paste, publish, …)

An important information is if the “Create page” action should be made available. And that’s the thing I want to outline in this article.

Screenshot: “Create” dialog

Assuming that you have the required write permissions on that folder, the most important is if templates are allowed to be created as children of the current page. The logic is described in the documentation and is quite complex.

In short:

  • On the content the template must be allowed (using the cq:allowedTemplates property (if present) AND
  • The template must be allowed to be used as a child page of the current page

Both conditions are must be met for a template to make it eligible to be used as a source for a new page. To display the entry “Page” it’s sufficient if at least 1 template is allowed.

Now let’s think about the runtime performance of this check, and that’s mostly determined by the total number of templates in the system. AEM determines all templates by this JCR query:

//jcr:content/element(*,cq:Template)

And that query returns 92 results on my local SDK instance with WKND installed. If we look a bit more closely to the results, we can determine 3 different types of templates:

  • Static templates
  • Editable templates
  • Content Fragment models

So depending on your use-case it’s easy to end up with hundreds of templates, and not all of them are applicable at the location you are currently in. In fact, typically just very few templates can be used to create a page here. That means that the check most likely needs to iterate a lot to eventually encounter a template which is a match.

Let’s come back to the evaluation if that entry should be displayed. If you have defined the cq:allowedTemplates property  on the page or it’s ancestors it’s sufficient to check the templates listed there. Typically it’s just a handful of templates, and it’s very likely that you find a “hit” early on, which immediately terminates this check with a positive result. I want to explicitly mention that not every template listed can be created here, because there also other constraints (e.g. the parent template must be of a certain type etc) which must match.

 If template A is allowed to be used below /content/wknd/en, then we just need to check the single Template A to get that hit. We don’t care, where in the list of templates it is (which are returned by the above query), because we know exactly which one(s) to look at.

If that property is not present, AEM needs to go through all templates and check the conditions for each and every one, until it finds that positive result.  And the list of templates is identical to the order in which the templates are returned from the JCR query, that means the order is not deterministic. Also it is not possible to order the result in a helpful way, because the semantic of our check (which include regular expressions) cannot be expressed as part of the JCR query.

So you are very lucky if the JCR query returns a matching template already at position 1 of the list, but that’s very unlikely. Typically you need to iterate tens of templates to get a hit.

So, what’s the impact on the performance of this iteration and the checks? In an synthetic check with 200 templates, when I did not have any match, it took around 3-5ms to iterate and check all of the results.

You might ask, “I really don’t feel a 3-5ms delay”, but when the list view in siteadmin performs this check for up to 40 pages in a single request, it’s rather a 120-200 millisecond difference. And that is a significant delay for requests where bad performance is visible immediately. Especially if there’s a simple way to mitigate this.

And for that reason I recommend you to provide “cq:allowedTemplates” properties on your content structure. In many cases it’s possible and it will speed up the siteadmin navigation performance.

And for those, who cannot change that: I currently working on changing the logic to speedup the processing for the cases where no cq:allowedTemplates property is applicable. And if you are on AEM as a Cloud Service, you’ll get this improvement automatically.

by Jörg at October 17, 2021 02:13 PM

September 21, 2021

CQ5 Blog - Inside Solutions

5 Tipps zur Pflege und Verbesserung Ihres Chatbots

5 Tipps zur Pflege und Verbesserung Ihres Chatbots

Chatbot-Lösungen sind ein flexibler Ansatz, um mit Kunden in verschiedenen Situationen in Kontakt zu treten. 

Sie vereinfachen den Zugang zu Ihrer Marke für Ihre Kunden. Ausserdem helfen sie Ihnen, tiefe Einblicke in die Bedürfnisse Ihrer Kunden zu gewinnen.  

Ein Chatbot ist jedoch nur dann erfolgreich, wenn er es schafft, Kundenanfragen zu beantworten.   

Um dieses Ziel zu erreichen, ist die Optimierung nach dem Go-Live der entscheidende Faktor. 

In diesem Beitrag erklären wir den Chatbot-Lebenszyklus und zeigen Ihnen fünf Möglichkeiten auf, wie Sie Ihre Lösung im Laufe der Zeit verbessern können.  

Abschliessend erläutern wir den Aufwand, den Ihr Marketingteam betreiben muss, damit Ihre Lösung erfolgreich ist. 

Wie verläuft der Lebenszyklus eines Chatbots? 

Wir unterteilen den Chatbot-Lebenszyklus in die folgenden fünf Phasen: 

  1. Von der Idee zur Roadmap: Zunächst müssen Sie verstehen, was Chatbot-Lösungen bieten können und Ihre Vision definieren.  
  2. Die Roadmap in einen Plan verwandeln:  Sobald Ihre Vision und Strategie definiert sind, erstellen Sie einen konkreten Plan für die Entwicklung Ihres Chatbots. 
  3. Chatbot und Konversationen erstellen: Implementierung, Inhaltserstellung, Konzeption der Konversationen und Beginn des Trainierens Ihres Systems. 
  4. Vom Training bis zur Inbetriebnahme: Sie müssen Ihr System testen, das Go-Live vorbereiten und Ihren Chatbot zum Leben erwecken. 
  5. Skalieren und Optimieren des Chatbot-Erlebnisses: Die Inbetriebnahme ist nur der Anfang für Ihren Chatbot. Jetzt ist es an der Zeit, zu wachsen und zu optimieren! Kunden sind ungeduldig – planen Sie nicht, Ihren Chatbot in der Zukunft zu optimieren, sondern tun Sie es kontinuierlich, vom ersten Tag an.  

Wir werden uns auf den letzten Schritt des Lebenszyklus konzentrieren, der oft missverstanden oder vernachlässigt wird.

Muss ich einen KI-Chatbot wirklich optimieren? 

Chatbots nutzen KI, um zu kommunizieren.  

Warum ist eine kontinuierliche Optimierung erforderlich? Die gegenwärtigen Systeme bieten eine gewisse Flexibilität beim Verstehen der Absicht des Benutzers.  

Wenn die Sprache jedoch zu sehr von den erwarteten (trainierten) Phrasen abweicht und/oder die Absichten nicht bekannt sind, kann das System an seine Grenzen stossen.  

Chatbot-Systeme sind nicht vollständig selbstlernend. Man muss sie unterstützen, damit sie sich weiterentwickeln. Die Antworten des Chatbots sind noch nicht durch KI gesteuert. Stattdessen werden typischerweise handgeschriebene Inhalte dargestellt, die für die erwartete/gewünschte User Journey optimiert sind. 

Die folgenden Schwierigkeiten werden bei jedem Chatbot im Laufe der Zeit auftreten – unabhängig davon, wie gut Sie ihn geplant haben: 

  • Die Sprache des Nutzers ist unerwartet und daher wird eine falsche Absicht erkannt.
  • Kunden haben Fragen zu Ihrem Unternehmen, die Sie nicht bedacht haben. 
  • Sie bieten neue Leistungen an, für die der Chatbot nicht trainiert wurde.
  • Ihr Chatbot-System soll durch die Integration weiterer Systeme, z.B. aus einem CRM oder PIM, intelligenter werden.
  • Die Welt, in der wir leben, verändert sich kontinuierlich. Daher müssen auch Sie sich als Unternehmen anpassen, um Ihren Kunden weiterhin ein zeitgemässes und ansprechendes Erlebnis zu bieten – auch wenn sich Ihr Leistungsangebot nicht geändert hat.

Was müssen Sie konkret tun, um Ihren Chatbot auf dem neuesten Stand zu halten? 

Wie optimiert man einen Chatbot? 

Die kontinuierliche Optimierung eines Chatbots ist von entscheidender Bedeutung. Wir haben daher einige Punkte aufgelistet, um die sich Ihr Marketingteam mit Unterstützung Ihres IT-Teams kümmern sollte. 

1 – Mehr und mehr Fragen verstehen 

Wahrscheinlich haben Sie Ihr Chatbot-Projekt mit einer begrenzten Grundlage für das Sprachverständnis begonnen. Um die Bedürfnisse Ihrer Kunden zu verstehen – unabhängig davon, ob Sie sie beantworten oder nicht -, müssen Sie im Laufe der Zeit mehr und mehr Intents hinzufügen. 

Kurz gesagt: Sie müssen Ihrem System neue Phrasen beibringen. 

2 – Halten Sie Ihre Inhalte auf dem neuesten Stand

So wie Sie den Inhalt Ihrer Website im Laufe der Zeit optimieren, ist dies auch für Ihren Chatbot erforderlich. Ändern Sie Formulierungen, beeinflussen Sie die User Journey oder verbessern Sie verlinkte Assets. 

Kurz gesagt: Sie müssen die Chatbot-Antworten kontinuierlich optimieren. 

3 – Lernen Sie von Ihren Kunden 

Ihre Kunden gewähren bei der Nutzung des Chatbots tiefe Einblicke in ihre Bedürfnisse. Profitieren Sie davon.  

Beobachten Sie dazu, wie sie mit Ihrem System interagieren und welche Fragen sie stellen – insbesondere zu Themen, die Sie mit Ihrem Chatbot oder sogar auf Ihrer Website noch nicht beantworten.

Kurz gesagt: Sie müssen die Chatbot-Interaktionen analysieren, um Erkenntnisse zu gewinnen.  

4 – Überprüfen Sie den Erfolg Ihres Chatbots 

Chatbots unterstützen Ihre Kunden und können ein grossartiges Investment für Ihr Unternehmen darstellen.  

Glauben Sie jedoch nicht blind, dass Ihr Chatbot erfolgreich ist. Vergewissern Sie sich regelmässig, ob die beabsichtigten Ziele erreicht werden und versuchen Sie, zusätzliche Daten zu sammeln, um immer wieder neue Ziele zu setzen. 

Setzen Sie Ihre Erfolgsfaktoren in Beziehung zu anderen Systemen und ermitteln Sie systemübergreifende Vorteile. 

Kurz gesagt: Sie müssen den Erfolg Ihres Chatbots überwachen.

5 – Bringen Sie Ihren Chatbot auf das nächste Level 

Wahrscheinlich haben Sie die Liste der Funktionen für Ihren ersten Chatbot eingeschränkt.

Nach dem Go-Live und den ersten Erfahrungen mit Conversational Marketing ist es genau der richtige Zeitpunkt, zusätzliche Funktionen in Ihren Chatbot einzubauen.  

Zum Beispiel könnten Sie weitere Businessinformationssysteme integrieren, neue UX/UI-Konzepte ausprobieren oder Funktionslücken schliessen.  

Ausserdem sollten Sie sich über die Funktionen der Chatbots Ihrer Konkurrenten auf dem Laufenden halten.

Kurz gesagt: Sie müssen Ihren Chatbot mit Funktionen anreichern, um weiterhin zu profitieren und für Kunden attraktiv zu bleiben.

Wer optimiert den Chatbot und wie oft? 

Für eine erfolgreiche Chatbot-Optimierung ist ein geeigneter Prozess erforderlich.  

Bestimmen Sie die regelmässigen Aufgaben und weisen Sie Verantwortlichkeiten zu. Planen Sie Optimierungsanalysen, um den Erfolg auf lange Sicht sicherzustellen. 

Wir haben die Hauptaufgaben bereits beschrieben. Sie umfassen: 

  • Erweiterung der Trainingsdaten für bestehende Intents 
  • Hinzufügen neuer Intents für unerwartete Themen 
  • Verbesserung des Flusses und des Inhalts der Antworten 

Alle diese Aufgaben sind in erster Linie inhaltsorientiert. Daher sollten die Wartung und Optimierung durch den Business Stakeholder und sein Team vorangetrieben werden.   

Die Teammitglieder müssen darin geschult werden, Inhalte für Chatbots zu schreiben, und sie müssen die Grundprinzipien von NLU (Natural Language Understanding) verstehen, um die Trainingsdaten zu verbessern.  

Da diese Aufgaben direkte Auswirkungen auf die Kunden haben, ist es wichtig, sie regelmässig durchzuführen. Verbessern und fügen Sie z.B. wöchentlich neue Intents hinzu und aktualisieren Sie Inhalte monatlich.  

Neben Chatbot spezifischen Wartungsaufgaben sind auch andere Optimierungen erforderlich: Fehlerbehebungen und neue Funktionen. 

Diese Aufgaben werden normalerweise in Zusammenarbeit mit dem Entwicklungsteam gelöst. Die Business Stakeholders wissen, was sie wollen und die Entwickler wissen, wie man es umsetzt. Warten Sie nicht einfach, bis Sie neue Funktionen benötigen, sondern planen Sie regelmässige Verbesserungen.  

Und schliesslich sollten Sie sicherstellen, dass Ihr System technologisch auf dem neuesten Stand ist. APIs können sich möglicherweise ändern und neue Funktionen bieten.

Zusammenfassend lässt sich sagen, dass die Optimierung geplant, die Verantwortlichkeiten zugewiesen und die Aufgaben an die Teammitglieder verteilt werden müssen.  

Holen Sie das Beste aus Ihrem Chatbot heraus 

Ein Chatbot ist ein grossartiges Instrument, um einen weiteren Kommunikationskanal für Ihre Kunden zu etablieren. Er ist die ideale Ergänzung zu bestehenden Kanälen und hilft Ihnen, Ihre Kunden bestmöglich zu verstehen. 

Darüber hinaus zwingen Chatbots Sie dazu, Ihre Prozesse und Ziele für Service- und Marketingaufgaben zu überdenken. Nutzen Sie die Chance und verbessern Sie Ihren Service mit einem dialogorientierten Kommunikationskanal. 

Wir haben alle Aspekte der Reise zu einem erfolgreichen Chatbot für Sie zusammengefasst (Whitepaper).  

Von der ersten Vision, über die Planung, bis zum Go-Live. Erfahren Sie mehr über Chatbots und wie sie Ihre Kunden und Ihrem Unternehmen nützen.

Clemens Blumer

Clemens Blumer

Senior Software Architect

Interessiert an weiteren Artikeln?

Melden Sie sich zu unserem Newsletter an, und wir senden Ihnen unseren nächten Artikel über AI-Chatbot.

The post 5 Tipps zur Pflege und Verbesserung Ihres Chatbots appeared first on One Inside.

by Samuel Schmitt at September 21, 2021 02:15 PM

5 astuces pour vous aider à optimiser votre chatbot

5 astuces pour vous aider à optimiser votre chatbot

Les solutions de chatbot constituent une approche flexible pour communiquer avec vos clients dans de nombreuses situations.

Elles permettent à vos clients d’accéder facilement à votre marque. En outre, elles vous aident à mieux comprendre leurs besoins.

Cependant, un chatbot n’est efficace que s’il parvient à répondre aux demandes des clients.

Pour atteindre cet objectif, l’optimisation est essentielle après la mise en service.

Dans cet article, nous vous expliquerons le cycle de vie d’un chatbot et les cinq mesures que vous pouvez prendre pour améliorer votre solution au fil du temps.

Pour finir, nous aborderons les efforts que votre équipe marketing doit déployer pour que votre solution reste efficace au fil du temps. 

 Quel est le cycle de vie d’un chatbot ? 

Nous avons divisé le cycle de vie du chatbot en cinq phases :

  1. Des idées à la feuille de route : tout d’abord, vous devez comprendre ce que les solutions de chatbot ont à offrir et définir votre vision. 
  2. Transformer la feuille de route en plan : une fois votre vision et votre stratégie définies, vous élaborerez un plan concret pour développer votre chatbot.
  3. Créer un chatbot et des conversations : mettez en œuvre, créez du contenu, concevez des conversations et commencez à entraîner votre système.
  4. De l’entraînement à la mise en service : vous devez tester votre système, préparer la mise en service et donner vie à votre chatbot.
  5. Faire évoluer et optimiser l’expérience du chatbot : la mise en service n’est que la première étape pour votre chatbot. Il est maintenant temps de le développer et de l’optimiser ! Les clients sont impatients ; ne prévoyez pas d’optimiser votre chatbot ultérieurement, faites-le continuellement, dès le premier jour.

Nous nous concentrerons sur la dernière étape du cycle de vie, qui est souvent mal comprise ou mise de côté.

Ai-je vraiment besoin d’optimiser un chatbot basé sur l’IA ? 

Pour fonctionner, les chatbots s’appuient sur l’IA.

Pourquoi est-il nécessaire de l’optimiser continuellement ? Les systèmes actuels offrent de la flexibilité dans la compréhension de l’intention de l’utilisateur.

Toutefois, si le langage est trop éloigné des expressions attendues (apprises via les données d’entraînement) et/ou si l’intention n’est pas connue, le système atteint ses limites.

L’apprentissage des systèmes de chatbot ne se fait pas entièrement de façon automatique. Vous devez les aider à s’améliorer. Les réponses du chatbot ne sont pas immédiatement basées sur l’IA. Elles sont généralement basées sur du contenu fait main, qui est optimisé pour le parcours attendu/souhaité de l’utilisateur.

Les défis suivants se présenteront pour chaque chatbot au fil du temps, même si vous avez effectué une planification adéquate :

  • La langue de l’utilisateur n’est pas celle attendue et, par conséquent, l’intention est mal comprise.
  • Les clients posent des questions auxquelles vous ne vous attendiez pas concernant votre entreprise.
  • Vous offrez de nouveaux services pour lesquels le chatbot n’a pas été entraîné.
  • Votre système de chatbot devrait gagner en intelligence avec l’intégration d’autres systèmes, par exemple, d’un CRM ou d’un PIM.
  • Le monde dans lequel nous vivons évolue avec le temps. Cela vous oblige à vous adapter pour continuer à offrir une expérience agréable et actuelle à vos clients, même si vos services restent les mêmes.

Quelles sont les étapes concrètes pour maintenir votre chatbot à jour ? 

Comment optimiser un chatbot ? 

L’optimisation d’un AI chatbot est cruciale. Nous avons donc répertorié certaines activités que votre équipe marketing devra prendre en charge, avec l’aide partielle de votre équipe informatique.

1 – Comprendre de plus en plus de questions

Vous avez probablement commencé avec une base limitée de compréhension du langage naturel. Pour comprendre tous les besoins de vos clients, que vous leur répondiez ou non, vous devrez ajouter de plus en plus d’intentions au fil du temps.

En bref : vous devez enseigner de nouvelles phrases à votre système.

2 – Maintenir votre contenu à jour

Tout comme vous optimisez le contenu de votre site Web au fil du temps, il en est de même pour votre chatbot. Modifiez les formulations des réponses, influencez le parcours de l’utilisateur ou améliorez les ressources liées.

En bref : vous devez optimiser les réponses du chatbot en continu.

3 – Apprendre de vos clients

Vos clients offrent de précieuses informations sur leurs besoins. Profitez-en.

Pour ce faire, analysez la façon dont ils interagissent avec votre système et les questions qu’ils posent, en particulier sur les sujets auxquels vous ne répondez pas encore avec votre chatbot ou même votre site Web.

En bref : vous devez analyser les interactions avec les chatbots pour obtenir des informations.

4 – Confirmer le succès de votre chatbot

Les chatbots sont exceptionnels. Ils aident vos clients et peuvent favoriser le retour sur investissement.

Cependant, ne vous contentez pas de vous reposer sur un chatbot efficace. Déterminez si les objectifs visés sont atteints et essayez de recueillir des données supplémentaires pour identifier d’autres objectifs.

Définissez vos facteurs de réussite par rapport aux autres systèmes et identifiez les avantages inter-systèmes.

En bref : vous devez surveiller la réussite de votre chatbot.

 5 – Faire passer votre chatbot au niveau supérieur

Vous avez probablement restreint la liste des fonctionnalités de votre chatbot initial.

Après la mise en service et une première expérience avec le marketing conversationnel, c’est le moment idéal pour intégrer des fonctionnalités supplémentaires à votre chatbot.

Par exemple, vous pouvez intégrer d’autres systèmes d’informations d’entreprise, essayer de nouveaux concepts d’expérience/d’interface utilisateur ou combler les lacunes des fonctionnalités.

Vous devez également veiller à rester à jour avec les fonctionnalités offertes par les chatbots de vos concurrents.

En bref : vous devez enrichir votre chatbot avec des fonctionnalités pour continuer à en tirer profit et à rester attrayant pour les clients.

Qui optimise le chatbot et à quelle fréquence ? 

Pour une optimisation réussie de votre chatbot, il est nécessaire de mettre en place un processus approprié.

Identifier les tâches régulières et attribuer des responsabilités. Revoir régulièrement les optimisations pour assurer le succès à long terme.

Nous avons précédemment décrit les principales tâches, qui comprennent :

  • l’amélioration en contenu des données d’entraînement pour les intentions existantes ;
  • l’ajout de nouvelles intentions pour les sujets imprévus ;
  • l’amélioration du flux et du contenu des réponses.

Toutes ces tâches sont principalement axées sur le contenu. Par conséquent, la maintenance doit être effectuée par les parties prenantes du métier et les équipes du marketing.

Les membres du marketing doivent être formés pour rédiger du contenu destiné aux chatbots et comprendre les principes de base de la NLU (compréhension du langage naturel) afin d’améliorer les données d’entraînement.

Comme ces tâches ont un impact direct sur les clients, il est important de les effectuer régulièrement. Par exemple, améliorez et ajoutez des intentions une fois par semaine et mettez à jour le contenu chaque mois.

En plus des tâches de maintenance spécifiques au chatbot, d’autres optimisations sont également nécessaires, comme les corrections de bugs et l’ajout de nouvelles fonctionnalités.

Ces tâches sont généralement effectuées en collaboration avec l’équipe de développement. Les responsables métiers savent ce qu’ils veulent, les développeurs savent comment le mettre en œuvre. Ne vous contentez pas d’attendre d’avoir besoin de fonctionnalités et planifiez des améliorations régulières.

Enfin, assurez-vous que la technologie de votre système soit toujours à jour. Les API peuvent changer et fournir de nouvelles fonctionnalités.

En résumé, l’optimisation doit être planifiée, les responsabilités attribuées et les tâches réparties parmi les membres de l’équipe. 

Tirez le meilleur parti de votre chatbot  

Un chatbot est un excellent outil pour ouvrir un nouveau canal pour vos clients. Il n’a pas pour objectif de remplacer un autre outil, mais représente une occasion supplémentaire d’offrir le meilleur service.

Il vous offre une connaissance approfondie des besoins des clients.

De plus, il vous oblige à repenser vos processus et vos objectifs pour les tâches de service et de marketing. Saisissez cette opportunité et améliorez vos services avec un canal de conversation.

Nous avons résumé tous les aspects du parcours vers un chatbot réussi ici (Livre blanc).

De la conception à la planification, en passant par la mise en service, vous en apprendrez davantage sur les chatbots et les avantages qu’ils peuvent apporter à vos clients et à votre entreprise.

Clemens Blumer

Clemens Blumer

Senior Software Architect

Souhaitez-vous recevoir le prochain article ?

Inscrivez-vous à notre newsletter et nous vous enverrons le prochain article sur Adobe Experience Manager.

The post 5 astuces pour vous aider à optimiser votre chatbot appeared first on One Inside.

by Samuel Schmitt at September 21, 2021 02:07 PM