Category Archives: Uncategorized

What legacy projects can teach you?

Last week I attended Udi Dahan’s ADSD course. At the moment seems like I need a few years to process and truly understand all the information he packed there. But surprisingly enough a few things sounded familiar. More organized, deeper analyzed, better phrased… Still sounded like he was on my previous projects.

I’ve never worked on a greenfield project (yet?). But I worked on a few that were “new, shiny, better rewrites”. Or at least that was the idea. In most cases people who created them were not available or forgot a lot of stuff. So we were required to do lot of detective work as well as navigate carefully whenever we tried to add a new feature. The challenge was not only figuring out how to get something done. The most fun part was trying to guess what was the most probable reason for why something was done in a specific way. Was it laziness? Unknown requirement? Easiest approach? Personal preference? Great way to learn how humans think and work.

Udi has quite an interesting perspective on “maintenance” work. He disagrees with the popular belief that it’s easy and requires less skills than working on a new project. He says that most likely it originates from the wrong metaphor. Maintenance is indeed less challenging than designing and building a skyscraper or a house. But it’s not true for software.

The most important distinction is that software is never finished. And is not meant to be. It’s finished only if it’s dead. Not only maintenance is not that different from greenfield, extending existing system is even more complex. There are many more things you can break. Users know the system, work with it. Whatever you do, it impacts them. You have to take that into account.

After he said that I’ve realized there are a few things that you are more likely to learn if you work on a legacy system. You might notice them on a greenfield project too, but I bet the lessons are more painful during maintenence. And pain boosts memory and pattern recognition.

Here’s my list:

1. Technology is just a tool. No matter how shiny, new, life-changing… It’s just a tool.

Every now and then people get enthusiastic about yet another fantastic idea. This time it’ll be different. We’ve learned a lot in the previous project. Said somebody on every project. Yet you’re working on third reincarnation of the same system and you hate this codebase. And for good reasons. At some point you realize that it’s more complicated than just picking the right technology.

2. Context and why is more important than what and how.

 Sometimes it’s not that important what exactly you decided on, but why. Best practices are not silver bullets, they need to be applied in a specific context. Domain models are great, but are overkill if simple CRUD would be good enough. In a long-lived project you have many opportunities to see how great ideas work when used (with best intentions) in the wrong  context. You learn to think for yourself, even if you hear recommendation from an expert you deeply respect. They make mistakes too.

3. Simplicity rules.

You can realize how bad humans are in predicting future. System is easy to extend in places that you never touch. Then you waste a week for understanding overcomplicated, overflexible, overgeneralized module to add a mere two lines of code. You might even dare to think that the simplest, least flexible solution would be better than that. It would be easier to replace it all completely if necessary.

4. Consistency is worth more than having “the best” tool/design.

Said a person who’s had 4 different ORMs in one codebase… Just trust me on that one.

5. It’s all about habits.

A fair number of issues we had could and eventually was mitigated by developing good habits. Be a good scout, do a tiny improvement. Every day. It adds up.

The less intuitive part is that big leaps don’t work as well as small, boring changes, applied consistently on a daily basis. If you look carefully enough you might find remnants of a few revolutions in your codebase. And remember, you won’t have more time to fix it later. Either do it now or never. Sticky notes get lost.

6. Bad code is not (only) a technical issue.

Often this is a result of (mis)management, time pressures, bad planning, people not speaking up… Even lack of sufficient technical skills might be tied back to the “soft side of things”. Why can’t you attract and/or keep more experienced people? Do you give them what they need for optimal work? Do you know how to judge people’s skills? Do you keep each other accountable? Do you let the loudest person make significant decisions even if they don’t have anything to back them up? Who deals with their consequences?

The less visible problem is a lack of trust between “business” and “technical” parts of organization. If business believes you know what you’re doing, they’ll give you what you need if you ask for it. If not… Well, life is hard.

7. It’s (almost?) never easier to “just rewrite it”.

You might discover that the “new feature” you’ve just estimated was implemented before you joined the company. For some reason it hasn’t been widely used though. Another day you realize that the bug you’re fixing was in production for the last 5 years. How come nobody noticed? So much for “just make it work as an old system”.

If you have some more lessons to add, please share them!

A (very) quick intro to Complex Events Processing

Contrary to what its name suggests, Complex Events Processing (CEP) is conceptually a simple idea. It’s also very easy to grasp on an intuitive level, because it corresponds well to how we operate in our “real life” (i.e. outside of work). That is why it’s best explained with examples.

However, CEP has its challenges. One has to face a lot of confusion regarding even the most basic terminology, various vendors have quite distinct approaches towards providing the same functionality, so comparing tools and choosing the right one for given task might be difficult.

I think it’s useful to know that it’s in large part because of historical reasons. The foundations for CEP we know today, were laid by teams working on different research problems such as active databases, discrete simulations, business intelligence, programming languages. Apparently they didn’t communicate much. That is why contemporary tools have different syntax (e.g. SQL- vs Prolog-like) or why not all of support the same set of “standard” CEP features (e.g. some lack efficient aggregations, negations or event absence detection mechanism).

Events, events and more events

The basic unit of data in CEP is, of course, an event. Formally events are defined as representations of something that has happened (e.g. new order placed) or as representations of change in state (e.g. temperature level raised from 1 to 2 Celsius degrees).

The input information to the system usually is fine-grained and low level, for example we might track readings from temperature and CO2 sensors at 1 second intervals and emit a new event for every reading. In CEP systems the expected output is actionable, business level information; meaning we expect business users to be able to immediately react to what happened and make informed decisions, for example we might emit event signalling fire outbreak.

CEP is a tool for getting from those low level input events to actionable insights. It aims to provide actionable, business-level information in real-time (or near real-time), based on low level input data from (usually) multiple, heterogenous sources. To this end, CEP extends Event Processing (EP) by introducing additional concepts and operations.

Events hierarchy

Usually we don’t get directly from input events to actionable insights. We need to build a set of gradual abstractions representing system’s state, that comprise events hierarchy. For our fire detection example we might have the following:

  • Level 1 – Raw sensor readings are sent to separate streams (or channels) at regular time intervals. They might be just numbers with metadata (e.g. sensor id and timestamp).
  • Level 2- Dealing with such raw readings is not very convenient, especially if we have various models of sensors measuring the same thing. It makes sense to abstract readings in a more usable information, e.g. by introducing levels (safe, suspicous, alarming).
  • Level 3 – The data from multiple streams can be combined to emit an actionable event. We might issue an alarm if both temperature and CO2 reach alarming levels or if both start rising rapidly in short time.

Common CEP features

Although it’s possible (and interesting) to implement the functionality described above without dedicated tools, it is not be practial for more realistic scenarios. CEP tools have a lot of useful features such as:

    • Pattern detection for detecting a single event or a relationship among multiple events. Examples: event A occured, followed by event B and C; event X occured within 10 min. after event Y; event M hasn’t occured for the last 10 min.
    • Computations over events data, including aggregations and computations over sliding windows. Examples: count of events type T in every 10 min. window; average temperature in 10 last readings.
    • Adaptation of data coming from multiple heterogenous sources, as well as enrichement and projection of event data. Examples: sensor A measures temperature in Celsius degrees, sensor B in Fahrenheit, we need to represent it in uniform way; when event X is detected, emit event Y containing data from X and some data retrieved from another source; when event M is detected, emit event N that contains only subset of M’s data.
    • Abstraction of events and patterns among them, in particular it involves ability to emit new, higher level events. Examples: when both events A and B occur, emit event C.
    • Filtering allows for ignoring events that are irrelevent for specific task. In CEP systems a large portion of input events might be ignored.

Those features will be very useful in my Pet Project.

Can CRUD be good (enough)?

When it comes to CRUD, I generally I agree with Matthias Verraes – it’s a great anti-pattern. However, everything we do, needs to be put in context. There are no silver bullets or absolute truths in our industry and I realized there are situations where CRUD is simply… good enough.

Some time ago we started re-designing our e-commerce project. It’s a legacy solution (some mean people even say it’s “uber legacy”) and we hit a lot of issues when trying to introduce automated tests. On a high-level we basically had two layers – data access and UI, business logic was randomly divided between controllers and data access objects.

The solution seems rather obvious – gradually isolate logic from data access and UI, extract new layer (let’s call it “business logic” or “domain” layer). That’s far from easy, given the current state of the solution, but possible.

Since our team is made mostly of novices and expert beginners (at least with regard to writing testable code and design), we needed very specific rules to follow. After long discussions and gradual improvements, we ended up with this general recommendation from “design team”:

  • controllers only manipulate parameters and maybe map domain objects to view models, they delegate all interesting work to domain objects
  • domain objects do all the heavy lifting, by default we have domain service here plus its interface, repository plus its interface (so we can mock it) and whatever other domain objects we need
  • all our current database access code is hidden in repositories.

Overall it looks good. Not perfect, not particularly sophisticated, but simple and way better than what we have now. There’s only one small catch. Over time developers started following this pattern everywhere, without ever questioning what is its purpose and whether it makes sense at all in the given context.

Our solution comprises of two main parts – customer site and management site. Rougly 70-80% of the management site are simple forms used for editing values: create new element, edit another, delete. The most advanced bit of logic in those are simple validations (e.g. value requried, regex match).

To me it seems like a perfect scenario to deviate from the general design guidelines. There’s not much logic to extract. There’s not much to test, so one end-to-end test per scenario should be more than enough. The changes in this area are not often and very simple. But we end up with at least 5 different objects involved, all just passing parameters and delegating calls to the layer below. CRUD should do here just perfect, it’s good enough, way simpler and faster.

The only problem is now I’m met with questions such as: “Isn’t it bad?”, “Aren’t we supposed to isolate data access?”, “How are we going to test it?” (but what to test here exactly without a database?)…

Lessons learned:

  • Context is king.
  • Rules and guidelines are no substitute for thinking. They’re only thinking aids, much like useful stereotypes and habits, without which we would be constantly overwhelmed and not able to do much valuable work. But if you’re too rigid with them… Well, it’s not much better than not having them at all.
  • If people don’t understand the two lessons mentioned above, then it doesn’t matter how great your rules will be. It’s impossible to determine all edge cases and exceptions up front. It’s not personal, it doesn’t mean you’re not smart enough, it’s just how things work in this world.
  • More important than what and how you’re doing something, is why you’re doing it in the first place. Make sure the motivation is clear for everybody and overcommunicate it at every opportunity. Make sure everybody understands that on our way to achieving the holy why, we might try various hows and whats. That’s ok. Make sure you focus on results of your why, and not verifying hows and whats (e.g. since we wanted to have testable code, we should focus on tests, not having extra abstractions everywhere in the codebase).
  • Last but not least, there are no silver bullets in technology. Even when it comes to “obviously good” practices, there are contexts in which they are not useful. So keep an open mind and be prepared to be challenged by reality. Sometimes CRUD is just good enough. Even though in general it definitely is a great anti-pattern.