Category Archives: Uncategorized

Start at the end

Level 0: Commit message

Once upon a time somebody told me the commit message should be written before any code. Of course, I’ve ignored the rule many times since then. But every time I follow it I’m amazed that such a simple thing actually helps with focus. Plus it makes me super guilty if I decide to batch multiple unrelated things in a single commit.

Tiny habit, but results add up quickly and actually make a difference.

Level 1: ATDD, BDD, Specification by example, etc.

Some time later I noticed that even though we, developers, can be very good at writing code the right way, quite often we work on the wrong things. We make wrong assumptions that break everything in the last moment, we keep making business decisions disguised as technical ones, or we simply don’t care about solving the business problem as much as about playing with all the cool technology.

I was fascinated by the stories where asking a few extra questions saved hundreds of hours of work, made existing model completely unsuitable or led to a complete change in approach.

But after putting the ideas into practice I lost some of the early enthusiasm. Those methods work and they certainly brought a lot of benefit to a few projects I worked on. Even though the actual acronyms were not always used, we found a few serious problems or misunderstandings early in the process with those methods. So I know they work.

The issue I have with them though, is that there are so many tools and processes around them now that it’s hard to not get distracted. It’s so tempting to play with a new framework or argue which tool is better instead of talking to the human being on the other side of our program. And there’s a lot one can play with.

Level 2: Sell it before you build it

At my current job we have a process that tries to force us to focus on users and business aspects before we even think about writing any code. We make so called “impact analysis” to determine what is the expected benefit of building a specific feature. Sometimes in the process we realize that in fact there are better things we could be working on at the moment and the idea is dropped. Then we prepare announcement and documentation drafts to think how will we communicate this feature to the users. If we can’t explain it or it doesn’t sound very attractive then it’s probably not worth building.

To be honest it’s not easy to work that way. We’re developers, not marketers after all. It really takes some effort. It’s easier to just code or cheat the process.

The obvious benefit of that approach is not wasting time for building things nobody would use. But there’s another one. A few times by going back to the announcement we realized we had made mistakes. Last time it happened to me yesterday. Another time we came up with a few extra things to discuss, noticed holes in the initial approach to solving problem or identified new edge-case scenarios.

The main benefit of starting with customer communication over BDD is that it doesn’t involve any fancy technology. There are not as many powerful distractions.

Next level: ???

I’m not sure what the next level is, I guess that it would have something to do with metrics and verifying assumptions we made using numbers. But it might be something completely different.

I hope to find out soon.

That’s just unacceptable

A few years ago I had a big challenge at work. With two or three other programmers we spent almost 8 hours browsing through code, drawing diagrams, swearing and thinking to the point our heads started to hurt. The conclusion was clear – we can’t do it. It’s impossible.

A few months before that day our batch job started processing too much data to fit a single processing window. Not a big deal. Business adapted. They started the job a few days early each month, to make sure all reports were ready before the deadline.

But now they came up with a new product. One requirement didn’t fit our model. From then on information that used to be always in one piece, could be spread across multiple records. What’s worse, related records didn’t have to be located next to each other and could arrive in  a random order.

We started analyzing edge cases. What if the processing window ends before all related records are processed? We’ll have incorrect results. We KNEW that was unacceptable. So we were thinking and thinking and thinking…

Eventually we realized that ensuring correctness at the end of each processing window is either impossible or extremely expensive. So on the very same day we set up the meeting and broke the bad news to the business. We felt uneasy about admitting that our system can’t accomodate their business idea. We were sure they’ll be extremely angry and unhappy.

Then somebody from the business asked whether we can ensure that results are correct when the whole batch job completes. We could. So no big deal, they said. We never look at this data in the middle of the job anyway. The call ended at that and we just looked at each other in disbelief.

That day I realized that business indeed copes with eventual consistency better than most programmers. I also learned that if we try to disguise our own technical biases as business requirements we might at best waste some time. If we’re less lucky we can lose a lot of money.

It might seem like an isolated example, but the world is full of similar stories.

Amazon decided they can deal with an occassional “absolutely unacceptable” situation that resulted in selling a single paper book to two different customers. Fixing that and making customers happy costs them some money, but they make much, much more thanks to the design that allows for that situation to happen.

Gojko Adzic in one of his books mentioned a betting company that was discussing key examples for their specifications (i.e. test cases). They realized that allowing customers to spend more money than they had is indeed a great idea. Even though it was an “absolutely unacceptable” scenario and would make every programmer cringe. Turns out that the customer that spent some money was more likely to come back and make another transaction. In order to do that they would have to pay off their debt first, so the company would get their money back eventually… But even if not, it’s better to make 5 dollars instead of 10 than make 0 instead of 10. Or at least that’s the logic business people apparently follow.

I try to keep those stories in mind every time I feel like saying that something is just unacceptable. Somebody else might already make money on proving me wrong.

What legacy projects can teach you?

Last week I attended Udi Dahan’s ADSD course. At the moment seems like I need a few years to process and truly understand all the information he packed there. But surprisingly enough a few things sounded familiar. More organized, deeper analyzed, better phrased… Still sounded like he was on my previous projects.

I’ve never worked on a greenfield project (yet?). But I worked on a few that were “new, shiny, better rewrites”. Or at least that was the idea. In most cases people who created them were not available or forgot a lot of stuff. So we were required to do lot of detective work as well as navigate carefully whenever we tried to add a new feature. The challenge was not only figuring out how to get something done. The most fun part was trying to guess what was the most probable reason for why something was done in a specific way. Was it laziness? Unknown requirement? Easiest approach? Personal preference? Great way to learn how humans think and work.

Udi has quite an interesting perspective on “maintenance” work. He disagrees with the popular belief that it’s easy and requires less skills than working on a new project. He says that most likely it originates from the wrong metaphor. Maintenance is indeed less challenging than designing and building a skyscraper or a house. But it’s not true for software.

The most important distinction is that software is never finished. And is not meant to be. It’s finished only if it’s dead. Not only maintenance is not that different from greenfield, extending existing system is even more complex. There are many more things you can break. Users know the system, work with it. Whatever you do, it impacts them. You have to take that into account.

After he said that I’ve realized there are a few things that you are more likely to learn if you work on a legacy system. You might notice them on a greenfield project too, but I bet the lessons are more painful during maintenence. And pain boosts memory and pattern recognition.

Here’s my list:

1. Technology is just a tool. No matter how shiny, new, life-changing… It’s just a tool.

Every now and then people get enthusiastic about yet another fantastic idea. This time it’ll be different. We’ve learned a lot in the previous project. Said somebody on every project. Yet you’re working on third reincarnation of the same system and you hate this codebase. And for good reasons. At some point you realize that it’s more complicated than just picking the right technology.

2. Context and why is more important than what and how.

 Sometimes it’s not that important what exactly you decided on, but why. Best practices are not silver bullets, they need to be applied in a specific context. Domain models are great, but are overkill if simple CRUD would be good enough. In a long-lived project you have many opportunities to see how great ideas work when used (with best intentions) in the wrong  context. You learn to think for yourself, even if you hear recommendation from an expert you deeply respect. They make mistakes too.

3. Simplicity rules.

You can realize how bad humans are in predicting future. System is easy to extend in places that you never touch. Then you waste a week for understanding overcomplicated, overflexible, overgeneralized module to add a mere two lines of code. You might even dare to think that the simplest, least flexible solution would be better than that. It would be easier to replace it all completely if necessary.

4. Consistency is worth more than having “the best” tool/design.

Said a person who’s had 4 different ORMs in one codebase… Just trust me on that one.

5. It’s all about habits.

A fair number of issues we had could and eventually was mitigated by developing good habits. Be a good scout, do a tiny improvement. Every day. It adds up.

The less intuitive part is that big leaps don’t work as well as small, boring changes, applied consistently on a daily basis. If you look carefully enough you might find remnants of a few revolutions in your codebase. And remember, you won’t have more time to fix it later. Either do it now or never. Sticky notes get lost.

6. Bad code is not (only) a technical issue.

Often this is a result of (mis)management, time pressures, bad planning, people not speaking up… Even lack of sufficient technical skills might be tied back to the “soft side of things”. Why can’t you attract and/or keep more experienced people? Do you give them what they need for optimal work? Do you know how to judge people’s skills? Do you keep each other accountable? Do you let the loudest person make significant decisions even if they don’t have anything to back them up? Who deals with their consequences?

The less visible problem is a lack of trust between “business” and “technical” parts of organization. If business believes you know what you’re doing, they’ll give you what you need if you ask for it. If not… Well, life is hard.

7. It’s (almost?) never easier to “just rewrite it”.

You might discover that the “new feature” you’ve just estimated was implemented before you joined the company. For some reason it hasn’t been widely used though. Another day you realize that the bug you’re fixing was in production for the last 5 years. How come nobody noticed? So much for “just make it work as an old system”.


If you have some more lessons to add, please share them!

A (very) quick intro to Complex Events Processing

Contrary to what its name suggests, Complex Events Processing (CEP) is conceptually a simple idea. It’s also very easy to grasp on an intuitive level, because it corresponds well to how we operate in our “real life” (i.e. outside of work). That is why it’s best explained with examples.

However, CEP has its challenges. One has to face a lot of confusion regarding even the most basic terminology, various vendors have quite distinct approaches towards providing the same functionality, so comparing tools and choosing the right one for given task might be difficult.

I think it’s useful to know that it’s in large part because of historical reasons. The foundations for CEP we know today, were laid by teams working on different research problems such as active databases, discrete simulations, business intelligence, programming languages. Apparently they didn’t communicate much. That is why contemporary tools have different syntax (e.g. SQL- vs Prolog-like) or why not all of support the same set of “standard” CEP features (e.g. some lack efficient aggregations, negations or event absence detection mechanism).

Events, events and more events

The basic unit of data in CEP is, of course, an event. Formally events are defined as representations of something that has happened (e.g. new order placed) or as representations of change in state (e.g. temperature level raised from 1 to 2 Celsius degrees).

The input information to the system usually is fine-grained and low level, for example we might track readings from temperature and CO2 sensors at 1 second intervals and emit a new event for every reading. In CEP systems the expected output is actionable, business level information; meaning we expect business users to be able to immediately react to what happened and make informed decisions, for example we might emit event signalling fire outbreak.

CEP is a tool for getting from those low level input events to actionable insights. It aims to provide actionable, business-level information in real-time (or near real-time), based on low level input data from (usually) multiple, heterogenous sources. To this end, CEP extends Event Processing (EP) by introducing additional concepts and operations.

Events hierarchy

Usually we don’t get directly from input events to actionable insights. We need to build a set of gradual abstractions representing system’s state, that comprise events hierarchy. For our fire detection example we might have the following:

  • Level 1 – Raw sensor readings are sent to separate streams (or channels) at regular time intervals. They might be just numbers with metadata (e.g. sensor id and timestamp).
  • Level 2- Dealing with such raw readings is not very convenient, especially if we have various models of sensors measuring the same thing. It makes sense to abstract readings in a more usable information, e.g. by introducing levels (safe, suspicous, alarming).
  • Level 3 – The data from multiple streams can be combined to emit an actionable event. We might issue an alarm if both temperature and CO2 reach alarming levels or if both start rising rapidly in short time.

Common CEP features

Although it’s possible (and interesting) to implement the functionality described above without dedicated tools, it is not be practial for more realistic scenarios. CEP tools have a lot of useful features such as:

    • Pattern detection for detecting a single event or a relationship among multiple events. Examples: event A occured, followed by event B and C; event X occured within 10 min. after event Y; event M hasn’t occured for the last 10 min.
    • Computations over events data, including aggregations and computations over sliding windows. Examples: count of events type T in every 10 min. window; average temperature in 10 last readings.
    • Adaptation of data coming from multiple heterogenous sources, as well as enrichement and projection of event data. Examples: sensor A measures temperature in Celsius degrees, sensor B in Fahrenheit, we need to represent it in uniform way; when event X is detected, emit event Y containing data from X and some data retrieved from another source; when event M is detected, emit event N that contains only subset of M’s data.
    • Abstraction of events and patterns among them, in particular it involves ability to emit new, higher level events. Examples: when both events A and B occur, emit event C.
    • Filtering allows for ignoring events that are irrelevent for specific task. In CEP systems a large portion of input events might be ignored.

Those features will be very useful in my Pet Project.

Can CRUD be good (enough)?

When it comes to CRUD, I generally I agree with Matthias Verraes – it’s a great anti-pattern. However, everything we do, needs to be put in context. There are no silver bullets or absolute truths in our industry and I realized there are situations where CRUD is simply… good enough.

Some time ago we started re-designing our e-commerce project. It’s a legacy solution (some mean people even say it’s “uber legacy”) and we hit a lot of issues when trying to introduce automated tests. On a high-level we basically had two layers – data access and UI, business logic was randomly divided between controllers and data access objects.

The solution seems rather obvious – gradually isolate logic from data access and UI, extract new layer (let’s call it “business logic” or “domain” layer). That’s far from easy, given the current state of the solution, but possible.

Since our team is made mostly of novices and expert beginners (at least with regard to writing testable code and design), we needed very specific rules to follow. After long discussions and gradual improvements, we ended up with this general recommendation from “design team”:

  • controllers only manipulate parameters and maybe map domain objects to view models, they delegate all interesting work to domain objects
  • domain objects do all the heavy lifting, by default we have domain service here plus its interface, repository plus its interface (so we can mock it) and whatever other domain objects we need
  • all our current database access code is hidden in repositories.

Overall it looks good. Not perfect, not particularly sophisticated, but simple and way better than what we have now. There’s only one small catch. Over time developers started following this pattern everywhere, without ever questioning what is its purpose and whether it makes sense at all in the given context.

Our solution comprises of two main parts – customer site and management site. Rougly 70-80% of the management site are simple forms used for editing values: create new element, edit another, delete. The most advanced bit of logic in those are simple validations (e.g. value requried, regex match).

To me it seems like a perfect scenario to deviate from the general design guidelines. There’s not much logic to extract. There’s not much to test, so one end-to-end test per scenario should be more than enough. The changes in this area are not often and very simple. But we end up with at least 5 different objects involved, all just passing parameters and delegating calls to the layer below. CRUD should do here just perfect, it’s good enough, way simpler and faster.

The only problem is now I’m met with questions such as: “Isn’t it bad?”, “Aren’t we supposed to isolate data access?”, “How are we going to test it?” (but what to test here exactly without a database?)…

Lessons learned:

  • Context is king.
  • Rules and guidelines are no substitute for thinking. They’re only thinking aids, much like useful stereotypes and habits, without which we would be constantly overwhelmed and not able to do much valuable work. But if you’re too rigid with them… Well, it’s not much better than not having them at all.
  • If people don’t understand the two lessons mentioned above, then it doesn’t matter how great your rules will be. It’s impossible to determine all edge cases and exceptions up front. It’s not personal, it doesn’t mean you’re not smart enough, it’s just how things work in this world.
  • More important than what and how you’re doing something, is why you’re doing it in the first place. Make sure the motivation is clear for everybody and overcommunicate it at every opportunity. Make sure everybody understands that on our way to achieving the holy why, we might try various hows and whats. That’s ok. Make sure you focus on results of your why, and not verifying hows and whats (e.g. since we wanted to have testable code, we should focus on tests, not having extra abstractions everywhere in the codebase).
  • Last but not least, there are no silver bullets in technology. Even when it comes to “obviously good” practices, there are contexts in which they are not useful. So keep an open mind and be prepared to be challenged by reality. Sometimes CRUD is just good enough. Even though in general it definitely is a great anti-pattern.