Saturday, May 1, 2021

Apostrophe, Ampersand, Exclamation mark & Co.

 by T. J. Zelger

Since I am in the testing business, I am preaching the use of special characters and umlauts wherever and whatever we test. 

 Umlauts (ä,ö,ü) and special chars such as é,â, etc. are very common in Europe. In the US and elsewhere, you often see lastnames with a single quote such as O'Neill, O'Connor or think about company names like Macy's and McDonald's.

Sometimes, we forget to include these characters in our tests and then we promptly pay the price in the  form of bugs reported in the field.

 But, awareness is increasing in our team and I am really delighted when I see a colleage posting a note into the group chat yelling: 

"Hey Buddy, there were no special chars in your demo.Will it work with an exclamation mark, too?"

  I love that, but I also recommend to first test a new user-story using simple inputs. There is no means to attack an application with special chars if it can't even deal with simple ones. Once I've convinced myself that the happy case works, I go over to feeding the program with more difficult input.

 It's always exciting to watch how a system deals with a combination of chars like
"{äöü},[áàâ]/|éèê|\ë!:;+(&)'%@" but if it fails, and if this is the exact string you include in a defect report, developers and/or business analysts will likely give you the hairy eyeball and close the issue unfixed, hereby suffering from the delusion that these inputs won't be entered in production.

  What has worked better for me, is to analyze which character causes a problem and then use it in a meaningful context so it becomes obvious to the stakeholders that the problem you spotted is worth getting fixed.

For example, I've made it a habit to use M&M's Worldstore Inc. whenever I use or create a company address. That's not because I love chocoloate so much but rather because it has a mix of special characters that all caused headaches in my past and current career as a software tester: the ampersand, the apostrophe and the dot.

Same approach is applied for street-names and ZIP codes. It is a common misbelieve that ZIP codes need to be numeric. Go visit the UK and check how their ZIP codes look alike. Street names can contain slashes, dashes, single quotes. Street numbers can be alphanumeric separated by a slash.

A blank in a name isn't excotic at all. Look at Leonardo Di Caprio. I've seen programs that cut away the second part, leaving the famous actor with the short name "Di".

Below find a few examples of names I use regularly in testing:

  • M&M's Worldstore
  • Mr. & Mrs. O'Leary
  • Léonardo Di Caprio 
  • Renée-Louise Silberschneider-Kärcher
  • Praxisgemeinschaft D'Amico & Ägerter GmbH

Example street names:

  • 29, Queen's Gardens
  • 4711 Crandon Blvd., Appartment F#1000
  • Rue du Général-Dufour 100-102
  • 55b, Rue de l'Université 
  • Elftausendjungferngässlein 12b

and ZIP-Codes:

  • 4142 Münchenstein 1 (Switzerland)
  • 33149 Key Biscayne, FL (USA)
  • EH4 2DA Edinburgh (Scotland)

   Special characters are interesting also at places where you can submit different kind of texts.

If you submit a message like "meet you at Lexington Av/59 St" to a program that stores information in an XML file without escaping the slash or embedding the input in a CDATA tag, you've found an interesting bug.

Backslashes are used in many programs to escape characters, quotes and single quotes are used to terminate text elements. The latter is a common tool for hackers trying to manipulate the executed SQL statement in that they can attach an additional OR statement. The goal is to make the program return information for which one is not authorized or for which the query wasn't prepared.

Semicolons can confuse web client logic or cause problems when information is stored in CSV files. 

Question marks and ampersands are both used in URL links, curly braces are common markers in REST payloads, etc. 

I've never really understood what's so funny using the pangram the quick brown fox jumps over the lazy dog in testing or the filler "Lore Ipsum". Both do not contain any special characters. When testing German language based programs, I prefer to use  Victor jagt zwölf Boxkämpfer quer über den großen Sylter Deich because it contains umlauts and this weird ß character. On Wiki you will find a lot more interesting pangrams.

Most often, I use my own extended version of the brown fox pangram that I have adapted to my personal needs:

René Di Agostino-Cruise has a fox living in a box opposite "Ottawa's Deacon Brodie" (Scottish Pub & Largest Whisky Collection). Each day he visits his friend and was once seen jumping quickly over the neighbours' cat  with a measured speed of > 50 km/h! How amazing is that?  

Agree, it's a little bit bumpy. There is a lot of room for improvement, but it's a text I can remember and it contains a whole set of interesting characters to challenge text processors. 

One of my most funny treasures is the Canadian town St. Louis du Ha! Ha! This is the weirdest town-name I've ever heard of, because it has two exclamation marks. I have to admit, I never reported defects when submitting such town failed. The likelyhood of such a city ever being entered by our customers is close to zero. Canadian citizens living near the area of Quebec probably disagree. Testing and test input is always a question of the context.

Null, True, All, Test and other funny bugs related to people's names

Besides the recommendation of using special characters in our tests, it is also worth to sneak a peak at reported stories related to people's names like Jennifer Null and Rachel True. Their names were processed and mis-interpreted as NULL [BBC16] or - in the case of Rachel - a boolean value [9TO5] causing a problem using her iCloud account. I've not experienced either of the two cases myself in any of my tests but we found a similar issue where submitting the search term "Alli" returned documents titled "Alligator" and "Allister". All fine, but submitting "All" ended in an exception. 

Stephen O, a Korean native living in the US had been hassled by credit card companies, because his last name was too short. When he applied the workaround by adding an extra O to end up with "OO", it didn't take long until he had a meeting with the government because he provided false information [NYT91].
 
A similar story was mentioned in a column by Rod Ackerman in the serie "Made in USA" in the Swiss newspaper Basler Zeitung. A US agency insisted to enter two initials for his first names. Since he had only one, he asked the clerk to enter R.R. Ackerman to workaorund the problem. Rod got into legal issues because of providing misrepresentation [BAZ].

Graham-Cumming could not register his name on a web-site because the hyphen was considered an invalid character [GRA10].

There is also Natalie Weiner who could not register either because here name was considered offensive [PAN18].

Yet another example is the story of William and Katie Test who were both unable to book airplane tickets simply because the system scanned the names for the term "test". A match triggered a security procedure to disallow the program of booking anything on a productive environment [COY17].

Wooha! That's exactly what one of my clients had implemented, too! I was surprised there were never any reported issues with it. So I investigated a little and found out, in Switzerland (where the software was used) there are only around 50-60 people carrying the name "Tester". Second, investigation revealed the system really checked for the occurrances of the term "test"; luckily not in the last name but in a different field. Doozy!

References:

Further similar reading:

No comments:

Post a Comment