Why HTML Matters for Data, Tags, and Content Management ~ Appatura

Why HTML Matters for Data, Tags, and Content Management

Richard Plotka


In the beginning, there was Adam …

Well, if we are talking about the World Wide Web, in the beginning there was Sir Tim Berners-Lee, the credited inventor of the Web.

His conception of the what is now TCP/IP and HTTP, the “hard” technology behind the Web, and his application of markup languages into HTML the “soft” technology behind the Web, laid the groundwork for what we now take for granted—Amazon, Google, etc…, and what we soon will: technologies like driverless cars, home automation, and intuitive AI. While we today talk with excitement about the Internet of Things (IoT), tomorrow we will be immersed in the Internet of Everything (IoE).

TCP/IP stands for Transmission Control Protocol over Internet Protocol. The slash is a clever way of saying “over.” This underlying technology enables our computers, smart phones, refrigerators, and cars to talk to each other.

HTML stands for HyperText markup language. Also conceptualized by Sir Tim Berners-Lee, HTML is the most common language for expressing information on the Web. It is how we produce and manage content on Amazon, Google, and Docubuilder, our document management software for enterprise content management (ECM). HTML is what you are reading right now.

HTML actually has its roots in publishing. It is descended from a less known language; SGML, or Standard Generalized Markup Language, which is itself a descendant of a much less known markup language—IBM Generalized Markup Language (GML). GML was one of (if not) the first markup languages which was created to allow IBM to manage and publish its manuals.

SGML has evolved into the language that defines markup languages. HTML (prior to HTML5) and XML are both defined by SGML.

But what is a markup language?

A markup language is a way of taking plain text and marking it up with other plain text, in a specific format, in order to allow some interpreter (non-human consumer) of that information to interpret meaning from it. (I know—bear with me.)

For example, if I want to write in print “I am hungry,”, and I want to emphasize that I am really hungry, then I would print something like, “I am hungry”. (If we’re talking to our friends, we would put stress on the word hungry)

Well, a computer does not understand how to read (at least in the traditional sense of the word), let alone print the word “hungry” in bold by itself.

We need a way to tell it to do so.

Though we think of computers as sentient, super intelligent machines that understand everything we tell them, the truth is that they are not. We are decades (maybe centuries) from West World level computer intelligence. A computer will do exactly what you tell it—no more, no less.

A simple markup language is one way to communicate with a computer. An instruction as simple as putting a word in bold requires precise instructions. We are spoiled by the programmers of Microsoft Word and Google Docs that allows us to put a character in bold by simply tapping “command” and “b” simultaneously (that is if you have a Mac). HTML requires that we use “tags,” these two-thirds horizontal brackets “< >,” to do the trick. If I want to put a phrase in bold, I have to do the following:

I am <b>hungry</b>

This renders “I am hungry” when the computer program reads the HTML. If you do not obey the languages protocols, you will end up with one hair-pulling “error” message after another. Other languages like C#, Python, and JavaScript have their own protocols.

Therefore, HTML is a markup language that is used to tell a computer program like a browser, (aka interpreter, parser or consumer), how to represent (or interpret or understand) the text contained in the file it is reading. Some other types of ‘consumers’ of markup languages are Search Engines (SEO anyone? – more on this to come), Web CMS, Enterprise Content Management Systems (ECM) and Compliance Systems like the SEC’s EDGAR and XBRL.

Even if you are not a programmer by trade, understanding the elements of HTML is a great basic skill set. If you are writing content, then you will want to be aware of it. Appreciating how tags operate will allow you to create and distribute content that will be relevant not only for your traditional reader, but for other systems, search engines, etc… as well.

By using markup languages, you can take components of data, wrap them with tags for style (and semantic meaning – more on this later as well), and create rich documents which transcend traditional publishing objectives. This is especially useful when the content we create needs to be consumed by many different consumers. We can write and distribute one piece of content, yet it can now be useful for many purposes – for example; this article will be used as a standalone document, a web page, a marketing piece and ‘food’ for Search Engines, and SEO. One document, many purposes. Repurposing the content with the least amount of effort is what we are after – markup languages is how we do it.

I will be talking more about the Semantic Web and SEO more specifically in future articles.

Let’s have a conversation about how we can use tags to enrich your data and create intelligent content .