Put that data-* attribute away, son...You might hurt someone

HTML 5 data-* attributes allow us to add custom attributes to elements as long as they are prefixed with ‘data-’ and since this was first discussed on John Resig’s blog I’ve been interested in how people will use and abuse this feature. I greeted the feature with mixed feelings. It’s definitely a simple way to enrich the semantic value of HTML pages as well as helping to improve some of the more toxic parts of Microformats. XML namespaces are definitely a more complete solution but this is a simple and immeadiately adoptable means to add invisible semantic data to HTML documents.

However, as John hinted in his post, there’s an enormous temptation for JavaScript authors to use this to embed configuration data for their scripts directly into HTML. Many developers have been itching for an excuse to do this for a long time. Some just added attributes willy-nilly like crazy web standards bandits, some would love to do add arbiturary configuration into their HTML but felt a bit squeamish about moving away from the HTML specs and opted to abuse the class attribute from within the standard. For the record, I’d tend to side with the former. If it works and there’s a good reason for it then I say do it. However, back then I explained why there is no good reason to add unsemantic configuration data into your HTML and now that we have standards-approved carte-blanche to do this I’d like to reiterate that it’s still not the way forward. If you’ve not read that article then its worth a quick read before you go on.

By all means, use data-* attributes to add semantically valuable data to your HTML but if you are just using it to prop up a script you are writing think again.

An Example

If you’ve not already watched it go now and watch Yehuda’s Screencast on evented programming with jQuery. The ideas in here represent a massive progression in client-side scripting. It’s nothing short of essential viewing. However, it also happens to be the latest example I’ve come across of needless use of data-* attributes and, while not wanting to take away from how progressive and clever the content is as a whole, I feel the need to use it as my counter-example for this article.

In the screencast, Yehuda is creating a tab interface. The markup he proposes is something like this:

<ul class="tabs">
  <li data-content="first">First</li>
  <li data-content="second">Second</li>
  <li data-content="third">Third</li>
</ul>

<div class="pane" id="first">Some content</div>
<div class="pane" id="second">Some content</div>
<div class="pane" id="third" class="selected">Some content</div>

The idea being that when the tab <li> is clicked the script then interrogates data-content to decide which div to show. However, without the JavaScript operating on this the HTML has no semantical value. The <li>s are just list elements (and will be read as such by assistive technologies). In fact, the browser doesn’t know that those list elements are in anyway associated with <div>s below. Here’s how I think it should be marked up:

<ul class="section-nav">
  <li><a href="#first">First</a></li>
  <li><a href="#second">Second</a></li>
  <li><a href="#third">Third</a></li>
</ul>

<div class="section" id="first">Some content</div>
<div class="section" id="second">Some content</div>
<div class="section" id="third">Some content</div>

Now, before we even add JavaScript we have links that we can click that will jump you to the specified content. If you click the back button you will jump back to the previous tab’s content. With this in place you could even make tabs work solely by using CSS and the :target pseudo-selector. If you wanted to go HTML 5 crazy you could even use <nav> and <section> elements which would further enhance the semantics of the document. By correctly associating the tab link and the tab content we can take advantage of the browsers facilities to navigate this type of content even before we get out the old JavaScript crowbar.

With this markup as a base it’s then just as trivial to hook in the script but instead of interogatting data-content we just look at the anchor of the link. Because we are now using anchors, users can deep link into a particular tab, it would be trivial to support the back button and assistive technologies will make better sense of it, amoung other things.

Leave Yehuda Alone!

Of course, I’m picking apart what was a very simple and purposefully contrived example, but as usage of data-* attributes picks up, it’s important to not abuse this facility and to continue find as many semantic hooks for your scripts as possible. It may now be a “standard” but it doesn’t mean that its a good solution. When looking for hooks for my scripts, this is the process I follow:

1. Build up your markup to be as meaningful as possible. If it submits a request it should be a <form>, if its linking to another piece of content it’s an <a>. Even if you’re building a very complex piece of UI seek to build as much of it as you can into your document (while keeping the semantics intact) before you go anywhere near your JavaScript.

2. Write your script to take advantage of the semantics your HTML document has to offer. This will get you a long way in many cases, however, you may well find that there is still configuration information you need to pass into your script. Rather than turn to data-* attributes its best to consider inferring this information via context in the same way that CSS does. This way you can assert things like “all <input>s with type ‘slider’ and a class ‘day’ have a min of 1 and a max of 31” then you can change this in one place rather than visiting each element’s data-* attributes individually. Read this article for more detail on how to do that. We don’t need to change the heading colour in every single heading element in our site now we have CSS, let’s not start doing that kind of thing again now we have data-* attributes.

I welcome the data-* attribute. It’s a simple and immediately useful method to add custom semantic data to HTML documents. Just avoid using it to litter implementation-specific crap into your documents :)

24 Comments (Closed)

I have seen some people use data-* to have some data attached to the node.

jQuery metadata plugin allows one to embed data in many other ways without resorting to having data-*. Checkout this excellent plugin at http://plugins.jquery.com/project/metadata .

NeerajNeeraj at 27.01.10 / 15PM

I think the markup you suggest for the Tab control is spot-on and makes semantic sense.

As far as good use cases go, I think there are certain cases where it makes sense. For instance, showing a human friendly Date/Time combo, but embedding the timestamp in a `data-time=”...”` attribute for easy JS parsing and use.

As far as auto-configuring javascript based on the HTML, it just seems like a complicated way to go about it.

Good points and thanks for sharing!

Douglas NeinerDouglas Neiner at 27.01.10 / 16PM

Exactly my thoughts while watching the presentation. Not to take away the main point of trying to think “what would the browser do” before coming up with your implementation, which I thought was a simple but powerful idea.

IsmaelIsmael at 27.01.10 / 16PM

In cases like the tab navigation, I’ve also semi-abused the ‘rel’ attribute for data- element-like behavior. It’s not 100% correct, but the anchor markup looks like it would be harder when you need JS buttons to flip tabs in order. With sequential numbers in ‘rel’, it’s pretty straightforward to grab the current tab, increment the value and bring that content to the front.

Matt JonesMatt Jones at 27.01.10 / 16PM

Douglas: Agreed. I think data-* attributes would be good way to avoid the abbr pattern in Microformats. In that case you are adding a machine readable date to an element which isn’t implementation specific and does enrich the document semantically.

Ismael: I’d like to make clear that I really like the screencast as a whole. It just was the last thing I saw that demostrated what I wanted to talk about. Im sure Yehuda would have marked it up the other way but the markup was not central to what he was explaining so its easy to overlook.

Matt: Leave the rel attribute alone, you fiend!!!

Dan WebbDan Webb at 27.01.10 / 17PM

Your core point is very much correct; that application creators should use the best available semantics for their scripting before extending it with embedded data. There is very much a danger that given a generic, script-centric mechanism like this people with poor awareness of document structure will just skip over the functionality of HTML to make their scripts work.

One correction: You cite microformats as a possible consumer of data-* attributes, that is incorrect. The data-* spec explicitly excludes using date-* attributes for organized, shared vocabularies, or for parsing by anything outside of the page context. The feature is explicitly for in-page functionality, not for sharing structured data.

Microdata is the part of HTML5 that would provide new syntax for microformat vocabularies, but of course the current state of the infighting means that has been taken out of the spec.

Concerning the abbr-pattern, an alternative, HTML4-valid solution—the value-class-pattern—was published last year.

Ben WardBen Ward at 27.01.10 / 20PM

Ben: Thanks. That’s really interesting. So it sounds like proposed usage of data-* attributes is almost exclusively what I would consider an abuse of the attribute. Hmrph. Off-hand, I can think of a few good uses for data-* within the scope of in-page functionality but there aren’t going to be many that couldn’t be achieved in a better way. It’s like a build your own semantically useless markup kit. data-animate-left=”5px”. Worse than I thought.

I’m obviously behind on Microformats. I’d not come across the value-class-pattern before. I’m glad a better solution exists for the pattern now.

Dan WebbDan Webb at 27.01.10 / 21PM

Great read. I’m interested in your opinion of something like this:

<a href=”some_url” class=”popup” data-height=”460px” data-width=”600px” /> ...

Using the data attribute to hold the height and width info of the popup to create configurable html popups. This is somethign I’ve been advocating as a way to make it easier for designers to make popup windows happen without the need for onclicks containing the window.open stuff.

Or would you consider this to be an abuse too?

Brian HoganBrian Hogan at 28.01.10 / 06AM

This is a great article Dan, and one I find useful because it’s not just the old overcooked argument about semantic markup, but rather a pragmatic appeal to leverage existing standards where applicable. I think the main problem here is the divide between web designers and developers. Traditionally designers have been the ones raving about standards and semantic markup, largely because HTML and CSS were their bread and butter whereas back-end developers had to concern themselves with a much thicker stack thus where HTML is sometimes treated as icing rather than the foundational layer of the client-side.

The data attributes in HTML5 are a great feature that will inevitably see a lot of abuse as people figure it out. The carnage will be substantially less than the abuse of the TABLE element, but conceptually a similar pitfall.

I would like to share one way I’ve used custom attributes in the past that I think illustrates how they can be most useful. On The Auteurs Film Library we generate some pretty standard HTML for the ratings, which shows the average rating on the site. However we also use a delegated javascript to enable user rating right inline on rollover. Once a user rates a film then the rating is persistent. If the user reloads the page, they should still see their rating there. But instead of checking this dynamically on the back-end, we simply continue to output the same standard HTML with a custom film_id attribute (no data prefix because I did this before I was aware of the HTML5 work). Javascript then scrapes up all the div.star_ratings and uses their film_ids to load the logged-in user’s ratings dynamically.

The benefits here are huge. In essence the loading of a user’s star ratings is decoupled from the template hierarchy in which they appear. That may not sound like much, but in a standard Rails controller where the data is being loaded, it’s not necessarily straightforward to know what data some sub-template is going to need. This technique allows a template to specify that some data is needed, but the data is only lazily loaded by the client where it is easy to wrap up all the necessary data into one optimized request, even if they are generated by disparate partial templates that know nothing about each other. Once user-specific data is lazily loaded from standard markup then suddenly the entire page is naively cacheable leading to a several-orders-of-magnitude scalability improvement.

Now, a custom attribute isn’t strictly necessary here. We could infer the film id from some outer tag, but that would require a good deal more javascript, and would be much more brittle with various contexts. So in this case, a small custom attribute goes a long way to providing very rich client-side functionality and removing a good chunk of server-side complexity along the way.

I’ve found the technique so useful that I’m planning on writing a small javascript shim and possibly Rack middleware that makes it easier to employ this technique by wrapping up multiple resource requests into a single AJAX request so you don’t need to worry about multiple concurrent requests slowing things down. To be released as a Rails 3 plugin. More on my blog soon…

Gabe da SilveiraGabe da Silveira at 28.01.10 / 07AM

Sounds like a simple argument.

You’re saying that data-* attributes are the style attributes of scripting. You can use inline styles, but any programmer worth his salt detaches completely into separate CSS files.

Separation of concerns and all that. Same deal.

Gabe makes a valid point though. Sometimes the data needs to be output into the page with the HTML. If that data can’t be semantically incorporated, then that’s when you dig out the data-* attributes.

It’s a niche case that excuses the invention, and will lead to some horrible markup crimes.

W3C should make a damn clear case about best practice right there in the spec.

KennethKenneth at 28.01.10 / 09AM

@Kenneth I would just amend that first part though. I don’t detach completely into external files. Even the much-derided style attribute has a handful of critical uses where any workarounds would be ugly hacks. I think the key is having a broad understanding of the full ecosystem of HTML/CSS/Javascript plus SEO/Accessibility and whatever back-end you’re using.

Gabe da SilveiraGabe da Silveira at 28.01.10 / 17PM

@Brian: that belongs in a stylesheet. You never know which device will open that popup (iPhone, iPad, desktop browser) so you have to fine-tune its dimensions together with the rest of the layout.

MislavMislav at 28.01.10 / 18PM

@Gabe da Silveira

Dimensions for window.open() can be stored in the CSS? Explain how please. I’d love to see that. I could see that working for a div that opens as an overlay, but not for a new window.

Even still, I don’t trust mobile stylesheets (as the iPhone is just gonna use screen ones anyway) so I end up serving different views.

But very interested to know how you’d use CSS to solve this issue.

Brian HoganBrian Hogan at 28.01.10 / 20PM

Sorry, that was meant for @mislav . My mistake.

Brian HoganBrian Hogan at 28.01.10 / 20PM

Gabe da Silveira: What exactly is a good example of using the style attribute?

I think in the case of the page you showed I’d avoid the problems you had by changing the markup to be a bit more semantic. You don’t need to use links with a hashed out href there at all. It would be better to use forms with input tags (maybe radio buttons, you could include the film id as a form field) and replacing that out with a star rating widget with javascript. That way the markup would be much closer (and even completely usable) without javascript. In my opinion the page that you showed as a good example of custom attribute usage is actually a good example of where semantic markup would make all the difference.

Dan WebbDan Webb at 29.01.10 / 01AM

thx,dan for the good article will test this once. regards from cologne

FrankFrank at 29.01.10 / 14PM

Hi, Dan! Really an usefully article. I hope you put more Articles like this. Interesting!

Thanks and many regards from Germany

DruckertinteDruckertinte at 29.01.10 / 21PM

Exactly how I feel. Too bad it appears Rails 3 “UJS” refactoring will be using these attributes. At least the last time I checked anyway.

cheapRoccheapRoc at 30.01.10 / 17PM

@Dan

Ugh, it seems like you missed the point even with my detailed explanation. I could sit down with you over coffee for an hour and explain the history behind this code and why it is the way it is, but that is immaterial. You claim that more semantic markup would “make all the difference” and “avoid the problems I had”, but you are fixated on minor markup details which actually have nothing to do with the custom attribute or the problems I was solving

The point is, where do we store the film_id? Do we try to shoehorn it into a standard “semantic” attribute somewhere, maybe parsing it out of an class like “film_xxxx” on an ancestor element or do we just use a custom attribute?

There are real benefits to custom attribute usage vs attempting to put it in a standardized HTML attribute for its own sake. I am saddened that I was unable to show the forest for the trees.

GabeGabe at 01.02.10 / 18PM

Gabe: Apologies for not being more specific. Hopefully this will explain what Im getting at.

I’d use either a select element containing values 1 to 5 or radio buttons. Depending on how the page was structured each rating could be its own form in which case you’d put the film_id either in a hidden field or as part of the form action, or you could put a form around the all the ratings and name select select boxes film[id] (where id is the film id).

With this in place your ratings would work without javascript at all. Then once this is working you could add javascript to hide the select/radios and replace with your rating widget pulling the film id from the markup.

If you weren’t allowed to use javascript to implement this feature you’d have to use form elements in the way I describe rather than hashed out link elements. What I meant was that if you design the solution this way before implementing the javascript there’s more often than not clear way to implement the enhanced version that both avoids using custom markup and offers graceful degradation.

Dan WebbDan Webb at 02.02.10 / 02AM

That’s a nice strategy for progressive enhancement, and something that in other contexts I have pursued. However we are a VOD site which has, from the very beginning required flash and javascript. The merits of that decision can be debated, but given that it’s a conscious devision of where to devote resources, the marginal cost of implementing a form—deciding how to style it, how to hide it, and whether or not it even makes sense for a non-editable rating, writing more and slower javascript to extract the necessary values from a more complex DOM fragment—just isn’t worth it.

In short, the most semantic way of modeling something in HTML isn’t better per se. If it makes things more complex it needs to be justified just like anything else.

Gabe da SilveiraGabe da Silveira at 03.02.10 / 19PM

Hi Dan, we follow your blog on daily bases. This is a great article, and hope you will continue with good work. Best regards from ITS team. Thanks for making it easier for all of us to work and understand new stuff and improving our sites.

Infotech web design studioInfotech web design studio at 28.02.10 / 15PM

Thanks a lot for this informative article. regards

JoshJosh at 27.03.10 / 18PM

Your Article are very informative and very interesting. I hope it´s going be a longer time ;)

Thanks again, Dan

ZahnimplantateZahnimplantate at 09.04.10 / 05AM

About This Article