Alice's DITA Blog: 2009

Wednesday, 30 December 2009

3.11 References and Resources

URLs:

Blog: http://aghalsey.blogspot.com

Webpage: http://www.student.city.ac.uk/~abgy261/

JavaScript: http://www.student.city.ac.uk/~abgy261/javascript.html

References:

Articulate. (2009) E-Learning Software and Authoring Tools. [Online] Available from: http://www.articulate.com/ [Accessed 30 September 2009].

ASP. (2009) The Official Microsoft ASP.NET site. [Online] Available from: http://www.asp.net/ [Accessed 8 December 2009].

Bates, M. J. (1989) 'The design of browsing and berrypicking techniques for the online search interface'. Online Review, 13(5), pp. 407-424.

Berners-Lee, T. (1999) Weaving the Web: the origins and future of the World Wide Web. London: Orion Business.

Bing. (2009) [Online] Available from: http://www.bing.com/ [Accessed 29 November 2009].

Bishop, L. (2007) 'Moving data into and out of an institutional repository: Off the map and into the territory'. IASSIST Quarterly. Fall and Winter, 2007. [Online] Available from: http://www.iassistdata.org/publications/iq/iq31/
iqvol313bishop.pdf [Accessed 10 November 2009].

Bosak, J. and Bray, T. (1999), 'XML and the Second-Generation Web', Scientific American, 280(5), pp. 89-94.

Bush, V. (1945) 'As We May Think'. The Atlantic Monthly. July 1945. [Online] Available from:
http://www.theatlantic.com/doc/194507/bush [Accessed 9 October 2009].

Chapman, C. (2009) 'The Evolution of Web Design'. Six Revisions. [Online] Available from: http://sixrevisions.com/web_design/
the-evolution-of-web-design/ [Accessed 1 December 2009].

Delicious (2009) Delicious Social Bookmarking. [Online] Available from: http://delicious.com/ [Accessed 28 September 2009].

Digg (2009) Digg - The Latest News Headlines, Videos and Images. [Online] Available from: http://digg.com/ [Accessed 28 September 2009].

Dublin Core Metadata Initative (2009) Making it easier to find information. [Online] Available from: http://dublincore.org/ [Accessed 9 November 2009].

Engard, N.C. (2009) Library mashups: exploring new ways to deliver library data. London: Facet Publishing.

Flickr (2009) Welcome to Flickr. [Online] Available from: http://www.flickr.com/ [Accessed 20 October 2009].

Google (2009) About Google Books. [Online] Available from: http://books.google.co.uk/intl/en/googlebooks/about.html [Accessed 20 October 2009].

Henderson, M. (2000) 'FM Interviews: Louise Addis'. First Monday, 5(5). [Online] Available from: http://131.193.153.231/www/issues/
issue5_5/addis/index.html [Accessed 9 October 2009].

Intute (2009) Home. [Online] Available from: http://www.intute.ac.uk/ [Accessed 20 December 2009].

Jadu (2009) Jadu Content Management Systems. [Online] Available from: http://www.jadu.co.uk/ [Accessed 20 December 2009].

JavaScript Lint (2009) Online Lint. [Online] Available from: http://www.javascriptlint.com/online_lint.php [Accessed 8 December 2009].

Krill, P. (2008) 'JavaScript creator ponders past, future'. InfoWorld. June 23 2008. [Online] Available from: http://www.infoworld.com/d/developer-world /
javascript-creator-ponders-past-future-704
[Accessed 8 December 2009].

Krill, P. (2009) 'Eich: JavaScript getting faster, could displace Flash'. InfoWorld. November 6 2009. [Online] Available from: http://www.infoworld.com/d/developer-world/
eich-javascript-getting-faster-could-kill-flash-251
[Accessed 8 December 2009].

Matthews, R. (2004) JPG vs GIF for Web Images. [Online] Available from: http://www.wfu.edu/~matthews/misc/jpg_vs_gif/
JpgVsGif.html [Accessed 20 October 2009].

Morville, P. and Rosenfeld, L. (2006) Information architecture for the World Wide Web. 3rd ed. Sebastopol, CA: O'Reilly.

Open Ajax Alliance. (2009) Introducing Ajax and OpenAjax. [Online] Available from: http://www.openajax.org/whitepapers/
Introducing%20Ajax%20and%20OpenAjax.php [Accessed 8 December 2009].

Open Content Alliance (2009) [Online] Available from: http://www.opencontentalliance.org/ [Accessed 20 October 2009].

Palfrey, J. and Gasser, U. (2008) Born digital: understanding the first generation of digital natives. New York: Basic Books.

Pitti, D. (1999) 'Encoded Archival Description: an introduction and overview'. D-Lib Magazine, 5(11). [Online] Available from: http://www.dlib.org/dlib/november99/
11pitti.html [Accessed 10 November 2009].

Prensky, M. (2001) Digital Natives, Digital Immigrants. [Online] Available from: http://www.marcprensky.com/writing/ [Accessed 9 October 2009]

Samuelson, P. (2009) 'Google Books is Not a Library'. The Huffington Post. October 13 2009. [Online] Available from: http://www.huffingtonpost.com/pamela-samuelson/
google-books-is-not-a-lib_b_317518.html [Accessed 20 October 2009].

SIMILE Project (2009). About SIMILE. [Online] Available from: http://simile.mit.edu/wiki/SIMILE:About [Accessed 20 October 2009].

Stanford University (2005) The Medlane Project. [Online] Available from: http://xmlmarc.stanford.edu/ [Accessed 10 November 2009].

TARO (unknown) List of repositories. [Online] Available from: http://lib.utexas.edu/taro/browse/index.html [Accessed 20 December 2009].

TEI (2004) A gentle introduction to XML. [Online] Available from: http://www.tei-c.org/release/doc/tei-p4-doc/html/SG.html [Accessed 10 November 2009].

University of Leeds (2009) [Online] Available from: http://www.leeds.ac.uk/ [Accessed 20 December 2009].

Validome.com (2009) Validator for XML Documents. [Online] Available from: http://www.validome.org/xml/ [Accessed 10 November 2009].

W3C (2004) What is the Document Object Model? [Online] Available from: http://www.w3.org/TR/DOM-Level-3-Core/introduction.html [Accessed 13 November 2009].

W3C (2009a) The W3C Markup Validation Service. [Online] Available from: http://validator.w3.org/ [Accessed 13 November 2009]

W3C (2009b) The Extensible Stylesheet Language Family (XSL). [Online] Available from: http://www.w3.org/Style/XSL/ [Accessed 13 November 2009].

WebMonkey (2008) JavaScript Tutorial. [Online] Available from: http://www.webmonkey.com/tutorial/JavaScript_Tutorial [Accessed 8 December 2009].

Wilson, T.D. (1999) 'Models in information behaviour research'. Journal of Documentation. 55(3), pp. 249-270.

Resources Used:

ADAM (unknown) Boolean search tips. [Online] Available from: http://adam.ac.uk/info/boolean.html - a simple guide to Boolean searching.

Beighley, L. (2007) Head First SQL. Cambridge: O'Reilly. - an entertaining(!) and readable guide teaching the basics of manipulating relational databases using SQL.

Castro, E. (2007) HTML, XHTML and CSS: visual quickstart guide. 6th ed. Berkeley, CA: Peachpit Press. - a comprehensive guide that helped me out in writing HTML and CSS for this module, pitched at a fairly advanced level for people who already have some familiarity with the technologies but wish to take their skills further.

Flickr (2009) [Online] Available from: http://www.flickr.com - a Web 2.0 site containing a vast amount of images. The image of the typewriter keys used in my header was taken by Jetheriot and is available for manipulation under a Creative Commons licence.

Freeman, E. (2006) Head First HTML with CSS and XHTML. Farnham: O'Reilly. - a lively introduction to HTML and CSS, with an emphasis on creating valid websites which are cross-browser compatible. Good for absolute beginners, although I have been using HTML and CSS for several years and learned plenty of new skills from this book.

GIMP (2009) [Online] Available from: http://www.gimp.org/- a freely distributed image manipulation programme for raster graphics.

Moncur, M. (2006) Sams teach yourself JavaScript in 24 hours. 3rd ed. Hemel Hampstead: Prentice Hall. - a little bit too detailed for what I needed to know for this module, and JavaScript requires so much practice I doubt anyone could learn it in 24 hours, but this book certainly helped me with Exercise 3.9.

nsftools.com (unknown) Microsoft Windows Command-Line FTP Command List. [Online] Available from: http://www.nsftools.com/tips/MSFTP.htm - an invaluable resource of FTP commands for putting my files on my City WebSpace using the Windows command line prompt.

Ray, E. T. (2003) Learning XML. 2nd ed. Sebastopol: O'Reilly. - broad, brief reference on XML although lacking in practical examples.

Stanicek, P. (2009) Color Scheme Designer 3. [Online] Available from: http://colorschemedesigner.com/ - very useful website for creating visually appealing colour schemes, allowing you to preview how text will appear on a page, and tweaking with the contrast, brightness and intensity of schemes. You can either pick a colour, or enter a hex code of a colour you wish to base your site around.

Whatis.com (2009) [Online] Available from: http://whatis.techtarget.com/ - somewhat laden with advertisments, but a useful primer on providing succinct definitions and clarifying the concepts learned during lectures.

W3Schools (2009) Online Web Tutorials. [Online] Available from: http://www.w3schools.com/ - wide range of tutorials covering standards-compliant SQL, CSS, XML, JavaScript and HTML, with a wide range of try-it-yourself examples. An integral resource for the module.

Exercise 3.10 -- Information Architectures

Information architecture is at the heart of what librarians do. DITA has taught me how to organise, categorise and classify information in a digital environment and teach users that a wide variety of search strategies must be employed to find what they need (Bates, 1989). Palfrey (2008) has highlighted how children born today have digital dossiers containing information on them before they are born, but being exposed to digital technologies does not equate to expertise in finding, managing and critically evaluating information. With the amount of global digital information growing exponentially, there will always be a need to guide people through it and collate disparate information into quality resources, so the role of a librarian shows no signs of obsolescence.

Organisation schemes are vital in collating digital information. Morville and Rosenfeld (2006) highlight the use of shallow hybrid schemes. An example is Leeds University's website, powered by the document-centred Jadu Content Management System. The site blendes exact alphabetical organisation schemes and ambiguous task-oriented schemes. Non-hybrid schemes are useful for subject gateways such as Intute, but not for sites like TARO where an alphabetical scheme has been used to classify topical and geographic information. Consideration must also be given to labels. Leeds University's site shows examples of ambiguous labels ('big ideas') and discrepancies and duplication in labelling ('Leeds and Yorkshire', 'Why Leeds?' and 'Choosing Leeds'.) This week's task of identifying a mystery vegetable shows how difficult it is to find information without consistent labelling: is kohlrabi a stalk, bulb or root vegetable or a brassica? For a user to find this information, it must be indexed as all four.

Felicitous information architectures will be organised to reflect their user's needs. A site without 'bells and whistles' is preferable to a visually aesthetic site offering disorganised chaos. Unfortunately, not all those creating and organising digital information are mindful of this fact.

Tuesday, 8 December 2009

Exercise 3.9 -- Client side programming

This week, we learned how JavaScript can be used to add interactivity to web pages. JavaScript was developed by Brendan Eich in 1995 as a flexible language that web designers with little programming experience could quickly learn (Krill, 2008). It is primarily used as a client-side programming language executed by the browser, although server-side JavaScript platforms such as ASP also exist.

In the exercise, we used JavaScript to elicit information from users to direct them to an appropriate hyperlink. The WebMonkey tutorial aided me in this exercise, as did the JavaScript lint online validator which allowed me to check my code for errors. I added a while loop so users would be prompted to re-enter information if they entered information which did not match that defined in the variable. I experimented with window.display to redirect users automatically or open hyperlinks in new windows, but felt users would rather decide for themselves if they wanted to visit the hyperlink. My finished programme is available from http://www.student.city.ac.uk/~abgy261/javascript.html .

The exercise gave me confidence in writing simple scripts. At work, I could use JavaScript to direct users to subject-specific resources, or a version of a web page tailored to their Internet Browser. JavaScript has far more sophisticated applications, however. Web technologies such as AJAX use it to bring together technologies such as CSS, XHTML and XML (Open Ajax Alliance, 2009) and power applications such as Google Maps which update in real-time and do not require plug-ins, improving speed and usability. Though JavaScript has existed for fourteen years, it is a potent high-level programming language and I am inclined to agree with Brendan Eich's assertion that in the future, "we'll see even more JavaScript." (Krill, 2009)

Sunday, 29 November 2009

Exercise 3.8 -- Information Retrieval

An understanding of techniques used to retrieve unstructured information is vital for meeting information needs as methods are vastly different from those used to search structured data. I frequently perform searches to meet both my own and my customers' research needs, but had given little thought to formulating research zones and content components (Morville and Rosenfeld, 2006, p.151) and the models mentioned by Wilson (1999).

"Problem solving is the underlying motivation for information searching." (Wilson, 1999, p.265) When met with a specific problem, such as, 'When was JavaScript invented?' a known approach is useful but cannot help with vague queries. Exhaustive approaches are useful for narrow topics, but a search for "simple SQL queries" on Bing returns 67.5 million results. I have found the exploratory technique most useful for this module. For example, "javascript tutorial" AND "for beginners" gives 13,000 results. Reading the first few results, I was interested in learning about while loops and alert boxes, so used the following query to retrieve a manageable and relevant set of results:

("while loop" OR "alert box") NEAR "javascript tutorial"

Other considerations include using proper nouns. HTML or html would make little difference, but case-sensitive searching would aid us when, for example, searching for information on Apple or Adobe. Boolean searches can aid us, for example users looking for information on the democratisation of information may find information on the Democratic political party if they do not employ effective search techniques.

Information retrieval is much more difficult than data retrieval, as we must use our information-seeking skills to transform data into information. Morville and Rosenfeld (2006) assert, "...search is there for users," (p. 150) and we must look at information seeking from a user's perspective if we are to meet their information needs.

Friday, 20 November 2009

Exercise 3.7 -- Databases

Organisations storing digital information should familiarise themselves with relational databases. They are secure, quick to access, and high in integrity compared to data held in databases using the file format. This week SQL queries were used to manipulate and retrieve data from biblio, a relational database; I took the hypothetical task of weeding Computing books from a library.

Before running any queries, I used the show tables and desc commands to check the contents of the database and the format the data was held in, which affects how the data can be meaningfully displayed and manipulated. In the titles table, for example, the subject field was poorly populated and the notes field poorly described as it appeared to contain Dewey Decimal classmarks.

I ran a simple query to find all books in the '...for Dummies' series, ordered by year published, displaying data in a grid format. This list had 48 items, but was not very specific. For instance, the librarian for Computing may tell me she only wants to discard books over ten years old, and does not want to weed any books on C++. The following query was entered, yielding 44 results (click to enlarge):

Lastly, the ordering team may wish to know details of the publisher and ISBN alongside the other details; the following query was used to join the publishers and titles tables together (click to enlarge)

Though a hypothetical example, I undertake similar collection management tasks at work using the Millennium library management system, which uses Boolean logic to search patron, item and bibliographic records. I have learned vital transferable skills as I know how to extract data precisely and adeptly, as opposed to running vague queries and sifting through print-outs due to employing inefficient search techniques.

Friday, 13 November 2009

Exercise 3.6 -- CSS

This week, I learned about the Document Object Model (DOM), a W3C standard which lets programmers manipulate XML and HTML documents by incorporating scripting languages. This allows for dynamic documents wihch are cross-browser compatible (W3C, 2004). I then learned about Cascading Style Sheets (CSS), used to separate the presentational elements of an HTML document from its structure. Styles can be included within documents, but linking to an external document is preferable as the advantages of CSS can be fully exploited. Part of this week's exercise entailed applying presentational elements to my web page using CSS, and my examples are available from http://www.student.city.ac.uk/~abgy261/css.html.

When accessed for the first time, a copy of the CSS is saved in the browser's cache; this saves bandwidth if subsequent files using the CSS are accessed. A CSS includes all the presentational elements, so the size of the HTML document may be reduced. Global changes can be made from a single document, saving authors time and allowing consistency. When creating web pages at work, I apply the default style sheet so my page fits with Leeds University Library's overall look and feel. CSS documents also improve accessibility, for example including styles for screen readers, mobile devices and a style to correctly render a document for printing. Unfortunately, a CSS only tells the browser how information should be displayed. Browsers do not adhere to the rules, so as with all web documents, it is crucial to check the presentation on a variety of browsers and use a tool like the W3C's validator.

A CSS can be used to format XML documents. Learning how to do this was beyond the scope of the lectures, although I learned that the W3C recommends XSLT (XSL Transformations) for this purpose as it is more extensive and extensible, although current browser support for this technology is poor (W3C, 2009b).

Tuesday, 10 November 2009

Exercise 3.5 -- XML

Prior to this week's exercise, I had heard of XML, but did not realise its relevance to libraries. Unlike HTML, which Bosak and Bray (1999) likened to a fax machine, XML focuses on data's meaning as opposed to its presentation (TEI, 2004). As an extensible language, authors of XML documents can write their own DTD or use an existing DTD such as Dublin Core to define task-specific elements and attributes, describing data with as much semantic precision as necessary. XML was designed to be interoperable, so any application with access to the DTD specified in an XML file can make sense of it. Information can be exchanged between different applications without loss of meaning. These factors facilitate the retrieval of data.

In academic libraries, XML has been used to transfer data between institutional repositories (Bishop, 2007). Special Collections staff at Leeds University Library are receiving training in cataloguing manuscripts using the EAD (Encoded Archival Description) DTD for inclusion into the library's digital repository. Stanford University's Medlane Project created XMLMARC to convert MARC catalogue records into XML and facilitate their manipulation.

For this week's exercise, I used XML to solve the problem of collecting information about our missing library items, which are currently recorded on manually sorted paper slips. This would allow information to be passed between customer service staff, academic librarians, and collection management services. The links below show my DTD and XML file, which were checked for validity and well-formedness using validome.org's XML validator.

http://www.student.city.ac.uk/~abgy261/missinglist.dtd
http://www.student.city.ac.uk/~abgy261/missinglist.xml

Bosak and Bray (1999) envisaged, "...millions of XML documents pulsing around the Internet," yet a cursory look at the Internet shows this is not yet the case. To best convert information into knowledge we must imbue digital information we create with semantic meaning.

Tuesday, 20 October 2009

Exercise 3.4 -- Images and Graphics

In this week's exercise I edited images using the JPEG and GIF formats. The JPEG format is appropriate for displaying photographic images on the Web due to its relatively small file size, but data is lost when the file is modified and cannot be recovered. GIF files are lossless but larger in size than JPEGs and only support 256 colours (Matthews, 2004). They are more useful for graphs and greyscale images and, unlike JPEGs, can be transparent or animated. The PNG format is growing in popularity as it is lossless like a GIF but supports 16 million colours although the file size is larger than JPEGs and it is not supported by older browsers (ibid). Irrespective of the format chosen, it is vital to save an original copy of an image before editing it for use on the Web. The results of this week's exercise are available from: http://www.student.city.ac.uk/~abgy261/task4.html

Libraries are at the forefront of new techniques to digitally represent graphical information. MIT's SIMILE project spatialises digital information, creating maps and images which aid users navigating through information spaces. Mash-ups can be used to seamlessly combine data from several sources to create a new resource, for example integrating images from Flickr into OPACs (Engard, 2009) which can save time and deliver relevant information more quickly, ameliorating problems with information overload.

Digitisation is a socially and culturally important concern for libraries. Initiatives such as Google Book Search and the Open Content Alliance are trying to create a global library of information by converting books and historical documents into graphical formats, ensuring printed information is not lost forever. In addition to legal ramifications (Samuelson, 2009) there are also the practical limitations pertinent to all graphical information on the Web: the problem of maintaining quality and integrity while coping with limitations of user's bandwidth speeds.

Friday, 9 October 2009

Exercise 3.3 -- Internet/WWW

Through this exercise, I realised my understanding of the Internet's inherent technologies and information architectures was poor. As a digital native, the exponential growth of the Web in the 1990s and 2000s has changed the way I think and work (Prensky, 2001) but I knew little about the underlying technologies driving it.

This week, I read 'Weaving the Web' by Tim Berners-Lee, which broadened my understanding of the history of the Internet. HTML was designed to, "convey the structure of a hypertext document, but not details of its presentation." (Berners-Lee, 1999, p.45) Many web authors are guilty of using semantic tags for presentation, and in the late 1990s it was common to see tables used as navigational aids (Chapman, 2009). Using a limited set of tags taught me sticking to semantic tags is best practice as presentational tags can bloat file size, limit a user's control over how web pages are displayed, and are not supported by older browsers. Semantic tags also aid the creation of accessible HTML documents, for example the ALT tag can be used to provide visually impaired users with audio descriptions of hyperlinks. Working with a limited set of semantic tags and manually checking pages is vital as even valid HTML documents are displayed differently on different browsers. The index page uploaded to my WebSpace is XHTML 1.0 Strict compliant.

The Web is dynamic, and continually evolving. Ideas of Web 3.0 are beginning to emerge, as is the semantic web, where information will deliver itself to us (Berners-Lee, 1999). Similar ideas were described by Bush (1945) in his concept of the Memex, and I feel the ideas he espoused are still relevant today. The first web server outside of CERN was encouraged by a librarian, Louise Addis (Henderson, 2000) and librarians and information professionals have a tradition of being at the forefront of digital information technologies. Remaining so is vital to educate people on how best to use them for finding, managing and organising digital information.

Thursday, 1 October 2009

Exercise 3.2 -- Text/HTML

Digital information consists of binary data, strings of unformatted ones and zeroes. Computers are used to interpret, represent and manage this information. This week's exercise taught me there are several considerations anyone creating, manipulating or storing digital information must bear in mind.

The format data is stored in is an important consideration. Different applications represent the same data in different ways. Proprietary formats such as Microsoft Word's DOC lose information when opened in applications such as Notepad, as the image below shows. Metadata, and semantic mark-up in particular, must be considered from the point of data creation to ensure the information is imbued with meaning and is interoperable across a wide variety of applications and platforms.

DOC file opened in Notepad (41KB)

The document-centred view of data is useful as files are embedded in documents, so a single file stored centrally can be part of many different documents. Duplicate copies of the original file need not be made, and updates to the original file are reflected automatically. This is vital for my job at Leeds University Library. When creating tutorials with specialist software such as Articulate, I embed them in Powerpoint documents, ensuring information is correctly displayed on PCs which lack the specialist software.

Lastly, when storing files, consideration must be given to file names and folder organisation. What is meaningful at the time of creation often differs from what is meaningful at the time of retrieval. Information may be created in a piecemeal manner, but considering organisation from the outset saves time and effort. The image below shows how I have used knowledge from this week's exercise to organise my folder for this module.

Organisation of folder for DITA module (9KB

Organisation of folder for DITA module (9KB

The most pertinent lesson I learned from this week's exercises is that anyone creating digital information is their own information manager. Learning how to organise, manage and represent intangible digital information to best exploit it is empowering.

Monday, 28 September 2009

Exercise 3.1 -- Introduction

It is easy to set up a Blog using Blogger, as users can generate content with little technical knowledge beyond that required to write an e-mail. Anyone with an Internet connection and a basic level of computer literacy can use Blogs to disseminate information, however being a democratic tool, Blogs are often created without consideration of their underlying architecture. If this is the case, their presentational elements may not work well with their organisational elements, and they cannot to satisfy the information needs of their readers. (Morville and Rosenfeld, 2006)

The in-built information architecture of Blogs gives rise to organisational problems as posts are frequently organised chronologically. This is not a germane organisational method for blogs containing a diverse range of topics as users can only browse by date and title. I added a tag cloud to my Blog to allow users to browse serendipitously by topic, allowing them to find information quickly and easily. Though my Blog is not broad in scope, I have aided users in finding and sharing information it contains through including a Google search box and an AddThis widget for users to share my entries on Web 2.0 sites like Twitter, Delicious and Digg.

When creating my Blog, I included an 'About Me' section at the top of the page to inform readers who I was, what I was writing about, and the purpose of my Blog. Lastly, I added a blogroll to provide links to sites readers might find relevant and interesting, and to ensure my Blog did not operate in isolation from the rest of the World Wide Web.

This week's exercise taught me that while Blogs are a democratic tool, anyone creating one needs to be solicitous when devising strategies to organise and present information. Blogs with poor underlying information architecture are of little practical use.