Sunday, August 05, 2007

Many within the company I work for are happy with the deployment of Team Foundation Server (TFS), including people that tend toward the skeptical side with all of Microsoft's offerings. It's great to see such a powerful, integrated tool actually helping development and project management as promised. I am still the main developer supporting our deployment of TFS, so when our configuration management (CM) group requires a tool or any extensibility done to TFS (such as custom controls in our work item types) I am the one to do the implementation.

At the moment, CM is looking for some specialized handling of work item type files, details of which I'll possibly outline in a future post. Since this requires processing the work item type XML files, I went to retrieve the Visual Studio SDK which includes the schema files for the work item types.

The xsd program generates code based on the original schemas. However, passing these files through any validator, such as XMLSpy's, reveals problems with the XML. Since xsd generates code I can use, I could stop here, but I wanted to explore the XML and see if there's a way to fix it. Also, I don't think this is a blocking issue for many at all, explaining why I haven't seen much about this online.

After opening WorkItemTypesSchema.xsd in XMLSpy we get this warning message:

Some of "include" and/or "import" and/or "redefine" statements in the following files have no schemaLocation attribute and will be ignored!

Step 1 was to address this issue, so I added schemaLocation="typelib.xsd" to the import of typelib.xsd near the top of the file.

Then I saved and got the following error message:

This file is not valid! If you save the file in its current state, other XML processors may have a problem opening the file.

When something like this happens, I struggle to find out "why?" If you read Raymond Chen's blog regularly, there are many instances where software such as Windows seems to do something boneheaded but upon thinking about it for a few moments (or viewing it in light of backward compatibility) it makes sense. I struggled for a few days to come up with the best answer I could to explain this failure to validate and I only see one likely possibility, but we'll get to that shortly.

So, why is the schema file invalid? It's because of non-determinism. Searching for "XML" and "non-determinism" on Google brings us to this http://msdn2.microsoft.com/en-us/library/9bf3997x(... page at msdn, that states:

A deterministic schema is a schema that is not ambiguous, allowing the parser used by the Schema Object Model (SOM) to determine the sequence in which elements should occur in order for an XML document to be valid. It is possible for an XML Schema to be ambiguous, or non-deterministic. A schema is considered to be non-deterministic if the parser is unable to clearly determine the structure to validate with the schema. When validation is attempted on a non-deterministic schema, the parser used by the SOM generates an error.

I now have two conflicting pieces of information. I have XMLSpy telling me the schema isn't valid. Then I have msdn at microsoft telling me schemas can be ambiguous coupled with xsd successfully generating code. If we follow the cos-nonambig link in the error window of XMLSpy, it brings us to this page http://www.w3.org/TR/2004/REC-xmlschema-1-20041028... that states:

A content model must be formed such that during ·validation· of an element information item sequence, the particle component contained directly, indirectly or ·implicitly· therein with which to attempt to ·validate· each item in the sequence in turn can be uniquely determined without examining the content or attributes of that item, and without any information about the items in the remainder of the sequence.

We expect the ultimate authority is w3.org. After more Google searches, it appears the existence of ambiguous schemas is (or was, hopefully) expected, even if these schemas don't validate. Rick Jelliffe, a former member of the XML Schema Working Group says:

I have received several very negative reports on the state of interoperability of tools using XML Schema ... The most common complaint is tools that generate ambiguous XML Schemas ... Ambiguous schemas effectively break everything downstream (http://www.w3.org/2005/05/25-schema/rick.html)

Okay, so the schema isn't valid, it apparently violates the specification at w3.org, and we have someone that should have some authority on this matter saying ambiguous schemas are bad. In spite of msdn acknowledging ambiguous schemas can exist (and corroborated by other sites I browsed), I think Microsoft should have made this schema validate.

XMLSpy indicates the <xs:complexType name="FieldDefinition"> tag is where the non-determinsm exists. We see the non-determinism almost immediately (if we know how to identify it) in the following lines

<xs:complexType name="FieldDefinition">
  <xs:sequence>
    <xs:group ref="PlainRules" minOccurs="0" maxOccurs="unbounded"/>
    <xs:element name="HELPTEXT" type="HelpTextRule" minOccurs="0"/>
    <xs:group ref="PlainRules" minOccurs="0" maxOccurs="unbounded"/>
  </xs:sequence>

This specifies a sequence of PlainRules (0 or more), followed by zero or one occurence of a HELPTEXT element, followed by 0 or more PlainRules again. I'm a little fuzzy on the default value of maxOccurs with the HELPTEXT element (and can't easily verify at the moment), so it might be 0 or more HELPTEXTs (and would explain why this isn't an attribute) but the following discussion holds true whether it's "0 or 1" or "0 or more" HELPTEXTs. Let's simplify this for purposes of explanation and call PlainRules "A" and HELPTEXT "B". Using symbols from regular expressions (and formal languages and such, stretching back to university) we'll use * as "0 or more occurrences" and ? as "0 or 1 occurence." This allows us to look at this XML fragment containing the non-determinism as:

A* B? A*

I went to the trouble of converting to these symbols in order to easily illustrate the non-determinsm. By the XML schema specification, each element (the particle component) must non-ambiguously belong to a predictable part of the schema. The following sequences belong to the language A* B? A*

  • AAAAABAAAAA
  • ABA
  • AABAA
  • AB
  • BA

All the preceding sequences avoid the ambiguity issue. The A's that come BEFORE the B belong to the first A group, and all the A's that come AFTER the B belong to the second A group. The ambiguity is solved here by the presence of B, but the language states B is optional, so the following are also valid members of the language A* B? A*

  • A
  • AAAAAAAAA
  • AAA

This is where we encounter the ambiguity. When there is a single A, does that A belong to the first group of A's or the second? When there is a sequence of A's, such as "AAA" then it's just as likely for any of the A's to belong to the A* before the B? as it is for them to belong to the A* after. There is no way to know which A* an A belongs without mandating B in the middle.

The only reason I can see for Microsoft to introduce this ambiguity is to make it easier for people modifying the work item type definitions. These people can place HELPTEXT anywhere as a child of the FIELD element, instead of mandating that HELPTEXT appear first. It seems a straightforward requirement to say "HELPTEXT" must appear in a specific position rather than anywhere in a big mess of PlainRules, since all other elements must be placed precisely.

In order to fix this issue, any tweaks done to the schema must not break the existing schema (such that a work item type validates against both the original and the fixed schemas). The most straightforward way I came up with is to mandate that HELPTEXT, if present, is the first child of the FIELD element. This means the language A* B? A* is rewritten as B? A*. Now the parser knows if there are any A's (PlainRules), they match the one and only A* specification. I changed the FieldDefinition type in the schema to

<xs:complexType name="FieldDefinition">
   <xs:sequence>
      <xs:element name="HELPTEXT" type="HelpTextRule" minOccurs="0"/>
      <xs:group ref="PlainRules" minOccurs="0" maxOccurs="unbounded"/>
   </xs:sequence>

For those a step ahead of me, you'll realize that if HELPTEXT appears after any PlainRules in a work item type, it no longer validates against this revised schema (while still validating against the original). Since I'm writing a tool to manipulate work item types, my fix is to execute code to rewrite the work item type, ensuring any occurence of HELPTEXT appears as the first child of any FIELD elements. I'll be placing this code on this site shortly incase anyone wants to use it.

There's one more modification needed to the XML schema so it validates - the two regular expressions used at the bottom (in SizeType and PaddingType) have their commas escaped. Once the backslashes before the commas are removed, and the other changes done, the XML file validates fine.

Please note that I don't know every corner of the XML schema specification, nor have I explored the Orcas or Rosario TFS versions, so information here might be incorrect or out of date shortly. I imagine I'll update this topic after seeing how things change in Orcas/Rosario.

Sunday, August 05, 2007 11:11:00 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  |  Trackback

There's much discussion of RIA in the blogosphere, most of which I've been reading at some Microsoft blogs and some Adobe blogs. I'm not an artist, even my stick figures reveal my lack of skill (and I don't have the patience to get good like Richard Feynman did), but clean user interfaces and strong user experiences have been a side interest of mine dating back to my first reading of Design of Everyday Things. I've been turning my attention more toward RIA and the supporting infrastructures with this next wave of technology that includes WPF and Silverlight and Adobe AIR and Flash and Flex and others, and eventually might include Seadragon and Surface, and whatever technology other companies come out with. My views align with Scott Barnes in general, but I wanted to hash out what I think in writing about RIA since it'll force me to organize my thoughts. I see RIA as more of a shift in philosophy, a fresh approach to application design, than as a specific technology, though the new technologies enable RIA development.

There are two significant perspectives in the RIA picture: that of the developer and that of the user.

(Note: I recognize I am speaking in generalities and not supporting everything I say with evidence, but bear with me. If you're reading this and disagree, chime in)

The Developer Perspective

As developers, we tend to lose sight of the bigger picture. We get mired in discussions of what technology is better, why X language sucks because it doesn't support Y feature and why Z company is superior. These are religious discussions in the technology world and I have very little patience for them. There is no one language, no one platform that will be the best answer in absolutely every case, and when we have choice, why do we waste time complaining about a certain technology? We figure out which technology best allows us to solve a particular problem and we move forward. Sometimes we deal with constraints (like working at a company that only uses Microsoft technology) but as long as the job can get done, we're fine with whatever we work with (or we quit to work for a company more in line with our personal preferences).

I also see some software developers constantly trying to shift perception of Microsoft technology by only focusing on the negatives and ignoring the positives. On one hand, this indicates people are holding Microsoft to a higher standard - they expect perfection and nail Microsoft when perfection is missed. But this view ignores the reality - Microsoft is like any other big software company. Product quality varies from one to another. Some features may make no sense outside the design meetings where an imperfect decision had to be made. Other decisions are actually the right ones from one angle but wrong from a different angle. And some decisions are made that are just wrong. I hate blanket statements I've seen online that say "everything from Microsoft sucks" because it ignores the many successes and reveals the commenter's ignorance. There are legitimate reasons to zing Microsoft (for example, no label viewer in TFS 1.0? I know they have deadlines and have to cut features, but still, that feature got cut? :) ) so let's shy away from dismissing Microsoft - or any company - out of hand.

I used to hate Microsoft over ten years ago. Even back then it was hip to hate Microsoft. But my views changed abruptly when I gave MS technology a serious chance. It might have been Internet Explorer surpassing Netscape that got me hooked, I'm not sure. I haven't let go for two main reasons: Microsoft technology, as a whole, is actually quite nice; and, I can get my job done fast. I've had to wrestle far less with Microsoft technology than other technology. I keep up with other technology for the times when MS doesn't have what I need, or when I have technology constraints I can do nothing about. I mention this background because the "let's hate Microsoft" and "Silverlight will fail" discussions are nothing new to me. People think Windows is dying or new technology from Microsoft is a failure and these people totally miss the point. Wishing Microsoft would go away didn't work 10+ years ago and it's not going to work now.

At the end of the day, software engineers are problem solvers. We implement solutions in whatever domain we live and work in. Can .NET help us do this? Yes. Can the Java platform? Yes. Can I roll out professional websites using IIS, ASP.NET, Windows Server 2003, etc.? Yes. Can I do the same with LAMP? Yes. This is why the religious discussions should stop in our industry. The people that hate Microsoft will continue to hate them, the people that don't like Java or open source will continue their avoidance, but the funny thing is, these factions will continue to solve problems and continue being productive (hopefully!)

I'm semi-ranting and meandering a bit, but what I'm getting at is Silverlight/WPF aren't going anywhere and we need a more open perspective as software engineers. There's much about Silverlight that should get recognized as "cool" and important:

  • Cross-platform CLR that does not require the .NET framework
  • DLR, the dynamic language runtime, extending the language support of .NET even further
  • Cross-platform support of a subset of XAML/WPF (which I imagine will grow closer to the full implementation over time)
  • XAML XAML XAML. I'm incredibly excited about XAML because I see it as "the new HTML." Once Silverlight has strong penetration, website designers can choose to develop sites in XAML and be confident people can view them. No more dealing with messy HTML/CSS and testing on every browser to make sure the site looks/works the same.

I also like that the technologies on the Adobe side (HTML, JS, Flash, Flex) are given a home on the desktop via AIR. This makes it easy for website designers to extend their skill set to the desktop. Adobe brought the design world to desktop applications and Microsoft brought developers to the world of rich application design, far surpassing the stodgy world of the past (MFC, WinForms, etc.) There's plenty reason to get excited about both technologies. (For the record, yes, I'm aware of Sun's offering, but no comments at the moment) We must be responsible software engineers moving forward as the RIA world evolves, and this means staying well informed about as much technology as we can.

Which technology will "win?" That's the wrong question to ask because there's no competition. Both will continue to exist, each caters to a different type of developer, and the people that really matter in the end are the users.

The User Perspective

I envision a spectrum of users, from those with the absolute bare minimum of knowledge required to use computers to those that are fairly sophisticated but don't do software development. The one thing that unites users is they want their software to work. This is a simple goal at its most basic for software developers, but also a tough goal because everyone uses software a little different. Some people will love an application and others will hate it, either because certain features are hard to use or the user's sense of how a feature should work is different from how it actually works. The larger our user base is for a product, the more we have to first focus on the functionality that affects 90% of the users, and then going forward we can refine the product to work well with as many additional users as possible. This is reality again intruding on what we create - limited resources, limited time, etc. We can also never win 100% of the users - I doubt any product can. There are people that don't like iPods due to bad experiences, but it doesn't hinder the success of the iPod.

Let's start at the basic end of the spectrum. Basic users want things to "just work," whether it's their car or their TiVo or their operating system or some other software. They don't know how it works, they don't care how it works. If it breaks, they want it fixed. Take a car to a mechanic, call the Maytag repairman (okay, maybe call him to fix your cable), get the neighbor's kid to remove spyware. These are the users that don't care if software automatically updates itself - as long as the updates don't break anything. We can't disregard these users when we write software, or discount how many of them there are. These users are a significant part of the reason Microsoft stays as committed to backwards compatibility as they do - if users' existing software didn't work on a new platform, they'd refuse to upgrade, or worse, refuse to use Windows going forward. Whether a site is implemented in Adobe technology or Microsoft technology or whatever, the users don't know and don't really care. Why should they? They want sites they visit and links shared by friends to work, they want to read and respond to e-mail without a hassle.

As we move to the other end of the spectrum, we find users that have an increased knowledge of their software and how it works. They'll know where the advanced configuration dialogs are and will pretty much understand all the options. These users might turn off automatic updates in order to have more control, but the only reason they'd do this is if they've been burnt by automatic updates in the past. These users are more informed about technology and might have strong opinions. The more sophisticated a user is, the more control he wants over his world. It's probably why they are sophisticated to begin with - dissatisfaction with the default configuration, a yearning to understand all they can, or they have specific needs met only by the nether regions of a program (think about how many features Word offers that most users don't use).

The difficulty in developing software is knowing just how many options to expose and what sort of application design will appeal to the majority of users, and hopefully to all users. Most users won't explore configuration too deeply and will in fact be intimidated by too many options. Most users don't want a deep level of choice - again, they simply want software that does what they expect - though we must balance this with what the more sophisticated users want. When we design applications in the near future, we have to think deeply about users. It is a challenge, but the evolution that is occurring in the software industry can help elevate the nature of applications we design.

Conclusion

It appears I've wandered far away from RIA, but I haven't. I see Rich Interactive Application design (yes, I prefer this term, and no, not simply because I'm towing the MS line) as a refocusing on the user via rethinking our user interfaces and application designs, whether applications are on the Internet or not. Using new technology, whether it's Flex or Silverlight or something else, opens more possibilities for us as software engineers. I think we're at the beginning of the next major wave of how people interact with computers and it's definitely an exciting time to contribute our vision and our expertise. It is important to raise our consciousness about this shift and move away from arguing about which technology is superior. We're all in this together, now let's get to the work of building awesome, useful technology using the tools given to us.

Sunday, August 05, 2007 2:51:56 AM (Eastern Standard Time, UTC-05:00)  #    Comments [2]  |  Trackback
Friday, August 03, 2007

Windows Presentation Foundation introduces a number of new concepts, such as XAML, dependency properties, data binding (in the WPF/XAML world), and type converters. Let's take a brief look at type converters and how they're used in WPF.

Since XAML is an XML dialect, parameters to objects are specified as strings. The parameters aren't usually actually strings, so we need a way for the XAML parser to convert the string to the correct object type. This is accomplished via type converters.

Using a type converter is XAML is easy, for example:

<Button Content="Accept" Background="Blue" />

The color specified as a string is converted to a Color object by the XAML parser, since the Button class' Background property is of type Color.

A type converter is obtained by passing a type of object to the GetConverter method, like this:

System.ComponentModel.TypeDescriptor.GetConverter(typeof(type));

The TypeConverter class has a number of useful methods, some of which are:

  • CanConvertFrom: specifies which type it can convert from
  • ConvertFrom: performs conversion
  • CanConvertTo: specifies which type it can convert to
  • ConvertTo: performs conversion
  • IsValid: validates whether an object is valid for this type
  • CanCreateInstance: whether the converter can perform creation of an object based on property values
  • CreateInstance: Re-creates an object based on an IDictionary of properties

One of the beautiful things about XAML and other bits introduced with WPF is that they aren't WPF specific. XAML is an application markup language that mirrors .NET classes in markup, so can be used outside WPF. Type converters are no different - the support is built into .NET 3.0+ so you can add knowledge of these bits to your tool kit and use them where appropriate.

If you want to implement your own type converter, reference this link at MSDN: http://msdn2.microsoft.com/en-us/library/ayybcxe5....

Friday, August 03, 2007 9:27:07 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  |  Trackback
Thursday, August 02, 2007

Scanning is the process of breaking input up into discrete tokens (such as one or more letters forming a word) and parsing is the process of applying meaning to these tokens (such as multiple words strung together to form a sentence, following specific grammatical rules). Software developers come from a variety of backgrounds, and while some may remember using these tools to construct a compiler in their Computer Science undergrad, I have a feeling many are unfamiliar with (or forgot about) these tools. I'd wager that the most well known scanner and parser, in the general pool of software developers, are lex and yacc. The "lex" name is short for lexical analyzer, a tool that understands a syntactic unit of information, such as a word in an English sentence, and feeds these units to another program one at a time. The "lex" tool is a scanner generator since it creates code that scans input. The "yacc" name is short for "yet another compiler compiler" which is accurate since the common usage of a parser is to create a compiler, so you can say the parser compiles a compiler. In general, though, "yacc" is a parser generator.

Remember that a compiler is nothing more than a translator. When you write code in any .NET language, a compiler translates the high level code (such as C#) to a lower level code (Intermediate Language, or IL) which is what the Common Language Runtime (CLR) understands and executes. In a way, a decompiler is actually a compiler, translating the low level code, such as IL, to a higher level language, such as C#. We use the terms "compiler" and "decompiler" because they communicate the direction of the translation. Compilers must scan input and then parse it. They must know that "public" is a token and "class" is a token and what an identifier looks like, but it's the parser that understands what a method is and what a while loop looks like and how they translate to IL.

A scanner and parser can be used separately. I won't go into details of constructing a parser in this post, but I will discuss developing a simple scanner using a scanner configuration that is translated into C# code. Before delving into an example, why do we care about a tool that can create a scanner for us? What are the benefits? Much processing of input we do in the business world (at least in my experience) is easily constructed by hand. We don't usually need to construct a compiler, so isn't using a scanner overkill? Like any problem we're called to solve, it's important to know about as many tools as possible, so when we encounter a case where a certain tool would save us significant time, we can use it since we know it exists. There are also business problems, such as sophisticated data translation, where a scanner/parser might be the perfect set of tools. Here are a few benefits to using an auto-generated scanner:

  • A scanner generator allows us to focus on the syntactic elements and not worry about any other details
  • If syntactic elements change, it's easy to update the code file used as input to the scanner generator 
  • A scanner can feed anything - a parser can accept the tokens, your custom code can, etc., so the syntactic analysis of input is separated from applying meaning to the input
  • Development of scanner is much faster than rolling one by hand, unless input is rather simple, and relying on a scanner generator reduces chances of introducing error into the scanner
  • A generated scanner is typically faster than one you roll on your own

A scanner generator for C# that I've used is available at this link C#Lex site

The syntax of the input files to C#Lex follow the syntax detailed at this JLex page except where called out at the C# Lexer site.

Let's look at an example: analyzing simple English. Words in English can take on multiple forms:

  • Words with an initial capital, such as at the beginning of a sentence or proper names
  • Contractions - words with an apostrophe (e.g., can't, don't, won't)
  • Abbreviations - words that are all capitals and optionally have a period after each letter (e.g., IL, CLR, e.g.)
  • Quoted words - single or double quotes on both sides (e.g., "compiler")

We'll stop there in the interest of keeping this simple.

An input file to the C#Lex program is formatted in three sections, each section separated with a double percent (%%) on a line of its own

  1. User code. This section is copied directly to the output file without modification, so you can place implementation and 'using' statements here.
  2. Directives to control C#Lex. This is where you can control C#Lex, such as specifying scanning states, and also specify regular expressions to match input.
  3. Scanner rules. This is where you specify what to recognize, what to do with it, state transitions, etc. When a rule matches here, data can be returned to the code running the scanning loop via a special class called Yytoken (that you define).

The scanning states allow recognition of different syntactic units at different times, so if certain syntactic units can only follow other syntactic units (think about visibility keywords such as 'public' and 'private' that can't appear outside a namespace declaration) you can control this in the scanner generator code.

The program we'll write is dead simple for purposes of illustration: it'll accept tokens from the scanner and output each token, one per line.

I won't go into details of the regular expression language, instead keeping it simple and showing the regular expressions we need without much explanation.

A word is simple: [A-Z]?[a-z']+

This gets us an optional initial capital and a sequence of one or more lower case letters and apostrophes. This doesn't limit the number of apostrophes, so let's revise it.

[A-Z]?[a-z]+'?[a-z]*

We can continue refining to ensure the end of the contraction is one of just a few options (such as "t" or "s") but let's keep it simple.

A word can also have all capitals: [A-Z]+

optionally separated by periods: ([A-Z]\.?)+

but periods can also separate lowercased words: ([a-z]\.?)+

Combining these gives us: [A-Z]?[a-z]+'?[a-z]* | ([A-Z]\.)+ | ([a-z]\.)+

We continue like this until we end up with a set of regular expressions that fully describe the input. Since the dot matches any character except newlines, we'll add a rule to pass this input back as a catch all rule. You probably don't want this in a real application, but it illustrates how to match any input not matched by other rules. Any whitespace is skipped over, along with punctuation we're not interested in (exclamation point, question mark, commas, periods). These regular expressions are far from perfect or comprehensive but they illustrate the process of analyzing the nature of the input and constructing the required expressions to scan the input.

I'm including the final file at the end of this post.

A Yytoken class must be defined. This is the communication mechanism between the scanner and the code you write and can hold any information you want (such as where in the input the scanner is, state information, etc). An instance of this class is what is returned by yylex() in the main scanning loop located in our code:

      Yytoken t;
      while ((t = yy.yylex()) != null)
      {
         System.Console.WriteLine(t.m_text);
      }

Now that we have an input file to the scanner generator, we first generate the C# code by executing C#Lex.exe on this file, then compile the generated C# file by invoking csc.

C#Lex english.lex

csc english.lex.cs

The input file (input.txt) has this line: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut "labore" et dolore magna aliqua.

Running english.lex.exe gives us the following output

Lorem
ipsum
dolor
sit
amet
consectetur
adipisicing
elit
sed
do
eiusmod
tempor
incididunt
ut
"labore"
et
dolore
magna
aliqua

Our final file looks like this:

using System;
using System.IO;

class WordExample {
   public static void Main(string[] argv) {
      Yylex yy = new Yylex(new StreamReader(new FileStream("test.txt", FileMode.Open)));

      Yytoken t;
      while ((t = yy.yylex()) != null)
      {
         Console.WriteLine(t.m_text);
      }
      Console.WriteLine();
   }
}

class Yytoken {
   public Yytoken(string token)
   {
      m_text = token;
   }
   public string m_text;
}

%%

ALPHA=[A-Za-z]
WORD=[A-Z]?[a-z]+'?[a-z]* | ([A-Z]\.?)+ | ([a-z]\.?)+
WHITE_SPACE_CHAR=[\n\ \t\b\012\r]

%%

<YYINITIAL> {WORD} { return(new Yytoken(yytext())); }

<YYINITIAL> \"{WORD}\" { return(new Yytoken(yytext())); }

<YYINITIAL> {WHITE_SPACE_CHAR}+ { }

<YYINITIAL> [\.,\?!] { }

<YYINITIAL> . { return(new Yytoken(yytext())); }

Thursday, August 02, 2007 11:05:06 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  |  Trackback
Wednesday, August 01, 2007

I’ve had my new Vista machine for awhile and haven't gotten around to writing down my impressions until now. Initially I really liked Vista. Some of the detail work (which I’ll describe later) was well thought out and did save me time in standard operation. UAC was hardly an issue after I got everything configured, and now it appears only at expected times, such as installing new programs. But as I continued to use my machine, a frustrating problem started - sometimes my machine would freeze. I repeatedly searched for solutions online, but none of the described scenarios were exactly like mine. The freezes started when I created a new directory through a file browse dialog. It wouldn’t happen every time but eventually the freezes increased in frequency and had no obvious trigger. At its peak, my computer would barely run for ten minutes before freezing on its own. During each freeze the mouse cursor was fine, which tells me Windows was still there, but the task bar froze and all active windows would slowly change to non-responding windows (I could do some limited alt-tabbing around for a short while). I tried a variety of solutions from updating drivers to various tweaks but nothing helped. Eventually the freezes mysteriously disappeared. I’m really not sure why – my best guess is Windows Update pulled down a fix since problems don't just disappear magically. I was eagerly hoping this would happen, and if this is indeed the case, I’m pleased. I bought this machine to keep my skills up to date with new Microsoft technology (from Vista to the new Office to all manner of exciting stuff happening in the .NET world) and now that it's purring, I'm quite happy.

Some of the detail work I love:

  • Visual appearance overall is pleasant. I like Aero Glass and the general look/feel of Vista.
  • I love that user directories are now under C:\Users (so mine is C:\Users\Jeff). I always dodged the "My Documents" folder on Windows XP (as a user) but I find myself organizing much more under my user directory on Vista than I ever did on XP.
  • The file extension isn’t highlighted when renaming a file in Explorer. How many times did we have to de-select the extension in previous Windows versions?
  • The central window that appears when you hold down Alt while Alt-Tabbing lets you use the mouse to select a specific window. This is such a time saver when you want to go to one of many windows, and the way I work, I have many, many windows open.
  • Flip 3D is really nice and I love that the windows in the stack continue their updating, as opposed to a stack of screenshots. I don’t know how behavior changes on different systems, but this behavior on my computer is awesome.
  • Files/programs are indexed, so it's easy to add the search in the start menu to quickly go to files/programs. I'm always going here to start Process Explorer.
  • I see potential for the sidebar but I haven't made much use of it. I figure loving this feature is a matter of installing the right gadgets, which I haven't searched for yet (I'm too busy watching videos of software I wish I was a part of developing, like SeaDragon and Photosynth)
  • One of the non-important drivers sometimes fails and Vista stops it without any ill effects on the system. I assume this driver is running in user mode which ensures it doesn't romp around memory and makes it easy to kill. I suppose I should disable the faulty driver.
  • I've been running Vista constantly for about three months and the only system crash I had was World of Warcraft overheating the CPU. Vista booted up fine after the abrupt shut off (and time to cool the computer) and I was back to grinding in the Ghostlands in no time.
  • It hooks up nicely to my Xbox 360 and watching video from my laptop on my television is a snap

These comments are more from a user perspective than a Vista developer perspective. I haven't dug into using any of the Vista-specific programming bits yet, but I am looking forward to exploring. I'm most interested in experimenting with the security improvements exposed to application developers. I'm spending a lot of my time these days learning WPF/Silverlight and playing around with various other technologies at work and at home, so getting into the guts of Vista hasn't happened yet and probably won't for awhile.

As an aside to the above comments, I wanted to briefly post the differences between versions of Vista.

  • Vista Basic: Stripped down Vista, no Aero Desktop, no media center, limited backup, no collaboration/media creation tools.
  • Vista Home Premium: Suitable for most regular users. Includes Aero, collaboration, media center, media creation (Windows DVD Maker, Windows Movie Maker)
  • Vista Business: Good version for businesses - doesn't include media center / media creation software, but does include the full PC Backup/Restore tool and Fax/Scan tools.
  • Vista Ultimate: This has everything.

Consult http://www.microsoft.com/windows/products/windowsv... for detailed information.

Wednesday, August 01, 2007 10:08:04 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  |  Trackback
Wednesday, March 21, 2007

I've been looking to buy a laptop for awhile and decided it was worth the wait until Vista made its full debut. I tried some of the Vista betas but had difficulty with drivers for my custom built desktop machine. Nonetheless, I'm still quite excited about Vista.

I've heard and read many things about Vista, both positive and negative. The most significant negative I've come across is the sheer number of UAC prompts. No security mechanism is perfect, and we do trade security for convenience, but the purpose of UAC, as I understand it, is to prevent the granting of elevated privileges to rogue programs. I've heard someone say Microsoft went too far with UAC in building in security, but is there a way to reduce the number of UAC prompts and maintain the level of security it provides? (And for any nitpickers, I'm well aware UAC is by far not the only element of security added to Vista, but it does seem to be the most visible to end users)

I've heard people complain that Administrator is not the "true" administrator on Vista, that you still get plenty of UAC prompts. Again, looking closely at where we've come from, users are used to running as Administrator on XP. Like it or not, the Administrator account must also be locked down to some degree. UAC isn't a magic bullet, and it seems imperfect (but is any single security mechanism sufficient?) but it is also there to avoid granting privileges to programs that shouldn't have them.

On the positive side, I've read that once you get through the initial configuration of your machine, after running the various programs that need the elevated privileges to install, UAC quiets down. I'll soon see what my experiences with UAC are.

I just came across a post I really liked at Ryan Bemrose's blog (The Audio Fool). He discusses various classes of legacy applications that we will be running on Vista and how these affect UAC. You can read his post at Categories of Legacy Applications 

I don't have my Vista laptop yet, but once I do I plan on documenting my early experiences with it. I realize that I'm an experienced software engineer, significantly more technical than the average user, so I'm actually expecting far more UAC prompts than average. I'll be documenting which I believe are those that end users will deal with and those that myself, installing certain more technical pieces of software, will encounter. My challenge to myself is to avoid turning UAC off, though I might consider using the security policy edit I've come across to grant the Administrator account full administrator privileges in order to avoid UAC prompts.

At this point in time I'm highly optimistic about Vista, and while acknowledging my pro-Microsoft bias, I'll try to report as honestly as I can :) Sometimes Microsoft technology does frustrate me (I have stories for another time).

Wednesday, March 21, 2007 9:51:50 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  |  Trackback
Sunday, March 18, 2007

At work we currently use Serena Version Manager and Tracker products (PVCS) for our source control and issue tracking. As previously (albeit briefly) mentioned, we are in the process of converting to Team System. Ultimately we decided to start the source control repository clean instead of importing our source trees from Version Manager. I did some exploratory work on figuring out how to do an import, and although it's not fully fleshed out, I'd like to throw my findings out there in the hopes this will help someone.

Okay, now to my (rough) approach. The first goal was to retrieve all version history from VM. After many fruitless online searches I eventually found what I needed, a command line utility which provides an interface to VM. This utility is pcli.exe, which I found in C:\Program Files\PVCS\win32\bin, but obviously your installation will differ. I had trouble finding documentation online for this program, however if you type "pcli -h" you will see a list of commands. You can type "pcli command -h" to get assistance on options for a particular command.

I used this utility to retrieve a list of all files within a particular folder (and all its subfolders) including all revision information for each file. I ran the following command to retrieve this list: pcli.exe vlog -z -idUSERNAME:PASSWORD -fOUTPUT.TXT -pr"PROJECT NAME" PATH_IN_VM

The "vlog" command reports archive/revision information. If you type "pcli vlog -h" you can see a full list of options. Several things to note about the command line I used. The "-z" argument recursively processes the path. Username and password are separated by a single colon. The project name should be in the form \\NAME with no trailing backslash. The PATH_IN_VM, however, must use forward slashes, with a forward slash starting the path and no trailing slash. I believe I tried different combinations and settled on these - other combinations might work, but I know for a fact that these will.

If you execute this on a VM project you should have an OUTPUT.TXT that contains information such as what follows next. I'll first show example contents and then discuss how to parse it and use the information. Before discussing this file, I should again caution that I barely scratched the surface of parsing this file. My focus was narrow, simply only needing each revision number of a particular file and some extra information such as author, check in comments, etc. Okay, here's a sample OUTPUT.TXT

Archive:          \\SampleProject\archives\Dev\Web\PrototypeSite\Default.aspx-arc
Workfile:         Default.aspx
Archive created:  Oct 10 2006 13:37:44
Split mode:       Split
\\SampleProject\archives\Dev\Web\PrototypeSite\Default.aspx-arc
Owner:            jeff
Last trunk rev:   1.2
Locks:           
Groups:          
Rev count:        3
Attributes:
   WRITEPROTECT
   CHECKLOCK
   NOEXCLUSIVELOCK
   EXPANDKEYWORDS
   NOTRANSLATE
   NOCOMPRESSDELTA
   NOCOMPRESSWORKIMAGE
   GENERATEDELTA
   COMMENTPREFIX = ""
   NEWLINE = "\r\n"
Version labels:
   "Baseline 1.0" = 1.2
   "SCR-1" = 1.2
Description:
Baseline 1.0

-----------------------------------
Rev 1.2
Checked in:     Oct 11 2006 11:53:04
Last modified:  Oct 11 2006 11:52:14
Author id: jeff     lines deleted/added/moved: 6/6/0
added custom httpmodule
Resolution for 1: baseline
-----------------------------------
Rev 1.1
Checked in:     Oct 10 2006 17:43:24
Last modified:  Oct 10 2006 17:40:46
Author id: jeff     lines deleted/added/moved: 0/7/0
added login screen
Resolution for 1: baseline
-----------------------------------
Rev 1.0
Checked in:     Oct 10 2006 13:37:44
Last modified:  Sep 27 2006 16:48:04
Author id: jeff     lines deleted/added/moved: 0/0/0
creation of web project
===================================

The first line of the file (and first line of text after any line containing only equal signs, ====) provides the full path to the file, however you need to strip the -arc at the end and also take out the archives\ directory. Since we want revision information, the next step is to skip ahead to the first line of dashes. After each dashed line we can read the revision number that appears after "Rev". In my parser I also parsed out check-in date, author ID, etc., but we only need revision numbers to retrieve a particular revision from VM. Also note that revisions are listed in reverse check-in order, so the latest revision comes first. When reading the revision information, either place this into a Stack (which you can then convert to an array via ToArray(), which will then be in the order you want) or process a List in reverse order. Again, each file is separated by a line of equal signs, so when you hit this line you know to stop processing the current file.

Now that we have all files in our repository and all revision information for each file, we have to construct a call to PCLI to retrieve a particular revision. This is the part that is ridiculously slow and sets off my internal alarms that scream THERE IS A BETTER WAY TO DO THIS! I just didn't dig in deep enough to find out the better way and I couldn't be bothered to do so now.

The command to retrieve a particular revision of a file is: pcli.exe run -y -ns get -id USERNAME:PASSWORD -o -a"c:\local\path\here" -wREV# -pr"PROJECTNAME" PATHNAME

"run" is the command, which we use to execute a specific PCLI command which, in this case, is "get"

The -y and -ns options stand alone, -y specifying that all prompts should be answered "yes" (I ran this from a C# program and did not want any user interaction) and -ns prevents extra stripping of quotes from command line.

The -id option is used again to specify credentials.

The -o option is used to override work file location so we can pull the file down to a directory of our choosing without needing to deal with work file locations.

The -a option is used to specify where the retrieved file should go.

The -w option specifies which revision number we want, such as -w1.0

The -pr option is used to specify the project and should be in the same format as our previous invocation of PCLI, for example, \\SampleProject

Now we have all the pieces to get a full list of files/revision history from VM and automatically retrieve a particular revision of a file. You can write a program to automatically invoke PCLI to create the revision history file, parse that file and pull down revisions one by one. I would suggest first processing all 1.0 revisions of all files within a particular root folder and then using Team Foundation Server's source control API to add these files. Then process all 1.1 revisions, checking these into Team Foundation server. And so on... Along with your check ins to TFS you can include original information such as author ID, check in comments, check in date, etc., so you can maintain a reasonable degree of information, hopefully alleviating the need to keep VM around for historical purposes. I'm leaving out details of the TFS source control API since it is documented well in many other places, but if you would like some suggestions or help, please leave a comment.

I'm almost ashamed to document such a clunky method of export/import, but without devoting significantly more time to exploring PCLI and various VM interfaces, this method was the best I found that I knew would work. There are (at least) two ways to improve on what I discussed, but I did not figure out how to do either. First, I believe there is a way to do batch operations with PCLI so you don't need to pull down a single revision at a time. Second, I came across a client API, but did not readily see how to do authentication so I was not sure how to properly connect to a repository and perform the calls needed.

I really hope this helps someone out there :)

Sunday, March 18, 2007 5:46:23 PM (Eastern Standard Time, UTC-05:00)  #    Comments [1]  |  Trackback
Saturday, March 17, 2007

I'm exploring some new technology by revisiting some old technology. I make no promise to the frequency of posts, but before delving into the specifics of what I'm attempting, some background information is necessary.

The old technology I'm looking at is something that was developed on DOS in the early 1990s. It used the standard 80x25 text mode in order to show various colored characters. The characters came from the 8-bit extended ASCII table. If my memory doesn't fail me, the standard ASCII table is 7-bits, which means it stores characters from index 0 to 127 (27 - 1) for a total of 128 characters. The 8-bit extended ASCII table provides for 256 characters, from index 0 to 255 (28 - 1). However, the first 32 characters are known as "non-printable." Although most have some graphical repesentation behind them, they are control characters meant to signal the printer or DOS to do something such as beep or insert a tab or go to the next line. These first 32 characters are listed below.

Character Meaning
0 (null)
1 Start of Heading
2 Start of Text
3 End of Text
4 End of Transmission
5 Enquiry
6 Acknowledge
7 Bell
8 Backspace
9 Horizontal Tab
10 Linefeed (new line)
11 Vertical Tab
12 Form feed (new page)
13 Carriage Return
14 Shift Out
15 Shift In
16 Data Link Escape
17 Device Control 1
18 Device Control 2
19 Device Control 3
20 Device Control 4
21 Negative Acknowledge
22 Synchronous Idle
23 End of Transmission Block
24 Cancel
25 End of Medium
26 Substitute
27 Escape
28 File Separator
29 Group Separator
30 Record Separator
31 Unit Separator

I'm listing these mostly for curiousity's sake (these are cribbed from http://www.asciitable.com). Characters 1 through 26 also correspond to Ctrl-A through Ctrl-Z (sometimes seen as ^A or ^G, the caret representing the Ctrl key). Even in modern programming we still encounter some of these characters, most significantly tab, carriage return and newline, which we know as \t, \r and \n in C# and other high level languages. The carriage return was used to make the cursor (and I'm fairly sure some printer heads, dating back to typewriters) go to the beginning of the current line, and then newline was used to advance the cursor to the next line. This is why seeing "\r\n" is common on Windows. The next character, 32, is the space, and this is where the printable area of the ASCII table starts. Interestingly, the control characters all have a symbol behind them (except for 0, which is empty but is treated as distinct from the space, character 32. The index into the ASCII table is what matters, not how it looks). Character 1, for example, is a hollow happy face, and character 2 is a filled in happy face. Each character in the ASCII table is made up of 8x16 pixels. Since the DOS text mode I'm looking at here is 80 characters across and 25 down, this gives us 80*8 x 25*16 pixels, or 640x400 for the video mode resolution. This character set, even the non-printable range, can be drawn to the screen on DOS. Each character can also have a foreground and background color.

DOS supports coloring of characters based on two hex values combined to form foreground and background. An alternate way to color on DOS is using ANSI escape sequences, but we aren't concerned with ANSI escape sequences here. If, right now, you open a command window (Win-R, type "cmd" and enter) and then type "color /?" at the prompt, you'll get a list of colors that correspond to each hex value. The colors Aqua/Cyan are the same, and Purple/Magenta are the same, incase you encounter these variant names somewhere. The "color" command uses "Aqua" and "Purple" but for what I'm working on, we will use its terminology, "Cyan" and "Purple." Also, different from the color command, there is no dark yellow (it's brown) or bright yellow (it's simply yellow). The full list of colors are listed next.

Value Color
0 Black
1 Blue
2 Green
3 Cyan
4 Red
5 Purple
6 Brown
7 White
8 Grey
9 Bright Blue
A Bright Green
B Bright Cyan
C Bright Red
D Bright Purple
E Yellow
F Bright White

It isn't essential to know how these colors are used when printing a character, other than the fact that the color information can be stored in a single byte, where the upper nibble is the foreground color and the lower nibble is the background color. Thus E9 is yellow text on a bright blue background. The color command places the background color first, so if you want to try to change your command window colors, type "color 9E" for yellow on bright blue.

The next goal is to ensure we have the full set of ASCII characters (all 256) and the proper colors (we can use an eye dropper from a Paint tool to get exact RGB from a command window), all formatted for the presentation framework I intend to use. This is what will come next, and shortly after, an introduction of exactly what I'm attempting to create.

Saturday, March 17, 2007 10:27:26 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  |  Trackback