Technology Answer: April 2011

Friday, April 29, 2011

LDAP Java library

We are using J2EE to develop a security product that relies on LDAP for authentication and role-based user management.

The team has implemented this using the JNDI but we have run into various pitfalls. I am looking for an LDAP API that handles all the low-level details and satisfies the following requirements:

LDAP User authentication and Authorization
Good performance (even with large and slow LDAP servers)
Support the main LDAP flavors (AD, Novell eDirectory etc.)

Can anyone recommend an open source or commercial package?

From stackoverflow

http://www.openldap.org/jldap/
I've used Spring's LDAP modules. I think they make programming with LDAP as easy as using JDBC. If you're using Spring, I recommend them highly. If you're not, there's value in learning it.

LDAP itself is already providing a pretty high level abstraction for directory servers, I haven't seen many libraries that provide a further abstraction on top of that. I have written my own little library to enable my own application to talk to LDAP servers (in my case, also an Active Directory server).

The java.naming.directory package is where the interesting stuff is. Connecting to an LDAP server is really not too hard...

// set properties for our connection and provider
Properties properties = new Properties();
properties.put( Context.INITIAL_CONTEXT_FACTORY, 
  "com.sun.jndi.ldap.LdapCtxFactory" );
properties.put( Context.PROVIDER_URL, "ldap://myserver.somewhere.com:389"; );
properties.put( Context.REFERRAL, "ignore" );

// set properties for authentication
properties.put( Context.SECURITY_PRINCIPAL, "User Name" );
properties.put( Context.SECURITY_CREDENTIALS, "password" );

InitialDirContext context = new InitialDirContext( properties );

Running searches against the directory isn't that much more difficult.

// Create the search controls
SearchControls searchCtls = new SearchControls();

// Specify the search scope
searchCtls.setSearchScope(SearchControls.SUBTREE_SCOPE);

// specify the LDAP search filter, just users
String searchFilter = "(&(objectClass=user)( cn=Joe Someone))";

// Specify the attributes to return
String returnedAtts[]={"memberOf"};
searchCtls.setReturningAttributes(returnedAtts);

NamingEnumeration answer = context.search( "dc=com,dc=somewhere", searchFilter, 
  searchCtls );

From there, authentication is very easy: the last line above will throw a NamingException is the username and password are not valid credentials.

I have used the Acegi Security library to good effect with a couple applications, getting Acegi to work with an LDAP backend is pretty straightforward; this may be the more high level solution you are looking for.

boutta : I'd accept your answer, since your code worked out of the box for me.

I have used Netscape LDAP SDK for Java in place of JNDI on a couple of occasions (e.g. LDAP Maven Plugin). But that was because I needed to import/export records using LDIF and DSML.

For web applications that need to manage entries in the directory I have used Spring LDAP. This is a layer on top of JNDI. One of my colleagues implemented caching using Spring AOP to improve the performance.

However, I believe the problem is most likely to in the design of your directory and the starting point that you use for your directory searches and look ups. Also make sure you use filters to avoid returning back all the attributes for an object. You are probably only interested in a small subset of what may be available.

You might also want to consider using a higher level framework for your security requirements. I generally use Spring Security for dealing with authorization and access control when developing web applications. However, whenever possible I prefer to do the authentication on the web server. If you are using Apache HTTPD then you can used the mod_ldap module.
Consider NOT using the Java APIs and instead checking out ArisID and its abstraction layer for identity...
Netscapes LDAP API is nice, easy to use, but JNDI should replace it.

What pitfalls did you have with JNDI? You may want to stay with JNDI and just get help on the pitfalls.

Once you have JNDI set up it isn't too hard to use, but the initial setup is a bit of a pain. :)

JNDI is more flexible than the Netscape API. I haven't tried Spring's implementation, but if you are going to use Spring for other parts of an application it would be a good choice to use it.

The LDAP server is where you will find the slowdown, not in the API. OpenLDAP has been too slow for me, when updating, but I had to put over 80k users into it, but Sun's IPlanet worked well. The various APIs won't show a slowdown as they are much faster than updating the database.
Check out http://code.google.com/p/object-ldap-mapping/ It is based on Spring LDAP but provides API similar to JPA
You can even try with jLDAPBeans (http://jldapbeans.sf.net), it follows same approach than JPA building an abstraction layer over JNDI, it doesn't depends on Spring and objects are defined as interfaces, an ldap object loaded with this library implements as many "entity interfaces" as objectClasses are defined in the directory.

it has an old experimental version (0.1) and a newer one (0.5) it's being implemented with a lot of refactoring in order to offer the same things as a full JPA implementation (caching, transactions, etc). The problem is that last version is not stable so you need to work with the experimental version.
Actually, todays, there is only 4 relevant Java LDAP SDK. Other are not supported anymore and/or not updated for Java 5 and higher:
- JNDI LDAP is still the "standard" choice, but you should use it if and only if you are FORCE to (for historical reasons, for example). JNDI LDAP is just a pain to use: almost anything you want to do with it is hard, even if LDAP itself is a really, really simple protocol, and I don't even talk about LDAP more advance features... But sometime, you just don't have any choice;
- Spring LDAP http://www.springsource.org/ldap ; I would say use it if and only if your application is already full Spring, and you are really used to the Spring "template" abstraction. But per se, Spring LDAP is just a layer on top of JNDI, and so it doesn't bring better performances or other LDAP specific features ;
- the ongoing effort to build a new default, common LDAP API, by ApacheDS and OpenDS people : http://cwiki.apache.org/confluence/display/LDAPAPI/Index ; it is a beginning, and not ready for production use, but you should keep an eye on that project ;
- and finally, THE SDK to use right now, in place JNDI LDAP: UnboundID LDAP SDK http://www.unboundid.com/products/ldapsdk/ ; Simple for simple use cases but nevertheless full support of LDAP, good performances, nice new features added regularly (the 2.0 add a object/entry mapping&persistence API), etc.
So, if you have only one to keep in mind, just get UnboundID LDAP SDK.

Trying to understand the code for the MeioUpload behavior for CakePHP

I was reading through the source code for MeioUpload to make sure I understand what it's doing, and for the most part the code is pretty easy to understand. However, I came upon a section of code which I just can't seem to figure out, and so I'm trying to determine if it's a mistake on the author's part or if I'm just missing something.

Essentially, this function is passed the filename of a default image, and adds that filename to a list of reserved words (and generates a replacement string for it). I have put an arrow and question marks (in comments) next to the line of code I can't figure out:

/**
 * Include a pattern of reserved word based on a filename, 
 * and it's replacement.
 * @author Vinicius Mendes
 * @return null
 * @param $default String
 */
function _includeDefaultReplacement($default){
 $replacements = $this->replacements;
 list($newPattern, $ext) = $this->splitFilenameAndExt($default);
 if(!in_array($newPattern, $this->patterns)){
  $this->patterns[] = $newPattern;
  $newReplacement = $newPattern;
  if(isset($newReplacement[1])){ // <--- ???
   if($newReplacement[1] != '_'){
    $newReplacement[1] = '_';
   } else {
    $newReplacement[1] = 'a';
   }
  } elseif($newReplacement != '_') {
   $newReplacement = '_';
  } else {
   $newReplacement = 'a';
  }
  $this->replacements[] = $newReplacement;
 }
}

As I understand it, $newReplacement should always be a string, not an array. That is because ultimately it gets its value from the first element of the array returned from this function:

function splitFilenameAndExt($filename){
 $parts = explode('.',$filename);
 $ext = $parts[count($parts)-1];
 unset($parts[count($parts)-1]);
 $filename = implode('.',$parts);
 return array($filename,$ext);
}

So that if() statement makes no sense to me. It seems to be trying to catch a condition which could never occur. Or am I wrong and that section of code does serve a purpose?

From stackoverflow

Well, I can't explain the actual reasoning behind why it's doing it, but when you use a particular index on a string value like that, you're accessing a particular character of the string. That is, it's checking whether the filename has a second character, which it then replaces with either '_' or 'a'. If the filename is only one character long, it replaces the whole thing with either '_' or 'a'.

I can explain in more detail what that function does if you like, but I don't really have any understanding of what it's trying to accomplish.

Calvin : Actually, with your explanation it all makes sense now. Thanks for enlightening me on the use of indexes on strings.
Chad Birch has already answered my question (my original confusion was due to not understanding that $var[n] can be used to find the n^th character of a string.), but just in case others are wondering, here's an explanation of what these functions are trying to accomplish:

MeioUpload is a file/image upload behavior for CakePHP. Using it, you can set any field in your model to behave as an upload field, like so:
```
var $actsAs = array(
 'MeioUpload' => array(
  'picture' => array(
   'dir' => 'img{DS}{model}{DS}{field}',
   'create_directory' => true,
   'allowed_mime' => array('image/jpeg', 'image/pjpeg', 'image/png'),
   'allowed_ext' => array('.jpg', '.jpeg', '.png'),
   'thumbsizes' => array(
    'normal' => array('width'=>180, 'height'=>180),
    'small' => array('width'=>72, 'height'=>72)
   ),
   'default' => 'default.png'
  )
 )
);
```
In the above example, MeioUpload will treat the field named "picture" as an upload field. This model happens to be named "product," so the upload directory would be "/img/product/picture/." The above configurations also specify that 2 thumbnails should be generated. So if I were to upload an image named "foo.png", the following files would be saved on the server:
```
/img/product/picture/foo.png
/img/product/picture/thumb.foo.png *
/img/product/picture/thumb.small.foo.png
```
* - thumbsizes labeled 'normal' do not have their key appended to their filenames

Additionally, the default images are also stored in the same directory:
```
/img/product/picture/default.png
/img/product/picture/thumb.default.png
/img/product/picture/thumb.small.default.png
```
But since we don't want the user-uploaded images, default images, or auto-generated thumbnails to overwrite one another, the author has created the following pair of arrays:
```
var $patterns = array(
 "thumb",
 "default"
);

var $replacements = array(
 "t_umb",
 "d_fault"
);
```
which are used to prevent filename conflicts when saving uploaded files:
```
$filename = str_replace($this->patterns,$this->replacements,$filename);
```
_includeDefaultReplacement() is used to add new reserved words when the default image is named something else.

In MVC, what are the limitations on the Controller?

Should the Controller make direct assignments on the Model objects, or just tell the Model what needs to be done?

From stackoverflow

It's fine for it to make "direct assignments" on the model, as long as it does so through an interface.
The controller has two traditional roles:
1. handling the input event from the UI (registered handler or callback)
2. notifying the model of an action--which may or may not result in a change on the model's state
It does not perform data validation, that is on the model, nor does it have any say in how information is presented.

Chris Noe : More generally, the Controller coordinates the View and Model. If it invoke changes on the model, it would do so indirectly. E.g., via facade methods. The intent is to isolate the business logic from the user interface.

zsharp : In this sense, the idea of model binding to webuser input in the controller method would not be recommended?
The Model services don't have to know the existence of the controller, thus, controller can do the stuff what ever the view needs by utilising the model services.
It depends largely on the scope of your application. If it's relatively quick and dirty, then there's no sense in over-engineering, and sure, your controllers can talk to your model objects. On the other hand, if it needs to be more "enterprisey" for whatever reason, a good pattern to use in conjunction with MVC is the so-called "Business Delegate". This is where you can compose coarse-grained methods out of one or more methods on one or more model objects; for instance deleting an object and then returning a refreshed list without that object. This layer gives two advantages. For one, it decouples the controllers from whatever ORM system is being used for model objects. Furthermore, it is the layer that finally must constructively deal with any exceptions that may have occurred instead of re-throwing them.
I don't think a controller should be dealing with model objects.

I tend to think that controller is really part of the UI tier. I prefer to inject a service layer in-between the controller and the rest of the app. The web tier accepts HTTP requests, unmarshals parameters from request objects into objects that the service interface can deal with, and marshals the response to send back. All the work with transactions, units of work, and dealing with model and persistence objects is done by the service.

This approach is more service oriented. It separates the service from the user interface, leaving open the possibility that several clients can reuse the same service. It makes the layer that marshals requests to the service "thin", so it's easy to switch out SOAP services for REST or EJB or CORBA or whatever the next new thing will be.

zsharp : ok, if you wanted to pull the Request's IP would you do it in the service layer or in the controller?

zsharp : the same for user identification?

duffymo : controller for IP address. as for security, it's a cross-cutting concern, so it belongs in an aspect.

Bug tracking that works with Google Apps?

Are there any hosted bug tracking systems that work natively with Google Apps for authentication, notification services, etc?

From stackoverflow

I've played with a good deal of bug tracking systems and have never encountered one that works natively with Google Apps.
Yes, you can make use of Google Sites, which has the ability to create "issue tracker" pages among other things.

Anirvan : I'm trying to figure out how to do this, but am totally not finding it. Can you add a description, screenshot, or URL to look at? Thanks!

lfaraone : See http://yfrog.com/eescreenshot1gzgp and http://yfrog.com/e0screenshotwp. On the create-a-new-page screen select "list", and it takes you to the latter page.

lfaraone : Did that work for you?

crosenblum : Neat....i hope i can convince my fellow workers to be interested in this lol...i doubt it.
TheBugTrack is an online bug tracking service seamlessly integrated with Google Apps.

Running a windows service in a console

What is the best way to run a windows service as a console?

My current idea is to pass in an "/exe" argument and do the work of the windows service, then calling Application.Run().

The reason I'm doing this is to better debug a windows service and allow easier profiling of the code. The service is basically hosting .NET remoted objects.

From stackoverflow

The Code Project site had a great article showing how to run a Windows Service in the Visual Studio debugger, no console app needed.

C:\> MyWindowsService.exe /?
MyWindowsService.exe /console
MyWindowsService.exe -console

This is how I do it. Give me the same .exe for console app and service. To start as a console app it needs a command line parameter of -c.

private static ManualResetEvent m_daemonUp = new ManualResetEvent(false);

[STAThread]
static void Main(string[] args)
{
    bool isConsole = false;

    if (args != null && args.Length == 1 && args[0].StartsWith("-c")) {
        isConsole = true;
        Console.WriteLine("Daemon starting");

        MyDaemon daemon = new MyDaemon();

        Thread daemonThread = new Thread(new ThreadStart(daemon.Start));
        daemonThread.Start();
        m_daemonUp.WaitOne();
    }
    else {
        System.ServiceProcess.ServiceBase[] ServicesToRun;
        ServicesToRun = new System.ServiceProcess.ServiceBase[] { new Service() };
        System.ServiceProcess.ServiceBase.Run(ServicesToRun);
    }
}

Michael Hedgpeth : I ended up with this: ThreadPool.QueueUserWorkItem(state => service.DoWork()); new ManualResetEvent(false).WaitOne(); I've read that using the ThreadPool is almost always better than explicitly creating threads.

sipwiz : Using the ThreadPool is a good idea. I would generally use the ThreadPool ahead of creating a new Thread as well. In the above example I did want more control of the thread for some reason I can't now recall.

What is the meaning of these Windows Enviroment variables: HOMEDRIVE, HOMEPATH, HOMESHARE, USERPROFILE

What is the meaning of these Windows Enviroment variables:

HOMEDRIVE,
HOMEPATH,
HOMESHARE,
and USERPROFILE?

Who set them? When? Who use them? For doing what?

How the configuration of the samba server modify these variables?

Thanks!

From stackoverflow

if you go to the run box and type any of the above like this

%HOMEPATH%

then it will go to your environment path that is set on your machine. It's usefull when writing vb scrips and things like that where you want to perform a task on the users profile area for example.

Hope this helps
Those are all set on login, and they are, as SocialAddict said, very useful in scripts when you need to perform an action on different systems.

I'm not too clear on your other question, a samba server shouldn't care about those variables.

See http://vlaurie.com/computers2/Articles/environment.htm for a detailed explanation.
Here is a more comprehensive list of Windows Environment Variables with their locations on:
- Windows XP
- Vista
http://en.wikipedia.org/wiki/Environment%5Fvariable#Default%5FValues%5Fon%5FMicrosoft%5FWindows

partial list unpack in python

Hello. In python, assignment operator can unpack list or tuple into variables, like this:

l = (1, 2)
a, b = l # here goes auto unpack

But i need to specify exactly same amount of names to the left as an items count in the list to the right. But sometimes i don't know a size of the list to the right, for example if i use split(). Example:

a, b = "length=25".split("=") # this will result in a="length" and b=25

But the following code will lead an error:

a, b = "DEFAULT_LENGTH".split("=") # error, list has only 1 item

is it possible to somehow unpack list in the example above so i get a = "DEFAULT_LENGTH" and b equals to 'None' or not set? Straightforward way looks kinda long:

a = b = None
if "=" in string :
  a, b = string.split("=")
else :
  a = string

From stackoverflow

Have you tried this?

values = aString.split("=")
if len(values) == 1:
   a = values[0]
else:
   a, b = values

Devin Jeanpierre : It should be a = values[0]

MizardX : or "a, = values"

This is slightly better than your solution but still not very elegant; it wouldn't surprise me if there's a better way to do it.
```
a, b = (string.split("=") + [None])[:2]
```
```
# this will result in a="length" and b="25"
a, b = "length=25".partition("=")[::2]

# this will result in a="DEFAULT_LENGTH" and b=""
a, b = "DEFAULT_LENGTH".partition("=")[::2]
```
Eye of Hell : any way to skip 'equals' variable name, eg a,,b = str.partition("=") ?

SilentGhost : @Eye: you can use [::2] to select elements with even index

Chris Upchurch : Nice SilentGhost! Added it to the answer.

Eye of Hell : [::2] is brilliant, thanks!

You could write a helper function to do it.

>>> def pack(values, size):
...     if len(values) >= size:
...         return values[:size]
...     return values + [None] * (size - len(values))
...
>>> a, b = pack('a:b:c'.split(':'), 2)
>>> a, b
('a', 'b')
>>> a, b = pack('a'.split(':'), 2)
>>> a, b
('a', None)

Don't use this code, it is meant as a joke, but it does what you want:
```
a = b = None
try: a, b = [a for a in 'DEFAULT_LENGTH'.split('=')]
except: pass
```
Eye of Hell : yeah, i can list comprehensions too :).

Brian : Just wait till someone tries to extend it to work for 3 variables though (or use python3)! Putting that in your code someone might read would be rather evil :-) A more sane approach is possibly just putting a=theString in the except block.
The nicest way is using the partition string method:

Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings.

New in version 2.5.
```
>>> inputstr = "length=25"
>>> inputstr.partition("=")
('length', '=', '25')
>>> name, _, value = inputstr.partition("=")
>>> print name, value
length 25
```
It also works for strings not containing the =:
```
>>> inputstr = "DEFAULT_VALUE"
>>> inputstr.partition("=")
('DEFAULT_VALUE', '', '')
```
If for some reason you are using a version of Python before 2.5, you can use list-slicing to do much the same, if slightly less tidily:
```
>>> x = "DEFAULT_LENGTH"

>>> a = x.split("=")[0]
>>> b = "=".join(x.split("=")[1:])

>>> print (a, b)
('DEFAULT_LENGTH', '')
```
..and when x = "length=25":
```
('length', '25')
```
Easily turned into a function or lambda:
```
>>> part = lambda x: (x.split("=")[0], "=".join(x.split("=")[1:]))
>>> part("length=25")
('length', '25')
>>> part('DEFAULT_LENGTH')
('DEFAULT_LENGTH', '')
```
gorsky : +1 for str.partition
This may be of no use to you unless you're using python 3. However, for completeness, it's worth noting that the extended tuple unpacking introduced there allows you do do things like:
```
>>> a, *b = "length=25".split("=")
>>> a,b
("length", ['25'])
>>> a, *b = "DEFAULT_LENGTH".split("=")
>>> a,b
("DEFAULT_LENGTH", [])
```
ie. tuple unpacking now works similarly to how it does in argument unpacking, so you can denote "the rest of the items" with *, and get them as a (possibly empty) list.

Partition is probably the best solution for what you're doing however.
But sometimes i don't know a size of the list to the right, for example if i use split().

Yeah, when I've got cases with limit>1 (so I can't use partition) I usually plump for:
```
def paddedsplit(s, find, limit):
    parts= s.split(find, limit)
    return parts+[parts[0][:0]]*(limit+1-len(parts))

username, password, hash= paddedsplit(credentials, ':', 2)
```
(parts[0][:0] is there to get an empty ‘str’ or ‘unicode’, matching whichever of those the split produced. You could use None if you prefer.)
Many other solutions have been proposed, but I have to say the most straightforward to me is still
```
a, b = string.split("=") if "=" in string else (string, None)
```

As an alternative, perhaps use a regular expression?

>>> import re
>>> unpack_re = re.compile("(\w*)(?:=(\w*))?")

>>> x = "DEFAULT_LENGTH"
>>> unpack_re.match(x).groups()
('DEFAULT_LENGTH', None)

>>> y = "length=107"
>>> unpack_re.match(y).groups()
('length', '107')

If you make sure the re.match() always succeeds, .groups() will always return the right number of elements to unpack into your tuple, so you can safely do

a,b = unpack_re.match(x).groups()

C++ Class Serialization Help

Hi, I quite recently learned about the C++ classes friend keyword and the uses in serialization and now I need some help in getting it to work.

I have no problem serializing my class to a file, it's working great, however i'm having a hard time trying to read this file into a vector container. I'm sure I need a loop in my code that reads line by line, but since the class has different types I guess I can't use std::getline() and also maybe that approach wouldn't use the istream method i implemented? A sample output file would be:

Person 1
2009
1

Person 2
2001
0

My code:

class SalesPeople {
    friend ostream &operator<<(ostream &stream, SalesPeople salesppl);
    friend istream &operator>>(istream &stream, SalesPeople &salesppl);

    private:
        string fullname;
        int employeeID;
        int startYear;
        bool status;
};

ostream &operator<<(ostream &stream, SalesPeople salesppl)
{
    stream << salesppl.fullname << endl;
    stream << salesppl.startYear << endl;
    stream << salesppl.status << endl;
    stream << endl;
    return stream;
}

istream &operator>>(istream &stream, SalesPeople &salesppl)
{
    stream >> salesppl.fullname;
    stream >> salesppl.startYear;
    stream >> salesppl.status;
    // not sure how to read that empty extra line here ?
    return stream;
}

// need some help here trying to read the file into a vector<SalesPeople>
SalesPeople employee;
vector<SalesPeople> employees;

ifstream read("employees.dat", ios::in);
if (!read) {
   cerr << "Unable to open input file.\n";
   return 1;
}

// i am pretty sure i need a loop here and should go line by line 
// to read all the records, however the class has different
// types and im not sure how to use the istream method here.

read >> employee;
employees.push_back(employee);

By the way, I know that the Boost library has a great serialization class, however I'm trying to learn how serialization would work using the STL library for now. Thanks a lot in advance for any help that you can give me and for getting me in the right track!

From stackoverflow

Not sure what your problem is. What exactly are you not understanding ? The fact that your names are composed of multiple tokens ? There's no magic way to do it, you might want to get the name trough getline(). Alternatively, you may want to specify the number of tokens when serializing and read the appropriate token count. ie, your file might look like.

2 Person 1

I assumed that Person was the first name and 1 the last name here. You might also enforce the notion that there's one first name, and one last name and just read each one separately.

You'll typically loop while (!ifstream.eof()) and read. Of course, you should always validate the inputs.

Also, why are you adding an extra endl between each record ? Serialized data need not be pretty. :)
It looks like you pretty much have all the code you need already! I copied your code and compiled it with some changes to read the SalesPeople in from a file in a loop. I will include the changes below, but since this is for your homework, you may just want to read and think about the following hints before looking at the code.
- For reading the SalesPeople in a loop, I would recommend that you take a look at this FAQ. It has an example of almost exactly what you need. FAQ 15.4 will also help you, I believe.
- For your question on how to handle the extra empty line when reading from the file, check out this link. You can very simply extract whitespace this way.
- As jfclavette suggested, I would recommend looking into std::getline for reading in the SalesPerson's full name, since you need everything on that line into one string.
I have one question for you, though: what about the employeeID? I notice that it is being ignored in your sample code. Is that on purpose?

And now, if you still need help, you can check out the code I wrote to get this to work:
```
istream &operator>>(istream &stream, SalesPeople &salesppl)
{
    //stream >> salesppl.fullname;
    getline(stream, salesppl.fullname);
    stream >> salesppl.startYear;
    stream >> salesppl.status;
    // not sure how to read that empty extra line here ?
    stream >> ws;
    return stream;
}

while(read >> employee)
{
    // cout << employee; // to verify the input, uncomment this line
    employees.push_back(employee);
}
```
Also, as jfclavette suggested, it may not be a bad idea to add some input validation (check the stream status after reading from it and verify that it is still good). Although I would recommend using the while() loop for the reasons stated in FAQ 15.5.

nmuntz : Thank you so much!!!! That helped a lot! Now I understand much better what I was doing wrong. Also, I had no idea about ws, I have been wondering how to do that for a long time! Thank you so much!

Venesectrix : No problem, glad I could help!

How to deal with the URISyntaxException

I got this error message :

java.net.URISyntaxException: Illegal character in query at index 31: http://finance.yahoo.com/q/h?s=^IXIC

My_Url = http://finance.yahoo.com/q/h?s=^IXIC

When I copied it into a browser address field, it showed the correct page, it's a valid URL, but I can't parse it with this : new URI(My_Url)

I tried : My_Url=My_Url.replace("^","\\^"), but 

<1> It won't be the url I need
<2> It doesn't work either

How to handle this ?

Frank

From stackoverflow

Use "%"-encoding for the "^" character, viz. http://finance.yahoo.com/q/h?s=%5EIXIC

You need to encode the URI to replace illegal characters with legal encoded characters. If you first make a URL (so you don't have to do the parsing yourself) and then make a URI using the four-argument constructor, then the constructor will do the encoding for you.

import java.net.*;

public class Test {
  public static void main(String[] args) {
    String myURL = "http://finance.yahoo.com/q/h?s=^IXIC";
    try {
      URL url = new URL(myURL);
      URI uri = new URI(url.getProtocol(), url.getHost(), url.getPath(), url.getQuery(), null);
      System.out.println("URI " + uri.toString() + " is OK");
    } catch (MalformedURLException e) {
      System.out.println("URL " + myURL + " is a malformed URL");
    } catch (URISyntaxException e) {
      System.out.println("URI " + myURL + " is a malformed URL");
    }
  }
}

You have to encode your parameters.

Something like this will do:

import java.net.*;
import java.io.*;

public class EncodeParameter { 

    public static void main( String [] args ) throws URISyntaxException ,
                                         UnsupportedEncodingException   { 

        String myQuery = "^IXIC";

        URI uri = new URI( String.format( 
                           "http://finance.yahoo.com/q/h?s=%s", 
                           URLEncoder.encode( myQuery , "UTF8" ) ) );

        System.out.println( uri );

    }
}

http://java.sun.com/javase/6/docs/api/java/net/URLEncoder.html

How do I time a program executing in Windows?

I want to be able to do the Windows equivalent of this Unix/Linux command:

time fooenter code here

foo
x cpu time
y real time
z wallclock time

From stackoverflow

timeit from the Windows Server 2003 Resource Kit should do the trick.

Can I destroy my /test folder if i'm using Rspec and Cucumber?

Is this possible? Or would I miss something? (Fixtures,..?)

Because: When I use autotest with rspec, autotest doesnt seem to work? It loads, but nothing else happens, even if i change a file..?

From stackoverflow

Yes - but you have to use command
```
autospec
```
and not
```
autotest
```
if you use RSpec.
yes you can remove the directory 'test' completely. if you use fixtures you can put them in spec/fixtures. thought fixtures are not a good idea imho :)

Lichtamberg : Why are fixtures not a good idea imho?

Christoph Schiessl : Because fixtures are just too hard to maintain. Use Stubs and Mocks instead! Complete Mocking Framework is already built in in RSpec.

C# - Do you use "var"?

C# 3.0 introduces implicitly typed variables, aka the "var" keyword.

var daysInAWeek = 7;
var paul = FindPerson("Paul");
var result = null as IPerson;

Others have asked about what it does or what the problems with it are:

I am interested in some numbers - do you use it? If so, how do you use it?

I never use var (and I never use anonymous types)
I only use var for anonymous types
I only use var where the type is obvious
I use var all the time!

From stackoverflow

3 - I just used var in a situation where it was obvious. The code looked something like this:

List<ErrorCodes> errors = AttemptMethod();
foreach (var errorCode in errorList) {
    // error handling code
}

I use it whenever it's possible. I in fact design my APIs to be type inference friendly. Type inference increases type safety in a code base so there is little reason not to use it.

http://blogs.msdn.com/jaredpar/archive/2008/09/09/when-to-use-type-inference.aspx

EDIT

Here are the steps I take to make my API's more type inference friendly

Add type inference friendly factory methods

Take the following for List<T>
```
public static class ImmutableCollection {
    public static ImmutableCollection<T> Create(IEnumerable<T> e) { return new ImmutableCollection<T>(e); }
}
```
Now I can write the following in my code.
```
var list = ImmutableCollection.Create(someEnumerable);
```
Blog entry on the subject http://blogs.msdn.com/jaredpar/archive/2008/04/11/design-guidelines-provide-type-inference-friendly-create-function-for-generic-objects.aspx

Avoid out and ref parameters

Instead return a Tuple or an Option class. For example Dictionary.TryGetValue() could be made more inference friendly if it had the following signature
```
Option<TValue> TryGetValue<TKey,TValue>(this Dictionary<TKey,TValue> map, TKey key) {
  TValue value;
  if (map.TryGetValue(key, out value)) {
    return Option.Create(value);
  }
  return Option.Empty;
}
```
Allows for nice friendly dictionary access
```
var opt = map.TryGetValue(42);
```
Paul Stovell : Jared, what specific additional steps do you take to be more friendly for type inference?

Roman Boiko : Great answer. I was doing similar things, but my implementations were too naive compared to these. Thanks a lot, very useful code. I wish I saw you utility library earlier, too.
While using var makes writing code faster, I find myself explicitly typing my variables, still. Obviously you have to use it for anonymous types, so I'll use it there.

Hence, my answer is: 2
4 - I'm a var whore, I use var everywhere.

The only time I don't use var is when I'm defining a new method/ class so I don't have to change it from object later on.

I find that var makes my life a lot easier when I do refactoring, especially when used within foreach loops.

John Baughman : I wish 1.1 had var...

Daniel Schaffer : I'm glad I'm not the only one... I've been feeling dirty for using it so much, but no longer!

Yuriy Faktorovich : I'm glad I don't work on your code.

Shimmy : OMG a var whore that's a good one im too!!

Sohnee : @John Baughman - I weep for you my friend! I haven't touched 1.1 in a long time and I don't miss it at all!!!
I'm going for 3.

There are times when its kind of silly, and there are times when it makes sense.

If you're using it to replace say int, then it doesn't make much sense. You're not really benefiting, your using var just to use it, it's still three letters. If you're using it to infer an iterator or for a LINQ query or when you know what the method returns and it makes the code fit in the visible area of the code editor, cool.

Not mention the fact that if you're experimenting with LINQ and you keep getting compiler errors because your query goes from returning an IEnumerable<T> to IQuerable<T> to List<T>, then you're wasting your time.

Honestly, If you've ever used languages (like C++) which don't have type inference (not including TR1) it can make code where you use vectors painful to read because you send most of your time scrolling the code into view.

EDIT: Yes, in TR1 the C++ team re-purposed the auto keyword, let rejoice in the love!

Paul Stovell : the new c++ will have the "auto" keyword i believe, which does the same thing as "var".

Chris : @Paul: yes, TR1 or `Technical Release 1` includes it.
Yes, definitley. Consider this:
```
Dictionary<int,Tuple<int,string>> items = new Dictionary<int,Tuple<int,string>>();
```
and then this:
```
var items = new Dictionary<int,Tuple<int,string>>();
```
The latter is much better IMO. For a start it doesn't require a scroll bar to be viewed!

I'm probably a 3.5.

Chris : yup, the less punctuation your eyes stumble over the better
I'm more of a type 2 guy - for clarity when reading code, I prefer to explicitly state the type whenever possible. Sure, in the IDE we have Intellisense and all - but how about on a printout?

Whenever possible, try to be as explicit as possible - makes reading code easier for other guys who come in and have to understand your code later on.

Marc

Adam Lassek : var e = new IEnumerable(); Is there any doubt what the type of e is? Do you _really_ need the second type declaration?

marc_s : Yes, I would strongly suggest to use the EXPLICIT type whenever possible - it's just cleaner, clearer. If it doesn't cost you any effort, express your intent explicitly. It helps down the line.

Adam Lassek : I don't understand your definition of explicit in this case. Writing the type declaration a second time doesn't give you any more information.

marc_s : Yes it does - it clearly states that you want a string - if you suddenly change something and you assign something else, you at least might get a compiler warning/error. Be as explicit as possible - it only costs you three extra characters here....

Adam Lassek : I still don't understand what you mean. In my example, I'm already clearly defining that I'm initializing an IEnumerable. The constructor for IEnumerable will never give me anything else. Repeating the type declaration a second time adds no information.

marc_s : Well, coming from a Pascal background, I'm always in favour of CLEARLY specifying what you want. Say you have IEnumerable and expect that - suddenly you apply .First() clause to it and you get back a single string - no longer an IEnumerable. Clearly saying what you want / expect is helpful.

Adam Lassek : Point taken; specifying the type when the right side of the assignment isn't perfectly clear is not redundant. But, this is not the example I gave. Specifying the type is good when it's being assigned from a non-generic function, but there are still many cases where it's pointless duplication.

Roman Boiko : @Adam Lassek: A small typo: one can't use ` new IEnumerable()`, because it can't be instantiated (it is an interface). `List()` **can**. Sometimes I want to make my variable `IEnumerable` and assign to it an instance `List()`. For example, when I need to store the `IEnumerator`. One of such cases is described here: http://msmvps.com/blogs/jon_skeet/archive/2010/07/27/iterate-damn-you.aspx
I`d like to use it whenever it's possible(because I do not use it frequently), but is there any performance problem?

Paul Stovell : no, the compiler completely optimizes it out - including in the "null as IPerson" version I showed.

BPAndrew : what about for intellisense and visual studio? if I had 100,000 lines of code littered with vars will my IDE hang trying to figure all this out as I go?

Roman Boiko : @BPAndrew: IntelliSense doesn't spend more time on `var` that on explicit type declaration. It needs to figure out the type of the right-hand side of assignment, regardless of whether you use `var` or not.
I've read/heard a lot of people advising that you shouldn't abuse var because the type is vitally important to code readability.

For example:
```
IFoo f = GetMeSomething();
DoSomething(f);
```
Using var supposedly makes it less readable:
```
var f = GetMeSomething();
DoSomething(f);
```
Now we don't know what f is, which is supposed to be bad.

And yet no one would ban function composition:
```
DoSomething(GetMeSomething());
```
The sky doesn't fall in every time someone does that. Also lambdas have built-in type inference:
```
list.ForEach(f => DoSomething(f));
```
So it may be that over time, people get used to the idea of using var by default and only avoiding it where there really is a good reason.

Or maybe it will be like "Methods should only have one exit" and hang around in coding standards for decades, for no good reason. Another one of these is "Lambdas should be no more than two lines long."

Personally I hardly every use var but only for "cultural" reasons (i.e. to avoid starting arguments), because the benefit is usually minor. For example, someone said above that there's no point using var instead of int because they're the same number of characters to type. Not so - if you have a bunch of code using int and later you decide to use double, it saves you a few seconds of search and replace!

Markus : "using var by default" - Hell, yes! And while we're on it, let's drop type safety altogether and let the compiler figure it out? :)

Daniel Earwicker : @Markus - just in case you're not joking: this isn't dropping type safety. In fact, if you use `var` in a `foreach` loop, you may *increase* type safety.

Markus : @Daniel - Definitely a joke. :) Although a slightly bitter one, I mean, we've got `dynamic` now ... :)) To be honest, I'm really not a fan of the idea to use var for anything else than anonymous types, or maybe two-liners. People argue about ultra long type names, but that's a thing that can be fixed with a using in a second. Same for the old IDE argument: While it may be true that you can see the type thanks to IntelliSense, this doesn't work on paper or when looking at a diff. I really think that if one knows the type, one should write it down. That's what it's for.

Markus : @Daniel - That said, I liked your answer. I don't agree with it, but it made me think about my point of view.
i'm a 3 as i think that writing var is muuuch more convenient than having ultra long declarations with namespaces, multiple generic definifinitions and so on.
I'm only using var when I dealing with LINQ.
Somewhere in between 2 and 3. It's handy for some things regarding to data-access (e.g. entity framework). But, when ever possible I try to use it where the resulting type is obvious or at least you get it with a small amount of thought...
Actually, i'm more of a 2.5 guy.

2.5 - I only use var when the type isn't very obvious, or when I don't really need to think about the type.

I seem to use it primarily with LINQ to SQL. ie.
```
var results = from blah ....
```
Honestly, i'm not a big fan of big honking types. You can use a using to alias them, but that's doing the same thing.

And I absolutely HATE interfaces as variable types.

Daniel Earwicker : "And I absolutely HATE interfaces as variable types." Wuh?

Mystere Man : just a personal preference. Of course I can't change the actual type, but I can hide it from my eyes when i'm coding by using var ;)
I'm a 2.

I only use var when dealing with LINQ to SQL.

Otherwise I don't really like making the compiler decide the type for my object.
I think I am a 2.5

I usually only use var for anonymous types but I also use it sometimes for Linq queries where the resulting type may change if I change the query later. Then I also use var so I have less refactoring to do when I change the retunr type of the select.
3, but I find 'obvious' to be highly subjective.

I've found most devs are ok with:
```
var sb = new StringBuilder();
```
But get annoyed by:
```
List<MyType> items = GoGetMyTypes();

foreach ( var item in items ) {
    //they don't know what item is here without intellisense.
}
```
John Kraft : I get annoyed in your second example when it is: var items = GoGetMyTypes(). That's really impossible without intellisense. As your second example stands, I'd have no problems with that.

Joel Mueller : I don't understand these "don't know the type without intellisense" arguments. Even free open-source C# IDE's have intellisense these days. It's like saying, "I don't know what type it is unless I can see." You can see. "Oh yeah." And don't start with the dead-tree code reviews, that's just silly.

Kyralessa : If you're getting a set of MyType objects, why would you call what you get back "items"? If you call it something useful ("myTypesWithActiveCustomers" or whatever), then there's no problem with saying foreach(var item in myTypesWithActiveCustomers).

Keith : @Kyralessa - maybe, but if I'm only using it across a few lines I tend not to bother with such long names.

Kyralessa : Well, if you're going to use non-obvious variable names, then it shouldn't be surprising that the use of var leaves things non-obvious.

Markus : Honestly, it's great that we all have so nice IDEs. Now assume for a second that we look at a diff output from our version control. It's fine I may understand from the variable name that this is some kind of "customer" we're talking about, but still I have no clue what type it is.
Daniel Earwicker : Using `var` in `foreach` is by far the best option. Suppose the loop variable `item` was declared to be of type `MyType`. A future maintainer could change `items` to be of type `List
` and the program would still compile just fine! You only find the problem at runtime, when someone puts a `string` on the list of items and the `foreach` throws an exception. This is because `foreach` was in the language before generics, so it has to insert a silent cast. Hence is not type safe. If you make the loop variable a `var`, you disable the silent cast and restore the language to static safety.
Personally, I only ever use var for anonymous types. Basically, when I have to use var rather than an explicit type.

I dislike the use of var for implicitly inferring types which can just as easily be explictly declared, i.e. :

// Yes, it's obvious that i is an int! var i = 10; // But why not use "int i = 10;". Just as many characters to type! int i =10;

Also, there are some situations where typing of the var is not obvious, at least not to me, i.e.:

var q = SomeFunkyMethodThatReturnsSomething();

Hmm.. Now I have to look up SomeFunkyMethodThatReturnsSomething in order to find out it's return type before I can know what q will be. Yes, intellisense can help, but I shouldn't have to rely on that, nor should I have to perform additional steps in order to know what type q is.

Here's another (admittedly contrived example) of where this kind of typing can be confusing:

double d = 0; var e = 0;

I don't know about you, but this makes me have to do a double-take (excuse the pun!). At first glance, due to the "code noise" if you like, it's not immediately obvious what e is here. I have to stop and think before realizing that e is an int.

Oskar Duveborn : Also refactoring shouldn't be a problem when not using var either, Resharper atleast will help you through it with great ease.

CraigTP : @Oskar - Very good point about the refactoring. I personally consider the use of var when its not absolutely necessary to be a "code smell".

Adam Lassek : Sure, if you're defining an int there's no point in using var; but most type declarations are a lot longer. If you're defining an IEnumerable, or a DataContext, var saves a lot of keystrokes and redundancy.

CraigTP : @Adam - I don't disagree with you when the type is something with a long name (ie. IInterface> etc.) however, for me it's a toss up between a few extra keystrokes and increased readability. I go for readability everytime. Extra keystrokes isn't that bad. We are coders after all!

Adam Lassek : I'm with you on readability, I just don't consider redundancy to be beneficial to that. I think one type declaration per line is plenty.

Joel Mueller : I consider NOT using var when you could to be a code smell. Well, that's an exaggeration, but really I think var helps readability more often than it hurts it.

Jonathan Shepherd : I don't think that var can be comfusing. double d = 0; var e = 0D; var myDecimal = 4.35M; It's self explained.

mattmc3 : *"Now I have to look up SomeFunkyMethodThatReturnsSomething"* Uh, no you don't. Just hover over `q` and the tooltip shows you exactly what the type of `q` is.
I'm at least a 3 (force of habit means I still type the type sometimes), for the reasons people have given above.

Having played with functional languages such as OCaml and Haskell, I'm quite happy with type inferencing - if anything C# doesn't go far enough!

1# here - aka "bah humbug, you crazy kids and your vars". back in my day we typed out all our types and we liked it just fine

Jon Tackabury : +1 for the comedy.

Matt Olenik : It's funny because type inference is really old!

I'm in camp #3. I prefer to use var when I feel it's obvious what the type is going to be. If I feel there is some ambiguity over it, then I'll be explicit.

VOTE : 3

But let me state that I explicitly define my variables. I let Resharper convert them to var when I clean the code. I am just not used to type var, but I use them ;-)

5) I only use var where the type is NOT obvious.

When using various interfaces (whether the interface type OR an abstract type), and if you even care to know, it can sometimes be a burden to determine at design time what type will be inferred and where it can be used down the line. One would hope the Liskov Principle would come into play here, but it doesn't always unfortunately.

I read on an MSDN blog (the specific one escapes me) to use var when you're not sure polymorphically what type will be coming back. So basically, if I know what it's going to be, and I'm not typing a bunch of generic type details out (as in Jonathan Parker's example above), then I use the specific type. Otherwise, I use var.

My humble $0.02

I'm a 3.5, or higher. There are some times that I don't use var, so I'm not quite a 4. I believe that redundancy is a worse problem than any readability hit. The redundancy that comes from restating the type over and over makes refactoring harder and just makes the code base more difficult to work with. There is already enough friction for developers to overcome as it is.

I'm a 1-2. I think the simple case is that over the longer term, the var keyword will have been seen to have weakened the language, because it inherently makes code harder to read, especially after 1 or 2 years. I've worked in several software houses already which have added var as a prohibited keyword, except in very specific instances, to enforce explicit declaration, and to improve long term readability. It should really be only used when your not sure when the rvalue return type is going to be, and intellisense doesn't immediately offer it up. Otherwise use sparingly.

I tend to use a lot of generics in my code,...as well as lambda expressions, and the like.

So yes, I use var everywhere I can.

Once you get over hungarian notation, switching to var is easy.

I use var with anonymous types in LINQ and in production code sometimes when the type is obvious from right side value.

In some tests and prototypes the var is very handy.

3 for me. I use it everywhere the type is obvious. Why on earth would anyone want to type the same thing twice, especially when generics with multiple type parameters are involved? :)
My standard is that the only time var is okay is with LINQ, anonymous types, and

var foo=new Something();

This is not okay:

var bar=GiveMeSomething();

Markus : +1 for disliking bar. :D (Jokes aside, I fully agree with this.)
I use var whenever possible. I find this invaluable for refactoring. I can change the return type of a method without having to change any of the methods that call it (in most cases).

I had an issue like this with code that returned List. The recommendation from FxCop was to use Collection or ReadOnlyCollection for these. Before making this refactoring, I first had to change all the explicit references to use var. Then, changing the return types made little or no change to the calling code.

We've discussed this at my current job, where the feeling among some is that it's OK to use var where the type is immediately obvious.

I seem to be in the minority, but I disagree. Why? Because I think having a type doesn't actually convey that much information. The name of the variable conveys much more. The real problem with using var everywhere is that it uncomfortably reveals the fact that you're choosing crappy names for your variables.

The solution, of course, isn't to ban var; it's to allow var everywhere and fix the variable-naming problems that become obvious when you do.

I use it only in LINQ queries, because i don't have to check the return type. Sometimes i use var in foreach loops, because of same reason. Otherwise is better using types.
I'm a huge fan of var simply because it looks so much nicer.

We work all the time to reduce repeated functionality (refactoring) but people seem to support this repeated declaration of types. Apart from convention, there's no need to declare the type on both sides of the equals sign.

var things = new List<int>(); List<int> things = new List<int>();

I prefer to use var when I can.

Having said that, I'm now working in an environment where a support team need to quickly analyse code and apparently do so in text editors rather than booting up Visual Studio. Therefore they have no Intellisense. Using var definitely doesn't make things clearer for the support team; if programmers are used to seeing the type declared first then they may be confused, especially when working under pressure at 3am. So I do see an argument for avoiding the use of var.
To some extent I think that as developers we look for a nice rule that we can apply across the board.

The correct answer to the famous "var debate" is that it should be used when appropriate. We're paid the big bucks because we can determine when to use things such as "var" and when to avoid using them.

I believe that the use of "var" can be more extensive than just for anonymous types - but using it absolutely everywhere suggests that not enough thought is going into its use.

2 - I only use var for anonymous types.

In many cases the use of var makes the code less readable, especially when used to store method results.

Posted by Ku XI at 9:58 PM 0 comments

Newer Posts Older Posts Home

Subscribe to: Posts (Atom)
Blog Archive

May (194)

April (700)

March (660)

February (629)

January (836)

Friday, April 29, 2011

Blog Archive