Full-Text Search: Human Heaven and Database Savior in the Cloud - Emmanuel Bernard & Aaron WalkerUsers think in terms of words, and the SQL
like operator doesn't respect word boundaries. Databases can't handle synonyms, proximity, relevance (result scoring). RDBMSs have some support for text search, but as proprietary SQL extensions that vary bewteen DB vendors. There are also issues with flexibility and scalability.
Hibernate Search extends the familiar Hibernate/JPA Query APIs and provides full-text search in your app with minimal intrusion. Full-text indexing is done through Lucene, using a thread pool. Lucene is very fast and scales well. Search can be run against multiple shards and the results are merged together.
(Kind of like searching across multiple collections in UltraSeek.)It "feels like Hibernate, searches like Lucene." You just need to add annotations to your model objects, like @Indexed, @DocumentId, and @Field(name="...", index=Index.TOKENIZED, store=Store.YES).
A custom Hibernate EntityManager extends the familiar Hibernate session and query objects and supports the usual features like pagination.
Search.getFullTextEntityManager(...);
session.createFullTextQuery();
With careful use of the full-text query API, you can avoid database hits through projection. If you only specify fields from the full-text index in the query's projection, it will retrieve it directly from the full-text index, avoiding the database. This uses StaticAliasToBeanResultTransformer.
Of course, this stuff runs on the cloud. They used Amazon EC2, EBS (
Elastic Block Storage), S3 and CloudFront. EBS is like a storage device that you can mount as part of the file system... snapshots of this are backed up to S3 storage. CloudFront is a content delivery network for static content, like images and video. They have edge servers around the globe, like Akamai.
The app server was JBoss running on CentOS. JOPR is a JBoss monitoring tools that can also be used to monitor the OS, app servers, and databases. For load balancing, httpd with mod_cluster and DNS round-robin.
Ajax Performance Tuning and Best Practices - Greg Murray (Netflix), Doris Chen (Sun)This presentation was all about general web performance tuning, very little had to do specifically with Ajax. Some of the speakers here like to "beef up" their titles with buzzwords to draw in a crowd. (It's Ajaxy! It's cloudy!) But, they just end up disappointing the audience. There was some good advice here, but I've heard a lot of it before.Place CSS and JS in separate files. Some inline JS and CSS are ok in the initial page, but after that has loaded, start loading in the external JS files that are used by the following pages.
After you're done with objects in JS, de-reference them (using
delete) and detach listeners (element.removeEventListener, element.detachEvent in IE). Use removeChild to remove unused DOM elements. Make use of window.onunload() as a clean-up function.
Hmmm... I always expected the browser to clean up after me after the page loads. Does this really improve performance? Is this part of the reason why browsers eat up tons of memory when you keep them open for hours? Think async. Avoid making function calls that block the browser's JS execution. Instead, use setTimeout() with a callback.
When working with objects, try to reduce dot operator use. So, do something like this...
var ds = divs[i].style
ds.color = ...
ds.padding = ...
...instead of repeating divs[i].style.
This makes your code more consise, too.Instead of using string concatenation, use an array, then use array.join() to merge the array elements into a string.
Kind of like using Java's StringBuilder. Most browsers are optimized for parsing the innerHtml property, so make use of it.
Use YSlow to time your site... use CSS Sprites... combine CSS, scripts, etc. for fewer requests.
(*Yawn*)Try to put CSS within the in HEAD element of your page if possible... this may help avoid the visual flash of unstyled content as your page loads.
Especially in smart browsers like FireFox that are better at incrementally rendering your page than IE.Move JS to the bottom of the page, because the browser must render it serially, and it blocks the browser's execution.
Use tools like JSMin and
Dean Edwards' Packer to compress JS.
Even if you are using GZip compression (supported by all modern browsers), JSMin can help a bit in production environments by stripping out comments. Run JSLint on your code to check for errors before you minify.
Make use of obvious things like the expires header and ETags if they are honored by the web server and browser.
What's new in Groovy 1.6? - Guillaume LaForgeNew in 1.6... groovyc compiler is about 3x to 5x faster, thanks to a clever class lookup cache. Overall, there is a runtime performance increase of about 150% - 460%.
New language features include mutiple assignments, like def (a, b) = [1, 2].
Very Perly. This is useful for utility functions like swap: (a, b) = [b, a]. The "return clauses" don't have to be explicitly declared in Groovy, they default to the last statement of the method.
Again, Perly. But, Groovy is now smarter about picking up the implicit return clause in conditional statements. Same applies to try / catch / finally blocks.
Groovy 1.6 now has complete support of Java 5 annotations.
Annotations aren't leveraged quite as much in Groovy as they are in Java code (mostly because Groovy is more runtime-oriented), but this is useful.New AST (Abstract Syntax Tree) transformations.
AST is "tree representation of your source code"... I guess you can use this to tap into Groovy's language parser. "Local" AST transformations are triggered by annotations:
- @Singleton(lazy=true) -- generates boilerplate singleton code like getInstance().
- @Immutable -- fields can't change.
- @Lazy -- makes an instance only when you first access it.
- @Delegate -- designate a class's method to be called on behalf of another class's method. Looks very useful.
This is very framework-like. I'm keeping the Spring 3.0 talk in mind as Guillaume goes over this. Seems like Spring Java Configuration will merge with these Groovy features before long.Grape is an "advanced packaging engine" for Groovy.
Aw, I had plans for "GrAPE"... maybe a certain portal-like framework integrated with Groovy? This is a way of specifying module dependencies through Maven / Ivy. Kind of an alternative to the ugly Maven pom.xml.
GroovyConsole can be customized to display your classes visually, like displaying maps with Swing table components.
Why hasn't Swing died?! Will Java FX help kill boring, stodgy, old Swing components? What about SWT (which Eclipse uses)? At least SWT leverages more of the OS's native UI controls.New Expando MetaClass features: the syntax is a bit easier for overriding operators, so you can do something like...
Number.metaClass {
mutiply { .... }
divide { ... }
}
...kind of like .NET with its getter and setter syntax."Runtime Mix-ins" let you inject behavior into types at runtime. So, if you have a boring JamesBondCar class, you can mix-in the FlyingAbility class to give it the fly() method.
This reminds me of multiple inheritence from C++, except it happens at runtime.If you wish to invoke Groovy from plain Java code, you can use java.script.ScriptEngineManager.getEngineByName("groovy").evaluate( "
groovy code here"). This is from JSR 223; supported in JDK 6.
Like using JS's eval() method.JMX Builder is a handy way to expose Java / Groovy as MBeans. You can easily create event handlers and broadcasters to be exposed through JMX.
May be useful for our Tomcat admins for monitoring application-specific objects through JMX.OSGi Readiness - Groovy JAR contains OSGi metadata.
I haven't heard as much hype about OSGi in this conference versus last year's. This might be because JDK 7 is going to support this type of modularity. In the last conference, they said that JDK 7 would support both OSGi and some new JDK 7 standard. But, not suprisingly, the "official Java standard" has been emphasized lately.
Scott Kessler asked about Groovy 1.7. You can expect more complete support of Java language features, like anonymous inner classes. This is largely handled by closures, but there are come cases where you need Groovy to mimic the genuine anonymous inner classes. Also, expect some concurrency helpers.
HtmlUnit - Daniel Gredler, Ahmed AshourUsing a tool like HtmlUnit for integration testing is not about trying to prove correctness.... it's about trying to catch errors before more exhaustive tests.
You can use a browser-driving solution.
Pros:
- Leverages browser configuration, including plugins.
- Easy to create test w/ recorders.
Cons:
- Browsers will grab all images, resources, etc. that may not be necessary for testing.
- Because of platform dependencies, hard to test multiple browsers on the same machine, like IE 6 and IE 7.
- Limited extensibility.
- Performance and scalability issues.
- Tests tend to be limited and fragile.
HtmlUnit is a browser simulator written in Java. It's a JS parser / executor + a CSS parser + HTML parser. It's been an active project since 2002. It's not just useful for integration testing; other uses include advanced web scraping and monitoring. Browser simulation focuses on FireFox 2 & 3 and IE 6 & 7, with support for other browsers in the works like IE 8 and maybe Chrome and Safari. You can simulate JS and CSS being disabled in the browser and control popup blockers. SSL is supported, but only with test certificates, so it should suffice for internal sites. Ajax is supported, but you may need to work around some
threading issues. No built-in support for Flash, but there is support for ActiveX controls through
Jacob (the Java-COM bridge). Very basic Java applet support.
There are extention points to simulate user interaction with JS dialogs like alert() and confirm(). There are also JS preprocessors if you are scraping someone else's site and need to mess around with their JS. There are "incorrectness listeners" for HTML, CSS, etc.
Sample test code:
@Test
public void homePage_Firefox() throws Exception {
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_2);
final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());
}
Mozilla Rhino is used as the JS engine, NekoHtml for the HTML parser, and Apache HTTP Client 3.1 for HTTP requests and cookie support. By default, HTMLUnit will not render images.
There's also a Perl interface to HTMLUnit.
Web 2.0 Security Puzzlers - Ray LaiStatic security analysis tools can give you a lot of false positives.
Watch out for cross-frame attacks. For instance, the parent frame uses JavaScript to get content from the child. Yes, most new browsers won't allow frames from different domains to talk to each other... but especially for external sites, remember that users are running some very old browsers. To prevent this, just check for
self != top in JS and use
window.open(url, "_top") to break out.
XSS (Cross-Site Scripting): Say a user tricks you into clicking this link:
http://mysite.com/pay.jsp?id=1;<script>alert("hack!")</script>
Is this an issue? Well, it depends on how the server parses the URL and outputs the id in the HTML. The above <script> in the URL could actually do something really naughty like examine your cookies, DOM elements, etc. and POST its findings somewhere.
XSS is easy to defeat using a servlet filter looking for a certain URL pattern with suspicious chars.
Yeah, like we need another servlet filter running in our app! Actually, SiteMinder already looks for suspicious chars and kills your request with a "naughty characters" message if it finds them. IE8 has built-in XSS prevention enabled by default.
CSRF - Cross-Site Request Forgery. This forces users to execute unwanted actions on a web site where they are already authenticated. You can use img tags or IFrames to point to a site where the user is already logged in. So, if someone has a browser window open already to their banking site, another site can point an img src to a URL on the banking site that triggers a money transfer or whatever.
Cantankerous! This is kind of hard to prevent.
SQL Injection Attacks. *Yawn.* Yeah, use parameterized queries.
Hard-coded passwords in server configuration files. You can't always rely on admins to restrict the file permissions to hide the passwords. At least obfuscate your password.
I'm trying to get Oracle's client-side wallets / SSO working so that we don't have to deal with passwords at all in our server config files for sensitive apps, but Oracle's documentation is lacking.System Info Leakage. Use an error page in web.xml, but certain requests may still generate error messages with revealing stack traces. Use log4j to write errors to a log, and don't output sensitive data to the logs.
Review the
OWASP Top 10 Web Security Vulnerabilities.
The Ghost in the Virtual Machine: a Reference to References - Bob Lee (Google)A lot of things require manual clean-up; you don't want them lingering around in memory, like...
- Listeners
- File descriptors
- Native memory
- External State, like using an IdentityHashMap to add fields to objects you don't control. Huh?!
Finalizers stink. If you must use them, be sure to also call super.finalize(). Finalizers are not guaranteed to run, and they aren't timely. You can even resurrect references (unintentionally) in finalizers... so your objects don't get cleaned up. Those who have studied for Java 5 certification may recall that bit of ugliness. Finalizers usually run on the same thread, so be sure to synchronize your finalizers where needed. Since there's just one thread calling finalizers, slow finalizers will defer clean up of other objects.
Since Java 1.2, we've had the Reference API for more flexible reference handling. Reference types:
- Soft - useful for caching; these are cleared when the JVM runs low on memory.
- Weak - for fast clean-up (pre-finalizer). These are cleared when there are no longer any other references to it.
- Phantom - for safe clean-up (post-finalizer). These must be cleared manually, because of some stupid patent. You must use a "reference queue" to get at objects through phantom references.
Keep in mind that these references themselves are objects, and you must have strong references to the reference objects.
Google developed some nifty collections to assist with references, like FinalizableReferenceQueue and FinalizablePhantomReference, where you have your own thread to do the clean-up, instead of relying on the finalizer thread.
WeakHashMap keeps weak references to keys and strong references to values, but like the standard HashMap, this uses equals() instead of ==, so it's a bit inefficient in most cases. Google Collections has a MapMaker, which is a smarter WeakHashMap that uses ==.
Google Collections is currently at 1.0RC2.
There's an old, interesting IBM DeveloperWorks article on using weak references to reduce memory usage.
BOF: Performance Comparisons of Dynamic Languages on the JVM - Michael Galpin (eBay)
Comparing Groovy, JRuby, Jython, Clojure, Scala, and Fan. These "benchmarks" were all about solving silly math puzzles. Not very realistic. He didn't even bother to monitor memory usage.
Overall, Groovy and JRuby performed about the same. Scala and Fan were the fastest, but about twice as slow as Java. Jython was very slow. Someone in the audience said that there has been a lot of activity on Jython, but the team has been busy catching up with new language features and will address performance next. Scala and Fan seem to be fast for string parsing / manipulation.
There are things you can do in Scala that you can't easily do in Groovy.
I guess Scala is good for heavily multithreaded apps with complex concurrency problems. Scala looks kind of like Ruby. Clojure focusses on functional programming. It looks like Lisp. Gross!