Tuesday, June 24, 2014

File Archive using Hadoop Archive


 Archiving small files
The Hadoop Archive's data format is called har, with the following layout:
foo.har/_masterindex //stores hashes and offsets
foo.har/_index //stores file statuses
foo.har/part-[1..n] //stores actual file data
The file data is stored in multiple part files, which are indexed for keeping the original separation of data intact. Moreover, the part files can be accessed by MapReduce programs in parallel. The index files also record the original directory tree structures and the file statuses. In Figure 1, a directory containing many small files is archived into a directory with large files and indexes.
HarFileSystem – A first-class FileSystem providing transparent access
Most archival systems, such as tar, are tools for archiving and de-archiving. Generally, they do not fit into the actual file system layer and hence are not transparent to the application writer in that the user had to de-archive the archive before use.
Hadoop Archive is integrated in the Hadoop’s FileSystem interface. The HarFileSystemimplements the FileSystem interface and provides access via the har:// scheme. This exposes the archived files and directory tree structures transparently to the users. Files in a har can be accessed directly without expanding it. For example, we have the following command to copy a HDFS file to a local directory:
hadoop fs –get hdfs://namenode/foo/file-1 localdir
Suppose an archive bar.har is created from the foo directory. Then, the command to copy the original file becomes
hadoop fs –get har://namenode/bar.har#foo/file-1 localdir
Users only have to change the URI paths. Alternatively, users may choose to create a symbolic link (from hdfs://namenode/foo to har://namenode/bar.har#foo in the example above), then even the URIs do not need to be changed. In either case,HarFileSystem will be invoked automatically for providing access to the files in the har. Because of this transparent layer, har is compatible with the Hadoop APIs, MapReduce, the shell command -ine interface, and higher-level applications like Pig, Zebra, Streaming, Pipes, and DistCp.

Monday, June 9, 2014

SAP BO Open Document in 4.1


The open document functionality between 3 and 4 is exactly the same expect a change the default URL 

The default URL to the OpenDocument web application bundle has changed in SAP BusinessObjects
Business Intelligence platform 4.0. New absolute OpenDocument links need to use the new default
URL:
http://<servername>:<port>/BOE/OpenDocument/opendoc/openDocument.jsp?<parameter1>&<parameter2>&...&<parameterN>
If you are migrating reports with existing links from an XI 3.x release platform, resolve the issue by
setting up the following redirect in your web server:
• Redirect: ../OpenDocument/opendoc/openDocument.jsp
• To: ../BOE/OpenDocument/opendoc/openDocument.jsp
Note:
• Ensure that all URL request parameters are forwarded correctly by your redirect. Refer to your web
server documentation for detailed steps on implementing a redirect.
• SAP BusinessObjects Business Intelligence platform 4.0 only supports a Java deployment of
OpenDocument. The OpenDocument web bundle is part of the BOE.war file.

Friday, June 6, 2014

SSIS 2012 New features


Source - MSDN BLog

#1 – Change Data Capture

We’ve partnered with Attunity to provide some great CDC functionality out of the box. This includes a CDC Control Task, a CDC Source component, and a CDC Splitter transform (that splits the output based on the CDC operation – insert/update/delete). It also includes CDC support for Oracle. More details to follow.

#2 – ODBC Support

ODBC Source and Destination components, also from Attunity, and included in the box.

#3 – Connection Manager Changes

RC0 makes some minor improvements to Shared Connection Managers (they are now expressionable), and changes the icons used to designate connection managers that are shared, offline, or have expressions on them. We also added a neat feature for the Cache Connection Manager – it can now share it’s in-memory cache across package executions (i.e. create a shared connection manager, load the cache with a master package, and the remaining child packages will all share the same in-memory cache).
image

#4 – Flat File Source Improvements

Another feature that was added in CTP3, but worth calling out again. The Flat File Source now supports a varying number of columns, and embedded qualifiers.
image

#5 – Package Format Changes

Ok, another CTP3 feature – but when I demo’d it at PASS, I did a live merge of two data flows up on stage. And it worked. Impressive, no?

#6 – Visual Studio Configurations

You can now externalize parameter values, storing them in a visual studio configuration. You can switch between VS configurations from the toolbar (like you can with other project types, such as C# or VB.NET), and your parameter values will automatically change to the value within the configuration.
24-9_configuration_param

#7 - Scripting Improvements

We upgraded the scripting engine to VSTA 3.0, which gives us a Visual Studio 2010 shell, and support for .NET 4.
Oh… and we also added Script Component Debugging. More about that to follow.

#8 – Troubleshooting & Logging

More improvements to SSIS Catalog based logging. You can now set a server wide default logging level, capture data flow component timing information, and row counts for all paths within a data flow.
image

#9 – Data Taps

Another CTP3 feature that didn’t get enough attention. This feature allow you to programmatically (using T-SQL) add a “tap” to any data flow path on a package deployed to the SSIS Catalog. When the package is run, data flowing through the path will be saved out to disk in CSV format. The feature was designed to make debugging data issues occurring in a production environment (that the developer doesn’t have access to).
image

#10 – Server Management with PowerShell

We’ve added PowerShell support for the SSIS Catalog in RC0. See the follow up post for API examples.

Other Changes

  • Updated look for the Control Flow and Data Flow
  • Pivot UI
  • Row Count UI
  • New Expression:
    • REPLACENULL
  • BIDS is now SQL Server Data Tools
  • Many small fixes and improvements based on CTP feedback – thank you!!