Saturday, May 13, 2017

Wrangle Young Man Wrangle (1)

Currently project managing a transformation programme around operational risk... here we're rolling out risk reporting packs globally to meet regulatory demands.

As ever, the vanilla MI / BI tool never does quite what's required by the business, so time to draw on those wrangling skills to get the job done.

Basic problem, convert the PDF report to a PowerPoint presentation. For guidance on PowerPoint, I turned to Microsoft wrangler-extraordinaire, Mark Townsend, currently crushing out solutions for Deutsche Asset Management!

Just a few catches: content holds a 'RESTRICTED' classification (cannot be shared externally), the converted content MUST be non-editable on the slide and only software on the client's approved list can be used to get the job done.

Can we run it on our own computers too...

OK!

After hassling a few SMEs internally turns out we have Nuance (Power PDF) and Adobe Standard to play with.

The problem statement:

Mark quickly turned around this outline solution with interop libraries, a great foundation for the final deliverable.
using System;
 using Microsoft.Office.Interop.PowerPoint;

 namespace ConsoleApplication4
 {
   class Program
   {
     static void Main(string[] args)
     {
       var ap = new Application();
       var pp = ap.Presentations.Open(@"\\PathTo\MyDummyPresentation.pptx");
       var sl = pp.Slides[1];
       var sp = sl.Shapes.AddPicture(@"\\PathTo\MyPicture.jpg", Microsoft.Office.Core.MsoTriState.msoFalse, Microsoft.Office.Core.MsoTriState.msoTrue, 100, 100);
       pp.SaveAs(@"\\PathTo\MyNewPresentation.pptx");
       pp.Close();
       // there’s some clean up missing here. COM cleanup is tricky so if you use this let me know and I’ll spend a bit of time getting the cleanup correct
      }
    }
  }
After a little more wrangling with the two PDF tools we ended up with the two solutions below.

1. Adobe Standard (v11.0).
function Convert-PDF
{

<#
  .SYNOPSIS
  Convert PDF file(s) into PowerPoint presentation(s) containing non-editable graphics on each slide.
  .DESCRIPTION
  Function converts PDF file(s) into PowerPoint presentation(s).

  Function takes an array of files, testing each file is a PDF type, then converts each PDF to PowerPoint. This is
  managed by converting each page of the PDF to a graphic file; content is non-editable. For each graphic, a new slide
  is created in the PowerPoint presentation and the graphic is added.
 
  The new PowerPoint presentation(s) is saved to the same location as the PDF.
 
  .NOTES
  File Name        : Convert-PDF.ps1
  Author           : Justin Townsend
  Create Date      : 17/05/2017
  Purpose / Change : Initial version
  Prerequisite     : Acrobat Standard (v11.0)
  .LINK
  https://link-to-help-file.com
  .EXAMPLE
  Convert-PDF -pdfs "C:\test.pdf"
  .EXAMPLE
  Convert-PDF -pdfs "C:\test_1.pdf", "C:\test_2.pdf" -InfoClass "RESTRICTED"
  .PARAMETER pdfs
  PDF file(s) for processing (accepts array).
  .PARAMETER InfoClass
  Information Classification, used for marking sensitive content.
#>
[cmdletbinding()]
param ([Parameter(Mandatory=$true,
                  Position=0,
                  HelpMessage='Specify the location(s) of PDF files.',
                  ValueFromPipeline=$true)]
                  [ValidateScript({ foreach ($pdf in $_) { if (![bool]($pdf -like '*.pdf')) { throw "$($pdf) is an invalid PDF file!" } } return$true })]
                  [string[]] $pdfs
                   ,
        [Parameter(Position=1,
                   HelpMessage='Information Classification.',
                   ValueFromPipeLine=$true)]
                   [ValidateSet("HIGHLY RESTRICTED","RESTRICTED","INTERNAL","PUBLIC","Not Applicable")]
                   [string] $InfoClass = "RESTRICTED"
                   )
:pdfloop foreach ($pdf in $pdfs)

{

   $pdf = get-childitem $pdf
   $out_dir = $pdf.DirectoryName
   $out_dir = $out_dir + "\" + $pdf.Basename
   $out_dir += "_PROC"
   $out_file = $out_dir + "\" + $pdf.Basename

   new-item $out_dir -type directory -force
 
   # Adobe Acrobat Standard (convert to graphic files)
   $adobeApp = New-Object -ComObject AcroExch.AVDoc;
   $adobeApp.Open($pdf.Fullname, "") | Out-Null;
   $pdfDoc = $adobeApp.GetPDDoc();
   $pdfJSObject = $pdfDoc.GetJSObject();
 
   $TypeExt="jpeg";
   $closeDocParam = $true;
   $T = $pdfJSObject.GetType();
 
   $T.InvokeMember("SaveAs",
     [Reflection.BindingFlags]::InvokeMethod -bor `
     [Reflection.BindingFlags]::Public       -bor `
     [Reflection.BindingFlags]::Instance,
     $null,
     $pdfJSObject,
     @([IO.Path]::ChangeExtension($out_file, $TypeExt), ("com.adobe.acrobat."+$TypeExt)));
 
   $T.InvokeMember("closeDoc",
     [Reflection.BindingFlags]::InvokeMethod -bor `
     [Reflection.BindingFlags]::Public       -bor `
     [Reflection.BindingFlags]::Instance,
     $null,
     $pdfJSObject,
     $closeDocParam) | Out-Null;
     $pdfDoc.Close()  | Out-Null;
     $adobeApp.Close(1) | Out-Null;
 
   # Microsoft PowerPoint creation
   Add-type -AssemblyName Office;
   Add-Type -AssemblyName Microsoft.Office.Interop.PowerPoint;
 
   $msoappPPT = New-Object -ComObject powerpoint.application;
   $msoappPPT.visible = [Microsoft.Office.Core.MsoTriState]::msoTrue;
   $slideType = "microsoft.office.interop.powerpoint.ppSlideLayout" -as [type];
   $slideSize = "microsoft.office.interop.powerpoint.ppSlideSizeType" -as [type];
   $msoSendToBack = 1;
 
   $out_ppt = $pdf.DirectoryName + "\" + $pdf.Basename
   $pptPres = $msoappPPT.Presentations
   $pptPres = $pptPres.add()
  
   $pptPres.PageSetup.slideSize = $slideSize::ppSlideSizeA4Paper;
  
   get-childitem -path $out_dir | sort-object -Property CreationTime | ForEach-Object { `
     $pic = $_.fullname
     $add_slide = $pptPres.Slides.Add($pptPres.Slides.Count + 1, 15);
     $add_slide.layout = $slideType::ppLayoutBlank;
     $add_slide.HeadersFooters.Footer.Visible = [Microsoft.Office.Core.MsoTriState]::msoTrue;
     $add_slide.HeadersFooters.Footer.text = $InfoClass;
     $add_slide.Shapes.Range("Footer Placeholder 2").Left = -100;
     $shape = $add_slide.Shapes.AddPicture($pic, $false, $true, 0, 0, -1, -1);
     $shape.ZOrder($msoSendToBack);
   }
   
   $pptPres.SaveAs($out_ppt)
   $pptPres.Close()
   $msoappPPT.quit()
   $msoappPPT = $null;
}
 
Remove-Item $out_dir -recurse
 
}
2. Nuance Power PDF Advanced (v1.2).
function Convert-PDF
{

<#
  .SYNOPSIS
  Convert PDF file(s) into PowerPoint presentation(s) containing non-editable graphics on each slide.
  .DESCRIPTION
  Function converts PDF file(s) into PowerPoint presentation(s).

  Function takes an array of files, testing each file is a PDF type, then converts each PDF to PowerPoint. This is
  managed by converting each page of the PDF to a graphic file; content is non-editable. For each graphic, a new slide
  is created in the PowerPoint presentation and the graphic is added.
 
  The new PowerPoint presentation(s) is saved to the same location as the PDF.
 
  .NOTES
  File Name        : Convert-PDF.ps1
  Author           : Justin Townsend
  Create Date      : 17/05/2017
  Purpose / Change : Initial version
  Prerequisite     : Nuance Power PDF Advanced (v1.2)
  .LINK
  https://link-to-help-file.com
  .EXAMPLE
  Convert-PDF -pdfs "C:\test.pdf"
  .EXAMPLE
  Convert-PDF -pdfs "C:\test_1.pdf", "C:\test_2.pdf" -InfoClass "RESTRICTED"
  .PARAMETER pdfs
  PDF file(s) for processing (accepts array).
  .PARAMETER InfoClass
  Information Classification, used for marking sensitive content.
#>
[cmdletbinding()]
param ([Parameter(Mandatory=$true,
                  Position=0,
                  HelpMessage='Specify the location(s) of PDF files.',
                  ValueFromPipeline=$true)]
                  [ValidateScript({ foreach ($pdf in $_) { if (![bool]($pdf -like '*.pdf')) { throw "$($pdf) is an invalid PDF file!" } } return$true })]
                  [string[]] $pdfs
                   ,
        [Parameter(Position=1,
                   HelpMessage='Information Classification.',
                   ValueFromPipeLine=$true)]
                   [ValidateSet("HIGHLY RESTRICTED","RESTRICTED","INTERNAL","PUBLIC","Not Applicable")]
                   [string] $InfoClass = "RESTRICTED"
                   )
:pdfloop foreach ($pdf in $pdfs)
{
   # Nuance batch conversion
   $pdf = get-childitem $pdf
   $outExt= "jpg"
   $out_dir = $pdf.DirectoryName
   $out_dir = $out_dir + "\" + $pdf.Basename
   $out_dir += "_PROC"
   $out_file = $out_dir + "\" + $pdf.Basename + "." + $outExt
 
   new-item $out_dir -type directory -force
 
   & "C:\Program Files\Nuance\Power PDF\batchconverter" -I"$pdf" -O"$out_file" -TTIF -CcJpegMax -Q
 
   # Microsoft PowerPoint creation
   Add-type -AssemblyName Office;
   Add-Type -AssemblyName Microsoft.Office.Interop.PowerPoint;
 
   $msoappPPT = New-Object -ComObject powerpoint.application;
   $msoappPPT.visible = [Microsoft.Office.Core.MsoTriState]::msoTrue;
   $slideType = "microsoft.office.interop.powerpoint.ppSlideLayout" -as [type];
   $slideSize = "microsoft.office.interop.powerpoint.ppSlideSizeType" -as [type];
   $msoSendToBack = 1;
 
   $out_ppt = $pdf.DirectoryName + "\" + $pdf.Basename
   $pptPres = $msoappPPT.Presentations
   $pptPres = $pptPres.add()
  
   $pptPres.PageSetup.slideSize = $slideSize::ppSlideSizeA4Paper;
  
   get-childitem -path $out_dir | sort-object -Property CreationTime | ForEach-Object { `
     $pic = $_.fullname
     $add_slide = $pptPres.Slides.Add($pptPres.Slides.Count + 1, 15);
     $add_slide.layout = $slideType::ppLayoutBlank;
     $add_slide.HeadersFooters.Footer.Visible = [Microsoft.Office.Core.MsoTriState]::msoTrue;
     $add_slide.HeadersFooters.Footer.text = $InfoClass;
     $add_slide.Shapes.Range("Footer Placeholder 2").Left = -100;
     $shape = $add_slide.Shapes.AddPicture($pic, $false, $true, 0, 0, -1, -1);
     $shape.ZOrder($msoSendToBack);
   }
   
   $pptPres.SaveAs($out_ppt)
   $pptPres.Close()
   $msoappPPT.quit()
   $msoappPPT = $null;
}
 
Remove-Item $out_dir -recurse
 
}
If the sequence of the files is important to you, try not to rely on the standard naming convention of the output. As per the requirement, we've ensured the sequence is correct by sorting the output.
get-childitem -path $out_dir | sort-object -Property CreationTime
Hope you find this useful. You can always get in touch.