My challenge was to import over 400 historical PDF documents and display them on the front end of a website in a table view. The table needed to be sortable and searchable with multiple columns for fields containing a title, file name, and date. In order for Drupal to recognize the files and allow me to create a view, the files had to be imported as media entities. To accomplish this I used FTP to upload the documents, a spreadsheet and migration modules to create the media entities, and the views module for creating the table.
Rename and Upload Documents
- Rename documents to be uploaded by replacing spaces and underscores with a dash. If using Windows, here’s a way to do this in bulk.
- Connect to your site via FTP.
- Create a new directory under your public file directory. The Drupal default is /sites/default/files, but yours may be different. Name it whatever makes sense to you. I am using ‘historical’ in my example.
- Upload the documents to the new directory.
- Create a new spreadsheet using Excel or Google Sheets. The column headers serve as the field names for the media entity we will create.
- Add column headers – replace spaces with underscores
- Suggested: ID, title, created, file_location, file_name, uid
- Optional: status and language. Alternatively, you can set the defaults in your media type for status to published (1) and for language to English (en).
- Change the date format to yyyy-mm-dd
- Fill in cells with data (can leave uid, status, and language blank)
- For file_location prepend the file name with: public://historical/filename.pdf. Your public file directory path will replace “public://”
- To fill in file_location using data from column A and C, you can use this formula: =”public://historical/”&C2&”/”&A2&”.pdf”
- If you are adding data containing multiple taxonomy terms, separate the data with a comma and a space.
- Save a copy of the spreadsheet to CSV format
- Upload the CSV file via FTP to the new directory you created or any other readable directory.
Create Media Entities
Install and enable the following modules:
Edit Document Media Type
Next, edit the default document media type (optionally, you can add a new doc media type).
- Go to /admin/structure/media
- For Document, select Manage fields under Operations
- Select Add field for each new field you want to collect info from the CSV file. The names don’t have to match up with the CSV file.
Create Configuration File
Next, we will create a new migration configuration file for importing the data from the CSV file to the document media entity. Create a new file and name it single-configuration.yml.
There are 4 main sections to the file:
- uuid: universally unique identifier; eg: any unique #
- id: file ID; eg: document_import
- label: name of file; eg: Import documents
- migration_group: eg: default
- Source – Identifies the CSV path, column names, and more
- plugin: The source plugin ID we want to use
- path: Path to your CSV file
- The first field (0) will be stored in the ID field during migration, and that name can be used to map the value.
- It’s important to list fields in the same order they appear in your CSV file.
- name – Identifies the column headers in the CSV file
- label – Human friendly name
- Process – Specifies media type and maps the data in CSV to the document media type fields
- The first part is the media type field (eg: field_year).
- The second part is the source (CSV column name; eg: year).
- Some processes rely on plugins included with the modules we enabled. These include the multi-line processes such as type, field_media_document, and uid.
- Destination – Instructs each row of data from the CSV file to create a document media entity
Example YML File
uuid: 1823b026-d232-11ec-9d64-0242ac120002 id: document_import label: Import documents migration_group: default source: plugin: 'csv' path: 'sites/default/files/import.csv' delimiter: ',' enclosure: '"' header_offset: 0 ids: [ID] fields: 0: name: ID label: 'ID' 1: name: title label: 'Title' 2: name: year label: 'Year' 3: name: created label: 'Created' 4: name: file_location label: 'File Location' 5: name: file_name label: 'File Name' 6: name: uid label: 'User Id' process: type: plugin: default_value default_value: document field_doc_title: title field_year: year field_published_date: created field_media_document: plugin: file_remote_url source: file_location name: file_name uid: plugin: default_value default_value: 1 destination: plugin: entity:media default_bundle: document
Now we are finally ready to import the configuration to create the media entities.
- Go to /admin/config/development/configuration/single/import (Configuration > Development > Configuration Synchronization > import tab > single item link)
- Under Configuration type select Migration
- Copy/paste your yml file
- Click import, then confirm
- Open the command line terminal
- View the status of all migrations – drush migrate-status
- This should show your migration file and how many if any docs have been imported.
- Run the import – drush migrate-import document_import
- document_import is the ID from line 2 of the migration file.
- If successful, you will get a notification showing what was created or not; similar to:
- [notice] Processed 10 items (10 created, 0 updated, 0 failed, 0 ignored) – done with ‘document_import’
- View the status of all migrations – drush migrate-status
- Check media (/admin/content/media) to verify your documents appear there and everything was mapped correctly. If not, you can rollback and make changes:
- Rollback by running – drush migrate-rollback document_import
- Make changes to the config file and repeat the steps above.
- You can delete your CSV file once your import is complete.
Create View to Display Documents
Once the entities are created, let’s create a view for a page to display the documents on your website.
- Go to /admin/structure/views (Structure > Views)
- Click Add View with the following info:
- View Name: Documents
- View Settings: Show Media of type Document
- Page Settings: Create a page; Add a title and path; Display format: Table
- Add fields and filters:
Add fields as shown below:
Add filters shown below:
If you want your users to be able to filter any fields, set those to exposed. It’s a good idea to include a reset button under advanced > Exposed form style > Settings.
To set a column as sortable, click on table settings, check on sortable, and apply.
Adjust anything else as desired on the page view. Depending on your theme, you may have to edit the CSS in your theme to keep the exposed forms on the same line and add a margin.
Your final page should look similar to the example below. My example has a couple of extra fields.
A working example of my final live page can be viewed here: https://www.usccr.gov/historical-documents.
This post was written by Fawn Kildoo.
Contact us for more information on this post or our services.