Difference between revisions of "Unraid 6/Troubleshooting"

From unRAID
Jump to: navigation, search
m (ptured in the logs)
m
(25 intermediate revisions by the same user not shown)
Line 22: Line 22:
 
If you want to ask questions in the forum such information will typically be requested as it will speed up the process of getting meaningful and accurate feedback.
 
If you want to ask questions in the forum such information will typically be requested as it will speed up the process of getting meaningful and accurate feedback.
  
'''System Diagnostics'''
+
=== System Diagnostics'' ===
  
 
Unraid has a GUI  option under ''Tools->Diagnostics'' to capture a lot of information about the state of your system that can be  
 
Unraid has a GUI  option under ''Tools->Diagnostics'' to capture a lot of information about the state of your system that can be  
 
helpful when trying to diagnose any issues.  Using this tool will result in a zip file being produced that can be downloaded and then attached to forum posts.
 
helpful when trying to diagnose any issues.  Using this tool will result in a zip file being produced that can be downloaded and then attached to forum posts.
 
If the GUI cannot be accessed for any reason then using the '''diagnostics''' command from the Linux level will generate the same information and
 
If the GUI cannot be accessed for any reason then using the '''diagnostics''' command from the Linux level will generate the same information and
put the resulting zip file into the '''logs''' folder on the flash drive.
+
put the resulting zip file into the '''logs''' folder on the flash drive
 +
The Diagnostics should if at all possible be captured BEFORE you reboot so that the lops show what happened leading up to the problem occurring.
 +
The zip file produced can then be attached to a forum post when asking for help on a problem in the Unraid forums.
  
 
[[File:Diagnostics.jpg|250px|thumb|right|Diagnostics]]
 
[[File:Diagnostics.jpg|250px|thumb|right|Diagnostics]]
Line 44: Line 46:
 
is not giving the expected results this is probably acceptable?
 
is not giving the expected results this is probably acceptable?
  
'''Persistent Logs'''
+
=== Persistent Logs ===
  
 
The main system log is the ''syslog'' file and it is the contents of this file that is displayed when you click the [[File:Log-icon.png||Log]] icon at the top right of the Unraid GUI.
 
The main system log is the ''syslog'' file and it is the contents of this file that is displayed when you click the [[File:Log-icon.png||Log]] icon at the top right of the Unraid GUI.
 +
Note that when posting to the forums extracted fragments of the syslog are rarely helpful as they do not show what lead up to a problem occurring.
  
Normally the logs are only written to RAM so do not survive the system being rebooted.  If you are investigating a system crash then as long as you are running Unraid 6.7.2 or later you can go to ''Settings->Network Services->Syslog Server'' and enable the server  
+
Normally the logs are only written to RAM so do not survive the system being rebooted.  If you are investigating a system crash then as long as you are running Unraid 6.7.2 there is now built-in syslog server support
 +
 
 +
[[File:Syslog-server-setup.jpg|200px|thumb|right|Syslog server]]
 +
 
 +
* Go to ''Settings->Network Services->Syslog Server''  
 +
: You can click on the 'Help' icon on the Toolbar and get more information for all of the options. 
 +
 
 +
* '''Mirror to Flash''':    This is the simplest to set up.  You select 'Yes' from the dropdown box and click on the 'Apply' button and the syslog will be mirrored to logs folder/directory of the flash drive and is appended to on a reboot.  There is one principal disadvantage to this method.  If the condition, that you are trying to troubleshoot, takes days to weeks to occur, it can do a lot of writes to the flash drive.  Some folks are hesitant to use the flash drive in this manner as it may shorten the life of the flash drive. 
 +
: The advantage of this approach is that it captures everything from the start of the boot process which can be important if trying to diagnose boot problems.
 +
* '''Remote Syslog Server''': This is used when you have another machine on your network that is acting as a syslog server. This can be another Unraid server.  You can also use virtually any other computer.  You find the necessary software by googling for the  syslog server <Operating system>  After you have set up the computer/server, you fill in the computer/server name or the IP address.  (I prefer to use the IP address as there is never any confusion about what it is.)  The Click on the 'Apply' button and your syslog will be mirrored to the other computer.
 +
: The other computer has be left on continuously until the problem occurs.
 +
: The events captured will only start with the point at which the syslog daemon is started during the boot process thus missing the very start of the boot process.
  
 
[[File:Syslog-server.jpg|200px|thumb|right|Syslog server]]
 
[[File:Syslog-server.jpg|200px|thumb|right|Syslog server]]
  
* Tick the ''Mirror to Flash'' option to get a syslog that survives a crash written to the flash drive and is appended to on a reboot. You do '''not''' want to leave the ''Mirror to Flash'' option ticked if you are not investigating a problem where you cannot get the normal system diagnostics as this can cause a lot of additional writes to the flash drive that can shorten it's lifetime.  
+
* '''Local Syslog Server''': Set this to '''Enabled''' to setup this Unraid server to act as a network syslog server.  When this is enabled then some extra options are offered.  The built-in Help gives guidance /n suitable settings.
* You can also enter the name or IP address of your Unraid server into the ''Remote syslog server'' field and enter a local share where such a file can be stored under the ''Local syslog folder'' field.   In this case a share set to use the cache drive is recommended to avoid spinning up array drives unnecessarily. This is more appropriate if you wand to continue to keep a permanent copy of the syslog but the file will not be as easy to access if the Unraid system is crashing.
+
**'''Local syslog folder''':  This will be a share on the your server but chose it with care.  Ideally, it will be a 'cache only' or a 'cache preferred' share.  This will minimize the spinning up of disks due to the continuous writing of new lines to the syslog.  A cache SSD drive would be the ideal choice here using a ''cache preferred'' share.  The syslog will be in the root of that folder/share.)
 +
** '''Local syslog rotation''':  These settings allow you to control how much space the syslog is allowed to use.
 +
*** '''Local syslog maximum file size''':
 +
*** '''Local syslog number of files''':
 +
: If you click the 'Apply button at this point, you will have this server setup to serve as a Remote Syslog Server.  It can now capture syslogs from several computers if the need should arise.  
 +
* Using a bit of trickery we can use the Unraid server with the problem as the Local syslog server.  
 +
: This is more appropriate if you want to continue to keep a permanent copy of the syslog but the file will not be as easy to access if the Unraid system is crashing.  
 +
:  You can now add the IP address of this server as the  Remote syslog server  (Remember the mention of trickery).  So basically, you send data out-of-the-server and it comes-right-back-in.)   
  
 
Notes:
 
Notes:
Line 74: Line 95:
 
== Boot Issues ==
 
== Boot Issues ==
  
''THIS SECTION IS STILL UNDER CONSTRUCTION'
+
=== Preparing the flash drive ===
 
 
'''Preparing the flash drive'''
 
  
  
'''Boot Process'''
+
=== Boot Process ===
  
 
Most of the time the Unraid boot process runs seamlessly and the user needs no awareness of the various stages involved.   
 
Most of the time the Unraid boot process runs seamlessly and the user needs no awareness of the various stages involved.   
Line 90: Line 109:
 
The boot process for Unraid proceeds through a number of stages  
 
The boot process for Unraid proceeds through a number of stages  
  
# '''Bios boot''':  This is the stage at which the motherboard BIOS recognizes the presence of the Unraid bootable flash drive and displays the Unraid Boot menu
+
# '''Bios boot''':  This is the stage at which the motherboard BIOS recognizes the presence of the Unraid bootable flash drive
 +
#* The way that the Unraid flash drive is set as the default boot device is BIOS dependent so you may need to consult your motherboard's User Manual to determine the correct way to do this.
 +
#* The Unraid flash drive supports booting in Legacy mode (also sometimes known as CSM mode) for older BiOS's and UEFI for more recent ones.  Many recent BIOS's support both modes.
 +
#* If you want UEFI boot mode to be used then the EFI folder on the flash drive must not have trailing tilde (~) character.
 +
# '''Syslinux loader''':
 +
#: [[File:BootMenu.jpg|Boot Menu]]
 
#* The entries that appear on the boot menu are specified by the ''syslinux/syslinux.cfg'' file on the flash drive.  Although in theory this file can be edited manually as it a text file it is recommended that it is done via the GUI by clicking on the flash drive on the ''Main'' tab and going to the ''Syslinux configuration'' section
 
#* The entries that appear on the boot menu are specified by the ''syslinux/syslinux.cfg'' file on the flash drive.  Although in theory this file can be edited manually as it a text file it is recommended that it is done via the GUI by clicking on the flash drive on the ''Main'' tab and going to the ''Syslinux configuration'' section
#* If you want UEFI boot mode to be used then the EFI folder on the flash drive must not have trailing stash.
+
#* The Memtest86+ option only works if booting in Legacy mode.  If booting in UEFI mode it will typically simply cause a reboot.  If you want a version that will work in UEFI boot mode then you need to download it for yourself from either [https://www.memtest.org/ www.memtest.org] or [https://www.memtest86.com/ www.memtest86.com]
 
#* If the user does not select a specific option then after a timeout period the default option will be used.  If Unraid is running in headless mode this is the option that will be run.
 
#* If the user does not select a specific option then after a timeout period the default option will be used.  If Unraid is running in headless mode this is the option that will be run.
 
# '''Linux core''':  This is the stage at which the ''syslinux'' boot loader takes over from the BIOS and starts loading the files specified in the ''syslinux.cfg'' file.
 
# '''Linux core''':  This is the stage at which the ''syslinux'' boot loader takes over from the BIOS and starts loading the files specified in the ''syslinux.cfg'' file.
Line 100: Line 124:
 
#* There will then be messages displayed as Linux start up and detects the hardware environment.
 
#* There will then be messages displayed as Linux start up and detects the hardware environment.
 
# '''Flash dependent services''':  At this stage the flash drive is mounted at ''/boot'' so that the process can continue
 
# '''Flash dependent services''':  At this stage the flash drive is mounted at ''/boot'' so that the process can continue
#* Drivers and configuration information is read into RAM from the flash drive.
+
#* If the mount of the flash fails it is still possible to get the login prompt displayed.  However this does not necessarily mean the whole boot process completed correctly.
#* If the mount of the flash fails it is still possible to get the login prompt displayed.  One way to see if this has happened is to login and use the '''df''' command.  If the flash drive was mounted successfully then you will see it as /boot in the resulting list of mount points.
+
#* If this stage of the boot process has not completed then typical symptoms are that the webGUI and network are not started
# '''Plugins''':  If the user has installed plugins then they are normally loaded at this stage. If one of the Safe Boot options was selected from the Unraid Boot menu then the loading of plugins is suppressed.
+
#: One way to see if this has happened is to login and use the '''df''' command.  If the flash drive was mounted successfully then you will see it as /boot in the resulting list of mount points.
# '''Web GUI''': The Unraid web GUI is started.  It is actually done via an entry in the config/do file on the flash drive so it is possible for user supplied commands to also be run from there ether before starting the web GUI or lest after doing so.
+
#: The output should have something like following mount points:
 +
#:<code>/dev/sdb1          15413232      826976    14586256  6% /boot<br>  /dev/loop0            9344        9344          0 100% /lib/modules<br>  /dev/loop1            7424        7424          0 100% /lib/firmware</code>
 +
#* Additional drivers and firmware are now available on the above mount points.
 +
#* Configuration information is read into RAM from the flash drive.
 +
#* Standard Linux services are started.  Examples would be networking and (if enabled) WireGuard VPN.
 +
# '''Plugins''':   
 +
#* If the user has installed plugins then they are normally loaded at this stage.
 +
#* If one of the Safe Boot options was selected from the Unraid Boot menu then the loading of plugins is suppressed.
 +
# '''Web GUI''':  
 +
#* The Unraid web GUI is started.   
 +
#* The webGUI is actually done via an entry in the ''config/go'' file on the flash drive so it is possible for user supplied commands to also be run from there either before starting the webGUI or just after doing so.
 
# '''Array'''  
 
# '''Array'''  
## '''Drives mounted'''
+
#: If the user has set the array to be auto-mounted then the following will start.  If array auto-start is not set then they happen when the user elects to start the array.
## '''File Share Services'''
+
#* '''Drives mounted'''
## '''Dockers Containers'''
+
#: Mount points will now be created of the form ''/dev/diskX'' and ''/mnt/cache'' (if you have a cache).
## '''VMs'''
+
#* '''File Share Services'''
 +
#: Shares will now become available on the network.
 +
#: At the Linux level the shares will now appear as paths like ''/mnt/user/sharename''
 +
#* '''Docker Containers'''
 +
#: If the user has enabled the docker services then the Docker containers will be started using the order on the Docker tab.
 +
#: The order of the containers and delays between starting the containers can be set on the Docker tab.
 +
#* '''VMs'''
 +
#: Any VMs the user has set to auto-start will now be started
 +
 
 +
By this stage the Unraid server will be fully operational.
 +
 
 +
<br>
 +
 
 +
=== Boot Failures ===
 +
 
 +
The following are some actions that can be taken to try and pin down the cause of a boot failure:
 +
# if possible use a USB2 port in preference to a USB3 one as they seem to be more reliable for booting purposes.
 +
# Check that the BIOS on the Unraid server still has the flash drive set as the boot device.  It is not unknown for this to get reset for no obvious reason.
 +
# On a windows 10 PC or a Mac run a check on the flash drive
 +
#* This will determine if something is wrong physically or logically with the flash drive
 +
#* If you do not already have one make sure you have a copy of the config folder of the plash drive as this contains all your current configuration information.
 +
# Download the zip version of the release from Limetech and extract all the bz* type files over-writing those on the flash drive
 +
#* This will determine if these files were not written correctly for some reason or are corrupt.
 +
# Rewrite the flash drive with a clean copy of Unraid and copy across just the key file from  your backup to the config folder
 +
#* This can determine if the flash drive itself is OK
 +
#* Copy across the remaining contents of the config folder to the flash drive
 +
#** If this goes well you are back up and running with your previous configuration intact.
 +
#** If this fails then try booting in Safe Mode.  If this works then a plugin is causing problems
 +
# If the original flash drive cannot be made to boot try a brand new flash drive and clean copy of Unraid (with the default configs)
 +
#* This can determine if something is wrong with the server's hardware (mobo, cpu, ram, usb port, etc.)
 +
# Install a clean/new copy of Unraid on a new flash drive and then copy the ''config'' folder over from the old one. 
 +
#* If this works then the license will need to be transferred to this new flash drive.
 +
 
 +
<br>
 +
 
 +
=== Lost root Password ===
 +
 
 +
Occasionally users lose their password for managing Unraid via the Unraid webGUI or console. 
 +
This may be that they simply forgot the password, but corruption of the flash drive can also result in the password not being recognized.
 +
 
 +
''Passwords for shares can be changed/reset from the Unraid webGUI.''
 +
 
 +
To reset the management password use the following process:
 +
# Shutdown server and then plug the USB boot flash device into a PC or Mac
 +
# While there it is a good idea to run a check on the flash drive and make a backup of its contents
 +
# Delete these files:
 +
#: config/passwd
 +
#: config/shadow
 +
#: config/smbpasswd
 +
# Plug flash back into server and start up again. 
 +
 
 +
Note that this will reset '''all''' user passwords including ‘root’ user to null (blank).
 +
 
 +
There is an alternative procedure that can be used for just resetting the root password (but is a little more prone to error):
 +
 
 +
# Plug the USB drive into another computer
 +
# Bing up an editor (such as Notepad++) on the following file: 
 +
#: /boot/config/shadow
 +
# On the first line you should see something such as:
 +
#: root:$&$&%*1112233484847648DHD$%.:15477:0:99999:7:::
 +
# Change that line to the following (essentially delete the content between the first and second semi-colons):
 +
#: root:15477:0:99999:7:::
 +
# Plug flash back into server and start up again. 
  
 
<br>
 
<br>
Line 121: Line 217:
  
 
''THIS SECTION IS STILL UNDER CONSTRUCTION''
 
''THIS SECTION IS STILL UNDER CONSTRUCTION''
 +
 +
<br>
 +
 +
== Windows Connection Issues ==
 +
 +
''THIS SECTION IS STILL UNDER CONSTRUCTION''
 +
 +
The majority of users have no problem making connections to Unraid shares.
 +
Having said that Microsoft is continually tweaking the network security model via
 +
windows Update and this can cause problems for some users.
 +
 +
If you encounter problems then your first port of call should probably be the
 +
[https://forums.unraid.net/topic/53172-windows-issues-with-unraid/ Windows issues with Unraid] forum thread.
 +
This thread is rather long so you probably want to start near the end.
 +
 +
''COMMONEST ISSUES AND SOLUTIONS TO BE ADDED HERE''
 +
 +
=== Name Resolution ===
 +
 +
=== Stored Credentials ===
 +
 +
=== Multiple Sign-ons ===
  
 
<br>
 
<br>
Line 143: Line 261:
 
** The Unraid forums have lots of knowledgeable users who can help guide you through what needs doing to get your data back into a standard if you are not sure what are the best steps to take.
 
** The Unraid forums have lots of knowledgeable users who can help guide you through what needs doing to get your data back into a standard if you are not sure what are the best steps to take.
 
** Unraid is very good about protecting your data against typical hardware failures , but it is not immune against users taking inappropriate steps to recover their data after a failure occurs.
 
** Unraid is very good about protecting your data against typical hardware failures , but it is not immune against users taking inappropriate steps to recover their data after a failure occurs.
 +
 +
=== Lost Array Configuration ===
 +
 +
If you have lost the array configuration and do not have a current backup of the flash drive the data will still be intact on the drives.
 +
 +
All configuration information is stored on the flash drive in the ''config'' folder.  In particular the Unraid array configuration is stored in the ''config/super.dat'' file.
 +
* ''Do not attempt to use an out-of-date backup that may have incorrect drive assignments.''
 +
 +
If you know which drives were the parity drives then you can simply reassign the drives.  However if you do not know which were the parity drives then you have to be more careful as incorrectly assigning a drive to parity that is really a data drive will result in you losing its contents.
 +
 +
When you do not know which were your parity drives the following steps can get your array back into operation:
 +
# Assign all drives as data drives
 +
# Start the array
 +
# All the genuine data drives should show as mounted and the parity drive(s) show as ''unmountable''.  If this is not the case and too many drives show as unmountable then stop and ask for help in the forums giving details of what happened.
 +
# Make a note of the serial numbers of the parity drives. 
 +
# Stop the array
 +
# Go to Tools >>> New Config.  Select the option to retain current assignments (as it reduces the chance of error).  Click the yes I want to do this and then Apply.
 +
# Go back to the Main tab and correct the assignments of the parity drives.  Double check you have the right drives now assigned as parity drives as assigning a data drive to parity will lose its contents.  You can move any other drives around at this stage as well.
 +
# Start the array and the system will start building parity based on the current assignments.
 +
 +
All your User Shares will re-appear (as they are simply the aggregation of the top level folders on each drive) but with default settings so you may need to re-apply any customisation you want.   
 +
 
 +
You can now go any other customisation that is appropriate and add any plugins you normally use.
 +
 
 +
At this point it is strongly recommended that you click on the flash drive on the Main tab and selecting the option to download a backup of the flash drive.  It is always good practice to do this any time you make a significant change.
 +
 +
<br>
 +
 +
=== Using ''ddrescue'' to recover data from a failing disk ===
 +
 +
In normal use a tailed/disabled disk is recovered under Unraid using the [https://wiki.unraid.net/UnRAID_6/Storage_Management#Replacing_disks Replacing Disks] procedure.
 +
 +
Occasionally it can happen due to a variety of reasons, like a disk failing while parity is invalid or two disks failing with single parity, a user having a failing disk with pending sectors and no way to rebuild it using parity, the normal Unraid recovery processes cannot be used.  In such a case you can try using '''ddrescue''' to salvage as much data as possible.
 +
 +
To install ''ddrescue'' install the Nerd Pack plugin then go to Settings -> Nerd Pack and install ''ddrescue''.
 +
 
 +
You need an extra disk (same size or larger than the failing disk) to clone the old disk to, using the console/SSH type:
 +
 +
  ddrescue -f /dev/sdX /dev/sdY /boot/ddrescue.log
 +
 +
Both source and destination disks can't be mounted, replace X with source disk, Y with destination, always triple check these, if the wrong disk is used as destination it will be overwritten deleting all data.
 +
 
 +
It's also possible to use an array disk as destination, though only if it's the same size as the original, but to maintain parity you can only clone the partition, so the existing array disk needs to be a formatted Unraid disk already in any filesystem, still to maintain parity you need to use the md# device and the array needs to be started in maintenance mode, i.e., not accessible during the copy, by using the command:
 +
 +
  ddrescue -f /dev/sdX1 /dev/md# /boot/ddrescue.log
 +
 +
Replace X with source disk (note de 1 in the source disk identifier), # with destination disk number, recommend enabling turbo write first or it will take much longer.
 +
 +
Example output during the 1st pass:
 +
 +
  GNU ddrescue 1.22
 +
    ipos:  926889 MB, non-trimmed:    1695 kB,  current rate:  95092 kB/s
 +
    opos:  926889 MB, non-scraped:        0 B,  average rate:  79236 kB/s
 +
  non-tried:    1074 GB,  bad-sector:        0 B,    error rate:      0 B/s
 +
    rescued:  925804 MB,  bad areas:        0,        run time:  3h 14m 44s
 +
  pct rescued:  46.28%, read errors:      54,  remaining time:      3h 18m
 +
                              time since last successful read:          0s
 +
  Copying non-tried blocks... Pass 1 (forwards)
 +
 +
After copying all the good blocks ddrescue will retry the bad blocks, forwards and backwards, this last part can take some time depending on how bad the disk is, example:
 +
 
 +
  GNU ddrescue 1.22
 +
    ipos:  17878 MB, non-trimmed:        0 B,  current rate:      0 B/s
 +
    opos:  17878 MB, non-scraped:  362496 B,  average rate:  74898 kB/s
 +
  non-tried:        0 B,  bad-sector:    93696 B,    error rate:    102 B/s
 +
    rescued:    2000 GB,  bad areas:      101,        run time:  7h 25m  8s
 +
  pct rescued:  99.99%, read errors:      260,  remaining time:        25m
 +
                              time since last successful read:        10s
 +
  Scraping failed blocks... (forwards)
 +
 +
After the clone is complete you can mount the destination disk manually or using for example the UD plugin (if the cloned disk is unmountable run the appropriate filesystem repair tool, it might also be a good idea to run a filesystem check even if it mounts OK) and copy the recovered data to the array, some files will likely be corrupt and if you have checksums or are using BTRFS you can easily find out which ones, if not see below.
 +
 
 +
If you don't have checksums for your files (or use btrfs) there's a way you can still check which files were affected:
 +
 
 +
Create a temporary text file with a text string not present on your data, e.g.:
 +
  printf "unRAID " >~/fill.txt
 +
Then fill the bad blocks on the destination disk with that string:
 +
  ddrescue -f --fill=- ~/fill.txt /dev/sdY /boot/ddrescue.log
 +
Replace Y with the cloned disk (not the original) and use the existing ddrescue mapfile.
 +
 
 +
Finally mount the disk, manually or for example using the UD plugin and search for that string:
 +
 
 +
  find /mnt/path/to/disk -type f -exec grep -l "unRAID" '{}' ';'
 +
Replace /path/to/disk with the correct mount point, all files containing the string "unRAID" will be output and those are your corrupt files, this will take some time as all files on the disks will be scanned, output is only displayed in the end, and if there's no output then the bad sectors were in areas without any files.
 +
 +
 +
<br>
 +
 +
== Docker ==
 +
 +
 +
''THIS SECTION IS STILL UNDER CONSTRUCTION''
 +
 +
''A lot more detail still needs to be added''
 +
 +
=== Docker Image Full ===
 +
 +
Unraid expects docker containers to be configured to that only the binaries for the container are held in the ''docker.img'' file.  All locations within the container that write variable data are then expected to be mapped to locations external to the docher.img file.
 +
 +
The default of size of 20GB is enough for all but the most demanding users so if you find that your ''docker.img'' file is running out of space it definitely sounds as if you have at least one container incorrectly configured so it is writing internally to the docker image rather that to storage external to the image. 
 +
 
 +
Common mistakes are:
 +
* Leaving off the leading / on the container side of a path mapping so it is relative rather than absolute
 +
* Case mismatch on either side of a path mapping as Linux pathnames are case-significant.
 +
 +
If you cannot spot the error then you can try:
 +
* Go to docker tab and click the Container size button.  often this will highlight the problem docker container(s) so you now know where to look.
 +
 +
If that is not enough to identify the culprit then:
 +
* Make sure all containers are stopped and not set to auto-start
 +
* Stop docker service
 +
* delete current docker image and set a more reasonable size (e.g. 20G)
 +
* Start docker service
 +
* Use Apps >>> Previous apps to re-install your containers (with all their settings intact).
 +
* Go to docker tab and click the Container size button
 +
: This will give you a starting point for the space each container is using.
 +
* Enable one container, let it run for a short while and then press the Container size button again to see if that particular container is consuming space when it runs.
 +
* Repeat the above step until you track down the rogue container(s)
 +
 +
<br>

Revision as of 12:11, 6 February 2020

Official Documentation Contents List

Troubleshooting

THIS SECTION IS STILL UNDER CONSTRUCTION

A lot more detail still needs to be added

Most of the time Unraid systems function with minimal problems. This section is intended to help with resolving issues that are most commonly encountered.

There are some important general guidelines that it is recommended that a user follows that will help with any troubleshooting that may be required:

  • Use the built-in Help: The Unraid GUI has extensive built in Help for most fields in the GUI. This can be accessed at the individual field level by clicking on the field name, or toggled on/off for the whole page by clicking on the Help icon at the top right of the GUI.
  • Enable Notifications: Unraid has a notification system that can be used to keep you informed about the health of your Unraid system. This can be enabled and the level of notifications you receive tuned under Settings->User Preferences->Notification Settings. Since Unraid systems often function for very long times without needing any user oversight it can be important that you are informed problems when they first occur as if left unresolved they can grow into more serious ones.
  • Proactively fix any reported issues:
  • Ask for help in the forums: Unraid has a vibrant user community and many knowledgeable users who are active in the Unraid forums. Any time you encounter a problem and you are not sure how to proceed it is a good idea to ask questions in the forums. There is nothing worse than rushing into trying to fix a problem using a process you do not understand and as a result making a problem that was initially easy to resolve into something more serious
  • Capture diagnostics: If you want to ask a question in the forums about a problem you are encountering you are frequently going to be asked to provide your system diagnostics file. You need to do so, BEFORE YOU REBOOT so that the logs show went wrong BEFORE the reboot (because once you reboot, it's lost)!

Capturing Diagnostic Information

When you encounter any sort of problem it is always recommended that you attempt to capture as much information as possible to help with pin-pointing the cause. If you want to ask questions in the forum such information will typically be requested as it will speed up the process of getting meaningful and accurate feedback.

System Diagnostics

Unraid has a GUI option under Tools->Diagnostics to capture a lot of information about the state of your system that can be helpful when trying to diagnose any issues. Using this tool will result in a zip file being produced that can be downloaded and then attached to forum posts. If the GUI cannot be accessed for any reason then using the diagnostics command from the Linux level will generate the same information and put the resulting zip file into the logs folder on the flash drive. The Diagnostics should if at all possible be captured BEFORE you reboot so that the lops show what happened leading up to the problem occurring. The zip file produced can then be attached to a forum post when asking for help on a problem in the Unraid forums.

Diagnostics

These system diagnostics include configuration information, state information and key system logs. When creating the diagnostics from the GUI then details of the sort of information that will be included is listed. There is an option (set by default) to say that the diagnostics should be anonymized to try and avoid including any information that might be deemed sensitive. All the files in the diagnostics are text files so a user is free to examine them to check for themselves exactly what information is present.

Note on anonymization of the diagnostics
It has been pointed out that the diagnostics are not completely anonymized if you have enabled mover logging under Settings->Mover Settings as the syslog will give details of files that mover is operating on. This is a bit of a catch-22 scenario as when one has enabled mover logging it is normally to investigate a problem where as much detail as terrible is captured so attempting to anonymize such information may well be counter-productive. Since mover logging is disabled by default and recommended practice is to only have it enabled when investigating why mover is not giving the expected results this is probably acceptable?

Persistent Logs

The main system log is the syslog file and it is the contents of this file that is displayed when you click the Log icon at the top right of the Unraid GUI. Note that when posting to the forums extracted fragments of the syslog are rarely helpful as they do not show what lead up to a problem occurring.

Normally the logs are only written to RAM so do not survive the system being rebooted. If you are investigating a system crash then as long as you are running Unraid 6.7.2 there is now built-in syslog server support

Syslog server
  • Go to Settings->Network Services->Syslog Server
You can click on the 'Help' icon on the Toolbar and get more information for all of the options.
  • Mirror to Flash: This is the simplest to set up. You select 'Yes' from the dropdown box and click on the 'Apply' button and the syslog will be mirrored to logs folder/directory of the flash drive and is appended to on a reboot. There is one principal disadvantage to this method. If the condition, that you are trying to troubleshoot, takes days to weeks to occur, it can do a lot of writes to the flash drive. Some folks are hesitant to use the flash drive in this manner as it may shorten the life of the flash drive.
The advantage of this approach is that it captures everything from the start of the boot process which can be important if trying to diagnose boot problems.
  • Remote Syslog Server: This is used when you have another machine on your network that is acting as a syslog server. This can be another Unraid server. You can also use virtually any other computer. You find the necessary software by googling for the syslog server <Operating system> After you have set up the computer/server, you fill in the computer/server name or the IP address. (I prefer to use the IP address as there is never any confusion about what it is.) The Click on the 'Apply' button and your syslog will be mirrored to the other computer.
The other computer has be left on continuously until the problem occurs.
The events captured will only start with the point at which the syslog daemon is started during the boot process thus missing the very start of the boot process.
Syslog server
  • Local Syslog Server: Set this to Enabled to setup this Unraid server to act as a network syslog server. When this is enabled then some extra options are offered. The built-in Help gives guidance /n suitable settings.
    • Local syslog folder: This will be a share on the your server but chose it with care. Ideally, it will be a 'cache only' or a 'cache preferred' share. This will minimize the spinning up of disks due to the continuous writing of new lines to the syslog. A cache SSD drive would be the ideal choice here using a cache preferred share. The syslog will be in the root of that folder/share.)
    • Local syslog rotation: These settings allow you to control how much space the syslog is allowed to use.
      • Local syslog maximum file size:
      • Local syslog number of files:
If you click the 'Apply button at this point, you will have this server setup to serve as a Remote Syslog Server. It can now capture syslogs from several computers if the need should arise.
  • Using a bit of trickery we can use the Unraid server with the problem as the Local syslog server.
This is more appropriate if you want to continue to keep a permanent copy of the syslog but the file will not be as easy to access if the Unraid system is crashing.
You can now add the IP address of this server as the Remote syslog server (Remember the mention of trickery). So basically, you send data out-of-the-server and it comes-right-back-in.)

Notes:

  • The standard system diagnostics include the RAM copy of the syslog so there is no need to provide this separately. You will need to do so to provide the logs captured by the syslog server as these are not included in the standard system diagnostics.

Docker Containers

The standard system diagnostics do not contain much that will help with diagnosing issues with specific docker containers.

MORE DETAIL NEEDED

VMs

The standard system diagnostics do not contain much that will help with diagnosing issues with specific VMs.

MORE DETAIL NEEDED


Boot Issues

Preparing the flash drive

Boot Process

Most of the time the Unraid boot process runs seamlessly and the user needs no awareness of the various stages involved. However when things go wrong it can be useful to know how far the boot process managed to get as this will be of use in knowing what remedial action to take.

Resolving boot issues will typically need either a locally attached monitor+keyboard or (if the motherboard supports it) an IPMI connection to carry out equivalent functionality. This can then be used to set any required BIOS stings and to monitor the booting process.

The boot process for Unraid proceeds through a number of stages

  1. Bios boot: This is the stage at which the motherboard BIOS recognizes the presence of the Unraid bootable flash drive
    • The way that the Unraid flash drive is set as the default boot device is BIOS dependent so you may need to consult your motherboard's User Manual to determine the correct way to do this.
    • The Unraid flash drive supports booting in Legacy mode (also sometimes known as CSM mode) for older BiOS's and UEFI for more recent ones. Many recent BIOS's support both modes.
    • If you want UEFI boot mode to be used then the EFI folder on the flash drive must not have trailing tilde (~) character.
  2. Syslinux loader:
    Boot Menu
    • The entries that appear on the boot menu are specified by the syslinux/syslinux.cfg file on the flash drive. Although in theory this file can be edited manually as it a text file it is recommended that it is done via the GUI by clicking on the flash drive on the Main tab and going to the Syslinux configuration section
    • The Memtest86+ option only works if booting in Legacy mode. If booting in UEFI mode it will typically simply cause a reboot. If you want a version that will work in UEFI boot mode then you need to download it for yourself from either www.memtest.org or www.memtest86.com
    • If the user does not select a specific option then after a timeout period the default option will be used. If Unraid is running in headless mode this is the option that will be run.
  3. Linux core: This is the stage at which the syslinux boot loader takes over from the BIOS and starts loading the files specified in the syslinux.cfg file.
    • This is when the core Linux system is loaded from the flash drive and unpacked into RAM.
    • There will be messages on the console about the various bz* types being loaded into RAM.
    • If there are any error messages displayed while loading these files then it normally indicates a problem with the flash drive.
    • There will then be messages displayed as Linux start up and detects the hardware environment.
  4. Flash dependent services: At this stage the flash drive is mounted at /boot so that the process can continue
    • If the mount of the flash fails it is still possible to get the login prompt displayed. However this does not necessarily mean the whole boot process completed correctly.
    • If this stage of the boot process has not completed then typical symptoms are that the webGUI and network are not started
    One way to see if this has happened is to login and use the df command. If the flash drive was mounted successfully then you will see it as /boot in the resulting list of mount points.
    The output should have something like following mount points:
    /dev/sdb1 15413232 826976 14586256 6% /boot
    /dev/loop0 9344 9344 0 100% /lib/modules
    /dev/loop1 7424 7424 0 100% /lib/firmware
    • Additional drivers and firmware are now available on the above mount points.
    • Configuration information is read into RAM from the flash drive.
    • Standard Linux services are started. Examples would be networking and (if enabled) WireGuard VPN.
  5. Plugins:
    • If the user has installed plugins then they are normally loaded at this stage.
    • If one of the Safe Boot options was selected from the Unraid Boot menu then the loading of plugins is suppressed.
  6. Web GUI:
    • The Unraid web GUI is started.
    • The webGUI is actually done via an entry in the config/go file on the flash drive so it is possible for user supplied commands to also be run from there either before starting the webGUI or just after doing so.
  7. Array
    If the user has set the array to be auto-mounted then the following will start. If array auto-start is not set then they happen when the user elects to start the array.
    • Drives mounted
    Mount points will now be created of the form /dev/diskX and /mnt/cache (if you have a cache).
    • File Share Services
    Shares will now become available on the network.
    At the Linux level the shares will now appear as paths like /mnt/user/sharename
    • Docker Containers
    If the user has enabled the docker services then the Docker containers will be started using the order on the Docker tab.
    The order of the containers and delays between starting the containers can be set on the Docker tab.
    • VMs
    Any VMs the user has set to auto-start will now be started

By this stage the Unraid server will be fully operational.


Boot Failures

The following are some actions that can be taken to try and pin down the cause of a boot failure:

  1. if possible use a USB2 port in preference to a USB3 one as they seem to be more reliable for booting purposes.
  2. Check that the BIOS on the Unraid server still has the flash drive set as the boot device. It is not unknown for this to get reset for no obvious reason.
  3. On a windows 10 PC or a Mac run a check on the flash drive
    • This will determine if something is wrong physically or logically with the flash drive
    • If you do not already have one make sure you have a copy of the config folder of the plash drive as this contains all your current configuration information.
  4. Download the zip version of the release from Limetech and extract all the bz* type files over-writing those on the flash drive
    • This will determine if these files were not written correctly for some reason or are corrupt.
  5. Rewrite the flash drive with a clean copy of Unraid and copy across just the key file from your backup to the config folder
    • This can determine if the flash drive itself is OK
    • Copy across the remaining contents of the config folder to the flash drive
      • If this goes well you are back up and running with your previous configuration intact.
      • If this fails then try booting in Safe Mode. If this works then a plugin is causing problems
  6. If the original flash drive cannot be made to boot try a brand new flash drive and clean copy of Unraid (with the default configs)
    • This can determine if something is wrong with the server's hardware (mobo, cpu, ram, usb port, etc.)
  7. Install a clean/new copy of Unraid on a new flash drive and then copy the config folder over from the old one.
    • If this works then the license will need to be transferred to this new flash drive.


Lost root Password

Occasionally users lose their password for managing Unraid via the Unraid webGUI or console. This may be that they simply forgot the password, but corruption of the flash drive can also result in the password not being recognized.

Passwords for shares can be changed/reset from the Unraid webGUI.

To reset the management password use the following process:

  1. Shutdown server and then plug the USB boot flash device into a PC or Mac
  2. While there it is a good idea to run a check on the flash drive and make a backup of its contents
  3. Delete these files:
    config/passwd
    config/shadow
    config/smbpasswd
  4. Plug flash back into server and start up again.

Note that this will reset all user passwords including ‘root’ user to null (blank).

There is an alternative procedure that can be used for just resetting the root password (but is a little more prone to error):

  1. Plug the USB drive into another computer
  2. Bing up an editor (such as Notepad++) on the following file:
    /boot/config/shadow
  3. On the first line you should see something such as:
    root:$&$&%*1112233484847648DHD$%.:15477:0:99999:7:::
  4. Change that line to the following (essentially delete the content between the first and second semi-colons):
    root:15477:0:99999:7:::
  5. Plug flash back into server and start up again.


Shutdown Issues

THIS SECTION IS STILL UNDER CONSTRUCTION


Crash Issues

THIS SECTION IS STILL UNDER CONSTRUCTION


Windows Connection Issues

THIS SECTION IS STILL UNDER CONSTRUCTION

The majority of users have no problem making connections to Unraid shares. Having said that Microsoft is continually tweaking the network security model via windows Update and this can cause problems for some users.

If you encounter problems then your first port of call should probably be the Windows issues with Unraid forum thread. This thread is rather long so you probably want to start near the end.

COMMONEST ISSUES AND SOLUTIONS TO BE ADDED HERE

Name Resolution

Stored Credentials

Multiple Sign-ons


Data Recovery

THIS SECTION IS STILL UNDER CONSTRUCTION

A lot more detail still needs to be added

This section is about recovering your data when Unraid reports problems with one or more drives.

There are some important points to bear in mind about securing your data(

  • Backup critical data: Unraid will protect you against most types of simple hardware failure, but not catastrophic failure. You should ALWAYS have backups of any critical data that you cannot afford to lose. Ideally one of those copies should be offsite or on the cloud to protect yourself against unforeseen issues such as fire, theft, flood, etc.
    • Each user has to make their own determination of what they deem critical and make an assessment of the level of risk they are prepared to take.
    • Personal data such as photographs & documents tend to always be deemed critical. Luckily these tend to be relatively small so easy to back up.
    • Media files are often deemed non-critical and are relatively large so a user may well decide these do not merit being backed up
    • Personal video that can never be replaced should fall into the critical category regardless of it's size
    • Remember that there are things such as ransomware around so there should be at least one copy of critical files that cannot be accessed online and corrupted if you are unfortunate enough to suffer from such an attack!
  • Be pro-active about resolving any issues that are detected by Unraid. Make sure that notifications are enabled under Settings->Notifications so that you get told as soon as issues are detected. For many users Unraid operates in a fire-and-forget mode so that they will not be actively checking for problems so need such reminders.
  • Ask for Advice:
    • The Unraid forums have lots of knowledgeable users who can help guide you through what needs doing to get your data back into a standard if you are not sure what are the best steps to take.
    • Unraid is very good about protecting your data against typical hardware failures , but it is not immune against users taking inappropriate steps to recover their data after a failure occurs.

Lost Array Configuration

If you have lost the array configuration and do not have a current backup of the flash drive the data will still be intact on the drives.

All configuration information is stored on the flash drive in the config folder. In particular the Unraid array configuration is stored in the config/super.dat file.

  • Do not attempt to use an out-of-date backup that may have incorrect drive assignments.

If you know which drives were the parity drives then you can simply reassign the drives. However if you do not know which were the parity drives then you have to be more careful as incorrectly assigning a drive to parity that is really a data drive will result in you losing its contents.

When you do not know which were your parity drives the following steps can get your array back into operation:

  1. Assign all drives as data drives
  2. Start the array
  3. All the genuine data drives should show as mounted and the parity drive(s) show as unmountable. If this is not the case and too many drives show as unmountable then stop and ask for help in the forums giving details of what happened.
  4. Make a note of the serial numbers of the parity drives.
  5. Stop the array
  6. Go to Tools >>> New Config. Select the option to retain current assignments (as it reduces the chance of error). Click the yes I want to do this and then Apply.
  7. Go back to the Main tab and correct the assignments of the parity drives. Double check you have the right drives now assigned as parity drives as assigning a data drive to parity will lose its contents. You can move any other drives around at this stage as well.
  8. Start the array and the system will start building parity based on the current assignments.

All your User Shares will re-appear (as they are simply the aggregation of the top level folders on each drive) but with default settings so you may need to re-apply any customisation you want.

You can now go any other customisation that is appropriate and add any plugins you normally use.

At this point it is strongly recommended that you click on the flash drive on the Main tab and selecting the option to download a backup of the flash drive. It is always good practice to do this any time you make a significant change.


Using ddrescue to recover data from a failing disk

In normal use a tailed/disabled disk is recovered under Unraid using the Replacing Disks procedure.

Occasionally it can happen due to a variety of reasons, like a disk failing while parity is invalid or two disks failing with single parity, a user having a failing disk with pending sectors and no way to rebuild it using parity, the normal Unraid recovery processes cannot be used. In such a case you can try using ddrescue to salvage as much data as possible.

To install ddrescue install the Nerd Pack plugin then go to Settings -> Nerd Pack and install ddrescue.

You need an extra disk (same size or larger than the failing disk) to clone the old disk to, using the console/SSH type:

 ddrescue -f /dev/sdX /dev/sdY /boot/ddrescue.log

Both source and destination disks can't be mounted, replace X with source disk, Y with destination, always triple check these, if the wrong disk is used as destination it will be overwritten deleting all data.

It's also possible to use an array disk as destination, though only if it's the same size as the original, but to maintain parity you can only clone the partition, so the existing array disk needs to be a formatted Unraid disk already in any filesystem, still to maintain parity you need to use the md# device and the array needs to be started in maintenance mode, i.e., not accessible during the copy, by using the command:

 ddrescue -f /dev/sdX1 /dev/md# /boot/ddrescue.log

Replace X with source disk (note de 1 in the source disk identifier), # with destination disk number, recommend enabling turbo write first or it will take much longer.

Example output during the 1st pass:

 GNU ddrescue 1.22
    ipos:  926889 MB, non-trimmed:    1695 kB,  current rate:  95092 kB/s
    opos:  926889 MB, non-scraped:        0 B,  average rate:  79236 kB/s
 non-tried:    1074 GB,  bad-sector:        0 B,    error rate:       0 B/s
   rescued:  925804 MB,   bad areas:        0,        run time:  3h 14m 44s
 pct rescued:   46.28%, read errors:       54,  remaining time:      3h 18m
                             time since last successful read:          0s
 Copying non-tried blocks... Pass 1 (forwards)

After copying all the good blocks ddrescue will retry the bad blocks, forwards and backwards, this last part can take some time depending on how bad the disk is, example:

 GNU ddrescue 1.22
    ipos:   17878 MB, non-trimmed:        0 B,  current rate:       0 B/s
    opos:   17878 MB, non-scraped:   362496 B,  average rate:  74898 kB/s
 non-tried:        0 B,  bad-sector:    93696 B,    error rate:     102 B/s
   rescued:    2000 GB,   bad areas:      101,        run time:  7h 25m  8s
 pct rescued:   99.99%, read errors:      260,  remaining time:         25m
                             time since last successful read:         10s
 Scraping failed blocks... (forwards)

After the clone is complete you can mount the destination disk manually or using for example the UD plugin (if the cloned disk is unmountable run the appropriate filesystem repair tool, it might also be a good idea to run a filesystem check even if it mounts OK) and copy the recovered data to the array, some files will likely be corrupt and if you have checksums or are using BTRFS you can easily find out which ones, if not see below.

If you don't have checksums for your files (or use btrfs) there's a way you can still check which files were affected:

Create a temporary text file with a text string not present on your data, e.g.:

 printf "unRAID " >~/fill.txt

Then fill the bad blocks on the destination disk with that string:

 ddrescue -f --fill=- ~/fill.txt /dev/sdY /boot/ddrescue.log

Replace Y with the cloned disk (not the original) and use the existing ddrescue mapfile.

Finally mount the disk, manually or for example using the UD plugin and search for that string:

 find /mnt/path/to/disk -type f -exec grep -l "unRAID" '{}' ';'

Replace /path/to/disk with the correct mount point, all files containing the string "unRAID" will be output and those are your corrupt files, this will take some time as all files on the disks will be scanned, output is only displayed in the end, and if there's no output then the bad sectors were in areas without any files.



Docker

THIS SECTION IS STILL UNDER CONSTRUCTION

A lot more detail still needs to be added

Docker Image Full

Unraid expects docker containers to be configured to that only the binaries for the container are held in the docker.img file. All locations within the container that write variable data are then expected to be mapped to locations external to the docher.img file.

The default of size of 20GB is enough for all but the most demanding users so if you find that your docker.img file is running out of space it definitely sounds as if you have at least one container incorrectly configured so it is writing internally to the docker image rather that to storage external to the image.

Common mistakes are:

  • Leaving off the leading / on the container side of a path mapping so it is relative rather than absolute
  • Case mismatch on either side of a path mapping as Linux pathnames are case-significant.

If you cannot spot the error then you can try:

  • Go to docker tab and click the Container size button. often this will highlight the problem docker container(s) so you now know where to look.

If that is not enough to identify the culprit then:

  • Make sure all containers are stopped and not set to auto-start
  • Stop docker service
  • delete current docker image and set a more reasonable size (e.g. 20G)
  • Start docker service
  • Use Apps >>> Previous apps to re-install your containers (with all their settings intact).
  • Go to docker tab and click the Container size button
This will give you a starting point for the space each container is using.
  • Enable one container, let it run for a short while and then press the Container size button again to see if that particular container is consuming space when it runs.
  • Repeat the above step until you track down the rogue container(s)