Time |
Nickname |
Message |
00:04
🔗
|
|
kristian_ has quit IRC (Quit: Leaving) |
00:12
🔗
|
|
ZexaronS has quit IRC (Quit: Leaving) |
00:26
🔗
|
|
ZexaronS has joined #archiveteam-bs |
00:31
🔗
|
|
Ravenloft has quit IRC () |
00:44
🔗
|
icedice |
imgbox.com and abload.de have a pretty good track record (though Imgbox announced they were shutting down a while ago and then retracted it later saying they "have partnered with a new team that have extensive experience in large-scale hosting") |
00:44
🔗
|
icedice |
but yeah, image hosts are dropping like flies |
00:44
🔗
|
icedice |
IPFS could maybe be a solution to that in the future |
00:45
🔗
|
Asparagir |
Just chiming in to say that I think doing much more regular scans of imgur would be peachy keen, |
00:46
🔗
|
icedice |
https://ipfs.io/ |
00:47
🔗
|
icedice |
Doing an !a archivation job of https://www.reddit.com/domain/imgur.com/ would be a great start |
00:48
🔗
|
joepie91 |
icedice: once again: IPFS *does not provide persistence* |
00:48
🔗
|
joepie91 |
there is absolutely zero guarantee that a copy of a given file will remain available |
00:48
🔗
|
icedice |
ok |
00:48
🔗
|
icedice |
didn't know that |
00:48
🔗
|
icedice |
first time discussing it here or anywhere else online for that matter |
00:48
🔗
|
joepie91 |
icedice: unfortunately IPFS markets itself as 'the permanent web', and per the authors 'permanent' is meant to refer to 'immutable', not 'persistent' |
00:48
🔗
|
joepie91 |
(which I still think is grossly misleading) |
00:49
🔗
|
joepie91 |
so I understand the confusioin but I still want to point it out very clearly and unambiguously :P |
00:49
🔗
|
icedice |
yeah |
00:49
🔗
|
icedice |
ok |
00:49
🔗
|
joepie91 |
icedice: basically, think of IPFS as "if a filesystem were based on torrent technology" |
00:49
🔗
|
joepie91 |
IPFS is great if you understand its limitations; it's just not an archival medium nor a reliable hosting platform |
00:49
🔗
|
joepie91 |
and it doesn't implement any 'assure availability' mechanics like Freenet does |
00:50
🔗
|
joepie91 |
the moment there are no seeds, data is gone |
00:50
🔗
|
icedice |
so it's like kind of like Freenet minus the anonymity? |
00:52
🔗
|
icedice |
Have you guys crawled https://www.reddit.com/domain/imgur.com/ btw? |
00:55
🔗
|
icedice |
With some exclusion rules that limit the crawl to imgur.com it should do a pretty good job at archiving a lot of popular content from Imgur |
00:55
🔗
|
joepie91 |
icedice: it's *not* like Freenet at all :) |
00:56
🔗
|
joepie91 |
(that's half the point) |
00:56
🔗
|
joepie91 |
icedice: it's like torrents, if anything. |
00:56
🔗
|
icedice |
ok |
00:56
🔗
|
joepie91 |
has all the same technical characteristics |
00:56
🔗
|
joepie91 |
just more suitable for filesystem-y tasks |
00:56
🔗
|
icedice |
So maybe more like ZeroNet |
00:56
🔗
|
joepie91 |
but generally, any assumption that holds true for torrents also holds true for IPFS |
00:56
🔗
|
joepie91 |
I don't know enough about ZeroNet architecture to meaningfully answer that |
00:57
🔗
|
icedice |
https://zeronet.io/ |
00:57
🔗
|
icedice |
"Open, free and uncensorable websites, |
00:57
🔗
|
icedice |
using Bitcoin cryptography and BitTorrent network" |
00:57
🔗
|
icedice |
^ BitTorrent powered there as well |
00:57
🔗
|
joepie91 |
icedice: yes, but that's the marketing slogan, it doesn't tell me what its actual design or guarantees are :) |
00:58
🔗
|
icedice |
ok |
00:59
🔗
|
JAA |
icedice: !a https://www.reddit.com/domain/imgur.com/ wouldn't work. /domain pages are limited to 1000 results. |
00:59
🔗
|
JAA |
Same for the search, for that matter. |
01:00
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
01:00
🔗
|
JAA |
You can work around it by using the "cloudsearch" syntax and timestamps, but it's annoying. |
01:02
🔗
|
JAA |
And obviously, it won't cover any Imgur links used outside of Reddit. |
01:03
🔗
|
JAA |
But yes, it might be a good idea to start a low-priority project for this. We might be able to reuse some of the code from Eroshare for the link extraction part. |
01:05
🔗
|
|
dashcloud has quit IRC (Ping timeout: 245 seconds) |
01:06
🔗
|
|
dashcloud has joined #archiveteam-bs |
01:21
🔗
|
|
j08nY has quit IRC (Quit: Leaving) |
01:25
🔗
|
|
fie has quit IRC (Ping timeout: 246 seconds) |
01:32
🔗
|
|
pizzaiolo has quit IRC (Remote host closed the connection) |
02:10
🔗
|
kisspunch |
is there any kind of standardized database/format for content-addessible data storage |
02:11
🔗
|
kisspunch |
I know there's magnet links and IPFS and so on, but none of them seem either standard or interconnected? |
02:14
🔗
|
kisspunch |
I'm not talking distribution, just metadata/indexing/cross-references |
02:26
🔗
|
|
ZexaronS- has joined #archiveteam-bs |
02:32
🔗
|
|
ZexaronS- has quit IRC (Ping timeout: 260 seconds) |
02:32
🔗
|
|
ZexaronS- has joined #archiveteam-bs |
02:33
🔗
|
|
ZexaronS has quit IRC (Read error: Operation timed out) |
02:34
🔗
|
|
Odd0002 has quit IRC (Remote host closed the connection) |
02:34
🔗
|
odemg |
http://archivisthings.eieidoh.net:8880/DataHoarder/Comics/ |
02:35
🔗
|
|
ZexaronS- has quit IRC (Client Quit) |
02:36
🔗
|
|
ZexaronS has joined #archiveteam-bs |
02:44
🔗
|
|
ReimuHaku has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) |
02:44
🔗
|
|
ReimuHaku has joined #archiveteam-bs |
02:48
🔗
|
|
ReimuHaku has quit IRC (Client Quit) |
02:55
🔗
|
|
icedice has quit IRC (Read error: Operation timed out) |
02:56
🔗
|
|
SilSte has quit IRC (Read error: Operation timed out) |
02:57
🔗
|
|
ReimuHaku has joined #archiveteam-bs |
02:57
🔗
|
|
ReimuHaku has quit IRC (Client Quit) |
03:00
🔗
|
|
SilSte has joined #archiveteam-bs |
03:02
🔗
|
|
ReimuHaku has joined #archiveteam-bs |
03:49
🔗
|
|
qw3rty has joined #archiveteam-bs |
03:56
🔗
|
|
qw3rty2 has quit IRC (Read error: Operation timed out) |
04:29
🔗
|
|
BubuAnabe has quit IRC (Ping timeout: 268 seconds) |
04:33
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
04:36
🔗
|
|
BubuAnabe has joined #archiveteam-bs |
04:40
🔗
|
|
Sk1d has joined #archiveteam-bs |
05:02
🔗
|
|
zhongfu has joined #archiveteam-bs |
05:17
🔗
|
|
BubuAnabe has quit IRC (Ping timeout: 268 seconds) |
06:38
🔗
|
|
ZexaronS- has joined #archiveteam-bs |
06:40
🔗
|
|
ZexaronS has quit IRC (Read error: Operation timed out) |
06:45
🔗
|
|
Honno has joined #archiveteam-bs |
07:04
🔗
|
|
Famicoman has quit IRC (Ping timeout: 260 seconds) |
07:07
🔗
|
|
ZexaronS- has quit IRC (Read error: Operation timed out) |
07:08
🔗
|
|
ZexaronS has joined #archiveteam-bs |
07:12
🔗
|
|
Famicoman has joined #archiveteam-bs |
07:18
🔗
|
|
ZexaronS has quit IRC (Quit: Leaving) |
07:24
🔗
|
|
ZexaronS has joined #archiveteam-bs |
07:33
🔗
|
|
Famicoman has quit IRC (Ping timeout: 260 seconds) |
07:40
🔗
|
|
Famicoman has joined #archiveteam-bs |
07:46
🔗
|
godane |
so i up to 1995-06-30 with tagesschau 20 clock news |
07:57
🔗
|
|
kyounko has joined #archiveteam-bs |
08:03
🔗
|
|
Famicoman has quit IRC (Ping timeout: 260 seconds) |
08:10
🔗
|
|
Famicoman has joined #archiveteam-bs |
08:28
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
08:28
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
08:33
🔗
|
|
Famicoman has quit IRC (Ping timeout: 260 seconds) |
08:39
🔗
|
|
Famicoman has joined #archiveteam-bs |
08:47
🔗
|
godane |
just noticed that electronic gaming monthly went dark 36 days ago |
08:51
🔗
|
|
kristian_ has joined #archiveteam-bs |
09:00
🔗
|
|
Famicoman has quit IRC (Ping timeout: 260 seconds) |
09:05
🔗
|
|
kyounko|2 has joined #archiveteam-bs |
09:06
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
09:07
🔗
|
|
Famicoman has joined #archiveteam-bs |
09:08
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
09:11
🔗
|
|
kyounko has quit IRC (Read error: Operation timed out) |
09:11
🔗
|
|
SHODAN_UI has joined #archiveteam-bs |
09:31
🔗
|
|
Famicoman has quit IRC (Ping timeout: 260 seconds) |
09:36
🔗
|
|
Famicoman has joined #archiveteam-bs |
09:53
🔗
|
|
kyounko|2 has quit IRC (Read error: Connection reset by peer) |
09:59
🔗
|
|
SHODAN_UI has quit IRC (Remote host closed the connection) |
10:00
🔗
|
|
kristian_ has quit IRC (Quit: Leaving) |
10:08
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
10:15
🔗
|
|
j08nY has joined #archiveteam-bs |
10:29
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
11:06
🔗
|
godane |
i'm uploading newer eric archive docs: https://archive.org/details/ERIC_ED565342 |
12:16
🔗
|
|
SHODAN_UI has joined #archiveteam-bs |
12:28
🔗
|
|
Honno has joined #archiveteam-bs |
12:41
🔗
|
|
kristian_ has joined #archiveteam-bs |
13:29
🔗
|
|
icedice has joined #archiveteam-bs |
13:52
🔗
|
arkiver |
odemg: http://archivisthings.eieidoh.net:8880/DataHoarder/Comics/ gives me a 403 |
13:53
🔗
|
odemg |
arkiver, server went down, I've redirected dns, just populating /DataHoarder/Comics as fast as I can |
13:54
🔗
|
arkiver |
thanks odemg |
13:56
🔗
|
odemg |
arkiver, 1.1TB of anime stuff in the mean time? http://archivisthings.eieidoh.net:8880/DataHoarder/ |
13:58
🔗
|
arkiver |
:) |
13:59
🔗
|
arkiver |
odemg: what this VR Content? |
13:59
🔗
|
arkiver |
from the README |
13:59
🔗
|
odemg |
it was 1TB of VR related games etc mirrored from ultimategamer.club after the hack |
14:00
🔗
|
arkiver |
very nice |
14:00
🔗
|
arkiver |
definitely grabbing a copy of that |
14:01
🔗
|
odemg |
arkiver, I'll let you know when it's back up |
14:01
🔗
|
arkiver |
thanks |
14:11
🔗
|
HCross2 |
odemg: is that a complete Naruto collection? |
14:12
🔗
|
HCross2 |
I've been looking for this for a while |
14:12
🔗
|
odemg |
yes |
14:14
🔗
|
HCross2 |
Thank you so much |
14:15
🔗
|
odemg |
HCross2, get it as fast as you can :p |
14:20
🔗
|
HCross2 |
odemg: is there a nicer way then doing a wget -r? |
14:21
🔗
|
odemg |
feed aria the file list aria2c -j 25 -c -i list |
14:22
🔗
|
|
pizzaiolo has joined #archiveteam-bs |
14:23
🔗
|
odemg |
HCross2, http://archivisthings.eieidoh.net:8880/DataHoarder/Anime/Naruto%20Complete%20Series/list |
14:23
🔗
|
HCross2 |
tyvm |
14:24
🔗
|
odemg |
there you go, 50-70MB/s |
14:32
🔗
|
|
yaMatt has joined #archiveteam-bs |
14:33
🔗
|
|
yaMatt has quit IRC (Client Quit) |
14:46
🔗
|
|
Famicoman has quit IRC (Ping timeout: 260 seconds) |
14:50
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
14:52
🔗
|
|
Smiley has quit IRC (Read error: Connection reset by peer) |
14:52
🔗
|
|
Smiley has joined #archiveteam-bs |
14:53
🔗
|
|
Famicoman has joined #archiveteam-bs |
15:06
🔗
|
|
SHODAN_UI has quit IRC (Ping timeout: 255 seconds) |
15:07
🔗
|
|
kristian_ has quit IRC (Ping timeout: 370 seconds) |
15:08
🔗
|
|
winr4r has quit IRC (Remote host closed the connection) |
15:11
🔗
|
|
SHODAN_UI has joined #archiveteam-bs |
15:11
🔗
|
|
SHODAN_UI has quit IRC (Read error: Connection reset by peer) |
15:13
🔗
|
|
SHODAN_UI has joined #archiveteam-bs |
15:15
🔗
|
|
Famicoman has quit IRC (Ping timeout: 260 seconds) |
15:16
🔗
|
|
SHODAN_UI has quit IRC (Read error: Connection reset by peer) |
15:18
🔗
|
|
SHODAN_UI has joined #archiveteam-bs |
15:24
🔗
|
|
Famicoman has joined #archiveteam-bs |
15:31
🔗
|
|
dashcloud has quit IRC (Ping timeout: 260 seconds) |
15:34
🔗
|
|
dashcloud has joined #archiveteam-bs |
15:40
🔗
|
hook54321 |
Do any of you know how to install grab-site on archlinux? |
15:44
🔗
|
useretail |
hey guys, is there some tripod archive? |
15:45
🔗
|
useretail |
wayback says that it's excluded |
16:10
🔗
|
|
BubuAnabe has joined #archiveteam-bs |
16:35
🔗
|
odemg |
HCross2, anime and comics dirs updated |
16:36
🔗
|
HCross2 |
odemg: can you do me a favour and make a list of every URL please? |
16:37
🔗
|
HCross2 |
Im going to mirror it to some HDDs locally |
16:37
🔗
|
HCross2 |
and I want to copy it to my own Online.net box first so I can let it download at its own pace |
16:38
🔗
|
Frogging |
hmm I wonder if I have space for any of this myself |
16:40
🔗
|
odemg |
HCross2, https://chrome.google.com/webstore/detail/link-grabber/caodelkhipncidmoebgbbeemedohcdma |
16:40
🔗
|
HCross2 |
ty |
16:42
🔗
|
|
simsy has joined #archiveteam-bs |
16:42
🔗
|
simsy |
hi |
16:46
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
16:55
🔗
|
|
RichardG has joined #archiveteam-bs |
16:55
🔗
|
|
RichardG_ has quit IRC (Read error: Connection reset by peer) |
17:11
🔗
|
hook54321 |
How do I import cookies into a grab-site/archivebot instance? |
17:19
🔗
|
|
BartoCH has joined #archiveteam-bs |
17:36
🔗
|
|
Famicoman has quit IRC (Ping timeout: 260 seconds) |
17:39
🔗
|
hook54321 |
i actually figured out the cookie thing. |
17:39
🔗
|
hook54321 |
For grab-site, what is the format of the ignore file like? |
17:42
🔗
|
|
Honno has joined #archiveteam-bs |
17:45
🔗
|
|
Famicoman has joined #archiveteam-bs |
17:46
🔗
|
|
simsy has quit IRC (Read error: Connection reset by peer) |
17:47
🔗
|
|
Ravenloft has joined #archiveteam-bs |
17:57
🔗
|
Aoede |
hook54321: https://github.com/ludios/grab-site/blob/master/libgrabsite/ignore_sets/forums |
17:58
🔗
|
hook54321 |
K. got that working. I imported a cookies.txt file, but it's not logged into the website for some reason. |
18:03
🔗
|
|
Famicoman has quit IRC (Ping timeout: 260 seconds) |
18:04
🔗
|
|
Ravenloft has quit IRC (Ping timeout: 250 seconds) |
18:05
🔗
|
JAA |
Different IP or user agent from when you logged in? |
18:07
🔗
|
hook54321 |
Useragent yeah. I'll try to set it to the same and see what happens. |
18:08
🔗
|
JAA |
Note that it's possible your session already got invalidated on the server side, so you may need to log in again. |
18:13
🔗
|
|
Famicoman has joined #archiveteam-bs |
18:17
🔗
|
hook54321 |
It just keeps on crashing about 3 or 4 urls in |
18:23
🔗
|
hook54321 |
https://gist.githubusercontent.com/hook54321a/71f8224b4e15d0ec23eb378f6474fcee/raw/eeada89d724f7941bf3708b31509905cc2d3aac2/gistfile1.txt |
18:34
🔗
|
|
SHODAN_UI has quit IRC (Remote host closed the connection) |
18:51
🔗
|
kisspunch |
hook54321: please make an arch grab-site package :) |
19:06
🔗
|
hook54321 |
kisspunch: If there were one, I wouldn't be trying to run it through the Ubuntu Windows bash thing. |
19:06
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
19:07
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
19:10
🔗
|
kisspunch |
i have no idea what you're trying to describe but it sounds horrifying |
19:10
🔗
|
kisspunch |
learn to make packages, it's pretty easy |
19:10
🔗
|
kisspunch |
go read a random PKGBUILD |
19:11
🔗
|
hook54321 |
I did get through part of the installation process, but then it said something about missing OpenSSL libraries. |
19:11
🔗
|
kisspunch |
yeah, you'd have to manage the manual installation process as step 1 |
20:18
🔗
|
|
Honno has joined #archiveteam-bs |
20:25
🔗
|
|
marvinw is now known as ivan |
20:25
🔗
|
ivan |
hook54321: segfault might imply a problem with lmdb, try grab-site --no-dupespotter |
20:26
🔗
|
|
SHODAN_UI has joined #archiveteam-bs |
20:33
🔗
|
hook54321 |
I think it's working now. Thank you so much |
20:33
🔗
|
ivan |
cool |
20:41
🔗
|
HCross2 |
I'm using grab-site for some pretty huge crawls and its coping really well |
20:41
🔗
|
HCross2 |
In fact, im currently capturing every .london homepage and its not falling over |
20:41
🔗
|
jrwr |
Nice |
20:42
🔗
|
HCross2 |
I split it in 6 in case it did have issues |
20:42
🔗
|
HCross2 |
but each pack is still around 15k homepages |
20:42
🔗
|
HCross2 |
plus whatever other assets it needds |
20:42
🔗
|
jrwr |
HCross2: Im looking to make a Tor Version of ArchiveBot |
20:42
🔗
|
HCross2 |
oh nice |
20:42
🔗
|
jrwr |
I just need something with Diskspace, all I have access to is 50GB |
20:42
🔗
|
HCross2 |
Can the wayback handle .onion sites? |
20:43
🔗
|
jrwr |
I think so |
20:43
🔗
|
jrwr |
even then, archive now, worry about it later |
20:43
🔗
|
HCross2 |
jrwr: use your 50GB as a testbed, but talk to me when you have it working |
20:44
🔗
|
jrwr |
I had one setup |
20:45
🔗
|
jrwr |
pretty easy, just do Tor in a transparent method |
20:45
🔗
|
jrwr |
abused LXC a little to do it as well |
20:45
🔗
|
hook54321 |
I'm running it through the Ubuntu bash thing in Windows 10... Which probably has something to do with it. |
20:48
🔗
|
Frogging |
use a VM or actual linux |
20:50
🔗
|
HCross2 |
hook54321: Can you send me a warc from your Windows 10 setup please? I would like to run a few validation checks on it |
21:19
🔗
|
|
bmcginty has quit IRC (Ping timeout: 250 seconds) |
21:21
🔗
|
|
bmcginty has joined #archiveteam-bs |
21:47
🔗
|
JAA |
6 days into my Tilt API grab: 4.36M URLs retrieved for 11.5 GiB of warc.gz, 5.87M queued (rising again, unfortunately); 779k users, 104k campaigns, 1.67M URLs discovered |
22:12
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
22:14
🔗
|
|
j08nY has quit IRC (Read error: Operation timed out) |
22:14
🔗
|
|
j08nY has joined #archiveteam-bs |
22:26
🔗
|
|
SHODAN_UI has quit IRC (Remote host closed the connection) |
22:30
🔗
|
Frogging |
I freaked out briefly because I found a corrupted photo on my NAS despite the RAID check telling me everything was fine |
22:31
🔗
|
Frogging |
turns out it was corrupted at the source. phew |
22:33
🔗
|
Frogging |
the source being an old external HDD. it's a good thing I cloned that disk when I did because clearly it wasn't trustworthy |
22:33
🔗
|
|
Famicoman has quit IRC (Ping timeout: 260 seconds) |
22:39
🔗
|
|
mundus201 has joined #archiveteam-bs |
22:40
🔗
|
|
Famicoman has joined #archiveteam-bs |
23:09
🔗
|
hook54321 |
HCross2: It's not done yet. |
23:09
🔗
|
hook54321 |
What are validation checks? |
23:23
🔗
|
|
BubuAnabe has quit IRC (Ping timeout: 268 seconds) |
23:25
🔗
|
|
Ravenloft has joined #archiveteam-bs |
23:28
🔗
|
joepie91 |
Frogging: obligatory "RAID is an availability measure, not an integrity measure" |
23:28
🔗
|
joepie91 |
(ie. not a backup) |
23:29
🔗
|
Frogging |
oh I know, I just use it in my NAS, which I use to back up my PC. I was comparing the files in my PC with those on the NAS. but I still run a monthly check just to catch anything odd |
23:30
🔗
|
|
BubuAnabe has joined #archiveteam-bs |
23:30
🔗
|
Frogging |
the comparison lead me to believe corruption occured on the NAS but really it was because I was comparing my PC with a backup of a backup that got corrupted long ago |
23:31
🔗
|
Frogging |
if that sounds dumb it's because it is, and that's why I'm sorting all this stuff out so it can actually make sense :p |
23:33
🔗
|
Frogging |
I ran rsync with -ni and saw this |
23:33
🔗
|
Frogging |
<fc........ Panorama 1.JPG |
23:34
🔗
|
Frogging |
the checksum changing but not the size or the time is a red flag :p |
23:56
🔗
|
|
pizzaiolo has quit IRC (Remote host closed the connection) |
23:59
🔗
|
|
pizzaiolo has joined #archiveteam-bs |