#archiveteam-bs 2017-07-01,Sat

↑back Search

Time Nickname Message
00:04 🔗 kristian_ has quit IRC (Quit: Leaving)
00:12 🔗 ZexaronS has quit IRC (Quit: Leaving)
00:26 🔗 ZexaronS has joined #archiveteam-bs
00:31 🔗 Ravenloft has quit IRC ()
00:44 🔗 icedice imgbox.com and abload.de have a pretty good track record (though Imgbox announced they were shutting down a while ago and then retracted it later saying they "have partnered with a new team that have extensive experience in large-scale hosting")
00:44 🔗 icedice but yeah, image hosts are dropping like flies
00:44 🔗 icedice IPFS could maybe be a solution to that in the future
00:45 🔗 Asparagir Just chiming in to say that I think doing much more regular scans of imgur would be peachy keen,
00:46 🔗 icedice https://ipfs.io/
00:47 🔗 icedice Doing an !a archivation job of https://www.reddit.com/domain/imgur.com/ would be a great start
00:48 🔗 joepie91 icedice: once again: IPFS *does not provide persistence*
00:48 🔗 joepie91 there is absolutely zero guarantee that a copy of a given file will remain available
00:48 🔗 icedice ok
00:48 🔗 icedice didn't know that
00:48 🔗 icedice first time discussing it here or anywhere else online for that matter
00:48 🔗 joepie91 icedice: unfortunately IPFS markets itself as 'the permanent web', and per the authors 'permanent' is meant to refer to 'immutable', not 'persistent'
00:48 🔗 joepie91 (which I still think is grossly misleading)
00:49 🔗 joepie91 so I understand the confusioin but I still want to point it out very clearly and unambiguously :P
00:49 🔗 icedice yeah
00:49 🔗 icedice ok
00:49 🔗 joepie91 icedice: basically, think of IPFS as "if a filesystem were based on torrent technology"
00:49 🔗 joepie91 IPFS is great if you understand its limitations; it's just not an archival medium nor a reliable hosting platform
00:49 🔗 joepie91 and it doesn't implement any 'assure availability' mechanics like Freenet does
00:50 🔗 joepie91 the moment there are no seeds, data is gone
00:50 🔗 icedice so it's like kind of like Freenet minus the anonymity?
00:52 🔗 icedice Have you guys crawled https://www.reddit.com/domain/imgur.com/ btw?
00:55 🔗 icedice With some exclusion rules that limit the crawl to imgur.com it should do a pretty good job at archiving a lot of popular content from Imgur
00:55 🔗 joepie91 icedice: it's *not* like Freenet at all :)
00:56 🔗 joepie91 (that's half the point)
00:56 🔗 joepie91 icedice: it's like torrents, if anything.
00:56 🔗 icedice ok
00:56 🔗 joepie91 has all the same technical characteristics
00:56 🔗 joepie91 just more suitable for filesystem-y tasks
00:56 🔗 icedice So maybe more like ZeroNet
00:56 🔗 joepie91 but generally, any assumption that holds true for torrents also holds true for IPFS
00:56 🔗 joepie91 I don't know enough about ZeroNet architecture to meaningfully answer that
00:57 🔗 icedice https://zeronet.io/
00:57 🔗 icedice "Open, free and uncensorable websites,
00:57 🔗 icedice using Bitcoin cryptography and BitTorrent network"
00:57 🔗 icedice ^ BitTorrent powered there as well
00:57 🔗 joepie91 icedice: yes, but that's the marketing slogan, it doesn't tell me what its actual design or guarantees are :)
00:58 🔗 icedice ok
00:59 🔗 JAA icedice: !a https://www.reddit.com/domain/imgur.com/ wouldn't work. /domain pages are limited to 1000 results.
00:59 🔗 JAA Same for the search, for that matter.
01:00 🔗 BlueMaxim has joined #archiveteam-bs
01:00 🔗 JAA You can work around it by using the "cloudsearch" syntax and timestamps, but it's annoying.
01:02 🔗 JAA And obviously, it won't cover any Imgur links used outside of Reddit.
01:03 🔗 JAA But yes, it might be a good idea to start a low-priority project for this. We might be able to reuse some of the code from Eroshare for the link extraction part.
01:05 🔗 dashcloud has quit IRC (Ping timeout: 245 seconds)
01:06 🔗 dashcloud has joined #archiveteam-bs
01:21 🔗 j08nY has quit IRC (Quit: Leaving)
01:25 🔗 fie has quit IRC (Ping timeout: 246 seconds)
01:32 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
02:10 🔗 kisspunch is there any kind of standardized database/format for content-addessible data storage
02:11 🔗 kisspunch I know there's magnet links and IPFS and so on, but none of them seem either standard or interconnected?
02:14 🔗 kisspunch I'm not talking distribution, just metadata/indexing/cross-references
02:26 🔗 ZexaronS- has joined #archiveteam-bs
02:32 🔗 ZexaronS- has quit IRC (Ping timeout: 260 seconds)
02:32 🔗 ZexaronS- has joined #archiveteam-bs
02:33 🔗 ZexaronS has quit IRC (Read error: Operation timed out)
02:34 🔗 Odd0002 has quit IRC (Remote host closed the connection)
02:34 🔗 odemg http://archivisthings.eieidoh.net:8880/DataHoarder/Comics/
02:35 🔗 ZexaronS- has quit IRC (Client Quit)
02:36 🔗 ZexaronS has joined #archiveteam-bs
02:44 🔗 ReimuHaku has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
02:44 🔗 ReimuHaku has joined #archiveteam-bs
02:48 🔗 ReimuHaku has quit IRC (Client Quit)
02:55 🔗 icedice has quit IRC (Read error: Operation timed out)
02:56 🔗 SilSte has quit IRC (Read error: Operation timed out)
02:57 🔗 ReimuHaku has joined #archiveteam-bs
02:57 🔗 ReimuHaku has quit IRC (Client Quit)
03:00 🔗 SilSte has joined #archiveteam-bs
03:02 🔗 ReimuHaku has joined #archiveteam-bs
03:49 🔗 qw3rty has joined #archiveteam-bs
03:56 🔗 qw3rty2 has quit IRC (Read error: Operation timed out)
04:29 🔗 BubuAnabe has quit IRC (Ping timeout: 268 seconds)
04:33 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
04:36 🔗 BubuAnabe has joined #archiveteam-bs
04:40 🔗 Sk1d has joined #archiveteam-bs
05:02 🔗 zhongfu has joined #archiveteam-bs
05:17 🔗 BubuAnabe has quit IRC (Ping timeout: 268 seconds)
06:38 🔗 ZexaronS- has joined #archiveteam-bs
06:40 🔗 ZexaronS has quit IRC (Read error: Operation timed out)
06:45 🔗 Honno has joined #archiveteam-bs
07:04 🔗 Famicoman has quit IRC (Ping timeout: 260 seconds)
07:07 🔗 ZexaronS- has quit IRC (Read error: Operation timed out)
07:08 🔗 ZexaronS has joined #archiveteam-bs
07:12 🔗 Famicoman has joined #archiveteam-bs
07:18 🔗 ZexaronS has quit IRC (Quit: Leaving)
07:24 🔗 ZexaronS has joined #archiveteam-bs
07:33 🔗 Famicoman has quit IRC (Ping timeout: 260 seconds)
07:40 🔗 Famicoman has joined #archiveteam-bs
07:46 🔗 godane so i up to 1995-06-30 with tagesschau 20 clock news
07:57 🔗 kyounko has joined #archiveteam-bs
08:03 🔗 Famicoman has quit IRC (Ping timeout: 260 seconds)
08:10 🔗 Famicoman has joined #archiveteam-bs
08:28 🔗 BlueMaxim has quit IRC (Quit: Leaving)
08:28 🔗 BlueMaxim has joined #archiveteam-bs
08:33 🔗 Famicoman has quit IRC (Ping timeout: 260 seconds)
08:39 🔗 Famicoman has joined #archiveteam-bs
08:47 🔗 godane just noticed that electronic gaming monthly went dark 36 days ago
08:51 🔗 kristian_ has joined #archiveteam-bs
09:00 🔗 Famicoman has quit IRC (Ping timeout: 260 seconds)
09:05 🔗 kyounko|2 has joined #archiveteam-bs
09:06 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
09:07 🔗 Famicoman has joined #archiveteam-bs
09:08 🔗 BlueMaxim has joined #archiveteam-bs
09:11 🔗 kyounko has quit IRC (Read error: Operation timed out)
09:11 🔗 SHODAN_UI has joined #archiveteam-bs
09:31 🔗 Famicoman has quit IRC (Ping timeout: 260 seconds)
09:36 🔗 Famicoman has joined #archiveteam-bs
09:53 🔗 kyounko|2 has quit IRC (Read error: Connection reset by peer)
09:59 🔗 SHODAN_UI has quit IRC (Remote host closed the connection)
10:00 🔗 kristian_ has quit IRC (Quit: Leaving)
10:08 🔗 BlueMaxim has quit IRC (Quit: Leaving)
10:15 🔗 j08nY has joined #archiveteam-bs
10:29 🔗 Honno has quit IRC (Read error: Operation timed out)
11:06 🔗 godane i'm uploading newer eric archive docs: https://archive.org/details/ERIC_ED565342
12:16 🔗 SHODAN_UI has joined #archiveteam-bs
12:28 🔗 Honno has joined #archiveteam-bs
12:41 🔗 kristian_ has joined #archiveteam-bs
13:29 🔗 icedice has joined #archiveteam-bs
13:52 🔗 arkiver odemg: http://archivisthings.eieidoh.net:8880/DataHoarder/Comics/ gives me a 403
13:53 🔗 odemg arkiver, server went down, I've redirected dns, just populating /DataHoarder/Comics as fast as I can
13:54 🔗 arkiver thanks odemg
13:56 🔗 odemg arkiver, 1.1TB of anime stuff in the mean time? http://archivisthings.eieidoh.net:8880/DataHoarder/
13:58 🔗 arkiver :)
13:59 🔗 arkiver odemg: what this VR Content?
13:59 🔗 arkiver from the README
13:59 🔗 odemg it was 1TB of VR related games etc mirrored from ultimategamer.club after the hack
14:00 🔗 arkiver very nice
14:00 🔗 arkiver definitely grabbing a copy of that
14:01 🔗 odemg arkiver, I'll let you know when it's back up
14:01 🔗 arkiver thanks
14:11 🔗 HCross2 odemg: is that a complete Naruto collection?
14:12 🔗 HCross2 I've been looking for this for a while
14:12 🔗 odemg yes
14:14 🔗 HCross2 Thank you so much
14:15 🔗 odemg HCross2, get it as fast as you can :p
14:20 🔗 HCross2 odemg: is there a nicer way then doing a wget -r?
14:21 🔗 odemg feed aria the file list aria2c -j 25 -c -i list
14:22 🔗 pizzaiolo has joined #archiveteam-bs
14:23 🔗 odemg HCross2, http://archivisthings.eieidoh.net:8880/DataHoarder/Anime/Naruto%20Complete%20Series/list
14:23 🔗 HCross2 tyvm
14:24 🔗 odemg there you go, 50-70MB/s
14:32 🔗 yaMatt has joined #archiveteam-bs
14:33 🔗 yaMatt has quit IRC (Client Quit)
14:46 🔗 Famicoman has quit IRC (Ping timeout: 260 seconds)
14:50 🔗 Honno has quit IRC (Read error: Operation timed out)
14:52 🔗 Smiley has quit IRC (Read error: Connection reset by peer)
14:52 🔗 Smiley has joined #archiveteam-bs
14:53 🔗 Famicoman has joined #archiveteam-bs
15:06 🔗 SHODAN_UI has quit IRC (Ping timeout: 255 seconds)
15:07 🔗 kristian_ has quit IRC (Ping timeout: 370 seconds)
15:08 🔗 winr4r has quit IRC (Remote host closed the connection)
15:11 🔗 SHODAN_UI has joined #archiveteam-bs
15:11 🔗 SHODAN_UI has quit IRC (Read error: Connection reset by peer)
15:13 🔗 SHODAN_UI has joined #archiveteam-bs
15:15 🔗 Famicoman has quit IRC (Ping timeout: 260 seconds)
15:16 🔗 SHODAN_UI has quit IRC (Read error: Connection reset by peer)
15:18 🔗 SHODAN_UI has joined #archiveteam-bs
15:24 🔗 Famicoman has joined #archiveteam-bs
15:31 🔗 dashcloud has quit IRC (Ping timeout: 260 seconds)
15:34 🔗 dashcloud has joined #archiveteam-bs
15:40 🔗 hook54321 Do any of you know how to install grab-site on archlinux?
15:44 🔗 useretail hey guys, is there some tripod archive?
15:45 🔗 useretail wayback says that it's excluded
16:10 🔗 BubuAnabe has joined #archiveteam-bs
16:35 🔗 odemg HCross2, anime and comics dirs updated
16:36 🔗 HCross2 odemg: can you do me a favour and make a list of every URL please?
16:37 🔗 HCross2 Im going to mirror it to some HDDs locally
16:37 🔗 HCross2 and I want to copy it to my own Online.net box first so I can let it download at its own pace
16:38 🔗 Frogging hmm I wonder if I have space for any of this myself
16:40 🔗 odemg HCross2, https://chrome.google.com/webstore/detail/link-grabber/caodelkhipncidmoebgbbeemedohcdma
16:40 🔗 HCross2 ty
16:42 🔗 simsy has joined #archiveteam-bs
16:42 🔗 simsy hi
16:46 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
16:55 🔗 RichardG has joined #archiveteam-bs
16:55 🔗 RichardG_ has quit IRC (Read error: Connection reset by peer)
17:11 🔗 hook54321 How do I import cookies into a grab-site/archivebot instance?
17:19 🔗 BartoCH has joined #archiveteam-bs
17:36 🔗 Famicoman has quit IRC (Ping timeout: 260 seconds)
17:39 🔗 hook54321 i actually figured out the cookie thing.
17:39 🔗 hook54321 For grab-site, what is the format of the ignore file like?
17:42 🔗 Honno has joined #archiveteam-bs
17:45 🔗 Famicoman has joined #archiveteam-bs
17:46 🔗 simsy has quit IRC (Read error: Connection reset by peer)
17:47 🔗 Ravenloft has joined #archiveteam-bs
17:57 🔗 Aoede hook54321: https://github.com/ludios/grab-site/blob/master/libgrabsite/ignore_sets/forums
17:58 🔗 hook54321 K. got that working. I imported a cookies.txt file, but it's not logged into the website for some reason.
18:03 🔗 Famicoman has quit IRC (Ping timeout: 260 seconds)
18:04 🔗 Ravenloft has quit IRC (Ping timeout: 250 seconds)
18:05 🔗 JAA Different IP or user agent from when you logged in?
18:07 🔗 hook54321 Useragent yeah. I'll try to set it to the same and see what happens.
18:08 🔗 JAA Note that it's possible your session already got invalidated on the server side, so you may need to log in again.
18:13 🔗 Famicoman has joined #archiveteam-bs
18:17 🔗 hook54321 It just keeps on crashing about 3 or 4 urls in
18:23 🔗 hook54321 https://gist.githubusercontent.com/hook54321a/71f8224b4e15d0ec23eb378f6474fcee/raw/eeada89d724f7941bf3708b31509905cc2d3aac2/gistfile1.txt
18:34 🔗 SHODAN_UI has quit IRC (Remote host closed the connection)
18:51 🔗 kisspunch hook54321: please make an arch grab-site package :)
19:06 🔗 hook54321 kisspunch: If there were one, I wouldn't be trying to run it through the Ubuntu Windows bash thing.
19:06 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
19:07 🔗 Honno has quit IRC (Read error: Operation timed out)
19:10 🔗 kisspunch i have no idea what you're trying to describe but it sounds horrifying
19:10 🔗 kisspunch learn to make packages, it's pretty easy
19:10 🔗 kisspunch go read a random PKGBUILD
19:11 🔗 hook54321 I did get through part of the installation process, but then it said something about missing OpenSSL libraries.
19:11 🔗 kisspunch yeah, you'd have to manage the manual installation process as step 1
20:18 🔗 Honno has joined #archiveteam-bs
20:25 🔗 marvinw is now known as ivan
20:25 🔗 ivan hook54321: segfault might imply a problem with lmdb, try grab-site --no-dupespotter
20:26 🔗 SHODAN_UI has joined #archiveteam-bs
20:33 🔗 hook54321 I think it's working now. Thank you so much
20:33 🔗 ivan cool
20:41 🔗 HCross2 I'm using grab-site for some pretty huge crawls and its coping really well
20:41 🔗 HCross2 In fact, im currently capturing every .london homepage and its not falling over
20:41 🔗 jrwr Nice
20:42 🔗 HCross2 I split it in 6 in case it did have issues
20:42 🔗 HCross2 but each pack is still around 15k homepages
20:42 🔗 HCross2 plus whatever other assets it needds
20:42 🔗 jrwr HCross2: Im looking to make a Tor Version of ArchiveBot
20:42 🔗 HCross2 oh nice
20:42 🔗 jrwr I just need something with Diskspace, all I have access to is 50GB
20:42 🔗 HCross2 Can the wayback handle .onion sites?
20:43 🔗 jrwr I think so
20:43 🔗 jrwr even then, archive now, worry about it later
20:43 🔗 HCross2 jrwr: use your 50GB as a testbed, but talk to me when you have it working
20:44 🔗 jrwr I had one setup
20:45 🔗 jrwr pretty easy, just do Tor in a transparent method
20:45 🔗 jrwr abused LXC a little to do it as well
20:45 🔗 hook54321 I'm running it through the Ubuntu bash thing in Windows 10... Which probably has something to do with it.
20:48 🔗 Frogging use a VM or actual linux
20:50 🔗 HCross2 hook54321: Can you send me a warc from your Windows 10 setup please? I would like to run a few validation checks on it
21:19 🔗 bmcginty has quit IRC (Ping timeout: 250 seconds)
21:21 🔗 bmcginty has joined #archiveteam-bs
21:47 🔗 JAA 6 days into my Tilt API grab: 4.36M URLs retrieved for 11.5 GiB of warc.gz, 5.87M queued (rising again, unfortunately); 779k users, 104k campaigns, 1.67M URLs discovered
22:12 🔗 Honno has quit IRC (Read error: Operation timed out)
22:14 🔗 j08nY has quit IRC (Read error: Operation timed out)
22:14 🔗 j08nY has joined #archiveteam-bs
22:26 🔗 SHODAN_UI has quit IRC (Remote host closed the connection)
22:30 🔗 Frogging I freaked out briefly because I found a corrupted photo on my NAS despite the RAID check telling me everything was fine
22:31 🔗 Frogging turns out it was corrupted at the source. phew
22:33 🔗 Frogging the source being an old external HDD. it's a good thing I cloned that disk when I did because clearly it wasn't trustworthy
22:33 🔗 Famicoman has quit IRC (Ping timeout: 260 seconds)
22:39 🔗 mundus201 has joined #archiveteam-bs
22:40 🔗 Famicoman has joined #archiveteam-bs
23:09 🔗 hook54321 HCross2: It's not done yet.
23:09 🔗 hook54321 What are validation checks?
23:23 🔗 BubuAnabe has quit IRC (Ping timeout: 268 seconds)
23:25 🔗 Ravenloft has joined #archiveteam-bs
23:28 🔗 joepie91 Frogging: obligatory "RAID is an availability measure, not an integrity measure"
23:28 🔗 joepie91 (ie. not a backup)
23:29 🔗 Frogging oh I know, I just use it in my NAS, which I use to back up my PC. I was comparing the files in my PC with those on the NAS. but I still run a monthly check just to catch anything odd
23:30 🔗 BubuAnabe has joined #archiveteam-bs
23:30 🔗 Frogging the comparison lead me to believe corruption occured on the NAS but really it was because I was comparing my PC with a backup of a backup that got corrupted long ago
23:31 🔗 Frogging if that sounds dumb it's because it is, and that's why I'm sorting all this stuff out so it can actually make sense :p
23:33 🔗 Frogging I ran rsync with -ni and saw this
23:33 🔗 Frogging <fc........ Panorama 1.JPG
23:34 🔗 Frogging the checksum changing but not the size or the time is a red flag :p
23:56 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
23:59 🔗 pizzaiolo has joined #archiveteam-bs

irclogger-viewer