#archiveteam-bs 2016-12-28,Wed

↑back Search

Time Nickname Message
00:01 🔗 HCross hook54321, files are safe on the mirrors
00:01 🔗 HCross just my torrent box
00:01 🔗 HCross all I need to do is redownload to my torrent server
00:03 🔗 hook54321 Which mirrors? The cm ones?
00:03 🔗 HCross Right. I have a plan. Its 00:02 here. I am going to set the files to redownload from ItsYoda - get some sleep and then work on this tomorrow
00:03 🔗 hook54321 Does ItsYoda have the forums too?
00:03 🔗 HCross yes
00:03 🔗 hook54321 ok, we're safe then
00:08 🔗 HCross hook54321, download is running. Will update in the morning. Goodnight all
00:11 🔗 hook54321 goodnight
00:12 🔗 arkiver goodnight HCross
01:08 🔗 siemensak has quit IRC (Quit: Page closed)
01:58 🔗 hook54321 I remember someone mentioning a huge URL database here awhile ago but I can't seem to find it now, anyone know what it was?
02:01 🔗 Specular has joined #archiveteam-bs
02:04 🔗 hook54321 Nevermind, found it: https://commoncrawl.org/
02:07 🔗 hook54321 "The organization's crawlers respect nofollow and robots.txt policies." pffftt :/
02:08 🔗 wp494 dashcloud: srsly.de's up, if you want to go see what I was talking about yesterday
02:11 🔗 Specular no FAQ on how they differ from other archiving efforts
02:11 🔗 DedSec yea kind of strange
02:12 🔗 DedSec they should atleasy have an faq
02:12 🔗 hook54321 I don't get it, what is their purpose?
02:15 🔗 hook54321 It looks like a bunch of screenshots, mostly of terminals
02:16 🔗 hook54321 wp494
02:17 🔗 VADemon what kind of terminals though
02:17 🔗 Specular using their primitive index searches found on their blog (seemingly the only links to the crawls) doesn't seem to contain much
02:18 🔗 hook54321 They have a blog? Where?
02:18 🔗 Specular http://commoncrawl.org/connect/blog/
02:18 🔗 DedSec but can you set it up lol
02:19 🔗 hook54321 Specular: Oh, I thought you were talking about srsly.de
02:20 🔗 Specular oh. Srsly.de looks like some site that displays exposed VNC terminals.
02:20 🔗 wp494 ^^^
02:20 🔗 hook54321 Specular: You can search it here: http://urlsearch.commoncrawl.org/
02:20 🔗 hook54321 Specular, wp494 : "Failed to connect to server (code: 1006)"
02:21 🔗 Specular hook54321, where did you even find that link btw?
02:21 🔗 hook54321 Specular: https://duckduckgo.com/?q=commoncrawl+search+urls&ia=web
02:21 🔗 hook54321 Some search like that
02:22 🔗 wp494 hook54321: yeah I've been getting that when attempting to view
02:22 🔗 wp494 it's early days at C3 so maybe it'll get fixed
02:22 🔗 hook54321 C3?
02:23 🔗 wp494 chaos communication congress
02:23 🔗 wp494 run by germany's chaos computer club
02:23 🔗 wp494 more or less euro DEFCON if you don't count Black Hat
02:23 🔗 hook54321 Specular: They don't have anything for archiveteam.org, so it's not very big :/
02:25 🔗 Specular they state they crawled the top million domains from Alexa but even so there aren't actually many results within it, yeah
02:26 🔗 hook54321 Are they just trying to create a database of tons of urls or are they trying to crawl them too?
02:27 🔗 hook54321 *archive them too?
02:28 🔗 Specular clicking on some of the results in my search it appears they archive the raw HTML but since it doesn't display the actual page I'm not sure if any images or other dependant files are or if its just a copy of the source HTML as-is
02:29 🔗 Specular been in operation since 2010 according to the earliest blog result
03:39 🔗 pizzaiolo has left
03:40 🔗 Specular has quit IRC (Ping timeout: 633 seconds)
03:50 🔗 VADemon has quit IRC (Quit: left4dead)
04:22 🔗 hook54321 I need help solving this: "prove you are not a bot to ten decimal places"
04:41 🔗 Somebody2 BTW, back on Dec 21st, someone made an account on the archiveteam wiki and shared it on bugmenot. I've now randomized the password for it; if someone can think of a good use for such a public/anonymous account, speak up.
04:46 🔗 Somebody2 Ah, I see someone did use the bugmenot account to update the status of a French ISP's hosting. Thanks for doing that, in any case.
05:18 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
05:25 🔗 Sk1d has joined #archiveteam-bs
06:49 🔗 wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES)
08:26 🔗 Specular has joined #archiveteam-bs
08:29 🔗 Specular_ has joined #archiveteam-bs
08:33 🔗 Specular has quit IRC (Ping timeout: 370 seconds)
08:38 🔗 Specular_ has quit IRC (Ping timeout: 370 seconds)
08:39 🔗 Specular_ has joined #archiveteam-bs
09:17 🔗 GE has joined #archiveteam-bs
09:22 🔗 VADemon has joined #archiveteam-bs
10:19 🔗 Specular_ has quit IRC (Ping timeout: 370 seconds)
10:20 🔗 Specular_ has joined #archiveteam-bs
10:20 🔗 schbirid has joined #archiveteam-bs
10:28 🔗 Simpbrain has joined #archiveteam-bs
10:47 🔗 GE has quit IRC (Quit: zzz)
11:25 🔗 Ravenloft has quit IRC (Ping timeout: 260 seconds)
11:40 🔗 vitzli has joined #archiveteam-bs
12:11 🔗 yan Somebody2: actually, it may be cool to keep the account!
12:12 🔗 yan I imagine someone wanting te fix a typo but not bother enough to make an account while not wanting to edit as an IP
12:13 🔗 yan oh, but the wiki is read-only for IPs
12:16 🔗 vitzli has quit IRC (Quit: Leaving)
12:17 🔗 GE has joined #archiveteam-bs
12:18 🔗 yan you could also keep playing whack-a-mole, but with the account name "bugmenot" it's at least obvious where the edits are coming from
12:18 🔗 schbirid i often create bugmenot accounts
12:18 🔗 yan as long as it's not used to spam..
12:18 🔗 schbirid because signing up is a pain in the ass if your contribution is something tiny
12:18 🔗 yan schbirid: yeah, I've made a bunch as well
12:43 🔗 Specular_ has quit IRC (Ping timeout: 370 seconds)
12:47 🔗 Specular_ has joined #archiveteam-bs
12:56 🔗 BlueMaxim has quit IRC (Quit: Leaving)
13:15 🔗 Simpbrain has quit IRC (Remote host closed the connection)
13:31 🔗 pizzaiolo has joined #archiveteam-bs
13:42 🔗 godane so turned out i figure go to install atomicparsley so that get_iplayer would add metadata to the m4a files
13:43 🔗 godane this will effect most of the Newshour uploads and 2016-11 files of The World Tonight
13:43 🔗 godane good news is the metadata xml file was upload with them so i don't need to be redoing them cause of this problem
13:55 🔗 godane i got another review: https://archive.org/details/DTIC_ADA041895
14:24 🔗 Specular_ has quit IRC (Ping timeout: 370 seconds)
14:26 🔗 Specular_ has joined #archiveteam-bs
14:52 🔗 Specular_ has quit IRC (Quit: whoosh)
14:59 🔗 HCross2 Torrent is up http://yda.pw/CyanogenMod.torrent
15:00 🔗 Smiley you need seeders? how big is it?
15:01 🔗 HCross2 413GB, needs seeders
15:35 🔗 pizzaiolo I am working on this, if anyone knows/is inclined to help https://www.wikidata.org/wiki/Q4787261
15:38 🔗 HCross2 arkiver: torrent has been shoved into the IA
15:43 🔗 godane so some good news on the WallBuilders Live archive i have been downloading for the last 3 years
15:44 🔗 godane looks like they have (at least) 2011 episodes up now
15:45 🔗 godane this also means i will be getting the 128kbs mp3s also
15:45 🔗 godane instead of just the 32kbs ones
15:46 🔗 godane only from 2012-06 to 2013-09 need the upgraded versions
15:56 🔗 godane ok so it goes back to feb 2009 at least
16:01 🔗 arkiver HCross: nice
16:01 🔗 arkiver What item is it in?
16:06 🔗 RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
16:08 🔗 RichardG has joined #archiveteam-bs
16:18 🔗 HCross2 https://archive.org/details/CyanogenMod281216
16:21 🔗 HCross2 arkiver: I don't see a peer from the IA on my torrent
16:23 🔗 arkiver it's doing something https://catalogd.archive.org/log/613242494
16:23 🔗 arkiver give it a few hours
16:24 🔗 TheKiwi has quit IRC (Quit: Connection closed for inactivity)
16:24 🔗 VADemon has quit IRC (Quit: left4dead)
16:28 🔗 godane and so my upload of Wallbuilders live mp3s starts now
16:29 🔗 godane with description metadata
16:30 🔗 godane first one from 2009 that i got is up: https://archive.org/details/wallbuilders-live-2009-02-19
17:22 🔗 HCross2 arkiver: there we go. IA is downloading things
17:23 🔗 arkiver :D
17:26 🔗 HCross2 arkiver: also rapidly running out of IO
17:27 🔗 arkiver is this all on newsbuddy?
17:48 🔗 atlogbot has quit IRC (Quit: atlogbot)
17:49 🔗 swebb has quit IRC (Quit: badcheese.com - where crap sometimes gets done)
18:09 🔗 HCross2 arkiver: not the torrent, as there isn't a torrent client installed. I may install one as I can see it being useful
18:17 🔗 yan Somebody2: another way the bugmenot enables good-faith edits is in the case a user doesn't have his password handy on a public computer for example
18:34 🔗 pizzaiolo (15:58:55) yan: SketchCow: I knew you'd respond that way re: wikimedia ;) On a positive note, I dumped over a third (and going) of links to the file formats wiki on wikidata and they were quite happy to take them!
18:34 🔗 pizzaiolo what links? *curious*
18:41 🔗 pizzaiolo luckcolor: brazilian :P
18:43 🔗 luckcolor ah lel
18:43 🔗 luckcolor cause i'm italian lol
18:43 🔗 kristian_ has joined #archiveteam-bs
18:54 🔗 atlogbot has joined #archiveteam-bs
18:54 🔗 atlogbot has quit IRC (Remote host closed the connection)
18:55 🔗 HCross arkiver, any reason the IA would stop downloading randomly?
18:56 🔗 swebb has joined #archiveteam-bs
18:57 🔗 atlogbot has joined #archiveteam-bs
18:57 🔗 arkiver HCross: no idea, but if it doesn't resume it'll time out and fail and we'll just restart
18:58 🔗 HCross ok. the seeders are still doing their thing
18:59 🔗 atlogbot has quit IRC (Remote host closed the connection)
18:59 🔗 swebb has quit IRC (Client Quit)
19:00 🔗 atlogbot has joined #archiveteam-bs
19:00 🔗 swebb has joined #archiveteam-bs
19:07 🔗 pizzaiolo has left
19:29 🔗 kristian_ has quit IRC (Quit: Leaving)
19:33 🔗 Asparagir has quit IRC (Asparagir)
19:34 🔗 Asparagir has joined #archiveteam-bs
19:34 🔗 Asparagir has quit IRC (Client Quit)
19:34 🔗 Asparagir has joined #archiveteam-bs
19:36 🔗 schbirid so, anyone in town for 33c3? i did not get a ticket but would be up for awkward meets
19:48 🔗 Start has quit IRC (Read error: Connection reset by peer)
19:48 🔗 Start has joined #archiveteam-bs
19:59 🔗 Simpbrain has joined #archiveteam-bs
20:03 🔗 t2t2 how large are the ftp-gov-items? 8GB is fine, 80 is too much for this machine
20:05 🔗 arkiver2 has joined #archiveteam-bs
20:05 🔗 Medowar current average is 420MB, but can be bigger.
20:07 🔗 vantec In the pipeline, you can edit the max size.
20:10 🔗 arkiver2 has quit IRC (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com ))
20:10 🔗 arkiver2 has joined #archiveteam-bs
20:13 🔗 t2t2 I can set concurrent_items=1, max_items=100; but not size of item as it's unknown, isn't it?
20:14 🔗 Aoede MAX_SIZE variable in pipeline.py
20:17 🔗 t2t2 oh right, I was looking at the runner
20:27 🔗 t2t2 it's currently set to 10GB; I'm worried ExtractRecordsInfo will hang or run out of memory with such large files
20:30 🔗 t2t2 with my latest 1256MB item, up to 2.4GB of real memory (without shared or swap) was used
20:41 🔗 tfgbd_znc has quit IRC (Read error: Connection reset by peer)
20:43 🔗 arkiver t2t2: it won't, we're not loading the payloads into memory (afaik)
20:43 🔗 arkiver hmm
20:44 🔗 arkiver strange. I'll test
20:44 🔗 t2t2 wpull isn't, but warc is
20:50 🔗 t2t2 also, rsync seems to be compressing during transfer, but isn't warc.gz already compressed?
20:51 🔗 t2t2 wasting cpu on both sender and receiver for no gain
20:52 🔗 pizzaiolo has joined #archiveteam-bs
21:00 🔗 pizzaiolo has quit IRC (Read error: Operation timed out)
21:01 🔗 Asparagir has quit IRC (Asparagir)
21:06 🔗 Simpbrain has quit IRC (Remote host closed the connection)
21:08 🔗 siemensak has joined #archiveteam-bs
21:12 🔗 t2t2 so yeah, it did run out of memory: http://i.imgur.com/7V49acA.png
21:14 🔗 pizzaiolo has joined #archiveteam-bs
21:15 🔗 jrwr has joined #archiveteam-bs
21:57 🔗 BlueMaxim has joined #archiveteam-bs
22:08 🔗 siemensak has quit IRC (Quit: http://www.mibbit.com ajax IRC Client)
22:10 🔗 schbirid has quit IRC (Quit: Leaving)
22:43 🔗 arkiver2 has quit IRC (Ping timeout: 244 seconds)
22:47 🔗 Jon PurpleSym: thanks for sorting remix-dot-nin :D first check in I've managed since xmas. won't be back to do anything else until next week :(
23:02 🔗 pizzaiolo has quit IRC (Ping timeout: 244 seconds)
23:43 🔗 GE has quit IRC (Remote host closed the connection)

irclogger-viewer