[00:08] *** JesseW has joined #archiveteam [00:15] *** gibigiana has quit IRC (Ping timeout: 258 seconds) [00:21] *** gibigiana has joined #archiveteam [00:22] *** mr-b has quit IRC (Read error: Operation timed out) [00:27] *** ris has quit IRC () [00:30] *** mr-b has joined #archiveteam [00:52] *** db48x has joined #archiveteam [00:58] *** BlueMaxim has quit IRC (Quit: Leaving) [01:02] *** j08nY has quit IRC (Quit: Leaving) [01:35] *** JesseW has quit IRC (Ping timeout: 370 seconds) [01:37] *** WinterFox has joined #archiveteam [01:46] *** schbirid has quit IRC (Ping timeout: 258 seconds) [02:00] *** schbirid has joined #archiveteam [02:15] *** VADemon has quit IRC (left4dead) [02:54] *** BlueMaxim has joined #archiveteam [03:19] *** JesseW has joined #archiveteam [03:29] *** RavetcoFX has joined #archiveteam [03:30] Hey everyone, acording to this video https://www.youtube.com/watch?v=F7N254MTA4Q this youtuber (Louis Rossmann) might have his videos taken down [03:30] RavetcoFX: how many videos are we talking about? [03:30] I will be trying to archive his channel, but my internet is very slow [03:31] JesseW: 727 [03:32] I thought it was gonna be in the thousands [03:32] ohai RavetcoFX [03:32] Frogging: same as frogging101? [03:32] yep [03:32] cool, funny seeing you here :) [03:33] Frogging: can you look into sticking this into #archivebot? [03:33] which, that youtube channel? [03:33] wouldn't it be better for someone to youtube-dl it and put it in a collection [03:33] I don't know how ArchiveBot deals with youtube channels [03:34] I imagine not well, the structure isn't exactly top-down [03:34] That's what I'm doing right with youtube-dl, I don't know if i'll be fast enough though [03:35] fyi dnshistory.org has not replied to an email to support or a support ticket on the 8086.net website. So... if anyone wanted to grab some of their data, you could start by sticking the Alexa top million sites into the dnshistory.org search box. Likely it will be a slow crawl due to lack of resources on their end. [03:35] wumpus: I think arkiver was planning to write a Warrior project for it. [03:35] He wasn't allowed to talk about what's going on aparently, but I seems like Apple might be shutting down his business and channel because he does "unauthorised" repairs for macbooks [03:45] RavetcoFX: How fast is your connection? [03:45] I can youtube-dl them at 3.7MB/s [03:47] I'm using youtube-dl --write-info-json -f bestvideo[ext=mp4]+bestaudio[ext=m4a] [03:49] maxing out at 1.5MiB/s [03:49] I'm not bothering with bestvideo/audio [03:51] I just trying to get 720p [03:52] I'm just trying* [03:52] kk [03:54] Frogging: I found he's made a partial backup to the Internet Archive already https://archive.org/search.php?query=creator%3A%22Louis+Rossmann%22 [03:55] oh, that's good [03:56] looks like that's Fletcher's doing, actually :) [03:57] Frogging: so are you a project leader here? I had no idea you were into the archiving scene [03:57] nah, I'm not a leader... I'm just a guy who hangs around here. I also run a #archivebot pipeline. Actually this is meant to be a low-traffic channel, so if we want to discuss further we should move to #archiveteam-bs [03:58] kk [03:59] *** BlueMaxim has quit IRC (Read error: Operation timed out) [04:28] *** RavetcoFX has quit IRC (Quit: Page closed) [04:33] *** ravetcofx has joined #archiveteam [04:45] did anyone start archiving calendar.sunrise.am ? [04:51] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:00] *** Sk1d has joined #archiveteam [05:03] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [05:04] yahoosucks [05:04] lol [05:05] Now, knave, go forth and improve the Wiki [05:50] *** schbirid has quit IRC (Quit: Leaving) [06:00] *** dashcloud has quit IRC (Read error: Operation timed out) [06:04] *** dashcloud has joined #archiveteam [06:17] *** DoomTay has quit IRC (Quit: Page closed) [06:17] *** BlueMaxim has joined #archiveteam [06:47] *** JesseW has quit IRC (Ping timeout: 370 seconds) [06:51] I feel less embarrassed than before. [07:13] *** redlob has quit IRC (Read error: Operation timed out) [07:16] *** db48x` has joined #archiveteam [07:17] *** db48x has quit IRC (Read error: Operation timed out) [07:19] *** testname has joined #archiveteam [07:25] *** wumpus has quit IRC (Quit: Page closed) [07:25] *** redlob has joined #archiveteam [07:31] *** atomotic has joined #archiveteam [07:37] *** testname has quit IRC (Quit: Page closed) [08:14] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [08:22] *** atomotic has quit IRC (Ping timeout: 260 seconds) [08:32] *** atomotic has joined #archiveteam [09:09] *** dashcloud has quit IRC (Read error: Operation timed out) [09:21] *** dashcloud has joined #archiveteam [09:47] *** Wuked has joined #archiveteam [10:31] *** BlueMaxim has quit IRC (Quit: Leaving) [10:34] *** Wuked has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [10:44] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [10:51] *** kristian_ has joined #archiveteam [10:56] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [11:14] *** r3c0d3x has joined #archiveteam [11:23] *** r3c0d3x has quit IRC (Quit: Leaving) [11:35] *** dashcloud has quit IRC (Read error: Connection reset by peer) [11:36] *** dashcloud has joined #archiveteam [11:47] *** j08nY has joined #archiveteam [11:48] *** Emcy_ has quit IRC (Read error: Operation timed out) [11:54] *** Emcy has joined #archiveteam [12:04] *** atomotic has joined #archiveteam [12:22] *** aMunster has quit IRC (Read error: Operation timed out) [12:22] *** Atom-- has quit IRC (Read error: Connection reset by peer) [12:24] *** TC01 has quit IRC (No Ping reply in 180 seconds.) [12:24] *** Atom-- has joined #archiveteam [12:25] *** TC01 has joined #archiveteam [12:25] *** arkiver has quit IRC (Read error: Operation timed out) [12:27] *** remsen has quit IRC (Read error: Operation timed out) [12:28] *** superkuh has quit IRC (Read error: Operation timed out) [12:29] *** superkuh has joined #archiveteam [12:30] *** arkiver has joined #archiveteam [12:30] *** swebb sets mode: +o arkiver [12:33] *** remsen has joined #archiveteam [12:36] *** mhazinsk has quit IRC (Read error: Operation timed out) [12:37] *** aMunster has joined #archiveteam [12:39] *** aMunster has quit IRC (Read error: Operation timed out) [12:40] *** nwf_ has quit IRC (Read error: Operation timed out) [12:47] wumpus: as i previously stated it's easier to crawl [12:47] we just need a list of registries to query them [12:48] like com org etc [12:49] hes not here [12:49] *** aMunster has joined #archiveteam [12:49] ah [12:49] ok [12:50] well i can enumerate a list of 3 characters registries just fine [12:50] *** pfallenop has quit IRC (Read error: Operation timed out) [12:50] luckcolor: what was the question? [12:51] " fyi dnshistory.org has not replied to an email to support or a support ticket on the 8086.net website. So... if anyone wanted to grab some of their data, you could start by sticking the Alexa top million sites into the dnshistory.org search box. Likely it will be a slow crawl due to lack of resources on their end. [12:51] wumpus: I think arkiver was planning to write a Warrior project for it." [12:51] i'm saying that atleast we can enumerate and crawl the 3 characters registries easly [12:52] luckcolor: er, how do you plan on doing that, exactly? [12:52] Did anyone grab the Louis Rossman YouTube channel yet? [12:52] if you use browse https://dnshistory.org/subdomains/1/com [12:52] you can easly change com to watever registry [12:53] oh, you mean on dnshistory itself [12:53] yeah [12:53] so you can basically change the registry name there [12:53] and get a different result list [12:54] and those lists can probably either get archivebotted or made into items for archivebot [12:54] *** pfallenop has joined #archiveteam [12:54] but we first need a dns registry list first [12:54] we can probably iterate up to 3 characters in leght just fine [12:55] but we will then miss the custom named ones [12:55] like .cake .stuff. etc [12:56] luckcolor: any help? http://data.iana.org/TLD/tlds-alpha-by-domain.txt [12:56] luckcolor: they do rate limiting based on referer header [12:56] sort of [12:57] that would help [12:57] *** aMunster has quit IRC (Remote host closed the connection) [12:57] yeah we probably have to go very slowly [12:57] *** aMunster has joined #archiveteam [12:57] hm not quite rate limiting [12:57] but i think in 8 days we can pull it off [12:57] but it will throw you back to page 1 if you visit high numbers direcftly [12:57] you can only get there through the next link [12:57] ie. referer header [12:57] ok [12:58] so it's possible i presume [12:58] yeah [12:58] just need to make sure to get the referer header right [12:59] i will just test on archivebot really quickly so wwe can see if we can use wpull as the discovery [13:00] *** Wuked has joined #archiveteam [13:00] https://dnshistory.org/subdomains/1/yt [13:02] *** aMunster has quit IRC (Read error: Operation timed out) [13:04] nope i don't think it's getting them [13:04] luckcolor: well, my site is in there [13:04] lol [13:05] (pdf.yt) [13:05] lol [13:06] *** nwf_ has joined #archiveteam [13:08] yep we can't use archivebot [13:09] luckcolor: hmm. why not? [13:09] wait [13:09] did i put the url without trailing / [13:09] yes [13:09] oh right [13:09] that [13:09] * luckcolor facepalm [13:10] luckcolor: if you just !a, wouldn't we get it? [13:10] er [13:10] if you just !a dnshistory.org [13:10] *** aMunster has joined #archiveteam [13:10] also, lol, they were selling a removal service [13:10] no there's nothing that points to other registries [13:18] https://dnshistory.org/subdomains/1/yt is case sensitive [13:18] https://dnshistory.org/subdomains/1/YT won't work [13:31] *** Wuked has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [13:42] *** Yoshimura has joined #archiveteam [13:42] joepie91: i'm now checking if the website has all the urls in the list you sent me [13:42] luckcolor: wait, I sent you a list? [13:42] * joepie91 is confuse [13:43] * luckcolor is more confused now [13:43] wait [13:44] * luckcolor facepalm again [13:44] joepie91: will sent the list to me [13:47] *** Wuked has joined #archiveteam [13:47] *** Emcy_ has joined #archiveteam [13:48] Did the bot work luckcolor ? [13:49] Or does it need a more bespoke job? [13:50] no it didn't work [13:51] it just 200'd the page and then moved on [13:51] with and without trailing / at the and of the url [13:52] *** Emcy_ has quit IRC (Client Quit) [13:52] O [13:53] ok i just found out that we can't use status code for checking if a page is empty or not [13:53] it will always give 200 on a blank page [13:53] https://dnshistory.org/subdomains/1/asdksadmidhasdsadj eg [13:54] *** Emcy has quit IRC (Read error: Operation timed out) [14:03] Igloo: can you check if you can do something? i sent a yt test url before [14:08] Sure [14:08] what do you need luckcolor ? [14:09] if you can figure out a way to archive that page with all the domains linked to it [14:09] I am trying with my Heritrix crawler [14:09] ok [14:14] I can get the 200 for the URL [14:14] But struggling to get it to actually parse it and go into the sub pages [14:18] *** kristian_ has quit IRC (Leaving) [14:20] *** bzc6p has joined #archiveteam [14:20] *** swebb sets mode: +o bzc6p [14:22] Maybe this dnshistory project is ripe enough to get its own channel? [14:24] ok so here's my discovery list http://pastebin.com/AWnrkiEY [14:25] bzc6p: moved it to BS temporarily [14:25] ok [14:25] works for me [14:25] Let's find of a good channel name [14:26] ehm historicaldnames? [14:26] i'm bad at puns lol [14:26] dnshistoryishistory [14:27] yeah that one works [14:28] i mean it's much better than mine lol [14:31] *** mhazinsk has joined #archiveteam [14:31] *** dashcloud has quit IRC (Read error: Operation timed out) [14:33] #greatlokup [14:33] * #greatlookup [14:33] that works too [14:40] *** dashcloud has joined #archiveteam [14:42] yeah [14:42] so we use that one? :P [14:42] *** Nemo_bis has joined #archiveteam [14:43] yeah use that one [14:43] If there are no better suggestions [14:43] yeah [14:45] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [14:45] arkiver: now that coursera and arto are done, can you please queue items for myvip, so that we can finish the project? [14:47] Coursera is still ongoing (Literally 5 items to run I think) [14:53] I did not realise that http://tracker.archiveteam.org/sourceforgersync/ still needs help [14:54] We literally haven't touched that project in a year [14:57] I noticed because of http://softwareheritage.org/ [14:58] I suggested going on slowly, but nobody cared [14:58] We'd be done by now [14:59] Well, downloads went on according to the graph, no? [15:01] That's tricky. Zoom in [15:02] You meant that heritage site may have sourceforge stuff? [15:02] I don't know, but I assume they're interested [15:16] *** WinterFox has quit IRC (Remote host closed the connection) [15:24] I am trying to get insight into some routing, would anyone here by any chance know of an IP located in London? [15:25] *** Wuked has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [15:25] *** bzc6p has left [15:26] never mind [15:30] *** JesseW has joined #archiveteam [15:36] *** DoomTay has joined #archiveteam [15:56] *** JesseW has quit IRC (Ping timeout: 370 seconds) [15:59] *** arkiver2 has joined #archiveteam [15:59] *** swebb sets mode: +o arkiver2 [16:00] *** Yoshimura has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) [16:03] *** arkiver2 has quit IRC (Read error: Connection reset by peer) [16:32] *** Trevor has joined #archiveteam [16:33] *** pfallenop has quit IRC (Ping timeout: 260 seconds) [16:33] *** Trevor has quit IRC (Client Quit) [16:37] *** Trevor has joined #archiveteam [16:38] Why do I get and HTTP error when trying to download archived Twitch videos? [16:43] depends on what the error is [16:45] The program im using to try to download it says 1.1 504 Gateway Time-out. Idk if thats helpful tho im a computer noob [16:47] and what's the thing you're trying to download [16:48] Twitch VOD from Mindcracknetwork channel. http://media-cdn.twitch.tv/store51.media10/archives/2013-10-20/live_user_mindcracknetwork_1382246112.flv [16:49] yep i get the same thing [16:50] So is it just broken? [16:51] ¯\_(ツ)_/¯ [16:51] Ah well, thanks anyways for trying! [16:52] I'd say it's a twitch then [16:52] *thing [16:52] It was broken for me the other day and then worked, try in a few hours or so [16:52] *** dashcloud has quit IRC (Read error: Operation timed out) [16:52] Alright thanks [16:52] *** pfallenop has joined #archiveteam [16:52] *** Trevor has quit IRC (Quit: Page closed) [16:54] *** pfallenop has quit IRC (Remote host closed the connection) [16:54] *** pfallenop has joined #archiveteam [16:55] how are we twitch support [16:55] *** ris has joined #archiveteam [16:55] ¯\_(ツ)_/¯ [16:57] *** RedType has quit IRC (Quit: leaving) [16:57] *** dashcloud has joined #archiveteam [17:08] *** ravetcofx has quit IRC (Ping timeout: 506 seconds) [17:08] well to be fair, archiveteam has a disturbingly large collective knowledge of weird failure modes of sites [17:08] lol [17:08] fair enough [17:09] *** metalcamp has joined #archiveteam [17:16] *** ravetcofx has joined #archiveteam [17:22] *** closure has quit IRC (Ping timeout: 250 seconds) [17:24] *** mls has quit IRC (Quit: Lost terminal) [17:25] *** mls has joined #archiveteam [17:26] *** Emcy has joined #archiveteam [17:26] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [17:30] phuzion: yea, I grabbed it last night [17:31] db48x`: awesome, all 700some videos? [17:31] *** RedType has joined #archiveteam [17:31] yep [17:31] Nice. Thanks. [17:32] 703 in the end [17:32] With metadata? [17:32] Uh....I thought there were 727 [17:33] [youtube:playlist] playlist Uploads from Louis Rossmann: Downloading 703 videos [17:33] [download] Downloading video 1 of 703 [17:33] *** BartoCH has joined #archiveteam [17:34] and yes, I got descriptions, annotations, and metadata [17:34] Cool [17:35] 152GB total [17:35] that sounds too small [17:35] db48x`: I already have 194GB here [17:36] hmm [17:36] biggest video is only 1.9GB [17:36] nope [17:36] I've seen at least one bigger one [17:36] only a dozen are larter than 1GB [17:36] you might be grabbing the wrong quality [17:36] they're all 1280x720 [17:37] db48x`: yeah, that's too low [17:37] you're probably using an old youtube-dl [17:37] hmm [17:37] without specifying the right quality param [17:37] no, it's fresh [17:37] older versions defaulted to `best` [17:37] which despite the name isn't actually best [17:37] db48x`: pip or distro? [17:37] According to a quick glance, they're supposed to be 1080p60 [17:38] joepie91: neither [17:38] where'd you install it from, then? [17:38] and I have one 6.7GB video here :p [17:38]  youtube-dl --version [17:38] 2016.07.01 [17:38] 3840x2176 at 60fps [17:39] db48x`: weird [17:41] could rerun it with -f 248+251 or something [17:42] hmm, could do --all-formats :) [17:42] I don't think it downloaded any of the DASH manifests at all [17:45] *** r3c0d3x has joined #archiveteam [17:51] db48x`: you want bestaudio+bestvideo/best [17:52] youtube-dl --title --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors --merge-output-format mkv -f 'bestvideo+bestaudio/best' [17:52] I did this "youtube-dl --write-info-json -f bestvideo[ext=mp4]+bestaudio[ext=m4a]" [17:53] I thought write-info-json implicitly gathers annotations [17:53] (reason for the ext=mp4 is that webm quality can suck sometimes) [17:57] *** ndiddy has joined #archiveteam [17:58] joepie91: that seems like it should be the default :) [17:59] db48x`: it -should- be now [17:59] Frogging: don't specify formats [17:59] joepie91: :) [17:59] youtube-dl will determine itself which format is of the best quality [17:59] my other youtube-dl, on my other computer, just downloaded a 1920x1080 video, and I'm not using -f :P [17:59] function yt { [17:59] youtube-dl --write-description --write-annotations --write-info-json -f bestaudio+bestvideo/best -o "%(uploader)s/%(title)s - %(resolution)s [%(extractor)s;%(id)s;%(format_id)s].%(ext)s" --no-mtime --download-archive ".youtube-dl" --playlist-reverse --match-title "$1" "$2" [17:59] } [18:00] joepie91: I know, but sometimes it chooses webm and it is noticeably worse quality than the mp4 [18:02] bah: ERROR: The first format must contain the video, try using "-f 135+251" [18:04] that was w464rsss9yg, which apparently only has 720x480 video [18:16] lol [18:16] -f bestvideo+bestaudio works, but -f bestaudio+bestvideo does not [18:17] yes [18:17] you need video first [18:18] Why are audio and video separate anyway? [18:22] dunno exactly [18:22] probably to prevent a combinatorial explosion [18:23] youtube provides separate video and audio streams now [18:23] on most videos [18:23] both of which are higher quality than the best combined stream [18:24] so wait is there any reason for _all_ of us to be downloading this [18:24] :p [18:24] Frogging: RAIA [18:24] redundant array of independent archivists? [18:24] yep [18:24] :P [18:24] sure [18:24] good enough for me :p [18:24] now, sleep. also whoop whoop offtopic siren [18:24] yeah [18:24] should probably go to -bs [18:24] :) [18:24] night :p [18:24] how is this offtopic? [18:25] sleep well though [18:25] db48x`: well it's lengthy, and this is meant to be a low traffic channel where conversations start and are moved elsewhere [18:25] eh, I dislike that idea [18:25] moving conversations breaks them [18:26] and the name is wrong [18:26] -bs channels are used for off-topic conversations, not for the second halves of conversations [18:26] well [18:27] that's not how we do it. #archiveteam is supposed to be a place where someone can look and see "what's going on today" and then follow up on things in other channels [18:27] because we start a lot of projects that last a while and have discussion that is not really interesting unless you care [18:34] well, that's downloading again [18:34] time for me to sleep as well [18:43] *** ring has quit IRC (Read error: Connection reset by peer) [18:43] *** ring has joined #archiveteam [18:44] *** Tomcat_ has joined #archiveteam [18:49] *** maseck has quit IRC (Remote host closed the connection) [18:54] *** maseck has joined #archiveteam [19:04] https://www.reddit.com/r/apple/comments/4qqv56/louis_rossmanns_youtube_channel_that_is_full_of/ [19:14] *** Tomcat_ has quit IRC (Remote host closed the connection) [19:17] *** Fake-Name has quit IRC (Read error: Operation timed out) [19:21] *** beeper has joined #archiveteam [19:40] *** JW_work has quit IRC (Ping timeout: 370 seconds) [19:45] *** JW_work has joined #archiveteam [20:33] much discussion about little knowledge [20:39] *** DoomTay has quit IRC (Quit: Page closed) [20:46] *** ravetcofx has quit IRC (Ping timeout: 506 seconds) [20:50] *** ravetcofx has joined #archiveteam [21:01] *** JesseW has joined #archiveteam [21:05] *** dashcloud has quit IRC (Read error: Operation timed out) [21:07] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [21:12] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [21:13] *** dashcloud has joined #archiveteam [21:16] *** BartoCH has joined #archiveteam [21:32] *** JesseW has quit IRC (Ping timeout: 370 seconds) [21:44] *** JesseW has joined #archiveteam [21:45] *** DoomTay has joined #archiveteam [21:52] *** BlueMaxim has joined #archiveteam [22:25] *** tomwsmf-a has joined #archiveteam [22:36] *** JesseW1 has joined #archiveteam [22:37] *** JesseW2 has joined #archiveteam [22:37] *** JesseW3 has joined #archiveteam [22:38] *** JesseW3 has quit IRC (Client Quit) [22:38] *** JesseW1 has quit IRC (Read error: Operation timed out) [22:38] *** JesseW1 has joined #archiveteam [22:39] *** JesseW has quit IRC (Read error: Operation timed out) [22:41] *** JesseW2 has quit IRC (Read error: Operation timed out) [22:55] This is a great fight [22:55] Also, siding with xmc on this one [22:55] db48x`: What's with the fisherman's sleep schedule [22:56] Wait, what fight? [22:56] The SSL ordeal? [22:57] they're just an seo startup [22:57] it's hilarious [22:58] xmc: who is just an seo startup? [22:58] idk, some shitbag [22:59] uh, i replyed to them on twitter [22:59] http://edocr.com/ [23:00] lolololol [23:00] anyway, I should take this to -bs [23:05] (The fight over the conversation debate of #archiveteam) [23:06] #archiveteam is a warroom, #archiveteam-bs is a deck out back [23:08] you can't fight in here, this is the war room! [23:16] Are we duping https://www.youtube.com/user/rossmanngroup ? [23:16] We should be [23:17] at least 3 people here (including me) are youtube-dl'ing it [23:17] should check how that's getting on actually [23:17] btw, how did coursera's old courses archival went? [23:29] *** dashcloud has quit IRC (Read error: Operation timed out) [23:30] *** dashcloud has joined #archiveteam [23:31] *** JesseW1 is now known as JesseW