[00:03] *** Darkstar has quit IRC (Remote host closed the connection) [00:10] *** nickname_ has joined #archiveteam-bs [00:28] *** Darkstar has joined #archiveteam-bs [00:34] It's my brother's business [00:38] Oh right — I think I'd heard you mention that in one of your talks. [01:02] speaking of gitlab [01:02] if you're looking for a self-hosted git thing, I really like gogs: https://gogs.io/ [01:02] been using it for a while on a raspi and it's quite nice [01:28] *** nickname_ has quit IRC (Read error: Operation timed out) [01:35] *** JesseW has joined #archiveteam-bs [01:36] *** VADemon has joined #archiveteam-bs [01:36] *** VADemon has quit IRC (Read error: Connection reset by peer) [02:01] *** Boppen has quit IRC (hub.se irc.du.se) [02:01] *** Rotab has quit IRC (hub.se irc.du.se) [02:11] *** JesseW has quit IRC (Quit: Leaving.) [02:34] *** JesseW has joined #archiveteam-bs [02:36] !ao https://twitter.com/TheFriddle/status/702316134008119297 --phantomjs [02:37] MrRadar: wrong channel [03:29] *** JesseW has quit IRC (Quit: Leaving.) [04:02] *** bwn has quit IRC (Ping timeout: 492 seconds) [04:28] *** ndiddy has quit IRC (Read error: Connection reset by peer) [04:48] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [05:11] *** Lord_Nigh has quit IRC (Ping timeout: 250 seconds) [05:27] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:32] *** JesseW has joined #archiveteam-bs [05:33] *** Lord_Nigh has joined #archiveteam-bs [05:34] *** Sk1d has joined #archiveteam-bs [05:42] https://archive.org/metadata/FourFiveSecondsRihannaKanyeWestPM4A32kbps <- an item whose identifier doesn't match the identifier stored in the metadata [05:42] sent in to info@ [06:07] It looks like there may be some magic with IA identifiers that are the same as mediatypes (and are collections), in that they seem to include all the items with that mediatype, whether or not the item in question explicitly includes itself in the collection. If someone from IA wanted to confirm that, I'd be grateful. [07:01] JW_work1 done it [07:04] *** metalcamp has joined #archiveteam-bs [07:04] HCross2: it didn't seem to take; I re-did it and it did then... [07:04] Ah. Thanks [08:14] *** bwn has joined #archiveteam-bs [08:27] *** JesseW has quit IRC (Quit: Leaving.) [08:32] *** schbirid has joined #archiveteam-bs [09:12] *** kvieta has quit IRC (Read error: Operation timed out) [09:12] *** acridAxid has quit IRC (Read error: Operation timed out) [09:12] SketchCow: i'm up to 2009-07-31 for kpfa [09:13] i also have all of 2009-08 now [09:15] *** logchfoo2 starts logging #archiveteam-bs at Wed Feb 24 09:15:31 2016 [09:15] *** logchfoo2 has joined #archiveteam-bs [09:17] *** beardicus has joined #archiveteam-bs [09:17] *** acridAxid has joined #archiveteam-bs [09:51] *** logchfoo1 starts logging #archiveteam-bs at Wed Feb 24 09:51:12 2016 [09:51] *** logchfoo1 has joined #archiveteam-bs [10:21] *** Coderjoe has quit (hub.efnet.us ircd.choopa.net) [10:21] *** Stiletto has quit (hub.efnet.us ircd.choopa.net) [10:21] *** beardicus has quit (hub.efnet.us ircd.choopa.net) [10:21] *** botpie91 has quit (hub.efnet.us ircd.choopa.net) [10:21] *** closure has quit (hub.efnet.us ircd.choopa.net) [10:21] *** logchfoo1 has quit (hub.efnet.us ircd.choopa.net) [10:21] *** kvieta has quit (hub.efnet.us ircd.choopa.net) [10:21] *** mistym- has quit (hub.efnet.us ircd.choopa.net) [10:29] *** Coderjoe (~tward@[redacted]) has joined #archiveteam-bs [10:29] *** Stiletto (~Stiletto@[redacted]) has joined #archiveteam-bs [10:35] *** beardicus (~beardicus@[redacted]) has joined #archiveteam-bs [10:35] *** botpie91 (~botpie91@[redacted]) has joined #archiveteam-bs [10:35] *** closure (~lambda@[redacted]) has joined #archiveteam-bs [10:35] *** mistym- (~mistym@[redacted]) has joined #archiveteam-bs [10:35] *** kvieta (~kvieta@[redacted]) has joined #archiveteam-bs [10:56] *** VADemon (~VADemon@[redacted]) has joined #archiveteam-bs [12:16] *** godane (~slacker@[redacted]) has joined #archiveteam-bs [12:50] *** simpwork (~androirc@[redacted]) has joined #archiveteam-bs [13:33] so this is interesting [13:34] i'm going to be get a mp3 from kpfa thats 750mb [13:34] thats about 3 DAYS of audio in one mp3 file [13:38] this is the reason: Fund Drive Special – September 29, 2009 [13:38] whats funny is the say the episode is no longer available: https://kpfa.org/archives/2009/9/29/ [13:39] but i'm still able to grab everything [13:58] Nice [14:51] *** metalcamp has quit (Ping timeout: 252 seconds) [15:23] *** Start has quit (Quit: Disconnected.) [15:58] *** Start (~Start@[redacted]) has joined #archiveteam-bs [16:27] *** Sk2d (~Sk1d@[redacted]) has joined #archiveteam-bs [16:33] *** Sk1d has quit (hub.se irc.du.se) [16:49] *** Sk2d is now known as Sk1d [16:55] *** Boppen (~Boppen@[redacted]) has joined #archiveteam-bs [16:56] *** Rotab (~Rotab@[redacted]) has joined #archiveteam-bs [17:04] *** Start has quit (Quit: Disconnected.) [17:04] *** JesseW (~jesse@[redacted]) has joined #archiveteam-bs [17:05] *** Boppen has quit (hub.se irc.du.se) [17:05] *** Rotab has quit (hub.se irc.du.se) [17:16] *** SadDM (~SadDM@[redacted]) has joined #archiveteam-bs [17:16] *** swebb gives channel operator status to SadDM [17:31] *** JesseW has quit (Quit: Leaving.) [17:53] *** mr-b has quit (Read error: Operation timed out) [17:53] *** dashcloud has quit (Read error: Operation timed out) [17:53] *** SadDM has quit (Read error: Operation timed out) [17:53] *** acridAxid has quit (Read error: Operation timed out) [17:54] *** Darkstar has quit (Read error: Operation timed out) [17:54] *** jspiros has quit (Read error: Operation timed out) [17:54] *** SN4T14 has quit (Read error: Operation timed out) [17:54] *** matthusby has quit (Ping timeout: 246 seconds) [17:54] *** simpwork has quit (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com )) [17:55] *** chfoo- has quit (Ping timeout: 246 seconds) [17:55] *** remsen has quit (Ping timeout: 246 seconds) [17:55] *** rduser has quit (Ping timeout: 246 seconds) [17:58] *** dashcloud (~quassel@[redacted]) has joined #archiveteam-bs [18:02] *** acridAxid (~acridAxid@[redacted]) has joined #archiveteam-bs [18:02] *** remsen (~remsen@[redacted]) has joined #archiveteam-bs [18:02] *** SN4T14 (~SN4T14@[redacted]) has joined #archiveteam-bs [18:03] *** rduser (~rduser@[redacted]) has joined #archiveteam-bs [18:04] *** mr-b (~mr-b@[redacted]) has joined #archiveteam-bs [18:20] *** chfoo- (~chfooZnc@[redacted]) has joined #archiveteam-bs [18:24] just a quick poll, looking at getting a domain, these new domain extensions worth it? [18:28] No. (Though, full disclosure, I do own a .ninja domain, just because.) [18:32] *** snape is strictly old-school with TLDs - .com if you sell stuff, .net if you're an ISP, .org if you're not, and .biz if you're a spammer. /s [18:33] nah spammers use .info because it's $3 [18:38] Oh yeah, I forgot about .info. It's like the new .name, which was the new .me, which was the new .biz... stupid spammers. :/ [18:40] *** metalcamp (~metalcamp@[redacted]) has joined #archiveteam-bs [18:44] yeah, have like 3 domains but these new domains tld's are tempting but it's only to address to mass hoarding of people buying and sitting on dead domains [18:51] Make sure it's a TLD that allows private/anon registration. I reg'd a no-private-registrations domain in my cat's name in 2008 and I still get spam (and physical junk mail) addressed to him from that today. Got some kind of fricking campaign flyer in the mail on Monday, actually. :/ [18:52] SimpBrain: heck no — better the devel (er devil) you know. [18:53] SketchCow: is there a tutorial or getting started guide for contributing ISOs to Archive.org's Shareware CD Archive? [18:54] *** Darkstar (~darkstar@[redacted]) has joined #archiveteam-bs [18:58] thx will avoid them [19:06] I really want to see .stupid as a tld [19:06] imagine .exe as a tld [19:06] argh https://gethttpsforfree.com/ does not support renewing :( [19:06] .scr [19:09] *** bwn has quit (Ping timeout: 250 seconds) [19:09] I think TLDs officially jumped the shark when we got .moe and .shiksa, nothing will surprise me anymore... [19:10] hey .moe is the best thing to happen to the internet since TCP/IP [19:14] I would also support .exe TLDs because then we could have rockman.exe and hell yeah [19:14] *** snape thinks .moe is one of the worst things to happen to the Internet since fingering users was a thing, but to each their own... [19:17] to be fair, the only .moe domain I really know of is llvm.moe [19:18] There was also archive.moe which was a 4chan archive site [19:19] That's actually a perfect use of that TLD, IMO [19:19] moe.moe.moe should be an invisiblecow clone with accent [19:21] ICANN and AR will make anime real [19:21] imagine the possibilities [19:21] all-year anime conventions [19:21] I don't think I've knowingly visited one, but I bet we can guess twenty working websites without even trying. desu.moe, pantsu.moe, baka.moe, oneesame.moe, tsundere.moe, waifu.moe, galge.moe... I'm sure at least three of those exist, probably more. Sigh. [19:21] I can see literally nothing wrong with this [19:22] wait, what does moe stand for? [19:22] *** SimpBrain is confused too [19:23] It's wapanese for "cute". [19:23] oh... [19:23] *cringe* [19:26] it is probably helpful to know that I have factored out the /s from everything I write [19:27] Yeah, I figured that out after the halfchan comment, lol. [19:28] swebb, we should probably talk about al jazeera here [19:29] Would youtube-dl work if I setup a grab site pipeline with it, and just used http://video.aljazeera.com/channels/eng etc? [19:39] *** bwn (~bwn@[redacted]) has joined #archiveteam-bs [19:40] Five of those seven .moe domains I guessed are registered. I'm not brave enough to see if there are websites, tho, lol. [19:40] it seems to be doing it [19:46] oh I was hoping microsoft.moe would have chibized OS-tans [19:47] Microsoft, get on that [19:50] acridAxid: What do you have and how many? [19:50] I occasionally scan the general software pile to find ones people uploaded. [19:59] can I throw things straight at the web collection? [20:01] Not sure. [20:04] I'm grabbing the Video site html too. [20:04] 27M AlJazeeraVideo-20160224195632985 [20:04] 680M AlJazeera-20160224193701857 [20:04] I didn't include that one in the original crawl. [20:05] Actually, it would be best to have them all together, huh? I should just put them all together. [20:06] Do you have the BW to do so? [20:07] Sure. [20:07] but I can't crawl the video like you're doing. [20:08] I was talking about the heritrix crawls. :) [20:08] I've got 1Gbit to my home. [20:08] better to have 2 copies of it, than no video [20:08] I'm not sure how to set up the video crawl stuff that you're doing. [20:09] *** Start (~Start@[redacted]) has joined #archiveteam-bs [20:10] The video site shows 11164 videos. :) [20:23] What are you using to pipe output to the webpage like that? [20:24] its all grab-site [20:24] its the same system archivebot uses [20:26] *** metalcamp has quit (Ping timeout: 252 seconds) [20:38] wellkinda [21:01] *** bsmith093 (68f4c20a@[redacted]) has joined #archiveteam-bs [21:21] grab-site's ignores have diverged [21:24] I know what my prob [21:25] problem was, megawarc is for rsyncs in, not me running grab-site [21:44] swebb, I just gor IPbanned from them [21:44] got [21:44] Crap [21:45] and for some reason, grab site has decided to crawl my hosting provider's website [21:46] Ill reboot it, and see what its done [21:46] I'm not banned as far as I can tell [21:46] You may have been hitting them too hard. [21:47] Yea. Ill slow down [21:47] ive rebooted, and they are happy and its downloading [22:00] *** schbirid has quit (Quit: Leaving) [22:00] swebb, how are you getting the files to the IA? [22:01] https://pypi.python.org/pypi/internetarchive [22:01] Works really well. [22:02] Yea. Can you just feed it a folder and say "upload all this" [22:02] Yep [22:02] you can overwrite files too [22:02] ah cool. I was in the middle of writing some hugely complex script to do it, but then realised there must be a better way. Ive used that library before, forgot it exists [22:03] I started uploading and then re-started it again and it just overwrote all of the old files with the same filename just as you would suspect. I was afraid that it would make new copies for dupe filenames. [22:03] Ill wait, as with the videos, there is a large gap as it caches them, before it makes the warc [22:04] ive got 2tb, and 100Mbps connection, so ill probably be able to upload in the morning [22:07] ooh, swebb can you tell it to delete the file after its uploaded? [22:11] I don't belive so. [22:12] Thanks [22:13] Add --metadata="mediatype:web" --metadata="collection:archiveteam" to your upload [22:13] and --metadata="subject:archiveteam" [22:13] Am I allowed to throw directly into the archiveteam collection? [22:14] no [22:14] Upload it to opensource and let SketchCow move it [22:14] throught so, will do that arkiver [22:14] thought [22:14] oh, ok [22:27] Yes [22:27] (Do it that way) [22:59] *** JetBalsa (~jetbalsa@[redacted]) has joined #archiveteam-bs [23:15] *** Start_ (~Start@[redacted]) has joined #archiveteam-bs [23:15] *** Start has quit (Read error: Connection reset by peer) [23:17] *** ndiddy (~Nathan@[redacted]) has joined #archiveteam-bs [23:19] *** logan2 (~a@[redacted]) has joined #archiveteam-bs [23:20] *** logan has quit (Ping timeout: 252 seconds) [23:49] HCross, internetarchive includes -delete [23:50] or delete=True if you're importing the library [23:50] Thanks [23:51] Ideally I want to point it at a directory, and then say "upload all the warcs in all subfolders here. The metadata is xxx, and then delete after". I think ive got a script that does that [23:52] absurdly easy if they're all going into one item, slightly harder if not [23:53] *** ndiddy has quit (Read error: Connection reset by peer) [23:53] yea, its all in one. Easy as pie [23:54] uploading 4 hour old files, so I know they are done [23:54] https://harrycross.me/ecd.png [23:54] *** ndiddy (~Nathan@[redacted]) has joined #archiveteam-bs [23:54] ^ that code is probably completely rubbish [23:57] I am slightly bemused that you found it easier to screenshot and upload that than just copy and paste it into a file... [23:58] ive got a little tool on my taskbar, I just select the area I want to show, ittakes the shot, crops it, and uploads it [23:58] and then copies the link to my clipboard