[00:01] *** phirephly has quit IRC (Read error: Operation timed out) [00:04] *** tech234a has quit IRC (Quit: Connection closed for inactivity) [00:12] *** phirephly has joined #archiveteam-bs [00:40] *** fuzy802 has joined #archiveteam-bs [00:41] *** fuzzy8021 has quit IRC (Read error: Operation timed out) [00:46] *** Sgeo_ has joined #archiveteam-bs [00:48] *** Sgeo__ has quit IRC (Read error: Operation timed out) [00:50] *** fuzy802 is now known as fuzzy8021 [01:23] *** tech234a has joined #archiveteam-bs [01:40] *** glmd has joined #archiveteam-bs [01:53] *** enowaldo has joined #archiveteam-bs [01:59] *** enowaldo has quit IRC (Ping timeout: 252 seconds) [02:10] *** kode54 has joined #archiveteam-bs [02:29] *** glmd has quit IRC (Ping timeout: 260 seconds) [03:18] *** Zerote has quit IRC (Ping timeout: 260 seconds) [03:35] *** qw3rty116 has joined #archiveteam-bs [03:35] *** xit_ has quit IRC (Remote host closed the connection) [03:40] Have anyone considered making a system similar to #youtubearchive, but for news sites that only host their news broadcasts temporarily? [03:40] *** qw3rty115 has quit IRC (Read error: Operation timed out) [03:43] Finland's public broadcasting company only have their videos up for 30 days, for example [03:43] https://arenan.yle.fi/tv/program/nyheter [03:43] https://areena.yle.fi/tv/ohjelmat/uutiset [03:44] I mean if you have the resources I would be happy to help [03:44] http://flickfetch.bplaced.net/ is quite helpful [03:45] Not really [03:46] I mean, I'm related to one of the top IT guys at YLE [03:47] Not sure if that would even be useful though [03:48] I don't have much to bring to the table, but I figured that it might be a project worth considering for the public broadcasting companies that don't permanently host their videos [03:56] *** drcd has quit IRC (Quit: Leaving) [03:57] https://nyancat.dakko.us/ [04:11] how much bandwidth can that take? :s [04:16] How does a short loop like that eat up so much bandwidth? [04:33] *** drcd has joined #archiveteam-bs [04:35] *** Hani111 has joined #archiveteam-bs [04:44] *** Hani has quit IRC (Ping timeout: 615 seconds) [04:44] *** Hani111 is now known as Hani [04:49] *** Xibalba has joined #archiveteam-bs [04:58] *** BlueMaxim has joined #archiveteam-bs [05:07] *** BlueMax has quit IRC (Ping timeout: 615 seconds) [05:10] *** wyatt8740 has quit IRC (Read error: Operation timed out) [05:23] *** Exairnous has joined #archiveteam-bs [05:58] *** closure has joined #archiveteam-bs [05:59] *** LeG0ax has joined #archiveteam-bs [06:00] *** atbk has quit IRC (Quit: ZNC - https://znc.in) [06:00] *** apache2 has quit IRC (Remote host closed the connection) [06:00] *** Dimtree has quit IRC () [06:00] *** kode54 has quit IRC (Quit: Ping timeout (120 seconds)) [06:00] *** Ing3b0rg has quit IRC (Quit: woopwoop) [06:00] *** closure_ has quit IRC (Write error: Broken pipe) [06:00] *** MR9K has quit IRC (Write error: Broken pipe) [06:00] *** eientei95 has quit IRC (Quit: ZNC 1.7.0+deb0+bionic1 - https://znc.in) [06:00] *** acridAxid has quit IRC (Quit: marauder) [06:00] *** DFJustin has quit IRC (Remote host closed the connection) [06:00] *** atbk has joined #archiveteam-bs [06:00] *** apache2 has joined #archiveteam-bs [06:00] *** DFJustin has joined #archiveteam-bs [06:00] *** LeG0ax is now known as Ing3b0rg [06:00] *** kode54 has joined #archiveteam-bs [06:01] *** MR9K has joined #archiveteam-bs [06:02] *** acridAxid has joined #archiveteam-bs [06:07] *** af10b3e5e has quit IRC (Read error: Connection reset by peer) [06:09] *** af10b3e5e has joined #archiveteam-bs [06:09] *** eientei95 has joined #archiveteam-bs [06:09] *** svchfoo1 sets mode: +o eientei95 [06:09] *** svchfoo3 sets mode: +o eientei95 [06:14] *** tech234a has quit IRC (Quit: Connection closed for inactivity) [06:24] *** Dimtree has joined #archiveteam-bs [07:33] *** Exairnous has quit IRC (Ping timeout: 246 seconds) [07:44] *** fuzzy8021 has quit IRC (Read error: Operation timed out) [07:46] *** fuzzy8021 has joined #archiveteam-bs [08:39] *** godane has quit IRC (Ping timeout: 268 seconds) [08:40] *** Smiley has quit IRC (Read error: Operation timed out) [08:47] *** Smiley has joined #archiveteam-bs [08:55] *** godane has joined #archiveteam-bs [09:13] *** enowaldo has joined #archiveteam-bs [09:18] *** enowaldo has quit IRC (Ping timeout: 265 seconds) [10:12] *** godane has quit IRC (Ping timeout: 615 seconds) [10:24] *** SilSte has joined #archiveteam-bs [10:28] *** godane has joined #archiveteam-bs [10:34] *** Zerote has joined #archiveteam-bs [10:46] *** BlueMaxim has quit IRC (Leaving) [11:14] *** enowaldo has joined #archiveteam-bs [11:23] *** enowaldo has quit IRC (Ping timeout: 492 seconds) [12:34] Frogging: icedice: it's streaming essentially uncompressed image data continuously, and incentivizes people to stay connected for as long as possible to make the counter go up, so.. :P [12:51] *** justas is now known as jut [13:05] *** Oddly has joined #archiveteam-bs [13:31] *** fuzzy8021 has quit IRC (Ping timeout: 252 seconds) [13:32] *** fuzzy8021 has joined #archiveteam-bs [13:36] *** alex_ has joined #archiveteam-bs [13:43] *** Pixi` has quit IRC (Quit: Pixi`) [13:44] *** Pixi has joined #archiveteam-bs [14:16] *** enowaldo has joined #archiveteam-bs [14:17] *** BartoCH has quit IRC (Ping timeout: 615 seconds) [14:22] *** BartoCH has joined #archiveteam-bs [14:33] *** BartoCH has quit IRC (Ping timeout: 615 seconds) [14:40] *** BartoCH has joined #archiveteam-bs [15:41] *** enowaldo has quit IRC (Read error: Operation timed out) [15:44] *** tech234a has joined #archiveteam-bs [16:01] *** dashcloud has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) [16:03] *** enowaldo has joined #archiveteam-bs [16:06] *** bitBaron has joined #archiveteam-bs [16:07] *** sHATNER has joined #archiveteam-bs [16:13] *** alex_ has quit IRC (Quit: take care ye all. Have fun!) [16:37] *** bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) [16:43] *** Aoede has quit IRC (Quit: ZNC - https://znc.in) [16:55] *** Verified_ has quit IRC (Ping timeout: 252 seconds) [17:01] *** enowaldo has quit IRC (Read error: Operation timed out) [17:08] *** ndiddy has joined #archiveteam-bs [17:11] *** enowaldo has joined #archiveteam-bs [17:25] *** Verified_ has joined #archiveteam-bs [17:37] *** Terbium has quit IRC (Quit: Terbium) [17:37] *** Terbium has joined #archiveteam-bs [17:53] *** Terbium has quit IRC (Quit: Terbium) [17:55] *** Aoede has joined #archiveteam-bs [17:56] *** svchfoo1 sets mode: +o Aoede [17:56] *** svchfoo3 sets mode: +o Aoede [17:56] *** Terbium has joined #archiveteam-bs [17:59] *** bitBaron has joined #archiveteam-bs [18:03] *** schbirid has quit IRC (Remote host closed the connection) [18:16] *** enowaldo has quit IRC (Read error: Operation timed out) [18:29] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) [18:37] *** Exairnous has joined #archiveteam-bs [19:00] *** enowaldo has joined #archiveteam-bs [19:06] *** bitBaron has joined #archiveteam-bs [19:10] *** enowaldo has quit IRC (Ping timeout: 252 seconds) [19:39] *** Zerote has quit IRC (Ping timeout: 260 seconds) [19:47] *** kyledrake has joined #archiveteam-bs [20:28] *** icedice2 has joined #archiveteam-bs [20:28] *** icedice has quit IRC (Ping timeout: 252 seconds) [20:35] *** icedice2 has quit IRC (Quit: Leaving) [20:40] *** enowaldo has joined #archiveteam-bs [20:43] *** sarahlynn has joined #archiveteam-bs [20:45] *** sarahlynn has quit IRC (Remote host closed the connection) [20:51] *** odemg has joined #archiveteam-bs [20:53] *** n00b593 has joined #archiveteam-bs [20:54] *** jesso has quit IRC (Quit: jesso) [20:55] Afternoon Archive Team! I'm a represenative of a video game preservation group and have a number of websites that are in danger of being lost: one is several TB worth of data, others have download limits (20 per IP address.) Is there a way to work with you folks on archiving this stuff? [20:58] n00b593: Welcome! Consider mentioning this in #archivebot for downloading sites to be added to IA. [20:58] Is that the proper channel? Alright! [20:59] Yeah, it's good for downloading websites. [20:59] Depends on the size. [20:59] True. [20:59] One is several TB worth of data, the other two are hard to estimate. [20:59] n00b593: roughly how many pages are on these sites? [21:00] For the section of one forum I'm interested in, several thousand at least? The full site (an old, out of date Russian forum) is maybe several hundred thousand. [21:01] The one with TB is mainly due to file downloads I believe (it's Microsoft's Xbox marketing page that hasn't been touched in a long while.) [21:01] The one with the download inhibitor is several tens of thousands. [21:02] Could send the URLs here? [21:03] *** jesso has joined #archiveteam-bs [21:08] Sorry... One moment. [21:10] mobiles24.co - downloads are limited per IP [21:10] http://phoneky.com/ - downloads are limited per IP (may be more complicated than just switching IPs) [21:12] waper.ru - on each page of waper from http://waper.ru/file/1 to something like http://waper.ru/file/200000 (they have a ridiculous amount of files) you can see that they have a URL box so someone would need to write a script that would load the page, get the URL, then download that URL, and move onto the next one. [21:15] https://news.xbox.com/en-us/media/ - Xbox Press site with TBs worth of data (that ends up being more than we can handle short of spinning up an AWS instance.) [21:15] (Much of the data is from the original Xbox and thus needs to be archived.) [21:16] What do you think, tech? [21:17] Hmm... looks like a lot of stuff. JAA? [21:17] JAA? [21:17] (someone else's username) [21:17] Oh.. Figured as much a second after I thought it was an acronym. [21:18] *** Zerote has joined #archiveteam-bs [21:21] Any idea how much at risk these sites are, and whether the content is unique? [21:25] mobiles24.co and phoneky likely contain a lot of software / files that can't be found anywhere else at this point and otherwise need to be curated (likely are dupes, but it's impossible to know. We have found at least 30% of these types of sites are unique data.) The Xbox Press stuff is if it exists elsewhere, all over the place. The older stuff is likely unique. [21:25] waper.ru likely falls into the "at least 30%" range. [21:28] Even if 70% of those sites are duplicates, it's they still represent niche data that is being lost and not added to anywhere on the net anymore. [21:30] Wide swathes of this data have already been lost, it is now essentially forgotton, and now it's at risk of being destroyed. [21:32] *** m007a83_ is now known as m007a83 [21:35] The Xbox stuff is important because these are the raw media assets, that until the mid 2000s, were sent to magazines and media companies to use which they did, but now all we have (mostly) are scans of low resolution versions. [21:44] *** odemgi has joined #archiveteam-bs [21:44] Are we talking "this might disappear in the coming months" or "shit, this will go down in the next days" here? [22:22] *** godane has quit IRC (Ping timeout: 246 seconds) [22:31] JAA: "shit, how is this stuff still up?" [22:33] I can't even estimate if its either of those, but this stuff is far and away past it's shelf life. It's probably only up because someone hasn't noticed the autopayments hitting their credit card for the domain registeration and server stuff. [22:44] For the Xbox Press assets, likely a lot of stuff has already been lost due to site redesigns and no one over there caring or having the resources to convert them. So it's unknown. [22:46] *** wyatt8740 has joined #archiveteam-bs [22:51] *** godane has joined #archiveteam-bs [23:10] n00b593: reminds me of many of these java-phone game sites [23:11] Exactly what they're.... There are a few of us who are trying to archive those sites and games to at some point curate and dat them. [23:21] Dat them? [23:22] *** enowaldo has quit IRC (Read error: Operation timed out) [23:22] *** BlueMax has joined #archiveteam-bs [23:25] Deduplicate, hash them to make them unique, run other checksums and log as much information to identify unique files. [23:28] BlueMaxim is part of the effort and we've discussed these sites in the past, but the need for custom scripts and especially the IP limited download limits add additional obstacles. [23:29] *** BlueMax has quit IRC (Quit: Leaving) [23:30] *** Sgeo has joined #archiveteam-bs [23:31] *** Sgeo_ has quit IRC (Read error: Operation timed out) [23:33] *** BlueMax has joined #archiveteam-bs [23:34] *** tech234a has quit IRC (Quit: Connection closed for inactivity) [23:45] *** BlueMax has quit IRC (Ping timeout: 615 seconds) [23:47] 20 downloads per IP per day with the one website? [23:56] Yes.