#archiveteam-bs 2019-04-07,Sun

↑back Search

Time Nickname Message
00:01 🔗 phirephly has quit IRC (Read error: Operation timed out)
00:04 🔗 tech234a has quit IRC (Quit: Connection closed for inactivity)
00:12 🔗 phirephly has joined #archiveteam-bs
00:40 🔗 fuzy802 has joined #archiveteam-bs
00:41 🔗 fuzzy8021 has quit IRC (Read error: Operation timed out)
00:46 🔗 Sgeo_ has joined #archiveteam-bs
00:48 🔗 Sgeo__ has quit IRC (Read error: Operation timed out)
00:50 🔗 fuzy802 is now known as fuzzy8021
01:23 🔗 tech234a has joined #archiveteam-bs
01:40 🔗 glmd has joined #archiveteam-bs
01:53 🔗 enowaldo has joined #archiveteam-bs
01:59 🔗 enowaldo has quit IRC (Ping timeout: 252 seconds)
02:10 🔗 kode54 has joined #archiveteam-bs
02:29 🔗 glmd has quit IRC (Ping timeout: 260 seconds)
03:18 🔗 Zerote has quit IRC (Ping timeout: 260 seconds)
03:35 🔗 qw3rty116 has joined #archiveteam-bs
03:35 🔗 xit_ has quit IRC (Remote host closed the connection)
03:40 🔗 icedice Have anyone considered making a system similar to #youtubearchive, but for news sites that only host their news broadcasts temporarily?
03:40 🔗 qw3rty115 has quit IRC (Read error: Operation timed out)
03:43 🔗 icedice Finland's public broadcasting company only have their videos up for 30 days, for example
03:43 🔗 icedice https://arenan.yle.fi/tv/program/nyheter
03:43 🔗 icedice https://areena.yle.fi/tv/ohjelmat/uutiset
03:44 🔗 Flashfire I mean if you have the resources I would be happy to help
03:44 🔗 icedice http://flickfetch.bplaced.net/ is quite helpful
03:45 🔗 icedice Not really
03:46 🔗 icedice I mean, I'm related to one of the top IT guys at YLE
03:47 🔗 icedice Not sure if that would even be useful though
03:48 🔗 icedice I don't have much to bring to the table, but I figured that it might be a project worth considering for the public broadcasting companies that don't permanently host their videos
03:56 🔗 drcd has quit IRC (Quit: Leaving)
03:57 🔗 eientei95 https://nyancat.dakko.us/
04:11 🔗 Frogging how much bandwidth can that take? :s
04:16 🔗 icedice How does a short loop like that eat up so much bandwidth?
04:33 🔗 drcd has joined #archiveteam-bs
04:35 🔗 Hani111 has joined #archiveteam-bs
04:44 🔗 Hani has quit IRC (Ping timeout: 615 seconds)
04:44 🔗 Hani111 is now known as Hani
04:49 🔗 Xibalba has joined #archiveteam-bs
04:58 🔗 BlueMaxim has joined #archiveteam-bs
05:07 🔗 BlueMax has quit IRC (Ping timeout: 615 seconds)
05:10 🔗 wyatt8740 has quit IRC (Read error: Operation timed out)
05:23 🔗 Exairnous has joined #archiveteam-bs
05:58 🔗 closure has joined #archiveteam-bs
05:59 🔗 LeG0ax has joined #archiveteam-bs
06:00 🔗 atbk has quit IRC (Quit: ZNC - https://znc.in)
06:00 🔗 apache2 has quit IRC (Remote host closed the connection)
06:00 🔗 Dimtree has quit IRC ()
06:00 🔗 kode54 has quit IRC (Quit: Ping timeout (120 seconds))
06:00 🔗 Ing3b0rg has quit IRC (Quit: woopwoop)
06:00 🔗 closure_ has quit IRC (Write error: Broken pipe)
06:00 🔗 MR9K has quit IRC (Write error: Broken pipe)
06:00 🔗 eientei95 has quit IRC (Quit: ZNC 1.7.0+deb0+bionic1 - https://znc.in)
06:00 🔗 acridAxid has quit IRC (Quit: marauder)
06:00 🔗 DFJustin has quit IRC (Remote host closed the connection)
06:00 🔗 atbk has joined #archiveteam-bs
06:00 🔗 apache2 has joined #archiveteam-bs
06:00 🔗 DFJustin has joined #archiveteam-bs
06:00 🔗 LeG0ax is now known as Ing3b0rg
06:00 🔗 kode54 has joined #archiveteam-bs
06:01 🔗 MR9K has joined #archiveteam-bs
06:02 🔗 acridAxid has joined #archiveteam-bs
06:07 🔗 af10b3e5e has quit IRC (Read error: Connection reset by peer)
06:09 🔗 af10b3e5e has joined #archiveteam-bs
06:09 🔗 eientei95 has joined #archiveteam-bs
06:09 🔗 svchfoo1 sets mode: +o eientei95
06:09 🔗 svchfoo3 sets mode: +o eientei95
06:14 🔗 tech234a has quit IRC (Quit: Connection closed for inactivity)
06:24 🔗 Dimtree has joined #archiveteam-bs
07:33 🔗 Exairnous has quit IRC (Ping timeout: 246 seconds)
07:44 🔗 fuzzy8021 has quit IRC (Read error: Operation timed out)
07:46 🔗 fuzzy8021 has joined #archiveteam-bs
08:39 🔗 godane has quit IRC (Ping timeout: 268 seconds)
08:40 🔗 Smiley has quit IRC (Read error: Operation timed out)
08:47 🔗 Smiley has joined #archiveteam-bs
08:55 🔗 godane has joined #archiveteam-bs
09:13 🔗 enowaldo has joined #archiveteam-bs
09:18 🔗 enowaldo has quit IRC (Ping timeout: 265 seconds)
10:12 🔗 godane has quit IRC (Ping timeout: 615 seconds)
10:24 🔗 SilSte has joined #archiveteam-bs
10:28 🔗 godane has joined #archiveteam-bs
10:34 🔗 Zerote has joined #archiveteam-bs
10:46 🔗 BlueMaxim has quit IRC (Leaving)
11:14 🔗 enowaldo has joined #archiveteam-bs
11:23 🔗 enowaldo has quit IRC (Ping timeout: 492 seconds)
12:34 🔗 joepie91 Frogging: icedice: it's streaming essentially uncompressed image data continuously, and incentivizes people to stay connected for as long as possible to make the counter go up, so.. :P
12:51 🔗 justas is now known as jut
13:05 🔗 Oddly has joined #archiveteam-bs
13:31 🔗 fuzzy8021 has quit IRC (Ping timeout: 252 seconds)
13:32 🔗 fuzzy8021 has joined #archiveteam-bs
13:36 🔗 alex_ has joined #archiveteam-bs
13:43 🔗 Pixi` has quit IRC (Quit: Pixi`)
13:44 🔗 Pixi has joined #archiveteam-bs
14:16 🔗 enowaldo has joined #archiveteam-bs
14:17 🔗 BartoCH has quit IRC (Ping timeout: 615 seconds)
14:22 🔗 BartoCH has joined #archiveteam-bs
14:33 🔗 BartoCH has quit IRC (Ping timeout: 615 seconds)
14:40 🔗 BartoCH has joined #archiveteam-bs
15:41 🔗 enowaldo has quit IRC (Read error: Operation timed out)
15:44 🔗 tech234a has joined #archiveteam-bs
16:01 🔗 dashcloud has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
16:03 🔗 enowaldo has joined #archiveteam-bs
16:06 🔗 bitBaron has joined #archiveteam-bs
16:07 🔗 sHATNER has joined #archiveteam-bs
16:13 🔗 alex_ has quit IRC (Quit: take care ye all. Have fun!)
16:37 🔗 bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…)
16:43 🔗 Aoede has quit IRC (Quit: ZNC - https://znc.in)
16:55 🔗 Verified_ has quit IRC (Ping timeout: 252 seconds)
17:01 🔗 enowaldo has quit IRC (Read error: Operation timed out)
17:08 🔗 ndiddy has joined #archiveteam-bs
17:11 🔗 enowaldo has joined #archiveteam-bs
17:25 🔗 Verified_ has joined #archiveteam-bs
17:37 🔗 Terbium has quit IRC (Quit: Terbium)
17:37 🔗 Terbium has joined #archiveteam-bs
17:53 🔗 Terbium has quit IRC (Quit: Terbium)
17:55 🔗 Aoede has joined #archiveteam-bs
17:56 🔗 svchfoo1 sets mode: +o Aoede
17:56 🔗 svchfoo3 sets mode: +o Aoede
17:56 🔗 Terbium has joined #archiveteam-bs
17:59 🔗 bitBaron has joined #archiveteam-bs
18:03 🔗 schbirid has quit IRC (Remote host closed the connection)
18:16 🔗 enowaldo has quit IRC (Read error: Operation timed out)
18:29 🔗 bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…)
18:37 🔗 Exairnous has joined #archiveteam-bs
19:00 🔗 enowaldo has joined #archiveteam-bs
19:06 🔗 bitBaron has joined #archiveteam-bs
19:10 🔗 enowaldo has quit IRC (Ping timeout: 252 seconds)
19:39 🔗 Zerote has quit IRC (Ping timeout: 260 seconds)
19:47 🔗 kyledrake has joined #archiveteam-bs
20:28 🔗 icedice2 has joined #archiveteam-bs
20:28 🔗 icedice has quit IRC (Ping timeout: 252 seconds)
20:35 🔗 icedice2 has quit IRC (Quit: Leaving)
20:40 🔗 enowaldo has joined #archiveteam-bs
20:43 🔗 sarahlynn has joined #archiveteam-bs
20:45 🔗 sarahlynn has quit IRC (Remote host closed the connection)
20:51 🔗 odemg has joined #archiveteam-bs
20:53 🔗 n00b593 has joined #archiveteam-bs
20:54 🔗 jesso has quit IRC (Quit: jesso)
20:55 🔗 n00b593 Afternoon Archive Team! I'm a represenative of a video game preservation group and have a number of websites that are in danger of being lost: one is several TB worth of data, others have download limits (20 per IP address.) Is there a way to work with you folks on archiving this stuff?
20:58 🔗 tech234a n00b593: Welcome! Consider mentioning this in #archivebot for downloading sites to be added to IA.
20:58 🔗 n00b593 Is that the proper channel? Alright!
20:59 🔗 tech234a Yeah, it's good for downloading websites.
20:59 🔗 JAA Depends on the size.
20:59 🔗 tech234a True.
20:59 🔗 n00b593 One is several TB worth of data, the other two are hard to estimate.
20:59 🔗 tech234a n00b593: roughly how many pages are on these sites?
21:00 🔗 n00b593 For the section of one forum I'm interested in, several thousand at least? The full site (an old, out of date Russian forum) is maybe several hundred thousand.
21:01 🔗 n00b593 The one with TB is mainly due to file downloads I believe (it's Microsoft's Xbox marketing page that hasn't been touched in a long while.)
21:01 🔗 n00b593 The one with the download inhibitor is several tens of thousands.
21:02 🔗 tech234a Could send the URLs here?
21:03 🔗 jesso has joined #archiveteam-bs
21:08 🔗 n00b593 Sorry... One moment.
21:10 🔗 n00b593 mobiles24.co - downloads are limited per IP
21:10 🔗 n00b593 http://phoneky.com/ - downloads are limited per IP (may be more complicated than just switching IPs)
21:12 🔗 n00b593 waper.ru - on each page of waper from http://waper.ru/file/1 to something like http://waper.ru/file/200000 (they have a ridiculous amount of files) you can see that they have a URL box so someone would need to write a script that would load the page, get the URL, then download that URL, and move onto the next one.
21:15 🔗 n00b593 https://news.xbox.com/en-us/media/ - Xbox Press site with TBs worth of data (that ends up being more than we can handle short of spinning up an AWS instance.)
21:15 🔗 n00b593 (Much of the data is from the original Xbox and thus needs to be archived.)
21:16 🔗 n00b593 What do you think, tech?
21:17 🔗 tech234a Hmm... looks like a lot of stuff. JAA?
21:17 🔗 n00b593 JAA?
21:17 🔗 tech234a (someone else's username)
21:17 🔗 n00b593 Oh.. Figured as much a second after I thought it was an acronym.
21:18 🔗 Zerote has joined #archiveteam-bs
21:21 🔗 JAA Any idea how much at risk these sites are, and whether the content is unique?
21:25 🔗 n00b593 mobiles24.co and phoneky likely contain a lot of software / files that can't be found anywhere else at this point and otherwise need to be curated (likely are dupes, but it's impossible to know. We have found at least 30% of these types of sites are unique data.) The Xbox Press stuff is if it exists elsewhere, all over the place. The older stuff is likely unique.
21:25 🔗 n00b593 waper.ru likely falls into the "at least 30%" range.
21:28 🔗 n00b593 Even if 70% of those sites are duplicates, it's they still represent niche data that is being lost and not added to anywhere on the net anymore.
21:30 🔗 n00b593 Wide swathes of this data have already been lost, it is now essentially forgotton, and now it's at risk of being destroyed.
21:32 🔗 m007a83_ is now known as m007a83
21:35 🔗 n00b593 The Xbox stuff is important because these are the raw media assets, that until the mid 2000s, were sent to magazines and media companies to use which they did, but now all we have (mostly) are scans of low resolution versions.
21:44 🔗 odemgi has joined #archiveteam-bs
21:44 🔗 JAA Are we talking "this might disappear in the coming months" or "shit, this will go down in the next days" here?
22:22 🔗 godane has quit IRC (Ping timeout: 246 seconds)
22:31 🔗 n00b593 JAA: "shit, how is this stuff still up?"
22:33 🔗 n00b593 I can't even estimate if its either of those, but this stuff is far and away past it's shelf life. It's probably only up because someone hasn't noticed the autopayments hitting their credit card for the domain registeration and server stuff.
22:44 🔗 n00b593 For the Xbox Press assets, likely a lot of stuff has already been lost due to site redesigns and no one over there caring or having the resources to convert them. So it's unknown.
22:46 🔗 wyatt8740 has joined #archiveteam-bs
22:51 🔗 godane has joined #archiveteam-bs
23:10 🔗 coderobe n00b593: reminds me of many of these java-phone game sites
23:11 🔗 n00b593 Exactly what they're.... There are a few of us who are trying to archive those sites and games to at some point curate and dat them.
23:21 🔗 marked Dat them?
23:22 🔗 enowaldo has quit IRC (Read error: Operation timed out)
23:22 🔗 BlueMax has joined #archiveteam-bs
23:25 🔗 n00b593 Deduplicate, hash them to make them unique, run other checksums and log as much information to identify unique files.
23:28 🔗 n00b593 BlueMaxim is part of the effort and we've discussed these sites in the past, but the need for custom scripts and especially the IP limited download limits add additional obstacles.
23:29 🔗 BlueMax has quit IRC (Quit: Leaving)
23:30 🔗 Sgeo has joined #archiveteam-bs
23:31 🔗 Sgeo_ has quit IRC (Read error: Operation timed out)
23:33 🔗 BlueMax has joined #archiveteam-bs
23:34 🔗 tech234a has quit IRC (Quit: Connection closed for inactivity)
23:45 🔗 BlueMax has quit IRC (Ping timeout: 615 seconds)
23:47 🔗 VADemon 20 downloads per IP per day with the one website?
23:56 🔗 n00b593 Yes.

irclogger-viewer