#archiveteam-ot 2019-06-09,Sun

↑back Search

Time Nickname Message
00:08 🔗 icedice has quit IRC (Leaving)
00:11 🔗 bitbit has quit IRC (Remote host closed the connection)
00:14 🔗 Raccoon has joined #archiveteam-ot
01:13 🔗 BlueMax has quit IRC (Quit: Leaving)
03:30 🔗 dhyan_nat has joined #archiveteam-ot
03:32 🔗 odemg has quit IRC (Ping timeout: 265 seconds)
03:45 🔗 odemg has joined #archiveteam-ot
04:10 🔗 BlueMax has joined #archiveteam-ot
04:36 🔗 robogoat has quit IRC (Ping timeout: 255 seconds)
04:58 🔗 robogoat has joined #archiveteam-ot
05:12 🔗 wp494 has quit IRC (Ping timeout: 268 seconds)
05:12 🔗 wp494 has joined #archiveteam-ot
05:59 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
05:59 🔗 Mateon1 has joined #archiveteam-ot
09:33 🔗 VoynichCr JAA: cool, found this searching "archives" https://twitter.com/archivesnext/lists/archives-on-twitter/members
09:34 🔗 VoynichCr more https://twitter.com/loharasukidog/lists/archives-n-such/members
09:54 🔗 VoynichCr we neeeeeed snscrape ircbot
10:10 🔗 VoynichCr JAA: can you add scraping for twitter-list-members?
10:46 🔗 killsushi has quit IRC (Quit: Leaving)
12:16 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
13:37 🔗 Raccoon has quit IRC (Ping timeout: 265 seconds)
13:37 🔗 Raccoon has joined #archiveteam-ot
14:15 🔗 JAA VoynichCr: Yup. I'll probably make it return users, not Tweets.
14:25 🔗 alex__ has joined #archiveteam-ot
14:26 🔗 alex__ has quit IRC (Remote host closed the connection)
14:26 🔗 alex__ has joined #archiveteam-ot
14:36 🔗 Raccoon has quit IRC (Ping timeout: 252 seconds)
14:48 🔗 alex__ has quit IRC (Ping timeout: 255 seconds)
14:48 🔗 alex__ has joined #archiveteam-ot
14:53 🔗 VoynichCr JAA: yeah
15:01 🔗 alex__ has quit IRC (alex__)
15:04 🔗 alex__ has joined #archiveteam-ot
15:19 🔗 chirlu has quit IRC (Read error: Operation timed out)
15:39 🔗 chirlu has joined #archiveteam-ot
16:21 🔗 icedice has joined #archiveteam-ot
16:37 🔗 DogsRNice has joined #archiveteam-ot
17:17 🔗 alex__ has quit IRC (Quit: alex__)
18:24 🔗 SuperCalc has joined #archiveteam-ot
18:24 🔗 JAA Continuing from #archivebot
18:24 🔗 SuperCalc yeah
18:24 🔗 SuperCalc so is there any other options for what Im trying to do?
18:25 🔗 JAA SuperCalc: I don't know enough about MtG/card trading to give you a real answer, but what you're looking for is a scraper, a tool that extract the information you're interested in from a website.
18:25 🔗 JAA extracts*
18:25 🔗 SuperCalc i see.
18:25 🔗 SuperCalc do scrapers cost anything?
18:27 🔗 JAA Well, complicated question. There are certainly commercial scraping softwares, but you can also write your own if you know how to code. Then you'll need a machine to actually run the scraping on, storage for the results, network, etc. You will likely also encounter that websites don't exactly like scrapers since it can generate significant server load.
18:28 🔗 SuperCalc i see
18:28 🔗 SuperCalc well, scryfall doesnt own the copyright to the cards they're databasing, they're an unofficial mtg site
18:28 🔗 SuperCalc and i only would need to do the scraping once a year probably
18:28 🔗 SuperCalc well, once a year except this year
18:29 🔗 SuperCalc if scryfall has a problem with it they can just tell me to stop right?
18:29 🔗 JAA If they know how to contact you, yes.
18:29 🔗 JAA Or they'll just ban your IP etc.
18:29 🔗 SuperCalc ah
18:29 🔗 SuperCalc ill have to find out what their policy is on data scraping then
18:30 🔗 JAA Is this of any use to you? https://scryfall.com/docs/api/bulk-data
18:30 🔗 SuperCalc no
18:30 🔗 SuperCalc the bulk data does not include pricing
18:30 🔗 SuperCalc unfortunately
18:30 🔗 JAA Ah
18:30 🔗 SuperCalc it says that in their api documentation
18:31 🔗 SuperCalc ill see if i can find out if their site says anything about data scraping
18:31 🔗 SuperCalc hang on
18:31 🔗 JAA The API itself does seem to have pricing information, and likely also has clear rate limits somewhere.
18:31 🔗 JAA 10 requests per second
18:32 🔗 JAA And 429 responses if you exceed that.
18:32 🔗 SuperCalc found something
18:32 🔗 SuperCalc this was mentioned in their terms of service:
18:32 🔗 SuperCalc You may not scrape Scryfall or solicit information from Scryfall users for the purposes of recruitment, head-hunting, or job boards
18:33 🔗 SuperCalc but since i'm not doing this for any of those purposes, i think i should be fine
18:33 🔗 SuperCalc question
18:34 🔗 SuperCalc if i do use a data scraper, will i need more storage space to put it on?
18:34 🔗 SuperCalc there are 10186 cards in the format
18:34 🔗 JAA Uh, well, you will have to store the scraping results, so yes.
18:34 🔗 JAA But it won't use much storage.
18:34 🔗 SuperCalc how big are scraping results typically?
18:34 🔗 SuperCalc will it fit comfortably on my laptop?
18:34 🔗 JAA There's no way to answer that.
18:34 🔗 SuperCalc hmm
18:35 🔗 JAA It all depends on what you're scraping and how much data you're collecting.
18:35 🔗 SuperCalc right
18:35 🔗 SuperCalc well, if i limit it to just the card images and the card names of cards that are 15 cents or less
18:35 🔗 JAA If you just need a card ID and a price, then that's maybe 20 bytes per card, so it would be under 1 MB before even compressing.
18:35 🔗 SuperCalc ah then that should probably be good then
18:36 🔗 SuperCalc although card id alone probably wont be enough, it would be better to have the card images and maybe the names too
18:36 🔗 SuperCalc that way people can browse through the legal cards in the format more easily
18:36 🔗 terorie has quit IRC (Ping timeout: 615 seconds)
18:36 🔗 SuperCalc without having to constantly copy and paste a code
18:36 🔗 SuperCalc into a card pricer
18:37 🔗 SuperCalc or something
18:37 🔗 SuperCalc sorry i mean the gatherer
18:37 🔗 SuperCalc not card pricer
18:37 🔗 SuperCalc since the prices would be just from the day of rotation
18:37 🔗 SuperCalc misspoke
18:38 🔗 SuperCalc found a data scraper app that doesnt require me to know how to code
18:38 🔗 SuperCalc it was on a list of best free data scrapers
18:38 🔗 JAA Images will obviously make it much larger. In any case, grab a small sample, e.g. 1 % or 1 ‰, and extrapolate from that.
18:38 🔗 SuperCalc ok sounds good
18:38 🔗 SuperCalc thanks! :)
18:50 🔗 SuperCalc well
18:51 🔗 SuperCalc ive read a bit more on the legal issues of data scraping
18:51 🔗 SuperCalc and it suddenly seems like just because what i want to do is technically allowed by scryfall's tos
18:51 🔗 SuperCalc doesnt mean the law will agree with me
18:51 🔗 SuperCalc im going to have to send a message to scryfall and ask for permission
18:52 🔗 Ryz has joined #archiveteam-ot
18:52 🔗 Ryz ...E3 2019 is upon us...and prepare to be disappointed and expect leaks lol
19:27 🔗 SuperCalc what do you guys think of this message for their contact us form?
19:28 🔗 SuperCalc or wait
19:28 🔗 SuperCalc it doesnt work in irc chat nvm
19:28 🔗 SuperCalc ill ask somewhere else
20:08 🔗 Ryz Xbox E3 2019 conference started
20:08 🔗 Ryz First, Outer Worlds; then Ninja Theory's new game, then Ori 2
20:11 🔗 icedice has quit IRC (Read error: Operation timed out)
20:50 🔗 SuperCalc has quit IRC (Quit: Page closed)
20:59 🔗 systwi GLaDOS: Thanks dude, will give it a go
21:51 🔗 Stilettoo has joined #archiveteam-ot
21:54 🔗 Stiletto has quit IRC (Ping timeout: 255 seconds)
21:58 🔗 ivan now that we have qwarc can someone please just run it all over twitter
22:28 🔗 Fusl :D
22:42 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
22:44 🔗 JAA Speaking of which, Fusl, how's your Twitter firehose plan doing?
22:44 🔗 Fusl need dem twitter account ids
22:45 🔗 JAA Have you talked to Jason Baumgartner yet?
22:45 🔗 JAA I assume he continued working on it since writing that article.
22:46 🔗 Fusl who is jason baumgartner
22:46 🔗 Fusl oh that guy
22:46 🔗 Fusl no i havent
22:47 🔗 Fusl i'll get in contact with him
22:47 🔗 Fusl on twitter
22:47 🔗 Fusl :P
22:47 🔗 JAA Another thing I'd like to do is collect metadata about Twitter users.
22:47 🔗 JAA All of them, ideally.
22:47 🔗 Fusl metadata?
22:47 🔗 JAA How many tweets, followers, followees, etc.
22:48 🔗 Fusl ah
23:01 🔗 JAA Sigh, Twitter...
23:01 🔗 BlueMax has joined #archiveteam-ot
23:02 🔗 JAA snscrape twitter-search '#dkpol until:2018-08-09' just dies after 1007 results even though there are many, many older tweets.
23:09 🔗 JAA Can reproduce it in Firefox as well. This is weird.
23:11 🔗 JAA Guess I'll just continue with until:2018-08-08. :-/
23:13 🔗 wp494 has quit IRC (Read error: Operation timed out)
23:15 🔗 wp494 has joined #archiveteam-ot
23:54 🔗 Odd0002_ has joined #archiveteam-ot
23:56 🔗 Odd0002 has quit IRC (Ping timeout: 252 seconds)
23:56 🔗 Odd0002_ is now known as Odd0002

irclogger-viewer